The process I describe below has been great for finding and reducing sources of workload growth. How do you monitor your workload?
I want to know the following:
- what resources are consumed by each database account
- what resources are consumed by each database object (table, index)
- what resources are consumed by each database statement
I use the SHOW TABLE_STATISTICS and SHOW INDEX_STATISTICS commands to measure resource consumption per table and index. These measure the number of rows read and changed per table and index.
Daily cron jobs run these SHOW commands, archive the results and generate HTML reports. My tools for this are remarkably primitive. I use AWK and Bash shell scripts and the results are stored in flat files. It would be nice to use a better toolchain for this, store the results in a database and produce graphs on demand. There is an opportunity here for others to provide a great tool.
Another tool (thanks Chip) runs these commands every N seconds and displays the changes in values. This is excellent for identifying the source of a workload surge. Without them you either need to capture all queries that are run or hope that most of the load is from long running statements and use SHOW PROCESSLIST to find them. To log all statements that get run you can use tcpdump or the slow query log with long_query_time=0. But on many builds, long_query_time is not a dynamic variable and requires a server restart to change.
There are a few ways to monitor statement activity. The slow query log can be used with long_query_time=0 to log all statements. I prefer not to log all statements to local disk because of the overhead. I am in favor of sampling. SHOW PROCESSLIST can be used to sample the statements that get run. Statements that run for a short amount of time seem to be underreported using this technique. An alternative is to provide an option for the slow query log to log one of every N statements.
It is difficult to make sense of a log of SQL statements, so you need to rank and aggregate the results. Aggregation is done by normalizing the query text (remove extra white space, replace literals with _). Ranking is done by determining the total time spent running the query if you have output from the slow query log and by counting the number of occurrences of the query if you have output from SHOW PROCESSLIST. This can be expressed as SQL by:
select normalized(stmt), sum(cost) from Log group by normalized(stmt) order by sum(cost)We have a tool that samples queries using SHOW PROCESSLIST, archives the results and supports queries over the archive. I have a cron job that queries the archive once per day to display the top-N statements from the last day, last 3 days and last 7 days. There is a great opportunity for others to provide a tool. MySQL Enterprise has work in progress to do this.
Where do we go from here?
The Percona MySQL branch has support for some of SHOW USER_STATISTICS, SHOW TABLE_STATISTICS and SHOW INDEX_STATISTICS. These features have yet to appear in a MySQL release nor has their potential appearance been announced. We need to keep on asking for them. Putting the code in the server allows for open and closed source monitoring tools to be built.
If it hasn't already been done, long_query_time must be a dynamic variable.
Someone needs to figure out a good solution for statement logging.

