You know that long running query on your server, the one with a group by and a large join? It was probably creating a very large temp table on disk. How large was it? It is hard to tell. If the temp table has spilled to disk and you can login to the server, then you can check the size of the temp table file.
I prefer to monitor this and have mysqld enforce a limit. Who wants to run out of disk space because of a 100G temp table? But MySQL provides no limits for this. You can use a separate file system for temp tables to limit their size. If you have not done that then I think you are stuck. I want to add an option to MySQL so that the max size of an on-disk temp table can be enforced. Options already exist for the max size of an in-memory temp table.
-
I want an option to prevent queries with unexpected cartesian joins from running. When enabled an error will be raised for this query:
select * from foo, bar
This query will continue to work:
select * from foo cross join bar
I read many more queries than I write and it is much easier to find missing join conditions in a query when ANSI join syntax is used.0Add a comment
-
How do you monitor the usage of temp tables in a MySQL database? I want to know what tables exist and how much disk space and memory they consume. I also want to view the index statistics for them when there are long running queries with bad plans. The following SQL commands might make this easier to do:
- SHOW TEMP TABLE STATUS displays the table status for every temp table on a server including those created implicitly to process queries with GROUP BY and ORDER BY
- SHOW TEMP TABLE INDEXES displays index cardinality statistics for all temp tables
5View comments
- SHOW TEMP TABLE STATUS displays the table status for every temp table on a server including those created implicitly to process queries with GROUP BY and ORDER BY
-
There are some cases when I want to know the plan for a running query. These include:
- MySQL doesn't explain INSERT, UPDATE and DELETE statements. As the optimizer for SELECT statements is different than that for the other statements, the query plan may be very different than expected.
- InnoDB uses sampling to provide index cardinality statistics. Different servers with the same data frequently have different stats when the distribution of rows to distinct key values is not uniform. This is a frequent cause for bad query plans and usually requires hints to force a join order.
- The contents of a temp table used by a query are not available to the DBA attempting to explain the query. It would help if I could run SHOW TEMP TABLE STATUS and SHOW TEMP TABLE INDEXES to understand the contents of all temp tables currently in use.
- A user may forget to collect statistics on a temp table prior to using it. I wish that stats were automatically collected in this case. They are not.
- The MySQL optimizer is very good at choosing the correct index as per-table selectivity estimates are extremely accurate. It has occasional problems choosing a good join order when there is skew. The problem is that the optimizer computes the join selectivity between tables based on the number of distinct number of values in the join columns (this is displayed by SHOW INDEXES for when the columns are indexes). This computation assumes that the number of rows per distinct value are uniformly distributed. The join selectivity is used to determine the join order. There is a page in the MySQL documentation that explains some of this but I cannot find it. We have many queries that force join orders to compensate for this. Forcing the index for a table is done much less frequently.
The plan for this query:
select * from foo, bar where foo.fk = bar.pk
Will be:
foo:fk_index,bar:pk_index
This can be extended to report on query progress by displaying the number of rows read from each table in the plan:
foo:fk_index:10,bar:pk_index:100
This output can also be written to the slow query log.3View comments
-
Percona has done a great job improving the user_statistics code and making it available to others. The data from user_statistics can be used for real-time and long-term workload monitoring.
A common problem for real-time monitoring is determining the source of the load on a database server. If the load on a server suddenly increases and the cause is many fast queries, you may not be able to identify the problem using SHOW PROCESSLIST. You can use tcpdump or enable the slow query log, but neither is easy to use. If you are an Enterprise customer and use the MySQL Proxy, then there is a new workload monitoring tool that can be used.
All of the data you need to identify the problem is in user_statistics -- find the account with large changes for Busy_time, Cpu_time, Select_commands, Update_commands or Other_commands. It helps to have a tool display the differences in values from user_statistics over time. I hope that innotop provides this.
Another problem is long-term workload monitoring. Changes in workload are easier to spot if you sample and log statistics. I run a cron job to read the values once per day and then generate reports that list the top-N accounts over the past day and week. If you use MySQL 5.1, then you can use the event scheduler (I can enter the use-case competition now). I imagine that third-party monitoring tools might also be a good place to put this data, but I don't have much experience with them. An advantage of putting this data into a SQL database is that you can query it and evaluate expressions (Cpu_time / (Select_commands + Update_commands)).0Add a comment
View comments