Sunday, September 21, 2008

SHOW QUERY PLANS

There are some cases when I want to know the plan for a running query. These include:
  • MySQL doesn't explain INSERT, UPDATE and DELETE statements. As the optimizer for SELECT statements is different than that for the other statements, the query plan may be very different than expected.
  • InnoDB uses sampling to provide index cardinality statistics. Different servers with the same data frequently have different stats when the distribution of rows to distinct key values is not uniform. This is a frequent cause for bad query plans and usually requires hints to force a join order.
  • The contents of a temp table used by a query are not available to the DBA attempting to explain the query.  It would help if I could run SHOW TEMP TABLE STATUS and SHOW TEMP TABLE INDEXES to understand the contents of all temp tables currently in use.
  • A user may forget to collect statistics on a temp table prior to using it. I wish that stats were automatically collected in this case. They are not.
  • The MySQL optimizer is very good at choosing the correct index as per-table selectivity estimates are extremely accurate. It has occasional problems choosing a good join order when there is skew. The problem is that the optimizer computes the join selectivity between tables based on the number of distinct number of values in the join columns (this is displayed by SHOW INDEXES for when the columns are indexes). This computation assumes that the number of rows per distinct value are uniformly distributed. The join selectivity is used to determine the join order. There is a page in the MySQL documentation that explains some of this but I cannot find it. We have many queries that force join orders to compensate for this. Forcing the index for a table is done much less frequently.
To compensate for these problems, I will implement SHOW QUERY PLANS or SHOW PROCESSLIST WITH PLANS. The output from this is one line per query where the plan is represented as a comma-separated list of table:index pairs and represents the join order used for the query. It would be great if I could also do EXPLAIN PLAN FOR _connection-id_, but the join order is sufficient for me.  An official implementation of this should use the information schema.

The plan for this query:
select * from foo, bar where foo.fk = bar.pk
Will be:
foo:fk_index,bar:pk_index
This can be extended to report on query progress by displaying the number of rows read from each table in the plan:
foo:fk_index:10,bar:pk_index:100
This output can also be written to the slow query log.

2 comments:

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.