Saturday, July 26, 2008

Top-N monitoring

A good way to save a lot of time with performance monitoring and workload management is to only look at the largest consumers of your DBMS. I call this top-N monitoring, because I only want to look at the top-N entries on any report that displays resource consumption. I also want to look at the changes in values over time. Tools for doing this with MySQL are starting to emerge.

The process I describe below has been great for finding and reducing sources of workload growth. How do you monitor your workload?

I want to know the following:
  • what resources are consumed by each database account
  • what resources are consumed by each database object (table, index)
  • what resources are consumed by each database statement
I use the SHOW USER_STATISTICS command to measure resource consumption per account. This measures wall clock and CPU time running commands, the number of rows fetched/changed/read, the number of connections and connection failures, the number of network bytes sent/received, the number of commits and rollbacks and the number of binlog bytes written per account.

I use the SHOW TABLE_STATISTICS and SHOW INDEX_STATISTICS commands to measure resource consumption per table and index. These measure the number of rows read and changed per table and index.

Daily cron jobs run these SHOW commands, archive the results and generate HTML reports. My tools for this are remarkably primitive. I use AWK and Bash shell scripts and the results are stored in flat files. It would be nice to use a better toolchain for this, store the results in a database and produce graphs on demand. There is an opportunity here for others to provide a great tool.

Another tool (thanks Chip) runs these commands every N seconds and displays the changes in values. This is excellent for identifying the source of a workload surge. Without them you either need to capture all queries that are run or hope that most of the load is from long running statements and use SHOW PROCESSLIST to find them. To log all statements that get run you can use tcpdump or the slow query log with long_query_time=0. But on many builds, long_query_time is not a dynamic variable and requires a server restart to change.

There are a few ways to monitor statement activity. The slow query log can be used with long_query_time=0 to log all statements. I prefer not to log all statements to local disk because of the overhead. I am in favor of sampling. SHOW PROCESSLIST can be used to sample the statements that get run. Statements that run for a short amount of time seem to be underreported using this technique. An alternative is to provide an option for the slow query log to log one of every N statements.

It is difficult to make sense of a log of SQL statements, so you need to rank and aggregate the results. Aggregation is done by normalizing the query text (remove extra white space, replace literals with _). Ranking is done by determining the total time spent running the query if you have output from the slow query log and by counting the number of occurrences of the query if you have output from SHOW PROCESSLIST. This can be expressed as SQL by:
select normalized(stmt), sum(cost) from Log group by normalized(stmt) order by sum(cost)
We have a tool that samples queries using SHOW PROCESSLIST, archives the results and supports queries over the archive. I have a cron job that queries the archive once per day to display the top-N statements from the last day, last 3 days and last 7 days. There is a great opportunity for others to provide a tool. MySQL Enterprise has work in progress to do this.

Where do we go from here?

The Percona MySQL branch has support for some of SHOW USER_STATISTICS, SHOW TABLE_STATISTICS and SHOW INDEX_STATISTICS. These features have yet to appear in a MySQL release nor has their potential appearance been announced. We need to keep on asking for them. Putting the code in the server allows for open and closed source monitoring tools to be built.

If it hasn't already been done, long_query_time must be a dynamic variable.

Someone needs to figure out a good solution for statement logging.

Only 4 DBMS vendors matter (according to Microsoft)

As part of the marketing for the new version of SQL Server, Microsoft provides comparisons for 3 other vendors. MySQL made the list! This is quite an honor. The content is OK, but there are a few funny points.
SQL Server is integrated with Microsoft Update for security updates. MySQL has no automatic update patching. 
I would love to have automatic updates applied to servers running in production. It would be even better if Microsoft did not notify me when servers were being stopped so this could be done.
Microsoft has the largest developer and support staff in the world. MySQL has approximately 70 developers and 50 support staff.
Apparently, Microsoft needs thousands or maybe tens of thousands of support staff.

Friday, July 25, 2008

Hey Drizzle, share the love

Drizzle gets all the love right now. MySQL developers who aren't working on Drizzle might feel a little bit unloved. I don't want them to stop what they are doing and move over to Drizzle. I need them to keep on fixing bugs in 5.0 and getting 5.1 ready for GA. So lets share the love. Hug a MySQL developer and let them know you appreciate their hard work -- but only if they are not working on Drizzle.

And speaking of 5.1, what is the plan for releasing 5.1 plugins for Maria and Falcon? Given that the storage engine API supports plugins, the release of a storage engine doesn't require a new server release. InnoDB has already proven that by giving us a plugin that provides compression and fast DDL.

Hey Drizzle, share the love

Drizzle gets all the love right now. MySQL developers who aren't working on Drizzle might feel a little bit unloved. I don't want them to stop what they are doing and move over to Drizzle. I need them to keep on fixing bugs in 5.0 and getting 5.1 ready for GA. So lets share the love. Hug a MySQL developer and let them know you appreciate their hard work -- but only if they are not working on Drizzle.



And speaking of 5.1, what is the plan for releasing 5.1 plugins for Maria and Falcon? Given that the storage engine API supports plugins, the release of a storage engine doesn't require a new server release. InnoDB has already proven that by giving us a plugin that provides compression and fast DDL.

Thursday, July 17, 2008

Going to OSCON

I will be at OSCON on Thursday and Friday. Who wants to go raise heck at the PostgreSQL booth? Will MySQL host Curt Monash at the Sun booth?

To fill out this post, my list of great things I don't get to use:
  • MapReduce, Bigtable and all of the other great Google technology. I focus on MySQL and don't get to partake.
  • ZFS - with this I can disable the InnoDB double write buffer
  • Thumper - Is there a desktop version of this? Perhaps I can win one at the Sun booth.
  • PostgreSQL

Wednesday, July 16, 2008

What exactly does FLUSH TABLES WITH READ LOCK do?

FLUSH TABLES WITH READ LOCK can do wonders. But you should understand what it does to avoid problems. The manual describes it with this:
Closes all open tables and locks all tables for all databases with a read lock until you explicitly release the lock by executing UNLOCK TABLES. This is very convenient way to get backups if you have a filesystem such as Veritas that can take snapshots in time.
FLUSH TABLES WITH READ LOCK acquires a global read lock and not table locks, so it is not subject to the same behavior as LOCK TABLES and UNLOCK TABLES with respect to table locking and implicit commits:
The implementation does this:
  1. set the global read lock - after this step, insert/update/delete/replace/alter statements cannot run
  2. close open tables - this step will block until all statements started previously have stopped
  3. set a flag to block commits
If long-running statements were started prior to step 2, then the FLUSH TABLES WITH READ LOCK command will block until they complete. This is a bad state to get stuck in as the server is in read-only mode at this point and this statement is frequently run on a primary. It might be a good idea to kill long-running statements prior to running the FLUSH command or when it gets stuck. Note that commit can still be done until step 3 finishes.

Here is the code:
      if (lock_global_read_lock(thd))
        return 1;                               // Killed
      result=close_cached_tables(thd,(options & REFRESH_FAST) ? 0 : 1,
                                 tables);
      if (make_global_read_lock_block_commit(thd)) // Killed
      {
        /* Don't leave things in a half-locked state */
        unlock_global_read_lock(thd);
        return 1;

Tuesday, July 15, 2008

Do you really want to use XA with MySQL?

I have never been a fan of XA. One reason is because I have had to use and debug buggy implementations of it elsewhere. But the big reason is that it seems like one of those things that sounds great with a lot of hand-waving -- click here, here and here and presto -- distributed commit. And then you put it in production and failure happens.

Given the restrictions on XA in MySQL, there is even less reason to use it. If a server crashes and there are transactions in the PREPARED state, the transactions can be commited after the server recovers. But those transactions will not be written into the binlog. What good is doing all of this work to keep separate primaries in sync if the primaries and slaves will not have the same transactions?

What happens when an app forgets to commit or rollback a PREPARED transaction? Nothing good. This is what a PREPARED transaction normally looks like in SHOW INNODB STATUS.
---TRANSACTION 0 23167490, ACTIVE (PREPARED) 39 sec, process no 13492, OS thread id 3002907568
1 lock struct(s), heap size 320, undo log entries 1
MySQL thread id 1, query id 9 mark 172.0.0.0 m
And this is what it looks like after a server has crashed and restarted. There is not much here and no way to map it back to the database account that created it. At this point you need to use XA RECOVER and hope the app that created it used a good transaction ID.
---TRANSACTION 0 23167490, ACTIVE (PREPARED) 2 sec, process no 0, OS thread id 0
, undo log entries 1
XA RECOVER is your friend in these cases. It provides enough data so you can commit or rollback.
mysql> xa recover;
+----------+--------------+--------------+------+
| formatID | gtrid_length | bqual_length | data |
+----------+--------------+--------------+------+
|        1 |            3 |            0 | bar  |
+----------+--------------+--------------+------+
Today I looked at a log from a crash. The cause was a 4 day old XA transaction that someone forgot to commit. Ouch. I have a solution that is better than monitoring for stale transactions. Disable the use of  XA commands in the MySQL parser and eliminate the problem. There is a variable, innodb_support_xa, that can be set to disable the support for XA by InnoDB, but this only changes things internally. The XA commands will still be parsed. And applications will not know that XA is not supported.

The MySQL implementation has other interesting behavior. It will rollback a prepared transaction when the session that prepared the transaction disconnects and when the server does a clean shutdown. When the server crashes, prepared transactions do not get a rollback during recovery. Given that the MySQL server acts as a resource manager and not as a transaction manager, this seems odd. If a client starts an XA transaction on several servers, prepares all of them and then loses a connection because of a flaky network, it will be suprised that it cannot commit on all servers. And there is no way to prevent this.

Monday, July 14, 2008

Enterprise implies upwards compatability

MySQL is pretty good about supporting upwards compatability with one exception -- replication. This should be fixed as production upgrades usually require running the new version first on a slave. Compatability can be measured in the following areas:
  • client API - must I relink my clients
  • datafile - must I dump and reload my database
  • replication - can I replicate from a version N master to version N+X slave
  • SQL - do statements that work in version N work in version N+X
MySQL has been very good about client API and SQL. Things were only broken to make the system better (client API change to use more secure password hash, SQL parser stopped accepting ambiguous statements).

MySQL states this about replication compatability. I don't have much faith in the support for rolling upgrades after reading this. They should assert there is no problem when replicating to a slave that is one version greater than the master. Are there tests for replication compatability?
As a general rule, you should set up replication only between masters and slaves running the same major versions (5.1, 5.0 or 4.1) of MySQL. If you must execute replication between different major versions, ensure that your client is at a version equal to or higher than that of the master.
MySQL docs are clear that datafile compatability is probably not supported on upgrade. I don't mind using dump and restore as that is a one time cost and it is one less thing to worry about.
MySQL recommends that you dump and reload your tables from any previous version to upgrade to 5.1.

Saturday, July 12, 2008

DDL twice as fast

If you are not running MySQL 5.1 with the InnoDB plugin, then DDL on large tables takes a long time because it usually requires making a copy of the table and rebuilding all indexes. There are two easy ways to make this go faster:
  1. Run the DDL on master and slaves concurrently. Assuming you can take the master and slaves down at the same time, don't use MySQL replication for long running DDL statements. If you do, then the operation takes twice as long as the DDL statement as the slaves don't start the DDL statement until the master finishes it. Make sure to run set sql_log_bin=0 on the master prior to running the DDL statement.
  2. Use a large value for innodb_log_file_size. You shouldn't always use the largest value possible, but it can make DDL much faster for operations that are IO bound because there is less checkpoint IO. For a test I ran today, the alter table operation was twice as fast with 3 1.3GB log files than with 3 128MB log files.

Friday, July 11, 2008

How do you know when InnoDB gets behind on IO?

Hopefully, someone will write a book about InnoDB one day because it is beautiful. Most of the complexity is hidden by adaptive algorithms that work. Sometimes you need to understand some of the complexity. Peter Zaitsev has written about this as well. The blog post is worth reading.

InnoDB uses an insert buffer to reduce IO for secondary index maintenance on updates and inserts. This is a persistent index of pending changes that must be done for secondary indexes. When a server gets busy, the insert buffer may grow. This line from SHOW INNODB STATUS displays the size of the insert buffer. When it is large, there is more pending IO to be done to flush changes from the insert buffer to the secondary indexes.
 Ibuf: size 37, free list len 70989, seg size 71027,
InnoDB does not remove deleted rows from a table until they cannot be read by any other transaction. This is done by a background thread. Deleted rows that have not been purged artificially increase the size of a table. They also increase the overhead of table scan operations. This line from SHOW INNODB STATUS displays the amount of work pending for the purge operation.

History list length 30
There are two times when you might want to know the value of these variables. When these values are growing, there is more pending IO to be done by background threads. The rate at which the background threads perform this IO is determined by whether the server is currently busy or idle and by an assumption that the server can do 100 IOs per second.
  1. When an application scans a table in primary key order concurrent with another application that deletes rows in primary key order, the scanning application will encounter rows that have been deleted but not yet removed. If the purge by the background thread does not keep up with the delete application, then the scan application may run much slower than expected. The innodb_max_purge_lag configuration variable may help with this. But the first time you have this problem, it might take a while to recognize.
  2. When batch application run concurrent with online applications, it is good to rate limit the work done by the batch apps and these variables can be used to determine when the batch app show slow down.
Finally, there are a couple of changes that can make this easier:
  1. Display the history list length value in SHOW STATUS so we don't have to parse SHOW INNODB STATUS.
  2. Display the insert buffer size in SHOW STATUS as well.
  3. Make the IO capacity of the server a configurable value and derive the max rate of background IO from these threads. This is hardwired to be 100 IOs per second today. Because of this, a server with many disks will take much longer than necessary to flush pending IOs from the insert buffer and to purge deleted rows.

Thursday, July 10, 2008

Can't we all just get along - EnterpriseDB and MySQL

I am not a big fan of marketing and my lack of fanhood extends to MySQL with their Falcon push. Now we have this gem from the CTO of EnterpriseDB.
One big issue with MySQL is that they continue to suffer from shipping products very late and with big quality issues.
I don't think this message is new. I just find it funny that EnterpriseDB is delivering it. They seem to be turning their attention from Oracle to MySQL. That must mean that MySQL is doing well in the marketplace. MySQL is certainly doing better in the marketing marketplace. They were featured in the Wall Street Journal and EnterpriseDB was featured on the blog of Curt Monash. They are also doing much better at trends.google.com.
 
Can MySQL survive the constant attacks from Bob at EnterpriseDB and Emma at Ingres? When will Sybase join in?
 
And in case you missed it, I have been linked by one of my favorite bloggers.

Monday, July 7, 2008

PostgreSQL vs MySQL according to EnterpriseDB

Curt Monash mentions a white paper from EnterpriseDB that compares PostgreSQL and MySQL. It is worth reading.

Did anyone watch the Marcia, Marcia, Marcia episode of the Brady Bunch? Sometimes I think that MySQL is Marcia and someone else is Jan.

There is a claim about MyISAM versus InnoDB performance. It is wrong, but not that different from an opinion I hear too frequently within the MySQL community. Because of this, the focus of too much of the paper is on MyISAM rather than InnoDB.
However, acceptable performance is generally considered to be only available from MySQL’s default storage engine, MyISAM ...

A typo and an incorrect claim about lock granularity in MyISAM. Everyone knows that MyISAM uses table-level locks, not page or block-level locks.

MyISAM lasks this feature, and instead locks rows at the less-granular block level.

More details are needed to make this problem clear. Changes to catalog tables during DDL are not transactional because MyISAM doesn't do transactions. Also, crashes during some DDL operations can be a problem (InnoDB might drop a table but the delete from the catalog is not done, InnoDB might add a table but the add to the catalog is not done).

However, the MySQL catalog only operates on MyISAM. Because of this limitation, catalog corruption and administrative tasks are still problematic.

PostgreSQL does concurrent index builds. The index build runs concurrently with DML as write locks are not taken. It requires two table scans. This is great, but I am confused about the lack of write locks. My guess is that writes to the indexed table are prevented during the second table scan. MySQL doesn't support this. Long running DDL can be done in MySQL without downtime by performing the operation on a slave that is otherwise offline and then make the slave a primary after it has caught up on replication.

Is this true? Sometimes, I wish I were hacking on PostgreSQL rather than MySQL, but the lack of a storage engine API has kept me from doing that. Note that PostgreSQL didn't exist in the late 1980's, Postgres did. The PostgreSQL project converted an interesting academic project into an amazing RDBMS product. MySQL has supported multiple storage engines long before the pluggable storage engine API started.

PostgreSQL has supported multiple storage engines since the late 1980’s. It wasn’t until just recently that MySQL implemented similar functionality via their newly developed Pluggable Storage Engine API.

Partly true. One of the calls is used to get index cardinality stats and no query context is available for that. Index cardinality stats are used to determine join order. These stats also do not represent skew as a uniform distribution is assume. The other call is used to determine selectivity for predicates that can use an index. In this case the query context is provided as the storage engine is given the predicates. InnoDB estimates selectivity by walking the index from the root to leaf for the first and last rows to be scanned for the predicates. The results are amazing.

MySQL’s API presents only two optimizer-related function calls to storage engine developers. These calls, if implemented, can assist in costing plans for the query optimizer. However, because these calls do not present the context of the query to the storage engines, the engines themselves cannot accurately return a proper estimate.

Tuesday, July 1, 2008

The best MySQL book ever

My copy of High Performance MySQL arrived today. Pardon the hyperbole but I reviewed it and I just saw another episode of The Bachelor with the most dramatic rose ceremony ever.

As good as this book is, it will soon need a third edition. There are several storage engines under development (PBXT, q4m, Maria, Falcon) and InnoDB continues to innovate (compression in the 5.1 plugin). Some of the things that we know (MyISAM and InnoDB don't scale on SMP) will no longer be true. And there will be many new things to figure out (just how do you tune Falcon).

Can Domas help write the next edition?
 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.