Thursday, May 1, 2008

EnterpriseDb -- the real open source database

According to ads displayed with my email, Enterprise DB is the real open-source database. At first, I thought that the source for most of their features was not available (I have no interest in Oracle compatability, so I don't want to see that code). But it is available at the download page.

I think that PostgreSQL is amazing for its technology, community, code (nice formatting and documentation) and environment that allows many companies to create value-added versions of it and eventually push back some of those changes to the community. The talk at the MySQL Conference was also amazing, but I had to miss it.

EnterpriseDb has published the source for gridsql that can run queries in parallel on a cluster. Who wants to do a MySQL port?

Thursday, April 24, 2008

The community gives back

Bluehost.com and Percona have made it easier for others to use one of my favorite features, user and table accounting, from the Google patch.

By easier I mean that patches for recent versions of MySQL 5.0 have been published. And if you don't want to go through the trouble of applying the patch, I imagine that Bluehost.com has it running on their hosted servers.

The MySQL Performance blog post describes using the microslow patch in combination with SHOW USER_STATISTICS output to debug workload problems. I hope they describe the process in greater detail in another blog post. I use user and table accounting as the first step in debugging workload problems. With that data I determine which users and tables create the majority of the load and then I use SQL statement logging or sampling from SHOW PROCESSLIST to determine the statements that might be the problem.

The external development community is growing, if slowly, and I am happy to see this. We have had some nice patches and major features and I look forward to more. I am sure I have left others off of this list -- let me know.

Saturday, April 19, 2008

A better (community) HEAP engine

A patch has been posted that makes the HEAP engine use much less memory for rows with variable length columns. This was contributed by Igor Chernyshev while working on the eBay Kernel Team. Thank you Igor and eBay!

Now for the fun and uncertainty. This patch was published with with a GPLv2 license. How do users get it? There is no community branch into which people can submit changes with a GPL license. I will put it into my patch, I expect it to show up in other places such as the 4.0.Domas source tree and there are companies that will port changes such as this on demand. The result of this is that community users are likely to benefit from this change much sooner than others.

Tuesday, April 15, 2008

Online (Community) Backup

Announcing pending availability of the online community backup utility.

I will have patches that make warm backup possible. By warm I mean that the server is put into readonly mode. It does not have to be stopped. Many queries can be done during this time, but DDL and statements that modify rows are blocked. After the server has been put into readonly mode, all files can be copied to create the backup. This will be fast if your filesystem supports snapshots.

The idea for this wasn't mine. Someone I work with requested it and then someone else did the work.

There are two big differences between MySQL Enterprise Online Backup and this:
  • Enterprise Online Backup takes hot backups. Transactions can be done while the backup is taken. Community Online Backup takes warm backups. The server is not stopped, but transactions and DDL are.
  • Community Online Backup works for InnoDB. As InnoDB already has a great hot backup utility (ibbackup), I doubt they will implement the new online backup APIs. The current plan for Enterprise Online Backup with InnoDB is to use the equivalent of mysqldump for backup and LOAD DATA INFILE for restore. Restores will take a really long time or forever depending on how patient you are. Community Online Backup is fast if your filesystem supports snapshots and as fast as your disks otherwise.
This is the procedure to take a backup. As there are more than a few steps, it would be best if tools such as Maatkit were to provide the automation.
  1. Stop replication if the server is a slave.
  2. Put the server into readonly mode. This uses new global variables rather than the existing read_only option as that allows transactions by accounts with the SUPER privilege and from the replication SQL thread.
  3. Kill any sessions performing updates or DDL or wait for the statements to finish.
  4. Flush the InnoDB transaction log and the binlog.
  5. Disable writes by InnoDB. This uses a new option and prevents InnoDB from writing to any files. That includes sessions performing work for user's sessions and background IO threads.
  6. Copy files or take a snapshot.
  7. Undo the previous steps.

InnoDB continues to improve

InnoDB just announced the 5.1 plugin with some great features.
  • page compression
  • fast index creation
  • better storage for large columns
I can't wait to get the source. These will solve critical problems for me. Fast index creation should reduce the downtime window used to perform the (too) frequent schema changes. Page compression will reduce the size of the database which should improve the utilization of the buffer cache and make backup/restore much faster.

Wednesday, April 9, 2008

InnoDB scales on big SMP servers

The performance problems for InnoDB on big SMP servers have been fixed. You just haven't seen the code yet. Attend my talk at the MySQL User Conference for more details on the fixes and some amazing graphs.

Friday, February 15, 2008

Is MySQL getting faster?

Is MySQL getting faster over time? I think so. It is difficult to keep things fast as features are added, but many operations have gotten faster between 4.0.26 and 5.0.54 and the performance of 5.0.54 is much improved over 5.0.37. I ran a few tests to measure the difference:
  • dump and reload a table
  • sql-bench
The formatting of the tables below looked great when I previewed them using the blog editor, but not so good when published.

Dump and Reload

I measured the time to dump and reload a table. The table has 10 million rows, 3 int columns and no indexes. MyISAM and InnoDB storage engines were used. The InnoDB tables were on a tmpfs filesystem to avoid disk IO. The test was performed in 4 steps:
  • Load the data into a MyISAM table using load data infile. This measures the performance of inserting without the overhead of parsing SQL statements.
  • Dump the table using mysqldump --extended-insert. This measures the performance of the MySQL client library and the select code path in the storage engine and the server.
  • Reload the output of mysqldump into a MyISAM table. This measures the performance of the MySQL client library, the mysql command-line tool, the parser and the storage engine.
  • Reload the output of mysqldump into an InnoDB table.
Performance can change significantly between releases, and usually in the right direction:
  • mysqldump is much faster in 5.0. The client library was changed to do fewer read() calls when fetching a query result. This is great.
  • the parser was very fast in 4.0 and 5.0.54 has returned some of that performance. Inserting a large number of rows is frequently bound by the performance of the parser and this was much slower in early 5.0 releases.
  • because of a bug in the 5.0.54 mysql command line client, I used the version from 5.0.51 for the 5.0.54 tests.
This lists the number of seconds to run each test.












































Versionload data infilemysqldumpinsert to MyISAMinsert to InnoDB
4.0.268.919.733.772.7
4.1.228.522.742.779.9
5.0.379.39.469.6117.7
5.0.549.29.650.852.5

sql-bench

This is a simple benchmark distributed with MySQL. I ran the version provided with 4.0.26.
  • InnoDB was used for all tables.
  • Constants that determine the amount of work done for each test were changed.
  • Tests were run on a server with 4 CPU cores.
  • 32-bit binaries were used.
5.0.54 is the fastest. This lists the number of seconds to run each section of the test.







































































ReleaseTotal TimeAlterATISbig-tablesconnectcreateinsertselecttransactionswisconsin
4.0.26390924435415931393227
4.1.22412854250526233323631
5.0.37422424566676134384836
5.0.54378404558545633363930