My Launchpad branches for features from the big Google patch have moved. They are now in the Patch for MySQL project. Hopefully, this is of limited interest as the purpose of these branches is to break down the big Google patch into usable pieces and then hope that others incorporate those into the official and community MySQL projects.
I also committed a few changes (revisions 2695 - 2698 for 5.0, revisions 2750 - 2753 for 5.1) to fix the performance regression the the faster rw-mutex patch. The server is ~25% faster than unmodified MySQL with the fix for the regression test provided by Percona.
Monday, October 27, 2008
Saturday, October 25, 2008
Crashes from Innodb long semaphore waits
In the unlikely case that there was a mysqld crash on servers that I use, a common cause was a long semaphore wait in InnoDB. MySQL bug 32149 has been open for this since November, 2007. We fixed this in our local branch, so this is rarely a problem for us now. The fixes are available in branches of MySQL 5.0 and 5.1 from Launchpad.
Does anyone else get this crash? The symptoms can be found in the database error log:
Does anyone else get this crash? The symptoms can be found in the database error log:
- it contains the text long semaphore wait
- it has text similar to the output SHOW INNODB STATUS
- connections wait too long for the adaptive hash latch (from btr0sea.c)
- a connection holds the adaptive hash latch -- search for holds adaptive hash latch
Friday, October 24, 2008
Hey! You! Get off of my cloud!
Failures of public services generate a lot of publicity. Every EC2 blip generates too many blog posts, including this one, about how the cloud cannot be trusted to run business critical functions. It is a difficult debate to have because availability numbers for cloud services are easy to get (there are companies devoted to reporting this) but public numbers for internal deployments are rare. What is the typical uptime for a deployment of server software from the large providers of server software?
We could start by reporting uptime numbers for MySQL deployments. But there is a big problem here. Who is willing to report a lousy number?
We could start by reporting uptime numbers for MySQL deployments. But there is a big problem here. Who is willing to report a lousy number?
Saturday, October 18, 2008
MySQL Conference proposals that I want to see
These are proposals that I would love to see for the 2009 MySQL Conference:
- MySQL Cluster Data Warehouse Edition. HP converted Tandem NonStep into HP NewView. MySQL Cluster can do the same. I am sure this will happen, but probably not by next year.
- Running a cluster of database servers on 1 big SMP. Sun has the hardware and software to make this possible. The new Sun T2+ server has 32 cores and 256 hardware threads. Solaris has lots of features for partitioning a larger server. The use of SMP as a cluster has been written about before. This is a good fit for MySQL as it will be a while before MySQL can effectively use a large SMP and multi-threaded MySQL Cluster will be limited to small (less than 10) number of threads.
- MySQL performance on Intel SSD. Actually, this should be open to any SSD vendor that provides commodity pricing and great random-write performance. This platform significantly raises the bar for what we expect. Sun has written about using Flash as a cache between memory and spinning disk. It will be interesting to see how this turns out with respect to price performance.
- How we pushed changes from Drizzle back to mainline. Dare to dream. For now I will settle for the changes to compile with -Wall and remove compiler warnings.
- High-throughput reporting performance: MySQL, ZFS, Flash and Sun Thumper. Thumpers have been popular for running Greenplum Postgres, but have not garnered as much press for MySQL. I distinguish reporting from data warehouse workloads. In this case there will be many concurrent complex queries. The queries have enough selectivity that nested loops join is preferred over hash join. 48 disks in a box should give you 5000 to 10,000 IOPs.
Friday, October 17, 2008
Performance regression in the SMP fix patch
The SMP fix patch has a performance regression. This was reported by Percona. They are very good at what they do. I get many ideas for improvements from comments by them and from Domas. The regression may be an issue for workloads with many concurrent updates. We will fix it.
Wednesday, October 15, 2008
Who needs hash join -- Intel SSD will soon be here
Intel SSD will soon be here. Does MySQL still need hash join?
SSD can be a huge deal for MySQL data warehouse deployments. Disk-bound joins between large tables can be slow because nested-loops join uses so many random reads. Hash join is one way to fix this. Decreasing the time for a random read is another.
Intel SSD can do a random read in less than 0.1 milliseconds. It also has great random write performance which has been an issue for SSD and the price is reasonable. Decreasing random read latency by 50X to 100X will make disk-bound joins much faster.
This should also help with:
SSD can be a huge deal for MySQL data warehouse deployments. Disk-bound joins between large tables can be slow because nested-loops join uses so many random reads. Hash join is one way to fix this. Decreasing the time for a random read is another.
Intel SSD can do a random read in less than 0.1 milliseconds. It also has great random write performance which has been an issue for SSD and the price is reasonable. Decreasing random read latency by 50X to 100X will make disk-bound joins much faster.
This should also help with:
- disk-bound aggregation when an indexed MyISAM temp table is used
- old-style external sort when the sort buffer contains (key, row pointer)
Tuesday, October 14, 2008
Faith-based upgrades
MySQL documentation states that replicating from a 5.0 master to a 5.1 slave should work. This is very different from stating that it does work. That section of the manual should enter the 5.1 no-use case competition.
An Enterprise database must claim that it supports replication from one release to the next. MySQL seems to support it in practice -- I am able to replicate from 5.0 to 5.1. Am I taking a risk by using 5.1 slaves or does the documentation need an update?
In general (but not always), slaves running MySQL 5.1.x can be used with older masters, but not the reverse.Most of us upgrade to a new version by running it on slaves for a long time before putting it on a master. Can we put it on critical slaves? After reading the documentation I am not sure.
An Enterprise database must claim that it supports replication from one release to the next. MySQL seems to support it in practice -- I am able to replicate from 5.0 to 5.1. Am I taking a risk by using 5.1 slaves or does the documentation need an update?
Saturday, October 11, 2008
More background IO threads for InnoDB in MySQL 5.1
I have ported code from the Google patch to support more background IO threads and make the InnoDB background IO limit configurable. More details on this are here. The code in in launchpad. More details on launchpad are here.
These changes can improve performance if your servers are IO bound and have many disks. They are more likely to be an issue if you use O_DIRECT for InnoDB or a remote file system such as NFS.
The 5.1 branch in launchpad has these patches:
These changes can improve performance if your servers are IO bound and have many disks. They are more likely to be an issue if you use O_DIRECT for InnoDB or a remote file system such as NFS.
The 5.1 branch in launchpad has these patches:
show_patches.patch SHOW PATCHES 1.0 Jeremy Cole N/A innodb_io_threads.patch INNODB IO THREADS 1.0 mcallaghan@google.com GPL More background IO threads innodb_fast_mutex.patch INNODB FAST MUTEX 1.0 benhandy@google.com GPL Faster Innodb mutex innodb_io_capacity.patch INNODB IO CAPACITY 1.0 mcallaghan@google.com GPL Background IO rate limits show_global_mutex_stats.patch SHOW GLOBAL MUTEX STATUS 1.0 mcallaghan@google.com GPL Displays mutex contention statistics
Friday, October 10, 2008
InnoDB faster mutex patch ported to MySQL 5.1
I have ported the Google patch for faster InnoDB mutexes to MySQL 5.1. The Google patch is for the InnoDB plugin. This is for the version of InnoDB in the 5.1 branch.
This improves performance on servers with more than 4 cores. It provides the following on supported platforms (x86, modern gcc)
The code is in Launchpad
This improves performance on servers with more than 4 cores. It provides the following on supported platforms (x86, modern gcc)
- uses atomic ops for the InnoDB mutex
- replaces the InnoDB rw-mutex
- disables the InnoDB memory heap and memory heap mutex
The code is in Launchpad
Monday, October 6, 2008
More background IO threads for InnoDB
I extracted code to support a configurable number of background IO threads for InnoDB. The code is available in a branch of MySQL 5.0. This adds 3 my.cnf variables:
The output from the FILE I/O section of SHOW INNODB STATUS has been extended to include statistics on IO operations. There is also one line per IO thread. The added output includes:
The state of the 5.0 branch after the latest patch:
- innodb_read_io_threads - the number of threads for read prefetch requests made by InnoDB. Only 1 thread was used prior to this patch.
- innodb_write_io_threads - the number of threads for writing dirty pages. Only 1 thread was used prior to this patch. If InnoDB is run with O_DIRECT then it usually helps to make this value larger than 1.
- innodb_max_merge_io - the maximum number of requests for adjacent pages to merge into one large IO operation. The default is 64. You don't need to change this, but if you do, please post the results.
The output from the FILE I/O section of SHOW INNODB STATUS has been extended to include statistics on IO operations. There is also one line per IO thread. The added output includes:
- reads, writes - the number of read or write requests done by this thread. One request is for one 16kb Innodb page except for the log thread.
- requests - the number of read or write system calls used. As reads and writes for adjacent pages may be merged into one IO, the number of requests is always less than or equal to the number of reads and writes.
- io secs - the number of seconds to process the reads and writes.
- io msecs/request - average time for an IO request in milliseconds, (io secs * 1000) / (requests)
- max_io_wait - the maximum time in milliseconds for an IO operation
-------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) reads 0 writes 0 requests 0 io secs 0.000000 io msecs/request 0.000000 max_io_wait 0.000000 I/O thread 1 state: waiting for i/o request (log thread) reads 0 writes 30 requests 30 io secs 0.000570 io msecs/request 0.019000 max_io_wait 0.043000 I/O thread 2 state: waiting for i/o request (read thread) reads 414 writes 0 requests 36 io secs 0.004601 io msecs/request 0.127806 max_io_wait 0.210000 I/O thread 3 state: waiting for i/o request (read thread) reads 337 writes 0 requests 26 io secs 0.003806 io msecs/request 0.146385 max_io_wait 0.298000 I/O thread 10 state: waiting for i/o request (write thread) reads 0 writes 1137 requests 75 io secs 0.022748 io msecs/request 0.303307 max_io_wait 5.213000 I/O thread 11 state: waiting for i/o request (write thread) reads 0 writes 259 requests 11 io secs 0.005803 io msecs/request 0.527545 max_io_wait 2.524000
The state of the 5.0 branch after the latest patch:
show patches; File Name Version Author License Comment show_patches.patch SHOW PATCHES 1.0 Jeremy Cole N/A innodb_io_threads.patch INNODB IO THREADS 1.0 mcallaghan@google.com GPL More background IO threads innodb_fast_mutex.patch INNODB FAST MUTEX 1.0 benhandy@google.com GPL Faster Innodb mutex innodb_io_capacity.patch INNODB IO CAPACITY 1.0 mcallaghan@google.com GPL Background IO rate limits innodb_disallow_writes.patch INNODB DISALLOW WRITES 1.0 jtolmer@google.com GPL Prevent writes to Innodb
InnoDB background IO
I ported code from the v2 Google patch to a branch of MySQL 5.0 on launchpad. Two new my.cnf variables have been added:
- innodb_io_capacity - specifies the IO capacity of the server in IOPs. The default inherited from existing InnoDB behavior is 100. This is used to determine the rate of background IO. Prior to this patch, background IO was rate limited assuming the server could do 100 IOPs. It can help to increase the background IO rate on servers with many disks. This makes it easier for the background IO threads to keep up with a server during high rates of row changes. When this is increased the rates of the following may increase:
- flushing dirty pages
- applying changes from the insert buffer
- physically removing deleted rows
- innodb_extra_dirty_writes - the default for this is ON. When ON, dirty pages may be flushed to disk if the server is idle even when innodb_max_dirty_pages_pct has not been reached. When this is set to off, dirty pages may be written to disk less frequently.
- the order of output for SHOW INNODB STATUS has been changed so that transactions are printed last. The output from SHOW INNODB STATUS is limited in size and I prefer that the active transaction list get truncated rather than the other sections.
- A new section has been added to SHOW INNODB STATUS output that displays the busy-wait time for InnoDB spin locks and statistics on the work done by the main background IO thread.
- The main background IO thread synchronously forces the InnoDB transaction log to disk less frequently when innodb_flush_log_at_trx_commit != 1. I think this code might need to be changed so that the log flush is not done at all rather than getting done asynchronously in that case. Regardless, the main background IO thread forces the log to disk once per second in all cases.
show variables like 'Inno%';Example output added to SHOW INNODB STATUS
Variable_name Value
innodb_extra_dirty_writes ON
innodb_io_capacity 1000
----------
BACKGROUND THREAD
----------
srv_master_thread loops: 1 1_second, 0 sleeps, 0 10_second, 0 background, 0 flush
srv_master_thread log flush: 0 sync, 0 async
srv_wait_thread_mics 0 microseconds, 0.0 seconds
spinlock delay for 5 delay 20 rounds is 4 mics
Thursday, October 2, 2008
Innovation: MySQL Cluster, Oracle Exadata, H-Store
MySQL Cluster continues to be ahead of the game, but we don't always hear about it as Cluster developers are either too busy or too modest.
Oracle announced Exadata last week. It provides extremely high-performance disk access and runs software on the storage servers to perform selection and projection at the storage server. Oracle has discovered the value of condition pushdown. It is only a matter of time before they figure out the value of batch key access and query fragment evaluation. I wonder if you can buy the Exadata hardware without Oracle software? And I wonder when MySQL will use Cluster as the basis for a data warehouse server.
Mikael Ronstrom described work in progress to support multi-threaded Cluster servers. This has many things in common with H-Store which is a radically different design for a DBMS to scale on multi-core architectures. Well, it is different when you don't consider Cluster.
Oracle announced Exadata last week. It provides extremely high-performance disk access and runs software on the storage servers to perform selection and projection at the storage server. Oracle has discovered the value of condition pushdown. It is only a matter of time before they figure out the value of batch key access and query fragment evaluation. I wonder if you can buy the Exadata hardware without Oracle software? And I wonder when MySQL will use Cluster as the basis for a data warehouse server.
Mikael Ronstrom described work in progress to support multi-threaded Cluster servers. This has many things in common with H-Store which is a radically different design for a DBMS to scale on multi-core architectures. Well, it is different when you don't consider Cluster.
Features from the Google patch at Launchpad.net
I have published a few features from the Google patch in bzr branches at launchpad.net. There is one branch for 5.0 and one for 5.1. More features will be published over time. The 5.0 branch has:
- show patches - provides SHOW PATCHES to display installed patches. Courtesy of Jeremy Cole at Proven Scaling and others at Percona
- innodb disallow writes - provides SET GLOBAL INNODB_DISALLOW_WRITES to temporarily block any file system changes by InnoDB (so you can take a backup). Courtesy of Justin Tolmer at Google.
- innodb fast mutex - makes the InnoDB mutex and rw-mutex scale better on SMP and disables the InnoDB malloc heap as glibc (ptmalloc) or mtmalloc are more than good enough. Courtesy of Ben Handy at Google.
- show patches - provides SHOW PATCHES to display installed patches. Courtesy of Jeremy Cole at Proven Scaling and others at Percona
- show global mutex stats - provides SHOW GLOBAL MUTEX STATUS to display mutex contention stats for mutexes other than ones internal to InnoDB. Courtesy of Mark Callaghan at Google.
Wednesday, October 1, 2008
Bzr and MySQL: articles by others
Many good articles have been written about using bzr for MySQL. But you need to find them. These articles made it very easy for me to use bzr with MySQL.
- Bazaar in 5 minutes
- Advanced Bazaar for MySQL Development
- Getting started with Bazaar for MySQL code
- A Contributor's Guide to Launchpad.net - Part 1 - Getting Started
- A Contributor's Guide to Launchpad.net - Part 2 - Code Management
- Bazaar
- Building MySQL from source
- From Bazaar to Sandbox In 5 Moves
- Announcement from MySQL
- man page
Subscribe to:
Posts (Atom)

