I used a simple workload with sysbench to determine the rate at which InnoDB can read blocks from disk. The workload is read-only and each query fetches 1 row by PK. The workload was IO-bound with a 2G InnoDB buffer pool and 32G database. Storage was fast courtesy of buffered IO and enough RAM to cache the database in the OS filesystem cache.
Using MySQL 5.6.11 and InnoDB with a few hacks the peak throughput was about 240,000 QPS and 210,000 block reads/second. The test server has 32 cores (16 physical cores, 32 logical cores with HT enabled). This is a great result that can probably be even better. Contention on fil_system->mutex was the bottleneck and I think that can be improved (see feature request #69276). I wonder if 400,000 block reads/second is possible?
A few years back, in 2009 or 2010, I ran similar tests using a server with 8 physical cores. I think HT was disabled. Using MySQL 5.1 with the Facebook patch I was able to get about 40,000 QPS and 35,000 block reads/second. It is good to see software advance to keep up with hardware.
-
I ran single-threaded performance tests to compare MySQL using releases 4.0.30, 5.0.85, 5.1.63 and 5.6.11. On my simple tests 4.0.30 is almost 1.5X faster than 5.6.11. I think it is important to reduce these regressions. Maybe this is an area in which the forks (MariaDB, Percona) will lead the way? I previously opened bug 68825 for this and will update it with the results I report here. Peter and I have written about this previously. Bug 69236 is also open for this now.
I used most of the advice from a previous blog post to build & configure 5.6 for peak performance and that included disabling the performance schema at compile time. All binaries were compiled with the same version of gcc and used jemalloc. For 4.0, 5.0 and 5.6 the built-in version of InnoDB. For 5.1 the plug-in was used.
Two of the tests below report the time in seconds to reload a table and a lower value is better. The other test reports QPS for sysbench point queries and a higher value is better. To avoid (or add) confusion from looking at one graph where less is better and another where more is better I list a value for relative performance below and use that in the graphs. For relative performance a larger value is better and the result for 4.0.30 is the baseline. For the other binaries performance was always worse and the relative performance is less than 1.
I also ran the sql-bench tests from the 4.0.30 release with all MySQL versions. The total time to run all tests was 789, 921, 911 and 1093 seconds for 4.0.30, 5.0.85, 5.1.63 and 5.6.11. Much of the 5.6 regression was from the create tests but there were significant regression for many tests.
Bulk load
The first workload is bulk load. I used mysqldump to dump all rows for a sysbench table with 16M rows. For the important configuration variables: the InnoDB buffer pool was 64G, the doublewrite buffer was disabled, the binlog was disabled and I set innodb_flush_log_at_trx_commit=2. The test was repeated using two files created by mysqldump. The first used --opt and the second used --opt --skip-extended-insert.
Time to reload 16M rows
Input from mysqldump --opt
Relative-perf is Seconds-for-4.0.30 / Seconds
Seconds Relative-perf
4.0.30 143 1
5.0.85 178 0.80
5.1.63 176 0.81
5.6.11 203 0.70 -> no perf-schema
5.6.11 210 0.68 -> perf-schema
Time to reload 16M rows
Input from mysqldump --opt --skip-extended-insert
Relative-perf is Seconds-for-4.0.30 / Seconds
Seconds Relative-perf
4.0.30 966 1
5.0.85 1150 0.84
5.1.63 1201 0.80
5.6.11 1274 0.76
One source of overhead in 5.6 is the mysql client. For the first set of tests below that took 2 to 3 minutes the mysql client used 10 seconds of CPU time for 4.0.30 versus 26 seconds for 5.6. Why does it need 3X more CPU time? From CPU profile reports there is a lot more time in utf8 functions. I added results for 5.6.11 with perf-schema enabled & disabled at compile time. The table below has the CPU time (user + system) from the mysql client during the reload using the output from mysqldump --opt. There is a big regression from 4.0 to 5.1 mostly due to conversions to utf8. There is another regression from 5.1 to 5.6. I don't know why.
5.6 user 0m25.121s sys 0m0.629s -> 25.8 seconds
5.1 user 0m19.884s sys 0m0.747s -> 20.6 seconds
5.0 user 0m18.700s sys 0m0.853s -> 19.6 seconds
4.0 user 0m8.994s sys 0m0.923s -> 9.9 seconds
My server was using these character sets:
character_set_client latin1 character_set_connection latin1 character_set_database latin1 character_set_filesystem binary character_set_results latin1 character_set_server latin1 character_set_system utf8
Point lookup
The next workload is point queries from sysbench. These use auto-commit and fetch 1 row by primary key per query. The buffer pool was warmed prior to the test and cached all rows but the adaptive hash index was cold.
Queries per second from sysbench
Relative-perf is QPS / QPS-for-4.0.30
QPS Relative-perf
4.0.30 14898 1
5.0.85 12247 0.82
5.1.63 10538 0.71
5.6.11 10334 0.69
4View comments
-
For some workloads the InnoDB performance model is very simple. It must flush dirty pages as fast as storage will allow. With a few more bug fixes InnoDB in MySQL 5.6 should be able to do that. The performance model becomes a bit more complicated when you consider that there are two reasons for sustaining a high rate of flushing dirty pages.
flush list
The flush list maintains pages in the order in which they were made dirty (see below for the fine print). This list only has dirty pages. Good reasons to flush dirty pages from the flush list include:
- The percentage of dirty pages in the buffer pool exceeds innodb-max-dirty-pages-pct. These writes are scheduled by a background thread (srv_master_thread prior to 5.6, buf_flush_page_cleaner_thread in 5.6).
- A dirty page was modified a long time ago and uses a redo log file that will soon be rotated. Fuzzy checkpoint code schedules such pages to be written in the background before the log rotate happens to avoid blocking the rotate. Read more about fuzzy checkpoints here, here and here. These writes are scheduled by background & foreground threads. In 5.1 the asynchronous writes scheduled by foreground threads weren't really async. I don't know if that is still true in 5.6. See bug 55004 for details.
This is the source-code description of oldest_modification field used by the flush_list:log sequence number of the START of the log entry written of the oldest modification to this block which has not yet been flushed on disk; zero if all modifications are on disk.
The LRU list maintains all pages, dirty & clean, in referenced order from most-recent to least-recent. When data is read from disk into a page the page is take from either the free list which is frequently empty. When the free list is empty pages are taken from the tail of the LRU. When pages at the tail of the LRU are dirty they must be flushed first and work is done to schedule flushes for such pages as they approach the tail of the LRU. Prior to 5.6 most of these flushes would be scheduled by foreground threads via buf_flush_free_margin. The buf_flush_free_margin code wasn't efficient and has been replaced in 5.6. I also replaced it in the Facebook patch for 5.1 via innodb_fast_free_list but my patch is no longer needed. The big change in 5.6 is that most of this flushing should now be done by a background thread - buf_flush_page_cleaner_thread.
Configuration
In MySQL 5.6 the rate of flushing for the LRU and flush lists should be determined by configuration variables - innodb_io_capacity for the flush list and innodb_lru_scan_depth for the LRU. This is a big change from earlier MySQL releases because innodb_lru_scan_depth is new and prior to 5.6 furious flushing would frequently be done.
In MySQL 5.6 these variables are the peak page write rates per buffer pool instance and with the exception of shutdown these rates should not be exceeded. Prior to 5.6 InnoDB would flush as fast as possible when needed and that rate could exceed innodb_io_capacity. I use furious flushing to describe this.
This is good news for getting predictable performance assuming InnoDB is able to sustain these configured rates. There are a few more bugs to get fixed in 5.6 before that is true. Fortunately the bugs are getting fixed quickly and many are fixed in 5.6.12. But I wish there were but one variable to set the page flush rate rather than two.
Benchmarks
There are a few basic performance tests to determine whether InnoDB can sustain the desired page flush rates. I use sysbench with a workload that updates 1 row by primary key per transaction and then run it for two configurations -- cached and IO-bound. The doublewrite buffer is enabled but fsync-on-commit and the binlog are disabled. The workload has no data contention. The workload & configuration make it as easy as possible for InnoDB to flush pages. If it can't keep up here then it will have more trouble elsewhere.
Cached
The cached configuration uses a buffer pool much larger than the test tables. For this workload all of the flushing should be done from the flush list. Whether InnoDB needs to sustain the maximum page flush rate (innodb_buffer_pool_instances X innodb_io_capacity) is determined by the size & number of the redo logs and the rate at which redo logs are written. For the tests reported here I did not vary the size & number of redo logs to confirm the peak flush rate could be sustained.
These list the rate of updates/second. One interesting result is that for both 4.0.30 and 5.6.11 the rate at high-concurrency with all data in one table is better than the rate for a workload that uses 8 tables. I hope that problem will be fixed in 5.6.12 (see bugs 68658, 68588, 68555 and 45892). Note also that the rates at low-concurrency are better (about 1.2X here) for 4.0.30. I hope that we can undo that change. My builds & configurations use most of my advice for getting peak performance from 5.6.
updates/second using 1 table
1 2 4 8 16 32 clients
5411 10342 20268 36473 19616 9718 4.0.30
4372 8347 16757 34656 60680 70662 5.6.11
updates/second using 8 tables8 16 32 64 128 256 clients33580 18501 9875 7367 6659 6012 4.0.3034328 25716 24633 23828 23790 23114 5.6.11
IO-bound
The IO-bound configuration uses a buffer pool much smaller than the test tables and most of the flushing should be done from the LRU list. The buffer pool is 2G and the tables are ~32G total. The peak flush (innodb_buffer_pool_instances X innodb_lru_scan_depth) is one limit on the peak update rate.
I ran IO-bound tests to compare MySQL 4.0.30 and 5.6.11 and was disappointed by the results for 5.6.11 as the rate was much less than the peak LRU flush rate. I filed bug 69170 and learned it was already fixed in 5.6.12. The results below are the rate of updates/second for 4.0.30, 5.6.11 and 5.6.11 with my hack to fix bug 69170. The 5.6.11 configuration used innodb_buffer_pool_size=2G, innodb_buffer_pool_instances=4 and innodb_lru_scan_depth=4000.
For both the 1 table and 8 table configurations the peak LRU flush rate should be 16,000 and I want the peak rate of updates/second to be close to 16,000 when concurrency grows. Because of bug 69170 it was far from the peak for the 1 table case and my hack fixes that. So I expect that InnoDB in 5.6.12 will be able to sustain much higher QPS for workloads that are bound by LRU flushing. The results with 5.6.11 for 8 tables are much worse than for 1 table and I think this suffers from the bugs listed in the cached section above. I will wait for 5.6.12 to confirm.
updates/second using 1 table
1 2 4 8 16 32 clients
2535 3865 5554 6729 6759 5825 4.0.30
2818 3740 5026 5359 5557 5525 5.6.11
3070 5893 10940 15420 15514 13628 5.6.11+hack
updates/second using 8 tables
8 16 32 64 128 256 clients
4626 4522 4253 4076 3972 3941 4.0.30
7565 6234 6109 6144 6125 6271 5.6.11
1View comments
-
Percona published Linkbench results. It looks like they set innodb_io_capacity to a large value but did not increase innodb_lru_scan_depth. I think that is a mistake. This is a mistake that I have made and that I hope you can avoid. innodb_io_capacity limits the number of pages flushed per second from the flush list. innodb_lru_scan_depth limits the number of pages flushed per second from the LRU list. Perhaps it would have been better to derive both flush rates from innodb_io_capacity. These are approximate limits on the number of page writes per buffer pool instance per second rather than on the number of page writes per second. When innodb_buffer_pool_instances=8, innodb_io_capacity=2000 and innodb_lru_scan_depth=1000 then InnoDB can flush about 8 X 2000 pages per second from the flush list and 8 X 1000 pages per second from the LRU.
It is now very important to set both of these variables to the IO rate you want InnoDB to maintain. Prior to 5.6 this was much less important as InnoDB did "furious flushing" and the main background thread could greatly exceed innodb_io_capacity when there was work to be done.
--
The rest of this was posted to mysqlperformanceblog.com but looks like it is held up by spam filters. Maybe it has too many URLs. One day spam filters will discover the notion of relevance for URLs.
For 5.6 you need to set innodb_lru_scan_depth to a value similar to innodb_io_capacity. They should have made LRU page flushing a function of innodb_io_capacity, but they didn’t and a lot of people are going to waste time and lose performance because of this. I am one of those people.
When MySQL 5.6 spends a lot of time in buf_LRU_scan_and_free_block you are not going to have good performance results. Prior to 5.6 foreground threads were responsible for moving pages to the free list via buf_flush_free_margin. I did a lot of work for the innodb_fast_free_list option to fix the stalls and that change makes 5.1 competitive with 5.6 (http://mysqlha.blogspot.com/2013/02/mysql-56-is-much-faster-on-io-bound.html) but that patch can’t be used in 5.6.
In 5.6 a background thread, buf_flush_page_cleaner_thread, should be moving all pages to the free list. Foreground threads can still do it, but the server is much less efficient when they must do it. And when they call buf_LRU_scan_and_free_block they are trying to do it.
I filed a few bugs for this and I hope most of them will be fixed in 5.6.12. They should make a difference. I think Oracle might be aware of some of these problems prior to my bug reports — but it is hard to know what goes on behind the large wall between us and them.
http://bugs.mysql.com/bug.php?id=68658
http://bugs.mysql.com/bug.php?id=68588
http://bugs.mysql.com/bug.php?id=68555
http://bugs.mysql.com/bug.php?id=68481
I am not sure this one will be fixed. Too bad, as Percona has a fix for it – http://bugs.mysql.com/bug.php?id=458920Add a comment
-
I compared the performance of MySQL 5.6.11 versus 4.0.30 using a read-only workload with sysbench. Performance was much better for 5.6.11 in most cases. At low-concurrency MySQL 4.0 was a bit faster. MySQL 5.6.11 was faster at high-concurrency and when doing many page reads per second. The product has improved a lot since I started using MySQL.
I followed most of my advice on building and configuring MySQL 5.6 for peak performance. I used two test servers - one for the sysbench clients and the other for mysqld. Each server has 16 real cores and 32 with HT enabled. Each test server has 144 GB of memory and fast storage that can do ~150,000 16 KB page reads/second.
The tests selected 1 row by primary key per query and all queries used auto-commit mode. Tests were run in IO-bound and cached modes. For cached tests all of the tables were in the InnoDB buffer pool at test start, but the adaptive hash index might not be warm. For the IO-bound tests only a subset of the tables fit in the InnoDB buffer pool. Direct IO was used to avoid caching by the OS. The buffer pool was 64 GB for the cached tests and 2 GB for the IO-bound tests. The full my.cnf files are at the end of this post.
One group of tests used 1 table and was run for 1, 2, 4, 8, 16 and 32 concurrent clients. The other group of tests used 8 tables and was run for 8, 16, 32, 64, 128 and 256 concurrent clients. The database size was 4 GB for the 1 table tests and 32 GB for the 8 table tests. Thus for the IO-bound tests about 50% of the table is in cache for the 1-table test and about 7% is in cache for the 8-tables test.
IO bound
MySQL 5.6.11 was much faster than 4.0.30 except at low concurrency. Some of the changes that make this possible include multiple buffer pool instances and a more efficient LRU page replacement algorithm.
QPS, 1-table
1 2 4 8 16 32 clients
3224 6231 12705 26193 55477 92301 5.6.11
3579 7160 13418 12646 10769 10184 4.0.30
QPS, 8-tables
8 16 32 64 128 256 clients
22676 44121 71019 79787 79082 78269 5.6.11
17168 16520 14990 14852 14767 14696 4.0.30
Cached
Had I run the 1-table test for more than 32 clients I expect that the difference between 5.6.11 and 4.0.30 would have been significant as it was in the 8-table test. At low-concurrency 4.0.30 is faster.
QPS, 1-table
1 2 4 8 16 32 clients
5005 10065 20178 39595 83888 154852 5.6.11
6130 11868 24391 45856 106598 133349 4.0.30
QPS, 8-tables
8 16 32 64 128 256 clients
39895 76498 153421 227335 245511 244313 5.6.11
49603 101307 97728 77737 75790 75500 4.0.30
Configuration
This is the my.cnf file for 4.0.30:
[mysqld]
innodb_log_file_size=1900M
innodb_flush_method=O_DIRECT
innodb_max_dirty_pages_pct=80
innodb_flush_log_at_trx_commit=2
innodb_thread_concurrency=32
innodb_buffer_pool_size= --> 64G or 2G
max_connections=2000
table_cache=2000
key_buffer_size=200M
query_cache_size=0
query_cache_type=0
This is the my.cnf file for 5.6.11
innodb_log_file_size=1900M
innodb_doublewrite=1
innodb_flush_method=O_DIRECT
innodb_max_dirty_pages_pct=80
innodb_file_format=barracuda
innodb_file_per_table
innodb_io_capacity=8000
innodb_lru_scan_depth=8000
innodb_checksum_algorithm=CRC32
metadata_locks_hash_instances=256
innodb_flush_log_at_trx_commit=2
innodb_thread_concurrency=32
innodb_buffer_pool_size=64G
innodb_buffer_pool_instances=8
table-definition-cache=1000
table-open-cache=2000
table-open-cache-instances=1
max_connections=2000
key_buffer_size=200M
skip_log_bin
query_cache_size=0
query_cache_type=0
6View comments
View comments