Sunday, March 31, 2013

MySQL 5.6: single-threaded, read-only

Peter Zaitsev wrote about the importance of single-threaded performance and expressed concern that there might be regressions in MySQL 5.6. Not much has been published on it so I will repeat tests that I ran for high-concurrency workloads using sysbench with IO-bound and cached workloads. These tests used sysbench with 128M rows in one table. I compared MySQL 5.6.10 (orig5610 below), MySQL 5.1.63 (orig5163 below) and MySQL 5.1.63 with the Facebook patch (fb5163 below). For all tests the sysbench clients ran on the same host as mysqld. The sysbench workload is to fetch one row by primary key per query.

My performance summary is:
  • performance_schema reduced peak performance by 9% for the IO-bound test. I assume the overhead of the PS is only an issue for very fast queries and the storage for my test is fast (100us reads). But the table_stats and user_stats feature from the Facebook patch is able to provide a lot of useful statistics without that overhead. I hope the PS can be made as efficient.
  • performance_schema did not reduce peak performance for the cached test. But in that case MySQL 5.6 was always about 9% worse in peak QPS than the others. I have yet to identify the source of the overhead.
  • innodb_checksum_algorithm=CRC32 helps when storage is fast
IO bound

For the IO bound test a 4GB InnoDB buffer pool was used. The database file is ~29GB. Fast storage was used with an average latency of 100 microseconds per 16kb page read. MySQL 5.1.63 is slower because it doesn't use x86 instructions to make the InnoDB checksum validation faster. For orig5610 I used innodb_checksum_algorithm=CRC32 and fb5163 does something similar. The performance schema still has a big cost - about 9% of peak QPS is lost when it is enabled with default options.
  • 5063 QPS - orig5610, performance_schema not compiled
  • 5012 QPS - fb5163
  • 4913 QPS - orig5163, innodb_checksums=0
  • 4682 QPS - orig5610, performance_schema=ON, default options
  • 4342 QPS - orig5163, innodb_checksums=1
Cached

For the cached tests a 64GB InnoDB buffer pool was used and the table was read into the buffer pool via a SELECT statement prior to the benchmark. The adaptive hash index was not warm prior to the test so most queries spent some time updating it.

  • 11940 QPS - orig5163
  • 11901 QPS - fb5163
  • 10875 QPS - orig5610, performance_schema not compiled
  • 10860 QPS - orig5610, performance_schema with default options

Thursday, March 28, 2013

I'd rather not always be compacting

I read a post about the need to defragment MongoDB databases. I liked the title but prefer to avoid a DBMS that requires routine maintenance. Alas I don't know what the author means by routine. There is a big difference between having to defragment a database once a week versus once a quarter. This isn't an online operation per the manual so it can have a big impact on service availability.

PostgreSQL isn't immune to this problem. It has a long history with vacuum and each release makes things better. But I haven't run either PostgreSQL or MongoDB in production for a busy OLTP workload so I don't know whether these would be a serious problem. Context matters in this case. I can read many things on the internet that tell me MongoDB compaction and PostgrSQL vacuum are never an issue or are always an issue. The hard part is figuring out whether that is true for the workload that you care about.

I do have experience with InnoDB. InnoDB purge has some things in common with PostgreSQL vacuum. It cleans up old versions required by MVCC when the old versions are no longer visible. For InnoDB this means it physically removes rows that were delete marked. Performance can slow to a crawl when purge gets too far behind and there are a few things that could be done to control that, but purge has never been routine maintenance. It just runs in the background. The controls for it are:
Purge != defragmentation

Even though purge is online InnoDB can still require maintenance. On servers that I care about 2/3 of the data growth rate was caused by fragmentation versus 1/3 from new data. Defragmentation is useful to reclaim the space wasted by fragmentation, especially when using pure-flash storage. The frequency in this case is a few times per year.

How do you defragment servers in production? If you are fortunate to have several replicas per master and spare capacity, then you can defragment a replica and then replace the master. The other options include using online DDL that is new in MySQL 5.6 to defragment secondary indexes. To defragment a primary index you probably need an online schema change framework like pt-online-schema change, oak-online-alter-table or Facebook OSC.

I think there is a better way to defragment InnoDB that is incremental and less invasive. But this is only a theory today. One of my talented peers is trying to figure this out and we hope to have good news and code to share in a few months.

Wednesday, March 27, 2013

My peers at the UC

We hope to see you at the MySQL user conference. Once again Yoshi outdoes Domas (3 talks versus 2). Attend to learn more about their rivalry. Nizam has new results for making InnoDB compression better and he will describe his recent work at the UC.
  • InnoDB Compression Present and Future - Nizam and Justin will talk about past and present work to get InnoDB compression into production. We were excited when this was first deployed and shrank many tables to 50% of their uncompressed size. Recent work has significantly improved the compression rate we get for real data. But you need to wait for the UC to learn more.
  • Lots and lots of small data - we are big believers in the small data movement. The db-perf team (Yoshi, Domas, Harrison) will describe what we do to keep our small data servers happy . They will be joined by an honorary member of the db-perf team (Mark). Note that by small data we still have a huge amount of data per host, but we prefer to brag about removing data more than having a lot (and too much) data.
  • Just say no to replication lag - Domas describes our long-term efforts to combat replication lag.
  • Reducing Database Size - Yoshi provides a lot more advice on making sure your servers remain in the small data movement.
  • Practical Failover Design - there is a rumor that some clever people have been able to automate MySQL failover.
  • Online Shard Migration - Vamsi enjoyed doing Online Schema Change so much that he returned to the area with a framework for migrating shards online.

Tuesday, March 12, 2013

MySQL 5.6: O_DIRECT, fil_flush and sleep

I discussed my previous post on O_DIRECT_NO_FSYNC with the InnoDB team and they fixed my understanding of a few parts of the code that contribute to the stalls I have been reporting.  We also discussed a problem I have been ignoring. The InnoDB code that does an fsync for a tablespace (fil_flush) can make a thread sleep when there are concurrent attempts to do the fsync. The amount of time to sleep, 20 milliseconds, was probably chosen in 1995. It is too long for fast storage including HW RAID cards with battery backed write cache and flash-based devices.

I created bug 68588 for the call to sleep and today I have performance numbers to share from MySQL 5.6.10. The workload is the same as described in the previous post. Three binaries were tested:

  • O_DIRECT,condvar - use innodb_flush_method=O_DIRECT and patch the binary to use condition variable waits rather than sleep when there are concurrent threads in fil_flush.
  • O_DIRECT,sleep - use innodb_flush_method=O_DIRECT
  • O_DIRECT_NO_FSYNC - use innodb_flush_method=O_DIRECT_NO_FSYNC

updates/second for update 1 row by PK via sysbench
    8      16      32      64     128     256   concurrent clients
18234   23513   22542   21967   21941   22135   O_DIRECT,condvar
18382   25290   10464    9868   10059   10917   O_DIRECT,sleep
18237   26332   30318   29695   29633   29380   O_DIRECT_NO_FSYNC

There is still a benefit from using O_DIRECT_NO_FSYNC but the difference is less significant. I didn't see any obvious stalls when using PMP with the O_DIRECT,condvar binary. However, at test end the average rate for innodb_pages_written was about 19,000/second for O_DIRECT,condvar versus 24,000/second for O_DIRECT_NO_FSYNC. My explanation at this point is that the page cleaner thread is able to write pages faster when it doesn't have to wait to call fsync and go through fil_flush().

Speaking at OpenWest

I am speaking about MySQL at OpenWest. I have a keynote on May 2 and a Q&A session on May 3.

Tuesday, March 5, 2013

MySQL 5.6: no O_DIRECT_NO_FSYNC for you

MySQL 5.6 has a new option for innodb_flush_method. When O_DIRECT_NO_FSYNC is used then fsync is not done after writes. This was a response to feature request 45892. Unfortunately this option as implemented and documented is not safe to use with XFS. That is too bad because it can make performance much better for workloads that stall on fil_flush. I revisited the update-only & IO-bound workload from a previous post. QPS below is much better when the new option is used. I used MySQL 5.6 with 8 buffer pool instances and the binlog disabled. The only difference was the value of innodb_flush_method.


updates/second for update 1 row by PK via sysbench
    8      16      32      64     128     256   concurrent clients
18234   24359   10379    9795    9843   10283   O_DIRECT
17996   26853   30265   28923   29293   29477   O_DIRECT_NO_FSYNC

This is the problem thread stack when there are stalls from O_DIRECT:
os_thread_sleep,fil_flush,fil_flush_file_spaces,buf_flush_sync_datafiles,buf_flush_single_page_from_LRU,buf_LRU_get_free_block,buf_page_init_for_read,buf_read_page_low,..

This is the my.cnf from the test. The server has fast storage that can do ~150k disk reads/second. I think innodb_io_capacity and innodb_lru_scan_depth were large enough. I don't think the host was mis-tuned.
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
innodb_log_file_size=1900M
innodb_max_dirty_pages_pct=80
innodb_file_format=barracuda
innodb_file_per_table
table-definition-cache=1000
table-open-cache=2000
max_connections=2000
key_buffer_size=200M
innodb_io_capacity=1000
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
query_cache_size=0
query_cache_type=0
innodb_thread_concurrency=0
innodb_flush_method=O_DIRECT_NO_FSYNC
metadata_locks_hash_instances=256
innodb_checksum_algorithm=CRC32
innodb_thread_concurrency=32
innodb_buffer_pool_size=4g
innodb_io_capacity=8192
innodb_buffer_pool_instances=8
innodb_adaptive_hash_index=1
loose-table_open_cache_instances=1
innodb_lru_scan_depth=8192

Sunday, March 3, 2013

MySQL 5.6: cached & update-only workload

For this post on MySQL 5.6 performance I used sysbench for a cached & update-only workload. The previous post on an update-only but not cached workload is here. For this workload each query updates 1 row by primary key. The database size is ~32GB and the data was cached in a 64GB InnoDB buffer pool. The table was read into the buffer pool before the tests were done.

The summary is that MySQL 5.6 did much better than MySQL 5.1 but throughput degraded when more buffer pool instances were used. I filed bug 68555 for that and if my patch is OK then the fix is easy. Using innodb_flush_method=O_DIRECT_NO_FSYNC also helped.

There binaries were tested:
  • orig5610 - MySQL 5.6.10
  • orig5163 - MySQL 5.1.63
  • fb5163 - MySQL 5.1.63 + the Facebook patch
Except where changes are indicated below the common configuration for these tests is:
innodb_buffer_pool_size=64G
innodb_log_file_size=1900M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
innodb_flush_method=O_DIRECT
innodb_max_dirty_pages_pct=80
innodb_file_format=barracuda
innodb_file_per_table
max_connections=2000
key_buffer_size=200M
innodb_io_capacity=1000
query_cache_size=0
query_cache_type=0
skip_log_bin
The MySQL 5.6 specific configuration is:
table-definition-cache=1000
table-open-cache=2000
metadata_locks_hash_instances=256
innodb_checksum_algorithm=CRC32
The MySQL 5.1 + Facebook patch specific configuration is:
table_cache=2000
innodb_fast_checksums=1

innodb_thread_concurrency=0

This test used innodb_thread_concurrency=0. MySQL 5.1 does better than 5.6 at low concurrency. MySQL 5.1 does much worse at high concurrency because of mutex contention. The workaround for high concurrency is to set innodb_thread_concurrency.

updates/second
    8      16      32      64    128      256   concurrent clients
43451   53975   31389   15628   10849    6520   fb5163
40463   52065   32846   15490   10637    6366   orig5163
39522   51445   62650   62788   61743   57441   orig5610

innodb_thread_concurrency=32

This test used innodb_thread_concurrency=32. For MySQL 5.1, QPS increased when innodb_thread_concurrency was changed from 0 to 32.

updates/second
    8      16      32      64    128      256   concurrent clients
44955   53898   27178   24056   22719   21375   fb5163
41282   49807   32451   28031   25959   23757   orig5163
36691   54741   62789   62262   61208   60884   orig5610

More buffer pool instances, part 1

This test used innodb_thread_concurrency=0 and varied innodb_buffer_pool_instances. For either value of innodb_thread_concurrency QPS drops when innodb_buffer_pool_instance is 8.

  • bpX means innodb_buffer_pool_instances=X
  • itcX means innodb_thread_concurrency=X

updates/second
    8      16      32      64    128      256   concurrent clients
36691   54741   62789   62262   61208   60884   orig5610, bp1, itc32
30076   30535   39266   33280   37135   31415   orig5610, bp8, itc32
39522   51445   62650   62788   61743   57441   orig5610  bp1, itc0
30141   31017   39089   40449   44994   47164   orig5610  bp8, itc0

More buffer pool instances, part 2

These used innodb_thread_concurrency=32. Note that performance with 8 buffer pool instances is much worse than with 1 although increasing innodb_io_capacity reduces the regression. Below iocap=X means innodb_io_capacity=X.


updates/second for innodb_buffer_pool_instances=1
    8      16      32      64    128      256   concurrent clients
36691   54741   62789   62262   61208   60884   orig5610, iocap=1000
34860   55060   61938   62270   61661   61107   orig5610, iocap=2048
40105   56780   64174   64257   62996   62208   orig5610, iocap=4096
43293   61882   66132   64147   63124   62533   orig5610, iocap=8192

updates/second for innodb_buffer_pool_instances=8
    8      16      32      64     128     256   concurrent clients
30076   30535   39266   33280   37135   31415   orig5610, iocap=1000
30151   32422   38110   39712   37551   34242   orig5610, iocap=2048
32092   38397   43014   44260   42148   41299   orig5610, iocap=4096
38302   51486   55676   55044   54883   54032   orig5610, iocap=8192

Fixing performance for multiple buffer pool instances

There were two changes to improve performance. The first is to fix bug 68555 and avoid thread stalls in log_checkpoint_margin when innodb_buffer_pool_instances > 1. The code change is simple and the QPS results below are much better.


updates/second for innodb_buffer_pool_instances=8
    8      16      32      64     128     256   concurrent clients
46726   70465   57360   49329   49120   48730   orig5610, iocap=1024
42710   70487   65408   57632   57587   57067   orig5610, iocap=2048
42464   66687   70299   63091   62940   61831   orig5610, iocap=4096
39295   69216   72740   66955   65382   65283   orig5610, iocap=8192
41798   68033   71868   66918   64496   64865   orig5610, iocap=16384



The second change is to use innodb_flush_method=O_DIRECT_NO_FSYNC along with a fix for bug 68555.  When O_DIRECT_NO_FSYNC is used there are fewer stalls in fil_flush_file_spaces. That function is called to do fsyncs on database files after writes have been done. At least on Linux, the fsync calls are not needed unless the file has been extended and there are pending filesystem metadata changes. The overhead isn't just the cost of doing fsync it is also the cost of getting through code that isn't exactly efficient (fil_flush, fil_flush_file_spaces).


updates/second for innodb_buffer_pool_instances=8
    8      16      32      64     128     256   concurrent clients
44092   69082   73446   69393   68118   67118   orig5610, iocap=1024
42445   67282   74463   69547   68324   67443   orig5610, iocap=2048
40601   69185   75044   69907   68406   67640   orig5610, iocap=4096
35835   68327   74777   69765   68069   67286   orig5610, iocap=8192
41997   66772   75226   70166   68531   67716   orig5610, iocap=16384

Neighbor page flushing

This test used innodb_thread_concurrency=32. Disabling neighbor page flushing had a small benefit for both MySQL 5.6 (innodb_flush_neighbors=0) and the FB patch for MySQL 5.1 (innodb_flush_neighbors_for_lru=0, innodb_flush_neighbors_on_checkpoint=0).

updates/second
    8      16      32      64    128      256   concurrent clients
29180   23622   18248   16683   16137   15655   fb5163
37945   53316   65973   65674   64065   62279   orig5610








 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.