Sunday, March 3, 2013

MySQL 5.6: cached & update-only workload

For this post on MySQL 5.6 performance I used sysbench for a cached & update-only workload. The previous post on an update-only but not cached workload is here. For this workload each query updates 1 row by primary key. The database size is ~32GB and the data was cached in a 64GB InnoDB buffer pool. The table was read into the buffer pool before the tests were done.

The summary is that MySQL 5.6 did much better than MySQL 5.1 but throughput degraded when more buffer pool instances were used. I filed bug 68555 for that and if my patch is OK then the fix is easy. Using innodb_flush_method=O_DIRECT_NO_FSYNC also helped.

There binaries were tested:
  • orig5610 - MySQL 5.6.10
  • orig5163 - MySQL 5.1.63
  • fb5163 - MySQL 5.1.63 + the Facebook patch
Except where changes are indicated below the common configuration for these tests is:
innodb_buffer_pool_size=64G
innodb_log_file_size=1900M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
innodb_flush_method=O_DIRECT
innodb_max_dirty_pages_pct=80
innodb_file_format=barracuda
innodb_file_per_table
max_connections=2000
key_buffer_size=200M
innodb_io_capacity=1000
query_cache_size=0
query_cache_type=0
skip_log_bin
The MySQL 5.6 specific configuration is:
table-definition-cache=1000
table-open-cache=2000
metadata_locks_hash_instances=256
innodb_checksum_algorithm=CRC32
The MySQL 5.1 + Facebook patch specific configuration is:
table_cache=2000
innodb_fast_checksums=1

innodb_thread_concurrency=0

This test used innodb_thread_concurrency=0. MySQL 5.1 does better than 5.6 at low concurrency. MySQL 5.1 does much worse at high concurrency because of mutex contention. The workaround for high concurrency is to set innodb_thread_concurrency.

updates/second
    8      16      32      64    128      256   concurrent clients
43451   53975   31389   15628   10849    6520   fb5163
40463   52065   32846   15490   10637    6366   orig5163
39522   51445   62650   62788   61743   57441   orig5610

innodb_thread_concurrency=32

This test used innodb_thread_concurrency=32. For MySQL 5.1, QPS increased when innodb_thread_concurrency was changed from 0 to 32.

updates/second
    8      16      32      64    128      256   concurrent clients
44955   53898   27178   24056   22719   21375   fb5163
41282   49807   32451   28031   25959   23757   orig5163
36691   54741   62789   62262   61208   60884   orig5610

More buffer pool instances, part 1

This test used innodb_thread_concurrency=0 and varied innodb_buffer_pool_instances. For either value of innodb_thread_concurrency QPS drops when innodb_buffer_pool_instance is 8.

  • bpX means innodb_buffer_pool_instances=X
  • itcX means innodb_thread_concurrency=X

updates/second
    8      16      32      64    128      256   concurrent clients
36691   54741   62789   62262   61208   60884   orig5610, bp1, itc32
30076   30535   39266   33280   37135   31415   orig5610, bp8, itc32
39522   51445   62650   62788   61743   57441   orig5610  bp1, itc0
30141   31017   39089   40449   44994   47164   orig5610  bp8, itc0

More buffer pool instances, part 2

These used innodb_thread_concurrency=32. Note that performance with 8 buffer pool instances is much worse than with 1 although increasing innodb_io_capacity reduces the regression. Below iocap=X means innodb_io_capacity=X.


updates/second for innodb_buffer_pool_instances=1
    8      16      32      64    128      256   concurrent clients
36691   54741   62789   62262   61208   60884   orig5610, iocap=1000
34860   55060   61938   62270   61661   61107   orig5610, iocap=2048
40105   56780   64174   64257   62996   62208   orig5610, iocap=4096
43293   61882   66132   64147   63124   62533   orig5610, iocap=8192

updates/second for innodb_buffer_pool_instances=8
    8      16      32      64     128     256   concurrent clients
30076   30535   39266   33280   37135   31415   orig5610, iocap=1000
30151   32422   38110   39712   37551   34242   orig5610, iocap=2048
32092   38397   43014   44260   42148   41299   orig5610, iocap=4096
38302   51486   55676   55044   54883   54032   orig5610, iocap=8192

Fixing performance for multiple buffer pool instances

There were two changes to improve performance. The first is to fix bug 68555 and avoid thread stalls in log_checkpoint_margin when innodb_buffer_pool_instances > 1. The code change is simple and the QPS results below are much better.


updates/second for innodb_buffer_pool_instances=8
    8      16      32      64     128     256   concurrent clients
46726   70465   57360   49329   49120   48730   orig5610, iocap=1024
42710   70487   65408   57632   57587   57067   orig5610, iocap=2048
42464   66687   70299   63091   62940   61831   orig5610, iocap=4096
39295   69216   72740   66955   65382   65283   orig5610, iocap=8192
41798   68033   71868   66918   64496   64865   orig5610, iocap=16384



The second change is to use innodb_flush_method=O_DIRECT_NO_FSYNC along with a fix for bug 68555.  When O_DIRECT_NO_FSYNC is used there are fewer stalls in fil_flush_file_spaces. That function is called to do fsyncs on database files after writes have been done. At least on Linux, the fsync calls are not needed unless the file has been extended and there are pending filesystem metadata changes. The overhead isn't just the cost of doing fsync it is also the cost of getting through code that isn't exactly efficient (fil_flush, fil_flush_file_spaces).


updates/second for innodb_buffer_pool_instances=8
    8      16      32      64     128     256   concurrent clients
44092   69082   73446   69393   68118   67118   orig5610, iocap=1024
42445   67282   74463   69547   68324   67443   orig5610, iocap=2048
40601   69185   75044   69907   68406   67640   orig5610, iocap=4096
35835   68327   74777   69765   68069   67286   orig5610, iocap=8192
41997   66772   75226   70166   68531   67716   orig5610, iocap=16384

Neighbor page flushing

This test used innodb_thread_concurrency=32. Disabling neighbor page flushing had a small benefit for both MySQL 5.6 (innodb_flush_neighbors=0) and the FB patch for MySQL 5.1 (innodb_flush_neighbors_for_lru=0, innodb_flush_neighbors_on_checkpoint=0).

updates/second
    8      16      32      64    128      256   concurrent clients
29180   23622   18248   16683   16137   15655   fb5163
37945   53316   65973   65674   64065   62279   orig5610








5 comments:

  1. Hi Mark.

    In this benchmark, no configuration changes related to the performance schema are listed, and the performance schema is enabled by default in 5.6.

    Could you confirm if all the MySQL 5.6.10 numbers are with the performance schema enabled, in the default configuration ?

    Thanks,
    -- Marc

    ReplyDelete
  2. I do not think O_DIRECT is a substitute for fsync(). Or is it and I'm mistaken? I thought O_DIRECT is bypassing filesystem cache, but data still can be cached in different non-OS related caches (e.g on HW level).

    ReplyDelete
  3. PS was disabled at compile time. That is our default build.

    ReplyDelete
  4. Wlad - you are correct per the Linux man page. I had your question in the past, but then never followed up. This time I will follow up.

    ReplyDelete
  5. Wlad - we think we are OK despite what the man page states. We had this discussion previously. Linux has an interesting history with O_DIRECT.

    ReplyDelete

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.