1. I read a post about the need to defragment MongoDB databases. I liked the title but prefer to avoid a DBMS that requires routine maintenance. Alas I don't know what the author means by routine. There is a big difference between having to defragment a database once a week versus once a quarter. This isn't an online operation per the manual so it can have a big impact on service availability.

    PostgreSQL isn't immune to this problem. It has a long history with vacuum and each release makes things better. But I haven't run either PostgreSQL or MongoDB in production for a busy OLTP workload so I don't know whether these would be a serious problem. Context matters in this case. I can read many things on the internet that tell me MongoDB compaction and PostgrSQL vacuum are never an issue or are always an issue. The hard part is figuring out whether that is true for the workload that you care about.

    I do have experience with InnoDB. InnoDB purge has some things in common with PostgreSQL vacuum. It cleans up old versions required by MVCC when the old versions are no longer visible. For InnoDB this means it physically removes rows that were delete marked. Performance can slow to a crawl when purge gets too far behind and there are a few things that could be done to control that, but purge has never been routine maintenance. It just runs in the background. The controls for it are:
    Purge != defragmentation

    Even though purge is online InnoDB can still require maintenance. On servers that I care about 2/3 of the data growth rate was caused by fragmentation versus 1/3 from new data. Defragmentation is useful to reclaim the space wasted by fragmentation, especially when using pure-flash storage. The frequency in this case is a few times per year.

    How do you defragment servers in production? If you are fortunate to have several replicas per master and spare capacity, then you can defragment a replica and then replace the master. The other options include using online DDL that is new in MySQL 5.6 to defragment secondary indexes. To defragment a primary index you probably need an online schema change framework like pt-online-schema change, oak-online-alter-table or Facebook OSC.

    I think there is a better way to defragment InnoDB that is incremental and less invasive. But this is only a theory today. One of my talented peers is trying to figure this out and we hope to have good news and code to share in a few months.
    4

    View comments

  2. We hope to see you at the MySQL user conference. Once again Yoshi outdoes Domas (3 talks versus 2). Attend to learn more about their rivalry. Nizam has new results for making InnoDB compression better and he will describe his recent work at the UC.
    • InnoDB Compression Present and Future - Nizam and Justin will talk about past and present work to get InnoDB compression into production. We were excited when this was first deployed and shrank many tables to 50% of their uncompressed size. Recent work has significantly improved the compression rate we get for real data. But you need to wait for the UC to learn more.
    • Lots and lots of small data - we are big believers in the small data movement. The db-perf team (Yoshi, Domas, Harrison) will describe what we do to keep our small data servers happy . They will be joined by an honorary member of the db-perf team (Mark). Note that by small data we still have a huge amount of data per host, but we prefer to brag about removing data more than having a lot (and too much) data.
    • Just say no to replication lag - Domas describes our long-term efforts to combat replication lag.
    • Reducing Database Size - Yoshi provides a lot more advice on making sure your servers remain in the small data movement.
    • Practical Failover Design - there is a rumor that some clever people have been able to automate MySQL failover.
    • Online Shard Migration - Vamsi enjoyed doing Online Schema Change so much that he returned to the area with a framework for migrating shards online.
    2

    View comments

  3. I discussed my previous post on O_DIRECT_NO_FSYNC with the InnoDB team and they fixed my understanding of a few parts of the code that contribute to the stalls I have been reporting.  We also discussed a problem I have been ignoring. The InnoDB code that does an fsync for a tablespace (fil_flush) can make a thread sleep when there are concurrent attempts to do the fsync. The amount of time to sleep, 20 milliseconds, was probably chosen in 1995. It is too long for fast storage including HW RAID cards with battery backed write cache and flash-based devices.

    I created bug 68588 for the call to sleep and today I have performance numbers to share from MySQL 5.6.10. The workload is the same as described in the previous post. Three binaries were tested:

    • O_DIRECT,condvar - use innodb_flush_method=O_DIRECT and patch the binary to use condition variable waits rather than sleep when there are concurrent threads in fil_flush.
    • O_DIRECT,sleep - use innodb_flush_method=O_DIRECT
    • O_DIRECT_NO_FSYNC - use innodb_flush_method=O_DIRECT_NO_FSYNC

    updates/second for update 1 row by PK via sysbench
        8      16      32      64     128     256   concurrent clients
    18234   23513   22542   21967   21941   22135   O_DIRECT,condvar
    18382   25290   10464    9868   10059   10917   O_DIRECT,sleep
    18237   26332   30318   29695   29633   29380   O_DIRECT_NO_FSYNC

    There is still a benefit from using O_DIRECT_NO_FSYNC but the difference is less significant. I didn't see any obvious stalls when using PMP with the O_DIRECT,condvar binary. However, at test end the average rate for innodb_pages_written was about 19,000/second for O_DIRECT,condvar versus 24,000/second for O_DIRECT_NO_FSYNC. My explanation at this point is that the page cleaner thread is able to write pages faster when it doesn't have to wait to call fsync and go through fil_flush().
    0

    Add a comment

  4. I am speaking about MySQL at OpenWest. I have a keynote on May 2 and a Q&A session on May 3.
    0

    Add a comment

  5. MySQL 5.6 has a new option for innodb_flush_method. When O_DIRECT_NO_FSYNC is used then fsync is not done after writes. This was a response to feature request 45892. Unfortunately this option as implemented and documented is not safe to use with XFS. That is too bad because it can make performance much better for workloads that stall on fil_flush. I revisited the update-only & IO-bound workload from a previous post. QPS below is much better when the new option is used. I used MySQL 5.6 with 8 buffer pool instances and the binlog disabled. The only difference was the value of innodb_flush_method.


    updates/second for update 1 row by PK via sysbench
        8      16      32      64     128     256   concurrent clients
    18234   24359   10379    9795    9843   10283   O_DIRECT
    17996   26853   30265   28923   29293   29477   O_DIRECT_NO_FSYNC

    This is the problem thread stack when there are stalls from O_DIRECT:
    os_thread_sleep,fil_flush,fil_flush_file_spaces,buf_flush_sync_datafiles,buf_flush_single_page_from_LRU,buf_LRU_get_free_block,buf_page_init_for_read,buf_read_page_low,..

    This is the my.cnf from the test. The server has fast storage that can do ~150k disk reads/second. I think innodb_io_capacity and innodb_lru_scan_depth were large enough. I don't think the host was mis-tuned.
    sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
    innodb_log_file_size=1900M
    innodb_max_dirty_pages_pct=80
    innodb_file_format=barracuda
    innodb_file_per_table
    table-definition-cache=1000
    table-open-cache=2000
    max_connections=2000
    key_buffer_size=200M
    innodb_io_capacity=1000
    innodb_flush_log_at_trx_commit=2
    innodb_doublewrite=0
    query_cache_size=0
    query_cache_type=0
    innodb_thread_concurrency=0
    innodb_flush_method=O_DIRECT_NO_FSYNC
    metadata_locks_hash_instances=256
    innodb_checksum_algorithm=CRC32
    innodb_thread_concurrency=32
    innodb_buffer_pool_size=4g
    innodb_io_capacity=8192
    innodb_buffer_pool_instances=8
    innodb_adaptive_hash_index=1
    loose-table_open_cache_instances=1
    innodb_lru_scan_depth=8192

    0

    Add a comment

  6. For this post on MySQL 5.6 performance I used sysbench for a cached & update-only workload. The previous post on an update-only but not cached workload is here. For this workload each query updates 1 row by primary key. The database size is ~32GB and the data was cached in a 64GB InnoDB buffer pool. The table was read into the buffer pool before the tests were done.

    The summary is that MySQL 5.6 did much better than MySQL 5.1 but throughput degraded when more buffer pool instances were used. I filed bug 68555 for that and if my patch is OK then the fix is easy. Using innodb_flush_method=O_DIRECT_NO_FSYNC also helped.

    There binaries were tested:
    • orig5610 - MySQL 5.6.10
    • orig5163 - MySQL 5.1.63
    • fb5163 - MySQL 5.1.63 + the Facebook patch
    Except where changes are indicated below the common configuration for these tests is:
    innodb_buffer_pool_size=64G
    innodb_log_file_size=1900M
    innodb_flush_log_at_trx_commit=2
    innodb_doublewrite=0
    innodb_flush_method=O_DIRECT
    innodb_max_dirty_pages_pct=80
    innodb_file_format=barracuda
    innodb_file_per_table
    max_connections=2000
    key_buffer_size=200M
    innodb_io_capacity=1000
    query_cache_size=0
    query_cache_type=0
    skip_log_bin
    The MySQL 5.6 specific configuration is:
    table-definition-cache=1000
    table-open-cache=2000
    metadata_locks_hash_instances=256
    innodb_checksum_algorithm=CRC32
    The MySQL 5.1 + Facebook patch specific configuration is:
    table_cache=2000
    innodb_fast_checksums=1

    innodb_thread_concurrency=0

    This test used innodb_thread_concurrency=0. MySQL 5.1 does better than 5.6 at low concurrency. MySQL 5.1 does much worse at high concurrency because of mutex contention. The workaround for high concurrency is to set innodb_thread_concurrency.

    updates/second
        8      16      32      64    128      256   concurrent clients
    43451   53975   31389   15628   10849    6520   fb5163
    40463   52065   32846   15490   10637    6366   orig5163
    39522   51445   62650   62788   61743   57441   orig5610

    innodb_thread_concurrency=32

    This test used innodb_thread_concurrency=32. For MySQL 5.1, QPS increased when innodb_thread_concurrency was changed from 0 to 32.

    updates/second
        8      16      32      64    128      256   concurrent clients
    44955   53898   27178   24056   22719   21375   fb5163
    41282   49807   32451   28031   25959   23757   orig5163
    36691   54741   62789   62262   61208   60884   orig5610

    More buffer pool instances, part 1

    This test used innodb_thread_concurrency=0 and varied innodb_buffer_pool_instances. For either value of innodb_thread_concurrency QPS drops when innodb_buffer_pool_instance is 8.

    • bpX means innodb_buffer_pool_instances=X
    • itcX means innodb_thread_concurrency=X

    updates/second
        8      16      32      64    128      256   concurrent clients
    36691   54741   62789   62262   61208   60884   orig5610, bp1, itc32
    30076   30535   39266   33280   37135   31415   orig5610, bp8, itc32
    39522   51445   62650   62788   61743   57441   orig5610  bp1, itc0
    30141   31017   39089   40449   44994   47164   orig5610  bp8, itc0

    More buffer pool instances, part 2

    These used innodb_thread_concurrency=32. Note that performance with 8 buffer pool instances is much worse than with 1 although increasing innodb_io_capacity reduces the regression. Below iocap=X means innodb_io_capacity=X.


    updates/second for innodb_buffer_pool_instances=1
        8      16      32      64    128      256   concurrent clients
    36691   54741   62789   62262   61208   60884   orig5610, iocap=1000
    34860   55060   61938   62270   61661   61107   orig5610, iocap=2048
    40105   56780   64174   64257   62996   62208   orig5610, iocap=4096
    43293   61882   66132   64147   63124   62533   orig5610, iocap=8192

    updates/second for innodb_buffer_pool_instances=8
        8      16      32      64     128     256   concurrent clients
    30076   30535   39266   33280   37135   31415   orig5610, iocap=1000
    30151   32422   38110   39712   37551   34242   orig5610, iocap=2048
    32092   38397   43014   44260   42148   41299   orig5610, iocap=4096
    38302   51486   55676   55044   54883   54032   orig5610, iocap=8192

    Fixing performance for multiple buffer pool instances

    There were two changes to improve performance. The first is to fix bug 68555 and avoid thread stalls in log_checkpoint_margin when innodb_buffer_pool_instances > 1. The code change is simple and the QPS results below are much better.


    updates/second for innodb_buffer_pool_instances=8
        8      16      32      64     128     256   concurrent clients
    46726   70465   57360   49329   49120   48730   orig5610, iocap=1024
    42710   70487   65408   57632   57587   57067   orig5610, iocap=2048
    42464   66687   70299   63091   62940   61831   orig5610, iocap=4096
    39295   69216   72740   66955   65382   65283   orig5610, iocap=8192
    41798   68033   71868   66918   64496   64865   orig5610, iocap=16384



    The second change is to use innodb_flush_method=O_DIRECT_NO_FSYNC along with a fix for bug 68555.  When O_DIRECT_NO_FSYNC is used there are fewer stalls in fil_flush_file_spaces. That function is called to do fsyncs on database files after writes have been done. At least on Linux, the fsync calls are not needed unless the file has been extended and there are pending filesystem metadata changes. The overhead isn't just the cost of doing fsync it is also the cost of getting through code that isn't exactly efficient (fil_flush, fil_flush_file_spaces).


    updates/second for innodb_buffer_pool_instances=8
        8      16      32      64     128     256   concurrent clients
    44092   69082   73446   69393   68118   67118   orig5610, iocap=1024
    42445   67282   74463   69547   68324   67443   orig5610, iocap=2048
    40601   69185   75044   69907   68406   67640   orig5610, iocap=4096
    35835   68327   74777   69765   68069   67286   orig5610, iocap=8192
    41997   66772   75226   70166   68531   67716   orig5610, iocap=16384

    Neighbor page flushing

    This test used innodb_thread_concurrency=32. Disabling neighbor page flushing had a small benefit for both MySQL 5.6 (innodb_flush_neighbors=0) and the FB patch for MySQL 5.1 (innodb_flush_neighbors_for_lru=0, innodb_flush_neighbors_on_checkpoint=0).

    updates/second
        8      16      32      64    128      256   concurrent clients
    29180   23622   18248   16683   16137   15655   fb5163
    37945   53316   65973   65674   64065   62279   orig5610








    5

    View comments

Loading