Sunday, February 24, 2013

MySQL 5.6: IO-bound, update-only workloads

The third performance test I am doing compares MySQL 5.6 (5.6.10) with MySQL 5.1 for an update-only workload with an IO-bound database. The configuration is similar to what I used for the IO-bound & read-only tests. The performance summary is that for my test servers:

  • MySQL 5.6 is always slower <= 64 threads
  • MySQL 5.6 is a bit faster at >= 128 threads with 1 buffer pool instance.
  • MySQL 5.6 is a lot faster at >= 128 threads with 8 buffer pool instances.
I think there is a performance regression in MySQL 5.6.10 for workloads that require high rates of page flushing when the working set does not fit in the InnoDB buffer pool and created bug 68481 for it. I see many changes to this code in launchpad that are not in 5.6.10 and perhaps these problems have been fixed in launchpad.

In MySQL 5.1 and 5.5 the main background IO thread (srv_master_thread) supported furious flushing when needed. That thread had a loop from which background IO would be scheduled (write back dirty pages, do reads for insert buffer merges) and there was a one second sleep at the start of the loop but the sleep would be skipped when the previous loop iteration flushed many dirty pages. I use furious flushing to describe the InnoDB behavior when the sleep is frequently skipped. Each iteration of the loop would do about innodb_io_capacity disk requests (note that innodb_io_capacity limit was fuzzy, it might try to do twice the rate, but probably not 10X the rate). Because sleep could be skipped the innodb_io_capacity limit didn't really set the IOPs rate for background IO (despite what the docs state) but this was usually a good thing on servers that can do a lot of IOPs. The alternative would be to make InnoDB respect the limit and then set innodb_io_capacity to a large value and I think that is a bad idea without support for real AIO from InnoDB -- which is now in MySQL 5.6.

Several things have changed in MySQL 5.6.10:

  • flushing of dirty pages from the tail of the LRU used to be done by foreground threads (threads that handle query processing) via buf_flush_free_margin and a thread would attempt to flush many dirty pages at a time. Note that clean pages at the end of the LRU can quickly be moved to the free list but dirty pages must first be flushed. In MySQL 5.6 foreground threads will try to move one page at a time via buf_flush_single_page_from_LRU and hope that the page cleaner thread does the rest of the work.
  • the page cleaner thread (buf_flush_page_cleaner_thread) doesn't do furious flushing. It sleeps so that it won't run more than once per second. In theory this means that the documented behavior for innodb_io_capacity is more likely to be correct. Sleep is done if any of the following are true 1) the server is not idle 2) there are pending background reads 3) pages were not flushed on the previous loop iteration. Note that the first condition is always true on a busy server, so sleep is not skipped on a busy server.
  • when the page cleaner flushes dirty pages from the end of the LRU it does not use innodb_io_capacity to determine how much work to do. Follow the call chain from buf_flush_LRU_tail to buf_flush_LRU. It looks like InnoDB will do up to ~1000 page writes per buffer pool instance. So the trick to getting a higher rate of page flushes from the LRU is to use more buffer pool instances. But that is not the real solution.
The workload is sysbench configured to update 1 row by primary key per query. I disabled the InnoDB write buffer, set innodb_flush_log_at trx_commit=2 and disabled the binlog. I probably would not do that in production. I used the following binaries and the abbreviations bpi=innodb_buffer_pool_instances, iocap=innodb_io_capacity, itc=innodb_thread_concurrency, lru=innodb_lru_scan_depth.
  • fb5163 - MySQL 5.1.63 + the Facebook patch, iocap=1000, itc=0
  • orig5163 - MySQL 5.1.63, iocap=1000, itc=0
  • orig5610+hack - MySQL 5.6.10, iocap=1000, bpi=8 and a hack to get the page cleaner thread to do furious flushing when needed.
  • orig5610+bp8 - MySQL 5.6.10, iocap=1000, bpi=8, itc=0
  • orig5610+bp1 - MySQL 5.6.10, iocap=1000, bpi=1, itc=0
  • orig5610+bp1+lru4k - MySQL 5.6.10, bpi=1, itc=0, iocap=lru=4k
  • orig5610+bp8+lru4k - MySQL 5.6.10, bpi=8, itc=0, iocap=lru=4k
  • orig5610+bp1+lru8k - MySQL 5.6.10, bpi=1, itc=0, iocap=lru=8k
  • orig5610+bp8+lru8k - MySQL 5.6.10, bpi=8, itc=0, iocap=lru=8k
updates/second
    8      16      32      64     128    256    concurrent clients
15134   19623   18521   14804   9730    5898    fb5163
10802   12980   13140   11337   11822   6284    orig5163
17649   22993   17281   15066   14899   14907   orig5610+hack
8317    8054    9379    11091   12684   14488   orig5610+bp8
4695    5393    6436    7378    8632    8951    orig5610+bp1
7402    7890    7058    7379    7893    8011    orig5610+bp1+lru4k
18505   25492   10042   11387   12858   14388   orig5610+bp8+lru4k
12965   12101   5134    4637    5078    5202    orig5610+bp1+lru8k
18579   24612   10566   11385   12571   13700   orig5610+bp8+lru8k

There are a few obvious problems. QPS falls quickly with concurrency for MySQL 5.1.63 because of mutex contention. QPS is much worse for MySQL 5.6 because of stalls on LRU flushing but using more buffer pool instances helps for the reason described above. From PMP I see that foreground threads are all stuck in buf_flush_single_page_from_LRU. Using a larger value for innodb_io_capacity does not help.

I repeated tests using innodb_thread_concurrency=32. It fixes the problems that occur at high concurrency for MySQL 5.1.

updates/second
    8      16      32      64     128    256    concurrent clients
15092   19577   18506   17678   17101   16541   fb5163
10669   12929   12994   12745   12617   12254   orig5163
4697     5385    6475    6301    6208    6083   orig5610+bp1
18650    8094    9696    9660    9548    9548   orig5610+bp8+lru2k
16917   22861   17425   17164   17201   17006   orig5610+bp1+hack
16971   24206    9484    9379    9286    9258   orig5610+bp8+hack

I have spent a lot of time working on the LRU flushing code for MySQL 5.1. That includes the innodb_fast_free_list option which allows MySQL 5.1 + the Facebook patch to almost match MySQL 5.6 on IO-bound & read-only workloads. Pages were moved from the LRU to the free list on demand in MySQL 5.1 & 5.5 when foreground threads needed a free page for a disk read and the free list was empty. Unfortunately the code in buf_flush_free_margin to do that work wasn't efficient. MySQL 5.6 might be more efficient given that most of the work will now be done by the page cleaner, a background thread. However this adds the risk that it won't keep up with demand. For example it is possible today for the page cleaner thread to be sleeping when the free list is empty and there are many dirty pages at the end of the LRU. That is not a good state for a high-perf server.


With the Facebook patch, using a large amount of memory with xtrabackup required the innodb_fast_free_list option or recovery was much too slow. I wonder if a similar problem exists in MySQL 5.6.

13 comments:

  1. Hi Mark,

    - did you play with "innodb_lru_scan_depth"? - the default should match 1000 flushed pages per BP instance you've observed, so try to increase it to see the impact..

    - via METRICS table you can monitor now "single pages" flushes involved by user threads (the tuning goal is to avoid them happen), and you may monitor LRU flushing activity as well..

    - what do you mean by "hack" in orig5610+hack results?

    well, we know we're not yet perfect today on LRU flushing and generally on flushing limits (see: http://dimitrik.free.fr/blog/archives/2013/01/mysql-performance-innodb-heavy-io-rw-workloads-limits-in-56.html), but I hope with a planned re-desing + parallel flushing implemented we'll hit finally the max storage write speed ;-)

    Rgds,
    -Dimitri

    ReplyDelete
  2. Mark,

    The page cleaner does not sleep invariably. It notes down the time when it starts the an iteration. On next iteration if sees if it has been less then a second since last iteration. If so, it sleeps for remaining period. Otherwise it keeps working. Look at page_cleaner_sleep_if_needed().

    Secondly, how much we LRU page cleaner flush every second is controlled by innodb_lru_scan_depth. The default is 1K. It is a per buffer pool instance value. When using single instance on a high end server it makes sense to set this value much higher.

    Single page flushing for LRU is a 'last resort' type thing. Just like synchronous flushing of flush list. It should not happen on typical sever. You can see whether or not this is happening by querying information_schema.innodb_metrics where name like '%lru%'; I understand that I have chosen a pretty conservative value for default for innodb_lru_scan_depth but it is somewhat analogous to innodb_io_capacity which also has a lower end default.
    If you get a chance to rerun your benchmark for single buffer pool instance with innodb_lru_scan_depth equal to, say, 5000 or some such, you'll probably see improvement in QPS.

    ReplyDelete
  3. Thanks for the feedback and I will repeat the test with changes to innodb_lru_scan_depth. I prefer that InnoDB derives all of the IO rates from one parameter - innodb_io_capacity. People will either miss the new option or won't be able to set it to a good value.

    I realize it will sleep less than one second. My primary claim is that "furious flushing" isn't used any more when it was done by srv_master_thread. The page cleaner iteration runs at most once per second.

    By "hack" I mean that I added back "furious flushing" to the LRU thread.

    My prediction is that large LRU scan depths will have too much overhead because of the cases where LRU searches have O(N*N) overhead. It might be more efficient to change the page cleaner thread to run at most once 100ms rather than once per second, and then use smaller values for LRU scan depth.

    ReplyDelete
  4. Added results for innodb_lru_scan_depth=4k and 8k. The hack still does better than innodb_lru_scan_depth and nobody wants to spend time figuring out the write value for another my.cnf value.

    ReplyDelete
  5. A few notes on the above without any claims of superior understanding of the flushing parameters.

    In 5.1 as you mentioned the IO capacity was used, but at a certain point one went into furious flushing to avoid issues.

    In 5.6 IO capacity has been amended with a new variable Max IO capacity.

    From reading the code one can see that in each point where the server is busy, flushing from the flush list is controlled by either the dirty level in the buffer pool.

    Flushing to keep the dirty level in the buffer pool at a good level is never done at a higher rate than the IO capacity rate.

    The second reason for flushing is to avoid running out of log space. Here we have a curve which dependes on the fill level of log raised to 1.5. So we will exponentially increase the IO rate as we get closer to the end of the log. In addition we will multiply this rate by Max IO capacity / IO capacity. By default Max IO capacity = 2 * IO capacity. So by increasing Max IO capacity one can get closer to the 5.1 behaviour of furious flushing.

    Actually in 5.6 the IO capacity is controlling things more than it did in 5.1. So I would actually set IO capacity higher in a 5.6 server than I would in a 5.1 server. Personally I always set IO capacity to at least 2000 in 5.6 and this is even with a memory bound workload.

    ReplyDelete
  6. Mark,

    everything is relative ;-)
    from what I can see, the best max TPS on 8 and 16users
    is reached with LRU scan depth setting (4K and 8K)..

    may you try "orig5610+bp8+lru4k+hack" on your server?..
    this test result should be the most interesting imho :-)
    (while, as you see by yourself, it's not easy to "predict" here
    anything ahead)..

    while I agree that such a "tuning" should be auto-adapted in
    the code, and LRU flushing activity aligned with IO capacity
    settings..

    Rgds,
    -Dimitri

    ReplyDelete
  7. Mark,

    Did the only thing you changed in 5.6 was to let page_cleaner run without sleeping if the previous iteration has had work to do?

    Thanks for testing with lru_scan_depth. We obviously need to get this fixed.

    ReplyDelete
  8. Inaam - my hack to bring back furious flushing to the page cleaner thread is to skip sleep when the previous iteration did a lot of work. This was an easy hack, not sure if it is the real solution for me.

    I have a few concerns about the new behavior in 5.6. It is much more sensitive to getting the correct values for innodb_io_capacity and innodb_lru_scan_depth. The meaning of innodb_io_capacity has changed given that furious flushing isn't done anymore.

    Note that in 5.1 we didn't have to set innodb_io_capacity to the right value. Now we have to set it and innodb_lru_scan_depth to the right value. I think this is a step back. I hope that the right value only depends on HW and not on workload, because workload changes.

    Perhaps I just need to get used to larger values for innodb_io_capacity and innodb_lru_scan_depth. But I don't think that using large values for them with fast flash storage is a good idea. And by large I mean 30,000 or more. At that point I would prefer for the background IO loops to run more frequently and do less work on each iteration -- like running every 100ms rather than every second.

    Much more needs to be done to educate users about some of these changes. I filed a bug because I think the docs need to be updated - http://bugs.mysql.com/bug.php?id=68497

    ReplyDelete
  9. Mark,

    everything is relative ;-)
    from what I can see, the best max TPS on 8 and 16users
    is reached with LRU scan depth setting (4K and 8K)..

    may you try "orig5610+bp8+lru4k+hack" on your server?..
    this test result should be the most interesting imho :-)
    (while, as you see by yourself, it's not easy to "predict" here
    anything ahead)..

    while I agree that such a "tuning" should be auto-adapted in
    the code, and LRU flushing activity aligned with IO capacity
    settings..

    Rgds,
    -Dimitri

    ReplyDelete
  10. I think innodb_lru_scan_depth should be more dynamic. If there are many clean pages at the end of the LRU and threads have not been waiting for free pages then the page cleaner thread doesn't need to check innodb_lru_scan_depth pages per buffer pool instance.

    ReplyDelete
  11. Mark,

    I believe page_cleaner works that way. If free list has innodb_lru_scan_depth pages the page_cleaner won't do any scanning. Its mandate is to try and maintain innodb_lru_scan_depth free pages on the free list. Every second it tries to replenish the free list. Note that I coded it this way instead of leaving clean pages at the tail of LRU to avoid O(n*n) scanning of LRU.

    ReplyDelete
  12. Inaam - using source in launchpad, I see a check in buf_flush_LRU_list_batch that stops the search when free_len >= srv_LRU_scan_depth. My suggestion is to possibly stop the search sooner as I might need to set innodb_lru_scan_depth to a large value.

    ReplyDelete

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.