Friday, February 4, 2011

So long kernel mutex

If you are willing to look there are a few good changes in trunk for InnoDB. A frequent source of mutex contention in InnoDB, kernel_mutex, has been replaced with a rw-lock for the transaction system (see trx_sys_struct), two mutexes in srv_sys_struct and possibly other mutexes and rw-locks. The dulint struct has been replaced with a native 8-byte int. All of these changes should make InnoDB more efficient for workloads with a large number of concurrent transactions and help with bug 49169.

I wish there were a better to track changes in InnoDB. Right now my tools are luck and recursive diff.

10 comments:

  1. Mark,

    that's great to post here - at least it's possible to comment without having a Facebook account :-))

    regarding kernel mutex: resolving its contention will open many other.. - so this work is only in its beginning, and will need a detailed analyze to get a performance improvement (personally I don't expect too much improvement just by removing contention on the kernel mutex..)

    then regarding your test workload: you're mainly doing your testing with sysbench, but then when you're finding improvement on sysbench, is this true then on the Facebook workload too?..

    Rgds,
    -Dimitri

    ReplyDelete
  2. Mutex contention on production workloads has been an intermittent problem for us so it is harder to quantify.

    The feature I want is the ability to have 10,000+ mostly idle connections and maintain constant throughput under load when there are occasional spikes in concurrent queries. Right now MySQL throughput collapses in that case, as will every other RDBMS. While a proxy or connection pool is the standard solution, I want protection in the RDBMS.

    Part of the problem is in InnoDB. The hot spots are enforcement of innodb_thread_concurrency and transaction start. My team will probably work on a solution.

    ReplyDelete
  3. I'm not sure what exactly you have in mind when you say "transaction start". Does it include the malloc and init too? or just the assigning of the id and putting the trx instance on the trx_list (trx_start_low()) ?

    ReplyDelete
  4. By "transaction start" I mean read_view_open_now. The overhead there grows with the number of open transactions. It would be nice to change it not to malloc while holding kernel_mutex. Iterating over the list isn't anything that will be easy to change (or even possible to change).

    ReplyDelete
  5. Right, the read view create code is a pain point, getting rid of the malloc is something that I've thought about but haven't spent too many cycles on it. Wanted the kernel mutex split code to be correct first.

    With the new 5.6 code the only requirement is to hold an S lock on trx_sys_t::lock. This should mitigate the problem.

    ReplyDelete
  6. Hi Mark and Sunny,

    I've been trying to track down a performance issue which looks related to locking on the enterprise MySQL with InnoDB. At high (data-altering) transaction concurrency we see a pretty much dead-on 60s periodicity of activity in the database. From the shape of the response, it looks very like a queued mutex, with a load of sharelocks that drain (as the transactions process), and queue-up behind the mutex. The mutex-ed operation happens very fast, and transactions start being able to operate concurrently again.

    Do you think these changes in where you're taking locks and what exactly you're locking might help us with this situation - and is this periodicity something you've also seen, or is otherwise a known problem - I've been looking as much as I can into what might be causing it and this post and your linked bug is about the closest I have come. Interestingly, we see the highest number of mutex os waits for srv0srv.c:938 (which is srv_free() as best I can tell). Would this change fix that?

    Cheers

    MBM

    ReplyDelete
  7. I don't think it is srv_free. It might be kernel_mutex as the only mutex_create call in srv0srv.c is for kernel_mutex. PMP (poormansprofiler.org) output would help. So would a summary of your problem by Percona.

    ReplyDelete
  8. Aha, I hadn't seen that, useful info for further digging - I was resigned to spending an afternoon trying to dig into the InnoDB source code.

    Thanks for your help. I hope it isn't really 3:37AM where you are :-)

    This problem has been stumping me for 3 days or so, so I'm really up for tracking it down.

    Cheers

    MBM

    ReplyDelete
  9. It is that late. Obama flew in and out of SFO today and that delayed my flight and thousands of others by 2 to 3 hours. He had dinner with Silicon Valley tech leaders (and big donors) and made a few thousand little donors and voters unhappy.

    ReplyDelete
  10. Argh! Sleep well, then. Looking at a more recent version of the innodb plugin code than I found on the web, 938 is indeed a kernel mutex. Interestingly, though, in srv0srv.c there's the stats collector which wakes up every 60s - more investigation needed :-)

    Cheers

    MBM

    ReplyDelete

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.