Sunday, January 18, 2009

Double sysbench throughput with TCMalloc

TCMalloc can make MySQL much faster (2X) for some workloads. I ran the sysbench OLTP readonly test on servers with 4, 8 and 16 cores to determine the benefit of using TCMalloc with unmodified MySQL 5.0.75. Malloc is a bottleneck for some workloads with MySQL because:
  • The overhead to parse, optimize and setup query execution structures is significant for fast statements. LOCK_open and mutexes used for access control have a lot of contention.
  • The HEAP engine has a global mutex held when malloc is called and HEAP tables are used to process ORDER BY and GROUP BY clauses. This should be easy to fix.
A key point is that InnoDB is far from the only problem in MySQL for SMP servers. While alternatives are great, fixing InnoDB or replacing it with Maria, Falcon and PBXT won't fix all of the problems

And now, a few claims that are true for workloads similar to the sysbench readonly OLTP test.
  • At high concurrency, tcmalloc doubles throughput.
  • InnoDB is much faster than MyISAM and the advantage grows with the number of CPU cores.
  • innodb_thread_concurrency=4 has a cost. It helps for CPU bound workloads dominated by long running statements. In other cases, it can hurt performance.
All of the data used for the graphs below is available. Unmodified MySQL 5.0.75 was used with and without tcmalloc. The test is sysbench OLTP readonly. Malloc is a bottleneck for sysbench OLTP readonly, so TCMalloc makes a big difference. For other workloads there are other bottlenecks and other changes are needed (such as the faster rw-mutex patch and patches to reduce other mutex contention in InnoDB).

Is this motivation to buy a big SMP?

TCMalloc makes InnoDB much faster at 8+ concurrent users:

TCMalloc makes MyISAM much faster at 8+ concurrent users:
InnoDB is much faster than MyISAM on a 16-core server at 8+ concurrent users:

InnoDB is much faster than MyISAM on an 8-core server at 8+ concurrent users:

For a 16-core server, innodb_thread_concurrency avoids both worst and best throughput:

11 comments:

  1. Hm... it would be interesting to see how this plays out with in-memory InnoDB workloads.

    I might try to re-rerun our benchmarks again.....

    ReplyDelete
  2. That's pretty impressive. I understand there are other bottle necks for other workloads, but If you were to combine this with all of the other improvements that Google and Percona have made to remove the other bottle necks in innodb/mysql, what would the results look like? Would tmalloc still provide a 2X speedup in those situations after some of the other bottle necks ere removed?

    If nothing else this is a great demo for tmalloc. Thanks for sharing all of your work.

    ReplyDelete
  3. Could someone post what the steps are to include tcmalloc in a MySQL install? Thanks alot.

    ReplyDelete
  4. For sysbench OLTP readonly, this is the largest obvious bottleneck. The next thing to fix is the mutex contention on LOCK_open and the access control mutex. The faster rw-mutex change doesn't make this much faster.

    For sysbench OLTP readwrite, there are a few fixes in progress that have yet to be published. I will try to publish more numbers for this but a lot of work remains to be done to make InnoDB faster on readwrite workloads.

    For workloads with concurrent long-running queries using InnoDB, the faster rw-mutex patch is needed.

    ReplyDelete
  5. Percona has published several changes to make readwrite InnoDB workloads faster and has more work in progress. Some of their changes overlap with changes in the Google patch, but many do not. So follow their blogs.

    ReplyDelete
  6. Steps 3 and beyond might not be the proper way to do it, but it worked for me.

    1) Build or install google perftools
    2) Build MySQL
    3) cd $MYSQL_ROOT/sql
    4) edit the line that starts with 'LDFLAGS = ' in Makefile by adding '-L$PERFTOOLS_ROOT'.
    5) edit the line that starts with 'LIBS = ' in Makefile by adding '-ltcmalloc_minimal'
    6) rm mysqld; make mysqld
    7) confirm -- nm mysqld | grep -i tcmalloc

    ReplyDelete
  7. Why not use ICC as a compiler? It should accelerate even more.

    ReplyDelete
  8. For me the assumed benefit isn't worth the cost. I know from years of usage that gcc works for my environment and there is an internal team that supports it. I don't even use -O3 when building -- I use -O2.

    ReplyDelete
  9. Any recommedation for tools to automatic migrating a MySQL 5.0 database to PostgreSQL 8.3. Thanks.

    ReplyDelete
  10. Hire good consultants. Upgrading any DBMS is hard.

    ReplyDelete
  11. Did you try jemalloc ( http://www.canonware.com/jemalloc/ ) ?

    and a recent glibc (>=2.10) ?
    see malloc scalability: http://udrepper.livejournal.com/20948.html

    it would be interesting see new tests on a recent distribution(Fedora13, ...)

    -thanks-

    ReplyDelete

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.