I want to hear from people who test this with real workloads on servers with 8+ cores or who do any type of testing on platforms other than Linux/x86. The patch for MySQL 5.0 is at code.google.com.
The change is to replace the mutex_t used in the InnoDB rw_lock_struct with a pthread_mutex_t. Calls to lock, unlock, create and destroy rw_lock_struct::mutex in sync0rw.c must also be updated.
InnoDB implements a mutex (mutex_t) and a read-write lock (rw_lock_struct). Both of these spin when a lock cannot be granted. On my platforms, the code spins for about 4 microseconds and then the thread waits on a condition variable. rw_lock_struct uses mutex_t to protect its internal state. I think that InnoDB is faster on SMP when pthread_mutex_t is used in place of mutex_t for rw_lock_struct::mutex. The following describes the overhead from the use of the InnoDB mutex when there is contention. A thread that must sleep waiting for a lock does:
- spin for a few microseconds trying to get the lock
- reserve a slot in the sync array (one lock/unlock of the global sync array pthread_mutex_t)
- reset an event (lock/unlock the event pthread_mutex_t)
- wait on the event (lock/unlock the global sync array pthread_mutex_t, lock the event pthread_mutex_t, wait on a pthread_cond_t)
Of course, you shouldn't take my word for it so I will provide a few results. These were measured on an 8-core x86 server that used Linux 2.6. Three mysqld binaries were tested:
- base - MySQL 5.0.37 and the Google patch excluding the smpfix changes
- smpfix+tcmalloc - MySQL 5.0.37 and the Google patch including the smpfix changes and linked with tcmalloc
- pthread_mutex - base with rw_lock_struct::mutex changed to use pthread_mutex_t
Results for sysbench --test=oltp --oltp-read-write. This displays transactions per second for sysbench run with 1, 2, 4, 8, 16, 32 and 64 concurrent connections.
Results for concurrent queries. Each query is a primary key - foreign key join between tables that each have 2M rows. Too long means it ran for 10s of minutes and I killed it. This displays the time in seconds to complete the query for 1, 2, 4, 8 and 16 concurrent users.
| Binary | 1 user | 2 users | 4 users | 8 users | 16 users |
| base | 2.6 | 3.9 | 8.1 | 182.5 | Too long |
| smpfix+tcmalloc | 2.6 | 3.7 | 4.9 | 7.6 | 15.2 |
| pthread_mutex | 2.5 | 3.7 | 9.1 | 27.8 | 58.6 |
Results for concurrent inserts. Each user does a sequence of insert statements to a different table. Too long means it ran for 10s of minutes and I killed it. This displays the time in seconds to complete the inserts for 1, 2, 4, 8 and 16 concurrent users.
| Binary | 1 user | 2 users | 4 users | 8 users | 16 users |
| base | 15.5 | 32.4 | 78.2 | Too long | Too long |
| smpfix+tcmalloc | 12.6 | 21.5 | 40.5 | 112.4 | 232.9 |
| pthread_mutex | 13.5 | 23.8 | 76.0 | 378.7 | Too long |


Interesting...just wondering - how is "throughput" measured? ... please add unity to x-axis...
ReplyDeleteRoland
Throughput for sysbench is transactions per second. The concurrent join and insert tests report the number of seconds to complete the SQL statements.
ReplyDeleteThe graphics show reduced throughput with pthread_mutex_t. If the numbers in the tables are seconds, then they also show that using pthread_mutex_t makes it slower. Am I reading the data incorrectly? If not, what is the point?
ReplyDeleteThe graphics show that sysbench throughput for pthread_mutex_t almost matches that for smpfix+tcmalloc while performance for the base case drops dramatically at 8 concurrent users.
ReplyDeleteMark,
ReplyDeleteIs there any reason that smpfix+tcmalloc+pthread_mutex_t were not tested together?
smpfix and pthread_mutex_t are mutually exclusive. pthread_mutex_t is an attempt to get some of the benefit provided by smpfix with a much simpler code change.
ReplyDelete"Mark Callaghan showed us how to make MySQL faster in one hour. Nice stuff. And real purty charts, too."
ReplyDeleteLog Buffer #128
great post
ReplyDeleteHow to make MySql much faster in 3 seconds: change InnoDb with MyISAM :))
ReplyDelete