I tested 5 configurations:
- inno.5150fb.b0.s0 - MySQL 5.1.50, the Facebook patch, binlog disabled
- inno.5150fb.b1.s0 - MySQL 5.1.50, the Facebook patch, binlog enabled, sync_binlog=0
- inno.5150fb.b1.s1 - MySQL 5.1.50, the Facebook patch, binlog enabled, sync_binlog=1
- mongo.170.safe - MongoDB 1.7.0 and safe updates
- mongo.170.unsafe - MongoDB 1.7.0 and unsafe updates
Note that with unsafe updates the client does not wait for the server to respond. It sends requests as fast as it can or until the buffer between the client and server is full. When a sufficient number of concurrent clients are used and the clients run for enough time the buffer becomes full and unsafe updates do not improve performance.
InnoDB has much better throughput when the binlog is disabled. The clients update rows selected at random from 2M rows. InnoDB is able to handle more of that load concurrently than MongoDB as a reader-writer lock is used that prevents concurrency within the database. Note that while InnoDB allows more of the update to be done concurrently, it isn't perfect as there are several global mutexes in MySQL/InnoDB including LOCK_open, kernel_mutex and the InnoDB buffer pool mutex. When the binlog is enabled by sync_binlog is disabled even more global mutexes are used. Finally, when the binlog is enabled and sync_binlog=1 then group commit is not enabled and updates are rate limited by the performance of fsync. In this case fsync is fast as a HW RAID card with battery backed cache was used.
This is output from PMP that demonstrates one source of mutex contention in mongod. The pileup occurs at mongo::MongoMutex which apprently implements the reader-writer lock.
48 mongo::connThread,thread_proxy,start_thread,clone
37 pthread_cond_wait@@GLIBC_2.3.2,boost::condition_variable::wait,mongo::MongoMutex::lock,mongo::receivedUpdate,mongo::assembleResponse
10 recv,mongo::MessagingPort::recv,mongo::MessagingPort::recv
2
1 select,mongo::Listener::initAndListen,mongo::listen,mongo::_initAndListen,mongo::initAndListen,main,select
1 select,mongo::Listener::initAndListen
1 pthread_cond_wait@@GLIBC_2.3.2,mongo::FileAllocator::Runner::operator(),thread_proxy,start_thread,clone
1 nanosleep,mongo::DataFileSync::run,mongo::BackgroundJob::thr,thread_proxy,start_thread,clone
1 nanosleep,mongo::ClientCursorMonitor::run,mongo::BackgroundJob::thr,thread_proxy,start_thread,clone
1 nanosleep
1 mongo::webServerThread,thread_proxy,start_thread,clone
1 mongo::SnapshotThread::run,mongo::BackgroundJob::thr,thread_proxy,start_thread,clone
1 mongo::interruptThread,thread_proxy,start_thread,clone
1 mongo::BtreeBucket::findSingle,mongo::ModSetState::createNewFromMods,mongo::_updateObjects,mongo::updateObjects,mongo::receivedUpdate,mongo::assembleResponse
Source code for MongoDB queries is listed below. The code to setup MongoDB and MySQL is described in the previous posts.
def query_mongo(host, port, pipe_to_parent, requests_per, dbname, rows, check, testname, worst_n, id):
conn = pymongo.Connection(host, port)
db = conn[dbname]
signal.signal(signal.SIGTERM, sigterm_handler)
gets = 0
stats = SummaryStats(worst_n)
while True:
for loop in xrange(0, requests_per):
target = random.randrange(0, rows)
s = time.time()
try:
r = db.c.update({'_id': target}, {'$inc': {'k': 1 }}, safe=True)
assert r['updatedExisting'] == True
assert r['ok'] == 1
stats.update(s)
gets += 1
except:
assert got_sigterm
The my.cnf settings for MySQL except for the values of log_bin and sync_binlog:
innodb_buffer_pool_size=2000M
innodb_log_file_size=100M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=1
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0
innodb_max_dirty_pages_pct=80
innodb_file_format=barracuda
innodb_file_per_table
innodb_deadlock_detect=0
max_connections=2000
table_cache=2000
key_buffer_size=2000M
innodb_doublewrite=0



web scale!
ReplyDeleteThanks for the great benchmarks.
ReplyDeleteNow that you've done select and update, would be great to complete the trifecta and do an insert benchmark.
In your my.cnf you have both:
ReplyDeleteinnodb_doublewrite=1
and
innodb_doublewrite=0
Can you explain?
Normally I test with innodb_doublewrite=1 as that is what most of us use in production. As this is a silly benchmark and I was already testing various levels of durability I disabled the doublewrite buffer. It probably didn't make a huge difference for this test.
ReplyDeleteEventually I will run the insert benchmark for mongodb and mysqld. See http://mysqlha.blogspot.com/2008/12/innodb-insert-performance.html for more details on that.
One day I might even test something that resembles a real application. But I will get there one step at a time.
Your results show an almost 90% drop in MySQL/facebook performance when binlog & sync_binlog are turned on.
ReplyDeleteIn another post (http://www.facebook.com/note.php?note_id=438641125932) you mentioned that facebook patch has a solution to fix the group commit bug.
Is that group commit bug fix included in this benchmark?
I'm trying to figure out if the 90% drop in performance observed in this benchmark is due to the group commit bug, or if the group commit bug has already been "fixed" but even after the fix there's still a 90% drop in performance.
The new behavior (group commit) was off for these tests. Comparing InnoDB and MongoDB is apples and oranges -- InnoDB is crash safe.
ReplyDeleteI wasn't comparing InnoDB & MongoDB. I was comparing InnoDB with binlog disabled and InnoDB with sync_binlog=1.
ReplyDeleteFrom your graph, InnoDB with binlog disabled peaked at just under 50K updates/sec. With sync_binlog=1 the performance dropped to about 5K updates/sec. That's an almost 90% drop.
What kind of results would you get using the group commit fix of the facebook patch? Would the performance stay at almost 50K updates/sec even with sync_binlog=1? Or would there still be some performance drop but it's less than 90%?