Please read the update as I was able to get much better throughput from MongoDB once I realized that the client host was saturated on the CPU.
I want to evaluate the performance claims. Several of the MongoDB versus the world benchmarks were run on Mac laptops. I love my Mac laptop, but I don't use it to confirm database server throughput.
My first test uses concurrent clients to determine the load at which the server saturates. I run this test in two modes: cached and not cached. No disk IO should be done during the cached test. Most requests should require disk IO during the uncached test. I am not sure that I will run the uncached test for MongoDB as it memory maps the database file and there is no way to limit the amount of memory it will use so the MongoDB uncached test requires a database much larger than RAM. That will take a long time to setup and I don't know whether many MongoDB deployments run with databases much larger than RAM.
The test is interesting to me because it has found problems that limit throughput on workloads that I care about. Other tests should be done after understand the results from this test. I suspect that for many people peak QPS on highly-concurrent workloads is not a priority.
Each client does a sequence of calls (SELECT or HANDLER for MySQL, find_one for MongoDB) where each calls fetches one row by specifying the value of an indexed column. The column has a primary key index for MySQL and used the Collections create_index method for MongoDB. The id value to fetch is randomly selected.
The code for the benchmark is a work in progress, so I have yet to publish it. I used Python, the MySQLdb driver for MySQL and the pymongo driver for MongoDB. The clients are separate processes forked via the Python multiprocessing module to avoid the penalty from the Python GIL. Some of the code used for MongoDB is listed at the end of this post.
I only report peak QPS at this time. The results ignore average and worst-case response time. It is not good to ignore them which is another reason why this might be a silly benchmark. Eventually the code will be updated to measure that.
The clients and DBMS (mysqld, mongod) ran on separate 16-core servers. The test was run for 1, 2, 4, 8, 16, 24, ..., 152, 160 concurrent clients. Tests were run for MyISAM and InnoDB for MySQL. The tests used MySQL 5.1.50 unmodified (5150orig) and 5.1.50 with the Facebook patch (5150fb). There are a few changes in the Facebook patch that make a huge difference for peak QPS. I expect official MySQL to have these changes soon. Peak QPS was much higher for MySQL than for MongoDB. In order of peak QPS:
- Peak QPS on MySQL 5.1.50 with the Facebook patch exceeds 100,000 and QPS increases to 128 concurrent clients.
- Peak QPS on MySQL 5.1.50 unmodified is between 70,000 and 80,000 and QPS increases to 80 concurrent clients.
- Peak QPS on MongoDB 1.7.0 is 40,000 and QPS increases to 64 concurrent clients.
MongoDB appears to saturate on mutex contention on the server at 64 concurrent clients. Alas, the binary is stripped and I cannot run PMP to determine where the problem occurs. My diagnosis is based on vmstat output.
This displays throughput MySQL 5.1.50 with the Facebook patch, MySQL 5.1.50 unmodified and MongoDB 1.7.0. All ran on a 16-core Nehalem server -- 16 cores with hyperthreading enabled.
This displays average and 98th percentile response time for MySQL 5.1.50 with the Facebook patch and MongoDB 1.7.0. Both ran on a 16-core Nehalem server -- 16 cores with hyperthreading enabled. Results are from the same test as used for the previous graph.
I repeated the tests using an 8-core Nehalem server. It is the same as the previous server except that hyperthreading was disabled. The graphs below display throughput. MySQL is able to handle more concurrent clients before saturating on the server.
The servers for the tests above were 1ms apart according to ping. I repeated the test for servers that have a 200us ping time but I don't want to inline another graph. It is linked in the next sentence. The QPS for get by primary key with an interesting and unexplained spike in QPS for MongoDB near 152 concurrent clients. Despite having a similar QPS at that point, MongoDB average response time was about 1.5X that for MySQL. This includes results for MongoDB versions 1.6.2 and 1.7.0.
The code to setup the collection for MongoDB is:
def setup_db(host, port, dbname, rows):
conn = pymongo.Connection(host, port)
conn.drop_database(dbname)
db = conn[dbname]
for x in xrange(0, rows):
sx = str(x)
lsx = len(sx)
db.c.save({'_id':x, 'k':x, 'c':sx+'x'*(120 - lsx), 'pad':sx+'y'*(120 - lsx)})
if x % 1000 == 0:
print '... row %d' % xAnd the code to query MongoDB:
def query_process(host, port, pipe_to_parent, requests_per, dbname, rows, check, id):
conn = pymongo.Connection(host, port)
db = conn[dbname]
gets = 0
while True:
for loop in xrange(0, requests_per):
target = random.randrange(0, rows)
o = db.c.find_one({'_id': target})
assert o['_id'] == target
if check:
assert o['k'] == target
sx = str(o['id'])
lsx = len(sx)
assert o['c'] == sx+'x'*(120-lsx)
assert o['pad'] == sx+'y'*(120-lsx)
gets += 1
if pipe_to_parent.poll():
msg = pipe_to_parent.recv()
print 'Received: %s' % msg
pipe_to_parent.send({'gets' : gets})
pipe_to_parent.close()
returnThe my.cnf settings for MySQL 5.1:
plugin-load=innodb=ha_innodb_plugin.so;innodb_trx=ha_innodb_plugin.so;innodb_locks=ha_innodb_plugin.so;innodb_lock_waits=ha_innodb_plugin.so;innodb_cmp=ha_innodb_plugin.so;innodb_cmp_reset=ha_innodb_plugin.so;innodb_cmpmem=ha_innodb_plugin.so;innodb_cmpmem_reset=ha_innodb_plugin.so
innodb_buffer_pool_size=2000M
innodb_log_file_size=100M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=1
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0
innodb_file_format=barracuda
innodb_file_per_table
max_connections=2000
table_cache=2000
key_buffer_size=2000M






Nice to see some appropriately tested methods instead of macbook-centric tests. Looking forward to seeing your benchmarking code when it's completed.
ReplyDeleteThis might be a silly benchmark, but of all the benchmarks of MongoDB I've seen, it's the least silly.
ReplyDeleteYes, I do not find it especially silly: at least you are comparing apples to apples here. A comparison based on write operations would not be so meaningful IMO: MongoDB uses coarse read-write locking (i.e. only one client can write at a given point in time), but does not acknowledge the operation (no OP_REPLY packet in the protocol for update/insert/delete).
ReplyDeleteHowever, MongoDB, as a document oriented database, would probably shine in read benchmarks involving several tables with 1-n relationships (rather than a simple key/values retrieval). With MySQL (or any other RDBMS) it would involve several queries (or query with joins), while with MongoDB it still costs only a single lookup (provided all the data are stored in a single object).
Mark, interesting results!
ReplyDeletewe/re to doing benchmarks on a Mac: I've seen other cases were the Mac results were very much off as compared to other platforms, for instance, look at this post by Mark Atwood on MySQL UDFs:
http://fallenpegasus.livejournal.com/707050.html
kind regards,
Roland Bouman
If you're querying on the primary key in mysql - you should use "_id" instead of "id" for MongoDB to make it fair.
ReplyDeleteYes, I think that MongoDB would perform much better were I to test document processing, especially something that was harder to model in SQL. I might get there but I have a few more basic tests to run.
ReplyDeleteRoland - I remember a similar result for one of the PBXT perf tests. Results on OSX were very different than on Linux.
ReplyDeleteSammy - yes, that would be better. I just read more of the docs to confirm how to do that and I will repeat my tests.
ReplyDeleteThe results have been updated to use _id for MongoDB. The results are better with that change. I will report on fetch by secondary key in another post.
ReplyDeleteAre you doing a "select *" in mysql or only pulling back some columns?
ReplyDeleteI've gotten much better numbers on some benchmarks - so curious why the difference.
These use "select *"
ReplyDeleteYou may want to consider trying mongo with java or c.
ReplyDeleteI just copied yours in java and got 120k/sec on a core i7.
I've heard before the python mongo driver is a bit of a dog, not sure why
In what context do you get that much? Client and server on same host? On different hosts? What is the network latency between them?
ReplyDeleteI know that more efficient client code can drive MySQL faster and I imagine the same is true for Mongo.
The peak QPS I get from Python clients to MySQL is almost 120,000 using MySQL 5.1 and the Facebook patch. Using sysbench for the clients rather than Python the peak is about 180,000.
Another significant component to peak QPS is network latency between client and server. Ping round-trip time is about 1ms in my setup. But both Mongo and MySQL suffer the same from that. I need to co-locate my client and server in the same rack to get something better.
Alas, I will repeat a few tests between two hosts that have less network latency.
Different hosts with .1 ms ping time
ReplyDeleteMy results are with mongodb 1.7.0. I will try the stable release next -- 1.6.2.
ReplyDeleteI just found client-server hosts that have the same 0.1ms ping time and will repeat tests there. A quick test shows that peak QPS is not changed much but the peak is reached with fewer clients and the server suffers severe mutex contention (based on vmstat) at saturation.
I asked my contact at 10gen to provide a non-stripped mongod binary so I can provide more details on the source of contention.
Updated with results from hosts that have a 200us ping time. The peak QPS results are similar but the servers saturate sooner.
ReplyDeleteIn summary from this post is it fair to say that-INNODB is faster, handles higher concurrency and saves your data. FB-INNODB is even better. Sounds like innodb is winner for my book.
ReplyDeleteRange queries / joins would be interesting to compare which @Didier Spezia indicated.
Dathan - I think that is fair if your primary concern is high-throughput OLTP, although this benchmark is trivial and I have yet to publish anything for a write workload. But MongoDB has many things going for it including flexible schemas, better support for document-oriented processing and what appears to be a more manageable implementation of replication.
ReplyDeleteI am not anti-mongodb, although I don't enjoy the "MongoDB is fast" mantra when that implies that MongoDB is universally faster than MySQL.
Updated to include results for 1.6.2 and 1.7.0 -- see
ReplyDeletehttps://spreadsheets.google.com/oimg?key=0AteR_jot1VDGdDVYWG4yaG51bW1lamZObFlzMHVCUkE&oid=19&zx=dl0e7oobpaky
Always great input with benchmarks, but what I miss is benchmarks of multi-server set ups. Especially since this seems to be the focus of mongodb with auto sharding and all. Iirc it's advertised that a single server set up with mongodb isn't even preferable.
ReplyDeleteBut more generally speaking...
As some comments above has indicated, both mysql and mongodb isn't designed to solve all problems; definitively not the same ones.
My two cents to anyone in the decision/advise chair: get to know the problem you want to solve before you even think of the word benchmarking. Based on that, design the benchmarks for the different db softwares you have in mind.
Just for the love of god don't make the benchmark code identical just to make them «fair». A lot of the alternatives out there is designed differently, thus should be used differently.
Those are all excellent points. I will add a few more:
ReplyDelete* hire or rent experts during the evaluation -- if you don't have MySQL or MongoDB expertise, then bring in consultants who do
* hire someone with expertise in physical db design if you have problems with performance, start by reading my post on "index only"
* hire someone with expertise on system performance
I suspect that absolute performance is overrated in terms of priorities for most deployments. Predictable performance and cost of operating are probably more important.
With respect to auto-sharding and multi-server setups, I have not read of many large MongoDB deployments that require much from that yet. I suspect that MongoDB deployments will have to be sharded earlier than MySQL deployments because of their reliance on mmap but that remains to be measured.
But for some of us it is important to know the point at which a server falls over as we encounter these problems in production and must prepare for them.
The tests here are necessary for me (maybe not for you) but far from sufficient for testing a DBMS.
from what I have heard python mongo driver is still slow
ReplyDeleteThe MongoDB python driver, pymongo, uses more CPU than the MySQL python driver, MySQLdb. That skews the results from this benchmark. But I never claimed the benchmark was perfect. I much prefer writing this in Python than in Java so I will just have to live with the results. As soon as I add tests that do more complex processing on the server then the impact from the Python overhead wil be less important.
ReplyDeleteCorrection! The extra CPU overhead from pymongo limited throughput. Details at http://mysqlha.blogspot.com/2010/09/oh-no-mongodb-can-be-fast-for-key-value.html. After correcting for this I was able to get more than 155,000 QPS from MongoDB for fetch by primary key. Wow.
ReplyDeletehey mark, since your testing innodb why do you have the key_buffer_size set to 2GB. Is the test producing temp tables?
ReplyDeleteNo, but sometimes I test MyISAM
ReplyDeleteHow can I turn off MongoDB cache?
ReplyDeleteYou set doublewrite to 1 : MySQL is doing twice operations instead of one. This is not fair :O
ReplyDeletePlease, retry with some of these parameters :
# Binary logging
# ------------------------------------------------------------------------------
#log_bin = /usr/local/mysql/data/mysql-bin
#server-id = SkeetMeet-1
#binlog_format = ROW
#expire_logs_days = 14
#sync_binlog = 1
# Core
# ------------------------------------------------------------------------------
#sort_buffer_size = 64M
tmp_table_size = 1000M
max_heap_table_size = 1000M
query_cache_type = 1
query_cache_size = 50M
max_connections = 1000
back_log = 10000
thread_cache_size = 50
open_files_limit = 65535
table_definition_cache = 1024
table_open_cache = 2048
wait_timeout = 30
connect_timeout = 5
max_allowed_packet = 16M
max_connect_errors = 1000000
transaction-isolation = READ-COMMITTED
# MyISAM
# ------------------------------------------------------------------------------
key_buffer_size = 32M
myisam_recover = FORCE,BACKUP
# InnoDB (http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html)
# ------------------------------------------------------------------------------
innodb_strict_mode = 1
innodb_file_format = Barracuda
innodb_compression_level = 6
innodb_flush_method = O_DIRECT
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_files_in_group = 2
innodb_log_file_size = 256M
innodb_io_capacity = 250
innodb_read_io_threads = 2
innodb_write_io_threads = 2
innodb_doublewrite = 0
innodb_read_ahead = 0
innodb_max_dirty_pages_pct = 80
innodb_lock_wait_timeout = 30
#innodb_ibuf_max_pct_of_buffer = 10
#innodb_ibuf_flush_pct = 40
innodb_support_xa = 0 # Slave:0, Master:1
innodb_flush_log_at_trx_commit = 2 # Slave:2, Master:1
innodb_file_per_table = 1
innodb_buffer_pool_size = 4G
innodb_buffer_pool_instances = 1
innodb_thread_concurrency = 0 # 0-1000, 0:default (no limit)
innodb_autoinc_lock_mode = 2 # 0, 1:default, 2:parralelize requests, but breaks statement-based replication, because order is non-deterministic (binary replication is ok)
innodb_sync_spin_loops = 200
innodb_spin_wait_delay = 0
I ran across this blog post while searching for something else, and I think you are perhaps guilty of selectively choosing indicators that benefit MySQL versus Mongo.
ReplyDeleteIt may well be the case that MySQL can handle a greater number of concurrent users (I don't know, and don't care to try to replicate your tests), but Mongo is the faster database by far - particularly with large data sets.
I have done quite a bit of development with Mongo, and I can testify that for applications accessing large data sets Mongo runs approximately 1000x faster once you get millions of rows of data or you have to join large tables.
Your graphs make it appear that MySQL is a faster database. That's just not correct. But unfortunately, people are going to confuse concurrent users tests with speed.
Just to put a fine point on this. I have a procedure I was running in MySQL that took 20 minutes to complete. That was too long.
ReplyDeleteI re-wrote it to use Mongo, and now it completes in 2 seconds.
It appears to me that you are a mysql partisan who is choosing a highly-specific indicator that in a very limited scope shows MySQL to be a faster database. Whereas in the vast majority of real-world application scenarios, it is Mongo that is the faster database.
I have done a bit of support for MySQL and I can testify that letting my local db-perf team do their magic makes things bazillion times faster. Maybe it is faster for you, or maybe you are better at Mongo than MySQL.
ReplyDeletehttp://dom.as/2012/06/26/memsql-rage/