Saturday, September 11, 2010

MySQL versus MongoDB - yet another silly benchmark

This is yet another silly benchmark because the results are likely to be misused. The results probably do not matter to you. I like MongoDB. It has many things in common with MySQL (ease of use, rapid iteration, give customers what they want). It would be more interesting to me were it to use embedded InnoDB as the backend.

Please read the update as I was able to get much better throughput from MongoDB once I realized that the client host was saturated on the CPU.

I want to evaluate the performance claims. Several of the MongoDB versus the world benchmarks were run on Mac laptops. I love my Mac laptop, but I don't use it to confirm database server throughput.

My first test uses concurrent clients to determine the load at which the server saturates. I run this test in two modes: cached and not cached. No disk IO should be done during the cached test. Most requests should require disk IO during the uncached test. I am not sure that I will run the uncached test for MongoDB as it memory maps the database file and there is no way to limit the amount of memory it will use so the MongoDB uncached test requires a database much larger than RAM. That will take a long time to setup and I don't know whether many MongoDB deployments run with databases much larger than RAM.

The test is interesting to me because it has found problems that limit throughput on workloads that I care about. Other tests should be done after understand the results from this test. I suspect that for many people peak QPS on highly-concurrent workloads is not a priority.

Each client does a sequence of calls (SELECT or HANDLER for MySQL, find_one for MongoDB) where each calls fetches one row by specifying the value of an indexed column. The column has a primary key index for MySQL and used the Collections create_index method for MongoDB. The id value to fetch is randomly selected.

The code for the benchmark is a work in progress, so I have yet to publish it. I used Python, the MySQLdb driver for MySQL and the pymongo driver for MongoDB. The clients are separate processes forked via the Python multiprocessing module to avoid the penalty from the Python GIL. Some of the code used for MongoDB is listed at the end of this post.

I only report peak QPS at this time. The results ignore average and worst-case response time. It is not good to ignore them which is another reason why this might be a silly benchmark. Eventually the code will be updated to measure that.

The clients and DBMS (mysqld, mongod) ran on separate 16-core servers. The test was run for 1, 2, 4, 8, 16, 24, ..., 152, 160 concurrent clients. Tests were run for MyISAM and InnoDB for MySQL. The tests used MySQL 5.1.50 unmodified (5150orig) and 5.1.50 with the Facebook patch (5150fb). There are a few changes in the Facebook patch that make a huge difference for peak QPS. I expect official MySQL to have these changes soon. Peak QPS was much higher for MySQL than for MongoDB. In order of peak QPS:
  1. Peak QPS on MySQL 5.1.50 with the Facebook patch exceeds 100,000 and QPS increases to 128 concurrent clients.
  2. Peak QPS on MySQL 5.1.50 unmodified is between 70,000 and 80,000 and QPS increases to 80 concurrent clients.
  3. Peak QPS on MongoDB 1.7.0 is 40,000 and QPS increases to 64 concurrent clients.
MongoDB appears to saturate on mutex contention on the server at 64 concurrent clients. Alas, the binary is stripped and I cannot run PMP to determine where the problem occurs. My diagnosis is based on vmstat output.

This displays throughput MySQL 5.1.50 with the Facebook patch, MySQL 5.1.50 unmodified and MongoDB 1.7.0. All ran on a 16-core Nehalem server -- 16 cores with hyperthreading enabled.

This displays average and 98th percentile response time for MySQL 5.1.50 with the Facebook patch and MongoDB 1.7.0. Both ran on a 16-core Nehalem server -- 16 cores with hyperthreading enabled. Results are from the same test as used for the previous graph.
I repeated the tests using an 8-core Nehalem server. It is the same as the previous server except that hyperthreading was disabled. The graphs below display throughput. MySQL is able to handle more concurrent clients before saturating on the server.

This is the response time graph for the 8-core server.
The servers for the tests above were 1ms apart according to ping. I repeated the test for servers that have a 200us ping time but I don't want to inline another graph. It is linked in the next sentence. The QPS for get by primary key with an interesting and unexplained spike in QPS for MongoDB near 152 concurrent clients. Despite having a similar QPS at that point, MongoDB average response time was about 1.5X that for MySQL. This includes results for MongoDB versions 1.6.2 and 1.7.0.

The code to setup the collection for MongoDB is:
def setup_db(host, port, dbname, rows):
  conn = pymongo.Connection(host, port)
  conn.drop_database(dbname)
  db = conn[dbname]  
  for x in xrange(0, rows):
    sx = str(x)
    lsx = len(sx)
    db.c.save({'_id':x, 'k':x, 'c':sx+'x'*(120 - lsx), 'pad':sx+'y'*(120 - lsx)})
    if x % 1000 == 0:
      print '... row %d' % x
And the code to query MongoDB:
def query_process(host, port, pipe_to_parent, requests_per, dbname, rows, check, id):
  conn = pymongo.Connection(host, port)
  db = conn[dbname]
  gets = 0
  while True:
    for loop in xrange(0, requests_per):
      target = random.randrange(0, rows)
      o = db.c.find_one({'_id': target})
      assert o['_id'] == target
      if check:
        assert o['k'] == target
        sx = str(o['id'])
        lsx = len(sx)
        assert o['c'] == sx+'x'*(120-lsx)
        assert o['pad'] == sx+'y'*(120-lsx)

      gets += 1

    if pipe_to_parent.poll():
      msg = pipe_to_parent.recv()
      print 'Received: %s' % msg
      pipe_to_parent.send({'gets' : gets})
      pipe_to_parent.close()
      return
The my.cnf settings for MySQL 5.1:
plugin-load=innodb=ha_innodb_plugin.so;innodb_trx=ha_innodb_plugin.so;innodb_locks=ha_innodb_plugin.so;innodb_lock_waits=ha_innodb_plugin.so;innodb_cmp=ha_innodb_plugin.so;innodb_cmp_reset=ha_innodb_plugin.so;innodb_cmpmem=ha_innodb_plugin.so;innodb_cmpmem_reset=ha_innodb_plugin.so
innodb_buffer_pool_size=2000M
innodb_log_file_size=100M
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=1
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=0

innodb_file_format=barracuda
innodb_file_per_table

max_connections=2000
table_cache=2000
key_buffer_size=2000M

31 comments:

  1. Nice to see some appropriately tested methods instead of macbook-centric tests. Looking forward to seeing your benchmarking code when it's completed.

    ReplyDelete
  2. This might be a silly benchmark, but of all the benchmarks of MongoDB I've seen, it's the least silly.

    ReplyDelete
  3. Yes, I do not find it especially silly: at least you are comparing apples to apples here. A comparison based on write operations would not be so meaningful IMO: MongoDB uses coarse read-write locking (i.e. only one client can write at a given point in time), but does not acknowledge the operation (no OP_REPLY packet in the protocol for update/insert/delete).

    However, MongoDB, as a document oriented database, would probably shine in read benchmarks involving several tables with 1-n relationships (rather than a simple key/values retrieval). With MySQL (or any other RDBMS) it would involve several queries (or query with joins), while with MongoDB it still costs only a single lookup (provided all the data are stored in a single object).

    ReplyDelete
  4. Mark, interesting results!

    we/re to doing benchmarks on a Mac: I've seen other cases were the Mac results were very much off as compared to other platforms, for instance, look at this post by Mark Atwood on MySQL UDFs:

    http://fallenpegasus.livejournal.com/707050.html

    kind regards,

    Roland Bouman

    ReplyDelete
  5. If you're querying on the primary key in mysql - you should use "_id" instead of "id" for MongoDB to make it fair.

    ReplyDelete
  6. Yes, I think that MongoDB would perform much better were I to test document processing, especially something that was harder to model in SQL. I might get there but I have a few more basic tests to run.

    ReplyDelete
  7. Roland - I remember a similar result for one of the PBXT perf tests. Results on OSX were very different than on Linux.

    ReplyDelete
  8. Sammy - yes, that would be better. I just read more of the docs to confirm how to do that and I will repeat my tests.

    ReplyDelete
  9. The results have been updated to use _id for MongoDB. The results are better with that change. I will report on fetch by secondary key in another post.

    ReplyDelete
  10. Are you doing a "select *" in mysql or only pulling back some columns?

    I've gotten much better numbers on some benchmarks - so curious why the difference.

    ReplyDelete
  11. You may want to consider trying mongo with java or c.
    I just copied yours in java and got 120k/sec on a core i7.
    I've heard before the python mongo driver is a bit of a dog, not sure why

    ReplyDelete
  12. In what context do you get that much? Client and server on same host? On different hosts? What is the network latency between them?

    I know that more efficient client code can drive MySQL faster and I imagine the same is true for Mongo.

    The peak QPS I get from Python clients to MySQL is almost 120,000 using MySQL 5.1 and the Facebook patch. Using sysbench for the clients rather than Python the peak is about 180,000.

    Another significant component to peak QPS is network latency between client and server. Ping round-trip time is about 1ms in my setup. But both Mongo and MySQL suffer the same from that. I need to co-locate my client and server in the same rack to get something better.

    Alas, I will repeat a few tests between two hosts that have less network latency.

    ReplyDelete
  13. Different hosts with .1 ms ping time

    ReplyDelete
  14. My results are with mongodb 1.7.0. I will try the stable release next -- 1.6.2.

    I just found client-server hosts that have the same 0.1ms ping time and will repeat tests there. A quick test shows that peak QPS is not changed much but the peak is reached with fewer clients and the server suffers severe mutex contention (based on vmstat) at saturation.

    I asked my contact at 10gen to provide a non-stripped mongod binary so I can provide more details on the source of contention.

    ReplyDelete
  15. Updated with results from hosts that have a 200us ping time. The peak QPS results are similar but the servers saturate sooner.

    ReplyDelete
  16. In summary from this post is it fair to say that-INNODB is faster, handles higher concurrency and saves your data. FB-INNODB is even better. Sounds like innodb is winner for my book.

    Range queries / joins would be interesting to compare which @Didier Spezia indicated.

    ReplyDelete
  17. Dathan - I think that is fair if your primary concern is high-throughput OLTP, although this benchmark is trivial and I have yet to publish anything for a write workload. But MongoDB has many things going for it including flexible schemas, better support for document-oriented processing and what appears to be a more manageable implementation of replication.

    I am not anti-mongodb, although I don't enjoy the "MongoDB is fast" mantra when that implies that MongoDB is universally faster than MySQL.

    ReplyDelete
  18. Updated to include results for 1.6.2 and 1.7.0 -- see
    https://spreadsheets.google.com/oimg?key=0AteR_jot1VDGdDVYWG4yaG51bW1lamZObFlzMHVCUkE&oid=19&zx=dl0e7oobpaky

    ReplyDelete
  19. Always great input with benchmarks, but what I miss is benchmarks of multi-server set ups. Especially since this seems to be the focus of mongodb with auto sharding and all. Iirc it's advertised that a single server set up with mongodb isn't even preferable.

    But more generally speaking...

    As some comments above has indicated, both mysql and mongodb isn't designed to solve all problems; definitively not the same ones.

    My two cents to anyone in the decision/advise chair: get to know the problem you want to solve before you even think of the word benchmarking. Based on that, design the benchmarks for the different db softwares you have in mind.

    Just for the love of god don't make the benchmark code identical just to make them «fair». A lot of the alternatives out there is designed differently, thus should be used differently.

    ReplyDelete
  20. Those are all excellent points. I will add a few more:
    * hire or rent experts during the evaluation -- if you don't have MySQL or MongoDB expertise, then bring in consultants who do
    * hire someone with expertise in physical db design if you have problems with performance, start by reading my post on "index only"
    * hire someone with expertise on system performance

    I suspect that absolute performance is overrated in terms of priorities for most deployments. Predictable performance and cost of operating are probably more important.

    With respect to auto-sharding and multi-server setups, I have not read of many large MongoDB deployments that require much from that yet. I suspect that MongoDB deployments will have to be sharded earlier than MySQL deployments because of their reliance on mmap but that remains to be measured.

    But for some of us it is important to know the point at which a server falls over as we encounter these problems in production and must prepare for them.

    The tests here are necessary for me (maybe not for you) but far from sufficient for testing a DBMS.

    ReplyDelete
  21. from what I have heard python mongo driver is still slow

    ReplyDelete
  22. The MongoDB python driver, pymongo, uses more CPU than the MySQL python driver, MySQLdb. That skews the results from this benchmark. But I never claimed the benchmark was perfect. I much prefer writing this in Python than in Java so I will just have to live with the results. As soon as I add tests that do more complex processing on the server then the impact from the Python overhead wil be less important.

    ReplyDelete
  23. Correction! The extra CPU overhead from pymongo limited throughput. Details at http://mysqlha.blogspot.com/2010/09/oh-no-mongodb-can-be-fast-for-key-value.html. After correcting for this I was able to get more than 155,000 QPS from MongoDB for fetch by primary key. Wow.

    ReplyDelete
  24. hey mark, since your testing innodb why do you have the key_buffer_size set to 2GB. Is the test producing temp tables?

    ReplyDelete
  25. How can I turn off MongoDB cache?

    ReplyDelete
  26. You set doublewrite to 1 : MySQL is doing twice operations instead of one. This is not fair :O

    Please, retry with some of these parameters :

    # Binary logging
    # ------------------------------------------------------------------------------
    #log_bin = /usr/local/mysql/data/mysql-bin
    #server-id = SkeetMeet-1
    #binlog_format = ROW
    #expire_logs_days = 14
    #sync_binlog = 1

    # Core
    # ------------------------------------------------------------------------------
    #sort_buffer_size = 64M
    tmp_table_size = 1000M
    max_heap_table_size = 1000M
    query_cache_type = 1
    query_cache_size = 50M
    max_connections = 1000
    back_log = 10000
    thread_cache_size = 50
    open_files_limit = 65535
    table_definition_cache = 1024
    table_open_cache = 2048
    wait_timeout = 30
    connect_timeout = 5
    max_allowed_packet = 16M
    max_connect_errors = 1000000
    transaction-isolation = READ-COMMITTED

    # MyISAM
    # ------------------------------------------------------------------------------
    key_buffer_size = 32M
    myisam_recover = FORCE,BACKUP

    # InnoDB (http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html)
    # ------------------------------------------------------------------------------
    innodb_strict_mode = 1
    innodb_file_format = Barracuda
    innodb_compression_level = 6
    innodb_flush_method = O_DIRECT
    innodb_data_file_path = ibdata1:10M:autoextend
    innodb_log_files_in_group = 2
    innodb_log_file_size = 256M
    innodb_io_capacity = 250
    innodb_read_io_threads = 2
    innodb_write_io_threads = 2
    innodb_doublewrite = 0
    innodb_read_ahead = 0
    innodb_max_dirty_pages_pct = 80
    innodb_lock_wait_timeout = 30
    #innodb_ibuf_max_pct_of_buffer = 10
    #innodb_ibuf_flush_pct = 40
    innodb_support_xa = 0 # Slave:0, Master:1
    innodb_flush_log_at_trx_commit = 2 # Slave:2, Master:1
    innodb_file_per_table = 1
    innodb_buffer_pool_size = 4G
    innodb_buffer_pool_instances = 1
    innodb_thread_concurrency = 0 # 0-1000, 0:default (no limit)
    innodb_autoinc_lock_mode = 2 # 0, 1:default, 2:parralelize requests, but breaks statement-based replication, because order is non-deterministic (binary replication is ok)
    innodb_sync_spin_loops = 200
    innodb_spin_wait_delay = 0

    ReplyDelete
  27. I ran across this blog post while searching for something else, and I think you are perhaps guilty of selectively choosing indicators that benefit MySQL versus Mongo.

    It may well be the case that MySQL can handle a greater number of concurrent users (I don't know, and don't care to try to replicate your tests), but Mongo is the faster database by far - particularly with large data sets.

    I have done quite a bit of development with Mongo, and I can testify that for applications accessing large data sets Mongo runs approximately 1000x faster once you get millions of rows of data or you have to join large tables.

    Your graphs make it appear that MySQL is a faster database. That's just not correct. But unfortunately, people are going to confuse concurrent users tests with speed.

    ReplyDelete
  28. Just to put a fine point on this. I have a procedure I was running in MySQL that took 20 minutes to complete. That was too long.

    I re-wrote it to use Mongo, and now it completes in 2 seconds.

    It appears to me that you are a mysql partisan who is choosing a highly-specific indicator that in a very limited scope shows MySQL to be a faster database. Whereas in the vast majority of real-world application scenarios, it is Mongo that is the faster database.

    ReplyDelete
  29. I have done a bit of support for MySQL and I can testify that letting my local db-perf team do their magic makes things bazillion times faster. Maybe it is faster for you, or maybe you are better at Mongo than MySQL.

    http://dom.as/2012/06/26/memsql-rage/

    ReplyDelete

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.