I have broken my promise to stop writing about this. Sorry, but I had to correct my mistake. I ran three micro-benchmarks: get by primary key, get by secondary key and update by primary key. MySQL had a higher peak QPS for all of them. Alas, the results for get by primary key were skewed because pymongo, the Python driver for MongoDB, uses more CPU than MySQLdb, the Python driver for MySQL. The client host was saturated during the test and this limited peak QPS to 80,000 for MongoDB versus 110,000 for MySQL.
I repeated one test using two 16-core client hosts with 40 processes per host. For that test the peak QPS on MongoDB improved to 155,000 while the peak for MySQL remained at 110,000. That is an impressive result. The results for get by secondary key and update by primary key are still valid as the server host saturated on those tests.
Now I must consider rewriting the test harness in Java, C or C++ or I could add MongoDB support to sysbench. I prefer Python. In addition to under-reporting MongoDB peak performance it also under-reports MySQL performance. I am able to get 180,000 peak QPS using sysbench on one 16-core client host and mysqld on another versus 110,000 using the Python equivalent.
Tuesday, September 14, 2010
Subscribe to:
Post Comments (Atom)


Keep this up! Love reading about an honest comparison.
ReplyDeleteI guess the moral of the story is that Python is slow ;)
ReplyDeleteYes and in this case pymongo appears to be mostly Python while MySQLdb is a thin Python wrapper on top of a C library.
ReplyDeleteI don't quite see the point in Benchmarks like yours. While those are a nice read for when I'm bored any "real benchmark" (meaningful to you as a user of software X) should *always* deal with the actual data.
ReplyDeletePersonally I don't think it makes a lot of sense to rely these benchmarks (yours, someone else, mine) - except for when you do have the same usage patterns with the same data and the same amount of questions asked to your system.
Btw.: If you really want a fast key-value store Google for the post that used Postgresql with a nice memory only configuration. AFAIR It was nearly memcached speed - hope you won't argue with me that memcached vs. [mysql, postgres, mongo, or any other "storage system" with persistence] are totally different use cases...
Perhaps you should define "real benchmark". I frequently make MySQL better. I frequently run benchmarks like this as part of my work. The results help me. I support large MySQL deployments that demand better performance. Maybe I am just lucky.
ReplyDeleteI also measure performance using real workloads. But neither is sufficient (only use real workoads, only use micro-benchmarks.
One point of the benchmark is to determine the point at which the server falls over. If I can't get more than 10,000 updates/second on a simple benchmark then I know I can't support more than that on complex workloads in production. When I turn on the binlog and set sync_binlog=1 and then updates/second drops to the rate at which I can do fsync, then I know there is another thing to look at.
Another point of the benchmark is to identify bottlenecks in the server. For MySQL they are frequently one of: LOCK_open, kernel_mutex, buffer_pool mutex, prepare_commit_mutex.
Do you have the URL for the Postgres results? MySQL, InnoDB and the Facebook patch do pretty good on key-value workloads. I can get 180,000 QPS from that setup for a read-only load. Peak memcache throughput for this hardware is likely to be greater than 400,000 QPS.
O'reilly just sent me MongoDB as a book to review. I'm going to start playing around with this (using PHP). Cassandra is just does not have any real documentation, while MongoDB is doc'ed up. With your results, I can see now that MongoDB for my use will kill Cassandra hands down - let's see :)
ReplyDeleteOne thing really impresses me. Join their mailing list and check out how quickly they provide good responses (good meaning the responders know their stuff). Maybe this is what MySQL was like in the early days.
ReplyDeleteMark: I thought I defined it :)
ReplyDeleteI didn't mean to say the benchmarks are completeley useless.
They may help:
* you
* for your use cases
* because you know how this data compares to your real data
They won't help:
* me
* Joe Random
* anyone else
O.K. actually that's a lie. But if I read a random benchmark on the web it isn't telling me anything aobut wether Product A or B will perform better on my data and use cases.
I need to know my data to be able to tune, test, benchmark....that was what I meant to say.
Your benchmarks are yours. And while I appreciate the read they can be nothing more to me than a starting point and a guideline finding cases I haven't thought of.
No offence intended :) - I just see a lot of people fall for benchmarks of other people like: "But this benchmark says that product A is faster than B" leaving out actually what was tested, how it was tested (which you nicely describe), why it was tested, ...
regarding the post about in-memory postgres - I can't find the original but here are the basic instructions:
http://rhaas.blogspot.com/2010/06/postgresql-as-in-memory-only-database_24.html
(not quite a benchmark but with these instructions it should be a breeze to create one)
regards,
Martin
Martin,
ReplyDeleteI often don't appreciate that results published by me and others are misinterpreted or misused. I did better this time by claiming this was a "silly benchmark" but even that is probably not enough and someone will say that MySQL/MongoDB is faster than MongoDB/MySQL on a comparison that doesn't mean much.
Hi Mark,
ReplyDeleteAny chances you are still pursuing this now that mongoDB has released 2.4?
-Abhay
I still wonder how well it does on IO-bound workloads. Maybe TokuDB + MongoDB are the solution?
ReplyDeletehttp://www.tokutek.com/2013/03/wanted-evaluators-to-try-mongodb-with-fractal-tree-indexing