1. Congratulations!

    Can we please tone down the marketing? It is great to see a community project grow, especially in the MySQL family. But you are judged by successful deployments, not by the class libraries used by your project.

    Drizzle is only more reliable than MySQL when it keeps the replication log and InnoDB in sync during crash recovery. But it does not do that today. Official MySQL supports this on a master via sync_binlog. MySQL slaves do this today via rpl_transaction_enabled which is available in the Facebook and Google patches. I think it is also available in Percona Server, MariaDB and XtraDB.

    I have been told that Drizzle was designed for multi-core scalability. I am not sure what that means on real workloads or benchmarks. I know that there are some bottlenecks in InnoDB that have nothing to do with the code above it (MySQL, Drizzle). I also know that MySQL has made huge strides this year and MySQL 5.5 far exceeds my expectations. Alas I cannot validate whether "designed for scalability" results in performance that is better than MySQL 5.5 as I cannot build Drizzle on my servers due to the  dependency on gcc 4.2 or greater.
    7

    View comments

  2. I have broken my promise to stop writing about this. Sorry, but I had to correct my mistake. I ran three micro-benchmarks: get by primary key, get by secondary key and update by primary key. MySQL had a higher peak QPS for all of them. Alas, the results for get by primary key were skewed because pymongo, the Python driver for MongoDB, uses more CPU than MySQLdb, the Python driver for MySQL. The client host was saturated during the test and this limited peak QPS to 80,000 for MongoDB versus 110,000 for MySQL.

    I repeated one test using two 16-core client hosts with 40 processes per host. For that test the peak QPS on MongoDB improved to 155,000 while the peak for MySQL remained at 110,000. That is an impressive result. The results for get by secondary key and update by primary key are still valid as the server host saturated on those tests.

    Now I must consider rewriting the test harness in Java, C or C++ or I could add MongoDB support to sysbench. I prefer Python. In addition to under-reporting MongoDB peak performance it also under-reports MySQL performance. I am able to get 180,000 peak QPS using sysbench on one 16-core client host and mysqld on another versus 110,000 using the Python equivalent.
    11

    View comments

  3. This is the end of my public performance comparison of MongoDB versus MySQL, at least for the next few weeks. For these tests I used a 16-core x86_64 client host and an 8-core x86_64 server host, MySQL 5.1.50 with the Facebook patch, MongoDB 1.7.0 and Python clients. Ping between the client and server hosts was ~200us.

    I tested 5 configurations:

    • inno.5150fb.b0.s0 - MySQL 5.1.50, the Facebook patch, binlog disabled
    • inno.5150fb.b1.s0 - MySQL 5.1.50, the Facebook patch, binlog enabled, sync_binlog=0
    • inno.5150fb.b1.s1 - MySQL 5.1.50, the Facebook patch, binlog enabled, sync_binlog=1
    • mongo.170.safe - MongoDB 1.7.0 and safe updates
    • mongo.170.unsafe - MongoDB 1.7.0 and unsafe updates
    Note that with unsafe updates the client does not wait for the server to respond. It sends requests as fast as it can or until the buffer between the client and server is full. When a sufficient number of concurrent clients are used and the clients run for enough time the buffer becomes full and unsafe updates do not improve performance.

    InnoDB has much better throughput when the binlog is disabled.  The clients update rows selected at random from 2M rows. InnoDB is able to handle more of that load concurrently than MongoDB as a reader-writer lock is used that prevents concurrency within the database. Note that while InnoDB allows more of the update to be done concurrently, it isn't perfect as there are several global mutexes in MySQL/InnoDB including LOCK_open, kernel_mutex and the InnoDB buffer pool mutex. When the binlog is enabled by sync_binlog is disabled even more global mutexes are used. Finally, when the binlog is enabled and sync_binlog=1 then group commit is not enabled and updates are rate limited by the performance of fsync. In this case fsync is fast as a HW RAID card with battery backed cache was used.
    This is output from PMP that demonstrates one source of mutex contention in mongod. The pileup occurs at mongo::MongoMutex which apprently implements the reader-writer lock.

         48 mongo::connThread,thread_proxy,start_thread,clone
         37 pthread_cond_wait@@GLIBC_2.3.2,boost::condition_variable::wait,mongo::MongoMutex::lock,mongo::receivedUpdate,mongo::assembleResponse
         10 recv,mongo::MessagingPort::recv,mongo::MessagingPort::recv
          2 
          1 select,mongo::Listener::initAndListen,mongo::listen,mongo::_initAndListen,mongo::initAndListen,main,select
          1 select,mongo::Listener::initAndListen
          1 pthread_cond_wait@@GLIBC_2.3.2,mongo::FileAllocator::Runner::operator(),thread_proxy,start_thread,clone
          1 nanosleep,mongo::DataFileSync::run,mongo::BackgroundJob::thr,thread_proxy,start_thread,clone
          1 nanosleep,mongo::ClientCursorMonitor::run,mongo::BackgroundJob::thr,thread_proxy,start_thread,clone
          1 nanosleep
          1 mongo::webServerThread,thread_proxy,start_thread,clone
          1 mongo::SnapshotThread::run,mongo::BackgroundJob::thr,thread_proxy,start_thread,clone
          1 mongo::interruptThread,thread_proxy,start_thread,clone
          1 mongo::BtreeBucket::findSingle,mongo::ModSetState::createNewFromMods,mongo::_updateObjects,mongo::updateObjects,mongo::receivedUpdate,mongo::assembleResponse
    Source code for MongoDB queries is listed below. The code to setup MongoDB and MySQL is described in the previous posts.



    def query_mongo(host, port, pipe_to_parent, requests_per, dbname, rows, check, testname, worst_n, id):
      conn = pymongo.Connection(host, port)
      db = conn[dbname]
      signal.signal(signal.SIGTERM, sigterm_handler)

      gets = 0
      stats = SummaryStats(worst_n)

      while True:
        for loop in xrange(0, requests_per):

          target = random.randrange(0, rows)

          s = time.time()
          try:
            r = db.c.update({'_id': target}, {'$inc': {'k': 1 }}, safe=True)
            assert r['updatedExisting'] == True
            assert r['ok'] == 1

            stats.update(s)

            gets += 1

          except:
            assert got_sigterm

    The my.cnf settings for MySQL except for the values of log_bin and sync_binlog:
    innodb_buffer_pool_size=2000M
    innodb_log_file_size=100M
    innodb_flush_log_at_trx_commit=2
    innodb_doublewrite=1
    innodb_flush_method=O_DIRECT
    innodb_thread_concurrency=0
    innodb_max_dirty_pages_pct=80
    innodb_file_format=barracuda
    innodb_file_per_table
    innodb_deadlock_detect=0

    max_connections=2000
    table_cache=2000

    key_buffer_size=2000M

    innodb_doublewrite=0

    7

    View comments

  4. This continues the silly benchmark series and compares performance from concurrent clients that fetch by secondary key. The previous post compared fetch by primary key. The test setup was the same as before. Clients were run on a 16-core x86 server. The servers (mongod, mysqld) were run alone on 8-core and 16-core x86 servers. The tests were run for servers that were 1ms and 200us apart according to ping. The database server saturates earlier when the client is only 200us away. That is to be expected.

    InnoDB tables are clusters on the primary key and a query that fetches all columns by PK only has to read data from one leaf block of the PK index. When all columns are fetched by secondary key then the secondary index leaf node and PK index leaf node must be read. As all data was cached for this test that does not make a big difference. Were data not cached the extra IO used to read the PK index leaf node would be significant.

    This displays throughput on the 16-core server. MongoDB saturates earlier on the 16-core server than on the 8-core server. From vmstat output this appears to be mutex contention on the server but I cannot provide more details on where that occurs as the mongod binary I downloaded has been stripped.
    This displays throughput on the 8-core server. Peak QPS for MongoDB is much better than on the 16-core server.
    This displays response time for the 16 core server.
    The test was repeated using a hosts that were 200us apart according to ping. The database host was an 8-core server in this test. The peak QPS is similar to the previous tests but the servers saturate with fewer concurrent clients. The results are here and have been updated to include results for MongoDB 1.6.2 and 1.7.0.

    Source code to setup the MySQL table:

    def setup_mysql(host, db, user, password, engine, rows):
      filterwarnings( 'ignore', category = MySQLdb.Warning )
      conn = connect_mysql(host, db, user, password)
      conn.autocommit(True)
      cursor = conn.cursor()
      cursor.execute('drop table if exists bm')
      cursor.execute('create table bm (id int primary key, sid int, k int, c char(120), pad char(60), key sidx(sid)) engine=%s' % engine)

      vals = []
      for x in xrange(0, rows):
        sx = str(x)
        lsx = len(sx)
        row = '(%d, %d, %d, "%s", "%s")' % (x, x, x, sx+'x'*(120-lsx), sx+'y'*(60-lsx))
        vals.append(row)

        if len(vals) == 1000:
          r = cursor.execute('insert into bm values %s' % ','.join(vals))
          vals = []
          print '... row %d, result %s' % (x, r)

      if vals:
        r = cursor.execute('insert into bm values %s' % ','.join(vals))
        vals = []
        print '... row %d, result %s' % (x, r)


    Source code to query the MySQL table:


    def query_mysql(host, db, user, password, pipe_to_parent, requests_per, rows, check, testname, worst_n, id):
      conn = connect_mysql(host, db, user, password)
      conn.autocommit(True)
      cursor = conn.cursor()

      gets = 0
      stats = SummaryStats(worst_n)

      while True:
        for loop in xrange(0, requests_per):

          target = random.randrange(0, rows)

          s = time.time()
          cursor.execute('select id, k, c, pad from bm where sid = %d' % target)
          sel_rows = cursor.fetchall()
          stats.update(s)

          if len(sel_rows) != 1:
            print 'No rows for %d' % target
            assert False
          if sel_rows[0][0] != target:
            print 'id is %s and should be %s' % (sel_rows[0][0], target)
            assert False

          gets += 1

    Source code to setup the MongoDB collection:


    def setup_mongo(host, port, dbname, rows):
      conn = pymongo.Connection(host, port)
      conn.drop_database(dbname)
      db = conn[dbname]

      for x in xrange(0, rows):
        sx = str(x)
        lsx = len(sx)
        db.c.save({'_id':x, 'sid':x, 'k':x, 'c':sx+'x'*(120 - lsx), 'pad':sx+'y'*(60 - lsx)})
        if x % 1000 == 0:
          print '... row %d' % x

      db.c.create_index('sid')

    Source code to query the MongoDB table:

    def query_mongo(host, port, pipe_to_parent, requests_per, dbname, rows, check, testname, worst_n, id):
      conn = pymongo.Connection(host, port)
      db = conn[dbname]

      gets = 0
      stats = SummaryStats(worst_n)

      while True:
        for loop in xrange(0, requests_per):

          target = random.randrange(0, rows)

          s = time.time()
          o = db.c.find_one({'sid': target})
          stats.update(s)

          assert o['_id'] == target
          if check:
            assert o['k'] == target
            sx = str(o['_id'])
            lsx = len(sx)
            assert o['c'] == sx+'x'*(120-lsx)
            assert o['pad'] == sx+'y'*(60-lsx)

          gets += 1

    13

    View comments

  5. This is yet another silly benchmark because the results are likely to be misused. The results probably do not matter to you. I like MongoDB. It has many things in common with MySQL (ease of use, rapid iteration, give customers what they want). It would be more interesting to me were it to use embedded InnoDB as the backend.

    Please read the update as I was able to get much better throughput from MongoDB once I realized that the client host was saturated on the CPU.

    I want to evaluate the performance claims. Several of the MongoDB versus the world benchmarks were run on Mac laptops. I love my Mac laptop, but I don't use it to confirm database server throughput.

    My first test uses concurrent clients to determine the load at which the server saturates. I run this test in two modes: cached and not cached. No disk IO should be done during the cached test. Most requests should require disk IO during the uncached test. I am not sure that I will run the uncached test for MongoDB as it memory maps the database file and there is no way to limit the amount of memory it will use so the MongoDB uncached test requires a database much larger than RAM. That will take a long time to setup and I don't know whether many MongoDB deployments run with databases much larger than RAM.

    The test is interesting to me because it has found problems that limit throughput on workloads that I care about. Other tests should be done after understand the results from this test. I suspect that for many people peak QPS on highly-concurrent workloads is not a priority.

    Each client does a sequence of calls (SELECT or HANDLER for MySQL, find_one for MongoDB) where each calls fetches one row by specifying the value of an indexed column. The column has a primary key index for MySQL and used the Collections create_index method for MongoDB. The id value to fetch is randomly selected.

    The code for the benchmark is a work in progress, so I have yet to publish it. I used Python, the MySQLdb driver for MySQL and the pymongo driver for MongoDB. The clients are separate processes forked via the Python multiprocessing module to avoid the penalty from the Python GIL. Some of the code used for MongoDB is listed at the end of this post.

    I only report peak QPS at this time. The results ignore average and worst-case response time. It is not good to ignore them which is another reason why this might be a silly benchmark. Eventually the code will be updated to measure that.

    The clients and DBMS (mysqld, mongod) ran on separate 16-core servers. The test was run for 1, 2, 4, 8, 16, 24, ..., 152, 160 concurrent clients. Tests were run for MyISAM and InnoDB for MySQL. The tests used MySQL 5.1.50 unmodified (5150orig) and 5.1.50 with the Facebook patch (5150fb). There are a few changes in the Facebook patch that make a huge difference for peak QPS. I expect official MySQL to have these changes soon. Peak QPS was much higher for MySQL than for MongoDB. In order of peak QPS:
    1. Peak QPS on MySQL 5.1.50 with the Facebook patch exceeds 100,000 and QPS increases to 128 concurrent clients.
    2. Peak QPS on MySQL 5.1.50 unmodified is between 70,000 and 80,000 and QPS increases to 80 concurrent clients.
    3. Peak QPS on MongoDB 1.7.0 is 40,000 and QPS increases to 64 concurrent clients.
    MongoDB appears to saturate on mutex contention on the server at 64 concurrent clients. Alas, the binary is stripped and I cannot run PMP to determine where the problem occurs. My diagnosis is based on vmstat output.

    This displays throughput MySQL 5.1.50 with the Facebook patch, MySQL 5.1.50 unmodified and MongoDB 1.7.0. All ran on a 16-core Nehalem server -- 16 cores with hyperthreading enabled.

    This displays average and 98th percentile response time for MySQL 5.1.50 with the Facebook patch and MongoDB 1.7.0. Both ran on a 16-core Nehalem server -- 16 cores with hyperthreading enabled. Results are from the same test as used for the previous graph.
    I repeated the tests using an 8-core Nehalem server. It is the same as the previous server except that hyperthreading was disabled. The graphs below display throughput. MySQL is able to handle more concurrent clients before saturating on the server.

    This is the response time graph for the 8-core server.
    The servers for the tests above were 1ms apart according to ping. I repeated the test for servers that have a 200us ping time but I don't want to inline another graph. It is linked in the next sentence. The QPS for get by primary key with an interesting and unexplained spike in QPS for MongoDB near 152 concurrent clients. Despite having a similar QPS at that point, MongoDB average response time was about 1.5X that for MySQL. This includes results for MongoDB versions 1.6.2 and 1.7.0.

    The code to setup the collection for MongoDB is:
    def setup_db(host, port, dbname, rows):
      conn = pymongo.Connection(host, port)
      conn.drop_database(dbname)
      db = conn[dbname]  
      for x in xrange(0, rows):
        sx = str(x)
        lsx = len(sx)
        db.c.save({'_id':x, 'k':x, 'c':sx+'x'*(120 - lsx), 'pad':sx+'y'*(120 - lsx)})
        if x % 1000 == 0:
          print '... row %d' % x
    And the code to query MongoDB:
    def query_process(host, port, pipe_to_parent, requests_per, dbname, rows, check, id):
      conn = pymongo.Connection(host, port)
      db = conn[dbname]
      gets = 0
      while True:
        for loop in xrange(0, requests_per):
          target = random.randrange(0, rows)
          o = db.c.find_one({'_id': target})
          assert o['_id'] == target
          if check:
            assert o['k'] == target
            sx = str(o['id'])
            lsx = len(sx)
            assert o['c'] == sx+'x'*(120-lsx)
            assert o['pad'] == sx+'y'*(120-lsx)

          gets += 1

        if pipe_to_parent.poll():
          msg = pipe_to_parent.recv()
          print 'Received: %s' % msg
          pipe_to_parent.send({'gets' : gets})
          pipe_to_parent.close()
          return
    The my.cnf settings for MySQL 5.1:
    plugin-load=innodb=ha_innodb_plugin.so;innodb_trx=ha_innodb_plugin.so;innodb_locks=ha_innodb_plugin.so;innodb_lock_waits=ha_innodb_plugin.so;innodb_cmp=ha_innodb_plugin.so;innodb_cmp_reset=ha_innodb_plugin.so;innodb_cmpmem=ha_innodb_plugin.so;innodb_cmpmem_reset=ha_innodb_plugin.so
    innodb_buffer_pool_size=2000M
    innodb_log_file_size=100M
    innodb_flush_log_at_trx_commit=2
    innodb_doublewrite=1
    innodb_flush_method=O_DIRECT
    innodb_thread_concurrency=0

    innodb_file_format=barracuda
    innodb_file_per_table

    max_connections=2000
    table_cache=2000
    key_buffer_size=2000M
    33

    View comments

  6. The 5.1 manual states:
    Therefore, if you are using statement-based replication, you must either avoid INSERT ... ON DUPLICATE KEY UPDATE or use innodb_autoinc_lock_mode = 0
    And earlier in the same page is the description of innodb_autoinc_lock_mode=0
    This lock mode is provided only for backward compatibility and performance testing. There is little reason to use this lock mode unless you use “mixed-mode inserts” and care about the important difference in semantics described later.
    I don't think these statements agree. I am confused even more because upgrading a master-slave pair of servers to 5.1 with innodb_autoinc_lock_mode=1 and statement-based replication fixed a logical corruption problem that occurs when the wrong value was written to the binlog for SET INSERT_ID=... as part of a transaction that does INSERT ... ON DUPLICATE KEY UPDATE.There are a few details on that at bug 50413.
    3

    View comments

  7. I am speaking at MySQL Sunday. The title for my talk is Success with MySQL and I will focus on things that  operations and users can do to make a MySQL deployment succeed. There are many interesting talks scheduled for Sunday, including several at the same time as mine. I hope to see you there.
    0

    Add a comment

Loading