Sunday, December 21, 2008

How much does insert performance matter?

I ran the insert benchmark using Innodb. The rate of rows inserted per second degraded from 20,000 to 2,000 during the test. I don't think another storage engine would do much more than 20,000 rows per second on that workload and similar hardware as much of the overhead is above the storage engine. But Innodb degrades by 10X from 20,000 to 2,000 rows per second over time. A storage engine would be 10X faster than Innodb if performance for it did not degrade.

Would 10X faster be enough to get you to switch? Do you need something that is faster for insert-only workloads or must it also be faster for updates and deletes?

Papers have been written that describe both the theory and practice of building systems that support high rates of inserts, updates and deletes. Some of this technology may eventually appear in a MySQL storage engine:
  • Log-Structured Merge-Tree provides a framework for evaluating performance.  
  • Bigtable describes a similar approach for avoiding random IO during updates. 
  • ROSE describes how to combine these techniques with compression for modern CPUs (check out the authors)
  • Graefe describes improvements that can be made for b-tree indexes

5 comments:

  1. FWIW, Hypertable (an open source implementation similar to Bigtable) alpha achieved well over 1M inserts/s of 1TB data (randomly ordered by primary key, replicated 3-way, so about 3TB data was written to disks) sustained on much cheaper commodity hardware with JBOD (Just a Bunch Of Disks (4 7.2K RPM SATA per node with onboard controllers), not RAIDed) over 9 nodes. The performance is expected to double in beta.

    Update/deletes have the same performance as inserts in Hypertable.

    ReplyDelete
  2. More info on that at http://hypertable.org/documentation.html

    ReplyDelete
  3. Is there a mature technology that supports HA ? (except MySQL Cluster)
    Hypertable writers are planning to implement the minimum code to support the loss of one node.

    ReplyDelete
  4. What about PBXT? One of the expected benefits of its architecture seems to be better performance for write operations ... Unfortunately, there are not many PBXT benchmarks for insert/update/delete operations ...

    ReplyDelete
  5. PBXT might be better. My vague memory is that many but not all of the persistent structures to be updated are copy-on-write.

    ReplyDelete

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.