Friday, April 24, 2009

New storage engines for MySQL -- rocket science or great engineering?

There were several new storage engine vendors at the MySQL Conference. I spoke with people from Virident, Tokutek and Schooner at length about their technology. Their products are impressive and I look forward to more details on performance from them including read-intensive and write-intensive workloads. Tokutek has already published performance results on an insert intensive workload and then worked with me to improve the InnoDB results and improve the test code so others can run it. The code and results are here.

One test I want all of them to run is to run a write-intensive workload so that InnoDB accumulates many dirty pages in the buffer pool and many entries in the insert buffer, kill mysqld and then determine how long it takes the server to perform crash recovery. This should be compared between InnoDB on commodity hardware, InnoDB on Virident and Schooner hardware and TokuDB on commodity hardware. I suspect that the results will be impressive for the new storage engines.

I use the term rocket science because a lot of vendors will have you believe that they have something special. In this case, I believe that each of the vendors really do have something special. But of course, more results will help us understand what they can do. Each of them have also chosen a path that doesn't require a huge investment on their part to build a product as they have limited their software and hardware investments to areas where they have a lot of value to add and that makes it more likely that they can deliver on their promises.
  • TokuDB is software-only. But this is really clever software. They have implemented a new algorithm that significantly reduces random IO for write-intensive workloads. There have been algorithms that do this. For example, Log-Structured Merge Trees. I have even published a paper on this at VLDB. But TokuDB may be much better than previously known approaches.
  • most of the hardware expense for Virident is isolated to one component that implements industry standard interfaces and can plug into commodity servers. Their software investment is focused on improving pieces of MySQL/InnoDB to leverage their hardware. They have been able to improve on the work of others in the InnoDB developer community.
  • Schooner uses mostly commodity hardware with value-added in the integration of that hardware. Their software investment in MySQL also appears to be focused on improving pieces of it rather than replacing it and they have been able to improve on the work of others.
TokuDB allows a small server to handle a much larger workload. This has many benefits including reduced power consumption and less need to shard or add shards to a large MySQL deployment. I think they also use much less disk space than InnoDB. Their technical staff explained the math behind the algorithms that justify their performance. Math is hard so this took some time, but eventually I kind of understood and I believe in their results. Their approach will enable many more optimizations in the future.

The servers from Virident and Schooner are optimized for InnoDB, so it should be easy for existing users to try them out. I expect ridculously high throughput results from both of them. Schooner hardware is easier to understand as they provide much better IO performance (and many other benefits). They also appear to have designed a balanced system so that peak and actual performance won't be too far away. Oracle has done this with the Exadata machine and it is very nice to see a similar effort from Schooner.

Virident uses NOR Flash to provide fast access with byte-level accessing (as opposed to reading a disk page at a time). This takes more time to understand. It is almost as if they have reduced the InnoDB page size to the size of a row, so much less data is transferred when reading rows randomly. MySQL loves to access rows randomly and reading less data means less effort is wasted and there should be less contention on shared resources with many-core and multi-core servers.

7 comments:

  1. Mark,

    Are the slides from your MySQL on EC2 presetnation available anywhere? I would love to see them.

    ReplyDelete
  2. They should be at the O'Reilly site.

    ReplyDelete
  3. They have on of your talks up there, but not the EC2 one: http://www.mysqlconf.com/mysql2009/public/schedule/proceedings . I'll check back next week and see if it is up yet.

    ReplyDelete
  4. Hey Mark do you have access to TokuDB slides that talk about the big O on insert speed. For some reason I have in my head that it's
    Big-O(logbN)/b such that 'b' == blocks, N == number of Rows.

    ReplyDelete
  5. I don't. They worked through the math of a more basic cache oblivious algorithm with me and eventually I understood that what they have done is for real. They have been as open as they can be about their tech so you should ask them.

    ReplyDelete
  6. Bradley Kuszmaul's slides covering TokuDB are up on the Percona Performance conference website:
    http://www.percona.com/ppc2009/PPC2009_Covering_Indexes_Tokutek.pdf

    Please contact me if you have any questions at hotchkiss at tokutek dot com.

    ReplyDelete
  7. We would like to get further improvements on sqls with higload selects and concurrent updates on tables with over 100 mio entries and several indices.
    I heared about a (not free) other storage engine, which seems to be involved in telephone enviroment. did you hear anything about that?

    ReplyDelete

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.