1. I ran the iibench test using a server with 2 CPU cores, 2 disks in SW RAID 0 and 1 MB stripe, 2G RAM and XFS. If you just want a summary, it is that software changes can make InnoDB run much faster on the same hardware. There is a lot of opportunity -- but certainly not enough to catch TokuDB.

    The binaries tested are:
    There were two tests:
    1. Time to insert 50M rows into an empty table
    2. Time to insert several million rows into a table with 50M rows
    I disabled the innodb doublewrite buffer for all tests as I want to compare the results to a server that doesn't use that level of safety.

    The my.cnf parameters for 5.0.77 are:
    innodb_buffer_pool_size=1G
    innodb_log_file_size=1900M
    innodb_flush_log_at_trx_commit=1
    innodb_flush_method=O_DIRECT
    innodb_max_dirty_pages_pct=20
    innodb_doublewrite=0
     The my.cnf parameters for the v3 Google patch are:
    innodb_buffer_pool_size=1G
    innodb_log_file_size=1900M
    innodb_flush_log_at_trx_commit=1
    innodb_flush_method=O_DIRECT
    innodb_io_capacity=250
    innodb_read_io_threads=2
    innodb_write_io_threads=2
    innodb_max_dirty_pages_pct=20
    innodb_ibuf_max_pct_of_buffer=10
    innodb_ibuf_reads_sync=1
    innodb_doublewrite=0
    The my.cnf parameters for XtraDB are:
    innodb_log_file_size=1900M
    innodb_buffer_pool_size=1G
    innodb_flush_log_at_trx_commit=1
    innodb_flush_method=O_DIRECT
    innodb_io_capacity=250
    innodb_use_sys_malloc=0
    innodb_read_io_threads=2
    innodb_write_io_threads=2
    innodb_max_dirty_pages_pct=20
    innodb_ibuf_max_size=100M
    innodb_ibuf_active_contract=1
    innodb_ibuf_accel_rate=200
    innodb_doublewrite=0
    All performance results are anonymous for binaries X, Y and Z. Maybe I can monetize my performance testing effort by doing this. I probably need a new disk soon.

    The first result is the time to insert 50m rows into an empty table measured in seconds. The difference is not that signficant. However, the results can be misleading. 5.0.77 has much more pending work at test end (dirty pages and insert buffer entries). That also made 5.0.77 much slower near the end of the test, but I will save the graphs for the next set of results. The test is run using the run_ib found in the link for iibench at the start of this page. The command line is:
    bash run.sh 1 $path no root pw test no innodb 50000000 innodb yes yes $binary
    And the results are:
    • 24478 seconds -- binary X
    • 21016 seconds -- binary Y
    • 37146 seconds -- binary Z
    The second result is the time to insert 3,380,000 rows into a table that starts with 50m rows on a cold server (no entries in the insert buffer, no dirty pages, server restarted). Queries are continuously run by 4 threads concurrent with the inserts. The test is run using run_ib from the iibench link at the top of this page. The command line is:
    bash run.sh 1 $path no root pw test no innodb 10000000 innodb no no $binary
    And the results are:
    • 38673 seconds -- binary X
    • 11143 seconds -- binary Y
    • 21018 seconds -- binary Z
    Finally, the graph for row insert rate over time. Note that the graphs for binaries Y and Z don't extend to the right because they inserted the 3.3M rows much faster.

    3

    View comments

  2. Did you know that Rackspace has a cloud offering? I didn't. The name is Mosso.

    Someone from Rackspace/Mosso on the drizzle-discuss mailing list offered to hire a person full time to work on Drizzle. Curious? Find the post on the mailing list.

    Drizzle has a lot of potential for making it easier to run a DBMS server on the cloud. There are a few things that need to be done differently from traditional MySQL replication. Drizzle has started over and has removed the code inherited from MySQL. Their focus is a clean API (right Jay). There should be a lot of interesting work that can be done.
    0

    Add a comment

  3. Surely there must be a better way to get your point across. Well, I assume this is an insult because you don't appear to be too hippie-ish. But if you really are the expert in open-source business that you claim to be, you probably can do better than to describe part of the MySQL community as an open-source hippie commune that displays hippie-esque tendencies, unless they self-identify as that. Although that will be a fair description once we start holding the user conferences at Burning Man and Country Fair.

    I haven't linked the blog in question because I don't want to promote it.
    5

    View comments

  4. Slides for my talks. Code described here is in the v3 Google patch.



    3

    View comments

  5. In his keynote Baron reminded us that we need to focus on what we can do to improve community MySQL rather than wait for things to get done by the corporate owners. What will you do?

    Many of us will continue to add high-end features to MySQL. It will great if those features make it into an official MySQL release. It will be a great business opportunity for the community if they do not.

    In the short term, I have some things to do:
    • run IO bound tests (iibench) for PBXT and provide the results to the PBXT team with a comparison to the v3 Google patch
    • run IO bound tests (iibench) for XtraDB and provide the results to Percona with a comparison to the v3 Google patch
    • publish more documentation for features in the v3 Google patch (support for roles, more changes to improve IO performance, more details on row-change logging)
    • read the docs and evaluate embedded InnoDB 
    Other people on my team also plan to share more details and code:
    • Ben is working on a backport of the pool-of-threads code to MySQL 5.0. While the backport itself to 5.0.37 might not help those using 5.1 or recent 5.0 versions, we have also fixed the SMP performance problems in the pool-of-threads code and that change is isolated to a single file. Others will be able to use it (but only on Linux as it uses epoll directly).
    • Justin may publish a patch for a recent version of 5.0 that only includes the changes for global group IDs, binlog event checksums and crash safe replication that works for all storage engines.
    Much of the big patch is not easy to consume. But there is an answer to that. If you really want the feature then you can hire someone to port it to a recent 5.0 or 5.1 release. Rumor has it that Percona has done just that with several features. This makes me happy and proud.

    InnoDB has continued to add valuable features to their releases. The most recent is embedded InnoDB. It can be used to do some very interesting things. But first I must read the docs. They have also added a lot of new functionality to the 5.1 branch via the InnoDB plugin. This includes fast index creation, compression and SMP performance improvements. This is a big deal as the standard response and practice from MySQL is that new features cannot go into a production branch.
    0

    Add a comment

  6. There were several new storage engine vendors at the MySQL Conference. I spoke with people from Virident, Tokutek and Schooner at length about their technology. Their products are impressive and I look forward to more details on performance from them including read-intensive and write-intensive workloads. Tokutek has already published performance results on an insert intensive workload and then worked with me to improve the InnoDB results and improve the test code so others can run it. The code and results are here.

    One test I want all of them to run is to run a write-intensive workload so that InnoDB accumulates many dirty pages in the buffer pool and many entries in the insert buffer, kill mysqld and then determine how long it takes the server to perform crash recovery. This should be compared between InnoDB on commodity hardware, InnoDB on Virident and Schooner hardware and TokuDB on commodity hardware. I suspect that the results will be impressive for the new storage engines.

    I use the term rocket science because a lot of vendors will have you believe that they have something special. In this case, I believe that each of the vendors really do have something special. But of course, more results will help us understand what they can do. Each of them have also chosen a path that doesn't require a huge investment on their part to build a product as they have limited their software and hardware investments to areas where they have a lot of value to add and that makes it more likely that they can deliver on their promises.
    • TokuDB is software-only. But this is really clever software. They have implemented a new algorithm that significantly reduces random IO for write-intensive workloads. There have been algorithms that do this. For example, Log-Structured Merge Trees. I have even published a paper on this at VLDB. But TokuDB may be much better than previously known approaches.
    • most of the hardware expense for Virident is isolated to one component that implements industry standard interfaces and can plug into commodity servers. Their software investment is focused on improving pieces of MySQL/InnoDB to leverage their hardware. They have been able to improve on the work of others in the InnoDB developer community.
    • Schooner uses mostly commodity hardware with value-added in the integration of that hardware. Their software investment in MySQL also appears to be focused on improving pieces of it rather than replacing it and they have been able to improve on the work of others.
    TokuDB allows a small server to handle a much larger workload. This has many benefits including reduced power consumption and less need to shard or add shards to a large MySQL deployment. I think they also use much less disk space than InnoDB. Their technical staff explained the math behind the algorithms that justify their performance. Math is hard so this took some time, but eventually I kind of understood and I believe in their results. Their approach will enable many more optimizations in the future.

    The servers from Virident and Schooner are optimized for InnoDB, so it should be easy for existing users to try them out. I expect ridculously high throughput results from both of them. Schooner hardware is easier to understand as they provide much better IO performance (and many other benefits). They also appear to have designed a balanced system so that peak and actual performance won't be too far away. Oracle has done this with the Exadata machine and it is very nice to see a similar effort from Schooner.

    Virident uses NOR Flash to provide fast access with byte-level accessing (as opposed to reading a disk page at a time). This takes more time to understand. It is almost as if they have reduced the InnoDB page size to the size of a row, so much less data is transferred when reading rows randomly. MySQL loves to access rows randomly and reading less data means less effort is wasted and there should be less contention on shared resources with many-core and multi-core servers.
    7

    View comments

  7. We added support for row-change logging to MySQL 5.0. The logged data is similar to row-based replication with changes to the output that make it much easier to parse. Gene Pang describes this work at 2pm at the conference.

    What might be done with this data?
    • replicate row changes to a data store that is not MySQL (Teradata, HBase/Hypertable, memcached)
    • materialized view maintenance
    • change notification
    And I talk at the Percona Performance Conference at 10:50am today on the InnoDB IO architecture.
    3

    View comments

  8. Calpont has a talk on their MPP column-store storage engine for MySQL at 2PM today. The talk title is  Open Source Columnar Storage Engine. It sounds interesting, especially if the source will be available as many people can try it out. But the source isn't available today. Where is the source?

    Note, Calpont doesn't mind that I am asking about this in public.

    Other questions I have include:
    1. Does it implement the condition pushdown interface?
    2. Will it implement the batch key access interface?
    3

    View comments

  9. Justin and Ben talk today at 4:25pm on features for SMP performance and high availability.

    Ben is an expert on InnoDB internals related to SMP performance. He designed and implemented the faster rw-mutex changes that are now in the 1.0.3 InnoDB plugin and MySQL 5.4 and have made MySQL much faster on SMP servers. More recently he changed InnoDB to significantly reduce mutex contention on the transaction log and buffer pool mutexes. This makes InnoDB 20% faster on sysbench and other read-write workloads. Right now he is finishing the backport of the pool-of-threads code from MySQL 6 to 5.0 and making it scale on SMP.

    Justin is a replication expert. He added support for global transaction IDs to automate slave failover, made replication slaves crash-safe, added checksums for binlog events and fixed many bugs in replication.

    They both have done very interesting work. And we don't just build these features, we also run them in production soon after adding them.
    0

    Add a comment

  10. How does InnoDB do on high IOPs servers? Thanks to SSD, many of us will soon have such servers. I will provide more details in my talks:
    The summary is that InnoDB is very effective at using the read capacity of a high IOPs server. It has problems at using enough of the write capacity. Alas, this is InnoDB and the problem is easy to fix. Many of the fixes are in the v3 Google patch. Many are also in the Percona patches and builds. Although there may be one fix in the v3 Google patch that Percona has yet to implement.

    Note that I may mention the P word a few times in my talk today and tomorrow.
    2

    View comments

Loading