I have taken the Tokutek challenge and run iibench. This test matches behavior that occasionally matters to me: reloading all tables of a database via INSERT statements, alter table requires a copy of the table to be made and all indexes to be rebuilt. The iibench workload has one table with a primary key index and two secondary indexes. 1B rows are inserted in primary key order. The other indexed columns have random values.
How fast should this run on InnoDB?
This analysis assumes that the table is large enough so that all data does not fit in the buffer cache. On the typical DBMS that updates rows in place and implements a NO FORCE and STEAL policy for the buffer manager, each insert is likely to require 1 read and 1 write per index. The read is done for the leaf block that must be updated and the write is done to flush a dirty page from the buffer cache to get space for the read block.
The primary key leaf blocks should not require reads as the inserts are done in primary key order. The writes for the row should be amortized over many inserts for both index-organized and heap tables.
Each insert needs 4 IOs once the table is much larger than the buffer cache. If I run with a server that can do 1000 IOPs, then I expect to insert 250 rows/second.
Alas, Heikki is clever. InnoDB has an insert buffer. IO for secondary index maintenance after UPDATE and INSERT statements is deferred when leaf blocks are not in the buffer cache. Instead, a change record is written to the insert buffer in (index-id, key) order. This clusters changes for the same leaf blocks. Changes are applied from the insert buffer by a background thread.
On real servers that I watch in production this provides a 4:1 or 8:1 reduction in IO for secondary index maintenance. But what impact does it have on iibench? If the insert buffer reduces IO by 8:1, then the row insert rate should be 8X faster -- 2000 rows/second rather than 250.
The insert buffer works when it is large enough to buffer multiple changes to an index leaf block. We can construct a scenario where the insert buffer does not help, but it works for the iibench case. After 1B rows have been inserted into the iibench test table, the secondary indexes use ~6M pages and each index entry requires ~50 bytes. The insert buffer can use half of the InnoDB buffer pool which is several hundred thousand pages for a buffer pool that is close to 10GB. From output in SHOW INNODB STATUS, about 50% of the space allocated for the insert buffer was used for entries. The result of this is that the insert buffer can store ~50M entries when the buffer pool is near 10GB for the iibench test. When there are 50M insert buffer entries for 6M index pages, there will be multiple entries per leaf page so the insert buffer should help performance. I want to use (# insert buffer entries / # index pages) as the estimate for the IO reduction rate but tests I have run suggest that there are other factors.
The test ran for 382431 seconds (6373 minutes or 106 hours and 14 minutes or 4.42 days). The results are close to what my handwaving suggests can be achieved. The row insert rate near the end of this test was ~1800/second. The results are from a server with:
- 8 CPU cores
- 16GB RAM
- 10 disks using SW RAID 0
- InnoDB with the patch to use multiple background IO threads
- innodb_read_io_threads=4, innodb_write_io_threads=4 (a Google patch)
- innodb_log_files_in_group=3, innodb_log_file_size=1300M
- innodb_io_capacity (another Google patch)