Are you going to JavaOne? I am not, but I will be at the Open HA Cluster Summit tomorrow. I will be part of a panel session on HA. I guess they needed someone with the HA light perspective -- that is how do you get a highly available service when you don't get to use HA components like MySQL Cluster. A lot of interesting work remains to be done to make this possible with regular MySQL. Projects like MMM, Tungsten and the Google patch with global transaction IDs are pieces that might eventually provide a complete solution. There is work underway at MySQL/Sun in the replication team as well. They may even be buiding the integrated solution for MySQL Enterprise.
-
-
These are my plans for making InnoDB faster on SMP and high-IOPs servers. I think we can double throughput at high levels of concurrency.
Future work:
- Reduce the size of mutex and rw-lock structures
- Reduce contention on the sync array mutex
- Reduce contention on kernel_mutex
- Reduce contention on commit_prepare_mutex
- Reduce the number of mutex lock/unlock calls used when a thread is put on the sync array
- Name all events, rw-locks and mutexes in InnoDB to make contention statistics output useful
- Add optional support to time all operations that may block
- Introduce dulint to native 64-bit integer types
- Make BUF_READ_AHEAD_AREA a compile-time constant
- Prevent full table scans from wiping out the InnoDB buffer cache
- Make prefetching smarter
- Get feedback from Dimitri, Domas, Mikael and Percona
- Use prefetch with MRR/BKA to get parallel IO in InnoDB
- Investigate larger doublewrite buffer to allow for more concurrent IOs
- Make Innodb work with a 4kb page size
- Make trx_purge() faster when called by the main background thread
- Use crc32 for Innodb page checksums with hardware support or otherwise make checksum faster.
- Reduce the per-page overhead for sync objects
- Repeat
- Add my.cnf options to disable InnoDB prefetch reads
- Put more output in SHOW INNODB STATUS and SHOW STATUS
- Reduce the overhead from buf_flush_free_margin()
- Change background IO threads to use available IO capacity
- Use more IO to merge insert buffer records when the insert buffer is full
- Fix mutex contention for the HEAP engine
- Fix mutex contention for the MyISAM engine
- Fix mutex contention for the query cache
- Give priority (CPU, disk) to the replication SQL thread to minimize replication delay.
- Push changes for --oltp-secondary-index to public sysbench branch
- Add support to sysbench fileio for transaction log and doublewrite buffer IO patterns
5View comments
-
Once again Domas is unhappy with some aspect of Innodb performance and doing crazy things with gdb to tune it. I made it faster by changing the checksum code to process one 32-bit word at a time rather than one byte at a time. This will be in a future Google patch and is enabled with the parameter innodb_fast_checksum. This is not compatible with the old checksum so you must dump and reload the database to use it.
I measured the benefit using the insert benchmark from Tokutek on a server that can do a lot of IO. CPU overheads are measured using oprofile. The data below lists the percentage of time for the top 4 functions in mysqld. The checksum is computed in buf_calc_page_new_checksum. By using the fast checksum, the checksum overhead drops from 33.6% to 22.1% for gcc -O2 and from 31.6% to 17.3% for gcc -O3.
Overhead for gcc -O2
Using the original checksum code:
- 33.6% - buf_calc_page_new_checksum
- 10.4% - memcpy
- 4.4% - os_aio_simulated_handle
- 4.3% - rec_get_offsets_func
- 22.1% - buf_calc_page_new_checksum
- 12.1% - memcpy
- 5.1% - rec_get_offsets_func
- 4.9% - os_aio_simulated_handle
Using the original checksum code:
- 31.6% - buf_calc_page_new_checksum
- 12.6% - memcpy
- 5.8% - rec_get_offsets_func
- 2.6 - os_aio_simulated_handle
- 17.3% - buf_calc_page_new_checksum
- 13.6% - memcpy
- 6.8% - rec_get_offsets_func
- 2.0% - os_aio_simulated_handle
3View comments
-
I added support for per-tablespace IO statistics to InnoDB. This also provides per-table IO statistics when you innodb_file_per_table is used. The stats are listed in SHOW INNODB STATUS and the text below is output when tpcc-mysql is run -- pardon the formatting. The code should appear at code.google.com real soon now.
File IO statistics
./test/warehouse.ibd 10 -- read: 4 requests, 4 pages, 0.00 secs, 0.72 msecs/r, write: 3 requests, 3 pages, 0.00 secs, 1.43 msecs/r
./ibdata1 0 -- read: 30 requests, 203 pages, 0.03 secs, 0.99 msecs/r, write: 124 requests, 3020 pages, 0.74 secs, 5.93 msecs/r
./test/orders.ibd 29 -- read: 8490 requests, 10033 pages, 8.48 secs, 1.00 msecs/r, write: 6754 requests, 12728 pages, 34.27 secs, 5
.07 msecs/r
./test/customer.ibd 28 -- read: 33901 requests, 34226 pages, 32.05 secs, 0.95 msecs/r, write: 11224 requests, 11850 pages, 43.17 se
cs, 3.85 msecs/r
./test/stock.ibd 27 -- read: 151957 requests, 176913 pages, 256.89 secs, 1.69 msecs/r, write: 41475 requests, 52199 pages, 220.43 s
ecs, 5.31 msecs/r
./test/order_line.ibd 25 -- read: 14239 requests, 14876 pages, 13.10 secs, 0.92 msecs/r, write: 11610 requests, 38413 pages, 45.01
secs, 3.88 msecs/r
./test/new_orders.ibd 22 -- read: 2023 requests, 2316 pages, 1.80 secs, 0.89 msecs/r, write: 1213 requests, 7004 pages, 7.58 secs,
6.25 msecs/r
./test/history.ibd 21 -- read: 5740 requests, 7711 pages, 5.64 secs, 0.98 msecs/r, write: 4938 requests, 22754 pages, 27.97 secs, 5
.66 msecs/r
./test/district.ibd 18 -- read: 15 requests, 15 pages, 0.01 secs, 0.78 msecs/r, write: 8 requests, 31 pages, 0.02 secs, 3.02 msecs/
r
./test/item.ibd 16 -- read: 757 requests, 904 pages, 0.67 secs, 0.89 msecs/r, write: 0 requests, 0 pages, 0.00 secs, 0.00 msecs/r
./ib_logfile0 4294967280 -- read: 6 requests, 9 pages, 0.00 secs, 0.02 msecs/r, write: 25630 requests, 25877 pages, 0.56 secs, 0.02
msecs/r5View comments
-
Justin just added a patch for global transaction IDs, binlog event checksums and crash-safe replication state. It is at code.google.com. This patch is based on MySQL 5.0.68, so Justin did a bit of work to port code forward from the version we use (5.0.37).
Well, I assume that this includes support for crash-safe replication state. This replaces transactional replication. But it works for all storage engines.
Percona has ported a few of the replication features from previous Google patches. Hopefully, they are interested in these changes. MySQL has semi-sync replication in 6.0 with a promise to backport to 5.4. Perhaps these changes will end up there too.5View comments
Add a comment