Saturday, May 30, 2009
JavaOne and the Open HA Cluster Summit
Are you going to JavaOne? I am not, but I will be at the Open HA Cluster Summit tomorrow. I will be part of a panel session on HA. I guess they needed someone with the HA light perspective -- that is how do you get a highly available service when you don't get to use HA components like MySQL Cluster. A lot of interesting work remains to be done to make this possible with regular MySQL. Projects like MMM, Tungsten and the Google patch with global transaction IDs are pieces that might eventually provide a complete solution. There is work underway at MySQL/Sun in the replication team as well. They may even be buiding the integrated solution for MySQL Enterprise.
Thursday, May 28, 2009
InnoDB performance TODO list
These are my plans for making InnoDB faster on SMP and high-IOPs servers. I think we can double throughput at high levels of concurrency.
Future work:
Future work:
- Reduce the size of mutex and rw-lock structures
- Reduce contention on the sync array mutex
- Reduce contention on kernel_mutex
- Reduce contention on commit_prepare_mutex
- Reduce the number of mutex lock/unlock calls used when a thread is put on the sync array
- Name all events, rw-locks and mutexes in InnoDB to make contention statistics output useful
- Add optional support to time all operations that may block
- Introduce dulint to native 64-bit integer types
- Make BUF_READ_AHEAD_AREA a compile-time constant
- Prevent full table scans from wiping out the InnoDB buffer cache
- Make prefetching smarter
- Get feedback from Dimitri, Domas, Mikael and Percona
- Use prefetch with MRR/BKA to get parallel IO in InnoDB
- Investigate larger doublewrite buffer to allow for more concurrent IOs
- Make Innodb work with a 4kb page size
- Make trx_purge() faster when called by the main background thread
- Use crc32 for Innodb page checksums with hardware support or otherwise make checksum faster.
- Reduce the per-page overhead for sync objects
- Repeat
- Add my.cnf options to disable InnoDB prefetch reads
- Put more output in SHOW INNODB STATUS and SHOW STATUS
- Reduce the overhead from buf_flush_free_margin()
- Change background IO threads to use available IO capacity
- Use more IO to merge insert buffer records when the insert buffer is full
- Fix mutex contention for the HEAP engine
- Fix mutex contention for the MyISAM engine
- Fix mutex contention for the query cache
- Give priority (CPU, disk) to the replication SQL thread to minimize replication delay.
- Push changes for --oltp-secondary-index to public sysbench branch
- Add support to sysbench fileio for transaction log and doublewrite buffer IO patterns
InnoDB checksum performance
Once again Domas is unhappy with some aspect of Innodb performance and doing crazy things with gdb to tune it. I made it faster by changing the checksum code to process one 32-bit word at a time rather than one byte at a time. This will be in a future Google patch and is enabled with the parameter innodb_fast_checksum. This is not compatible with the old checksum so you must dump and reload the database to use it.
I measured the benefit using the insert benchmark from Tokutek on a server that can do a lot of IO. CPU overheads are measured using oprofile. The data below lists the percentage of time for the top 4 functions in mysqld. The checksum is computed in buf_calc_page_new_checksum. By using the fast checksum, the checksum overhead drops from 33.6% to 22.1% for gcc -O2 and from 31.6% to 17.3% for gcc -O3.
Overhead for gcc -O2
Using the original checksum code:
Using the original checksum code:
I measured the benefit using the insert benchmark from Tokutek on a server that can do a lot of IO. CPU overheads are measured using oprofile. The data below lists the percentage of time for the top 4 functions in mysqld. The checksum is computed in buf_calc_page_new_checksum. By using the fast checksum, the checksum overhead drops from 33.6% to 22.1% for gcc -O2 and from 31.6% to 17.3% for gcc -O3.
Overhead for gcc -O2
Using the original checksum code:
- 33.6% - buf_calc_page_new_checksum
- 10.4% - memcpy
- 4.4% - os_aio_simulated_handle
- 4.3% - rec_get_offsets_func
- 22.1% - buf_calc_page_new_checksum
- 12.1% - memcpy
- 5.1% - rec_get_offsets_func
- 4.9% - os_aio_simulated_handle
Using the original checksum code:
- 31.6% - buf_calc_page_new_checksum
- 12.6% - memcpy
- 5.8% - rec_get_offsets_func
- 2.6 - os_aio_simulated_handle
- 17.3% - buf_calc_page_new_checksum
- 13.6% - memcpy
- 6.8% - rec_get_offsets_func
- 2.0% - os_aio_simulated_handle
Tuesday, May 26, 2009
InnoDB IO performance and the v4 Google patch
Friday, May 22, 2009
A good reason to use inodb_file_per_table -- per-table IO statistics
I added support for per-tablespace IO statistics to InnoDB. This also provides per-table IO statistics when you innodb_file_per_table is used. The stats are listed in SHOW INNODB STATUS and the text below is output when tpcc-mysql is run -- pardon the formatting. The code should appear at code.google.com real soon now.
File IO statistics
./test/warehouse.ibd 10 -- read: 4 requests, 4 pages, 0.00 secs, 0.72 msecs/r, write: 3 requests, 3 pages, 0.00 secs, 1.43 msecs/r
./ibdata1 0 -- read: 30 requests, 203 pages, 0.03 secs, 0.99 msecs/r, write: 124 requests, 3020 pages, 0.74 secs, 5.93 msecs/r
./test/orders.ibd 29 -- read: 8490 requests, 10033 pages, 8.48 secs, 1.00 msecs/r, write: 6754 requests, 12728 pages, 34.27 secs, 5
.07 msecs/r
./test/customer.ibd 28 -- read: 33901 requests, 34226 pages, 32.05 secs, 0.95 msecs/r, write: 11224 requests, 11850 pages, 43.17 se
cs, 3.85 msecs/r
./test/stock.ibd 27 -- read: 151957 requests, 176913 pages, 256.89 secs, 1.69 msecs/r, write: 41475 requests, 52199 pages, 220.43 s
ecs, 5.31 msecs/r
./test/order_line.ibd 25 -- read: 14239 requests, 14876 pages, 13.10 secs, 0.92 msecs/r, write: 11610 requests, 38413 pages, 45.01
secs, 3.88 msecs/r
./test/new_orders.ibd 22 -- read: 2023 requests, 2316 pages, 1.80 secs, 0.89 msecs/r, write: 1213 requests, 7004 pages, 7.58 secs,
6.25 msecs/r
./test/history.ibd 21 -- read: 5740 requests, 7711 pages, 5.64 secs, 0.98 msecs/r, write: 4938 requests, 22754 pages, 27.97 secs, 5
.66 msecs/r
./test/district.ibd 18 -- read: 15 requests, 15 pages, 0.01 secs, 0.78 msecs/r, write: 8 requests, 31 pages, 0.02 secs, 3.02 msecs/
r
./test/item.ibd 16 -- read: 757 requests, 904 pages, 0.67 secs, 0.89 msecs/r, write: 0 requests, 0 pages, 0.00 secs, 0.00 msecs/r
./ib_logfile0 4294967280 -- read: 6 requests, 9 pages, 0.00 secs, 0.02 msecs/r, write: 25630 requests, 25877 pages, 0.56 secs, 0.02
msecs/r
Tuesday, May 12, 2009
Patch for global transaction IDs, binlog event checksums and crash-safe replication state
Justin just added a patch for global transaction IDs, binlog event checksums and crash-safe replication state. It is at code.google.com. This patch is based on MySQL 5.0.68, so Justin did a bit of work to port code forward from the version we use (5.0.37).
Well, I assume that this includes support for crash-safe replication state. This replaces transactional replication. But it works for all storage engines.
Percona has ported a few of the replication features from previous Google patches. Hopefully, they are interested in these changes. MySQL has semi-sync replication in 6.0 with a promise to backport to 5.4. Perhaps these changes will end up there too.
Well, I assume that this includes support for crash-safe replication state. This replaces transactional replication. But it works for all storage engines.
Percona has ported a few of the replication features from previous Google patches. Hopefully, they are interested in these changes. MySQL has semi-sync replication in 6.0 with a promise to backport to 5.4. Perhaps these changes will end up there too.
Subscribe to:
Posts (Atom)

