1. Are you going to JavaOne? I am not, but I will be at the Open HA Cluster Summit tomorrow. I will be part of a panel session on HA. I guess they needed someone with the HA light perspective -- that is how do you get a highly available service when you don't get to use HA components like MySQL Cluster. A lot of interesting work remains to be done to make this possible with regular MySQL. Projects like MMM, Tungsten and the Google patch with global transaction IDs are pieces that might eventually provide a complete solution. There is work underway at MySQL/Sun in the replication team as well. They may even be buiding the integrated solution for MySQL Enterprise.
    0

    Add a comment

  2. These are my plans for making InnoDB faster on SMP and high-IOPs servers. I think we can double throughput at high levels of concurrency.

    Future work:
    1. Reduce the size of mutex and rw-lock structures
    2. Reduce contention on the sync array mutex
    3. Reduce contention on kernel_mutex
    4. Reduce contention on commit_prepare_mutex
    5. Reduce the number of mutex lock/unlock calls used when a thread is put on the sync array
    6. Name all events, rw-locks and mutexes in InnoDB to make contention statistics output useful
    7. Add optional support to time all operations that may block
    8. Introduce dulint to native 64-bit integer types
    9. Make BUF_READ_AHEAD_AREA a compile-time constant
    10. Prevent full table scans from wiping out the InnoDB buffer cache
    11. Make prefetching smarter
    12. Get feedback from Dimitri, Domas, Mikael and Percona
    13. Use prefetch with MRR/BKA to get parallel IO in InnoDB
    14. Investigate larger doublewrite buffer to allow for more concurrent IOs
    15. Make Innodb work with a 4kb page size
    16. Make trx_purge() faster when called by the main background thread
    17. Use crc32 for Innodb page checksums with hardware support or otherwise make checksum faster.
    18. Reduce the per-page overhead for sync objects
    19. Repeat
    Current work:
    1. Add my.cnf options to disable InnoDB prefetch reads
    2. Put more output in SHOW INNODB STATUS and SHOW STATUS
    3. Reduce the overhead from buf_flush_free_margin()
    4. Change background IO threads to use available IO capacity
    5. Use more IO to merge insert buffer records when the insert buffer is full
    Non-InnoDB work:
    1. Fix mutex contention for the HEAP engine
    2. Fix mutex contention for the MyISAM engine
    3. Fix mutex contention for the query cache
    4. Give priority (CPU, disk) to the replication SQL thread to minimize replication delay.
    5. Push changes for --oltp-secondary-index to public sysbench branch
    6. Add support to sysbench fileio for transaction log and doublewrite buffer IO patterns
    5

    View comments

  3. Once again Domas is unhappy with some aspect of Innodb performance and doing crazy things with gdb to tune it. I made it faster by changing the checksum code to process one 32-bit word at a time rather than one byte at a time. This will be in a future Google patch and is enabled with the parameter innodb_fast_checksum. This is not compatible with the old checksum so you must dump and reload the database to use it.

    I measured the benefit using the insert benchmark from Tokutek on a server that can do a lot of IO. CPU overheads are measured using oprofile. The data below lists the percentage of time for the top 4 functions in mysqld. The checksum is computed in buf_calc_page_new_checksum. By using the fast checksum, the checksum overhead drops from 33.6% to 22.1% for gcc -O2 and from 31.6% to 17.3% for gcc -O3.

    Overhead for gcc -O2

    Using the original checksum code:
    • 33.6% - buf_calc_page_new_checksum
    • 10.4% - memcpy
    • 4.4% - os_aio_simulated_handle
    • 4.3% - rec_get_offsets_func
    Using the fast checksum code:
    • 22.1% - buf_calc_page_new_checksum
    • 12.1% - memcpy
    • 5.1% - rec_get_offsets_func
    • 4.9% - os_aio_simulated_handle
    Overhead for gcc -O3

    Using the original checksum code:
    • 31.6% - buf_calc_page_new_checksum
    • 12.6% - memcpy
    • 5.8% - rec_get_offsets_func
    • 2.6 - os_aio_simulated_handle
    Using the fast checksum code:
    • 17.3% - buf_calc_page_new_checksum
    • 13.6% - memcpy
    • 6.8% - rec_get_offsets_func
    • 2.0% - os_aio_simulated_handle
    3

    View comments

  4. The v4 patch has been published. A description of the changes and performance results are here. I am still analyzing the results to make sure that I can explain performance differences and the lack of performance differences.
    0

    Add a comment

  5. I added support for per-tablespace IO statistics to InnoDB. This also provides per-table IO statistics when you innodb_file_per_table is used. The stats are listed in SHOW INNODB STATUS and the text below is output when tpcc-mysql is run -- pardon the formatting. The code should appear at code.google.com real soon now.

    File IO statistics
      ./test/warehouse.ibd 10 -- read: 4 requests, 4 pages, 0.00 secs, 0.72 msecs/r, write: 3 requests, 3 pages, 0.00 secs, 1.43 msecs/r
      ./ibdata1 0 -- read: 30 requests, 203 pages, 0.03 secs, 0.99 msecs/r, write: 124 requests, 3020 pages, 0.74 secs, 5.93 msecs/r
      ./test/orders.ibd 29 -- read: 8490 requests, 10033 pages, 8.48 secs, 1.00 msecs/r, write: 6754 requests, 12728 pages, 34.27 secs, 5
    .07 msecs/r
      ./test/customer.ibd 28 -- read: 33901 requests, 34226 pages, 32.05 secs, 0.95 msecs/r, write: 11224 requests, 11850 pages, 43.17 se
    cs, 3.85 msecs/r
      ./test/stock.ibd 27 -- read: 151957 requests, 176913 pages, 256.89 secs, 1.69 msecs/r, write: 41475 requests, 52199 pages, 220.43 s
    ecs, 5.31 msecs/r
      ./test/order_line.ibd 25 -- read: 14239 requests, 14876 pages, 13.10 secs, 0.92 msecs/r, write: 11610 requests, 38413 pages, 45.01
    secs, 3.88 msecs/r
      ./test/new_orders.ibd 22 -- read: 2023 requests, 2316 pages, 1.80 secs, 0.89 msecs/r, write: 1213 requests, 7004 pages, 7.58 secs,
    6.25 msecs/r
      ./test/history.ibd 21 -- read: 5740 requests, 7711 pages, 5.64 secs, 0.98 msecs/r, write: 4938 requests, 22754 pages, 27.97 secs, 5
    .66 msecs/r
      ./test/district.ibd 18 -- read: 15 requests, 15 pages, 0.01 secs, 0.78 msecs/r, write: 8 requests, 31 pages, 0.02 secs, 3.02 msecs/
    r
      ./test/item.ibd 16 -- read: 757 requests, 904 pages, 0.67 secs, 0.89 msecs/r, write: 0 requests, 0 pages, 0.00 secs, 0.00 msecs/r
      ./ib_logfile0 4294967280 -- read: 6 requests, 9 pages, 0.00 secs, 0.02 msecs/r, write: 25630 requests, 25877 pages, 0.56 secs, 0.02
     msecs/r
    5

    View comments

  6. Justin just added a patch for global transaction IDs, binlog event checksums and crash-safe replication state. It is at code.google.com. This patch is based on MySQL 5.0.68, so Justin did a bit of work to port code forward from the version we use (5.0.37).

    Well, I assume that this includes support for crash-safe replication state. This replaces transactional replication. But it works for all storage engines.

    Percona has ported a few of the replication features from previous Google patches. Hopefully, they are interested in these changes. MySQL has semi-sync replication in 6.0 with a promise to backport to 5.4. Perhaps these changes will end up there too.
    5

    View comments

Loading