There are workloads for which MyISAM is faster. But we should stop saying MyISAM is faster. This is repeated too often with neither qualification nor quantification. So I will take a turn. Either my results are bogus or InnoDB is faster (for most workloads). Regardless, tmcalloc makes everything in MySQL faster. I am not the first to say this and this isn't the first time I have said this, but I have pretty graphs to accompany my statement.
MyISAM has several advantages versus InnoDB, but it also has several performance problems:
- readers block writers and writers block readers with the exception of insert-append
- data blocks are not cached by the MySQL server
- there is a global MyISAM index cache (key cache) protected by one mutex. Tables can have their own key cache which I hope is protected by a separate mutex, but this doesn't reduce contention for workloads that are isolated to one or a few tables
- sysbench test args: --test=oltp --oltp-read-only --oltp-table-size=1000000 --max-time=60 --max-requests=0 --db-ps-mode=disable
- server with 8 CPU cores
- dataset is cached in memory
- sysbench client and mysqld ran on the same server
- unmodified MySQL 5.0.75 and Google-patched MySQL 5.0.37. The Google patch includes the use of tcmalloc.
This plots the performance of InnoDB, MyISAM and Blackhole for patched 5.0.37. InnoDB is much faster than MyISAM.
This plots the performance of InnoDB, MyISAM and Blackhole for unmodified 5.0.75. InnoDB is a bit faster than MyISAM.


One case where MyISAM tables have helped me is in its ability to do MERGE tables. A bottleneck in InnoDB might happen if you've got gigabytes of time series data to insert (on a daily basis or something); merge tables will be much faster than say inserting into one big InnoDB table with tons of rows and indices already added on the table esp if you can process MyISAM tables remotely on slaves and copy them to a master machine. It's much faster to create a database table without indices and then ALTER that table to add the indices and append it to the MERGE table in my experience than the InnoDB monolithic table way.
ReplyDeleteMySQL will get much better at supporting data-warehouse workloads when Maria arrives because it will maintain this property and provide a buffer cache for data blocks and provide crash recovery. Maria with crash recovery but without support for concurrent transactions would be sufficient for that.
ReplyDeleteYour use case for all data in-memory is ideal for InnoDB to crush MyISAM.
ReplyDeleteThe difficulty with InnoDB is that the on-disk format is MUCH larger than MyISAM... 3-4x larger.
For disk-based workloads InnoDB can fall over because it can't keep up with MyISAM in a number of situations.
Especially when you're in a insert-append mode with MyISAM where it's somewhat lock free.
@Kevin - InnoDB tables use more space than MyISAM and may be 3X to 4X larger for your schema. They are not 3X or 4X larger as a rule of thumb. For the sysbench table, InnoDB uses 239 bytes per row and MyISAM uses 203 bytes per row == 1.17X larger.
ReplyDeleteWhat is up with your schema? Do you have many small rows? Do you have huge primary key columns and many secondary indexes?
For diskbound workloads InnoDB has a few huge advantages over MyISAM -- insert buffer, index organized on the primary key (great when inserts are in PK order) and a bufer cache.
1) readers block writers and writers block readers with the exception of insert-append
ReplyDeleteThat alone can kill replication, if you are doing large queries on the slave.
@Mark: The 3x to 4x Kevin quotes is due to MVCC, the overhead of which you would only experience if you were performing a significant number of writes.
ReplyDelete@Stu - I am not sure what Kevin means. He has repeated this statement many times but never provided more details.
ReplyDeleteI doubt that MyISAM is any less prone to fragmentation than InnoDB. And fragmentation should leave the leaf blocks ~60% full on average which would account for ~1.7X difference for space used by secondary indexes assuming there is no fragmentation for MyISAM. Prior to the 5.1 plugin, it is safe to assume that InnoDB secondary indexes are always fragmented because they cannot be built by sort.
@Stu - by MVCC I assume you mean the per-row overhead for InnoDB. That is ~20 bytes per row + 1 byte per column. That is a big deal for small rows. The other overhead is that the PK column(s) is (are) in all secondary index entries. That is a big deal when the PK is big. From a test I just did, it looks like the per-index entry overhead for InnoDB is ~9 bytes.
ReplyDeleteThanks for the tcmalloc info:
ReplyDeleteI've been looking for a benchmark that compares tcmalloc vs glibc-malloc for a while. I set up my own comparison but did not find a significant enough difference to consider valid, which I attributed in my mind to ineffective linking/compiling.
Surely .75 doesn't come set with tcmalloc enabled, right? You wouldn't happen to have compile/run time arguments or specifications or more detailed performance benchmarks up your sleeve, would you?
What platform do you use (CPU architecure, OS, ...)?
ReplyDeleteLinking with libtcmalloc_minimal will get you many of the benefits. Our builds have more modifications: a SQL command to display memory in use, a SQL command to get tcmalloc to return unused memory to the OS. The v2 Google patch for MySQL 5.0.37 has some of these changes.
tcmalloc helps workloads dominated by fast statements as those are the most likely to spend a lot of time in malloc. Also, workloads that use the heap engine get a big boost (including the implicit use of HEAP tables for group by and order by) because HEAP tables have contention on one mutex that is unfortunately held when malloc is called. sysbench OLTP tests are one such workload for which there is a benefit.
I published many performance results but neglected to test unmodified MySQL + tcmalloc with no other changes -- http://code.google.com/p/google-mysql-tools/wiki/SmpPerformance. Stay tuned and I will try to post glibc vs tcmalloc results.
Mark,
ReplyDeleteThanks for the benchmarks you posted here: http://mysqlha.blogspot.com/2009/01/double-sysbench-throughput-with_18.html
Aside from a few HEAP tables, my production workload is disk bound. I was simply trying to gauge any performance improvements brought about by tcmalloc for high transaction volumes.
The server I ran the tests on was 16 core x86_64, cent5, 32gb ram.
If you are diskbound on InnoDB then the Percona build or the Google patch that support more background IO threads for InnoDB might be very useful.
ReplyDelete"MyISAM is faster" is a myth propagated by benchmark junkies who have no experience with real-world applications. I have one customer who gained 100x average performance improvement just by switching from MyISAM to InnoDB. The gain came mostly from the reduction in lock contention. We had a heavy volume of lightweight read queries which were being blocked several seconds or minutes at a time by infrequent but slow update statements. With MyISAM, the update locks had caused the performance on the reads to degrade from a few milliseconds to a minute or more, driving the average response time up to 10sec.
ReplyDelete"The difficulty with InnoDB is that the on-disk format is MUCH larger than MyISAM... 3-4x larger" is a big lie that's completely irreproducible in the real-world. My customer storage requirements for the whole database increased by only 25% after conversion of all 32 tables to InnoDB, going from 800MB to 1GB.