LevelDB is here. It might make an interesting storage engine for MySQL as it has many performance benefits at the cost of a few limitations: no multi-statement transactions and MyISAM-like concurrency. I doubt we will ever get a production quality LevelDB storage engine for MySQL because the storage engine API is hard, so hard that such projects require funding. This is unfortunate.
LevelDB might be a great fit for MongoDB. MongoDB doesn't need multi-statement transactions. Both are limited by 1-writer or N-reader concurrency, but writes to database files are much faster with LevelDB because it doesn't do update in place. So LevelDB doesn't lose as much performance for IO-bound workloads by doing 1-writer or N-readers and my guess is that this could make MongoDB much better at supporting IO-bound workloads.
14
There're few interesting design choices to make.
ReplyDeleteThough it is easy to have shared block cache, per-table write buffers is a different topic, so either tables have to coexist in single read-write lock environment, or there're multiple transaction environments per table each having their own logs/compaction threads/buffers/etc.
One could pick something in the middle and have per-database.
Of course, global environment could work too for various workloads, and would have most transactional consistency.
How does this compare to Aria? It seems (after a quick glance) fairly similar.
ReplyDeleteAria is B-Tree, LevelDB is LSM-Tree, very different!
ReplyDeleteMySQL storage engine API isn't hard. It's just arcane. It should be possible to wrap this quite quickly.
ReplyDeleteDomas: And that's the only difference?
ReplyDeleteAntony - you might be able to quickly produce a proof-of-concept that kind of works. That doesn't help much. If it were easy we would have more production quality storage engines, we don't. PBXT had great potential and a talented developer. Who runs that in production? No engines have emerged from the community. A few have emerged from funded startups but that process has not been easy.
ReplyDeleteI have read handler.h, I have maintained a read-only engine that was widely used in production. The API is difficult, messy, poorly documented or not documented, .... The protocols are not documented for operations that occur across multiple calls. I have read a few times on internals@ that people should consult MyISAM source to figure out how things are supposed to work.
LevelDB has many concurrency issues which actually place it in the same league with Redis. Works nice in a single thread. If someone wants to get maximum from available hardware one needs to shard DB and run several processes/threads in parallel. Each with its own DB shard. Its not easy task.
ReplyDeleteAnother leveldb feature: vast amount of storage files. From I/O performance point of view, the lesser files - better performance. Unfortunately, there is a trade off between number of SST files and query latency variation: the larger single storage file is - the more time will require to compact it, hence leveldb availability will be limited during this time due to quite primitive internal concurrency control.
This is probably a good fit for single user storage but currently we do not have any evidence that it can scale nicely beyond couple dozens of GB of data.
Another note:
ReplyDeleteTo my best knowledge, although LevelDB design was inspired by BigTable tablet design - its totally different product and is not used in the current BigTable.
This comment has been removed by the author.
ReplyDeleteThe results here provide more details on the behavior that you describe:
ReplyDeletehttps://github.com/m1ch1/mapkeeper/wiki/MySQL-vs.-LevelDB
I would like to see a LevelDB plugin as well but when I think about trying to use it in production my main concern is trying to allocate memory between InnoDB and LevelDB. But however I decide to split it will be wrong more often than not for dynamic workloads. MySQL needs a shared memory pool for storage engines. Until then, I would rather see an LSM based index added to InnoDB.
ReplyDeleteI have tried TokuDB and it works great!! But I will check LevelDB to put some comment here. In the meantime would you have anything to say about TokuDB?
ReplyDeleteI also think TokuDB is great
ReplyDeleteThe development progress of LeveDB storage engine can be tracked here: https://code.launchpad.net/~maria-captains/maria/mysql-5.6-leveldb and https://mariadb.atlassian.net/browse/MDEV-3841
ReplyDeleteHow do you think about it's progress?
many won't fix issues.