LevelDB is here. It might make an interesting storage engine for MySQL as it has many performance benefits at the cost of a few limitations: no multi-statement transactions and MyISAM-like concurrency. I doubt we will ever get a production quality LevelDB storage engine for MySQL because the storage engine API is hard, so hard that such projects require funding. This is unfortunate.

LevelDB might be a great fit for MongoDB. MongoDB doesn't need multi-statement transactions. Both are limited by 1-writer or N-reader concurrency, but writes to database files are much faster with LevelDB because it doesn't do update in place. So LevelDB doesn't lose as much performance for IO-bound workloads by doing 1-writer or N-readers and my guess is that this could make MongoDB much better at supporting IO-bound workloads.
14

View comments

  1. There're few interesting design choices to make.

    Though it is easy to have shared block cache, per-table write buffers is a different topic, so either tables have to coexist in single read-write lock environment, or there're multiple transaction environments per table each having their own logs/compaction threads/buffers/etc.

    One could pick something in the middle and have per-database.

    Of course, global environment could work too for various workloads, and would have most transactional consistency.

    ReplyDelete
  2. How does this compare to Aria? It seems (after a quick glance) fairly similar.

    ReplyDelete
  3. Aria is B-Tree, LevelDB is LSM-Tree, very different!

    ReplyDelete
  4. MySQL storage engine API isn't hard. It's just arcane. It should be possible to wrap this quite quickly.

    ReplyDelete
  5. Domas: And that's the only difference?

    ReplyDelete
  6. Antony - you might be able to quickly produce a proof-of-concept that kind of works. That doesn't help much. If it were easy we would have more production quality storage engines, we don't. PBXT had great potential and a talented developer. Who runs that in production? No engines have emerged from the community. A few have emerged from funded startups but that process has not been easy.

    I have read handler.h, I have maintained a read-only engine that was widely used in production. The API is difficult, messy, poorly documented or not documented, .... The protocols are not documented for operations that occur across multiple calls. I have read a few times on internals@ that people should consult MyISAM source to figure out how things are supposed to work.

    ReplyDelete
  7. LevelDB has many concurrency issues which actually place it in the same league with Redis. Works nice in a single thread. If someone wants to get maximum from available hardware one needs to shard DB and run several processes/threads in parallel. Each with its own DB shard. Its not easy task.

    Another leveldb feature: vast amount of storage files. From I/O performance point of view, the lesser files - better performance. Unfortunately, there is a trade off between number of SST files and query latency variation: the larger single storage file is - the more time will require to compact it, hence leveldb availability will be limited during this time due to quite primitive internal concurrency control.

    This is probably a good fit for single user storage but currently we do not have any evidence that it can scale nicely beyond couple dozens of GB of data.

    ReplyDelete
  8. Another note:

    To my best knowledge, although LevelDB design was inspired by BigTable tablet design - its totally different product and is not used in the current BigTable.

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. The results here provide more details on the behavior that you describe:
    https://github.com/m1ch1/mapkeeper/wiki/MySQL-vs.-LevelDB

    ReplyDelete
  11. I would like to see a LevelDB plugin as well but when I think about trying to use it in production my main concern is trying to allocate memory between InnoDB and LevelDB. But however I decide to split it will be wrong more often than not for dynamic workloads. MySQL needs a shared memory pool for storage engines. Until then, I would rather see an LSM based index added to InnoDB.

    ReplyDelete
  12. I have tried TokuDB and it works great!! But I will check LevelDB to put some comment here. In the meantime would you have anything to say about TokuDB?

    ReplyDelete
  13. The development progress of LeveDB storage engine can be tracked here: https://code.launchpad.net/~maria-captains/maria/mysql-5.6-leveldb and https://mariadb.atlassian.net/browse/MDEV-3841

    How do you think about it's progress?
    many won't fix issues.

    ReplyDelete

Many startups depend on MySQL. Check out the list of keynote speakers from past and present MySQL User Conferences. There are things to make better and new features to add like better support for dynamic schemas (documents) and auto-sharding. But existing deployments aren't in crisis mode and it is being used for new deployments. Maybe I have an agenda in writing this but anyone writing the opposite is likely to have their own agenda.

There is a problem in MySQL land that doesn't get enough attention. When startups using MySQL get really big then they try to hire developers to work on MySQL internals (see Facebook, Google, Twitter, LinkedIn, etc). And I mean try because there is much more demand than supply in the US. There is a lot of talent but most is in other countries. While remote development has been successful in the MySQL community most of that is for traditional software development. Hacking on MySQL at a fast growing startup is far from traditional. You occasionally need to make MySQL better right now to fix serious problems in production. You won't get 6 month development cycles unless you have a large team. This requires a strong relationship with the operations team. This also requires personal experience with operations. That is harder to get while working remotely. Remote development can work. I am an example of that, but I am only 1 hour from the office by plane and I already had strong relationships with the operations team.

A great solution is for someone already working at the startup to begin hacking on MySQL. This is great because it grows the supply of MySQL internals expertise and MySQL will get better at a faster rate. My teams at Facebook and Google grew in that manner. The problem is that it can be very hard to grow the team size from 0 to 1. The first person won't have a mentor. If the first person is also new to mature DBMS code then they might be in for a surprise. In my case 9 years between Informix and Oracle increased my tolerance level but not everyone has that background.

I think we can make it easy for the first person on the team by providing training and mentorship. A bootcamp in April during/after/before the UC is one way to do that and there can be remote mentorship once per 2 weeks after that. The experts who can teach the class (like gurus from MariaDB) will be in town. Are people interested in this? I don't expect this to be free. If we expect professional training then we need to pay the professionals. But I hope that free training materials can eventually be produced from the effort.

Beyond getting paid for professional services there would be other benefits were MariaDB to lead the program. This can increase their community of users and hackers.
16

View comments

  1. Mark,

    Your idea sounds interesting. It also makes me think of a similar way to potentially attract more developers and that is possibly to offer bounties for feature requests that different people are interested in. That's not quite the focus you suggest above but for those idle minds who need a real incentive, and assuming that the required conditions could be met, this might bring in people and tempt them to look at the code and provide those new features that people need.

    Managing this might be tricky but other projects seem to have found a way to do that so I don't think it's impossible, and many feature requests to improve MySQL are not so hard. For those, where there is someone willing to pay to get it resolved, and assuming the code can be "easily" merged into the upstream code base this might be good for all those involved.

    ReplyDelete
    Replies
    1. Summer of code projects are one example of the bounty-for-features program. I haven't been involved in it, but I think it can work. Participants are screened and then rewarded for their effort. MariaDB is another example, but there is no money in return for work. However, the MariaDB/SkySQL developers are very good at responding to patch contributors.

      But in this case I am proposing a much more narrow program targeted at companies that want their existing employees to hack on MySQL/MariaDB.

      Delete
  2. I think taking the time to groom developers about what to expect to see in the MySQL code base, and how to write code that will be more likely accepted upstream would be good topics. The MySQL code base really takes some getting used to, and if you aren't very careful things will end up breaking in some seemingly obscure ways in the test suite (if you are lucky), or you introduce a new bug which is no good.

    Writing actual good tests with mtr would also be a good topic.

    Even something like #dbhangops could be organized where developers can join for a one or two hour chat every two weeks. Also a separate discussion list about getting started coding mysql might be a good idea, because I always feel weird about posting newbie questions to the internals list, but maybe that is just me.

    Basically some aces like yourself, antony, jeremy cole, you guys can look at most patches and know right away if something is "not right". Such code rewviews would be a very valuable contribution to the community. Again, the mailing list I described might be a good format for such a thing.

    ReplyDelete
  3. Hi, Mark
    I think this is a great idea!
    I am he dengcheng from China, i have been involved in database development for more than 8 years. Now, i am leading a team in Netease, developing our own storage engine based on MySQL, plus a distributed database proxy. All of these two products, have been used in our company for more than 3 years.
    I fully agree your opinion: "The problem is that it can be very hard to grow the team size from 0 to 1. " So if your idea can carry out successfully in America, i think i can run the same idea in China, helping more company for their database teams.

    ReplyDelete
    Replies
    1. The MySQL community beyond China doesn't have enough contact with the community in China. I read about very interesting work being done there, but the stories about that work aren't being written frequently.

      Delete
  4. Very interested in this.

    ReplyDelete
  5. This makes good sense, Mark.

    Given the number of recruiters that seem to think I'd enjoy or be good at the sort of work you talk about, this is clearly needed. The questions that pop into my mind are:

    (1) are there ways that my employer can help make this a reality?

    (2) is there a role for the MariaDB Foundation in this?

    Jeremy

    ReplyDelete
    Replies
    1. My focus at this point is on companies that want a current employer to get better at hacking on MySQL. Your employer can signing up for that training -- assuming you need a MySQL hacker. It will take time to figure out who is going to provide this training. I am trying to get people talking about this for now and then there will be private conversations with some of the web-scale MySQL deployments and hopefully MariaDB.

      Delete
    2. Training for people to get better at MySQL operations is another topic that I am ignoring for now. I think there is a need for that too.

      Delete
    3. Agreed on *both* needs. We just wonder from time to time about the best ways we can help and contribute back. We have talked about having a MySQL developer on staff, not necessarily because we *need* one, but doing that sort thing would be a benefit to the community and could also come in handy for us once in a while too.

      Delete
    4. If you are interested in getting support for auto-sharding and dynamic schemas then participate in that discussion -- http://smalldatum.blogspot.com/2014/02/use-case.html

      Delete
  6. Hi Mark,
    As far as publications the only book available to hack MySQL is Understanding MySQL internals which is outdated and there are not any guides for hacking MySQL at the production level.
    It's a great idea to engage more developers in MySQL development.

    ReplyDelete
    Replies
    1. This is the only book I recommend for internals. It isn't an internals book but has some details and really it is an excellent book - http://www.packtpub.com/mysql-51-plugin-development/book

      I think the best way to start is to begin reading the code relevant to a problem you want to understand. InnoDB is a good place to start because the code is well structured.

      Personally it would be very frustrating to try and figure out the architecture of MySQL starting from scratch. But the code can be understood in small steps. Even today there are a lot of things in MySQL that I have never understood. I rarely look at optimizer code.

      Delete
    2. Thank you so much for the suggestion.

      Delete
  7. I recently needed to maintain an in memory, non persistent data structure inside of TokuDB. It would have been nice to be able to use the std::unordered_map class, but there are naming conflicts between the standard container types and the MySQL source. One example is the min and max macros. It is nice to see that MySQL 5.6 renamed these macros so naming conflicts with the standard containers do not occur. However, TokuDB also runs on MySQL 5.5.

    The MySQL implementations of basic containers are not easy to use. The API definition (my_tree.h for example) has no documentation. I could not find any tests for these containers, nor could I find examples that use them. Unfortunately, the MySQL source code is the documentation. This lack of documentation is certainly a barrier.

    ReplyDelete
    Replies
    1. A few weeks back I wondered why Toku doesn't have a windows port. After all, MySQL has a portable library. I tend to stay far away from the libraries as I remember the lack of documentation. You might have more tolerance than I do. Unfortunately, reading existing uses of the library is the documentation. And back in the day common practice was to ignore error return values.

      Delete
Loading