LevelDB might be a great fit for MongoDB. MongoDB doesn't need multi-statement transactions. Both are limited by 1-writer or N-reader concurrency, but writes to database files are much faster with LevelDB because it doesn't do update in place. So LevelDB doesn't lose as much performance for IO-bound workloads by doing 1-writer or N-readers and my guess is that this could make MongoDB much better at supporting IO-bound workloads.
There is a problem in MySQL land that doesn't get enough attention. When startups using MySQL get really big then they try to hire developers to work on MySQL internals (see Facebook, Google, Twitter, LinkedIn, etc). And I mean try because there is much more demand than supply in the US. There is a lot of talent but most is in other countries. While remote development has been successful in the MySQL community most of that is for traditional software development. Hacking on MySQL at a fast growing startup is far from traditional. You occasionally need to make MySQL better right now to fix serious problems in production. You won't get 6 month development cycles unless you have a large team. This requires a strong relationship with the operations team. This also requires personal experience with operations. That is harder to get while working remotely. Remote development can work. I am an example of that, but I am only 1 hour from the office by plane and I already had strong relationships with the operations team.
A great solution is for someone already working at the startup to begin hacking on MySQL. This is great because it grows the supply of MySQL internals expertise and MySQL will get better at a faster rate. My teams at Facebook and Google grew in that manner. The problem is that it can be very hard to grow the team size from 0 to 1. The first person won't have a mentor. If the first person is also new to mature DBMS code then they might be in for a surprise. In my case 9 years between Informix and Oracle increased my tolerance level but not everyone has that background.
I think we can make it easy for the first person on the team by providing training and mentorship. A bootcamp in April during/after/before the UC is one way to do that and there can be remote mentorship once per 2 weeks after that. The experts who can teach the class (like gurus from MariaDB) will be in town. Are people interested in this? I don't expect this to be free. If we expect professional training then we need to pay the professionals. But I hope that free training materials can eventually be produced from the effort.
Beyond getting paid for professional services there would be other benefits were MariaDB to lead the program. This can increase their community of users and hackers.
View comments
Mark,
ReplyDelete
Your idea sounds interesting. It also makes me think of a similar way to potentially attract more developers and that is possibly to offer bounties for feature requests that different people are interested in. That's not quite the focus you suggest above but for those idle minds who need a real incentive, and assuming that the required conditions could be met, this might bring in people and tempt them to look at the code and provide those new features that people need.
Managing this might be tricky but other projects seem to have found a way to do that so I don't think it's impossible, and many feature requests to improve MySQL are not so hard. For those, where there is someone willing to pay to get it resolved, and assuming the code can be "easily" merged into the upstream code base this might be good for all those involved.- AnonymousFebruary 22, 2014 at 1:05 PM
I think taking the time to groom developers about what to expect to see in the MySQL code base, and how to write code that will be more likely accepted upstream would be good topics. The MySQL code base really takes some getting used to, and if you aren't very careful things will end up breaking in some seemingly obscure ways in the test suite (if you are lucky), or you introduce a new bug which is no good.
ReplyDelete
Writing actual good tests with mtr would also be a good topic.
Even something like #dbhangops could be organized where developers can join for a one or two hour chat every two weeks. Also a separate discussion list about getting started coding mysql might be a good idea, because I always feel weird about posting newbie questions to the internals list, but maybe that is just me.
Basically some aces like yourself, antony, jeremy cole, you guys can look at most patches and know right away if something is "not right". Such code rewviews would be a very valuable contribution to the community. Again, the mailing list I described might be a good format for such a thing. Hi, Mark
ReplyDelete
I think this is a great idea!
I am he dengcheng from China, i have been involved in database development for more than 8 years. Now, i am leading a team in Netease, developing our own storage engine based on MySQL, plus a distributed database proxy. All of these two products, have been used in our company for more than 3 years.
I fully agree your opinion: "The problem is that it can be very hard to grow the team size from 0 to 1. " So if your idea can carry out successfully in America, i think i can run the same idea in China, helping more company for their database teams.This makes good sense, Mark.
ReplyDelete
Given the number of recruiters that seem to think I'd enjoy or be good at the sort of work you talk about, this is clearly needed. The questions that pop into my mind are:
(1) are there ways that my employer can help make this a reality?
(2) is there a role for the MariaDB Foundation in this?
JeremyHi Mark,
ReplyDelete
As far as publications the only book available to hack MySQL is Understanding MySQL internals which is outdated and there are not any guides for hacking MySQL at the production level.
It's a great idea to engage more developers in MySQL development.- AnonymousFebruary 24, 2014 at 8:18 AM
I recently needed to maintain an in memory, non persistent data structure inside of TokuDB. It would have been nice to be able to use the std::unordered_map class, but there are naming conflicts between the standard container types and the MySQL source. One example is the min and max macros. It is nice to see that MySQL 5.6 renamed these macros so naming conflicts with the standard containers do not occur. However, TokuDB also runs on MySQL 5.5.
ReplyDelete
The MySQL implementations of basic containers are not easy to use. The API definition (my_tree.h for example) has no documentation. I could not find any tests for these containers, nor could I find examples that use them. Unfortunately, the MySQL source code is the documentation. This lack of documentation is certainly a barrier.
There're few interesting design choices to make.
ReplyDeleteThough it is easy to have shared block cache, per-table write buffers is a different topic, so either tables have to coexist in single read-write lock environment, or there're multiple transaction environments per table each having their own logs/compaction threads/buffers/etc.
One could pick something in the middle and have per-database.
Of course, global environment could work too for various workloads, and would have most transactional consistency.
How does this compare to Aria? It seems (after a quick glance) fairly similar.
ReplyDeleteAria is B-Tree, LevelDB is LSM-Tree, very different!
ReplyDeleteMySQL storage engine API isn't hard. It's just arcane. It should be possible to wrap this quite quickly.
ReplyDeleteDomas: And that's the only difference?
ReplyDeleteAntony - you might be able to quickly produce a proof-of-concept that kind of works. That doesn't help much. If it were easy we would have more production quality storage engines, we don't. PBXT had great potential and a talented developer. Who runs that in production? No engines have emerged from the community. A few have emerged from funded startups but that process has not been easy.
ReplyDeleteI have read handler.h, I have maintained a read-only engine that was widely used in production. The API is difficult, messy, poorly documented or not documented, .... The protocols are not documented for operations that occur across multiple calls. I have read a few times on internals@ that people should consult MyISAM source to figure out how things are supposed to work.
LevelDB has many concurrency issues which actually place it in the same league with Redis. Works nice in a single thread. If someone wants to get maximum from available hardware one needs to shard DB and run several processes/threads in parallel. Each with its own DB shard. Its not easy task.
ReplyDeleteAnother leveldb feature: vast amount of storage files. From I/O performance point of view, the lesser files - better performance. Unfortunately, there is a trade off between number of SST files and query latency variation: the larger single storage file is - the more time will require to compact it, hence leveldb availability will be limited during this time due to quite primitive internal concurrency control.
This is probably a good fit for single user storage but currently we do not have any evidence that it can scale nicely beyond couple dozens of GB of data.
Another note:
ReplyDeleteTo my best knowledge, although LevelDB design was inspired by BigTable tablet design - its totally different product and is not used in the current BigTable.
This comment has been removed by the author.
ReplyDeleteThe results here provide more details on the behavior that you describe:
ReplyDeletehttps://github.com/m1ch1/mapkeeper/wiki/MySQL-vs.-LevelDB
I would like to see a LevelDB plugin as well but when I think about trying to use it in production my main concern is trying to allocate memory between InnoDB and LevelDB. But however I decide to split it will be wrong more often than not for dynamic workloads. MySQL needs a shared memory pool for storage engines. Until then, I would rather see an LSM based index added to InnoDB.
ReplyDeleteI have tried TokuDB and it works great!! But I will check LevelDB to put some comment here. In the meantime would you have anything to say about TokuDB?
ReplyDeleteI also think TokuDB is great
ReplyDeleteThe development progress of LeveDB storage engine can be tracked here: https://code.launchpad.net/~maria-captains/maria/mysql-5.6-leveldb and https://mariadb.atlassian.net/browse/MDEV-3841
ReplyDeleteHow do you think about it's progress?
many won't fix issues.