LevelDB might be a great fit for MongoDB. MongoDB doesn't need multi-statement transactions. Both are limited by 1-writer or N-reader concurrency, but writes to database files are much faster with LevelDB because it doesn't do update in place. So LevelDB doesn't lose as much performance for IO-bound workloads by doing 1-writer or N-readers and my guess is that this could make MongoDB much better at supporting IO-bound workloads.
-
Many startups depend on MySQL. Check out the list of keynote speakers from past and present MySQL User Conferences. There are things to make better and new features to add like better support for dynamic schemas (documents) and auto-sharding. But existing deployments aren't in crisis mode and it is being used for new deployments. Maybe I have an agenda in writing this but anyone writing the opposite is likely to have their own agenda.
There is a problem in MySQL land that doesn't get enough attention. When startups using MySQL get really big then they try to hire developers to work on MySQL internals (see Facebook, Google, Twitter, LinkedIn, etc). And I mean try because there is much more demand than supply in the US. There is a lot of talent but most is in other countries. While remote development has been successful in the MySQL community most of that is for traditional software development. Hacking on MySQL at a fast growing startup is far from traditional. You occasionally need to make MySQL better right now to fix serious problems in production. You won't get 6 month development cycles unless you have a large team. This requires a strong relationship with the operations team. This also requires personal experience with operations. That is harder to get while working remotely. Remote development can work. I am an example of that, but I am only 1 hour from the office by plane and I already had strong relationships with the operations team.
A great solution is for someone already working at the startup to begin hacking on MySQL. This is great because it grows the supply of MySQL internals expertise and MySQL will get better at a faster rate. My teams at Facebook and Google grew in that manner. The problem is that it can be very hard to grow the team size from 0 to 1. The first person won't have a mentor. If the first person is also new to mature DBMS code then they might be in for a surprise. In my case 9 years between Informix and Oracle increased my tolerance level but not everyone has that background.
I think we can make it easy for the first person on the team by providing training and mentorship. A bootcamp in April during/after/before the UC is one way to do that and there can be remote mentorship once per 2 weeks after that. The experts who can teach the class (like gurus from MariaDB) will be in town. Are people interested in this? I don't expect this to be free. If we expect professional training then we need to pay the professionals. But I hope that free training materials can eventually be produced from the effort.
Beyond getting paid for professional services there would be other benefits were MariaDB to lead the program. This can increase their community of users and hackers.16View comments
-
My kids watched the new Lego movie today and spent the rest of the day repeating "Everything is amazing". I spent a few hours reading MongoDB documentation to help a friend who uses it. Everything wasn't awesome for all of us. I try to be a pessimist when reading database documentation. If you spend any time near production then you spend a lot of time debugging things that fail. Being less than optimistic is a good way to predict failure.
One source of pessimism is database limits. MongoDB has a great page to describe limits. It limits index keys to less than 1025 bytes. But this is a great example that shows the value of pessimism. The documentation states that values (MongoDB documents) are not added to the index when the index key is too large. An optimist might assume that an insert or update statement fails when the index key is too large, but that is not the specified behavior.
As far as I can tell, prior to MongoDB 2.5.5 the behavior was to not add the document to the index when the indexed column exceeded 1024 bytes. The insert or update would succeed but the index maintenance would fail. Queries that used the index after this can return incorrect results.
A quick search of the interwebs shows that people were aware of the problem in 2011. I can reproduce the problem on my Ubuntu 12.10 VM. Why do we tolerate problems like this? Maybe this isn't a big deal and the real problem is that new risks (from a system I don't know much about) are worse than risks in software that I have been using for years. But corruption (either permanent or via incorrect query results) has been a stop the world bug for me -- as in you do nothing else until the problem has been fixed. Why have MongoDB users tolerated this problem for years?
While it is great that the bug appears to have been fixed, database vendors should understand that FUD takes much more time to go away. See all of the stories about transactions and MySQL that lived on long after InnoDB became viable. And note that some of the MySQL FUD was self-inflicted -- see the section of the MySQL manual on Atomic Operations.
Found this code in 2.4.9 to explain how key-too-large is handled. A careful reader might figure out that index maintenance isn't done. Nice optimization. I spoke to one user who doesn't like the old behavior but doesn't want to break apps with the new behavior that fails inserts/updates with too large keys. Indexes on a prefix of a key would help in that case.
template< class V >
void BtreeBucket<V>::twoStepInsert(DiskLoc thisLoc,
IndexInsertionContinuationImpl<V> &c,
bool dupsAllowed) const
{
if ( c.key.dataSize() > this->KeyMax ) {
problem() << "ERROR: key too large len:" << c.key.dataSize()
<< " max:" << this->KeyMax << ' '
<< c.key.dataSize() << ' '
<< c.idx.indexNamespace() << endl;
return; // op=Nothing
}
insertStepOne(thisLoc, c, dupsAllowed);
}
/** todo: meaning of return code unclear clean up */
template< class V >
int BtreeBucket<V>::bt_insert(const DiskLoc thisLoc, const DiskLoc recordLoc,
const BSONObj& _key, const Ordering &order,
bool dupsAllowed,
IndexDetails& idx, bool toplevel) const
{
guessIncreasing = _key.firstElementType() == jstOID && idx.isIdIndex();
KeyOwned key(_key);
dassert(toplevel);
if ( toplevel ) {
if ( key.dataSize() > this->KeyMax ) {
problem() << "Btree::insert: key too large to index, skipping "
<< idx.indexNamespace() << ' ' << key.dataSize()
<< ' ' << key.toString() << endl;
return 3;
}
}
10View comments
-
Pardon the rant but the winter storm has kept me in a hotel away from home for a few days. I exchanged email this week with someone pitching a solution to a problem I don't have (MySQL failover). But by "I" I really mean the awesome operations team with whom I share an office. The pitch got off to a bad start. It is probably better to compliment the supposed expertise of the person to whom you are pitching than to suggest they are no different than COBOL hackers working on Y2K problems.
Unfortunately legacy is a bad word in my world. Going off topic, so is web scale. I hope we can change this. The suggestion that MySQL was a legacy technology was conveyed to me via email, x86, Linux and a laptop. Most of those have been around long enough to be considered legacy technology. DNA and the wheel are also legacy technology. Age isn't the issue. Relevance is determined by utility and efficiency.
Remember that utility is measured from a distance. It is easy to show that one algorithm can do one narrow operation much faster than as implemented in existing products. But an algorithm shouldn't be confused with a solution. A solution requires user eduction, documentation, skilled operations, trust, client libraries, backup, monitoring and more.0Add a comment
-
What is a modern database? We have some terms that wander between marketing and technical descriptions - NewSQL, NoSQL. We have much needed work on write-optimized database algorithms - Tokutek, LevelDB, RocksDB, HBase, Cassandra. We also get reports of amazing performance. I think there is too much focus on peak performance and not enough on predictable performance and manageability.
Building a DBMS for production workloads is hard. Writing from scratch is an opportunity to do a lot better than the products that you hope to replace. It is also an opportunity to repeat many mistakes. You can avoid some of the mistakes by getting advice from someone who has a lot of experience supporting production workloads. I worked at Oracle for 8 years, wrote some good code (new sort!) and fixed a lot of bugs but never got anywhere near production.
Common mistakes include insufficient monitoring and poor manageability. Monitoring should be simple. I want to know where something is running and not running (waiting on IO, locks). I also want to drill down by user and table -- user & table aren't just there for access control. I am SQL-centric in what follows. While there are frequent complaints about optimizers making bad choices I can only imagine how much fun it will be to debug load problems when the query plan is hidden away in some external application.
The best time to think about monitoring is after spending too much time debugging a problem. At that point you have a better idea about the data that would have made things easier. One example of missing monitoring was the lack of disk IO latency metrics in MySQL. In one case not having them made it much easier to not notice that the oversubscribed NFS server was making queries slow via 50 millisecond disk reads.
Monitoring should be cheap so that it can always be enabled and from this I can understand the average costs and spot changes in load from the weekly push. But I also need to debug some problems manually so I need to monitor both sessions that I know are too slow (get the query plan for a running SQL statement) and to find sessions/statements that are too slow (dump things into the slow query when certain conditions are met). Letting me do EXPLAIN for statements in my session is useful, but I really need to do EXPLAIN from statements in production - if the optimizer uses sampling I want to see the plan they get and if temp tables are involved I have no idea what will be in their temp tables. MariaDB (and MySQL) recently added support to show the query plan for statements that are currently running. This is even more useful when the query plan and performance metrics can be dumped into the slow query log when needed.
The goal for monitoring performance problems is to eliminate the use of PMP. When I want to understand why a system is running slower than expected I frequently look at thread stacks from a running server. I hope one day that pstack + awk is not the best tool for the job on a modern database. I was debugging a problem with RocksDB while writing this. The symptom was slow performance for a read-only workload with a database that doesn't fit in cache. I have seen the problem previously and was quickly able to figure out the cause -- the block cache used too many shards. Many problems are easy to debug when you have previously experienced them. This can be restated as many problems are expensive to debug for most users because they don't have full time database performance experts.
The focus on peak performance can be at odds with manageability. The faster way to peak performance is via tuning options and too often these options are static. Restarting a database in production to change an option is a bad idea. Dynamic options are an improvement. Using adaptive algorithms in place of many options is even better. And if you add options, make sure the defaults are reasonable.
Predictable performance is part of manageability. How does your modern database behave when confronted with too much load? It helps when you can classify your workload as high-priority and best-effort and then shed load from the best-effort users. Alas this requires some way to distinguish users. In theory you want the hi-pri users to get their work done and then let best-effort users compete for the spare capacity. This requires some notion of SLA for the hi-pri users and spare capacity for the DBMS. These are hard problems and I have not used a great solution for load shedding.
This could be a much larger rant/document but I must return to my performance debugging.
6View comments
-
The UC schedule has been published and there are several talks from the database teams at Facebook.
- Small Data and MySQL by Domas Mituzas- small data is another name for OLTP. Given the popularity of big data we think that small data also deserves attention.
- Asynchronous MySQL by Chip Turner - this describes the work done by Chip's team to implement an async MySQL client API. The feature is in the FB patch, widely used at FB and is integrated with HHVM.
- Performance Monitoring at Scale by Yoshinori - this will explain how to be effective when monitoring many servers with few people. It is easy to get distracted by false alarms.
- MySQL 5.6 at Facebook by Yoshinori - Yoshi will share many stories about what it took to get 5.6 into production. This included a bit of debugging and performance testing, bug fixes from upstream and a lot of work from the MySQL teams at FB.
- Global Transaction ID by Evan, Yoshinori, Santosh - at last global transaction IDs have arrived (for people not using the Google patch). Learn what it took to get this ready for production.
- InnoDB Defragmentation by Rongrong - learn about the work by Rongrong to reduce the amount of space wasted from fragmentation.
- MySQL Pool Scanner by Shlomo - MPS is one of the key tools created by our automation experts (aka operations gurus) that make it possible to manage many machines with few people.
0Add a comment
-
Big downtime gets a lot of attention in the MySQL world. There will be some downtime when you replace a failed master. With GTID in MariaDB and MySQL that time will soon be much smaller. There might be lost transactions if you use asynchronous replication. You can also lose transactions with synchronous replication depending on how you define lose. I don't think this gets sufficient appreciation in the database community. If the higher commit latency from sync replication prevents your data service from keeping up with demand then update requests will timeout and requested changes will not be done. This is one form of small downtime. Whether or not you consider this to be a lost transaction it is definitely an example of lousy quality of service.
My future project, MarkDB, might have a mode where it never loses a transaction. This is really easy to implement. Just return an error on calls to COMMIT.
2View comments
-
I looked at the release notes for 5.6.14 and then my bzr tree that has been upgraded to 5.6.14. I was able to find changes in bzr based on bug numbers. However for the 5 changes I checked I did not see any regression tests. For the record, I checked the diffs in bzr for these bugs: 1731508, 1476798, 1731673, 1731284, 1730289.
I think this is where the MySQL Community team can step up and help the community understand this. Has something changed? Or did the tests move over here?2View comments
-
Google search results for mariadb trademark are interesting. I forgot so much had been written about this in the past. Did the trademark policy ever get resolved? This discussion started in 2010.
- http://openlife.cc/blogs/2010/november/leaving-monty-program-and-mariadb
- http://mariadb.com/kb/en/mariadb-trademark-policy
- http://blog.mariadb.org/mariadb-draft-trademark-policy-available
- http://monty-says.blogspot.com/2010/12/proposal-for-mariadb-trademark-policy.html
- http://blog.mariadb.org/mariadb-and-trademark
- http://www.skysql.com/about/legal/trademarks
2View comments
-
There aren't many new files under mysql-test for 5.6.14. Is this compression or something else? Many bugs were fixed per the release notes.
diff --recursive --brief mysql-5.6.13 mysql-5.6.14 | grep "Only in"
Only in mysql-5.6.14/man: ndb_blob_tool.1
Only in mysql-5.6.14/mysql-test/include: have_valgrind.inc
Only in mysql-5.6.14/support-files: mysql.5.6.14.spec
Only in mysql-5.6.14/unittest/gunit: log_throttle-t.cc
Only in mysql-5.6.14/unittest/gunit: strtoll-t.cc
Only in mysql-5.6.13/packaging/rpm-uln: mysql-5.6-stack-guard.patch
Only in mysql-5.6.13/support-files: mysql.5.6.13.spec2View comments
-
I have been wondering what the Foundation has been up to. I had high hopes for it and even contributed money but it has been very quiet. Fortunately I learned that it has been busy making decisions, maybe not in public, but somewhere. And at Percona London we will be told why it forked MariaDB prior to 5.6 and reimplemented a lot of features.
In other news the Percona London lineup looks great and I appreciate that Oracle is part of it.3View comments
View comments