1. Many startups depend on MySQL. Check out the list of keynote speakers from past and present MySQL User Conferences. There are things to make better and new features to add like better support for dynamic schemas (documents) and auto-sharding. But existing deployments aren't in crisis mode and it is being used for new deployments. Maybe I have an agenda in writing this but anyone writing the opposite is likely to have their own agenda.

    There is a problem in MySQL land that doesn't get enough attention. When startups using MySQL get really big then they try to hire developers to work on MySQL internals (see Facebook, Google, Twitter, LinkedIn, etc). And I mean try because there is much more demand than supply in the US. There is a lot of talent but most is in other countries. While remote development has been successful in the MySQL community most of that is for traditional software development. Hacking on MySQL at a fast growing startup is far from traditional. You occasionally need to make MySQL better right now to fix serious problems in production. You won't get 6 month development cycles unless you have a large team. This requires a strong relationship with the operations team. This also requires personal experience with operations. That is harder to get while working remotely. Remote development can work. I am an example of that, but I am only 1 hour from the office by plane and I already had strong relationships with the operations team.

    A great solution is for someone already working at the startup to begin hacking on MySQL. This is great because it grows the supply of MySQL internals expertise and MySQL will get better at a faster rate. My teams at Facebook and Google grew in that manner. The problem is that it can be very hard to grow the team size from 0 to 1. The first person won't have a mentor. If the first person is also new to mature DBMS code then they might be in for a surprise. In my case 9 years between Informix and Oracle increased my tolerance level but not everyone has that background.

    I think we can make it easy for the first person on the team by providing training and mentorship. A bootcamp in April during/after/before the UC is one way to do that and there can be remote mentorship once per 2 weeks after that. The experts who can teach the class (like gurus from MariaDB) will be in town. Are people interested in this? I don't expect this to be free. If we expect professional training then we need to pay the professionals. But I hope that free training materials can eventually be produced from the effort.

    Beyond getting paid for professional services there would be other benefits were MariaDB to lead the program. This can increase their community of users and hackers.
    16

    View comments

  2. My kids watched the new Lego movie today and spent the rest of the day repeating "Everything is amazing". I spent a few hours reading MongoDB documentation to help a friend who uses it. Everything wasn't awesome for all of us. I try to be a pessimist when reading database documentation. If you spend any time near production then you spend a lot of time debugging things that fail. Being less than optimistic is a good way to predict failure.

    One source of pessimism is database limits. MongoDB has a great page to describe limits. It limits index keys to less than 1025 bytes. But this is a great example that shows the value of pessimism. The documentation states that values (MongoDB documents) are not added to the index when the index key is too large. An optimist might assume that an insert or update statement fails when the index key is too large, but that is not the specified behavior.

    As far as I can tell, prior to MongoDB 2.5.5 the behavior was to not add the document to the index when the indexed column exceeded 1024 bytes. The insert or update would succeed but the index maintenance would fail. Queries that used the index after this can return incorrect results.

    A quick search of the interwebs shows that people were aware of the problem in 2011. I can reproduce the problem on my Ubuntu 12.10 VM. Why do we tolerate problems like this? Maybe this isn't a big deal and the real problem is that new risks (from a system I don't know much about) are worse than risks in software that I have been using for years. But corruption (either permanent or via incorrect query results) has been a stop the world bug for me -- as in you do nothing else until the problem has been fixed. Why have MongoDB users tolerated this problem for years?

    While it is great that the bug appears to have been fixed, database vendors should understand that FUD takes much more time to go away. See all of the stories about transactions and MySQL that lived on long after InnoDB became viable. And note that some of the MySQL FUD was self-inflicted -- see the section of the MySQL manual on Atomic Operations.

    Found this code in 2.4.9 to explain how key-too-large is handled. A careful reader might figure out that index maintenance isn't done. Nice optimization. I spoke to one user who doesn't like the old behavior but doesn't want to break apps with the new behavior that fails inserts/updates with too large keys. Indexes on a prefix of a key would help in that case.

    template< class V >
    void BtreeBucket<V>::twoStepInsert(DiskLoc thisLoc,
                                       IndexInsertionContinuationImpl<V> &c,

                                       bool dupsAllowed) const
        {

            if ( c.key.dataSize() > this->KeyMax ) {
                problem() << "ERROR: key too large len:" << c.key.dataSize() 
                                << " max:" << this->KeyMax << ' '
                                << c.key.dataSize() << ' '
                                << c.idx.indexNamespace() << endl;
                return; // op=Nothing
            }
            insertStepOne(thisLoc, c, dupsAllowed);
        }

    /** todo: meaning of return code unclear clean up */
    template< class V >
    int BtreeBucket<V>::bt_insert(const DiskLoc thisLoc, const DiskLoc recordLoc,
                                  const BSONObj& _key, const Ordering &order,
                                  bool dupsAllowed,
                                  IndexDetails& idx, bool toplevel) const
        {
            guessIncreasing = _key.firstElementType() == jstOID && idx.isIdIndex();
            KeyOwned key(_key);

            dassert(toplevel);
            if ( toplevel ) {
                if ( key.dataSize() > this->KeyMax ) {
                    problem() << "Btree::insert: key too large to index, skipping "
                    << idx.indexNamespace() << ' ' << key.dataSize()
                    << ' ' << key.toString() << endl;
                    return 3;
                }
            }


    10

    View comments

  3. Pardon the rant but the winter storm has kept me in a hotel away from home for a few days. I exchanged email this week with someone pitching a solution to a problem I don't have (MySQL failover). But by "I" I really mean the awesome operations team with whom I share an office. The pitch got off to a bad start. It is probably better to compliment the supposed expertise of the person to whom you are pitching than to suggest they are no different than COBOL hackers working on Y2K problems.

    Unfortunately legacy is a bad word in my world. Going off topic, so is web scale. I hope we can change this. The suggestion that MySQL was a legacy technology was conveyed to me via email, x86, Linux and a laptop. Most of those have been around long enough to be considered legacy technology. DNA and the wheel are also legacy technology. Age isn't the issue. Relevance is determined by utility and efficiency.

    Remember that utility is measured from a distance. It is easy to show that one algorithm can do one narrow operation much faster than as implemented in existing products. But an algorithm shouldn't be confused with a solution. A solution requires user eduction, documentation, skilled operations, trust, client libraries, backup, monitoring and more.
    0

    Add a comment

Loading