Tuesday, April 20, 2010

CAP for MySQL

Stonebraker wrote a post on CAP. It is interesting because he highlights all of the causes for failures in a database service and then provides his estimates on the frequency of those causes. Many posts on CAP ignore failures caused by how the service is used and operated.
  • Choose CA - You can get CA (consistency, availability) for a database service running within a datacenter. Use DRBD with MySQL. Then use asynchronous replication to a remote datacenter to survive the loss of a datacenter. You don't get CA between datacenters in this setup. Others have written about the ability to get CA within a datacenter.
  • Repeatable DBMS errors - I call these queries of death. I notice about one per year. MySQL is remarkably stable if you are careful about the features that you use. New SQL must never run first on a master.
  • Unrepeatable DBMS errors - These are also infrequent with MySQL but much more frequent than repeatable errors. These are usually impossible to distinguish from intermittent errors caused by hardware and other system software.
  • P of the CAP theorem is a rare event - I don't have numbers but I don't agree with this for services running across a WAN. Additionally, there are other reasons to sacrifice P. Many applications cannot afford the latency required for strong consistency.
Can we afford strong consistency?

I am certainly not an expert on Paxos, but it is the way to get strong consistency for a database service running across a WAN. This costs one or two round trips depending on whether the commit coordinator migrates between servers. In theory we should be able to afford the overhead of 100 milliseconds to 200 milliseconds for latency sensitive services. We get a lot in return.

But applications and existing database servers can make this difficult to achieve. Many applications are conversational (request, think, request, think, commit -- substitute network latency for 'think'). Many database servers have one resource, the database log, for which Paxos must be run. This guarantees that the commit coordinator will frequently migrate and commit will require two round trips. Even when the log isn't a problem, there will be performance problems for rows that are frequently updated from all locations.

This will change in the future. Servers that are optimized for OLTP (no conversational transactions) will be designed. VoltDB is doing this today (disclaimer, a family member works there). Servers that are optimized for strong consistency across a WAN are also getting built and will become more popular over time.

I think that MySQL can take part in that future (strong consistency across a WAN), but that requires a new storage engine.

3 comments:

  1. I've been doing a lot of experiments for years on this and found that a new storage engine from the ground up is not really needed. If you had something like a Proxy storage engine in front of INNODB with multi-master replication (although you can remove this requirement as well). BCP can be achieved. Think of the concept of Partition DB that acts as a layer to distribute data that sits in front of INNODB or PBXT etc.

    Things that I have done:

    Put in an application layer that writes to both datacenters (latency grows as more DC are put in the mix)

    Daemon that simulates multi-master replication with mySQL comments to indicate which DC it came from. It reads the data from DC 1 and applies it to DC 2 - the problem with this is that DC 2 can be inconsistent if the data is written to DC 1 and DC 2 is lagging.

    Same as above but sticky user sessions to DC's to make both DC's hot and fail-over when
    either DC is down. Same problem.

    Inter DC redundancy with master/master shard setup for the above 2 (3) solutions tried.

    So from this testing having a proxy engine to simulate the behavior is all I think is really needed which would be much faster then building a new storage engine from the ground up.

    ReplyDelete
  2. Just out of curiosity, how common do you think the requirement for strong consistency over WAN really is?

    I see the strong consistency requirement mostly for small subsets of data, for example configuration information that identify location of locally hosted data sets. Just about every multi-tenant application has this type of data, for example. With such data sets it is not hard to design application access in such a way that you can work around network partitions, which are definitely more common than Mike Stonebraker indicated.

    In most other cases it seems sufficient to have a hot copy of data on one site with DR copies elsewhere.

    ReplyDelete
  3. I wrote more about this at http://mysqlha.blogspot.com/2010/04/consistency-across-wan.html

    For a data service with users around the world it is very hard to cluster data so that all writes can be local and network partitions are not a problem. Some online communities are limited to a geographic region. Many others are not. Most people belong to both types of communities.

    ReplyDelete

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.