There are three solutions for providing consistency in a data service that operates across a wide area network (WAN). None of them are free. What are you willing to pay and where are you willing to add complexity? Depending on what you choose your system can be more complex for external users, internal application developers or operations. The choices are multi-master with conflict resolution (eventual consistency), multi-master with conflict prevention (strong consistency) and single master with downtime on failover.
If you choose eventual consistency (EC) then internal application developers must write logic to resolve conflicts and external users will occasionally encounter inconsistent data. This might be a small price to pay for a system that provides higher availability and transparent failover. I am not aware of support for secondary indexes in the popular EC systems. I wonder if the same logic that does eventual consistency across a WAN might be reused to keep secondary indexes eventually consistent within a datacenter. That would impose an additional cost on internal application developers in return for expanding the workloads that EC can support.
If you choose strong consistency then external users experience more latency on writes as the transaction commit requires one or two round trips across a WAN. This might be a small price to pay for a system that provides higher availability and transparent failover. Galera is doing interesting work in this area for MySQL and they have already begun to publish results. I need to read more about that.
If you choose single master then you will spend more money to make that master less likely to fail. You will also experience more downtime and higher support costs while doing manual failover as quickly as possible. Solutions include RAID 10, battery backed write cache, highly-available SAN/NFS, DRBD and pagers for your operations team.
I don't know if people choose single master in the MySQL community. There are not many choices. It supports multi-master replication but without conflict resolution. It supports strong consistency with Galera but that is new on the MySQL market. Galera might be the killer application for MariaDB. Tungsten is another product that can reduce the complexity of master-slave replication.
Unless you are using Tungsten, it very hard to automate master failover for MySQL when there is more than one slave per master. But many deployments need a master and slave in one datacenter and another slave in a remote datacenter.
-
Stonebraker wrote a post on CAP. It is interesting because he highlights all of the causes for failures in a database service and then provides his estimates on the frequency of those causes. Many posts on CAP ignore failures caused by how the service is used and operated.
- Choose CA - You can get CA (consistency, availability) for a database service running within a datacenter. Use DRBD with MySQL. Then use asynchronous replication to a remote datacenter to survive the loss of a datacenter. You don't get CA between datacenters in this setup. Others have written about the ability to get CA within a datacenter.
- Repeatable DBMS errors - I call these queries of death. I notice about one per year. MySQL is remarkably stable if you are careful about the features that you use. New SQL must never run first on a master.
- Unrepeatable DBMS errors - These are also infrequent with MySQL but much more frequent than repeatable errors. These are usually impossible to distinguish from intermittent errors caused by hardware and other system software.
- P of the CAP theorem is a rare event - I don't have numbers but I don't agree with this for services running across a WAN. Additionally, there are other reasons to sacrifice P. Many applications cannot afford the latency required for strong consistency.
I am certainly not an expert on Paxos, but it is the way to get strong consistency for a database service running across a WAN. This costs one or two round trips depending on whether the commit coordinator migrates between servers. In theory we should be able to afford the overhead of 100 milliseconds to 200 milliseconds for latency sensitive services. We get a lot in return.
But applications and existing database servers can make this difficult to achieve. Many applications are conversational (request, think, request, think, commit -- substitute network latency for 'think'). Many database servers have one resource, the database log, for which Paxos must be run. This guarantees that the commit coordinator will frequently migrate and commit will require two round trips. Even when the log isn't a problem, there will be performance problems for rows that are frequently updated from all locations.
This will change in the future. Servers that are optimized for OLTP (no conversational transactions) will be designed. VoltDB is doing this today (disclaimer, a family member works there). Servers that are optimized for strong consistency across a WAN are also getting built and will become more popular over time.
I think that MySQL can take part in that future (strong consistency across a WAN), but that requires a new storage engine.3View comments
-
- Announce
- Release
2View comments
-
Do you want to reference the CAP theorem at your next tech conference? I certainly do. Consistency, availability and partition tolerance are wonderful. You only get two and sometimes you only get one. But what are they? In the CAP context availability and partition tolerance are labels for behavior defined by the CAP presentation and paper and this is the source of much confusion. Start with the original work before reading summaries by others.
Eric Brewer presented CAP in a keynote at the PODC. The best slides are on page 4. With XA you forfeit partitions and get CA (consistency + availability). With majority protocols you forfeit availability and get CP (consistency + partition tolerance). XA == CA is easy for me to remember and then I know that majority protocols provide CP.
CAP was proved in a paper by Gilbert and Lynch. This paper defines availability and partition tolerance as used by CAP.
- availability - every request received by a non-failing node must result in a response
- partition tolerance - The network will be allowed to lose arbitrarily many messages sent from one node to another. Every node receiving a request from a client must respond, even though arbitrary messages that are sent may be lost.
What does this have to do with MySQL? I think we will be talking about CAP much more in the future. I found an interesting presentation that explained current MySQL solutions (DRBD, master-slave, master-master, NDB/Cluster) in terms of CAP, but it didn't have enough notes to explain the content.1View comments
View comments