Monday, April 6, 2009

Master-master replication and crash recovery

The slave SQL thread executes binlog events from the relay log to keep a slave in sync with the master. Prior to row-based replication, binlog events were SQL statements. The slave SQL thread records its state in the relay-log.info file. The state includes the file offset of the next binlog event to execute, so it is important that this state be correct to avoid skipping a transaction or running a transaction multiple times on the slave. The slave SQL thread does the following in a loop:
  1. replay all binlog events from a transaction
  2. commit the transaction to the storage engines that participated
  3. write new state to the relay-log.info file
Unfortunately, if the mysqld server crashes after step 2 and before step 3 then it will run the last transaction twice (before the crash, after the crash). This may fail with a duplicate key error or it may not and leave the slave inconsistent with the master with little evidence left behind for the DBA to notice the problem.

This is bug 26540 if you want to express interest in a fix. This is also fixed in some cases for InnoDB by transactional replication which has made it into Percona. By some cases, I mean that it does not protect transactions that update MyISAM tables, at least not in the Google patch. I have not reviewed the Percona code.

This problem gets more interesting for master-master replication. In that case a server writes a binlog and a relay log and the update sequence is:
  1. replay all binlog events from a transaction
  2. XA prepare for the binlog
  3. XA prepare for InnoDB (assuming InnoDB is used)
  4. write the XID to the binlog (commit)
  5. commit the transaction to the storage engines that participated
  6. write new state to the relay-log.info file
In this configuration, the server uses internal XA to coordinate the update of the binlog and commit to the storage engines.  There are three interesting crashes that can occur:
  • before step 4 - This is not a problem. The prepared InnoDB transaction is rolled back during crash recovery and then run when the slave SQL thread starts.
  • between step 4 and step 5 - This is a problem. The prepared InnoDB transaction is committed during crash recovery but relay-log.info is not updated. Note that transactional replication does not correct the mismatch so the last transaction will be run again when the slave SQL thread starts. Running the same transaction multiple times may cause replication to halt or may corrupt your database.
  • between step 5 and step 6 - This problem is fixed by transactional replication.
So there is a new problem that we need to fix for servers that will soon run with --log-slave-updates (not because I will use master-master replication).  Some of the changes described in my previous post can fix this new problem, at least they do in our stress test framework.

3 comments:

  1. how do I be surprised and totally not surprised at the same time.... it's inexcusable these days to have such a giant hole for corruption.

    Determined not to have such things for Drizzle.

    ReplyDelete
  2. Hi Mark,

    We solve this problem in Tungsten Replicator by updating an extra table as part of the same database transaction on the slave, which removes ambiguity about whether the salve updated. Maybe I'm missing something but couldn't MySQL use the same approach internally? It seems as if a lot of the trouble comes from before and after updates across stores.

    If so please don't implement it too quickly as I want to make some house payments first. ;)

    Cheers, Robert

    ReplyDelete
  3. @Robert - there is a whole lot of value to be added on the path between anything we do in 5.0.37 and anything that a customer can depend on when deployed in production.

    Tables would be good. You still need to be sure that any files that store binlog events are updated safely and can deal with partial updates.

    ReplyDelete

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.