1. Maria is here. Despite the disclaimers, the performance is not too bad. I have but one suggestion, the engine comment should be changed. The following query was actually much faster on Maria than on other storage engines:
    • select count(*) from foo where d > 0

    mysql> show engines;
    +------------+---------+----------------------------------------------------------------+--------------+-----+------------+
    | Engine | Support | Comment | Transactions | XA | Savepoints |
    +------------+---------+----------------------------------------------------------------+--------------+-----+------------+
    | MARIA | YES | Traditional transactional MySQL tables | YES | NO | NO |
    +------------+---------+----------------------------------------------------------------+--------------+-----+------------+
    5

    View comments

  2. Back in the day, servers with 8 to 32 CPUs were big iron. Soon commodity servers will have 8 or more CPU cores, 128GB of RAM and thousands of IO operations per second to local disks.

    While the Falcon community might have you think that Falcon is the storage engine for this future platform, the InnoDB community is also working hard to improve scalability and maintain its role as an amazing storage engine.

    I will speak about changes in the code that we and others have made and the impact of those changes on performance. The results are significant.

    Details on my talk are here. I am speaking on Thursday, April 17 at 11:55AM.
    2

    View comments

  3. I am surrounded by Harry Potter fans. Harry took Defense against the dark arts at school to learn how to defend himself against other wizards and witches. How do you defend your MySQL server against applications that run too many queries?
    1. connection limits
    2. query per hour limits
    Are there any other options?

    You can use connection limits to reduce the number of concurrent connections an application can use. But your application might not be prepared to work with fewer connections.

    You can set a limit on the number of queries per hour an account can use, but that is a blunt instrument for rate limiting. Before the limit is reached, no limiting is done and after the limit is reached the account is useless until the hour is up. This would be more useful for my current problem if the rate limit were computed over a shorter interval, such as queries per minute. In that case, the outage would last no more than one minute when the limit was reached.

    I changed MySQL to allow the number of InnoDB concurrency tickets to be set at the account level. The nickname for this was karma, and accounts with low karma would spend more time waiting to access InnoDB than accounts with high karma. But karma doesn't help much when the problem queries don't are short and don't enter InnoDB more than a few times.

    Is there a way to slow an account down without preventing it from running queries? A coworker proposed one idea. Allow a pre-query delay to be set per account. When set to a value greater than zero, the thread sleeps for that number of milliseconds before running a query.
    0

    Add a comment

  4. Are there open slots for keynotes at the MySQL User Conference? Can one be reserved for Curt Monash? I would be thrilled if he were invited.
    3

    View comments

  5. While debugging a problem, I read some of the code used to send binlog events from the master to a slave. I wanted to determine when data for replication is written to the socket by the master. The protocol for this is interesting. Data for binlog events are written to a buffer on the master as soon as the events are available. The code for this is in mysql_binlog_send which is run by the thread on the master associated with one slave. The data is sent to the slave (written to a socket) in two cases:
    1. The buffer is full. I think the size of the buffer is determined by the value of the net_buffer_length variable. This is set to 16384 on my servers.
    2. There are no more binlog events to send.
    An implication of this is that when a master is busy and the IO thread on a slave remains behind the master, it will receive replication events in net_buffer_length byte chunks and some replication events will remain in a buffer on the master until the buffer is full, rather than be sent as soon as they are available.

    I don't know how much this contributes to replication delay. When semi-sync replication as implemented in the Google patch is used, the buffer is usually flushed at transaction boundaries whether or not the buffer is full. This is done so that a slave can acknowledge receipt of the binlog events before the transaction commit returns to a user.

    Whether or not semi-sync replication is deployed, flushing data to the slave on transaction boundaries decreases the chance of transaction loss on failover.
    2

    View comments

  6. I used to think that MySQL rarely crashed. Then I began running a script to search the database error log (/var/lib/mysql/HOSTNAME.err) for crashes on a daily basis. Now I have numbers to show that it rarely crashes and can compute the MTBF for my servers.

    For the purpose of this blog, I measure MTBF as the expected number of days that a MySQL server will run before crashing. If I have 100 servers and there is 1 crash every 10 days, then my MTBF is 1000 days. The crash may be caused by a software bug or flaky hardware and it isn't always possible to distinguish between the two. Also, this excludes planned restarts so the uptime of a server almost never reaches the MTBF.

    With those rules, what is your MTBF?

    I use two types of servers and MySQL with InnoDB. The two types have an MTBF of 600 days and 2000 days. With a bit of work, the MTBF for one of the types can be improved from 600 to more than 1000 days. I think these numbers are remarkable, especially when I convert from days to years and round up (2 to 6 years is a long time). But MySQL is the only RDBMS I have deployed in a production environment, so I can't compare it to anything. Thus, my question.
    5

    View comments

  7. Chained and circular are popular methods to provide highly available database service for MySQL. They are popular because such a deployment is easy to manage and provides great uptime. Chained replication is also popular elsewhere. I want to use it, but I don't and will explain why. In what follows, I focus on chained replication, but most of this also applies to circular replication.

    There are three problems that make chained replication difficult to deploy. I expect most of these to be fixed by future MySQL releases.
    1. It is difficult to avoid replicating statements that don't need to be replicated.
    2. Each server in the chain has more replication delay from the master.
    3. Transactions may be lost when a failed node is removed from the chain.
    I will use as an example servers deployed in a chain with foo as the master and bat as the last node in the chain: foo -> bar -> baz -> bat. All of the servers have the following parameters set in /etc/my.cnf.
    • log_bin
    • log_slave_updates
    • replicate-wild-ignore-table=test.%
    Controlling what replicates

    As the slaves will be used for reporting queries, applications need to create tables to evaluate queries in multiple steps, to improve performance and to compensate for some SQL features that MySQL does not support. Tables will be created in the test database for this purpose. Statements used to create and modify rows for tables in the test database must not be replayed on other servers by the replication thread because those statements might fail when those tables do or do not exist on other servers. Those statements should not be replayed on other servers because this creates useless load. As all slaves have log_bin set, these statements will be written to the slave's binlog and received by the next slave in the replication chain.

    The replicate-wild-ignore-table parameter prevents these statements from being replayed by replication. Unfortunately, the replication filter parameters are a constant source of confusion and a potential source of errors. On this I speak from experience and others agree with me. The replicate-wild-ignore-table parameter works as long as:
    1. All applications use the test database for working tables.
    2. Tables from the test database are never used in a statement that changes rows for tables in the main database. If they are, then those statements will be ignored by slaves and transactions will be lost.
    The other replication filter options, such as replicate-ignore-db, have the same problems. You have to trust applications to know the rules and do the right thing. Eventually they won't and transactions will be lost.

    Row based replication should fix all of this. Today, replication events are SQL statements and filters are limited to not replaying a statement that references a table or not replaying a statement based on the default database. When a replication event describes a change to one row in one table, filters can be specified so that events that modify a table are not replayed.

    Replication delay

    Before a replication transaction is replayed on a slave, it must first be replayed on the previous node in the chain. Each node on a chain adds replication delay. Unless all transactions are very fast or the replication chain is short, this creates a lot of replication delay towards the tail of the chain. This will be improved, but not completely fixed in future MySQL releases as row based replication removes the overhead of processing queries from replication.

    Failover

    When a slave in the interior of the chain fails, the node replicating from the failed slave must change to use the predecessor of the failed slave. Transactions are likely to be lost when the CHANGE MASTER command is run to do this. The problem is that replication state of a slave is local to the master used by the slave. It is a binlog filename and offset. This binlog file does not exist on any other server and there is no easy way to map this to a binlog file and offset on another master. It can be done in many cases, but it is difficult to automate and I suspect that most deployments use the easy solution, which is to begin replicating from the end of the binlog on the new master. The easy solution is likely to lose transactions.

    Assume that the server bar has failed and baz is changed to replicate from the end of the active binlog on foo. This will lose transactions that replicated from foo to bar but not from bar to baz. It will also lose transactions that had yet to reach bar.

    I don't like to lose transactions.

    This too can be fixed and MySQL intends to fix it. One solution is to add metadata to replication events to make them globally unique. When this is done, the current replication state can be used to reconnect after failover without losing transactions. Assume the following is done:
    1. Metadata is added to replication events to include a globally unique ID. This can be the pair (server_id, event_id) where event_id is the value of a counter that is unique within one server and incremented for each event written to a binlog. The value is determined by the server that first logs the binlog event and does not change when binlog events are written by a slave because log_slave_updates is set. This behavior is already provided for server_id.
    2. A command is added to get the globally unique ID for the last replication event run by a node (SHOW SLAVE GLOBAL_OFFSET)
    3. A method is added to map a globally unique ID to a binlog filename and offset on a server (SHOW MASTER OFFSET FOR ID ). This might be slow as a server must scan existing binlogs to determine the offset.
    With this functionality, the following provides lossless failover:
    1. Run this on the slave that must reconnect: SHOW SLAVE GLOBAL_OFFSET
    2. Run this on the server to which the slave reconnects: SHOW MASTER OFFSET FOR ID
    3. Run this on the slave that must reconnect using the values from the previous step: CHANGE MASTER ...
    1

    View comments

  8. You cannot replicate down versions in MySQL. For example, a MySQL 4.0 slave cannot connect to a MySQL 5.0 master. You can replicate up versions.

    I helped upgrade a critical MySQL deployment from version 4.0 to version 5.0. In case something went wrong, we wanted to support fast failover to a master running MySQL 4.0 without losing transactions.

    One way to do this is to wait for the crisis, and then replay binlogs from the MySQL 5.0 master with the old MySQL 4.0 master. This isn't very fast, although you could replay the binlogs as soon as the 5.0 master rotated to a new log file. This also won't work as some of the SQL statements produced by mysqlbinlog with a 5.0 binlog as input cannot be parsed by a 4.0 server. The problem SQL is generated by MySQL replication code, as applications used the subset of SQL supported by versions 4.0 and 5.0.

    My coworkers built a tool to poll a 5.0 master for new binlog events, filter the SQL that is not supported and run the output against a 4.0 server. With the tool, the 4.0 server was current with the 5.0 master and the people who worry about database downtime were appeased.

    The upgrade was done without a crisis. I hope MySQL provides a product for this before I must do an upgrade to version 5.1 or 6.0.
    0

    Add a comment

  9. When David DeWitt and Michael Stonebraker call you out, you might be on to something. In a new article they claim that MapReduce is a step backwards for parallel query processing. Regardless of whether they are fond of it, they are helping to spread the message and start interesting discussions and that is good.

    Even better is the impact that Sun's purchase of MySQL might have on the future of MySQL and Hadoop. MySQL has great transaction engines that generate a lot of operational data. Eventually, there is so much operational data that parallel query, hash join and fast external sorts are needed to process queries quickly. Sun likes to sell lots of machines and Hadoop uses a lot of machines. Hopefully, it is just a matter of time before we are able to maintain copies of MySQL tables in realtime in Hadoop using row-based replication and then query that data using Pig Latin or something else that looks like SQL.

    1

    View comments

Loading