1. There is a great article on how using LIMIT clauses in update and delete statements can produce different results on a slave than it did on the master. If you really want to keep a master and slave in sync, I have a larger list of rules to follow for SQL that is run on a master. They might not be popular but they make it easy to run a large number of replication slaves with few problems.
    1. Do not use LIMIT clauses in statements that can change rows (see Baron's article)
    2. Do not use session variables in SQL statements that might be written to the binlog. This is OK to do on recent MySQL versions, but some of us still use old releases.
    3. Do not use temporary tables on the master. When the binlog contains statements that populate and then use a temporary table, a crash or restart of the slave mysqld process between the population and use means that the temp table will not exist after the restart and replication will halt.
    4. Do not use MyISAM tables on the slave. A slave mysqld process may crash and a slave SQL thread might rollback a transaction because of a lock wait timeout or deadlock. The slave SQL thread must then retry the transaction. But MyISAM doesn't support rollback. At this point, the MyISAM table on the slave is likely to have different contents then the master.
    5. Do not use MyISAM tables on the master. There is code to keep InnoDB and the binlog in sync on the master. That doesn't work for MyISAM. Also, it is not possible to guarantee that what gets written to the binlog matches the changes done to MyISAM on the master as MyISAM does not do rollback and does not undo partially executed statements.
    6. Do not allow the slave to crash. Unfortunately, this is impossible. But slave replication state is not updated atomically with the changes made by the slave SQL thread. The slave SQL thread first commits transactions and then updates the files in which replication state is stored. A crash between the two means that the slave SQL thread will replay replay statements from the relay log on restart, statements that have already been executed.
    7. With many thanks to Peter for documenting the problems, do not use replicate-ignore-db and binlog-ignore-db.
    0

    Add a comment

  2. A few DBMS vendors support parallel query processing on large SMP servers or on clusters of commodity servers. MySQL does not do that yet. But there is a way to use MySQL and parallel data processing -- maintain a copy of your data in Hadoop. This can be done today if you are willing to frequently dump all of your tables from MySQL and reload them into the Hadoop Distributed File System. You might not want to do this because full dumps are slow and the copy of the data in Hadoop will be stale.

    To replicate from MySQL to another data store, you need a description of each row changed in MySQL. MySQL 5.1 almost provides that with row-based replication. It is missing a library that can be used by a MySQL client to decode the contents of a binlog event. Hopefully, MySQL will provide such a library.

    There are other problems that you might want to solve for this to work and these include:
    • Support for incremental changes in the alternate data store. Files in Hadoop are written, closed and then read. Once closed, they are read-only. To replicate from MySQL into Hadoop, a new file must be created for each batch of replicated transactions.
    • Support for schema changes in the alternate data store. The amount of data that must be changed in the alternate stored depends on several factors including whether rows are self-describing, whether all rows must have the same schema and whether the schema change impacts an indexed column.
    • Support for updates and deletes in the alternate data store. If this is a file, then inserts can be handled by appending the new data to the end of the file. But updates and deletes require more work.
    • Support for indexes. Indexes are not needed for query processing if you expect to scan all rows for each query. Primary key indexes are needed if you hope to process replication changes without doing full scans.

    Is anyone else trying to do this? Does anyone else want to do this?
    2

    View comments

  3. When not injecting MySQL with code, I try to make MySQL work better for my employer. I also campaign for features I want in MySQL and against features I don't want in MySQL. Support for parallel operations is an area that I am still uncertain about it.

    There are a few reasons why I don't want it supported in MySQL.
    • It will make the MySQL server much more complex and bugs accompany complexity.
    • MySQL doesn't need it for many workloads. It favors throughput over response time for queries on large data sets. That model works for many customers. The simplicity and scalability of replication further enable throughput.
    • There are alternatives for parallel query processing including Greenplum, if they open source all of their code, and Hadoop, if you don't need SQL and you can figure out how to get your data into it from MySQL.
    But what does it mean to support parallel operations in MySQL? MySQL can support parallel query processing within one mysqld process. I am wary of this because it will require significant changes to the optimizer and query execution code in MySQL, even if this is limited to queries on partitioned tables.

    There are more limited forms of support for parallel operations that require fewer changes to MySQL.
    • InnoDB already supports parallel IO because it issues prefetch requests during table and index scan and the requests are processed by a background thread.
    • Table and index scans on partitioned tables can be done in parallel.
    • Filesort can use async IO, real or simulated, to overlap IO with sorting.
    Finally, there is an alternative to the traditional approach to parallel query processing that is made possible by the MySQL Proxy and the ease with which MySQL supports scale out. Queries could be parallelized in the MySQL Proxy rather than in the MySQL server. Different parts of a query can be run on different servers and combined within the proxy to be returned to the user as if a single query were used. I hope this is done.
    1

    View comments

  4. How do you tune load problems on MySQL? You might get lucky and see that there are a few long running statements when running SHOW PROCESSLIST. You might be working on a small system where you know all of the people and/or SQL statements. Unfortunately, these approaches don't work for me most of the time. My first approach was to run SHOW PROCESSLIST many times and aggregate the results by account and by statement. This is a simple and occasionally effective profiler. Aggregating by statement worked better after normalizing the query text by replacing all literals with a a constant. Eventually we implemented support for account, table and index monitoring. The results are reported with SHOW USER_STATISTICS, SHOW TABLE_STATISTICS and SHOW INDEX_STATISTICS. The features are described here.

    I use the output from these commands in several ways. I generate nightly reports that rank accounts number of seconds running SQL statements (Busy_time in SHOW USER_STATISTICS). I look at that report on a weekly basis to see if there is a change in the most busy accounts. I also look at that report when new applications are launched to see if there are SQL statements that must be adjusted. I should generate reports that contain the change in values from day to day, but I have yet to do that. Many of my reports are generated with a mix of Bash, Awk and SQL statements and my reporting capabilities are limited.

    When there is an immediate problem, I use a tool that displays the change in values from SHOW USER_STATISTICS over intervals that I define so I can see the load per account over the last 10 minutes rather than since the server was started. It would be great to roll this kind of functionality into innotop. But there is not much point in doing that until the new monitoring code is in MySQL.

    Occasionally people ask if patches for this feature are available on a particular release. Unfortunately, they are limited to 5.0.37 and 4.0.26 for now, and all changes are in one patch. If that doesn't work for you then update the feature request for this here. And if you don't want to wait, I am sure you could hire someone from the MySQL community to port it for you.
    1

    View comments

  5. I think this story has a moral. The source to MySQL is open. Because of that we were able to find the cause of a serious performance regression between MySQL4 and MySQL5. We also may have made MySQL5 29% slower than MySQL4 rather than 58% slower for one of our benchmarks. Note that this difference is for our benchmark that consists of fast SQL insert statements with many expressions and minimal network latency.

    As part of upgrading a few database servers from MySQL 4.0 to MySQL 5.0, I ran a benchmark that measures the time to reload all tables of a database. The number of sessions used to perform the reload was configurable, and the performance difference between MySQL 4 and 5 was the greatest when 1 session (no concurrency) was used.

    This was quite a shock. I first assumed the problem was caused by InnoDB and the new support for row_format=compact. I even removed support for this feature and reran the tests. That made MySQL5 faster, but not much faster.

    I then began using oprofile which is amazing for performance debugging and found that MySQLParse was the top consumer of CPU time. Unfortunately, this is one large function a huge switch statement generated by Bison. Fortunately, I was able to use the rdtsc instruction to isolate the problem.

    The grammar for MySQL5 has changed significantly from MySQL4 in a few places. There is good reason for this change as the SQL supported by MySQL5 is much closer to the SQL standard than it used to be. One unexpected effect of this change is that much more CPU time is used to parse expressions (see http://bugs.mysql.com/bug.php?id=29921).

    I found a colleague who gets interrupted at work less frequently than I do and asked him to modify the grammar and get back the performance. I think we have something that works. The worst case performance difference for MySQL5 has been cut in half for MyISAM and almost in half for InnoDB.

    The bug report has many more numbers. The number of seconds to reload a database with MyISAM tables are listed below. MySQL5 is 58% slower without the fix and 29% slower with it.


    MySQL 4.0.26 - 10061 seconds
    MySQL 5.0.37 (original) - 15808 seconds
    MySQL 5.0.37 (fixed) - 12995 seconds
    2

    View comments

  6. What is rate limiting in MySQL and why might you need it? For starters this is limited to the InnoDB engine. InnoDB had performance problems on SMP servers and uses rate limiting as a workaround. There is an excellent description of this at mysqlperformanceblog.com. InnoDB limits the number of threads that can execute concurrently by giving each thread a number of tickets and making threads sleep when they are out of tickets and too many other threads are active.

    Unfortunately, the replication thread is not given special treatment. It gets no more tickets than other sessions. On a busy replica where the number of concurrent queries exceeds innodb_thread_concurrency, replication can get far behind as the replication thread spends too much time waiting for tickets. This is easy to fix. Change the code so that the replication thread gets many more tickets per allocation than other threads. The result of this change is much less replication delay on busy slaves.

    This feature in InnoDB can also be used to rate limit accounts that are using too many resources on a busy server. The trick is to provide a mechanism to change the number of tickets that a session may get at the account level. When too much work is done by one account and there is no easy way to change applications that use the account, reducing the number of tickets given to sessions for that account is a good way to make that account use less of a server. This requires a mechanism for setting the the equivalent of innodb_concurrency_tickets per account. This can be done by adding new SQL syntax to the parser, adding a system variable that can be set with a comma-separated list of account and number of ticket pairs or by redefining the values of rarely used columns in the mysql.user table. Since I have never been fond of the max_questions and max_updates columns in mysql.user, I would use the value of the column, when not set to 0, to be the per-account value for innodb_concurrency_tickets.
    4

    View comments

  7. My wishes are:
    1. Provide HA for traditional storage engines (InnoDB) as defined by the features in the Google patch. The features include support for semi-sync replication, binlogs that survive a master failover and slave replication state that does not get corrupt on a slave crash. For the binlog to survive a master failover, the slaves position in the master's binlog must be represented by something other than the file offset or binlogs must be mirrored.
    2. Support parallel IO. Parallel query is very hard to support. Parallel IO is a bit simpler. External sort (index creation) can be much faster on multi-disk servers when async IO is used so that more disks are utilized concurrently. Async IO can be provided by libaio or a set of threads as InnoDB does for Linux. Parallel IO can also be used for external hash join, external sort merge join and prefetch. InnoDB uses async IO on some platforms and when the Google patch is used. But external sort does not use it and nested loops join is the only join method in production MySQL releases.
    3. Continue to improve InnoDB thread scalability.
    4. Separate threads from connections in the MySQL server so I can run a server with many more connections than is possible today or provide a proxy that provides connection pools.
    5. Fix the InnoDB performance regressions in MySQL 5.0.
      • Group commit is broken for InnoDB in MySQL 5.0. This bug has been open for some time. I may try to fix it.
      • There might be a performance regression for insert statements with InnoDB in MySQL 5.0.37. I say might because I have great respect for Heiki and InnoDB and I am still investigating the problem. But I think that the performance difference is significant (30% to 50% slower for table restore using mysqldump output). The difference does not exist for queries and I have yet to test update and insert. I need to fix this if I am to use MySQL 5.0.
    2

    View comments

  8. I have uploaded the patch for semi-sync replication with MySQL 5.0.37 at http://code.google.com/p/google-mysql-tools. It is in the Source section (here) and there is some documentation on the wiki. There is not much documentation at this time. I would like to get some feedback from power users before making this easier to use. I am also still trying this out.

    This only works with InnoDB, but can easily be extended to other storage engines.

    There are many interesting features that can be added to it. The HA feature from Solid as described at the MySQL Conference supports different levels of durability and these determine when the slave acknowledges receipt of replication events. This implementation ACKs receipt as soon as the IO thread receives them, which is before the IO thread buffers them. With not much effort, there could be options that delay the ACK until the IO thread has written the relay log or forced the relay log to disk.

    More complex commit protocols can be implemented. Currently, the master calls commit on a storage engine, and then blocks return from a successful commit until at least one slave ACKs receipt or the timeout expires. I am sure that you can think of options to make this more interesting.
    2

    View comments

  9. Chip Turner has begun work on a new storage engine that makes it easy to export files as if they were database tables. Why is this good? I have written code to export data from /proc in SHOW STATUS. This makes it possible to track changes in system resource consumption, such as the output from vmstat. The code I wrote is not portable, as it only exports a few values. The work underway by Chip makes it possible to read many more files. This code isn't ready for production, but it is amazing that it only takes a few days to get something running.

    My description doesn't do this justice, the example that follows does. And another example is here.


    create table foo (load1 DECIMAL(3, 2),
    load5 DECIMAL(3, 2),
    load15 DECIMAL(3, 2),
    active_processes VARCHAR(12),
    lastpid INTEGER)
    engine=filesystem connection="/proc/loadavg";

    select * from foo;

    +-------+-------+--------+------------------+---------+
    | load1 | load5 | load15 | active_processes | lastpid |
    +-------+-------+--------+------------------+---------+
    | 0.01 | 0.39 | 0.65 | 1/98 | 1713 |
    +-------+-------+--------+------------------+---------+

    create table vmstat (label varchar(64), value integer)
    engine=filesystem connection="/proc/vmstat";

    select * from vmstat order by label limit 4;

    +-------------------+--------+
    | label | value |
    +-------------------+--------+
    | allocstall | 0 |
    | kswapd_inodesteal | 1393 |
    | kswapd_steal | 133347 |
    | nr_anon_pages | 2265 |
    +-------------------+--------+
    1

    View comments

  10. I am lucky right now to be working for a company that when they make public statements about open source database software, MySQL and InnoDB, such statements are noticed. I am even luckier that such statements are true.

    I went to the MySQL Replication Roadmap and Vision talk by Lars Thalmann today. I am very happy with what I heard as they are working on the problems that when solved, will make MySQL deployments more highly available, even if you don't use MySQL Cluster. We have published patches that solve some of these problems for us today, but we much prefer a solution provided by MySQL, as they will then provide support for those features and we can reduce the overhead of maintaining our patches.

    Some of the features that I really want are listed below. Were these available today, we would not have done our patches and would have had much more spare time to visit the gym and work off the great lunches and dinners that are provided to us.
    • including the original SQL statement when row-based replication is used
    • tools for converting row-based replication events into a format I can read
    • use of SQL tables for slave replication state and making updates transactional
    • support for some variations of synchronous replication
    • support for making binlog offsets usable on multiple servers
    • support for fast failover
    0

    Add a comment

Loading