<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5915567578707286635</id><updated>2012-01-27T13:58:36.171-08:00</updated><category term='myisam'/><category term='distributed'/><category term='postgres'/><category term='other'/><category term='mysql'/><category term='ha'/><category term='innodb'/><category term='free'/><category term='optimizer'/><category term='shill'/><category term='windows'/><category term='oops'/><category term='nosql'/><category term='performance'/><category term='pbxt'/><category term='mongodb'/><category term='rant'/><category term='replication'/><title type='text'>High Availability MySQL</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default?start-index=101&amp;max-results=100'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>217</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-7354196434421186378</id><published>2012-01-26T20:27:00.000-08:00</published><updated>2012-01-26T20:27:57.648-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Error injection tests for InnoDB would be nice</title><content type='html'>I am trying to figure out why an InnoDB table was lost when a DDL statement failed. I think it was a RENAME TABLE statement. I have yet to find the root cause but I did find that InnoDB doesn't report some errors when RENAME fails so the user thinks that the table was renamed, the FRM file is renamed, and the ibd file is not renamed. This is only a problem for files not in the InnoDB system tablespace so --innodb_file_per_table=1 must be used. This is &lt;a href="http://bugs.mysql.com/bug.php?id=64144"&gt;bug&amp;nbsp;64144&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As I wrote in a &lt;a href="http://mysqlha.blogspot.com/2011/12/marketing-bug-in-3-easy-steps.html"&gt;previous blog post&lt;/a&gt;, it is time to add error injection tests to InnoDB.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-7354196434421186378?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/7354196434421186378/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2012/01/error-injection-tests-for-innodb-would.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7354196434421186378'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7354196434421186378'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2012/01/error-injection-tests-for-innodb-would.html' title='Error injection tests for InnoDB would be nice'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1361057475268881275</id><published>2012-01-12T07:07:00.000-08:00</published><updated>2012-01-12T07:33:14.200-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Who wants to write a storage engine?</title><content type='html'>&lt;a href="http://code.google.com/p/leveldb/"&gt;LevelDB&lt;/a&gt; is here. It might make an interesting storage engine for MySQL as it has many performance benefits at the cost of a few limitations: no multi-statement transactions and MyISAM-like concurrency. I doubt we will ever get a production quality LevelDB storage engine for MySQL because the storage engine API is hard, so hard that such projects require funding. This is unfortunate.&lt;br /&gt;&lt;br /&gt;LevelDB might be a great fit for MongoDB. MongoDB doesn't need multi-statement transactions. Both are limited by 1-writer or N-reader concurrency, but writes to database files are much faster with LevelDB because it doesn't do update in place. So LevelDB doesn't lose as much performance for IO-bound workloads by doing 1-writer or N-readers and my guess is that this could make MongoDB much better at supporting IO-bound workloads.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1361057475268881275?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1361057475268881275/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2012/01/who-wants-to-write-storage-engine.html#comment-form' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1361057475268881275'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1361057475268881275'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2012/01/who-wants-to-write-storage-engine.html' title='Who wants to write a storage engine?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-174033801786501014</id><published>2011-12-02T18:42:00.001-08:00</published><updated>2011-12-02T18:45:27.114-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Marketing a bug in 3 easy steps</title><content type='html'>&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;a href="http://bugs.mysql.com/bug.php?id=60343"&gt;File a request&lt;/a&gt; for crash recovery tests and wait a few months&lt;/li&gt;&lt;li&gt;&lt;a href="http://bugs.mysql.com/bug.php?id=62401"&gt;File a request&lt;/a&gt; for error injection tests during InnoDB DDL and wait a few months&lt;/li&gt;&lt;li&gt;Lose a table during alter table because &lt;a href="http://bugs.mysql.com/bug.php?id=63553"&gt;untested error handling is incorrect&lt;/a&gt;&amp;nbsp;and blog about it&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-174033801786501014?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/174033801786501014/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2011/12/marketing-bug-in-3-easy-steps.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/174033801786501014'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/174033801786501014'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2011/12/marketing-bug-in-3-easy-steps.html' title='Marketing a bug in 3 easy steps'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2614989863700045711</id><published>2011-11-18T14:15:00.001-08:00</published><updated>2011-11-18T14:21:33.993-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Great work, bug #12704861 was fixed!</title><content type='html'>MySQL Community Server 5.1.60 has been released and I am very happy because the &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/news-5-1-60.html"&gt;release notes&lt;/a&gt; state that bug&amp;nbsp;#12704861 has been fixed. I know this bug quite well. As my readers are very busy let me provide all of the details that have been made available to the community:&lt;br /&gt;&lt;blockquote class="tr_bq"&gt;&lt;span class="Apple-style-span" style="background-color: white; font-family: arial, sans-serif; font-size: 13px;"&gt;InnoDB Storage Engine: Data from BLOB columns could be lost if&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="background-color: white; font-family: arial, sans-serif; font-size: 13px;"&gt;the server crashed at a precise moment when other columns were&amp;nbsp;&lt;/span&gt;&lt;span class="Apple-style-span" style="background-color: white; font-family: arial, sans-serif; font-size: 13px;"&gt;being updated in an InnoDB table. (Bug #12704861)&lt;/span&gt;&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2614989863700045711?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2614989863700045711/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2011/11/great-work-bug-12704861-was-fixed.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2614989863700045711'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2614989863700045711'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2011/11/great-work-bug-12704861-was-fixed.html' title='Great work, bug #12704861 was fixed!'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1108804542136531423</id><published>2011-10-19T23:05:00.000-07:00</published><updated>2011-10-19T23:05:42.261-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Unexplained jumps in Seconds_Behind_Master</title><content type='html'>I am trying to understand why a server would go from 0 to 45 and then back to 0 seconds of replication lag as reported by the Seconds_Behind_Master column in SHOW SLAVE STATUS output. This occurs over a few seconds so there isn't a statement that runs for 45 seconds on the slave. I then compared consecutive SET TIMESTAMP values in the binlog and the absolute value of the differences is at most 2 seconds.&lt;br /&gt;&lt;br /&gt;Has anyone else been confused by this? I filed &lt;a href="http://bugs.mysql.com/?id=62839"&gt;bug 62839&lt;/a&gt;&amp;nbsp;and think there is a race condition in the code that computes the value for Seconds_Behind_Master. If I am correct about this then we need to fix the problem. Replication lag is a big problem for many of us and reporting a value that is much larger than the actual value is bad PR.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1108804542136531423?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1108804542136531423/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2011/10/unexplained-jumps-in.html#comment-form' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1108804542136531423'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1108804542136531423'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2011/10/unexplained-jumps-in.html' title='Unexplained jumps in Seconds_Behind_Master'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-8791180465127846726</id><published>2011-10-13T07:57:00.000-07:00</published><updated>2011-10-13T07:57:58.576-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Multi-master, NoSQL and MySQL</title><content type='html'>The MySQL family has been innovating rapidly. New features need names and sometimes those names are confusing. Describing something as &lt;b&gt;multi-master&lt;/b&gt; or a &lt;b&gt;NoSQL solution&lt;/b&gt; has confused me.&lt;br /&gt;&lt;br /&gt;Multi-master requires one of conflict prevention, conflict resolution or faith. MySQL Cluster provides both conflict prevention and resolution as described &lt;a href="http://messagepassing.blogspot.com/2011/10/eventual-consistency-with-transactions.html"&gt;in these&lt;/a&gt; &lt;a href="http://messagepassing.blogspot.com/2011/10/eventual-consistency-with-mysql.html"&gt;great posts&lt;/a&gt;. Regular MySQL has minimal support for conflict prevention (auto-increment-offset can prevent insert conflicts) and thus requires faith that the application does the right thing. Regular MySQL gets conflict prevention via synchronous replication when used with &lt;a href="http://www.codership.com/"&gt;Galera&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;There has been talk of adding support to replicate from multiple masters into one slave. We have yet to agree on a name for this. It has been called fan-in, multi-source and multi-master. I hope multi-master isn't reused for this.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;You can now use HandlerSocket and the memcache API to access MySQL. But taking away SQL doesn't make this a NoSQL solution. And in the case of MySQL Cluster it already has many of the properties of a NoSQL solution, getting the memcache API is just gravy. The new APIs will allow some workloads to get more read throughput from MySQL. They won't do much for write throughput because those bottlenecks are in InnoDB and independent of SQL/NoSQL. Note that the memcache plugin for InnoDB has tuning options that are good for benchmarks but might not be good for production. There are options that reduce the frequency at which transactions are committed and started. These options allow the transaction start and commit bottlenecks to be avoided at the cost of stale reads and async commits -- see &lt;a href="http://blogs.innodb.com/wp/2011/04/nosql-to-innodb-with-memcached/"&gt;daemon_memcached_r_batch_size and daemon_memcached_w_batch_size.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I don't think the lack of SQL access is what makes many of the NoSQL products compelling. The big thing for me is that they can reduce TCO because you will spend less time managing a DBMS that has these features:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;no sharding required or sharding and resharding are easy&lt;/li&gt;&lt;li&gt;no failover via multi-master or failover is automated&lt;/li&gt;&lt;li&gt;much less downtime on schema change or no schema changes required&lt;/li&gt;&lt;/ul&gt;Eventual consistency is supported by some of the NoSQL products. Given their simpler data models I expect it to be supported by more. This feature is critical for a service with users distributed around the world as response time can be reduced when users access a local database.&lt;br /&gt;&lt;br /&gt;Regardless of the API (SQL or NoSQL), regular MySQL doesn't have any of the features listed above. MySQL Cluster already has all of the features listed above. But there is still hope. Tungsten has automated failover, Galera continues to get better, official MySQL can get automated failover once &lt;a href="http://d2-systems.blogspot.com/2011/10/global-transaction-identifiers-feature.html"&gt;global transaction IDs are supported&lt;/a&gt;. &lt;a href="http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html"&gt;Semi-sync replication is supported&lt;/a&gt; and I think a few things can be done to make that much more useful for HA systems. There are tools to do online schema changes but I think that is better done in the MySQL server.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-8791180465127846726?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/8791180465127846726/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2011/10/multi-master-nosql-and-mysql.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8791180465127846726'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8791180465127846726'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2011/10/multi-master-nosql-and-mysql.html' title='Multi-master, NoSQL and MySQL'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-3368914641397885007</id><published>2011-08-11T08:53:00.000-07:00</published><updated>2011-08-11T08:53:26.169-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Read Amplification Factor</title><content type='html'>Flash devices are described in part by their &lt;a href="http://en.wikipedia.org/wiki/Write_amplification"&gt;write amplification factor&lt;/a&gt;&amp;nbsp;(or WAF). When the OS writes a page once the device might write it more than once and this multiple is the write amplification factor. The WAF isn't always described in marketing and even if it were the value you get in production is workload dependent.&lt;br /&gt;&lt;br /&gt;Variations of the&amp;nbsp;&lt;a href="http://www.bing.com/search?q=log+structured+merge+tree"&gt;log-structured merge tree&lt;/a&gt;&amp;nbsp;have been used by many new storage servers including &lt;a href="http://hbase.apache.org/"&gt;HBase&lt;/a&gt;,&amp;nbsp;&lt;a href="http://labs.google.com/papers/bigtable-osdi06.pdf"&gt;Bigtable&lt;/a&gt;, &amp;nbsp;&lt;a href="http://www.odbms.org/download/cassandra.pdf"&gt;Cassandra&lt;/a&gt;&amp;nbsp;and &lt;a href="http://code.google.com/p/leveldb/"&gt;leveldb&lt;/a&gt;. These servers append changes (delete, insert, update) to the end of a file rather than in place. To find one row by key value with an LSM the server might have to read from from multiple files or multiple locations within one file to fine one. I have been calling this the read penalty because a workload is very likely to do more disk reads when using an LSM than when using an update-in-place engine but I think that &lt;b&gt;read amplification factor&lt;/b&gt; (or RAF) might be a better phrase. If a workload does 100 disk reads on InnoDB and 120 disk reads on an LSM then the RAF is 1.2. The RAF matters even for many write-optimized servers because an update intensive workload requires many random disk reads. Although an LSM can avoid some of the reads when the update is a replace or when the operation is commutative and doesn't require an immediate result. For example an update that increments a row can log +1 when the request doesn't need to return the old value.&lt;br /&gt;&lt;br /&gt;Many LSM implementations &lt;a href="http://www.bing.com/search?q=log+structured+merge+tree+bloom+filter"&gt;use a bloom filter&lt;/a&gt; to reduce the RAF. The bloom filter prevents some reads from files known not to have data for a given key. A bloom filter only works for point lookups. It cannot be used for a range scan and the RAF for a workload will be at its worst when you map a relational schema directly to HBase (1 row in InnoDB --&amp;gt; 1 row in HBase). Fortunately many of the LSM implementations support schemas in which more data is consolidated into one row and in many cases something that requires a range scan in a SQL RDBMS will use a point lookup in HBase.&lt;br /&gt;&lt;br /&gt;There are new products (&lt;a href="http://www.tokutek.com/"&gt;TokuDB&lt;/a&gt;,&amp;nbsp;&lt;a href="http://www.acunu.com/"&gt;Acunu&lt;/a&gt;, &lt;a href="http://www.rethinkdb.com/"&gt;maybe RethinkDB&lt;/a&gt;) that claim to be better than an LSM in part because their RAF is much closer to one for both point lookups and range scans. By closer to one I mean that there is (almost) no read penalty. This should be easy to verify with a production workload.&lt;br /&gt;&lt;br /&gt;While there are very interesting performance models described in the literature I use a very simple one when considering the read amplification factor. In my model all levels of a tree-structured index are in RAM except for the lowest level. In this model a point lookup with an update-in-place DBMS does at most one disk read from an index leaf page excluding access to external/overflow pages for LOB columns and other special cases. For something that claims to be better than an update-in-place DBMS I want to know how many index leaf pages are read in the worst and average cases.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-3368914641397885007?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/3368914641397885007/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2011/08/read-amplification-factor.html#comment-form' title='17 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3368914641397885007'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3368914641397885007'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2011/08/read-amplification-factor.html' title='Read Amplification Factor'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>17</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-6871418593009589361</id><published>2011-08-10T11:38:00.000-07:00</published><updated>2011-08-10T11:38:56.732-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>This Percona patch saved me</title><content type='html'>If you use InnoDB and have long-running select statements then the InnoDB undo space can grow large because purge can't advance beyond the longest open transaction. If this is blocked for too-long then pages to be purged might leave the InnoDB buffer cache and the purge thread will have to do many disk reads. As the InnoDB purge process is single-threaded it might not be able to catch up and a server can use too much disk space for a very long time.&lt;br /&gt;&lt;br /&gt;I experienced this problem at work. &amp;nbsp;I modified innochecksum to report the number of pages by type and it was easy to see that undo accounted for most of the ibdata1 file (a few hundred GB). As InnoDB doesn't shrink ibd files there is no way to get this space back on a running server but I would like to prevent the problem in the future.&lt;br /&gt;&lt;br /&gt;The problem was solved by using the &lt;a href="http://www.mysqlperformanceblog.com/2011/08/09/announcing-percona-live-mysql-conference-and-expo-2012/"&gt;multi-threaded purge patch&lt;/a&gt; from XtraDB for MySQL 5.1. The patch isn't too much code so I felt comfortable testing it and was able to review it. Since deploying it I no longer have to worry about servers using too much space for undo because of purge lag.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-6871418593009589361?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/6871418593009589361/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2011/08/this-percona-patch-saved-me.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6871418593009589361'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6871418593009589361'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2011/08/this-percona-patch-saved-me.html' title='This Percona patch saved me'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-5554198031143742246</id><published>2011-08-02T21:06:00.000-07:00</published><updated>2011-08-03T06:40:17.503-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Technical debt</title><content type='html'>Last time I checked &lt;a href="http://bugs.mysql.com/bug.php?id=60343"&gt;there was one test&lt;/a&gt; in the MySQL test suite (mtr) that covers one case for crash recovery. Perhaps there is a private test suite. Given that I modify InnoDB and replication code and that I frequently debug crashes at work I wish there were more tests. I added many tests in the Facebook patch for crash recovery to confirm that recovery works for the replication slave, replication master and InnoDB. While doing so I found at least one bug in rpl_transaction_enabled. While working on global transaction IDs Justin found a few bugs in official MySQL that prevented recovery after the slave crashed. These were fixed &amp;nbsp;in official MySQL 5.1 so there is a lot of value in having tests like this. But there is a lot more that should be tested including:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;crash recovery during DDL. There are windows where recovery is not possible given that many DDL commands are not atomic between InnoDB and the FRM file.&lt;/li&gt;&lt;li&gt;crash recovery during DDL on a partitioned table. I know from debugging crashes that a file named &lt;b&gt;ddl.log&lt;/b&gt; is written.&amp;nbsp;&lt;/li&gt;&lt;li&gt;crash recovery on a replication master. The master uses XA to keep the binlog and InnoDB in sync. But there are not tests to confirm that the right thing is done after a crash at each step of the two-phase commit process. The Facebook patch has such tests so I trust that the code in official MySQL but these tests should be in mtr.&lt;/li&gt;&lt;li&gt;crash recovery on a replication slave. Unfortunately, there are known race conditions in replication slave state updates that might be fixed in 5.6.3 so all of the possible failures cannot be tested. But updates to the replication state index files can be crash proof and I think they are now after the Google team reported the problems while testing global transaction IDs.&lt;/li&gt;&lt;li&gt;crash recovery for InnoDB. I don't think I have to write anything about this.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;MySQL has extremely useful DEBUG macros for making the server crash at specific points in the code. This makes it very easy to add deterministic crash tests. The test suite was updated within the last year to allow the server to be crashed without failing a test. Unfortunately these features are rarely used in the official test suite.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I hope this changes. Tests with deterministic failures are just the start. The next step is a test with randomized failures. Unfortunately it won't be as easy to add this to mtr as such a test is much easier to write in Perl or Python. We have a test suite at work where a random workload is run against the master and the master is killed at random times. A slave replicates from that master and after each kill tables on the master and slave are compared.&lt;br /&gt;&lt;br /&gt;And now some of the recovery bugs that I have encountered including problems that have been fixed:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://bugs.mysql.com/bug.php?id=41609"&gt;Crash recovery does not work for InnoDB temp tables&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Verdana, 'Lucida Grande', 'Lucida Sans Unicode', Tahoma, Arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;a href="http://bugs.mysql.com/bug.php?id=56373"&gt;InnoDB Plugin : DROP TABLE on a corrupt table crashes server&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Verdana, 'Lucida Grande', 'Lucida Sans Unicode', Tahoma, Arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;a href="http://bugs.mysql.com/bug.php?id=37148"&gt;Most callers of mysql_bin_log.write ignore the return result&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Verdana, 'Lucida Grande', 'Lucida Sans Unicode', Tahoma, Arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;a href="http://bugs.mysql.com/bug.php?id=38826"&gt;Race in MYSQL_LOG::purge_logs is impossible to debug in production&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Verdana, 'Lucida Grande', 'Lucida Sans Unicode', Tahoma, Arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;a href="http://bugs.mysql.com/bug.php?id=52620"&gt;Crash can leave null characters in binary log index file&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Verdana, 'Lucida Grande', 'Lucida Sans Unicode', Tahoma, Arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;a href="http://bugs.mysql.com/bug.php?id=39325"&gt;Server crash inside MYSQL_LOG::purge_first_log halts replicaiton&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;And open bugs and feature requests&lt;/div&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Verdana, 'Lucida Grande', 'Lucida Sans Unicode', Tahoma, Arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;a href="http://bugs.mysql.com/bug.php?id=25922"&gt;InnoDB crash recovery changes: make DDL in MySQL 'atomic'&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Verdana, 'Lucida Grande', 'Lucida Sans Unicode', Tahoma, Arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;a href="http://bugs.mysql.com/bug.php?id=62037"&gt;assert fails in row_search_for_mysql on TRX_ISO_READ_UNCOMMITTED&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Verdana, 'Lucida Grande', 'Lucida Sans Unicode', Tahoma, Arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;a href="http://bugs.mysql.com/bug.php?id=24894"&gt;Prevent relay-log.info from getting out of sync with transactions on slave crash&lt;/a&gt;&amp;nbsp;or &lt;a href="http://bugs.mysql.com/bug.php?id=26540"&gt;26540&lt;/a&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://bugs.mysql.com/bug.php?id=26489"&gt;26489&lt;/a&gt; will be the proxy for the binlog event checksums feature request. PITR isn't possible after a crash when binlog events are incorrect but you can't detect that.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-5554198031143742246?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/5554198031143742246/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2011/08/technical-debt.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5554198031143742246'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5554198031143742246'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2011/08/technical-debt.html' title='Technical debt'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-6704833196290805624</id><published>2011-04-29T17:08:00.000-07:00</published><updated>2011-04-29T17:17:01.141-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='nosql'/><title type='text'>Is this a new feature?</title><content type='html'>Is this an amazing new feature or the next step after the &lt;a href="http://mysqlha.blogspot.com/2011/02/where-have-bugs-gone.html"&gt;change to bugs.mysql.com&lt;/a&gt;? I was about to file a bug report to improve the MySQL manual, but that won't happen now:&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Go to page for &lt;a href="http://dev.mysql.com/doc/refman/5.6/en/index.html"&gt;MySQL Reference manual&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Type text in "Search Manual" box on the left-hand side of the page&lt;/li&gt;&lt;li&gt;End up at login.oracle.com &lt;a href="https://login.oracle.com/mysso/signon.jsp"&gt;sign in page&lt;/a&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-6704833196290805624?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/6704833196290805624/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2011/04/is-this-new-feature.html#comment-form' title='17 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6704833196290805624'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6704833196290805624'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2011/04/is-this-new-feature.html' title='Is this a new feature?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>17</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-8790118682121203794</id><published>2011-04-17T17:14:00.000-07:00</published><updated>2011-04-17T17:16:50.180-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>O'Reilly MySQL &amp; more 2011 was great</title><content type='html'>I hope this is held again in 2012 and that Oracle sends more developers to the conference. I had useful conversations with the four developers and development managers who attended from Oracle. I also had great discussions with Percona, Monty Program, Continuent and many others. At home I sit and type, at the conference I stand and talk. It is a welcome change.&lt;br /&gt;&lt;br /&gt;The MySQL team has been very productive recently with many bug fixes, many features implemented, a great 5.5 release and what appears to be an even better 5.6 release. The changes in 5.6 are a really big deal. I can't wait to stop porting rpl_transaction_enabled to get crash-proof slave state.&lt;br /&gt;&lt;br /&gt;The value-added community continues to push the state of the art for those who can't wait for the great new features to be GA or for those with special problems. &amp;nbsp;I am interested in trying &lt;a href="http://scale-out-blog.blogspot.com/2011/04/settling-in-at-codegooglecom.html"&gt;parallel replication apply from Tungsten&lt;/a&gt;, working with Monty Program to improve monitoring in MariaDB and working with Percona to improve InnoDB quality-of-service for high-throughput OLTP.&lt;br /&gt;&lt;br /&gt;I have begun to catch up on my reading to figure out what has changed for 5.6. I probably need another week to finish reading the many useful blogs and presentations. DimitriK published a 5-part performance report that I have yet to start (&lt;a href="http://dimitrik.free.fr/blog/archives/2011/04/mysql-performance-56-notes-part-1-discovery.html"&gt;1&lt;/a&gt;, &lt;a href="http://dimitrik.free.fr/blog/archives/2011/04/mysql-performance-56-notes-part-2-under-full-dbstress-workload.html"&gt;2&lt;/a&gt;, &lt;a href="http://dimitrik.free.fr/blog/archives/2011/04/mysql-performance-56-notes-part-3-more-in-depth.html"&gt;3&lt;/a&gt;, &lt;a href="http://dimitrik.free.fr/blog/archives/2011/04/mysql-performance-56-notes-part-4-fixing-purge-issue.html"&gt;4&lt;/a&gt;, &lt;a href="http://dimitrik.free.fr/blog/archives/2011/04/mysql-performance-56-notes-part-5-fixing-adaptive-flushing.html"&gt;5&lt;/a&gt;). I am happy to find that few more MySQL developers at Oracle have begun blogging (&lt;a href="http://oysteing.blogspot.com/2011/04/more-stable-query-execution-time-by.html"&gt;Oystein&lt;/a&gt;, &lt;a href="http://didrikdidrik.blogspot.com/2011/04/optimizing-mysql-filesort-with-small.html"&gt;Didrik&lt;/a&gt;, &lt;a href="http://d2-systems.blogspot.com/2011/04/mysql-562-dm-binlog-informational.html"&gt;Luis&lt;/a&gt;, &lt;a href="http://olavsandstaa.blogspot.com/2011/04/mysql-56-index-condition-pushdown.html"&gt;Olav&lt;/a&gt;). How do I type the accented "O" in Oystein?&lt;br /&gt;&lt;br /&gt;First the InnoDB changes:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://blogs.innodb.com/wp/2011/04/information-schema-system-table/"&gt;Information schema system tables&lt;/a&gt;&amp;nbsp;- InnoDB has moved a lot of data to IS tables. The ones listed here are not the most interesting to me but the migration in general makes life easier for me. I don't know if these are new in 5.6, but they are more interesting to me than the ones described in the InnoDB blog:&amp;nbsp;&lt;span class="Apple-style-span" style="color: #555555; font-family: verdana, arial, helvetica, sans-serif; font-size: 12px; line-height: 18px;"&gt;&lt;code style="background-attachment: initial; background-clip: initial; background-color: transparent; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; color: #761596; font-size: 12px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; vertical-align: baseline;"&gt;INNODB_BUFFER_PAGE&lt;/code&gt;,&amp;nbsp;&lt;code style="background-attachment: initial; background-clip: initial; background-color: transparent; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; color: #761596; font-size: 12px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; vertical-align: baseline;"&gt;INNODB_BUFFER_PAGE_LRU&lt;/code&gt;, and&amp;nbsp;&lt;code style="background-attachment: initial; background-clip: initial; background-color: transparent; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; color: #761596; font-size: 12px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; vertical-align: baseline;"&gt;INNODB_BUFFER_POOL_STATS.&lt;/code&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://blogs.innodb.com/wp/2011/04/mysql-5-6-innodb-scalability-fix-kernel-mutex-removed/"&gt;No more kernel_mutex&lt;/a&gt; - Wow! This is a huge deal for multi-core performance and will enable even more improvements in future releases. As part of the admission control feature we have been trying to reduce kernel_mutex contention and noticed that 5.5 already was much better for that.&lt;/li&gt;&lt;li&gt;&lt;a href="http://blogs.innodb.com/wp/2011/04/innodb-persistent-statistics-at-last/"&gt;Persistent index cardinality statistics&lt;/a&gt; - when a server is restarted all index cardinality stats are computed in MySQL 5.1 and without the Facebook patch, this is serialized because of LOCK_open. That can create too many stalls on restart. I assume this allows a DBA to populate the stats table manually which can be very useful for a deployment with many scale-out slaves to prevent query plans from changing between slaves. Other details are at &lt;a href="http://oysteing.blogspot.com/2011/04/more-stable-query-execution-time-by.html"&gt;Oystein's blog&lt;/a&gt;.&amp;nbsp;&lt;span class="Apple-style-span" style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 12px; font-weight: bold; line-height: 16px;"&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://blogs.innodb.com/wp/2011/04/mysql-5-6-multi-threaded-purge/"&gt;Multi-threaded purge&lt;/a&gt;&amp;nbsp;- I want this right now. This and parallel replication apply make it possible to deploy large databases on slaves using disk setups that match what is available on a master. Otherwise you must read &lt;a href="http://yoshinorimatsunobu.blogspot.com/2011/04/slides-linux-and-hw-optimizations-for_14.html"&gt;Yoshinori's slides&lt;/a&gt; to find the workaround.&lt;/li&gt;&lt;li&gt;&lt;a href="http://blogs.innodb.com/wp/2011/04/mysql-5-6-multi-threaded-purge/"&gt;&lt;/a&gt;&lt;a href="http://blogs.innodb.com/wp/2011/04/mysql-5-6-data-dictionary-lru/"&gt;Data dictionary LRU&lt;/a&gt;&amp;nbsp;helps if you have a lot (or too many) tables.&lt;/li&gt;&lt;li&gt;&lt;a href="http://blogs.innodb.com/wp/2011/04/introducing-page_cleaner-thread-in-innodb/"&gt;Page cleaner thread&lt;/a&gt; - This is a good step but I wonder if it is enough and need to read code or wait for a Percona performance report. Has anything been done to prevent stalls when the async flush tries to flush too many pages in one call? The problem is that all dirty pages from an extent are flushed when at least one page with a too-old LSN is in the extent. Flushing neighbor pages can increase the number of pages flushed by up to 64X. I have seen benchmark servers stall for 60+ seconds while 200,000+ pages were flushed when the async limit was reached. This can be very painful in servers that are able to cache the database and fill with many dirty pages.&lt;/li&gt;&lt;li&gt;&lt;a href="http://blogs.innodb.com/wp/2011/04/nosql-to-innodb-with-memcached/"&gt;memcached API for InnoDB&lt;/a&gt; - this should enable HandlerSocket like performance while supporting an API that is already supported by most clients (PHP, Java, Python, ...). I think that non-SQL interfaces to InnoDB will be a big deal.&lt;/li&gt;&lt;li&gt;&lt;a href="http://blogs.innodb.com/wp/2011/04/innodb-metrics-table/"&gt;metrics table for InnoDB counters&lt;/a&gt; - I spent too much time adding counters to SHOW STATUS for InnoDB. Now you don't have to.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;The InnoDB team has done a great job of making it easy to understand the changes. Getting docs for the replication changes required a bit more searching as they were not referenced from the main Oracle announcement.&amp;nbsp;I added links to worklog entries and blogs by Mats and Luis. I am very interested in many of the replication changes that match and exceed what was in &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/Mysql5Patches"&gt;the Google patch&lt;/a&gt;.&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://forge.mysql.com/wiki/ReplicationFeatures/ParallelSlave"&gt;parallel replication apply&lt;/a&gt; - I am confused. The labs page and worklog state that this is for RBR. I want parallel apply for SBR to overcome slave replication lag from IO bound slaves. Parallel apply provides parallel IO requests. Per the &lt;a href="http://d2-systems.blogspot.com/2011/04/mysql-56x-feature-preview-multi.html"&gt;blog by Luis&lt;/a&gt;, I think this supports SBR.&lt;/li&gt;&lt;li&gt;&lt;a href="http://forge.mysql.com/worklog/task.php?id=2775"&gt;system tables for slave state&lt;/a&gt; - I don't know if WL2775 describes what was implemented. &lt;a href="http://datacharmer.blogspot.com/2011/04/replication-metadata-in-mysql-562.html"&gt;Feedback from gmaxia&lt;/a&gt;&amp;nbsp;makes me wish for more docs. Hopefully I can stop porting rpl_transaction_enabled and begin to use this instead. I need to read the &lt;a href="http://mysqlmusings.blogspot.com/2011/04/crash-safe-replication.html"&gt;blog by Mats&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;&lt;a href="http://forge.mysql.com/worklog/task.php?id=2540"&gt;replication checksums&lt;/a&gt; - I want binlog event checksums. I have not needed them for a long time since a certain bug was fixed but I would rather not worry about that problem again. Mats also &lt;a href="http://mysqlmusings.blogspot.com/2011/04/replication-event-checksum.html"&gt;wrote about this&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;&lt;a href="http://d2-systems.blogspot.com/2011/04/mysql-562-dm-binlog-informational.html"&gt;informational log events&lt;/a&gt; - I need the original SQL including its query comment in the binlog if I am to use RBR. With this feature I have one less excuse for not trying RBR.&lt;/li&gt;&lt;li&gt;remote binlog backup - We already have this in the 5.1 Facebook patch thanks to a backport by Harrison. It lets you archive the binlog almost as soon as it is written.&amp;nbsp;&lt;/li&gt;&lt;li&gt;&lt;a href="http://forge.mysql.com/worklog/task.php?id=3584"&gt;universal group identifiers&lt;/a&gt; - This might be the equivalent of global transaction IDs from the &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/GlobalTransactionIds"&gt;Google patch&lt;/a&gt;.&amp;nbsp;&lt;/li&gt;&lt;li&gt;&lt;a href="http://d2-systems.blogspot.com/2011/04/mysql-562-dm-optimized-row-based.html"&gt;optimized RBR logging&lt;/a&gt; - This will be a big deal for tables with BLOB columns that get frequent updates to non-BLOB columns. I have a few of those.&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.clusterdb.com/mysql-replication/delayed-replication-in-mysql-5-6-development-release/"&gt;Time delayed replication&lt;/a&gt; - I don't need this but many others will.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;There also appear to be many useful changes in the Performance Schema. Alas, it isn't easy to figure out what has changed from reading the &lt;a href="http://dev.mysql.com/tech-resources/articles/whats-new-in-mysql-5.6.html"&gt;big announcement&lt;/a&gt;. I haven't been a fan of the P_S given my primary need is for aggregated workload stats (per-table, per-index, per-user) and that the user_stats patch from Facebook, Percona, MariaDB and Google has had that for many years in a form that is trivial to use (no setup) with very low overhead (it is always enabled in my benchmarks) and will always be enabled in production. It also is very easy to use -- &lt;b&gt;select * from easy_to_remember_table_name&lt;/b&gt;. I hope the P_S can also provide that. Mark and Marc have begun to write more about this. I hope that continues.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-8790118682121203794?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/8790118682121203794/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2011/04/oreilly-mysql-more-2011-was-great.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8790118682121203794'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8790118682121203794'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2011/04/oreilly-mysql-more-2011-was-great.html' title='O&apos;Reilly MySQL &amp; more 2011 was great'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-507185809639159198</id><published>2011-02-20T16:18:00.000-08:00</published><updated>2011-02-20T16:18:02.087-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Where have the bugs gone?</title><content type='html'>The incoming bug rate over the &lt;a href="http://bugs.mysql.com/search.php?search_for=&amp;amp;status=Active&amp;amp;severity=&amp;amp;limit=10&amp;amp;order_by=&amp;amp;cmd=display&amp;amp;direction=ASC&amp;amp;bug_type=&amp;amp;os=0&amp;amp;phpver=&amp;amp;bug_age=7"&gt;past 7 days&lt;/a&gt; is much lower than the &lt;a href="http://bugs.mysql.com/search.php?search_for=&amp;amp;status=Active&amp;amp;severity=&amp;amp;limit=10&amp;amp;order_by=&amp;amp;cmd=display&amp;amp;direction=ASC&amp;amp;bug_type=&amp;amp;os=0&amp;amp;phpver=&amp;amp;bug_age=14"&gt;past 14 days&lt;/a&gt;. Maybe this is a blip. But commits&amp;nbsp;to &lt;a href="http://bazaar.launchpad.net/~mysql/mysql-server/mysql-trunk/changes"&gt;trunk&lt;/a&gt;&amp;nbsp;have begun to use 8-digit bug numbers that do not reference entries in bugs.mysql.com. I think something has changed but nothing has been announced. I like bugs.mysql.com. We all benefit by sharing bug reports and I &lt;a href="http://www.google.com/search?hl=en&amp;amp;q=site:bugs.mysql.com+callaghan"&gt;contributed a lot of content&lt;/a&gt;. It is also easy to search (use "site:bugs.mysql.com" on google searches).&lt;br /&gt;&lt;br /&gt;Assuming there was a change maybe it doesn't matter to most community members.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-507185809639159198?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/507185809639159198/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2011/02/where-have-bugs-gone.html#comment-form' title='35 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/507185809639159198'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/507185809639159198'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2011/02/where-have-bugs-gone.html' title='Where have the bugs gone?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>35</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-6474837125214250904</id><published>2011-02-11T16:10:00.000-08:00</published><updated>2011-02-11T16:15:21.894-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>This happens when you don't have a condition variable</title><content type='html'>Did Windows NT not have condition variables? This might be an artifact of that. It shows what happens when you don't have a condition variable.&lt;br /&gt;&lt;br /&gt;stop_ibuf_merges:&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;mutex_enter(&amp;amp;fil_system-&amp;gt;mutex);&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;space = fil_space_get_by_id(id);&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;if (space != NULL) {&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;space-&amp;gt;stop_ibuf_merges = TRUE;&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;if (space-&amp;gt;n_pending_ibuf_merges == 0) {&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;mutex_exit(&amp;amp;fil_system-&amp;gt;mutex);&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;count = 0;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;goto try_again;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;} else {&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;if (count &amp;gt; 5000) {&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ut_print_timestamp(stderr);&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;fputs(" &amp;nbsp;InnoDB: trying to&amp;nbsp;delete tablespace ", stderr);&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;ut_print_filename(stderr, space-&amp;gt;name);&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;}&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;mutex_exit(&amp;amp;fil_system-&amp;gt;mutex);&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;os_thread_sleep(20000);&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;count++;&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;goto stop_ibuf_merges;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;}&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-6474837125214250904?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/6474837125214250904/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2011/02/this-happens-when-you-dont-have.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6474837125214250904'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6474837125214250904'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2011/02/this-happens-when-you-dont-have.html' title='This happens when you don&apos;t have a condition variable'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-118713363328501007</id><published>2011-02-04T06:22:00.000-08:00</published><updated>2011-02-04T07:01:21.600-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>So long kernel mutex</title><content type='html'>If you are willing to look there are a few good changes in &lt;a href="https://launchpad.net/mysql-server/trunk"&gt;trunk&lt;/a&gt; for InnoDB. A frequent source of mutex contention in InnoDB, kernel_mutex, has been replaced with a rw-lock for the transaction system (see trx_sys_struct), two mutexes in srv_sys_struct and possibly other mutexes and rw-locks. The dulint struct has been replaced with a native 8-byte int. All of these changes should make InnoDB more efficient for workloads with a large number of concurrent transactions and help with &lt;a href="http://bugs.mysql.com/bug.php?id=49169"&gt;bug 49169&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I wish there were a better to track changes in InnoDB. Right now my tools are luck and recursive diff.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-118713363328501007?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/118713363328501007/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2011/02/so-long-kernel-mutex.html#comment-form' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/118713363328501007'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/118713363328501007'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2011/02/so-long-kernel-mutex.html' title='So long kernel mutex'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-5142326113380707109</id><published>2011-01-18T14:50:00.000-08:00</published><updated>2011-01-18T14:50:28.783-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Odd MySQL fact of the day</title><content type='html'>The LogEvent class has four methods named &lt;b&gt;read_log_event&lt;/b&gt;. This is the base class for binlog events. I might prefer different names. I was extremely confused by servers on which the value of Seconds_Behind_Master rapidly fluctuated between 0 and a large value. This behavior &lt;a href="http://dev.mysql.com/doc/refman/5.5/en/show-slave-status.html"&gt;is documented&lt;/a&gt;, I just had not read that page recently. Fortunately, the db operations experts explained it to me. So I am adding a few more variables to make it easier to monitor replication including:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Counters for the number of binlog events written by the IO thread and run by the SQL thread.&lt;/li&gt;&lt;li&gt;Counters for the number of bytes written by the IO thread and run by the SQL thread.&lt;/li&gt;&lt;li&gt;A timer for the number of seconds that the SQL thread waits for the IO thread to provide more events.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;And if you are curious, these are the read_log_event methods.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;One:&lt;br /&gt;&amp;nbsp;&amp;nbsp;static Log_event* read_log_event(IO_CACHE* file,&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; pthread_mutex_t* log_lock,&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; const Format_description_log_event&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; *description_event);&lt;br /&gt;&lt;br /&gt;&lt;div&gt;Two:&lt;/div&gt;&lt;div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp;static int read_log_event(IO_CACHE* file, String* packet,&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;pthread_mutex_t* log_lock);&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Three:&lt;/div&gt;&lt;div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp;static Log_event* read_log_event(IO_CACHE* file,&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; const Format_description_log_event&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; *description_event);&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Four:&lt;/div&gt;&lt;div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp;static Log_event* read_log_event(const char* buf, uint event_len,&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; const char **error,&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; const Format_description_log_event&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; *description_event);&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-5142326113380707109?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/5142326113380707109/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2011/01/odd-mysql-fact-of-day.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5142326113380707109'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5142326113380707109'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2011/01/odd-mysql-fact-of-day.html' title='Odd MySQL fact of the day'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-4387716562724520593</id><published>2010-12-28T12:02:00.000-08:00</published><updated>2010-12-28T16:11:33.845-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Performance monitoring in MySQL</title><content type='html'>There are several types of performance monitoring. Some of them are made easier when workload metrics are summarized by database user, table and index. The Google patch added SHOW TABLE_STATISTICS, SHOW INDEX_STATISICS and SHOW USER_STATISTICS to MySQL several years ago. Since then they have been ported and improved in &lt;a href="https://launchpad.net/mysqlatfacebook/51"&gt;the Facebook patch&lt;/a&gt;,&amp;nbsp;&lt;a href="http://kb.askmonty.org/v/user-statistics"&gt;MariaDB&lt;/a&gt; and &lt;a href="http://www.percona.com/docs/wiki/percona-server:features:userstatv2"&gt;Percona Server&lt;/a&gt;. I think that Eric Bergen &lt;a href="http://ebergen.net/wordpress/2010/12/24/second-draft-of-the-per-session-row-and-index-stats-patch/"&gt;has also ported it&lt;/a&gt;. I guess you could say the community has spoken.&lt;br /&gt;&lt;br /&gt;I like these features because they have been extremely useful to me. They are also simple to use and have little impact on performance. Most of what you need to know about them can be determined by listing the columns in the tables. But more than anything else they are easy to use.&lt;br /&gt;&lt;br /&gt;The Facebook patch has these columns for user_statistics:&lt;br /&gt;USER_NAME&lt;br /&gt;BINLOG_BYTES_WRITTEN&lt;br /&gt;BYTES_RECEIVED&lt;br /&gt;BYTES_SENT&lt;br /&gt;COMMANDS_DDL&lt;br /&gt;COMMANDS_DELETE&lt;br /&gt;COMMANDS_HANDLER&lt;br /&gt;COMMANDS_INSERT&lt;br /&gt;COMMANDS_OTHER&lt;br /&gt;COMMANDS_SELECT&lt;br /&gt;COMMANDS_TRANSACTION&lt;br /&gt;COMMANDS_UPDATE&lt;br /&gt;CONNECTIONS_CONCURRENT&lt;br /&gt;CONNECTIONS_DENIED_MAX_GLOBAL&lt;br /&gt;CONNECTIONS_DENIED_MAX_USER&lt;br /&gt;CONNECTIONS_LOST&lt;br /&gt;CONNECTIONS_TOTAL&lt;br /&gt;DISK_READ_BYTES&lt;br /&gt;DISK_READ_REQUESTS&lt;br /&gt;DISK_READ_SVC_USECS&lt;br /&gt;DISK_READ_WAIT_USECS&lt;br /&gt;ERRORS_ACCESS_DENIED&lt;br /&gt;ERRORS_TOTAL&lt;br /&gt;MICROSECONDS_CPU&lt;br /&gt;MICROSECONDS_WALL&lt;br /&gt;QUERIES_EMPTY&lt;br /&gt;ROWS_DELETED&lt;br /&gt;ROWS_FETCHED&lt;br /&gt;ROWS_INSERTED&lt;br /&gt;ROWS_READ&lt;br /&gt;ROWS_UPDATED&lt;br /&gt;ROWS_INDEX_FIRST&lt;br /&gt;ROWS_INDEX_NEXT&lt;br /&gt;TRANSACTIONS_COMMIT&lt;br /&gt;TRANSACTIONS_ROLLBACK&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The Facebook patch has these columns for table_statistics:&lt;/div&gt;&lt;div&gt;&lt;div&gt;TABLE_SCHEMA&lt;/div&gt;&lt;div&gt;TABLE_NAME&lt;/div&gt;&lt;div&gt;TABLE_ENGINE&lt;/div&gt;&lt;div&gt;ROWS_INSERTED&lt;/div&gt;&lt;div&gt;ROWS_UPDATED&lt;/div&gt;&lt;div&gt;ROWS_DELETED&lt;/div&gt;&lt;div&gt;ROWS_READ&lt;/div&gt;&lt;div&gt;ROWS_REQUESTED&lt;/div&gt;&lt;div&gt;ROWS_INDEX_FIRST&lt;/div&gt;&lt;div&gt;ROWS_INDEX_NEXT&lt;/div&gt;&lt;div&gt;IO_READ_BYTES&lt;/div&gt;&lt;div&gt;IO_READ_REQUESTS&lt;/div&gt;&lt;div&gt;IO_READ_SVC_USECS&lt;/div&gt;&lt;div&gt;IO_READ_SVC_USECS_MAX&lt;/div&gt;&lt;div&gt;IO_READ_WAIT_USECS&lt;/div&gt;&lt;div&gt;IO_READ_WAIT_USECS_MAX&lt;/div&gt;&lt;div&gt;IO_READ_OLD_IOS&lt;/div&gt;&lt;div&gt;IO_WRITE_BYTES&lt;/div&gt;&lt;div&gt;IO_WRITE_REQUESTS&lt;/div&gt;&lt;div&gt;IO_WRITE_SVC_USECS&lt;/div&gt;&lt;div&gt;IO_WRITE_SVC_USECS_MAX&lt;/div&gt;&lt;div&gt;IO_WRITE_WAIT_USECS&lt;/div&gt;&lt;div&gt;IO_WRITE_WAIT_USECS_MAX&lt;/div&gt;&lt;div&gt;IO_WRITE_OLD_IOS&lt;/div&gt;&lt;div&gt;IO_INDEX_INSERTS&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The Facebook patch has these columns for index_statistics:&lt;/div&gt;&lt;div&gt;&lt;div&gt;TABLE_SCHEMA&lt;/div&gt;&lt;div&gt;TABLE_NAME&lt;/div&gt;&lt;div&gt;INDEX_NAME&lt;/div&gt;&lt;div&gt;TABLE_ENGINE&lt;/div&gt;&lt;div&gt;ROWS_INSERTED&lt;/div&gt;&lt;div&gt;ROWS_UPDATED&lt;/div&gt;&lt;div&gt;ROWS_DELETED&lt;/div&gt;&lt;div&gt;ROWS_READ&lt;/div&gt;&lt;div&gt;ROWS_REQUESTED&lt;/div&gt;&lt;div&gt;ROWS_INDEX_FIRST&lt;/div&gt;&lt;div&gt;ROWS_INDEX_NEXT&lt;/div&gt;&lt;div&gt;IO_READ_BYTES&lt;/div&gt;&lt;div&gt;IO_READ_REQUESTS&lt;/div&gt;&lt;div&gt;IO_READ_SVC_USECS&lt;/div&gt;&lt;div&gt;IO_READ_SVC_USECS_MAX&lt;/div&gt;&lt;div&gt;IO_READ_WAIT_USECS&lt;/div&gt;&lt;div&gt;IO_READ_WAIT_USECS_MAX&lt;/div&gt;&lt;div&gt;IO_READ_OLD_IOS&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-4387716562724520593?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/4387716562724520593/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/12/performance-monitoring-in-mysql.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/4387716562724520593'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/4387716562724520593'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/12/performance-monitoring-in-mysql.html' title='Performance monitoring in MySQL'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-997599190325518966</id><published>2010-11-23T06:46:00.000-08:00</published><updated>2010-11-23T08:40:56.502-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='postgres'/><title type='text'>How are index-only scans implemented in InnoDB?</title><content type='html'>There have been interesting discussions in the PostgreSQL community about &lt;a href="http://rhaas.blogspot.com/2010/11/index-only-scans.html"&gt;adding support for index only scans&lt;/a&gt;. On several occasions people were curious about how InnoDB supports this. A &lt;a href="http://blogs.innodb.com/wp/2010/09/mysql-5-5-innodb-change-buffering/"&gt;recent post by the InnoDB&lt;/a&gt; team is an excellent overview. A brief summary of that post and other material is:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;records in the clustered (primary) index &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/innodb-multi-versioning.html"&gt;store hidden columns&lt;/a&gt; (DB_TRX_ID, DB_ROLL_PTR)&lt;/li&gt;&lt;li&gt;records in the non-clustered (secondary) index do not store hidden columns&lt;/li&gt;&lt;li&gt;records in clustered and non-clustered indexes have a delete-mark flag&lt;/li&gt;&lt;li&gt;records are not updated in the secondary index, they are delete-marked on delete, inserted on insert and delete-marked/inserted on update&lt;/li&gt;&lt;li&gt;delete-marked records are removed from indexes by the purge thread when it is safe to do so&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;When a secondary index page is read, if the max transaction ID on the page is less than the max transaction ID for which all transactions are visible to the reading transaction (low-water mark, up_limit_id), then the page can be used as is and the page read is index-only. If this condition is not true, then for any entry read from this page the record is read from the clustered index page to determine whether the index entry is visibile. In that case the secondary index read is not index only. Index only matters because when things are not index only there can be an additional random disk read to the clustered index for each entry read from the secondary index.&lt;br /&gt;&lt;br /&gt;The max transaction ID for which all transactions are visible to the reading transaction is described as the low-water mark and assigned to the up_limit_field in the read view (read_view_struct). This is the max transaction ID for which there are no unresolved transactions when the reading transaction starts. If there is a long-open transaction when the reading transaction starts, then up_limit_id will be less than the transaction ID of the long-open transaction.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I began to read the code for this today as I want to add a counter for the number of secondary index page reads that are and are not index only. If you want to read the code too the function&amp;nbsp;&lt;b&gt;lock_sec_rec_cons_read_sees&lt;/b&gt; determines whether all entries on a secondary index page are definitely visible to a transaction (read view).&lt;br /&gt;&lt;br /&gt;If you are interested in this topic, I recommend these books:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="https://www.packtpub.com/postgresql-9-0-high-performance/book"&gt;PostgreSQL 9.0 High Performance&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://oreilly.com/catalog/9780596003067"&gt;High Performance MySQL&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-997599190325518966?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/997599190325518966/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/11/how-are-index-only-scans-implemented-in.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/997599190325518966'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/997599190325518966'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/11/how-are-index-only-scans-implemented-in.html' title='How are index-only scans implemented in InnoDB?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-6088238260140647892</id><published>2010-11-19T07:59:00.000-08:00</published><updated>2010-11-19T09:05:08.590-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><title type='text'>Monty style</title><content type='html'>I make bad jokes about Monty-style code. I don't like to read or modify it. I love to run it in production. It never crashes. It never leaks memory. I exaggerate a little bit but not too much. It is remarkably stable.&lt;br /&gt;&lt;br /&gt;This is an amazing accomplishment. Alas some of the senior developers who did that at MySQL have since left. Another senior developer left this week. I hope this trend does not continue.&lt;br /&gt;&lt;br /&gt;Everyone can be replaced. Smart people can be found. But smart and productive people are not as widely available and it takes a while to figure out how things are done in MySQL. I know because I have made a lot of mistakes while trying to make things better.&lt;br /&gt;&lt;br /&gt;Perhaps the Google response is appropriate. Everyone gets a 10% raise and a $1000 holiday bonus.&lt;br /&gt;&lt;br /&gt;I did not submit this to &lt;a href="http://planet.mysql.com/"&gt;http://planet.mysql.com&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-6088238260140647892?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/6088238260140647892/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/11/monty-style.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6088238260140647892'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6088238260140647892'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/11/monty-style.html' title='Monty style'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-6979244758852158307</id><published>2010-10-26T11:04:00.000-07:00</published><updated>2010-10-26T11:23:44.454-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>More about MySQL at Facebook</title><content type='html'>The Facebook database teams will describe how MySQL is used at Facebook. &lt;a href="http://www.facebook.com/event.php?eid=160712450628622"&gt;Join us on Tuesday, November 2&lt;/a&gt;. The performance, operations and engineering teams will describe work in progress to keep MySQL happy. The event is from 7pm to 9pm (PDT or UTC -7).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-6979244758852158307?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/6979244758852158307/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/10/more-about-mysql-at-facebook.html#comment-form' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6979244758852158307'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6979244758852158307'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/10/more-about-mysql-at-facebook.html' title='More about MySQL at Facebook'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-144702120322837614</id><published>2010-09-30T18:37:00.000-07:00</published><updated>2010-09-30T18:37:05.028-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>The MySQL Plugins Development book is excellent</title><content type='html'>Packt Publishing gave me a copy of &lt;a href="http://www.packtpub.com/mysql-5-1-plugins-development/book?utm_source=mysqlha.blogspot.com&amp;amp;utm_medium=bookrev&amp;amp;utm_content=blog&amp;amp;utm_campaign=mdb_004860"&gt;MySQL 5.1 Plugins Development&lt;/a&gt;&amp;nbsp;to review.&amp;nbsp;The book is written by two people who are MySQL experts -- Sergei Golubchik and Andrew Hutchings. I know Sergei because he answers a lot of my questions on the &lt;a href="http://lists.mysql.com/internals"&gt;MySQL Internals mail list&lt;/a&gt;.&amp;nbsp;I wasn't in the best mood when I opened the e-book as I worked late last night. My mood was much better after reading a few pages. The book is amazing. The content, presentation and editing are excellent. The book is full of relevant examples that show you how to build plugins including source code and tips on compiling. The text carefully explains all of the steps required to build each of the plugins.&lt;br /&gt;&lt;br /&gt;I wish I had this book several years ago. It would have saved me a lot of trouble. But from the quality of the book I suspect that they have been working on it for several years.&lt;br /&gt;&lt;br /&gt;Thank you Sergei and Andrew. This is a great addition to the MySQL community.&lt;br /&gt;&lt;br /&gt;The target audience for this book is someone who wants to write code that will run within the mysqld process. This includes user defined functions (UDF) to be called by SQL statements, daemon plugins that can run code using a dedicated thread in the server process, information schema plugins to expose data in INFORMATION_SCHEMA, full-text parser plugins and storage engine plugins. There are examples for three storage engines. The first engine is simple and only supports read-only tables. The second example supports read-write tables without indexes and uses HTML as the file format. The final example builds a storage engine using &lt;a href="http://fallabs.com/"&gt;Tokyo Cabinet&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;My team has put a lot of things into the mysqld process. Some of those changes were even useful. This book would have prevented us from making a few mistakes. It even has an example to export the output from &lt;a href="http://linux.die.net/man/2/getrusage"&gt;getrusage&lt;/a&gt;&amp;nbsp;via SHOW STATUS. I added a similar patch to MySQL, but I don't think my change was as nice.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-144702120322837614?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/144702120322837614/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/09/mysql-plugins-development-book-is.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/144702120322837614'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/144702120322837614'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/09/mysql-plugins-development-book-is.html' title='The MySQL Plugins Development book is excellent'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1089259307883852118</id><published>2010-09-29T16:41:00.000-07:00</published><updated>2010-09-29T16:42:08.476-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Drizzle beta is here!</title><content type='html'>Congratulations!&lt;br /&gt;&lt;br /&gt;Can we please &lt;a href="http://adrianotto.com/2010/09/drizzle-is-now-beta"&gt;tone down the marketing&lt;/a&gt;? It is great to see a community project grow, especially in the MySQL family. But you are judged by successful deployments, not by the class libraries used by your project.&lt;br /&gt;&lt;br /&gt;Drizzle is only more reliable than MySQL when it keeps the replication log and InnoDB in sync during crash recovery. But it does not do that today. Official MySQL supports this on a master via &lt;a href="http://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html"&gt;sync_binlog&lt;/a&gt;. MySQL slaves do this today via &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/TransactionalReplication"&gt;rpl_transaction_enabled&lt;/a&gt; which is available in the Facebook and Google patches. I think it is also available in Percona Server, MariaDB and XtraDB.&lt;br /&gt;&lt;br /&gt;I have been told that Drizzle was designed for multi-core scalability. I am not sure what that means on real workloads or benchmarks. I know that there are some bottlenecks in InnoDB that have nothing to do with the code above it (MySQL, Drizzle). I also know that MySQL has made huge strides this year and MySQL 5.5 far exceeds my expectations.&amp;nbsp;Alas I cannot validate whether "designed for scalability" results in performance that is better than MySQL 5.5 as I cannot build Drizzle on my servers due to&amp;nbsp;the &amp;nbsp;dependency on gcc 4.2 or greater.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1089259307883852118?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1089259307883852118/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/09/drizzle-beta-is-here.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1089259307883852118'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1089259307883852118'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/09/drizzle-beta-is-here.html' title='Drizzle beta is here!'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1116158295937096159</id><published>2010-09-14T11:59:00.000-07:00</published><updated>2010-09-14T11:59:26.541-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='nosql'/><category scheme='http://www.blogger.com/atom/ns#' term='mongodb'/><title type='text'>Oh, no - MongoDB can be fast for key-value stores</title><content type='html'>I have broken my promise to stop writing about this. Sorry, but I had to correct my mistake. I ran three micro-benchmarks: get by primary key, get by secondary key and update by primary key. MySQL had a higher peak QPS for all of them. Alas, the results for get by primary key were skewed because pymongo, the Python driver for MongoDB, uses more CPU than MySQLdb, the Python driver for MySQL. The client host was saturated during the test and this limited peak QPS to 80,000 for MongoDB versus 110,000 for MySQL.&lt;br /&gt;&lt;br /&gt;I repeated one test using two 16-core client hosts with 40 processes per host. For that test the peak QPS on MongoDB improved to 155,000 while the peak for MySQL remained at 110,000. That is an impressive result. The results for get by secondary key and update by primary key are still valid as the server host saturated on those tests.&lt;br /&gt;&lt;br /&gt;Now I must consider rewriting the test harness in Java, C or C++ or I could add MongoDB support to sysbench. I prefer Python. In addition to under-reporting MongoDB peak performance it also under-reports MySQL performance. I am able to get 180,000 peak QPS using sysbench on one 16-core client host and mysqld on another versus 110,000 using the Python equivalent.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1116158295937096159?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1116158295937096159/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/09/oh-no-mongodb-can-be-fast-for-key-value.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1116158295937096159'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1116158295937096159'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/09/oh-no-mongodb-can-be-fast-for-key-value.html' title='Oh, no - MongoDB can be fast for key-value stores'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-8164240566156218598</id><published>2010-09-13T22:19:00.000-07:00</published><updated>2010-09-13T22:19:37.899-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='nosql'/><category scheme='http://www.blogger.com/atom/ns#' term='mongodb'/><title type='text'>MySQL versus MongoDB - update performance</title><content type='html'>This is the end of my public performance comparison of MongoDB versus MySQL, at least for the next few weeks. For these tests I used a 16-core x86_64 client host and an 8-core x86_64 server host, MySQL 5.1.50 with the Facebook patch, MongoDB 1.7.0 and Python clients. Ping between the client and server hosts was ~200us.&lt;br /&gt;&lt;br /&gt;I tested 5 configurations:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;inno.5150fb.b0.s0 - MySQL 5.1.50, the Facebook patch, binlog disabled&lt;/li&gt;&lt;li&gt;inno.5150fb.b1.s0 - MySQL 5.1.50, the Facebook patch, binlog enabled, sync_binlog=0&lt;/li&gt;&lt;li&gt;inno.5150fb.b1.s1 - MySQL 5.1.50, the Facebook patch, binlog enabled, sync_binlog=1&lt;/li&gt;&lt;li&gt;mongo.170.safe - MongoDB 1.7.0 and safe updates&lt;/li&gt;&lt;li&gt;mongo.170.unsafe - MongoDB 1.7.0 and unsafe updates&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Note that with unsafe updates the client does not wait for the server to respond. It sends requests as fast as it can or until the buffer between the client and server is full. When a sufficient number of concurrent clients are used and the clients run for enough time the buffer becomes full and unsafe updates do not improve performance.&lt;/div&gt;&lt;br /&gt;InnoDB has much better throughput when the binlog is disabled. &amp;nbsp;The clients update rows selected at random from 2M rows. InnoDB is able to handle more of that load concurrently than MongoDB as a reader-writer lock is used that prevents concurrency within the database. Note that while InnoDB allows more of the update to be done concurrently, it isn't perfect as there are several global mutexes in MySQL/InnoDB including LOCK_open, kernel_mutex and the InnoDB buffer pool mutex. When the binlog is enabled by sync_binlog is disabled even more global mutexes are used. Finally, when the binlog is enabled and sync_binlog=1 then group commit is not enabled and updates are rate limited by the performance of fsync. In this case fsync is fast as a HW RAID card with battery backed cache was used.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_3rU41dez5TI/TI8BrJNc7CI/AAAAAAAAAYo/3hYek4RHh7Q/s1600/updates_-_8_core_-_200us_ping.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://1.bp.blogspot.com/_3rU41dez5TI/TI8BrJNc7CI/AAAAAAAAAYo/3hYek4RHh7Q/s320/updates_-_8_core_-_200us_ping.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;This is output from &lt;a href="http://poormansprofiler.org/"&gt;PMP&lt;/a&gt; that demonstrates one source of mutex contention in mongod. The pileup occurs at&amp;nbsp;mongo::MongoMutex which apprently implements the reader-writer lock.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; 48 mongo::connThread,thread_proxy,start_thread,clone&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; 37 pthread_cond_wait@@GLIBC_2.3.2,boost::condition_variable::wait,mongo::MongoMutex::lock,mongo::receivedUpdate,mongo::assembleResponse&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; 10 recv,mongo::MessagingPort::recv,mongo::MessagingPort::recv&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;2&amp;nbsp;&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;1 select,mongo::Listener::initAndListen,mongo::listen,mongo::_initAndListen,mongo::initAndListen,main,select&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;1 select,mongo::Listener::initAndListen&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;1 pthread_cond_wait@@GLIBC_2.3.2,mongo::FileAllocator::Runner::operator(),thread_proxy,start_thread,clone&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;1 nanosleep,mongo::DataFileSync::run,mongo::BackgroundJob::thr,thread_proxy,start_thread,clone&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;1 nanosleep,mongo::ClientCursorMonitor::run,mongo::BackgroundJob::thr,thread_proxy,start_thread,clone&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;1 nanosleep&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;1 mongo::webServerThread,thread_proxy,start_thread,clone&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;1 mongo::SnapshotThread::run,mongo::BackgroundJob::thr,thread_proxy,start_thread,clone&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;1 mongo::interruptThread,thread_proxy,start_thread,clone&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;1 mongo::BtreeBucket::findSingle,mongo::ModSetState::createNewFromMods,mongo::_updateObjects,mongo::updateObjects,mongo::receivedUpdate,mongo::assembleResponse&lt;/blockquote&gt;&lt;div&gt;Source code for MongoDB queries is listed below. The code to setup MongoDB and MySQL is described in the &lt;a href="http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-yet-another-silly.html"&gt;previous&lt;/a&gt; &lt;a href="http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-fetch-by-secondary.html"&gt;posts&lt;/a&gt;.&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;def query_mongo(host, port, pipe_to_parent, requests_per, dbname, rows, check, testname, worst_n, id):&lt;br /&gt;&amp;nbsp;&amp;nbsp;conn = pymongo.Connection(host, port)&lt;br /&gt;&amp;nbsp;&amp;nbsp;db = conn[dbname]&lt;br /&gt;&amp;nbsp;&amp;nbsp;signal.signal(signal.SIGTERM, sigterm_handler)&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;gets = 0&lt;br /&gt;&amp;nbsp;&amp;nbsp;stats = SummaryStats(worst_n)&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;while True:&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;for loop in xrange(0, requests_per):&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;target = random.randrange(0, rows)&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;s = time.time()&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;try:&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;r = db.c.update({'_id': target}, {'$inc': {'k': 1 }}, safe=True)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;assert r['updatedExisting'] == True&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;assert r['ok'] == 1&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;stats.update(s)&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;gets += 1&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;except:&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;assert got_sigterm&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The my.cnf settings for MySQL except for the values of log_bin and sync_binlog:&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;innodb_buffer_pool_size=2000M&lt;/blockquote&gt;&lt;blockquote&gt;innodb_log_file_size=100M&lt;/blockquote&gt;&lt;blockquote&gt;innodb_flush_log_at_trx_commit=2&lt;/blockquote&gt;&lt;blockquote&gt;innodb_doublewrite=1&lt;/blockquote&gt;&lt;blockquote&gt;innodb_flush_method=O_DIRECT&lt;/blockquote&gt;&lt;blockquote&gt;innodb_thread_concurrency=0&lt;/blockquote&gt;&lt;blockquote&gt;innodb_max_dirty_pages_pct=80&lt;/blockquote&gt;&lt;blockquote&gt;innodb_file_format=barracuda&lt;/blockquote&gt;&lt;blockquote&gt;innodb_file_per_table&lt;/blockquote&gt;&lt;blockquote&gt;innodb_deadlock_detect=0&lt;/blockquote&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;blockquote&gt;max_connections=2000&lt;/blockquote&gt;&lt;blockquote&gt;table_cache=2000&lt;/blockquote&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;blockquote&gt;key_buffer_size=2000M&lt;/blockquote&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;blockquote&gt;innodb_doublewrite=0&lt;/blockquote&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-8164240566156218598?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/8164240566156218598/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-update-performance.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8164240566156218598'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8164240566156218598'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-update-performance.html' title='MySQL versus MongoDB - update performance'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_3rU41dez5TI/TI8BrJNc7CI/AAAAAAAAAYo/3hYek4RHh7Q/s72-c/updates_-_8_core_-_200us_ping.png' height='72' width='72'/><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2646798577990660505</id><published>2010-09-13T11:28:00.000-07:00</published><updated>2010-09-13T15:06:50.055-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='nosql'/><category scheme='http://www.blogger.com/atom/ns#' term='mongodb'/><title type='text'>MySQL versus MongoDB - fetch by secondary index</title><content type='html'>This continues the silly benchmark series and compares performance from concurrent clients that fetch by secondary key. The &lt;a href="http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-yet-another-silly.html"&gt;previous post&lt;/a&gt; compared fetch by primary key. The test setup was the same as before. Clients were run on a 16-core x86 server. The servers (mongod, mysqld) were run alone on 8-core and 16-core x86 servers. The tests were run for servers that were 1ms and 200us apart according to ping. The database server saturates earlier when the client is only 200us away. That is to be expected.&lt;br /&gt;&lt;br /&gt;InnoDB tables are clusters on the primary key and a query that fetches all columns by PK only has to read data from one leaf block of the PK index. When all columns are fetched by secondary key then the secondary index leaf node and PK index leaf node must be read. As all data was cached for this test that does not make a big difference. Were data not cached the extra IO used to read the PK index leaf node would be significant.&lt;br /&gt;&lt;br /&gt;This displays throughput on the 16-core server. MongoDB saturates earlier on the 16-core server than on the 8-core server. From vmstat output this appears to be mutex contention on the server but I cannot provide more details on where that occurs as the mongod binary I downloaded has been stripped.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_3rU41dez5TI/TI0yixqw-NI/AAAAAAAAAYQ/XEmmm_TnHxs/s1600/get_by_secondary_key_-_16_core_server.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://2.bp.blogspot.com/_3rU41dez5TI/TI0yixqw-NI/AAAAAAAAAYQ/XEmmm_TnHxs/s320/get_by_secondary_key_-_16_core_server.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;This displays throughput on the 8-core server. Peak QPS for MongoDB is much better than on the 16-core server.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_3rU41dez5TI/TI0ypguMBLI/AAAAAAAAAYU/pmuFSB0kxRg/s1600/get_by_secondary_key_-_8_core.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://4.bp.blogspot.com/_3rU41dez5TI/TI0ypguMBLI/AAAAAAAAAYU/pmuFSB0kxRg/s320/get_by_secondary_key_-_8_core.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;This displays response time for the 16 core server.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_3rU41dez5TI/TI0ywNLQZtI/AAAAAAAAAYY/td1vM65PbEs/s1600/get_by_secondary_key_-_response_time_-_average_%2526_98th_percentile_-_16_core_server.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://1.bp.blogspot.com/_3rU41dez5TI/TI0ywNLQZtI/AAAAAAAAAYY/td1vM65PbEs/s320/get_by_secondary_key_-_response_time_-_average_%2526_98th_percentile_-_16_core_server.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;The test was repeated using a hosts that were 200us apart according to ping. The database host was an 8-core server in this test. The peak QPS is similar to the previous tests but the servers saturate with fewer concurrent clients. The &lt;a href="https://spreadsheets.google.com/oimg?key=0AteR_jot1VDGdDVYWG4yaG51bW1lamZObFlzMHVCUkE&amp;amp;oid=23&amp;amp;zx=7i40w8h3mefl"&gt;results are here&lt;/a&gt;&amp;nbsp;and have been updated to include results for MongoDB 1.6.2 and 1.7.0.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Source code to setup the MySQL table:&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;def setup_mysql(host, db, user, password, engine, rows):&lt;br /&gt;&amp;nbsp;&amp;nbsp;filterwarnings( 'ignore', category = MySQLdb.Warning )&lt;br /&gt;&amp;nbsp;&amp;nbsp;conn = connect_mysql(host, db, user, password)&lt;br /&gt;&amp;nbsp;&amp;nbsp;conn.autocommit(True)&lt;br /&gt;&amp;nbsp;&amp;nbsp;cursor = conn.cursor()&lt;br /&gt;&amp;nbsp;&amp;nbsp;cursor.execute('drop table if exists bm')&lt;br /&gt;&amp;nbsp;&amp;nbsp;cursor.execute('create table bm (id int primary key, sid int, k int, c char(120), pad char(60), key sidx(sid)) engine=%s' % engine)&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;vals = []&lt;br /&gt;&amp;nbsp;&amp;nbsp;for x in xrange(0, rows):&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;sx = str(x)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;lsx = len(sx)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;row = '(%d, %d, %d, "%s", "%s")' % (x, x, x, sx+'x'*(120-lsx), sx+'y'*(60-lsx))&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;vals.append(row)&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;if len(vals) == 1000:&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;r = cursor.execute('insert into bm values %s' % ','.join(vals))&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;vals = []&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;print '... row %d, result %s' % (x, r)&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;if vals:&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;r = cursor.execute('insert into bm values %s' % ','.join(vals))&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;vals = []&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;print '... row %d, result %s' % (x, r)&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;Source code to query the MySQL table:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;def query_mysql(host, db, user, password, pipe_to_parent, requests_per, rows, check, testname, worst_n, id):&lt;br /&gt;&amp;nbsp;&amp;nbsp;conn = connect_mysql(host, db, user, password)&lt;br /&gt;&amp;nbsp;&amp;nbsp;conn.autocommit(True)&lt;br /&gt;&amp;nbsp;&amp;nbsp;cursor = conn.cursor()&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;gets = 0&lt;br /&gt;&amp;nbsp;&amp;nbsp;stats = SummaryStats(worst_n)&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;while True:&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;for loop in xrange(0, requests_per):&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;target = random.randrange(0, rows)&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;s = time.time()&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;cursor.execute('select id, k, c, pad from bm where sid = %d' % target)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;sel_rows = cursor.fetchall()&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;stats.update(s)&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;if len(sel_rows) != 1:&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;print 'No rows for %d' % target&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;assert False&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;if sel_rows[0][0] != target:&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;print 'id is %s and should be %s' % (sel_rows[0][0], target)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;assert False&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;gets += 1&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Source code to setup the MongoDB collection:&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;def setup_mongo(host, port, dbname, rows):&lt;br /&gt;&amp;nbsp;&amp;nbsp;conn = pymongo.Connection(host, port)&lt;br /&gt;&amp;nbsp;&amp;nbsp;conn.drop_database(dbname)&lt;br /&gt;&amp;nbsp;&amp;nbsp;db = conn[dbname]&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;for x in xrange(0, rows):&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;sx = str(x)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;lsx = len(sx)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;db.c.save({'_id':x, 'sid':x, 'k':x, 'c':sx+'x'*(120 - lsx), 'pad':sx+'y'*(60 - lsx)})&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;if x % 1000 == 0:&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;print '... row %d' % x&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;db.c.create_index('sid')&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Source code to query the MongoDB table:&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;def query_mongo(host, port, pipe_to_parent, requests_per, dbname, rows, check, testname, worst_n, id):&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp;conn = pymongo.Connection(host, port)&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp;db = conn[dbname]&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp;gets = 0&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp;stats = SummaryStats(worst_n)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp;while True:&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;for loop in xrange(0, requests_per):&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;target = random.randrange(0, rows)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;s = time.time()&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;o = db.c.find_one({'sid': target})&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;stats.update(s)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;assert o['_id'] == target&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;if check:&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;assert o['k'] == target&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;sx = str(o['_id'])&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;lsx = len(sx)&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;assert o['c'] == sx+'x'*(120-lsx)&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;assert o['pad'] == sx+'y'*(60-lsx)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;gets += 1&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2646798577990660505?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2646798577990660505/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-fetch-by-secondary.html#comment-form' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2646798577990660505'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2646798577990660505'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-fetch-by-secondary.html' title='MySQL versus MongoDB - fetch by secondary index'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_3rU41dez5TI/TI0yixqw-NI/AAAAAAAAAYQ/XEmmm_TnHxs/s72-c/get_by_secondary_key_-_16_core_server.png' height='72' width='72'/><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-4487884184438887676</id><published>2010-09-11T10:59:00.000-07:00</published><updated>2010-09-15T14:07:29.820-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='nosql'/><title type='text'>MySQL versus MongoDB - yet another silly benchmark</title><content type='html'>This is yet another silly benchmark because the results are likely to be misused. The results probably do not matter to you. I like MongoDB. It has many things in common with MySQL (ease of use, rapid iteration, give customers what they want). It would be more interesting to me were it to use embedded InnoDB as the backend.&lt;br /&gt;&lt;br /&gt;Please read &lt;a href="http://mysqlha.blogspot.com/2010/09/oh-no-mongodb-can-be-fast-for-key-value.html"&gt;the update&lt;/a&gt; as I was able to get much better throughput from MongoDB once I realized that the client host was saturated on the CPU.&lt;br /&gt;&lt;br /&gt;I want to evaluate the performance claims.&amp;nbsp;Several of the &lt;a href="http://www.mongodb.org/display/DOCS/Benchmarks"&gt;MongoDB versus the world benchmarks&lt;/a&gt;&amp;nbsp;were run on Mac laptops. I love my Mac laptop, but I don't use it to confirm database server throughput.&lt;br /&gt;&lt;br /&gt;My first test uses concurrent clients to determine the load at which the server saturates. I run this test in two modes: cached and not cached. No disk IO should be done during the cached test. Most requests should require disk IO during the uncached test. I am not sure that I will run the uncached test for MongoDB as it memory maps the database file and there is no way to limit the amount of memory it will use so the MongoDB uncached test requires a database much larger than RAM. That will take a long time to setup and I don't know whether many MongoDB deployments run with databases much larger than RAM.&lt;br /&gt;&lt;br /&gt;The test is interesting to me because it has found problems that limit throughput on workloads that I care about. Other tests should be done after understand the results from this test. I suspect that for many people peak QPS on highly-concurrent workloads is not a priority.&lt;br /&gt;&lt;br /&gt;Each client does a sequence of calls (SELECT or HANDLER for MySQL, find_one for MongoDB) where each calls fetches one row by specifying the value of an indexed column. The column has a primary key index for MySQL and used the Collections create_index method for MongoDB. The id value to fetch is randomly selected.&lt;br /&gt;&lt;br /&gt;The code for the benchmark is a work in progress, so I have yet to publish it. I used Python, the MySQLdb driver for MySQL and the pymongo driver for MongoDB. The clients are separate processes forked via the Python multiprocessing module to avoid the penalty from the Python GIL. Some of the code used for MongoDB is listed at the end of this post.&lt;br /&gt;&lt;br /&gt;I only report peak QPS at this time. The results ignore average and worst-case response time. It is not good to ignore them which is another reason why this might be a silly benchmark. Eventually the code will be updated to measure that.&lt;br /&gt;&lt;br /&gt;The clients and DBMS (mysqld, mongod) ran on separate 16-core servers. The test was run for 1, 2, 4, 8, 16, 24, ..., 152, 160 concurrent clients. Tests were run for MyISAM and InnoDB for MySQL. The tests used MySQL 5.1.50 unmodified (5150orig) and 5.1.50 with the Facebook patch (5150fb). There are a few changes in the Facebook patch that make a huge difference for peak QPS. I expect official MySQL to have these changes soon. Peak QPS was much higher for MySQL than for MongoDB. In order of peak QPS:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Peak QPS on MySQL 5.1.50 with the Facebook patch exceeds 100,000 and QPS increases to 128 concurrent clients.&lt;/li&gt;&lt;li&gt;Peak QPS on MySQL 5.1.50 unmodified is between 70,000 and 80,000 and QPS increases to 80 concurrent clients.&lt;/li&gt;&lt;li&gt;Peak QPS on MongoDB 1.7.0 is 40,000 and QPS increases to 64 concurrent clients.&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;MongoDB appears to saturate on mutex contention on the server at 64 concurrent clients. Alas, the binary is stripped and I cannot run &lt;a href="http://www.poormansprofiler.org/"&gt;PMP&lt;/a&gt; to determine where the problem occurs. My diagnosis is based on vmstat output.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;This displays throughput MySQL 5.1.50 with the Facebook patch, MySQL 5.1.50 unmodified and MongoDB 1.7.0. All ran on a 16-core Nehalem server -- 16 cores with hyperthreading enabled.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_3rU41dez5TI/TI1EgOPGs_I/AAAAAAAAAYk/Mvw23TISYaw/s1600/gets_by_pk_-_16_core_server.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://3.bp.blogspot.com/_3rU41dez5TI/TI1EgOPGs_I/AAAAAAAAAYk/Mvw23TISYaw/s320/gets_by_pk_-_16_core_server.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;This displays average and 98th percentile response time for MySQL 5.1.50 with the Facebook patch and MongoDB 1.7.0. Both ran on a 16-core Nehalem server -- 16 cores with hyperthreading enabled. Results are from the same test as used for the previous graph.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_3rU41dez5TI/TIz-i8WFnTI/AAAAAAAAAYE/pnVJa_r9iVw/s1600/gets_by_pk_-_response_time_-_average_%2526_98th_percentile_-_16_core_server.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://2.bp.blogspot.com/_3rU41dez5TI/TIz-i8WFnTI/AAAAAAAAAYE/pnVJa_r9iVw/s320/gets_by_pk_-_response_time_-_average_%2526_98th_percentile_-_16_core_server.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;I repeated the tests using an 8-core Nehalem server. It is the same as the previous server except that hyperthreading was disabled. The graphs below display throughput. MySQL is able to handle more concurrent clients before saturating on the server.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_3rU41dez5TI/TI1EXx-rt2I/AAAAAAAAAYg/xABUqC4i49U/s1600/gets_by_pk_-_8_core_server.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://1.bp.blogspot.com/_3rU41dez5TI/TI1EXx-rt2I/AAAAAAAAAYg/xABUqC4i49U/s320/gets_by_pk_-_8_core_server.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;This is the response time graph for the 8-core server.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_3rU41dez5TI/TIz_D238bJI/AAAAAAAAAYM/VSD6S_JS3qg/s1600/gets_by_pk_-_response_time_-_average_%2526_98th_percentile_-_8_core_server.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://1.bp.blogspot.com/_3rU41dez5TI/TIz_D238bJI/AAAAAAAAAYM/VSD6S_JS3qg/s320/gets_by_pk_-_response_time_-_average_%2526_98th_percentile_-_8_core_server.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;The servers for the tests above were 1ms apart according to ping. I repeated the test for servers that have a 200us ping time but I don't want to inline another graph. It is linked in the next sentence. The&amp;nbsp;&lt;a href="https://spreadsheets.google.com/oimg?key=0AteR_jot1VDGdDVYWG4yaG51bW1lamZObFlzMHVCUkE&amp;amp;oid=19&amp;amp;zx=dl0e7oobpaky"&gt;QPS for get by primary key&lt;/a&gt;&amp;nbsp;with an interesting and unexplained spike in QPS for MongoDB near 152 concurrent clients. Despite having a similar QPS at that point, MongoDB average response time was about 1.5X that for MySQL. This includes results for MongoDB versions 1.6.2 and 1.7.0.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;The code to setup the collection for MongoDB is:&lt;br /&gt;&lt;blockquote&gt;def setup_db(host, port, dbname, rows):&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp;conn = pymongo.Connection(host, port)&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp;conn.drop_database(dbname)&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp;db = conn[dbname]&amp;nbsp;&amp;nbsp;&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp;for x in xrange(0, rows):&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;sx = str(x)&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;lsx = len(sx)&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;db.c.save({'_id':x, 'k':x, 'c':sx+'x'*(120 - lsx), 'pad':sx+'y'*(120 - lsx)})&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;if x % 1000 == 0:&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;print '... row %d' % x&lt;/blockquote&gt;And the code to query MongoDB:&lt;br /&gt;&lt;blockquote&gt;def query_process(host, port, pipe_to_parent, requests_per, dbname, rows, check, id):&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp;conn = pymongo.Connection(host, port)&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp;db = conn[dbname]&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp;gets = 0&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp;while True:&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;for loop in xrange(0, requests_per):&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;target = random.randrange(0, rows)&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;o = db.c.find_one({'_id': target})&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;assert o['_id'] == target&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;if check:&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;assert o['k'] == target&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;sx = str(o['id'])&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;lsx = len(sx)&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;assert o['c'] == sx+'x'*(120-lsx)&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;assert o['pad'] == sx+'y'*(120-lsx)&lt;/blockquote&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;gets += 1&lt;/blockquote&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;if pipe_to_parent.poll():&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;msg = pipe_to_parent.recv()&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;print 'Received: %s' % msg&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;pipe_to_parent.send({'gets' : gets})&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;pipe_to_parent.close()&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;return&lt;/blockquote&gt;The my.cnf settings for MySQL 5.1:&lt;br /&gt;&lt;blockquote&gt;plugin-load=innodb=ha_innodb_plugin.so;innodb_trx=ha_innodb_plugin.so;innodb_locks=ha_innodb_plugin.so;innodb_lock_waits=ha_innodb_plugin.so;innodb_cmp=ha_innodb_plugin.so;innodb_cmp_reset=ha_innodb_plugin.so;innodb_cmpmem=ha_innodb_plugin.so;innodb_cmpmem_reset=ha_innodb_plugin.so&lt;/blockquote&gt;&lt;blockquote&gt;innodb_buffer_pool_size=2000M&lt;/blockquote&gt;&lt;blockquote&gt;innodb_log_file_size=100M&lt;/blockquote&gt;&lt;blockquote&gt;innodb_flush_log_at_trx_commit=2&lt;/blockquote&gt;&lt;blockquote&gt;innodb_doublewrite=1&lt;/blockquote&gt;&lt;blockquote&gt;innodb_flush_method=O_DIRECT&lt;/blockquote&gt;&lt;blockquote&gt;innodb_thread_concurrency=0&lt;/blockquote&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;blockquote&gt;innodb_file_format=barracuda&lt;/blockquote&gt;&lt;blockquote&gt;innodb_file_per_table&lt;/blockquote&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;blockquote&gt;max_connections=2000&lt;/blockquote&gt;&lt;blockquote&gt;table_cache=2000&lt;/blockquote&gt;&lt;blockquote&gt;key_buffer_size=2000M&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-4487884184438887676?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/4487884184438887676/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-yet-another-silly.html#comment-form' title='27 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/4487884184438887676'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/4487884184438887676'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-yet-another-silly.html' title='MySQL versus MongoDB - yet another silly benchmark'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_3rU41dez5TI/TI1EgOPGs_I/AAAAAAAAAYk/Mvw23TISYaw/s72-c/gets_by_pk_-_16_core_server.png' height='72' width='72'/><thr:total>27</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-336642638921468185</id><published>2010-09-08T16:32:00.000-07:00</published><updated>2010-09-08T16:32:47.016-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>SBR and innodb_autoinc_lock_mode</title><content type='html'>The &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html"&gt;5.1 manual&lt;/a&gt; states:&lt;br /&gt;&lt;blockquote&gt;Therefore, if you are using statement-based replication, you must either avoid INSERT ... ON DUPLICATE KEY UPDATE or use&amp;nbsp;innodb_autoinc_lock_mode = 0&lt;/blockquote&gt;And earlier &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html"&gt;in the same page&lt;/a&gt; is the description of innodb_autoinc_lock_mode=0&lt;br /&gt;&lt;blockquote&gt;This lock mode is provided only for backward compatibility and performance testing. There is little reason to use this lock mode unless you use “mixed-mode inserts” and care about the important difference in semantics described later.&lt;/blockquote&gt;I don't think these statements agree. I am confused even more because upgrading a master-slave pair of servers to 5.1 with innodb_autoinc_lock_mode=1 and statement-based replication fixed a logical corruption problem that occurs when the wrong value was written to the binlog for SET INSERT_ID=... as part of a transaction that does INSERT ... ON DUPLICATE KEY UPDATE.There are a few details on that at &lt;a href="http://bugs.mysql.com/bug.php?id=50413"&gt;bug 50413&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-336642638921468185?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/336642638921468185/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/09/sbr-and-innodbautoinclockmode.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/336642638921468185'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/336642638921468185'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/09/sbr-and-innodbautoinclockmode.html' title='SBR and innodb_autoinc_lock_mode'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-7633823412011969042</id><published>2010-09-02T07:31:00.000-07:00</published><updated>2010-09-02T09:05:06.979-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Speaking at MySQL Sunday</title><content type='html'>I am speaking at MySQL Sunday. The title for my talk is &lt;a href="http://www.oracle.com/us/openworld/mysql-sunday-078000.html"&gt;Success with MySQL&lt;/a&gt; and I will focus on things that &amp;nbsp;operations and users can do to make a MySQL deployment succeed. There are many interesting talks scheduled for Sunday, including several at the same time as mine. I hope to see you there.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-7633823412011969042?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/7633823412011969042/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/09/speaking-at-mysql-sunday.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7633823412011969042'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7633823412011969042'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/09/speaking-at-mysql-sunday.html' title='Speaking at MySQL Sunday'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-5375293534007425762</id><published>2010-07-25T12:18:00.000-07:00</published><updated>2010-07-25T12:18:29.107-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Why don't you use X?</title><content type='html'>Sometimes I am asked why don't I use X instead of official MySQL. The answer is simple. I like to use it because I have been using it, the MySQL development team (including InnoDB) has done great work this year and because change is expensive. The cost of change includes the cost of evaluating the alternatives and the cost of deploying them. The cost of change also includes the features I won't work on because I am doing an evaluation. I also use it because the quality of new 5.1 releases has been very high this year. I know because I test them and some of the alternatives.&lt;br /&gt;&lt;br /&gt;My initial evaluation criteria are simple. I don't like compiler or valgrind warnings. The alternative should not introduce new ones. I like regression tests. The alternative should not disable or fail existing tests. If the existing test is somewhat bogus, then it should be fixed. I love &lt;a href="http://buildbot.askmonty.org/buildbot/"&gt;buildbot&lt;/a&gt; as done by MariaDB, &lt;a href="http://bugs.mysql.com/bug.php?id=53445"&gt;fixes in official MySQL&lt;/a&gt; to reduce compiler warnings and all of the work done by Drizzle to not tolerate compiler warnings. When the alternative adds new features it must add new regression tests (hooray for status_user.test in MariaDB).&lt;br /&gt;&lt;br /&gt;I spent a lot of time debugging valgrind warnings that occur in MySQL 5.1.47. All of them were bogus and I think future versions of MySQL will prevent these. That is good news as I prefer to not repeat that effort the next time I upgrade to a new MySQL release.&lt;br /&gt;&lt;br /&gt;Percona and MariaDB confront the same issues and more when considering code to incorporate. In addition to what I wrote about above, they must review patches to make sure the code is reasonable. There is a lot of duplicate effort done by groups that patch or fork MySQL. I wish this weren't the case. If the external fork &amp;amp; patch effort is consolidated around MariaDB, then we can reduce some of this.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-5375293534007425762?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/5375293534007425762/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/07/why-dont-you-use-x.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5375293534007425762'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5375293534007425762'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/07/why-dont-you-use-x.html' title='Why don&apos;t you use X?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-4307901994708846434</id><published>2010-07-23T10:55:00.000-07:00</published><updated>2010-07-23T10:56:41.756-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Building MariaDB with the InnoDB plugin</title><content type='html'>This post was inspired by a couple events. But I won't explain them other than to say I think there have been too many subjective comments (or FUD) about quality. This is an attempt to quantify whether the grass is greener on the other side.&lt;br /&gt;&lt;br /&gt;Today I tried to build MariaDB with the InnoDB plugin. I was told his is now supported. The last time I checked XtraDB replaced the InnoDB plugin in MariaDB. MPAB had a reasonable reason for doing this as they don't want to test both the InnoDB plugin and XtraDB. But I prefer choice, despite the many great features in XtraDB.&lt;br /&gt;&lt;br /&gt;First I tried using the 5.3 release. That failed fast. I prefer fast failures over obscure ones:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;./configure --enable-thread-safe-client --with-plugins=partition,csv,blackhole,myisam,heap,innodb_plugin --without-plugin-innobase --with-fast-mutexes --with-extra-charsets=all --with-debug C_EXTRA_FLAGS="-fno-omit-frame-pointer -fno-strict-aliasing -Wall"&lt;/blockquote&gt;&lt;blockquote&gt;...&lt;/blockquote&gt;&lt;br /&gt;&lt;blockquote&gt;configure: error: unknown plugin: innodb_plugin&lt;/blockquote&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Then I tried the 5.2 MariaDB release. This time the configure command worked. After running make, both XtraDB and the InnoDB were compiled. Time to try a test. There were no failures! Alas, all tests were skipped.&lt;/div&gt;&lt;blockquote&gt;./mysql-test-run.pl --suite=innodb_plugin&lt;/blockquote&gt;&lt;blockquote&gt;...&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; [ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb-analyze &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; [ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb-autoinc &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; [ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb-autoinc-44030 &amp;nbsp; &amp;nbsp; &amp;nbsp; [ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb-consistent &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb-index &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; [ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb-index_ucs2 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb-lock &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb-replace &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; [ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb-semi-consistent &amp;nbsp; &amp;nbsp; [ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb-timeout &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; [ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb-use-sys-malloc &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb-zip &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; [ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug21704 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug34053 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug34300 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug35220 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug36169 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug36172 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug38231 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug39438 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug40360 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug40565 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug41904 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug42101 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug42101-nonzero &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug44032 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug44369 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug44571 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug45357 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug46000 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug46676 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug47167 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug47621 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug47622 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug47777 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug51378 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug51920 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug52663 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_bug52745 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_file_format &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; [ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_information_schema &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;blockquote&gt;innodb_plugin.innodb_trx_weight &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;[ skipped ] &amp;nbsp;No innodb support&lt;/blockquote&gt;&lt;br /&gt;Then I checked for compiler warnings. I really dislike compiler warnings. MySQL has recently done a lot of work to remove them from 5.1 (thanks Davi). I think all of this work was considered one bug, but it was a lot of work and will make MySQL better. They have also begun to do some builds with Werror. See &lt;a href="http://bugs.mysql.com/bug.php?id=53445"&gt;bug 53445&lt;/a&gt;&amp;nbsp;for all of the details.&lt;br /&gt;&lt;br /&gt;After compiling storage engines with -Wall, there are &lt;b&gt;no&lt;/b&gt; warnings for the official InnoDB plugin (storage/innodb_plugin). There are warnings for XtraDB (storage/xtradb):&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;btr/btr0btr.c:2871: warning: null argument where non-null required (argument 1)&lt;/blockquote&gt;&lt;blockquote&gt;btr/btr0cur.c:1841: warning: null argument where non-null required (argument 2)&lt;/blockquote&gt;&lt;blockquote&gt;btr/btr0cur.c:1860: warning: null argument where non-null required (argument 1)&lt;/blockquote&gt;&lt;blockquote&gt;btr/btr0cur.c:1967: warning: null argument where non-null required (argument 1)&lt;/blockquote&gt;&lt;blockquote&gt;fil/fil0fil.c:3106: warning: pointer targets in passing argument 2 of ‘dict_table_get_index_on_name’ differ in signedness&lt;/blockquote&gt;&lt;blockquote&gt;ibuf/ibuf0ibuf.c:775: warning: null argument where non-null required (argument 1)&lt;/blockquote&gt;&lt;blockquote&gt;ibuf/ibuf0ibuf.c:950: warning: null argument where non-null required (argument 1)&lt;/blockquote&gt;&lt;blockquote&gt;os/os0file.c:4194: warning: pointer targets in assignment differ in signedness&lt;/blockquote&gt;&lt;div&gt;I want to compare the results from running all regression tests with valgrind. But that might take some time. I am able to run all InnoDB tests without valgrind warnings using the Facebook-patched MySQL 5.1.47. That required a few small changes that are likely in the recent 5.1.49 release.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Have all regression tests been run under valgrind for MariaDB using either the InnoDB or XtraDB plugin?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Stay tuned for part 2.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-4307901994708846434?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/4307901994708846434/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/07/building-mariadb-with-innodb-plugin.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/4307901994708846434'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/4307901994708846434'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/07/building-mariadb-with-innodb-plugin.html' title='Building MariaDB with the InnoDB plugin'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-5605571725055711653</id><published>2010-07-21T10:01:00.000-07:00</published><updated>2010-07-21T12:46:22.729-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Conversation starters for OSCON</title><content type='html'>I will be at OSCON in a few hours. Mohan and I &lt;a href="http://www.oscon.com/oscon2010/public/schedule/detail/15567"&gt;have a talk on FlashCache&lt;/a&gt; on Thursday. The talk will have lots of details on the FlashCache implementation. I expect to be quiet except for a few slides on performance. Mohan and Paul did an amazing job getting FlashCache running on our servers. This is an opportunity to learn from Mohan.&lt;br /&gt;&lt;br /&gt;As Percona has been doing a lot of work with it, I hope they will be at OSCON to discuss their experience with it.&lt;br /&gt;&lt;br /&gt;There are other interesting things to talk about if your area is data management:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;MySQL is doing great for their customers this year. I began using MySQL in 2005. This has been the best year for me. The 5.1 release is excellent and the 5.5 beta looks great. The MySQL development team has been fixing bugs fast. Merging InnoDB and MySQL into one company means that their development teams work better, fix bugs faster and talk more about new features.&lt;/li&gt;&lt;li&gt;Your SQL, not MySQL, is frequently the problem. Sometimes we make MySQL deployments better by hacking on MySQL. More often improvement comes from changing application SQL. While some of the application changes compensate for less than perfect behavior in MySQL, more are done to fix things that would be a problem for any database. The biggest thing that MySQL needs to fix in this area is monitoring. It must make it easier to identify performance problems. Until then tcpdump, &lt;a href="http://poormansprofiler.org/"&gt;Poor Mans Profiler&lt;/a&gt; and &lt;a href="http://www.maatkit.org/doc/mk-query-digest.html"&gt;mk-query-digest&lt;/a&gt; are excellent options.&lt;/li&gt;&lt;li&gt;Where did Java go wrong? MySQL has a wonderful JDBC driver. I don't blame the implementation. But Java clients continue to cause too many database problems for me. I recently logged all SQL on a server and noticed that the JDBC client connected, ran 13 statements to prepare the connection (including 5 set autocommit statements) and then ran 1 query. That is an amazing amount of bloat. I have seen many other cases where preparing/returning a connection from/to the pool required 5 to 10 statements. Given that number of round trips between the client and server it isn't likely that the connection pool saves any overhead on the database. I am currently dealing with Java applications that set tx_isolation to read-committed for InnoDB. With MySQL 5.1 all binlog events written for such a connection must use row-based replication. For now I will assume that most of the Java apps use read-committed because they want to rather than because they need to.&lt;/li&gt;&lt;li&gt;Just because your database is sharded doesn't mean you lose joins. You lose the ability to do joins or enforce foreign keys across all of your data. But lots of interesting queries can be run within one shard. I prefer that long running queries using something other than MySQL as they will run much faster elsewhere.&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-5605571725055711653?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/5605571725055711653/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/07/conversation-starters-for-oscon.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5605571725055711653'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5605571725055711653'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/07/conversation-starters-for-oscon.html' title='Conversation starters for OSCON'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1607243466745779668</id><published>2010-07-13T09:21:00.000-07:00</published><updated>2010-07-13T10:19:42.999-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>A new book on MySQL replication and HA</title><content type='html'>There is a &lt;a href="http://oreilly.com/catalog/9780596807306?utm_content=em-orm-pr-MySQL+High+Availability"&gt;new book&lt;/a&gt; on MySQL replication and HA from Charles, Mats and Lars. I read it as a reviewer and learned more than a few things. It has many details on internals that are not described elsewhere unless you are willing to read the source code. It also describes how to deploy MySQL replication for many use cases. I think the book can save people from some failures that are inevitable when a distributed system is deployed for enough time on enough servers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1607243466745779668?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1607243466745779668/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/07/new-book-on-mysql-replication-and-ha.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1607243466745779668'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1607243466745779668'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/07/new-book-on-mysql-replication-and-ha.html' title='A new book on MySQL replication and HA'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-6473650283485075411</id><published>2010-05-05T10:35:00.000-07:00</published><updated>2010-05-05T11:14:43.555-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Best practices</title><content type='html'>Back in the day &lt;a href="http://en.wikipedia.org/wiki/The_C_Programming_Language_%28book%29"&gt;we wrote C&lt;/a&gt; without support for type checking. Many bugs were missed because of this. Someone wrote &lt;a href="http://en.wikipedia.org/wiki/Lint_%28software%29"&gt;lint&lt;/a&gt; and many bugs were prevented. Not everyone takes advantage of tools that prevent easily fixed errors. MySQL builds are done without using the &lt;a href="http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Warning-Options.html#index-Wall-234"&gt;-Wall option in gcc&lt;/a&gt; to generate more warnings. Nor do they use the &lt;a href="http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Warning-Options.html#index-Werror-226"&gt;-Werror option&lt;/a&gt; to fail on warnings. This allows for some silly things in production releases like &lt;a href="http://bugs.mysql.com/bug.php?id=51289"&gt;bug 51289&lt;/a&gt; (return NULL for a function that is declared to return double):&lt;br /&gt;&lt;blockquote&gt;&lt;pre class="note"&gt;double Item_cache_decimal::val_real()&lt;br /&gt;{&lt;br /&gt;  DBUG_ASSERT(fixed);&lt;br /&gt;  double res;&lt;br /&gt;  if (!value_cached &amp;amp;&amp;amp; !cache_value())&lt;br /&gt;    return NULL;&lt;/pre&gt;&lt;/blockquote&gt;Not using -Wall and -Werror allows for more serious problems to be missed. &lt;a href="http://bugs.mysql.com/bug.php?id=42733"&gt;Bug 42733&lt;/a&gt; is an example of one such problem. It also has allowed me to miss problems in code that I change.&lt;br /&gt;&lt;br /&gt;I filed &lt;a href="http://bugs.mysql.com/bug.php?id=53445"&gt;bug 53445&lt;/a&gt; for this. MySQL has been very good at fixing things lately. This should also get fixed. Visit the bug and subscribe to it or update it.&lt;br /&gt;&lt;br /&gt;Searching for &lt;a href="http://www.google.com/search?hl=en&amp;amp;client=firefox-a&amp;amp;hs=j0O&amp;amp;rls=org.mozilla%3Aen-US%3Aofficial&amp;amp;q=site%3Abugs.mysql.com+compiler+warning"&gt;"site:bugs.mysql.com compiler warning"&lt;/a&gt; finds a lot of entries.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-6473650283485075411?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/6473650283485075411/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/05/best-practices.html#comment-form' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6473650283485075411'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6473650283485075411'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/05/best-practices.html' title='Best practices'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-8273253010643874425</id><published>2010-04-25T11:05:00.000-07:00</published><updated>2010-04-25T14:31:43.394-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='distributed'/><title type='text'>Consistency across a WAN</title><content type='html'>There are three solutions for providing consistency in a data service that operates across a wide area network (WAN). None of them are free. What are you willing to pay and where are you willing to add complexity? Depending on what you choose your system can be more complex for external users, internal application developers or operations. The choices are multi-master with conflict resolution (eventual consistency), multi-master with conflict prevention (strong consistency) and single master with downtime on failover.&lt;br /&gt;&lt;br /&gt;If you choose &lt;a href="http://www.allthingsdistributed.com/2008/12/eventually_consistent.html"&gt;eventual consistency&lt;/a&gt; (EC) then internal application developers must write logic to resolve conflicts and external users will occasionally encounter inconsistent data. This might be a small price to pay for a system that provides higher availability and transparent failover. I am not aware of support for secondary indexes in the popular EC systems. I wonder if the same logic that does eventual consistency across a WAN might be reused to keep secondary indexes eventually consistent within a datacenter. That would impose an additional cost on internal application developers in return for expanding the workloads that EC can support.&lt;br /&gt;&lt;br /&gt;If you choose &lt;a href="http://en.wikipedia.org/wiki/Paxos_algorithm"&gt;strong consistency&lt;/a&gt; then external users experience more latency on writes as the transaction commit requires one or two round trips across a WAN. This might be a small price to pay for a system that provides higher availability and transparent failover. &lt;a href="http://www.codership.com/"&gt;Galera is doing interesting work&lt;/a&gt; in this area for MySQL and they have already begun to publish results. I need to read more about that.&lt;br /&gt;&lt;br /&gt;If you choose single master then you will spend more money to make that master less likely to fail. You will also experience more downtime and higher support costs while doing manual failover as quickly as possible. Solutions include RAID 10, battery backed write cache, highly-available SAN/NFS, &lt;a href="http://www.linbit.com/"&gt;DRBD&lt;/a&gt; and pagers for your operations team.&lt;br /&gt;&lt;br /&gt;I don't know if people choose single master in the MySQL community. There are not many choices. It supports multi-master replication but without conflict resolution. It supports strong consistency with Galera but that is new on the MySQL market. Galera might be the killer application for MariaDB. &lt;a href="http://www.continuent.com/"&gt;Tungsten&lt;/a&gt; is another product that can reduce the complexity of master-slave replication.&lt;br /&gt;&lt;br /&gt;Unless you are using Tungsten, it very hard to automate master failover for MySQL when there is more than one slave per master. But many deployments need a master and slave in one datacenter and another slave in a remote datacenter.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-8273253010643874425?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/8273253010643874425/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/04/consistency-across-wan.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8273253010643874425'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8273253010643874425'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/04/consistency-across-wan.html' title='Consistency across a WAN'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2750242660154467308</id><published>2010-04-20T08:34:00.000-07:00</published><updated>2010-04-20T09:10:29.579-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='distributed'/><title type='text'>CAP for MySQL</title><content type='html'>Stonebraker wrote a &lt;a href="http://cacm.acm.org/blogs/blog-cacm/83396-errors-in-database-systems-eventual-consistency-and-the-cap-theorem/fulltext"&gt;post on CAP&lt;/a&gt;. It is interesting because he highlights all of the causes for failures in a database service and then provides his estimates on the frequency of those causes. Many posts on CAP ignore failures caused by how the service is used and operated.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Choose CA&lt;/b&gt; - You can get CA (&lt;a href="http://mysqlha.blogspot.com/2010/04/cap-theorem.html"&gt;consistency, availability&lt;/a&gt;) for a database service running within a datacenter. Use DRBD with MySQL. Then use asynchronous replication to a remote datacenter to survive the loss of a datacenter. You don't get CA between datacenters in this setup. Others &lt;a href="http://jsensarma.com/blog/2009/11/dynamo-a-flawed-architecture-part-i/"&gt;have written&lt;/a&gt; about the ability to get CA &lt;a href="http://jsensarma.com/blog/2009/11/dynamo-part-i-a-followup-and-re-rebuttals/"&gt;within a datacenter&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Repeatable DBMS errors&lt;/b&gt; - I call these queries of death. I notice about one per year. MySQL is remarkably stable if you are careful about the features that you use. New SQL must never run first on a master.&lt;/li&gt;&lt;li&gt;&lt;b&gt;Unrepeatable DBMS errors&lt;/b&gt; - These are also infrequent with MySQL but much more frequent than repeatable errors. These are usually impossible to distinguish from intermittent errors caused by hardware and other system software.&lt;/li&gt;&lt;li&gt;&lt;b&gt;P of the CAP theorem is a rare event&lt;/b&gt; - I don't have numbers but I don't agree with this for services running across a WAN. Additionally, there are other reasons to sacrifice P. Many applications cannot afford the latency required for strong consistency.&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;Can we afford strong consistency?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I am certainly not an expert on &lt;a href="http://www.google.com/search?hl=en&amp;amp;client=firefox-a&amp;amp;hs=wOa&amp;amp;rls=org.mozilla%3Aen-US%3Aofficial&amp;amp;q=paxos+consensus"&gt;Paxos&lt;/a&gt;, but it is the way to get strong consistency for a database service running across a WAN. This costs one or two round trips depending on whether the commit coordinator migrates between servers. In theory we should be able to afford the overhead of 100 milliseconds to 200 milliseconds for latency sensitive services.&lt;b&gt; &lt;/b&gt;We get a lot in return.&lt;br /&gt;&lt;br /&gt;But applications and existing database servers can make this difficult to achieve. Many applications are conversational (request, think, request, think, commit -- substitute network latency for 'think'). Many&lt;b&gt; &lt;/b&gt;database servers have one resource, the database log, for which Paxos must be run. This guarantees that the commit coordinator will frequently migrate and commit will require two round trips. Even when the log isn't a problem, there will be performance problems for rows that are frequently updated from all locations.&lt;br /&gt;&lt;br /&gt;This will change in the future. Servers that are optimized for OLTP (no conversational transactions) will be designed. &lt;a href="http://www.voltdb.com/"&gt;VoltDB&lt;/a&gt; is doing this today (disclaimer, a family member works there). Servers that are optimized for strong consistency across a WAN are also getting built and will become more popular over time.&lt;br /&gt;&lt;br /&gt;I think that MySQL can take part in that future (strong consistency across a WAN), but that requires a new storage engine.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2750242660154467308?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2750242660154467308/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/04/cap-for-mysql.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2750242660154467308'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2750242660154467308'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/04/cap-for-mysql.html' title='CAP for MySQL'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1388361716536573083</id><published>2010-04-18T21:49:00.000-07:00</published><updated>2010-04-18T21:49:23.985-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>How MySQL can placate the communtiy</title><content type='html'>&lt;ol&gt;&lt;li&gt;Announce&lt;/li&gt;&lt;li&gt;Release&lt;/li&gt;&lt;/ol&gt;This works as long as the gap between Announce and Release is not unreasonably large. The Announce step was done last week. We can all help with the Release step by using a 5.5 beta, reporting bugs and reading the excellent documentation for &lt;a href="http://dev.mysql.com/doc/refman/5.5/en/index.html"&gt;MySQL 5.5&lt;/a&gt; and &lt;a href="http://dev.mysql.com/doc/innodb-plugin/1.1/en/index.html"&gt;InnoDB 1.1&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1388361716536573083?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1388361716536573083/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/04/how-mysql-can-placate-communtiy.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1388361716536573083'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1388361716536573083'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/04/how-mysql-can-placate-communtiy.html' title='How MySQL can placate the communtiy'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1541001661434761076</id><published>2010-04-18T09:22:00.000-07:00</published><updated>2010-04-18T09:22:51.059-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='distributed'/><title type='text'>CAP theorem</title><content type='html'>Do you want to reference the CAP theorem at your next tech conference? I certainly do. Consistency, availability and partition tolerance are wonderful. You only get two and sometimes you only get one. But what are they? In the CAP context availability and partition tolerance are labels for behavior defined by the CAP presentation and paper and this is the source of much confusion. Start with the original work before reading summaries by others.&lt;br /&gt;&lt;br /&gt;Eric Brewer presented CAP in a &lt;a href="http://www.cs.berkeley.edu/%7Ebrewer/cs262b-2004/PODC-keynote.pdf"&gt;keynote at the PODC&lt;/a&gt;. The best slides are on page 4. With XA you forfeit partitions and get CA (consistency + availability). With majority protocols you forfeit availability and get CP (consistency + partition tolerance). &lt;b&gt;XA == CA&lt;/b&gt; is easy for me to remember and then I know that majority protocols provide CP.&lt;br /&gt;&lt;br /&gt;CAP was proved in a &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.20.1495&amp;amp;rep=rep1&amp;amp;type=pdf"&gt;paper by Gilbert and Lynch&lt;/a&gt;. This paper defines availability and partition tolerance as used by CAP.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;availability - every request received by a non-failing node must result in a response&lt;/li&gt;&lt;li&gt;partition tolerance - The network will be allowed to  lose arbitrarily many messages sent from one node to another. Every node receiving a request from a client must respond, even though arbitrary messages that are sent may be lost.&lt;/li&gt;&lt;/ul&gt;CAP was explained in a great post by Jeff Darcy on &lt;a href="http://pl.atyp.us/wordpress/?p=2521"&gt;Availability and Partition Tolerance&lt;/a&gt;. Read it. He explains why &lt;b&gt;XA == CA&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;What does this have to do with MySQL? I think we will be talking about CAP much more in the future. I found &lt;a href="http://www.mysqlab.net/blog/wp-content/uploads/2010/04/e8b0ade4bf8ae99d92-mysqle695b0e68daee5ba93e99b86e7bea4e9ab98e58fafe794a8e8aebee8aea1e58f8ae5ba94e794a8.pdf"&gt;an interesting presentation&lt;/a&gt; that explained current MySQL solutions (DRBD, master-slave, master-master, NDB/Cluster) in terms of CAP, but it didn't have enough notes to explain the content.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1541001661434761076?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1541001661434761076/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/04/cap-theorem.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1541001661434761076'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1541001661434761076'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/04/cap-theorem.html' title='CAP theorem'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2581232363416829646</id><published>2010-03-28T16:39:00.000-07:00</published><updated>2010-03-28T16:43:26.981-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='myisam'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><category scheme='http://www.blogger.com/atom/ns#' term='pbxt'/><title type='text'>Fast reads or fast scans?</title><content type='html'>MyISAM is frequently described and marketed as providing fast reads when it really provides fast index and table scans. This is a more narrow use case as fast reads implies great performance for most queries while fast scans implies great performance for single-table queries that are index only or do a full table scan.&lt;br /&gt;&lt;br /&gt;MyISAM caches index blocks but not data blocks. There can be a lot of overhead from re-reading data blocks from the OS buffer cache assuming mmap is not used. InnoDB and PBXT are 20X faster than MyISAM for some of my tests. However, I suspect that mutex contention on the key cache is also a factor in the performance differences.&lt;br /&gt;&lt;br /&gt;While there are many claims about the great performance of MyISAM. There are not as many examples that explain when it is fast. Alas, the same marketing technique is being repeated with NoSQL to the disadvantage of MySQL.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.5/en/ansi-diff-transactions.html"&gt;http://dev.mysql.com/doc/refman/5.5/en/ansi-diff-transactions.html&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-foreign-keys.html"&gt;http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-foreign-keys.html&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.mysql.com/products/dw"&gt;http://www.mysql.com/products/dw&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.5/en/storage-engine-compare-transactions.html"&gt;http://dev.mysql.com/doc/refman/5.5/en/storage-engine-compare-transactions.html&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;Tests were run on a server that reports 16 CPU cores. The full test configuration is &lt;a href="http://www.facebook.com/notes/mysqlfacebook/pbxt-still-looks-good/379934640932"&gt;described elsewhere&lt;/a&gt;. For this test I modified the sysbench oltp test to do a self-join query. I will publish the code soon. The schema for the test is:&lt;br /&gt;&lt;blockquote&gt;CREATE TABLE sbtest (&lt;br /&gt;&amp;nbsp; id int(10) unsigned NOT NULL AUTO_INCREMENT,&lt;br /&gt;&amp;nbsp; k int(10) unsigned NOT NULL DEFAULT '0',&lt;br /&gt;&amp;nbsp; c char(120) NOT NULL DEFAULT '',&lt;br /&gt;&amp;nbsp; pad char(60) NOT NULL DEFAULT '',&lt;br /&gt;&amp;nbsp; PRIMARY KEY (id),&lt;br /&gt;&amp;nbsp; KEY k (k)&lt;br /&gt;) ENGINE=InnoDB;&lt;/blockquote&gt;The self-join query uses a range predicate that selects a fixed number (1, 10, 100, 1000 or 10000) of rows. This is an example that selects 1000 rows.&lt;br /&gt;&lt;blockquote&gt;SELECT t1.c, t2.c FROM sbtest t1, sbtest t2 &lt;br /&gt;WHERE t1.id between 245793 and 246792 and t2.id = 2000000 - t1.id&lt;/blockquote&gt;Tests were run using MySQL 5.1.45 for MyISAM, InnoDB plugin 1.0.6 and PBXT 1.1. Results are in queries per second for 1, 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024 concurrent clients. I do not report results for 512 and 1024 clients to avoid long lines in this post. &lt;br /&gt;&lt;br /&gt;The performance of MyISAM is much worse compared to InnoDB and PBXT as the number of rows selected grows from 1 to 10,000.&lt;br /&gt;&lt;br /&gt;Queries per second when the between predicate selects 1 row: &lt;br /&gt;&amp;nbsp; 6843&amp;nbsp; 13157&amp;nbsp; 24552&amp;nbsp; 46822&amp;nbsp; 62588&amp;nbsp; 57023&amp;nbsp; 46568&amp;nbsp; 30582&amp;nbsp; 18745 innodb&lt;br /&gt;&amp;nbsp; 6164&amp;nbsp; 13627&amp;nbsp; 25671&amp;nbsp; 48705&amp;nbsp; 63741&amp;nbsp; 59217&amp;nbsp; 48300&amp;nbsp; 30964&amp;nbsp; 18866 pbxt&lt;br /&gt;&amp;nbsp; 6354&amp;nbsp; 12061&amp;nbsp; 23373&amp;nbsp; 44284&amp;nbsp; 50778&amp;nbsp; 49546&amp;nbsp; 44412&amp;nbsp; 30444&amp;nbsp; 18827 myisam&lt;br /&gt;&lt;br /&gt;Queries per second when the between predicate selects 10 rows:&lt;br /&gt;&amp;nbsp; 4240&amp;nbsp;&amp;nbsp; 8466&amp;nbsp; 16387&amp;nbsp; 33221&amp;nbsp; 53902&amp;nbsp; 39599&amp;nbsp; 36214&amp;nbsp; 28026&amp;nbsp; 18084 innodb&lt;br /&gt;&amp;nbsp; 4802&amp;nbsp;&amp;nbsp; 8835&amp;nbsp; 17688&amp;nbsp; 35917&amp;nbsp; 57461&amp;nbsp; 47691&amp;nbsp; 41578&amp;nbsp; 29087&amp;nbsp; 18558 pbxt&lt;br /&gt;&amp;nbsp; 3890&amp;nbsp;&amp;nbsp; 7129&amp;nbsp; 12512&amp;nbsp; 16450&amp;nbsp; 12272&amp;nbsp; 12304&amp;nbsp; 12441&amp;nbsp; 12448&amp;nbsp; 11304 myisam&lt;br /&gt;&lt;br /&gt;Queries per second when the between predicate selects 100 rows:&lt;br /&gt;&amp;nbsp; 1842&amp;nbsp;&amp;nbsp; 3455&amp;nbsp;&amp;nbsp; 7249&amp;nbsp; 14842&amp;nbsp; 20206&amp;nbsp; 13875&amp;nbsp; 13471&amp;nbsp; 12942&amp;nbsp; 12344 innodb&lt;br /&gt;&amp;nbsp; 2113&amp;nbsp;&amp;nbsp; 3522&amp;nbsp;&amp;nbsp; 7893&amp;nbsp; 13411&amp;nbsp; 18597&amp;nbsp; 18905&amp;nbsp; 18694&amp;nbsp; 18123&amp;nbsp; 12301 pbxt&lt;br /&gt;&amp;nbsp; 1608&amp;nbsp;&amp;nbsp; 2260&amp;nbsp;&amp;nbsp; 2263&amp;nbsp;&amp;nbsp; 1899&amp;nbsp;&amp;nbsp; 1371&amp;nbsp;&amp;nbsp; 1399&amp;nbsp;&amp;nbsp; 1451&amp;nbsp;&amp;nbsp; 1468&amp;nbsp;&amp;nbsp; 1442 myisam&lt;br /&gt;&lt;br /&gt;Queries per second when the between predicate selects 1000 rows:&lt;br /&gt;&amp;nbsp;&amp;nbsp; 380&amp;nbsp;&amp;nbsp;&amp;nbsp; 654&amp;nbsp;&amp;nbsp; 1222&amp;nbsp;&amp;nbsp; 2023&amp;nbsp;&amp;nbsp; 2487&amp;nbsp;&amp;nbsp; 1866&amp;nbsp;&amp;nbsp; 1791&amp;nbsp;&amp;nbsp; 1794&amp;nbsp;&amp;nbsp; 1942 innodb&lt;br /&gt;&amp;nbsp;&amp;nbsp; 303&amp;nbsp;&amp;nbsp;&amp;nbsp; 641&amp;nbsp;&amp;nbsp; 1149&amp;nbsp;&amp;nbsp; 1699&amp;nbsp;&amp;nbsp; 2044&amp;nbsp;&amp;nbsp; 2069&amp;nbsp;&amp;nbsp; 2072&amp;nbsp;&amp;nbsp; 2063&amp;nbsp;&amp;nbsp; 2056 pbxt&lt;br /&gt;&amp;nbsp;&amp;nbsp; 232&amp;nbsp;&amp;nbsp;&amp;nbsp; 248&amp;nbsp;&amp;nbsp;&amp;nbsp; 227&amp;nbsp;&amp;nbsp;&amp;nbsp; 189&amp;nbsp;&amp;nbsp;&amp;nbsp; 141&amp;nbsp;&amp;nbsp;&amp;nbsp; 143&amp;nbsp;&amp;nbsp;&amp;nbsp; 149&amp;nbsp;&amp;nbsp;&amp;nbsp; 148&amp;nbsp;&amp;nbsp;&amp;nbsp; 148 myisam&lt;br /&gt;&lt;br /&gt;Queries per second when the between predicate selects 10000 rows:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 43&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 70&amp;nbsp;&amp;nbsp;&amp;nbsp; 130&amp;nbsp;&amp;nbsp;&amp;nbsp; 213&amp;nbsp;&amp;nbsp;&amp;nbsp; 254&amp;nbsp;&amp;nbsp;&amp;nbsp; 199&amp;nbsp;&amp;nbsp;&amp;nbsp; 194&amp;nbsp;&amp;nbsp;&amp;nbsp; 196&amp;nbsp;&amp;nbsp;&amp;nbsp; 199 innodb&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 49&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 69&amp;nbsp;&amp;nbsp;&amp;nbsp; 123&amp;nbsp;&amp;nbsp;&amp;nbsp; 182&amp;nbsp;&amp;nbsp;&amp;nbsp; 213&amp;nbsp;&amp;nbsp;&amp;nbsp; 216&amp;nbsp;&amp;nbsp;&amp;nbsp; 216&amp;nbsp;&amp;nbsp;&amp;nbsp; 216&amp;nbsp;&amp;nbsp;&amp;nbsp; 216 pbxt&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 24&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 24&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 23&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 19&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 14&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 14&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 15&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 15&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 15 myisam&lt;br /&gt;&lt;br /&gt;MyISAM is at a disadvantage because it does not cache data blocks, so I changed the query to be index only and it is listed below. This did not make MyISAM faster. I think the bottleneck is contention on the key cache mutex.&lt;br /&gt;&lt;blockquote&gt;SELECT t1.id, t2.id FROM sbtest t1, sbtest t2 &lt;br /&gt;WHERE t1.id between 245793 and 246792 and t2.id = 2000000 - t1.id&lt;/blockquote&gt;Queries per second for range 1000 using the index only query:&lt;br /&gt;&amp;nbsp;&amp;nbsp; 457&amp;nbsp;&amp;nbsp;&amp;nbsp; 706&amp;nbsp;&amp;nbsp; 1354&amp;nbsp;&amp;nbsp; 2146&amp;nbsp;&amp;nbsp; 2596&amp;nbsp;&amp;nbsp; 2044&amp;nbsp;&amp;nbsp; 1918&amp;nbsp;&amp;nbsp; 1887&amp;nbsp;&amp;nbsp; 1953 innodb&lt;br /&gt;&amp;nbsp;&amp;nbsp; 576&amp;nbsp;&amp;nbsp;&amp;nbsp; 837&amp;nbsp;&amp;nbsp; 1386&amp;nbsp;&amp;nbsp; 1681&amp;nbsp;&amp;nbsp; 2058&amp;nbsp;&amp;nbsp; 2094&amp;nbsp;&amp;nbsp; 2103&amp;nbsp;&amp;nbsp; 2095&amp;nbsp;&amp;nbsp; 2087 pbxt&lt;br /&gt;&amp;nbsp;&amp;nbsp; 353&amp;nbsp;&amp;nbsp;&amp;nbsp; 244&amp;nbsp;&amp;nbsp;&amp;nbsp; 223&amp;nbsp;&amp;nbsp;&amp;nbsp; 190&amp;nbsp;&amp;nbsp;&amp;nbsp; 140&amp;nbsp;&amp;nbsp;&amp;nbsp; 142&amp;nbsp;&amp;nbsp;&amp;nbsp; 147&amp;nbsp;&amp;nbsp;&amp;nbsp; 146&amp;nbsp;&amp;nbsp;&amp;nbsp; 146 myisam&lt;br /&gt;&lt;br /&gt;Results for MySQL 5.0.84 are similar to 5.1.45 for the range 1000 query:&lt;br /&gt;&amp;nbsp;&amp;nbsp; 390&amp;nbsp;&amp;nbsp;&amp;nbsp; 642&amp;nbsp;&amp;nbsp; 1241&amp;nbsp;&amp;nbsp; 2045&amp;nbsp;&amp;nbsp; 2547&amp;nbsp;&amp;nbsp; 1891&amp;nbsp;&amp;nbsp; 1825&amp;nbsp;&amp;nbsp; 1813&amp;nbsp;&amp;nbsp; 1930 innodb&lt;br /&gt;&amp;nbsp;&amp;nbsp; 303&amp;nbsp;&amp;nbsp;&amp;nbsp; 239&amp;nbsp;&amp;nbsp;&amp;nbsp; 225&amp;nbsp;&amp;nbsp;&amp;nbsp; 189&amp;nbsp;&amp;nbsp;&amp;nbsp; 140&amp;nbsp;&amp;nbsp;&amp;nbsp; 141&amp;nbsp;&amp;nbsp;&amp;nbsp; 147&amp;nbsp;&amp;nbsp;&amp;nbsp; 146&amp;nbsp;&amp;nbsp;&amp;nbsp; 146 myisam&lt;br /&gt;&lt;br /&gt;The query plan for the basic query:&lt;br /&gt;&lt;blockquote&gt;explain&amp;nbsp; SELECT t1.c, t2.c&lt;br /&gt;from sbtest t1, sbtest t2&lt;br /&gt;where t1.id between 245793 and 246792 and t2.id = 2000000 - t1.id&lt;/blockquote&gt;&lt;br /&gt;*************************** 1. row ***************************&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; id: 1&lt;br /&gt;&amp;nbsp; select_type: SIMPLE&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; table: t1&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; type: range&lt;br /&gt;possible_keys: PRIMARY&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; key: PRIMARY&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; key_len: 4&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ref: NULL&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; rows: 1072&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Extra: Using where; Using index&lt;br /&gt;*************************** 2. row ***************************&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; id: 1&lt;br /&gt;&amp;nbsp; select_type: SIMPLE&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; table: t2&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; type: ref&lt;br /&gt;possible_keys: PRIMARY&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; key: PRIMARY&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; key_len: 4&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ref: func&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; rows: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Extra: Using where; Using index&lt;br /&gt;2 rows in set (0.01 sec)&lt;br /&gt;&lt;br /&gt;The query plan for the index only join: &lt;br /&gt;&lt;blockquote&gt;explain&amp;nbsp; SELECT t1.id, t2.id&lt;br /&gt;from sbtest t1, sbtest t2&lt;br /&gt;where t1.id between 1916457 and 1917456 and t2.id = 2000000 - t1.id&lt;/blockquote&gt;*************************** 1. row ***************************&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; id: 1&lt;br /&gt;&amp;nbsp; select_type: SIMPLE&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; table: t1&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; type: range&lt;br /&gt;possible_keys: PRIMARY&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; key: PRIMARY&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; key_len: 4&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ref: NULL&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; rows: 978&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Extra: Using where; Using index&lt;br /&gt;*************************** 2. row ***************************&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; id: 1&lt;br /&gt;&amp;nbsp; select_type: SIMPLE&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; table: t2&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; type: eq_ref&lt;br /&gt;possible_keys: PRIMARY&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; key: PRIMARY&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; key_len: 4&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ref: func&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; rows: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Extra: Using where; Using index&lt;br /&gt;2 rows in set (0.00 sec)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2581232363416829646?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2581232363416829646/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/03/fast-reads-or-fast-scans.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2581232363416829646'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2581232363416829646'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/03/fast-reads-or-fast-scans.html' title='Fast reads or fast scans?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-8713234088138469148</id><published>2010-03-28T08:44:00.000-07:00</published><updated>2010-03-28T08:44:49.373-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>Do we still need innodb_thread_concurrency?</title><content type='html'>Baron wrote this in a comment to a &lt;a href="http://www.xaprb.com/blog/2010/03/04/a-growing-trend-innodb-mutex-contention/"&gt;recent blog post&lt;/a&gt;.&lt;br /&gt;&lt;blockquote&gt;I consider innodb_thread_concurrency a vestigial tail of the “built-in  InnoDB” that ships by default with MySQL 5.0 or 5.1, and should  generally be set to 0 with recent versions of XtraDB or the InnoDB  Plugin.&lt;/blockquote&gt;Can this be? I cannot wait for &lt;a href="http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_thread_concurrency"&gt;innodb_thread_concurrency&lt;/a&gt; to be made obsolete. I run a lot of CPU-bound benchmarks on 8 and 16 core servers and I always set it to 0 in my benchmark framework. A few times I have repeated tests with it set to a non-zero value to understand whether that helps and it has never helped. Alas, this will also make &lt;a href="http://www.facebook.com/note.php?note_id=175800920932"&gt;my FLIFO patch&lt;/a&gt; obsolete.&lt;br /&gt;&lt;br /&gt;I agree with Baron that it should be set to 0 with the InnoDB plugin and XtraDB. This is a big deal that has not received enough attention. InnoDB and XtraDB have gotten much better at supporting highly-concurrent workloads on many-core servers. For me highly-concurrent means 100 to 1000 concurrent transactions and many-core means 8 and 16 core servers.&lt;br /&gt;&lt;br /&gt;This is not an easy workload to support. MySQL is getting much better at it. A lot of work remains to be done. MySQL 5.5 has even more improvements and several problems have yet to be fixed in InnoDB. But this is a huge deal. Maybe we can have a going away party for innodb_thread_concurrency at &lt;a href="http://en.oreilly.com/mysql2010/"&gt;the conference&lt;/a&gt;?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-8713234088138469148?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/8713234088138469148/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/03/do-we-still-need-innodbthreadconcurrenc.html#comment-form' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8713234088138469148'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8713234088138469148'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/03/do-we-still-need-innodbthreadconcurrenc.html' title='Do we still need innodb_thread_concurrency?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2485659573300814766</id><published>2010-03-25T21:17:00.000-07:00</published><updated>2010-03-28T08:20:20.194-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Pay now or pay later</title><content type='html'>I think I have &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/TransactionalReplication"&gt;rpl_transaction_enabled&lt;/a&gt; working for MySQL 5.1 and will publish the patch after more testing. I hope to never port this again but that depends on whether the distribution I use provides an equivalent feature. Apparently people in operations enjoy not having to restore slaves after hardware and software crashes.&lt;br /&gt;&lt;br /&gt;Some  features require payment up front. They either cost a lot for developers to implement or for users to deploy. Others avoid the up front costs but  require payment down the road by users who encounter many problems. I think that MySQL replication has been on the wrong side of this trade off for too long. But things are changing as the replication team has been done a lot of good things for the past few years. I am sure if we &lt;a href="http://en.oreilly.com/mysql2010/public/schedule/speaker/3122"&gt;follow Mats around&lt;/a&gt; at the User conference we can find out what is coming.&lt;br /&gt;&lt;br /&gt;MySQL has to improve to remain competitive &lt;a href="http://scale-out-blog.blogspot.com/2009/02/simple-ha-with-postgresql-point-in-time.html"&gt;as PostgreSQL&lt;/a&gt; and others have compelling features pending or available now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2485659573300814766?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2485659573300814766/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/03/pay-now-or-pay-later.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2485659573300814766'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2485659573300814766'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/03/pay-now-or-pay-later.html' title='Pay now or pay later'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-3546905409452241993</id><published>2010-03-24T07:58:00.000-07:00</published><updated>2010-03-28T08:11:33.191-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Durable, not durable and really not durable</title><content type='html'>A lot of interesting work is being done today on SQL and NoSQL servers. This generates a lot of interesting discussions about CAP, ACID and BASE. Be careful what you read.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://en.wikipedia.org/wiki/ACID#Durability"&gt;D in ACID&lt;/a&gt; stands for durability. A DBMS (SQL or NoSQL) is either durable or it is not. But there are several ways to be not durable. Two popular ways are lose-last-N-transactions (&lt;b&gt;not durable&lt;/b&gt;) and lose-unspecified-amount-of-data (&lt;b&gt;really not durable&lt;/b&gt;). One of these is much better than the other. Alas this distinction is frequently ignored when describing really-not-durable servers.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;lose-last-N-transactions - &lt;b&gt;not durable&lt;/b&gt; servers provide a configuration that enables better performance by allowing the last N transactions to be lost during a crash. InnoDB does this when &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit"&gt;innodb_flush_log_at_trx_commit=2&lt;/a&gt;. I am not a MySQL Cluster expert but I think that &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-basics.html"&gt;global checkpoints&lt;/a&gt; provide the same property. Cassandra can run in this mode. I think that HBase must run in this mode. Some filesystems provide a similar option. The key point is that the system will quickly recover to a consistent point in time after a crash and the time to which it recovers will be not too far in the past.&lt;/li&gt;&lt;li&gt;lose-unspecified-amount-of-data - &lt;b&gt;really not durable&lt;/b&gt; servers can be great for batch and read-only workloads. I am skeptical about using them for OLTP. A server specific version of &lt;a href="http://en.wikipedia.org/wiki/Fsck"&gt;fsck&lt;/a&gt; must be run after a software or hardware crash. The really not durable servers that I am familiar with do not document this behavior prominently (the amount of data that might be lost after crash recovery and the amount of time required to run the recovery tool on a large database file). I suspect some of their users are not aware of the problems that await them.&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-3546905409452241993?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/3546905409452241993/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/03/durable-not-durable-and-really-not.html#comment-form' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3546905409452241993'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3546905409452241993'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/03/durable-not-durable-and-really-not.html' title='Durable, not durable and really not durable'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>15</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1109897021981228359</id><published>2010-03-20T20:36:00.000-07:00</published><updated>2010-03-28T08:20:50.978-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Why do NoSQL systems use XML for configuration files?</title><content type='html'>MySQL uses SQL for data and name-value pairs for configuration files. Cassandra uses XML for configuration files and something closer to name-value pairs for data (or name-value-value-... pairs). Why does it use a stronger data model for configuration than for data?&lt;br /&gt;&lt;br /&gt;While I am writing this in jest I think this is an interesting question.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1109897021981228359?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1109897021981228359/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/03/why-do-nosql-systems-use-xml-for.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1109897021981228359'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1109897021981228359'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/03/why-do-nosql-systems-use-xml-for.html' title='Why do NoSQL systems use XML for configuration files?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-5735988744909726479</id><published>2010-03-16T09:27:00.000-07:00</published><updated>2010-03-28T08:20:41.941-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Can a protocol be GPL?</title><content type='html'>I am trying to understand the behavior of MYSQL_OPT_READ_TIMEOUT which can be used to set a client-side read timeout for connections to a MySQL server. This determines how long a client will wait for a response to a request. I was uncertain based on &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/mysql-options.html"&gt;the documentation&lt;/a&gt;.&lt;br /&gt;&lt;blockquote&gt;The timeout in seconds for attempts to read from the server.             Each attempt uses this timeout value and there are retries             if necessary, so the total effective timeout value is three             times the option value.&lt;/blockquote&gt;I read the code. The documentation is correct. The code attempts to read from the socket three times. I prefer to not have to multiply by three to know the real timeout. But it is too late to change this. Maybe they could add a new option -- MYSQL_OPT_READ_TIMEOUT_DO_NOT_MULTIPLY_BY_THREE.&lt;br /&gt;&lt;br /&gt;I encountered this interesting claim while reading the source in sql/net_serv.c. We recently &lt;a href="http://krow.livejournal.com/684068.html?thread=2669860"&gt;discussed this elsewhere&lt;/a&gt;. I wasn't aware that the claim is still in the source code.&lt;br /&gt;&lt;blockquote&gt;This file is the net layer API for the MySQL client/server protocol, which is a tightly coupled, proprietary protocol owned by MySQL AB.&lt;/blockquote&gt;&lt;blockquote&gt;@note&lt;br /&gt;&amp;nbsp; Any re-implementations of this protocol must also be under GPL, unless one has got an license from MySQL AB stating otherwise.&lt;/blockquote&gt;&lt;b&gt;UPDATE&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The read timeout is enforced by my_real_read in sql/net_serv.c. This code is hard to read. Output from the preprocessor is slightly better. I think it is an accident that the read is retried 3 times for client side code.&lt;br /&gt;&lt;br /&gt;I am pretty sure that Drizzle removed this code. Good for them. &lt;br /&gt;&lt;br /&gt;The outermost loop should be ignored as all retries occur in the first iteration of it. The comment that the first read is done with non blocking mode is wrong when this is used in the client library. That may explain why one of the retries is done:&lt;br /&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (i=0 ; i &amp;lt; 2 ; i++)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (remain &amp;gt; 0)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* First read is done with non blocking mode */&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ((long) (length= vio_read(net-&amp;gt;vio, pos, remain)) &amp;lt;= 0L)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/blockquote&gt;After that fails on the first read attempt this block of code runs and then &lt;b&gt;continue&lt;/b&gt; is called to jump to the start of the &lt;b&gt;while&lt;/b&gt; loop. This comment is again wrong as the code within this block changes the socket to use blocking mode. The comment also contradicts the previous comment mentioned above.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /*&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; We got an error that there was no data on the socket. We now set up&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; an alarm to not 'read forever', change the socket to non blocking&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; mode and try again&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; */&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ((interrupted || length == 0) &amp;amp;&amp;amp; !thr_alarm_in_use(&amp;amp;alarmed))&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (!thr_alarm(&amp;amp;alarmed,net-&amp;gt;read_timeout,&amp;amp;alarm_buff)) /* Don't wait too long */&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/blockquote&gt;The block above is not executed after the second read call fails because &lt;b&gt;thr_alarm_in_use&lt;/b&gt; is true. In this case alarms aren't really used, the &lt;b&gt;alarmed&lt;/b&gt; variable is an int set to 0 at function entry and set to 1 after the first read fails. This block of code is executed after the second read call fails and it executes a continue statement to branch to the start of the while loop and retry the read call for a third time. It increments retry_count to 1 before doing so.&lt;br /&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (thr_alarm_in_use(&amp;amp;alarmed) &amp;amp;&amp;amp; !thr_got_alarm(&amp;amp;alarmed) &amp;amp;&amp;amp;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; interrupted)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* Probably in MIT threads */&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (retry_count++ &amp;lt; net-&amp;gt;retry_count)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; continue;&lt;br /&gt;#ifdef EXTRA_DEBUG&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fprintf(stderr, "%s: read looped with error %d, aborting thread\n",&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; my_progname,vio_errno(net-&amp;gt;vio));&lt;br /&gt;#endif /* EXTRA_DEBUG */&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/blockquote&gt;The previous block of code is also executed after the read fails for the second time. However, retry_count was previously incremented to 1 and net-&amp;gt;retry_count equals 1. So the continue statement is not called after the third read failure and my_real_read returns.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-5735988744909726479?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/5735988744909726479/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/03/can-protocol-be-gpl.html#comment-form' title='17 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5735988744909726479'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5735988744909726479'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/03/can-protocol-be-gpl.html' title='Can a protocol be GPL?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>17</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-7907516496698018643</id><published>2010-03-14T11:16:00.000-07:00</published><updated>2010-03-28T08:11:33.193-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Thoughts on Drizzle</title><content type='html'>I wish the &lt;a href="http://www.rackspacecloud.com/blog/2010/03/13/rackspace-and-drizzle-its-time-to-rethink-everything/"&gt;case for Drizzle&lt;/a&gt; could be made without bashing MySQL. Sometimes it is, but too often it isn't. I guess &lt;a href="http://cloudcomputing.sys-con.com/node/1318133"&gt;this is karma&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;This isn't a rant against Drizzle. This is a rant against pulling up Drizzle by pushing down MySQL. I occasionally have negative things to say about MySQL, but I usually say them to get the problems fixed. We have lots of complaints about MySQL because we use it in production.&lt;br /&gt;&lt;br /&gt;What have I learned about &lt;a href="http://www.rackspacecloud.com/blog/2010/03/13/rackspace-and-drizzle-its-time-to-rethink-everything/"&gt;the Drizzle vision&lt;/a&gt;&lt;b&gt;?&lt;/b&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Drizzle will re-think everything&lt;/b&gt;.&amp;nbsp;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Alas, I have problems to solve today. While I am passionate about doing things correctly, I am also aware that compromises must be made to get things done. Some of those compromises turn out to be mistakes. It isn't always possible to know which compromises will turn out to be a mistake. Nor is it always possible to identify the right thing. &lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;b&gt;You hate MySQL replication? You now love Drizzle&lt;/b&gt;.&amp;nbsp;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;I love what Drizzle might do for replication. I love what MySQL is doing with replication. I can't compare the two until Drizzle replication is running in production.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;b&gt;MySQL has a data type called a 3-byte integer. Think about that for a moment. On today’s server hardware that does not make a whole lot of sense&lt;/b&gt;.&amp;nbsp;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;I thought about it. Does this mean that some of my tables will grow from 3G to 4G on disk? I won't be happy if that is the result.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;strong&gt; &lt;b&gt;No triggers or stored procedures&lt;/b&gt;&lt;/strong&gt;&lt;b&gt;. That stuff is bloat as done in MySQL, and Drizzle has other ways to deal with these needs. These capabilities can be added in later as needed such that they are done right.&lt;/b&gt;&amp;nbsp;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;I need stored procedures. They are required for high-performance OLTP as they minimize transaction duration for multi-statement transactions. Alas, I have yet to use them in MySQL. &lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;b&gt;MyISAM is gone. Long live the Queen!&lt;/b&gt;&amp;nbsp;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Alas, I need MyISAM. Long-running insert, update and delete statements consume too many resources in InnoDB. Such statements are used for reporting jobs on slaves and in that case I want to use InnoDB for production tables and MyISAM for transient tables.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt; &lt;b&gt;Ever tried to compile MySQL from source. Hah! Yeah, drizzle builds like butter&lt;/b&gt;.&lt;/li&gt;&lt;ul&gt;&lt;li&gt; I have no problems building MySQL from source. I have had more problems building Drizzle because it has a few more dependencies (google protobufs, libdrizzle). But both are easy to build and nobody cares too much in either case with one exception. Does Drizzle build on Windows?&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-7907516496698018643?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/7907516496698018643/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/03/thoughts-on-drizzle.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7907516496698018643'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7907516496698018643'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/03/thoughts-on-drizzle.html' title='Thoughts on Drizzle'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1024090234306056044</id><published>2010-03-12T18:29:00.000-08:00</published><updated>2010-03-28T08:11:33.193-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Index only</title><content type='html'>A problem with SQL is SQL. It is easy to write queries that require random IO in the worst case. It is usually easy to find queries that do too much random IO on a NoSQL system as you must code the extra data fetches manually.&lt;br /&gt;&lt;br /&gt;Digg has begun to write about their reasons for migrating from MySQL to Cassandra. They &lt;a href="http://about.digg.com/node/564"&gt;provide an excellent summary&lt;/a&gt; and then describe a &lt;a href="http://about.digg.com/blog/looking-future-cassandra"&gt;performance problem fixed by the migration&lt;/a&gt;. I think Cassandra and a few other members of the NoSQL family are amazing technology but I don't think a migration was needed to fix this performance problem. A better index on the Diggs table would have done that. &lt;a href="http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Performance_Lie/"&gt;Others have said&lt;/a&gt; the same thing. Maybe I don't have all of the details. I can only go on what was written in the blog.&lt;br /&gt;&lt;br /&gt;You can learn more about the power of indexes &lt;a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13284"&gt;at the MySQL conference&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The Diggs table was the source of the problem:&lt;br /&gt;&lt;blockquote&gt;CREATE TABLE Diggs (&lt;br /&gt;&amp;nbsp; id&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; INT(11),&lt;br /&gt;&amp;nbsp; itemid&amp;nbsp; INT(11),&lt;br /&gt;&amp;nbsp; userid&amp;nbsp; INT(11),&lt;br /&gt;&amp;nbsp; digdate DATETIME,&lt;br /&gt;&amp;nbsp; PRIMARY KEY (id),&lt;br /&gt;&amp;nbsp; KEY user&amp;nbsp; (userid),&lt;br /&gt;&amp;nbsp; KEY item&amp;nbsp; (itemid)&lt;br /&gt;) ENGINE=InnoDB;&lt;/blockquote&gt;It supported an important query that was too slow. A simple form of this query is:&lt;br /&gt;&lt;blockquote&gt;SELECT digdate, id&lt;br /&gt;FROM Diggs&lt;br /&gt;WHERE userid in (10, 20, 30) AND itemid = 50&lt;br /&gt;ORDER BY digdate DESC, id DESC LIMIT 4;&lt;/blockquote&gt;This query requires too much random IO because it isn't index only. The query can use either the index on itemid or the index on userid. In both cases it will scan more entries than it needs to from the secondary index and then lookup the remaining columns from the primary index. Each lookup on the primary index can do one disk seek. On my test server the plan for this query is:&lt;br /&gt;&lt;blockquote&gt;id&amp;nbsp;&amp;nbsp;&amp;nbsp; select_type&amp;nbsp;&amp;nbsp;&amp;nbsp; table&amp;nbsp;&amp;nbsp;&amp;nbsp; type&amp;nbsp;&amp;nbsp;&amp;nbsp; possible_keys&amp;nbsp;&amp;nbsp;&amp;nbsp; key&amp;nbsp;&amp;nbsp;&amp;nbsp; key_len&amp;nbsp;&amp;nbsp;&amp;nbsp; ref&amp;nbsp;&amp;nbsp;&amp;nbsp; rows&amp;nbsp;&amp;nbsp;&amp;nbsp; Extra&lt;br /&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp; SIMPLE&amp;nbsp;&amp;nbsp;&amp;nbsp; Diggs&amp;nbsp;&amp;nbsp;&amp;nbsp; ref&amp;nbsp;&amp;nbsp;&amp;nbsp; user,item&amp;nbsp;&amp;nbsp;&amp;nbsp; item&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&amp;nbsp;&amp;nbsp;&amp;nbsp; const&amp;nbsp;&amp;nbsp;&amp;nbsp; 8960&amp;nbsp;&amp;nbsp;&amp;nbsp; Using where; Using filesort&lt;/blockquote&gt;After running the query I ran &lt;b&gt;SHOW SESSION STATUS LIKE "Handler_read%"&lt;/b&gt; and the result from that is below. The query scanned 5000 entries from the secondary index and would have done more than 5000 disk seeks in the worst case to lookup columns from the primary key index.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Variable_name&amp;nbsp;&amp;nbsp;&amp;nbsp; Value&lt;br /&gt;Handler_read_first&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;br /&gt;Handler_read_key&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;br /&gt;Handler_read_next&amp;nbsp;&amp;nbsp;&amp;nbsp; 5000&lt;br /&gt;Handler_read_prev&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;br /&gt;Handler_read_rnd&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;br /&gt;Handler_read_rnd_next&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/blockquote&gt;The query is much faster for a table with different indexes &lt;br /&gt;&lt;blockquote&gt;CREATE TABLE DiggsFast (&lt;br /&gt;&amp;nbsp; id&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; INT(11),&lt;br /&gt;&amp;nbsp; itemid&amp;nbsp; INT(11),&lt;br /&gt;&amp;nbsp; userid&amp;nbsp; INT(11),&lt;br /&gt;&amp;nbsp; digdate DATETIME,&lt;br /&gt;&amp;nbsp; PRIMARY KEY (itemid,userid,digdate,id),&lt;br /&gt;&amp;nbsp; UNIQUE KEY (id)&lt;br /&gt;) ENGINE=InnoDB;&lt;/blockquote&gt;&lt;br /&gt;The query has a better plan:&lt;br /&gt;&lt;blockquote&gt;id&amp;nbsp;&amp;nbsp;&amp;nbsp; select_type&amp;nbsp;&amp;nbsp;&amp;nbsp; table&amp;nbsp;&amp;nbsp;&amp;nbsp; type&amp;nbsp;&amp;nbsp;&amp;nbsp; possible_keys&amp;nbsp;&amp;nbsp;&amp;nbsp; key&amp;nbsp;&amp;nbsp;&amp;nbsp; key_len&amp;nbsp;&amp;nbsp;&amp;nbsp; ref&amp;nbsp;&amp;nbsp;&amp;nbsp; rows&amp;nbsp;&amp;nbsp;&amp;nbsp; Extra&lt;br /&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp; SIMPLE&amp;nbsp;&amp;nbsp;&amp;nbsp; DiggsFast&amp;nbsp;&amp;nbsp;&amp;nbsp; range&amp;nbsp;&amp;nbsp;&amp;nbsp; PRIMARY&amp;nbsp;&amp;nbsp;&amp;nbsp; PRIMARY&amp;nbsp;&amp;nbsp;&amp;nbsp; 8&amp;nbsp;&amp;nbsp;&amp;nbsp; NULL&amp;nbsp;&amp;nbsp;&amp;nbsp; 149&amp;nbsp;&amp;nbsp;&amp;nbsp; Using where; Using index; Using filesort&lt;/blockquote&gt;&amp;nbsp;It also is much better in reality. The output from  &lt;b&gt;SHOW SESSION STATUS LIKE "Handler_read%"&lt;/b&gt; is listed below. With a better index the query scans 150 entries from 5 range scans of the index. It should do about 5 disk seeks in the worst case. It is also index only so it doesn't have to lookup other columns after the index scan. Although that doesn't matter much in this case because the query uses the primary key index which has all columns for an InnoDB table. This query will be much faster than the previous one (5 disk seeks versus 5000).&lt;br /&gt;&lt;br /&gt;Note that this query uses the first two columns in the primary index for the predicates on itemid and userid. InnoDB stores all columns in the primary key index entries so any query that uses the PK index is index only. &lt;br /&gt;&lt;blockquote&gt;Variable_name&amp;nbsp;&amp;nbsp;&amp;nbsp; Value&lt;br /&gt;Handler_read_first&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;br /&gt;Handler_read_key&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&lt;br /&gt;Handler_read_next&amp;nbsp;&amp;nbsp;&amp;nbsp; 150&lt;br /&gt;Handler_read_prev&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;br /&gt;Handler_read_rnd&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;br /&gt;Handler_read_rnd_next&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/blockquote&gt;&amp;nbsp;&lt;b&gt;UPDATE&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I created another variant of the Diggs table that uses a secondary index for the query. InnoDB includes all columns from a PK index in the secondary index to serve as the pointer to the row. Note there is a difference between being 'in the index' and being indexed.&lt;br /&gt;&lt;blockquote&gt;CREATE TABLE DiggsFast2 (&lt;br /&gt;&amp;nbsp; id&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; INT(11),&lt;br /&gt;&amp;nbsp; itemid&amp;nbsp; INT(11),&lt;br /&gt;&amp;nbsp; userid&amp;nbsp; INT(11),&lt;br /&gt;&amp;nbsp; digdate DATETIME,&lt;br /&gt;&amp;nbsp; KEY itemuserdig (itemid,userid,digdate),&lt;br /&gt;&amp;nbsp; PRIMARY KEY (id)&lt;br /&gt;) ENGINE=InnoDB;&lt;/blockquote&gt;From the query plan:&lt;br /&gt;&lt;blockquote&gt;id&amp;nbsp;&amp;nbsp;&amp;nbsp; select_type&amp;nbsp;&amp;nbsp;&amp;nbsp; table&amp;nbsp;&amp;nbsp;&amp;nbsp; type&amp;nbsp;&amp;nbsp;&amp;nbsp; possible_keys&amp;nbsp;&amp;nbsp;&amp;nbsp; key&amp;nbsp;&amp;nbsp;&amp;nbsp; key_len&amp;nbsp;&amp;nbsp;&amp;nbsp; ref&amp;nbsp;&amp;nbsp;&amp;nbsp; rows&amp;nbsp;&amp;nbsp;&amp;nbsp; Extra&lt;br /&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp; SIMPLE&amp;nbsp;&amp;nbsp;&amp;nbsp; DiggsFast2&amp;nbsp;&amp;nbsp;&amp;nbsp; range&amp;nbsp;&amp;nbsp;&amp;nbsp; itemuserdig&amp;nbsp;&amp;nbsp;&amp;nbsp; itemuserdig&amp;nbsp;&amp;nbsp;&amp;nbsp; 10&amp;nbsp;&amp;nbsp;&amp;nbsp; NULL&amp;nbsp;&amp;nbsp;&amp;nbsp; 150&amp;nbsp;&amp;nbsp;&amp;nbsp; Using where; Using index; Using filesort&lt;br /&gt;digdate&amp;nbsp;&amp;nbsp;&amp;nbsp; id&lt;/blockquote&gt;And SHOW SESSION LIKE "Handler_read%"&lt;br /&gt;&lt;blockquote&gt;Variable_name&amp;nbsp;&amp;nbsp;&amp;nbsp; Value&lt;br /&gt;Handler_read_first&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;br /&gt;Handler_read_key&amp;nbsp;&amp;nbsp;&amp;nbsp; 5&lt;br /&gt;Handler_read_next&amp;nbsp;&amp;nbsp;&amp;nbsp; 150&lt;br /&gt;Handler_read_prev&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;br /&gt;Handler_read_rnd&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;br /&gt;Handler_read_rnd_next&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1024090234306056044?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1024090234306056044/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/03/index-only.html#comment-form' title='24 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1024090234306056044'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1024090234306056044'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/03/index-only.html' title='Index only'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>24</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-8145685778199426897</id><published>2010-03-01T08:40:00.000-08:00</published><updated>2010-03-28T08:11:33.194-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Plays well with others</title><content type='html'>A few years ago MySQL+memcached and PostgreSQL+memcached were the only choices for high-scale applications. That has changed with the arrival of NoSQL. Change is good. Open-source monopolies are not much better than closed-source ones from the perspective of an end user. I expect MySQL to focus much more on the needs of high-scale applications to remain relevant. I also expect it to play better with others as it is no longer the only persistent data store for high-scale applications.&lt;br /&gt;&lt;br /&gt;I think that MySQL+memcached is still the default choice and I don't think it is going away in the high-scale market. But some high-scale applications either don't need all of the features of a SQL RDBMS or are willing to go without those features to scale. This isn't a blanket endorsement of NoSQL as the definition of NoSQL is weak. I am referring to the NoSQL systems that support high-scale.&lt;br /&gt;&lt;br /&gt;I don't believe all of the bad press that MySQL receives from high-scale applications. I know that some problems with MySQL are self-inflicted (seriously, I know this). It is hard to diagnose many problems for which the primary symptom is a slow MySQL server so it is also hard to identify self-inflicted problems. I also don't think that some NoSQL systems will provide a different scale-out experience than MySQL given that some NoSQL systems scale-out by sharding (just like MySQL) and that I can deploy MySQL like NoSQL (disallow joins and secondary indexes, use HANDLER statements)&lt;br /&gt;&lt;br /&gt;I also wonder whether affordable SSD/Flash reduces the need to migrate from MySQL to NoSQL. Many MySQL deployments that were IO bound when it was difficult to get more than a few thousand IOPs on a commodity server can now get 10,000 to 100,000 IOPs in that server at commodity prices.&lt;br /&gt;&lt;br /&gt;MySQL and NoSQL are also at significantly different stages. MySQL is mature and maturity has its benefits. MySQL has amazing support and documentation.&amp;nbsp; There are client libraries for almost every language that you should use. There are even bindings for languages you shouldn't use. The MySQL C API is easy to use. The JDBC driver is awesome, even if support for JDBC makes it much more complex than needed. There is a lot of MySQL expertise that can be hired or rented (MySQL, Monty Program, Percona, Open Query, Pythian, FromDual) and there is some innovation (not enough companies, but they are doing amazing things) from third-parties such as InfiniDB, InfoBright and TokuDB. &lt;br /&gt;&lt;b&gt;&lt;br /&gt;What happened?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;NoSQL systems are improving faster than MySQL, MySQL has focused on features for the enterprise RDBMS market in the past two releases and the changes we need from MySQL are hard to implement.&amp;nbsp; Change is hard because MySQL is a complex server that supports many features. Change is also much harder than it should be because of the MySQL coding style. Parts of it are not modular and features are entangled. Some of the difficulty could be overcome were there interest from external contributors. There are external contributors willing and able to improve server code but they are working on other projects like NoSQL. The MySQL effort is also split (or diluted) between official MySQL, Drizzle and MariaDB.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What really happened?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I don't know. It may have been better for the business of MySQL to focus on the enterprise market. I can describe some of the problems that need to be fixed in MySQL to make things easier for me. I think other high-scale applications share these problems:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Multi-master - high-scale applications have users around the world. Latency is reduced by distributing databases and application servers around the world. Databases are rarely sharded by location so the data store must support multi-master deployments with conflict resolution and eventual consistency for some database tables. There is no support for conflict resolution in MySQL. It might be possible to do something with the output of row-based replication.&lt;/li&gt;&lt;li&gt;SQL - this is a problem that MySQL cannot fix. SQL makes it easy to make mistakes. Mistakes include insert, update and delete statements that lock all rows in a table. Alas, the EXPLAIN statement in MySQL &lt;a href="http://draft.blogger.com/%3Ca%20href=%22http://bugs.mysql.com/bug.php?id=14745%22%3Edoes%20not%20support%3C/a%3E%20"&gt;does not support&lt;/a&gt; insert, update and delete statements. Another serious mistake is a query that has a lousy response time when the database buffer cache is cold because it does many random disk reads. The EXPLAIN statement in MySQL does not provide an estimate for the worst-case number of random disk IOs and many people who write SQL don't know how to interpret it to get an estimate. Worst-case performance is critical for queries run during web requests.&lt;/li&gt;&lt;li&gt;Write-optimization - several NoSQL systems are write-optimized including Cassandra, HBase and Bigtable. A write-optimized system makes it possible to use more indexes than an update-in-place data store. With more indexes it is more likely that there can be an index defined for every popular query and the index reduces the number of disk reads that must be done to evaluate the query.&amp;nbsp; This improves worst-case query response time and reduces the need to use memcached or a huge database buffer cache. Write-optimization has finally arrived for MySQL with the availability of &lt;a href="http://www.tokutek.com/"&gt;TokuDB&lt;/a&gt;. I hope that &lt;a href="http://www.rethinkdb.com/"&gt;RethinkDB&lt;/a&gt; provides a GA version in the future.&lt;/li&gt;&lt;li&gt;Monitoring - without good monitoring you will either spend too much time fixing performance problems or never find them and buy too much hardware. I suspect that monitoring in MySQL is much better than anything in a NoSQL system but MySQL is missing features that make it easy to understand current and new sources of workload. I need to aggregate the overhead (CPU time, disk operations, rows read, ...) by database user, table and statement. It is extremely hard or not possible to do this by database user and table. It became possible in MySQL 5.1 to do this by statement for short periods of time by using the slow query log in MySQL 5.1. Prior to MySQL 5.1, the slow query log was limited to queries that ran for at least two seconds. The alternative is to use tcpdump with &lt;a href="http://www.maatkit.org/doc/mk-query-digest.html"&gt;mk-query-digest&lt;/a&gt;. Despite all of the work that has gone into the &lt;a href="http://dev.mysql.com/doc/performance-schema/en/index.html"&gt;performance schema&lt;/a&gt;, MySQL has yet to support anything like my favorite feature -- &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/UserTableMonitoring"&gt;user and table monitoring&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Crash proof slaves - replication slaves are not crash proof. The slave commits transactions to a storage engine and then updates a state file to maintain the replication offset. There is nothing to keep the state file and storage engine in sync. Until recently there wasn't even an option to force the state file to disk after it was updated. Unplanned hardware reboots are frequent when there are hundreds or thousands of slaves. If you are clever and use the right version of InnoDB it is possible in some cases to figure out the correct offset for the slave after a crash and repair it manually. This isn't a good use of DBA time. Otherwise DBAs must waste their time and network bandwidth to restore the slaves. The Google patch published two different fixes for this: &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/TransactionalReplication"&gt;rpl_transaction_enabled&lt;/a&gt; and &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/GlobalTransactionIds"&gt;global transaction IDs&lt;/a&gt;. MySQL is working on a fix.&lt;/li&gt;&lt;li&gt;Automated failover - for MySQL deployments that have many slaves connected to one master it isn't possible to automate failover when a master crashes. Tungsten and DRBD might make this better.&amp;nbsp;&lt;/li&gt;&lt;li&gt;Resharding - sharding is an excellent way to scale MySQL. Sharding usually requires resharding. Resharding is hard and must be done with minimal downtime. It might be possible to build a tool that uses row-based replication output to reshard a database in the background with little downtime. No such tool exists today.&lt;/li&gt;&lt;li&gt;Replication lag - a slave with replication lag is useless for OLTP scaleout. The replication thread is single-threaded. MySQL is working on support for parallel execution on a slave. Until then we need to improve &lt;a href="http://www.maatkit.org/doc/mk-slave-prefetch.html"&gt;mk-slave-prefetch&lt;/a&gt; (Domas can you hear me).&lt;/li&gt;&lt;li&gt;Schema change - these are frequently needed for growing high-scale applications. Long running schema changes in MySQL require downtime unless first done on a slave (assuming you have a spare slave and that slave can become the master after the change). Users don't like downtime. I think it is possible to do many of these on a master with minimal downtime using the output from row-based replication. Alas, there is no tool for that today.&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;What is NoSQL?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Do your homework when evaluating a NoSQL system as they differ greatly from each other:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Crash safety - most NoSQL systems are crash safe but a few are not. I would limit the use of systems that are not crash safe to supporting batch workload. Unplanned server reboots are frequent for high-scale applications when a large number of servers is used. At least two prominent members of the NoSQL family are not crash safe. That should be documented in bold text on their project pages. It is not.&lt;/li&gt;&lt;li&gt;Sharding - some NoSQL systems do sharding. BigTable and others do not. With sharding it is possible to support transactions and multiple-indexes on a table within the scope of a shard. That then requires support for resharding. It also requires that queries on secondary indexes to be run on all shards while queries on primary indexes can be limited to run on one shard which may limit the ability to use secondary indexes.&lt;/li&gt;&lt;li&gt;Index types - Many NoSQL systems are limited to hash indexes. You can't do range scans on hash indexes. I wonder whether this leads to data redundancy when every query must be resolved by one index lookup.&lt;/li&gt;&lt;li&gt;Secondary indexes - NoSQL systems like BigTable not only do not support transactions, they also do not support secondary indexes. You can explicitly maintain a secondary index but there is no support to make multiple-changes atomic and there can be a failure between the primary and secondary index updates which results in data drift. It is also difficult to do consistent reads between the two.&lt;/li&gt;&lt;li&gt;Consistent reads - consistency is usually the responsibility of the client and done by using per-row timestamps.&lt;/li&gt;&lt;li&gt;Single-node performance - I know that &lt;b&gt;performance != scale-out&lt;/b&gt; but scale-out is not a substitute for lousy single-node performance in the high-scale application market. It might be acceptable to use 5X as many nodes because your data store is slow when you end up using 40 nodes. This becomes a show-stopper when you end up using thousands of nodes. One NoSQL system has accepted this compromise. While I know it has many other use cases I think that will limit the use of it for high-scale applications.&lt;/li&gt;&lt;li&gt;Network efficiency - MySQL reduces use of the network because all query evaluation is done at the server. All NoSQL systems evaluate predicates implied by the index access. Only some NoSQL systems evaluate non-indexed predicates. This can result in more data returned to the client.&lt;/li&gt;&lt;li&gt;Technology or solution - MySQL is more mature than the NoSQL systems. A lot of work remains to grow NoSQL from technology into solution with support for audit, backup, monitoring and all of the other things required to scale in a large company. &lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-8145685778199426897?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/8145685778199426897/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/03/plays-well-with-others.html#comment-form' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8145685778199426897'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8145685778199426897'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/03/plays-well-with-others.html' title='Plays well with others'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1130591546548783283</id><published>2010-02-20T14:12:00.000-08:00</published><updated>2010-03-28T08:11:33.194-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Time to use -Werror</title><content type='html'>&lt;a href="http://bugs.mysql.com/bug.php?id=51289"&gt;This&lt;/a&gt; is in 5.1.44. It is easy to make mistakes like this in a large and rapidly changing code base. Why not compile with &lt;a href="http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html"&gt;-Werror&lt;/a&gt; to catch the problem?&lt;br /&gt;&lt;blockquote&gt;double Item_cache_decimal::val_real()&lt;br /&gt;{&lt;br /&gt;DBUG_ASSERT(fixed);&lt;br /&gt;double res;&lt;br /&gt;if (!value_cached &amp;amp;&amp;amp; !cache_value())&lt;br /&gt;return NULL;&lt;br /&gt;my_decimal2double(E_DEC_FATAL_ERROR, &amp;amp;decimal_value, &amp;amp;res);&lt;br /&gt;return res;&lt;br /&gt;}&lt;/blockquote&gt;&lt;pre class="note"&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1130591546548783283?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1130591546548783283/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/02/time-to-use-werror.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1130591546548783283'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1130591546548783283'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/02/time-to-use-werror.html' title='Time to use -Werror'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2105708834626415412</id><published>2010-02-15T09:02:00.000-08:00</published><updated>2010-03-28T08:11:33.195-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Save MyISAM</title><content type='html'>What is the future for MyISAM? MySQL has invested a lot in storage engines over the past few years (Falcon, Maria) and it isn't clear that anything will come from those efforts. A lot of effort has been put into InnoDB and much will come from that. There has not been a significant effort to improve MyISAM (other than hot backup). What could be done with it?&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Support undo. The manual &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/ansi-diff-transactions.html"&gt;claims that MyISAM&lt;/a&gt; supports atomic operations. They must use a different meaning for &lt;b&gt;atomic&lt;/b&gt;. When a long-running insert, update, delete or replace statement is killed it remains half-done for MyISAM. This could be fixed by supporting undo for MyISAM. I will guess that MySQL can reuse some of the code they added for hot backup to implement undo.&lt;/li&gt;&lt;li&gt;Reduce mutex contention. MyISAM could use multiple key caches per table as &lt;a href="http://askmonty.org/worklog/Server-Sprint/?tid=85"&gt;MPAB has proposed&lt;/a&gt;.&lt;/li&gt;&lt;/ol&gt;MyISAM performance matters even when you don't explicitly create MyISAM tables. MyISAM is used for some implicit temp tables created to evaluate ORDER BY and GROUP BY clauses. &lt;br /&gt;&lt;ol&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2105708834626415412?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2105708834626415412/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/02/save-myisam.html#comment-form' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2105708834626415412'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2105708834626415412'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/02/save-myisam.html' title='Save MyISAM'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-3309689673108036564</id><published>2010-02-03T16:03:00.000-08:00</published><updated>2010-03-28T08:11:33.196-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Dude, where is my link?</title><content type='html'>What happened to the obvious link from &lt;a href="http://mysql.com/"&gt;mysql.com&lt;/a&gt; to &lt;a href="http://dev.mysql.com/"&gt;dev.mysql.com&lt;/a&gt;? Did it move &lt;a href="http://www.postgresql.org/"&gt;here&lt;/a&gt; or &lt;a href="http://www.askmonty.org/"&gt;here&lt;/a&gt;?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-3309689673108036564?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/3309689673108036564/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2010/02/dude-where-is-my-link.html#comment-form' title='19 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3309689673108036564'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3309689673108036564'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2010/02/dude-where-is-my-link.html' title='Dude, where is my link?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>19</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2468893924987291648</id><published>2009-12-29T16:08:00.000-08:00</published><updated>2010-03-28T08:11:33.196-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Save MySQL, save the world</title><content type='html'>Things are &lt;a href="http://monty-says.blogspot.com/2009/12/help-keep-internet-free.html"&gt;getting interesting&lt;/a&gt;. MPAB continues to drive away potential supporters with the tone of their messages, the inclusion of pointless assertions, and the complete lack of references.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;i&gt;For example, Oracle could buy some companies developing PostgreSQL and target the core developers. Without the core developers working actively on PostgreSQL, the PostgreSQL project will be weakened tremendously and it could even die as ar result.&lt;/i&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Or another company could hire all of the core developers from one area of MySQL. I am glad that people have the opportunity to work elsewhere.&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;i&gt;MySQL is the database with the highest number of installed units in all markets (except in the high enterprise market where it has only a medium size unit share).&lt;/i&gt;&amp;nbsp;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;All markets or non-embedded markets? SQLite claims &lt;a href="http://www.sqlite.org/mostdeployed.html"&gt;at least 500M&lt;/a&gt; deployments. Oracle claims &lt;a href="http://www.oracle.com/database/berkeley-db/index.html"&gt;there are 200M&lt;/a&gt; deployments of Berkeley DB.&lt;i&gt;&amp;nbsp;&lt;/i&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;i&gt;MySQL is causing Oracle sales losses around 1 billion usd/year (in lost sales to MySQL and because of having to do heavy discounting when competing with MySQL).&lt;/i&gt;&amp;nbsp;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Where does this number come from?&lt;i&gt;&amp;nbsp;&lt;/i&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;i&gt;Oracle did not provide any remedies to the EC and the &lt;a href="http://www.marketwire.com/press-release/Oracle-Corporation-NASDAQ-ORCL-1090000.html"&gt;public promises&lt;/a&gt; they have published are just &lt;a href="http://monty-says.blogspot.com/2009/12/oracle-gives-only-empty-promises-for.html"&gt;empty promises&lt;/a&gt;.&lt;/i&gt;&amp;nbsp;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;How do we know what Oracle has or has not promised? I am sure that Oracle can contact the EC without involving MPAB. Besides, I thought the hearings at the EC were private. I don't trust summaries of the hearings from anyone on either side of the issue.&lt;i&gt;&amp;nbsp;&lt;/i&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;i&gt;The open source software it has acquired, like InnoDB, has after being acquired, been developed secretly and slowly which is against how things are done in the open source environment.&lt;/i&gt;&amp;nbsp;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Compared to what? From my perspective neither Oracle/InnoDB nor Sun/MySQL have been great in this area. But so what? Both continue to improve their software and most people don't care whether or not the development process is open.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;i&gt;MariaDB is an enhanced (faster, more features and less bugs) drop-in replacement of MySQL that is only available under GPL.&lt;/i&gt;&amp;nbsp;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;The GA release of MariaDB has no bugs because there is no GA release. &lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;i&gt;The fork can't be used with other products that are using MySQL as a building block for their closed source applications.&lt;/i&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Yes, it can (thanks Sheeri)&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;i&gt;The fork has to work in an environment where no one has to pay for it.&lt;/i&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;I will speculate that most of the money earned by MySQL is from customers who don't have to pay. People buy support contracts because the support product is excellent, not because they must.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;i&gt;As long as the products are recognized to be competing, any solution that the EC would accept has to ensure that there is as much competition in the database field before the merger as after the merger.&lt;/i&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;Is it that simple? Is competition a binary decision?&amp;nbsp;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;i&gt;If MySQL were licensed under a permissive license, like BSD, then the users would benefit as they now can securely continue to use MySQL in all context. Monty Program Ab would also switch to only produce code under BSD for the MariaDB server, to ensure that also MariaDB can be used in all context. Monty Program Ab would benefit very little from of this; We cannot take money from selling BSD; We can only hope that there is a market demand for our skilled engineers.&lt;/i&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;I would love for it to be BSD. Then I can form a company to build a custom version of it make people pay for that version. Your blog post cites EnterpriseDB for doing this. Why can't MPAB do the same?&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;i&gt;The companies that would benefit the most from BSD are the companies that enhance MySQL (storage engine vendors and companies providing extensions to MySQL) and companies that embed MySQL in their products, like Adobe or Cisco.&lt;/i&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;How does saving storage engine vendors, Adobe and Cisco save the internet? This has nothing to do with the MySQL users you claim have something at risk. Storage engine vendors don't have thousands of customers. I assume app vendors who embed MySQL have more customers, but even in that case I fail to see how the internet is at risk.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2468893924987291648?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2468893924987291648/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/12/save-mysql-save-world.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2468893924987291648'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2468893924987291648'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/12/save-mysql-save-world.html' title='Save MySQL, save the world'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-3144811453185678435</id><published>2009-12-15T07:34:00.000-08:00</published><updated>2010-03-28T08:11:33.197-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='shill'/><title type='text'>MySQL 5.5 and semi-sync replication are here</title><content type='html'>&lt;a href="http://blogs.mysql.com/kaj/2009/12/15/mysql-550-m2-a-milestone-ready-to-download"&gt;MySQL 5.5&lt;/a&gt; is here with &lt;a href="http://dev.mysql.com/doc/refman/5.5/en/mysql-nutshell.html"&gt;several new features&lt;/a&gt; including &lt;a href="http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html"&gt;semi-sync replication&lt;/a&gt;. The MySQL team has been getting a lot done lately. I know because I get many bug and feature request status updates. They did a lot of work on semi-sync because their implementation was a rewrite rather than a port of the Google patch. It had to be done that way to maintain code quality. Those of us who maintain large patches against MySQL frequently do things the convenient way rather than the right way to simplify patch maintenance.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html"&gt;Read the MySQL manua&lt;/a&gt;l to understand what semi-sync does. It is not synchronous replication. With semi-sync each connection to a master can lose at most one transaction when the master crashes. This reduces but does not eliminate the problem of transaction loss on a master crash. Semi-sync also limits a busy connection so that it cannot commit faster than a slave can receive its transactions.&lt;br /&gt;&lt;br /&gt;Additional references for this include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://forge.mysql.com/wiki/ReplicationFeatures/SemiSyncReplication"&gt;notes&lt;/a&gt; from the MySQL forge page&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://code.google.com/p/google-mysql-tools/wiki/SemiSyncReplicationDesign"&gt;design docs&lt;/a&gt; from the Google patch&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-3144811453185678435?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/3144811453185678435/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/12/mysql-55-and-semi-sync-replication-are.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3144811453185678435'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3144811453185678435'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/12/mysql-55-and-semi-sync-replication-are.html' title='MySQL 5.5 and semi-sync replication are here'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2241581870985370281</id><published>2009-12-02T22:22:00.000-08:00</published><updated>2010-03-28T08:11:33.197-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Oracle RDBMS != MySQL RDBMS</title><content type='html'>The Oracle and MySQL RDBMS are very different products. This makes me happy. I used to work on the Oracle RDBMS. It has a lot of features that do amazing things. Unfortunately, this also makes it extremely hard to modify. MySQL doesn't have as many features. This makes it easier to modify. This also means there are a lot of things to fix in it when you care about high-performance and high-availability OLTP workloads.&lt;br /&gt;&lt;br /&gt;But now we have a new story emerging from an independent source of news on the Oracle-Sun merger.&lt;br /&gt;&lt;blockquote&gt;&lt;a href="http://blogs.zdnet.com/open-source/?p=5323"&gt;One more week won’t change the fact that MySQL competes fiercely with Oracle’s database products including its flagship ‘11g’ across all major market segments.&lt;/a&gt;&lt;br /&gt;&lt;/blockquote&gt;What does this mean besides a few more months of uncertainty for people at Sun/MySQL? Do they compete for customers? Or do they compete based on technology? We can only guess as the report is not public. I am sure it is a great document, at least &lt;a href="http://blogs.the451group.com/opensource/2009/11/30/oracle-sun-statements-and-observations/"&gt;that is what I have been told&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Can we get this done and return our focus to the roadmap for 5.4, 6.0 and the MySQL User Conference? I would much rather bicker about who doesn't get to present at the conference, the rate at which community patches are accepted and my inability to republish an edited version of MySQL docs. MySQL would otherwise be on a roll right now with the progress they have made on 5.1 (it is a great release) and with work in progress for future releases.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Wow, maybe the &lt;a href="http://www.reuters.com/article/mergersNews/idUSN0310376420091203"&gt;GPL means something&lt;/a&gt;. Eben Moglen finds factual errors in the still-secret statement of objections.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2241581870985370281?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2241581870985370281/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/12/oracle-rdbms-mysql-rdbms.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2241581870985370281'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2241581870985370281'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/12/oracle-rdbms-mysql-rdbms.html' title='Oracle RDBMS != MySQL RDBMS'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-4097833373580472137</id><published>2009-11-10T09:51:00.000-08:00</published><updated>2010-03-28T08:11:33.198-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Time to get it done</title><content type='html'>I am neither for nor against the Oracle-Sun merger. I am against the damage done by extending the uncertainty on the outcome of the deal. MySQL as an organization is in great shape. The 5.1 release turned out better than some expected. The InnoDB plugin is excellent. What is the roadmap? MySQL is limited in what they can say about their future. That hurts all users and customers.&lt;br /&gt;&lt;br /&gt;A lot of nonsense has been written about this. As MySQL employees cannot write about it, the discussion has been one sided, full of speculation and justified by quotes from random people. I almost provided a few quotes myself when contacted by someone I thought was a potential MySQL customer.&lt;br /&gt;&lt;br /&gt;I am neither a lawyer nor an economist, so I don't understand their notion of competition as applied to this issue. I wish that were clear. I don't think that the 8-year old E-Week benchmark implies anything about whether MySQL and Oracle compete. Nor do I think that a few slides from a project at Sun that failed to migrate Oracle customers to MySQL is evidence of that. &lt;a href="http://news.cnet.com/8301-13505_3-10370162-16.html"&gt;Marten's letter&lt;/a&gt; set a high standard for the discussion. I hope others follow it.&lt;br /&gt;&lt;br /&gt;Clearly competition isn't defined by revenue as that is something between $100M and $300M per year. The database market is much larger than that. For better or worse, MySQL has not done a good job of monetizing their users. Maybe they have not tried to do that as their value seems to be independent of their revenue. But someone has to fund the development of MySQL.&lt;br /&gt;&lt;br /&gt;I have worked on source code for the Oracle and MySQL database servers. I have used MySQL in production. Some parts of MySQL are amazing (InnoDB, JDBC, support, docs, bug database, server uptime, ease of use, NDB) but in no way do they compete on a feature basis. I hope that never changes as MySQL would be ruined were it to become as complex as Oracle. I am sure they compete for some customers who don't need all of the features provided by Oracle. But that competition includes Sybase, Microsoft and IBM.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-4097833373580472137?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/4097833373580472137/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/11/time-to-get-it-done.html#comment-form' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/4097833373580472137'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/4097833373580472137'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/11/time-to-get-it-done.html' title='Time to get it done'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-6617923741002050480</id><published>2009-11-05T09:25:00.000-08:00</published><updated>2010-03-28T08:11:33.198-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='shill'/><title type='text'>InnoDB plugin gets better again</title><content type='html'>Forgive me for being a shill, but InnoDB &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/innodb-buffer-pool.html"&gt;appears to have added a feature&lt;/a&gt; for the next release of the InnoDB plugin that prevents the buffer pool from getting wiped out by a full table scan. Many people &lt;a href="http://www.mysqlperformanceblog.com/2007/10/26/heikki-tuuri-innodb-answers-part-i/"&gt;have requested&lt;/a&gt; this. &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/innodb-buffer-pool.html"&gt;The documentation&lt;/a&gt; is excellent. I have tested it and not only did it work as advertised, but it didn't degrade performance on OLTP workloads. This fixes &lt;a href="http://bugs.mysql.com/bug.php?id=45015"&gt;bug 45015&lt;/a&gt; and is a nice feature to have when you occasionally use mysqldump to copy a table from a busy OLTP server. Now is a good time to evaluate MySQL 5.1 with the InnoDB plugin.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-6617923741002050480?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/6617923741002050480/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/11/innodb-plugin-gets-better-again.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6617923741002050480'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6617923741002050480'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/11/innodb-plugin-gets-better-again.html' title='InnoDB plugin gets better again'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-8811814766096622213</id><published>2009-10-27T10:19:00.000-07:00</published><updated>2010-03-28T08:11:33.199-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Managed MySQL -- Amazon RDS</title><content type='html'>Managed MySQL is here. &lt;a href="http://aws.amazon.com/rds"&gt;Amazon RDS&lt;/a&gt; allows you to run MySQL on their hardware. It isn't perfect, but I think this is a great first release. I expect this will support PostgreSQL soon given that the command-line tools are not MySQL specific.&lt;br /&gt;&lt;br /&gt;Note:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;This uses MySQL 5.1.38&lt;/li&gt;&lt;li&gt;I did not see an option to enable SSH connections to MySQL. I think that is required for this to be a great way to run MySQL. &lt;br /&gt;&lt;/li&gt;&lt;li&gt;This supports MyISAM and InnoDB. They don't give you command line access to the machines, so you cannot run myisamchk to recover corrupt MyISAM tables, nor can you run myisampack to compress them. I think it is a good idea to stick with InnoDB and then ask Amazon to upgrade to the InnoDB 1.0.4+ plugin.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;This appears to use network attached storage for most data. For example, innodb_data_home_dir=/rdsdbdata/db/innodb. I am not sure whether this buffers data in the OS buffer cache and if it does not, that will hurt MyISAM performance as it does not buffer table data.&lt;/li&gt;&lt;li&gt;Replication is disabled. That makes it much easier to run many instances of MySQL in the environment. Replication state is not crash proof and Amazon probably does not want to spend their days recovering/replacing/rebuilding slaves. But that also limits the use of this for read scale out. Maybe Amazon and RightScale have something in progress to change that without introducing manageability overhead.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The master user does not have SHUTDOWN, SUPER or replication privileges. &lt;br /&gt;&lt;/li&gt;&lt;li&gt;Binlogs are enabled, but the master user does not have privileges to run SHOW MASTER STATUS. The documents state that databases can be recovered up to the last 5 minutes. I assume this means that any writes done are guaranteed to be archived somewhere after 5 minutes. If there were an option to archive the binlogs, then that would provide an extra degree of safety.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-8811814766096622213?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/8811814766096622213/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/10/managed-mysql-amazon-rds.html#comment-form' title='17 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8811814766096622213'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8811814766096622213'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/10/managed-mysql-amazon-rds.html' title='Managed MySQL -- Amazon RDS'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>17</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-6311365403796360396</id><published>2009-10-16T11:57:00.000-07:00</published><updated>2010-03-28T08:11:33.199-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Be careful with FLUSH TABLES WITH READ LOCK</title><content type='html'>Be careful when using &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/flush.html"&gt;FLUSH TABLES WITH READ LOCK&lt;/a&gt; (aka FTWRL). I have &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/flush.html"&gt;written about potential problems&lt;/a&gt; that may occur when using FTWRL. Anyone who runs ibbackup or xtrabackup on a server that writes a binlog needs FTWRL to run as fast as possible with as few problems as possible, but that is not always the case. In its current form, you must monitor FTWRL and either kill it or long-running queries when FTWRL takes too long.&lt;br /&gt;&lt;br /&gt;MySQL does three things when processing FTWRL. First it sets the global read lock. Then it closes open tables. Finally it sets a flag to block commits. You will have problems in production when FTWRL doesn't return quickly. It doesn't return quickly when there are long running queries as it waits for the current queries to finish. The problem is that insert, update, delete and replace statements are blocked after the first step. When FTWRL lingers in the second step (close open tables), then your server will stop accepting writes. An additional problem is that for deployments with many open tables, it is a lot of work to close and then re-open them. I need to confirm whether re-open is done serially because a mutex is held and whether InnoDB re-samples all indexes on all reopened tables to get optimizer statistics.&lt;br /&gt;&lt;br /&gt;I blame MyISAM for the current problems. As I am not a MyISAM expert, this is an educated guess and I welcome your feedback. The problem with FTWRL is FT (flush tables) and MyISAM is the reason that tables must be flushed. The &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/server-options.html#option_mysqld_delay-key-write"&gt;--delay-key-write option&lt;/a&gt; and possibly other features in MyISAM allow open tables to buffer committed changes. The buffered changes are written to MyISAM data files when the open table is closed.&lt;br /&gt;&lt;br /&gt;INSERT DELAYED might also cause problems, but anybody who needs a hot backup shouldn't be using that option.&lt;br /&gt;&lt;br /&gt;I think we can make this better and my solution is DFTBGRL (don't flush tables but get read lock). Maybe it needs a better name. DFTBGRL skips the second step of FTWRL -- it sets the global read lock and then it sets a flag to block commits. This should be much safer to use in production.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update:&lt;/b&gt;&lt;br /&gt;After I wrote this, Harrison and Konstantin from MySQL/Sun gave me advice on a better way to fix this. I have implemented their advice and it appears to work, but I need to test it. The result will be much better than FTWRL for InnoDB.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-6311365403796360396?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/6311365403796360396/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/10/be-careful-with-flush-tables-with-read.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6311365403796360396'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6311365403796360396'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/10/be-careful-with-flush-tables-with-read.html' title='Be careful with FLUSH TABLES WITH READ LOCK'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2843477819706210371</id><published>2009-09-12T12:58:00.000-07:00</published><updated>2010-03-28T08:11:33.200-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>OpenSQL Camp</title><content type='html'>My travel is booked for &lt;a href="http://www.opensqlcamp.org/Events/Portland2009/"&gt;OpenSQL camp in Portland&lt;/a&gt; on November 14 and 15. It should be a great event in my favorite city. It is also an opportunity to speak with technical people about something other than MySQL. The current sessions are &lt;a href="http://www.opensqlcamp.org/Events/Portland2009/Sessions"&gt;skewed towards MySQL&lt;/a&gt;, but Portland has an active PostgreSQL community and is home to Len Shapiro who &lt;a href="http://portal.acm.org/citation.cfm?id=6315"&gt;contributed a lot&lt;/a&gt; to the development of high-performance hash joins. I hope there is some kind of PostgreSQL-MySQL exchange. I have yet to propose a topic, but am considering &lt;a href="http://dev.mysql.com/tech-resources/articles/4.1/gis-with-mysql.html"&gt;MySQL GIS&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/MUMPS"&gt;MUMPS&lt;/a&gt;, &lt;a href="http://www.innodb.com/products/embedded-innodb/"&gt;embedded InnoDB&lt;/a&gt; or the &lt;a href="http://www.innodb.com/products/innodb_plugin/"&gt;InnoDB plugin&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2843477819706210371?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2843477819706210371/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/09/opensql-camp.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2843477819706210371'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2843477819706210371'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/09/opensql-camp.html' title='OpenSQL Camp'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-5637350588898880373</id><published>2009-09-05T10:55:00.000-07:00</published><updated>2010-03-28T08:11:33.201-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='optimizer'/><title type='text'>Explaining subqueries in the FROM clause</title><content type='html'>I was debugging the performance of a DELETE statement that contained a subquery in the FROM clause. As there is no EXPLAIN for DELETE, I converted it to a SELECT statement (and hoped the same optimizations were done). But I still had to wait for EXPLAIN to complete. EXPLAIN evaluates subqueries in the FROM clause for MySQL. This can make EXPLAIN take a long time and create load on a server. Recent versions of MySQL have had many improvements for subquery optimization, but the &lt;a href="http://dev.mysql.com/doc/refman/6.0/en/unnamed-views.html"&gt;documentation&lt;/a&gt; for all versions states that this is still done. A &lt;a href="http://bugs.mysql.com/bug.php?id=44802"&gt;feature request is open&lt;/a&gt; to change this. Feature requests are also open to get EXPLAIN for &lt;a href="http://bugs.mysql.com/bug.php?id=14745"&gt;UPDATE&lt;/a&gt;, &lt;a href="http://bugs.mysql.com/bug.php?id=35355"&gt;INSERT and DELETE&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Do other RDBMS products support EXPLAIN for subqueries in a FROM clause without evaluating the subquery?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-5637350588898880373?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/5637350588898880373/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/09/explaining-subqueries-in-from-clause.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5637350588898880373'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5637350588898880373'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/09/explaining-subqueries-in-from-clause.html' title='Explaining subqueries in the FROM clause'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-8541309914511904083</id><published>2009-09-01T18:03:00.000-07:00</published><updated>2010-03-28T08:11:33.201-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='myisam'/><title type='text'>Blame it on MyISAM</title><content type='html'>I reviewed most of the changes from the v4 Google patch today. My head hurts now. During this review I checked whether bugs fixed in the patch have also been fixed in recent releases of official MySQL. I am happy that most of them have been fixed. But some changes will never be accepted, such as the one that added support for INF for FLOAT/DOUBLE columns.&lt;br /&gt;&lt;br /&gt;The default value of &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html"&gt;sql_mode&lt;/a&gt; is the empty string. You probably want to change that before your applications come to depend on it. When it is the empty string, invalid values are coerced to valid values on INSERT and UPDATE and a warning is returned. Applications usually ignore the warnings. The coercion includes: &lt;br /&gt;&lt;ul&gt;&lt;li&gt;INT values that are too big are set to the maximum value of an INT. The same is done for BIGINT&lt;br /&gt;&lt;/li&gt;&lt;li&gt;INF is changed to MAX_DOUBLE or MAX_FLOAT for a DOUBLE/FLOAT column&lt;/li&gt;&lt;li&gt;varchar and LOB columns are truncated to not exceed the maximum length&lt;/li&gt;&lt;li&gt;invalid DATE values are accepted &lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;What needs this behavior? MyISAM. For a storage engine that doesn't do rollback, one way to handle invalid data during an INSERT or UPDATE statement is to coerce it to valid values and proceed with the statement. I am not fond of this approach. An alternative for data warehouse workloads is to use an exception table to log rows with invalid data and avoid corrupting non-exception tables.&lt;br /&gt;&lt;br /&gt;MyISAM has also made replication semantics and internals much more complex. For example, what is written to the binlog in this case, and has this behavior changed between releases?&lt;br /&gt;&lt;pre&gt;begin;&lt;br /&gt;insert into Innodb_table values (1);&lt;br /&gt;insert into Myisam_table values (1);&lt;br /&gt;rollback;&lt;br /&gt;&lt;/pre&gt;I think that MyISAM has its place. It does fast table scans, but InnoDB is much faster on just about everything else. I am just not thrilled with the impact it has had on MySQL. It can be used for tasks where a table or partition is loaded once and then made readonly after the insert. This is a good fit for data warehouse tasks. Although it would be better were multi-core performance improved and the key cache expanded to include data blocks. MyISAM can also be used for scratch tables on a slave. &lt;br /&gt;&lt;br /&gt;Drizzle avoided these problems by limiting MyISAM to temporary tables.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-8541309914511904083?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/8541309914511904083/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/09/blame-it-on-myisam.html#comment-form' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8541309914511904083'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8541309914511904083'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/09/blame-it-on-myisam.html' title='Blame it on MyISAM'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2534258952579930366</id><published>2009-09-01T13:37:00.000-07:00</published><updated>2010-03-28T08:11:33.202-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Fun with user variables</title><content type='html'>What type is used for the expression returned by this SELECT statement?&lt;br /&gt;&lt;pre&gt;set @x = 1e300; select @x&lt;br /&gt;&lt;/pre&gt;It depends:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;MySQL 4.0, 5.1 - double&lt;br /&gt;&lt;/li&gt;&lt;li&gt;MySQL 4.1, 5.0 - string&lt;/li&gt;&lt;/ul&gt;Note that this table uses double in all releases:&lt;br /&gt;&lt;pre&gt;set @x = 1e300; create table tt as select @x&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2534258952579930366?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2534258952579930366/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/09/fun-with-user-variables.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2534258952579930366'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2534258952579930366'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/09/fun-with-user-variables.html' title='Fun with user variables'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-105843751019682390</id><published>2009-08-27T08:32:00.000-07:00</published><updated>2010-03-28T08:11:33.202-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='windows'/><title type='text'>MySQL on windows is irrelevant (to me)</title><content type='html'>Henrik &lt;a href="http://openlife.cc/blogs/2009/august/mariadb-release-plan-and-other-news-mp-company-meeting"&gt;mentioned the problem&lt;/a&gt; of patches for MariaDB that ignore the Windows platform. I am guilty of this as MySQL on Windows is irrelevant to me. I am not alone. What contributions do we get from the MySQL on Windows community in patches, performance tests or bug reports?&amp;nbsp;I have never seen a benchmark result for MySQL on Windows.&lt;br /&gt;&lt;br /&gt;There is nothing wrong with this. The relationship between users and vendors for MySQL on Windows is different. This is an opportunity for vendors (Sun/MySQL, Monty Program, Innobase/Oracle, Percona, Pythian, Open Query) to add value. I suspect that Sun/MySQL and Innobase/Oracle already care a lot about MySQL on Windows. But maybe they should offer a discount for MySQL on Linux users to offset our contributions (insert smiley face).&lt;br /&gt;&lt;br /&gt;As an example of the value that can be added, I filed &lt;a href="http://bugs.mysql.com/bug.php?id=46957"&gt;bug 46957&lt;/a&gt; because InnoDB does not appear to support concurrent IO requests per file on Windows. That is, there can be either 1 pending read or 1 pending write to a file. That might hurt performance. I &lt;a href="http://mysqlha.blogspot.com/2009/03/does-anyone-really-use-mysqlinnodb-on.html"&gt;wrote about this in March&lt;/a&gt; and while there were assertions that this couldn't be the case I am not aware of any progress. The basis for my claim is from reading &lt;a href="http://jpipes.com/lcov/storage/innobase/os/os0file.c.gcov.html"&gt;the source&lt;/a&gt;. Maybe someone who builds and/or runs MySQL on Windows can confirm or deny this.&lt;br /&gt;&lt;br /&gt;If you want to learn more about systems programming on Windows, this is the &lt;a href="http://www.amazon.com/Windows-Programming-Addison-Wesley-Microsoft-Technology/dp/0321256190"&gt;book&lt;/a&gt; to buy. I have an earlier edition of it. One day I will read it. I tried to learn more about WriteFile, ReadFile and why win32/win64 don't support the equivalent of pread/pwrite by searching online for 'WriteFile API' and 'WriteFile API site:mysql.com'. The results were disappointing. Am I expected to buy a copy of the API docs?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-105843751019682390?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/105843751019682390/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/08/mysql-on-windows-is-irrelevant-to-me.html#comment-form' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/105843751019682390'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/105843751019682390'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/08/mysql-on-windows-is-irrelevant-to-me.html' title='MySQL on windows is irrelevant (to me)'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-6070132086311294485</id><published>2009-08-12T07:39:00.000-07:00</published><updated>2010-03-28T08:11:33.203-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>A reason to use 5.1</title><content type='html'>We now have 2 great storage engines for 5.1 -- InnoDB 1.0.4 and XtraDB. We need more performance results to understand InnoDB 1.0.4, but it looks excellent from the code I have reviewed. This describes some of the changes based on a brief review. All of this make my work easier as I can reduce the size of the patch I need to maintain extreme performance with MySQL.&lt;br /&gt;&lt;br /&gt;Kudos to InnoDB for delivering these features in 5.1 and to Percona and Google for contributing patches. &lt;br /&gt;&lt;ol&gt;&lt;li&gt;support for more background IO threads - InnoDB and XtraDB support a configurable number of background IO threads for prefetch reads and dirty page writes. The my.cnf parameters are innodb_read_io_threads and innodb_write_io_threads. Prefetch read requests are generated during queries and when insert buffer entries must be merged. For InnoDB, IO requests are hashed by extent number (64 16kb pages per extent) to the per thread request queues although when a request queue is full, then a request will use any queue. Each queue can hold 256 pending requests. I assume the code in XtraDB is the same. The Google patch uses one queue for all read or write threads which should provide better throughput when there are hot extents, but also requires many more changes to the current source.&lt;/li&gt;&lt;li&gt;support for group commit - not only does this fix an old regression, I think that it also fixes &lt;a href="http://bugs.mysql.com/bug.php?id=46459"&gt;bug 46459&lt;/a&gt; which degrades performance when autocommit insert statements are used on tables with an auto increment column.&lt;/li&gt;&lt;li&gt;adaptive flushing - one of the things that makes InnoDB is the use of adaptive algorithms to keep the server balanced. Many of these are not documented because they work and we don't need to know about them. They may have added a new one with support for adaptive flushing. I hope to see performance results from Percona for this, but I think this is another thing in InnoDB we can soon forget as it will work without problems.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;readahead - prior to 1.0.4, InnoDB could generate read prefetch requests when it detected sequential or random access to most pages in an extent. For 1.0.4, the use of readahead for random access to pages within an extent appears to have been removed. The use of readahead for sequential access to pages within an extent has been changed to use a new my.cnf parameter, innodb_read_ahead_threshold, that sets the number of pages that must be accessed sequentially within an extent before all of the pages in the physically adjacent extent will be prefetched. I am still not fond of this feature because:&lt;/li&gt;&lt;/ol&gt;&lt;ul&gt;&lt;ul&gt;&lt;li&gt;I am not aware of any performance counters that report on the success of readahead (#fetched versus #fetched_and_used). But you can disable readahead now and measure the impact on your application.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Prefetch requests for the pages in the next extent are generated late. For example, if innodb_read_ahead_threshold=56, then requests are generated when the 56th (out of 64) page in the current extent is used. &lt;/li&gt;&lt;li&gt;If request merging is done for all of the pages in the next extent, then a 1MB read will be used and none of the pages can be accessed until the read completes.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-6070132086311294485?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/6070132086311294485/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/08/reason-to-use-51.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6070132086311294485'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6070132086311294485'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/08/reason-to-use-51.html' title='A reason to use 5.1'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-8245036642178120004</id><published>2009-08-04T11:20:00.000-07:00</published><updated>2010-03-28T08:11:33.203-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>Fast count(*) for InnoDB</title><content type='html'>Why must SELECT COUNT(*) FROM FOO run fast? It is much more valuable to make that query fast when it has a WHERE clause. When there isn't a where clause, MyISAM executes SELECT COUNT(*) FROM FOO fast. When there is a WHERE clause, MySQL has limited support for combining index scans but nothing like bitmap indexes.&lt;br /&gt;&lt;br /&gt;If you must, the following will make SELECT COUNT(*) FROM FOO fast for InnoDB:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Install INSERT and DELETE triggers to maintain the row count for the InnoDB table in a metadata table. This will kill concurrency on the table, just like MyISAM.&lt;/li&gt;&lt;li&gt;Modify the parser to accept SELECT ESTIMATED_COUNT(*) and evaluate this internally to use the row count estimate provided by the storage engine. Is the estimate good enough? If an exact answer is needed and the table is not locked after the exact value is computed, then the exact value will quickly become incorrect.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-8245036642178120004?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/8245036642178120004/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/08/fast-count-for-innodb.html#comment-form' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8245036642178120004'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8245036642178120004'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/08/fast-count-for-innodb.html' title='Fast count(*) for InnoDB'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-4630678248926146552</id><published>2009-07-29T09:26:00.000-07:00</published><updated>2010-03-28T08:11:33.204-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='other'/><title type='text'>Who knew Ingres had it in them?</title><content type='html'>&lt;a href="http://www.vectorwise.com/index_js.php"&gt;VectorWise&lt;/a&gt; has emerged from stealth mode. Even more interesting is that Ingres is collaborating with them on &lt;a href="http://www.ingres.com/vectorwise/index.php"&gt;Ingres VectorWise&lt;/a&gt;. And most interesting is the &lt;a href="http://www.youtube.com/watch?v=ViUL769ilro"&gt;video made to promote&lt;/a&gt; the work. In the interest of disclosure, I have friends who work on this at Ingres and VectorWise.&lt;br /&gt;&lt;br /&gt;I have read many of the papers on X100 and MonetDB and believe the performance claims they make. Within the MySQL community, this is most similar to Kickfire in terms of the query workloads it can support. The difference is that VectorWise writes their software to run at full speed on a modern CPU and maintain a high IPC rate while Kickfire uses custom hardware (and a lot of really good software that they probably cannot describe without an NDA). Anyone attempting to build a VectorWise storage engine would have to do all of the &lt;b&gt;fun things&lt;/b&gt; that KickFire has done to circumvent much of the MySQL optimizer and execution code, so I doubt that will be repeated.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-4630678248926146552?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/4630678248926146552/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/07/who-knew-ingres-had-it-in-them.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/4630678248926146552'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/4630678248926146552'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/07/who-knew-ingres-had-it-in-them.html' title='Who knew Ingres had it in them?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-3434006664490968703</id><published>2009-07-24T08:47:00.000-07:00</published><updated>2010-03-28T08:11:33.204-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>It is alive -- Monty Program AB</title><content type='html'>I have been curious about Monty Program AB as their mailing list has been relatively quiet. They have been too busy to write much as they are setting up the company, talking to customers and doing real work including debugging the problems with MRR/BKA/ICP and InnoDB (thanks Igor and Sergey).&lt;br /&gt;&lt;br /&gt;I will let them reveal their plans for growing the MariaDB community but I hope it includes my agenda of improving MySQL replication. &lt;a href="http://drizzle.org/"&gt;Drizzle&lt;/a&gt; has a very clean interface for providing different replication implementations. I think the same is possible with &lt;a href="http://www.askmonty.org/"&gt;MariaDB&lt;/a&gt;. It won't be as clean as the API in Drizzle because it has not diverged from MySQL source, but it should allow us to provide interesting alternatives (sync replication, conflict resolution with multi-master) in a reasonable amount of time. With such an interface and with row-based replication we can make it much easier and less expensive to manage a large MySQL deployment.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-3434006664490968703?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/3434006664490968703/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/07/it-is-alive-monty-program-ab.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3434006664490968703'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3434006664490968703'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/07/it-is-alive-monty-program-ab.html' title='It is alive -- Monty Program AB'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-7649483785601072210</id><published>2009-07-23T11:02:00.000-07:00</published><updated>2010-03-28T08:11:33.205-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ha'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Relevance in the datacenter</title><content type='html'>Do you know SQL or do you &lt;a href="http://www.dbms2.com/2009/07/01/nosql-sql-alternative/"&gt;NoSQL&lt;/a&gt;? MySQL has been very popular for internet-scale deployments. But times have changed and there are alternatives. The alternatives either out-scale or out-avail MySQL and this is more important than providing the features of an RDBMS for many applications. My prediction is that there will be much less usage of MySQL for internet-scale applications in the future if we do not make big changes.&lt;br /&gt;&lt;br /&gt;What are the problems and what can we do to fix them? From my perspective there are two problems:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;MySQL is not efficient on modern hardware (multicore, many disk IOPs)&lt;/li&gt;&lt;li&gt;Replication is very expensive to manage&lt;/li&gt;&lt;/ol&gt;We are in the process of fixing the first problem for InnoDB and Percona has binaries you can use in production today that make things much better. However many problems remain that limit throughput on servers with 8+ cores and there is little visible work in progress to fix them (MyISAM, query cache, LOCK_table, ...). This is a serious issue as 8 cores is or will soon be the new common box in the datacenter and price/performance comparisons will get much worse for MySQL.&lt;br /&gt;&lt;br /&gt;Replication requires much more work. I want more automation and more flexibility.&lt;br /&gt;&lt;br /&gt;The lack of automation is apparent when you consider the replication related errors that require manual intervention. These errors are frequent or constant when you run a large number of MySQL servers. It is very expensive to support MySQL in this environment. Actions that must be automated include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt; the promotion of a slave to a master after the failure of the master&lt;/li&gt;&lt;li&gt;failover of slaves to the new master&lt;/li&gt;&lt;/ul&gt;I also want the flexibility to extend replication. I have participated in the development of many replication enhancements (semi-sync, mirror binlog, global group IDs) and that effort has been incredibly difficult. I am still amazed at what Wei and Justin were able to accomplish. I doubt that anyone would ever volunteer for such a project (I was paid). The code is not fun to modify.&lt;br /&gt;&lt;br /&gt;I have more ideas to improve replication but it isn't clear to me that I can afford the cost to modify the replication code in official MySQL. But then I looked at the code for &lt;a href="http://drizzle.org/"&gt;Drizzle&lt;/a&gt;. Wow! The code is clean, easy to read and easy to modify. So I still have hope for MySQL-related technology in the datacenter, but in the form of Drizzle.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-7649483785601072210?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/7649483785601072210/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/07/relevance-in-datacenter.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7649483785601072210'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7649483785601072210'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/07/relevance-in-datacenter.html' title='Relevance in the datacenter'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-5257295201986051554</id><published>2009-07-22T13:53:00.000-07:00</published><updated>2010-03-28T08:11:33.206-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='free'/><title type='text'>No docs for you</title><content type='html'>I use http://dev.mysql.com every day. But not today as the site is down. There is a mirror of the docs at &lt;a href="http://docs.sun.com/source/mysql-refman-5.0/index.html"&gt;http://docs.sun.com/source/mysql-refman-5.0/index.html&lt;/a&gt;. Although the docs license is one of my &lt;a href="http://mysqlha.blogspot.com/2009/07/listen-to-monty.html"&gt;favorite topics&lt;/a&gt;, I am not a lawyer. Does the license not allow others to serve a copy of the docs?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-5257295201986051554?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/5257295201986051554/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/07/no-docs-for-you.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5257295201986051554'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5257295201986051554'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/07/no-docs-for-you.html' title='No docs for you'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-5410874389872916331</id><published>2009-07-19T09:38:00.000-07:00</published><updated>2010-03-28T08:11:33.206-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Crashing bugs</title><content type='html'>What does it mean to release a server with &lt;b&gt;no known crashing bugs&lt;/b&gt;? I don't know. A lot of this depends on your perspective. Perhaps old releases of MySQL were done with no known crashing bugs, but this means that testing for current releases is much more rigorous. If you don't believe me then read some of the bug updates from Shane and others to understand how good they are at testing it. &lt;br /&gt;&lt;br /&gt;I judge quality by whether I can depend on the code in production. Both 4.0 and 5.0 have been stable in production for me and both have had crashing bugs. I am sure that MySQL 5.0 releases have had more known bugs than 4.0 releases as 5.0 has more features, a larger community testing it and is run on much larger hardware (more RAM, more disks, multicore, more load). Note that my perception of 5.0 is based on using it with many features disabled including query cache, subqueries, stored procedures, triggers and views.&lt;br /&gt;&lt;br /&gt;The quality metric that I am interested in is the number of major releases for a feature to become stable in production and feature complete. I don't mind when this number if more than 1 as long as I can disable the feature. But this number should never exceed 2.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-5410874389872916331?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/5410874389872916331/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/07/crashing-bugs.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5410874389872916331'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5410874389872916331'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/07/crashing-bugs.html' title='Crashing bugs'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-4000061206231477198</id><published>2009-07-16T06:08:00.000-07:00</published><updated>2010-03-28T08:11:33.207-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='free'/><title type='text'>Listen to Monty</title><content type='html'>I don't always agree with Monty but &lt;a href="http://monty-says.blogspot.com/2009/07/helping-us-department-of-justice.html"&gt;this time he is right&lt;/a&gt;. Now is the time to provide feedback on the merger. I wish there were more commentary on plans for MySQL after the merger.&lt;br /&gt;&lt;br /&gt;My concern is the documentation license. If we want to improve MySQL at a faster rate than the docs copyright holder, then we must create separate documentation as we cannot publish a modified version of the official documentation. Several people have written about this before including &lt;a href="http://mysqlha.blogspot.com/2009/04/vendor-lock-in-and-mysql-documentation.html"&gt;me&lt;/a&gt;, &lt;a href="http://www.xaprb.com/blog/2009/05/08/please-re-license-the-mysql-documentation/"&gt;Baron&lt;/a&gt;, &lt;a href="http://www.pythian.com/news/2274/mysql-documentation-licensing-woes"&gt;Sheeri&lt;/a&gt;, &lt;a href="http://openquery.com/blog/mysql-docs-freedom"&gt;Arjen&lt;/a&gt; and &lt;a href="http://swanhart.livejournal.com/126077.html"&gt;Justin&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Despite comments on stage between Karen and Kaj at the User Conference, there have been no changes to the docs license. The lead for the MySQL docs team &lt;a href="http://blogs.sun.com/mysqlf/entry/mysql_documentation_no_license_change"&gt;explained to us&lt;/a&gt; that there will be no change and he deserves kudos for his bravery. I am very disappointed that neither Karen nor Kaj have said anything on this topic since then.&lt;br /&gt;&lt;br /&gt;What else can you do? Find a different community. Check out the activity on the &lt;a href="http://drizzle.org/"&gt;Drizzle&lt;/a&gt; mailing lists. I don't participate in Drizzle but I lurk there and am extremely impressed by the activity on the mailing lists and the changes they make to the code. &lt;a href="http://askmonty.org/wiki/index.php/MariaDB"&gt;MariaDB&lt;/a&gt; has yet to become active for external developers but it is quickly gaining momentum.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-4000061206231477198?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/4000061206231477198/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/07/listen-to-monty.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/4000061206231477198'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/4000061206231477198'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/07/listen-to-monty.html' title='Listen to Monty'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-8020913619863176678</id><published>2009-06-22T08:00:00.000-07:00</published><updated>2010-03-28T08:11:33.207-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Hello Facebook!</title><content type='html'>Today is my first day at &lt;a href="http://facebook.com/"&gt;Facebook&lt;/a&gt;. I am thrilled to be here. This is a great opportunity for me to work on hard problems with talented people. I expect to continue making MySQL faster.&lt;br /&gt;&lt;br /&gt;You can &lt;a href="http://www.facebook.com/Mark.Callaghan"&gt;friend me on Facebook&lt;/a&gt;. I have started a &lt;a href="http://www.facebook.com/group.php?gid=93123303841"&gt;Facebook group&lt;/a&gt; on which MySQL can be discussed. I will start posting there soon. But this is a group so you are welcome to join and post.&lt;br /&gt;&lt;br /&gt;I will be at the Velocity Conference on Tuesday evening. Will anyone else who deploys MySQL be in town?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-8020913619863176678?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/8020913619863176678/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/06/hello-facebook.html#comment-form' title='23 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8020913619863176678'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8020913619863176678'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/06/hello-facebook.html' title='Hello Facebook!'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>23</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-8373660218738124639</id><published>2009-06-16T09:11:00.000-07:00</published><updated>2010-03-28T08:11:33.208-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='oops'/><title type='text'>What could possibly go wrong?</title><content type='html'>The default table type was changed from MyISAM to InnoDB on a few production servers. There were good reasons for doing this. Besides, what could possibly go wrong? Sometimes the status quo has value even when you can't elaborate it.&lt;br /&gt;&lt;br /&gt;MyISAM tables are created in several cases:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;CREATE TABLE (...) engine=MyISAM&lt;/li&gt;&lt;li&gt;CREATE TABLE (...) ; // when the default storage engine is MyISAM&lt;/li&gt;&lt;li&gt;implicit temporary tables created for ORDER BY and GROUP BY processing that are too large are converted from HEAP to MyISAM&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;What was the problem?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Users began reporting ER_LOCK_TABLE_FULL errors during long running CREATE TABLE AS SELECT statements. These statements did not specify the storage engine type and began using InnoDB. From reading the code, this error is raised when buf_LRU_buf_pool_running_out returns TRUE during an insert statement. It returns true when less than 25% of the memory allocated for innodb_buffer_pool_size is available for the buffer pool. Memory from this allocation can be used elsewhere including for lock structures.&lt;br /&gt;&lt;br /&gt;I need to investigate this to determine whether the locks were used for the created table or the selected table. I don't think the locks have to be used in either case. Locks on rows in the created table might not be needed because the table is not visible to other sessions until the CTAS statement completes (right?). In this case it was CREATE TEMPORARY TABLE ... SELECT, so the table will never be visible to others. Share locks on rows in the selected table might have been taken to guarantee deterministic replay in replication. But this statement was run on a slave, so that should not be needed. I have dealt with this particular problem in the past.&lt;br /&gt;&lt;br /&gt;When innodb_locks_unsafe_for_binlog is set in my.cnf, then share locks are not obtained on rows read from the selected table. InnoDB really should be clever enough to do this when the binlog is not open.&lt;br /&gt;&lt;br /&gt;Lock structs appear are not allocated for the inserted table regardless of the value of innodb_locks_unsafe_for_binlog, so I have yet to determine the source of memory allocations from the buffer pool. The binary we use has behavior similar to innodb_locks_unsafe_for_binlog but based on whether or not the binlog is open.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Still a mystery&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Memory is allocated from the buffer pool for:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Buffer pool frames&lt;/li&gt;&lt;li&gt;Row locks (trx-&amp;gt;lock_heap) - but row locks should not be allocated for this statement given the fix or if innodb_locks_unsafe_for_binlog were used.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Recovery (recv_sys-&amp;gt;heap) - this looks like code that is only run during crash recovery when the server is started.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Adaptive hash index (btr_search_sys-&amp;gt;hash_index) - this should use no more than a fixed amount of memory as the number of pages to be indexed is fixed.&lt;/li&gt;&lt;/ol&gt;So, given the above, I don't know what is using memory from the buffer pool. But I learned a bit more about InnoDB, so all is not lost.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Notes from 5.0.77&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;A call path to buf_LRU_get_free_block is:&lt;br /&gt;&amp;nbsp;&amp;nbsp; mem_heap_create_block -&amp;gt; buf_frame_alloc -&amp;gt; buf_block_alloc&lt;br /&gt;&lt;br /&gt;And mem_heap_create_block is called by (search for MEM_HEAP_BUFFER)&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;mem_heap_create_func&lt;/li&gt;&lt;li&gt;mem_heap_add_block&lt;/li&gt;&lt;/ul&gt;So the callers are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;mem_heap_create_in_buffer&lt;/li&gt;&lt;ul&gt;&lt;li&gt;used for trx-&amp;gt;lock_heap, recv_sys-&amp;gt;heap, ha_create &lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;mem_heap_create_in_btr_search&lt;/li&gt;&lt;ul&gt;&lt;li&gt;used for ha_create&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;And ha_create is used for btr_search_sys-&amp;gt;hash_index&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-8373660218738124639?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/8373660218738124639/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/06/what-could-possibly-go-wrong.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8373660218738124639'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8373660218738124639'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/06/what-could-possibly-go-wrong.html' title='What could possibly go wrong?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2320784672142273236</id><published>2009-06-15T20:14:00.000-07:00</published><updated>2010-03-28T08:11:33.208-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>Alternate page sizes in InnoDB</title><content type='html'>InnoDB supports multiple page sizes. The default page size is 16kb and compiled into the binary/plugin. The valid page sizes are 8kb, 16kb, 32kb and 64kb. However, there is a known bug for page sizes &amp;gt; 16kb with row_format=COMPACT. Also, there are few deployments with a page size other than 16kb, so the amazing reliability of InnoDB may degrade for other page sizes until the bugs are found and fixed.&lt;br /&gt;&lt;br /&gt;To use an alternate page size, edit innobase/include/univ.i to change these:&lt;br /&gt;&lt;blockquote&gt;/* The universal page size of the database */&lt;br /&gt;#define UNIV_PAGE_SIZE&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (2 * 8192)&lt;br /&gt;/* The 2-logarithm of UNIV_PAGE_SIZE: */&lt;br /&gt;#define UNIV_PAGE_SIZE_SHIFT&amp;nbsp;&amp;nbsp;&amp;nbsp; 14&lt;/blockquote&gt;Does InnoDB support 4kb pages? Maybe. If you edit the fields listed above for a 4kb page size and run MySQL, you will quickly get a core dump. The problem is that there are other per-page objects that are too big for a 4kb page. The first problem I encountered is fixed by editing innobase/include/trx0rseg.h to reduce the number of rollback segment slots per page. The default value is:&lt;br /&gt;&lt;blockquote&gt;#define TRX_RSEG_N_SLOTS&amp;nbsp;&amp;nbsp;&amp;nbsp; 1024&lt;/blockquote&gt;To be really safe, I set it to 256 and the segfault was fixed. I haven't done testing beyond this so your mileage may vary.&lt;br /&gt;&lt;br /&gt;InnoDB had a &lt;a href="http://mysqlha.blogspot.com/2008/11/innodb-memory-overhead.html"&gt;900 byte per-page overhead&lt;/a&gt; when I last measured this. 900 bytes per 4kb page is much worse than per 16kb page. I am not sure when this will get fixed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2320784672142273236?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2320784672142273236/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/06/alternate-page-sizes-in-innodb.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2320784672142273236'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2320784672142273236'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/06/alternate-page-sizes-in-innodb.html' title='Alternate page sizes in InnoDB'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1103937629488747962</id><published>2009-06-08T08:43:00.000-07:00</published><updated>2010-03-28T08:11:33.209-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>InnoDB, IO-bound OLTP, 2-disk server</title><content type='html'>I have run many tests on mid-size servers to show how &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/Mysql5Patches"&gt;patches&lt;/a&gt; &lt;a href="http://www.percona.com/percona-lab.html"&gt;make&lt;/a&gt; InnoDB faster for IO bound workloads. The servers could do more than 1000 IOPs and the patches make a big difference. But do the patches help on small servers? I ran tpcc-mysql on my home server with 2 SATA disks, 2 CPU cores and 2GB RAM. There were two interesting results:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The v4 Google patch makes things faster.&lt;/li&gt;&lt;li&gt;Only the v4 Google patch enforced innodb_max_dirty_pages_pct=20. InnoDB is not good at enforcing this limit on write-intensive workloads. This problem has not received much press. If you are running a critical OLTP server and want it to recover quickly after a crash then this limit must be enforced. Code has been added to the v4 Google patch to delay user sessions by making them flush dirty pages prior to making more pages dirty if the limit has been exceeded. This is enabled by the my.cnf parameter innodb_check_max_dirty_foreground. Nobody has reviewed this code (hint, hint).&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;I tested the v4 Google patch (5.0.37), &lt;a href="http://www.percona.com/mysql/5.0.77-b13/source/mysql-5.0.77-percona-highperf-b13-src.tar.gz"&gt;Percona highperf b13&lt;/a&gt; (5.0.77) and unmodified MySQL 5.0.77 using &lt;a href="https://code.launchpad.net/%7Epercona-dev/perconatools/tpcc-mysql"&gt;tpcc-mysql&lt;/a&gt;. The server uses 2 disks with SW RAID 0, 1 MB RAID stripe, XFS and the SATA write cache disabled.&lt;br /&gt;&lt;br /&gt;tpcc-mysql was run with 20 warehouses and 8 users with a 600 second warmup and 3600 second measurement period.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Results&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The 5077 binaries were faster than the v4 Google patch at innodb_max_dirty_pages_pct=20 because they did not enforce that limit. I have included a result for the v4 Google patch at innodb_max_dirty_pages_pct=50 to provide an additional reference point.&amp;nbsp;&lt;b&gt; &lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;table border="2"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Binary&lt;/th&gt;&lt;th&gt;TpmC&lt;/th&gt;&lt;th&gt;innodb_max_dirty_pages_pct&lt;/th&gt;&lt;th&gt;Avg %dirty pages&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="center"&gt;v4-5037&lt;/td&gt;&lt;td align="center"&gt;1294&lt;/td&gt;&lt;td align="center"&gt;20&lt;/td&gt;&lt;td align="center"&gt;25.1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="center"&gt;percona-5077&lt;/td&gt;&lt;td align="center"&gt;2112&lt;/td&gt;&lt;td align="center"&gt;20&lt;/td&gt;&lt;td align="center"&gt;60.0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="center"&gt;mysql-5077&lt;/td&gt;&lt;td align="center"&gt;2836&lt;/td&gt;&lt;td align="center"&gt;20&lt;/td&gt;&lt;td align="center"&gt;72.1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;-&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="center"&gt;v4-5037&lt;/td&gt;&lt;td align="center"&gt;3024&lt;/td&gt;&lt;td align="center"&gt;50&lt;/td&gt;&lt;td align="center"&gt;52.8&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;-&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="center"&gt;v4-5037&lt;/td&gt;&lt;td align="center"&gt;3854&lt;/td&gt;&lt;td align="center"&gt;80&lt;/td&gt;&lt;td align="center"&gt;72.0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="center"&gt;percona-5077&lt;/td&gt;&lt;td align="center"&gt;2930&lt;/td&gt;&lt;td align="center"&gt;80&lt;/td&gt;&lt;td align="center"&gt;73.7&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="center"&gt;mysql-5077&lt;/td&gt;&lt;td align="center"&gt;3373&lt;/td&gt;&lt;td align="center"&gt;80&lt;/td&gt;&lt;td align="center"&gt;82.8&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;Throughput over time for v4-5037:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_3rU41dez5TI/Si2AnUdXpuI/AAAAAAAAAU0/kE0Mk0aI9Ws/s1600-h/tpm.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_3rU41dez5TI/Si2AnUdXpuI/AAAAAAAAAU0/kE0Mk0aI9Ws/s320/tpm.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;b&gt;Configuration&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Notes:&lt;br /&gt;&lt;ul&gt;&lt;li&gt; tests were run twice for each binary with innodb_max_dirty_pages_pct set to 20 and 80&lt;/li&gt;&lt;li&gt;the Percona highperf b13 5.0.77 binary uses more memory for the same value of innodb_buffer_pool_size, so I had to reduce that value to 1G&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;The my.cnf settings for the v4 Google patch:&lt;br /&gt;&lt;blockquote&gt;innodb_buffer_pool_size=1200M&lt;br /&gt;innodb_log_file_size=1900M&lt;br /&gt;innodb_flush_log_at_trx_commit=2&lt;br /&gt;innodb_io_capacity=250&lt;br /&gt;innodb_read_io_threads=2&lt;br /&gt;innodb_write_io_threads=2&lt;br /&gt;innodb_max_dirty_pages_pct=80 or 20&lt;br /&gt;innodb_ibuf_max_pct_of_buffer=10&lt;br /&gt;skip_innodb_ibuf_reads_sync&lt;br /&gt;innodb_doublewrite=0&lt;br /&gt;innodb_file_per_table&lt;br /&gt;allow_view_trigger_sp_subquery&lt;br /&gt;skip_innodb_readahead_random&lt;br /&gt;skip_innodb_readahead_sequential&lt;br /&gt;innodb_flush_method=O_DIRECT&lt;br /&gt;innodb_check_max_dirty_foreground&lt;/blockquote&gt;The my.cnf settings for Percona highperf b13 5.0.77:&lt;br /&gt;&lt;blockquote&gt;innodb_log_file_size=1900M&lt;br /&gt;innodb_buffer_pool_size=1000m&lt;br /&gt;innodb_flush_log_at_trx_commit=2&lt;br /&gt;innodb_flush_method=O_DIRECT&lt;br /&gt;innodb_io_capacity=250&lt;br /&gt;innodb_read_io_threads=2&lt;br /&gt;innodb_write_io_threads=2&lt;br /&gt;innodb_max_dirty_pages_pct=80&lt;br /&gt;innodb_ibuf_max_size=120M&lt;br /&gt;innodb_ibuf_active_contract=1&lt;br /&gt;innodb_ibuf_accel_rate=200&lt;br /&gt;innodb_doublewrite=0&lt;br /&gt;innodb_read_ahead=0&lt;br /&gt;innodb_adaptive_checkpoint=1&lt;/blockquote&gt;The my.cnf settings for 5.0.77:&lt;br /&gt;&lt;blockquote&gt;innodb_buffer_pool_size=1200M&lt;br /&gt;innodb_log_file_size=1900M&lt;br /&gt;innodb_flush_log_at_trx_commit=2&lt;br /&gt;innodb_max_dirty_pages_pct=80&lt;br /&gt;innodb_doublewrite=0&lt;br /&gt;innodb_file_per_table&lt;br /&gt;innodb_flush_method=O_DIRECT&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1103937629488747962?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1103937629488747962/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/06/innodb-io-bound-oltp-2-disk-server.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1103937629488747962'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1103937629488747962'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/06/innodb-io-bound-oltp-2-disk-server.html' title='InnoDB, IO-bound OLTP, 2-disk server'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_3rU41dez5TI/Si2AnUdXpuI/AAAAAAAAAU0/kE0Mk0aI9Ws/s72-c/tpm.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1193484956512133556</id><published>2009-06-05T11:03:00.000-07:00</published><updated>2010-03-28T08:11:33.209-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>Buffered versus direct IO for InnoDB</title><content type='html'>I learn a lot about InnoDB when people ask me questions. In this case, someone asked whether &lt;a href="http://linux.die.net/man/2/fsync"&gt;fsync&lt;/a&gt; was used for writes done when InnoDB is running with &lt;a href="http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html"&gt;innodb_flush_method=O_DIRECT&lt;/a&gt;. It is still used. It does not need to be used and I am not sure whether this has a measurable impact on performance.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Buffered IO&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;You can configure InnoDB to use direct IO for data files or for transaction log files but not for both at the same time. I added a new value for innodb_flush_method, allsync, to change that. When innodb_flush_method=allsync is used, the behavior for O_DIRECT and O_DSYNC (described below) are implied. I have yet to find a significant performance benefit from that change. I may not have the right hardware.&lt;br /&gt;&lt;br /&gt;Data files are opened with &lt;a href="http://www.kernel.org/doc/man-pages/online/pages/man2/open.2.html"&gt;O_DIRECT&lt;/a&gt; when innodb_flush_method is set to O_DIRECT. fsync is still used in this case, but it doesn't need to be.&lt;br /&gt;&lt;br /&gt;Transaction log files are opened with &lt;a href="http://www.kernel.org/doc/man-pages/online/pages/man2/open.2.html"&gt;O_SYNC&lt;/a&gt; when innodb_flush_method is set to O_DSYNC. fsync is not used in this case. Writes to the transaction log are done in multiples of OS_FILE_LOG_BLOCK_SIZE (set to 512 in os0file.h). The filesystem block size is likely to be much larger than 512 and each 512 byte (or N*512 byte) write done by InnoDB requires a file system block write to the disk. According to the man page and actual behavior on Linux, buffering can be done so that a sequence of 512 byte writes with a 4kb file system block size does not require a disk read on every write.&lt;br /&gt;&lt;br /&gt;It might be good to use a larger value for OS_FILE_LOG_BLOCK_SIZE (1024?) and a smaller file system block size for the file system that stores the transaction log (but not for the file system that stores the data files). This is more likely to be useful on SSD and dependent on your workload.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I used sysbench fileio to determine whether there is a performance impact from calling fsync after writes on a file opened with O_DIRECT. There is an impact, but it remains to be seen whether that translates to a performance impact for InnoDB. To test this, I used a server with:&lt;br /&gt;&lt;ul&gt;&lt;li&gt; 2 CPU cores&lt;/li&gt;&lt;li&gt;2 disks, SATA, 7200 RPM&lt;/li&gt;&lt;li&gt;2 disk SW RAID 0, 1MB RAID stripe, XFS file system&lt;/li&gt;&lt;li&gt;SATA write cache enabled&lt;/li&gt;&lt;/ul&gt;I used this sysbench command line with fsf set to (0, 1) and nt to (1,2,4,8,16):&lt;br /&gt;&lt;blockquote&gt;sysbench --test=fileio --file-num=1 --file-total-size=4G --file-test-mode=rndwr --file-extra-flags=direct --file-fsync-freq=$fsf --num-threads=$nt --max-requests=0 --max-time=60 run&lt;/blockquote&gt;Writes per second was always higher when fsync was not done and the SATA write cache enabled:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;503 vs 532 for --num-threads=1&lt;/li&gt;&lt;li&gt;501 vs 438 for --num-threads=2&lt;/li&gt;&lt;li&gt;510 vs 447 for --num-threads=4&lt;/li&gt;&lt;li&gt;495 vs 451 for --num-threads=8&lt;/li&gt;&lt;li&gt;502 vs 464 for --num-threads=16&lt;/li&gt;&lt;/ul&gt;They were also higher when fsync was not done and the SATA write cache disabled:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;109 vs 49 for --num-threads=1&lt;/li&gt;&lt;li&gt; 160 vs 58 for --num-threads=2&lt;/li&gt;&lt;li&gt;189 vs 70 for --num-threads=4&lt;/li&gt;&lt;li&gt;209 vs 77 for --num-threads=8&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;Update2&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I get ~5% speedup on tpcc-mysql (1294 vs 1237 transactions per second) when fsync calls are disabled with innodb_flush_method set to O_DIRECT.&amp;nbsp; The server for this test is described in the previous section. The SATA write cache was disabled for the test. These my.cnf parameters were used:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;innodb_buffer_pool_size=1200M&lt;/li&gt;&lt;li&gt;innodb_log_file_size=1900M&lt;/li&gt;&lt;li&gt;innodb_flush_log_at_trx_commit=2&lt;/li&gt;&lt;li&gt;innodb_io_capacity=250&lt;/li&gt;&lt;li&gt;innodb_read_io_threads=2&lt;/li&gt;&lt;li&gt;innodb_write_io_threads=2&lt;/li&gt;&lt;li&gt;innodb_max_dirty_pages_pct=20&lt;/li&gt;&lt;li&gt;innodb_ibuf_max_pct_of_buffer=10&lt;/li&gt;&lt;li&gt;skip_innodb_ibuf_reads_sync&lt;/li&gt;&lt;li&gt;innodb_doublewrite=0&lt;/li&gt;&lt;li&gt;innodb_file_per_table&lt;/li&gt;&lt;li&gt;allow_view_trigger_sp_subquery&lt;/li&gt;&lt;li&gt;skip_innodb_readahead_random&lt;/li&gt;&lt;li&gt;skip_innodb_readahead_sequential&lt;/li&gt;&lt;li&gt;innodb_flush_method=O_DIRECT&lt;/li&gt;&lt;li&gt;innodb_check_max_dirty_foreground&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1193484956512133556?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1193484956512133556/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/06/buffered-versus-direct-io-for-innodb.html#comment-form' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1193484956512133556'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1193484956512133556'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/06/buffered-versus-direct-io-for-innodb.html' title='Buffered versus direct IO for InnoDB'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2336827623859337719</id><published>2009-06-03T09:26:00.000-07:00</published><updated>2010-03-28T08:11:33.210-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Current events in the MySQL community</title><content type='html'>Jeremy has an &lt;a href="http://www.linux-mag.com/id/7342"&gt;article in LinuxMag&lt;/a&gt;. This is worth reading.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2336827623859337719?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2336827623859337719/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/06/current-events-in-mysql-community.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2336827623859337719'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2336827623859337719'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/06/current-events-in-mysql-community.html' title='Current events in the MySQL community'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-8048945696333416683</id><published>2009-06-02T09:36:00.000-07:00</published><updated>2010-03-28T08:11:33.211-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ha'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='replication'/><title type='text'>On synchronous replication</title><content type='html'>Is synchronous replication possible in MySQL? Yes. Is it possible without major surgery to the existing code? Probably (or hopefully). Notes on an approach are at &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/MysqlSyncReplication"&gt;code.google.com&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The MySQL replication team may be working on this now for MySQL 6.0. They have spent a lot of time recently making replication flexible to support semi-sync and other new features. I assume they plan to support sync replication as well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-8048945696333416683?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/8048945696333416683/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/06/on-synchronous-replication.html#comment-form' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8048945696333416683'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8048945696333416683'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/06/on-synchronous-replication.html' title='On synchronous replication'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1243576527913734792</id><published>2009-06-01T15:16:00.000-07:00</published><updated>2010-03-28T08:11:33.211-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>Performance impact of prefetching in InnoDB</title><content type='html'>&lt;a href="http://dev.mysql.com/doc/refman/5.0/en/innodb-disk-io.html"&gt;InnoDB prefetches blocks&lt;/a&gt; when it detects multiple accesses to blocks within an extent. Unfortunately, there are no metrics in the server to determine whether it is effective. There are also weak metrics in the server to determine how frequently it is done -- counters incremented each time the readahead code prefetches one or more blocks rather than once per prefetch request.&lt;br /&gt;&lt;br /&gt;There are cases where prefetch improves performance. A query that does a full table scan was run with prefetch enabled and disabled. It was 35% slower with prefetch disabled.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.mysqlperformanceblog.com/2007/06/26/can-innodb-read-ahead-reduce-read-performance/"&gt;Percona&lt;/a&gt; and &lt;a href="http://www.bigdbahead.com/?p=104"&gt;Matt&lt;/a&gt; have written about potential performance problems from this feature. There isn't much data to indicate when this feature should be enabled.&amp;nbsp; I have &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/InnodbIoPrefetch"&gt;published data for a few IO-bound benchmarks&lt;/a&gt;. On these tests, the prefetching done by InnoDB reduces performance. The tests run were:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://blogs.tokutek.com/tokuview/mysql_insert_performance_with_iibench_python_client/"&gt;insert benchmark&lt;/a&gt;&amp;nbsp;&lt;/li&gt;&lt;li&gt; &lt;a href="https://code.launchpad.net/%7Epercona-dev/perconatools/tpcc-mysql"&gt;tpcc-mysql&lt;/a&gt; with data cached by InnoDB&lt;/li&gt;&lt;li&gt;tpcc-mysql with too much data to cach&lt;/li&gt;&lt;/ul&gt;XtraDB has an &lt;a href="http://www.percona.com/docs/wiki/percona-xtradb:patch:innodb_io"&gt;option to disable readahead&lt;/a&gt;.&amp;nbsp; The v4 Google patch changes the SHOW STATUS counters for readahead to display the number of prefetch requests. An upcoming v4 Google patch will also have options to disable readahead.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Update&lt;/b&gt; -- my peers just reminded me to add support for per-session (dynamic) usage of the new my.cnf parameters (innodb_readahead_random, innodb_readahead_sequential).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1243576527913734792?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1243576527913734792/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/06/performance-impact-of-prefetching-in.html#comment-form' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1243576527913734792'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1243576527913734792'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/06/performance-impact-of-prefetching-in.html' title='Performance impact of prefetching in InnoDB'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-8933481402946953350</id><published>2009-05-30T08:44:00.000-07:00</published><updated>2010-03-28T08:11:33.212-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>JavaOne and the Open HA Cluster Summit</title><content type='html'>Are you going to &lt;a href="http://java.sun.com/javaone/"&gt;JavaOne&lt;/a&gt;? I am not, but I will be at the &lt;a href="http://wikis.sun.com/display/OpenSolaris/Open+HA+Cluster+Summit+May+2009"&gt;Open HA Cluster Summit&lt;/a&gt; tomorrow. I will be part of a panel session on HA. I guess they needed someone with the &lt;i&gt;HA light&lt;/i&gt; perspective -- that is how do you get a highly available service when you don't get to use HA components like &lt;a href="http://www.mysql.com/cluster"&gt;MySQL Cluster&lt;/a&gt;. A lot of interesting work remains to be done to make this possible with regular MySQL. Projects like &lt;a href="https://launchpad.net/mysql-mmm"&gt;MMM&lt;/a&gt;, &lt;a href="http://scale-out-blog.blogspot.com/2009/04/tungsten-replicator-build-101-available.html"&gt;Tungsten&lt;/a&gt; and the Google patch with &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/GlobalTransactionIds"&gt;global transaction IDs&lt;/a&gt; are pieces that might eventually provide a complete solution. There is work underway at MySQL/Sun in the replication team as well. They may even be buiding the integrated solution for MySQL Enterprise.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-8933481402946953350?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/8933481402946953350/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/05/javaone-and-open-ha-cluster-summit.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8933481402946953350'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/8933481402946953350'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/05/javaone-and-open-ha-cluster-summit.html' title='JavaOne and the Open HA Cluster Summit'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-9018082899142668410</id><published>2009-05-28T11:33:00.000-07:00</published><updated>2010-03-28T08:11:33.212-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>InnoDB performance TODO list</title><content type='html'>These are my plans for making InnoDB faster on SMP and high-IOPs servers. I think we can double throughput at high levels of concurrency.&lt;br /&gt;&lt;br /&gt;Future work: &lt;br /&gt;&lt;ol&gt;&lt;li&gt;Reduce the size of mutex and rw-lock structures&lt;/li&gt;&lt;li&gt;Reduce contention on the sync array mutex&lt;/li&gt;&lt;li&gt;Reduce contention on kernel_mutex&lt;/li&gt;&lt;li&gt;Reduce contention on commit_prepare_mutex&lt;/li&gt;&lt;li&gt;Reduce the number of mutex lock/unlock calls used when a thread is put on the sync array&lt;/li&gt;&lt;li&gt;Name all events, rw-locks and mutexes in InnoDB to make contention statistics output useful&lt;/li&gt;&lt;li&gt; Add optional support to time all operations that may block&lt;/li&gt;&lt;li&gt;Introduce dulint to native 64-bit integer types&lt;/li&gt;&lt;li&gt;Make BUF_READ_AHEAD_AREA a compile-time constant &lt;/li&gt;&lt;li&gt;Prevent full table scans from wiping out the InnoDB buffer cache&lt;/li&gt;&lt;li&gt;Make prefetching smarter&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Get feedback from Dimitri, Domas, Mikael and Percona&lt;/li&gt;&lt;li&gt;Use prefetch with &lt;a href="http://www.google.com/search?q=mrr+bka+sergey"&gt;MRR/BKA&lt;/a&gt; to get parallel IO in InnoDB&lt;/li&gt;&lt;li&gt;Investigate larger doublewrite buffer to allow for more concurrent IOs&lt;/li&gt;&lt;li&gt;Make Innodb work with a 4kb page size&lt;/li&gt;&lt;li&gt;Make trx_purge() faster when called by the main background thread&lt;/li&gt;&lt;li&gt;Use crc32 for Innodb page checksums with hardware support or otherwise make checksum faster.&lt;/li&gt;&lt;li&gt;Reduce the &lt;a href="http://mysqlha.blogspot.com/2008/11/innodb-memory-overhead.html"&gt;per-page overhead&lt;/a&gt; for sync objects&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Repeat&lt;/li&gt;&lt;/ol&gt;Current work:&lt;br /&gt;&lt;ol&gt;&lt;li&gt; Add my.cnf options to disable InnoDB prefetch reads&lt;/li&gt;&lt;li&gt;Put more output in SHOW INNODB STATUS and SHOW STATUS&lt;/li&gt;&lt;li&gt;Reduce the overhead from buf_flush_free_margin()&lt;/li&gt;&lt;li&gt;Change background IO threads to use available IO capacity&lt;/li&gt;&lt;li&gt;Use more IO to merge insert buffer records when the insert buffer is full&lt;/li&gt;&lt;/ol&gt;Non-InnoDB work:&lt;br /&gt;&lt;ol&gt;&lt;li&gt; Fix mutex contention for the HEAP engine&lt;/li&gt;&lt;li&gt;Fix mutex contention for the MyISAM engine&lt;/li&gt;&lt;li&gt;Fix mutex contention for the query cache&lt;/li&gt;&lt;li&gt;Give priority (CPU, disk) to the replication SQL thread to minimize replication delay. &lt;/li&gt;&lt;li&gt;Push changes for --oltp-secondary-index to public sysbench branch&lt;/li&gt;&lt;li&gt;Add support to sysbench fileio for transaction log and doublewrite buffer IO patterns&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-9018082899142668410?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/9018082899142668410/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/05/innodb-performance-todo-list.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/9018082899142668410'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/9018082899142668410'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/05/innodb-performance-todo-list.html' title='InnoDB performance TODO list'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-7123566388790586552</id><published>2009-05-28T10:17:00.000-07:00</published><updated>2010-03-28T08:11:33.213-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>InnoDB checksum performance</title><content type='html'>Once again &lt;a href="http://dammit.lt/2009/05/28/checksums-again-some-io-too/"&gt;Domas is unhappy&lt;/a&gt; with some aspect of Innodb performance and doing crazy things with gdb to tune it. I made it faster by changing the checksum code to process one 32-bit word at a time rather than one byte at a time. This will be in a future Google patch and is enabled with the parameter innodb_fast_checksum. This is not compatible with the old checksum so you must dump and reload the database to use it.&lt;br /&gt;&lt;br /&gt;I measured the benefit using the &lt;a href="http://www.tokutek.com/benchmark.php"&gt;insert benchmark from Tokutek&lt;/a&gt; on a server that can do a lot of IO. CPU overheads are measured using oprofile. The data below lists the percentage of time for the top 4 functions in mysqld. The checksum is computed in buf_calc_page_new_checksum. By using the fast checksum, the checksum overhead drops from 33.6% to 22.1% for gcc -O2 and from 31.6% to 17.3% for gcc -O3.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Overhead for gcc -O2&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Using the original checksum code:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;33.6% - buf_calc_page_new_checksum&lt;/li&gt;&lt;li&gt;10.4% - memcpy&lt;/li&gt;&lt;li&gt;4.4% - os_aio_simulated_handle&lt;/li&gt;&lt;li&gt;4.3% - rec_get_offsets_func&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Using the fast checksum code:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;22.1% - buf_calc_page_new_checksum&lt;/li&gt;&lt;li&gt;12.1% - memcpy&lt;/li&gt;&lt;li&gt;5.1% - rec_get_offsets_func&lt;/li&gt;&lt;li&gt;4.9% - os_aio_simulated_handle&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;Overhead for gcc -O3&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Using the original checksum code:&lt;br /&gt;&lt;ul&gt;&lt;li&gt; 31.6% - buf_calc_page_new_checksum&lt;/li&gt;&lt;li&gt;12.6% - memcpy&lt;/li&gt;&lt;li&gt;5.8% - rec_get_offsets_func&lt;/li&gt;&lt;li&gt;2.6 - os_aio_simulated_handle&lt;/li&gt;&lt;/ul&gt;Using the fast checksum code:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;17.3% - buf_calc_page_new_checksum&lt;/li&gt;&lt;li&gt;13.6% - memcpy&lt;/li&gt;&lt;li&gt;6.8% - rec_get_offsets_func&lt;/li&gt;&lt;li&gt;2.0% - os_aio_simulated_handle&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-7123566388790586552?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/7123566388790586552/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/05/innodb-checksum-performance.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7123566388790586552'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7123566388790586552'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/05/innodb-checksum-performance.html' title='InnoDB checksum performance'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-5201791399186643558</id><published>2009-05-26T17:16:00.000-07:00</published><updated>2010-03-28T08:11:33.214-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>InnoDB IO performance and the v4 Google patch</title><content type='html'>The v4 patch has been &lt;a href="http://google-mysql-tools.googlecode.com/svn/trunk/mysql-patches/all.v3-mysql-5.0.37.patch.gz"&gt;published&lt;/a&gt;. A description of the changes and performance results are &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/InnodbIoPerformance"&gt;here&lt;/a&gt;. I am still analyzing the results to make sure that I can explain performance differences and the lack of performance differences.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-5201791399186643558?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/5201791399186643558/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/05/innodb-io-performance-and-v4-google.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5201791399186643558'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5201791399186643558'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/05/innodb-io-performance-and-v4-google.html' title='InnoDB IO performance and the v4 Google patch'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-7694290707844884067</id><published>2009-05-22T22:51:00.000-07:00</published><updated>2010-03-28T08:11:33.215-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>A good reason to use inodb_file_per_table -- per-table IO statistics</title><content type='html'>I added support for per-tablespace IO statistics to InnoDB. This also provides per-table IO statistics when you innodb_file_per_table is used. The stats are listed in SHOW INNODB STATUS and the text below is output when tpcc-mysql is run -- pardon the formatting. The code should appear at code.google.com real soon now.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;File IO statistics&lt;br /&gt;&amp;nbsp; ./test/warehouse.ibd 10 -- read: 4 requests, 4 pages, 0.00 secs, 0.72 msecs/r, write: 3 requests, 3 pages, 0.00 secs, 1.43 msecs/r&lt;br /&gt;&amp;nbsp; ./ibdata1 0 -- read: 30 requests, 203 pages, 0.03 secs, 0.99 msecs/r, write: 124 requests, 3020 pages, 0.74 secs, 5.93 msecs/r&lt;br /&gt;&amp;nbsp; ./test/orders.ibd 29 -- read: 8490 requests, 10033 pages, 8.48 secs, 1.00 msecs/r, write: 6754 requests, 12728 pages, 34.27 secs, 5&lt;br /&gt;.07 msecs/r&lt;br /&gt;&amp;nbsp; ./test/customer.ibd 28 -- read: 33901 requests, 34226 pages, 32.05 secs, 0.95 msecs/r, write: 11224 requests, 11850 pages, 43.17 se&lt;br /&gt;cs, 3.85 msecs/r&lt;br /&gt;&amp;nbsp; ./test/stock.ibd 27 -- read: 151957 requests, 176913 pages, 256.89 secs, 1.69 msecs/r, write: 41475 requests, 52199 pages, 220.43 s&lt;br /&gt;ecs, 5.31 msecs/r&lt;br /&gt;&amp;nbsp; ./test/order_line.ibd 25 -- read: 14239 requests, 14876 pages, 13.10 secs, 0.92 msecs/r, write: 11610 requests, 38413 pages, 45.01 &lt;br /&gt;secs, 3.88 msecs/r&lt;br /&gt;&amp;nbsp; ./test/new_orders.ibd 22 -- read: 2023 requests, 2316 pages, 1.80 secs, 0.89 msecs/r, write: 1213 requests, 7004 pages, 7.58 secs, &lt;br /&gt;6.25 msecs/r&lt;br /&gt;&amp;nbsp; ./test/history.ibd 21 -- read: 5740 requests, 7711 pages, 5.64 secs, 0.98 msecs/r, write: 4938 requests, 22754 pages, 27.97 secs, 5&lt;br /&gt;.66 msecs/r&lt;br /&gt;&amp;nbsp; ./test/district.ibd 18 -- read: 15 requests, 15 pages, 0.01 secs, 0.78 msecs/r, write: 8 requests, 31 pages, 0.02 secs, 3.02 msecs/&lt;br /&gt;r&lt;br /&gt;&amp;nbsp; ./test/item.ibd 16 -- read: 757 requests, 904 pages, 0.67 secs, 0.89 msecs/r, write: 0 requests, 0 pages, 0.00 secs, 0.00 msecs/r&lt;br /&gt;&amp;nbsp; ./ib_logfile0 4294967280 -- read: 6 requests, 9 pages, 0.00 secs, 0.02 msecs/r, write: 25630 requests, 25877 pages, 0.56 secs, 0.02&lt;br /&gt;&amp;nbsp;msecs/r&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-7694290707844884067?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/7694290707844884067/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/05/good-reason-to-use-inodbfilepertable.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7694290707844884067'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7694290707844884067'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/05/good-reason-to-use-inodbfilepertable.html' title='A good reason to use inodb_file_per_table -- per-table IO statistics'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-7782225283088604991</id><published>2009-05-12T12:44:00.000-07:00</published><updated>2010-03-28T08:11:33.216-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ha'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='replication'/><title type='text'>Patch for global transaction IDs, binlog event checksums and crash-safe replication state</title><content type='html'>Justin just added a patch for &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/GlobalTransactionIds"&gt;global transaction IDs&lt;/a&gt;, &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/BinlogEventChecksums"&gt;binlog event checksums&lt;/a&gt; and crash-safe replication state. It is at &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/Mysql5Patches"&gt;code.google.com&lt;/a&gt;. This patch is based on MySQL 5.0.68, so Justin did a bit of work to port code forward from the version we use (5.0.37).&lt;br /&gt;&lt;br /&gt;Well, I assume that this includes support for crash-safe replication state. This replaces &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/TransactionalReplication"&gt;transactional replication&lt;/a&gt;. But it works for all storage engines.&lt;br /&gt;&lt;br /&gt;Percona has ported a few of the replication features from previous Google patches. Hopefully, they are interested in these changes. MySQL has semi-sync replication in 6.0 with a promise to backport to 5.4. Perhaps these changes will end up there too.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-7782225283088604991?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/7782225283088604991/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/05/patch-for-global-transaction-ids-binlog.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7782225283088604991'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7782225283088604991'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/05/patch-for-global-transaction-ids-binlog.html' title='Patch for global transaction IDs, binlog event checksums and crash-safe replication state'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-7556871304760974466</id><published>2009-04-30T08:26:00.000-07:00</published><updated>2010-03-28T08:11:33.216-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='ha'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Vendor lock in and MySQL documentation</title><content type='html'>Part of the sales pitch for MySQL is that there is less risk of vendor lock in. This is repeated frequently on their marketing &lt;a href="http://mysql.com/why-mysql/topreasons_vp.html"&gt;here&lt;/a&gt;, &lt;a href="http://mysql.com/why-mysql/topreasons_pm.html"&gt;here&lt;/a&gt;, &lt;a href="http://mysql.com/why-mysql/topreasons_dba.html"&gt;here&lt;/a&gt; and &lt;a href="http://mysql.com/why-mysql/topreasons_cio.html"&gt;here&lt;/a&gt;. The explanation is that the source code for MySQL is available with a GPL license and if you are unhappy with MySQL the company you can continue using MySQL the product and get support elsewhere.&lt;br /&gt;&lt;br /&gt;Documentatation does not have a similar license. You can decide whether this creates the risk of vendor lock in. Details are &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/index.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;We cannot edit it.&lt;/li&gt;&lt;li&gt; We have limited rights to publish it.&lt;/li&gt;&lt;/ul&gt;Isn't it in the best interests of Sun/MySQL to address this issue and reassure potential customers?&lt;br /&gt;&lt;br /&gt;&lt;a href="http://openquery.com/blog/mysql-docs-freedom"&gt;Arjen&lt;/a&gt;, &lt;a href="http://www.pythian.com/news/2274/mysql-documentation-licensing-woes"&gt;Sheeri&lt;/a&gt;, &lt;a href="http://www.xaprb.com/blog/2009/05/08/please-re-license-the-mysql-documentation/"&gt;Baron&lt;/a&gt; and the &lt;a href="http://blogs.sun.com/mysqlf/entry/mysql_documentation_no_license_change"&gt;lead for the MySQL docs team&lt;/a&gt; have also written about this.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-7556871304760974466?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/7556871304760974466/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/vendor-lock-in-and-mysql-documentation.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7556871304760974466'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7556871304760974466'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/vendor-lock-in-and-mysql-documentation.html' title='Vendor lock in and MySQL documentation'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-6529103590555361138</id><published>2009-04-28T13:39:00.000-07:00</published><updated>2010-03-28T08:11:33.217-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>InnoDB on IO bound workloads</title><content type='html'>I ran the &lt;a href="https://code.launchpad.net/%7Emdcallag/mysql-patch/mytools"&gt;iibench&lt;/a&gt; test using a server with 2 CPU cores, 2 disks in SW RAID 0 and 1 MB stripe, 2G RAM and XFS. If you just want a summary, it is that software changes can make InnoDB run much faster on the same hardware. There is a lot of opportunity -- but certainly not enough to catch &lt;a href="http://blogs.tokutek.com/tokuview/tokudb_storage_engine_for_mysql/"&gt;TokuDB&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The binaries tested are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.percona.com/mysql/5.1.34-5/"&gt;XtraDB 1.0.3-5&lt;/a&gt; &lt;br /&gt;&lt;/li&gt;&lt;li&gt;InnoDB with &lt;a href="http://dev.mysql.com/downloads/mysql/5.0.html#source"&gt;MySQL 5.0.77&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://code.google.com/p/google-mysql-tools/wiki/Mysql5Patches"&gt;v3 Google patch&lt;/a&gt; dated April 12&lt;/li&gt;&lt;/ul&gt;There were two tests:&lt;br /&gt;&lt;ol&gt;&lt;li&gt; Time to insert 50M rows into an empty table&lt;/li&gt;&lt;li&gt;Time to insert several million rows into a table with 50M rows&lt;/li&gt;&lt;/ol&gt;I disabled the innodb doublewrite buffer for all tests as I want to compare the results to a server that doesn't use that level of safety.&lt;br /&gt;&lt;br /&gt;The my.cnf parameters for 5.0.77 are:&lt;br /&gt;&lt;blockquote&gt;innodb_buffer_pool_size=1G&lt;br /&gt;innodb_log_file_size=1900M&lt;br /&gt;innodb_flush_log_at_trx_commit=1&lt;br /&gt;innodb_flush_method=O_DIRECT&lt;br /&gt;innodb_max_dirty_pages_pct=20&lt;br /&gt;innodb_doublewrite=0&lt;/blockquote&gt;&amp;nbsp;The my.cnf parameters for the v3 Google patch are:&lt;br /&gt;&lt;blockquote&gt;innodb_buffer_pool_size=1G&lt;br /&gt;innodb_log_file_size=1900M&lt;br /&gt;innodb_flush_log_at_trx_commit=1&lt;br /&gt;innodb_flush_method=O_DIRECT&lt;br /&gt;innodb_io_capacity=250&lt;br /&gt;innodb_read_io_threads=2&lt;br /&gt;innodb_write_io_threads=2&lt;br /&gt;innodb_max_dirty_pages_pct=20&lt;br /&gt;innodb_ibuf_max_pct_of_buffer=10&lt;br /&gt;innodb_ibuf_reads_sync=1&lt;br /&gt;innodb_doublewrite=0&lt;/blockquote&gt;The my.cnf parameters for XtraDB are:&lt;br /&gt;&lt;blockquote&gt;innodb_log_file_size=1900M&lt;br /&gt;innodb_buffer_pool_size=1G&lt;br /&gt;innodb_flush_log_at_trx_commit=1&lt;br /&gt;innodb_flush_method=O_DIRECT&lt;br /&gt;innodb_io_capacity=250&lt;br /&gt;innodb_use_sys_malloc=0&lt;br /&gt;innodb_read_io_threads=2&lt;br /&gt;innodb_write_io_threads=2&lt;br /&gt;innodb_max_dirty_pages_pct=20&lt;br /&gt;innodb_ibuf_max_size=100M&lt;br /&gt;innodb_ibuf_active_contract=1&lt;br /&gt;innodb_ibuf_accel_rate=200&lt;br /&gt;innodb_doublewrite=0&lt;/blockquote&gt;All performance results are anonymous for binaries X, Y and Z. Maybe I can monetize my performance testing effort by doing this. I probably need a new disk soon.&lt;br /&gt;&lt;br /&gt;The first result is the time to insert 50m rows into an empty table measured in seconds. The difference is not that signficant. However, the results can be misleading. 5.0.77 has much more pending work at test end (dirty pages and insert buffer entries). That also made 5.0.77 much slower near the end of the test, but I will save the graphs for the next set of results. The test is run using the run_ib found in the link for iibench at the start of this page. The command line is:&lt;br /&gt;&lt;blockquote&gt;bash run.sh 1 $path no root pw test no innodb 50000000 innodb yes yes $binary&lt;/blockquote&gt;And the results are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;24478 seconds -- binary X&lt;br /&gt;&lt;/li&gt;&lt;li&gt;21016 seconds -- binary Y&lt;/li&gt;&lt;li&gt;37146 seconds -- binary Z&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;The second result is the time to insert 3,380,000 rows into a table that starts with 50m rows on a cold server (no entries in the insert buffer, no dirty pages, server restarted). Queries are continuously run by 4 threads concurrent with the inserts. The test is run using run_ib from the iibench link at the top of this page. The command line is:&lt;br /&gt;&lt;blockquote&gt;bash run.sh 1 $path no root pw test no innodb 10000000 innodb no no $binary&lt;/blockquote&gt;And the results are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;38673 seconds -- binary X&lt;/li&gt;&lt;li&gt; 11143 seconds -- binary Y&lt;/li&gt;&lt;li&gt;21018 seconds -- binary Z&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Finally, the graph for row insert rate over time. Note that the graphs for binaries Y and Z don't extend to the right because they inserted the 3.3M rows much faster.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_3rU41dez5TI/SfdpAXPvn_I/AAAAAAAAAUU/lCTD0d1j-iU/s1600-h/rir.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/_3rU41dez5TI/SfdpAXPvn_I/AAAAAAAAAUU/lCTD0d1j-iU/s320/rir.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-6529103590555361138?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/6529103590555361138/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/innodb-on-io-bound-workloads.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6529103590555361138'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6529103590555361138'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/innodb-on-io-bound-workloads.html' title='InnoDB on IO bound workloads'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_3rU41dez5TI/SfdpAXPvn_I/AAAAAAAAAUU/lCTD0d1j-iU/s72-c/rir.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-206209425952806308</id><published>2009-04-27T15:15:00.000-07:00</published><updated>2010-03-28T08:11:33.217-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Hack on Drizzle, get paid!</title><content type='html'>Did you know that Rackspace has a cloud offering? I didn't. The &lt;a href="http://www.mosso.com/"&gt;name is Mosso&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Someone from Rackspace/Mosso on the drizzle-discuss mailing list offered to hire a person full time to work on &lt;a href="http://drizzle.org/"&gt;Drizzle&lt;/a&gt;. Curious? Find the post on the mailing list.&lt;br /&gt;&lt;br /&gt;Drizzle has a lot of potential for making it easier to run a DBMS server on the cloud. There are a few things that need to be done differently from traditional MySQL replication. Drizzle has started over and has removed the code inherited from MySQL. Their focus is a clean API (right Jay). There should be a lot of interesting work that can be done.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-206209425952806308?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/206209425952806308/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/hack-on-drizzle-get-paid.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/206209425952806308'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/206209425952806308'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/hack-on-drizzle-get-paid.html' title='Hack on Drizzle, get paid!'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-7147698690232004842</id><published>2009-04-27T12:20:00.000-07:00</published><updated>2010-03-28T08:11:33.218-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Why must you insult us Matt?</title><content type='html'>Surely there must be a better way to get your point across. Well, I assume this is an insult because you don't appear to be too hippie-ish. But if you really are the expert in open-source business that you claim to be, you probably can do better than to describe part of the MySQL community as an &lt;b&gt;open-source hippie commune&lt;/b&gt; that displays &lt;b&gt;hippie-esque tendencies&lt;/b&gt;, unless they self-identify as that. Although that will be a fair description once we start holding the user conferences at Burning Man and &lt;a href="http://www.oregoncountryfair.org/"&gt;Country Fair&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I haven't linked the blog in question because I don't want to promote it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-7147698690232004842?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/7147698690232004842/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/why-must-you-insult-us-matt.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7147698690232004842'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7147698690232004842'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/why-must-you-insult-us-matt.html' title='Why must you insult us Matt?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-6493623872758740516</id><published>2009-04-26T06:10:00.000-07:00</published><updated>2010-03-28T08:11:33.219-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>Slides for MySQL User Conference 2009 Talks</title><content type='html'>Slides for my talks. Code described here is in the &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/Mysql5Patches"&gt;v3 Google patch&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;iframe frameborder="0" height="342" src="http://docs.google.com/EmbedSlideshow?docid=dhngrkwh_9qsv8n9gm" width="410"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;br /&gt;&lt;iframe frameborder="0" height="342" src="http://docs.google.com/EmbedSlideshow?docid=dhngrkwh_7fn256bdj" width="410"&gt;&lt;/iframe&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-6493623872758740516?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/6493623872758740516/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/slides-for-mysql-user-conference-2009.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6493623872758740516'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/6493623872758740516'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/slides-for-mysql-user-conference-2009.html' title='Slides for MySQL User Conference 2009 Talks'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1866192532496729981</id><published>2009-04-25T13:37:00.000-07:00</published><updated>2010-03-28T08:11:33.219-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><category scheme='http://www.blogger.com/atom/ns#' term='replication'/><title type='text'>What is next?</title><content type='html'>In his keynote Baron reminded us that we need to focus on what we can do to improve community MySQL rather than wait for things to get done by the corporate owners. What will you do?&lt;br /&gt;&lt;br /&gt;Many of us will continue to add high-end features to MySQL. It will great if those features make it into an official MySQL release. It will be a great business opportunity for the community if they do not.&lt;br /&gt;&lt;br /&gt;In the short term, I have some things to do:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;run IO bound tests (iibench) for PBXT and provide the results to the PBXT team with a comparison to the v3 Google patch&lt;br /&gt;&lt;/li&gt;&lt;li&gt;run IO bound tests (iibench) for XtraDB and provide the results to Percona with a comparison to the v3 Google patch&lt;br /&gt;&lt;/li&gt;&lt;li&gt;publish more documentation for features in the v3 Google patch (support for roles, more changes to improve IO performance, more details on row-change logging)&lt;/li&gt;&lt;li&gt;read the docs and evaluate &lt;a href="http://www.innodb.com/products/embedded-innodb/"&gt;embedded InnoDB&lt;/a&gt;&amp;nbsp;&lt;/li&gt;&lt;/ul&gt;Other people on my team also plan to share more details and code:&lt;br /&gt;&lt;ul&gt;&lt;li&gt; Ben is working on a backport of the pool-of-threads code to MySQL 5.0. While the backport itself to 5.0.37 might not help those using 5.1 or recent 5.0 versions, we have also fixed the SMP performance problems in the pool-of-threads code and that change is isolated to a single file. Others will be able to use it (but only on Linux as it uses epoll directly).&lt;/li&gt;&lt;li&gt;Justin may publish a patch for a recent version of 5.0 that only includes the changes for global group IDs, binlog event checksums and crash safe replication that works for all storage engines.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Much of the big patch is not easy to consume. But there is an answer to that. If you really want the feature then you can hire someone to port it to a recent 5.0 or 5.1 release. Rumor has it that &lt;a href="http://www.percona.com/"&gt;Percona&lt;/a&gt; has done just that with several features. This makes me happy and proud.&lt;br /&gt;&lt;br /&gt;InnoDB has continued to add valuable features to their releases. The most recent is &lt;a href="http://www.innodb.com/products/embedded-innodb/"&gt;embedded InnoDB&lt;/a&gt;. It can be used to do some very interesting things. But first I must read the docs. They have also added a lot of new functionality to the 5.1 branch via the InnoDB plugin. This includes fast index creation, compression and SMP performance improvements. This is a big deal as the standard response and practice from MySQL is that new features cannot go into a production branch.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1866192532496729981?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1866192532496729981/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/what-is-next.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1866192532496729981'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1866192532496729981'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/what-is-next.html' title='What is next?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-738728085911094866</id><published>2009-04-24T09:08:00.000-07:00</published><updated>2010-03-28T08:11:33.220-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>New storage engines for MySQL -- rocket science or great engineering?</title><content type='html'>There were several new storage engine vendors at the MySQL Conference. I spoke with people from &lt;a href="http://www.virident.com/"&gt;Virident&lt;/a&gt;, &lt;a href="http://www.tokutek.com/"&gt;Tokutek&lt;/a&gt; and &lt;a href="http://www.schoonerinfotech.com/"&gt;Schooner&lt;/a&gt; at length about their technology. Their products are impressive and I look forward to more details on performance from them including read-intensive and write-intensive workloads. Tokutek has already published performance results on an insert intensive workload and then worked with me to improve the InnoDB results and improve the test code so others can run it. The &lt;a href="http://tokutek.com/technology.php"&gt;code and results are here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;One test I want all of them to run is to run a write-intensive workload so that InnoDB accumulates many dirty pages in the buffer pool and many entries in the insert buffer, kill mysqld and then determine how long it takes the server to perform crash recovery. This should be compared between InnoDB on commodity hardware, InnoDB on Virident and Schooner hardware and TokuDB on commodity hardware. I suspect that the results will be impressive for the new storage engines.&lt;br /&gt;&lt;br /&gt;I use the term &lt;b&gt;rocket science&lt;/b&gt; because a lot of vendors will have you believe that they have something special. In this case, I believe that each of the vendors really do have something special. But of course, more results will help us understand what they can do. Each of them have also chosen a path that doesn't require a huge investment on their part to build a product as they have limited their software and hardware investments to areas where they have a lot of value to add and that makes it more likely that they can deliver on their promises.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;TokuDB is software-only. But this is really clever software. They have implemented a new algorithm that significantly reduces random IO for write-intensive workloads. There have been algorithms that do this. For example, &lt;a href="http://www.google.com/search?q=log+structured+merge+tree"&gt;Log-Structured Merge Trees&lt;/a&gt;. I have even &lt;a href="http://portal.acm.org/citation.cfm?id=1453856.1453914&amp;amp;coll=GUIDE&amp;amp;dl=GUIDE&amp;amp;idx=J1174&amp;amp;part=journal&amp;amp;WantType=Journals&amp;amp;title=Proceedings%20of%20the%20VLDB%20Endowment"&gt;published a paper&lt;/a&gt; on this at VLDB. But TokuDB may be much better than previously known approaches.&lt;br /&gt;&lt;/li&gt;&lt;li&gt; most of the hardware expense for Virident is isolated to one component that implements industry standard interfaces and can plug into commodity servers. Their software investment is focused on improving pieces of MySQL/InnoDB to leverage their hardware. They have been able to improve on the work of others in the InnoDB developer community. &lt;br /&gt;&lt;/li&gt;&lt;li&gt;Schooner uses mostly commodity hardware with value-added in the integration of that hardware. Their software investment in MySQL also appears to be focused on improving pieces of it rather than replacing it and they have been able to improve on the work of others.&lt;/li&gt;&lt;/ul&gt;TokuDB allows a small server to handle a much larger workload. This has many benefits including reduced power consumption and less need to shard or add shards to a large MySQL deployment. I think they also use much less disk space than InnoDB. Their technical staff explained the math behind the algorithms that justify their performance. Math is hard so this took some time, but eventually I kind of understood and I believe in their results. Their approach will enable many more optimizations in the future.&lt;br /&gt;&lt;br /&gt;The servers from Virident and Schooner are optimized for InnoDB, so it should be easy for existing users to try them out. I expect ridculously high throughput results from both of them. Schooner hardware is easier to understand as they provide much better IO performance (and many other benefits). They also appear to have designed a balanced system so that peak and actual performance won't be too far away. Oracle has done this with the Exadata machine and it is very nice to see a similar effort from Schooner.&lt;br /&gt;&lt;br /&gt;Virident uses NOR Flash to provide fast access with byte-level accessing (as opposed to reading a disk page at a time). This takes more time to understand. It is almost as if they have reduced the InnoDB page size to the size of a row, so much less data is transferred when reading rows randomly. MySQL loves to access rows randomly and reading less data means less effort is wasted and there should be less contention on shared resources with many-core and multi-core servers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-738728085911094866?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/738728085911094866/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/new-storage-engines-for-mysql-rocket.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/738728085911094866'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/738728085911094866'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/new-storage-engines-for-mysql-rocket.html' title='New storage engines for MySQL -- rocket science or great engineering?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2974121277946283786</id><published>2009-04-23T07:42:00.000-07:00</published><updated>2010-03-28T08:11:33.220-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ha'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Cool things you can almost do with replication</title><content type='html'>We added support for row-change logging to MySQL 5.0. The logged data is similar to row-based replication with changes to the output that make it much easier to parse. Gene Pang &lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/6780"&gt;describes this work at 2pm&lt;/a&gt; at the conference.&lt;br /&gt;&lt;br /&gt;What might be done with this data?&lt;br /&gt;&lt;ul&gt;&lt;li&gt;replicate row changes to a data store that is not MySQL (Teradata, HBase/Hypertable, memcached)&lt;/li&gt;&lt;li&gt;materialized view maintenance&lt;/li&gt;&lt;li&gt;change notification&lt;/li&gt;&lt;/ul&gt;And I &lt;a href="http://tinyurl.com/dzshl2"&gt;talk&lt;/a&gt; at the Percona Performance Conference &lt;a href="http://conferences.percona.com/percona-performance-conference-2009/schedule.html"&gt;at 10:50am today&lt;/a&gt; on the InnoDB IO architecture.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2974121277946283786?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2974121277946283786/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/cool-things-you-can-almost-do-with.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2974121277946283786'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2974121277946283786'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/cool-things-you-can-almost-do-with.html' title='Cool things you can almost do with replication'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-3026636427627894993</id><published>2009-04-22T11:22:00.000-07:00</published><updated>2010-03-28T08:11:33.221-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Where is the Calpont code?</title><content type='html'>&lt;a href="http://www.calpont.com/"&gt;Calpont&lt;/a&gt; has a talk on their MPP column-store storage engine for MySQL at 2PM today. The talk title is&amp;nbsp; &lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/8997"&gt;Open Source Columnar Storage Engine.&lt;/a&gt; It sounds interesting, especially if the source will be available as many people can try it out. But the source isn't available today. Where is the source?&lt;br /&gt;&lt;br /&gt;Note, Calpont doesn't mind that I am asking about this in public.&lt;br /&gt;&lt;br /&gt;Other questions I have include:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Does it implement the condition pushdown interface?&lt;/li&gt;&lt;li&gt;Will it implement the batch key access interface?&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-3026636427627894993?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/3026636427627894993/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/where-is-calpont-code.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3026636427627894993'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3026636427627894993'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/where-is-calpont-code.html' title='Where is the Calpont code?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-621055651315989964</id><published>2009-04-22T09:34:00.000-07:00</published><updated>2010-03-28T08:11:33.221-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Really cool features in the Google patch</title><content type='html'>Justin and Ben talk today at 4:25pm on &lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/6903"&gt;features for SMP performance and high availability&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Ben is an expert on InnoDB internals related to SMP performance. He designed and implemented the faster rw-mutex changes that are now in the 1.0.3 InnoDB plugin and MySQL 5.4 and have made MySQL much faster on SMP servers. More recently he changed InnoDB to significantly reduce mutex contention on the transaction log and buffer pool mutexes. This makes InnoDB 20% faster on sysbench and other read-write workloads. Right now he is finishing the backport of the pool-of-threads code from MySQL 6 to 5.0 and making it scale on SMP.&lt;br /&gt;&lt;br /&gt;Justin is a replication expert. He added support for global transaction IDs to automate slave failover, made replication slaves crash-safe, added checksums for binlog events and fixed many bugs in replication.&lt;br /&gt;&lt;br /&gt;They both have done very interesting work. And we don't just build these features, we also run them in production soon after adding them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-621055651315989964?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/621055651315989964/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/really-cool-features-in-google-patch.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/621055651315989964'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/621055651315989964'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/really-cool-features-in-google-patch.html' title='Really cool features in the Google patch'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1995079318277265956</id><published>2009-04-22T07:15:00.000-07:00</published><updated>2010-03-28T08:11:33.222-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb'/><title type='text'>InnoDB on high IOPs servers (SSD)</title><content type='html'>How does InnoDB do on high IOPs servers? Thanks to SSD, many of us will soon have such servers. I will provide more details in my talks:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/6653"&gt;MySQL performance in the cloud&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://conferences.percona.com/percona-performance-conference-2009/schedule.html"&gt;Life of a Dirty Page&lt;/a&gt; (&lt;a href="http://docs.google.com/Presentation?id=dhngrkwh_7fn256bdj"&gt;slides are here&lt;/a&gt;)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;The summary is that InnoDB is very effective at using the read capacity of a high IOPs server. It has problems at using enough of the write capacity. Alas, this is InnoDB and the problem is easy to fix. Many of the fixes are in the &lt;a href="http://www.google.com/url?sa=t&amp;amp;source=web&amp;amp;ct=res&amp;amp;cd=1&amp;amp;url=http%3A%2F%2Fcode.google.com%2Fp%2Fgoogle-mysql-tools%2Fwiki%2FMysql5Patches&amp;amp;ei=riXvSc3wL5vEtAOH5IT0AQ&amp;amp;usg=AFQjCNFZWpazfRM1oNdIv7QI8R-mhfqWlA"&gt;v3 Google patch&lt;/a&gt;. Many are also in the &lt;a href="http://www.percona.com/"&gt;Percona&lt;/a&gt; patches and builds. Although there may be one fix in the v3 Google patch that Percona has yet to implement.&lt;br /&gt;&lt;br /&gt;Note that I may mention the P word a few times in my talk today and tomorrow.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1995079318277265956?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1995079318277265956/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/innodb-on-high-iops-servers-ssd.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1995079318277265956'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1995079318277265956'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/innodb-on-high-iops-servers-ssd.html' title='InnoDB on high IOPs servers (SSD)'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2580310519582177456</id><published>2009-04-19T17:10:00.000-07:00</published><updated>2010-03-28T08:11:33.222-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Hack MySQL at MySQL Camp on Monday</title><content type='html'>I host the &lt;a href="http://forge.mysql.com/wiki/MySQLCamp2009Sessions#Monday_9:00_am_-_12:00_pm"&gt;MySQL Hackfest&lt;/a&gt; at MySQL Camp on Monday. Bring your laptop setup to build MySQL. I will have a few CDs with MySQL 5.0.37, the v3 Google patch for MySQL 5.0.37 and a few other releases of MySQL 5. Possible projects that can be started (but maybe not finished) include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Parse the output from the &lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/6780"&gt;row-change log&lt;/a&gt; and convert it to protocol buffers.&lt;/li&gt;&lt;li&gt;Apply the output from the &lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/6780"&gt;row-change log&lt;/a&gt; to another data store (RDBMS, HBase, Hypertable)&lt;/li&gt;&lt;li&gt;Add a new SHOW command&lt;/li&gt;&lt;li&gt;Add a new SQL function&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2580310519582177456?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2580310519582177456/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/hack-mysql-at-mysql-camp-on-monday.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2580310519582177456'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2580310519582177456'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/hack-mysql-at-mysql-camp-on-monday.html' title='Hack MySQL at MySQL Camp on Monday'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2604602862081468986</id><published>2009-04-16T11:49:00.000-07:00</published><updated>2010-03-28T08:11:33.223-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Everything I know about running MySQL on Amazon EC2...</title><content type='html'>... I learned from excellent documentation written by &lt;a href="http://www.anvilon.com/"&gt;Eric Hammond&lt;/a&gt;. For someone just starting out with EC2 (me), the documentation is great. It has made it very easy for me to try things out and repeat many of the performance tests I run for the v2 Google patch.&lt;br /&gt;&lt;br /&gt;I needed to setup RAID on the local storage -- &lt;a href="http://groups.google.com/group/ec2ubuntu/web/raid-0-on-ec2-ebs-volumes-elastic-block-store-using-mdadm?pli=1"&gt;not a problem&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I needed to setup RAID over &lt;a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1667&amp;amp;ref=featured"&gt;several EBS&lt;/a&gt; volumes -- &lt;a href="http://developer.amazonwebservices.com/connect/entry.jspa?categoryID=112&amp;amp;externalID=1663"&gt;not a problem&lt;/a&gt;. Just be sure to follow the instructions and use XFS rather than ext2 or ext3 (which don't allow concurrent writes to a file).&lt;br /&gt;&lt;br /&gt;Clouds like EC2, whether internal or external, represent an interesting environment in which to deploy MySQL. Many of the requirements for a stable MySQL environment in the past might not be true anymore. For example, it isn't so easy to guarantee that the server is always shutdown cleanly (yes, MyISAM and replication state files, I am looking at you). But once a few of the problems are solved, then it becomes much easier for a few people to manage a large number of servers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2604602862081468986?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2604602862081468986/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/everything-i-know-about-running-mysql.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2604602862081468986'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2604602862081468986'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/everything-i-know-about-running-mysql.html' title='Everything I know about running MySQL on Amazon EC2...'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-5933659558362214190</id><published>2009-04-14T21:15:00.000-07:00</published><updated>2010-03-28T08:11:33.223-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Talks at the (free) MySQL Camp</title><content type='html'>David Lutz will talk about &lt;a href="http://forge.mysql.com/wiki/MySQLCamp2009Sessions"&gt;Predicting Performance with Queueing Models&lt;/a&gt; at the (free) MySQL Camp on Thursday at 2PM. I will be there. Good performance testing includes tests and an explanation of the results. The test results are frequently much better when the explanation includes a performance model. I usually skip the model (math is hard). My published results would be more useful were I to include the models. I have a few books by &lt;a href="http://www.perfdynamics.com/"&gt;Neil Gunther&lt;/a&gt; gathering dust at home that I really need to read.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/Mysql5Patches"&gt;v3 Google patch&lt;/a&gt; now includes support for row-change logging. The code has a few bugs that we are fixing ASAP but it is ready for testing. There is a &lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/6780"&gt;talk on the this&lt;/a&gt; at the regular conference. We can try it out at the &lt;a href="http://forge.mysql.com/wiki/MySQLCamp2009Sessions"&gt;MySQL Hackfest&lt;/a&gt; at the (free) MySQL Camp on Monday morning. Row-change logging generates one text line per changed row that describes the change. It is easy to parse and can be used to:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Maintain a copy of the MySQL table in another RDBMS or in a scalable data structure such as HBase or Hypertable.&lt;/li&gt;&lt;li&gt;Implement a change notification service.&lt;/li&gt;&lt;li&gt;Maintain materialized views.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-5933659558362214190?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/5933659558362214190/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/talks-at-free-mysql-camp.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5933659558362214190'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/5933659558362214190'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/talks-at-free-mysql-camp.html' title='Talks at the (free) MySQL Camp'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-7945983536714981664</id><published>2009-04-13T17:03:00.000-07:00</published><updated>2010-03-28T08:11:33.224-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Battle of the hot boxes</title><content type='html'>Two companies are in the process of launching MySQL appliances: &lt;a href="http://www1.schoonerinfotech.com/"&gt;Schooner Infotech&lt;/a&gt; and &lt;a href="http://www.virident.com/index.php"&gt;Virident&lt;/a&gt;. Both incorporate flash in some form and it will be interesting to find out what value they add. I also want to know how they overcome some of the performance limits in InnoDB for multi-core and high-IOPs servers. The problems on multi-core servers are well known. The problems on high-IOPs servers are slowly becoming understood and there are fixes in the &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/Mysql5Patches"&gt;v3 Google patch&lt;/a&gt; and in &lt;a href="http://www.percona.com/percona-lab.html"&gt;Percona binaries&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Do the appliances use official versions of MySQL? The schooner web site states that InnoDB 1.0.3 is used, which goes a long way towards improving performance on servers with 8+ cores.&lt;br /&gt;&lt;br /&gt;I will describe some of the IO problems and improvements during my talks at the &lt;a href="http://conferences.percona.com/percona-performance-conference-2009/schedule.html"&gt;Percona Performance Conference&lt;/a&gt; and the &lt;a href="http://www.mysqlconf.com/mysql2009/public/schedule/detail/6653"&gt;MySQL Conference&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-7945983536714981664?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/7945983536714981664/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/battle-of-hot-boxes.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7945983536714981664'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/7945983536714981664'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/battle-of-hot-boxes.html' title='Battle of the hot boxes'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-2678281939106527213</id><published>2009-04-10T08:17:00.000-07:00</published><updated>2010-03-28T08:11:33.224-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>MySQL has a new storage engine for DB2?</title><content type='html'>MySQL has a new storage engine for DB2 on &lt;a href="http://en.wikipedia.org/wiki/AS400"&gt;IBM System i&lt;/a&gt;. System i used to be known as iSeries which used to be known as AS/400.&amp;nbsp; With a name change rate of 1 per decade, it has at least 3 more decades as a viable system (2020 -- renamed to &lt;b&gt;the i&lt;/b&gt;, 2030 -- renamed to &lt;b&gt;i&lt;/b&gt;, 2040 -- renamed to &lt;b&gt;''&lt;/b&gt;).&lt;br /&gt;&lt;br /&gt;I would describe this as a hybrid engine as it provides an interface to an existing DBMS and is tightly integrated with the existing DBMS. By comparison, the Federated storage engine also provides an interface to an existing DBMS, but is not tightly integrated. The integration should greatly increase usability and performance. Performance can be further improved by adding support for &lt;a href="http://bugs.mysql.com/bug.php?id=44206"&gt;condition pushdown&lt;/a&gt; for MySQL 5.1 and by implementing the batch key access interface when the engine is available in MySQL 6.&lt;br /&gt;&lt;br /&gt;A few more comments:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;My favorite line from the &lt;a href="http://en.wikipedia.org/wiki/AS/400"&gt;wikipedia page&lt;/a&gt; is &lt;i&gt;&lt;b&gt;on the System i everything is an object. &lt;/b&gt;&lt;/i&gt;Profound. &lt;/li&gt;&lt;li&gt;Where is the storage engine independent test suite that makes it easy to test new engine? The current suite is hardwired to use MyISAM and InnoDB for most tests.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Where is the virtual machine image that will allow us to try out System i with MySQL and the new engine using our Mac/Win/Linux boxes?&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-2678281939106527213?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/2678281939106527213/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/mysql-has-new-storage-engine-for-db2.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2678281939106527213'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/2678281939106527213'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/mysql-has-new-storage-engine-for-db2.html' title='MySQL has a new storage engine for DB2?'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-3957603213479316568</id><published>2009-04-09T13:19:00.000-07:00</published><updated>2010-03-28T08:11:33.225-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>An obscure bug</title><content type='html'>We have begun running a &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/OnlineDataDrift"&gt;tool&lt;/a&gt; that determines whether tables on slaves and masters match. This tool is similar to &lt;a href="http://www.maatkit.org/doc/mk-table-checksum.html"&gt;mk-table-checksum&lt;/a&gt; with a few optimizations from the Google patch:&lt;br /&gt;&lt;ul&gt;&lt;li&gt; It uses the &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/NewSqlFunctions"&gt;aggregate function LAST_VALUE&lt;/a&gt; to determine where to start a scan when a table is scanned using multiple (fast) queries rather than one slow query.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;It uses the &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/NewSqlFunctions"&gt;aggregate functions UNORDERED_CHECKSUM and ORDERED_CHECKSUM&lt;/a&gt; to compute checksums.&lt;/li&gt;&lt;/ul&gt;We soon found an InnoDB table with rows in the secondary index that were missing from the PRIMARY index. How can this be? My guess is that one or more disk writes to the PRIMARY index were lost. InnoDB stores an LSN and checksum on each page. That page for which writes were lost had a valid old version on disk. When that page is read, the checksum is still valid and there is no way to determine that the LSN is correct unless the last write for that page is recorded in the current transaction log or the doublewrite buffer.&lt;br /&gt;&lt;br /&gt;ZFS would detect this error as it stores a page checksum separate from the disk page. RAID 10 might help. If writes for one of the copies was lost, then as long as your are lucky enough to read the page from the good side of the mirror, you will get the good data. But that might not be comforting. What else would help to detect and or correct this?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-3957603213479316568?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/3957603213479316568/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/obscure-bug.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3957603213479316568'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/3957603213479316568'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/obscure-bug.html' title='An obscure bug'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5915567578707286635.post-1224965036695730010</id><published>2009-04-06T11:44:00.000-07:00</published><updated>2010-03-28T08:11:33.225-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Master-master replication and crash recovery</title><content type='html'>The slave SQL thread executes binlog events from the relay log to keep a slave in sync with the master. Prior to row-based replication, binlog events were SQL statements. The slave SQL thread records its state in the relay-log.info file. The state includes the file offset of the next binlog event to execute, so it is important that this state be correct to avoid skipping a transaction or running a transaction multiple times on the slave. The slave SQL thread does the following in a loop:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;replay all binlog events from a transaction&lt;/li&gt;&lt;li&gt;commit the transaction to the storage engines that participated&lt;/li&gt;&lt;li&gt;write new state to the relay-log.info file&lt;/li&gt;&lt;/ol&gt;Unfortunately, if the mysqld server crashes after step 2 and before step 3 then it will run the last transaction twice (before the crash, after the crash). This may fail with a duplicate key error or it may not and leave the slave inconsistent with the master with little evidence left behind for the DBA to notice the problem.&lt;br /&gt;&lt;br /&gt;This is &lt;a href="http://bugs.mysql.com/bug.php?id=26540"&gt;bug 26540&lt;/a&gt; if you want to express interest in a fix. This is also fixed in some cases for InnoDB by &lt;a href="http://code.google.com/p/google-mysql-tools/wiki/TransactionalReplication"&gt;transactional replication&lt;/a&gt; which has made it into &lt;a href="http://www.mysqlperformanceblog.com/2009/03/04/making-replication-a-bit-more-reliable/"&gt;Percona&lt;/a&gt;. By some cases, I mean that it does not protect transactions that update MyISAM tables, at least not in the Google patch. I have not reviewed the Percona code.&lt;br /&gt;&lt;br /&gt;This problem gets more interesting for master-master replication. In that case a server writes a binlog and a relay log and the update sequence is:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;replay all binlog events from a transaction&lt;/li&gt;&lt;li&gt;XA prepare for the binlog&lt;/li&gt;&lt;li&gt; XA prepare for InnoDB (assuming InnoDB is used)&lt;/li&gt;&lt;li&gt;write the XID to the binlog (commit)&lt;/li&gt;&lt;li&gt;commit the transaction to the storage engines that participated&lt;/li&gt;&lt;li&gt;write new state to the relay-log.info file&lt;/li&gt;&lt;/ol&gt;In this configuration, the server uses internal XA to coordinate the update of the binlog and commit to the storage engines.&amp;nbsp; There are three interesting crashes that can occur:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;before step 4 - This is not a problem. The prepared InnoDB transaction is rolled back during crash recovery and then run when the slave SQL thread starts.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;between step 4 and step 5 - This is a problem. The prepared InnoDB transaction is committed during crash recovery but relay-log.info is not updated. Note that transactional replication does not correct the mismatch so the last transaction will be run again when the slave SQL thread starts. Running the same transaction multiple times may cause replication to halt or may corrupt your database. &lt;/li&gt;&lt;li&gt;between step 5 and step 6 - This problem is fixed by transactional replication.&lt;/li&gt;&lt;/ul&gt;So there is a new problem that we need to fix for servers that will soon run with --log-slave-updates (not because I will use master-master replication).&amp;nbsp; Some of the changes described in my &lt;a href="http://mysqlha.blogspot.com/2009/04/making-replication-more-robust.html"&gt;previous post&lt;/a&gt; can fix this new problem, at least they do in our stress test framework.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5915567578707286635-1224965036695730010?l=mysqlha.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mysqlha.blogspot.com/feeds/1224965036695730010/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://mysqlha.blogspot.com/2009/04/master-master-replication-and-crash.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1224965036695730010'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5915567578707286635/posts/default/1224965036695730010'/><link rel='alternate' type='text/html' href='http://mysqlha.blogspot.com/2009/04/master-master-replication-and-crash.html' title='Master-master replication and crash recovery'/><author><name>Mark Callaghan</name><uri>http://www.blogger.com/profile/09590445221922043181</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://1.bp.blogspot.com/_3rU41dez5TI/Snh3Q8IDb7I/AAAAAAAAAWg/s9BOv7OeFEs/S220/Photo+21.jpg'/></author><thr:total>3</thr:total></entry></feed>
