1. Is MySQL getting faster over time? I think so. It is difficult to keep things fast as features are added, but many operations have gotten faster between 4.0.26 and 5.0.54 and the performance of 5.0.54 is much improved over 5.0.37. I ran a few tests to measure the difference:
    • dump and reload a table
    • sql-bench
    The formatting of the tables below looked great when I previewed them using the blog editor, but not so good when published.

    Dump and Reload

    I measured the time to dump and reload a table. The table has 10 million rows, 3 int columns and no indexes. MyISAM and InnoDB storage engines were used. The InnoDB tables were on a tmpfs filesystem to avoid disk IO. The test was performed in 4 steps:
    • Load the data into a MyISAM table using load data infile. This measures the performance of inserting without the overhead of parsing SQL statements.
    • Dump the table using mysqldump --extended-insert. This measures the performance of the MySQL client library and the select code path in the storage engine and the server.
    • Reload the output of mysqldump into a MyISAM table. This measures the performance of the MySQL client library, the mysql command-line tool, the parser and the storage engine.
    • Reload the output of mysqldump into an InnoDB table.
    Performance can change significantly between releases, and usually in the right direction:
    • mysqldump is much faster in 5.0. The client library was changed to do fewer read() calls when fetching a query result. This is great.
    • the parser was very fast in 4.0 and 5.0.54 has returned some of that performance. Inserting a large number of rows is frequently bound by the performance of the parser and this was much slower in early 5.0 releases.
    • because of a bug in the 5.0.54 mysql command line client, I used the version from 5.0.51 for the 5.0.54 tests.
    This lists the number of seconds to run each test.












































    Versionload data infilemysqldumpinsert to MyISAMinsert to InnoDB
    4.0.268.919.733.772.7
    4.1.228.522.742.779.9
    5.0.379.39.469.6117.7
    5.0.549.29.650.852.5

    sql-bench

    This is a simple benchmark distributed with MySQL. I ran the version provided with 4.0.26.
    • InnoDB was used for all tables.
    • Constants that determine the amount of work done for each test were changed.
    • Tests were run on a server with 4 CPU cores.
    • 32-bit binaries were used.
    5.0.54 is the fastest. This lists the number of seconds to run each section of the test.







































































    ReleaseTotal TimeAlterATISbig-tablesconnectcreateinsertselecttransactionswisconsin
    4.0.26390924435415931393227
    4.1.22412854250526233323631
    5.0.37422424566676134384836
    5.0.54378404558545633363930
    4

    View comments

  2. Do you create transient tables and then use them with complex queries? By transient I mean a table with a short lifetime, temporary or not. If there are indexes on them, then you should run ANALYZE TABLE prior to running queries. But what you need to do is determined by the type of storage engine you use.

    As background, the MySQL optimizer uses two types of table statistics
    • predicate selectivity - computed per-query by handler::records_in_range. Determines the number of rows that will be fetched from a table. Evaluated for the single table predicates in the WHERE clause and only for columns that are indexed.
    • index cardinality - uses the data displayed by SHOW INDEXES. Influences join order. Used with the join conditions to determine join selectivity.
    For InnoDB:
    • predicate selectivity is computed by probing the index and accesses one or two leaf nodes per probe.
    • index cardinality is computed by sampling 8 leaf nodes per index. This is done the first time a table is opened, and after many rows have been changed for a table. Because this is very fast, index cardinality stats are always available whether or not ANALYZE TABLE has been run.
    For MyISAM and Heap
    • index cardinality stats are not available until ANALYZE TABLE has been run. If you forget to run it for transient tables, then you are likely to get bad query plans.
    • predicate selectivity is accurate before ANALYZE TABLE has been run.
    I have filed a feature request asking for changes so that index cardinality stats are available for MyISAM and Heap tables before ANALYZE TABLE is run. This can save many of us from slow query plans.

    Examples below show that index cardinality stats are available for InnoDB but not for MyISAM and Heap tables. First, create the tables for the experiment.
    drop table if exists n, st, st2, it, it2;

    create table n(i int);
    insert into n values (0), (1), (2), (3), (4), (5), (6), (7), (8);
    insert into n select * from n;
    insert into n select * from n;
    insert into n select * from n;
    insert into n select * from n;

    create temporary table tt2 (i int, p int not null auto_increment, index(i,p), primary key (p));
    create table st2 (i int, p int not null auto_increment, index(i,p), primary key (p));
    create table it2 (i int, p int not null auto_increment, index(i,p), primary key (p)) engine=innodb;

    insert into tt2 select 0, null from n;
    insert into st2 select 0, null from n;
    insert into it2 select 0, null from n;
    Display index cardinality for MyISAM and Heap tables. Note that the value for Cardinality is NULL except for the primary key columns.
    show indexes from tt2;
    --------------

    Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
    tt2 0 PRIMARY 1 p A 144 NULL NULL BTREE
    tt2 1 i 1 i A NULL NULL NULL YES BTREE
    tt2 1 i 2 p A NULL NULL NULL BTREE

    show indexes from st2;
    --------------

    Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
    st2 0 PRIMARY 1 p A 144 NULL NULL BTREE
    st2 1 i 1 i A NULL NULL NULL YES BTREE
    st2 1 i 2 p A NULL NULL NULL BTREE

    Display index cardinality for the InnoDB table. Note that the values for Cardinality are valid.
    show indexes from it2;
    --------------

    Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment
    it2 0 PRIMARY 1 p A 144 NULL NULL BTREE
    it2 1 i 1 i A 144 NULL NULL YES BTREE
    it2 1 i 2 p A 144 NULL NULL BTREE
    MyISAM, Heap and InnoDB are able to determine predicate selectivity without having run analyze table. The where clause selects all of the rows in the table.
    explain select * from tt2 where i = 0;
    --------------
    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE tt2 index i i 9 NULL 108 Using where; Using index

    explain select * from st2 where i = 0;
    --------------
    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE st2 index i i 9 NULL 108 Using where; Using index

    explain select * from it2 where i = 0;
    --------------
    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE it2 ref i i 5 const 72 Using where; Using index
    MyISAM, Heap and InnoDB are able to determine predicate selectivity without having run analyze table. The where clause selects none of the rows in the table.
    explain select * from tt2 where i = 1;
    --------------
    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE tt2 ref i i 5 const 1 Using where; Using index

    explain select * from st2 where i = 1;
    --------------
    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE st2 ref i i 5 const 1 Using where; Using index

    explain select * from it2 where i = 1;
    --------------
    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE it2 ref i i 5 const 1 Using where; Using index
    4

    View comments

  3. Decimal is the default numeric data type in MySQL5. By this I mean that Decimal is the type for most numeric literals, is much improved over the implementation in MySQL4, and is the numeric type preferred by just about anyone who has to explain the semantics of decimal to binary floating point conversions.

    This is a good thing as correctness is usually preferred over performance and decimal rounding is much easier to reason about than the IEEE754 standard. I ran a few simple tests to compare the performance between arithmetic using Decimal and Double and the tests made me realize one thing.

    Decimal arithmetic is fast in MySQL.

    To measure this, I created two tables with the same data. Each table stored 4,000,000 rows -- 4 copies of the integers 1 to 1,000,000 in one column and the inverse of those values in the other column. The table definitions were:
    create table d(d double, i double not null) engine=innodb;
    create table e(d decimal, i decimal(8,3) not null) engine=innodb;
    I then ran these queries several times to determine the performance with a warm cache:
    select count(*) from TABLE
    select sum(i) from TABLE
    select sum(i*i) from TABLE
    The first query was done to determine the overhead of fetching rows. The time for it was subtracted from the time to execute the other queries in an attempt to isolate the cost of arithmetic. The second query does one addition per row. The third query does one addition and one multiplication per row.

    The per row overhead for one addition:
    • 133 nanoseconds for Double
    • 195 nanoseconds for Decimal
    The per row overhead for one addition and one multiplication:
    • 163 nanoseconds for Double
    • 358 nanoseconds for Decimal
    0

    Add a comment

  4. We (many people) just upgraded a MySQL deployment from 4.0.26 to 5.0.37. It took a long time to get it ready, but everything works and there was minimal impact on the database users. I will share some of the fun here.
    • Our database files are about 15% smaller thanks to the compact row format new with InnoDB in 5.0.
    • The data in our 5.0 databases was exactly the same as the data in our 4.0 databases, with two exceptions. I know because we checked many times. This is a big deal. The exceptions are one application we were unwilling to change to fix differences in rounding arithmetic expressions and the least significant bits of the values of Double columns.
    • The parser in MySQL 5 is much faster than it used to be. We measured a big performance regression in our database reload benchmark. MySQL worked with us to fix this. Had this been a closed-source RDBMS, the collaboration would not have been possible.
    • InnoDB has begun to fix the concurrency bottlenecks in 5.0 and we expect much better performance on large SMPs in the near future.
    • Primaries are much more crash proof with the use of XA internally to synchronize commit between a storage engine and the binlog.
    • I haven't noticed a big performance difference between 4.0 and 5.0. The codepath in 5.0 is longer as there are many more features there, and this makes some things a bit slower. But 5.0 has new features that make some queries faster.
    • We had to frequently convert a database from 4.0 to 5.0 and we did this by dumping the tables with mysqldump from 4.0 and reloading into 5.0. We ended up writing Python utilities to do this with configurable levels of parallelism. The tool used a SQLite database to checkpoint the progress of the dump and restore. Maatkit provides similar features.
    • Primarily because of the use of Decimal rather than Double for some arithmetic operations and numeric literals and a change in blank trimming, there were differences in column values on 5.0 slaves compared to 4.0 masters. We built a few tools to identify all differences between tables. Computing table checksums is only part of the problem. Once you know that tables are different, you need to determine where and why. In most of these cases, the SQL statements contained in the binlog were used to identify the applications that needed to be updated. I am not sure how we would have done this were row based replication used. Again, Maatkit provides similar functionality, but I am not sure if it identifies the rows that differ.
    • Some queries that worked in 4.0 did not parse in 5.0 because the SQL syntax for 5.0 was improved to be more standards compliant. This is a great thing to do, but it required some work for us to identify and fix the broken queries.
    • You cannot replicate from a 5.0 master to a 4.0 slave. If you are paranoid during an upgrade, you want to preserve a downgrade path. We built a tool for this.
    • Condition pushdown, new in 5.0, is a big deal for us. We hope for more enhancements to the table handler interface. (hint -- aggregation pushdown)
    5

    View comments

Loading