1. MySQL performance on SMP servers has been reduced because there are too many subsystems that use a mutex or rw-lock to protect access to all subsystem state. I use the term global mutexes to describe these. This is to be expected given that the majority of the MySQL deployments were not on SMP servers when these subsystems were implemented.

    But times have changed and we really need to stop adding global mutexes to MySQL and by we I mean MySQL the company as the community has a limited role in their development process. Yet global mutexes continue to be added and continue to reduce SMP performance. Can the code and design review process be changed to prevent this?

    Examples of recent problems include:
    Examples of not-recent problems include:
    • MyISAM engine has a global mutex on the key cache and this limits performance on SMP even when all data is cached. InnoDB can be much faster than MyISAM on SMP.
    • HEAP engine has a global mutex. In a few places it is held when malloc and free are called. This is part of the reason why tcmalloc makes sysbench OLTP readonly so much faster.
    • InnoDB has several hot global mutexes that are being fixed by InnoDB, Percona and Google. The biggest open problems are prepare_commit_mutex and kernel_mutex.
    • LOCK_open is hot when queries are parsed and ended. acl_cache->lock is hot when access control checks are made.
    We are in the middle of backporting the thread/connection pool code from 6.0 to 5.0 and I ran sysbench OLTP readonly tests on the current code. Peak throughput is about 2X faster for InnoDB and 3X faster for blackhole with the thread/connection pool code disabled. With the code enabled, every command (query) sent to the server requires a pthread mutex lock/unlock pair on LOCK_event_loop. The bug for this is here.

    I am sure this will be fixed. But a more interesting problem remains. The thread/connection pool allows thousands of connections to be run on hundreds of threads. The pool has a fixed number of threads. This can make a server hang when all of the threads are handling long-running queries or queries that are blocked in the server. A few changes can be made to reduce this problem:
    • get MySQL to listen on multiple ports. Use the pool to handle requests that arrive on one port. Don't use the pool to handle requests that arrive on other ports. We already added code to MySQL to make it listen on a separate port but only accept connections from accounts with SUPER or REPL_SLAVE privileges. I don't want replication connections to get blocked waiting for a thread from the pool.
    • limit the max number of concurrent queries per account. I don't want one bad app to prevent all others from connecting to a server. We have added a column to the mysql.user table to specify this limit.
    • run with innodb_thread_concurrency=0 as the MySQL docs state
    • make all blocked threads respond to KILL commands. InnoDB has a srv_lock_timeout_and_monitor thread that scans the array of threads blocked on InnoDB locks. This could be extended to wake KILLed connections.
    • add another background thread that kills blocked connections when the thread pool runs out of threads
    But the problem still remains even with these changes as there are many ways to block a query, and the connections that block use threads that cannot be used by other connections to unblock them and for many of the problems there are no timeouts. The connection that is blocked will block until you noticed the problem and run a KILL command.
    • LOCK TABLES - this command is blocked until the referenced tables are locked. Other connections with long-running connections will make this wait for a long time. When this has completed, other connections may be blocked by the locks held for this command. Timeouts are not supported in either case although the kill command can be used.
    • FLUSH TABLES WITH READ LOCK - this has problems similar to LOCK TABLES
    • GET_LOCK() - fortunately, this has a timeout
    • innodb_lock_wait_timeout determines the number of seconds a connection waits for a row lock when the row is locked in a conflicting mode by another connection
    • MyISAM tables are implicitly exclusively locked for the duration of DELETE, UPDATE and (non-appending) INSERT statements. Queries block until the statements complete.
    • Tables are exclusively locked for the duration of DDL statements. 
    • innodb_thread_concurrency - the thread/connection pool docs state that you don't need to set this when the pool is used. I disagree. But if it is set, it is another thing that may block a running connection. You will still want to set this for performance reasons on an 8-core server for some workloads. If you have many long-running queries and don't have patches from Percona or Google, then things can be slower than expected unless innodb_thread_concurrency=4 is used.
    I don't have a solution to propose for this problem but I think one is required if the thread/connection pool code is to be used widely. Or MySQL support is going to be very busy explaining and debugging.
    4

    View comments

  2. TCMalloc can make MySQL much faster (2X) for some workloads. I ran the sysbench OLTP readonly test on servers with 4, 8 and 16 cores to determine the benefit of using TCMalloc with unmodified MySQL 5.0.75. Malloc is a bottleneck for some workloads with MySQL because:
    • The overhead to parse, optimize and setup query execution structures is significant for fast statements. LOCK_open and mutexes used for access control have a lot of contention.
    • The HEAP engine has a global mutex held when malloc is called and HEAP tables are used to process ORDER BY and GROUP BY clauses. This should be easy to fix.
    A key point is that InnoDB is far from the only problem in MySQL for SMP servers. While alternatives are great, fixing InnoDB or replacing it with Maria, Falcon and PBXT won't fix all of the problems

    And now, a few claims that are true for workloads similar to the sysbench readonly OLTP test.
    • At high concurrency, tcmalloc doubles throughput.
    • InnoDB is much faster than MyISAM and the advantage grows with the number of CPU cores.
    • innodb_thread_concurrency=4 has a cost. It helps for CPU bound workloads dominated by long running statements. In other cases, it can hurt performance.
    All of the data used for the graphs below is available. Unmodified MySQL 5.0.75 was used with and without tcmalloc. The test is sysbench OLTP readonly. Malloc is a bottleneck for sysbench OLTP readonly, so TCMalloc makes a big difference. For other workloads there are other bottlenecks and other changes are needed (such as the faster rw-mutex patch and patches to reduce other mutex contention in InnoDB).

    Is this motivation to buy a big SMP?

    TCMalloc makes InnoDB much faster at 8+ concurrent users:

    TCMalloc makes MyISAM much faster at 8+ concurrent users:
    InnoDB is much faster than MyISAM on a 16-core server at 8+ concurrent users:

    InnoDB is much faster than MyISAM on an 8-core server at 8+ concurrent users:

    For a 16-core server, innodb_thread_concurrency avoids both worst and best throughput:
    11

    View comments

  3. InnoDB uses more disk space than MyISAM because transactions are not free. But how much more will it use? That depends on your schema and data. Depending on the value of pack_keys for a MyISAM table, InnoDB uses 1.24X, 1.35X or 1.46X the disk space as used by MyISAM for the schema and data that I care about. I don't think this is a huge difference and is certainly far from a 3X to 4X overhead.

    I measured the overhead by reloading the database using InnoDB and MyISAM. For MyISAM I used pack_keys=default, 1 and 0. I would not use pack_keys=1 in production because of the performance overhead during query processing. The schema is complex -- think enterprise app rather than web app. The load was done using 8 concurrent sessions on an 8-core server. It took much longer (3.5X) to load the MyISAM tables even though repair by sort was used for index creation. The MyISAM load was CPU bound with almost 2/3 of the CPU time in pthread_mutex calls. Had I manually assigned indexes to multiple key caches this would have been less of a problem, but that was too much work. Why doesn't MySQL do this automatically? Percona has great articles on this topic.

    The InnoDB row structure is described here. The overhead for an InnoDB row in MySQL 5.0 and beyond is approximately 20 + (1 * #columns) bytes:
    • 6 bytes for a transaction ID
    • 7 bytes for a roll pointer
    • 5 bytes for a row header
    • (#possibly_null_columns + 7) / 8 bytes for the null column bitmap
    • 1 or 2 bytes per variable length column
    There are additional overheads:
    • Each secondary index entry stores all primary key columns. You should avoid creating many secondary indexes on a table with a large primary key.
    • Each secondary index entry may have a per-row overhead similar to that for the base row. I have not read the code to determine whether this is true (shame on me). But from a quick test, the overhead was no more than 9 bytes per entry on MySQL 4.0.
    • Secondary indexes are almost always fragmented. Prior to the MySQL 5.1 plugin they are always built incrementally after insert, update and delete statements. Assuming changes are made in a random key order, they should be about 2/3 full.
    • Primary (clustered) indexes may be fragmented. This can be avoided by occasionally reorganizing the table.
    3

    View comments

  4. MyISAM is faster because everyone says so. This has even made it to the MyISAM wikipedia entry. That is unfortunate because I think that InnoDB is faster for most workloads.

    There are workloads for which MyISAM is faster. But we should stop saying MyISAM is faster. This is repeated too often with neither qualification nor quantification. So I will take a turn. Either my results are bogus or InnoDB is faster (for most workloads). Regardless, tmcalloc makes everything in MySQL faster. I am not the first to say this and this isn't the first time I have said this, but I have pretty graphs to accompany my statement.

    MyISAM has several advantages versus InnoDB, but it also has several performance problems:
    1. readers block writers and writers block readers with the exception of insert-append
    2. data blocks are not cached by the MySQL server
    3. there is a global MyISAM index cache (key cache) protected by one mutex. Tables can have their own key cache which I hope is protected by a separate mutex, but this doesn't reduce contention for workloads that are isolated to one or a few tables
    I ran sysbench using a read-only CPU-bound workload. More details on the test environment:
    • sysbench test args: --test=oltp --oltp-read-only --oltp-table-size=1000000 --max-time=60 --max-requests=0 --db-ps-mode=disable
    • server with 8 CPU cores
    • dataset is cached in memory
    • sysbench client and mysqld ran on the same server
    • unmodified MySQL 5.0.75 and Google-patched MySQL 5.0.37. The Google patch includes the use of tcmalloc.
    This plots the performance of InnoDB and MyISAM for unmodified 5.0.75 and patched 5.0.37. Note that MyISAM is much faster in the patched 5.0.37 build than in the unmodified 5.0.75 build. The difference is tcmalloc.


    This plots the performance of InnoDB, MyISAM and Blackhole for patched 5.0.37. InnoDB is much faster than MyISAM.


    This plots the performance of InnoDB, MyISAM and Blackhole for unmodified 5.0.75. InnoDB is a bit faster than MyISAM.
    16

    View comments

  5. Oracle presented at OpenWorld on the use of MySQL for sharding. The presentation was targeted at large-scale (#shards >= 100) deployments of sharded MySQL. The presentation claims that Oracle is better than MySQL for such a deployment and lists some of the operational costs that exist in a MySQL deployment. I think the focus should have been on small-scale deployments as large-scale deployments have more financial incentives to make MySQL work.

    Some of us were quoted (Monty, Baron, the pre-GA 5.1 manual, work by Facebook and possibly work by my employer) without the courtesy of an Oracle t-shirt or tickets to the OpenWorld party. Bah humbug.

    The presentation had few details about the use of Oracle in a large-scale sharded deployment.  Are there customers who use hundreds or thousands of shards? Does anyone write about their experiences?
    8

    View comments

  6. Parsed but ignored has to stop. What am I talking about? If you don't know then I have made my point.
    This DDL doesn't do what was requested. There is no foreign key.
    mysql> create table a(i int primary key);
    mysql> create table b(i int primary key, ai int, index aix(ai));
    mysql> alter table b add constraint foreign key (ai) references a;
    This DDL doesn't do what was requested. The desc option is ignored.
    mysql> create table c(i int, o int, index x(i asc, o desc));
    mysql> insert into c values (1,1), (2,2), (3,3);
    mysql> analyze table c;
    Surprise! This query can use the index: select * from c order by i asc, o asc
    mysql> explain select * from c order by i asc, o asc;
    +----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
    | id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra       |
    +----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
    |  1 | SIMPLE      | c     | index | NULL          | x    | 10      | NULL |    3 | Using index |
    +----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
    Surprise! This query cannot use the index: select * from c order by i asc, o desc
    mysql> explain select * from c order by i asc, o desc;
    +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------+
    | id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                       |
    +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------+
    |  1 | SIMPLE      | c     | index | NULL          | x    | 10      | NULL |    3 | Using index; Using filesort |
    +----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------+
    7

    View comments

Loading