Thursday, October 25, 2007

Going parallel

When not injecting MySQL with code, I try to make MySQL work better for my employer. I also campaign for features I want in MySQL and against features I don't want in MySQL. Support for parallel operations is an area that I am still uncertain about it.

There are a few reasons why I don't want it supported in MySQL.
  • It will make the MySQL server much more complex and bugs accompany complexity.
  • MySQL doesn't need it for many workloads. It favors throughput over response time for queries on large data sets. That model works for many customers. The simplicity and scalability of replication further enable throughput.
  • There are alternatives for parallel query processing including Greenplum, if they open source all of their code, and Hadoop, if you don't need SQL and you can figure out how to get your data into it from MySQL.
But what does it mean to support parallel operations in MySQL? MySQL can support parallel query processing within one mysqld process. I am wary of this because it will require significant changes to the optimizer and query execution code in MySQL, even if this is limited to queries on partitioned tables.

There are more limited forms of support for parallel operations that require fewer changes to MySQL.
  • InnoDB already supports parallel IO because it issues prefetch requests during table and index scan and the requests are processed by a background thread.
  • Table and index scans on partitioned tables can be done in parallel.
  • Filesort can use async IO, real or simulated, to overlap IO with sorting.
Finally, there is an alternative to the traditional approach to parallel query processing that is made possible by the MySQL Proxy and the ease with which MySQL supports scale out. Queries could be parallelized in the MySQL Proxy rather than in the MySQL server. Different parts of a query can be run on different servers and combined within the proxy to be returned to the user as if a single query were used. I hope this is done.

1 comment:

  1. Agreed on all parts, Mark. I'd hope that parallel execution shows up in certain parts of the code (partitioning, etc) especially when I/O bottlenecks can be bypassed when multiple disk heads are available for writing...

    I think it's best to start simple, in modules of MySQL that can benefit from it, instead of trying to rework a major subsystem, where bugs can so easily come into play...

    Cheers,

    Jay

    ReplyDelete

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.