1. Someone I know used to make jokes about their plans to run MySQL 4.0 forever. It wasn't a horrible idea as 4.0 was very efficient and the Google patch added monitoring and crash-proof replication slaves. I spent time this week comparing MySQL 5.7.2 with 5.6.12 and 5.1.63. To finish the results I now have numbers for 4.1.22. I wanted to include 4.0 but I don't think it works good when compiled with a modern version of gcc and I didn't want to debug the problem. The result summary is that 4.1.22 is much faster at low concurrency and much slower at high concurrency. Of course we want the best of both worlds -- 4.1.22 performance at low concurrency and 5.7.2 performance at high. Can we get that?

    I used sysbench for single-threaded and high concurrency workloads. The database is cached by InnoDB. All of the QPS numbers are in previous posts except for the 4.1.22 results. I only include the charts and graphs here as the differences between 4.1.22 and modern MySQL stand out.

    Single thread results

    For all of the charts the results for 4.1.22 are at the top. The first result is for a workload that fetches 1 row by primary key via SELECT. MySQL 4.1.22 is much better than modern MySQL.


    The next result is for a workload that fetches 1 row by primary key via HANDLER. MySQL 4.1.22 is still the best but the difference is smaller than for SELECT.


    The last result is for a workload that updates 1 row by primary key. The database uses reduced durability (no binlog, no fsync on commit). Modern MySQL has gotten much slower. Bulk loading a database with MySQL might be a lot slower than it was in 4.1.22.

    Concurrent results

    MySQL 4.1.22 looked much better than modern MySQL on the single-threaded results. It looks much worse on the high-concurrency workloads and displays a pattern that was well known back in the day -- QPS collapses once there are too many concurrent requests.

    Here is an example of that pattern for SELECT by primary key.


    This is an example of the collapse for fetch 1 row by primary key via HANDLER.


    The final example of the collapse is for UPDATE 1 row by primary key. Note that 5.1.63 with and without the Facebook patch also collapses.

    5

    View comments

  2. Many of my write-intensive benchmarks use reduced durability mode (fsync off, binlog off) because that was required to understand whether other parts of the server might be a bottleneck. Fortunately real group commit exists in 5.6 and 5.7 and it works great. Results here compare the performance between official 5.1, 5.6 and 5.7 along with Facebook 5.1. I included FB 5.1 because it was the first to have group commit and the first to use that in production. But the official version of real group commit is much better, as is the MariaDB version. Performance for the same workload without group commit is here.

    I compared 5 binaries:
    • orig5612.gc - MySQL 5.6.12 with group commit
    • orig572.gc - MySQL 5.7.2 with group commit
    • fb5163.gc - MySQL 5.1.63 and the FB patch with group commit
    • fb5163.nogc - MySQL 5.1.63 and the FB patch without group commit
    • orig5163 - MySQL 5.1.63 without group commit

    This graphs displays performance for an update-only workload. The client is 8 sysbench processes that run on one host with mysqld on another host. The total number of clients tested was 8, 16, 32, 64, 128 and 256 where the clients are evenly divided between the sysbench processes. The test database was cached by InnoDB and 8 tables with 8M rows each were used. The clients were evenly divided between the 8 tables. Each transaction is an auto-commit UPDATE that changes the non-indexed column for 1 row found by primary key lookup. The binlog was enabled and InnoDB did fsync on commit. The test server has 24 CPUs with HT enabled and storage is fast flash.
    This table has all of the results from the test.

    binary8163264128256
    orig5612.gc111081772828789395334770851110
    orig572.gc110211758327133371454373646717
    fb5163.gc80511470922486288823174331888
    fb5163.nogc745767846722668968106537
    orig5163694271256957659866486568

    For comparison I include these results from tests that disabled the binlog and fsync on commit. In these tests performance of 5.1.63 collapses under concurrency. I did not debug the cause but using the binlog and fsync on commit improved performance. Note also that 5.6 and 5.7 can do ~70k TPS in a reduced durability configuration and ~50k in a durable configuration.  So durability costs about 2/7 of the peak.

    binary8163264128
    fb5163.noahi2854341978317581448010208
    orig5163.noahi2771447582352311493610308
    fb5612.noahi2593547862696797573071983
    orig5612.noahi2692051842732887875772966
    orig5612.psdis2713850902717117720871674
    orig5612.psen2657648000694517572970466
    orig572.noahi2608948382731908337375456
    orig572.psdis2509048368717958234875060
    orig572.psen2537545751691547902370585
    0

    Add a comment

  3. These are results for sysbench with a cached database and concurrent workload. All data is in the InnoDB buffer pool and I used three workloads (select only, handler only, update only) as described here. The summary is that MySQL sustains much higher update rates starting with 5.6 and that improves again in 5.7. Read-only performance also improves but to get a huge increase over 5.1 or 5.6 you need a workload with extremely high concurrency.

    The tests used one server for clients and another for mysqld. Ping between the hosts takes ~250 microseconds. The mysqld host has 24 CPUs with HT enabled. Durability was reduced for the update test -- binlog off, no fsync on commit and host storage was fast for the writes/fsyncs that had to be done. The names used to describe the binaries is described here. Each test was repeated for 8, 16, 32, 64 and 128 concurrent clients.

    You might also notice there is a performance regression in the FB patches for MySQL 5.6. I am still trying to figure that out. The regression is less than the one in 5.6/5.7 when the PS is enabled but I hope we can get per-table and per-user resource monitoring with less overhead.

    SELECT by PK

    binary8163264128
    fb5163.noahi3122859099128500184677192537
    orig5163.noahi3245067192126999183784193457
    fb5612.noahi2899658856118444168934175622
    orig5612.noahi3320759713124882176286184590
    orig5612.psdis2796363654123344174108180259
    orig5612.psen2919257937116613160917164578
    orig572.noahi3005362649121101171441180280
    orig572.psdis3083562925117282165293171528
    orig572.psen3186958030117433156647160074

    HANDLER by PK

    binary8163264128
    fb5163.noahi3461373636156946239452207223
    orig5163.noahi3801483000152202223349133286
    fb5612.noahi3455283458152313243776266989
    orig5612.noahi3606484524158341246033276491
    orig5612.psdis3853771292159299242109272497
    orig5612.psen3499782608151322228510249636
    orig572.noahi3326073790161773242909280488
    orig572.psdis3424471770151687239340272236
    orig572.psen3772372841153221226125248008

    UPDATE by PK

    binary8163264128
    fb5163.noahi2854341978317581448010208
    orig5163.noahi2771447582352311493610308
    fb5612.noahi2593547862696797573071983
    orig5612.noahi2692051842732887875772966
    orig5612.psdis2713850902717117720871674
    orig5612.psen2657648000694517572970466
    orig572.noahi2608948382731908337375456
    orig572.psdis2509048368717958234875060
    orig572.psen2537545751691547902370585
    3

    View comments

  4. I used sysbench to measure the performance for concurrent clients connecting and then running a query. Each transaction in this case is one new connection followed by a HANDLER statement to fetch 1 row by primary key.  Connection create is getting faster in 5.6 and even more so in 5.7. But enabling the performance schema with default options significantly reduces performance. See bug 70018 if you care about that.

    There are more details on my test setup in previous posts. For this test clients and server ran on separate hosts and ping takes ~250 usecs between them today. Eight sysbench processes were run on the client host and each process created between 1 and 16 connections to mysqld. The database is cached by InnoDB and the clients were divided evenly between the tables.  Each table has 8M rows.

    These are results in TPS for 8, 16, 32, 64 and 128 concurrent clients. Each transaction is connect followed by a HANDLER fetch. The binaries orig572.psen and orig5612.psen use the performance schema with default options for MySQL 5.7.2 and 5.6.12. Throughput is much worse compared to the same code without the PS. All binary names are explained here.

    binary8163264128
    fb5163.noahi40418084163231638016066
    orig5163.noahi40267848155871591215741
    fb5612.noahi40047425231252457024688
    orig5612.noahi40277601261552802128091
    orig5612.psdis40087643256402751727631
    orig5612.psen42059366211972145621592
    orig572.noahi41729248286133945139721
    orig572.psdis40257612276003796338044
    orig572.psen40017870184372298223240

    And this chart has data for some of the binaries.

    1

    View comments

  5. I used sysbench to understand the changes in connection create performance between MySQL versions 5.1, 5.6 and 5.7. The test used single-threaded sysbench where each query created a new connection and then selected one row by PK via HANDLER. The database was cached by InnoDB and both the single client thread and mysqld ran on the same host. The tests were otherwise the same as described in a previous post.

    The summary is that connection create has gotten faster in MySQL 5.6 and 5.7 but enabling the performance schema with default options reduces that by about 10% for a single threaded workload. Bug 70018 is open to reduce this overhead. The memory consumed per increment of max_connections by the PS might also be interesting to you.

    binaryQPS
    fb5163.noahi2087
    orig5163.noahi2122
    fb5612.noahi2656
    orig5612.noahi2775
    orig5612.psdis2706
    orig5612.psen2468
    orig572.noahi2687
    orig572.psdis2611
    orig572.psen2427
    1

    View comments

  6. This isn't a new message but single-threaded performance continues to get worse in 5.7.2. There have been regressions from 5.1 to 5.6 and now to 5.7. I skipped testing 5.5. On the bright side there is progress on a bug I opened for this and MySQL seems to be very interested in making things better. The regressions for UPDATE and SELECT are much worse than for HANDLER so I assume the optimizer accounts for much of the new overhead.

    The performance schema with default instrumentation appears to have a higher overhead than the Facebook patch. The critical monitoring for the FB patch is per-user and per-table statistics. While it is always nice to reduce the size of the FB patch, and switching back to the PS for that would reduce it, I don't think that will happen until the PS becomes more efficient.

    The graphs below have results for 5.1 at the top, 5.6 in the middle and 5.7 on the bottom. This makes it easier to see the regressions over time.

    I tested 3 workloads using sysbench: select, handler and update. The select workload fetches all columns in one row by primary key using SELECT. The handler workload does the same using HANDLER instead of SELECT. The update workload updates a non-indexed column in 1 row by primary key. For all of the tests the database was cached by InnoDB. The tests used 1 sysbench process and 1 table (sbtest1) and all processes ran on the same server. Only one client connection (1 thread) was used during each test. Durability was reduced for the update test -- no binlog, no fsync on commit

    MySQL 5.1.63, 5.6.12 and 5.7.2 were tested in several configuration -- with/without the adaptive hash index (AHI below) and with/without the performance schema (PS below). When the PS is enabled only the default options are used. The results use the following binary names:

    • fb5163.noahi - 5.1.63, Facebook patch, AHI off
    • fb5163.ahi - 5.1.63, Facebook patch, AHI on
    • orig5163.noahi - 5.1.63, AHI off
    • orig5163.ahi - 5.1.63, AHI on
    • fb5612.noahi - 5.6.12, Facebook patch, AHI off
    • fb5612.ahi - 5.6.12, Facebook patch, AHI on
    • orig5612.noahi - 5.6.12, AHI off, PS not compiled
    • orig5612.ahi - 5.6.12, AHI on, PS not compiled
    • orig5612.psdis - 5.6.12, AHI off, PS compiled but disabled
    • orig5612.psen - 5.6.12, AHI off, PS compiled & enabled
    • orig572.noahi - 5.7.2, AHI off, PS not compiled
    • orig572.ahi - 5.7.2, AHI on, PS not compiled
    • orig572.psdis - 5.7.2, AHI off, PS compiled but disabled
    • orig572.psen - 5.7.2, AHI off, PS compiled & enabled

    Fetch 1 row by SELECT via PK

    binaryQPS
    fb5163.noahi10087
    fb5163.ahi10918
    orig5163.noahi10230
    orig5163.ahi10614
    fb5612.noahi9070
    fb5612.ahi9607
    orig5612.noahi9412
    orig5612.ahi9511
    orig5612.psdis9334
    orig5612.psen8702
    orig572.noahi9128
    orig572.ahi9607
    orig572.psdis8573
    orig572.psen8572

    Fetch 1 row by PK via HANDLER
    binaryQPS
    fb5163.noahi14560
    fb5163.ahi14832
    orig5163.noahi14679
    orig5163.ahi15535
    fb5612.noahi14532
    fb5612.ahi14908
    orig5612.noahi15068
    orig5612.ahi14638
    orig5612.psdis13840
    orig5612.psen14433
    orig572.noahi13960
    orig572.ahi14799
    orig572.psdis14434
    orig572.psen13697

    Update 1 row by PK

    binaryQPS
    fb5163.noahi7947
    fb5163.ahi8130
    orig5163.noahi8184
    orig5163.ahi8273
    fb5612.noahi6569
    fb5612.ahi6587
    orig5612.noahi6813
    orig5612.ahi6893
    orig5612.psdis6613
    orig5612.psen6395
    orig572.noahi6350
    orig572.ahi6306
    orig572.psdis6131
    orig572.psen5984
    2

    View comments

  7. I used linkbench to compare MySQL/InnoDB 5.1, 5.6 and 5.7. After a few improvements to linkbench and to the InnoDB my.cnf variables I was able to get much better QPS than before (about 1.5X better). I was ready to try 5.7 because it reduces contention from the per-index latch. All tests below use reduced durability (no binlog, no fsync on commit) and more details on the my.cnf options are at the end of this page. The tests were very IO-bound as the databases were ~600GB at test start prior to fragmentation and the InnoDB buffer pool was 64GB.

    The summary is that 5.7.2 has better performance than 5.6 and 5.1 and much less mutex contention. The test server has 32 cores with HT enabled and a fast flash device. InnoDB was doing about 40,000 page reads & writes per second.

    • 11281 QPS -> MySQL 5.1.63
    • 23079 QPS -> MySQL 5.6.12
    • 24710 QPS -> MySQL 5.7.2

    Mutex contention for 5.7.2

    This was collected using the performance schema during an 1800 second test run with 64 client connections. The nsecs_per column is the average number of nanoseconds per attempt to lock the mutex or rw-lock. The seconds column is the total number of seconds attempting to lock it.

    +---------------------------------------------+---------+--------+
    | event_name                                  |nsecs_per| seconds|
    +---------------------------------------------+---------+--------+
    | wait/synch/rwlock/innodb/index_tree_rw_lock | 19543.3 | 3456.1 |
    | wait/synch/mutex/innodb/log_sys_mutex       |  2071.3 |  385.8 |
    | wait/synch/rwlock/innodb/hash_table_locks   |   165.7 |  184.5 |
    | wait/synch/mutex/innodb/fil_system_mutex    |   328.3 |  113.6 |
    | wait/synch/mutex/innodb/redo_rseg_mutex     |  1766.4 |   84.9 |
    | wait/synch/rwlock/sql/MDL_lock::rwlock      |   430.2 |   73.9 |
    | wait/synch/mutex/innodb/buf_pool_mutex      |   264.5 |   72.7 |
    | wait/synch/rwlock/innodb/fil_space_latch    | 27216.1 |   53.8 |
    | wait/synch/mutex/sql/THD::LOCK_query_plan   |   167.0 |   50.7 |
    | wait/synch/mutex/innodb/trx_sys_mutex       |   394.9 |   41.5 |
    +---------------------------------------------+---------+--------+

    Mutex contention for 5.6.12

    This was collected using the performance schema during an 1800 second test run with 64 client connections. The total number of seconds stalled for index_tree_rw_lock and buf_pool_mutex is much higher compared to 5.7.2.

    +---------------------------------------------+--------------------+
    | event_name                                  | nsecs_per| secs    |
    +---------------------------------------------+----------+---------+
    | wait/synch/rwlock/innodb/index_tree_rw_lock | 144148.0 | 24491.5 |
    | wait/synch/mutex/innodb/buf_pool_mutex      |   5439.9 |  1531.8 |
    | wait/synch/mutex/innodb/log_sys_mutex       |   1821.6 |   349.7 |
    | wait/synch/rwlock/innodb/hash_table_locks   |    112.4 |   240.9 |
    | wait/synch/mutex/innodb/fil_system_mutex    |    234.9 |    83.2 |
    | wait/synch/rwlock/sql/MDL_lock::rwlock      |    373.1 |    66.1 |
    | wait/synch/mutex/innodb/trx_sys_mutex       |    332.3 |    50.6 |
    | wait/synch/mutex/sql/THD::LOCK_thd_data     |    159.6 |    46.7 |
    | wait/synch/mutex/innodb/os_mutex            |    190.3 |    42.9 |
    | wait/synch/mutex/sql/LOCK_table_cache       |    325.4 |    35.9 |
    +---------------------------------------------+------------+-------+

    my.cnf options

    This lists the my.cnf options for 5.7.2 and 5.6.12

    table-definition-cache=1000
    table-open-cache=2000
    table-open-cache-instances=1
    max_connections=2000
    key_buffer_size=200M
    metadata_locks_hash_instances=256
    query_cache_size=0
    query_cache_type=0
    skip_log_bin
    max_allowed_packet=16000000
    innodb_buffer_pool_size=64G
    innodb_log_file_size=1900M
    innodb_buffer_pool_instances=8
    innodb_io_capacity=16384
    innodb_lru_scan_depth=2048
    innodb_checksum_algorithm=CRC32
    innodb_flush_log_at_trx_commit=2
    innodb_thread_concurrency=0
    innodb_flush_method=O_DIRECT
    innodb_max_dirty_pages_pct=80
    innodb_file_format=barracuda
    innodb_file_per_table
    innodb_adaptive_hash_index=0
    innodb_doublewrite=0
    innodb_flush_neighbors=0
    innodb_use_native_aio = 1

    17

    View comments

  8. Several members of the small data team at FB will be at MySQL Connect this weekend. It would be interesting to learn that someone else has used Linkbench. I use it in addition to sysbench. After some effort tuning InnoDB and a few changes to the source I was able to almost double the Linkbench QPS but I really need MySQL 5.7 as the per-index latch for InnoDB indexes is the primary bottleneck.

    In addition to networking at conferences, I recently spent a day looking at networking in MySQL 5.1 and 5.6. A good overview is the output from strace -c -p $PID where $PID is a thread busy with sysbench read-only queries for a cached database. Below I describe the results from MySQL 5.1.63 and 5.6.12 using official MySQL and the Facebook patch. Each result is from a sample of about 10 seconds (give or take a few seconds).

    Official MySQL 5.1.63

    This strace output is from official MySQL 5.1.63. There are two interesting things in these results. The first is frequent calls to sched_setparam and all of them return an error. That is bug 35164 which was fixed in MySQL 5.6. Removing the calls in 5.1 improved performance by about 0.3% on my test server. That isn't a big deal but I am happy the code is gone. The second interesting result is the high number of calls to fcntl. I filed feature request 54790 asking for them to be removed. There were a big problem for performance on older Linux kernels that used a big kernel mutex for some of the fcntl processing. See this post for details on the impact. This is not a performance problem on the kernels I have been using recently.


    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     40.30    0.964307          17     57447     17100 read
     36.58    0.875208          22     40348     40348 sched_setparam
     11.30    0.270419           8     34199           fcntl
      9.12    0.218152          11     20174           write
      2.51    0.060160          20      3021       621 futex
      0.19    0.004601          29       156           sched_yield
    ------ ----------- ----------- --------- --------- ----------------
    100.00    2.392847                155345     58069 total

    Facebook 5.1.63

    This strace output is from the Facebook patch for MySQL 5.1.63. It still has the frequent errors from calls to sched_setparam. But instead of too many calls to fcntl it has too many calls to setsockopt. That was a good tradeoff on some Linux kernels as described in this post but it doesn't matter on recent kernels.

    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     46.15    0.673638          42     15851           read
     28.17    0.411157          26     15850     15850 sched_setparam
     11.73    0.171289          11     15851           setsockopt
      6.93    0.101176          13      7925           write
      4.61    0.067359          26      2628       615 futex
      2.41    0.035137          39       896           sched_yield
    ------ ----------- ----------- --------- --------- ----------------
    100.00    1.459756                 59001     16465 total

    MySQL 5.6.12

    This result is the same for both official MySQL and the Facebook patch. Hooray, the calls to sched_setparam are gone! There are many calls to recvfrom that get errors. I assume these are the non-blocking calls that return no data. There are also many calls to poll. I prefer to see fewer calls to poll and hacked on MySQL to do blocking calls to recv. That made the poll calls go away but didn't have a significant impact on performance. Perhaps it will help in the future when other bottlenecks are removed.

    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     52.61    1.743483         108     16120           poll
     33.75    1.118461          59     19055           sendto
      9.71    0.321903           6     54229     16121 recvfrom
      3.92    0.130002          19      6846      1716 futex
      0.01    0.000353         118         3           sched_yield
    ------ ----------- ----------- --------- --------- ----------------
    100.00    3.314202                 96253     17837 total

    Server-side network reads

    The pattern for server-side network reads is to first do a non-blocking read and if that doesn't return the expected amount of data then a read with timeout is done. I don't have all of the history behind the design decision but my guess is that there had to be support for interrupting reads on shutdown. The implementation of read with timeout has changed over time and it can be hard to figure out some of the code unless you look at the preprocessor output.

    • in official MySQL 5.1.63 read with timeout was usually implemented by doing a blocking read and then the alarm thread would unblock the thread when the timeout was reached or on shutdown. There was also an option to not use the alarm thread (see -DNO_ALARM). This suffered from frequent calls to fcntl which was a problem when Linux required a big kernel mutex to process them.
    • in the Facebook patch for MySQL 5.1.63 the code was changed to set a read timeout on the socket via setsockopt and this worked with -DNO_ALARM.
    • in MySQL 5.6 the recv call is used to read data from a socket and the code does non-blocking reads and then does a wait with timeout via poll until there is more data.

    Reference

    I frequently refer to my old blog posts to research problems so I pasted some of the network stack that is used during server-side socket reads. The call stack is:
    my_net_read()
    -- net_read_packet()
    ---- net_read_packet_header()
    ------ net_read_raw_loop()
    ---- net_read_raw_loop() to get body

    And the interesting code for net_read_raw_loop() is listed below:

    net_read_raw_loop has:

      while (count)
      {
        size_t recvcnt= vio_read(net->vio, buf, count);

        /* VIO_SOCKET_ERROR (-1) indicates an error. */
        if (recvcnt == VIO_SOCKET_ERROR)
        {
          /* A recoverable I/O error occurred? */
          if (net_should_retry(net, &retry_count))
            continue;
          else
            break;
        }
        /* Zero indicates end of file. */
        else if (!recvcnt)
        {
          eof= true;
          break;
        }

        count-= recvcnt;
        buf+= recvcnt;
      }

    vio_read is:

    size_t vio_read(Vio *vio, uchar *buf, size_t size)
    {
      ssize_t ret;
      int flags= 0;

      /* If timeout is enabled, do not block if data is unavailable. */
      if (vio->read_timeout >= 0)
        flags= VIO_DONTWAIT;  

    /* this is VIO_DONTWAIT == MSG_DONTWAIT 
       and with tracing all calls to mysql_socket_recv have read_timeout > 0 and use MSG_DONTWAIT */

      while ((ret= mysql_socket_recv(vio->mysql_socket, (SOCKBUF_T *)buf, size, flags)) == -1)
      {
        int error= socket_errno;

        /* The operation would block? */
        if (error != SOCKET_EAGAIN && error != SOCKET_EWOULDBLOCK)
          break;

        /* Wait for input data to become available. */
        if ((ret= vio_socket_io_wait(vio, VIO_IO_EVENT_READ)))
          break;
      }

      DBUG_RETURN(ret);
    }

    int vio_socket_io_wait(Vio *vio, enum enum_vio_io_event event)
    {
      int timeout, ret;

      DBUG_ASSERT(event == VIO_IO_EVENT_READ || event == VIO_IO_EVENT_WRITE);

      /* Choose an appropriate timeout. */
      if (event == VIO_IO_EVENT_READ)
        timeout= vio->read_timeout;
      else
        timeout= vio->write_timeout;

      /* Wait for input data to become available. */
      switch (vio_io_wait(vio, event, timeout))
      {
      case -1:
        /* Upon failure, vio_read/write() shall return -1. */
        ret= -1;
        break;
      case  0:
        /* The wait timed out. */
        ret= -1;
        break;
      default:
        /* A positive value indicates an I/O event. */
        ret= 0;
        break;
      }

      return ret;
    }

    int vio_socket_io_wait(Vio *vio, enum enum_vio_io_event event)
    {
      int timeout, ret;

      DBUG_ASSERT(event == VIO_IO_EVENT_READ || event == VIO_IO_EVENT_WRITE);

      /* Choose an appropriate timeout. */
      if (event == VIO_IO_EVENT_READ)
        timeout= vio->read_timeout;
      else
        timeout= vio->write_timeout;

      /* Wait for input data to become available. */
      switch (vio_io_wait(vio, event, timeout))
      {
      case -1:
        /* Upon failure, vio_read/write() shall return -1. */
        ret= -1;
        break;
      case  0:
        /* The wait timed out. */
        ret= -1;
        break;
      default:
        /* A positive value indicates an I/O event. */
        ret= 0;
        break;
      }

      return ret;
    }

    int vio_socket_io_wait(Vio *vio, enum enum_vio_io_event event)
    {
      int timeout, ret;

      DBUG_ASSERT(event == VIO_IO_EVENT_READ || event == VIO_IO_EVENT_WRITE);

      /* Choose an appropriate timeout. */
      if (event == VIO_IO_EVENT_READ)
        timeout= vio->read_timeout;
      else
        timeout= vio->write_timeout;

      /* Wait for input data to become available. */
      switch (vio_io_wait(vio, event, timeout))
      {
      case -1:
        /* Upon failure, vio_read/write() shall return -1. */
        ret= -1;
        break;
      case  0:
        /* The wait timed out. */
        ret= -1;
        break;
      default:
        /* A positive value indicates an I/O event. */
        ret= 0;
        break;
      }

      return ret;
    }


    int vio_io_wait(Vio *vio, enum enum_vio_io_event event, int timeout)
    {
      int ret;
      short DBUG_ONLY revents= 0;
      struct pollfd pfd;
      my_socket sd= mysql_socket_getfd(vio->mysql_socket);

      memset(&pfd, 0, sizeof(pfd));
      pfd.fd= sd;

      /* Set the poll bitmask describing the type of events.
        The error flags are only valid in the revents bitmask. */

      switch (event)
      {
      case VIO_IO_EVENT_READ:
        pfd.events= MY_POLL_SET_IN;
        revents= MY_POLL_SET_IN | MY_POLL_SET_ERR | POLLRDHUP;
        break;
      case VIO_IO_EVENT_WRITE:
      case VIO_IO_EVENT_CONNECT:
        pfd.events= MY_POLL_SET_OUT;
        revents= MY_POLL_SET_OUT | MY_POLL_SET_ERR;
        break;
      }

      /* Wait for the I/O event and return early in case of error or timeout */
      switch ((ret= poll(&pfd, 1, timeout)))
      {
      case -1:
        break; /* return -1 on error */
      case 0:
        /* Set errno to indicate a timeout error. */
        errno= SOCKET_ETIMEDOUT;
        break;
      default:
        /* Ensure that the requested I/O event has completed. */
        DBUG_ASSERT(pfd.revents & revents);
        break;
      }

      DBUG_RETURN(ret);
    }

    static inline ssize_t 
    inline_mysql_socket_recv(MYSQL_SOCKET mysql_socket,  SOCKBUF_T *buf, size_t n, int flags)
    {
      ssize_t result;

      /* Non instrumented code */
      result= recv(mysql_socket.fd, buf, IF_WIN((int),) n, flags);

      return result;
    }
    1

    View comments

  9. My co-workers will speak about big and small data at XLDB. Jeremy Cole and the Tokutek founders are also speaking. I hope to learn many interesting things there including my fate (1, 2, 3, 4), whether I am doing things right or wrong and what database technology might be used by future extremely large science experiments. Oh, you probably missed it but we are doing it all wrong. See the slides/abstract from NEDS 2013 on "The Traditional Wisdom is All Wrong". Excessively strong claims without any attempt to understand web-scale data management doesn't make a great paper.

    5

    View comments

Loading