Tuesday, October 27, 2009

Managed MySQL -- Amazon RDS

Managed MySQL is here. Amazon RDS allows you to run MySQL on their hardware. It isn't perfect, but I think this is a great first release. I expect this will support PostgreSQL soon given that the command-line tools are not MySQL specific.

Note:
  • This uses MySQL 5.1.38
  • I did not see an option to enable SSH connections to MySQL. I think that is required for this to be a great way to run MySQL.
  • This supports MyISAM and InnoDB. They don't give you command line access to the machines, so you cannot run myisamchk to recover corrupt MyISAM tables, nor can you run myisampack to compress them. I think it is a good idea to stick with InnoDB and then ask Amazon to upgrade to the InnoDB 1.0.4+ plugin.
  • This appears to use network attached storage for most data. For example, innodb_data_home_dir=/rdsdbdata/db/innodb. I am not sure whether this buffers data in the OS buffer cache and if it does not, that will hurt MyISAM performance as it does not buffer table data.
  • Replication is disabled. That makes it much easier to run many instances of MySQL in the environment. Replication state is not crash proof and Amazon probably does not want to spend their days recovering/replacing/rebuilding slaves. But that also limits the use of this for read scale out. Maybe Amazon and RightScale have something in progress to change that without introducing manageability overhead.
  • The master user does not have SHUTDOWN, SUPER or replication privileges.
  • Binlogs are enabled, but the master user does not have privileges to run SHOW MASTER STATUS. The documents state that databases can be recovered up to the last 5 minutes. I assume this means that any writes done are guaranteed to be archived somewhere after 5 minutes. If there were an option to archive the binlogs, then that would provide an extra degree of safety.

17 comments:

  1. I discovered the lack of SUPER privileges when I tried to create a TRIGGER. Somewhat surprisingly, I was able to grant my master user SUPER privileges.

    ReplyDelete
  2. I did the math and we'd have to pay ~20000$/year for something we do with a 7k$ server (add some power costs, but that isn't much).

    I guess this isn't designed to be running anything too big, is it?

    ReplyDelete
  3. +1 for supporting PostgreSQL. I've got clients who are already running Pg in AWS, and would love it if it was managed similarly.

    ReplyDelete
  4. @Domas - many companies have neither a server room nor a DBA nor a good backup/failover solution. This might be great for them.

    ReplyDelete
  5. domas,

    RDS prices are the EC2 instance prices with a mark-up. Once they introduce reserved instances for RDS (in the roadmap), the price disparity may reduce.

    Best,
    Ismael

    ReplyDelete
  6. Yea but with the lack of access, there's also lack of ability to monitor, analyse performance, and address issues.
    It will indeed address the needs of a (large) group of simple single server setups that currently just stuff around with unmanaged default installs.
    It does not address the needs of environments that need tuning, replication, and scaling needs. In that context, the product table makes no sense as the higher-end servers will not actually perform well, given the InnoDB internals.

    ReplyDelete
  7. @Arjen - I think you are overstating your case. For tuning my.cnf parameters can be set. For scaling, the larger servers also provide more memory and better IO performance. They will some workloads much faster. When RDS switches from built-in InnoDB to the 1.0.4+ plugin, performance for high-concurrency workloads will get much better as the 1.0.4+ plugin scales to or beyond 16 cores on some workloads for me. By 'scale' I mean that I get more throughput, not 2X the throughput, versus an 8 core server.

    ReplyDelete
  8. See my comment on ReadWriteWeb about RDS. I/O performance is horrific. If you have any significant write operations, RDS and EC2 is pretty useless.

    Of course, this was just my experience. I've heard you can stripe EBS volumes in a Raid config.

    http://www.readwriteweb.com/enterprise/2009/10/amazon-web-services-announces-relational-database-services.php#comment-165560

    ReplyDelete
  9. @Justin - from the number you publish, you are IO-bound at the latency of a slow SATA disk in the RDS case, IO bound at the latency of a SAS disk in the non-RDS EC2 case and not IO bound at all in the Dell case.

    Or for the RDS case, each transaction requires 2 IO operation latencies, for non-RDS EC2 each requires 1 IO operation latency and for Dell there were none thanks to write buffering by the disk cache, OS buffer cache or HW RAID card.

    The binlog is enabled for RDS. Was it also enabled for the other cases? I don't know enough about EBS to understand latencies on write and fsync when it is used.

    ReplyDelete
  10. Write performance is horrible and the price is rather steep. So what's the use case for this?

    ReplyDelete
  11. @Nils - how can I respond to a detailed statement like that? I will try.

    At a job long ago, the QA team decided they were the perf team the day before the release. Wizards that they were, they reported that the competition did 10,000 commits per second from one thread on a server with 1 slow disk. Our product could only do a miserable 100 per second. Therefore we sucked and the product should not be shipped. They never explained how the competition could get 10,0000+ IOPs out of that disk. Nor did they explain how the pure-Java server from the competition could force the sync on commit, as pure-Java did not do sync/fsync at the time.

    So, maybe write latency is horrible and maybe people are comparing it with the performance of writes into an OS buffer cache, drive cache or HW RAID cache. I want to see more results. Also, when Amazon upgrades to the 1.0.4+ plugin, then we get multiple background write threads and that will make a big difference when writes by InnoDB see real IO latency rather than the latency of a write into a cache.

    With respect to price, I think that the base case varies. For someone who has neither a DBA, sysadmin, network tech, server rooom, UPS and backup solution, then the prices for RDS might be good.

    ReplyDelete
  12. Mark,

    I just wanted to sum up some previous statements and then get right to the punch line ;)

    You don't need to buy a whole server room and hire staff to run one database. You can rent or buy a server, colocate it somewhere and pay someone to do the operations stuff for you if you can't do it yourself. So the question is, could you still save by using RDS? Especially when you can costumize the hardware to your needs, especially regarding I/O. A BBU for the RAID Card for example does wonders for many workloads and only costs like 150$. This is not possible with EC2. Getting a "Quadruple Extra Large DB Instance" (do you get fries with that?) and then only getting the write latency equalivent of a single SAS Disk (assuming 85 qps with binlog enabled)? COME ON! At 2000$ a month I'd expect a bit more.

    ReplyDelete
  13. @Nils -- with the details your summary is much more convincing. As I am curious about the results, I would like to understand what limits performance in this case, but I don't have time to try that now. Maybe someone else in the community will publish more results. EC2 (MySQL without RDS) can overcome some of this by using local storage and something to make the binlogs archived remotely.

    I am also curious about how the EBS IO is setup on the various RDS instances. Do the larger instances stripe RAID over more EBS devices?

    ReplyDelete
  14. I wonder if they will make it easy to transfer data to servers outside EC2. Meaning your product becomes successful and you want to run your own server and such. Are their any tools to get a backup of INNODB or MYISAM tables so the data owner can transfer it to another facility? If so do they allow remote over the internet SSL connections for replication?

    ReplyDelete
  15. Nils .. For all of us who don't have highly capable DBA's, air and power conditioned server rooms, quick and ready hardware replacements, or the patience and skill to coordinate a team that can acheive this, the RDS option looks to be as big a win as EC2 turned out to be.

    ReplyDelete
  16. Performance is another problem with RDS so those apps that needs raw speed will need to look at alternatives (or traditional setup of localhost server)

    ReplyDelete

 
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.