1. What is a modern database? We have some terms that wander between marketing and technical descriptions - NewSQL, NoSQL. We have much needed work on write-optimized database algorithms - Tokutek, LevelDB, RocksDB, HBase, Cassandra. We also get reports of amazing performance. I think there is too much focus on peak performance and not enough on predictable performance and manageability.

    Building a DBMS for production workloads is hard. Writing from scratch is an opportunity to do a lot better than the products that you hope to replace. It is also an opportunity to repeat many mistakes. You can avoid some of the mistakes by getting advice from someone who has a lot of experience supporting production workloads. I worked at Oracle for 8 years, wrote some good code (new sort!) and fixed a lot of bugs but never got anywhere near production.

    Common mistakes include insufficient monitoring and poor manageability. Monitoring should be simple. I want to know where something is running and not running (waiting on IO, locks). I also want to drill down by user and table -- user & table aren't just there for access control. I am SQL-centric in what follows. While there are frequent complaints about optimizers making bad choices I can only imagine how much fun it will be to debug load problems when the query plan is hidden away in some external application.

    The best time to think about monitoring is after spending too much time debugging a problem. At that point you have a better idea about the data that would have made things easier. One example of missing monitoring was the lack of disk IO latency metrics in MySQL. In one case not having them made it much easier to not notice that the oversubscribed NFS server was making queries slow via 50 millisecond disk reads.

    Monitoring should be cheap so that it can always be enabled and from this I can understand the average costs and spot changes in load from the weekly push. But I also need to debug some problems manually so I need to monitor both sessions that I know are too slow (get the query plan for a running SQL statement) and to find sessions/statements that are too slow (dump things into the slow query when certain conditions are met). Letting me do EXPLAIN for statements in my session is useful, but I really need to do EXPLAIN from statements in production - if the optimizer uses sampling I want to see the plan they get and if temp tables are involved I have no idea what will be in their temp tables. MariaDB (and MySQL) recently added support to show the query plan for statements that are currently running. This is even more useful when the query plan and performance metrics can be dumped into the slow query log when needed.

    The goal for monitoring performance problems is to eliminate the use of PMP. When I want to understand why a system is running slower than expected I frequently look at thread stacks from a running server. I hope one day that pstack + awk is not the best tool for the job on a modern database. I was debugging a problem with RocksDB while writing this. The symptom was slow performance for a read-only workload with a database that doesn't fit in cache. I have seen the problem previously and was quickly able to figure out the cause -- the block cache used too many shards. Many problems are easy to debug when you have previously experienced them. This can be restated as many problems are expensive to debug for most users because they don't have full time database performance experts.

    The focus on peak performance can be at odds with manageability. The faster way to peak performance is via tuning options and too often these options are static. Restarting a database in production to change an option is a bad idea. Dynamic options are an improvement. Using adaptive algorithms in place of many options is even better. And if you add options, make sure the defaults are reasonable.

    Predictable performance is part of manageability. How does your modern database behave when confronted with too much load? It helps when you can classify your workload as high-priority and best-effort and then shed load from the best-effort users. Alas this requires some way to distinguish users. In theory you want the hi-pri users to get their work done and then let best-effort users compete for the spare capacity. This requires some notion of SLA for the hi-pri users and spare capacity for the DBMS. These are hard problems and I have not used a great solution for load shedding.

    This could be a much larger rant/document but I must return to my performance debugging.
    6

    View comments

  2. The UC schedule has been published and there are several talks from the database teams at Facebook.
    • Small Data and MySQL by Domas Mituzas- small data is another name for OLTP. Given the popularity of big data we think that small data also deserves attention.
    • Asynchronous MySQL by Chip Turner - this describes the work done by Chip's team to implement an async MySQL client API. The feature is in the FB patch, widely used at FB and is integrated with HHVM.
    • Performance Monitoring at Scale by Yoshinori - this will explain how to be effective when monitoring many servers with few people. It is easy to get distracted by false alarms.
    • MySQL 5.6 at Facebook by Yoshinori - Yoshi will share many stories about what it took to get 5.6 into production. This included a bit of debugging and performance testing, bug fixes from upstream and a lot of work from the MySQL teams at FB.
    • Global Transaction ID by Evan, Yoshinori, Santosh - at last global transaction IDs have arrived (for people not using the Google patch). Learn what it took to get this ready for production. 
    • InnoDB Defragmentation by Rongrong - learn about the work by Rongrong to reduce the amount of space wasted from fragmentation.
    • MySQL Pool Scanner by Shlomo - MPS is one of the key tools created by our automation experts (aka operations gurus) that make it possible to manage many machines with few people.
    0

    Add a comment

Loading