- Support undo. The manual claims that MyISAM supports atomic operations. They must use a different meaning for atomic. When a long-running insert, update, delete or replace statement is killed it remains half-done for MyISAM. This could be fixed by supporting undo for MyISAM. I will guess that MySQL can reuse some of the code they added for hot backup to implement undo.
- Reduce mutex contention. MyISAM could use multiple key caches per table as MPAB has proposed.
Monday, February 15, 2010
Save MyISAM
What is the future for MyISAM? MySQL has invested a lot in storage engines over the past few years (Falcon, Maria) and it isn't clear that anything will come from those efforts. A lot of effort has been put into InnoDB and much will come from that. There has not been a significant effort to improve MyISAM (other than hot backup). What could be done with it?
Subscribe to:
Post Comments (Atom)


The strength of MyISAM is in the simplicity of it. Row after row of data stacked up in a file.
ReplyDeleteThe only weakness I see is that it is not crash safe or atomic, so a simple modifaction of how data changes/inserts are written to disk would be the best change.
Keep it simple and fast, there are users who depend upon that speed in a low concurrency environment.
Make it crash safe by implementing a synch to disk at statement completion. This should give you single statement atomicity.
MyISAM should not be made transactional beyond this level though. Because as you state..
" MyISAM performance matters even when you don't explicitly create MyISAM tables. MyISAM is used for some implicit temp tables created to evaluate ORDER BY and GROUP BY clauses."
So internal creation of tables , I assume those are non indexed temp tables, needs to be simple and fast.
--
Tom Hanlon
Meh. How about we just replace MyISAM altogether? PBXT, for instance, has some interesting design concepts. ARCHIVE is great for fast INSERT performance (slow retrieval obviously). And for data-warehousing some of the newer engines coming out seem to be a preferred choice.
ReplyDeleteMyISAM is old and busted and perhaps I have distaste for it given the number of off the shelf applications that use it when InnoDB would be highly preferred.
MyISAM was built to fast (for some things) and simple. I think it makes sense to leave it as such and work on something that is superior. Even Drizzle, to my knowledge, is only using MyISAM internally because another solution has not been created yet.
Plus, Monty is already working on MariaDB and it is supposed to be crash safe and, eventually, transactional if I'm not mistaken.
Partitioning the key cache is fairly straightforward. It doesn't really make the engine that much more complicated. But it scales a lot better.
ReplyDeleteOur patch and benchmarking for partitioning the MyISAM key cache here: http://peter.stardoll.com/
We're using it internally. We would love if more people try it out and provide feedback.
@Tom - the first thing to fix is to undo half-done statements and sync at statement completion doesn't fix that.
ReplyDelete@Tim - I think PBXT is great and at times regret that all of my energy must go into InnoDB. That will change some day. There is a use case for MyISAM. In a master slave setup, you should use InnoDB for the replicated tables (tomorrow you should use PBXT). If long running queries are being done on the slave and long running insert-select statements used to create reporting tables, then it is not a good idea to use something other than InnoDB for that. MyISAM is the engine of choice in that case but has suffered from neglect.
@Fredrik - is this the patch that was contributed to MariaDB? It sounds great and I can't wait to get it into my builds.
@Tim - I have never had a reason to use ARCHIVE. The use case is very narrow as it supports INSERT and full table scans.
ReplyDeleteThe truth is nobody wants to look at it because it's a piece of old code written in old, obscure, micro-optimized "Monty" style. If I was an engineer looking at this code I would take several months to first make it readable and encapsulated.
ReplyDeleteHaving pulled the source code to pieces on several occasions, I can honestly say, I think I understand why MySQL has done so little work on MyISAM recently.
ReplyDeleteIt's because they don't understand the source. I don't think anybody can.
Key cache contention would be very beneficial to fix, imo. Use case: (single or ) denormalized datasets that can't benefit from multiple multiple key caches.
ReplyDelete@Mark - Yes, we contributed that patch to MariaDB. We're using vanilla MySQL though.
ReplyDeleteMark, as you suggest, they *do* have a different definition for atomic. Further down in the page:
ReplyDelete"Atomic," in the sense that we mean it, is nothing magical. It only means that you can be sure that while each specific update is running, no other user can interfere with it, and there can never be an automatic rollback (which can happen with transactional tables if you are not very careful).
So....yes, it doesn't mean the same thing, but to their credit, they did explain what they meant by "atomic"....even if it's completely silly to call that "atomic".
@Tim Monty designed the [My]ISAM engine (in the pre-MySQL UNIREG era) for datawarehousing: concurrent selects, fast tablescans, single (bulk)insert thread.
ReplyDeleteARCHIVE "solves" a problem that doesn't exist. The premise was that for high inserts you get IO bound. Untrue. With multi-row inserts and other tuning, I can max out a thread CPU at +/- 340k row inserts/second on MyISAM with IO to spare for other threads/logging/etc.
MariaDB != Maria. MariaDB includes the Maria engine, but it's also a branch/fork with other additional features like PBXT, XtraDB and various useful patches and fixes.
@Mark there's a community patch for "partitioned key cache" for MyISAM that's getting added to MariaDB. This patch also solves the problem for retrieving stats when using multiple key caches (that feature already existed for a few years).
I agree with the undo, adding that to MyISAM might make a lot of sense.
@Arjen - you should have submitted 340k insert/second as a conference talk. I am sure there is something you are not telling me. For example, are there secondary indexes? Are the inserts in PK order? Inserting quickly is hard once the data doesn't fit in the buffer cache and the inserts aren't in PK order or there are secondary indexes. And if there are no indexes (as in archive) then I agree with you 100%, MyISAM can store that data without problems. But there won't be much use of it and it isn't better than a file system at that point.
ReplyDelete[...]Mark Callaghan gets our MySQL stuff started. “What is the future for MyISAM?” He asks.[...]
ReplyDeleteLog Buffer #179