Falcon has the performance, reliability and scalability you need to match your application requirements.
This statement must be from the future as there is no production release today.
Row-level replication (instead of statement-based replication) is required when replicating Falcon objects.
This might be enough to keep me from upgrading, but I am not sure if this is limited to Falcon. Will future MySQL releases require the use of row based replication? Having SQL statements in the binlog is invaluable to me and I am not willing to give that up.
True clustered indexes on InnoDB tables would not be migrated and function on Falcon as they do on InnoDB.
Most descriptions of Falcon discuss the differences between clustered indexes for InnoDB and the index scan optimizations done for Falcon. This one does not. Index range scans on the primary key index for Falcon tables do not behave the same as InnoDB and are likely to use more IO in many cases.
Falcon excels in processing short to medium-sized transactions on multi-CPU hardware, and is therefore ideal for most online database applications.
Does this mean that Falcon is targeted towards OLTP and will not do well for data warehouse queries? InnoDB is great at both. I think Maria will be great at both, although I would be happy to use it if it supported crash recovery and many readers concurrent with 1 writer, but not concurrent writers.The plan is to have a GA release of Falcon in mid-2008, but is dependent on the information gathered from the alpha and beta testing periods.
Has the plan changed? We are not that far off from mid-2008. What are the release targets for 5.1 and 6.0 and which one has priority?


Hi Mark,
ReplyDeleteThe questions about Falcon are relevant, sorry if my question is a bit of the topic.
But, I would be very interested to know, what do you use the SQL statements in the binlog for?
So maybe MySQL AB has realized that Falcon is not going to be the migration path for many InnoDB users and this is why Maria has transaction safety on its roadmap?
ReplyDeleteMark -
ReplyDeleteThanks for your comments. In response to some of your questions, I've updated and reposted (may take a day to appear on the site) the several different docs you've pulled quotes from and also updated the additional FAQ you refer to with the new timeframe for 6.0 that reflects the changes in the 5.1 delivery schedule.
Some of the team may post other specific answers for you later in this blog.
Thanks again,
--Robin
Mark, I am glad you are happy with InnoDB. It IS an amazing database engine for performance and we use it in the falcon team as a target for comparisons.
ReplyDelete"there is no production release today" - We are hoping to get a production release later this year.
Statement-based replication is planned, but not for the first release.
"Falcon excels in processing short to medium-sized transactions on multi-CPU hardware". Falcon handles large transactions well too. We are still making some tweaks to record cache handling of many concurrent large transactions. But the main point should be that Falcon continues to make the most use of each CPU on the system and handles high concurrency very well because there are no index locks. Falcon locks only records that are being changed or are locked for update.
Kevin Lewis - Falcon Team Lead
Mark, I am glad you are happy with InnoDB. It IS an amazing database engine for performance and we use it in the falcon team as a target for comparisons.
ReplyDelete"there is no production release today" - We are hoping to get a production release later this year.
Statement-based replication is planned, but not for the first release.
"Falcon excels in processing short to medium-sized transactions on multi-CPU hardware". Falcon handles large transactions well too. We are still making some tweaks to record cache handling of many concurrent large transactions. But the main point should be that Falcon continues to make the most use of each CPU on the system and handles high concurrency very well because there are no index locks. Falcon locks only records that are being changed or are locked for update.
Kevin Lewis - Falcon team Lead
Hi!
ReplyDeleteFalcon is just another option, it is not an Innodb replacement. I would ignore the marketing fluff and just test it to figure out if it belongs in your stack or not. Stick to first hand experience.
Oracle continues to be committed to Innodb (they even hired a friend of mine in the last couple of weeks to work on it, so its far from being dead there).
The future of replication is a hybrid mix of row and statement based. It would be simpler, and more reliable to stick to row based, but there are some great advantages with UPDATE that only come with statement.
Will Innodb some day require it? Maybe, it is up to Heikki. I can see good arguments for and against it. Extra bandwidth in replication may be offset with fewer locks if you go with row based (but... big transactions are harmed by IO...).
Cheers,
-Brian
this just goes on to show the lack of understanding and communication between the marketing
ReplyDeleteteam and those working on product(s) at MySQL.
"Having SQL statements in the binlog is invaluable to me and I am not willing to give that up."
Very good point Mark. I hope MySQL doesn't forces us to use SBR. Come to think of it, I don't
think production InnoDB environments can really take that hit.
"Index range scans on the primary key index for Falcon tables do not behave the same as InnoDB
and are likely to use more IO in many cases."
I am with you on this one too, Mark. Sometimes I feel there is a gap in MySQL's understanding
of true production environments even though they provide consulting to many. Clustered indexes
are extremely valuable to many large environments focusing on decreasing IO as much as
possible. This is also the reason Maria fails to inspire me. Those who understand the true cost
of IO will avoid any marginal IO from happening as much as possible. Even talking to upcoming
vendors and products I find it hard to believe how everyone just simply doesn't even think of
clustered indexes.
"Does this mean that Falcon is targeted towards OLTP and will not do well for data warehouse
queries? InnoDB is great at both"
Man, did you really get in my mind and gave an eloquent way of describing my thoughts on your
blog? One of the biggest misconceptions that widely exists is that InnoDB is a poor choice for
data warehouse queries with multi-hundred GB InnoDB tables. I passionately disagree with that
as well. I cannot thank Heikki enough for thinking of clustered indexes in InnoDB. Without it,
we probably will be against a wall as far as MySQL is concerned because our application needs
the reliability of OLTP systems and yet the ability to run data warehouse queries. Our DW
queries on InnoDB run much faster and cause least disruption from IO point of view.
"I think Maria will be great at both"
I am curious to know why you think Maria will be great at both? AFAIK Maria still lacks
clustered indexes (they are not even on the roadmap). Unless clustering of data is involved
it will still cause more IO, IMHO. As I have pointed on my blog before, that is the one reason,
even Maria offers us no hope with our needs. However, that is just because of our production
needs. Maria seems a very fine creation and looks ideal for many kinds of environments.
"many readers concurrent with 1 writer, but not concurrent writers"
hmm... this makes me even more curious. May I ask, why concurrent writers are not ideal in your
situation especially if you want OLTP support with the ability to run DW queries?
Thanks for a great post.
I use this for debugging and monitoring on a regular basis. We just upgraded a MySQL deployment from 4.0 to 5.0. During this we wanted to make sure that the data in 5.0 matched that in 4.0. Some Double columns were significantly different. The cause was a rounding error (4.0 did the math as Double, 5.0, did the math as Decimal) that became signficant when multiplied by a large number. I found the offending SQL in the binlog and the application owners changed their code. I had to fix several differences in this way.
ReplyDeleteI have had to check many times for uses of temporary tables. That is done now, as we don't allow them to be created on masters. Prior to then, the binlog helped me find the applications creating temp tables.
When there is a load problem on a machine, I start by looking at vmstat, iostat and top output. When there is a replication problem, I look at the binlog.
I am more interested in support for large queries than large transactions. With respect to locking rows, the same can be said about InnoDB (ignoring the S locks taken on the master when statement based replication is used). These locks are not an issue for me. I know that Falcon and InnoDB have very different implementations which may have an impact on performance for different workloads. We will have to wait and see what the differences are.
ReplyDeleteFrank,
ReplyDeleteAs we have run the workload on anything other than InnoDB, I don't know how much benefit we get from clustered indexes. I hope to have per object (index, table) and per account IO stats in SHOW USER_STATISTICS output, and when that is done I can get a better idea of where my IO goes.
If Maria were limited to 1 writer with concurrent readers I could use it on a slave. That combined with partitioning and compressing some of the Maria partitions (is that supported?) would be a big win.
In 5.1, to get the highest performance from InnoDB for auto-increment lock modes row based replication must be used. Perhaps Falcon is favoring performance/concurrency/scalability in this and other cases which brings the necessity for row based replication.
ReplyDeletehttp://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html
Falcon uses row based replication for correctness. Our repeatable read mode is not serializable and statement based replication could cause the master and slave to diverge.
ReplyDeleteOn the clustered index issue, there are benefits on both sides. Many applications use artificial primary keys, which aren't terribly useful for range retrievals. If the primary key is artificial, the performance of secondary keys is important. In Falcon, primary and secondary keys have the same access time. Clustered indexes are also subject to overflows and fragmentation.
ReplyDeleteOur testing shows that Falcon's record encoding reduces the amount of data stored, which is a different way to approach the question of reducing I/O.
Kudos to Falcon for reminding us that the logical size of an int doesn't have to be the physical size. As some of us go through slow alter table statements to grow our int columns, we wish that all storage engines did this.
ReplyDeleteI agree that artificial PK values are not very useful for range scans. But that are frequently referenced during FK-PK joins. However, I don't have stats to show that one approach is better for my workload.
Falcon definitely offers a big optimization for IO by ordering rowids before fetching them.
"this just goes on to show the lack of understanding and communication between the marketing team and those working on product(s) at MySQL."
ReplyDeleteActually, no collateral on Falcon goes out the door until it gets thumbs up from the Falcon Eng team. However, not all docs contain the same degree of detailed info so sometimes it may appear that something isn't known when in reality, it is and just present in a doc more targeted for detailed reading. We'll look into making this better though, so thanks.
--Robin