When David DeWitt and Michael Stonebraker call you out, you might be on to something. In a new article they claim that MapReduce is a step backwards for parallel query processing. Regardless of whether they are fond of it, they are helping to spread the message and start interesting discussions and that is good.
Even better is the impact that Sun's purchase of MySQL might have on the future of MySQL and Hadoop. MySQL has great transaction engines that generate a lot of operational data. Eventually, there is so much operational data that parallel query, hash join and fast external sorts are needed to process queries quickly. Sun likes to sell lots of machines and Hadoop uses a lot of machines. Hopefully, it is just a matter of time before we are able to maintain copies of MySQL tables in realtime in Hadoop using row-based replication and then query that data using Pig Latin or something else that looks like SQL.
Friday, January 18, 2008
Subscribe to:
Post Comments (Atom)


Though it does not support HA (clustered "sharding") in the application layer yet, this might be of interest.
ReplyDeletehttps://issues.apache.org/jira/browse/HADOOP-2536