SSD can be a huge deal for MySQL data warehouse deployments. Disk-bound joins between large tables can be slow because nested-loops join uses so many random reads. Hash join is one way to fix this. Decreasing the time for a random read is another.
Intel SSD can do a random read in less than 0.1 milliseconds. It also has great random write performance which has been an issue for SSD and the price is reasonable. Decreasing random read latency by 50X to 100X will make disk-bound joins much faster.
This should also help with:
- disk-bound aggregation when an indexed MyISAM temp table is used
- old-style external sort when the sort buffer contains (key, row pointer)


Yes, I think hash-join should still be implemented. SSDs may eventually replace spinning disks, but not for some time, and not while promising the same capacities as spinning disks for some time as well.
ReplyDeleteBy the way, your SPAM catcher seems broken on Safari :(
Which will get here first -- commodity pricing on SSD or a full implementation of hash join in MySQL? There are slides on the use of hash join in MySQL 6 for some subqueries, but I think it is limited to cases where the build side fits in memory and does not do external (2-pass) hash join.
ReplyDeleteHi!
ReplyDeleteIf you want a Hash Join quickly, and have the money, I know how to get one in Drizzle. It would just be up to you to port it over to MySQL :)
Cheers,
-Brian
Hi!
ReplyDeleteIf you have the cash I know how to get this written for Drizzle. It would be up to you to port it back to your version of MySQL :)
Cheers,
-Brian
I wanted to implement it myself. It might be fun. I can do the query execution part of it if someone else delivers the optimizer changes.
ReplyDeleteGive me both SSD and hash joins. The data volumes in reporting systems keeps increasing. 50x may seem like a lot, but that is 5 to 6 years of yearly doubling of data volume. This may be a bit of a stretch, but I don't want to have to wait for MySQL 7.0 for hash joins.
ReplyDeleteTimour (you've met!) has written nearly all of the hash join code for MySQL, put loads of time into it. I don't know the current state and it may already be bitrotted.
ReplyDeleteArjen, as I already replied to your email, there is no code that can be reused, however there is something that IMO is better than code - there is a very detailed design (with pseudocode and explanations):
ReplyDeletehttp://forge.mysql.com/worklog/task.php?id=2241
In particular, in WL#2241 I have addressed both how to hook hash join to the optimizer and the executioner.
Please notice that WL#2241 is the design for the so-called One-pass hash join. This algorithm makes sense only if the build tables fits in main memory. The optimizer part of the design makes sure that we don't apply single-pass hash join when the build table is much bigger than the available memory.
The idea behind this project was to make it as a first step to a complete multi-pass hashing based implementation. We also assumed that it will be useful in many cases, because dimension tables are often pretty small.
As far as I know from our discussions with Marc, WL#2241 wouldn't help Google in particular, as all their tables are too big.
Actually, just a thought - given SSDs are so much better than HDD with random access, would it make sense to store potentially big build tables on an SSD? Does anyone know if lookups into hash indexes that are on an SSD would provide good performance?
ReplyDeleteIf that is so, then single-pass hash joins might become quite interesting. Well, just a thought ...
Batch key access + spinning disk + nested loops join should make things much better than they are today. If storage engines then take advantage of the batch of keys to do lookups in parallel, things get even better.
ReplyDeleteDisk-bound nested loops join on SSD will be much faster. We just have to wait for someone to buy an Intel SSD or its equivalent and run some benchmarks.
I think hash joins are still needed. Even when the amount of latency for random read is decreased significantly, the relative cost of a nested loop into an index with low selectivity won't be good compared to a hash join on the same column.
ReplyDeleteJust to add to some of the discussions, there are conditions where nested loop has not only random reads, but repeated reads of the same blocks.
ReplyDeleteConditions where the size of the index is larger than one query’s available portion of the data buffer cache, additional probes into the same index block separated over time can require an additional disk access. The number of repeated disk reads required depends on overall pressures on the data buffer cache, the percent of records required from the total, index structure, sorting of the driving set, among other conditions.
Reduced cost of random read via SSD will be a clear net gain in performance for cases where the number of probes is few and distributed close in time. Under those circumstances, a repeated access to an index block is very likely to be from cache, and the % of index disk blocks required is few (and scattered).
Under other circumstances where the number of index probes is higher and drives up repeated accesses to blocks as well as adding to the duration, the likelihood of a disk read goes up. Access to even 10% of the index keys with 400 keys per block can require each block to be touched 40 times. Under these conditions, the net gain from reduced latency delivered by SSD is reduced or potentially overcome by any repeated disk accesses.