19.5. Join Reordering
Background
The order in which joins are executed in a query can have a significant impact on the query’s performance. The aspect of join ordering that has the largest impact on performance is the size of the data being processed and passed over the network. If a join is not a primary key-foreign key join, the data produced can be much greater than the size of either table in the join– up to |Table 1| x |Table 2| for a cross join. If a join that produces a lot of data is performed early in the execution, then subsequent stages will need to process large amounts of data for longer than necessary, increasing the time and resources needed for the query.
Manual Join Reordering
By default, Presto joins tables in the order in which they are listed in a query. It is the responsibility of the user to optimize the join order when writing queries in order to achieve better performance and handle larger joins. It is often a good idea to join small tables early in the plan, and leave larger fact tables until the end. One should also be careful of introducing cross joins or join conditions that produce output that is larger than the size of the input. Though such joins may sometimes be necessary or optimal, when introduced unintentionally, they can dramatically reduce the performance of the query.
Cross Join Elimination
When the configuration property reorder-joins
or the session property
reorder_joins
is enabled, the optimizer will search for cross joins in
the query plan and try to eliminate them by changing the join order. When
reordering, it will try to preserve the original join order as much as
possible. If cross joins cannot be eliminated, the original join order will be
maintained. For this optimization, the optimizer does not use any statistics.
It will try to eliminate any cross join it can, even if including the cross
joins would have resulted in a more optimal query plan. For example, it may be
optimal to perform a cross join of two small dimension tables before joining in
the larger fact table. However, the optimizer will nevertheless reorder the
joins to remove the cross join. Because of this limitation, this property
should be used cautiously (note that a user can instead reorder the joins
manually to achieve the same effect). The reorder-joins
property is set to
false
by default.
Examples
For the following query:
SELECT * FROM part p, orders o, lineitem l WHERE p.partkey = l.partkey AND l.orderkey = o.orderkey;
In the original join order part, orders, lineitem
, Presto will first join
the part
table with the orders
table, for which there is no join
condition. When reorder-joins=true
, the join order will be changed to
part, lineitem, orders
to eliminate the cross join.
For the following query:
SELECT * FROM part p, orders o, lineitem l, supplier s, nation n
WHERE p.partkey = l.partkey AND l.orderkey = o.orderkey AND l.suppkey = s.suppkey AND s.nationkey = n.nationkey;
The join order will change from part, orders, lineitem, supplier, nation
to
part, lineitem, orders, supplier, nation
.