19.4. Join Reordering

Background

The order in which joins are executed in a query can have a significant impact on the query’s performance. The aspect of join ordering that has the largest impact on performance is the size of the data being processed and passed over the network. If a join is not a primary key-foreign key join, the data produced can be much greater than the size of either table in the join– up to |Table 1| x |Table 2| for a cross join. If a join that produces a lot of data is performed early in the execution, then subsequent stages will need to process large amounts of data for longer than necessary, increasing the time and resources needed for the query.

Join Reordering Options in Presto

Presto’s join reordering strategy can be set by the configuration property join-reordering-strategy or the session property join_reordering_strategy. The options are NONE, ELIMINATE_CROSS_JOINS, and COST_BASED. The default value is ELIMINATE_CROSS _JOINS.

NONE

When the join reordering strategy is set to NONE, Presto joins tables in the order in which they are listed in a query. It is the responsibility of the user to optimize the join order when writing queries in order to achieve better performance and handle larger joins. It is often a good idea to join small tables early in the plan, and leave larger fact tables until the end. One should also be careful of introducing cross joins or join conditions that produce output that is larger than the size of the input. Though such joins may sometimes be necessary or optimal, when introduced unintentionally, they can dramatically reduce the performance of the query.

ELIMINATE_CROSS_JOINS

When the join reordering strategy is set to ELIMINATE_CROSS_JOINS (the default), the optimizer will search for cross joins in the query plan and try to eliminate them by changing the join order. When reordering, Presto will try to preserve the original join order as much as possible. If cross joins cannot be eliminated, the original join order will be maintained. Note that this join reordering strategy does not use any statistics. Therefore, Presto will try to eliminate any cross join it can, even if including the cross joins would have resulted in a more optimal query plan. For example, it may be optimal to perform a cross join of two small dimension tables before joining in the larger fact table. However, Presto will nevertheless reorder the joins to remove the cross join.

Examples

For the following query:

SELECT * FROM part p, orders o, lineitem l WHERE p.partkey = l.partkey AND l.orderkey = o.orderkey;

In the original join order part, orders, lineitem, Presto will first join the part table with the orders table, for which there is no join condition. When reorder-joins=true, the join order will be changed to part, lineitem, orders to eliminate the cross join.

For the following query:

SELECT * FROM part p, orders o, lineitem l, supplier s, nation n
WHERE p.partkey = l.partkey AND l.orderkey = o.orderkey AND l.suppkey = s.suppkey AND s.nationkey = n.nationkey;

The join order will change from part, orders, lineitem, supplier, nation to part, lineitem, orders, supplier, nation.

COST_BASED

When the join reordering strategy is set to COST_BASED, Presto will use Table Statistics provided by the connectors to estimate the costs for different join orders. It will then choose the join order with the lowest computed cost. If statistics are not available or if for any other reason a cost could not be computed, the ELIMINATE_CROSS_JOINS strategy is used instead. Enumerating all possible join orders for many tables is computationally intensive, so joins are reordered in groups of ten at a time. For example if a query has 15 joins, the first ten will be reordered together, and the last five will be reordered together.