=============== Join Reordering =============== Background ---------- The order in which joins are executed in a query can have a significant impact on the query's performance. The aspect of join ordering that has the largest impact on performance is the size of the data being processed and passed over the network. If a join is not a primary key-foreign key join, the data produced can be much greater than the size of either table in the join-- up to \|Table 1\| x \|Table 2\| for a cross join. If a join that produces a lot of data is performed early in the execution, then subsequent stages will need to process large amounts of data for longer than necessary, increasing the time and resources needed for the query. Join Reordering Options in Presto --------------------------------- Presto's join reordering strategy can be set by the configuration property ``join-reordering-strategy`` or the session property ``join_reordering_strategy``. The options are ``NONE``, ``ELIMINATE_CROSS_JOINS``, and ``COST_BASED``. The default value is ``ELIMINATE_CROSS _JOINS``. NONE ==== When the join reordering strategy is set to ``NONE``, Presto joins tables in the order in which they are listed in a query. It is the responsibility of the user to optimize the join order when writing queries in order to achieve better performance and handle larger joins. It is often a good idea to join small tables early in the plan, and leave larger fact tables until the end. One should also be careful of introducing cross joins or join conditions that produce output that is larger than the size of the input. Though such joins may sometimes be necessary or optimal, when introduced unintentionally, they can dramatically reduce the performance of the query. ELIMINATE_CROSS_JOINS ===================== When the join reordering strategy is set to ``ELIMINATE_CROSS_JOINS`` (the default), the optimizer will search for cross joins in the query plan and try to eliminate them by changing the join order. When reordering, Presto will try to preserve the original join order as much as possible. If cross joins cannot be eliminated, the original join order will be maintained. Note that this join reordering strategy does not use any statistics. Therefore, Presto will try to eliminate any cross join it can, even if including the cross joins would have resulted in a more optimal query plan. For example, it may be optimal to perform a cross join of two small dimension tables before joining in the larger fact table. However, Presto will nevertheless reorder the joins to remove the cross join. Examples ^^^^^^^^ For the following query: .. code-block:: sql SELECT * FROM part p, orders o, lineitem l WHERE p.partkey = l.partkey AND l.orderkey = o.orderkey; In the original join order ``part, orders, lineitem``, Presto will first join the ``part`` table with the ``orders`` table, for which there is no join condition. When ``reorder-joins=true``, the join order will be changed to ``part, lineitem, orders`` to eliminate the cross join. For the following query: .. code-block:: sql SELECT * FROM part p, orders o, lineitem l, supplier s, nation n WHERE p.partkey = l.partkey AND l.orderkey = o.orderkey AND l.suppkey = s.suppkey AND s.nationkey = n.nationkey; The join order will change from ``part, orders, lineitem, supplier, nation`` to ``part, lineitem, orders, supplier, nation``. COST_BASED ========== When the join reordering strategy is set to ``COST_BASED``, Presto will use :doc:`/optimizer/statistics` provided by the connectors to estimate the costs for different join orders. It will then choose the join order with the lowest computed cost. If statistics are not available or if for any other reason a cost could not be computed, the ``ELIMINATE_CROSS_JOINS`` strategy is used instead. Enumerating all possible join orders for many tables is computationally intensive, so joins are reordered in groups of ten at a time. For example if a query has 15 joins, the first ten will be reordered together, and the last five will be reordered together.