=================== Migrating From Hive =================== Presto uses ANSI SQL syntax and semantics, whereas Hive uses a SQL-like language called HiveQL which is loosely modeled after MySQL (which itself has many differences from ANSI SQL). Use subscript for accessing a dynamic index of an array instead of a udf ------------------------------------------------------------------------ The subscript operator in SQL supports full expressions, unlike Hive (which only supports constants). Therefore you can write queries like:: SELECT my_array[CARDINALITY(my_array)] as last_element FROM ... Avoid out of bounds access of arrays ------------------------------------ Accessing out of bounds elements of an array will result in an exception. You can avoid this with an ``if`` as follows:: SELECT IF(CARDINALITY(my_array) >= 3, my_array[3], NULL) FROM ... Use ANSI SQL syntax for arrays ------------------------------ Arrays are indexed starting from 1, not from 0:: SELECT my_array[1] AS first_element FROM ... Construct arrays with ANSI syntax:: SELECT ARRAY[1, 2, 3] AS my_array Use ANSI SQL syntax for identifiers and strings ----------------------------------------------- Strings are delimited with single quotes and identifiers are quoted with double quotes, not backquotes:: SELECT name AS "User Name" FROM "7day_active" WHERE name = 'foo' Quote identifiers that start with numbers ----------------------------------------- Identifiers that start with numbers are not legal in ANSI SQL and must be quoted using double quotes:: SELECT * FROM "7day_active" Use the standard string concatenation operator ---------------------------------------------- Use the ANSI SQL string concatenation operator:: SELECT a || b || c FROM ... Use standard types for CAST targets ----------------------------------- The following standard types are supported for ``CAST`` targets:: SELECT CAST(x AS varchar) , CAST(x AS bigint) , CAST(x AS double) , CAST(x AS boolean) FROM ... In particular, use ``VARCHAR`` instead of ``STRING``. Use CAST when dividing integers ------------------------------- Presto follows the standard behavior of performing integer division when dividing two integers. For example, dividing ``7`` by ``2`` will result in ``3``, not ``3.5``. To perform floating point division on two integers, cast one of them to a double:: SELECT CAST(5 AS DOUBLE) / 2 Use WITH for complex expressions or queries ------------------------------------------- When you want to re-use a complex output expression as a filter, use either an inline subquery or factor it out using the ``WITH`` clause:: WITH a AS ( SELECT substr(name, 1, 3) x FROM ... ) SELECT * FROM a WHERE x = 'foo' Use UNNEST to expand arrays and maps ------------------------------------ Presto supports :ref:`unnest` for expanding arrays and maps. Use ``UNNEST`` instead of ``LATERAL VIEW explode()``. Hive query:: SELECT student, score FROM tests LATERAL VIEW explode(scores) t AS score; Presto query:: SELECT student, score FROM tests CROSS JOIN UNNEST(scores) AS t (score); Outer Join Differences ---------------------- Adhering to the ANSI SQL spec, Presto respects the abstract concept that the *whole* ``ON`` clause is evaluated to determine whether or not a row from the left table will be joined with a right table row. In a ``LEFT JOIN``, all the rows of the left table are always returned out of the join, vice versa for a ``RIGHT JOIN``. In contrast, Hive will *first* apply any constant filters in the ``ON`` clause *then* perform the join. This can produce very different results when ``ON`` clause predicates refer to the outer table. When you want to convert a Hive ``OUTER JOIN`` query to Presto, remember that Hive treats the ``ON`` clause predicates as if it were part of the ``WHERE`` clause. So to get the equivalent behavior in Presto, you need to move your ``ON`` clause predicates into the ``WHERE`` clause. Hive query:: SELECT a.id, b.userid FROM a LEFT JOIN b ON a.id = b.id AND a.ds = '2013-11-11' Presto query:: SELECT a.id, b.userid FROM a LEFT JOIN b ON a.id = b.id WHERE a.ds = '2013-11-11'