23.2. Release 0.167-t

Presto 0.167-t is equivalent to Presto release 0.167, with some additional features and patches.

SQL Features

  • Add GROUPING support.

Data Types

  • Native functions for CHAR(x).
  • DECIMAL is the default literal for non-integral numbers.

The Teradata distribution of Presto fixes the semantics of the TIMESTAMP and TIME types to align with the SQL standard. See the following sections for details.

TIMESTAMP semantic changes

Previously, the TIMESTAMP type described an instance in time in the Presto session’s time zone. Now, Presto treats TIMESTAMP values as a set of the following fields representing wall time:

  • YEAR OF ERA
  • MONTH OF YEAR
  • DAY OF MONTH
  • HOUR OF DAY
  • MINUTE OF HOUR
  • SECOND OF MINUTE - as decimal with precision 3

For that reason, a TIMESTAMP value is not linked with the session time zone in any way until a time zone is needed explicitly, such as when casting to a TIMESTAMP WITH TIME ZONE or TIME WITH TIME ZONE. In those cases, the time zone offset of the session time zone is applied, as specified in the SQL standard.

For various compatibility reasons, when casting from date/time type without a time zone to one with a time zone, a fixed time zone is used as opposed to the named one that may be set for the session.

eg. with -Duser.timezone="Asia/Kathmandu" on CLI

  • Query: SELECT CAST(TIMESTAMP '2000-01-01 10:00' AS TIMESTAMP WITH TIME ZONE);
  • Previous result: 2000-01-01 10:00:00.000 Asia/Kathmandu
  • Current result: 2000-01-01 10:00:00.000 +05:45

TIME semantic changes

The TIME type was changed similarly to the TIMESTAMP type.

TIME WITH TIME ZONE semantic changes

Due to compatibility requirements, having TIME WITH TIME ZONE completely aligned with the SQL standard was not possible yet. For that reason, when calculating the time zone offset for TIME WITH TIME ZONE, the Teradata distribution of Presto uses the session’s start date and time.

This can be seen in queries using TIME WITH TIME ZONE in a time zone that has had time zone policy changes or uses DST. eg. With session start time on 1 March 2017

  • Query: SELECT TIME '10:00:00 Asia/Kathmandu' AT TIME ZONE 'UTC'
  • Previous result: 04:30:00.000 UTC
  • Current result: 04:15:00.000 UTC

Time-related bug fixes

  • The current_time and localtime functions were fixed to return the correct value for non-UTC timezones.

Security

  • File based system access control plugin that allows you to specify catalog access control rules and Kerberos principal matching rules.
  • Secure cluster communication over HTTPS.
  • SHOW GRANTS support for the Hive connector.
  • ROLE support for the Hive connector, including CREATE ROLE, DROP ROLE, GRANT ROLE, REVOKE ROLE, SET ROLE, SHOW CURRENT ROLES, SHOW ROLES and SHOW ROLE GRANTS commands.

Performance

  • Automatically choose between repartitioned and replicated based on Hive table statistics. Must have join-distribution-type set to automatic.
  • Push aggregations below outer joins when possible. It is particularly useful for correlated scalar aggregations.
  • Faster DECIMAL implementation.
  • Distributed aggregation for ROLLUP and CUBE.
  • Improve window function performance.
  • Improve performance of projections without filters.
  • Push down equality predicates on clustering keys for the Cassandra connector.

Teradata Presto ODBC/JDBC Drivers

  • The Teradata JDBC driver does not support batch queries.
  • SET ROLE functionality is not supported on ODBC versions <= 1.1.8 and JDBC versions <= 1.0.14.

Connectors

  • SQL Server connector, TPC-DS connector and Memory Connector were added

Documentation

  • Query Optimizer
  • Query Performance Analysis
  • Presto Tuning
  • Prepared Statements
  • Installation of Presto via Presto Admin
  • CLI options

Miscellaneous

  • Support prepared statements that are longer than 4K bytes.
  • Additional EXPLAIN ANALYZE information for window functions and spill to disk.
  • More informative error messages for CLI users.

Bugs Fixed

  • Fix analysis error message for queries containing both aggregations and subqueries in some cases.
  • Fix issue “Hive table is corrupt. It is declared as being bucketed, but the files do not match the bucketing declaration. The number of files in the directory (1) does not match the declared.” by fixing support for Hive bucketed tables. See option hive.multi-file-bucketing.enabled in the Presto Hive connector documentation.
  • Allow empty partitions for clustered hive tables.
  • Clean up files on failed CREATE TABLE AS SELECT queries on S3.
  • Fix bug in JOIN ON clause with two char fields.
  • Fix wrong results when an argument to a function in the ORDER BY clause needs to be coerced.
  • Temporarily revert empty join short-circuit optimization due to issue with hanging queries.
  • Fix reading decimals for RCFile text format using non-optimized reader.

Unsupported Functionality

Some functionality from Presto 0.167 may work but is not officially supported by Teradata.

  • The installation method as documented on prestodb.io.
  • Web Connector for Tableau
  • The following connectors:
    • Accumulo Connector
    • Kafka Connector
    • Local File Connector
    • MongoDB Connector
    • Redis Connector

SQL/DML/DDL Limitations

  • The SQL keyword end is used as a column name in system.runtime.queries, so in order to query from that column, end must be wrapped in quotes
  • NATURAL JOIN is not supported
  • LIMIT ALL and OFFSET are not supported
  • For INSERT INTO ... VALUES, the data types must be exact, i.e. must use 2.0 for double, cast('2015-1-1' as date) for date, and you must supply a value for every column.
  • The expression on the left hand side of IN must not be NULL for any of the queried rows. Otherwise, the query will fail. This limitation is needed to ensure correct results and may be dropped in the future. This also affect quantified subqueries. See SELECT and Comparison Functions and Operators. In previously versions of Presto this query was allowed to execute potentially resulting in incorrect results.

Additional Limitations

  • If called through JDBC, executeUpdate does not return the count of rows inserted.
  • Presto does not push down aggregate calculations. This means that when a user executes a simple query such as SELECT COUNT(*) FROM lineitem the entire table will be retrieved and the aggregate calculated by Presto. If the table is large or the network slow, this may take a very long time.

QueryGrid

QueryGrid connectors Presto-to-Teradata and Teradata-to-Presto version 1.5 will be the terminal release for the 1.x generation. For future releases of Presto, you must upgrade to the QueryGrid 2.x current generation.


Hive Connector Limitations

File Formats

Teradata supports data stored in the following formats:

  • Text files
  • ORC
  • RCFILE
  • PARQUET

TIMESTAMP limitations

Presto supports a granularity of milliseconds for the TIMESTAMP datatype, while Hive supports microseconds.

INSERT INTO ... SELECT limitations

INSERT INTO creates unreadable data (unreadable both by Hive and Presto) if a Hive table has a schema for which Presto only interprets some of the columns (e.g. due to unsupported data types). This is because the generated file on HDFS will not match the Hive table schema.

Hive Transactions

The Hive connector does not support Hive ACID tables. https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions


PostgreSQL and MySQL Connector Limitations

Known Bugs

PostgreSQL connector describe table reports Table has no supported column types inappropriately. https://github.com/facebook/presto/issues/4082

Security

Presto connects to MySQL and PostgreSQL using the credentials specified in the properties file. The credentials are used to authenticate the users while establishing the connection. Presto runs queries as the “presto” service user and does not pass down user information to MySQL or PostgreSQL connectors.

Datatypes

PostgreSQL and MySQL each support a wide variety of datatypes. Many of these types are not supported in Presto. Table columns that are defined using an unsupported type are not visible to Presto users. These columns are not shown when describe table or select * SQL statements are executed.

MySQL Catalogs

MySQL catalog names are mapped to Presto schema names.


23.3. Release 0.167-t.0.4

The following has been added to 0.167-t.0.4:

  • Hive configuration property to enable/disable creating non-managed (external) Hive tables
  • Support NULLs as valid values in value IN (%subquery%)

23.4. Release 0.167-t.0.5

The following has been added to 0.167-t.0.5:

  • Disable identity projection optimization that might lead to unpredicted transfer errors

23.5. Release 0.167-t.0.6

The following has been added to 0.167-t.0.5:

  • Disable gathering of extended statistics for window function that might lead to unpredictable failures on huge data sets