22.1. Release 0.167-t
Presto 0.167-t is equivalent to Presto release 0.167, with some additional features and patches.
SQL Features
- Add
GROUPING
support.
Data Types
- Native functions for
CHAR(x)
. DECIMAL
is the default literal for non-integral numbers.
The Teradata distribution of Presto fixes the semantics of the TIMESTAMP
and TIME
types to align with the SQL standard. See the following sections for details.
TIMESTAMP semantic changes
Previously, the TIMESTAMP
type described an instance in time in the Presto session’s time zone.
Now, Presto treats TIMESTAMP
values as a set of the following fields representing wall time:
YEAR OF ERA
MONTH OF YEAR
DAY OF MONTH
HOUR OF DAY
MINUTE OF HOUR
SECOND OF MINUTE
- as decimal with precision 3
For that reason, a TIMESTAMP
value is not linked with the session time zone in any way until a time zone is needed explicitly,
such as when casting to a TIMESTAMP WITH TIME ZONE
or TIME WITH TIME ZONE
.
In those cases, the time zone offset of the session time zone is applied, as specified in the SQL standard.
For various compatibility reasons, when casting from date/time type without a time zone to one with a time zone, a fixed time zone is used as opposed to the named one that may be set for the session.
eg. with -Duser.timezone="Asia/Kathmandu"
on CLI
- Query:
SELECT CAST(TIMESTAMP '2000-01-01 10:00' AS TIMESTAMP WITH TIME ZONE);
- Previous result:
2000-01-01 10:00:00.000 Asia/Kathmandu
- Current result:
2000-01-01 10:00:00.000 +05:45
TIME semantic changes
The TIME
type was changed similarly to the TIMESTAMP
type.
TIME WITH TIME ZONE semantic changes
Due to compatibility requirements, having TIME WITH TIME ZONE
completely aligned with the SQL standard was not possible yet.
For that reason, when calculating the time zone offset for TIME WITH TIME ZONE
, the Teradata distribution of Presto uses
the session’s start date and time.
This can be seen in queries using TIME WITH TIME ZONE
in a time zone that has had time zone policy changes or uses DST.
eg. With session start time on 1 March 2017
- Query:
SELECT TIME '10:00:00 Asia/Kathmandu' AT TIME ZONE 'UTC'
- Previous result:
04:30:00.000 UTC
- Current result:
04:15:00.000 UTC
Time-related bug fixes
- The
current_time
andlocaltime
functions were fixed to return the correct value for non-UTC timezones.
Security
- File based system access control plugin that allows you to specify catalog access control rules and Kerberos principal matching rules.
- Secure cluster communication over HTTPS.
SHOW GRANTS
support for the Hive connector.ROLE
support for the Hive connector, includingCREATE ROLE
,DROP ROLE
,GRANT ROLE
,REVOKE ROLE
,SET ROLE
,SHOW CURRENT ROLES
,SHOW ROLES
andSHOW ROLE GRANTS
commands.
Performance
- Automatically choose between repartitioned and replicated based on Hive table statistics.
Must have
join-distribution-type
set toautomatic
. - Push aggregations below outer joins when possible. It is particularly useful for correlated scalar aggregations.
- Faster
DECIMAL
implementation. - Distributed aggregation for ROLLUP and CUBE.
- Improve window function performance.
- Improve performance of projections without filters.
- Push down equality predicates on clustering keys for the Cassandra connector.
Teradata Presto ODBC/JDBC Drivers
- The Teradata JDBC driver does not support batch queries.
SET ROLE
functionality is not supported on ODBC versions <= 1.1.8 and JDBC versions <= 1.0.14.
Connectors
- SQL Server connector, TPC-DS connector and Memory Connector were added
Documentation
- Query Optimizer
- Query Performance Analysis
- Presto Tuning
- Prepared Statements
- Installation of Presto via Presto Admin
- CLI options
Miscellaneous
- Support prepared statements that are longer than 4K bytes.
- Additional
EXPLAIN ANALYZE
information for window functions and spill to disk. - More informative error messages for CLI users.
Bugs Fixed
- Fix analysis error message for queries containing both aggregations and subqueries in some cases.
- Fix issue “Hive table is corrupt. It is declared as being bucketed, but the files do not match the bucketing declaration. The number of files in the directory (1) does not match the declared.” by fixing support for Hive bucketed tables. See option hive.multi-file-bucketing.enabled in the Presto Hive connector documentation.
- Allow empty partitions for clustered hive tables.
- Clean up files on failed
CREATE TABLE AS SELECT
queries on S3. - Fix bug in
JOIN ON
clause with two char fields. - Fix wrong results when an argument to a function in the
ORDER BY
clause needs to be coerced. - Temporarily revert empty join short-circuit optimization due to issue with hanging queries.
- Fix reading decimals for RCFile text format using non-optimized reader.
Unsupported Functionality
Some functionality from Presto 0.167 may work but is not officially supported by Teradata.
- The installation method as documented on prestodb.io.
- Web Connector for Tableau
- The following connectors:
- Accumulo Connector
- Kafka Connector
- Local File Connector
- MongoDB Connector
- Redis Connector
SQL/DML/DDL Limitations
- The SQL keyword
end
is used as a column name insystem.runtime.queries
, so in order to query from that column,end
must be wrapped in quotesNATURAL JOIN
is not supportedLIMIT ALL
andOFFSET
are not supported- For
INSERT INTO ... VALUES
, the data types must be exact, i.e. must use2.0
fordouble
,cast('2015-1-1' as date)
fordate
, and you must supply a value for every column.- The expression on the left hand side of
IN
must not beNULL
for any of the queried rows. Otherwise, the query will fail. This limitation is needed to ensure correct results and may be dropped in the future. This also affect quantified subqueries. See SELECT and Comparison Functions and Operators. In previously versions of Presto this query was allowed to execute potentially resulting in incorrect results.
Additional Limitations
- If called through JDBC, executeUpdate does not return the count of rows inserted.
- Presto does not push down aggregate calculations. This means that when a user executes a simple query such as
SELECT COUNT(*) FROM lineitem
the entire table will be retrieved and the aggregate calculated by Presto. If the table is large or the network slow, this may take a very long time.
QueryGrid
QueryGrid connectors Presto-to-Teradata and Teradata-to-Presto version 1.5 will be the terminal release for the 1.x generation. For future releases of Presto, you must upgrade to the QueryGrid 2.x current generation.
Hive Connector Limitations
File Formats
Teradata supports data stored in the following formats:
- Text files
- ORC
- RCFILE
- PARQUET
TIMESTAMP limitations
Presto supports a granularity of milliseconds for the TIMESTAMP
datatype, while Hive
supports microseconds.
INSERT INTO ... SELECT limitations
INSERT INTO creates unreadable data (unreadable both by Hive and Presto) if a Hive table has a schema for which Presto only interprets some of the columns (e.g. due to unsupported data types). This is because the generated file on HDFS will not match the Hive table schema.
Hive Transactions
The Hive connector does not support Hive ACID tables. https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
PostgreSQL and MySQL Connector Limitations
Known Bugs
PostgreSQL connector describe table
reports Table has no supported column types
inappropriately.
https://github.com/facebook/presto/issues/4082
Security
Presto connects to MySQL and PostgreSQL using the credentials specified in the properties file. The credentials are used to authenticate the users while establishing the connection. Presto runs queries as the “presto” service user and does not pass down user information to MySQL or PostgreSQL connectors.
Datatypes
PostgreSQL and MySQL each support a wide variety of datatypes. Many of these
types are not supported in Presto. Table columns that are defined using an unsupported type are not visible to Presto
users. These columns are not shown when describe table
or select *
SQL statements are executed.
MySQL Catalogs
MySQL catalog names are mapped to Presto schema names.