Teradata QueryGrid: Presto-To-Teradata Database User Guide

Overview

Teradata® QueryGrid™: Presto-To-Teradata Database (henceforth known as “P2T”) provides a SQL interface to query the Teradata Database from Presto. Presto is a SQL query engine for big data that provides access to many data sources, including Hadoop ecosystem databases like Hive and traditional relational databases like MySQL and PostgreSQL.

You can issue queries from Presto with P2T that do the following:

  • Import data from Teradata Database tables and store it in Hadoop or one of the databases that Presto has a connector for.
  • Export data from Presto into an existing table in the Teradata Database.
  • Select data from Teradata tables and join it with existing Hadoop or other Presto data.

P2T additionally performs push-down of simple predicates and selects only the columns needed for the SQL statement, in order to reduce the amount of data that needs to be retrieved. For a list of supported pushdown, see Predicate Pushdown.

Prerequisites

P2T is compatible with versions of the Teradata database in the 15.00 and 15.10 releases. The minimum version for the 15.00 release is 15.00.04.02. The minimum version for the 15.10 release is 15.10.01.01. The Teradata database must have Java 8 installed. Additionally, there are recommended changes to the Teradata JVM properties that greatly improve performance, as described here: JVM Configuration. Changing the JVM properties on the Teradata database requires a restart of the JVM in Teradata.

Only Presto version 0.167-t is supported with this version of P2T.

These installation instructions use the presto-admin utility, described here. The administrator needs sudo and ssh access to all of the Presto nodes to install a new connector with presto-admin.

The following ports must be opened on the Teradata and Presto nodes:
  • The Teradata database port (by default 1025) on all the Teradata nodes for the Presto coordinator and all workers.
  • The TRANSPORT_COORDINATOR_PORT (by default 21338) on all the Presto coordinator and workers for all Teradata nodes.
  • The data-exchange.server.port (by default 21338) on all Teradata nodes for all Presto coordinator and workers.

Note: We assume that the Presto coordinator and workers can connect to each other on TRANSPORT_COORDINATOR_PORT (by default 21338).

Compatibility Matrix

Each version of P2T works with exactly one version of Presto. See the table below for the mapping:

Presto version Presto-to-Teradata QueryGrid version
0.127t 0.2
0.141-t 0.3
0.148-t 1.2
0.152-t 1.3
0.157-t 1.4
0.167-t 1.5

Installation

To install P2T, you must install a driver on Teradata, a P2T foreign server on Teradata, and a connector on Presto.

Before starting installation, upload presto-to-teradata-driver.zip to both the Teradata and Presto clusters and unzip it:

unzip presto-to-teradata-driver.zip -d presto-to-teradata-driver
cd presto-to-teradata-driver
The archive contains the following:
  • the Teradata driver jar file, presto-to-teradata-driver.jar
  • the RPM file to be installed on the Presto cluster, presto-to-teradata-connector-*.noarch.rpm
  • the sample properties files to be copied to the Presto cluster, properties/teradata.properties
  • the install script for the driver on Teradata, install_teradata
  • a utility script to restart the JVM on Teradata, restart_jvm.bteq
  • this documentation, in the docs directory

Installation on Teradata

install-teradata script

All management of P2T on the Teradata side happens via the install-teradata script.

The installation script contain four commands:
  • install - installs the P2T driver jar
  • uninstall - uninstall the P2T driver jar
  • create_presto_to_teradata_server - creates a P2T foreign server
  • drop_server - drops a foreign server

To see a full list of supported options, type the following:

./install-teradata

Teradata Privileges and Security

P2T uses two Teradata accounts: a DBA user and a user that will run queries on the foreign server.

The DBA user (henceforth known as dba_user) is usually sysdba or a user with similar permissions, and it installs the driver in the syslib database.

The following permissions are granted to dba_user during the install task:

GRANT EXECUTE PROCEDURE ON sqlj TO dba_user;
GRANT CREATE EXTERNAL PROCEDURE ON syslib TO dba_user WITH GRANT OPTION;
GRANT CREATE FUNCTION ON syslib TO dba_user;
GRANT CREATE SERVER ON td_server_db TO dba_user;
GRANT DROP SERVER ON td_server_db TO dba_user;

The user that will be running queries (henceforth known as presto_user) can either be an existing user or a new user. That user must have sufficient spool space for the query workload: the spool space must be at least as large as the amount of data used in the query.

The following permissions are granted to presto_user during the install task:

GRANT CREATE SERVER ON td_server_db TO presto_user;
GRANT DROP SERVER ON td_server_db TO presto_user;

In addition, SELECT and EXECUTE FUNCTION privileges for the import and export functions for the foreign server are granted to presto_user.

Warning

P2T foreign servers should not be queried directly by Teradata users, so it is recommended to limit access to foreign servers to presto_user only.

JVM Configuration

P2T requires additional Java Virtual Machine (JVM) options. For best performance, it is recommended to use the Java G1 garbage collector and increase the JVM heap size to 30GB using cufconfig.

Warning

cufconfig -f overwrites the existing JVM properties. To list the existing properties, run cufconfig -o.

Warning

If on a VM or other low-memory environment, you will need to set -Xms and -Xmx to a value no larger than the amount of free memory.

In order to set these JVM options, run the following commands on any Teradata node:

cufconfig -o | grep JVMOptions > old_jvmoptions.txt
echo "JVMOptions: -Xms30g -Xmx30g -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+ExplicitGCInvokesConcurrent -XX:OnOutOfMemoryError=kill -9 %p" > /tmp/jre_config;
cufconfig -f /tmp/jre_config;

Note

For more information about available JVM options and restrictions for -Xms and -Xmx values, please refer to the Java Documentation.

Then, restart the JVM for the Teradata Database using the restart_jvm.bteq script as dba_user:

.run file restart_jvm.bteq

Driver Installation

To install P2T, you need the following information:
  • dba_logon - Teradata’s database admin credentials
  • user_logon - Teradata’s database credentials of the existing user who will use P2T.
  • Transport Coordinator IP - IP address or hostname of the Presto coordinator, accessible from the Teradata hosts; must match host provided in data-exchange.coordinator.host-and-port property in the Presto configuration
  • Transport Coordinator Port - Port used for data exchange on Presto coordinator, accessible from the Teradata hosts; must match port number provided in the data-exchange.server.port property in the Presto configuration

You can install the driver with the following command:

./install-teradata install \
                   --dba_logon dba_login,dba_password \
                   --user_logon user_login,user_password

Note

All values used in command examples must be replaced with values corresponding to your environment.

./install-teradata install performs the following tasks:
  • Registers the driver jar file in the Teradata database.
  • Grants proper rights to the user (in order to be able to use P2T), which are listed in full in the Teradata Privileges and Security section.
  • Registers import and export functions (needed to create Teradata P2T foreign servers).

Note

It is not possible to call install-teradata twice, and only dba_logon and user_logon will have permission to create foreign servers. However, you may grant permissions on the foreign servers to different users.

Foreign Server Creation

After installing the driver, create one or more foreign servers. The Presto connector(s) connect to the foreign server(s) on Teradata.

To create a foreign server, issue the following command (transport_coordinator_port is optional, and defaults to 21338):

./install-teradata create_presto_to_teradata_server \
                   --user_logon user_login,user_password \
                   --server_name p2t \
                   --transport_coordinator_ip <transport_coordinator_ip> \
                   [--transport_coordinator_port <transport_coordinator_port>]

It is possible to create multiple foreign servers with different configurations, as long as they have different names.

Drop foreign servers

To remove a foreign server, issue the following command:

./install-teradata drop_server \
                   --dba_logon dba_login,dba_password \
                   --server_name presto_server

Uninstall Driver

The driver can be uninstalled only after dropping all of the foreign servers. To uninstall the driver, execute the following command:

./install-teradata uninstall \
                  --dba_logon dba_login,dba_password

After uninstalling P2T, run the provided restart_jvm.bteq script to restart the JVM in Teradata to clean up resources allocated by the driver as dba_user:

.run file restart_jvm.bteq

Alternatively, you can run uninstall task with the restart parameter:

./install-teradata uninstall \
                    --dba_logon dba_login,dba_password
                   --restart

Installation on Presto

Privileges and Security

All users of Presto have access to the P2T connector. The connector respects the privileges in Teradata for the tables that it reads from and writes to. If the teradata.connection.user property is set in Presto to presto_user, SHOW SCHEMAS, SHOW TABLES, and DESCRIBE <TABLE> will show schemas, tables, and columns for which presto_user has INSERT and/or SELECT privileges. To issue an INSERT INTO or SELECT query on a table, the user needs INSERT or SELECT privileges respectively. On Teradata, the user needs SELECT privileges for the dbc system tables.

Connector Installation

To install P2T on Presto:

  1. Install the presto-to-teradata-connector RPM:

    ./presto-admin package install presto-to-teradata-connector-*.noarch.rpm
    
  1. Copy properties/teradata.properties from the package zip file to ~/.prestoadmin/catalog. Your ~/.prestoadmin/catalog/teradata.properties file will be as follows:

    connector.name=teradata
    data-exchange.server.port=21338
    data-exchange.coordinator.host-and-port=<PRESTO-HOST>:21338
    teradata.connection.url=jdbc:teradata://<TERADATA-HOST>
    teradata.foreign-server-name=p2t
    teradata.connection.user=<USERNAME_ON_TERADATA>
    teradata.connection.password=<PASSWORD_FOR_ABOVE_USER>
    

Replace <PRESTO-HOST> with the actual Presto coordinator hostname or IP reachable by Teradata. If bynet is configured, make sure to use the bynet IP or a hostname that resolves to the bynet IP so that network traffic uses the fastest possible connection.

Replace <TERADATA-HOST> with the hostname or IP of Teradata reachable by Presto.

If the foreign server created above in Foreign Server Creation is not named p2t, replace p2t with that foreign server name in teradata.foreign-server-name.

Replace <USERNAME_ON_TERADATA> and <PASSWORD_FOR_ABOVE_USER> with a user that has SELECT and INSERT privileges for the foreign server.

Additional configuration properties may be found in the Configuration section.

  1. Deploy the P2T connector configuration using presto-admin:

    ./presto-admin catalog add teradata
    
  2. Restart Presto:

    ./presto-admin server restart
    
  3. Use the Presto CLI to ensure that the P2T connector is installed:

./presto --server localhost:8080 --catalog system --schema default
presto:default> show catalogs;
  Catalog
-----------
 ...
 teradata
 ...

Note

If running the CLI on any node but the Presto coordinator, substitute the Presto coordinator hostname/IP for “localhost” in the above command.

It is possible to add multiple Teradata connectors with different properties by having a different name. For example, you can add a connector specified in other_teradata.properties which connects to a different Teradata Database, for which the connector name will be other_teradata. When adding another Teradata connector, it is necessary to use a different port value for data-exchange.server.port.

Additional configuration is necessary in order to run P2T with Kerberos authentication. For a detailed description of how to configure Kerberos, see Kerberos Support.

Querying Teradata with P2T

In Presto, connectors allow access to different data sources. P2T uses a Teradata connector in Presto. The connector is configured via a configuration file placed in Presto’s catalog directory (see Installation on Presto). For example, a connector specified in teradata.properties can be queried via the following syntax:

SELECT * FROM teradata.<schema-name>.<table-name>;
INSERT INTO teradata.<schema-name>.<table-name> SELECT * FROM hive.default.nation;

To show the schemas in Teradata:

SHOW SCHEMAS IN teradata;

To show tables in Teradata:

SHOW TABLES IN teradata.<schema-name>;

To show table metadata for a given Teradata table:

presto:test_presto_user> describe teradata.test_presto_user.nation;
   Column    |  Type   | Null | Partition Key | Comment
-------------+---------+------+---------------+---------
 n_nationkey | bigint  | true | false         |
 n_name      | varchar | true | false         |
 n_regionkey | bigint  | true | false         |
 n_comment   | varchar | true | false         |
(4 rows)

For extensive documentation of the syntax available in Presto, see the Presto documentation.

Upgrading P2T

In order to upgrade P2T, you will need to update both the Presto connector and the Teradata driver. Each release of P2T works with exactly one version of Presto, so upgrading Presto necessitates upgrading P2T, and vice versa. P2T works with both Teradata 15.00 and 15.10; there is no need to upgrade Teradata.

Upgrading Presto

In order to upgrade the Presto connector:

  1. Uninstall the Teradata connector on Presto:

    ./presto-admin package uninstall presto-to-teradata-connector
    
  2. Upgrade Presto to the version specified in the Compatibility Matrix. See the Presto documentation for the Presto upgrade itself.

  3. Install the new Teradata connector on Presto and restart Presto:
    ./presto-admin package install presto-to-teradata-connector
    ./presto-admin server restart
    

    The configuration files from the previous version of the Teradata connector will remain. You can edit them via ./presto-admin catalog add.

Upgrading Teradata

In order to upgrade the Teradata driver:

  1. Drop all existing foreign servers that use the old version of P2T. In order to see all existing foreign servers, you can issue the following query as dba_user:

    select * from dbc.serverv;

    Before dropping foreign servers, you may want to save the foreign server text:

    SHOW FOREIGN SERVER [foreign-server-name]
    
  2. Uninstall the old version of P2T, using the instructions in Uninstall Driver from the old version of the driver.

  3. Restart the JVM for the Teradata Database using the restart_jvm.bteq script as dba_user:

    .run file restart_jvm.bteq
    
  4. Install the new version, using the instructions in Driver Installation.

  5. Re-create the foreign servers. The release notes will mention any changes to the foreign server DDL, otherwise the old foreign server DDL can be used.

Configuration

Presto Performance Tuning

There are several Presto configuration variables that should be modified for QueryGrid to work with better performance. Add the following properties to the ~/.prestoadmin/workers/config.properties and ~/.prestoadmin/coordinator/config.properties files for Presto:

redistribute-writes=false
task.writer-count=X

Where X is an integer around (CORES_COUNT / EXPECTED_CONCURRENT_QUERIES_COUNT) and must be a power of 2.

Predicate Pushdown Disabling

It is possible to disable predicate pushdown via the property teradata.pushdown-disabled=true in teradata.properties.

Data exchange configuration

The following properties control the data exchange. They can be set for both Presto and Teradata:

  • data-exchange.server.buffer-size - query results write buffer size. This buffer is shared between all the queries that are in progress. When set on Presto, this property sets the buffer size when exporting data from Presto to Teradata (e.g. INSERT INTO teradata...). When set on Teradata, it indicates the import buffer size (e.g. SELECT * FROM teradata..). If not specified, the default is 2GB.
  • data-exchange.server.port - data exchange server port. If not specified, the default is 21338.
  • data-exchange.client.buffer-size - query results read buffer size. This buffer is shared between all the queries that are in progress. When set on Presto, this property sets the buffer size when importing data from Teradata to Presto (e.g. SELECT * FROM teradata..). When set on Teradata, it indicates the export buffer size (e.g. INSERT INTO teradata...). If not specified, the default is 2GB.

Note

Buffer sizes must be specified in human readable data size notation: 100MB, 1GB, 1TB

The data-exchange.server.buffer-size and data-exchange.client.buffer-size properties may need to be changed if latency in the system causes the buffers to fill up. This will cause increased memory usage by P2T.

The data-exchange.server-port property must be changed if the port 21338 is not free. If the port on Presto is changed, TRANSPORT_COORDINATOR_PORT in the foreign server definition, as described in `Optional Name Value Pairs`_, must also be changed to match.

To change a property for Presto, edit the P2T connector configuration file (~/.prestoadmin/catalog/teradata.properties). To apply your changes, distribute the new properties file to all Presto nodes and restart the Presto server::

./presto-admin catalog add teradata
./presto-admin server restart

For Teradata, the data exchange properties are shared across all P2T foreign servers. In order to change a static property for Teradata, you must set a JVM system property in the following format: -Dp2t.property.name=value. Properties for the Teradata JVM can be set using the cufconfig utility. To apply your changes, you must restart the Teradata database using the tpareset command.

Compression

If the network bandwidth is the bottleneck, data compression may increase throughput at the cost of increased CPU utilization. Data compression can be enabled for a particular foreign server/Presto connector pair using the teradata.compression property. If it is not specified, the data is not compressed.

On Presto, add the property to teradata.properties, and then restart the server:

teradata.compression=<compression_algorithm>

The two options for compression_algorithm are LZ4 and SNAPPY. Both algorithms are focused on compression and decompression speed. Compression ratio and compression speed may vary for different queries and data sets.

Query Timeouts

P2T contains several query timeout mechanisms.

Read Timeout

The read timeout (default: 2h) determines the maximum amount of time waiting for query results. It is recommended to increase this value for queries that require complex computation before returning any data.

INSERT INTO teradata.schema.table SELECT count(*) from some_huge_table;

To create an appropriate foreign server with a read timeout of 3 days:

./install-teradata create_presto_to_teradata_server \
                   --user_logon user_login,user_password \
                   --server_name p2t \
                   --transport_coordinator_ip <transport_coordinator_ip> \
                   [--transport_coordinator_port <transport_coordinator_port>] \
                   --server_options "ReadTimeout('3d')"

Set the read timeout on Presto to 3 days as well, by adding the following property to teradata.properties, and then restarting the server:

teradata.read_timeout=3d

Timeouts must have both a number and a unit: for example, 10s, 5m, or 2h.

Write Timeout

The write timeout (default: 1h) determines maximum amount of time waiting for consumer to consume the results. It can be updated by setting the Presto session variable teradata.write_timeout. It is recommended to increase this value for queries for which Presto will have to wait a long time for Teradata to consume the data, e.g. queries that require complex computation after consuming the data or pass a lot of data.

INSERT INTO teradata.schema.table SELECT * FROM some_large_table;

Set the write timeout on Presto to 3 days by adding the following property to teradata.properties, and then restarting the server:

teradata.write_timeout=3d

Timeouts must have both a number and a unit: for example, 10s, 5m, or 2h.

Logging on Teradata

In order to turn on diagnostics/debug logging on Teradata, add the following property to teradata.properties and restart the Presto server:

teradata.debug-enabled=true

When enabled, the logs are in /tmp/jlog-CURRENT_DATE.txt on the Teradata node.

Warning

There is a large amount of output, so in most cases debugging should not be enabled.

Logging on Presto

To adjust the log debug levels on Presto, follow the instructions about adjusting the log level in the Presto documentation.

JVM Configuration in Presto

In Presto you can change JVM properties by editing the jvm.config file. In particular, if you face frequent out-of-memory errors on the Presto workers, you should consider adding additional memory to Presto (by increasing -Xmx). jvm.config can be easily distributed across the cluster using the presto-admin utility. Please refer to the presto-admin documentation for more details.

Kerberos Support

It is possible to connect to Teradata via Kerberos. Before doing so, you need to set up Teradata to authenticate via Kerberos. To do that, see the documentation for Teradata 15.00 or 15.10.

In order to connect to Teradata via Kerberos:

  1. Create a Kerberos principal with a password that you will use to connect to Teradata. All queries from Presto to Teradata will be run via this principal.

  2. Create a Teradata user with the same name as the above Kerberos principal. That user needs to be granted WITH NULL PASSWORD permissions. For example, given a Kerberos principal Test_User:

    CREATE USER Test_User AS PERMANENT=<permsize> SPOOL=<spoolsize> TEMPORARY=<tempsize> PASSWORD="<password>";
    GRANT LOGON ON ALL TO Test_User WITH NULL PASSWORD;
    
  3. Grant INSERT access to the P2T foreign server, if the foreign server was created with a different user than the Kerberos principal:

    GRANT INSERT ON TD_SERVER_DB.p2t TO <principal>
    
  4. Ensure that the /etc/krb5.conf file specifies a restricted keyset. The following lines must be included in /etc/krb5.conf on both the Presto cluster and the Teradata cluster:

    default_tgs_enctypes = arcfour-hmac des3-cbc-sha1 des-cbc-md5
    default_tkt_enctypes = arcfour-hmac des3-cbc-sha1 des-cbc-md5
    
  5. On Presto, add the following additional configuration properties to teradata.properties, using the principal created above as the user and the Kerberos password as the password, and restart the Presto server:

    teradata.authentication.type=KERBEROS
    teradata.connection.user=<principal>
    teradata.connection.password=<password>
    teradata.connection.kerberos.realm=<realm>
    

Note

Please note that though usernames in the Teradata database are not case sensitive, all Kerberos entities, including principals and realms, are case sensitive. Thus, using the above example, the principal must be Test_User.

  1. Add the following property to the jvm.config for both the coordinator and the workers:

    -Djava.security.auth.login.config=/usr/lib/presto/lib/plugin/teradata/TeraJDBC.conf
    
jvm.config can be easily distributed across the cluster using the presto-admin utility. Please refer to the presto-admin documentation for more details.

Technical specification

Type support

The following table refers to type handling/conversion between Teradata and Presto. Type handling can differ between Presto connectors or even between storage types within a connector. Please refer to the documentation for specific Presto connectors for more details.

Type conversions from Teradata to Presto:

Type in Teradata Type in Presto Notes
BYTEINT TINYINT  
SMALLINT SMALLINT  
INTEGER INTEGER  
BIGINT BIGINT  
REAL DOUBLE  
DECIMAL(X,Y) DECIMAL(X,Y) Max precision in both Presto and Teradata is 38; scale <= precision
CHAR(X) CHAR(X)  
VARCHAR(X) VARCHAR(X)  
DATE DATE  
TIMESTAMP TIMESTAMP  
BYTE VARBINARY  
VARBYTE VARBINARY  
JSON VARCHAR  

Note

Presto’s TIMESTAMP type is not SQL standard. When given a timestamp literal like TIMESTAMP ‘1988-04-08 00:00:00’, Presto interprets and stores it in the user time zone, rather than without a time zone as is the SQL standard. Teradata follows the SQL standard. Thus, in order to ensure that the TIMESTAMP values are interpreted properly, it is best to run the Presto server in UTC (by either setting the JVM time zone or setting the time zone via the JDBC driver).

Note

For DECIMAL, Presto does not perform any implicit conversions on INSERT INTO, so the values being inserted must match the type of the column exactly, including precision and scale. Constant decimal values have a precision of the number of digits in the number and scale of the number of fractional digits, so they may need a cast. e.g. INSERT INTO decimal_8_2 VALUES (CAST(1.0 AS DECIMAL(8, 2)))

Predicate Pushdown

P2T can push all projections and some predicates down to Teradata, to limit data transfer. The supported predicates are:

  • IS NULL and IS NOT NULL
  • IN and NOT IN
  • =, <=, >=, <, >, BETWEEN...AND
  • OR and AND of the above predicates (though OR predicates must be on the same column)

Known issues/limitations

  • The maximum VARCHAR size is 31232B, rather than the Teradata limit of 64K
  • Currently the maximum number of concurrent queries is 20. However, if your queries are long running or take a lot of resources (e.g. if they have a large result set), it is advised to run less than 20 queries.
  • If you get an error message stating “Another instance of the Querygrid driver is still running.”, please restart the JVM of your Teradata instance with the restart_jvm.bteq script.
  • TIMESTAMP WITH TIME ZONE and DATE WITH TIME ZONE are not currently supported.
  • It is not possible to access the dbc system tables via P2T, even if the user has proper permissions.
  • When installing more than one Teradata Presto connector, it is necessary to use different port values for data-exchange.server.port.
  • If a Teradata table contains columns of types not supported by Presto, an INSERT INTO query will fail with the message “The positional assignment list has too few values.”
  • SELECT queries will return only the columns of supported Presto types (unless there are no columns with Presto supported types, in which case the SELECT will fail).
  • Since Teradata does not have a BOOLEAN type, in order to insert a Presto BOOLEAN column into Teradata you must cast that column to BIGINT.

Release Notes

Release 1.2

  • Add support for data types INTEGER, TINYINT, SMALLINT, FLOAT, CHAR, and parameterized VARCHAR.
  • Add support for accessing Teradata views.
  • Fix bug in handling DECIMAL of precision 18.
  • Add support for multi-byte UTF-16 characters.
  • Improved performance by implementing low latency connection mechanisms.

Release 1.2.3

  • Allow Presto to connect to multiple Teradata servers.

Release 1.3

  • Update QueryGrid to work with Presto 0.152-t.
  • Add logging for the QueryGrid server version in /tmp/querygrid-presto-driver.log on query startup on Teradata.
  • Improved data transfer speed and resource utilization to lower CPU use on TD.
  • FLOAT type renamed to REAL since the name of the Presto type changed.

Release 1.4

  • Update QueryGrid to work with Presto 0.157-t.

Release 1.4.3

  • Fix correctness bug in DATE/VARBINARY pushdown.

Release 1.5

  • Update QueryGrid to work with Presto 0.167-t.