Welcome to QueryGrid’s documentation!

Installation on Teradata

Prerequisites

Querygrid requires JRE 7 (or higher). Teradata Database 15 has JRE 7 pre-installed along with the JRE 6. In order to change JRE run below code snippet on any of the Teradata nodes:

echo "JREPath: /opt/teradata/jvm64/jre7/jre/" > /tmp/jre_config;
cufconfig -f /tmp/jre_config;
tpareset 'JRE version changed';

We recommend to use G1 collector with the following properties:

-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent

You must ensure that you have enough memory for query and IO buffers. In case of frequent GC invocations try to increase JVM heap size.

-Xms30g
-Xmx30g

In Teradata Database JVM properties can be set with using *cufconfig*. Provided properties will be applied for all of the Teradata nodes. Below is the set of properties which are a good start point of configuring your JVM.

echo "JVMOptions: -Xms30g -Xmx30g -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+ExplicitGCInvokesConcurrent" > /tmp/jre_config;
cufconfig -f /tmp/jre_config;
tpareset 'JVMOptions has been changed';

Be aware that the *cufconfig* doesn’t append the newly added properties but just overwrite them instead. In order to list your existing properties please run *cufconfig -o*. Among other things you will find *JVMOptions* that contains your current JVM settings.

In Presto you can change JVM properties with editing the jvm.config file. It can be easily distributed across the cluster using the presto-admin utility. Please refer to presto-admin documentation for more details.

Upload

Please upload a package zip (teradata-to-presto-driver.zip) file on Teradata and Presto systems and unzip it.

unzip teradata-to-presto-driver.zip -D presto-teradata-driver
cd presto-teradata-driver
The archive contains the following:
  • the driver jar file
  • the installation bash script
  • the rpm file to be installed on Presto cluster
  • the sample properties files to be copied on Presto cluster
  • this documentation

Installation on Presto

Now make Presto compatible with Querygrid by executing the following commands:
  • Use Teradata provided distribution of Presto (version 0.127t), installed with presto-admin tool.
  • Install the querygrid-presto-connector rpm
presto-admin package install querygrid-presto-connector-*.noarch.rpm
  • Copy properties/querygrid.properties to /etc/opt/prestoadmin/connectors.
  • In /etc/opt/prestoadmin/connectors/querygrid.properties replace <PRESTO-HOST> with the actual Presto coordinator hostname or IP reachable by Teradata.
  • Deploy Querygrid connector configuration using presto-admin
presto-admin connector add querygrid
  • Restart Presto
presto-admin server restart
  • Make sure that Querygrid connector is installed
presto:default> show catalogs;
  Catalog
-----------
 ...
 querygrid
 ...

Driver installation

Driver installation consists of the following steps:
  • register the driver jar file in the Teradata database
  • grant proper rights to the user (in order to use Querygrid functionality)
  • register import and export functions (needed to create Teradata’s foreign server).
To perform installation you need the following information:
  • dba_logon - Teradata’s database admin credentials
  • user_logon - Teradata’s database credentials of user who will use the Presto Querygrid driver.
  • presto ip address - and port Presto server, accessible from Teradata’s host

To see a full list of supported options type the following:

./install-teradata
Installation scripts contain four commands:
  • install - installs the querygrid driver
  • uninstall - uninstall the querygrid driver
  • create_teradata_to_presto_server - creates foreign server which will be using querygrid driver
  • drop_server - drops foreign server

Once you have all needed information you can install the driver with the following command:

./install-teradata install \
                   --dba_logon dba_login,dba_password \
                   --user_logon user_login,user_password

Note

All values used in command examples must be replaced with values corresponding to your environment.

After this step you need to create a foreign server (you can skip static configuration).

Static driver configuration

Currently there are following static configurable properties shared across all foreign servers:

  • data-exchange.server.buffer-size - query results write buffer size. Buffer is shared between all the queries that are in progress. Defaults to: 2GB.
  • data-exchange.server.port - querygrid server port. Defaults to: 21337.
  • data-exchange.client.buffer-size - query results read buffer size. Buffer is shared between all the queries that are in progress. Defaults to: 2GB.

Static properties can be set for both Presto and Teradata.

In order to change static property for Presto you must edit the querygrid connector configuration file (/etc/opt/prestoadmin/connectors/querygrid.properties). To apply your changes you must distribute new properties file to all Presto nodes and restart the Presto server:

presto-admin connector add querygrid
presto-admin server restart

In order to change static property for Teradata you must set JVM system property in the following format: -Dt2p.property.name=value. Properties for Teradata JVM can be set using the cufconfig utility. To apply your changes you must restart the Teradata database using the tpareset command.

Note

Buffer sizes must be specified in human readable data size notation: 100MB, 1GB, 1TB

Foreign server creation

To create a foreign server issue the following command:

./install-teradata create_teradata_to_presto_server \
                   --user_logon user_login,user_password \
                   --presto_master_ip 192.168.205.1 \
                   --server_name presto_server

By default hive catalog is queried. You can specify another one via server_catalog:

./install-teradata create_teradata_to_presto_server \
                   --user_logon user_login,user_password \
                   --presto_master_ip 192.168.205.1 \
                   --server_name presto_server_tpch \
                   --server_catalog tpch

If you want to query one particular schema you can specify it via server_schema, in other case you will need to specify it explicitly for each query. Be noticed that when schema is already specified in foreign server you won’t be able to specify any other schema explicitly.

./install-teradata create_teradata_to_presto_server \
                   --user_logon user_login,user_password \
                   --presto_master_ip 192.168.205.1 \
                   --server_name presto_server \
                   --server_schema default

If the network bandwidth is the bottleneck, data compression may increase throughput at the cost of increased CPU utilization. Data compression can be enabled for a particular foreign server using the compression parameter.

./install-teradata create_teradata_to_presto_server \
                   --user_logon user_login,user_password \
                   --presto_master_ip 192.168.205.1 \
                   --server_name presto_server
                   --compression <compression_algorithm>

LZ4 and SNAPPY compression algorithms are available. Both algorithms are focused on compression and decompression speed. Compression ratio and compression speed may vary for different queries and data sets.

Presto database credentials can be changed using the presto_user and presto_password parameters:

./install-teradata create_teradata_to_presto_server \
                   --user_logon user_login,user_password \
                   --presto_master_ip 192.168.205.1 \
                   --server_name presto_server
                   --presto_user <username>
                   --presto_password <password>

Depending on your workflow you may want to adjust a foreign server parameter presto_writer_count. If your general workload consists of running large queries with low concurrency, please set it to higher value. However, if your workload contains running concurrent queries, you may need to lower it. This property controls how many threads will be used by Presto per each QueryGrid query. If your Presto cluster has 64 cores per machine, to fully utilize all of those cores with 4 concurrent queries you need to set presto_writer_count to at least 16. By default the number of concurrent writers is set to 8.

./install-teradata create_teradata_to_presto_server \
                   --user_logon user_login,user_password \
                   --presto_master_ip 192.168.205.1 \
                   --server_name presto_server
                   --presto_writer_count 8

If you need to set any server options, follow the convention in the command below (note last option):

./install-teradata create_teradata_to_presto_server \
                   --user_logon user_login,user_password \
                   --presto_master_ip 192.168.205.1 \
                   --server_name presto_server_custom_temporary_schema \
                   --server_options "MaximumTextLength('4088')"

To see full list of server configuration parameters please see Configuration parameters section.

Usage of foreign servers

Calling bteq command on teradata

Once you have installed the querygrid driver and created a foreign server, play around with them by running the bteq command on the teradata host (logged in as the querygrid driver user) followed by a few sql queries:

bteq .logon user_login,user_password

Bteq’s default character encoding is NOT UTF8 and Presto supports only UTF8/UTF16, so not specifying encoding will cause errors displaying/entering non ascii characters in BTEQ So to avoid issues with encoding, it is better to run bteq as follows:

bteq -c UTF8 -e UTF8 .logon localhost/user_login,user_password

Once you are logged on teradata data warehouse you can run the SQL examples below.

To see the list of schemas (databases) of a foreign server

HELP FOREIGN SERVER presto_server;

To see the list of tables in schema (database)

HELP FOREIGN DATABASE default@presto_server;

To see the list of columns of a table from a foreign server

HELP FOREIGN TABLE region@presto_server;

Select query (import path)

SELECT * from region@presto_server;

Insert into query (export path)

INSERT INTO region@presto_server SELECT * from region_on_teradata_table;

Drop foreign servers

To remove foreign server you can:

./install-teradata drop_server \
                   --user_logon user_login,user_password \
                   --server_name presto_server

Uninstall Driver

Once you dropped all foreign servers, you are able to uninstall driver:

./install-teradata uninstall \
                   --user_logon user_login,user_password

After the uninstall you have to restart your Teradata Database. It can be done with:

tpareset "Querygrid driver has been uninstalled."

Or you can run uninstall task with the restart parameter:

./install-teradata uninstall \
                   --user_logon user_login,user_password \
                   --restart

Configuration parameters

Foreign Presto Querygrid servers can be configured (see –server-options of install-teradata script) with the following parameters:

  • MaximumTextLength - (default: 4096) an integer which determines maximum length in characters of text which could be transferred between databases (Teradata and Presto)
  • MaximumBinaryLength - (default: 4096) an integer which determines maximum length of binary data in bytes which could be transferred between databases (Teradata and Presto)
  • ReadTimeout - (default: 2h) determines the maximum amount of time waiting for query results. It is recommended to increase this value for the queries that require complex computation before returning any data.
SELECT *
FROM FOREIGN TABLE(
  SELECT count(*)
  FROM some_enormous_table
)@foreign_server AS table;
  • WriteTimeout - (default: 1h) determines maximum amount of time waiting for consumer to consume the results. It is recommended to increase this value for the queries that require complex computation after consuming the data.
SELECT fibonacci(bigint_column)
FROM large_table@foreign_server;
  • QueryTimeout - (default: 1h) determines maximum amount of time waiting for the remote query to finish after data transfer is finished. It is recommended to increase this value for the INSERT INTO queries which insert the data into the slow consuming connectors.
INSERT INTO "slow_consuming_connector"."slow_consuming_table"
SELECT * FROM large_table;

Note

Sample timeout values: 10s, 5m, 2h.

To see how to use above parameters see Foreign server creation section.

Technical specification

Type support

The following table refers to type handling/conversion between Presto query engine and Teradata. Type handling can differ between presto connectors or even between storage types within a connector. Please refer to a specific presto connector documentation for more details. For example presto-hive connector maps all integral types to Presto BIGINT type.

Type in Presto Type in Teradata Notes
BOOLEAN BYTEINT  
BIGINT BIGINT  
DOUBLE REAL  
VARCHAR VARCHAR Values exceeding MaximumTextLength will be truncated
DATE DATE  
TIMESTAMP TIMESTAMP  
VARBINARY VARBYTE Error thrown on values exceeding MaximumBinaryLength
JSON VARCHAR  
ARRAY VARCHAR Serialized to/from json
MAP VARCHAR Serialized to/from json
ROW VARCHAR Serialized to json in import path. Writing ROW values to Presto is not supported

Known issues/limitations

  • maximum VARCHAR length is limited by default to 4096 characters (see Configuration parameters)
  • maximum VARBINARY length is limited by default to 4096 bytes (see Configuration parameters)
  • Complex Presto type’s objects are translated to JSON, and they are visible in Teradata as strings. Large complex objects can’t be selected, because their JSON string representation may exceed 64kb.
  • Native Presto queries that have aggregations without explicitly specified aliases fail with an error: Presto query failed: Column name not specified at position 1.
SELECT *
FROM FOREIGN TABLE(
   SELECT count(*)
   FROM nation
)@presto_server AS presto_query

We should provide alias explicitly to make it work.

SELECT *
FROM FOREIGN TABLE(
   SELECT count(*) as some_explicit_alias
   FROM nation
)@presto_server AS presto_query
  • Currently QueryGrid is limited to run 20 concurrent queries. However, if your queries are long running or taking a lot of resources (e.g. big result set) it is advised to run even less than 20 queries. Otherwise you may get an Teradata error:
[Error 7583] [SQLState HY000] The secure mode processes had a set up error.
  • In order to run INSERT INTO queries with the high presto_writer_count you must change *node-scheduler* settings in Presto config.properties file as well

    node-scheduler.max-splits-per-node=MAX_SPLITS_PER_NODE
    node-scheduler.max-pending-splits-per-node-per-task=MAX_SPLITS_PER_TASK_PER_NODE
    
    Where
    • *MAX_SPLITS_PER_TASK_PER_NODE* must equal or greater than presto_writer_count
    • *MAX_SPLITS_PER_NODE* must be equal or greater than presto_writer_count multiplied by the number of queries that are supposed to be run concurrently
  • High presto_writer_count can cause high CPU utilization. If the Presto starts to fail with the timeout related issue try to decrease presto_writer_count.

  • In case of getting the Another instance of the Querygrid driver is still running. error please restart your Teradata instance with the tpareset reason command.