Querygrid requires JRE 7 (or higher). Teradata Database 15 has JRE 7 pre-installed along with the JRE 6. In order to change JRE run below code snippet on any of the Teradata nodes:
echo "JREPath: /opt/teradata/jvm64/jre7/jre/" > /tmp/jre_config;
cufconfig -f /tmp/jre_config;
tpareset 'JRE version changed';
We recommend to use G1 collector with the following properties:
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
You must ensure that you have enough memory for query and IO buffers. In case of frequent GC invocations try to increase JVM heap size.
-Xms30g
-Xmx30g
In Teradata Database JVM properties can be set with using *cufconfig*. Provided properties will be applied for all of the Teradata nodes. Below is the set of properties which are a good start point of configuring your JVM.
echo "JVMOptions: -Xms30g -Xmx30g -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+ExplicitGCInvokesConcurrent" > /tmp/jre_config;
cufconfig -f /tmp/jre_config;
tpareset 'JVMOptions has been changed';
Be aware that the *cufconfig* doesn’t append the newly added properties but just overwrite them instead. In order to list your existing properties please run *cufconfig -o*. Among other things you will find *JVMOptions* that contains your current JVM settings.
In Presto you can change JVM properties with editing the jvm.config file. It can be easily distributed across the cluster using the presto-admin utility. Please refer to presto-admin documentation for more details.
Please upload a package zip (teradata-to-presto-driver.zip) file on Teradata and Presto systems and unzip it.
unzip teradata-to-presto-driver.zip -D presto-teradata-driver
cd presto-teradata-driver
presto-admin package install querygrid-presto-connector-*.noarch.rpm
presto-admin connector add querygrid
presto-admin server restart
presto:default> show catalogs; Catalog ----------- ... querygrid ...
To see a full list of supported options type the following:
./install-teradata
Once you have all needed information you can install the driver with the following command:
./install-teradata install \
--dba_logon dba_login,dba_password \
--user_logon user_login,user_password
Note
All values used in command examples must be replaced with values corresponding to your environment.
After this step you need to create a foreign server (you can skip static configuration).
Currently there are following static configurable properties shared across all foreign servers:
- data-exchange.server.buffer-size - query results write buffer size. Buffer is shared between all the queries that are in progress. Defaults to: 2GB.
- data-exchange.server.port - querygrid server port. Defaults to: 21337.
- data-exchange.client.buffer-size - query results read buffer size. Buffer is shared between all the queries that are in progress. Defaults to: 2GB.
Static properties can be set for both Presto and Teradata.
In order to change static property for Presto you must edit the querygrid connector configuration file (/etc/opt/prestoadmin/connectors/querygrid.properties). To apply your changes you must distribute new properties file to all Presto nodes and restart the Presto server:
presto-admin connector add querygrid presto-admin server restart
In order to change static property for Teradata you must set JVM system property in the following format: -Dt2p.property.name=value. Properties for Teradata JVM can be set using the cufconfig utility. To apply your changes you must restart the Teradata database using the tpareset command.
Note
Buffer sizes must be specified in human readable data size notation: 100MB, 1GB, 1TB
To create a foreign server issue the following command:
./install-teradata create_teradata_to_presto_server \
--user_logon user_login,user_password \
--presto_master_ip 192.168.205.1 \
--server_name presto_server
By default hive catalog is queried. You can specify another one via server_catalog:
./install-teradata create_teradata_to_presto_server \
--user_logon user_login,user_password \
--presto_master_ip 192.168.205.1 \
--server_name presto_server_tpch \
--server_catalog tpch
If you want to query one particular schema you can specify it via server_schema, in other case you will need to specify it explicitly for each query. Be noticed that when schema is already specified in foreign server you won’t be able to specify any other schema explicitly.
./install-teradata create_teradata_to_presto_server \
--user_logon user_login,user_password \
--presto_master_ip 192.168.205.1 \
--server_name presto_server \
--server_schema default
If the network bandwidth is the bottleneck, data compression may increase throughput at the cost of increased CPU utilization. Data compression can be enabled for a particular foreign server using the compression parameter.
./install-teradata create_teradata_to_presto_server \
--user_logon user_login,user_password \
--presto_master_ip 192.168.205.1 \
--server_name presto_server
--compression <compression_algorithm>
LZ4 and SNAPPY compression algorithms are available. Both algorithms are focused on compression and decompression speed. Compression ratio and compression speed may vary for different queries and data sets.
Presto database credentials can be changed using the presto_user and presto_password parameters:
./install-teradata create_teradata_to_presto_server \
--user_logon user_login,user_password \
--presto_master_ip 192.168.205.1 \
--server_name presto_server
--presto_user <username>
--presto_password <password>
Depending on your workflow you may want to adjust a foreign server parameter presto_writer_count. If your general workload consists of running large queries with low concurrency, please set it to higher value. However, if your workload contains running concurrent queries, you may need to lower it. This property controls how many threads will be used by Presto per each QueryGrid query. If your Presto cluster has 64 cores per machine, to fully utilize all of those cores with 4 concurrent queries you need to set presto_writer_count to at least 16. By default the number of concurrent writers is set to 8.
./install-teradata create_teradata_to_presto_server \
--user_logon user_login,user_password \
--presto_master_ip 192.168.205.1 \
--server_name presto_server
--presto_writer_count 8
If you need to set any server options, follow the convention in the command below (note last option):
./install-teradata create_teradata_to_presto_server \
--user_logon user_login,user_password \
--presto_master_ip 192.168.205.1 \
--server_name presto_server_custom_temporary_schema \
--server_options "MaximumTextLength('4088')"
To see full list of server configuration parameters please see Configuration parameters section.
Once you have installed the querygrid driver and created a foreign server, play around with them by running the bteq command on the teradata host (logged in as the querygrid driver user) followed by a few sql queries:
bteq .logon user_login,user_password
Bteq’s default character encoding is NOT UTF8 and Presto supports only UTF8/UTF16, so not specifying encoding will cause errors displaying/entering non ascii characters in BTEQ So to avoid issues with encoding, it is better to run bteq as follows:
bteq -c UTF8 -e UTF8 .logon localhost/user_login,user_password
Once you are logged on teradata data warehouse you can run the SQL examples below.
HELP FOREIGN SERVER presto_server;
HELP FOREIGN DATABASE default@presto_server;
HELP FOREIGN TABLE region@presto_server;
SELECT * from region@presto_server;
INSERT INTO region@presto_server SELECT * from region_on_teradata_table;
To remove foreign server you can:
./install-teradata drop_server \
--user_logon user_login,user_password \
--server_name presto_server
Once you dropped all foreign servers, you are able to uninstall driver:
./install-teradata uninstall \
--user_logon user_login,user_password
After the uninstall you have to restart your Teradata Database. It can be done with:
tpareset "Querygrid driver has been uninstalled."
Or you can run uninstall task with the restart parameter:
./install-teradata uninstall \
--user_logon user_login,user_password \
--restart
Foreign Presto Querygrid servers can be configured (see –server-options of install-teradata script) with the following parameters:
- MaximumTextLength - (default: 4096) an integer which determines maximum length in characters of text which could be transferred between databases (Teradata and Presto)
- MaximumBinaryLength - (default: 4096) an integer which determines maximum length of binary data in bytes which could be transferred between databases (Teradata and Presto)
- ReadTimeout - (default: 2h) determines the maximum amount of time waiting for query results. It is recommended to increase this value for the queries that require complex computation before returning any data.
SELECT * FROM FOREIGN TABLE( SELECT count(*) FROM some_enormous_table )@foreign_server AS table;
- WriteTimeout - (default: 1h) determines maximum amount of time waiting for consumer to consume the results. It is recommended to increase this value for the queries that require complex computation after consuming the data.
SELECT fibonacci(bigint_column) FROM large_table@foreign_server;
- QueryTimeout - (default: 1h) determines maximum amount of time waiting for the remote query to finish after data transfer is finished. It is recommended to increase this value for the INSERT INTO queries which insert the data into the slow consuming connectors.
INSERT INTO "slow_consuming_connector"."slow_consuming_table" SELECT * FROM large_table;Note
Sample timeout values: 10s, 5m, 2h.
To see how to use above parameters see Foreign server creation section.
The following table refers to type handling/conversion between Presto query engine and Teradata. Type handling can differ between presto connectors or even between storage types within a connector. Please refer to a specific presto connector documentation for more details. For example presto-hive connector maps all integral types to Presto BIGINT type.
Type in Presto | Type in Teradata | Notes |
---|---|---|
BOOLEAN | BYTEINT | |
BIGINT | BIGINT | |
DOUBLE | REAL | |
VARCHAR | VARCHAR | Values exceeding MaximumTextLength will be truncated |
DATE | DATE | |
TIMESTAMP | TIMESTAMP | |
VARBINARY | VARBYTE | Error thrown on values exceeding MaximumBinaryLength |
JSON | VARCHAR | |
ARRAY | VARCHAR | Serialized to/from json |
MAP | VARCHAR | Serialized to/from json |
ROW | VARCHAR | Serialized to json in import path. Writing ROW values to Presto is not supported |
- maximum VARCHAR length is limited by default to 4096 characters (see Configuration parameters)
- maximum VARBINARY length is limited by default to 4096 bytes (see Configuration parameters)
- Complex Presto type’s objects are translated to JSON, and they are visible in Teradata as strings. Large complex objects can’t be selected, because their JSON string representation may exceed 64kb.
- Native Presto queries that have aggregations without explicitly specified aliases fail with an error: Presto query failed: Column name not specified at position 1.
SELECT * FROM FOREIGN TABLE( SELECT count(*) FROM nation )@presto_server AS presto_queryWe should provide alias explicitly to make it work.
SELECT * FROM FOREIGN TABLE( SELECT count(*) as some_explicit_alias FROM nation )@presto_server AS presto_query
- Currently QueryGrid is limited to run 20 concurrent queries. However, if your queries are long running or taking a lot of resources (e.g. big result set) it is advised to run even less than 20 queries. Otherwise you may get an Teradata error:
[Error 7583] [SQLState HY000] The secure mode processes had a set up error.
In order to run INSERT INTO queries with the high presto_writer_count you must change *node-scheduler* settings in Presto config.properties file as well
node-scheduler.max-splits-per-node=MAX_SPLITS_PER_NODE node-scheduler.max-pending-splits-per-node-per-task=MAX_SPLITS_PER_TASK_PER_NODE
- Where
- *MAX_SPLITS_PER_TASK_PER_NODE* must equal or greater than presto_writer_count
- *MAX_SPLITS_PER_NODE* must be equal or greater than presto_writer_count multiplied by the number of queries that are supposed to be run concurrently
High presto_writer_count can cause high CPU utilization. If the Presto starts to fail with the timeout related issue try to decrease presto_writer_count.
In case of getting the Another instance of the Querygrid driver is still running. error please restart your Teradata instance with the tpareset reason command.