10.4. Presto Configuration Options for YARN-Based Clusters
If you are using Ambari to install the Presto App package you can update the configuration properties from the Ambari Slider View UI. If you are using Slider to install Presto on YARN manually, you must edit the configuration files manually.
The appConfig.json
and resources-[singlenode|mutlinode].json
files
are the two major configuration files you need to configure before you
can get Presto running on YARN. Sample configuration options files for the
Presto App package are available in the repository in the following directory:
presto-yarn-package/src/main/resources
The “default” values listed for the sections
appConfig.json and resources.json are from
presto-yarn-package/src/main/resources/appConfig.json
and
presto-yarn-package/src/main/resources/resources-multinode.json
files respectively. These default values will be auto-populated on the
Slider View UI for installation using Ambari Slider View. But you
can modify the properties on the UI as per your requirements.
For manual installation using Slider, copy the
presto-yarn-package/src/main/resources/appConfig.json
and
presto-yarn-package/src/main/resources/resources-[singlenode|multinode].json
to a local file at a location where you are planning to run Slider. Name
them as appConfig.json
and resources.json
. Update these sample
json files with whatever configurations you want to have for Presto. If
you are ok with the default values in the sample file you can just use
them as-is.
Follow the steps here and configure the presto-yarn configuration files to match your cluster requirements. Optional ones are marked (optional). Please do not change any variables other than the ones listed below.
appConfig.json
site.global.app_user
(default -yarn
): This is the user which will be launching the YARN application for Presto. So all the Slider commands (usingbin/slider
script) will be run as this user. Make sure that you have a HDFS home directory created for theapp_user
. Eg: for useryarn
create/user/yarn
withyarn
user as an owner.
hdfs dfs -mkdir -p /user/yarn
hdfs dfs -chown yarn:yarn /user/yarn
Note
: For operations involving Hive connector in Presto, especially
INSERT, ALTER TABLE etc, it may require that the user running Presto has
access to HDFS directories like Hive warehouse directories. So make sure
that the app_user
you set has appropriate access permissions to
those HDFS directories. For eg: /apps/hive/warehouse
is usually
where Presto user will need access for various DML operations involving
Hive connector and is owned by hdfs
in most cases. In that case, one
way to fix the permission issue is to set site.global.app_user
to
user hdfs
and also create /user/hdfs
directory in HDFS if not
already there (as above). You will also need to run any slider
scripts(bin/slider) as user hdfs
in this case.
site.global.user_group
(default -hadoop
): The group owning the application.site.global.data_dir
(default -/var/lib/presto/data
): This will be the data directory used by Presto. This directory configured should be pre-created on all nodes and must be owned by useryarn
, otherwise slider will fail to start Presto with permission errors.
mkdir -p /var/lib/presto/data
chown -R yarn:hadoop /var/lib/presto/data
site.global.config_dir
(default -/var/lib/presto/etc
): The configuration directory on the cluster where the Presto config files node.properties, jvm.config, config.properties and connector configuration files are deployed. These files will have configuration values created from templatespresto-yarn-package/package/templates/*.j2
and other relevantappConfig.json
parameters.site.global.singlenode
(default -true
): If set to true, the node used act as both coordinator and worker (singlenode mode). For multi-node set up, this should be set to false.site.global.presto_query_max_memory
(default -50GB
): This will be used asquery.max-memory
in Presto’s config.properties file.site.global.presto_query_max_memory_per_node
(default -1GB
): This will be used asquery.max-memory-per-node
in Presto’s config.properties file.site.global.presto_server_port
(default -8080
): Presto server’s http port.site.global.catalog
(optional) (default - configurestpch
connector): This property is used to configure connectors for Presto. The value of this should match the properties you would normally add in aconnector.properties
file for Presto in a non-YARN based installation.It should be of the format (note the single quotes around each value) - {‘connector1’ : [‘key1=value1’, ‘key2=value2’..], ‘connector2’ : [‘key1=value1’, ‘key2=value2’..]..}. This will create files connector1.properties, connector2.properties for Presto with entries key1=value1 etc. For example to have hive.properties for CDH Hive:
"site.global.catalog": "{'hive': ['connector.name=hive-cdh5', 'hive.metastore.uri=thrift://${NN_HOST}:9083'], 'tpch': ['connector.name=tpch']}"
Note
: The NN_HOST
used in hive.metastore.uri
is a variable
for your HDFS Namenode and this expects that your hive metastore is up
and running on your HDFS Namenode host. You do not have to replace that
with your actual Namenode hostname. This variable will be substituted
with your Namenode hostname during runtime. If you have hive metastore
running elsewhere make sure you update NN_HOST
with the appropriate
hostname.
site.global.jvm_args
(default - as in example below): This configures Prestojvm.config
file and default heapsize is1GB
. Since Presto needs thejvm.config
format to be a list of options, one per line, this property must be a String representation of list of strings. Each entry of this list will be a new line in your jvm.config. For example the configuration should look like:
"site.global.jvm_args": "['-server', '-Xmx1024M', '-XX:+UseG1GC', '-XX:G1HeapRegionSize=32M', '-XX:+UseGCOverheadLimit', '-XX:+ExplicitGCInvokesConcurrent', '-XX:+HeapDumpOnOutOfMemoryError', '-XX:OnOutOfMemoryError=kill -9 %p']",
site.global.additional_node_properties
andsite.global.additional_config_properties
(optional) (default - None): Presto launched via Slider will useconfig.properties
andnode.properties
created from templatespresto-yarn-package/package/templates/config.properties*.j2
andpresto-yarn-package/package/target/node.properties.j2
respectively. If you want to add any additional properties to these configuration files, addsite.global.additional_config_properties
andsite.global.additional_node_properties
to yourappConfig.json
. The value of these has to be a string representation of an array of entries (key=value) that has to go to the.properties
file. Eg:
"site.global.additional_config_properties": "['task.max-worker-threads=50', 'distributed-joins-enabled=true']"
site.global.plugin
(optional) (default - None): This allows you to add any additional jars you want to copy to pluginpresto-server-<version>/plugin/<connector>
directory in addition to what is already available there. It should be of the format {‘connector1’ : [‘jar1’, ‘jar2’..], ‘connector2’ : [‘jar3’, ‘jar4’..]..}. This will copy jar1, jar2 to Presto plugin directory at plugin/connector1 directory and jar3, jar4 at plugin/connector2 directory. Make sure you have the plugin jars you want to add to Presto available atpresto-yarn-package/src/main/slider/package/plugins/
prior to building the presto-yarn app package and thus the app package builtpresto-yarn-package-<version>-<presto-version>.zip
will have the jars underpackage/plugins
directory.
"site.global.plugin": "{'ml': ['presto-ml-${presto.version}.jar']}",
java_home
(default -/usr/lib/jvm/java
): Presto requires Java 1.8. So make jdk8 the default java or add it tojava_home
here- Variables in
appConfig.json
like${COORDINATOR_HOST}
,${AGENT_WORK_ROOT}
etc. do not need any substitution and will be appropriately configured during runtime.
resources.json
The configuration here can be added either globally (for COORDINATOR and WORKER) or for each component. See:
Advanced Configuration Options.
yarn.vcores
(default -1
): By default this is set globally.yarn.component.instances
(default -1
for COORDINATOR and3
for WORKER): The multinodepresto-yarn-package/src/main/resources/resources-multinode.json
sample file is now configured for a 4 node cluster where there will be 1 coordinator and 3 workers with strict placement policy, meaning, there will be one component instance running on every node irrespective of failure history. If there are insufficient number of nodemanager nodes in your cluster to accomodate the number of workers requested, the application launch will fail. The number of workers could benumber of nodemanagers in your cluster - 1
, with 1 node reserved for the coordinator, if you want Presto to be on all YARN nodes. If you want to deploy Presto on a single node (site.global.singlenode
set to true), make sure you set 1 for the COORDINATOR and just not add the WORKER component section (Referpresto-yarn-package/src/main/resources/resources-singlenode.json
). You can also just setyarn.component.instances
to 0 for WORKER in this case.yarn.memory
(default -1500MB
): The heapsize defined as -Xmx ofsite.global.jvm_args
inappConfig.json
, is used by the Presto JVM itself. Slider suggests that the value ofyarn.memory
must be bigger than this heapsize. The value ofyarn.memory
MUST be bigger than the heap size allocated to any JVM and Slider suggests using atleast 50% more appears to work, though some experimentation will be needed.yarn.label.expression
(optional) (default -coordinator
for COORDINATOR andworker
for WORKER``): See Configuring Presto.
Now you are ready to deploy Presto on YARN either manually or by using Ambari.