10.2. Manual Installation on a YARN-Based Cluster
You can use Apache Slider. to manually install Presto on a YARN-based cluster.
Installing and Integrating Presto with YARN
Deploying Presto on a YARN-Based Cluster
The installation procedures assume that you have a basic knowledge of Presto and the configuration files and properties it uses.
Note
All example files referred to are from: https://github.com/prestodb/presto-yarn/
Pre-Requisites
- A cluster with HDP 2.2+ or CDH5.4+ installed
- Apache Slider 0.80.0 (download from https://slider.incubator.apache.org/)
- JDK 1.8
- Zookeeper
- openssl >= 1.0.1e-16
Presto Installation Directory Structure
When you use Slider to install Presto on a YARN-based cluster, the Presto installation directory structure differs from the standard structure.
For more information, see: Presto Installation Directory Structure for YARN-Based Clusters.
Presto Installation Configuration Options
Before installation, you must configure the .json files required for running Presto.
For more information, see: Presto Configuration Options for YARN-Based Clusters.
Using Apache Slider to Manually Install Presto on a YARN-Based Cluster
- Download the slider 0.80.0 installation file from http://slider.incubator.apache.org/index.html to one of your nodes in the cluster.
tar -xvf slider-0.80.0-incubating-all.tar.gz
- Now configure Slider with JAVA_HOME and HADOOP_CONF_DIR in
slider-0.80.0-incubating/conf/slider-env.sh
export JAVA_HOME=/usr/lib/jvm/java
export HADOOP_CONF_DIR=/etc/hadoop/conf
- Configure zookeeper in
conf/slider-client.xml
. In case zookeper is listening onmaster:2181
you need to add there the following section:
<property>
<name>slider.zookeeper.quorum</name>
<value>master:2181</value>
</property>
- Configure path where slider packages will be installed
<property>
<name>fs.defaultFS</name>
<value>hdfs://master/</value>
</property>
Make sure the user running slider, which should be same as
site.global.app_user
inappConfig.json
, has a home dir in HDFS (See note here: appConfig.json).For more details about appConfig.json and resources.json, see Presto Configuration Options for YARN-Based Clusters
su hdfs
$ hdfs dfs -mkdir -p /user/<user>
$ hdfs dfs -chown <user>:<user> -R /user/<user>
- Now run Slider:
su <user>
cd slider-0.80.0-incubating
bin/slider package --install --name PRESTO --package ../presto-yarn-package-*.zip
bin/slider create presto1 --template appConfig.json --resources resources.json (using modified .json files as per your requirement)
This should start your application, and you can see it under the Yarn ResourceManager webUI.If your application is successfully run, it should continuously be available in the YARN resource manager as a “RUNNING” application. If the job fails, please be sure to check the job history’s logs along with the logs on the node’s disk. See Debugging and Logging for YARN-Based Clusters.
Additional Slider Commands
You can use the following Slider commands to manage your existing Presto application.
Check the Status
If you want to check the status of running application you run the
following, and you will have status printed to a file status_file
bin/slider status presto1 --out status_file
Destroy the App and Re-create
If you want to re-create the app due to some failures or you want to reconfigure Presto (eg: add a new connector)
bin/slider destroy presto1
bin/slider create presto1 --template appConfig.json --resources resources.json
Completely Remove the App
Delete the app including the app package.
bin/slider package --delete --name PRESTO
‘Flex’ible App
Flex the number of Presto workers to the new value. If greater than before, new copies of the worker will be requested. If less, component instances will be destroyed.
Changes are immediate and depend on the availability of resources in the
YARN cluster. Make sure while flex that there are extra nodes
available(if adding) with YARN nodemanagers running and also Presto data
directory pre-created/owned by yarn
user. Also make sure these nodes
do not have a Presto component already running, which may cause flex-ing
to deploy worker on these nodes and eventually failing.
eg: Asumme there are 2 nodes (with YARN nodemanagers running) in the cluster and you initially deployed only one of the nodes with Presto via Slider. If you want to deploy and start Presto WORKER component on the second node (assuming it meets all resource requirements) and thus have the total number of WORKERS to be 2, then run:
bin/slider flex presto1 --component WORKER 2
Please note that if your cluster already had 3 WORKER nodes running, the above command will destroy one of them and retain 2 WORKERs.
Advanced Configuration Options
The following advanced configuration options are available:
- Configuring memory, CPU, and YARN CGroups
- Failure policy
- YARN label
For more information, see Advanced Configuration Options for YARN-Based Clusters.
Debugging and Logging
For more information, see: Debugging and Logging for YARN-Based Clusters.