7.3. Tuning Presto
The default Presto settings should work well for most workloads. The following information may help you if your cluster is facing a specific performance problem.
Config Properties
These configuration options may require tuning in specific situations:
- task.info-refresh-max-wait: Controls staleness of task information, which is used in scheduling. Increasing this value can reduce coordinator CPU load, but may result in suboptimal split scheduling.
- task.max-worker-threads: Sets the number of threads used by workers to process splits. Increasing this number can improve throughput, if worker CPU utilization is low, but will cause increased heap space usage.
- distributed-joins-enabled: Use hash distributed joins instead of broadcast joins. Distributed joins require redistributing both tables using a hash of the join key. This can be slower (sometimes substantially) than broadcast joins, but allows much larger joins. Broadcast joins require that the tables on the right side of the join fit in memory on each machine, whereas distributed joins only need to fit in distributed memory across all machines. This can also be specified on a per-query basis using the distributed_join session property.
JVM Settings
The following can be helpful for diagnosing GC issues:
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCCause
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintClassHistogramAfterFullGC
-XX:+PrintClassHistogramBeforeFullGC
-XX:PrintFLSStatistics=2
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1