Presto 0.127t Documentation

9.2. Tuning Presto

9.2. Tuning Presto

The default Presto settings should work well for most workloads. The following information may help you if your cluster is facing a specific performance problem.

Config Properties

These configuration options may require tuning in specific situations:

  • task.info-refresh-max-wait: Controls staleness of task information, which is used in scheduling. Increasing this value can reduce coordinator CPU load, but may result in suboptimal split scheduling.
  • task.max-worker-threads: Sets the number of threads used by workers to process splits. Increasing this number can improve throughput, if worker CPU utilization is low, but will cause increased heap space usage.
  • distributed-joins-enabled: Use hash distributed joins instead of broadcast joins. Distributed joins require redistributing both tables using a hash of the join key. This can be slower (sometimes substantially) than broadcast joins, but allows much larger joins. Broadcast joins require that the tables on the right side of the join fit in memory on each machine, whereas distributed joins only need to fit in distributed memory across all machines. This can also be specified on a per-query basis using the distributed_join session property. This property is enabled by default.

JVM Settings

The following can be helpful for diagnosing GC issues:

-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCCause
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintClassHistogramAfterFullGC
-XX:+PrintClassHistogramBeforeFullGC
-XX:PrintFLSStatistics=2
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1