Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaoyong Zhu <xiaoy...@microsoft.com>
Subject RE: How to Tuning Tez Task Performance
Date Mon, 27 Apr 2015 23:11:09 GMT
Btw, Rajesh, I set tez.task.generate.counters.per.io=true in my cluster but did not find the
task counter per edge. Could you please give some counter examples when this is enabled so
I could verify?

Thanks!

Xiaoyong

From: Rajesh Balamohan [mailto:rajesh.balamohan@gmail.com]
Sent: Friday, April 24, 2015 4:55 PM
To: user@tez.apache.org
Subject: Re: How to Tuning Tez Task Performance

Listing some details at very high level,

- Set "tez.task.generate.counters.per.io<http://tez.task.generate.counters.per.io>=true"
to get more details on the task counters. Basically this starts printinng the counters per
edge, which can be a lot more useful for debugging.

- In case you want to avoid container launches etc when you analyze for first time, try hive.prewarm.enabled=true
& hive.prewarm.numcontainers=<no of containers you want in your sesssion to be prewarmed>

- Container reuse is enabled by default in tez. (tez.am.container.idle.release-timeout-min.millis,
tez.am.container.idle.release-timeout-max.millis controls the amount of time a container is
held by AM before releasing it)

- Set tez.runtime.io.sort.mb appropriately to avoid spills (you can check task counters in
the logs to find out the spills and adjust it accordingly)

- Set tez.runtime.sort.threads=2 to enable PipelinedSorter which is a lot performant than
DefaultSorter (this is the default in master branch. But if you are using earlier releases,
you can turn it on by setting tez.runtime.sort.threads=2).

- Set tez.runtime.compress=true and set tez.runtime.compress.codec (SnappyCodec is preferred,
but it is upto you to choose)

- Set tez.runtime.shuffle.keep-alive.enabled=true in case you have shuffle heavy workload.
This reduces number of connections in shuffle.

- Adjust memory allocated to different inputs/outputs based on tez.task.scale.memory.ratios
(but this is more of expert level setting which you might want to touch after nailing down
any memory pressure)

- Adjusting shuffle buffers are also possible, but would advise only when you nail down an
issue related to shuffle/merge codepath.

- Set "tez.runtime.optimize.local.fetch=true" to bypass http fetches (when data is locally
present)


Feel free to refer to https://github.com/t3rmin4t0r/tez-autobuild/blob/master/tez-site.xml
for any commonly used settings for benchmarks.

On Fri, Apr 24, 2015 at 1:52 PM, r7raul1984@163.com<mailto:r7raul1984@163.com> <r7raul1984@163.com<mailto:r7raul1984@163.com>>
wrote:
I want to  Tuning Tez Task Performance. This Tez Task is created by Hive.  How to Tuning Tez
Task Performance?
Analyze performance  by Tez Task Counts  of Tez Log ? Any Suggestion?

________________________________
r7raul1984@163.com<mailto:r7raul1984@163.com>



--
~Rajesh.B
Mime
View raw message