Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: How to Tuning Tez Task Performance
Date Fri, 24 Apr 2015 17:12:35 GMT
>   What are the problems with having
>tez.runtime.shuffle.keep-alive.enabled and
>tez.runtime.optimize.local.fetch set to true always by default?

Nothing has failed due to these so far - wešve gone through one entire
release where we tested both heavily and found that they work very well at
scale.

local.fetch is already enabled by default in 0.7.x (TEZ-2333).

shared.fetch isnšt getting flipped right now because last release it
didnšt get enough coverage on customer setups (for my liking) to bake it
in (the broadcast edge didnšt whitelist that config).

The keep-alive shuffle was tested on 350 nodes, with 10,000 mappers. And
the advantage of these were significant - between those three options a
broadcast JOIN went from about 30 minutes of shuffle time to around 2 1/2
minutes.

You do need a 64 bit OS (not sandbox) with a modern kernel to safely flip
these on - system configs on Centos need to roughly correspond to the
ktune settings for RHEL (other than THP & numad/zone_reclaim).

These configs help shuffle in general - off the top of my head,
tcp_fin_timeout and somaxconn comes to mind immediately as being the
relevant configs to always tune.

Therešs a certain inflection point we hit in shuffle, where itšs worse to
be faster - fixes like HADOOP-11226 help there, but they need
router/switch configs as well.

Cheers,
Gopal



Mime
View raw message