Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajesh Balamohan <rajesh.balamo...@gmail.com>
Subject Re: How to Tuning Tez Task Performance
Date Tue, 28 Apr 2015 00:07:19 GMT
Not sure if you have Tez-UI which should render this info automatically.
Otherwise you can verify from the application logs.  Example is given below.

2015-04-27 04:15:12,834 INFO [Dispatcher thread: Central]
history.HistoryEventHandler:
[HISTORY][DAG:dag_1429683757595_0452_1][Event:DAG_FINISHED]:
dagId=dag_1429683757595_0452_1, startTime=1430133293306,
finishTime=1430133312773, timeTaken=19467, status=SUCCEEDED, diagnostics=,
counters=Counters: 225, org.apache.tez.common.counters.DAGCounter,
NUM_SUCCEEDED_TASKS=43, TOTAL_LAUNCHED_TASKS=43,
.....
.....
 *TaskCounter_Map_4_OUTPUT_Map_1*, ADDITIONAL_SPILLS_BYTES_READ=0,
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0,
OUTPUT_BYTES=137200, OUTPUT_BYTES_PHYSICAL=119705,
OUTPUT_BYTES_WITH_OVERHEAD=548794, OUTPUT_LARGE_RECORDS=0,
OUTPUT_RECORDS=27440, SPILLED_RECORDS=0, TaskCounter_Map_5_INPUT_date_dim,
INPUT_RECORDS_PROCESSED=10, TaskCounter_Map_5_OUTPUT_Map_1,
ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0,
ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=1825, OUTPUT_BYTES_PHYSICAL=1505,
OUTPUT_BYTES_WITH_OVERHEAD=7297, OUTPUT_LARGE_RECORDS=0,
OUTPUT_RECORDS=365, SPILLED_RECORDS=0,
.......
.......
 *TaskCounter_Map_6_OUTPUT_Map_1*, ADDITIONAL_SPILLS_BYTES_READ=0,
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0,
OUTPUT_BYTES=909, OUTPUT_BYTES_PHYSICAL=464,
OUTPUT_BYTES_WITH_OVERHEAD=2421, OUTPUT_LARGE_RECORDS=0,
OUTPUT_RECORDS=101, SPILLED_RECORDS=0, TaskCounter_Map_7_INPUT_item,
INPUT_RECORDS_PROCESSED=47,
 ....
 ....
 *TaskCounter_Map_7_OUTPUT_Map_1*, ADDITIONAL_SPILLS_BYTES_READ=0,
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0,
OUTPUT_BYTES=1104000, OUTPUT_BYTES_PHYSICAL=341828,
OUTPUT_BYTES_WITH_OVERHEAD=1727999, OUTPUT_LARGE_RECORDS=0,
OUTPUT_RECORDS=48000, SPILLED_RECORDS=0, TaskCounter_Reducer_2_INPUT_Map_1,
ADDITIONAL_SPILLS_BYTES_READ=821473, ADDITIONAL_SPILLS_BYTES_WRITTEN=0,
COMBINE_INPUT_RECORDS=0, FIRST_EVENT_RECEIVED=12, LAST_EVENT_RECEIVED=5049,
MERGED_MAP_OUTPUTS=36, MERGE_PHASE_TIME=5070, NUM_DISK_TO_DISK_MERGES=0,
NUM_FAILED_SHUFFLE_INPUTS=0, NUM_MEM_TO_DISK_MERGES=0,
NUM_SHUFFLED_INPUTS=36, NUM_SKIPPED_INPUTS=0, REDUCE_INPUT_GROUPS=47999,
REDUCE_INPUT_RECORDS=670353, SHUFFLE_BYTES=16402510,
SHUFFLE_BYTES_DECOMPRESSED=52252736, SHUFFLE_BYTES_DISK_DIRECT=821473,
SHUFFLE_BYTES_TO_DISK=0, SHUFFLE_BYTES_TO_MEM=15581037,
SHUFFLE_PHASE_TIME=5056, SPILLED_RECORDS=33317,
....
....
  *TaskCounter_Reducer_2_OUTPUT_Reducer_3*, ADDITIONAL_SPILLS_BYTES_READ=0,
ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0,
OUTPUT_BYTES=5600, OUTPUT_BYTES_PHYSICAL=0, OUTPUT_BYTES_WITH_OVERHEAD=0,
OUTPUT_RECORDS=100, SPILLED_RECORDS=100

On Tue, Apr 28, 2015 at 4:41 AM, Xiaoyong Zhu <xiaoyzhu@microsoft.com>
wrote:

>  Btw, Rajesh, I set tez.task.generate.counters.per.io=true in my cluster
> but did not find the task counter per edge. Could you please give some
> counter examples when this is enabled so I could verify?
>
>
>
> Thanks!
>
>
>
> Xiaoyong
>
>
>
> *From:* Rajesh Balamohan [mailto:rajesh.balamohan@gmail.com]
> *Sent:* Friday, April 24, 2015 4:55 PM
> *To:* user@tez.apache.org
> *Subject:* Re: How to Tuning Tez Task Performance
>
>
>
> Listing some details at very high level,
>
>
>
> - Set "tez.task.generate.counters.per.io=true" to get more details on the
> task counters. Basically this starts printinng the counters per edge, which
> can be a lot more useful for debugging.
>
>
>
> - In case you want to avoid container launches etc when you analyze for
> first time, try hive.prewarm.enabled=true & hive.prewarm.numcontainers=<no
> of containers you want in your sesssion to be prewarmed>
>
>
>
> - Container reuse is enabled by default in tez.
> (tez.am.container.idle.release-timeout-min.millis,
> tez.am.container.idle.release-timeout-max.millis controls the amount of
> time a container is held by AM before releasing it)
>
>
>
> - Set tez.runtime.io.sort.mb appropriately to avoid spills (you can check
> task counters in the logs to find out the spills and adjust it accordingly)
>
>
>
> - Set tez.runtime.sort.threads=2 to enable PipelinedSorter which is a lot
> performant than DefaultSorter (this is the default in master branch. But if
> you are using earlier releases, you can turn it on by setting
> tez.runtime.sort.threads=2).
>
>
>
> - Set tez.runtime.compress=true and set tez.runtime.compress.codec
> (SnappyCodec is preferred, but it is upto you to choose)
>
>
>
> - Set tez.runtime.shuffle.keep-alive.enabled=true in case you have shuffle
> heavy workload. This reduces number of connections in shuffle.
>
>
>
> - Adjust memory allocated to different inputs/outputs based on
> tez.task.scale.memory.ratios (but this is more of expert level setting
> which you might want to touch after nailing down any memory pressure)
>
>
>
> - Adjusting shuffle buffers are also possible, but would advise only when
> you nail down an issue related to shuffle/merge codepath.
>
>
>
> - Set "tez.runtime.optimize.local.fetch=true" to bypass http fetches (when
> data is locally present)
>
>
>
>
>
> Feel free to refer to
> https://github.com/t3rmin4t0r/tez-autobuild/blob/master/tez-site.xml for
> any commonly used settings for benchmarks.
>
>
>
> On Fri, Apr 24, 2015 at 1:52 PM, r7raul1984@163.com <r7raul1984@163.com>
> wrote:
>
>  I want to  Tuning Tez Task Performance. This Tez Task is created by
> Hive.  How to Tuning Tez Task Performance?
>
> Analyze performance  by Tez Task Counts  of Tez Log ? Any Suggestion?
>
>
>  ------------------------------
>
> r7raul1984@163.com
>
>
>
>
>
> --
>
> ~Rajesh.B
>



-- 
~Rajesh.B

Mime
View raw message