Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sungwoo Park <glap...@gmail.com>
Subject Question on Tez 0.6 and Tez 0.7
Date Sat, 11 Jun 2016 10:57:59 GMT
Hello,

I have a question about the performance difference between Tez 0.6.2 and
Tez 0.7.0.

This is what we did:

1. Installed HDP 2.4 on a 10-node cluster with default settings. No other
particular changes were made to the
default settings recommended by HDP 2.4.

2. Ran TeraSort using Tez 0.6.2 and Tez 0.7.0, and compared the running
time.

Each experiment specifies the amount of input data per node. For example,
10GB_per_node means a total of
100GB input because there are 10 data nodes in the cluster.

We've found that Tez 0.7.0 runs consistently slower than Tez 0.6.2,
producing 'Vertex re-running' errors quite
often when the size of input data per node is over 40GB. Even when there is
no 'Vertex re-running', Tez 0.7.0
took much longer than Tez 0.6.2.

We know that Tez 0.7.0 runs faster than Tez 0.6.2, because on a cluster of
44 nodes (with only 24GB memory per
node), Tez 0.7.0 finished TeraSort almost as fast as Tez 0.6.2. We are
trying to figure out what we missed in
the experiments on the 11-node cluster.

Any help here would be appreciated. Thanks a lot.

Sungwoo Park

----- Configuration

HDP 2.4
11 nodes, 10 data nodes, each with 96GB memory, 6 x 500GB HDDs
same HDFS, Yarn, MR

Each mapper container uses 5GB.
Each reducer container uses 10GB.

Configurations specific to tez-0.6.0
tez.runtime.sort.threads = 2

Configurations specicfic to tez-0.7.0
tez.grouping.max-size = 1073741824
tez.runtime.sorter.class = PIPELINED
tez.runtime.pipelined.sorter.sort.threads = 2

----- TEZ-0.6.2

10GB_per_node
id              time            num_containers  mem             core
     diag
0               212             239             144695261       21873
1               204             239             139582665       20945
2               211             239             143477178       21700

20GB_per_node
id              time            num_containers  mem             core
     diag
0               392             239             272528515       42367
1               402             239             273085026       42469
2               410             239             270118502       42111

40GB_per_node
id              time            num_containers  mem             core
     diag
0               761             239             525320249       82608
1               767             239             527612323       83271
2               736             239             520229980       82317

80GB_per_node
id              time            num_containers  mem             core
     diag
0               1564            239             1123903845      173915
1               1666            239             1161079968      178656
2               1628            239             1146656912      175998

160GB_per_node
id              time            num_containers  mem             core
     diag
0               3689            239             2523160230      377563
1               3796            240             2610411363      388928
2               3624            239             2546652697      381400

----- TEZ-0.7.0

10GB_per_node
id              time            num_containers  mem             core
     diag
0               262             239             179373935       26223
1               259             239             179375665       25767
2               271             239             186946086       26516

20GB_per_node
id              time            num_containers  mem             core
     diag
0               572             239             380034060       55515
1               533             239             364082337       53555
2               515             239             356570788       52762

40GB_per_node
id              time            num_containers  mem             core
     diag
0               1405            339             953706595       136624
     Vertex re-running
1               1157            239             828765079       118293
2               1219            239             833052604       118151

80GB_per_node
id              time            num_containers  mem             core
     diag
0               3046            361             1999047193      279635
     Vertex re-running
1               2967            337             2079807505      290171
     Vertex re-running
2               3138            355             2030176406      282875
     Vertex re-running

160GB_per_node
id              time            num_containers  mem             core
     diag
0               6832            436             4524472859      634518
     Vertex re-running
1               6233            365             4123693672      573259
     Vertex re-running
2               6133            379             4121812899      579044
     Vertex re-running

Mime
View raw message