Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sungwoo Park <glap...@gmail.com>
Subject Re: Question on Tez 0.6 and Tez 0.7
Date Mon, 13 Jun 2016 12:00:33 GMT
What we have found is that mapred-site.xml affects the configuration for
Tez 0.6 and 0.7 in a different way.

Here is what we tried:

1. We installed HDP 2.4.2.0-258.

2. On the directory /usr/hdp/2.4.2.0-258/hadoop/etc/hadoop, there is
mapred-site.xml. The default value for mapreduce.task.io.sort.mb is 2047.

    <property>
      <name>mapreduce.task.io.sort.mb</name>
      <value>2047</value>
    </property>

3. In our local installation of Tez, there is tez-site.xml. The default
value for tez.runtime.io.sort.mb is 1843.

    <property>
      <name>tez.runtime.io.sort.mb</name>
      <value>1843</value>
    </property>

4. When we run TeraSort from hadoop-mapreduce 2.7.1, Tez 0.6 uses the value
of 1843 from tez-site.xml for tez.runtime.io.sort.mb for the user payload
in ProcessorDescriptor. However, Tez 0.7 somehow uses 2047 from
mapred-site.xml. For example, setting mapreduce.task.io.sort.mb to 3000
changes tez.runtime.io.sort.mb to 3000 in Tez 0.7, whereas it has no effect
on Tez 0.6.

I guess some key-value pairs from mapred-site.xml interfere with Tez 0.7
(but not Tez 0.6). This is because setting  tez.runtime.io.sort.mb is 1843
does not make any significant difference in execution time for Tez 0.7 (it
still runs slow). In conclusion, I think that Tez 0.7 runs fast, but there
might some bug in Tez 0.7 client for handling MapReduce jobs such as
TeraSort.

Let me try to run TeraSort on Tez 0.7 without using mapred-site.xml.
(Currently it fails with a memory error.)

--- Sungwoo







On Mon, Jun 13, 2016 at 12:56 PM, Rajesh Balamohan <
rajesh.balamohan@gmail.com> wrote:

> Are the runtime configs identical between 0.6 and 0.7?. E.g Do you see any
> changes in tez.runtime.io.sort.mb between 0.6 and 0.7?
>
> On Mon, Jun 13, 2016 at 9:20 AM, Sungwoo Park <glapark@gmail.com> wrote:
>
>> Yes, we also think that that's what is happening. When there is 'Vertex
>> re-running', we see Exceptions like:
>>
>> 2016-05-13 09:45:28,280 WARN [ResponseProcessor for block
>> BP-358076261-10.1.91.4-1462195174702:blk_1075135562_1396252]
>> hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block
>> BP-358076261-10.1.91.4-14621951747\
>> 02:blk_1075135562_1396252
>> java.net.SocketTimeoutException: 65000 millis timeout while waiting for
>> channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/10.1.91.11:37025
>> remote=/10.1.91.11:50010]
>>   at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>   at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>>   at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>>   at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
>>   at java.io.FilterInputStream.read(FilterInputStream.java:83)
>>   at java.io.FilterInputStream.read(FilterInputStream.java:83)
>>   at
>> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2201)
>>   at
>> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
>>   at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:867)
>> 2016-05-13 09:45:28,283 INFO [TezChild] task.TezTaskRunner: Encounted an
>> error while executing task: attempt_1462251281999_0859_1_01_000063_0
>> java.io.IOException: All datanodes 10.1.91.11:50010 are bad. Aborting...
>>   at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1206)
>>   at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1004)
>>   at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:548)
>>
>> However, this still does not explain that Tez 0.7 somehow runs much
>> slower than Tez 0.6.2 when there arises no such Exception. As an example,
>> see the summary of results below.  Tez 0.7 takes more than 1100 seconds to
>> sort 40GB per node (where there is no Vertex re-running and no Exception
>> other than at the end of the run), whereas Tez 0.6.2 spends just less than
>> 800 seconds. These results were obtained from the same cluster, using the
>> default settings by HDP 2.4 (installed with Ambari), with very few
>> differences in configuration settings.
>>
>> I believe that we missed some important settings for Tez 0.7 (because we
>> did not install Hadoop and HDFS manually). Let me try the same experiment
>> in other clusters and see what happens.
>>
>> Thanks for the response,
>>
>> --- Sungwoo
>>
>> tez-0.6.2-40GB_10nodes
>> id              time            num_containers  mem             core
>>        diag
>> 0               761             239             525320249       82608
>>
>> 1               767             239             527612323       83271
>>
>> 2               736             239             520229980       82317
>>
>>
>>
>> tez-0.7.0-40GB_10nodes
>>
>> id              time            num_containers  mem             core
>>        diag
>> 0               1405            339             953706595       136624
>>        Vertex re-running, vertexName=initialmap,
>> vertexId=vertex_1462251281999_0859_1_00
>> 1               1157            239             828765079       118293
>>
>> 2               1219            239             833052604       118151
>>
>>
>>
>> On Mon, Jun 13, 2016 at 6:21 AM, Bikas Saha <bikas@apache.org> wrote:
>>
>>> From vertex re-running messages it seems like you may be seeing errors
>>> in fetching shuffle data to reducers from mappers. This results in
>>> re-running map vertex tasks to produce only the missing data – with the
>>> corresponding vertex re-running message.
>>>
>>>
>>>
>>> Bikas
>>>
>>>
>>>
>>> *From:* Rajesh Balamohan [mailto:rajesh.balamohan@gmail.com]
>>> *Sent:* Saturday, June 11, 2016 5:14 AM
>>> *To:* user@tez.apache.org
>>> *Subject:* Re: Question on Tez 0.6 and Tez 0.7
>>>
>>>
>>>
>>> Can you share the exceptions that happened in tez 0.7 runs.
>>>
>>> -Rajesh.B
>>>
>>> On 11-Jun-2016 16:28, "Sungwoo Park" <glapark@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>>
>>>
>>> I have a question about the performance difference between Tez 0.6.2 and
>>> Tez 0.7.0.
>>>
>>>
>>>
>>> This is what we did:
>>>
>>>
>>>
>>> 1. Installed HDP 2.4 on a 10-node cluster with default settings. No
>>> other particular changes were made to the
>>>
>>> default settings recommended by HDP 2.4.
>>>
>>>
>>>
>>> 2. Ran TeraSort using Tez 0.6.2 and Tez 0.7.0, and compared the running
>>> time.
>>>
>>>
>>>
>>> Each experiment specifies the amount of input data per node. For
>>> example, 10GB_per_node means a total of
>>>
>>> 100GB input because there are 10 data nodes in the cluster.
>>>
>>>
>>>
>>> We've found that Tez 0.7.0 runs consistently slower than Tez 0.6.2,
>>> producing 'Vertex re-running' errors quite
>>>
>>> often when the size of input data per node is over 40GB. Even when there
>>> is no 'Vertex re-running', Tez 0.7.0
>>>
>>> took much longer than Tez 0.6.2.
>>>
>>>
>>>
>>> We know that Tez 0.7.0 runs faster than Tez 0.6.2, because on a cluster
>>> of 44 nodes (with only 24GB memory per
>>>
>>> node), Tez 0.7.0 finished TeraSort almost as fast as Tez 0.6.2. We are
>>> trying to figure out what we missed in
>>>
>>> the experiments on the 11-node cluster.
>>>
>>>
>>>
>>> Any help here would be appreciated. Thanks a lot.
>>>
>>>
>>>
>>> Sungwoo Park
>>>
>>>
>>>
>>> ----- Configuration
>>>
>>>
>>>
>>> HDP 2.4
>>>
>>> 11 nodes, 10 data nodes, each with 96GB memory, 6 x 500GB HDDs
>>>
>>> same HDFS, Yarn, MR
>>>
>>>
>>>
>>> Each mapper container uses 5GB.
>>>
>>> Each reducer container uses 10GB.
>>>
>>>
>>>
>>> Configurations specific to tez-0.6.0
>>>
>>> tez.runtime.sort.threads = 2
>>>
>>>
>>>
>>> Configurations specicfic to tez-0.7.0
>>>
>>> tez.grouping.max-size = 1073741824
>>>
>>> tez.runtime.sorter.class = PIPELINED
>>>
>>> tez.runtime.pipelined.sorter.sort.threads = 2
>>>
>>>
>>>
>>> ----- TEZ-0.6.2
>>>
>>>
>>>
>>> 10GB_per_node
>>>
>>> id              time            num_containers  mem             core
>>>        diag
>>>
>>> 0               212             239             144695261       21873
>>>
>>> 1               204             239             139582665       20945
>>>
>>> 2               211             239             143477178       21700
>>>
>>>
>>>
>>> 20GB_per_node
>>>
>>> id              time            num_containers  mem             core
>>>        diag
>>>
>>> 0               392             239             272528515       42367
>>>
>>> 1               402             239             273085026       42469
>>>
>>> 2               410             239             270118502       42111
>>>
>>>
>>>
>>> 40GB_per_node
>>>
>>> id              time            num_containers  mem             core
>>>        diag
>>>
>>> 0               761             239             525320249       82608
>>>
>>> 1               767             239             527612323       83271
>>>
>>> 2               736             239             520229980       82317
>>>
>>>
>>>
>>> 80GB_per_node
>>>
>>> id              time            num_containers  mem             core
>>>        diag
>>>
>>> 0               1564            239             1123903845      173915
>>>
>>> 1               1666            239             1161079968      178656
>>>
>>> 2               1628            239             1146656912      175998
>>>
>>>
>>>
>>> 160GB_per_node
>>>
>>> id              time            num_containers  mem             core
>>>        diag
>>>
>>> 0               3689            239             2523160230      377563
>>>
>>> 1               3796            240             2610411363      388928
>>>
>>> 2               3624            239             2546652697      381400
>>>
>>>
>>>
>>> ----- TEZ-0.7.0
>>>
>>>
>>>
>>> 10GB_per_node
>>>
>>> id              time            num_containers  mem             core
>>>        diag
>>>
>>> 0               262             239             179373935       26223
>>>
>>> 1               259             239             179375665       25767
>>>
>>> 2               271             239             186946086       26516
>>>
>>>
>>>
>>> 20GB_per_node
>>>
>>> id              time            num_containers  mem             core
>>>        diag
>>>
>>> 0               572             239             380034060       55515
>>>
>>> 1               533             239             364082337       53555
>>>
>>> 2               515             239             356570788       52762
>>>
>>>
>>>
>>> 40GB_per_node
>>>
>>> id              time            num_containers  mem             core
>>>        diag
>>>
>>> 0               1405            339             953706595       136624
>>>        Vertex re-running
>>>
>>> 1               1157            239             828765079       118293
>>>
>>> 2               1219            239             833052604       118151
>>>
>>>
>>>
>>> 80GB_per_node
>>>
>>> id              time            num_containers  mem             core
>>>        diag
>>>
>>> 0               3046            361             1999047193      279635
>>>        Vertex re-running
>>>
>>> 1               2967            337             2079807505      290171
>>>          Vertex re-running
>>>
>>> 2               3138            355             2030176406      282875
>>>          Vertex re-running
>>>
>>>
>>>
>>> 160GB_per_node
>>>
>>> id              time            num_containers  mem             core
>>>        diag
>>>
>>> 0               6832            436             4524472859      634518
>>>          Vertex re-running
>>>
>>> 1               6233            365             4123693672      573259
>>>        Vertex re-running
>>>
>>> 2               6133            379             4121812899      579044
>>>        Vertex re-running
>>>
>>>
>>
>
>
> --
> ~Rajesh.B
>

Mime
View raw message