Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes Zillmann <jzillm...@googlemail.com>
Subject Re: container fails to start with malloc error
Date Thu, 16 Apr 2015 08:44:47 GMT
Hi Hitesh,

will check the memory situation!
Java version is:
  java version "1.7.0_76"
 Java(TM) SE Runtime Environment (build 1.7.0_76-tdc1-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)

I don’t think its necessarily the same java version which was used to compile (Hadoop, Tez,
Datameer ?)

Johannes 


> On 15 Apr 2015, at 18:59, Hitesh Shah <hitesh@apache.org> wrote:
> 
> Hi Johannes 
> 
> Not sure if anyone has seen this earlier. Do you know if the machines have enough memory
to run the no. of tasks/containers that you are launching? Also, I am assuming that you are
compiling and running against the same jdk version?
> 
> Would you mind sharing the details on what java version are you running? 
> 
> — Hitesh
> 
> On Apr 14, 2015, at 1:19 AM, Johannes Zillmann <jzillmann@googlemail.com> wrote:
> 
>> Hey guys,
>> 
>> in an customer environment certain Tez jobs fail to start
>> 
>> On the client side it looks like:
>> ——————————————————————————
>> INFO [2015-04-08 15:19:30.213] [MrPlanRunnerV2] (YarnClientImpl.java:204) - Submitted
application application_1428177121154_0065
>> INFO [2015-04-08 15:19:30.214] [MrPlanRunnerV2] (TezClient.java:357) - The url to
track the Tez Session: http://master:8088/proxy/application_1428177121154_0065/
>> INFO [2015-04-08 15:19:33.219] [MrPlanRunnerV2] (TezClient.java:556) - App did not
succeed. Diagnostics: Application application_1428177121154_0065 failed 2 times due to AM
Container for appattempt_1428177121154_0065_000002 exited with  exitCode: 134 due to: Exception
from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 1:
10818 Aborted                 (core dumped) /opt/teradata/jvm64/jdk7/bin/java -Xmx819m -server
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -XX:+PrintGCDetails -verbose:gc
-XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Dapple.awt.UIElement=true -Djava.awt.headless=true
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties
-Dyarn.app.container.log.dir=/data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_000001
-Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster --session
> /data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_000001/stdout
2> /data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_000001/stderr
>> 
>> org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 1: 10818 Aborted
                (core dumped) /opt/teradata/jvm64/jdk7/bin/java -Xmx819m -server -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA
-XX:+UseParallelGC -Dapple.awt.UIElement=true -Djava.awt.headless=true -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator
-Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_000001
-Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster --session
> /data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_000001/stdout
2> /data3/hadoop/yarn/log/application_1428177121154_0065/container_1428177121154_0065_02_000001/stderr
>> 
>> 	at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
>> 	at org.apache.hadoop.util.Shell.run(Shell.java:418)
>> 	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
>> 	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
>> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
>> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> 	at java.lang.Thread.run(Thread.java:745)
>> 
>> 
>> Container exited with a non-zero exit code 134
>> .Failing this attempt.. Failing the application.
>> ——————————————————————————
>> 
>> 
>> Then you have that for the task:
>> ——————————————————————————
>> Log Type: stderr
>> Log Length: 429
>> java: malloc.c:3090: sYSMALLOc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1)
- 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) ||
((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk,
fd_nextsize))+((2 * (sizeof(size_t))) - 1)) & ~((2 * (sizeof(size_t))) - 1))) &&
((old_top)->size & 0x1) && ((unsigned long)old_end & pagemask) == 0)' failed.
>> 
>> Log Type: stdout
>> Log Length: 0
>> ——————————————————————————
>> 
>> Any ideas ?
>> 
>> Johannes
> 


Mime
View raw message