Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Sprague <sprag...@gmail.com>
Subject Re: Tez GC issues perhaps? not sure.
Date Thu, 15 Dec 2016 01:20:52 GMT
ah.

2016-12-14 14:05:07,855 [WARN] [AMShutdownThread]
|ats.ATSHistoryLoggingService|: ATSService being stopped,
eventQueueBacklog=14820, maxTimeLeftToFlush=-1, waitForever=true
2016-12-14 14:05:37,877 [ERROR] [AMShutdownThread]
|impl.TimelineClientImpl|: Failed to get the response from the timeline
server.
java.lang.RuntimeException: Failed to connect to timeline server.
Connection retries limit exceeded. The posted timeline event may be missing

so looks like something wonky with the timeline service.


yet.


$ ps -ef | grep timeline
spragues 14326 19414 99 16:43 pts/1    00:02:02
/usr/lib/jvm/java-8-oracle/jre/bin/java -Dproc_timelineserver -Xmx1000m
-Dhadoop.log.dir=/usr/lib/hadoop-yarn/logs
-Dyarn.log.dir=/usr/lib/hadoop-yarn/logs -Dhadoop.log.file=yarn.log
-Dyarn.log.file=yarn.log -Dyarn.home.dir= -Dyarn.id.str=
-Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console
-Djava.library.path=/usr/lib/hadoop/lib/native
-Dyarn.policy.file=hadoop-policy.xml
-Dhadoop.log.dir=/usr/lib/hadoop-yarn/logs
-Dyarn.log.dir=/usr/lib/hadoop-yarn/logs -Dhadoop.log.file=yarn.log
-Dyarn.log.file=yarn.log -Dyarn.home.dir=/usr/lib/hadoop-yarn
-Dhadoop.home.dir=/usr/lib/hadoop-yarn -Dhadoop.root.logger=INFO,console
-Dyarn.root.logger=INFO,console
-Djava.library.path=/usr/lib/hadoop/lib/native -classpath
/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/usr/lib/apache-tez-0.8.4-bin/conf:/usr/lib/apache-tez-0.8.4-bin/*:/usr/lib/apache-tez-0.8.4-bin/lib/*:/opt/pepperdata/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*:/etc/hadoop/conf/timelineserver-config/log4j.properties
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer


$ sudo netstat -lanp | grep 14326 | grep LISTEN
tcp        0      0 172.19.73.136:10200     0.0.0.0:*
LISTEN      14326/java
tcp        0      0 172.19.73.136:8188      0.0.0.0:*
LISTEN      14326/java


so i'm pretty sure its up and running.


ran the test tez job again just now and looked a syslog file on the DN.
found this again.

spragues@dwrdevdn13:~$ sudo cat
/storage7/hadoop/yarn/logs/application_1481520856023_2250/container_1481520856023_2250_01_000001/syslog

2016-12-14 16:46:21,177 [ERROR] [HistoryEventHandlingThread]
|impl.TimelineClientImpl|: Failed to get the response from the timeline
server.
2016-12-14 16:46:21,178 [WARN] [HistoryEventHandlingThread]
|ats.ATSHistoryLoggingService|: Could not handle history events
org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response
from the timeline server.
        at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339)
        at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)
        at
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357)
        at
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53)
        at
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190)
        at java.lang.Thread.run(Thread.java:745)
2016-12-14 16:46:21,541 [ERROR] [HistoryEventHandlingThread]
|impl.TimelineClientImpl|: Failed to get the response from the timeline
server.
2016-12-14 16:46:21,541 [WARN] [HistoryEventHandlingThread]
|ats.ATSHistoryLoggingService|: Could not handle history events
org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response
from the timeline server.
        at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:339)
        at
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:301)
        at
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:357)
        at
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53)
        at
org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190)
        at java.lang.Thread.run(Thread.java:745)



mapreduce.job.emit-timeline-data=false
yarn.timeline-service.address=${yarn.timeline-service.hostname}:10200
yarn.timeline-service.client.max-retries=30
yarn.timeline-service.client.retry-interval-ms=1000
yarn.timeline-service.enabled=true
yarn.timeline-service.handler-thread-count=10
yarn.timeline-service.hostname=dwrdevnn1.sv2.trulia.com
yarn.timeline-service.http-authentication.simple.anonymous.allowed=true
yarn.timeline-service.http-authentication.type=simple
yarn.timeline-service.http-cross-origin.enabled=true
yarn.timeline-service.keytab=/etc/krb5.keytab
yarn.timeline-service.leveldb-timeline-store.path=${hadoop.tmp.dir}/yarn/timeline
yarn.timeline-service.leveldb-timeline-store.read-cache-size=104857600
yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size=10000
yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size=10000
yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=300000
yarn.timeline-service.store-class=org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore
yarn.timeline-service.ttl-enable=false
yarn.timeline-service.ttl-ms=604800000
yarn.timeline-service.webapp.address=dwrdevnn1.sv2.trulia.com:8188
yarn.timeline-service.webapp.https.address=${yarn.timeline-service.hostname}:8190



I think i must be missing something obvious but if the timeline service is
running and tez is using ATSHistoryLoggingService one would think it would
work, no?

thanks again for your help!

Cheers,
Stephen.



On Wed, Dec 14, 2016 at 4:23 PM, Gopal Vijayaraghavan <gopalv@apache.org>
wrote:

>
> > looking at the stderr of that one container hanging around we have this
> below.
>
> Look in the syslog for a log line which starts with
>
> ATSService being stopped, eventQueueBacklog=<number>…, waitForever=true
>
> Cheers,
> Gopal
>
>
>
>
>
>
>

Mime
View raw message