Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Sprague <sprag...@gmail.com>
Subject Re: Tez GC issues perhaps? not sure.
Date Wed, 14 Dec 2016 23:05:12 GMT
first pass:
   1. changing yarn.timeline-service.ttl-enable to false didn't seem work.
i restarted the TLS and HS2 and RM.  and the query still stuck around.

   2. figure i'd try using RollingLevelDbTimelineStore but got class not
found so i'll dig around for that later today.


current settings for "yarn.timeline-service.*" vars are now this:

yarn.timeline-service.address=${yarn.timeline-service.hostname}:10200
yarn.timeline-service.client.max-retries=30
yarn.timeline-service.client.retry-interval-ms=1000
yarn.timeline-service.enabled=true
yarn.timeline-service.handler-thread-count=10
yarn.timeline-service.hostname=XXXXX.sv2.trulia.com
yarn.timeline-service.http-authentication.simple.anonymous.allowed=true
yarn.timeline-service.http-authentication.type=simple
yarn.timeline-service.http-cross-origin.enabled=true
yarn.timeline-service.keytab=/etc/krb5.keytab
yarn.timeline-service.leveldb-timeline-store.path=${hadoop.tmp.dir}/yarn/timeline
yarn.timeline-service.leveldb-timeline-store.read-cache-size=104857600
yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size=10000
yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size=10000
yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=300000
yarn.timeline-service.store-class=org.apache.hadoop.yarn.server.timeline.RollingLevelDbTimelineStore
<-- need to find jar with this class
yarn.timeline-service.ttl-enable=false  <-- change to false
yarn.timeline-service.ttl-ms=604800000  <-- one week?
yarn.timeline-service.webapp.address=XXXX.sv2.trulia.com:8188
yarn.timeline-service.webapp.https.address=${yarn.timeline-service.hostname}:8190


looking at the stderr of that one container hanging around we have this
below.

2016-12-14 13:58:38 Running Dag: dag_1481520856023_2137_1
Dec 14, 2016 1:58:51 PM
com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider
get
WARNING: You are attempting to use a deprecated API (specifically,
attempting to @Inject ServletContext inside an eagerly created singleton.
While we allow this for backwards compatibility, be warned that this MAY
have unexpected behavior if you have more than one injector (with
ServletModule) running in the same JVM. Please consult the Guice
documentation at http://code.google.com/p/google-guice/wiki/Servlets for
more information.
2016-12-14 13:59:06 Completed Dag: dag_1481520856023_2137_1
2016-12-14 13:59:11 Running Dag: dag_1481520856023_2137_2
2016-12-14 13:59:19 Completed Dag: dag_1481520856023_2137_2
2016-12-14 13:59:25 Running Dag: dag_1481520856023_2137_3
2016-12-14 13:59:39 Completed Dag: dag_1481520856023_2137_3
2016-12-14 13:59:43 Running Dag: dag_1481520856023_2137_4
2016-12-14 13:59:54 Completed Dag: dag_1481520856023_2137_4
2016-12-14 13:59:56 Running Dag: dag_1481520856023_2137_5
2016-12-14 14:00:08 Completed Dag: dag_1481520856023_2137_5
2016-12-14 14:00:10 Running Dag: dag_1481520856023_2137_6
2016-12-14 14:03:21 Completed Dag: dag_1481520856023_2137_6
2016-12-14 14:03:26 Running Dag: dag_1481520856023_2137_7
2016-12-14 14:03:44 Completed Dag: dag_1481520856023_2137_7
2016-12-14 14:03:47 Running Dag: dag_1481520856023_2137_8
2016-12-14 14:04:04 Completed Dag: dag_1481520856023_2137_8
2016-12-14 14:04:11 Running Dag: dag_1481520856023_2137_9
2016-12-14 14:04:35 Completed Dag: dag_1481520856023_2137_9
2016-12-14 14:04:48 Running Dag: dag_1481520856023_2137_10
2016-12-14 14:04:54 Completed Dag: dag_1481520856023_2137_10


and this is stdout:

* spragues@dwrdevdn27:~$ sudo ls -l
/storage6/hadoop/yarn/logs/application_1481520856023_2137/container_1481520856023_2137_01_000001/stdout
-rw-rw-r-- 1 yarn yarn 655355 Dec 14 14:54
/storage6/hadoop/yarn/logs/application_1481520856023_2137/container_1481520856023_2137_01_000001/stdout

* spragues@dwrdevdn27:~$ sudo tail
/storage6/hadoop/yarn/logs/application_1481520856023_2137/container_1481520856023_2137_01_000001/stdout
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Free CSet: 0.4 ms]
   [Eden: 215.0M(215.0M)->0.0B(157.0M) Survivors: 3072.0K->28.0M Heap:
331.0M(373.0M)->149.4M(373.0M)]
 [Times: user=0.11 sys=0.00, real=0.03 secs]
Heap
 garbage-first heap   total 381952K, used 229430K [0x00000000ccc00000,
0x00000000e4100000, 0x0000000100000000)
  region size 1024K, 102 young (104448K), 28 survivors (28672K)
 Metaspace       used 52193K, capacity 52928K, committed 52952K, reserved
1095680K
  class space    used 5696K, capacity 5868K, committed 5888K, reserved
1048576K
spragues@dwrdevdn27:~$ sudo tail -30
/storage6/hadoop/yarn/logs/application_1481520856023_2137/container_1481520856023_2137_01_000001/stdout
3352.102: [GC pause (G1 Evacuation Pause) (young), 0.0283240 secs]
   [Parallel Time: 5.7 ms, GC Workers: 18]
      [GC Worker Start (ms): Min: 3352102.4, Avg: 3352102.5, Max:
3352102.6, Diff: 0.2]
      [Ext Root Scanning (ms): Min: 0.8, Avg: 1.0, Max: 2.2, Diff: 1.5,
Sum: 17.7]
      [Update RS (ms): Min: 1.1, Avg: 2.7, Max: 4.4, Diff: 3.2, Sum: 49.0]
         [Processed Buffers: Min: 1, Avg: 3.1, Max: 10, Diff: 9, Sum: 56]
      [Scan RS (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.9]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2,
Sum: 0.5]
      [Object Copy (ms): Min: 0.1, Avg: 1.2, Max: 1.5, Diff: 1.5, Sum: 21.4]
      [Termination (ms): Min: 0.0, Avg: 0.5, Max: 0.6, Diff: 0.6, Sum: 8.4]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum:
0.6]
      [GC Worker Total (ms): Min: 5.3, Avg: 5.5, Max: 5.6, Diff: 0.3, Sum:
98.5]
      [GC Worker End (ms): Min: 3352107.9, Avg: 3352108.0, Max: 3352108.0,
Diff: 0.1]
   [Code Root Fixup: 0.4 ms]
   [Code Root Migration: 0.8 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.4 ms]
   [Other: 21.0 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 20.2 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.1 ms]
      [Free CSet: 0.4 ms]
   [Eden: 215.0M(215.0M)->0.0B(157.0M) Survivors: 3072.0K->28.0M Heap:
331.0M(373.0M)->149.4M(373.0M)]
 [Times: user=0.11 sys=0.00, real=0.03 secs]
Heap
 garbage-first heap   total 381952K, used 229430K [0x00000000ccc00000,
0x00000000e4100000, 0x0000000100000000)
  region size 1024K, 102 young (104448K), 28 survivors (28672K)
 Metaspace       used 52193K, capacity 52928K, committed 52952K, reserved
1095680K
  class space    used 5696K, capacity 5868K, committed 5888K, reserved
1048576K



So definitely looks GC-ish related, yeah?  okay, onward looking for that
RollingLevelDb class next...

Cheers,
STephen.



On Wed, Dec 14, 2016 at 10:03 AM, Stephen Sprague <spragues@gmail.com>
wrote:

> Thanks Gopal.  I'll set the ttl flag to false and see what gives.
>
> Cheers,
> Stephen
>
> On Tue, Dec 13, 2016 at 10:48 PM, Gopal Vijayaraghavan <gopalv@apache.org>
> wrote:
>
>> > yarn.timeline-service.ttl-enable=true
>>
>> Let us validate that this is due to the TTL GC kicking in and disable the
>> TTL flag & leave it running for a day.
>>
>> Better to also verify the Tez logs of sessions hanging along waiting for
>> the ATS to collect events (look for the last _post log file in the AM logs
>> link).
>>
>> > you propose that setting that to "RollingLevelDbTimelineStore" might
>> fix the issue?
>>
>> Yes, but you would lose all the existing history, so not yet - but it
>> will be what you need to do to get out of the TTL.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>

Mime
View raw message