Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Shah <hit...@apache.org>
Subject Re: Tez Job fails - waiting for AM container to be allocated
Date Sat, 18 Jun 2016 01:38:38 GMT
-dev@tez for now.

Hello Anandha, 

The usual issue with this is a lack of resources. e.g. no cluster capacity to launch the AM,
queue configs not allowing another AM to launch, the memory size configured for the AM is
too large such that it cannot be scheduled on any existing node, etc. 

Can you search for this string “1466115469995_0142” within the ResourceManager logs? That
should shed some more light on what is going on. 

thanks
— Hitesh 


> On Jun 17, 2016, at 6:30 PM, Anandha L Ranganathan <analog.sony@gmail.com> wrote:
> 
> Yes.  sufficient resources  are available for that job.  No other jobs are running and
only this job is running.
>  
> 
> 
> On Fri, Jun 17, 2016 at 5:16 PM, Jeff Zhang <zjffdu@gmail.com> wrote:
> Please check RM UI whether you have sufficient resources for your app
> 
> 
> On Sat, Jun 18, 2016 at 7:35 AM, Anandha L Ranganathan <analog.sony@gmail.com>
wrote:
> I am upgrading one of our cluster from HDP 2.2 to HDP 2.4.0. version.
> 
> 
> 
> The status I see in the Application monitoring URL is
> 
> YARN Applicaiton Status: ACCEPTED: waiting for AM container to be
> allocated, launched and register with RM.  But when we submit the MR job,
> then it is running fine.
> 
> It waits in that state for sometime(300 seconds) and dies and the service
> check is failed.  All nodes are live and Active status.
> 
> 
> 
> We try to run the job manually , and the job stops at this point.
> 
> hadoop --config /usr/hdp/2.4.0.0-169/hadoop/conf jar
> /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount
> /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput1/
> WARNING: Use "yarn jar" to launch YARN applications.
> 16/06/17 19:04:47 INFO client.TezClient: Tez Client Version: [
> component=tez-api, version=0.7.0.2.4.0.0-169,
> revision=3c1431f45faaca982ecc8dad13a107787b834696,
> SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git,
> buildTime=20160210-0711 ]
> 16/06/17 19:04:47 INFO impl.TimelineClientImpl: Timeline service
> address: http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
> 16/06/17 <http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/16/06/17>
> 19:04:48 INFO client.RMProxy: Connecting to ResourceManager at
> usw2stdpma03.glassdoor.local/172.17.212.107:8050
> 16/06/17 19:04:48 INFO client.TezClient: Using
> org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to
> manage Timeline ACLs
> 16/06/17 19:04:48 INFO impl.TimelineClientImpl: Timeline service
> address: http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
> 16/06/17 <http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/16/06/17>
> 19:04:49 INFO examples.OrderedWordCount: Running OrderedWordCount
> 16/06/17 19:04:49 INFO client.TezClient: Submitting DAG application
> with id: application_1466115469995_0142
> 16/06/17 19:04:49 INFO client.TezClientUtils: Using tez.lib.uris value
> from configuration: /hdp/apps/2.4.0.0-169/tez/tez.tar.gz
> 16/06/17 19:04:49 INFO client.TezClient: Stage directory
> /tmp/root/staging doesn't exist and is created
> 16/06/17 19:04:49 INFO client.TezClient: Tez system stage directory
> hdfs://dfs-nameservices/tmp/root/staging/.tez/application_1466115469995_0142
> doesn't exist and is created
> 16/06/17 19:04:49 INFO acls.ATSHistoryACLPolicyManager: Created
> Timeline Domain for History ACLs,
> domainId=Tez_ATS_application_1466115469995_0142
> 16/06/17 19:04:50 INFO client.TezClient: Submitting DAG to YARN,
> applicationId=application_1466115469995_0142,
> dagName=OrderedWordCount, callerContext={ context=TezExamples,
> callerType=null, callerId=null }
> 16/06/17 19:04:50 INFO impl.YarnClientImpl: Submitted application
> application_1466115469995_0142
> 16/06/17 19:04:50 INFO client.TezClient: The url to track the Tez AM:
> http://usw2stdpma03.glassdoor.local:8088/proxy/application_1466115469995_0142/
> 16/06/17 <http://usw2stdpma03.glassdoor.local:8088/proxy/application_1466115469995_0142/16/06/17>
> 19:04:50 INFO impl.TimelineClientImpl: Timeline service address:
> http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/
> 16/06/17 <http://usw2stdpma03.glassdoor.local:8188/ws/v1/timeline/16/06/17>
> 19:04:50 INFO client.RMProxy: Connecting to ResourceManager at
> usw2stdpma03.glassdoor.local/172.17.212.107:8050
> 16/06/17 19:04:51 INFO client.DAGClientImpl: Waiting for DAG to start running
> 
> 
> 
> how do I fix this problem ?
> 
> Thanks
> Anand
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang
> 


Mime
View raw message