Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jianfeng (Jeff) Zhang" <jzh...@hortonworks.com>
Subject Re: Tez Vertex Info analysis
Date Tue, 07 Apr 2015 00:54:57 GMT
<1> All the tez apis  is based on Java, could I get the C# version or how can I implement
the C# version based on the tez service ??
        There’s no C# version of tez java api.   For tez ATS Service, since it is rest api,
so could be ported to C#.

<2> I am using /ws/v1/timeline/TEZ_DAG_ID/ this ATS API to get DAG execution plan and
also I can get the DAG status , could this status can be running ?? or it is like vertex info
, it is only pushed into ATS when the graph execution has been finished. If it is so, we also
cannot get the DAG info dynamically when it was running ?

        You can get running status through this REST API, but may need more work to parse
the json result. E.g. You can get the DAG status from /ws/v1/timeline/TEZ_DAG_ID,  then get
the vertices info through parsing the json response. And then query /ws/v1/timeline/TEZ_VERTEX_ID
to get the vertex status. Tez-UI use this REST API to get the dag running status. But not
sure whether there’s unified API for parsing the json response. Need to check that.
        BTW, what your purpose of this ? Is it for monitoring or performance diagnosis ?


Best Regard,
Jeff Zhang


From: "Joe Zhang (SDE)" <guizha@microsoft.com<mailto:guizha@microsoft.com>>
Reply-To: "user@tez.apache.org<mailto:user@tez.apache.org>" <user@tez.apache.org<mailto:user@tez.apache.org>>
Date: Saturday, April 4, 2015 at 12:14 PM
To: "user@tez.apache.org<mailto:user@tez.apache.org>" <user@tez.apache.org<mailto:user@tez.apache.org>>,
Xiaoyong Zhu <xiaoyzhu@microsoft.com<mailto:xiaoyzhu@microsoft.com>>
Cc: Yifung Lin <yifungl@microsoft.com<mailto:yifungl@microsoft.com>>, HDInsight
VS Tooling V-team <hdivstool@microsoft.com<mailto:hdivstool@microsoft.com>>
Subject: RE: Tez Vertex Info analysis

Hi JianFeng :

<1> All the tez apis  is based on Java, could I get the C# version or how can I implement
the C# version based on the tez serice ??

<2>
>>>>>>>>>    why does not contains running tasks? How can get the
number of running task and not started tasks?

       If you get the data from ATS, then it is not possible to get the running tasks number.
Because these data are pushed to ATS when the vertex is completed which means there’s no running
tasks at that time.
       But you can use the Tez API to get the status of running vertex where you can get the
number of running tasks.
 >>>>>>>>
I am using /ws/v1/timeline/TEZ_DAG_ID/ this ATS API to get DAG execution plan and also I can
get the DAG status , could this status can be running ?? or it is like vertex info , it is
only pushed into ATS when the graph execution has been finished. If it is so, we also cannot
get the DAG info dynamically when it was running ?

From: Jianfeng (Jeff) Zhang [mailto:jzhang@hortonworks.com]
Sent: Saturday, April 4, 2015 8:49 AM
To: Xiaoyong Zhu; user@tez.apache.org<mailto:user@tez.apache.org>
Cc: Yifung Lin; HDInsight VS Tooling V-team
Subject: Re: Tez Vertex Info analysis


Hi Xiaoyong,

Here’s the javadoc link for tez api.  http://tez.apache.org/releases/0.6.0/tez-api-javadocs/index.html
I would suggest you to check the wordcount example to get started on the tez api.
https://github.com/apache/tez/blob/master/tez-examples/src/main/java/org/apache/tez/examples/WordCount.java

The main flow is as following:

  1.   Create TezClient
  2.   Create DAG
  3.   Use TezClient to submit DAG (which would return DAGClient to you)
  4.   Query the DAGClient until the DAG is finished ( here you can get the vertex status
)

Best Regard,
Jeff Zhang


From: Xiaoyong Zhu <xiaoyzhu@microsoft.com<mailto:xiaoyzhu@microsoft.com>>
Date: Friday, April 3, 2015 at 9:15 PM
To: Jianfeng Zhang <jzhang@hortonworks.com<mailto:jzhang@hortonworks.com>>, "user@tez.apache.org<mailto:user@tez.apache.org>"
<user@tez.apache.org<mailto:user@tez.apache.org>>
Cc: Yifung Lin <yifungl@microsoft.com<mailto:yifungl@microsoft.com>>, HDInsight
VS Tooling V-team <hdivstool@microsoft.com<mailto:hdivstool@microsoft.com>>
Subject: RE: Tez Vertex Info analysis

Yes, we mean the ATS APIs.
What Tez API did you refer to? Are there additional REST APIs to get the Tez info directly?

Xiaoyong

From: Jianfeng (Jeff) Zhang [mailto:jzhang@hortonworks.com]
Sent: Friday, April 3, 2015 5:29 PM
To: user@tez.apache.org<mailto:user@tez.apache.org>
Cc: Xiaoyong Zhu; Yifung Lin; HDInsight VS Tooling V-team
Subject: Re: Tez Vertex Info analysis


Hi Joe,

What do you mean tez REST API ? Do you mean you get these info through the ATS (application
timeline service ) ?

<1> what does “ numTasks” contains ? what are relationships among numTasks, numberCompletedTasks,
numKilledTasks , numFailedTasks, numSucceededTasks ?

        numTasks means the number of tasks of this vertex should run. If you have some knowledge
of map reduce, you can think of this as the number of tasks in mapper/reducer.
        numberCompletedTasks means the number of task that has finished. There’re 3 possible
states for task finishing: Succeeded/Failed/Killed.  So that means numberCompletedTasks should
be equals to the sum of numKilledTasks , numFailedTasks, numSucceededTasks


<2> why does not contains running tasks? How can get the number of running task and
not started tasks?

       If you get the data from ATS, then it is not possible to get the running tasks number.
Because these data are pushed to ATS when the vertex is completed which means there’s no running
tasks at that time.
       But you can use the Tez API to get the status of running vertex where you can get the
number of running tasks.


<3> whether any task in this vertex failed will called the whole vertex failed??

       Yes, vertex only succeeded when no task is failed/killed.  That means numberCompletedTasks
should be equal to numSucceededTasks if the vertex is succeeded.


Best Regard,
Jeff Zhang


From: "Joe Zhang (SDE)" <guizha@microsoft.com<mailto:guizha@microsoft.com>>
Reply-To: "user@tez.apache.org<mailto:user@tez.apache.org>" <user@tez.apache.org<mailto:user@tez.apache.org>>
Date: Friday, April 3, 2015 at 3:30 PM
To: "user@tez.apache.org<mailto:user@tez.apache.org>" <user@tez.apache.org<mailto:user@tez.apache.org>>
Cc: Xiaoyong Zhu <xiaoyzhu@microsoft.com<mailto:xiaoyzhu@microsoft.com>>, Yifung
Lin <yifungl@microsoft.com<mailto:yifungl@microsoft.com>>, HDInsight VS Tooling
V-team <hdivstool@microsoft.com<mailto:hdivstool@microsoft.com>>
Subject: Tez Vertex Info analysis

Hi Tez experts:

       I am using tez REST API to analysis vertex running information, below is what I get
. but I am wandering some concepts

<1> what does “ numTasks” contains ? what are relationships among numTasks, numberCompletedTasks,
numKilledTasks , numFailedTasks, numSucceededTasks ?

<2> why does not contains running tasks? How can get the number of running task and
not started tasks?

<3> whether any task in this vertex failed will called the whole vertex failed??



[cid:image001.png@01D06ECE.8CA7F440]
Best wishes
Joe zhang


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message