Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajesh Balamohan <rbalamo...@apache.org>
Subject Re: Trace Key-Value pairs
Date Mon, 05 Dec 2016 02:14:20 GMT
Hi Robert,

Tez deals with bytes and does not understand if the data is coming from
Hive/Pig/Cascading etc. So in case you print the content from Hive, you
would get mostly binary data.  For hive,
org.apache.hadoop.hive.ql.io.HiveKey, and value would be
org.apache.hadoop.io.BytesWritable. Printing this would just churn out
binary contents. You can print it from the below locations in Tez.

Writing keyValues:
https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/sort/impl/PipelinedSorter.java#L375

Reading keyValues:
https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/ValuesIterator.java#L186,

https://github.com/apache/tez/blob/master/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/common/ValuesIterator.java#L213

If you are interested in knowing the real key/value details, you may want
to print the details from Hive side. This may be best answered in Hive
community mailing list.
But at a very high level in Hive, key gets converted to HiveKey which is a
wrapper around BytesWritable. You may want to print the details of key
values using the relevant object inspector in Hive. E.g
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java#L526.
In this case, you may want to get the relevant object inspector and print
out the contents. This is just an example.

~Rajesh.B


On Mon, Dec 5, 2016 at 5:43 AM, Robert Grandl <rgrandl@yahoo.com> wrote:

> Hi guys,
>
> I am running Hive atop Tez and run several TPC-DS / TPC-H queries. I am
> trying to print the Key/Value pairs received as input by each vertex and
> generated as output accordingly.
>
> However, looking at Hive / Tez code, it seems they are converted to Object
> type and use their serialized forms along. I would like to print the
> original content in <Key, Value> pairs both when generated and received by
> a vertex (just for the purpose of  understanding).
>
> Could you please give me some hints on how I can do that?
>
> Thank you,
> Robert
>
>

Mime
View raw message