Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Shah <hit...@apache.org>
Subject Re: MRv2 and Tez co-existence
Date Tue, 08 Jul 2014 19:02:48 GMT
Hi Bala, 

I believe with respect to rolling upgrades, you would be installing the new version of Tez
without removing the older one ( i.e. “simple” rpms are probably a bad idea if you want rolling
upgrades :-) ). What this implies is that HADOOP_CLASSPATH in any scenario ( MR on Tez, Hive
on Tez or Hive Server ) can continue pointing to the older version of Tez. Likewise for the
tez jars on HDFS. This also means that you need 2 versions of tez-site.xml in versioned config
dirs i.e. TEZ_CLASSPATH=/opt/tez-0.4.1/conf:/opt/tez-0.4.1/*:/opt/tez-0.4.1/lib/*  ( for the
new version, it would be TEZ_CLASSPATH=/opt/tez-0.4.2/conf:/opt/tez-0.4.2/*:/opt/tez-0.4.2/lib/*
) 

Switching to the newer version of Tez should just be done by changing the env var to point
to the new version directory of Tez ( the conf in it will also point to the newer version
of Tez on HDFS).

Given that Tez is completely client-side, any job ( be it a hive query or MR job ) already
running on the cluster will not be affected when the switch is made ( jars are localized when
the job kicks off ). All newly submitted jobs will now pick the new version. Likewise for
the Hive Server, assuming it has been configured with a particular class path, it need not
be affected until it is restarted with a modified class path to the newly installed version.
The only gotcha is that the older jars cannot be deleted until all running jobs using them
have completed.

We can setup a face-to-face meeting/meetup for any interested folks on this area if there
is interest.

thanks
— Hitesh

On Jul 8, 2014, at 11:26 AM, Bala Krishna Gangisetty <bala@altiscale.com> wrote:

> Thanks Gopal, Bikas and Hitesh for pouring your thoughts.
> 
> Hi Gopal,
> 
> One follow-up question: As you advised, in case of rolling upgrades to overcome these
errors, for hive, the best place to update HADOOP_CLASSPATH with Tez jars is through hive-config.sh.
Could you also suggest the best ways to update HADOOP_CLASSPATH with Tez jars for mapreduce
programs and also for non Hive cli sessions (Through HiveServer2, et al)?
> 
> --Bala G.
> 
> 
> On Mon, Jul 7, 2014 at 7:30 PM, Gopal V <gopalv@apache.org> wrote:
> On 7/7/14, 5:50 PM, Bala Krishna Gangisetty wrote:
> Thanks Hitesh for your inputs. I've not come across any issues yet. So, I
> can safely assume that putting Tez jars in Hadoop class path will not cause
> the map reduce programs to use Tez framework unless it is enabled. Let me
> know if my understanding it not correct.
> 
> Your assumptions are correct.
> 
> But this is not advised because it will break rolling upgrades.
> 
> The main issue early adopters have run into is installing a tez built against hadoop-2.4.x
into a cluster running hadoop-2.2.x.
> 
> As Hitesh/Bikas mentioned, that would cause errors at runtime even for MR jobs.
> 
> The errors you will get for that case is similar to the errors you get during a rolling
upgrade between versions.
> 
> There is no real reason to include tez jars for any hadoop daemons (datanode, nodemanager)
you run in your cluster because they might error out while replacing those files.
> 
> The correct solution for this is to install Tez in its own versioned directory.
> 
> And for hive, within your hive-config.sh to do the following.
> 
> export HADOOP_CLASSPATH=/opt/tez/current/*:/opt/tez/current/lib/*:/etc/tez/conf/:/usr/share/java/*:$HADOOP_CLASSPATH
> 
> This setup with symlinks from
> 
> /etc/tez/conf -> /opt/tez/current/conf
> /opt/tez/current -> /opt/tez/0.4.1
> 
> Will ensure that you are ready to do rolling upgrades from day #1.
> 
> After the symlinks point to a new version, the only daemon to restart would be hive-server2.
> 
> Cheers,
> Gopal
> 
> 
> On Mon, Jul 7, 2014 at 4:10 PM, Hitesh Shah <hitesh@apache.org> wrote:
> 
> Hi
> 
> For the most part, there should be no issues as most dependencies that Tez
> pulls in are compatible with the hadoop version that it is compiled with (
> 2.2 or higher ). The major issue to be aware of is that you should compile
> Tez against the same version of hadoop/mapreduce that is deployed on your
> cluster.  The tez dependency jars contain both 3rd party deps as well as
> hadoop jars ( hdfs, common, yarn client-side and mapreduce client-side ) -
> if there is a version mismatch, this may cause a problem when the tez
> directory is added to the hadoop classpath.
> 
> Have you seen any issues? If yes, could you provide more details?
> 
> thanks
> — Hitesh
> 
> 
> On Jul 7, 2014, at 3:44 PM, Bala Krishna Gangisetty <bala@altiscale.com>
> wrote:
> 
> > I'm wondering, from operational point of view, are there any specifics
> that need special attention to make MRv2 and Tez frameworks coexist in
> harmony? I heard that putting Tez jars in Hadoop class path would impact
> the mapred behavior, even when Tez is not enabled (either through
> mapred-site.xml, or Hive). Could someone throw more light and share
> thoughts on it?
> >
> > --Bala G.
> 
> 
> 
> 
> 
> 


Mime
View raw message