1397535668 发表于 2018-11-18 08:16:17

Apache Tez Installation Guide

  Tez是Apache开源的DAG作业的计算引擎,是为了减小Hive作业的延迟而提出的解决方案,Tez已被Hortonworks用于Hive引擎的优化,经测试,性能提升约100倍。Tez+Hive仍然采用MapReduce计算框架,但对DAG的依赖关系进行了剪裁,并将多个小作业合并成一个大作业,这样不仅作业量减少了,而且写HDFS的次数也会大大减少。Tez具有以下几个特点:
(1) 丰富的数据流(dataflow,NOT Streaming!)编程接口;
(2) 扩展性良好的“Input-Processor-Output”运行模型;
(3) 简化数据部署(充分利用了YARN框架,Tez本身仅是一个客户端编      程库,无需事先部署相关服务)
(4) 性能优于MapReduce
(5)优化的资源管理(直接运行在资源管理系统YARN之上)
(6) 动态生成物理数据流(dataflow)
Tez和MapReduce的区别,如下图所示:

一、源代码安装
1.1 依赖软件包
本文的操作系统环境是Oracle Linux 7.4,需要安装以下依赖包:

# yum -y install git bzip2 redhat-lsb
  1.2 安装protobuf软件

# wget https://github.com/google/protobuf/releases/download/v3.5.1/protobuf-all-3.5.1.tar.gz
# tar -xzf /u02/software/src/protobuf-all-3.5.1.tar.gz
# cd /u02/protobuf-3.5.1;./configure;make;make install
--编译安装完成后,执行protoc命令出现以下结果则安装成功:
# protoc --version
libprotoc 3.5.1
  1.3 编译安装tez

$ wget http://mirrors.hust.edu.cn/apache/tez/0.9.0/apache-tez-0.9.0-src.tar.gz
$ tar -xzf /u02/software/src/apache-tez-0.9.0-src.tar.gz
$ cd apache-tez-0.9.0-src
--若protoc不是2.5.0版本,则必须编辑源代码文件夹里的pom.xml文件,修改protoc为系统当前使用的版本。
$ mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true
  此过程比较漫长,编译成功后,如下图所示:

1.4 配置tez
编译后的tez-dist/target/tez-0.9.0.tar.gz就是需要的二进制软件包。
1.4.1 上传二进制软件包

$ hdfs dfs -mkdir /user/tez
$ hdfs dfs -put /u02/software/apache-tez-0.9.0-src/tez-dist/target/tez-0.9.0.tar.gz /user/tez
  1.4.2 解压缩文件

$ tar -xzf /u02/software/apache-tez-0.9.0-src/tez-dist/target/tez-0.9.0.tar.gz
$ mv tez-0.9.0 tez
  1.4.3 创建tez-site.xml文件
在hadoop主节点的$HADOOP_HOME/etc/hadoop/目录下创建tez-site.xml文件(只在主节点创建即可),内容如下:





tez.lib.uris
${fs.defaultFS}/user/tez/tez-0.9.0.tar.gz


tez.container.max.java.heap.fraction
0.3


  1.4.4 编辑mapred-site.xml
将mapreduce.framework.name的值从yarn改为yarn-tez即可。
1.4.5 修改hadoop-env.sh
追加以下内容:

export TEZ_CONF_DIR=/u01/hadoop/etc/hadoop/tez-site.xml
export TEZ_JARS=/u01/tez
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*
  1.4.6 同步文件
这里需要将tez-site.xml、mapred-site.xml、hadoop-env.sh以及/u01/tez目录同步到集群其他节点,如下:

$ for i in {2..4};do scp hadoop-env.sh hdp0$i:/u01/hadoop/etc/hadoop/;done
$ for i in {2..4};do scp mapred-site.xml hdp0$i:/u01/hadoop/etc/hadoop/;done
$ for i in {2..4};do scp tez-site.xml hdp0$i:/u01/hadoop/etc/hadoop/;done
$ for i in {2..4};do scp -r /u01/tez hdp0$i:/u01;done
  1.4.7 重启hadoop集群

$ stop-yarn.sh;stop-dfs.sh
$ start-dfs.sh;start-yarn.sh
  到此,整个tez安装已完成。
二、测试验证
2.1 准备测试文件

$ echo "Hello World Hello Tez" > file01
$ echo "Hello World Goodbye Tez" > file02
$ hdfs dfs -mkdir /user/tez/input
$ hdfs dfs -mkdir /user/tez/output
$ hdfs dfs -put file0*/user/tez/input
  2.2 使用以下命令验证

$ cd /u01/tez
$ $ hadoop jar tez-examples-0.9.0.jar orderedwordcount /user/tez/input /user/tez/output
17/12/26 11:49:47 INFO shim.HadoopShimsLoader: Trying to locate HadoopShimProvider for hadoopVersion=2.7.4, majorVersion=2, minorVersion=7
17/12/26 11:49:47 INFO shim.HadoopShimsLoader: Picked HadoopShim org.apache.tez.hadoop.shim.HadoopShim27, providerName=org.apache.tez.hadoop.shim.HadoopShim25_26_27Provider, overrideProviderViaConfig=null, hadoopVersion=2.7.4, majorVersion=2, minorVersion=7
17/12/26 11:49:47 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.9.0, revision=0873a0118a895ca84cbdd221d8ef56fedc4b43d0, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2017-07-18T05:41:23Z ]
17/12/26 11:49:48 INFO examples.OrderedWordCount: Running OrderedWordCount
17/12/26 11:49:48 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
17/12/26 11:49:48 INFO client.TezClient: Submitting DAG application with id: application_1513929521869_0023
17/12/26 11:49:48 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: hdfs://hdp01:9000/user/tez/tez.tar.gz
17/12/26 11:49:48 INFO client.TezClientUtils: Using tez.lib.uris.classpath value from configuration: null
17/12/26 11:49:48 INFO client.TezClient: Tez system stage directory hdfs://hdp01:9000/tmp/hadoop/tez/staging/.tez/application_1513929521869_0023 doesn't exist and is created
17/12/26 11:49:49 INFO client.TezClient: Submitting DAG to YARN, applicationId=application_1513929521869_0023, dagName=OrderedWordCount, callerContext={ context=TezExamples, callerType=null, callerId=null }
17/12/26 11:49:49 INFO impl.YarnClientImpl: Submitted application application_1513929521869_0023
17/12/26 11:49:49 INFO client.TezClient: The url to track the Tez AM: http://hdp04:8088/proxy/application_1513929521869_0023/
17/12/26 11:49:53 INFO client.DAGClientImpl: DAG initialized: CurrentState=Running
17/12/26 11:49:53 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 3 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:53 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:53 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:53 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl: DAG: State: RUNNING Progress: 33.33% TotalTasks: 3 Succeeded: 1 Running: 1 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 1 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl: DAG: State: SUCCEEDED Progress: 100% TotalTasks: 3 Succeeded: 3 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl:    VertexStatus: VertexName: Tokenizer Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl:    VertexStatus: VertexName: Summation Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl:    VertexStatus: VertexName: Sorter Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
17/12/26 11:49:58 INFO client.DAGClientImpl: DAG completed. FinalState=SUCCEEDED
  执行成功后,查看output下面的文件,如下:

$ hdfs dfs -ls /user/tez/output
Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2017-12-26 11:49 /user/tez/output/_SUCCESS
-rw-r--r--   3 hadoop supergroup         32 2017-12-26 11:49 /user/tez/output/part-v002-o000-r-00000
$ hdfs dfs -text /user/tez/output/part-v002-o000-r-00000
Goodbye 1
Tez   2
World   2
Hello   3
  三、Hive操作验证
在hive控制台指定execution engine为tez即可,默认是mr(mapreduce)。

hive> set hive.execution.engine=tez;
hive> use hivedb;
hive> select count(*) from xj_student;

如果修改默认值为tez,需要编辑hive-site.xml文件,修改execution engine为tez,重启hive服务即可。

参考文献:
1、安装Tez 0.9.0
2、Install/Deploy Instructions for Tez



页: [1]
查看完整版本: Apache Tez Installation Guide