pig安装在hadoop伪分布式节点

uigy · 发表于 2014-10-8 14:27:01

本次pig安装在一个hadoop伪分布式节点。

Pig是yahoo捐献给apache的一个项目，它是SQL-like语言，是在MapReduce上构建的一种高级查询语言，把一些运算编译进MapReduce模型的Map和Reduce中，并且用户可以定义自己的功能。

Pig是一个客户端应用程序，就算你要在Hadoop集群上运行Pig，也不需要在集群上装额外的东西。

首先从官网上下载pig安装包，并上传到服务器后。使用以下命令解压：

[hadoop@hadoop1 soft]$ tar -zxvf pig-0.13.0.tar.gz

为了配置方便，简单可以修改一下解压后的文件名

[hadoop@hadoop1 ~]$ mv pig-0.13.0 pig2

在hadoop用户的.bash_profile中增加pig环境变量

[hadoop@hadoop1 ~]$ cat .bash_profile

# .bash_profile

# Get the aliases and functions

if [ -f ~/.bashrc ]; then

. ~/.bashrc

fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH

export JAVA_HOME=/usr/lib/jvm/java-1.7.0/

export HADOOP_HOME=/home/hadoop/hadoop2

export PIG_HOME=/home/hadoop/pig2

export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop/

export ATH=$PATH:$JAVA_HOME/bin/:$HADOOP_HOME/bin:$PIG_HOME/bin

[hadoop@hadoop1 ~]$ source .bash_profile

Pig有两种模式：

一种是Localmode，也就是本地模式，这种模式下Pig运行在一个JVM里，访问的是本地的文件系统，只适合于小规模数据集，一般是用来体验Pig。而且，它并没有用到Hadoop的Localrunner，Pig把查询转换为物理的Plan，然后自己去执行。

在终端下输入

% pig -x local

就可以进入Local模式了。

还有一种就是Hadoop模式了，这种模式下，Pig才真正的把查询转换为相应的MapReduce Jobs，并提交到Hadoop集群去运行，集群可以是真实的分布式也可以是伪分布式。

[hadoop@hadoop1 ~]$ pig
14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : LOCAL
14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/09/10 21:04:08 INFOpig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-09-10 21:04:09,149 [main] INFO  org.apache.pig.Main - Apache Pig version0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-09-10 21:04:09,150 [main] INFO  org.apache.pig.Main - Logging error messagesto: /home/hadoop/pig2/pig-err.log
2014-09-10 21:04:09,435 [main] INFO  org.apache.pig.impl.util.Utils - Defaultbootup file /home/hadoop/.pigbootup not found
2014-09-10 21:04:10,345 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address
2014-09-10 21:04:10,345 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated.Instead, use fs.defaultFS
2014-09-10 21:04:10,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoop1:9000
2014-09-10 21:04:10,360 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation- mapred.used.genericoptionsparser is deprecated. Instead, usemapreduce.client.genericoptionsparser.used
2014-09-10 21:04:12,820 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address
2014-09-10 21:04:12,821 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to map-reduce job tracker at: hadoop1:9001
2014-09-10 21:04:12,831 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation- fs.default.name is deprecated. Instead, use fs.defaultFS
grunt>

grunt> help
Commands:
<pig latin statement>; - See thePigLatin manual for details: http://hadoop.apache.org/pig
File system commands:
fs <fs arguments> - Equivalent to Hadoop dfs command:http://hadoop.apache.org/common/docs/current/hdfs_shell.html
Diagnostic commands:
describe <alias>[::<alias] - Show the schema for the alias.Inner aliases can be described as A::B.
explain [-script <pigscript>] [-out <path>] [-brief][-dot|-xml] [-param <param_name>=<param_value>]
   [-param_file <file_name>] [<alias>] - Show the executionplan to compute the alias or for entire script.
   -script - Explain the entire script.
   -out - Store the output into directory rather than print to stdout.
   -brief - Don't expand nested plans (presenting a smaller graph foroverview).
   -dot - Generate the output in .dot format. Default is text format.
   -xml - Generate the output in .xml format. Default is text format.
   -param <param_name - See parameter substitution for details.
   -param_file <file_name> - See parameter substitution for details.
   alias - Alias to explain.
dump <alias> - Compute the alias and writes the results to stdout.
Utility Commands:
exec [-param <param_name>=param_value] [-param_file<file_name>] <script> -
   Execute the script with access to grunt environment including aliases.
   -param <param_name - See parameter substitution for details.
   -param_file <file_name> - See parameter substitution for details.
   script - Script to be executed.
run [-param <param_name>=param_value] [-param_file<file_name>] <script> -
   Execute the script with access to grunt environment.
   -param <param_name - See parameter substitution for details.
   -param_file <file_name> - See parameter substitution for details.
   script - Script to be executed.
sh  <shell command> - Invokea shell command.
kill <job_id> - Kill the hadoop job specified by the hadoop jobid.
set <key> <value> - Provide execution parameters to Pig.Keys and values are case sensitive.
   The following keys are supported:
   default_parallel - Script-level reduce parallelism. Basic input sizeheuristics used by default.
   debug - Set debug on or off. Default is off.
   job.name - Single-quoted name for jobs. Default is PigLatin:<scriptname>
   job.priority - Priority for jobs. Values: very_low, low, normal, high,very_high. Default is normal
   stream.skippath - String that contains the path. This is used bystreaming.
   any hadoop property.
help - Display this message.
history [-n] - Display the list statements in cache.
   -n Hide line numbers.
quit - Quit the grunt shell.
grunt>

账号		自动登录	找回密码
密码			立即注册

wirelessnetview好用的无线分析工具

Red Hat RHCE 8 (EX294) Cert Guide

亿图图示专家(EDraw Max) V7.9 中文破解版

zabbix3.4.1安装部署+微信推送信息+大屏显

Red Hat OpenShift I: Containers & Kubern

2025 年，C++ 还能“硬核”多久？

RH199 RHCSA Rapid Track

[经验分享] pig安装在hadoop伪分布式节点

浏览过的版块

扫码加入运维网微信交流群