|
本次pig安装在一个hadoop伪分布式节点。
Pig是yahoo捐献给apache的一个项目,它是SQL-like语言,是在MapReduce上构建的一种高级查询语言,把一些运算编译进MapReduce模型的Map和Reduce中,并且用户可以定义自己的功能。
Pig是一个客户端应用程序,就算你要在Hadoop集群上运行Pig,也不需要在集群上装额外的东西。
首先从官网上下载pig安装包,并上传到服务器后。使用以下命令解压:
[hadoop@hadoop1 soft]$ tar -zxvf pig-0.13.0.tar.gz
为了配置方便,简单可以修改一下解压后的文件名
[hadoop@hadoop1 ~]$ mv pig-0.13.0 pig2
在hadoop用户的.bash_profile中增加pig环境变量
[hadoop@hadoop1 ~]$ cat .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin
export PATH
export JAVA_HOME=/usr/lib/jvm/java-1.7.0/
export HADOOP_HOME=/home/hadoop/hadoop2
export PIG_HOME=/home/hadoop/pig2
export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop/
export ATH=$PATH:$JAVA_HOME/bin/:$HADOOP_HOME/bin:$PIG_HOME/bin
[hadoop@hadoop1 ~]$ source .bash_profile
Pig有两种模式:
一种是Localmode,也就是本地模式,这种模式下Pig运行在一个JVM里,访问的是本地的文件系统,只适合于小规模数据集,一般是用来体验Pig。而且,它并没有用到Hadoop的Localrunner,Pig把查询转换为物理的Plan,然后自己去执行。
在终端下输入
% pig -x local
就可以进入Local模式了。
还有一种就是Hadoop模式了,这种模式下,Pig才真正的把查询转换为相应的MapReduce Jobs,并提交到Hadoop集群去运行,集群可以是真实的分布式也可以是伪分布式。
[hadoop@hadoop1 ~]$ pig
14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : LOCAL
14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/09/10 21:04:08 INFOpig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-09-10 21:04:09,149 [main] INFO org.apache.pig.Main - Apache Pig version0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-09-10 21:04:09,150 [main] INFO org.apache.pig.Main - Logging error messagesto: /home/hadoop/pig2/pig-err.log
2014-09-10 21:04:09,435 [main] INFO org.apache.pig.impl.util.Utils - Defaultbootup file /home/hadoop/.pigbootup not found
2014-09-10 21:04:10,345 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address
2014-09-10 21:04:10,345 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated.Instead, use fs.defaultFS
2014-09-10 21:04:10,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoop1:9000
2014-09-10 21:04:10,360 [main] INFO org.apache.hadoop.conf.Configuration.deprecation- mapred.used.genericoptionsparser is deprecated. Instead, usemapreduce.client.genericoptionsparser.used
2014-09-10 21:04:12,820 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address
2014-09-10 21:04:12,821 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to map-reduce job tracker at: hadoop1:9001
2014-09-10 21:04:12,831 [main] INFO org.apache.hadoop.conf.Configuration.deprecation- fs.default.name is deprecated. Instead, use fs.defaultFS
grunt>
grunt> help
Commands:
; - See thePigLatin manual for details: http://hadoop.apache.org/pig
File system commands:
fs - Equivalent to Hadoop dfs command:http://hadoop.apache.org/common/docs/current/hdfs_shell.html
Diagnostic commands:
describe [:: |
|
|