西大 发表于 2018-10-30 13:35:20

pig安装在hadoop伪分布式节点

  本次pig安装在一个hadoop伪分布式节点。
  Pig是yahoo捐献给apache的一个项目,它是SQL-like语言,是在MapReduce上构建的一种高级查询语言,把一些运算编译进MapReduce模型的Map和Reduce中,并且用户可以定义自己的功能。
  Pig是一个客户端应用程序,就算你要在Hadoop集群上运行Pig,也不需要在集群上装额外的东西。
  首先从官网上下载pig安装包,并上传到服务器后。使用以下命令解压:
  $ tar -zxvf pig-0.13.0.tar.gz
  为了配置方便,简单可以修改一下解压后的文件名
  $ mv pig-0.13.0 pig2
  在hadoop用户的.bash_profile中增加pig环境变量
  $ cat .bash_profile
  # .bash_profile
  # Get the aliases and functions
  if [ -f ~/.bashrc ]; then
  . ~/.bashrc
  fi
  # User specific environment and startup programs
  PATH=$PATH:$HOME/bin
  export PATH
  export JAVA_HOME=/usr/lib/jvm/java-1.7.0/
  export HADOOP_HOME=/home/hadoop/hadoop2
  export PIG_HOME=/home/hadoop/pig2
  export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop/
  export ATH=$PATH:$JAVA_HOME/bin/:$HADOOP_HOME/bin:$PIG_HOME/bin
  $ source .bash_profile
  Pig有两种模式:
  一种是Localmode,也就是本地模式,这种模式下Pig运行在一个JVM里,访问的是本地的文件系统,只适合于小规模数据集,一般是用来体验Pig。而且,它并没有用到Hadoop的Localrunner,Pig把查询转换为物理的Plan,然后自己去执行。
  在终端下输入
  % pig -x local
  就可以进入Local模式了。
  还有一种就是Hadoop模式了,这种模式下,Pig才真正的把查询转换为相应的MapReduce Jobs,并提交到Hadoop集群去运行,集群可以是真实的分布式也可以是伪分布式。
  $ pig
  14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : LOCAL
  14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : MAPREDUCE
  14/09/10 21:04:08 INFOpig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
  2014-09-10 21:04:09,149 INFOorg.apache.pig.Main - Apache Pig version0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
  2014-09-10 21:04:09,150 INFOorg.apache.pig.Main - Logging error messagesto: /home/hadoop/pig2/pig-err.log
  2014-09-10 21:04:09,435 INFOorg.apache.pig.impl.util.Utils - Defaultbootup file /home/hadoop/.pigbootup not found
  2014-09-10 21:04:10,345 INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address
  2014-09-10 21:04:10,345 INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated.Instead, use fs.defaultFS
  2014-09-10 21:04:10,346 INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoop1:9000
  2014-09-10 21:04:10,360 INFOorg.apache.hadoop.conf.Configuration.deprecation- mapred.used.genericoptionsparser is deprecated. Instead, usemapreduce.client.genericoptionsparser.used
  2014-09-10 21:04:12,820 INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address
  2014-09-10 21:04:12,821 INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to map-reduce job tracker at: hadoop1:9001
  2014-09-10 21:04:12,831 INFOorg.apache.hadoop.conf.Configuration.deprecation- fs.default.name is deprecated. Instead, use fs.defaultFS
  grunt>
  grunt> help
  Commands:
  ; - See thePigLatin manual for details: http://hadoop.apache.org/pig
  File system commands:
  fs- Equivalent to Hadoop dfs command:http://hadoop.apache.org/common/docs/current/hdfs_shell.html
  Diagnostic commands:
  describe [::
页: [1]
查看完整版本: pig安装在hadoop伪分布式节点