云帆大数据学院_hadoop 2.2.0源码编译

namedhao · 发表于 2018-10-30 10:49:21

2.1下载地址
　　1、ApacheHadoop（100%永久开源）下载地址：
　　- http://hadoop.apache.org/releases.html
　　- SVN：http://svn.apache.org/repos/asf/hadoop/common/branches/
　　2、CDH（ClouderaDistributed Hadoop，100%永久开源）下载地址：
　　- http://archive.cloudera.com/cdh4/cdh/4/（是tar.gz文件！）
　　- http://archive.cloudera.com/cdh5/cdh/ （是tar.gz文件！）
2.2官方版本说明
　　(1) 官网：http://hadoop.apache.org
　　(2) 下载Hadoop包

　　(3)  官方版本存在的问题
　　官方版本是在Linux 32位环境下编译的，在Linux64为环境下运行会出错：

　　u  错误警告：WARNutil.NativeCodeLoader: Unable to load native-hadoop library for yourplatform... using builtin-java>　　u  官网提供的二进制包，里面的native库，是32位的可以通过以下命令进行查看：
　　$file $HADOOP_PREFIX/lib/native/libhadoop.so.1.0.0
　　可以看到该库是基于32位的
　　libhadoop.so.1.0.0: ELF 32-bit LSBshared object, Intel 80386, version 1 (SYSV), dynamically linked,BuildID[sha1]=0x9eb1d49b05f67d38454e42b216e053a27ae8bac9, not stripped。
2.3官方编译说明
　　在下载下来的hadoop-2.2.0-src.tar.gz包下有个BUILDING.txt文件，这个文件详细说明了编译步骤
　　Build instructions for Hadoop
　　----------------------------------------------------------------------------------
　　Requirements:先决条件
　　* Unix System       （这里采用社区版Linux CentOS 6.4版本 64位）
　　* JDK 1.6+          （JDK 1.6以上）
　　* Maven 3.0 or later （建议最好采用 3.0.5版本）
　　* Findbugs 1.3.9 (if running findbugs)
　　* ProtocolBuffer 2.5.0
　　* CMake 2.6 or newer (if compiling native code)    （编译本地库）
　　* Internet connection for first build (to fetch allMaven and Hadoop dependencies) （联网下载依赖包）
　　----------------------------------------------------------------------------------
　　Maven main modules:
　　hadoop                         (Main Hadoopproject)
　　-hadoop-project          (Parent POM forall Hadoop Maven modules.          )
　　(Allplugins & dependencies versions are defined here.)
　　-hadoop-project-dist    (Parent POM formodules that generate distributions.)
　　-hadoop-annotations    (Generates theHadoop doclet used to generated the Javadocs)
　　-hadoop-assemblies       (Mavenassemblies used by the different modules)
　　-hadoop-common-project (Hadoop Common)
　　-hadoop-hdfs-project    (Hadoop HDFS)
　　-hadoop-mapreduce-project (Hadoop MapReduce)
　　-hadoop-tools          (Hadoop toolslike Streaming, Distcp, etc.)
　　-hadoop-dist             (Hadoopdistribution assembler)
　　----------------------------------------------------------------------------------
　　Where to run Maven from?
　　It can berun from any module. The only catch is that if not run from utrunk  all modules that are not part of the buildrun must be installed in the local  Mavencache or available in a Maven repository.
　　----------------------------------------------------------------------------------
　　Maven build goals:
　　* Clean                   : mvn clean
　　*Compile                : mvn compile[-Pnative]
　　* Runtests                : mvn test[-Pnative]
　　* CreateJAR             : mvn package
　　* Runfindbugs             : mvn compilefindbugs:findbugs
　　* Runcheckstyle          : mvn compilecheckstyle:checkstyle
　　* InstallJAR in M2 cache : mvn install
　　* Deploy JARto Maven repo  : mvn deploy
　　* Runclover             : mvn test -Pclover[-DcloverLicenseLocation=${user.name}/.clover.license]
　　* RunRat                : mvnapache-rat:check
　　* Buildjavadocs          : mvn javadoc:javadoc
　　* Builddistribution       : mvn package[-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar]
　　* Change Hadoopversion    : mvn versions:set-DnewVersion=NEWVERSION
　　Buildoptions:
　　* Use-Pnative to compile/bundle native code
　　* Use-Pdocs to generate & bundle the documentation in the distribution (using-Pdist)
　　* Use -Psrcto create a project source TAR.GZ
　　* Use -Dtarto create a TAR with the distribution (using -Pdist)
　　Snappybuild options:
　　Snappy isa compression library that can be utilized by the native code. It is currentlyan optional component, meaning that Hadoop can be built with  or without this dependency.
　　* Use-Drequire.snappy to fail the build if libsnappy.so is not found. If this optionis not specified and the snappy library is missing, we silently build a version of libhadoop.sothat cannot make use of snappy.  Thisoption is recommended if you plan on making use of snappy and want  to get more repeatable builds.
　　* Use-Dsnappy.prefix to specify a nonstandard location for the libsnappy headerfiles and library files. You do not need this option if you have installedsnappy using a package manager.
　　* Use-Dsnappy.lib to specify a nonstandard location for the libsnappy library files. Similarly to nappy.prefix, you do not need this option if you have  installed snappy using a package manager.
　　* Use-Dbundle.snappy to copy the contents of the snappy.lib directory into the finaltar file. This option requires that -Dsnappy.lib is also given, and it ignoresthe -Dsnappy.prefix option.
　　---------------------------------------------------------------------------------
　　Building components separately
　　If you are building a submodule directory, all thehadoop dependencies this submodule has will be resolved as all other 3rd partydependencies. This is,from the Maven cache or from a Maven repository (if notavailable in the cache or the SNAPSHOT 'timed out').
　　An>mvn install -DskipTests' from Hadoop source top levelonce; and then work from the submodule. Keep in mind that SNAPSHOTs time outafter a while, using the Maven '-nsu' will stop Maven from trying to updateSNAPSHOTs from external repos.
　　----------------------------------------------------------------------------------
　　Protocol Buffer compiler
　　The version of Protocol Buffer compiler, protoc,must match the version of the protobuf JAR.
　　If you have multiple versions of protoc in yoursystem, you can set in your build shell the HADOOP_PROTOC_PATH environmentvariable to point to the one you want to use for the Hadoop build. If you don'tdefine this environment variable,protoc is looked up in the PATH.
　　----------------------------------------------------------------------------------
　　Importing projects to eclipse
　　When you import the project to eclipse, installhadoop-maven-plugins at first.
　　$ cdhadoop-maven-plugins
　　$ mvninstall
　　Then, generate eclipse project files.
　　$ mvneclipse:eclipse -DskipTests
　　At last, import to eclipse by specifying the rootdirectory of the project via
　　[File] > [Import] > [Existing Projects intoWorkspace].
　　----------------------------------------------------------------------------------
　　Building distributions: （编译发布）
　　Create binary distribution without native codeand without documentation:（二进制源码）
　　$ mvnpackage -Pdist -DskipTests –Dtar
　　Create binary distribution with native code andwith documentation:（二进制源码+本地库+文档）
　　$ mvnpackage -Pdist,native,docs -DskipTests –Dtar
　　Create source distribution:（源码）
　　$ mvnpackage -Psrc –DskipTests
　　Create source and binarydistributions with native code and documentation:（源码+二进制源码+本地库+文档）
　　$ mvnpackage -Pdist,native,docs,src -DskipTests –Dtar
　　Create a local staging version of the website (in/tmp/hadoop-site)
　　$ mvn cleansite; mvn site:stage -DstagingDirectory=/tmp/hadoop-site
　　----------------------------------------------------------------------------------
　　Handling out of memory errors in builds(解决内存溢出问题)
　　If the build process fails with an out of memoryerror, you should be able to fix it by increasing the memory used by maven-which can be done via the environment variable MAVEN_OPTS.
　　Here is an example setting to allocate between 256and 512 MB of heap space to Maven
　　export MAVEN_OPTS="-Xms256m -Xmx512m"
　　----------------------------------------------------------------------------------
2.4编译步骤
Step1：安装VMware 10 （略）
Step2：安装 Linux操作系统 64bit（略）
　　这里采用社区版CentOS 6.4版本 64位. 下载地址：http://www.centoscn.com/CentosSoft/
Step3：设置Linux联网
　　(1)  设置VMware虚拟机网络模式为：NAT模式
　　(2)  设置Linux操作系统的网络类型为：动态获取DHCP服务器地址，与宿主机共享网络

　　(3) 测试：ping www.baidu.com

Step4：安装JDK
　　说明： JDK版本为1.5以上； 64位编译版本（本环境采用jdk-6u45-linux-x64.bin）
　　(1)使用FTP工具（WinSCP工具或FileZilla）将jdk-6u45-linux-x64.bin上传到Linxu系统/software/目录下
　　(2)安装jdk
　　cd /software/
　　chmod u+x jdk-6u45-linux-x64.bin --授予执行权限
　　mkdir /workDir                      --创建一个软件安装目录(个人习惯而已)
　　cp jdk-6u45-linux-x64.bin /workDir  --复制到workDir目录
　　./ jdk-6u45-linux-x64.bin       --执行自解压文件
　　mv jdk1.6.0_45 jdk6u45          --方便起见，对文件夹重命名
　　(3)配置环境变量
　　Vi /etc/profile
　　增加如下配置：
　　export JAVA_HOME=/workDir/jdk6u45
　　export PATH=.:$PATH:$JAVA_HOME/bin
　　(1)  使环境变量生效
　　source /etc/profile
　　(5)验证jdk是否安装成功
　　java –verson
Step5：安装依赖包
　　yum install autoconf -y
　　yum install automake -y
　　yum install libtool -y
　　yum install cmake -y
　　yum installncurses-devel -y
　　yum installopenssl-devel -y
　　yum installgcc -y
　　yum install gcc-c++ -y
　　yum install lzo-devel -y
　　yum installzlib-devel -y
　　说明：-y 代表在安装过程中提示选择默认为“yes”
　　验证：
　　rpm –qa | grep autoconf
　　【yum命令简介】：
　　yum（全称为 Yellow dog Updater, Modified）是一个在Fedora和RedHat以及SUSE中的Shell前端软件包管理器。基於RPM包管理，能够从指定的服务器自动下载RPM包并且安装，可以自动处理依赖性关系，并且一次安装所有依赖的软体包，无须繁琐地一次次下载、安装。yum提供了查找、安装、删除某一个、一组甚至全部软件包的命令，而且命令简洁而又好记。
　　yum的命令形式一般是如下：yum [options] [command] [package...]
　　其中的[options]是可选的，选项包括-h（帮助），-y（当安装过程提示选择全部为"yes"），-q（不显示安装的过程）等等。[command]为所要进行的操作，[package ...]是操作的对象。
　　- 部分常用的命令包括：
　　自动搜索最快镜像插件： yum install yum-fastestmirror
　　安装yum图形窗口插件：    yum install yumex
　　查看可能批量安装的列表：  yum grouplist
　　- 安装
　　yuminstall 全部安装
　　yuminstall package1 安装指定的安装包package1
　　yumgroupinsall group1 安装程序组group1
Step6：安装Maven
　　(1)  Maven 版本下载apache-maven-3.0.5-bin.tar.gz
　　说明：不要使用最新的Maven 3.1.1，Hadoop2.2.0的源码与Maven3.x存在兼容性问题，所以会出现
　　java.lang.NoClassDefFoundError:org/sonatype/aether/graph/DependencyFilter
　　建议使用Maven3.0.5版本
　　(2)  下载
　　地址： http://maven.apache.org/download.cgi
　　选择 apache-maven-3.0.5-bin.tar.gz下载
　　(3)  上传到Linux并解压到安装目录
　　tar –zxvf apache-maven-3.0.5-bin.tar.gz –C/workDir
　　(4)  设置环境变量
　　vi/etc/profile
　　新增：
　　exportMAVEN_HOME=/workDir/apache-maven-3.0.5
　　exportPATH=$PATH:$MAVEN_HOME/bin
　　执行命令：source /etc/profile 或者 .  /etc/profile
　　验证：
　　mvn-v
Step7：配置Maven国内镜像
　　(1)  编辑 settings.xml文件
　　进入安装目录 /workDir/apache-maven-3.0.5/conf
　　* 修改内容：
　　
　　nexus-osc
　　*
　　Nexusosc
　　http://maven.oschina.net/content/groups/public/
　　
　　* 修改内容：
　　
　　jdk-1.6
　　
　　1.6
　　
　　
　　
　　nexus
　　localprivate nexus
　　http://maven.oschina.net/content/groups/public/
　　
　　true
　　
　　
　　false
　　
　　
　　
　　
　　
　　nexus
　　localprivate nexus
　　http://maven.oschina.net/content/groups/public/
　　
　　true
　　
　　
　　false
　　
　　
　　
　　
　　(2)  复制配置
　　说明：将settings.xml文件复制到用户目录，使得每次对maven创建时，都采用该配置
　　cd /home/Hadoop --*查看用户目录【/home/hadoop】是否存在【.m2】文件夹，如没有，则创建
　　mkdir .m2
　　cp /workDir/apache-maven-3.0.5/conf/settings.xml~/.m2 --复制文件
　　(3)  配置DNS
　　vi /etc/resolv.conf
　　修改如下：
　　nameserver 8.8.8.8
　　nameserver 8.8.4.4
Step8：安装protobuf
　　(1)  下载protobuf-2.5.0.tar.gz
　　https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
　　(2)  解压到安装目录
　　cd /software
　　tar-zxvf protobuf-2.5.0.tar.gz –C /wrokDir
　　(3)  安装下面3个依赖包(如果已经安装可以跳过)
　　yuminstall gcc -y
　　yuminstall gcc-c++ -y
　　yuminstall make  -y
　　【说明】：如果缺少这个3个依赖包，会报下面的错误：
　　ERROR]Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.2.0:protoc(compile-protoc) on project hadoop-common:org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did notreturn a version -> [Help 1]
　　[ERROR]
　　[ERROR]To see the full stack trace of the errors, re-run Maven with the -eswitch.
　　[ERROR]Re-run Maven using the -X switch to enable full debug logging.
　　[ERROR]
　　[ERROR]For more information about the errors and possible solutions, please read thefollowing articles:
　　[ERROR][Help 1]http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
　　[ERROR]
　　[ERROR]After correcting the problems, you can resume the build with the command
　　[ERROR] mvn  -rf :hadoop-common
　　(4)  编译安装,执行配置文件
　　进入安装目录，执行configure文件
　　cd/workDir/protobuf-2.5.0    --进入安装目录
　　./configure                   --执行配置文件
　　(5)  安装
　　make& make check & make install
　　说明：安装protobuf需要安装gcc gcc-c++系统包（如果之前安装的话就不用再安装）
　　(6)  配置环境变量
　　vi /etc/profile
　　新增：
　　export PROTOBUF_HOME=/workDir/ protobuf-2.5.0
　　export PATH=$PATH:$PROTOBUF_HOME/bin
　　使配置生效：
　　source /etc/profile 或者  .  /etc/profile
　　验证：
　　protoc --version
Step9：安装findbugs-3.0.0
　　(1)  下载：findbugs-3.0.0.tar.gz
　　http://sourceforge.jp/projects/sfnet_findbugs/releases/
　　(2)  解压到安装目录
　　cd /software
　　tar -zxvf findbugs-3.0.0.tar.gz-C /workDir
　　(3)  设置环境变量
　　vi/etc/profile
　　增加如下内容：
　　exportFINDBUGS_HOME=/wrokDir/findbugs-3.0.0
　　exportPATH=$PATH:$FINDBUGS_HOME/bin
　　(4)  使环境变量生效
　　source/etc/profile 或者  ./etc/profile
　　(5)  验证
　　findbugs-version
　　重要说明：
　　如果出现以下错误，说明jdk版本不兼容导致。findbugs-2.5.0和findbugs3.0.0是在jdk7以上编译的，所以需要在Linux上安装jdk7才可以。
　　错误提示：

Step10：编译hadoop-src-2.2.0源码
　　(1)  下载：hadoop-2.2.0-src.tar.gz
　　http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz
　　(2)  解压到安装目录
　　cd/software
　　tar-zxvf  hadoop-2.2.0-src.tar.gz –C/workDir
　　(3)  源码包打Patch
　　- 重要说明：hadoop-2.2.0版本的源码存在bug，在apache官方JIRA上有说明：
　　JIRA地址：https://issues.apache.org/jira/browse/HADOOP-10110
　　- Bug修复办法：
　　Index: hadoop-common-project/hadoop-auth/pom.xml
　　===================================================================
　　--- hadoop-common-project/hadoop-auth/pom.xml  (revision 1543124)
　　+++ hadoop-common-project/hadoop-auth/pom.xml  (working copy)
　　@@ -54,6 +54,11 @@
　　
　　
　　org.mortbay.jetty
　　+    jetty-util
　　+    test
　　+
　　+
　　+    org.mortbay.jetty
　　jetty
　　test
　　
　　从上面官方的bug修复说明中可以看到，需要编辑目录$HADOOP_SRC_HOME/hadoop-common-project/hadoop-auth中的pom.xml文件，在第55行下增加以下内容：
　　
　　org.mortbay.jetty
　　jetty-util
　　test
　　
　　否则会报下面的错误：
　　[ERROR]Failed to execute goalorg.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile(default-testCompile) on project hadoop-auth: Compilation failure: Compilationfailure:
　　[ERROR]/home/chuan/trunk/hadoop-common-project/hadoop-auth/src/test/java/org/apache/hadoop/security/authentication/client/AuthenticatorTestCase.java:[84,13]cannot access org.mortbay.component.AbstractLifeCycle
　　[ERROR]class file for org.mortbay.component.AbstractLifeCycle not found
　　(4)  编译
　　官方编译说明：
　　Createsource and binary distributions with native code and documentation:（源码+二进制源码+本地库+文档）
　　$ mvnpackage -Pdist,native,docs,src -DskipTests –Dtar
　　cd/wrokDir/Hadoop-2.2.0-src
　　mvnpackage -DskipTests -Pdist,native -Dtar
　　说明：如果在编译过程中出现内存溢出的情况时，可以调整一下内存大小
　　export MAVEN_OPTS="-Xms256m -Xmx512m"
　　这个过程时间比较久，需要上网下载依赖包……
　　直到看到下面的信息，说明编译成功：
　　[INFO]------------------------------------------------------------------------
　　[INFO]BUILD SUCCESS
　　[INFO]------------------------------------------------------------------------
　　[INFO]Total time: 11:53.144s
　　[INFO]Finished at: Fri Nov 22 16:58:32 CST 2013
　　[INFO]Final Memory: 70M/239M
　　[INFO]------------------------------------------------------------------------
Step11：编译后说明
　　1. 查看编译后的文件
　　编译后的路径在:hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0
　　cd /workDir/ hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0
　　ll --查看编译好的目录
　　编译后hadoop-2.2.0目录下的目录：
　　drwxr-xr-x. 2 root root 4096 Aug 11 12:00 bin
　　drwxr-xr-x. 3 root root 4096 Aug 11 12:00 etc
　　drwxr-xr-x. 2 root root 4096 Aug 11 12:00 include
　　drwxr-xr-x. 3 root root 4096 Aug 11 12:00 lib
　　drwxr-xr-x. 2 root root 4096 Aug 11 12:00 libexec
　　drwxr-xr-x. 2 root root 4096 Aug 11 12:00 sbin
　　drwxr-xr-x. 4 root root 4096 Aug 11 12:00 share
　　进入 bin目录，执行hadoop命令查看脚本
　　cd bin
　　./Hadoop version
　　可以看到所有版本：
　　[root@localhost bin]# ./hadoop version
　　Hadoop 2.2.0
　　Subversion Unknown -r Unknown
　　Compiled by root on 2014-08-11T18:34Z
　　Compiled with protoc 2.5.0
　　From source with checksum79e53ce7994d1628b240f09af91e1af4
　　This command was run using /workDir/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/share/hadoop/common/
　　hadoop-common-2.2.0.jar
　　2. 查看本地库编译版本
　　cd /workDir/ hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0
　　file lib//native/*
　　可以看到是64位的版本了（红色字部分）：
　　[root@localhost hadoop-2.2.0]# file lib//native/*
　　lib//native/libhadoop.a:       current ar archive
　　lib//native/libhadooppipes.a: current ar archive
　　lib//native/libhadoop.so:    symbolic link to `libhadoop.so.1.0.0'
　　lib//native/libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64, version 1(SYSV), dynamically linked, not stripped
　　lib//native/libhadooputils.a: current ar archive
　　lib//native/libhdfs.a:       current ar archive
　　lib//native/libhdfs.so:       symbolic link to `libhdfs.so.0.0.0'
　　lib//native/libhdfs.so.0.0.0: ELF 64-bit LSBshared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
　　至此，编译成功！

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] 云帆大数据学院_hadoop 2.2.0源码编译

浏览过的版块

扫码加入运维网微信交流群