1. 部署环境 系统: CentOS 6.3 需要安装jdk. 关闭iptables和selinux 1
2
3
4
| /etc/init.d/iptables stop
chkconfig iptables off
sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
setenforce 0
|
2. SSH配置1
2
3
4
5
6
7
8
9
| useradd hadoop
echo 123456 | passwd --stdin hadoop
su - hadoop
ssh-keygen -t rsa #生成密钥对
ssh-copy-id user@ip #将ssh公钥copy到指定的主机
cd .ssh #每台服务器本机也需要配置ssh免密码登录
cat id_rsa.pub >> authorized_keys
|
3. 部署hadoop
修改配置涉及到的配置文件有7个: etc/hadoop/hadoop-env.sh etc/hadoop/yarn-env.sh etc/hadoop/slaves etc/hadoop/core-site.xml etc/hadoop/hdfs-site.xml etc/hadoop/mapred-site.xml etc/hadoop/yarn-site.xml 以上个别文件默认不存在的, 可以复制相应的template文件获得
1. etc/hadoop/hadoop-env.sh1
| export JAVA_HOME=/usr/java/jdk1.7.0_67
|
2. etc/hadoop/yarn-env.sh1
| export JAVA_HOME=/usr/java/jdk1.7.0_67
|
3. etc/hadoop/slavessecondarynamenode和master分离的时候在masters文件中记录
4. etc/hadoop/core-site.xml1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
| <configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9200</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop-2.5.0/tmp</value> #设置hadoop临时目录
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
</configuration>
|
5. etc/hadoop/hdfs-site.xml1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
| <configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop1:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop-2.5.0/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop-2.5.0/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
|
6. etc/hadoop/mapred-site.xml1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| <configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop1:19888</value>
</property>
</configuration>
|
7. etc/hadoop/yarn-site.xml1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
| <configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop1:8088</value>
</property>
</configuration>
|
4. 启动集群并检验将hadoop目录分发到各节点 1
| scp -r /home/hadoop/hadoop-2.5.0 ip:/home/hadoop
|
格式化namenode 1
| ./bin/hdfs namenode -format
|
启动hdfs
此时在hadoop1上面运行的进程有: NameNode SecondaryNameNode hadoop2和hadoop3上面运行的进程有: DataNode 启动yarn
此时在hadoop1上面运行的进程有: NameNode SecondaryNameNode ResourceManager hadoop2和hadoop3上面运行的进程有: DataNode NodeManager 通过jps可以查看进程, 以下是通过oracle 安装的rpm的jdk. 1
| /usr/java/jdk1.7.0_67/bin/jps
|
控制台报错:1
| WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
|
通过配置DEBUG变量,可以查看错误细节, 1
| export HADOOP_ROOT_LOGGER=DEBUG,console
|
可以看到所依赖的GLIBC版本不符合要求... 1
| DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: /home/hadoop/hadoop-2.5.0/lib/native/libhadoop.so.1.0.0: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/hadoop/hadoop-2.5.0/lib/native/libhadoop.so.1.0.0)
|
升级GLIBC....
1
2
3
4
5
6
7
8
| tar xf glibc-2.14.tar.gz
cd glibc-2.14
tar xf ../glibc-linuxthreads-2.5.tar.bz2
cd ..
export CFLAGS="-g -O2"
./glibc-2.14/configure --prefix=/usr --disable-profile --enable-add-ons --with-headers=/usr/include --with-binutils=/usr/bin
make
make install
|
安装编译过程中需要注意三点: 1. 要将glic-linuxthreads解压到glibc目录下 2. 不能在glibc当前目录下运行configure 3. 加上优化开关, export CFLAGS="-g -O2", 否则会出现错误..
如果安装的hadoop本地库是32位而系统是64位的:重新编译hadoop... 暂时可以解决的办法, 使用以下的环境变量.... ,让hadoop找不到本地库,,,会使用java的标准库.. 1
2
| export HADOOP_COMMON_LIB_NATIVE_DIR=/home/hadoop/hadoop-2.2.0/lib/native
export HADOOP_OPTS="-D java.library.path=/home/hadoop/hadoop-2.2.0/lib"
|
总结:安装JDK 编辑hosts文件 关闭防火墙和selinux 部署免密码ssh 下载hadoop 2.5 并解压 修改配置文件 分发hadoop到各个节点 启动集群
|