[toughhou@hd1 hadoop]$ cd /opt/hadoop-2.4.0/etc/hadoop
1) hadoop-env.sh、yarn-env.sh 设置JAVA_HOME环境变量
最开始以为已经在/etc/profile设置了JAVA_HOME,所以在hadoop-env.sh和yarn-env.sh中已经能成功获取到JAVA_HOME,所以就不用再设置了。最终发现这在hadoop-2.4.0中行不通,start-all.sh的时候出错了(hd1: Error: JAVA_HOME is not set and could not be found.)。
找到里面的JAVA_HOME,修改为实际路径
2) slaves
这个文件配置所有datanode节点,以便namenode搜索
[toughhou@hd1 hadoop]$ vi slaves
hd2
hd3
3) core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hd1:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/temp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>hd1</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
如果“Exiting with status 0”,就说明OK。
14/07/23 03:26:33 INFO util.ExitUtil: Exiting with status 0
4. 启动集群
[toughhou@hd1 sbin]$ cd /opt/hadoop-2.4.0/sbin
[toughhou@hd1 sbin]$ ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hd1]
hd1: namenode running as process 12580. Stop it first.
hd2: starting datanode, logging to /opt/hadoop-2.4.0/logs/hadoop-toughhou-datanode-hd2.out
hd3: starting datanode, logging to /opt/hadoop-2.4.0/logs/hadoop-toughhou-datanode-hd3.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: secondarynamenode running as process 12750. Stop it first.
starting yarn daemons
resourcemanager running as process 11900. Stop it first.
hd3: starting nodemanager, logging to /opt/hadoop-2.4.0/logs/yarn-toughhou-nodemanager-hd3.out
hd2: starting nodemanager, logging to /opt/hadoop-2.4.0/logs/yarn-toughhou-nodemanager-hd2.out
5. 查看各节点的状态
3) 运行wordcount
[toughhou@hd1 ~]$ cd /opt/hadoop-2.4.0/share/hadoop/mapreduce/
[toughhou@hd2 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.4.0.jar wordcount /in/wordcount /out/out1
14/07/23 10:42:36 INFO client.RMProxy: Connecting to ResourceManager at hd1/192.168.0.101:18040
14/07/23 10:42:38 INFO input.FileInputFormat: Total input paths to process : 2
14/07/23 10:42:38 INFO mapreduce.JobSubmitter: number of splits:2
14/07/23 10:42:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406105556378_0003
14/07/23 10:42:38 INFO impl.YarnClientImpl: Submitted application application_1406105556378_0003
14/07/23 10:42:38 INFO mapreduce.Job: The url to track the job: http://hd1:8088/proxy/application_1406105556378_0003/
14/07/23 10:42:38 INFO mapreduce.Job: Running job: job_1406105556378_0003
14/07/23 10:42:46 INFO mapreduce.Job: Job job_1406105556378_0003 running in uber mode : false
14/07/23 10:42:46 INFO mapreduce.Job: map 0% reduce 0%
14/07/23 10:42:55 INFO mapreduce.Job: map 100% reduce 0%
14/07/23 10:43:01 INFO mapreduce.Job: map 100% reduce 100%
4) 查看运行结果
[toughhou@hd2 mapreduce]$ hadoop fs -cat /out/out4/part-r-00000
, 1
China 1
China, 1
Hello 3
How 1
I 1
Shanghai 1
World 1
are 1
love 1
you 1
到此,全部结束。整个hadoop-2.4.0集群搭建过程全部结束。