export JAVA_HOME=/usr/lib/jvm/java-8-oracle
7.Hadoop配置
Hadoop的配置这里主要涉及四个配置文件:etc/hadoop/core-site.xml,etc/hadoop/hdfs-site.xml, etc/hadoop/yarn-site.xml and etc/hadoop/mapred-site.xml.
这里摘录网络上的一段话,在继续下面的操作之前一定要阅读这段,以便更好的理解:
Hadoop Distributed File System: A distributed file system that provides high-throughput access to application data. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data. If you compare HDFS to a traditional storage structures ( e.g. FAT, NTFS), then NameNode is analogous to a Directory Node structure, and DataNode is analogous to actual file storage blocks.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
dfs.namenode.name.dir
/data/hduser/hdfs/namenode
Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.datanode.data.dir
/data/hduser/hdfs/datanode
Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.
⑤更新slave文件
在master节点上修改slave文件,添加master和slave节点的主机名或者ip地址,并去掉"localhost":
$ vim /home/hduser/hadoop/etc/hadoop/slaves
master
slave-1
slave-2
⑥格式化namenode:
在启动cluster之前需要先格式化namenode,在master上执行:
$ hdfs namenode -format 看到类似提示INFO:"Storage directory /home/hduser/data/hduser/hdfs/namenode has been successfully formatted."表示格式化成功。
⑦启动服务
可以直接使用Hadoop提供的脚本"start-all.sh"启动所有服务,也可以把dfs和yarn分别启动。可以使用绝对路径:/home/hduser/hadoop/sbin/start-all.sh,也可以直接调用start-all.sh脚本(因为前面已经改过PATH的路径):
$ start-all.sh 如下图所示没有看到任何错误信息,表示集群已成功启动:
⑧验证查看
使用jps命令分别在master和slave上查看启动的服务
网页验证:
浏览器打开:http://master:50070
查看yarn web console: http://master:8088/cluster/nodes
如果所有node均正常启动,这里能够全部显示:
Hadoop解压的share目录里给我们提供了几个example的jar包,我们执行一个看下效果:
$ hadoop jar /home/hduser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 30 100 执行之后通过浏览器访问:http://master:8088/cluster/apps
能够看到当前执行的任务: