<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop/tmp/hadoop-${user.name}</value>
<description>A base for other temporarydirectories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8010</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
备注:配置了/home/hduser/hadoop/tmp/这个目录,必须执行mkdir /home/hduser/hadoop/tmp/创建它,否则后面运行会报错。
编辑/home/hduser/hadoop/etc/hadoop/mapred-site.xml:
(1) mv /home/hduser/hadoop/etc/hadoop/mapred-site.xml.template/home/hduser/hadoop/etc/hadoop/mapred-site.xml
(2) 在<configuration>中添加如下:
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", thenjobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>10</value>
<description>As a rule of thumb, use 10x the number of slaves(i.e., number of tasktrackers).
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
<description>As a rule of thumb, use 2x the number of slaveprocessors (i.e., number of tasktrackers).
</description>
</property>
注意本例中没有使用yarn框架运行MR,使用yarn配置见:http://liyonghui160com.iyunv.com/admin/blogs/2111134
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file iscreated.
The default is used if replication is not specified in create time.
</description>
</property> 五:运行Hadoop
在初次运行Hadoop的时候需要初始化Hadoop文件系统,命令如下:
$cd /home/hduser/hadoop/bin
$./hdfs namenode -format
如果执行成功,你会在日志中(倒数几行)找到如下成功的提示信息:
common.Storage: Storage directory/home/hduser/hadoop/tmp/hadoop-hduser/dfs/name has been successfully formatted.
运行命令如下:
$cd /home/hduser/hadoop/sbin/
$./start-dfs.sh
注:该过程需要多次输入密码, 如果不想多次输入密码,可先用ssh建立信任。
hduser@ubuntu:~/hadoop/sbin$ jps
4266 SecondaryNameNode
4116 DataNode
4002 NameNode
注:用jps查看启动了三个进程。
$./start-yarn.sh
hduser@ubuntu:~/hadoop/sbin$ jps
4688 NodeManager
4266 SecondaryNameNode
4116 DataNode
4002 NameNode
4413 ResourceManager