但是由于上述目录中hadoop的配置文件和hadoop的安装目录是放在一起的,这样一旦日后升级hadoop版本的时候所有的配置文件都会被覆盖,因此建议将配置文件与安装目录分离,一种比较好的方法就是建立一个存放配置文件的目录,/home/dbrg/HadoopInstall/hadoop-config/,然后将/hadoop/conf/目录中的hadoop_site.xml,slaves,hadoop_env.sh三个文件拷贝到hadoop-config/目录中(这个问题很奇怪,在官网上的Getting Started With Hadoop中说是只需要拷贝这个三个文件到自己创建的目录就可以了,但我在实际配置的时候发现还必须把masters这个文件也拷贝到hadoop-conf/目录中才行,不然启动Hadoop的时候就会报错说找不到masters这个文件),并指定环境变量$HADOOP_CONF_DIR指向该目录。环境变量在/home/dbrg/.bashrc和/etc/profile中设定。
PasswordAuthentication no
AuthorizedKeyFile .ssh/authorized_keys
至此各个机器上的SSH配置已经完成,可以测试一下了,比如dbrg-1向dbrg-2发起ssh连接
[dbrg@dbrg-1:~]$ssh dbrg-2
如果ssh配置好了,就会出现以下提示信息
The authenticity of host [dbrg-2] can't be established.
Key fingerprint is 1024 5f:a0:0b:65:d3:82:df:ab:44:62:6d:98:9c:fe:e9:52.
Are you sure you want to continue connecting (yes/no)?
fs.default.name
dbrg-1:9000
The name of the default file system. Either the literal string "local" or a host:port for DFS.
mapred.job.tracker
dbrg-1:9001
The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.
hadoop.tmp.dir
/home/dbrg/HadoopInstall/tmp
A base for other temporary directories.
dfs.name.dir
/home/dbrg/HadoopInstall/filesystem/name
Determines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.data.dir
/home/dbrg/HadoopInstall/filesystem/data
Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.
dfs.replication
1
Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.