export> export PATH=$PATH:$JAVA_HOME/bin
[root@hadoop ~]# source /etc/profile
[root@hadoop ~]# source /etc/bashrc
[root@hadoop ~]# java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
[root@hadoop ~]#
注意,要想让我们的配置马上生效,应该执行source /etc/profile和source /etc/bashrc来立即读取更新后的配置信息。
3.2安装Hadoop
将前文提及的Hadoop安装包下载至/root目录下。
通过如下命名安装Hadoop:
[root@hadoop ~]# cd /root
[root@hadoop ~]# tar -zxf /root/hadoop-2.9.0.tar.gz -C /usr/local
这样一来,Hadoop2.9.0就安装在/usr/local/hadoop-2.9.0目录下,我们可以通过如下命令查看一下:
[root@hadoop ~]# /usr/local/hadoop-2.9.0/bin/hadoop version
Hadoop 2.9.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 756ebc8394e473ac25feac05fa493f6d612e6c50
Compiled by arsuresh on 2017-11-13T23:15Z
Compiled with protoc 2.5.0
From source with checksum 0a76a9a32a5257331741f8d5932f183
This command was run using /usr/local/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar
[root@hadoop ~]#
毕竟每次执行hadoop的时候带着这么一长串命令不是很方便,尤其是需要手动输入的时候,我们依然可以借鉴配置JAVA环境参数的方式将Hadoop相关的环境参数配置到环境变量配置文件source /etc/profile和source /etc/bashrc中,在两个文件的末尾分别增加如下配置:
export HADOOP_HOME=/usr/local/hadoop-2.9.0
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
添加完内容保存,最后别忘了执行如下命令刷新环境变量信息:
[root@hadoop ~]# source /etc/profile
[root@hadoop ~]# source /etc/bashrc
这时候再执行hadoop的相关命令就无需带路径信息了,如下:
[root@hadoop ~]# hadoop version
Hadoop 2.9.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 756ebc8394e473ac25feac05fa493f6d612e6c50
Compiled by arsuresh on 2017-11-13T23:15Z
Compiled with protoc 2.5.0
From source with checksum 0a76a9a32a5257331741f8d5932f183
This command was run using /usr/local/hadoop-2.9.0/share/hadoop/common/hadoop-common-2.9.0.jar
4.配置
4.1用户配置
为便于管理和维护,我们单独创建一个系统账户用来运行hadoop有关的脚本和任务,这个系统账户名为hadoop,通过以下脚本创建:
useradd hadoop -s /bin/bash –m
上面的命令创建了名为hadoop的用户和用户组,并且nginx用户无法登录系统(-s /sbin/nologin限制),可以通过id命令查看:
[root@hadoop ~]#> uid=1001(hadoop) gid=1001(hadoop) groups=1001(hadoop)
通过上面的命令创建的用户是没有密码的,需要用passwd来设置密码:
[root@hadoop ~]# passwd hadoop
Changing password for user hadoop.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
因为有时候需要hadoop这个用户执行一些高权限的命令,因此给予它sudo的权限,打开/etc/sudoers文件,找到“root ALL=(ALL) ALL”那一行,在下面添加一行:
hadoop ALL=(ALL) ALL
然后保存文件(记住如果是用vim等编辑器编辑,最终保存的时候要使用”:wq!”命令来强制保存到这个只读文件)。修改的结果如下图所示:
4.2免登录配置
虽然在本篇讲述的是Hadoop的伪分布式部署,但是中间还有一些分布式的操作,这就要求能够用ssh来登录,注意这里的ssh不是Java里面的SSH(Servlet+Spring+Hibernate),这里讲的SSH是Secure Shell 的缩写,是用于Linux服务器之间远程登录的服务协议。
如果当前是用root用户登录,那么就要切换为hadoop用户:
[root@hadoop hadoop]# su hadoop
[hadoop@hadoop ~]$ cd ~
[hadoop@hadoop ~]$ pwd
/home/hadoop
可以看出hadoop用户的工作路径为/home/hadoop,然后我们用ssh登录本机,第一次登录的时候会提示是否继续登录,然后输入”yes”,接着会提示我们输入当前用于ssh登录的用户(这里是hadoop)的在对应服务器上的密码(这里是localhost),输入正确密码后就可以登录,然后在输入”exit”退出登录,如下所示:
[hadoop@hadoop ~]$ ssh localhost
hadoop@localhost's password:
Last login: Sat Dec 2 11:48:52 2017 from localhost
[hadoop@hadoop ~]$ rm -rf /home/hadoop/.ssh
[hadoop@hadoop ~]$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is aa:21:ce:7a:b2:06:3e:ff:3f:3e:cc:dd:40:38:64:9d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
hadoop@localhost's password:
Last login: Sat Dec 2 11:49:58 2017 from localhost
[hadoop@hadoop ~]$ exit
logout
Connection to localhost closed.
经过上述操作后,创建了这个目录/home/hadoop/.ssh和该目录下的known_hosts文件。
这样每次登录都会提示输入密码,但在Hadoop运行过程中会通过shell无交互的形式在远程服务器上执行命令,因此需要设置成免密码登录才行。我们需要通过通过如下命令创建密钥文件(一路回车即可):
[hadoop@hadoop ~]$ cd /home/hadoop/.ssh/
[hadoop@hadoop .ssh]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your> Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
19:b3:11:a5:6b:a3:26:03:c9:b9:b3:b8:02:ea:c9:25 hadoop@hadoop
The key's randomart image is:
+--[ RSA 2048]----+
| ... |
| o |
| = |
| . o B |
| = S |
|. o o . |
|oEo.o o |
|+.+o + |
|==. |
+-----------------+
然后将密钥文件的内容添加到authorized_keys文件,同时授予600的权限。
[hadoop@hadoop .ssh]$ cat> [hadoop@hadoop .ssh]$ chmod 600 authorized_keys
这时,再使用ssh localhost命令就不需要输入密码了,如下:
[hadoop@hadoop .ssh]$ ssh localhost
Last login: Sat Dec 2 11:50:44 2017 from localhost
[hadoop@hadoop ~]$ exit
logout
Connection to localhost closed.
注意:在本系列的第9篇关于git用户免密码登录时也讲到了类似的操作,而且当时也讲了git文件传输也是使用ssh协议的。
4.3hadoop的配置
4.3.1更改hadoop安装目录的所有者
首先检查一下/usr/local/hadoop-2.9.0这个hadoop的安装目录的所有者和用户组是否是hadoop,如果不是就需要通过chown来设置:
[hadoop@hadoop .ssh]$ ls -lh /usr/local/hadoop-2.9.0
total 128K
drwxr-xr-x. 2 root root 194 Nov 14 07:28 bin
drwxr-xr-x. 3 root root 20 Nov 14 07:28 etc
drwxr-xr-x. 2 root root 106 Nov 14 07:28 include
drwxr-xr-x. 3 root root 20 Nov 14 07:28 lib
drwxr-xr-x. 2 root root 239 Nov 14 07:28 libexec
-rw-r--r--. 1 root root 104K Nov 14 07:28 LICENSE.txt
-rw-r--r--. 1 root root 16K Nov 14 07:28 NOTICE.txt
-rw-r--r--. 1 root root 1.4K Nov 14 07:28 README.txt
drwxr-xr-x. 3 root root 4.0K Nov 14 07:28 sbin
drwxr-xr-x. 4 root root 31 Nov 14 07:28 share
下面是更改owner和group的命令:
[hadoop@hadoop .ssh]$ sudo chown -R hadoop:hadoop /usr/local/hadoop-2.9.0
We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:
#1) Respect the privacy of others.
#2) Think before you type.
#3) With great power comes great responsibility.
[sudo] password for hadoop:
再次查看,就可以看到命令执行成功了。
[hadoop@hadoop .ssh]$ ls -lh /usr/local/hadoop-2.9.0
total 128K
drwxr-xr-x. 2 hadoop hadoop 194 Nov 14 07:28 bin
drwxr-xr-x. 3 hadoop hadoop 20 Nov 14 07:28 etc
drwxr-xr-x. 2 hadoop hadoop 106 Nov 14 07:28 include
drwxr-xr-x. 3 hadoop hadoop 20 Nov 14 07:28 lib
drwxr-xr-x. 2 hadoop hadoop 239 Nov 14 07:28 libexec
-rw-r--r--. 1 hadoop hadoop 104K Nov 14 07:28 LICENSE.txt
-rw-r--r--. 1 hadoop hadoop 16K Nov 14 07:28 NOTICE.txt
-rw-r--r--. 1 hadoop hadoop 1.4K Nov 14 07:28 README.txt
drwxr-xr-x. 3 hadoop hadoop 4.0K Nov 14 07:28 sbin
drwxr-xr-x. 4 hadoop hadoop 31 Nov 14 07:28 share
4.3.2更改hadoop的配置
hadoop的配置文件存放于/usr/local/hadoop-2.9.0/etc/hadoop目录下,主要有几个配置文件:
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
其中,后两个主要是跟YARN有关的配置。
将core-site.xml更改为如下内容:
其中有一句:” INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.”
5.2开启 NameNode 和 DataNode 守护进程
通过start-dfs.sh命令开启NameNode 和 DataNode 守护进,第一次执行时会询问是否连接,输入”yes”即可(因为已经配置了ssh免密码登录),如下所示(请注意一定要用创建的hadoop用户来运行,如果不是hadoop请记得用su hadoop命令来切换到hadoop用户):
[hadoop@hadoop hadoop]$ start-dfs.sh
17/12/02 13:54:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java> Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop-2.9.0/logs/hadoop-hadoop-namenode-hadoop.out
localhost: starting datanode, logging to /usr/local/hadoop-2.9.0/logs/hadoop-hadoop-datanode-hadoop.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is aa:21:ce:7a:b2:06:3e:ff:3f:3e:cc:dd:40:38:64:9d.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.9.0/logs/hadoop-hadoop-secondarynamenode-hadoop.out
17/12/02 13:54:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java> 然后我们可以通过jps这个java提供的工具来查看启动情况:
[hadoop@hadoop hadoop]$ jps
11441 Jps
11203 SecondaryNameNode
10903 NameNode
11004 DataNode
若启动成功会出现上述的进程,如果没有NameNode和DataNode进程请检查配置情况,或者通过/usr/local/hadoop-2.9.0/logs下的日志来查看配置错误。
这时可以在浏览器中输入http://localhost:50070/查看NameNode和DataNode的信息以及HDFS的信息,界面如下: