mingche 发表于 2019-1-30 14:28:49

CentOS64位6.4下Hadoop2.7.1、Mysql5.5.46、Hive1.2.1、Spark1.5.0的集群环境部署



  部署环境:

CentOS 6.4 64bit
Hadoop2.7.1、Mysql5.5、Hive1.2.1、Scala2.11.7、Spark1.5.0
jdk1.7.0_79

主机IP:
master(namenode):10.10.4.115
slave1(datanode):10.10.4.116
slave2(datanode):10.10.4.117

一、环境准备及Hadoop部署:
(1)环境准备(host配置必须一致,否则可能导致错误):
#vi/etc/hosts
127.0.0.1       localhost
10.10.4.115   master.hadoop
10.10.4.116   slave1.hadoop
10.10.4.117   slave2.hadoop

(2)设置主机名:
#vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=master.hadoop

(3)使其hostname即时生效:
可重启系统;
或者使用命令:
#hostname master.hadoop
或者
#echomaster.hadoop > /proc/sys/kernel/hostname


(4)关闭防火墙服务:
#service iptables stop
避免重启系统后iptables启动:
#chkconfig iptables off
确认已关闭
#chkconfig --list|grep iptables                                    
iptables      0:关闭1:关闭2:关闭3:关闭4:关闭5:关闭6:关闭
chkconfig关闭iptables并不能永久关闭iptables,在重启系统后,level3下会自动启动iptables,这时候我们可以将iptables规则清空后保存,间接达到永久关闭的需求:
echo"" > /etc/sysconfig/iptables

(5)关闭selinux
#vi /etc/sysconfig/selinux
修改SELINUX=disabled

(6)卸载默认安装的openjdk(hadoop官方建议使用oracle版本jdk):
检查是否安装):
#rpm -qa|grep openjdk
卸载安装的openjdk:
#rpm -e openjdk

(7)下载并安装oraclejdk1.7.0_79:
#rpm -ivh jdk-7u79-linux-x64.rpm

(8)设置环境变量:
#vi /etc/profile
##############JAVA#############################
export JAVA_HOME=/usr/java/jdk1.7.0_79
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/tools.jar
export JRE_HOME=$JAVA_HOME/jre
export PATH=/usr/local/mysql/bin:$JAVA_HOME/bin:$PATH
################END############################

(9)使其生效:
#source /etc/profile

(10)测试:
#echo $JAVA_HOME
/usr/java/jdk1.7.0_79

(11)创建hadoop用户及群组:
#useradd hadoop -p mangocity

(12)配置master免密码登录slave:
启用免密码登录ssh配置:
#vi /etc/ssh/sshd_config
RSAAuthentication yes # 启用 RSA 认证
PubkeyAuthentication yes # 启用公钥私钥配对认证方式
AuthorizedKeysFile .ssh/authorized_keys # 公钥文件路径(和上面生成的文件同)

(13)设置完之后记得重启SSH服务,才能使刚才设置有效:
#service sshd restart

#######以上设置slave服务器设置类同master##########


(14)在master上切换到hadoop用户,生成公钥:
#su - hadoop

(15)生成公钥:
$ssh-keygen –t rsa –P ''
$more /home/hadoop/.ssh/id_rsa.pub
ssh-rsa AAAB3**************************************************7RbuPw== hadoop@master.hadoop

(16)把id_rsa.pub内容追加到授权的key:
$cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys

(17)验证本地免密码登录:
验证之前确认/home/hadoop/.ssh/authorized_keys的权限为600,即只有hadoop用户拥有rw权限:
$ ll ./.ssh/authorized_keys
-rw-------. 1 hadoop hadoop 402 11月 18 01:53 ./.ssh/authorized_keys
如权限不对,则会导致免密码登录失败;如权限非600,则需要修改其权限为600:
$chmod 600 ~/.ssh/authorized_keys

(18)测试验证:
$ssh 127.0.0.1
The authenticity of host '127.0.0.1 (127.0.0.1)' can't be established.
RSA key fingerprint is 31:45:00:ff:a7:80:23:f2:2a:86:7f:13:0d:bf:f0:d6.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '127.0.0.1' (RSA) to the list of known hosts.
hadoop@127.0.0.1's password:(第一次登录需要输入密码)
$exit
$ssh 127.0.0.1
Last login: Mon Nov 23 09:52:02 2015 from localhost(第二次登录不需要输入密码,说明本地登录验证成功)
$

(19)设置slave服务器ssh免密码登录:
此时slave服务器的hadoop的家目录下是没有.ssh目录和authorized_keys的,需要手动建立:
#su -hadoop
$ls -a
....bash_history.bash_logout.bash_profile.bashrc.gnome2   .viminfo

(20)手动建立目录、修改权限:
$mkdir ./.ssh                                                   
$chmod 700 ./.ssh                                          
                     
(21)手动创建文件、修改文件权限:
$touch./.ssh/authorized_keys   
$chmod 600./.ssh/authorized_keys
$ls -a      
....bash_history.bash_logout.bash_profile.bashrc.gnome2   .ssh.viminfo

(22)将master服务器上hadoop用户建立的公钥文件id_rsa.pub复制到slave服务器hadoop用户目录:
$scp hadoop@master.hadoop:/home/hadoop/id_rsa.pub /home/hadoop/   

(23)将master服务器hadoop用户的公钥id_rsa.pub内容追加到授权的authorized_keys文件中:
$cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys
以上步骤在slave2服务器重复操作一遍,更改/etc/ssh/sshd_config配置并重启sshd服务。

(24)测试验证master免密码登录slave服务器配置:
$ssh slave1.hadoop
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '127.0.0.1' (RSA) to the list of known hosts.
hadoop@127.0.0.1's password:(第一次登录需要输入密码)
$exit
logout
Connection to slave1.hadoop closed.
$ ssh slave1.hadoop
Last login: Mon Nov 23 19:25:05 2015 from master.hadoop(第二次登录不需要输入密码,且当前用户已切换到slave1的hadoop,说明无密码登录验证成功)
$
验证master免密码登录slave2服务器的过程和slave1相同。

部署hadoop2.7.1到master:
(1)下载并解压源码包:
#tar -xvf hadoop-2.7.1.tar.gz
#cp -Rv./hadoop-2.7.1 /usr/local/hadoop/

(2)/usr/local/hadoop建立一些需要用到的目录:
#mkdir -p /usr/local/hadoop/tmp   /usr/local/hadoop/dfs/data    /usr/local/hadoop/dfs/name
#chown -R hadoop:hadoop /usr/local/hadoop/

(3)设置环境变量(slave服务器与master一致):
vi /etc/profile
##############Hadoop###########################
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=/usr/local/mysql/bin:$HADOOP_HOME/bin:$PATH
################END############################

(4)使其生效:
source/etc/profile生效

(5)切换到hadoop用户,修改hadoop相关配置,相关文件存放在$HADOOP_HOME/etc/hadoop/下,主要配置core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml这几个site文件:
#su - hadoop
$ cd /usr/local/hadoop/etc/hadoop/#su - hadoop
$ cd /usr/local/hadoop/etc/hadoop/

(6)修改env环境变量文件,将hadoop-env.sh,mapred-env.sh,yarn-env.sh末尾都添加上JAVA_HOME环境变量:
$vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_79   ###在文件末尾添加上这一行,另外2个文件相同方式修改

修改site文件,相关配置如下:
(7)core-site.xml:
$more core-site.xml


fs.defaultFS
hdfs://master.hadoop:9000            #因后期需要整合hive,这里要用hostname


hadoop.tmp.dir
file:/usr/local/hadoop/tmp
Abasefor other temporary directories.



(8)mapred-site.xml:
$more mapred-site.xml


   mapreduce.framework.name
   yarn



(9)hdfs-site.xml:
$more hdfs-site.xml


dfs.namenode.secondary.http-address
10.10.4.115:9001


dfs.replication
2                        


(10)yarn-site.xml
$more yarn-site.xml


   yarn.resourcemanager.hostname
      10.10.4.115


   yarn.nodemanager.aux-services
   mapreduce_shuffle


   yarn.resourcemanager.address
   10.10.4.115:8032


   yarn.resourcemanager.scheduler.address
   10.10.4.115:8030


   yarn.resourcemanager.resource-tracker.address
   10.10.4.115:8035


   yarn.resourcemanager.admin.address
   10.10.4.115:8033


   yarn.resourcemanager.webapp.address
   10.10.4.115:8088


(11)添加datanode节点:
#vi slaves
10.10.4.116                     #可以填IP地址也可以填hostname
10.10.4.116

(12)部署datanode节点:
复制hadoop目录至slave1和slave2节点,并修改目录权限:
#scp -r /usr/local/hadoop/ root@slave1.hadoop:/usr/local/
$chown -R hadoop:hadoop /usr/local/hadoop
#scp -r /usr/local/hadoop/ root@slave2.hadoop:/usr/local/
$chown -R hadoop:hadoop /usr/local/hadoop


(13)格式化namenode:
#su - hadoop
$cd /usr/local/hadoop
$./bin/hdfs namenode -format
当出现“successfully formatted”提示,说明格式化成功。

(14)启动hadoop:
$./sbin/start-all.sh

(15)检查master(namenode)进程状态:
$ jps
2697 Jps
7820 SecondaryNameNode
7985 ResourceManager
7614 NameNode
出现以上结果说明namenode节点服务正常运行

(16)检查slave(datanode)进程状态:
$ jps
18342 DataNode
30450 Jps
18458 NodeManager

(17)显示以上结果说明datanode节点服务正常运行
还可以通过网页监控集群状态等信息:
http://master.hadoop:8088
http://s4.运维网.com/wyfs02/M01/76/7B/wKiom1ZUM2vB9Bc9AALshzIeWAs689.jpg
http://master.hadoop:50070
http://s4.运维网.com/wyfs02/M00/76/7B/wKiom1ZUM1fgsCyMAAVCWLBYRjg627.jpg
至此,hadoop集群安装完成,接下来是和Hive的整合。

二、Hive和Hadoop的整合(hive安装在namenode节点上):
(1)安装mysql 5.5(也可以用yum安装,这里使用源码安装):
安装依赖包:
#yum -y install gcc gcc-c++ ncurses ncurses-devel cmake

(2)下载相应源码包:
#wget http://cdn.mysql.com/Downloads/MySQL-5.5/mysql-5.5.46.tar.gz

(3)添加mysql用户:
#useradd -M -s /sbin/nologin mysql

(4)预编译:
#tar -xzvf mysql-5.5.46.tar.gz
#mkdir -p /data/mysql
#cd mysql-5.5.46
#cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/mysql \###mysql5.5版本之后,configura改成cmake
-DMYSQL_DATADIR=/data/mysql \
-DSYSCONFDIR=/etc \
-DWITH_INNOBASE_STORAGE_ENGINE=1 \
-DWITH_PARTITION_STORAGE_ENGINE=1 \
-DWITH_FEDERATED_STORAGE_ENGINE=1 \
-DWITH_BLACKHOLE_STORAGE_ENGINE=1 \
-DWITH_MYISAM_STORAGE_ENGINE=1 \
-DENABLED_LOCAL_INFILE=1 \
-DENABLE_DTRACE=0 \
-DDEFAULT_CHARSET=utf8mb4 \
-DDEFAULT_COLLATION=utf8mb4_general_ci \
-DWITH_EMBEDDED_SERVER=1

(5)编译安装:
#make
#make install

(6)启动脚本,设置开机自启动:
#cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysqld
#chmod +x /etc/init.d/mysqld
#chkconfig --add mysqld
#chkconfig mysqld on

(7)初始化数据库:
#/usr/local/mysql/bin/mysqld --initialize-insecure --user=mysql --basedir=/usr/local/mysql --datadir=/data/mysql

(8)设置密码:
#mysql -uroot -p   #第一次执行时,会要求输入root账户密码

(9)建立hive数据库:
mysql>create database hive;

(10)建立hive账户并授权访问mysql权限:
mysql>grant all privileges on *.* to 'hive'@'%' identified by '123456' with grant option;    #设置用户名密码及访问权限授权
mysql>flush privileges;
mysql>alter database hivemeta character set latin1;                                                               #更改字符集类型
mysql>flush privileges;
mysql>exit

(11)测试hive账户:
#mysql -uhive -pmangocity
mysql>
测试登录成功

安装Hive:
(1)设置全局环境变量(行首添加):
#vi /etc/profile
#############Hive##############################
export HIVE_HOME=/usr/local/hive-1.2.1
export HIVE_CONF_DIR=$HIVE_HOME/conf
export HIVE_LIB=$HIVE_HOME/lib
export CLASSPATH=$CLASSPATH:$HIVE_LIB
export PATH=$HIVE_HOME/bin/:$PATH
###################END#########################

(2)使其生效:
#source /etc/profile

(3)下载安装Hive:
#tar -xzvf apache-hive-1.2.1-bin.tar.gz


(4)改变hive目录属组:
#chown -R hadoop:hadoop /usr/local/hive-1.2.1
su - hadoop
$cd /usr/local/hive-1.2.1/


(5)将下载好的mysql-connector-java-5.1.6-bin.jar复制到hive 的lib下面:
hive-1.2.1]$cp/home/hadoop/mysql-connector-java-5.1.37-bin.jar /usr/local/hive-1.2.1/lib/

(6)将$HIVE_HOME/lib下jline-2.12.jar复制到hadoop对应的目录中,替换掉低版本的jline-0.9.94.jar,看通过find查找出具体目录:
$find /usr/local/hadoop -name jline*.jar
$mv /usr/local/hadoop/share/hadoop/yarn/lib/jline-0.9.4.jar /usr/local/hadoop/share/hadoop/yarn/lib/jline-0.9.4.jar.bak#备份旧版本jar包
$cp /usr/local/hive-1.2.1/lib/jline-2.12.jar /usr/local/hadoop/share/hadoop/yarn/lib/

(8)hive环境变量设置:
编辑$HIVE_HOME/bin/hive-config.sh,页尾添加:
$vi /usr/local/hive-1.2.1/bin/hive-config.sh
export JAVA_HOME=/usr/java/jdk1.7.0_79
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive-1.2.1

(9)hive环境变量设置:
编辑$HADOOP_HOME/etc/hadoop/hadoop-env.sh,页尾添加:
$vi/usr/local/hadoop/etc/hadoop/hadoop-env.sh
export HIVE_HOME=/usr/local/hive-1.2.1
export HIVE_CONF_DIR=/usr/local/hive-1.2.1/conf

(10)设置hive与mysql连接等参数等文件的配置:
$vi/usr/local/hive-1.2.1/conf/hive-site.xml

   
hive.metastore.local                                                    #远程mysql还是本地mysql,这里是本地
ture


javax.jdo.option.ConnectionURL
jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true   #JDBC相关参数
JDBC connect string for a JDBC metastore


javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
Driver class name for a JDBC metastore   


javax.jdo.option.ConnectionUserName
hive
username to use against metastore database                   #连接mysql所用账户


javax.jdo.option.ConnectionPassword
123456
password to use against metastore database                  #连接mysql所用账户密码



(11)启动Hive:
$./bin/hive --service metastore &            #启动metastore,置于后台运行
$./bin/hive --service hiveserver &            #启动hiveserver,置于后台运行

(12)测试Hive,执行jps,会查看到有一个Runjar进程在运行:
$jps
5640 NameNode
6013 ResourceManager
5847 SecondaryNameNode
6490 RunJar
13978 Jps

(13)测试hive和hadoop及MySQL的整合:
$ ./bin/hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
SLF4J: Found binding in
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
SLF4J: Found binding in
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type
15/11/24 10:12:24 WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist
Logging initialized using configuration in jar:file:/usr/local/hive-1.2.1/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 1.878 seconds, Fetched: 1 row(s)
hive> show tables;
OK
Time taken: 2.025 seconds, Fetched: 1 row(s)
hive> create table test(name string, age int);                #尝试建立一张表
OK
Time taken: 2.952 seconds
hive> desc test;
OK
name                  string                                    
age                     int                                       
Time taken: 0.225 seconds, Fetched: 2 row(s)
hive> exit;

至此,hadoop集群与hive的整合完成,接下来是和spark的整合。

三丶Spark和Hadoop的整合
安装Scala:

(1)将压缩包解压至/usr/scala 目录:
#tar -zxvf .//scala-2.11.7.tgz
#mv ./scala-2.11.7 /usr/local/scala-2.11.7


(2)目录授权给hadoop用户:
#chown -R hadoop:hadoop /usr/local/scala-2.11.7

(3)设置环境变量:
#vi /etc/profile
追加如下内容:
#############Scale#############################
export SCALA_HOME=/usr/local/scala-2.11.7
export PATH=$PATH:$SCALA_HOME/bin
####################END########################

(4)使其生效:
#source /etc/profile

(5)测试Scala是否安装配置成功:
$ scala -version
Scala code runner version 2.11.7 -- Copyright 2002-2013, LAMP/EPFL

安装Spark

官网下载spark1.5.0:spark-1.5.0-bin-hadoop2.6.tgz

(1)将压缩包解压至/usr/local目录:
#tar -zxvf ./spark-1.5.0-bin-hadoop2.6.tgz
#mv./spark-1.5.0 /usr/local/spark-1.5.0

(2)目录授权给hadoop用户:
#chown -R hadoop:hadoop /usr/local/spark-1.5.0

(3)设置环境变量:
#vi/etc/profile
追加如下内容
############Spark##############################
export SPARK_HOME=/usr/local/spark-1.5.0
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
#################END###########################

(4)使之生效
#source /etc/profile

(5)测试Spark是否安装配置成功:
#spark-shell --version
Welcome to
      ____            __
   / __/_____ _____/ /__
    _\ \/ _ \/ _ `/ __/'_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.0
      /_/

Type --help for more information.

(6)编辑/usr/local/hadoop/etc/hadoop/hadoop-env.sh,添加如下内容:
export SPARK_MASTER_IP=10.10.4.115
export SPARK_WORKER_MEMORY=1024m

(7)更改spark相关配置:
su - hadoop
cd /usr/local/spark.1.5.0


(8)复制模板并编辑spark-env.sh ,配置相关环境变量:
$mv ./conf/spark-env.sh.template./conf/spark-env.sh
$vi ./conf/spark-env.sh
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SCALA_HOME=/usr/local/scala-2.11.7
export JAVA_HOME=/usr/java/jdk1.7.0_79
export SPARK_MASTER_IP=10.10.4.115
export SPARK_WORKER_MEMORY=1024m

(9)配置slave节点服务器:
$vi ./conf/slaves
# A Spark Worker will be started on each of the machines listed below.
slave1.hadoop
slave2.hadoop

(10)切换到root用户,将scala和spark以及相关环境变量配置复制到datanode节点,即salve1和slave2服务器:
#scp -r /usr/local/scala-2.11.7root@slave1.hadoop:/usr/local/scala-2.11.7
#scp -r /usr/local/spark.1.5.0 root@slave1.hadoop:/usr/local/spark.1.5.0
登录到slave1服务器,将刚才复制的两个目录授权给hadoop:
chown -R hadoop:hadoop /usr/local/scala-2.11.7/usr/local/spark.1.5.0
配置slave1服务器/etc/profile,/usr/local/hadoop/etc/hadoop/hadoop-env.sh,使其和master服务器保持一致,并使其生效。


#scp -r /usr/local/scala-2.11.7root@slave2.hadoop:/usr/local/scala-2.11.7
#scp -r /usr/local/spark.1.5.0 root@slave2.hadoop:/usr/local/spark.1.5.0
登录到slave2服务器,将刚才复制的两个目录授权给hadoop:
chown -R hadoop:hadoop /usr/local/scala-2.11.7/usr/local/spark.1.5.0
配置slave2服务器/etc/profile,/usr/local/hadoop/etc/hadoop/hadoop-env.sh,使其和master服务器保持一致,并使其生效。

(11)启动spark和测试:
$./sbin/start-all.sh

可以通过jps查看namenode节点,会有一个Master进程在运行:
$jps
5640 NameNode
6013 ResourceManager
5847 SecondaryNameNode
6897 Master
15164 Jps
6490 RunJar
过jps查看datanode节点,会有一个Worker进程在运行:
$ jps
  

3982 Jps
  

31810 DataNode
  

32136 Worker
  

31928 NodeManager
  



也可以通过网页监控到相关信息:
http://master.hadoop:8080
http://s5.运维网.com/wyfs02/M02/76/7B/wKiom1ZUMwKiUOvPAAONkcOyJZI955.jpg

至此,整个Hdoop、Hive、Spark集群环境部署完成。
                  

  




页: [1]
查看完整版本: CentOS64位6.4下Hadoop2.7.1、Mysql5.5.46、Hive1.2.1、Spark1.5.0的集群环境部署