机器与系统准备
centos7系统 最小化安装 设置好网络和防火墙 网络需要能访问外网
三台机器,一台master,两台slave,设置主机名为master,slave01,slave02
添加hosts
cat /etc/hosts
192.168.1.10 master
192.168.1.11 slave01
192.168.1.12 slave02
下面关闭防火墙 setenforce 0 systemctl stop firewalld systemctl disable firewalld sed -i 's/enforcing/disabled/g' /etc/selinux/config
设置yum源 yum install wget -y cd /etc/yum.repos.d/ yum -y install epel-release yum install net-tools -y yum install tree -y
配置三台机器免密登陆
打开ssh的rsa认证
vi /etc/ssh/sshd_config RSAAuthentication yes PubkeyAuthentication yes 然后重启sshd systemctl restart sshd
创建用户hadoop
groupadd hadoop useradd -m -g hadoop hadoop echo "hadoop" |passwd --stdin hadoop 或者直接passwd hadoop 输入密码hadoop
切换到普通用户 su hadoop cd /home/hadoop/ ssh-keygen -t rsa #为你生成rsa密钥,可以直接一路回车,执行默认操作 生成密钥后,会出现 .ssh ├── id_rsa └── id_rsa.pub #公钥 服务端需要里边内容验证连接着身份 cd .ssh/ touch authorized_keys cat id_rsa.pub >> authorized_keys chmod 600 authorized_keys chmod 700 id_rsa* 复制slave01,slave02的id_rsa.pub公钥添加到master的authorized_keys 将有三个机器的公钥authorized_keys文件复制到slave机器上 scp authorized_keys hadoop@slave01:/home/hadoop/.ssh/ scp authorized_keys hadoop@slave02:/home/hadoop/.ssh/ 然后都重启sshd systemctl restart sshd 之后就可以免密访问了
上面的基本环境三台机器协同配置好,务必保证准确 ############################################## 下面安装JDK和hadoop 本次用了最新版本的jdk-8u151-linux-x64.tar.gz(官网下载) hadoop用户下操作 cd /usr/
mkdir java cd java/ tar zxf jdk-8u151-linux-x64.tar.gz 下载hadoop cd /home/hadoop/
mkdir bigdata cd bigdata/ tar -zxf hadoop-2.7.5.tar.gz mv hadoop-2.7.5 hadoop
设置用户环境变量 vi /home/hadoop/.bashrc export JAVA_HOME=/usr/java/jdk1.8.0_151
export HADOOP_HOME=/home/hadoop/bigdata/hadoop
export HADOOP_USER_NAME=hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
source /home/hadoop/.bashrc #加载设置的变量
下面修改Hadoop的配置文件 创建数据目录 ##配置jdk和环境变量、创建目录也都要在slaves上操作,等hadoop修改完配置,用复制的方法拷贝到slaves机器上。 cd /home/hadoop/bigdata/ mkdir -p data/hadoop/tmp
mkdir -p data/hadoop/hdfs/datanode
mkdir -p data/hadoop/hdfs/namenode
vi /home/hadoop/bigdata/hadoop/etc/hadoop/core-site.xml 1
2
3
4
5
6
7
8
9
10
| <configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/bigdata/data/hadoop/tmp</value>
</property>
</configuration>
|
core-site.xml文件添加了每个节点的临时文件目录tmp,最好自己先行创建
vi /home/hadoop/bigdata/hadoop/etc/hadoop/hdfs-site.xml 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| <configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/bigdata/data/hadoop/hdfs/datanode</value>
</property>
<property>
<name>dfs.datanode.name.dir</name>
<value>file:/home/hadoop/bigdata/data/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
|
hdfs-site.xml文件添加了节点数据目录datanode和namenode,最好提前创建
vi /home/hadoop/bigdata/hadoop/etc/hadoop/mapred-site.xml #如果没有这个文件就用模板复制一个 1
2
3
4
5
6
| <configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
|
vi /home/hadoop/bigdata/hadoop/etc/hadoop/yarn-site.xml 1
2
3
4
5
6
7
8
9
10
11
12
| <configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
|
vi /home/hadoop/bigdata/hadoop/etc/hadoop/slaves
以上配置仅为基本参数,关于资源分配和性能参数可以根据业务相应的加配置!
在master上配置完成后,把hosts .bashrc /home/hadoop/bigdata/hadoop data 复制到slave01 slave02 的对应目录 scp /etc/hosts hadoop@slave01:/etc/hosts scp /etc/hosts hadoop@slave02:/etc/hosts #scp -r /usr/java/jdk1.8.0_151 hadoop@slave01:/usr/java/ #scp -r /usr/java/jdk1.8.0_151 hadoop@slave02:/usr/java/ scp /home/hadoop/.bashrc hadoop@slave01:/home/hadoop/ scp /home/hadoop/.bashrc hadoop@slave02:/home/hadoop/ scp -r /home/hadoop/bigdata/hadoop hadoop@slave01:/home/hadoop/bigdata/ scp -r /home/hadoop/bigdata/hadoop hadoop@slave02:/home/hadoop/bigdata/ 最后在slave机器上执行下 source /home/hadoop/.bashrc #加载设置的变量
启动hadoop集群在master上执行 cd /home/hadoop/bigdata/hadoop/sbin sh start-all.sh
1
2
3
4
5
6
7
8
9
10
| [hadoop@master sbin]$ jps
2713 ResourceManager
2362 NameNode
5053 Jps
2558 SecondaryNameNode
[hadoop@slave01 sbin]$ jps
2769 NodeManager
3897 Jps
2565 DataNode
|
到此hadoop的集群启动成功。
##########################################
cd /home/hadoop/bigdata/ tar zxf apache-hive-1.2.2-bin.tar.gz mv apache-hive-1.2.2 hive
修改配置 cd /home/hadoop/bigdata/hive/conf cp hive-default.xml.template hive-site.xml
cp hive-env.sh.template hive-env.sh cp hive-log4j.properties.template hive-log4j.properties
vi hive-env.sh export HADOOP_HOME=/home/hadoop/bigdata/hadoop export HIVE_CONF_DIR=/home/hadoop/bigdata/hive/conf
vi hive-log4j.properties hive.log.threshold=ALL hive.root.logger=INFO,DRFA hive.log.dir=/home/hadoop/bigdata/hive/log hive.log.file=hive.log
vi hive-site.xml 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
| <property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://master:9000/user/hive/warehouse</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>hdfs://master:9000/user/hive/scratchdir</value>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/home/hadoop/bigdata/hive/tmp</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/home/hadoop/bigdata/hive/tmp</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/home/hadoop/bigdata/hive/tmp</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
<property>
<name>hive.querylog.location</name>
<value>/home/hadoop/bigdata/hive/logs</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hivemeta?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
|
hive配置中是经常出错的可根据启动使用中的错误提示修改。 hadoop fs -mkdir -p /user/hive/scratchdir hadoop fs -mkdir -p /user/hive/warehouse hadoop fs -chmod g+w /user/hive/scratchdir hadoop fs -chmod g+w /user/hive/warehouse
启动metastore 和hiveserver2 服务
nohup hive --service metastore& nohup hive --service hiveserver2& 1
2
3
4
5
6
7
8
| [hadoop@master bin]$ hive
Logging initialized using configuration in file:/home/hadoop/bigdata/hive/conf/hive-log4j.properties
hive> show databases;
OK
default
fucktime
Time taken: 1.14 seconds, Fetched: 2 row(s)
hive>
|
########################################## 先在master上操作 修改好后scp到slave上 spark部署 zookeeper部署 hbase部署 首先下载软件,用阿里云镜像下载 cd /home/hadoop/bigdata/
解压到bigdata目录 cd /home/hadoop/bigdata/ tar zxf spark-2.2.1-bin-hadoop2.7.tgz tar zxf scala-2.10.4.tgz tar zxf zookeeper-3.4.10.tar.gz tar zxf hbase-1.2.6-bin.tar.gz
mv spark-2.2.1-bin-hadoop2.7 spark mv scala-2.10.4 scala mv zookeeper-3.4.10 zk mv hbase-1.2.6 hbase
在用hadoop的.bashrc中添加对应系统的环境变量 export HIVE_HOME=/home/hadoop/bigdata/hive export PATH=$PATH:$HIVE_HOME/bin export SCALA_HOME=/home/hadoop/bigdata/scala export PATH=$PATH:$SCALA_HOME/bin export SPARK_HOME=/home/hadoop/bigdata/spark export PATH=$PATH:$SPARK_HOME/bin export ZK_HOME=/home/hadoop/bigdata/zk export PATH=$PATH:$ZK_HOME/bin export HBASE_HOME=/home/hadoop/bigdata/hbase export PATH=$PATH:$HBASE_HOME/bin
source /home/hadoop/.bashrc
复制到slave机器上 scp /home/hadoop/.bashrc hadoop@slave01:/home/hadoop/ scp /home/hadoop/.bashrc hadoop@slave02:/home/hadoop/ source /home/hadoop/.bashrc
****************************************************** 修改spark配置 cd /home/hadoop/bigdata/spark cp spark-env.sh.template spark-env.sh
vi spark-env.sh export SCALA_HOME=/home/hadoop/bigdata/scala export JAVA_HOME=/usr/java/jdk1.8.0_151 export HADOOP_HOME=/home/hadoop/bigdata/hadoop export HADOOP_CONF_DIR=/home/hadoop/bigdata/hadoop/etc/hadoop SPARK_MASTER_IP=master SPARK_LOCAL_DIRS=/home/hadoop/bigdata/spark SPARK_DRIVER_MEMORY=512M
cp slaves.template slaves vi slaves slave01 slave02
将spark拷贝到slave机器 cd /home/hadoop/bigdata/ scp -r spark hadoop@slave01:/home/hadoop/bigdata/ scp -r spark hadoop@slave02:/home/hadoop/bigdata/
cd /home/hadoop/bigdata/spark/sbin sh start-all.sh 1
2
3
4
5
6
7
8
9
10
11
12
| [hadoop@master sbin]$ jps
2713 ResourceManager
2362 NameNode
1268 Master
5053 Jps
2558 SecondaryNameNode
[hadoop@slave01 sbin]$ jps
2769 NodeManager
3897 Jps
25623 Worker
2565 DataNode
|
********************************************* 修改zookeeper配置 cd /home/hadoop/bigdata/zk/conf/ cp zoo_sample.cfg zoo.cfg
vi zoo.cfg dataDir=/home/hadoop/bigdata/zk/zkdata dataLogDir=/home/hadoop/bigdata/zk/zkdatalog server.1=master:2888:3888 server.2=slave01:2888:3888 server.3=slave02:2888:3888
echo "1" > /home/hadoop/bigdata/zkdata/myid
复制zk到slave机器上 cd /home/hadoop/bigdata/ scp -r zk hadoop@slave01:/home/hadoop/bigdata/ scp -r zk hadoop@slave02:/home/hadoop/bigdata/
在slave机器上分别修改myid echo "2" > /home/hadoop/bigdata/zkdata/myid echo "3" > /home/hadoop/bigdata/zkdata/myid
在各节点启动zkServer cd /home/hadoop/bigdata/zk/bin/ ./zkServer.sh start 查看状态 sh zkServer.sh status
****************************************************** 修改hbase配置 cd /home/hadoop/bigdata/hbase/conf vi habse-site.xml 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| <configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave01,slave02</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/bigdata/zk/zkdata</value>
</property>
</configuration>
|
vi regionservers slave01 slave02
将habse复制到slave机器上 cd /home/hadoop/bigdata/
scp -r hbase hadoop@slave01:/home/hadoop/bigdata/
scp -r hbase hadoop@slave02:/home/hadoop/bigdata/
在master上启动hbase cd /home/hadoop/bigdata/hbase/bin sh start-hbase.sh 查看状态 hbase shell status
|