mt-hadoop-data5
配置/usr/hadoop/conf/datanode-deny-list,存放禁止连接的datanode的主机名或IP(IP存在隐患问题,建议用主机名)
文件为空,表示都允许。
建立hadoop配置同步工具脚本
vi /usr/hadoop/bin/sync-datanode-config.sh
#!/usr/bin/env bash
# copy hadoop configuration to datanode in datanode-allow-list or specified
if [ $# = 0 ];then
##sync namenode and datanode config
hostlist=$(cat /usr/hadoop/conf/masters /usr/hadoop/conf/slaves|grep -vw `hostname`|tr "\n" " ")
echo "starting copy hadoop config to {$hostlist}..."
for host in $hostlist
do
scp /usr/hadoop/conf/* hadoop@$host:/usr/hadoop/conf > /dev/null 2>&1
if [ $? = 0 ];then
echo "copy to $host Successful"
else
echo "copy to $host Failure"
fi
done
##sync hadoop config on hive server
host_hive=$(cat /etc/hosts|grep hive|awk '{print $2}')
echo "starting copy hadoop config to {$host_hive}..."
scp /usr/hadoop/conf/* hadoop@$host_hive:/usr/hadoop/conf > /dev/null
if [ $? = 0 ];then
echo "copy to $host_hive Successful"
else
echo "copy to $host_hive Failure"
fi
elif [ $# = 1 ];then
hostlist=$1
echo "starting copy hadoop config to {$hostlist}..."
for host in $hostlist
do
scp /usr/hadoop/conf/* hadoop@$host:/usr/hadoop/conf > /dev/null 2>&1
if [ $? = 0 ];then
echo "copy to $host Successful"
else
echo "copy to $host Failure"
fi
done
fi
chmod +x /usr/hadoop/bin/sync-datanode-config.sh
同步配置
sync-datanode-config.sh
5)hadoop集群启动与验证 使用hadoop用户完成以下操作
a)hdfs文件系统格式化
hadoop namenode -format
b)hadoop启动与停止
启动vip(可以通过HA实现自动切换)
sudo ifconfig eth0:0 172.26.10.140 netmask 255.255.255.0
启动hadoop
start-all.sh
停止hadoop
stop-all.sh
c)查看DataNode情况
hadoop dfsadmin -report
d)网页查看
http://172.26.10.125:50030 hadoop管理
http://172.26.10.126:50060 Task Tracker
http://172.26.10.125:50070 HDFS状态
e)运行测试任务
hadoop fs -mkdir /user/pset
hadoop fs -chown pset:pset /user/pset
hadoop fs -put /home/hadoop/software/jdk-7u25-linux-x64.gz /user/pset/
hadoop fs -ls /user/pset
f)TestDFSIO基准测试HDFS
写测试。
hadoop jar $HADOOP_HOME/hadoop-test*.jar TestDFSIO -write -nrFiles 2 -fileSize 1000
读测试。
hadoop jar $HADOOP_HOME/hadoop-test*.jar TestDFSIO -read -nrFiles 2 -fileSize 1000
清除测试数据。
hadoop jar $HADOOP_HOME/hadoop-test*.jar TestDFSIO -clean 四、hive的安装与配置
在上一章节,我们在安装hadoop时,已对hive的主机安装了hadoop(即第三章第1节 hadoop安装中的步骤)。在此基础上,我们安装hive。
术语:Metastore
元存储,即存放hive元数据的数据中心。默认情况下,hive service包含一个嵌入式Derby数据库作为元存储,但是因为同一时刻,只有一个嵌入式Derby数据库可以访问数据库文件,所以在同一时刻,只有1个hive的session。如果你启动第二个session,会有如下错误:
Failed to start database 'metastore_db'。要解决session问题,我们就要使用单独的数据库。这里以mysql为例。以下是权威指南中的原话:
Using an embedded metastore is a simple way to get started with Hive; however, only
one embedded Derby database can access the database files on disk at any one time,
which means you can have only one Hive session open at a time that shares the same
metastore. Trying to start a second session gives the error:
Failed to start database 'metastore_db'
when it attempts to open a connection to the metastore.
1)安装mysql 使用root用户完成以下操作:
安装mysql
apt-get install mysql-server
建立hive账号
delete from user where user!='root' or host!='localhost';
update user set host='%' where host='root';
CREATE USER 'hive'> GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' WITH GRANT OPTION;
flush privileges;
以hive用户登录mysql,建立hive数据库
mysql -uhive -phive
create database hive;
2)安装hive 使用hadoop用户完成以下操作 :
解压hive的tar包,放到/usr/hive
tar -zxvf hive-0.12.0.tar.gz
sudo cp -r hive-0.12.0/ /usr/hive
修改目录权限
sudo chown -R hadoop:hadoop /usr/hive
sudo chmod 755 /usr/hive -R
3)配置环境变量
sudo vi /etc/profile文件末尾增加如下行:(java及hadoop的环境变量在hadoop安装时已配置)
#set hive path
export HIVE_INSTALL=/usr/hive
hive.cli.print.header
true
Whether to print the names of the columns in query output.
5)验证
启动hive shell
hive
执行以下语句测试:
show tables;
CREATE TABLE xp(id INT,name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
编辑sample.txt
vi /tmp/sample.txt
1 zhangsan
2 lisi
3 test
hive导入数据
load data local inpath '/tmp/sample.txt' overwrite into table xp;
select * from xp;
quit;