设为首页 收藏本站
查看: 3038|回复: 0

[经验分享] Hadoop-1.2.1环境的使用(sqoop,hive,hbase,zookeeper,pig)

[复制链接]

尚未签到

发表于 2015-12-23 09:13:25 | 显示全部楼层 |阅读模式
搭建Linux的Hadoop环境(亲测通过)
首先配置服务器的环境,就是Hadoop在Linux上面跑的基本环境,一个Java一个免秘钥自动登录,下面进行配置
master 192.168.1.110
slave1 192.168.1.190
slave2 192.168.1.191
首先配置三台机器能够互相访问先配秘钥登陆
1、此处省略配置过程
2、配置jdk环境
[iyunv@localhost ~]# mkdir /usr/java
[iyunv@localhost ~]# mv jdk-7u51-linux-x64.rpm /usr/java/
[iyunv@localhost ~]# rpm -qa |grep gcj
[iyunv@localhost ~]# rpm -qa |grep jdk
java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64
[iyunv@localhost ~]# rpm -e --nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
[iyunv@localhost ~]# rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64
[iyunv@localhost ~]# cd /usr/java/
[iyunv@localhost java]# rpm -ivh jdk-7u51-linux-x64.rpm
Preparing...                ########################################### [100%]
   1:jdk                    ########################################### [100%]
为jdk的安装目录及执行程序创建链接文件,以方便使用。
[iyunv@localhost java]# alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_51/bin/java 1
[iyunv@localhost java]# alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_51/bin/javac 1
查看java运行程序
[iyunv@localhost java]#  alternatives --config java
#vim /etc/profile
在最后面加入
[iyunv@master ~]#set java environment
JAVA_HOME=/usr/java/jdk1.7.0_51
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME CLASSPATH PATH
保存退出。
[iyunv@localhost ~]# source /etc/profile
查看Java版本信息
[iyunv@localhost ~]# java -version
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
3、开始配置Hadoop
[iyunv@master ~]# wget http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
[iyunv@master ~]# tar zxf hadoop-1.2.1.tar.gz
[iyunv@master ~]# mv hadoop-1.2.1 /usr/local/
[iyunv@master conf]# vim hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_51            ###添加此处配置jdk的系统变量
[iyunv@master conf]# vim core-site.xml


hadoop.tmp.dir
/data/tmp/hadoop-${user.name}


fs.default.name
hdfs://192.168.1.110:9000

                             ###添加以上配置
[iyunv@master conf]# vim hdfs-site.xml


dfs.name.dir
/data/hdfs/name



dfs.data.dir
/data/hdfs/data



dfs.datanode.max.xcievers
4096



dfs.replication
2              #数据备份的个数


[iyunv@master conf]# vim mapred-site.xml


mapred.job.tracker
192.168.1.110:9001


配置master和slave的IP地址
4、开始克隆master服务器的环境到slave
[iyunv@master conf]# ssh 192.168.1.190 '[ -d /usr/java ] || mkdir -p /usr/java ]'
[iyunv@master conf]# ssh 192.168.1.191 '[ -d /usr/java ] || mkdir -p /usr/java ]'
[iyunv@master conf]# scp -r /usr/java/jdk1.7.0_51 root@192.168.1.190:/usr/java/
[iyunv@master conf]# scp -r /usr/java/jdk1.7.0_51 root@192.168.1.191:/usr/java/
[iyunv@master conf]# scp -r /usr/local/hadoop-1.2.1 root@192.168.1.190:/usr/local/
[iyunv@master conf]# scp -r /usr/local/hadoop-1.2.1 root@192.168.1.191:/usr/local/
[iyunv@master bin]# ./hadoop namenode -format                  ###格式化文件系统
[iyunv@master bin]# ./start-all.sh                             ###启动Hadoop
starting namenode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-master.out
192.168.1.190: starting datanode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-slave1.out
192.168.1.191: starting datanode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-slave2.out
The authenticity of host '192.168.1.110 (192.168.1.110)' can't be established.
RSA key fingerprint is 09:48:07:75:1b:13:85:46:0c:16:45:53:c5:be:0e:07.
Are you sure you want to continue connecting (yes/no)? yes
192.168.1.110: Warning: Permanently added '192.168.1.110' (RSA) to the list of known hosts.
192.168.1.110: starting secondarynamenode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-master.out
starting jobtracker, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-master.out
192.168.1.190: starting tasktracker, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-slave1.out
192.168.1.191: starting tasktracker, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-slave2.out
[iyunv@master conf]#bin/hadoop jar hadoop-examples-1.2.1.jar pi 10 100       ###Hadoop自带测试包看到以下输出则启动成功
Number of Maps  = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
....................
开始测试一下我们Hadoop环境
管理地址:http://192.168.1.110:50030/
存储管理地址:http://192.168.1.110:50070/

Hadoop的使用
1、创建测试用原生Python编写Mapreduce
[iyunv@master script]# vim input.txt
foo foo off off off off on on python      #编写测试数据
[iyunv@master script]# vim mapper.py
#!/usr/bin/env python
import sys
for line in sys.stdin:
    line = line.strip()
    words = line.split()
    for word in words:
        print "%s\t%s" % (word, 1)        ###统计脚本

[iyunv@master script]# vim reducer.py

#!/usr/bin/env python
from operator import itemgetter
import sys
current_word = None
current_count = 0
word = None

for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t', 1)
    try:
        count = int(count)
    except ValueError:
        continue
    if current_word == word:
        current_count += count
    else:
        if current_word:
            print "%s\t%s" % (current_word, current_count)
        current_count = count
        current_word = word

if word == current_word:
    print "%s\t%s" % (current_word, current_count)       ###统计
[iyunv@master script]# cat input.txt | python mapper.py |sort |python reducer.py
foo    2
off    4
on    2
python    1
2、此时我们利用Hadoop的统计来实现
首先需要在HDFS上面创建文本文件
[iyunv@master hadoop-1.2.1]# /usr/local/hadoop-1.2.1/bin/hadoop dfs -mkdir /usr/local/word
上传文件到创建的HDFS目录
[iyunv@master hadoop-1.2.1]# bin/hadoop fs -put /home/script/input.txt /usr/local/word
[iyunv@master hadoop-1.2.1]# bin/hadoop fs -ls /usr/local/word
output是指定输出目录
[iyunv@master hadoop-1.2.1]# bin/hadoop jar /usr/local/hadoop-1.2.1/contrib/streaming/hadoop-streaming-1.2.1.jar -file /home/script/mapper.py -mapper /home/script/mapper.py -file /home/script/reducer.py -reducer /home/script/reducer.py -input /usr/local/word -output /output/word1


同步数据客户端到服务端的hdfs
[iyunv@master script]# vim hdfsput.sh
#!/bin/bash
#
webid="web1"
logspath="/var/log/httpd/access_log"
logname="access.log$webid"

/usr/local/hadoop-1.2.1/bin/hadoop dfs -mkdir /usr/local/hadoop_web/`date +%Y%m%d`      ###创建hdfs目录
/usr/local/hadoop-1.2.1/bin/hadoop dfs -put $logspath /usr/local/hadoop_web/`date +%Y%m%d`/$logname     ###put本地的日志到hdfs,如果是多台机器需要上传的服务端就写hdfs://192.168.1.110/usr/local/hadoop_web/`date +%Y%m%d`

统计流量脚本
[iyunv@master script]# vim httpflow.py
#!/usr/bin/python
#coding=utf-8

from mrjob.job import MRJob
import re

class MRCounter(MRJob):
    def mapper(self, key, line):
        i=0
        for flow in line.split():
            if i==3:
                timerow=flow.split(":")
                hm=timerow[1]+":"+timerow[2]
            if i==9 and re.match(r"\d{1,}", flow):
                yield hm, int(flow)
            i+=1
    def reducer(self, key, occurrences):
        yield key, sum(occurrences)

if __name__ == '__main__':
    MRCounter.run()


统计日志的IP访问次数
[iyunv@master ~]# vim /home/script/wordip.py
#!/usr/bin/python
#coding=utf-8
from mrjob.job import MRJob
import re

IPlist = re.compile(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}")

class MRword(MRJob):
    def mapper(self, key, line):
        for word in IPlist.findall(line):
            yield word, 1
    def reducer(self, word, occurrences):
        yield word, sum(occurrences)

if __name__ == '__main__':
    MRword.run()

统计状态码脚本
[iyunv@master ~]# vim /home/script/wordhttp.py
#!/usr/bin/python
#coding=utf-8
from mrjob.job import MRJob
import re

#HTTPlist = re.match(r"\d{1,3}", word)
class MRword(MRJob):
    def mapper(self, key, line):
        i=0
        for word in line.split():
            if i==8 and re.match(r"\d{1,3}", word):
                yield word, 1
            i+=1
    def reducer(self, word, occurrences):
        yield word, sum(occurrences)

if __name__ == '__main__':
    MRword.run()


########################################################################################################################################
Hadoop的第一个插件,是利用Java的jdbc来操作mysql数据库的,前提也是必须有Java环境
安装使用sqoop
[iyunv@master ~]# tar zxf sqoop-1.4.5.bin__hadoop-1.0.0.tar.gz
[iyunv@master ~]# mv sqoop-1.4.5.bin__hadoop-1.0.0 /usr/local/sqoop-1
[iyunv@master ~]# cd /usr/local/sqoop/
[iyunv@master ~]# mv  sqoop-env-template.sh  sqoop-env.sh
[iyunv@master sqoop]# vim conf/sqoop-env.sh
export HADOOP_COMMON_HOME=/usr/local/hadoop-1.2.1/
export HADOOP_MAPRED_HOME=/usr/local/hadoop-1.2.1/
将jbdc也就是java对mysql的驱动程序放到lib目录下
[iyunv@master ~]# unzip mysql-connector-java-5.1.22.zip
mysql-connector-java-5.1.22-bin.jar   hadoop-core-1.2.1.jar  都cp到lib目录下
从hdfs向mysql导入数据
[iyunv@master ~]#/usr/local/sqoop/bin/sqoop export --connect "jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf-8" --username root --password 123456 --table user --export-dir /output/word1/part-00000 --fields-terminated-by ':' -m 1
从mysql像hdfs导入数据
[iyunv@master ~]#/usr/local/sqoop/bin/sqoop import --connect "jdbc:mysql://127.0.0.1:3306/ndsg?useUnicode=true&characterEncoding=utf-8" --username root --password 123456 --table table_user --fields-terminated-by ':' -m 1
列出数据库
[iyunv@master ~]#/usr/local/sqoop/bin/sqoop list-databases connect jdbc:mysql://localhost:3306/ username root password 123456


########################################################################################################################################
这是我们安装的Hadoop的第一个子项目hive:
[iyunv@master ~]#tar zxf apache-hive-0.13.1-bin.tar.gz
[iyunv@master ~]#mv apache-hive-0.13.1-bin /usr/local/hive
[iyunv@master conf]# cp hive-default.xml.template hive-default.xml
[iyunv@master conf]# cp hive-default.xml.template hive-site.xml
[iyunv@master conf]# vim hive-site.xml






javax.jdo.option.ConnectionURL
jdbc:mysql://127.0.0.1:3306/test?createDatabaseIfNotExist=true



javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver



javax.jdo.option.ConnectionUserName
root



javax.jdo.option.ConnectionPassword
123456



datanucleus.fixedDatastore
false


[iyunv@master conf]# vim /etc/profile
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
[iyunv@master conf]# vim hive-env.sh
export HADOOP_HEAPSIZE=1024
HADOOP_HOME=/usr/local/hadoop
在hdfs上面,创建目录    #是hive的根目录
[iyunv@master ~]# /usr/local/hadoop/bin/hadoop fs -mkidr /user/hive/warehouse
[iyunv@master ~]# /usr/local/hadoop/bin/hadoop fs -chmod g+w /user/hive/warehouse
启动hive
[iyunv@master conf]# ../bin/hive
Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive>                                   ###启动成功
hive> CREATE TABLE table_hive (a varchar, b int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';     ####创建数据库
hive> LOAD DATA LOCAL INPATH '/home/script/iplist.txt' OVERWRITE INTO TABLE table_hive ;        ###将本地的文件cp到hive库
hive> select * from table_hive;
OK
123456  2
123456  9
Time taken: 0.083 seconds, Fetched: 2 row(s)

将hive数据库复制到mysql库
[iyunv@master hive]# /usr/local/sqoop/bin/sqoop export --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table user --export-dir /user/hive/warehouse/t_hive/iplist.txt --input-fields-terminated-by '\t'
将mysql库复制到hive库
[iyunv@master hive]# /usr/local/sqoop/bin/sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password 123456 --table table_ip --hive-import --hive-table table_ip -m 1


########################################################################################################################################
Hadoop的第二个子项目hbase,其中hbase可以结合zookeeper来使用,可以对其进同意的管理,zookeeper的主要作用就是来进行统一的接口调度
安装使用hbase和zookeeper
首先安装zookeeper      ####是hbase的集群的一个模式,可以随着hbase的启动而启动,是能够将hbase的接口统一进行数据存储
[iyunv@master ~]#tar zxf zookeeper-3.3.6.tar.gz
[iyunv@master ~]#cd /usr/local/zookeeper/
[iyunv@master zookeeper]# cp conf/zoo_sample.cfg conf/zoo.cfg
[iyunv@master zookeeper]# vim conf/zoo.cfg
[iyunv@master zookeeper]# vim conf/zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/data/zookeeper/data                  ###zookeper的存放目录
dataLogDir=/data/zookeeper/logs               ###zookeeper的日志存放目录
# the port at which the clients will connect
clientPort=2181
server.1=master:2887:3887      
server.2=slave1:2888:3888
server.3=slave2:2889:3889
[iyunv@master zookeeper]# mkdir -p /data/zookeeper/logs
[iyunv@master zookeeper]# mkdir -p /data/zookeeper/data
[iyunv@master zookeeper]# vim /data/zookeeper/data/myid
1                                ###每台机器一台不同的id用于识别,在其他机器修改为2和3
[iyunv@master zookeeper]# scp -r /usr/local/zookeeper/ root@192.168.1.190:/usr/local/      ###将环境复制到不同的机器,并且创建目录和myid值
[iyunv@master zookeeper]# scp -r /usr/local/zookeeper/ root@192.168.1.191:/usr/local/
[iyunv@master zookeeper]# bin/zkServer.sh start       ###启动服务,并在其他的机器启动服务

安装hbase来管理zookeeper
[iyunv@master ~]#tar zxf hbase-0.94.26.tar.gz
[iyunv@master ~]#mv hbase-0.94.26 /usr/local/hbase
[iyunv@master ~]#cd /usr/local/hbase/
[iyunv@master hbase]# vim conf/hbase-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_51
[iyunv@master hbase]# vim conf/regionservers
master
slave1
slave2
[iyunv@master hbase]# vim conf/hbase-site.xml


       hbase.rootdir
       hdfs://192.168.1.180:9000/hbase
   
   
       hbase.cluster.distributed
       true
   
   
        dfs.replication
        2
   
   
        hbase.zookeeper.quorum
        192.168.1.180,192.168.1.190,192.168.1.191
   
   
        hbase.zookeeper.property.dataDir
        /data/zookeeper/data               ###此处是zookeeper的安装目录
   

[iyunv@master hbase]# scp -r /usr/local/hbase root@192.168.1.190:/usr/local/
[iyunv@master hbase]# scp -r /usr/local/hbase root@192.168.1.191:/usr/local/
[iyunv@master hbase]# bin/start-hbase.sh
192.168.1.180: starting zookeeper, logging to /usr/local/hbase/bin/../logs/hbase-root-zookeeper-master.out
192.168.1.191: starting zookeeper, logging to /usr/local/hbase/bin/../logs/hbase-root-zookeeper-slave2.out
192.168.1.190: starting zookeeper, logging to /usr/local/hbase/bin/../logs/hbase-root-zookeeper-slave1.out
starting master, logging to /usr/local/hbase/bin/../logs/hbase-root-master-master.out
slave1: starting regionserver, logging to /usr/local/hbase/bin/../logs/hbase-root-regionserver-slave1.out
master: starting regionserver, logging to /usr/local/hbase/bin/../logs/hbase-root-regionserver-master.out
slave2: starting regionserver, logging to /usr/local/hbase/bin/../logs/hbase-root-regionserver-slave2.out      ###启动成功
[iyunv@master hbase]# bin/hbase shell
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.94.26, ra9114543cd48c7e230dabd31f45a086f7b7a5d6a, Wed Dec 17 20:37:01 UTC 2014

hbase(main):001:0>             ###下面就是我们使用hbase了
同时在其他的机器上面也已经启动了我们的hbase服务

########################################################################################################################################
安装使用pig,pig是什么,我现在也不知道,我们一起来看看吧
[iyunv@master local]# tar zxf pig-0.14.0.tar.gz
[iyunv@master local]# mv pig-0.14.0 pig
[iyunv@master local]# vim /etc/profile
export PIG_HOME=/usr/local/pig
export PATH=$PATH:$PIG_HOME/bin
[iyunv@master local]# pig -x local         ###进入本地的pig库,是在当前的安装目录下面
grunt> ls
grunt> log = LOAD 'tutorial/data/excite-small.log'        ###进行一个简单的小测试
grunt> AS (user:chararray, time:long, query:chararray);
grunt> grpd = group log by user;
grunt> cntd = FOREACH grpd GENERATE group, COUNT(log);
grunt> dump cntd;                         ###输出显示的cntd信息
grunt> STORE cntd INTO 'output';          ###输出至output目录,此时在当前目录下面将生成output目录,我们可以在当前目录下面查看一下输出
[iyunv@master pig]# cat output/part-r-00000         ###这个只是我们使用的本地的pig模式,我们还可以使用Hadoop的pig模式
下面我们介绍pig的Hadoop模式
[iyunv@master pig]# vim conf/pig.properties
fs.default.name=hdfs://192.168.1.180:9000
mapred.job.tracker=192.168.1.180:9001
[iyunv@master ~]# pig              ###直接进入Hadoop模式我们使用pig吧,先将我们的日志put一份到hdfs
grunt> ls
hdfs://192.168.1.180:9000/usr/root/test/access.log 133878764
grunt> a = load 'access.log'             ###统计一下我们公司日志的IP访问次数
grunt> using PigStorage(' ')      以空格为分隔符,默认也是table为分隔符
grunt> AS (ip,a1,a2,a3,a4,a5,a6,a7,a8);
grunt> b = foreach a generate ip;
grunt> c = group b by ip;
grunt> d = foreach c generate group,COUNT($1);
grunt> dump d;      ###输出到标准输出
grunt> store d INTO 'output';     ###保存到hdfs,在浏览器可以查看到,我们从hdfs存储到mysql数据库
[iyunv@master]# /usr/local/sqoop/bin/sqoop export --connect "jdbc:mysql://127.0.0.1:3306/test?useUnicode=true&characterEncoding=utf-8" --username root --password 123456 --table table_ip --export-dir /usr/root/test/output/part-r-00000 --fields-terminated-by '\t' -m 1
我们到数据库进行查看我们存储的数据

到此希望对大家能有所帮助,有问题的话可以加Q继续探讨(645868779)

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-155031-1-1.html 上篇帖子: 一个简单的关于hadoop的hdfs API的接口的测试用例 下篇帖子: Hadoop2.4.1环境
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表