设为首页 收藏本站
查看: 808|回复: 0

[经验分享] Hadoop HBase Hive伪分布式环境搭建

[复制链接]

尚未签到

发表于 2016-12-10 10:11:40 | 显示全部楼层 |阅读模式
Hadoop HBase Hive
启动:
$HADOOP_HOME/bin/start-all.sh
$HBASE_HOME/bin/start-hbase.sh
$HIVE_HOME/bin/hive start
环境配置
1、JDK安装
2、SSH配置
3、环境变量
/etc/profile
    export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_65
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${JRE_HOME}/lib/rt.jar
export HADOOP_HOME=/usr/local/hadoop
export CLASSPATH=.:$CLASSPATH:$HADOOP_HOME/lib
export HBASE_HOME=/usr/local/hbase
export HIVE_HOME=/usr/local/hive
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib
export PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin:${HADOOP_HOME}/bin:${HBASE_HOME}/bin:$HIVE_HOME/bin:$PATH
Hadoop   
Hadoop版本:1.1.2

目的

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).


Prerequisites



Required Software


Required software for Linux and Windows include:
1.    JavaTM 1.6.x, preferably from Sun, must be installed.
2.    ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
Additional requirements for Windows include:
1.   Cygwin - Required for shell support in addition to the required software above.


Installing Software


If your cluster doesn't have the requisite software you will need to install it.
For example on Ubuntu Linux:
$ sudo apt-get install ssh 
$ sudo apt-get install rsync
On Windows, if you did not install the required software when you installed cygwin, start the cygwin installer and select the packages:
·  openssh - the Net category


下载


To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.


Prepare to Start the Hadoop Cluster


Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_65
Try the following command:
$ bin/hadoop 
This will display the usage documentation for the hadoop script.

 


伪分布式Pseudo-Distributed Operation


Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.


配置Configuration


Use the following: 
hostname myhadoop
vi /etc/hostname
myhadoop
vi /etc/hosts
ip myhadoop
conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://myhadoop:9000</value>
     </property>
</configuration>
conf/hdfs-site.xml:

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
    <property>
       <name>dfs.name.dir</name>
       <value>/usr/local/hadoop/hadoopdata/dfsname</value>
    </property>
    <property>
       <name>dfs.data.dir</name>                                   
<value>/usr/local/hadoop/hadoopdata/dfsdata</value>

    </property>
</configuration>
conf/mapred-site.xml:

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>myhadoop:9001</value>
     </property>
</configuration>

Setup passphraseless ssh


Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys



执行Execution


Format a new distributed-filesystem:
$ bin/hadoop namenode -format
Start the hadoop daemons:
$ bin/start-all.sh
The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).
Browse the web interface for the NameNode and the JobTracker; by default they are available at:
<!--[if !supportLists]-->·         <!--[endif]-->NameNode http://localhost:50070/
<!--[if !supportLists]-->·         <!--[endif]-->JobTracker http://localhost:50030/
Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input
Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
Examine the output files:
Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output 
$ cat output/*
or
View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*
When you're done, stop the daemons with:
$ bin/stop-all.sh
 
HBase
HBase版本:0.94.7
hbase-site.xml
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://myhadoop:9000/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>myhadoop</value>
  </property>

bin/start-hbase.sh


 
Hive
一、hive配置
 cp hive-default.xml.template hive-site.xml
 cp hive-log4j.properties.template hive-log4j.properties
 cp hive-env.sh.template hive-env.sh
 
二、修改hive-env.sh
配置HADOOP_HOME路径。
三、修改hive-site.xml配置文件,把Hive的元数据存储到MySQL中
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://myhadoop:3306/hive_metadata?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
 
<property>
    <name>hive.aux.jars.path</name>
<value>file:///usr/local/hive/lib/hive-hbase-handler-0.9.0.jar,file:///usr/local/hive/lib/hbase-0.94.7-security.jar,file:///usr/local/hive/lib/protobuf-java-2.4.0a.jar,file:///usr/local/hive/lib/zookeeper-3.4.5.jar</value>
</property>
删除/usr/local/hive/lib下的hbase-0.92.0.jarhbase-0.92.0-tests.jarzookeeper-3.4.3.jar
 
hbase拷贝hbase-0.94.7-security.jarzookeeper-3.4.5.jarprotobuf-java-2.4.0a.jarhive/lib下。
三、修改hive-log4j.properties
#log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
 
四、在hdfs上面,创建目录
$HADOOP_HOME/bin/hadoop fs -mkdrr /tmp
$HADOOP_HOME/bin/hadoop fs –mkdir /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
 
五、手动上传mysqljdbc库到hive/lib
~ ls /home/cos/toolkit/hive-0.9.0/lib
mysql-connector-java-5.1.22-bin.jar
六、启动hive
方式1
bin/hive start
方式2
#启动metastore服务
~ bin/hive --service metastore &
Starting Hive Metastore Server
 
#启动hiveserver服务
~ bin/hive --service hiveserver &
Starting Hive Thrift Server
#启动hive客户端
~ bin/hive shell
Logging initialized using configuration in file:/root/hive-0.9.0/conf/hive-log4j.properties
Hive history file=/tmp/root/hive_job_log_root_201211141845_1864939641.txt
 
hive> show tables
OK
 
 
 


Hive函数、复杂类型访问操作


hive提供了复合数据类型:
Structs: structs内部的数据可以通过DOT(.)来存取,例如,表中一列c的类型为STRUCT{a INT; b INT},我们可以通过c.a来访问域a
Maps(K-V对):访问指定域可以通过["指定域名称"]进行,例如,一个Map M包含了一个group-》gid的kv对,gid的值可以通过M['group']来获取
Arrays:array中的数据为相同类型,例如,假如array A中元素['a','b','c'],则A[1]的值为'b'
 
Array
建表:
create table class_test(name string, student_id_list array<INT>)  ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':';
导数据:
vim test.txt
034,1:2:3:4
035,5:6
036,7:8:9:10
 
LOAD DATA LOCAL INPATH '/opt/test.txt' INTO TABLE class_test ;
查询:
select student_id_list[3] from class_test;
 
Map
建表:
create table employee(id string, perf map<string, int>) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':';
导数据:
vim test2.txt  
1   job:80,team:60,person:70
2   job:60,team:80
3   job:90,team:70,person:100
 
LOAD DATA LOCAL INPATH '/opt/test2.txt' INTO TABLE employee;
查询:
select perf['person'] from employee; 
select perf['person'] from employee where perf['person'] is not null;
 
Struct使用
建表:
create table student_test(id INT, info struct<name:STRING, age:INT>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':'; 
'FIELDS TERMINATED BY' :字段与字段之间的分隔符
'COLLECTION ITEMS TERMINATED BY' :一个字段各个item的分隔符
导入数据:
cat test3.txt  
1,zhou:30 
2,yan:30 
3,chen:20 
4,li:80 
LOAD DATA LOCAL INPATH '/opt/test3.txt' INTO TABLE student_test; 
查询:
select info.age from student_test;
 
 
查询每年9.30-10.07十一期间的身份证号:
select substr(rzsj,1,4) as year, sfzmhm from jnlk where substr(rzsj,6,5)>='09-30' and substr(rzsj,6,5)<='10-07' and rzsj is not null order by year,sfzmhm;
练习
Hbase创建表:
create 'blog','article','author'
 
插入hbase数据:
put 'blog','1','article:title','Head First HBase'
put 'blog','1','article:content','HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data.'
put 'blog','1','article:tags','Hadoop,HBase,NoSQL'
put 'blog','1','author:name','hujinjun'
put 'blog','1','author:nickname','一叶渡江'
 
put 'blog','10','article:tags','Hadoop'
put 'blog','10','author:nickname','heyun'
 
 
put 'blog','100','article:tags','hbase,nosql'
put 'blog','100','author:nickname','shenxiu'
 
hive:
CREATE EXTERNAL TABLE blog(key int,title string,content string,tags string,name string,nickname string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "article:title,article:content,article:tags,author:name,author:nickname") TBLPROPERTIES("hbase.table.name" = "blog");
 
 
hive> create table wyp (id int, name string, age int, tel string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;
 
vim /opt/wyp.txt
1   wyp 25  13188888888888
2   test   30  13888888888888
3   zs  34  899314121
 
hive> load data local inpath '/opt/wyp.txt' into table wyp;
 
vim /opt/add.txt
5   wyp1   23  131212121212
6   wyp2   24  134535353535
7   wyp3   25  132453535353
8   wyp4   26  154243434355
 
$HADOOP_HOME/bin/hadoop fs -mkdir /wyp
$HADOOP_HOME/bin/hadoop fs -copyFromLocal /opt/add.txt /wyp/add.txt
 
hive> load data inpath '/wyp/add.txt' into table wyp;
 
hive> select * from wyp;
 
 
FAQ
 


hive hwi 启动错误


错误日志:
INFO hwi.HWIServer: HWI is starting up
WARN conf.HiveConf: DEPRECATED: Ignoring hive-default.xml found on the CLASSPATH at /usr/local/hive/conf/hive-default.xml
FATAL hwi.HWIServer: HWI WAR file not found at /usr/local/hive/usr/local/hive/lib/hive-hwi-0.9.0.war
解决方法:
这样的错误解决办法很简单,hive-site.xml中添加:
<property>
  <name>hive.hwi.war.file</name>
  <value>lib/hive-hwi-0.9.0.war</value>
  <description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description>
</property>
否则路径错误!
 

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-312251-1-1.html 上篇帖子: 基于hadoop的网络爬虫设计1.0 下篇帖子: Hadoop error: Bad connection to FS. command aborted.
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表