ElasticSearch和hive结合使用

寂寞大萝卜 发表于 2019-1-29 07:08:31

　　首先去这个网站下载elasticsearch-hadoop-2.0.2.jar
　　可以用maven下载
　　
　　org.elasticsearch
　　elasticsearch-hadoop
　　2.0.2
　　
　　也有最新版本
　　
　　org.elasticsearch
　　elasticsearch-hadoop
　　2.1.0.Beta3
　　
　　也可以从这里下载http://www.elasticsearch.org/overview/hadoop/download/
　　这里是教程网址：http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#_writing_data_to_elasticsearch_2
　　取得这个jar包之后，可以将其拷贝到hive的lib目录中，然后以如下方式打开hive命令窗口：
　　bin/hive -hiveconf hive.aux.jars.path=/root/hive/lib/elasticsearch-hadoop-2.0.2.jar
　　这个也可以写在hive的配置文件中，
　　==============================================================================================================
　　CLI configuration.
　　$ bin/hive --auxpath=/path/elasticsearch-hadoop.jar
　　or use the hive.aux.jars.path property specified either through the command-line or, if available, through if the hive-site.xml file, to register additional jars (that accepts an URI as well):
　　$ bin/hive -hiveconf hive.aux.jars.path=/path/elasticsearch-hadoop.jar
　　or if the hive-site.xml configuration can be modified, one can register additional jars through the hive.aux.jars.path option (that accepts an URI as well):
　　hive-site.xml configuration.
　　
　　hive.aux.jars.path
　　/path/elasticsearch-hadoop.jar
　　A comma separated list (with no spaces) of the jar files
　　
　　==============================================================================================================
　　上面说明官网给的配置方式
　　首先你得告诉es这个表是ElasticSearch支持的：
　　建立view表是
　　CREATE EXTERNAL TABLE user(id BIGINT, name STRING) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'radio/artists','es.index.auto.create' = 'true');
　　如果无法插入数据请执行下面命令指定es端口和ip：
　　CREATE EXTERNAL TABLE user(id BIGINT, name STRING) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'radio/artists','es.index.auto.create' = 'true','es.nodes'='192.168.1.88','es.port'='9200');
　　其他配置请参见这里http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/configuration.html
　　es.resource的radiott/artiststt分别是索引名和索引的类型，这个是在es访问数据时候使用的。
　　然后建立源数据表：
　　CREATE TABLE user_source (id INT, name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
　　在linux里建立一个data.txt数据导入到user_source里
　　vim data.txt
　　1,medcl
　　2,lcdem
　　3,tom
　　4,jack
　　将数据导入到user_source表中：
　　LOAD DATA LOCAL INPATH '/home/steven/data.txt' OVERWRITE INTO TABLE user_source;
　　hive> select * from user_source;
　　OK
　　1 medcl
　　2 lcdem
　　3 tom
　　4 jack
　　Time taken: 0.149 seconds, Fetched: 4 row(s)
　　将数据导入到user表中：
　　INSERT OVERWRITE TABLE user SELECT s.id, s.name FROM user_source s;
　　不知道为什么执行完insert后发现找不到文件
　　INSERT OVERWRITE TABLE user SELECT s.id,s.name FROM user_source s;
　　Total jobs = 1
　　Launching Job 1 out of 1
　　Number of reduce tasks is set to 0 since there's no reduce operator
　　java.io.FileNotFoundException: File does not exist: hdfs://dev-53:8020/root/hive/lib/elasticsearch-hadoop-2.0.2.jar
　　at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
　　at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
　　at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
　　at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
　　at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
　　at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
　　at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
　　at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
　　at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
　　at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300)
　　at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
　　at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
　　at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
　　at java.security.AccessController.doPrivileged(Native Method)
　　at javax.security.auth.Subject.doAs(Subject.java:415)
　　at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
　　at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
　　at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
　　at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
　　at java.security.AccessController.doPrivileged(Native Method)
　　at javax.security.auth.Subject.doAs(Subject.java:415)
　　at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
　　at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
　　at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
　　at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
　　at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
　　at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
　　at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
　　at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
　　at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
　　at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
　　at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
　　at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
　　at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
　　at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
　　at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
　　at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
　　at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
　　at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
　　at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
　　at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
　　at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
　　at java.lang.reflect.Method.invoke(Method.java:606)
　　at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
　　Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://dev-53:8020/root/hive/lib/elasticsearch-hadoop-2.0.2.jar)'
　　FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
　　后面解决方法是这样解决的
　　首先用hadoop命令把
　　hadoop fs -put /root/hive/lib/elasticsearch-hadoop-2.0.2.jar /tmp/elasticsearch-hadoop-2.0.2.jar加载到hdfs
　　然后在启动的时候这样启动
　　bin/hive -hiveconf hive.aux.jars.path=/tmp/elasticsearch-hadoop-2.0.2.jar
　　这样就ok了
　　如果插入报es链接失败请添加esip和port；

页: [1]

运维网's Archiver

ElasticSearch和hive结合使用