设为首页 收藏本站
查看: 1021|回复: 0

[经验分享] ElasticSearch和hive结合使用

[复制链接]
发表于 2019-1-29 07:08:31 | 显示全部楼层 |阅读模式
  首先去这个网站下载elasticsearch-hadoop-2.0.2.jar
  可以用maven下载
  
  org.elasticsearch
  elasticsearch-hadoop
  2.0.2
  
  也有最新版本
  
  org.elasticsearch
  elasticsearch-hadoop
  2.1.0.Beta3
  
  也可以从这里下载http://www.elasticsearch.org/overview/hadoop/download/
  这里是教程网址:http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/hive.html#_writing_data_to_elasticsearch_2
  取得这个jar包之后,可以将其拷贝到hive的lib目录中,然后以如下方式打开hive命令窗口:
  bin/hive -hiveconf hive.aux.jars.path=/root/hive/lib/elasticsearch-hadoop-2.0.2.jar
  这个也可以写在hive的配置文件中,
  ==============================================================================================================
  CLI configuration.
  $ bin/hive --auxpath=/path/elasticsearch-hadoop.jar
  or use the hive.aux.jars.path property specified either through the command-line or, if available, through if the hive-site.xml file, to register additional jars (that accepts an URI as well):
  $ bin/hive -hiveconf hive.aux.jars.path=/path/elasticsearch-hadoop.jar
  or if the hive-site.xml configuration can be modified, one can register additional jars through the hive.aux.jars.path option (that accepts an URI as well):
  hive-site.xml configuration.
  
  hive.aux.jars.path
  /path/elasticsearch-hadoop.jar
  A comma separated list (with no spaces) of the jar files
  
  ==============================================================================================================
  上面说明官网给的配置方式
  首先你得告诉es这个表是ElasticSearch支持的:
  建立view表是
  CREATE EXTERNAL TABLE user(id BIGINT, name STRING) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'radio/artists','es.index.auto.create' = 'true');
  如果无法插入数据请执行下面命令指定es端口和ip:
  CREATE EXTERNAL TABLE user(id BIGINT, name STRING) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'radio/artists','es.index.auto.create' = 'true','es.nodes'='192.168.1.88','es.port'='9200');
  其他配置请参见这里http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/configuration.html
  es.resource的radiott/artiststt分别是索引名和索引的类型,这个是在es访问数据时候使用的。
  然后建立源数据表:
  CREATE TABLE user_source (id INT, name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
  在linux里建立一个data.txt数据导入到user_source里
  vim data.txt
  1,medcl
  2,lcdem
  3,tom
  4,jack
  将数据导入到user_source表中:
  LOAD DATA LOCAL INPATH '/home/steven/data.txt' OVERWRITE INTO TABLE user_source;
  hive> select * from user_source;
  OK
  1 medcl
  2 lcdem
  3 tom
  4 jack
  Time taken: 0.149 seconds, Fetched: 4 row(s)
  将数据导入到user表中:
  INSERT OVERWRITE TABLE user SELECT s.id, s.name FROM user_source s;
  不知道为什么执行完insert后发现找不到文件
  INSERT OVERWRITE TABLE user SELECT s.id,s.name FROM user_source s;
  Total jobs = 1
  Launching Job 1 out of 1
  Number of reduce tasks is set to 0 since there's no reduce operator
  java.io.FileNotFoundException: File does not exist: hdfs://dev-53:8020/root/hive/lib/elasticsearch-hadoop-2.0.2.jar
  at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1110)
  at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
  at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
  at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
  at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
  at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
  at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:264)
  at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300)
  at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387)
  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
  at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
  at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
  at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
  at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
  at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
  at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
  at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
  at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
  at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
  at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
  at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
  Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://dev-53:8020/root/hive/lib/elasticsearch-hadoop-2.0.2.jar)'
  FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
  后面解决方法是这样解决的
  首先用hadoop命令把
  hadoop fs -put /root/hive/lib/elasticsearch-hadoop-2.0.2.jar /tmp/elasticsearch-hadoop-2.0.2.jar加载到hdfs
  然后在启动的时候这样启动
  bin/hive -hiveconf hive.aux.jars.path=/tmp/elasticsearch-hadoop-2.0.2.jar
  这样就ok了
  如果插入报es链接失败请添加esip和port;


运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-668876-1-1.html 上篇帖子: Elasticsearch集群管理 下篇帖子: 3:elasticsearch服务编写
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表