设为首页 收藏本站
查看: 2069|回复: 0

[经验分享] flume将日志到hive实现

[复制链接]

尚未签到

发表于 2015-11-27 20:48:37 | 显示全部楼层 |阅读模式
  科普:flume是apache下的一个日志收集系统,主要由source+channel+ sink组成:source可以看做是源,也就是日志的来源,本例子是用exec source;channel可以看做是中转的路,可以是文件也可以是内存;sink是输出,一般有hive sink,hbase sink,hdfs sink,avro sink。当然一个机器可以有多个source+channel+ sink。
  资源: 172.16.6.152 node1 安装有flume+datanode
       172.16.6.151 master  安装有flume +hive+ namenode
  思路:1.node1机器中使用exec source 执行tail -F /*/*.log获取日志的source,然后使用memory channel,然后使用avro传输至master
  2.master中的flume 的souce是avro,接收node1机器所发送过来的数据,经过memory channel,最后使用hdfs sink写入hdfs中
  3.由于本人水平有限,本来想直接在master中使用hive sink,无奈一直报hive class找不着的错误。所以就使用了另外一个损招,hive导入数据,是可以直接复制文件进入到hive表所对应的location中,所以就有了解决办法。
  附上详细的设置:
  1.flume安装,解压,然后修改配置文件conf/flume-env.sh 设置环境变量
  2.node1节点,在conf目录下生成example4.conf文件,example4.conf文件内容如下
  

  # Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 100000
agent1.channels.ch1.transactionCapacity = 100000
agent1.channels.ch1.keep-alive = 30

# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
#agent1.sources.avro-source1.channels = ch1
#agent1.sources.avro-source1.type = avro
#agent1.sources.avro-source1.bind = 0.0.0.0
#agent1.sources.avro-source1.port = 41414
#agent1.sources.avro-source1.threads = 5

#define source monitor a file
agent1.sources.avro-source1.type = exec
agent1.sources.avro-source1.shell = /bin/bash -c
agent1.sources.avro-source1.command =  tail -F /opt/cdh5.3.0/hadoop/logs/hadoop-hadoop-namenode-node1.log
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.threads = 5

# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.log-sink1.channel = ch1
agent1.sinks.log-sink1.type = avro
agent1.sinks.log-sink1.hostname=master
agent1.sinks.log-sink1.port=41415


# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1

  

  3.master节点flume设置,在conf中生成example6.conf,内容如下
  # Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 100000
agent1.channels.ch1.transactionCapacity = 100000
agent1.channels.ch1.keep-alive = 30

# Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
#agent1.sources.avro-source1.channels = ch1
#agent1.sources.avro-source1.type = avro
#agent1.sources.avro-source1.bind = 0.0.0.0
#agent1.sources.avro-source1.port = 41414
#agent1.sources.avro-source1.threads = 5

#define source monitor a file
agent1.sources.avro-source1.type = avro
agent1.sources.avro-source1.bind = master
agent1.sources.avro-source1.port =  41415
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.threads = 5

# Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.log-sink1.channel = ch1
agent1.sinks.log-sink1.type = hdfs
agent1.sinks.log-sink1.hdfs.path = hdfs://master:8020/zhonghui/flume/logs/type=hadoop/host=172.16.6.152/
agent1.sinks.log-sink1.hdfs.useLocalTimeStamp=true
agent1.sinks.log-sink1.hdfs.writeFormat = Text
agent1.sinks.log-sink1.hdfs.fileType = DataStream
agent1.sinks.log-sink1.hdfs.rollInterval = 0
agent1.sinks.log-sink1.hdfs.rollSize = 1000000
agent1.sinks.log-sink1.hdfs.rollCount = 0
agent1.sinks.log-sink1.hdfs.batchSize = 1000
agent1.sinks.log-sink1.hdfs.txnEventMax = 1000
agent1.sinks.log-sink1.hdfs.callTimeout = 60000
agent1.sinks.log-sink1.hdfs.appendTimeout = 60000
agent1.sinks.log-sink1.hdfs.filePrefix = log
agent1.sinks.log-sink1.hdfs.batchSize=2



# Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1

  4.启动master的flume,和node1的flum
  master:bin/flume-ng agent --conf conf --conf-file conf/example6.conf --name agent1 -Dflume.root.logger=INFO,consolee
  node1:bin/flume-ng agent --conf conf --conf-file conf/example4.conf --name agent1 -Dflume.root.logger=INFO,consolee
  6.大概过个几分钟(最多10分钟,当然你可以改配置文件,我暂时没有找到配置项),就会产生文件,如果会产生文件,那就成功了一半,反之,自己小心核对配置文件。然后停止node1和master上的服务,删除刚生成的文件夹(包括文件)
  7.hive环境搭建我就不复述了,自己百度吧。
  create table logs(
create string,
content string)
PARTITIONED BY (type string,host string)
row format delimited fields terminated by ','  location '/zhonghui/flume/logs/';

  alter table logs add partition (type='hadoop',host='172.16.6.152');

  这样就创建了一个hive表,以及一个分区
  8.启动node1和master上的flume。然后坐等数据吧
  hive> select * from logs;
OK
2015-09-15 17:49:15 800 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 millisecondshadoop 172.16.6.152
2015-09-15 17:49:15 801 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).hadoop 172.16.6.152
2015-09-15 17:49:37 825 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 1050 hadoop 172.16.6.152
2015-09-15 17:49:37 853 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073860942_120256 172.16.6.153:50010 172.16.6.152:50010hadoop 172.16.6.152
2015-09-15 17:49:37 870 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073860943_120257 172.16.6.152:50010 172.16.6.151:50010hadoop 172.16.6.152
2015-09-15 17:49:38 801 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* BlockManager: ask 172.16.6.151:50010 to delete [blk_1073860943_120257]hadoop 172.16.6.152
2015-09-15 17:49:41 801 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* BlockManager: ask 172.16.6.153:50010 to delete [blk_1073860942_120256]hadoop 172.16.6.152
2015-09-15 17:49:44 802 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* BlockManager: ask 172.16.6.152:50010 to delete [blk_1073860942_120256hadoop 172.16.6.152
2015-09-15 17:49:45 800 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 millisecondshadoop 172.16.6.152
2015-09-15 17:49:45 801 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).hadoop 172.16.6.152
2015-09-15 17:50:15 800 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 millisecondshadoop 172.16.6.152
2015-09-15 17:50:15 800 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).hadoop 172.16.6.152
2015-09-15 17:50:22 104 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 172.16.6.150hadoop 172.16.6.152
2015-09-15 17:50:22 104 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logshadoop 172.16.6.152

  附:https://flume.apache.org/FlumeUserGuide.html#hdfs-sink 官网上比较详细的资料
  

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-144379-1-1.html 上篇帖子: (2) flume 入门学习 HelloWorld 及HDFS 遇到的问题 总结 下篇帖子: Flume-ng 1.4 运行抛出“line 81: syntax error in conditional expression: unexpected to
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表