ELK+Filebeat+Nginx集中式日志解决方案（三）—— 添加kafka+zook...

6y45t4 · 发表于 2017-3-24 16:48:32

ELK+Filebeat+Nginx集中式日志解决方案（三）—— 添加kafka+zookeeper集群
一、使用说明：

Kafka:

Kafka 是一个消息系统，原本开发自 LinkedIn，用作 LinkedIn 的活动流（Activity Stream）和运营数据处理管道（Pipeline）的基础。现在它已被多家公司作为多种类型的数据管道和消息系统使用。活动流数据是几乎所有站点在对其网站使用情况做报表时都要用到的数据中最常规的部分。活动数据包括页面访问量（Page View）、被查看内容方面的信息以及搜索情况等内容。这种数据通常的处理方式是先把各种活动以日志的形式写入某种文件，然后周期性地对这些文件进行统计分析。运营数据指的是服务器的性能数据（CPU、IO 使用率、请求时间、服务日志等等数据)，总的来说，运营数据的统计方法种类繁多。

不太熟悉的话，可以参考一下这一篇文章：《Kafka 入门 and kafka+logstash 实战应用》

Zookeeper:

ZooKeeper是一种为分布式应用所设计的高可用、高性能且一致的开源协调服务，是Google的Chubby一个开源实现，是Hadoop和Hbase的重要组件，它提供了一项基本服务：分布式锁服务。由于ZooKeeper的开源特性，后来我们的开发者在分布式锁的基础上，摸索了出了其他的使用方法：配置维护、组服务、分布式消息队列、分布式通知/协调等。
Zookeeper是基于内存同步数据的，所以集群内的节点其内存中的数据结构是完全相同的，因此效率非常高。
它主要是用来解决分布式应用中经常遇到的一些数据管理问题，如：统一命名服务、状态同步服务、集群管理、分布式应用配置项的管理等。

不太熟悉的话，可以参考一下这一篇文章：《ZooKeeper基本讲解 & 集群构建 & 常用操作指令》

二、实验环境

架构图：
iyunv.com-2017-3-24109.png

架构解读 : （整个架构从左到右，总共分为5层）

第一层、数据采集层
最左边的是业务服务器集群，上面安装了filebeat做日志采集，同时把采集的日志分别发送给两个logstash服务。

第二层、数据处理层，数据缓存层
logstash服务把接受到的日志经过格式处理，转存到本地的kafka broker+zookeeper 集群中。

第三层、数据转发层
这个单独的Logstash节点会实时去kafka broker集群拉数据，转发至ES DataNode。

第四层、数据持久化存储
ES DataNode 会把收到的数据，写磁盘，建索引库。

第五层、数据检索，数据展示
ES Master + Kibana 主要协调ES集群，处理数据检索请求，数据展示。

笔者为了节约宝贵的服务器资源，把一些可拆分的服务合并在同一台主机。大家可以根据自己的实际业务环境自由拆分，延伸架构。

8台服务器（centos 6.5 final版本）：

1
2
3
4
5
6
7
8

192.168.1.194 (filebeat收集日志,nginx做为web服务器)
192.168.1.195 (filebeat收集日志,nginx做为web服务器)
192.168.1.196 (logstash, kafka+zookeeper)
192.168.1.197 (kafka+zookeeper)
192.168.1.199 (logstash, kafka+zookeeper)
192.168.1.198（elasticsearch master，kibana，nginx做方向代理）
192.168.1.200 (elasticsearch DataNode)
192.168.1.201 (elasticsearch DataNode)

使用版本：

1
2
3
4
5
6
7
8

java-1.8.0-openjdk
filebeat-5.2.2
logstash-5.2.2
elasticsearch-5.2.2
kibana-5.2.2
nginx-1.6.1
zookeeper-3.4.9
kafka_2.11-0.10.2.0

三、安装配置：

此次实验是在第一和第二篇文章的基础上，不熟悉的可以先看一下第一篇文章《ELK+Filebeat+Nginx集中式日志解决方案（一）》和第二篇文章《ELK+Filebeat+Nginx集中式日志解决方案（二）——添加ElasticSearch集群》。

1、安装和配置zookeeper集群：

zookeeper官网: http://zookeeper.apache.org/
在192.168.1.196，192.168.1.197，192.168.1.199上分别安装java-1.8.0-openjdk

1	tar xf zookeeper-3.4.9.tar.gz

配置文件：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

[iyunv@localhost zookeeper-3.4.9]# vim conf/zoo.cfg

内容为：
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/u01/zookeeper/zookeeper-3.4.9/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
server.6=192.168.1.196:2888:3888
server.7=192.168.1.197:2888:3888
server.9=192.168.1.199:2888:3888
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/ ... html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

创建myid文件：

1
2
3
4
5
6
7
8

#192.168.1.196
echo 6 > /u01/zookeeper/zookeeper-3.4.9/data/myid

#192.168.1.197
echo 7 > /u01/zookeeper/zookeeper-3.4.9/data/myid

#192.168.1.199
echo 9 > /u01/zookeeper/zookeeper-3.4.9/data/myid

启动服务 & 查看节点状态

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

#192.168.1.196
bin/zkServer.sh start
bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/workspace/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: leader

#192.168.1.197
bin/zkServer.sh start
bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/workspace/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: follower

#192.168.1.199
bin/zkServer.sh start
bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/workspace/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: follower

Zookeeper的集群已经配置ok了

2、安装和配置kafka集群：

Kafka官网: http://kafka.apache.org/
在192.168.1.196，192.168.1.197，192.168.1.199上分别执行

1	tar xf kafka_2.11-0.10.2.0.tar.gz

192.168.1.196修改配置文件：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94

[iyunv@localhost workspace]# vim kafka_2.11-0.10.2.0/config/server.properties

内容为：
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults
############################# Server Basics #############################
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=6
# Switch to enable topic deletion or not, default value is false
#delete.topic.enable=true
############################# Socket Server Settings #############################
# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
#    listeners = listener_name://host_name:port
# EXAMPLE:
#    listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://192.168.1.196:9092
# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL
# The number of threads handling network requests
num.network.threads=3
# The number of threads doing disk I/O
num.io.threads=8
# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400
# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600
############################# Log Basics #############################
# A comma seperated list of directories under which to store log files
log.dirs=/home/workspace/kafka_2.11-0.10.2.0/data
# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1
# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1
############################# Log Flush Policy #############################
# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.
# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000
# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000
############################# Log Retention Policy #############################
# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.
# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=168
# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824
# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824
# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=192.168.1.196:2181,192.168.1.197:2181,192.168.1.199:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000

其他两台192.168.1.197,192.168.1.199修改配置文件：

1
2
3
4
5
6
7
8

# 修改 broker.id和listeners
# 192.168.1.197
broker.id=7
listeners=PLAINTEXT://192.168.1.197:9092

# 192.168.1.199
broker.id=9
listeners=PLAINTEXT://192.168.1.199:9092

其他都一致！！！

启动服务(都一样启动)：

1	bin/kafka-server-start.sh config/server.properties

ok，此时kafka+zookeeper集群完成！！！

3、在192.168.1.199上面安装和配置logstash

安装方法参照第一篇文章《ELK+Filebeat+Nginx集中式日志解决方案（一）》！此机器上的logstash主要是用来做为kafka集群的消息消费者，也就是从kafka集群主动拉取消息，再转发到elasticsearch集群上面！

修改配置：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

vim /etc/logstash/conf.d/kafka2elasticsearch.conf

内容为：
input{
kafka {
      bootstrap_servers => "192.168.1.196:9092,192.168.1.197:9092,192.168.1.199:9092"
      group_id => "logstashnginx"
         topics => ["testnginx"]
      consumer_threads => 10
}
}

filter {
#if "nginx-accesslog" in [tags] {
      grok {
match => { "message" => "%{HTTPDATE:timestamp}\|%{IP:remote_addr}\|%{IPORHOST:http_host}\|(?:%{DATA:http_x_forwarded_for}|-)\|%{DATA:request_method}\|%{DATA:request_uri}\|%{DATA:server_protocol}\|%{NUMBER:status}\|(?:%{NUMBER:body_bytes_sent}|-)\|(?:%{DATA:http_referer}|-)\|%{DATA:http_user_agent}\|(?:%{DATA:request_time}|-)\|"}
      }
      mutate {
            convert => ["status","integer"]
            convert => ["body_bytes_sent","integer"]
            convert => ["request_time","float"]
      }
      geoip {
            source=>"remote_addr"
      }
      date {
            match => [ "timestamp","dd/MMM/YYYY:HH:mm:ss Z"]
      }
      useragent {
            source=>"http_user_agent"
      }
#}
#if "sys-messages"  in [tags] {
#       grok {
#                      match => { "message" => "%{SYSLOGLINE}" }
#                      add_field => [ "received_at", "%{@timestamp}" ]
#                      add_field => [ "received_from", "%{host}" ]
#       }
#       date {
#             match => [ "timestamp", "MMM  d HH:mm:ss" ]
#       }
      #ruby {
      #       code => "event['@timestamp'] = event['@timestamp'].getlocal"
      #}
#}
}

output {
elasticsearch {
      hosts => ["192.168.1.200:9200","192.168.1.201:9200"]
      index => "logstash-nginx-%{+YYYY.MM.dd}"
      #document_type => "%{type}"
      flush_size => 50000
      idle_flush_time => 10
}
#stdout { codec => rubydebug }
}

然后启动logstash：

1	nohup logstash -r -f kafka2elasticsearch.conf --path.settings /etc/logstash/ &

4、在192.168.1.196上面配置logstash

修改配置：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

vim /etc/logstash/conf.d/2kafka.conf

内容为：
input {
beats {
port => 5044
}
}
output {
#elasticsearch {
# hosts => ["192.168.1.198:9200"]
# index => "logstash-%{type}-%{+YYYY.MM.dd}"
# document_type => "%{type}"
#}
#stdout { codec => rubydebug }
kafka {
#workers => 3
bootstrap_servers => "192.168.1.196:9092,192.168.1.197:9092,192.168.1.199:9092"
topic_id => "testnginx"
}
}

然后启动logstash：

1	nohup logstash -r -f 2kafka.conf --path.settings /etc/logstash/ &

ok!到此整个集群架构就搭设完成了！！！可以在kibana上面看到被搜集的日志信息了！

后面还会有后续文章，敬请关注！！！

四、未解决的：
在实验的过程中，发现第二层的logstash可以从filebeat中获取到tags、document_type这类值，但是通过kafka+zookeeper集群之后，第三层的logstash再从kafka+zookeeper集群就拉群不到这类fileds信息了，所以也没办法在filter中利用不通的tags或者document_type这类值做为不通日志来源的判断，从而无法使用不同的grok匹配。这里哪位同学有什么解决办法，可以在评论区说一下

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

Red Hat RHCE 8 (EX294) Cert Guide

c++ size_t 和 int 的区别

HERE 使用 AWS EF 和 JFrog Artifactory 打

C++ 指针大全：从基础到进阶，一篇快速上手

wirelessnetview好用的无线分析工具

亿图图示专家(EDraw Max) V7.9 中文破解版

[经验分享] ELK+Filebeat+Nginx集中式日志解决方案（三）—— 添加kafka+zook...

相关帖子

扫码加入运维网微信交流群