|
一、前言
从上周一直在调研找一款好用的kafka监控,我测试使用过的KafkaOffsetMonitor、Burrow、kafka-monitor、Kafka-Manager,他们各有优缺点,具体情况我这里就不展开描述了,大家可以到它们的git上去查看, 并且它们基本上都是监控topic的写入和读取等等,没有提供对于整体集群的监控信息,比如集群的分片、延时、内存使用情况等等,无意中发现了jmxtrans,jmxtrans它是一个通过jmx采集java应用的数据采集器,他的输出可以是Graphite、StatsD、Ganglia、InfluxDb等等,刚好我们现有的监控是通过InfluxDb做数据存储的,通过Grafana做展示,下面就给大家介绍一下jmxtrans+InfluxDb+Grafana监控kafka的整体解决方案,并且不需要任何额外的开发工作,完全使用原生的。
二、环境介绍
1、角色
1
2
3
4
5
6
7
8
| a、10.10.10.10 InfluxDb
b、10.10.10.100 Grafana
c、10.10.30.69 jmxtrans
d、kafka集群
10.10.20.14 node1
10.10.20.15 node2
10.10.20.16 node3
10.10.20.17 node4
|
2、软件版本
1
2
3
4
| influxdb-1.2.4-1.x86_64
grafana-4.1.1-1484211277.x86_64
jmxtrans-266.rpm
kafka_2.10-0.9.0.0.jar.asc
|
3、架构图
三、配置规划
1、jmxtrans我们可以分别在每台kafka节点上部署,也可以部署到一台机器上,我这里是选择了后者,因为我的集群小,这样配置文件可以集中管理,如果集群比较大,可以考虑分散部署。
2、关于jmxtrans的配置文件,分全局指标(每个kafka节点)和topic指标,全局指标每个节点一个配置文件,命名规则:base_10.10.20.14.json,topic指标是每个topic一个配置文件,命名规则:falcon_monitor_us_17.json
四、监控指标
1、全局指标
每秒输入的流量
1
2
3
4
| "obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec"
"attr" : [ "Count" ]
"resultAlias":"BytesInPerSec"
"tags" : {"application" : "BytesInPerSec"}
|
每秒输入的流量
1
2
3
4
| "obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec"
"attr" : [ "Count" ]
"resultAlias":"BytesOutPerSec"
"tags" : {"application" : "BytesOutPerSec"}
|
每秒输入的流量
1
2
3
4
| "obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec"
"attr" : [ "Count" ]
"resultAlias":"BytesRejectedPerSec"
"tags" : {"application" : "BytesRejectedPerSec"}
|
每秒的消息写入总量
1
2
3
4
| "obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec"
"attr" : [ "Count" ]
"resultAlias":"MessagesInPerSec"
"tags" : {"application" : "MessagesInPerSec"}
|
每秒FetchFollower的请求次数
1
2
3
4
| "obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchFollower"
"attr" : [ "Count" ]
"resultAlias":"RequestsPerSec"
"tags" : {"request" : "FetchFollower"}
|
每秒FetchConsumer的请求次数
1
2
3
4
| "obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsumer"
"attr" : [ "Count" ]
"resultAlias":"RequestsPerSec"
"tags" : {"request" : "FetchConsumer"}
|
每秒Produce的请求次数
1
2
3
4
| "obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce"
"attr" : [ "Count" ]
"resultAlias":"RequestsPerSec"
"tags" : {"request" : "Produce"}
|
内存使用的使用情况
1
2
3
4
| "obj" : "java.lang:type=Memory"
"attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage" ]
"resultAlias":"MemoryUsage"
"tags" : {"application" : "MemoryUsage"}
|
GC的耗时和次数
1
2
3
4
| "obj" : "java.lang:type=GarbageCollector,name=*"
"attr" : [ "CollectionCount","CollectionTime" ]
"resultAlias":"GC"
"tags" : {"application" : "GC"}
|
线程的使用情况
1
2
3
4
| "obj" : "java.lang:type=Threading"
"attr" : [ "PeakThreadCount","ThreadCount" ]
"resultAlias":"Thread"
"tags" : {"application" : "Thread"}
|
副本落后主分片的最大消息数量
1
2
3
4
| "obj" : "kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica"
"attr" : [ "Value" ]
"resultAlias":"ReplicaFetcherManager"
"tags" : {"application" : "MaxLag"}
|
该broker上的partition的数量
1
2
3
4
| "obj" : "kafka.server:type=ReplicaManager,name=PartitionCount"
"attr" : [ "Value" ]
"resultAlias":"ReplicaManager"
"tags" : {"application" : "PartitionCount"}
|
正在做复制的partition的数量
1
2
3
4
| "obj" : "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions"
"attr" : [ "Value" ]
"resultAlias":"ReplicaManager"
"tags" : {"application" : "UnderReplicatedPartitions"}
|
Leader的replica的数量
1
2
3
4
| "obj" : "kafka.server:type=ReplicaManager,name=LeaderCount"
"attr" : [ "Value" ]
"resultAlias":"ReplicaManager"
"tags" : {"application" : "LeaderCount"}
|
一个请求FetchConsumer耗费的所有时间
1
2
3
4
| "obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer"
"attr" : [ "Count","Max" ]
"resultAlias":"TotalTimeMs"
"tags" : {"application" : "FetchConsumer"}
|
一个请求FetchFollower耗费的所有时间
1
2
3
4
| "obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower"
"attr" : [ "Count","Max" ]
"resultAlias":"TotalTimeMs"
"tags" : {"application" : "FetchFollower"}
|
一个请求Produce耗费的所有时间
1
2
3
4
| "obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce"
"attr" : [ "Count","Max" ]
"resultAlias":"TotalTimeMs"
"tags" : {"application" : "Produce"}
|
2、topic的监控指标
falcon_monitor_us每秒的写入流量
1
2
3
4
| "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=falcon_monitor_us"
"attr" : [ "Count" ]
"resultAlias":"falcon_monitor_us"
"tags" : {"application" : "BytesInPerSec"}
|
falcon_monitor_us每秒的输出流量
1
2
3
4
| "kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=falcon_monitor_us"
"attr" : [ "Count" ]
"resultAlias":"falcon_monitor_us"
"tags" : {"application" : "BytesOutPerSec"}
|
falcon_monitor_us每秒写入消息的数量
1
2
3
4
| "obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=falcon_monitor_us"
"attr" : [ "Count" ]
"resultAlias":"falcon_monitor_us"
"tags" : {"application" : "MessagesInPerSec"}
|
falcon_monitor_us在每个分区最后的Offset
1
2
3
4
| "obj" : "kafka.log:type=Log,name=LogEndOffset,topic=falcon_monitor_us,partition=*"
"attr" : [ "Value" ]
"resultAlias":"falcon_monitor_us"
"tags" : {"application" : "LogEndOffset"}
|
PS:
1、参数说明
"obj"对应jmx的ObjectName,就是我们要监控的指标
"attr"对应ObjectName的属性,可以理解为我们要监控的指标的值
"resultAlias"对应metric 的名称,在InfluxDb里面就是MEASUREMENTS名
"tags" 对应InfluxDb的tag功能,对与存储在同一个MEASUREMENTS里面的不同监控指标可以做区分,我们在用Grafana绘图的时候会用到,建议对每个监控指标都打上tags
2、对于全局监控,每一个监控指标对应一个MEASUREMENTS,所有的kafka节点同一个监控指标数据写同一个MEASUREMENTS ,对于topc监控的监控指标,同一个topic所有kafka节点写到同一个MEASUREMENTS,并且以topic名称命名
五、安装
1、kafka
这里不详细介绍kafka集群的安装,主要说一下kafka的启动方式,因为我们需要通过jmx采集kafka的监控数据,所以在kafka的启动时候需要启动jmx端口,启动方式如下:
1
2
| cd /data/kafka/bin/
JMX_PORT=9999 nohup ./kafka-server-start.sh ../config/server.properties >/dev/null 2>&1 &
|
2、InfluxDb
1
2
3
4
5
6
7
| yum -y install influxdb ##安装
/etc/init.d/influxdb start ##启动服务
[iyunv@ip-10-10-10-10 jmxtrans]# influx
Connected to http://localhost:8086 version 1.3.2
InfluxDB shell version: 1.3.2
> CREATE USER "root" WITH PASSWORD '123456' WITH ALL PRIVILEGES ##添加一个账号
>
|
3、Grafana
1
2
| yum -y install grafana ##安装
/etc/init.d/grafana-server start ##启动服务
|
4、jmxtrans
六、配置
这里主要介绍jmxtrans采集数据的配置文件撰写和Grafana绘图的配置注意事项,kafka和InfluxDb的配置这里不做描述。
1、jmxtrans
a、jmxtrans默认读取/var/lib/jmxtrans下的配置文件去采集数据的,所以我们把采集kafka监控数据的配置文件都在这个目录下,下面是我的配置文件命名规范:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| [iyunv@ip-10-10-30-69 jmxtrans]# ll
total 96
-rw-r--r-- 1 root root 1657 Aug 18 17:03 article-feedback-10min-json_14.json
-rw-r--r-- 1 root root 1657 Aug 18 17:03 article-feedback-10min-json_15.json
-rw-r--r-- 1 root root 1657 Aug 18 17:04 article-feedback-10min-json_16.json
-rw-r--r-- 1 root root 1657 Aug 18 17:04 article-feedback-10min-json_17.json
-rw-r--r-- 1 root root 8430 Aug 22 08:24 base_10.10.20.14.json
-rw-r--r-- 1 root root 8431 Aug 22 08:24 base_10.10.20.15.json
-rw-r--r-- 1 root root 8431 Aug 22 08:25 base_10.10.20.16.json
-rw-r--r-- 1 root root 8431 Aug 22 08:25 base_10.10.20.17.json
-rw-r--r-- 1 root root 2027 Aug 21 16:19 falcon_monitor_us_14.json
-rw-r--r-- 1 root root 2027 Aug 21 16:20 falcon_monitor_us_15.json
-rw-r--r-- 1 root root 2484 Aug 21 20:58 falcon_monitor_us_16.json
-rw-r--r-- 1 root root 2027 Aug 21 16:20 falcon_monitor_us_17.json
-rw-r--r-- 1 root root 2147 Aug 21 17:43 highgmp-articles-through-primary_14.json
-rw-r--r-- 1 root root 2147 Aug 21 17:46 highgmp-articles-through-primary_15.json
-rw-r--r-- 1 root root 2147 Aug 21 17:46 highgmp-articles-through-primary_16.json
-rw-r--r-- 1 root root 2147 Aug 21 17:47 highgmp-articles-through-primary_17.json
[iyunv@ip-10-10-30-69 jmxtrans]# pwd
/var/lib/jmxtrans
|
b、全局监控的配置文件,以10.10.20.14为例:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
| [iyunv@ip-10-10-30-69 jmxtrans]# cat base_10.10.20.14.json
{
"servers" : [ {
"port" : "9999",
"host" : "10.10.20.14",
"queries" : [ {
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec",
"attr" : [ "Count","OneMinuteRate" ],
"resultAlias":"BytesInPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "BytesInPerSec"}
} ]
},
{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec",
"attr" : [ "Count","OneMinuteRate" ],
"resultAlias":"BytesOutPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "BytesOutPerSec"}
} ]
},
{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec",
"attr" : [ "Count","OneMinuteRate" ],
"resultAlias":"BytesRejectedPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "BytesRejectedPerSec"}
} ]
},
{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec",
"attr" : [ "Count","OneMinuteRate" ],
"resultAlias":"MessagesInPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "MessagesInPerSec"}
} ]
},
{
"obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsumer",
"attr" : [ "Count" ],
"resultAlias":"RequestsPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"request" : "FetchConsumer"}
} ]
},
{
"obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchFollower",
"attr" : [ "Count" ],
"resultAlias":"RequestsPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"request" : "FetchFollower"}
} ]
},
{
"obj" : "kafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce",
"attr" : [ "Count" ],
"resultAlias":"RequestsPerSec",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"request" : "Produce"}
} ]
},
{
"obj" : "java.lang:type=Memory",
"attr" : [ "HeapMemoryUsage", "NonHeapMemoryUsage" ],
"resultAlias":"MemoryUsage",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "MemoryUsage"}
} ]
},
{
"obj" : "java.lang:type=GarbageCollector,name=*",
"attr" : [ "CollectionCount","CollectionTime" ],
"resultAlias":"GC",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "GC"}
} ]
},
{
"obj" : "java.lang:type=Threading",
"attr" : [ "PeakThreadCount","ThreadCount" ],
"resultAlias":"Thread",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "Thread"}
} ]
},
{
"obj" : "kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica",
"attr" : [ "Value" ],
"resultAlias":"ReplicaFetcherManager",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "MaxLag"}
} ]
},
{
"obj" : "kafka.server:type=ReplicaManager,name=PartitionCount",
"attr" : [ "Value" ],
"resultAlias":"ReplicaManager",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "PartitionCount"}
} ]
},
{
"obj" : "kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions",
"attr" : [ "Value" ],
"resultAlias":"ReplicaManager",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "UnderReplicatedPartitions"}
} ]
},
{
"obj" : "kafka.server:type=ReplicaManager,name=LeaderCount",
"attr" : [ "Value" ],
"resultAlias":"ReplicaManager",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "LeaderCount"}
} ]
},
{
"obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer",
"attr" : [ "Count","Max" ],
"resultAlias":"TotalTimeMs",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "FetchConsumer"}
} ]
},
{
"obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower",
"attr" : [ "Count","Max" ],
"resultAlias":"TotalTimeMs",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "FetchConsumer"}
} ]
},
{
"obj" : "kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce",
"attr" : [ "Count","Max" ],
"resultAlias":"TotalTimeMs",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "Produce"}
} ]
},
{
"obj" : "kafka.server:type=ReplicaManager,name=IsrShrinksPerSec",
"attr" : [ "Count" ],
"resultAlias":"ReplicaManager",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "IsrShrinksPerSec"}
} ]
}
]
} ]
}
|
c、topic监控的配置文件,以falcon_monitor_us的10.10.20.14节点为例:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
| [iyunv@ip-10-10-30-69 jmxtrans]# cat falcon_monitor_us_14.json
{
"servers" : [ {
"port" : "9999",
"host" : "10.10.20.14",
"queries" : [ {
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=falcon_monitor_us",
"attr" : [ "Count" ],
"resultAlias":"falcon_monitor_us",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "BytesInPerSec"}
} ]
},
{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=falcon_monitor_us",
"attr" : [ "Count" ],
"resultAlias":"falcon_monitor_us",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "BytesOutPerSec"}
} ]
},
{
"obj" : "kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=falcon_monitor_us",
"attr" : [ "Count" ],
"resultAlias":"falcon_monitor_us",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "MessagesInPerSec"}
} ]
},
{
"obj" : "kafka.log:type=Log,name=LogEndOffset,topic=falcon_monitor_us,partition=*",
"attr" : [ "Value" ],
"resultAlias":"falcon_monitor_us",
"outputWriters" : [ {
"@class" : "com.googlecode.jmxtrans.model.output.InfluxDbWriterFactory",
"url" : "http://10.10.10.10:8086/",
"username" : "root",
"password" : "root",
"database" : "jmxDB",
"tags" : {"application" : "LogEndOffset"}
} ]
}
]
} ]
}
|
2、Grafana配置
a、添加数据源
Url、Database、User、Password需要和jmxtrans采集数据配置文件里面的写一致,然后点击Save&Test,提示成功就正常了
b、创建一个dashboard,然后在这里配置每一个监控指标的图
c、要点说明
1、对于监控指标为Count的监控项,需要通过Grafana做计算得到我们想要的监控,比如BytesInPerSec这个指标,它的监控值是一个累计值,我们想要取到每秒的流量,肯定需要计算,(本次采集的值-上次采集的值)/60 ,jmxtrans是一分钟采集一次数据,具体配置参考下面截图:
因为我们是一分钟采集一次数据,所以group by 和derivative选1分钟;因为我们要每秒的流量,所以math这里除以60
2、X轴的单位选择,比如流量的单位、时间的单位、每秒消息的个数无单位等等,下面分布举一个例子介绍说明
设置流量的单位 ,点击需要设置的图,选择"Edit"进入编辑页面,切到Axes这个tab页,Unit--》data(Metric)--》bytes
设置时间的单位 ,点击需要设置的图,选择"Edit"进入编辑页面,切到Axes这个tab页,Unit--》time--》milliseconds(ms)
设置按原始值展示,无单位 ,点击需要设置的图,选择"Edit"进入编辑页面,切到Axes这个tab页,Unit--》none--》none
七、收获总结
1、关于jmx收集了kafka的那些指标,对应的值都是那些类型,对应这个问题走了很多弯路,各种谷歌百度拿到了有人整理过的,一个一个试,发现很多不能用,要不就是写的是错误的,要不就是版本不同,写法不一样,最后看到了jconsole这个工具,他可以连接到本地或者远程的jmx端口,能看到在收集的所有指标,在windows下装好jdk,在bin目录你可以找到这个工具。
2、关于consumer的延时,关官方介绍有一个type是 type=consumer-fetch-manager-metrics的指标,但是我这通过jconsole连进来死活没有找到,如果亲们有使用这套监控方案的,求帮忙解惑我的这个问题,谢了,官网监控指标如下:
http://kafka.apache.org/documentation/#monitoring
|
|