zabbix性能监控

sxyzy · 发表于 2019-1-17 13:22:52

在监控zabbix内部的性能时，我们通常使用如下的几个metric来衡量服务的性能：
　　nvps, queue,update percent,process busy和pending sync data,cache。

通过增加相应的监控，可以有效的发现zabbix的性能问题，进而进行有的放矢的优化。

下面简要说明下：
1.nvps,每秒钟处理的数据量，是一个理论值。

取值的sql:

　　整集群：
SELECTSUM(1.0/i.delay)AS qps  FROM items i,hosts h WHERE i.status='0' AND i.hostid=h.hostid AND h.status='0' AND i.delay0;

　　breakdown 到proxy的：
SELECT h.proxy_hostid,SUM(1.0/i.delay) AS qps FROM items i,hosts h  WHERE i.status='0' AND i.hostid=h.hostid  AND h.status='0' AND i.delay0 AND h.proxy_hostid  is NOT NULL GROUP BY h.proxy_hostid;

2.数据的delay情况，比如一个item的interval设置为60s,但是在70s左右才进行了更新，那么就说明delay了10s。
queue值越大就说明zabbix内部存在某些性能上的问题了。比较常见的是poller和trapper的进程busy问题。
这个是一个interval check，可以建立如下item:
zabbix[queue]
zabbix[queue,5m]
zabbix[queue,10m]

3.update percent:
用来衡量item值的更新情况，如果percent很低，证明数据存在delay或者某些agent端的数据存在异常。

1)整个集群的
select a.aa/b.bb from
(select count(*) as aa from items
where lastclock > UNIX_TIMESTAMP()-1800 and delay < 900
and hostid in (select hostid from hosts where status=0)
and status = 0
) a,
(select count(*) as bb from items
where delay < 900 and status = 0
and hostid in (select hostid from hosts where status=0)
) b

　　2)到proxy的:
select a.aa/b.bb from
(select count(*) as aa from items
where lastclock > UNIX_TIMESTAMP()-1800 and delay < 900
and hostid in (select hostid from hosts where status=0 and proxy_hostid = 10100)
and status = 0
) a,
(select count(*) as bb from items
where delay < 900 and status = 0
and hostid in (select hostid from hosts where status=0 and proxy_hostid = 10100)
) b　　

其中proxy_hostid是对应的proxy的id.

　　3)到主机，可以定位哪些主机的值更新存在异常（比unreachable的报警更加准确）：
select b.hostname ,c.ip,a.update_percent as uppercent from
(select a.hostid,round(a.aa*100/b.bb,2) as update_percent from
(select hostid,count(*) as aa from items
where lastclock > UNIX_TIMESTAMP()-1800 and delay < 900
and hostid in (select hostid from hosts where status=0)
and status = 0 group by hostid
) a,
(select hostid,count(*) as bb from items
where delay < 900 and status = 0
and hostid in (select hostid from hosts where status=0) group by hostid
) b where a.hostid=b.hostid)a,(select hostid,lower(host) as hostname from hosts where status=0)b,
(select hostid,ip from interface where type='1')c
where a.hostid=b.hostid and b.hostid=c.hostid  having(a.update_percent) < 80 order by uppercent;　　

4.内部进程的busy情况：
zabbix的工作线程的情况，可以快速定位zabbix内部的性能瓶颈，具体是一些interval check。
比如 zabbix[process,housekeeper,avg,busy]， zabbix[process,http poller,avg,busy]， zabbix[process,poller,avg,busy]等

5.proxy的 pending send data的情况
用了衡量proxy到server的数据发送情况，值越小说明数据发送越快。
　　取值sql:
　　

SELECT ((SELECT MAX(proxy_history.id) FROM proxy_history)-nextid) FROM ids WHERE field_name='history_lastid'　　

　　

　　6.cache,interval check

比如: zabbix[wcache,history,pfree], zabbix[wcache,text,pfree]

　　

账号		自动登录	找回密码
密码			立即注册

wirelessnetview好用的无线分析工具

亿图图示专家(EDraw Max) V7.9 中文破解版

zabbix3.4.1安装部署+微信推送信息+大屏显

Red Hat OpenShift I: Containers & Kubern

2025 年，C++ 还能“硬核”多久？

RH199 RHCSA Rapid Track

Red Hat RHCE 8 (EX294) Cert Guide

[经验分享] zabbix性能监控

浏览过的版块

扫码加入运维网微信交流群