359025439 发表于 2019-1-1 09:46:02

使用zabbix监控HAProxy的状态信息

  我们使用HAProxy+Keepalived的方式部署游戏服务器前端负载均衡和高可用,因此需要对HAProxy的监控状况进行实时监控.
  本文使用的HAProxy版本是1.4.24
  参考官方文档http://cbonte.github.io/haproxy-dconv/configuration-1.4.html 中的
9. Statistics and monitoring
  

  https://github.com/olindata/tribily-zabbix-templates/tree/master/App_HAProxy
  https://github.com/jlyheden/zabbix_scripts/tree/master/haproxy
  

  

  1.监控原理描述
  

  HAProxy提供HTTP页面和状态Unix Socket可以显示HAProxy的状态信息,并且可以以CSV的格式导出。
  

  HTTP页面可以通过类似http://10.10.41.100/status;csv 的方式查看
  Unix Socket可以通过
  echo "show info;show stat" | sudo socat stdio unix-connect:/tmp/haproxy

  

  本文主要通过第二种方式获取HAProxy的状态信息
  在haproxy.cfg配置文件中设置状态socket
  

  stats socket/tmp/haproxy level admin
  level后面可以跟级别user,operator,admin
  user是最低权限级别,只能看到一些非敏感信息
  operator可以看到全部信息,但是只能修改一些非敏感信息
  admin可以看到并且操作所有信息,需要慎用
  

  $echo "show help" | sudo socat stdio unix-connect:/tmp/haproxy
  Unknown command. Please enter one of the following commands only :
  clear counters : clear max statistics counters (add 'all' for all counters)
  help         : this message
  prompt         : toggle interactive mode with prompt
  quit         : disconnect
  show info      : report information about the running process
  show stat      : report counters for each proxy and server
  show errors    : report last request and response errors for each proxy
  show sess : report the list of current sessions or dump this session
  get weight   : report a server's current weight
  set weight   : change a server's weight
  set timeout    : change a timeout setting
  disable server : set a server in maintenance mode
  enable server: re-enable a server that was previously in maintenance mode
  

  

  

  

  

  

  show info 报告当前的HAProxy进程信息

  

  
  Name: HAProxy
  Version: 1.4.24
  Release_date: 2013/06/17
  Nbproc: 1
  Process_num: 1
  Pid: 7020
  Uptime: 110d 16h25m55s
  Uptime_sec: 9563155
  Memmax_MB: 0
  Ulimit-n: 131101
  Maxsock: 131101
  Maxconn: 65536
  Maxpipes: 0
  CurrConns: 14
  PipesUsed: 0
  PipesFree: 0
  Tasks: 26
  Run_queue: 1
  node: master_loadbalance1
  description: lb1
  

  show stat显示HAProxy各个指标的计数

  # pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkf
  ail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_cod
  e,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,
  srv_abrt,
  login_game_pool,FRONTEND,,,24,868,2000,196721023,87244966860,121969199234,0,0,171448,,,,,OPEN,,,,,,,,,1,1,0,,,,0,95,0,628
  ,,,,0,195071390,0,1619236,28338,2034,,93,611,196721000,,,
  login_pool,web1_80,0,0,0,38,2000,8333681,2356031055,2827436427,,0,,0,3,2211,11,UP,30,1,0,902,0,9558963
  ,0,,1,2,1,,8329209,,2,1,,199,L7OK,200,1,20,7967292,0,361648,7,0,0,,,,136,0,
  login_pool,web2_80,0,0,0,63,2000,8333998,2358035705,2826639220,,0,,1,6,2281,13,UP,30,1,0,861,0,9558963
  

0. pxname: proxy name                     
1. svname: service name (FRONTEND for frontend, BACKEND for backend, any name
    for server)
2. qcur: current queued requests
3. qmax: max queued requests
4. scur: current sessions
5. smax: max sessions
6. slim: sessions limit
7. stot: total sessions
8. bin: bytes in
9. bout: bytes out
10. dreq: denied requests
11. dresp: denied responses
12. ereq: request errors
13. econ: connection errors
14. eresp: response errors (among which srv_abrt)
15. wretr: retries (warning)
16. wredis: redispatches (warning)
17. status: status (UP/DOWN/NOLB/MAINT/MAINT(via)...)
18. weight: server weight (server), total weight (backend)
19. act: server is active (server), number of active servers (backend)
20. bck: server is backup (server), number of backup servers (backend)
21. chkfail: number of failed checks
22. chkdown: number of UP->DOWN transitions
23. lastchg: last status change (in seconds)
24. downtime: total downtime (in seconds)
25. qlimit: queue limit
26. pid: process id (0 for first instance, 1 for second, ...)
27. iid: unique proxy id
28. sid: service id (unique inside a proxy)
29. throttle: warm up status
30. lbtot: total number of times a server was selected
31. tracked: id of proxy/server if tracking is enabled
32. type (0=frontend, 1=backend, 2=server, 3=socket)
33. rate: number of sessions per second over last elapsed second
34. rate_lim: limit on new sessions per second
35. rate_max: max number of new sessions per second
36. check_status: status of last health check, one of:
      UNK   -> unknown
      INI   -> initializing
      SOCKERR -> socket error
      L4OK    -> check passed on layer 4, no upper layers testing enabled
      L4TMOUT -> layer 1-4 timeout
      L4CON   -> layer 1-4 connection problem, for example
                   "Connection refused" (tcp rst) or "No route to host" (icmp)
      L6OK    -> check passed on layer 6
      L6TOUT-> layer 6 (SSL) timeout
      L6RSP   -> layer 6 invalid response - protocol error
      L7OK    -> check passed on layer 7
      L7OKC   -> check conditionally passed on layer 7, for example 404 with
                   disable-on-404
      L7TOUT-> layer 7 (HTTP/SMTP) timeout
      L7RSP   -> layer 7 invalid response - protocol error
      L7STS   -> layer 7 response error, for example HTTP 5xx
37. check_code: layer5-7 code, if available
38. check_duration: time in ms took to finish last health check
39. hrsp_1xx: http responses with 1xx code
40. hrsp_2xx: http responses with 2xx code
41. hrsp_3xx: http responses with 3xx code
42. hrsp_4xx: http responses with 4xx code
43. hrsp_5xx: http responses with 5xx code
44. hrsp_other: http responses with other codes (protocol error)
45. hanafail: failed health checks details
46. req_rate: HTTP requests per second over last elapsed second
47. req_rate_max: max number of HTTP requests per second observed
48. req_tot: total number of HTTP requests received
49. cli_abrt: number of data transfers aborted by the client
50. srv_abrt: number of data transfers aborted by the server (inc. in eresp)  

  需要注意的是如果HAProxy是以多进程方式启动即设置nbproc的值不为1,那么每个进程都可以通过socket显示它的状态信息,所以看到的状态信息是在多个进程间切换的。
  

  

  2.监控脚本编写
  这里有三个监控脚本

  haproxy_info.sh                   用于收集HAProxy的基本信息
  haproxy_pool_discovery.py         用于zabbix通过LLD功能发现各个pool对,如login_pool:BACKEND,login_pool:web1_80等,通过低级发现可以动态的根据配置文件中配置的后端主机监控各个后端主机的状态
  haproxy_stat.sh                   通过向stat socket发送show stat命令收集各个状态的值,脚本中会根据,进行判断第二个字段的值,因为有些字段是只有FRONTEND或BACKEND才会有,或者除了FRONTEND和BACKEND,其他都有等
  

  haproxy_info.sh
#!/bin/bash
#This script is used for getting haproxy info such as version ,uptime and number of processes etc
metric=$1
stats_socket=/tmp/haproxy
info_file=/tmp/haproxy_info.csv
echo "show info"|/usr/bin/sudo /usr/bin/socat   unix-connect:$stats_socketstdio > $info_file
grep $metric $info_file|awk '{print $2}'  

  haproxy_pool_discovery.py
  需要安装socat并且要设置zabbxi客户端用户具有sudo权限执行socat
  执行visudo命令更改
  如下
#
# Disable "ssh hostname sudo ", because it will show the password in clear.
#         You have to run "ssh -t hostname sudo ".
#
Defaults    !requiretty

zabbixagent   ALL=(root)      NOPASSWD:/usr/bin/socat  

#/usr/bin/python
#This script is used to discovery disk on the server
import subprocess
import json
args='''echo "show stat"|sudo socat stdio unix-connect:/tmp/haproxy|egrep -v '^#|^$'|awk -F',' '{print $1":"$2}' '''
t=subprocess.Popen(args,shell=True,stdout=subprocess.PIPE).communicate()
pools=[]
for pool in t.split('\n'):
    if len(pool) != 0:
       pools.append({'{#POOL_NAME}':pool})
print json.dumps({'data':pools},indent=4,separators=(',',':'))  

  执行结果
{
    "data":[
      {
            "{#POOL_NAME}":"login_game_pool:FRONTEND"
      },
      {
            "{#POOL_NAME}":"login_pool:web1_80"
      },
      {
            "{#POOL_NAME}":"login_pool:web2_80"
      },
      {
            "{#POOL_NAME}":"login_pool:BACKEND"
      },
    ]
}  

  haproxy_stat.sh
#!/bin/bash
# login_game_pool:FRONTEND
pool_name=$(echo $1|awk -F':' '{print $1}')
server_name=$(echo $1|awk -F':' '{print $2}')
metric=$2
stat_socket=/tmp/haproxy
stat_file=/tmp/haproxy_stat.csv
echo "show stat"|sudo socat stdio unix-connect:/tmp/haproxy > $stat_file
case $metric in
          qcur)
            #current queued requests
            if [ "$server_name" != "FRONTEND" ];then
                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $3}' $stat_file
            else
                  echo 0
            fi
             ;;
          qmax)
            #max queued requests
            if [ "$server_name" != "FRONTEND" ];then
                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $4}' $stat_file
            else
                  echo 0
            fi
             ;;
          scur)
            #current sessions
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $5}' $stat_file
             ;;
          smax)
            #max sessions
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $6}' $stat_file
             ;;
          slim)
            #sessions limit
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $7}' $stat_file
             ;;
          stol)
            #total sessions
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $8}' $stat_file
             ;;
         bin)
            #bytes in
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $9}' $stat_file
             ;;
          bout)
            #bytes out
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $10}' $stat_file
             ;;
          dreq)
            #denied requests
            #only FRONTEND and BACKEND has this field
            if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then
                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $11}' $stat_file
            else
               echo 0
            fi
             ;;
         dresp)
            #denied responses
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $12}' $stat_file
             ;;
          ereq)
            #request errors
            #only FRONTEND has this field
            if [ "$server_name" == "FRONTEND" ];then
               awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $13}' $stat_file
            else
               echo 0
            fi
             ;;
          econ)
            #connection errors
            #FRONTEND has not this field
            if [ "$server_name" != "FRONTEND" ];then
               awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $14}' $stat_file
            else
               echo 0
            fi
             ;;
         eresp)
            #response errors
            #FRONTEND has not this field
            if [ "$server_name" != "FRONTEND" ];then
               awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $15}' $stat_file
            else
               echo 0
            fi
             ;;
      status)
            #status
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $18}' $stat_file
            ;;
       chkfail)
            #number of failed checks
            #FRONTEND and BACKEND has not this field
            if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then
               echo 0
            else
               awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $22}' $stat_file
            fi
            ;;
       chkdown)
            #number of UP->DOWN transitions
            #FRONTEND has not this field will return 0
            if [ "$server_name" != "FRONTEND" ];then
                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $23}' $stat_file
            else
               echo 0
            fi
            ;;
       lastchg)
            #last status change in seconds
            #FRONTEND has not this field will return 0
            if [ "$server_name" != "FRONTEND" ];then
                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $24}' $stat_file
            else
               echo 0
            fi
            ;;
      downtime)
            #total downtime in seconds
            #FRONTEND has not this field will return 0
            if [ "$server_name" != "FRONTEND" ];then
                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $25}' $stat_file
            else
               echo 0
            fi
            ;;
         lbtot)
            #total number of times a server was selected
            #FRONTEND has not this field
            if [ "$server_name" != "FRONTEND" ];then
                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $31}' $stat_file
            else
               echo 0
            fi
            ;;
          rate)
            #number of sessions per second over last elapsed second
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $34}' $stat_file
            ;;
    rate_limit)
            #limit on new sessions per second
            #only FRONTEND has this field
            if [ "$server_name" == "FRONTEND" ];then
                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $35}' $stat_file
            else
                  echo 0
            fi
            ;;
      rate_max)
            #max number of new sessions per second
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $36}' $stat_file
            ;;
check_status)
            #status of last health check
            if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then
               echo "NULL"
            else
                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $37}' $stat_file
            fi
            ;;
      hrsp_1xx)
            #http response with 1xx code
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $40}' $stat_file
            ;;
      hrsp_2xx)
            #http response with 2xx code
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $41}' $stat_file
            ;;
      hrsp_3xx)
            #http response with 3xx code
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $42}' $stat_file
            ;;
      hrsp_4xx)
            #http response with 4xx code
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $43}' $stat_file
            ;;
      hrsp_5xx)
            #http response with 5xx code
            awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $44}' $stat_file
            ;;
      req_rate)
            #HTTP requests per second over last elapsed second
            #only FRONTEND has this field,others will return 0
            if [ "$server_name" == "FRONTEND" ];then
               awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $47}' $stat_file
            else
               echo 0
            fi
            ;;
req_rate_max)
            #max number of HTTP requests per second observed
            #only FRONTEND has this field,others will return 0
            if [ "$server_name" == "FRONTEND" ];then
                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $48}' $stat_file
            else
                  echo 0
            fi
            ;;
       req_tot)
            #total number of HTTP requests recevied
            #only FRONTEND has this field,others will return 0
            if [ "$server_name" == "FRONTEND" ];then
                  awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $49}' $stat_file
            else
                  echo 0
            fi
            ;;
             *)
               echo "please input the correct argument"
            ;;
esac  

  3.zabbix配置文件更改
  添加haproxy_status.conf
### Option: UserParameter
#User-defined parameter to monitor. There can be several user-defined parameters.
#Format: UserParameter=,
#See 'zabbix_agentd' directory for examples.
#
# Mandatory: no
# Default:
# UserParameter=
UserParameter=haproxy.info
[*],/usr/local/zabbix/bin/haproxy_info.sh $1
UserParameter=haproxy.discovery,/usr/bin/python /usr/local/zabbix/bin/haproxy_pool_discovery.py
UserParameter=haproxy.stat
[*],/usr/local/zabbix/bin/haproxy_stat.sh $1 $2  

  4.添加zabbix模板
http://s3.运维网.com/wyfs02/M00/4D/9E/wKioL1RU6yCwo6HsAAJMAXEBVvY840.jpg
http://s3.运维网.com/wyfs02/M02/4D/9F/wKiom1RU6sTQaSk2AAH-WY1IBXA742.jpg
http://s3.运维网.com/wyfs02/M01/4D/9E/wKioL1RU6yLB7wssAAmhFXoTb80477.jpg
http://s3.运维网.com/wyfs02/M02/4D/9F/wKiom1RU7UCxhsOpAAKLdGBec0c603.jpg
http://s3.运维网.com/wyfs02/M01/4D/9F/wKiom1RU7bGRxuSaAATxqLkCbCE437.jpg
  

  详细模板参考附件
  

  




页: [1]
查看完整版本: 使用zabbix监控HAProxy的状态信息