这里举几个例子。 http连接并发监测在某地的双向系统中,两台服务器有一个httpd服务器,通过负载均衡承担4万多的机顶盒的首页面访问。因此,需要监测每台服务器的http连接数量。对并发量关注一段时期,如果并发量在设置的值之内,则不需要进行并发量的调整,而如果并发量比较大的话,则需要进行调整httpd参数。 http的连接数量可以通过ps -ef | grep httpd | wc -l来获取。Centos5版本默认的文件打开数量为1024。对/etc/httpd/conf/httpd.conf进行参数调整,使httpd可以承担1000个并发: 查询默认的服务方式: # httpd -l Compiled in modules: core.c prefork.c http_core.c mod_so.c # 默认的服务方式为prefork,调整/etc/httpd/conf/httpd.conf参数(具体含义可到网上查找): <IfModule prefork.c> StartServers 10 MinSpareServers 50 MaxSpareServers 100 ServerLimit 3000 MaxClients 3000 MaxRequestsPerChild 10000 </IfModule> 重启httpd后,进行并发测试: 这里共发送10000个请求,每次发送1000个,也就是每次并发1000个,共分10次发送。 如果是并发2000,则报错: 查询httpd的服务方式为: ps -ef | grep httpd | wc -l 编写监测脚本: #!/bin/bash STATE_OK=0 STATE_WARNING=1 STATE_CRITICAL=2 STATE_UNKNOWN=3 count=`ps -ef | grep httpd | wc -l` let "count =count-2" if [ $count -gt 800 ]; then echo "Critical ! httpd processes is $count!|HttpCounts=$count;600;800;0" exit $STATE_CRITICAL elif [ $count -gt 600 ]; then echo "Warning ! httpd processes is $count!|HttpCounts=$count;600;800;0" exit $STATE_WARNING elif [ 600 -gt $count ]; then echo "Ok ! httpd processes is $count!|HttpCounts=$count;600;800;0" exit $STATE_OK else echo "Unknow" exit $STATE_UNKNOWN fi 脚本中echo一行最后添加的“|HttpCounts=$count;600;800;0”是pnp画图所需,否则不能画出图。其中600为预警,800为严重告警。 TIME_WAIT监测默认配置的httpd,在访问量比较大的话,使用命令: # netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}' TIME_WAIT=581 ESTABLISHED=1 SYN_RECV=1 就会有大量的time_wait。可以通过调整系统参数得到改善: http://blog.iyunv.com/sunvince/article/details/6622796 http://chembo.iteye.com/blog/1503770 这里,在/etc/sysctl.conf增加以下4个参数: net.ipv4.tcp_syncookies = 1 //这个参数如果已经添加了的话,就不需要再次添加
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 30 然后执行 /sbin/sysctl -p 让参数生效。 继续使用netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}',观察到TIME_WAIT=3了。 可以根据下节中监测物理内存的check_mem脚本规范编写监测 TIME_WAIT的脚本: #more /usr/local/nagios/libexec/check_timewait #!/bin/bash #netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}' #TIME_WAIT 641 #ESTABLISHED 1 #SYN_RECV 1 if [ $# != 4 ];then echo "Usage:$0 -w num1 -c num2" exit fi time_wait=`netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'|grep TIME_WAIT|awk '{print $2}'` established=`netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'|grep ESTABLISHED|awk '{print $2}'` syn_recv=`netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'|grep SYN_RECV|awk '{print $2}'` if [ $time_wait -gt $4 ];then echo "Critical - TIME_WAIT=$time_wait ESTABLISHED=$established SYN_RECV=$syn_recv|TIME_WAIT=$time_wait;$2;$4;0 ESTABLISHED=$established;;; SYN_RECV=$syn_recv;;;" exit 2 fi if [ $time_wait -le $4 -a $time_wait -ge $2 ];then echo "Warning - TIME_WAIT=$time_wait ESTABLISHED=$established SYN_RECV=$syn_recv |TIME_WAIT=$time_wait;$2;$4;0 ESTABLISHED=$established;;; SYN_RECV=$syn_recv;;;" exit 1 fi if [ $time_wait -lt $2 ];then echo "OK - TIME_WAIT=$time_wait ESTABLISHED=$established SYN_RECV=$syn_recv |TIME_WAIT=$time_wait;$2;$4;0 ESTABLISHED=$established;;; SYN_RECV=$syn_recv;;;" exit 0 fi ~ 路由监测另外,由于这台机器不能丢失一个默认的路由,否则机顶盒无法获取首页面。默认的路由也可以编写脚本进行监测: #!/bin/bash STATE_OK=0 STATE_WARNING=1 STATE_CRITICAL=2 STATE_UNKNOWN=3 default_route=`route |grep default` if [ "default 172.16.100.1 0.0.0.0 UG 0 0 0 bond0"x = "$default_route"x ]; then echo "Ok! Default route is \"$default_route\"!|defaultRoute=1;0;0" exit $STATE_OK else echo "Critical! Default route is \"$default_route\"!|defaultRoute=0;0;0" exit $STATE_CRITICAL fi 物理内存监测默认的check_swap为监测虚拟内存,如果要监测物理内存,则可以编写以下脚本: #cd /usr/local/nagios/libexec #vi check_mem #!/bin/bash #memory if [ $# != 4 ];then echo "Usage:$0 -w num1 -c num2" exit fi total_mem=`free -m |grep Mem|awk '{print $2}'` free_mem=`free -m |grep Mem|awk '{print $4}'` used_mem=`free -m |grep Mem|awk '{print $3}'` if [ $free_mem -gt $2 ];then echo "OK - total memory $total_mem MB used $used_mem MB free $free_mem MB|free_mem=$free_mem;$2;$4;0" exit 0 fi if [ $free_mem -ge $4 -a $free_mem -le $1 ];then echo "Warning - total memory $total_mem MB used $used_mem MB free $free_mem MB|free_mem=$free_mem;$2;$4;0" exit 1 fi if [ $free_mem -lt $4 ];then echo "Critical - total memory $total_mem MB used $used_mem MB free $free_mem MB|free_mem=$free_mem;$2;$4;0" exit 2 fi #chown -R nagios.nagios check_mem #chmod a+x check_mem #./check_mem -w 200 -c 100 //后面两个参数表示剩余内存容量,单位为兆。 #script to check real memory usage # L.Gill 02/05/06 - V.1.0 # ------------------------------------------ # ######## Script Modifications ########## # ------------------------------------------ # Who When What # --- ---- ---- # LGill 17/05/06 "$percent" lt 1% fix - sed edits dc result beggining with "." # # #!/bin/bash USAGE="`basename $0` [-w|--warning]<percent free> [-c|--critical]<percent free>" THRESHOLD_USAGE="WARNING threshold must be greater than CRITICAL: `basename $0` $*" calc=/tmp/memcalc percent_free=/tmp/mempercent critical="" warning="" STATE_OK=0 STATE_WARNING=1 STATE_CRITICAL=2 STATE_UNKNOWN=3 # print usage if [[ $# -lt 4 ]] then echo "" echo "Wrong Syntax: `basename $0` $*" echo "" echo "Usage: $USAGE" echo "" exit 0 fi # read input while [[ $# -gt 0 ]] do case "$1" in -w|--warning) shift warning=$1 ;; -c|--critical) shift critical=$1 ;; esac shift done # verify input if [[ $warning -eq $critical || $warning -lt $critical ]] then echo "" echo "$THRESHOLD_USAGE" echo "" echo "Usage: $USAGE" echo "" exit 0 fi # Total memory available total=`free -m | head -2 |tail -1 |gawk '{print $2}'` # Total memory used used=`free -m | head -2 |tail -1 |gawk '{print $3}'` # Calc total minus used free=`free -m | head -2 |tail -1 |gawk '{print $4+$7}'` # normal values #echo "$total"MB total #echo "$used"MB used #echo "$free"MB free # make it into % percent free = ((free mem / total mem) * 100) echo "5" > $calc # decimal accuracy echo "k" >> $calc # commit echo "100" >> $calc # multiply echo "$free" >> $calc # division integer echo "$total" >> $calc # division integer echo "/" >> $calc # division sign echo "*" >> $calc # multiplication sign echo "p" >> $calc # print percent=`/usr/bin/dc $calc|/bin/sed 's/^\./0./'|/usr/bin/tr "." " "|/usr/bin/gawk {'print $1'}` #percent1=`/usr/bin/dc $calc` #echo "$percent1" if [[ "$percent" -le $critical ]] then echo "CRITICAL - $free MB ($percent%) Free Memory" exit 2 fi if [[ "$percent" -le $warning ]] then echo "WARNING - $free MB ($percent%) Free Memory" exit 1 fi if [[ "$percent" -gt $warning ]] then echo "OK - $free MB ($percent%) Free Memory" exit 0 fi 用以下命令执行: # ./check_memory -w 36 -c 10 OK - 185 MB (37%) Free Memory # ./check_memory -w 37 -c 10 WARNING - 185 MB (37%) Free Memory # ./check_memory -w 40 -c 37 CRITICAL - 184 MB (37%) Free Memory # -w表示低于多少百分比就预警,-c表示低于多少百分比就严重告警
|