交换机CPU使用率过高故障处理
之前通过检查,我们发现厂区中的多数交换机都存在CPU使用率过高的情况,如下图显示上图为router81的CPU使用率。
通过登录到多台交换机,我们看到,每台交换机的CPU使用率都与router81相似,仔细对照图片可以看出,出现CPU使用率100%的时间,似乎存在着规律性,每隔4个小时,都会出现一次。
对于这种规律性的现象,一般分为两种情况:
[*]由病毒引起,但这种每隔四小时暴发一次的病毒很少见,少到几乎没有。
[*]网络中存在某种特殊应用
在此之前的很长一段时间,网络中安装了ciscoworks网管软件,该软件在特定时刻都会收集网络中设备的相关信息,但是我们登录ciscoworks发现,该软件收集信息的时间间隔是每一天,而不是4个小时。但是对这种特殊应用的怀疑仍然存在。
看时间,似乎又要到了CPU使用率100%的时刻,不停地使用sh proc cpu | exclude 0.00%查看占用CPU的进程,终于在某一个时刻发现了占用CPU使用率的进程,信息如下:
router#sh proc cpu | exclude 0.00%CPU utilization for five seconds: 99%/0%; one minute: 22%; five minutes: 12% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 7 242041611 413611390 5850.15%0.28%0.18% 0 ARP Input 47 48808959 402940967 1210.15%0.43%0.14% 0 hrpc <- response 75 41891982 222693622 1880.47%0.09%0.05% 0 hpm counter proc 76 119702441 695594464 1720.47%0.14%0.11% 0 HRPC pm-counters 104 237056752550016548 92.71%0.57%0.29% 0 Hulc LED Process 113 435571097 170696079 25510.31%0.29%0.33% 0 HRPC qos request 213 335184226277319 1270.31%0.23%0.05% 0 PDU DISPATCHER 214 103717798 112667119 920 88.67% 13.09%2.95% 0 SNMP ENGINE router#sh proc cpu | exclude 0.00%CPU utilization for five seconds: 97%/0%; one minute: 28%; five minutes: 13% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 7 242041619 413611406 5850.31%0.29%0.19% 0 ARP Input 47 48809027 402941360 1210.95%0.47%0.15% 0 hrpc <- response 75 41892006 222693627 1880.47%0.12%0.05% 0 hpm counter proc 104 237057732550016582 92.71%0.74%0.33% 0 Hulc LED Process 113 435571106 170696089 25510.31%0.29%0.33% 0 HRPC qos request 163 25180871 318757351 780.15%0.07%0.01% 0 IPC LC Message H 212 1349542952209019 2580.15%0.46%0.12% 0 IP SNMP 213 335185026277333 1270.15%0.22%0.05% 0 PDU DISPATCHER 214 103722934 112667518 920 85.96% 18.92%4.32% 0 SNMP ENGINE 223 71723 8617 83230.47%0.68%2.30% 6 Virtual Exec router#sh proc cpu | exclude 0.00%CPU utilization for five seconds: 99%/1%; one minute: 39%; five minutes: 16% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 43 110442861442524058 70.15%0.04%0.01% 0 Fifo Error Detec 75 41892031 222693643 1880.31%0.14%0.06% 0 hpm counter proc 104 237058842550016621 91.39%0.84%0.36% 0 Hulc LED Process 112 3721960321371638 17410.15%0.06%0.01% 0 HQM Stack Proces 113 435571123 170696096 25510.31%0.28%0.32% 0 HRPC qos request 155 33439834 264047680 1260.15%0.11%0.07% 0 IP Input 214 103730064 112667558 920 91.01% 29.82%7.14% 0 SNMP ENGINE router#sh proc cpu | exclude 0.00%CPU utilization for five seconds: 99%/0%; one minute: 44%; five minutes: 17% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 4 11122116712664628 87820.45%0.12%0.09% 0 Check heaps 7 242041627 413611434 5850.15%0.23%0.18% 0 ARP Input 43 110443021442524087 70.30%0.06%0.01% 0 Fifo Error Detec 75 41892048 222693648 1880.30%0.15%0.07% 0 hpm counter proc 76 119702457 695594557 1720.15%0.12%0.11% 0 HRPC pm-counters 104 237059512550016650 91.22%0.87%0.38% 0 Hulc LED Process 112 3721961121371639 17410.15%0.07%0.01% 0 HQM Stack Proces 113 435571165 170696104 25510.61%0.31%0.33% 0 HRPC qos request 155 33439842 264047697 1260.15%0.11%0.07% 0 IP Input 212 1349543752209041 2580.15%0.38%0.12% 0 IP SNMP 214 103735535 112667588 920 90.91% 34.71%8.53% 0 SNMP ENGINE 223 71782 8627 83200.45%0.60%2.20% 6 Virtual Exec router#sh proc cpu | exclude 0.00%CPU utilization for five seconds: 99%/0%; one minute: 48%; five minutes: 19% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 43 110443021442524119 70.15%0.07%0.01% 0 Fifo Error Detec 75 41892080 222693663 1880.47%0.18%0.07% 0 hpm counter proc 104 237060872550016683 91.59%0.93%0.40% 0 Hulc LED Process 112 3721962821371641 17410.15%0.08%0.01% 0 HQM Stack Proces 113 435571209 170696119 25510.63%0.33%0.33% 0 HRPC qos request 212 1349544652209047 2580.15%0.36%0.12% 0 IP SNMP 214 103741824 112667620 920 90.71% 39.20%9.89% 0 SNMP ENGINE 223 71816 8632 83190.63%0.61%2.17% 6 Virtual Exec router#sh proc cpu | exclude 0.00%CPU utilization for five seconds: 100%/0%; one minute: 52%; five minutes: 20% PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process 30 4095059621373388 19150.15%0.04%0.05% 0 Compute load avg 47 48809210 402942725 1210.15%0.35%0.15% 0 hrpc <- response 75 41892089 222693669 1880.31%0.19%0.08% 0 hpm counter proc 104 237061832550016786 92.07%1.02%0.43% 0 Hulc LED Process 112 3721963621371642 17410.15%0.08%0.02% 0 HQM Stack Proces 113 435571225 170696127 25510.31%0.33%0.33% 0 HRPC qos request 163 25180880 318757422 780.15%0.05%0.01% 0 IPC LC Message H 214 103745101 112668974 920 90.57% 43.31% 11.23% 0 SNMP ENGINE 223 71841 8637 83170.47%0.59%2.15% 6 Virtual Exec
以上信息中红色加粗显示的就是占用CPU使用率最高的进程:SNMP ENGINE。通过show run查看交换机配置,可以看到每台交换机都配置了SNMP的相关配置。后来观察,关闭SNMP后,交换机的使用率下降了很多。
另外通过抓包工具也可以看到网络存在SNMP包的交互,至此,造成此次现象的元凶终于露出真容。
结论:
SNMP的应用属于网络的正常应用,有关SNMP的配置要根据网络的实际情况来规划,如果网络中没有SNMP的相关应用,那么不建议配置SNMP。
通过抓包,我们可以看到总部那边与厂区的设备有SNMP的交互,建议查看,总部那边到底什么应用在使用SNMP,如果确实必要,那么SNMP的当前配置保留,如果没有应用,那么可以通过策略将SNMP的数据保留在当前网络环境中,不向总部那边发送。SNMP可以方便网络管理,但同样也可以带来一定的隐患。
页:
[1]