结合Ansible技术监控Storm集群
原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 、作者信息和本声明。否则将追究法律责任。http://sofar.blog.iyunv.com/353572/15798971、我的hosts配置# vim /etc/hosts
123456789101112192.168.1.100storm_zk1192.168.1.101storm_zk2192.168.1.102storm_zk3192.168.1.103storm_nimbus192.168.1.104storm_supervisor1192.168.1.105storm_supervisor2192.168.1.106storm_supervisor3192.168.1.107storm_supervisor4192.168.1.108storm_supervisor5192.168.1.109storm_supervisor62、我的storm配置
# vim /usr/local/storm/conf/storm.yaml
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970drpc.servers:- "storm_supervisor1"- "storm_supervisor2"- "storm_supervisor3" storm.zookeeper.servers:- "storm_zk1"- "storm_zk2"- "storm_zk3" storm.local.dir: "/data/storm/workdir"nimbus.host: "storm_nimbus"nimbus.thrift.port: 6627nimbus.thrift.max_buffer_size: 1048576nimbus.childopts: "-Xmx1024m"nimbus.task.timeout.secs: 30nimbus.supervisor.timeout.secs: 60nimbus.monitor.freq.secs: 10nimbus.cleanup.inbox.freq.secs: 600nimbus.inbox.jar.expiration.secs: 3600nimbus.task.launch.secs: 240nimbus.reassign: truenimbus.file.copy.expiration.secs: 600nimbus.topology.validator: "backtype.storm.nimbus.DefaultTopologyValidator"storm.zookeeper.port: 2181storm.zookeeper.root: "/data/storm/zkinfo"storm.cluster.mode: "distributed"storm.local.mode.zmq: falseui.port: 8080ui.childopts: "-Xmx768m"supervisor.slots.ports:- 6700- 6701- 6702- 6703- 6704- 6705- 6706- 6707- 6708- 6709 supervisor.childopts: "-Xmx2048m"supervisor.worker.start.timeout.secs: 240supervisor.worker.timeout.secs: 30supervisor.monitor.frequency.secs: 3supervisor.heartbeat.frequency.secs: 5supervisor.enable: trueworker.childopts: "-Xmx4096m"topology.max.spout.pending: 5000storm.zookeeper.session.timeout: 5000storm.zookeeper.connection.timeout: 3000storm.zookeeper.retry.times: 6storm.zookeeper.retry.interval: 2000storm.zookeeper.retry.intervalceiling.millis: 30000storm.thrift.transport: "backtype.storm.security.auth.SimpleTransportPlugin"storm.messaging.transport: "backtype.storm.messaging.netty.Context"storm.messaging.netty.server_worker_threads: 50storm.messaging.netty.client_worker_threads: 50storm.messaging.netty.buffer_size: 20971520storm.messaging.netty.max_retries: 100storm.messaging.netty.max_wait_ms: 1000storm.messaging.netty.min_wait_ms: 1003、nimbus节点部署
# vim /data/scripts/monitor_status_for_storm.sh
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263#!/bin/shPATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin. /etc/profile## 监控页面地址参数MON_SRV_IPADDR="192.168.1.103"MON_SRV_PORT="8080"## 是否已正确扫描SCAN_FLAG=0## 工作基路径BASE_PATH="/data/scripts"## 异常 storm Supervisor 主机地址列表FAIL_SUPERVISOR_LIST="${BASE_PATH}/fail_supervisor.txt"#---------------------------------------------------------------------------------------------------## 重启storm的nimbus服务function restart_storm_nimbus_server() { [[ -n `ps aux | grep java | grep storm` ]] && kill -9 `ps aux | grep java | grep storm | awk '{print $2}'` nohup /usr/local/storm/bin/storm nimbus >/dev/null 2>&1 & nohup /usr/local/storm/bin/storm ui >/dev/null 2>&1 & sleep 30}#---------------------------------------------------------------------------------------------------## 1、检查监控页面是否正常【8080端口不通的情况】for ((i=0; i<3; i++)); do RETVAL=`/usr/bin/nmap -n -sS -p ${MON_SRV_PORT} ${MON_SRV_IPADDR} | grep open` [[ -n "${RETVAL}" ]] && SCAN_FLAG=1;break || sleep 10done[[ ${SCAN_FLAG} -ne 1 ]] && restart_storm_nimbus_server#---------------------------------------------------------------------------------------------------## 2、将监控页面抓取内容与本地hosts内容进行差异比较,以确定是否存在异常的 storm supervisor 服务curl -s http://${MON_SRV_IPADDR}:${MON_SRV_PORT}/ | sed 's/<td>/<td>\n/g' | awk -F '<' '/^storm_/{print $1}' | awk '!/nimbus/{print}' | sort > ${BASE_PATH}/supervisor_list_from_page.txt## 如果获取的storm nimbus监控页面数据为空,代表storm nimbus服务存在异常[[ -z `sed '/^$/d' ${BASE_PATH}/supervisor_list_from_page.txt` ]] && restart_storm_nimbus_serversort -nr ${BASE_PATH}/supervisor_list_from_page.txt ${BASE_PATH}/supervisor_list.txt | uniq -u > ${BASE_PATH}/supervisor_list_for_failed.txt[[ -z `sed '/^$/d' ${BASE_PATH}/supervisor_list_for_failed.txt` ]] && rm -f ${BASE_PATH}/supervisor_list_for_failed.txt && exit 0#---------------------------------------------------------------------------------------------------## 3、获得异常的 storm supervisor 服务的IP地址列表echo "" >> ${FAIL_SUPERVISOR_LIST}for SUPERVISOR_NAMEADDR in `cat ${BASE_PATH}/supervisor_list_for_failed.txt`do TEMP_IPADDR=`grep -w ${SUPERVISOR_NAMEADDR} /etc/hosts | grep -v '#' | awk '{print $1}' | tail -1` echo "${TEMP_IPADDR}" >> ${FAIL_SUPERVISOR_LIST} IPLIST="${IPLIST} ${TEMP_IPADDR}"done#---------------------------------------------------------------------------------------------------## 4、远程重启 storm supervisor 服务/usr/local/bin/ansible -i ${FAIL_SUPERVISOR_LIST} fail_supervisor -m shell -a "/data/scripts/restart_storm_service.sh"rm -f ${FAIL_SUPERVISOR_LIST}# vim /data/scripts/supervisor_list.txt
123456storm_supervisor1storm_supervisor2storm_supervisor3storm_supervisor4storm_supervisor5storm_supervisor6# touch /var/run/check_storm.lock
# crontab -e
*/2 * * * * (flock --timeout=0 /var/run/check_storm.lock /data/scripts/monitor_status_for_storm.sh >/dev/null 2>&1)
4、supervisor节点部署
# vim/data/scripts/restart_storm_service.sh
123456#!/bin/shPATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/sbin. /etc/profile[[ -n `ps aux | grep java | grep storm` ]] && kill -9 `ps aux | grep java | grep storm | awk '{print $2}'`nohup /usr/local/storm/bin/storm supervisor >/dev/null 2>&1 &
本文出自 “人生理想在于坚持不懈” 博客,请务必保留此出处http://sofar.blog.iyunv.com/353572/1579897
页:
[1]