|
转自:http://wiki.saltstack.cn/salt-keepalive
Salt Keepalive分析
在salt 0.11版本时碰到了一个问题,就是salt master意外重启后(断电服务器重启,然后重启了salt master服务),结果minion netstat显示为ESTABLIASHED,结果master无法连通minion,查阅salt
Issues,有人反馈过这个问题,官方提示说在新一个版本修复。刚好目前有salt 0.13的环境,发现minion的配置文件中已经有关于Keepalive的设置,就对Salt
Keepalive进行下测试
测试环境
角色
IP
OS
版本
Master
172.16.0.26
CentOS 6.3 X86_64
0.13.1
Minion
172.16.0.27
CentOS 6.3 X86_64
0.13.1
minion的keepalive setting部分配置采用默认配置
测试
(1) 进行test.ping测试
切换行号显示
1 # salt '*' test.ping
2 salt-test:
3 True
返回结果为True,代表和minion通信一切正常
(2) 进行tcpdump抓包
切换行号显示
1 # tcpdump host 172.16.0.27 and port 4505 -nnn
2
在300s之后,收到了如下包:
15:31:03.828271 IP 172.16.0.27.43263 > 172.16.0.26.4505: Flags [.], ack 163, win 547, options [nop,nop,TS val 2495920406 ecr 2495615334], length 0
15:31:03.828303 IP 172.16.0.26.4505 > 172.16.0.27.43263: Flags [.], ack 1, win 227, options [nop,nop,TS val 2495915334 ecr 2495620406], length 0
说明minion的keepalive默认是开启的,并且tcp_keepalive_idle默认是300s
(3) 限制minion连接
切换行号显示
1 # iptables -A INPUT -s 172.16.0.27 -p tcp --dport 4505 -j DROP
2
模拟salt master突然关闭的情况,此时minion上对应的tcp连接为ESTABLIASHED
300s之后,tcpdump检测到新的包:
切换行号显示
1 15:36:03.828237 IP 172.16.0.27.43263 > 172.16.0.26.4505: Flags [.], ack 163, win 547, options [nop,nop,TS val 2496220406 ecr 2495915334], length 0
2 15:37:18.828341 IP 172.16.0.27.43263 > 172.16.0.26.4505: Flags [.], ack 163, win 547, options [nop,nop,TS val 2496295406 ecr 2495915334], length 0
3 15:38:33.828316 IP 172.16.0.27.43263 > 172.16.0.26.4505: Flags [.], ack 163, win 547, options [nop,nop,TS val 2496370406 ecr 2495915334], length 0
4 15:39:48.828255 IP 172.16.0.27.43263 > 172.16.0.26.4505: Flags [.], ack 163, win 547, options [nop,nop,TS val 2496445406 ecr 2495915334], length 0
5 15:41:03.828220 IP 172.16.0.27.43263 > 172.16.0.26.4505: Flags [.], ack 163, win 547, options [nop,nop,TS val 2496520406 ecr 2495915334], length 0
6 15:42:18.828249 IP 172.16.0.27.43263 > 172.16.0.26.4505: Flags [.], ack 163, win 547, options [nop,nop,TS val 2496595406 ecr 2495915334], length 0
7 15:43:33.828236 IP 172.16.0.27.43263 > 172.16.0.26.4505: Flags [.], ack 163, win 547, options [nop,nop,TS val 2496670406 ecr 2495915334], length 0
8 15:44:48.828231 IP 172.16.0.27.43263 > 172.16.0.26.4505: Flags [.], ack 163, win 547, options [nop,nop,TS val 2496745406 ecr 2495915334], length 0
9 15:46:03.829814 IP 172.16.0.27.43263 > 172.16.0.26.4505: Flags [.], ack 163, win 547, options [nop,nop,TS val 2496820406 ecr 2495915334], length 0
10 15:47:18.828229 IP 172.16.0.27.43263 > 172.16.0.26.4505: Flags [R.], seq 1, ack 163, win 547, options [nop,nop,TS val 2496895406 ecr 2495915334], length 0
11 15:47:19.029054 IP 172.16.0.27.34080 > 172.16.0.26.4505: Flags [S], seq 337868058, win 14600, options [mss 1460,sackOK,TS val 2496895606 ecr 0,nop,wscale 6], length 0
12 15:47:20.028250 IP 172.16.0.27.34080 > 172.16.0.26.4505: Flags [S], seq 337868058, win 14600, options [mss 1460,sackOK,TS val 2496896606 ecr 0,nop,wscale 6], length 0
13 15:47:22.028223 IP 172.16.0.27.34080 > 172.16.0.26.4505: Flags [S], seq 337868058, win 14600, options [mss 1460,sackOK,TS val 2496898606 ecr 0,nop,wscale 6], length 0
14 15:47:26.028357 IP 172.16.0.27.34080 > 172.16.0.26.4505: Flags [S], seq 337868058, win 14600, options [mss 1460,sackOK,TS val 2496902606 ecr 0,nop,wscale 6], length 0
第一个包是keepalive的包,由于master上有防火墙策略,minion和master的通信是中断的.
第二个包到第九个包,每个包间隔是75s,表示默认的tcp_keepalive_intvl是75秒
之前发送完9个包后,收到了RST的包,表明tcp_keepalive_cnt默认值是9
然后salt minion会不断的发送SYN包到master,以确保salt minion能够连通master.
结论
Salt采用长连接的方式进行通信,由于新版本新增的keepalive功能,将不会出现salt minion由于连接中断无法连通的情况(但在keepalive之间由于tcp连接是断开的,master和minion是无法进行通讯的).
接反馈,有朋友在CentOS 5上依然会出现minion无法连接的情况, CentOS 6正常 |
|