Openstack:计算节点没法启动的一次排查过程
openstack的控制节点和计算节点重启后,在控制节点上查看计算资源的状态:# openstack compute service list
+----+------------------+------------------------+----------+---------+-------+----------------------
| ID | Binary | Host | Zone | Status| State | Updated At |
+----+------------------+------------------------+----------+---------+-------+----------------------
|1 | nova-conductor | linux-node1.wanwan.com | internal | enabled | up | 2017-03-10T03:00:40.000000 |
|2 | nova-scheduler | linux-node1.wanwan.com | internal | enabled | up | 2017-03-10T03:00:41.000000 |
|3 | nova-consoleauth | linux-node1.wanwan.com | internal | enabled | up | 2017-03-10T03:00:45.000000 |
|7 | nova-compute | linux-node1.wanwan.com | nova | enabled | up | 2017-03-10T03:00:38.000000 |
|8 | nova-compute | linux-node2.wanwan.com | nova | enabled | down| 2017-03-10T02:28:39.000000 |
+----+------------------+------------------------+----------+---------+-------+---------------------- 居然发现计算节点node02没有起来,如上在计算节点上查看下状态
# systemctl status openstack-nova-compute.service
● openstack-nova-compute.service - OpenStack Nova Compute Server
Loaded: loaded (/usr/lib/systemd/system/openstack-nova-compute.service; enabled; vendor preset: disabled)
Active: activating (start) since Fri 2017-03-10 10:49:08 CST; 12min ago
Main PID: 2261 (nova-compute)
CGroup: /system.slice/openstack-nova-compute.service
└─2261 /usr/bin/python2 /usr/bin/nova-compute
Mar 10 10:49:08 linux-node2.wanwan.com systemd: Starting OpenStack Nova Compute Server...
# systemctl start openstack-nova-compute.service
启服务,发现一直处于卡住的状态,查看下日志文件
-f101b84fa432] AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 10:58:19.846 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 10:58:51.944 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 10:59:24.076 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 10:59:56.191 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 11:00:28.302 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 11:01:00.411 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 11:01:33.521 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None 提示AMQP不可达,这个时候我就开始怀疑消息队列是否正常,继续检查
# lsof -i :5672
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nova-cons 1171 nova 5uIPv430613 0t0TCP linux-node1:40614->linux-node1:amqp (ESTABLISHED)
beam.smp1173 rabbitmq 52uIPv629124 0t0TCP *:amqp (LISTEN)
beam.smp1173 rabbitmq 53uIPv631152 0t0TCP linux-node1:amqp->linux-node1:40614 (ESTABLISHED)
beam.smp1173 rabbitmq 54uIPv631176 0t0TCP linux-node1:amqp->linux-node1:40624 (ESTABLISHED)
beam.smp1173 rabbitmq 55uIPv631180 0t0TCP linux-node1:amqp->linux-node1:40626 (ESTABLISHED)
beam.smp1173 rabbitmq 56uIPv631183 0t0TCP linux-node1:amqp->linux-node1:40628 (ESTABLISHED)
beam.smp1173 rabbitmq 57uIPv631193 0t0TCP linux-node1:amqp->linux-node1:40630 (ESTABLISHED)
beam.smp1173 rabbitmq 58uIPv631197 0t0TCP linux-node1:amqp->linux-node1:40632 (ESTABLISHED)
beam.smp1173 rabbitmq 59uIPv631255 0t0TCP linux-node1:amqp->linux-node1:40640 (ESTABLISHED)
beam.smp1173 rabbitmq 60uIPv631321 0t0TCP linux-node1:amqp->linux-node1:40646 (ESTABLISHED)
beam.smp1173 rabbitmq 61uIPv631355 0t0TCP linux-node1:amqp->linux-node1:40654 (ESTABLISHED)
beam.smp1173 rabbitmq 62uIPv635079 0t0TCP linux-node1:amqp->linux-node1:40670 (ESTABLISHED)
nova-sche 1186 nova 7uIPv431192 0t0TCP linux-node1:40630->linux-node1:amqp (ESTABLISHED)
nova-comp 2091 nova 4uIPv431168 0t0TCP linux-node1:40624->linux-node1:amqp (ESTABLISHED)
nova-comp 2091 nova 5uIPv431179 0t0TCP linux-node1:40626->linux-node1:amqp (ESTABLISHED)
nova-comp 2091 nova 21uIPv431898 0t0TCP linux-node1:40654->linux-node1:amqp (ESTABLISHED)
nova-comp 2091 nova 22uIPv435882 0t0TCP linux-node1:40670->linux-node1:amqp (ESTABLISHED)
nova-cond 3265 nova 7uIPv431196 0t0TCP linux-node1:40632->linux-node1:amqp (ESTABLISHED)
nova-cond 3265 nova 8uIPv431833 0t0TCP linux-node1:40646->linux-node1:amqp (ESTABLISHED)
nova-cond 3267 nova 7uIPv430623 0t0TCP linux-node1:40628->linux-node1:amqp (ESTABLISHED)
nova-cond 3267 nova 8uIPv431750 0t0TCP linux-node1:40640->linux-node1:amqp (ESTABLISHED)
没发现有异常,尝试清除下iptables
# iptables -F
# iptables -X
# iptables -Z
再次进行检查
# openstack compute service list
+----+------------------+------------------------+----------+---------+-------+----------------------
| ID | Binary | Host | Zone | Status| State | Updated At |
+----+------------------+------------------------+----------+---------+-------+----------------------
|1 | nova-conductor | linux-node1.wanwan.com | internal | enabled | up | 2017-03-10T03:08:40.000000 |
|2 | nova-scheduler | linux-node1.wanwan.com | internal | enabled | up | 2017-03-10T03:08:41.000000 |
|3 | nova-consoleauth | linux-node1.wanwan.com | internal | enabled | up | 2017-03-10T03:08:45.000000 |
|7 | nova-compute | linux-node1.wanwan.com | nova | enabled | up | 2017-03-10T03:08:48.000000 |
|8 | nova-compute | linux-node2.wanwan.com | nova | enabled | up | 2017-03-10T03:08:40.000000 |
+----+------------------+------------------------+----------+---------+-------+----------------------
如上,可以发现计算节点已经恢复正常了,看来iptables一定要记得清空策略
页:
[1]