sm702 发表于 2018-5-31 09:11:19

Openstack:计算节点没法启动的一次排查过程

  openstack的控制节点和计算节点重启后,在控制节点上查看计算资源的状态:
# openstack compute service list
+----+------------------+------------------------+----------+---------+-------+----------------------
| ID | Binary         | Host                   | Zone   | Status| State | Updated At               |
+----+------------------+------------------------+----------+---------+-------+----------------------
|1 | nova-conductor   | linux-node1.wanwan.com | internal | enabled | up    | 2017-03-10T03:00:40.000000 |
|2 | nova-scheduler   | linux-node1.wanwan.com | internal | enabled | up    | 2017-03-10T03:00:41.000000 |
|3 | nova-consoleauth | linux-node1.wanwan.com | internal | enabled | up    | 2017-03-10T03:00:45.000000 |
|7 | nova-compute   | linux-node1.wanwan.com | nova   | enabled | up    | 2017-03-10T03:00:38.000000 |
|8 | nova-compute   | linux-node2.wanwan.com | nova   | enabled | down| 2017-03-10T02:28:39.000000 |
+----+------------------+------------------------+----------+---------+-------+----------------------  居然发现计算节点node02没有起来,如上在计算节点上查看下状态

# systemctl status openstack-nova-compute.service
● openstack-nova-compute.service - OpenStack Nova Compute Server
   Loaded: loaded (/usr/lib/systemd/system/openstack-nova-compute.service; enabled; vendor preset: disabled)
   Active: activating (start) since Fri 2017-03-10 10:49:08 CST; 12min ago
Main PID: 2261 (nova-compute)
   CGroup: /system.slice/openstack-nova-compute.service
         └─2261 /usr/bin/python2 /usr/bin/nova-compute
Mar 10 10:49:08 linux-node2.wanwan.com systemd: Starting OpenStack Nova Compute Server...
# systemctl start openstack-nova-compute.service
启服务,发现一直处于卡住的状态,查看下日志文件
-f101b84fa432] AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 10:58:19.846 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 10:58:51.944 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 10:59:24.076 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 10:59:56.191 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 11:00:28.302 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 11:01:00.411 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None
2017-03-10 11:01:33.521 2261 ERROR oslo.messaging._drivers.impl_rabbit AMQP server on 10.10.10.11:5672 is unreachable: EHOSTUNREACH. Trying again in 32 seconds. Client port: None  提示AMQP不可达,这个时候我就开始怀疑消息队列是否正常,继续检查

# lsof -i :5672
COMMAND    PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
nova-cons 1171   nova    5uIPv430613      0t0TCP linux-node1:40614->linux-node1:amqp (ESTABLISHED)
beam.smp1173 rabbitmq   52uIPv629124      0t0TCP *:amqp (LISTEN)
beam.smp1173 rabbitmq   53uIPv631152      0t0TCP linux-node1:amqp->linux-node1:40614 (ESTABLISHED)
beam.smp1173 rabbitmq   54uIPv631176      0t0TCP linux-node1:amqp->linux-node1:40624 (ESTABLISHED)
beam.smp1173 rabbitmq   55uIPv631180      0t0TCP linux-node1:amqp->linux-node1:40626 (ESTABLISHED)
beam.smp1173 rabbitmq   56uIPv631183      0t0TCP linux-node1:amqp->linux-node1:40628 (ESTABLISHED)
beam.smp1173 rabbitmq   57uIPv631193      0t0TCP linux-node1:amqp->linux-node1:40630 (ESTABLISHED)
beam.smp1173 rabbitmq   58uIPv631197      0t0TCP linux-node1:amqp->linux-node1:40632 (ESTABLISHED)
beam.smp1173 rabbitmq   59uIPv631255      0t0TCP linux-node1:amqp->linux-node1:40640 (ESTABLISHED)
beam.smp1173 rabbitmq   60uIPv631321      0t0TCP linux-node1:amqp->linux-node1:40646 (ESTABLISHED)
beam.smp1173 rabbitmq   61uIPv631355      0t0TCP linux-node1:amqp->linux-node1:40654 (ESTABLISHED)
beam.smp1173 rabbitmq   62uIPv635079      0t0TCP linux-node1:amqp->linux-node1:40670 (ESTABLISHED)
nova-sche 1186   nova    7uIPv431192      0t0TCP linux-node1:40630->linux-node1:amqp (ESTABLISHED)
nova-comp 2091   nova    4uIPv431168      0t0TCP linux-node1:40624->linux-node1:amqp (ESTABLISHED)
nova-comp 2091   nova    5uIPv431179      0t0TCP linux-node1:40626->linux-node1:amqp (ESTABLISHED)
nova-comp 2091   nova   21uIPv431898      0t0TCP linux-node1:40654->linux-node1:amqp (ESTABLISHED)
nova-comp 2091   nova   22uIPv435882      0t0TCP linux-node1:40670->linux-node1:amqp (ESTABLISHED)
nova-cond 3265   nova    7uIPv431196      0t0TCP linux-node1:40632->linux-node1:amqp (ESTABLISHED)
nova-cond 3265   nova    8uIPv431833      0t0TCP linux-node1:40646->linux-node1:amqp (ESTABLISHED)
nova-cond 3267   nova    7uIPv430623      0t0TCP linux-node1:40628->linux-node1:amqp (ESTABLISHED)
nova-cond 3267   nova    8uIPv431750      0t0TCP linux-node1:40640->linux-node1:amqp (ESTABLISHED)
没发现有异常,尝试清除下iptables
# iptables -F
# iptables -X
# iptables -Z
再次进行检查
# openstack compute service list
+----+------------------+------------------------+----------+---------+-------+----------------------
| ID | Binary         | Host                   | Zone   | Status| State | Updated At               |
+----+------------------+------------------------+----------+---------+-------+----------------------
|1 | nova-conductor   | linux-node1.wanwan.com | internal | enabled | up    | 2017-03-10T03:08:40.000000 |
|2 | nova-scheduler   | linux-node1.wanwan.com | internal | enabled | up    | 2017-03-10T03:08:41.000000 |
|3 | nova-consoleauth | linux-node1.wanwan.com | internal | enabled | up    | 2017-03-10T03:08:45.000000 |
|7 | nova-compute   | linux-node1.wanwan.com | nova   | enabled | up    | 2017-03-10T03:08:48.000000 |
|8 | nova-compute   | linux-node2.wanwan.com | nova   | enabled | up    | 2017-03-10T03:08:40.000000 |
+----+------------------+------------------------+----------+---------+-------+----------------------  

  如上,可以发现计算节点已经恢复正常了,看来iptables一定要记得清空策略
页: [1]
查看完整版本: Openstack:计算节点没法启动的一次排查过程