openstack常见错误总结

吸毒的虫子 发表于 2018-6-2 12:51:16

以下主要为安装部署过程中遇到的一些问题，因为openstack版本问题，带来的组件差异导致不同的版本安装的方法也完全不一样。经过测试，目前已可成功部署Essex和Grizzly两个版本，其中间还有个版本是Folsom,这个版本没有部署成功，也没有花太多时间去研究，因为Folsom版本中使用的quantum组件还不成熟，对于网络连通性还有很多问题，网上也很少有成功的案例，大多数人使用的还是folsom+nova-network模式。
到了Grizzly版本，quantum组件才比较稳定，可以正常使用，自己也花了很多时间研究，现在已可以成功部署多节点环境。以下是部署过程中遇到的一些问题，包括Essex和Grizzly两个版本。国内网上关于这方面的资料很少，很多资料也都是国外网站上看到的。而且很多情况下日志错误信息相同，但导致错误的原因却不尽相同，这时候就需要仔细分析其中的原理，才能准确定位。遇到错误并不可怕，我们可以通过对错误的排查加深对系统的理解，这样也是好事。
关于安装部署，网上有一些自动化的部署工具，如devstack和onestack，一键式部署。如果你是初学者，并不建议你使用这些工具，很明显，这样你学不到任何东西，不会有任何收获。如果没有问题可以暂时恭喜你一下，一旦中间环节出现错误信息，你可能一头雾水，根本不知道是哪里错了，加之后期的维护也是相当困难的。你可能需要花更多的时间去排查故障。因为你根本不了解中间经过了哪些环节，需要做哪些配置！这些工具大多数是为了快速部署开发环境所用，正真生产环境还需要我们一步一步来操作。这样有问题也可快速定位排查错误。
本文仅是针对部署过程中的一些错误信息进行总结梳理，并给予解决办法，这些情况是在我的环境里遇到的，并成功解决的，可能会因为环境的不同而有所差异，仅供参考。
1、检查服务是否正常：
view plaincopy

[*]root@control:~# nova-manage service list

[*]Binary       Host                               Zone          Status State Updated_At

[*]nova-cert    control                            internal       enabled :-) 2013-04-26 02:29:44
[*]nova-conductor control                         internal       enabled :-) 2013-04-26 02:29:42
[*]nova-consoleauth control                            internal       enabled :-) 2013-04-26 02:29:44
[*]nova-scheduler control                            internal       enabled :-) 2013-04-26 02:29:47
[*]nova-compute node-01                            nova          enabled :-) 2013-04-26 02:29:46
[*]nova-compute node-02                            nova          enabled :-) 2013-04-26 02:29:46
[*]nova-compute node-03                            nova          enabled :-) 2013-04-26 02:29:42

如果看到都是笑脸状态，说明nova的服务属于正常状态，如果出现XXX，请查看该服务的相关日志信息，在/var/log/nova/下查看，通过日志一般可以分析出错误的原因。
2、libvirt错误
view plaincopy

[*]python2.7/dist-packages/nova/virt/libvirt/connection.py”, line 338, in _connect
[*]2013-03-0917:05:42 TRACE nova return libvirt.openAuth(uri, auth, 0)
[*]2013-03-09 17:05:42 TRACE nova File “/usr/lib/python2.7/dist-packages/libvirt.py”, line 102, in openAuth
[*]2013-03-09 17:05:42 TRACE nova if ret is None:raise libvirtError(‘virConnectOpenAuth() failed’)
[*]2013-03-09 17:05:42 TRACE nova libvirtError: Failed to connect socket to ‘/var/run/libvirt/libvirt-sock’: No such file or directory
[*]2013-03-09 22:05:41.909+0000: 12466: info : libvirt version: 0.9.8
[*]2013-03-09 22:05:41.909+0000: 12466: error : virNetServerMDNSStart:460 : internal error Failed to create mDNS client: Daemon not running

解决方案：
出现这种错误首先要查看/var/log/libvirt/libvirtd.log日志信息，日志里会显示：libvirt-bin service will not start without dbus installed.
我们再查看ps –ea|grep dbus，确认dbus is running，然后执行apt-get install lxc
3、Failed to add image
view plaincopy

[*]Error：
[*]Failed to add image. Got error: The request returned 500 Internal Server Error

解决方案：
环境变量问题，配置环境变量，在/etc/profile文件中新增：
view plaincopy

[*]OS_AUTH_KEY=”openstack”
[*]OS_AUTH_URL=”http://localhost:5000/v2.0/”
[*]OS_PASSWORD=”openstack”
[*]OS_TENANT_NAME=”admin”
[*]OS_USERNAME=”admin”

然后执行source/etc/profile即可！当然你也可以不在profile里配置环境变量，但是只能临时生效，重启服务器就很麻烦，所以建议你还是写在profile里，这样会省很多麻烦。
4、僵尸实例的产生
僵尸实例一般是非法的关闭nova或者底层虚拟机，又或者在实例错误时删除不了的错误，注意用virsh list检查底层虚拟机是否还在运行，有的话停掉，然后直接进入数据库删除。
view plaincopy

[*]Nova instance not found
[*]Local file storage of the image files.
[*]Error:
[*]2013-03-09 17:58:08 TRACE nova raise exception.InstanceNotFound(instance_id=instance_name)
[*]2013-03-09 17:58:08 TRACE nova InstanceNotFound: Instance instance-00000002 could not be found.
[*]2013-03-09 17:58:08 TRACE nova

解决方案：
删除数据库中的僵尸实例或将数据库删除重新创建：
a、删除数据库：
view plaincopy

[*]$mysql –u root –p
[*]DROP DATABASE nova;
[*]Recreate the DB:
[*]CREATE DATABASE nova; (strip formatting if you copy and paste any of this)
[*]GRANT ALL PRIVILEGES ON nova.* TO ‘novadbadmin’@'%’ IDENTIFIED BY ‘<password>’;
[*]Quit
[*]　　

[*]Resync DB

b、删除数据库中的实例：
view plaincopy

[*]#!/bin/bash
[*]mysql -uroot -pmysql <<_ESXU_
[*]use nova;
[*]DELETE a FROM nova.security_group_instance_association
[*]AS a INNER JOIN nova.instances AS b
[*]ON a.instance_uuid=b.id where b.uuid='$1';
[*]DELETE FROM nova.instance_info_caches WHERE instance_uuid='$1';
[*]DELETE FROM nova.instances WHERE uuid='$1';
[*]_ESXU_

将以上文件写入delete_insrance.sh中，然后执行sh delete_instrance.sh insrance_id;
其中instrance_id可以通过nova list 查看。
5、Keystone NoHandlers
view plaincopy

[*]Error
[*]root@openstack-dev-r910:/home/brent/openstack# ./keystone_data.sh
[*]No handlers could be found for logger “keystoneclient.client”
[*]Unable to authorize user
[*]No handlers could be found for logger “keystoneclient.client”
[*]Unable to authorize user
[*]No handlers could be found for logger “keystoneclient.client”
[*]Unable to authorize user

解决方案：
出现这种错误是大多数是由于keystone_data.sh有误，其中
admin_token必须与/etc/keystone/keystone.conf中相同。然后确认keystone.conf中有如下配置：
driver = keystone.catalog.backends.templated.TemplatedCatalog template_file = /etc/keystone/default_catalog.templates
6、清空系统组件，重新安装：
view plaincopy

[*]#!/bin/bash
[*]mysql -uroot -popenstack -e “drop database nova;”
[*]mysql -uroot -popenstack -e “drop database glance;”
[*]mysql -uroot -popenstack -e “drop database keystone;”
[*]apt-get purge nova-api nova-cert nova-common nova-compute
[*]nova-compute-kvm nova-doc nova-network nova-objectstore
[*]nova-scheduler nova-vncproxy nova-volume python-nova python-novaclient
[*]apt-get autoremove
[*]rm -rf /var/lib/glance
[*]rm -rf /var/lib/keystone/
[*]rm -rf /var/lib/nova/
[*]rm -rf /var/lib/mysql

可通过执行上面的脚本，卸载已安装的组件并清空数据库。这样可以省去重装系统的麻烦！
7、Access denied for user ‘keystone@localhost(using password:YES’)
view plaincopy

[*]# keystone-manage db_sync
[*]File “/usr/lib/python2.7/dist-packages/MySQLdb/connections.py”, line 187, in __init__
[*]super(Connection, self).__init__(*args, **kwargs2)
[*]sqlalchemy.exc.OperationalError: (OperationalError) (1045, “Access denied for user ‘keystone’@'openstack1′ (using password: YES)”) None None

解决方案：
查看keystone.conf配置文件链接数据库是否有误，正确如下：
view plaincopy

[*]
[*]connection = mysql://keystone:openstack@localhost:3306/keystone

8、nova-compute挂掉与时间同步的关系
很多时候发现nova-compute挂掉，或者不正常了，通过nova-manage查看状态是XXX了。
往往是nova-compute的主机时间和controller的主机时间不一致。 nova-compute是定时地往数据库中services这个表update时间的，这个时间是nova-compute的主机时间。
controller校验nova-compute的存活性是以controller的时间减去nova-compute的update时间，如果大于多少秒（具体数值代码里面有，好像是15秒）就判断nova-compute异常。
这个时候你用nova-manage查看nova-compute状态是XXX，如果创建虚拟机，查看nova-scheduler.log 就是提示找不到有效的host 其他服务节点类同，这是nova心跳机制问题。所以讲nova环境中各节点时间同步很重要。一定要确保时间同步！！
如果在dashboard上看nova-compute状态，可能一会儿变红，一会儿变绿。那就严格同步时间，或者找到代码，把上面的那个15秒改大一点。
9、noVNC不能连接到实例
novnc的问题比较多，网上也有关于这方面的很多配置介绍，其实配置不复杂，只有四个参数，配置正确基本上没什么大问题，但是装的过程中还是遇到了不少的问题。
a、提示“Connection Refuesd”
可能是控制节点在收到vnc请求的时候，无法解析计算节点的主机名，从而无法和计算节点上的实例建立连接。
另外可能是，当前浏览器不支持或者不能访问，将计算节点的ip和主机名的对应关系加入到控制节点的/etc/hosts文件中。
b、提示“failed connect to server”
出现这种错误的情况比较多，有可能是配置文件的错误，我们的环境中遇到这个错误是因为网络源有更新，导致安装版本不一致，使组件无法正常使用，解决方法就是使用本地源。另外需要特别说明的是使用novnc的功能需要浏览器支持Web Socket和HTML5.推荐使用谷歌。
10、cinder错误，无法登录dashboard.
出现如下错误：
view plaincopy

[*]TypeError at /admin/
[*]hasattr(): attribute name must be string
[*]Request Method: GET
[*]Request URL: http://192.168.80.21/horizon/admin/
[*]Django Version: 1.4.5
[*]Exception Type: TypeError
[*]Exception Value:
[*]hasattr(): attribute name must be string
[*]Exception Location: /usr/lib/python2.7/dist-packages/cinderclient/client.py in __init__, line 78
[*]Python Executable: /usr/bin/python
[*]Python Version: 2.7.3
[*]Python Path:
[*]['/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../..',
[*] '/usr/lib/python2.7',
[*] '/usr/lib/python2.7/plat-linux2',
[*] '/usr/lib/python2.7/lib-tk',
[*] '/usr/lib/python2.7/lib-old',
[*] '/usr/lib/python2.7/lib-dynload',
[*] '/usr/local/lib/python2.7/dist-packages',
[*] '/usr/lib/python2.7/dist-packages',
[*] '/usr/share/openstack-dashboard/',
[*] '/usr/share/openstack-dashboard/openstack_dashboard']
[*]Server time: Fri, 29 Mar 2013 12:51:09 +0000

解决方案
查看 apache2 的 error 日志，报如下错误：
view plaincopy

[*]ERROR:django.request:Internal Server Error: /horizon/admin/
[*]Traceback (most recent call last):
[*]File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 111, in get_response
[*]response = callback(request, *callback_args, **callback_kwargs)
[*]File "/usr/lib/python2.7/dist-packages/horizon/decorators.py", line 38, in dec
[*] return view_func(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/horizon/decorators.py", line 86, in dec
[*] return view_func(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/horizon/decorators.py", line 54, in dec
[*] return view_func(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/horizon/decorators.py", line 38, in dec
[*] return view_func(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/horizon/decorators.py", line 86, in dec
[*] return view_func(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/django/views/generic/base.py", line 48, in view
[*] return self.dispatch(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/django/views/generic/base.py", line 69, in dispatch
[*] return handler(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/horizon/tables/views.py", line 155, in get
[*]handled = self.construct_tables()
[*]File "/usr/lib/python2.7/dist-packages/horizon/tables/views.py", line 146, in construct_tables
[*]handled = self.handle_table(table)
[*]File "/usr/lib/python2.7/dist-packages/horizon/tables/views.py", line 118, in handle_table
[*]data = self._get_data_dict()
[*]File "/usr/lib/python2.7/dist-packages/horizon/tables/views.py", line 182, in _get_data_dict
[*]self._data = {self.table_class._meta.name: self.get_data()}
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/dashboards/admin/overview/views.py", line 41, in get_data
[*]data = super(GlobalOverview, self).get_data()
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/usage/views.py", line 34, in get_data
[*] self.usage.get_quotas()
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/usage/base.py", line 115, in get_quotas
[*] _("Unable to retrieve quota information."))
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/usage/base.py", line 112, in get_quotas
[*]self.quotas = quotas.tenant_quota_usages(self.request)
[*]File "/usr/lib/python2.7/dist-packages/horizon/utils/memoized.py", line 33, in __call__
[*]value = self.func(*args)
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/usage/quotas.py", line 115, in tenant_quota_usages
[*]disabled_quotas=disabled_quotas):
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/usage/quotas.py", line 98, in ge_tenant_quota_data
[*]tenant_id=tenant_id)
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/usage/quotas.py", line 80, in _get_quota_data
[*] quotasets.append(getattr(cinder, method_name)(request, tenant_id))
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/api/cinder.py", line 123, in tenant_quota_get
[*]c_client = cinderclient(request)
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/api/cinder.py", line 59, in cinderclient
[*]http_log_debug=settings.DEBUG)
[*]File "/usr/lib/python2.7/dist-packages/cinderclient/v1/client.py", line 69, in __init__
[*]cacert=cacert)
[*]File "/usr/lib/python2.7/dist-packages/cinderclient/client.py", line 78, in __init__
[*] if hasattr(requests, logging):
[*]TypeError: hasattr(): attribute name must be string

错误信息中指出了 Cinderclient 的 client.py 中 78 行 hasattr() 方法的属性必须是一个字符串。
修改代码：
view plaincopy

[*]# vim /usr/lib/python2.7/dist-packages/cinderclient/client.py
[*] 78 if hasattr(requests, logging): # 改为： if hasattr(requests, 'logging'):
[*] 79 requests.logging.getLogger(requests.__name__).addHandler(ch)

重新启动 apache2 ：
/etc/init.d/apache2 restart
这次访问 dashboard 没有报错，尝试创建 volume 也没有问题了。
11、Unable to attach cinder volume to VM
在测试openstack中的volume服务时把lvm挂载到虚拟机实例时失败，这其实不是cinder的错误，是iscsi挂载的问题。
以下是计算节点nova-compute.log 的错误日志：
view plaincopy

[*]2012-07-24 14:33:08 TRACE nova.rpc.amqp ProcessExecutionError: Unexpected error while running command.
[*]2012-07-24 14:33:08 TRACE nova.rpc.amqp Command: sudo nova-rootwrap iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000011 -p 192.168.0.23:3260 –rescan
[*]2012-07-24 14:33:08 TRACE nova.rpc.amqp Exit code: 255
[*]2012-07-24 14:33:08 TRACE nova.rpc.amqp Stdout: ”
[*]2012-07-24 14:33:08 TRACE nova.rpc.amqp Stderr: ‘iscsiadm: No portal found.\n’

以上错误是没有找到iscsi服务端共享出的存储，查找了很多openstack 资料说要添加以下两个参数：
iscsi_ip_prefix=192.168.80 #openstack环境内网段
iscsi_ip_address=192.168.80.22 # volume机器内网IP
可是问题依然无法解决，后来发现只要在nova.conf配置文件中添加参数iscsi_helper=tgtadm 就挂载失败。
根据这个情况进行了测试查看日志才发现：如果使用参数：iscsi_helper=tgtadm 时就必须使用 tgt 服务，反之使用iscsitarget服务再添加参数iscsi_helper=ietadm。
我测试环境的问题是tgt和iscsitarget服务都已安装并运行着（在安装nova-common时会把tgt服务也安装上，这个不小心还真不会发现），在nova.conf配置中添加参数iscsi_helper=tgtadm ，查看端口3260 发现是iscsitarget服务占用，所以导致挂载失败，我们可以根据情况来使用哪个共享存储服务！！将tgt 和iscsi_helper=tgtadm、iscsitarget和iscsi_helper=ietadm保留一个即可。
12、glance index报错：
view plaincopy

[*]Authorization Failed: Unable to communicate with identity service: {"error": {"message": "An unexpected error prevented the server from fulfilling your request. Command 'openssl' returned non-zero exit status 3", "code": 500, "title": "Internal Server Error"}}. (HTTP 500)

在 Grizzly 版，我测试 glance index 时候报错：
Authorization Failed: Unable to communicate with identity service: {"error": {"message": "An unexpected error prevented the server from fulfilling your request. Command 'openssl' returned non-zero exit status 3", "code": 500, "title": "Internal Server Error"}}. (HTTP 500)错误信息指出：glance 没有通过keystone验证，查看了 keystone 日志，报错如下：
2677 2013-03-04 12:40:58 ERROR Signing error: Error opening signer certificate /etc/keystone/ssl/certs/signing_cert.pem2678 139803495638688:error:02001002:system library:fopen:No such file or directory:bss_file.c:398:fopen('/etc/keystone/ssl/certs/signing_cert.pem','r')2679 139803495638688:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:400:2680 unable to load certificate2682 2013-03-04 12:40:58 ERROR Command 'openssl' returned non-zero exit status 32683 Traceback (most recent call last):2684 File "/usr/lib/python2.7/dist-packages/keystone/common/wsgi.py", line 231, in __call__2685 result = method(context, **params)2686 File "/usr/lib/python2.7/dist-packages/keystone/token/controllers.py", line 118, in authenticate2687 CONF.signing.keyfile)2688 File "/usr/lib/python2.7/dist-packages/keystone/common/cms.py", line 140, in cms_sign_token2689 output = cms_sign_text(text, signing_cert_file_name, signing_key_file_name)2690 File "/usr/lib/python2.7/dist-packages/keystone/common/cms.py", line 135, in cms_sign_text2691 raise subprocess.CalledProcessError(retcode, "openssl")2692 CalledProcessError: Command 'openssl' returned non-zero exit status 3在Grizzly 版中，keystone 默认验证方式是 PKI , 需要签名证书，之前的版本都是用的 UUID，改 keystone.conf:
token_format = UUID在试一次就没有错误了。
13、镜像制作
这里主要强调下windows的镜像制作，因为windows的涉及到加载驱动的问题，就比较麻烦。
下载virtio驱动，因为win默认不支持virtio驱动，而通过openstack管理虚拟机是需要virtio驱动的。需要两个virtio驱动，一个是硬盘的，一个是网卡的，即：virtio-win-0.1-30.iso和virtio-win-1.1.16.vfd。这里主要强调两个地方：
1、创建镜像：
view plaincopy

[*]kvm -m 512 -boot d –drive
[*]ile=win2003server.img,cache=writeback,if=virtio,boot=on -fda virtio-win-1.1.16.vfd -cdrom windows2003_x64.iso -vnc:10

2、引导系统：
view plaincopy

[*]kvm -m 1024 –drive file=win2003server.img,if=virtio,
　　

[*]boot=on -cdrom virtio-win-0.1-30.iso -net nic,model=virtio -net user -boot c -nographic -vnc 8

这里需要注意的地方是if=virtio,boot=on –fda virtio-win-1.1.16.vfd和引导系统时使用的virtio-win-0.1-30.iso 这两个驱动分别是硬盘和网卡驱动。如果不加载这两个驱动安装时会发现找不到硬盘，并且用制作好的镜像生成实例也会发现网卡找不到驱动，所以在这里安装镜像生成后需要重新引导镜像安装更新网卡驱动为virtio。
14、删除僵尸volume
如果cinder服务不正常，我们在创建volume时会产生一些僵尸volume，如果在horizon中无法删除的话，我们需要到服务器上去手动删除，
命令：lvremove /dev/nova-volumes/volume-000002
注意这里一定要写完整的路径，不然无法删除，如果删除提示：
“Can't remove open logical volume“ 可尝试将相关服务stop掉，再尝试删除。删除完还需到数据库cinder的volumes表里清除相关记录。

页: [1]

运维网's Archiver

openstack常见错误总结