openstack常见错误总结
以下主要为安装部署过程中遇到的一些问题,因为openstack版本问题,带来的组件差异导致不同的版本安装的方法也完全不一样。经过测试,目前已可成功部署Essex和Grizzly两个版本,其中间还有个版本是Folsom,这个版本没有部署成功,也没有花太多时间去研究,因为Folsom版本中使用的quantum组件还不成熟,对于网络连通性还有很多问题,网上也很少有成功的案例,大多数人使用的还是folsom+nova-network模式。到了Grizzly版本,quantum组件才比较稳定,可以正常使用,自己也花了很多时间研究,现在已可以成功部署多节点环境。以下是部署过程中遇到的一些问题,包括Essex和Grizzly两个版本。国内网上关于这方面的资料很少,很多资料也都是国外网站上看到的。而且很多情况下日志错误信息相同,但导致错误的原因却不尽相同,这时候就需要仔细分析其中的原理,才能准确定位。遇到错误并不可怕,我们可以通过对错误的排查加深对系统的理解,这样也是好事。
关于安装部署,网上有一些自动化的部署工具,如devstack和onestack,一键式部署。如果你是初学者,并不建议你使用这些工具,很明显,这样你学不到任何东西,不会有任何收获。如果没有问题可以暂时恭喜你一下,一旦中间环节出现错误信息,你可能一头雾水,根本不知道是哪里错了,加之后期的维护也是相当困难的。你可能需要花更多的时间去排查故障。因为你根本不了解中间经过了哪些环节,需要做哪些配置!这些工具大多数是为了快速部署开发环境所用,正真生产环境还需要我们一步一步来操作。这样有问题也可快速定位排查错误。
本文仅是针对部署过程中的一些错误信息进行总结梳理,并给予解决办法,这些情况是在我的环境里遇到的,并成功解决的,可能会因为环境的不同而有所差异,仅供参考。
1、检查服务是否正常:
view plaincopy
[*]root@control:~# nova-manage service list
[*]Binary Host Zone Status State Updated_At
[*]nova-cert control internal enabled :-) 2013-04-26 02:29:44
[*]nova-conductor control internal enabled :-) 2013-04-26 02:29:42
[*]nova-consoleauth control internal enabled :-) 2013-04-26 02:29:44
[*]nova-scheduler control internal enabled :-) 2013-04-26 02:29:47
[*]nova-compute node-01 nova enabled :-) 2013-04-26 02:29:46
[*]nova-compute node-02 nova enabled :-) 2013-04-26 02:29:46
[*]nova-compute node-03 nova enabled :-) 2013-04-26 02:29:42
如果看到都是笑脸状态,说明nova的服务属于正常状态,如果出现XXX,请查看该服务的相关日志信息,在/var/log/nova/下查看,通过日志一般可以分析出错误的原因。
2、libvirt错误
view plaincopy
[*]python2.7/dist-packages/nova/virt/libvirt/connection.py”, line 338, in _connect
[*]2013-03-0917:05:42 TRACE nova return libvirt.openAuth(uri, auth, 0)
[*]2013-03-09 17:05:42 TRACE nova File “/usr/lib/python2.7/dist-packages/libvirt.py”, line 102, in openAuth
[*]2013-03-09 17:05:42 TRACE nova if ret is None:raise libvirtError(‘virConnectOpenAuth() failed’)
[*]2013-03-09 17:05:42 TRACE nova libvirtError: Failed to connect socket to ‘/var/run/libvirt/libvirt-sock’: No such file or directory
[*]2013-03-09 22:05:41.909+0000: 12466: info : libvirt version: 0.9.8
[*]2013-03-09 22:05:41.909+0000: 12466: error : virNetServerMDNSStart:460 : internal error Failed to create mDNS client: Daemon not running
解决方案:
出现这种错误首先要查看/var/log/libvirt/libvirtd.log日志信息,日志里会显示:libvirt-bin service will not start without dbus installed.
我们再查看ps –ea|grep dbus,确认dbus is running,然后执行apt-get install lxc
3、Failed to add image
view plaincopy
[*]Error:
[*]Failed to add image. Got error: The request returned 500 Internal Server Error
解决方案:
环境变量问题,配置环境变量,在/etc/profile文件中新增:
view plaincopy
[*]OS_AUTH_KEY=”openstack”
[*]OS_AUTH_URL=”http://localhost:5000/v2.0/”
[*]OS_PASSWORD=”openstack”
[*]OS_TENANT_NAME=”admin”
[*]OS_USERNAME=”admin”
然后执行source/etc/profile即可!当然你也可以不在profile里配置环境变量,但是只能临时生效,重启服务器就很麻烦,所以建议你还是写在profile里,这样会省很多麻烦。
4、僵尸实例的产生
僵尸实例一般是非法的关闭nova或者底层虚拟机,又或者在实例错误时删除不了的错误,注意用virsh list检查底层虚拟机是否还在运行,有的话停掉,然后直接进入数据库删除。
view plaincopy
[*]Nova instance not found
[*]Local file storage of the image files.
[*]Error:
[*]2013-03-09 17:58:08 TRACE nova raise exception.InstanceNotFound(instance_id=instance_name)
[*]2013-03-09 17:58:08 TRACE nova InstanceNotFound: Instance instance-00000002 could not be found.
[*]2013-03-09 17:58:08 TRACE nova
解决方案:
删除数据库中的僵尸实例或将数据库删除重新创建:
a、删除数据库:
view plaincopy
[*]$mysql –u root –p
[*]DROP DATABASE nova;
[*]Recreate the DB:
[*]CREATE DATABASE nova; (strip formatting if you copy and paste any of this)
[*]GRANT ALL PRIVILEGES ON nova.* TO ‘novadbadmin’@'%’ IDENTIFIED BY ‘<password>’;
[*]Quit
[*]
[*]Resync DB
b、删除数据库中的实例:
view plaincopy
[*]#!/bin/bash
[*]mysql -uroot -pmysql <<_ESXU_
[*]use nova;
[*]DELETE a FROM nova.security_group_instance_association
[*]AS a INNER JOIN nova.instances AS b
[*]ON a.instance_uuid=b.id where b.uuid='$1';
[*]DELETE FROM nova.instance_info_caches WHERE instance_uuid='$1';
[*]DELETE FROM nova.instances WHERE uuid='$1';
[*]_ESXU_
将以上文件写入delete_insrance.sh中,然后执行sh delete_instrance.sh insrance_id;
其中instrance_id可以通过nova list 查看。
5、Keystone NoHandlers
view plaincopy
[*]Error
[*]root@openstack-dev-r910:/home/brent/openstack# ./keystone_data.sh
[*]No handlers could be found for logger “keystoneclient.client”
[*]Unable to authorize user
[*]No handlers could be found for logger “keystoneclient.client”
[*]Unable to authorize user
[*]No handlers could be found for logger “keystoneclient.client”
[*]Unable to authorize user
解决方案:
出现这种错误是大多数是由于keystone_data.sh有误,其中
admin_token必须与/etc/keystone/keystone.conf中相同。然后确认keystone.conf中有如下配置:
driver = keystone.catalog.backends.templated.TemplatedCatalog template_file = /etc/keystone/default_catalog.templates
6、清空系统组件,重新安装:
view plaincopy
[*]#!/bin/bash
[*]mysql -uroot -popenstack -e “drop database nova;”
[*]mysql -uroot -popenstack -e “drop database glance;”
[*]mysql -uroot -popenstack -e “drop database keystone;”
[*]apt-get purge nova-api nova-cert nova-common nova-compute
[*]nova-compute-kvm nova-doc nova-network nova-objectstore
[*]nova-scheduler nova-vncproxy nova-volume python-nova python-novaclient
[*]apt-get autoremove
[*]rm -rf /var/lib/glance
[*]rm -rf /var/lib/keystone/
[*]rm -rf /var/lib/nova/
[*]rm -rf /var/lib/mysql
可通过执行上面的脚本,卸载已安装的组件并清空数据库。这样可以省去重装系统的麻烦!
7、Access denied for user ‘keystone@localhost(using password:YES’)
view plaincopy
[*]# keystone-manage db_sync
[*]File “/usr/lib/python2.7/dist-packages/MySQLdb/connections.py”, line 187, in __init__
[*]super(Connection, self).__init__(*args, **kwargs2)
[*]sqlalchemy.exc.OperationalError: (OperationalError) (1045, “Access denied for user ‘keystone’@'openstack1′ (using password: YES)”) None None
解决方案:
查看keystone.conf配置文件链接数据库是否有误,正确如下:
view plaincopy
[*]
[*]connection = mysql://keystone:openstack@localhost:3306/keystone
8、nova-compute挂掉与时间同步的关系
很多时候发现nova-compute挂掉,或者不正常了,通过nova-manage查看状态是XXX了。
往往是nova-compute的主机时间和controller的主机时间不一致。 nova-compute是定时地往数据库中services这个表update时间的,这个时间是nova-compute的主机时间。
controller校验nova-compute的存活性是以controller的时间减去nova-compute的update时间,如果大于多少秒(具体数值代码里面有,好像是15秒)就判断nova-compute异常。
这个时候你用nova-manage查看nova-compute状态是XXX,如果创建虚拟机,查看nova-scheduler.log 就是提示找不到有效的host 其他服务节点类同,这是nova心跳机制问题。所以讲nova环境中各节点时间同步很重要。一定要确保时间同步!!
如果在dashboard上看nova-compute状态,可能一会儿变红,一会儿变绿。那就严格同步时间,或者找到代码,把上面的那个15秒改大一点。
9、noVNC不能连接到实例
novnc的问题比较多,网上也有关于这方面的很多配置介绍,其实配置不复杂,只有四个参数,配置正确基本上没什么大问题,但是装的过程中还是遇到了不少的问题。
a、提示“Connection Refuesd”
可能是控制节点在收到vnc请求的时候,无法解析计算节点的主机名,从而无法和计算节点上的实例建立连接。
另外可能是,当前浏览器不支持或者不能访问,将计算节点的ip和主机名的对应关系加入到控制节点的/etc/hosts文件中。
b、提示“failed connect to server”
出现这种错误的情况比较多,有可能是配置文件的错误,我们的环境中遇到这个错误是因为网络源有更新,导致安装版本不一致,使组件无法正常使用,解决方法就是使用本地源。另外需要特别说明的是使用novnc的功能需要浏览器支持Web Socket和HTML5.推荐使用谷歌。
10、cinder错误,无法登录dashboard.
出现如下错误:
view plaincopy
[*]TypeError at /admin/
[*]hasattr(): attribute name must be string
[*]Request Method: GET
[*]Request URL: http://192.168.80.21/horizon/admin/
[*]Django Version: 1.4.5
[*]Exception Type: TypeError
[*]Exception Value:
[*]hasattr(): attribute name must be string
[*]Exception Location: /usr/lib/python2.7/dist-packages/cinderclient/client.py in __init__, line 78
[*]Python Executable: /usr/bin/python
[*]Python Version: 2.7.3
[*]Python Path:
[*]['/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../..',
[*] '/usr/lib/python2.7',
[*] '/usr/lib/python2.7/plat-linux2',
[*] '/usr/lib/python2.7/lib-tk',
[*] '/usr/lib/python2.7/lib-old',
[*] '/usr/lib/python2.7/lib-dynload',
[*] '/usr/local/lib/python2.7/dist-packages',
[*] '/usr/lib/python2.7/dist-packages',
[*] '/usr/share/openstack-dashboard/',
[*] '/usr/share/openstack-dashboard/openstack_dashboard']
[*]Server time: Fri, 29 Mar 2013 12:51:09 +0000
解决方案
查看 apache2 的 error 日志,报如下错误:
view plaincopy
[*]ERROR:django.request:Internal Server Error: /horizon/admin/
[*]Traceback (most recent call last):
[*]File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 111, in get_response
[*]response = callback(request, *callback_args, **callback_kwargs)
[*]File "/usr/lib/python2.7/dist-packages/horizon/decorators.py", line 38, in dec
[*] return view_func(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/horizon/decorators.py", line 86, in dec
[*] return view_func(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/horizon/decorators.py", line 54, in dec
[*] return view_func(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/horizon/decorators.py", line 38, in dec
[*] return view_func(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/horizon/decorators.py", line 86, in dec
[*] return view_func(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/django/views/generic/base.py", line 48, in view
[*] return self.dispatch(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/django/views/generic/base.py", line 69, in dispatch
[*] return handler(request, *args, **kwargs)
[*]File "/usr/lib/python2.7/dist-packages/horizon/tables/views.py", line 155, in get
[*]handled = self.construct_tables()
[*]File "/usr/lib/python2.7/dist-packages/horizon/tables/views.py", line 146, in construct_tables
[*]handled = self.handle_table(table)
[*]File "/usr/lib/python2.7/dist-packages/horizon/tables/views.py", line 118, in handle_table
[*]data = self._get_data_dict()
[*]File "/usr/lib/python2.7/dist-packages/horizon/tables/views.py", line 182, in _get_data_dict
[*]self._data = {self.table_class._meta.name: self.get_data()}
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/dashboards/admin/overview/views.py", line 41, in get_data
[*]data = super(GlobalOverview, self).get_data()
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/usage/views.py", line 34, in get_data
[*] self.usage.get_quotas()
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/usage/base.py", line 115, in get_quotas
[*] _("Unable to retrieve quota information."))
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/usage/base.py", line 112, in get_quotas
[*]self.quotas = quotas.tenant_quota_usages(self.request)
[*]File "/usr/lib/python2.7/dist-packages/horizon/utils/memoized.py", line 33, in __call__
[*]value = self.func(*args)
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/usage/quotas.py", line 115, in tenant_quota_usages
[*]disabled_quotas=disabled_quotas):
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/usage/quotas.py", line 98, in ge_tenant_quota_data
[*]tenant_id=tenant_id)
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/usage/quotas.py", line 80, in _get_quota_data
[*] quotasets.append(getattr(cinder, method_name)(request, tenant_id))
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/api/cinder.py", line 123, in tenant_quota_get
[*]c_client = cinderclient(request)
[*]File "/usr/share/openstack-dashboard/openstack_dashboard/wsgi/../../openstack_dashboard/api/cinder.py", line 59, in cinderclient
[*]http_log_debug=settings.DEBUG)
[*]File "/usr/lib/python2.7/dist-packages/cinderclient/v1/client.py", line 69, in __init__
[*]cacert=cacert)
[*]File "/usr/lib/python2.7/dist-packages/cinderclient/client.py", line 78, in __init__
[*] if hasattr(requests, logging):
[*]TypeError: hasattr(): attribute name must be string
错误信息中指出了 Cinderclient 的 client.py 中 78 行 hasattr() 方法的属性必须是一个字符串。
修改代码:
view plaincopy
[*]# vim /usr/lib/python2.7/dist-packages/cinderclient/client.py
[*] 78 if hasattr(requests, logging): # 改为 : if hasattr(requests, 'logging'):
[*] 79 requests.logging.getLogger(requests.__name__).addHandler(ch)
重新启动 apache2 :
/etc/init.d/apache2 restart
这次访问 dashboard 没有报错,尝试创建 volume 也没有问题了。
11、Unable to attach cinder volume to VM
在测试openstack中的volume服务时把lvm挂载到虚拟机实例时失败,这其实不是cinder的错误,是iscsi挂载的问题。
以下是计算节点nova-compute.log 的错误日志:
view plaincopy
[*]2012-07-24 14:33:08 TRACE nova.rpc.amqp ProcessExecutionError: Unexpected error while running command.
[*]2012-07-24 14:33:08 TRACE nova.rpc.amqp Command: sudo nova-rootwrap iscsiadm -m node -T iqn.2010-10.org.openstack:volume-00000011 -p 192.168.0.23:3260 –rescan
[*]2012-07-24 14:33:08 TRACE nova.rpc.amqp Exit code: 255
[*]2012-07-24 14:33:08 TRACE nova.rpc.amqp Stdout: ”
[*]2012-07-24 14:33:08 TRACE nova.rpc.amqp Stderr: ‘iscsiadm: No portal found.\n’
以上错误是没有找到iscsi服务端共享出的存储,查找了很多openstack 资料说要添加以下两个参数:
iscsi_ip_prefix=192.168.80 #openstack环境内网段
iscsi_ip_address=192.168.80.22 # volume机器内网IP
可是问题依然无法解决,后来发现只要在nova.conf配置文件中添加参数iscsi_helper=tgtadm 就挂载失败。
根据这个情况进行了测试查看日志才发现:如果使用参数 :iscsi_helper=tgtadm 时就必须使用 tgt 服务,反之使用iscsitarget服务再添加参数iscsi_helper=ietadm。
我测试环境的问题是tgt和iscsitarget服务都已安装并运行着(在安装nova-common时会把tgt服务也安装上,这个不小心还真不会发现),在nova.conf配置中添加参数iscsi_helper=tgtadm ,查看端口3260 发现是iscsitarget服务占用,所以导致挂载失败,我们可以根据情况来使用哪个共享存储服务!!将tgt 和iscsi_helper=tgtadm、iscsitarget和iscsi_helper=ietadm保留一个即可。
12、glance index报错:
view plaincopy
[*]Authorization Failed: Unable to communicate with identity service: {"error": {"message": "An unexpected error prevented the server from fulfilling your request. Command 'openssl' returned non-zero exit status 3", "code": 500, "title": "Internal Server Error"}}. (HTTP 500)
在 Grizzly 版,我测试 glance index 时候报错:
Authorization Failed: Unable to communicate with identity service: {"error": {"message": "An unexpected error prevented the server from fulfilling your request. Command 'openssl' returned non-zero exit status 3", "code": 500, "title": "Internal Server Error"}}. (HTTP 500)错误信息指出:glance 没有通过keystone验证,查看了 keystone 日志,报错如下:
2677 2013-03-04 12:40:58 ERROR Signing error: Error opening signer certificate /etc/keystone/ssl/certs/signing_cert.pem2678 139803495638688:error:02001002:system library:fopen:No such file or directory:bss_file.c:398:fopen('/etc/keystone/ssl/certs/signing_cert.pem','r')2679 139803495638688:error:20074002:BIO routines:FILE_CTRL:system lib:bss_file.c:400:2680 unable to load certificate2682 2013-03-04 12:40:58 ERROR Command 'openssl' returned non-zero exit status 32683 Traceback (most recent call last):2684 File "/usr/lib/python2.7/dist-packages/keystone/common/wsgi.py", line 231, in __call__2685 result = method(context, **params)2686 File "/usr/lib/python2.7/dist-packages/keystone/token/controllers.py", line 118, in authenticate2687 CONF.signing.keyfile)2688 File "/usr/lib/python2.7/dist-packages/keystone/common/cms.py", line 140, in cms_sign_token2689 output = cms_sign_text(text, signing_cert_file_name, signing_key_file_name)2690 File "/usr/lib/python2.7/dist-packages/keystone/common/cms.py", line 135, in cms_sign_text2691 raise subprocess.CalledProcessError(retcode, "openssl")2692 CalledProcessError: Command 'openssl' returned non-zero exit status 3在Grizzly 版中,keystone 默认验证方式是 PKI , 需要签名证书,之前的版本都是用的 UUID,改 keystone.conf:
token_format = UUID在试一次就没有错误了。
13、镜像制作
这里主要强调下windows的镜像制作,因为windows的涉及到加载驱动的问题,就比较麻烦。
下载virtio驱动,因为win默认不支持virtio驱动,而通过openstack管理虚拟机是需要virtio驱动的。需要两个virtio驱动,一个是硬盘的,一个是网卡的,即:virtio-win-0.1-30.iso和virtio-win-1.1.16.vfd。这里主要强调两个地方:
1、创建镜像:
view plaincopy
[*]kvm -m 512 -boot d –drive
[*]ile=win2003server.img,cache=writeback,if=virtio,boot=on -fda virtio-win-1.1.16.vfd -cdrom windows2003_x64.iso -vnc:10
2、引导系统 :
view plaincopy
[*]kvm -m 1024 –drive file=win2003server.img,if=virtio,
[*]boot=on -cdrom virtio-win-0.1-30.iso -net nic,model=virtio -net user -boot c -nographic -vnc 8
这里需要注意的地方是if=virtio,boot=on –fda virtio-win-1.1.16.vfd和引导系统时使用的virtio-win-0.1-30.iso 这两个驱动分别是硬盘和网卡驱动。如果不加载这两个驱动安装时会发现找不到硬盘,并且用制作好的镜像生成实例也会发现网卡找不到驱动,所以在这里安装镜像生成后需要重新引导镜像安装更新网卡驱动为virtio。
14、删除僵尸volume
如果cinder服务不正常,我们在创建volume时会产生一些僵尸volume,如果在horizon中无法删除的话,我们需要到服务器上去手动删除,
命令:lvremove /dev/nova-volumes/volume-000002
注意这里一定要写完整的路径,不然无法删除,如果删除提示:
“Can't remove open logical volume“ 可尝试将相关服务stop掉,再尝试删除。删除完还需到数据库cinder的volumes表里清除相关记录。
页:
[1]