设为首页 收藏本站
查看: 1199|回复: 0

[经验分享] [转] KVM/QEMU hypervisor driver

[复制链接]

尚未签到

发表于 2015-4-10 13:58:18 | 显示全部楼层 |阅读模式
KVM/QEMU hypervisor driver


  • Project Links
  • Deployment pre-requisites
  • Connections to QEMU driver
  • Driver security architecture

    • Driver instances
    • POSIX users/groups
    • Linux process capabilities
    • SELinux basic confinement
    • SELinux sVirt confinement
    • AppArmor sVirt confinement
    • Cgroups device ACLs


  • Import and export of libvirt domain XML configs

    • Converting from QEMU args to domain XML
    • Converting from domain XML to QEMU args


  • Pass-through of arbitrary qemu commands
  • Example domain XML config
  The libvirt KVM/QEMU driver can manage any QEMU emulator from version 0.8.1 or later. It can also manage Xenner, which provides the same QEMU command line syntax and monitor interaction.

Project Links


  • The KVM Linux hypervisor
  • The QEMU emulator

Deployment pre-requisites


  • QEMU emulators: The driver will probe /usr/bin for the presence of qemu, qemu-system-x86_64, qemu-system-microblaze, qemu-system-microblazeel, qemu-system-mips,qemu-system-mipsel, qemu-system-sparc,qemu-system-ppc. The results of this can be seen from the capabilities XML output.
  • KVM hypervisor: The driver will probe /usr/bin for the presence of qemu-kvm and /dev/kvm device node. If both are found, then KVM fullyvirtualized, hardware accelerated guests will be available.
  • Xenner hypervisor: The driver will probe /usr/bin for the presence of xenner and /dev/kvm device node. If both are found, then Xen paravirtualized guests can be run using the KVM hardware acceleration.

Connections to QEMU driver
  The libvirt QEMU driver is a multi-instance driver, providing a single system wide privileged driver (the "system" instance), and per-user unprivileged drivers (the "session" instance). The URI driver protocol is "qemu". Some example connection URIs for the libvirt driver are:

qemu:///session                      (local access to per-user instance)
qemu+unix:///session                 (local access to per-user instance)
qemu:///system                       (local access to system instance)
qemu+unix:///system                  (local access to system instance)
qemu://example.com/system            (remote access, TLS/x509)
qemu+tcp://example.com/system        (remote access, SASl/Kerberos)
qemu+ssh://root@example.com/system   (remote access, SSH tunnelled)

Driver security architecture
  There are multiple layers to security in the QEMU driver, allowing for flexibility in the use of QEMU based virtual machines.

Driver instances
  As explained above there are two ways to access the QEMU driver in libvirt. The "qemu:///session" family of URIs connect to a libvirtd instance running as the same user/group ID as the client application. Thus the QEMU instances spawned from this driver will share the same privileges as the client application. The intended use case for this driver is desktop virtualization, with virtual machines storing their disk images in the user's home directory and being managed from the local desktop login session.
  The "qemu:///system" family of URIs connect to a libvirtd instance running as the privileged system account 'root'. Thus the QEMU instances spawned from this driver may have much higher privileges than the client application managing them. The intended use case for this driver is server virtualization, where the virtual machines may need to be connected to host resources (block, PCI, USB, network devices) whose access requires elevated privileges.

POSIX users/groups
  In the "session" instance, the POSIX users/groups model restricts QEMU virtual machines (and libvirtd in general) to only have access to resources with the same user/group ID as the client application. There is no finer level of configuration possible for the "session" instances.
  In the "system" instance, libvirt releases from 0.7.0 onwards allow control over the user/group that the QEMU virtual machines are run as. A build of libvirt with no configuration parameters set will still run QEMU processes as root:root. It is possible to change this default by using the --with-qemu-user=$USERNAME and --with-qemu-group=$GROUPNAME arguments to 'configure' during build. It is strongly recommended that vendors build with both of these arguments set to 'qemu'. Regardless of this build time default, administrators can set a per-host default setting in the /etc/libvirt/qemu.conf configuration file via the user=$USERNAME and group=$GROUPNAME parameters. When a non-root user or group is configured, the libvirt QEMU driver will change uid/gid to match immediately before executing the QEMU binary for a virtual machine.
  If QEMU virtual machines from the "system" instance are being run as non-root, there will be greater restrictions on what host resources the QEMU process will be able to access. The libvirtd daemon will attempt to manage permissions on resources to minimise the likelihood of unintentional security denials, but the administrator / application developer must be aware of some of the consequences / restrictions.


  •   The directories /var/run/libvirt/qemu/, /var/lib/libvirt/qemu/ and /var/cache/libvirt/qemu/ must all have their ownership set to match the user / group ID that QEMU guests will be run as. If the vendor has set a non-root user/group for the QEMU driver at build time, the permissions should be set automatically at install time. If a host administrator customizes user/group in /etc/libvirt/qemu.conf, they will need to manually set the ownership on these directories.

  •   When attaching USB and PCI devices to a QEMU guest, QEMU will need to access files in /dev/bus/usb and /sys/bus/pci/devices respectively. The libvirtd daemon will automatically set the ownership on specific devices that are assigned to a guest at start time. There should not be any need for administrator changes in this respect.

  •   Any files/devices used as guest disk images must be accessible to the user/group ID that QEMU guests are configured to run as. The libvirtd daemon will automatically set the ownership of the file/device path to the correct user/group ID. Applications / administrators must be aware though that the parent directory permissions may still deny access. The directories containing disk images must either have their ownership set to match the user/group configured for QEMU, or their UNIX file permissions must have the 'execute/search' bit enabled for 'others'.
      The simplest option is the latter one, of just enabling the 'execute/search' bit. For any directory to be used for storing disk images, this can be achieved by running the following command on the directory itself, and any parent directories

    chmod o+x /path/to/directory

      In particular note that if using the "system" instance and attempting to store disk images in a user home directory, the default permissions on $HOME are typically too restrictive to allow access.


Linux process capabilities
  The libvirt QEMU driver has a build time option allowing it to use the libcap-ng library to manage process capabilities. If this build option is enabled, then the QEMU driver will use this to ensure that all process capabilities are dropped before executing a QEMU virtual machine. Process capabilities are what gives the 'root' account its high power, in particular the CAP_DAC_OVERRIDE capability is what allows a process running as 'root' to access files owned by any user.
  If the QEMU driver is configured to run virtual machines as non-root, then they will already lose all their process capabilities at time of startup. The Linux capability feature is thus aimed primarily at the scenario where the QEMU processes are running as root. In this case, before launching a QEMU virtual machine, libvirtd will use libcap-ng APIs to drop all process capabilities. It is important for administrators to note that this implies the QEMU process will only be able to access files owned by root, and not files owned by any other user.
  Thus, if a vendor / distributor has configured their libvirt package to run as 'qemu' by default, a number of changes will be required before an administrator can change a host to run guests as root. In particular it will be necessary to change ownership on the directories /var/run/libvirt/qemu/, /var/lib/libvirt/qemu/ and /var/cache/libvirt/qemu/ back to root, in addition to changing the /etc/libvirt/qemu.conf settings.

SELinux basic confinement
  The basic SELinux protection for QEMU virtual machines is intended to protect the host OS from a compromised virtual machine process. There is no protection between guests.
  In the basic model, all QEMU virtual machines run under the confined domain root:system_r:qemu_t. It is required that any disk image assigned to a QEMU virtual machine is labelled with system_u:object_r:virt_image_t. In a default deployment, package vendors/distributor will typically ensure that the directory /var/lib/libvirt/images has this label, such that any disk images created in this directory will automatically inherit the correct labelling. If attempting to use disk images in another location, the user/administrator must ensure the directory has be given this requisite label. Likewise physical block devices must be labelled system_u:object_r:virt_image_t.
  Not all filesystems allow for labelling of individual files. In particular NFS, VFat and NTFS have no support for labelling. In these cases administrators must use the 'context' option when mounting the filesystem to set the default label to system_u:object_r:virt_image_t. In the case of NFS, there is an alternative option, of enabling the virt_use_nfs SELinux boolean.

SELinux sVirt confinement
  The SELinux sVirt protection for QEMU virtual machines builds to the basic level of protection, to also allow individual guests to be protected from each other.
  In the sVirt model, each QEMU virtual machine runs under its own confined domain, which is based on system_u:system_r:svirt_t:s0 with a unique category appended, eg, system_u:system_r:svirt_t:s0:c34,c44. The rules are setup such that a domain can only access files which are labelled with the matching category level, eg system_u:object_r:svirt_image_t:s0:c34,c44. This prevents one QEMU process accessing any file resources that are prevent to another QEMU process.
  There are two ways of assigning labels to virtual machines under sVirt. In the default setup, if sVirt is enabled, guests will get an automatically assigned unique label each time they are booted. The libvirtd daemon will also automatically relabel exclusive access disk images to match this label. Disks that are marked as  will get a generic label system_u:system_r:svirt_image_t:s0 allowing all guests read/write access them, while disks marked as  will get a generic label system_u:system_r:svirt_content_t:s0 which allows all guests read-only access.
  With statically assigned labels, the application should include the desired guest and file labels in the XML at time of creating the guest with libvirt. In this scenario the application is responsible for ensuring the disk images & similar resources are suitably labelled to match, libvirtd will not attempt any relabelling.
  If the sVirt security model is active, then the node capabilities XML will include its details. If a virtual machine is currently protected by the security model, then the guest XML will include its assigned labels. If enabled at compile time, the sVirt security model will always be activated if SELinux is available on the host OS. To disable sVirt, and revert to the basic level of SELinux protection (host protection only), the /etc/libvirt/qemu.conf file can be used to change the setting to security_driver="none"

AppArmor sVirt confinement
  When using basic AppArmor protection for the libvirtd daemon and QEMU virtual machines, the intention is to protect the host OS from a compromised virtual machine process. There is no protection between guests.
  The AppArmor sVirt protection for QEMU virtual machines builds on this basic level of protection, to also allow individual guests to be protected from each other.
  In the sVirt model, if a profile is loaded for the libvirtd daemon, then each qemu:///system QEMU virtual machine will have a profile created for it when the virtual machine is started if one does not already exist. This generated profile uses a profile name based on the UUID of the QEMU virtual machine and contains rules allowing access to only the files it needs to run, such as its disks, pid file and log files. Just before the QEMU virtual machine is started, the libvirtd daemon will change into this unique profile, preventing the QEMU process from accessing any file resources that are present in another QEMU process or the host machine.
  The AppArmor sVirt implementation is flexible in that it allows an administrator to customize the template file in /etc/apparmor.d/libvirt/TEMPLATE for site-specific access for all newly created QEMU virtual machines. Also, when a new profile is generated, two files are created: /etc/apparmor.d/libvirt/libvirt- and /etc/apparmor.d/libvirt/libvirt-.files. The former can be fine-tuned by the administrator to allow custom access for this particular QEMU virtual machine, and the latter will be updated appropriately when required file access changes, such as when a disk is added. This flexibility allows for situations such as having one virtual machine in complain mode with all others in enforce mode.
  While users can define their own AppArmor profile scheme, a typical configuration will include a profile for /usr/sbin/libvirtd, /usr/lib/libvirt/virt-aa-helper (a helper program which the libvirtd daemon uses instead of manipulating AppArmor directly), and an abstraction to be included by /etc/apparmor.d/libvirt/TEMPLATE (typically /etc/apparmor.d/abstractions/libvirt-qemu). An example profile scheme can be found in the examples/apparmor directory of the source distribution.
  If the sVirt security model is active, then the node capabilities XML will include its details. If a virtual machine is currently protected by the security model, then the guest XML will include its assigned profile name. If enabled at compile time, the sVirt security model will be activated if AppArmor is available on the host OS and a profile for the libvirtd daemon is loaded when libvirtd is started. To disable sVirt, and revert to the basic level of AppArmor protection (host protection only), the /etc/libvirt/qemu.conf file can be used to change the setting to security_driver="none".

Cgroups device ACLs
  Recent Linux kernels have a capability known as "cgroups" which is used for resource management. It is implemented via a number of "controllers", each controller covering a specific task/functional area. One of the available controllers is the "devices" controller, which is able to setup whitelists of block/character devices that a cgroup should be allowed to access. If the "devices" controller is mounted on a host, then libvirt will automatically create a dedicated cgroup for each QEMU virtual machine and setup the device whitelist so that the QEMU process can only access shared devices, and explicitly disks images backed by block devices.
  The list of shared devices a guest is allowed access to is

/dev/null, /dev/full, /dev/zero,
/dev/random, /dev/urandom,
/dev/ptmx, /dev/kvm, /dev/kqemu,
/dev/rtc, /dev/hpet, /dev/net/tun

  In the event of unanticipated needs arising, this can be customized via the /etc/libvirt/qemu.conf file. To mount the cgroups device controller, the following command should be run as root, prior to starting libvirtd

mkdir /dev/cgroup
mount -t cgroup none /dev/cgroup -o devices

  libvirt will then place each virtual machine in a cgroup at /dev/cgroup/libvirt/qemu/$VMNAME/

Import and export of libvirt domain XML configs
  The QEMU driver currently supports a single native config format known as qemu-argv. The data for this format is expected to be a single line first a list of environment variables, then the QEMu binary name, finally followed by the QEMU command line arguments

Converting from QEMU args to domain XML
  The virsh domxml-from-native provides a way to convert an existing set of QEMU args into a guest description using libvirt Domain XML that can then be used by libvirt. Please note that this command is intended to be used to convert existing qemu guests previously started from the command line to be managed through libvirt. It should not be used a method of creating new guests from scratch. New guests should be created using an application calling the libvirt APIs (see the libvirt applications page for some examples) or by manually crafting XML to pass to virsh.

$ cat > demo.args

运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-55701-1-1.html 上篇帖子: [zz]Improve KVM performance 下篇帖子: [转] KVM/QEMU hypervisor driver
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表