KVM nested virtualization with Intel finally works for me on Fedora-18. All three layers L0 (physical host) -> L1(regular-guest/guest-hypervisor) -> L2 (nested-guest) are running successfully as of writing this.
Previously, nested KVM virtualization on Intel was discussed here and here. This time on Fedora-18, I was able to successfully boot and use nested guest with resonable performance. (Although, I still have to do more formal tests to show some meaningful performance results).
Test setup information
Config info about the physical host, regular-guest/guest hypervisor and nested-guest. (All of them are Fedora-18; x86_64)
- Physical Host (Host hypervisor/Bare metal)
- Node info and some version info
#--------------------# # virsh nodeinfo CPU model: x86_64 CPU(s): 4 CPU frequency: 1995 MHz CPU socket(s): 1 Core(s) per socket: 4 Thread(s) per core: 1 NUMA cell(s): 1 Memory size: 10242692 KiB #--------------------# # cat /etc/redhat-release ; uname -r ; arch ; rpm -q qemu-kvm libvirt-daemon-kvm Fedora release 18 (Spherical Cow) 3.6.7-5.fc18.x86_64 x86_64 qemu-kvm-1.3.0-9.fc18.x86_64 libvirt-daemon-kvm-1.0.2-1.fc18.x86_64 # #--------------------#
- Regualr Guest (Guest Hypervisor)
- A 20GB qcow2 disk image w/ cache=’none’ enabled in the libvirt xml
#--------------------# # virsh nodeinfo CPU model: x86_64 CPU(s): 4 CPU frequency: 1994 MHz CPU socket(s): 4 Core(s) per socket: 1 Thread(s) per core: 1 NUMA cell(s): 1 Memory size: 4049888 KiB #--------------------# # cat /etc/redhat-release ; uname -r ; arch ; rpm -q qemu-kvm libvirt-daemon-kvm Fedora release 18 (Spherical Cow) 3.6.10-4.fc18.x86_64 x86_64 qemu-kvm-1.2.2-6.fc18.x86_64 libvirt-daemon-kvm-0.10.2.3-1.fc18.x86_64 #--------------------#
- Config: 2GB Memory; 2 vcpus; 6GB sparse qcow2 disk image
Setting up guest hypervisor and nested guest
Refer the notes linked above to get the nested guest up and running:
- Create a regular guest/guest-hypervisor —
# ./create-regular-f18-guest.bash
- Expose intel VMX extensions inside the guest-hypervisor by adding the cpu’ attribute to the regular-guest’s libvirt xml file
- Shutdown regular guest, Redefine it ( virsh define /etc/libvirt/qemu/regular-guest-f18.xml ) ; Start the guest ( virsh start regular-guest-f18 )
- Now, install virtualization packages inside the guest-hypervisor
–
# yum install libvirt-daemon-kvm libvirt-daemon-config-network libvirt-daemon-config-nwfilter python-virtinst -y
# systemctl start libvirtd.service && systemctl status libvirtd.service
# ./create-nested-f18-guest.bash
The scripts, and reference libvirt xmls I used for this demonstration are posted on github .
qemu-kvm invocation of bare-metal and guest hypervisors
qemu-kvm invocation of regular guest (guest hypervisor) indicating vmx extensions
# ps -ef | grep -i qemu-kvm | egrep -i 'regular-guest-f18|vmx' qemu 15768 1 19 13:33 ? 01:01:52 /usr/bin/qemu-kvm -name regular-guest-f18 -S -M pc-1.3 -cpu core2duo,+vmx -enable-kvm -m 4096 -smp 4,sockets=4,cores=1,threads=1 -uuid 9a7fd95b-7b4c-743b-90de-fa186bb5c85f -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/regular-guest-f18.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/export/vmimgs/regular-guest-f18.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a6:ff:96,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
Running virt-host-validate (it’s part of libvirt-client package) on bare-metal host indicting the host is configured to run KVM
# virt-host-validate QEMU: Checking for hardware virtualization : PASS QEMU: Checking for device /dev/kvm : PASS QEMU: Checking for device /dev/vhost-net : PASS QEMU: Checking for device /dev/net/tun : PASS LXC: Checking for Linux >= 2.6.26 : PASS #
Networking Info
– The regular guest is using the bare metal host’s bridge device ‘br0’
– The nested guest is using libvirt’s default bridge ‘virbr0’
Caveat : If NAT’d networking is used on both bare metal & guest hypervisor, both, by default have 192.168.122.0/24 network subnet (unless explicitly changed), and will mangle the networking setup. Bridging on L0 (bare metal host), and NAT on L1 (guest hypervisor) avoids this.
Notes
- Ensure to have serial console enabled in the both L1 and L2 guests, very handy for debugging. If you use the kickstart file mentioned here, it’s taken care of. The magic lines to be added to kernel cmd line are console=tty0 console=ttyS0,115200
- Once the nested guest was created, I tried to set the hostname and it turns out for some reason ext4 has made the file system read-only :
# hostnamectl set-hostname nested-guest-f18.foo.bar.com Failed to issue method call: Read-only file system
The I see these I/O errors from /var/log/messages:
. . . Feb 12 04:22:31 localhost kernel: [ 724.080207] end_request: I/O error, dev vda, sector 9553368 Feb 12 04:22:31 localhost kernel: [ 724.080922] Buffer I/O error on device dm-1, logical block 33467 Feb 12 04:22:31 localhost kernel: [ 724.080922] Buffer I/O error on device dm-1, logical block 33468
At this point, I tried to reboot the guest, only to be thrown at a dracut repair shell. I tried fsck a couple of times, & then tried to reboot the nested guest, to no avail. Then I force powered-off the nested-guest:
#virsh destroy nested-guest-f18
Now, it boots just fine — just while I was trying to get to the bottom of the I/O errors. I was discussing this behaviour with Rich Jones, and he suggested to try some more I/O activity inside the nested guest to see if I can trigger those errors again.
# find / -exec md5sum {} \; > /dev/null # find / -xdev -exec md5sum {} \; > /dev/null
After the above commands ran for more than 15 minutes, the I/O errors can’t be triggered any more,
- A test for libugestfs program (from rwmj) would be on the host & first level guest to compare. The command needs to be ran several times and discard the first few results, to get a hot cache.
# time guestfish -a /dev/null run'
- Another libguestfs test Rich suggested is to disable nested virt and measure guestfish running in the guest to find the speed-up from nested virtualization in contrast to pure software emulation.
Next, to run more useful work loads in these nested vmx guests.
And for thoe hwo are on debian:
http://alexander.holbreich.org/2013/03/qemu-kvm-introduction/