Some context: In regular virtualization, your physical linux host is the hypervisor, and runs multiple operating systems. Nested Virtualization let’s you run a guest inside a regular guest(essentially a Guest hypervisor).For AMD there is nested-support available since a while, and some people reported success w/ nesting KVM guests. For Intel arch., there is support available recently, an year-ish, and some in progress work, so thought I’d give it a whirl when Adam Young started discussion about it in context of openstack project.
Some of the common use-cases for that are being discussed for nested-virtualization
- For instance, a cloud user gets a beefy, Regualar Guest(which she completely controls). Now, this user can turn regular guest into a hypervisor, and can cheerfully run/manage multiple guests for developing or testing w/o the hassle and intervention of the cloud provider.
- Possibility of having a many instances of virtualization setup (hypervisor and its guests) on one single Bare metal.
- Ability to debug and test hypervisor software
I have immediate access to a moderately beefy Intel hardware, and rest of the post is based on Intel’s CPU virt extensions. Before proceeding, let’s settle on some terminology for clarity:
- Physical Host (Host hypervisor/Bare metal)
- Config: Intel(R) Xeon(R) CPU(4 cores/socket); 10GB Memory; CPU Freq – 2GHz; Running latest Fedora-16(Minimal foot-print, @core only with Virt pkgs;x86_64; kernel-3.1.8-2.fc16.x86_64
- Regualr Guest (Or Guest Hypervisor)
- Config: 4GB Memory; 4vCPU; 20GB Raw disk image with cache =’none’ to have decent I/O; Minimal, @core F16; And same virt-packages as Physical Host; x86_64
- Nested Guest (Guest installed inside the Regular Guest)
- Config: 2GB Memory; 1vCPU; Minimal(@core only) F16; x86_64
Enabling Nesting on the Physical Host
Node Info of the Physical Host.
# virsh nodeinfo
CPU model: x86_64
CPU frequency: 1994 MHz
CPU socket(s): 1
Core(s) per socket: 4
Thread(s) per core: 1
NUMA cell(s): 1
Memory size: 10242864 kB
Let us first ensure kvm_intel kernel module has nesting enabled. By default, it’s disabled for Intel arch[ but enabled for AMD -- SVM (secure virtual machine) extensions arch.]
# modinfo kvm_intel | grep -i nested
And, we need to pass this kvm-intel.nested=1 on kernel commandline while rebooting the host to enable nesting for the Intel KVM kernel module. Which can be verified after boot by doing:
# cat /sys/module/kvm_intel/parameters/nested
# systool -m kvm_intel -v | grep -i nested
nested = "Y"
Or alternatively, Adam Young identified that nesting can be enabled by adding this directive kvm_intel.nested=1 to the end of /etc/modprobe.d/dist.conf file and reboot the host so it persists.
Set up the Regular Guest(or Guest hypervisor)
Install a regular guest using virt-install or oz tool or any other preferred way. I made a quick script here. And ensure to have cache=’none’ in the disk attribute of the Guest Hypervisor’s xml file. (observation: Install via virt-install tool didn’t seem have this option picked by default.) Here is the ‘drive’ attribute libvirt xml snippet:
<disk type='file' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
Now, let’s try to enable Intel VMX(Virtual Machine Extensions) in the regular guest’s CPU. We can do it by running the below on the Physical host(aka Host Hypervisor), and adding the ‘cpu’ attribute to the regular-guest’s libvirt xml file, and start the guest.
# virsh capabilities | virsh cpu-baseline /dev/stdin
<feature policy='require' name='dca'/>
<feature policy='require' name='xtpr'/>
<feature policy='require' name='tm2'/>
<feature policy='require' name='vmx'/>
<feature policy='require' name='ds_cpl'/>
<feature policy='require' name='monitor'/>
<feature policy='require' name='pbe'/>
<feature policy='require' name='tm'/>
<feature policy='require' name='ht'/>
<feature policy='require' name='ss'/>
<feature policy='require' name='acpi'/>
<feature policy='require' name='ds'/>
<feature policy='require' name='vme'/>
The o/p of the above cmd has a variety of options. Since we need only vmx extensions, I tried the simple way by adding to the regular-guest’s libvirt xml(virsh edit ..) and started it.
<feature policy='require' name='vmx'/>
Thanks to Jiri Denemark for the above hint. Also note that, there is a very detailed and informative post from Dan P Berrange on host/guest CPU models in libvirt.
As we enabled vmx in the guest-hypervisor, let’s confirm that vmx is exposed in the emulated CPU by ensuring qemu-kvm is invoked with -cpu core2duo,+vmx :
[root@physical-host ~]# ps -ef | grep qemu-kvm
qemu 17102 1 4 22:29 ? 00:00:34 /usr/bin/qemu-kvm -S -M pc-0.14
-cpu core2duo,+vmx -enable-kvm -m 3072
-smp 3,sockets=3,cores=1,threads=1 -name f16test1
-uuid f6219dbd-f515-f3c8-a7e8-832b99a24b5d -nographic -nodefconfig
-nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f16test1.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-netdev tap,fd=21,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e6:cc:4e,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
Now, let’s attempt to create a nested guest
Here comes the more interesting part, the nested-guest config. will be 2G RAM; 1vcpu; 8GB virtual disk. And let’s invoke a virt-install cmdline with a minimal kickstart install:
[root@regular-guest ~]# virt-install --connect=qemu:///system \
--extra-args=ks=file:/fed.ks console=tty0 console=ttyS0,115200 serial rd_NO_PLYMOUTH \
--name=nested-guest --disk path=/var/lib/libvirt/images/nested-guest.img,size=6 \
--ram 2048 \
Retrieving file .treeinfo... | 1.7 kB 00:00 ...
Retrieving file vmlinuz... | 7.9 MB 00:08 ...
Retrieving file initrd.img... 28% [============== ] 647 kB/s | 38 MB 02:25 ETA
virt-install proceeds fine(to a certain extent), doing all regular things like getting access to network, create devices, create file-systems, dep checks performed, and finally package install proceeds:
Welcome to Fedora for x86_64
┌─────────────────────┤ Package Installation ├──────────────────────┐
│ 24% │
│ Packages completed: 52 of 390 │
│ Installing glibc-common-2.14.90-14.x86_64 (112 MB) │
│ Common binaries and locale data for glibc │
And now, it’s stuck like that for ever. Doesn’t budge, trying to install pkgs for eternity. Let’s try to see what’s the state of the guest in a seperate terminal
[root@regular-guest ~]# virsh list
Id Name State
1 nested-guest paused
[root@regular-guest ~]# virsh domstate nested-guest --reason
So our nested-guest seems to be paused, And package install on the nested-guest’s serial console is still hung. I gave up at this point. Need to try if I can get any helpful info w/ virt-dmesg tool aor any other ways to debug this further.
Just to note, there is enough disk space and memory on the ‘regular-guest’, so that case is ruled out here. And, I tried to destroy the broken nested-guest, and attempted to create a fresh one(repeated twice). Still no dice.
So not much luck yet with Intel arch, I’d have to try on an AMD machine.
UPDATE(on Intel arch): After trying a couple of times, I was finally able to ssh to the nested guest, but, after a reboot, the nested-guest loses the IP rendering it inaccessible.(Info: the regular-guest has a bridged IP, and nested-guest has a NATed IP) . And I couldn’t login via serial-console, as it’s broken due to a regression(which has a workaround). Also, refer to comments below for further discussion on NATed networking caveats.
UPDATE2: The correct syntax to be added to /etc/modprobe.conf/dist.conf is
options kvm-intel nested=y