Tag Archives: Virtualization

Cubietruck: QEMU, KVM and Fedora

Rich Jones previoulsy wrote here on how he got KVM working on Cubietruck — it was Fedora-19 timeframe. It wasn’t quite straight forwad then: you had to build a custom Kernel, custom U-Boot (Universal-Boot), etc.

Recently I got a Cubietruck for testing. First thing I wanted to try was to boot a Fedora-21 KVM guest. It worked just fine — ensure to supply -cpu host parameter to QEMU invocation (Rich actually mentions this at the end of his post linked above). Why? CubieTruck has a Cortex-A7 processor, for which QEMU doesn’t have specific support, so you’d need need to use the same CPU as the host to boot a KVM guest. Peter Maydell, one of the QEMU upstream maintainers, explains a little more here on the why (a Kernel limitation).

Below is what I ended up doing to successfully boot a Fedora 21 guest on Cubietruck.

  • I downloaded the Fedora 21-Beta ARM image and wrote it to a microSD card. (I used this procedure.)
  • The above Fedora ARM image didn’t have KVM module. Josh Boyer on IRC said I might need a (L)PAE ARM Kernel. Installing it (3.17.4-302.fc21.armv7hl+lpae) did it — this is built with KVM module enabled.
  • Resized the root filesystem of Fedora 21 on the microSD card. (Procedure I used.)
  • Then, I tried a quick test to boot a KVM guest with libguestfs-test-tool. But the guest boot failed with:
    “kvm_init_vcpu failed: Invalid argument”. So, I filed a libguestfs bug to track this. After a bit of trial and error on IRC, I made a small edit to libguestfs source so that the KVM appliance is created with -cpu host. Rich suggested I sumbit this as a patch — which resulted in this libguestfs commit (resolving the bug I filed).
  • With this in place, a KVM guest boots successfully. (Complete output of boot via libguestfs appliance is here.) Also, I tested importing a Cirros-0.3.3 disk image into libvirt and run it succesfully.

[Trivia: Somehow the USB to TTL CP2102 serial converter cable I bought didn’t show boot (nor any other) messages when I accessed it via UART (with ‘ screen /dev/ttyUSB0 115200‘). Fortunately, the regular cable that was shipped with the Cubietruck worked just fine. Still, I’d like to find what’s up with the CP2012 cable, since it was deteced as a device in my systemd logs. ]



Filed under Uncategorized

libvirt blockcommit: shorten disk image chain by live merging the current active disk content

When using QCOW2-based external snapshots, it is desirable to reduce an entire disk image chain to a single disk to retain performance and increase while the guest is running. Upstream QEMU and libvirt has recently acquired the ability to do that. Relevant git commits for QEMU (Jeff Cody) and libvirt (Eric Blake).

This is best illustrated with a quick example.

Let’s start with the below disk image chain as below for a guest called vm1. For simplicity’s sake:

[base] <-- [sn1] <-- [sn2] <-- [current] (live QEMU)

Once live active block commit operation is complete (step 5 below), the result will be a flattened disk image chain where data from sn1, sn2 and current are live commited into base:

 [base] (live QEMU)

(1) List the current active image in use:

$ virsh domblklist vm1
Target     Source
vda        /export/images/base.qcow2

(2) For a quick test, create external snapshots. (And, repeat the above operation two more times, so we have the chain: [base] <– [sn1] <– [sn2] <– [current] )

$ virsh snapshot-create-as \
   --domain vm1 snap1 \
   --diskspec vda,file=/export/images/sn1.qcow2 \
   --disk-only --atomic

(3) Enumerate the backing file chain:

$ qemu-img info --backing-chain current.qcow2
[. . .] # output discarded for brevity

(4) Again, check the current active disk image:

$ virsh domblklist vm1
Target     Source
vda        /export/images/current.qcow2

(5) Live Active commit an entire chain, including pivot:

$ virsh blockcommit vm1 vda \
   --active --pivot --verbose
Block Commit: [100 %]
Successfully pivoted


  • –active: It performs a two stage operation: first stage – it commits the contents from top images into base (i.e. sn1, sn2, current into base); in the second stage, the block operation remains awake to synchronize any further changes (from top images into base), here the user can take two actions: cancel the job, or pivot the job, i.e. adjust the base image as the current active image.
  • –pivot: Once data is committed from sn1, sn2 and current into base, it pivots the live QEMU to use base as the active image.
  • –verbose: Displays a progress of block operation.
  • Finally, the disk image backing chain is shortened to a single disk image.

(6) Optionally, list the current active image in use. It’s now back to ‘base’ which has all the contents from current, sn2, sn1):

$ virsh domblklist vm1
Target     Source
vda        /export/images/base.qcow2


Filed under Uncategorized

libvirt: default network conflicts (not anymore)

Increasingly there’s a need for libvirt networking to work inside a virtual machine that is already running on the default network ( The immediate practical case where this comes up is while testing nested virtualization: start a guest (L1) with default libvirt networking, and if you need to install libvirt again on it to run a (nested) guest (L2), there’ll be routing conflict because of the existing default route — Up until now, I tried to avoid this by creating a new libvirt network with a different IP range (or manually edit the default libvirt network).

To alleviate this routing conflict, Laine Stump (libvirt developer) now pushed a patch (with a tiny follow up) to upstream libvirt git. (Relevant libvirt bug with discussion.)

I ended up testing the patch last night, it works well.

Assuming your physical host (L0) has the default libvirt network route:

$ ip route show | grep virbr dev virbr0  proto kernel  scope link  src

Now, start a guest (L1) and when you install libvirt (which has the said fix) on it, it notices the existing route of and creates the default network on the next free network range (starting its search with, thus avoiding the routing conflict.

 $ ip route show
  default via dev ens2  proto static  metric 1024 dev ens2  proto kernel  scope link  src dev virbr0  proto kernel  scope link  src

Relevant snippet of the default libvirt network (you can notice the new network range):

  $ virsh net-dumpxml default | grep "ip address" -A4
    <ip address='' netmask=''>
        <range start='' end=''/>

So, please test it (build RPMs locally from git master or should be available in the next upstream libvirt release, early October) for your use cases and report bugs, if any.

[Update: On Fedora, this fix is available from version libvirt-1.2.8-2.fc21 onwards.]


Filed under Uncategorized

Live disk migration with libvirt blockcopy

[08-JAN-2015 Update: Correct the blockcopy CLI and update the final step to re-use the copy to be consistent with the scenario outlined at the beginning. Corrections pointed out by Gary R Cook at the end of the comments.]
[17-NOV-2014 Update: With recent libvirt/QEMU improvements, another way (which is relatively faster) to take a live disk backup via libvirt blockcommit, here’s an example]

QEMU and libvirt projects has had a lot of block layer improvements in its last few releases (libvirt 1.2.6 & QEMU 2.1). This post discusses a method to do live disk storage migration with libvirt’s blockcopy.

Context on libvirt blockcopy
Simply put, blockcopy facilitates virtual machine live disk image copying (or mirroring) — primarily useful for different use cases of storage migration:

  • Live disk storage migration
  • Live backup of a disk image and its associated backing chain
  • Efficient non-shared storage migration (with a combination of virsh operations snapshort-create-as+blockcopy+blockcommit)
  • As of IceHouse release, OpenStack Nova project also uses a variation of libvirt blockcopy, through its Python API virDomainBlockRebase, to create live snapshots, nova image-create. (More details on this in an upcoming blog post).

A blockcopy operation has two phases: (a) All of source disk content is copied (or mirrored) to the destination, this operation can be canceled to revert to the source disk (b) Once libvirt gets a signal indicating source and destination content are equal, the mirroring job remains awake until an explicit call to virsh blockjob [. . .] --abort is issued to end the mirroring operation gracefully . If desired, this explicit call to abort can be avoided by supplying --finish option. virsh manual page for verbose details.

Scenario: Live disk storage migration

To illustrate a simple case of live disk storage migration, we’ll use a disk image chain of depth 2:

base <-- snap1 <-- snap2 (Live QEMU) 

Once live blockcopy is complete, the resulting status of disk image chain ends up as below:

base <-- snap1 <-- snap2
          '------- copy (Live QEMU, pivoted)

I.e. once the operation finishes, ‘copy’ will share the backing file chain of ‘snap1’ and ‘base’. And, live QEMU is now pivoted to use the ‘copy’.

Prepare disk images, backing chain & define the libvirt guest

[For simplicity, all virtual machine disks are QCOW2 images.]

Create the base image:

 $ qemu-img create -f qcow2 base 1G

Edit the base disk image using guestfish, create a partition, make a file-system, add a file to the base image so that we distinguish its contents from its qcow2 overlay disk images:

$ guestfish -a base.qcow2 
[. . .]
><fs> run 
><fs> part-disk /dev/sda mbr
><fs> mkfs ext4 /dev/sda1
><fs> mount /dev/sda1 /
><fs> touch /foo
><fs> ls /
><fs> exit

Create another QCOW2 overlay snapshot ‘snap1’, with backing file as ‘base’:

$ qemu-img create -f qcow2 -b base.qcow2 \
  -o backing_fmt=qcow2 snap1.qcow2

Add a file to snap1.qcow2:

$ guestfish -a snap1.qcow2 
[. . .]
><fs> run
><fs> part-disk /dev/sda mbr
><fs> mkfs ext4 /dev/sda1
><fs> mount /dev/sda1 /
><fs> touch /bar
><fs> ls /
><fs> exit

Create another QCOW2 overlay snapshot ‘snap2’, with backing file as ‘snap1’:

$ qemu-img create -f qcow2 -b snap1.qcow2 \
  -o backing_fmt=qcow2 snap2.qcow2

Add another test file ‘baz’ into snap2.qcow2 using guestfish (refer to previous examples above) to distinguish contents of base, snap1 and snap2.

Create a simple libvirt XML file as below, with source file pointing to snap2.qcow2 — which will be the active block device (i.e. it tracks all new guest writes):

$ cat <<EOF > /etc/libvirt/qemu/testvm.xml
<domain type='kvm'>
  <memory unit='MiB'>512</memory>   
    <type arch='x86_64'>hvm</type>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/export/vmimages/snap2.qcow2'/>
      <target dev='vda' bus='virtio'/>

Define the guest and start it:

$ virsh define etc/libvirt/qemu/testvm.xml
  Domain testvm defined from /etc/libvirt/qemu/testvm.xml
$ virsh start testvm
Domain testvm started

Perform live disk migration
Undefine the running libvirt guest to make it transient[*]:

$ virsh dumpxml --inactive testvm > /var/tmp/testvm.xml
$ virsh undefine testvm

Check what is the current block device before performing live disk migration:

$ virsh domblklist testvm
Target     Source
vda        /export/vmimages/snap2.qcow2

Optionally, display the backing chain of snap2.qcow2:

$ qemu-img info --backing-chain /export/vmimages/snap2.qcow2
[. . .] # Output removed for brevity

Initiate blockcopy (live disk mirroring):

$ virsh blockcopy --domain testvm vda \
  /export/blockcopy-test/backups/copy.qcow2 \
  --wait --verbose --shallow \

Details of the above command: It creates copy.qcow2 file in the specified path; performs a --shallow blockcopy (i.e. the ‘copy’ shares the backing chain) of the current block device (vda); –pivot will pivot the live QEMU to the ‘copy’.

Confirm that QEMU has pivoted to the ‘copy’ by enumerating the current block device in use:

$ virsh domblklist testvm
Target     Source
vda        /export/vmimages/copy.qcow2

Again, display the backing chain of ‘copy’, it should be the resultant chain as noted in the Scenario section above).

$ qemu-img info --backing-chain /export/vmimages/copy.qcow2

Enumerate the contents of copy.qcow2:

$ guestfish -a copy.qcow2 
[. . .]
><fs> run
><fs> mount /dev/sda1 /
><fs> ls /
><fs> quit

(You can notice above: all the content from base.qcow2, snap1.qcow2, and snap2.qcow2 mirrored into copy.qcow2.)

Edit the libvirt guest XML to use the copy.qcow2, and define it:

$ virsh edit testvm
# Replace the <source file='/export/vmimages/snap2.qcow2'/> 
# with <source file='/export/vmimages/copy.qcow2'/>
[. . .] 

$ virsh define /var/tmp/testvm.xml

[*] Reason for the undefining and defining the guest again: As of writing this, QEMU has to support persistent dirty bitmap — this enables us to restart a QEMU process with disk mirroring intact. There are some in-progress patches upstream for a while. Until they are in main line QEMU, the current approach (as illustrated above) is: make a running libvirt guest transient temporarily, perform live blockcopy, and make the guest persistent again. (Thanks to Eric Blake, one of libvirt project’s principal developers, for this detail.)


Filed under Uncategorized

Resize a fedora 19 guest with libguestfs tools

The inimitable Rich Jones writes some incredibly useful software. I can’t count how many times they helped me debugging disk images, or gave great insights into different aspects of linux virtualization. One of those instances again — I had to resize my OpenStack Fedora guest as its root file system merely had 5.4 GB to start with. So, I wanted to add atleast 15 GB more. After a bit of trial & error, here’s how I got it working. I’m using Fedora-19 in this case, but any other distro which supports libguestfs should be just fine.

Firstly, let’s check disk space inside the guest:

$ df -hT
    Filesystem              Type      Size  Used Avail Use% Mounted on
    /dev/mapper/fedora-root ext4      5.4G  5.4G     0 100% /
    devtmpfs                devtmpfs  4.7G     0  4.7G   0% /dev
    tmpfs                   tmpfs     4.7G     0  4.7G   0% /dev/shm
    tmpfs                   tmpfs     4.7G  392K  4.7G   1% /run
    tmpfs                   tmpfs     4.7G     0  4.7G   0% /sys/fs/cgroup
    tmpfs                   tmpfs     4.7G  472K  4.7G   1% /tmp
    /dev/vda1               ext4      477M   87M  365M  20% /boot
    /dev/loop0              ext4      4.6G   10M  4.4G   1% /srv/node/device1
    /dev/loop1              ext4      4.6G   10M  4.4G   1% /srv/node/device2
    /dev/loop2              ext4      4.6G   10M  4.4G   1% /srv/node/device3
    /dev/loop3              ext4      4.6G   10M  4.4G   1% /srv/node/device4
    /dev/vdb                ext4       17G   44M   16G   1% /mnt/newdisk

Print the libvirt XML to get the source of the disk

$ virsh dumpxml f19-test | grep -i source
      <source file='/var/lib/libvirt/images/f19-test.qcow2'/>

Above, I’m using a qcow2 disk image, I converted it to raw:

$ qemu-img convert -f qcow2 -O raw \
  /var/lib/libvirt/images/f19-test.qcow2 \

List the filesystems, partitions, block devices inside the raw disk image:

$ virt-filesystems --long --all -h -a \ 
Name              Type        VFS   Label  MBR  Size  Parent
/dev/sda1         filesystem  ext4  -      -    500M  -
/dev/fedora/root  filesystem  ext4  -      -    5.6G  -
/dev/fedora/swap  filesystem  swap  -      -    3.9G  -
/dev/fedora/root  lv          -     -      -    5.6G  /dev/fedora
/dev/fedora/swap  lv          -     -      -    3.9G  /dev/fedora
/dev/fedora       vg          -     -      -    9.5G  /dev/sda2
/dev/sda2         pv          -     -      -    9.5G  -
/dev/sda1         partition   -     -      83   500M  /dev/sda
/dev/sda2         partition   -     -      8e   9.5G  /dev/sda
/dev/sda          device      -     -      -    10G   -

Now, extend the file size of the raw disk image, using truncate:

# Create a new file based on original
$ truncate -r f19-test.raw f19-test.raw.new
# Adjust the new file size to 15G
$ truncate -s +15G f19-test.raw.new

List the file system partition info to find out the block device name:

$ virt-filesystems --partitions \
  --long -h -a f19-test.raw
Name       Type       MBR  Size  Parent
/dev/sda1  partition  83   500M  /dev/sda
/dev/sda2  partition  8e   9.5G  /dev/sda

Now, resize the new disk image using virt-resize. Note that, the --lv-expand option expands the root file system (thx Rich!):

$ virt-resize --expand /dev/sda2 --lv-expand \
  /dev/fedora/root f19-test.raw f19-test.raw.new
    Examining f19-test.raw ...
    Summary of changes:
    /dev/sda1: This partition will be left alone.
    /dev/sda2: This partition will be resized from 9.5G to 24.5G.  The LVM 
        PV on /dev/sda2 will be expanded using the 'pvresize' method.
    /dev/fedora/root: This logical volume will be expanded to maximum size. 
         The filesystem ext4 on /dev/fedora/root will be expanded using the 
        'resize2fs' method.
    Setting up initial partition table on f19-test.raw.new ...
    Copying /dev/sda1 ...
     100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ 00:00
    Copying /dev/sda2 ...
     100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ 00:00
    Expanding /dev/sda2 using the 'pvresize' method ...
    Expanding /dev/fedora/root using the 'resize2fs' method ...
    Resize operation completed with no errors.  Before deleting the old 
    disk, carefully check that the resized disk boots and works correctly.

Now, the size of the both new guests

$ ls -lash f19-test.raw f19-test.raw.new
2.7G -rw-r--r--. 1 qemu qemu 10G Apr 10 11:12 f19-test.raw
11G -rw-r--r--. 1 root root 25G Apr 10 12:13 f19-test.raw.new

Replace the old one w/ new one (you might want to take backup of the old one here, just in case):

$ mv f19-test.raw.new f19-test.raw

Also, update the libvirt XML file of the guest to reflect the raw disk image:

# Update source file path
$ virsh edit f19-test 
# grep the xml file to ensure.
$ grep source /etc/libvirt/qemu/f19-test.xml 

List file systems inside the newly created guest:

$ virt-filesystems --partitions --long -h -a f19-test.raw
Name       Type       MBR  Size  Parent
/dev/sda1  partition  83   500M  /dev/sda
/dev/sda2  partition  8e   25G   /dev/sda

Start the guest & ensure if everything looks sane:

$ virsh start f19-test --console

Leave a comment

Filed under Uncategorized

Multiple ways to access QEMU Machine Protocol (QMP)

Once QEMU is built, to get a finer understanding of it, or even for plain old debugging, having familiarity with QMP (QEMU Monitor Protocol) is quite useful. QMP allows applications — like libvirt — to communicate with a running QEMU’s instance. There are a few different ways to access the QEMU monitor to query the guest, get device (eg: PCI, block, etc) information, modify the guest state (useful to understand the block layer operations) using QMP commands. This post discusses a few aspects of it.

Access QMP via libvirt’s qemu-monitor-command
Libvirt had this capability for a long time, and this is the simplest. It can be invoked by virsh — on a running guest, in this case, called ‘devstack’:

$ virsh qemu-monitor-command devstack \
--pretty '{"execute":"query-kvm"}'
    "return": {
        "enabled": true,
        "present": true
    "id": "libvirt-8"

In the above example, I ran the simple command query-kvm which checks if (1) the host is capable of running KVM (2) and if KVM is enabled. Refer below for a list of possible ‘qeury’ commands.

QMP via telnet
To access monitor via any other way, we need to have qemu instance running in control mode, via telnet:

$ ./x86_64-softmmu/qemu-system-x86_64 \
  --enable-kvm -smp 2 -m 1024 \
  /export/images/el6box1.qcow2 \
  -qmp tcp:localhost:4444,server --monitor stdio
QEMU waiting for connection on: tcp::,server
VNC server running on `'
QEMU 1.4.50 monitor - type 'help' for more information

And, from a different shell, connect to that listening port 4444 via telnet:

$ telnet localhost 4444

Connected to localhost.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 1}, "package": ""}, "capabilities": []}}

We have to first enable QMP capabilities. This needs to be run before invoking any other commands, do:

{ "execute": "qmp_capabilities" }

QMP via unix socket
First, invoke the qemu binary in control mode using qmp, and create a unix socket as below:

$ ./x86_64-softmmu/qemu-system-x86_64 \
  --enable-kvm -smp 2 -m 1024 \
  /export/images/el6box1.qcow2 \
  -qmp unix:./qmp-sock,server --monitor stdio
QEMU waiting for connection on: unix:./qmp-sock,server

A few different ways to connect to the above qemu instance running in control mode, vi QMP:

  1. Firstly, via nc :

    $ nc -U ./qmp-sock
    {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 1}, "package": ""}, "capabilities": []}}
  2. But, with the above, you have to manually enable the QMP capabilities, and type each command in JSON syntax. It’s a bit cumbersome, & no history of commands typed is saved.

  3. Next, a more simpler way — a python script called qmp-shell is located in the QEMU source tree, under qemu/scripts/qmp/qmp-shell, which hides some details — like manually running the qmp_capabilities.

    Connect to the unix socket using the qmp-shell script:

    $ ./qmp-shell ../qmp-sock 
    Welcome to the QMP low-level shell!
    Connected to QEMU 1.4.50

    Then, just hit the key, and all the possible commands would be listed. To see a list of query commands:

    (QEMU) query-<TAB>
    query-balloon               query-commands              query-kvm                   query-migrate-capabilities  query-uuid
    query-block                 query-cpu-definitions       query-machines              query-name                  query-version
    query-block-jobs            query-cpus                  query-mice                  query-pci                   query-vnc
    query-blockstats            query-events                query-migrate               query-status                
    query-chardev               query-fdsets                query-migrate-cache-size    query-target                
  4. Finally, we can also acess the unix socket using socat and rlwrap. Thanks to upstream qemu developer Markus Armbruster for this hint.

    Invoke it this way, also execute a couple of commands — qmp_capabilities, and query-kvm, to view the response from the server.

    $ rlwrap -H ~/.qmp_history \
      socat UNIX-CONNECT:./qmp-sock STDIO
    {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 1}, "package": ""}, "capabilities": []}}
    {"return": {}}
    { "execute": "query-kvm" }
    {"return": {"enabled": true, "present": true}}

    Where, qmp_history contains recently ran QMP commands in JSON syntax. And rlwrap adds decent editing capabilities, recursive search & history. So, once you run all your commands, the ~/.qmp_history has a neat stack of all the QMP commands in JSON syntax.

    For instance, this is what my ~/.qmp_history file contains as I write this:

    $ cat ~/.qmp_history
    { "execute": "qmp_capabilities" }
    { "execute": "query-version" }
    { "execute": "query-events" }
    { "execute": "query-chardev" }
    { "execute": "query-block" }
    { "execute": "query-blockstats" }
    { "execute": "query-cpus" }
    { "execute": "query-pci" }
    { "execute": "query-kvm" }
    { "execute": "query-mice" }
    { "execute": "query-vnc" }
    { "execute": "query-spice " }
    { "execute": "query-uuid" }
    { "execute": "query-migrate" }
    { "execute": "query-migrate-capabilities" }
    { "execute": "query-balloon" }

To illustrate, I ran a few query commands (noted above) which provides an informative response from the server — no change is done to the state of the guest — so these can be executed safely.

I personally prefer the libvirt way, & accessing via unix socket with socat & rlwrap.

NOTE: To try each of the above variants, fisrst quit — type quit on the (qemu) shell — the qemu instance running in control mode, reinvoke it, then access it via one of the 3 different ways.


Filed under Uncategorized

Nested virtualization with KVM and Intel on Fedora-18

KVM nested virtualization with Intel finally works for me on Fedora-18. All three layers L0 (physical host) -> L1(regular-guest/guest-hypervisor) -> L2 (nested-guest) are running successfully as of writing this.

Previously, nested KVM virtualization on Intel was discussed here and here. This time on Fedora-18, I was able to successfully boot and use nested guest with resonable performance. (Although, I still have to do more formal tests to show some meaningful performance results).

Test setup information

Config info about the physical host, regular-guest/guest hypervisor and nested-guest. (All of them are Fedora-18; x86_64)

  • Physical Host (Host hypervisor/Bare metal)
    • Node info and some version info
      # virsh nodeinfo
      CPU model:           x86_64
      CPU(s):              4
      CPU frequency:       1995 MHz
      CPU socket(s):       1
      Core(s) per socket:  4
      Thread(s) per core:  1
      NUMA cell(s):        1
      Memory size:         10242692 KiB
      # cat /etc/redhat-release ; uname -r ; arch ; rpm -q qemu-kvm libvirt-daemon-kvm
      Fedora release 18 (Spherical Cow)
  • Regualr Guest (Guest Hypervisor)
    • A 20GB qcow2 disk image w/ cache=’none’ enabled in the libvirt xml
      # virsh nodeinfo
      CPU model:           x86_64
      CPU(s):              4
      CPU frequency:       1994 MHz
      CPU socket(s):       4
      Core(s) per socket:  1
      Thread(s) per core:  1
      NUMA cell(s):        1
      Memory size:         4049888 KiB
      # cat /etc/redhat-release ; uname -r ; arch ; rpm -q qemu-kvm libvirt-daemon-kvm
      Fedora release 18 (Spherical Cow)
  • Nested Guest
    • Config: 2GB Memory; 2 vcpus; 6GB sparse qcow2 disk image

Setting up guest hypervisor and nested guest

Refer the notes linked above to get the nested guest up and running:

  • Create a regular guest/guest-hypervisor —
     # ./create-regular-f18-guest.bash 
  • Expose intel VMX extensions inside the guest-hypervisor by adding the cpu’ attribute to the regular-guest’s libvirt xml file
  • Shutdown regular guest, Redefine it ( virsh define /etc/libvirt/qemu/regular-guest-f18.xml ) ; Start the guest ( virsh start regular-guest-f18 )
  • Now, install virtualization packages inside the guest-hypervisor
  •  # yum install libvirt-daemon-kvm libvirt-daemon-config-network libvirt-daemon-config-nwfilter python-virtinst -y 
  • Start libvirtd service —
     # systemctl start libvirtd.service && systemctl status libvirtd.service  
  • Create a nested guest
     # ./create-nested-f18-guest.bash 

The scripts, and reference libvirt xmls I used for this demonstration are posted on github .

qemu-kvm invocation of bare-metal and guest hypervisors

qemu-kvm invocation of regular guest (guest hypervisor) indicating vmx extensions

# ps -ef | grep -i qemu-kvm | egrep -i 'regular-guest-f18|vmx'
qemu     15768     1 19 13:33 ?        01:01:52 /usr/bin/qemu-kvm -name regular-guest-f18 -S -M pc-1.3 -cpu core2duo,+vmx -enable-kvm -m 4096 -smp 4,sockets=4,cores=1,threads=1 -uuid 9a7fd95b-7b4c-743b-90de-fa186bb5c85f -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/regular-guest-f18.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/export/vmimgs/regular-guest-f18.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a6:ff:96,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Running virt-host-validate (it’s part of libvirt-client package) on bare-metal host indicting the host is configured to run KVM

# virt-host-validate 
  QEMU: Checking for hardware virtualization                                 : PASS
  QEMU: Checking for device /dev/kvm                                         : PASS
  QEMU: Checking for device /dev/vhost-net                                   : PASS
  QEMU: Checking for device /dev/net/tun                                     : PASS
   LXC: Checking for Linux >= 2.6.26                                         : PASS

Networking Info
– The regular guest is using the bare metal host’s bridge device ‘br0’
– The nested guest is using libvirt’s default bridge ‘virbr0’

Caveat : If NAT’d networking is used on both bare metal & guest hypervisor, both, by default have network subnet (unless explicitly changed), and will mangle the networking setup. Bridging on L0 (bare metal host), and NAT on L1 (guest hypervisor) avoids this.


  • Ensure to have serial console enabled in the both L1 and L2 guests, very handy for debugging. If you use the kickstart file mentioned here, it’s taken care of. The magic lines to be added to kernel cmd line are console=tty0 console=ttyS0,115200
  • Once the nested guest was created, I tried to set the hostname and it turns out for some reason ext4 has made the file system read-only :
    	#  hostnamectl set-hostname nested-guest-f18.foo.bar.com
    Failed to issue method call: Read-only file system

    The I see these I/O errors from /var/log/messages:

    Feb 12 04:22:31 localhost kernel: [  724.080207] end_request: I/O error, dev vda, sector 9553368
    Feb 12 04:22:31 localhost kernel: [  724.080922] Buffer I/O error on device dm-1, logical block 33467
    Feb 12 04:22:31 localhost kernel: [  724.080922] Buffer I/O error on device dm-1, logical block 33468

    At this point, I tried to reboot the guest, only to be thrown at a dracut repair shell. I tried fsck a couple of times, & then tried to reboot the nested guest, to no avail. Then I force powered-off the nested-guest:

    #virsh destroy nested-guest-f18

    Now, it boots just fine — just while I was trying to get to the bottom of the I/O errors. I was discussing this behaviour with Rich Jones, and he suggested to try some more I/O activity inside the nested guest to see if I can trigger those errors again.

    # find / -exec md5sum {} \; > /dev/null
    # find / -xdev -exec md5sum {} \; > /dev/null

    After the above commands ran for more than 15 minutes, the I/O errors can’t be triggered any more,

  • A test for libugestfs program (from rwmj) would be on the host & first level guest to compare. The command needs to be ran several times and discard the first few results, to get a hot cache.
    # time guestfish -a /dev/null run' 
  • Another libguestfs test Rich suggested is to disable nested virt and measure guestfish running in the guest to find the speed-up from nested virtualization in contrast to pure software emulation.

Next, to run more useful work loads in these nested vmx guests.

1 Comment

Filed under Uncategorized