Tag Archives: kvm

Documentation of QEMU Block Device Operations

QEMU Block Layer currently (as of QEMU 2.10) supports four major kinds of live block device jobs – stream, commit, mirror, and backup. These can be used to manipulate disk image chains to accomplish certain tasks, e.g.: live copy data from backing files into overlays; shorten long disk image chains by merging data from overlays into backing files; live synchronize data from a disk image chain (including current active disk) to another target image; and point-in-time (and incremental) backups of a block device.

To that end, recently I have written documentation (thanks to the QEMU Block Layer maintainers & developers for the reviews) of the usage of following commands:

  • block-stream
  • block-commit
  • drive-mirror (and blockdev-mirror)
  • drive-backup (and blockdev-backup)

Each of the above block device jobs, their QMP (QEMU Machine Protocol) invocation examples are documented.

Here’s the source. And here’s the Sphinx-rendered HTML version.

This documentation can be handy in those (debugging) scenarios when it’s instructive to look at what is happening behind the scenes of QEMU. For example, live storage migration (without shared storage setup) is one of the most common use-cases that takes advantage of the QMP drive-mirror command and QEMU’s built-in Network Block Device (NBD) server. Here’s the QMP-level workflow for it — this is the flow libvirt internally implements (with some additional niceties).

Advertisement

Leave a comment

Filed under Uncategorized

QEMU Advent Calendar 2016

The QEMU Advent Calendar website 2016 features a QEMU disk image each day from 01-DEC to 24 DEC. Each day a new package becomes available for download (of format tar.xz) which contains a file describing the image (readme.txt or similar), and a little run shell script that starts QEMU with the recommended command-line parameters for the disk image.

The disk images contain interesting operating systems and software that run under the QEMU emulator. Some of them are well-known or not-so-well-known operating systems, old and new, others are custom demos and neat algorithms.” [From the About section.]

This is brought to you by Thomas Huth (his initial announcement here) and yours truly.


Explore the last five days of images from the 2016 edition here! [Extract the download with, e.g. for Day 05: tar -xf.day05.tar.xz]

PS: We still have a few open slots, so please don’t hesitate to contact if you have any fun disk image(s) to contribute.

Leave a comment

Filed under Uncategorized

Minimal DevStack with OpenStack Neutron networking

This post discusses a way to setup minimal DevStack (OpenStack development environment from git sources) with Neutron networking, in a virtual machine.

(a) Setup minimal DevStack environment in a VM

Prepare VM, take a snapshot

Assuming you have a Linux (Fedora 21 or above or any of Debian variants) virtual machine setup (with at-least 8GB memory and 40GB of disk space), take a quick snapshot. The below creates a QCOW2 internal snapshot (that means, your disk image should be a QCOW2 image), you can invoke it live or offline:

 $ virsh snapshot-create-as devstack-vm cleanslate

So that if something goes wrong, you can revert to this clean state, by simple doing:

 $ virsh snapshot-revert devstack-vm cleanslate

Setup DevStack

There’s plenty of configuration variants to set up DevStack. The upstream documentation has its own recommendation of minimal configuration. The below configuration is of much smaller foot-print, which configures only: Nova (Compute, Scheduler, API and Conductor services), Keystone Neutron and Glance (Image service) services.

$ mkdir -p $HOME/sources/cloud
$ git clone https://git.openstack.org/openstack-dev/devstack
$ chmod go+rx $HOME
$ cat << EOF > local.conf
[[local|localrc]]
DEST=$HOME/sources/cloud
DATA_DIR=$DEST/data
SERVICE_DIR=$DEST/status
SCREEN_LOGDIR=$DATA_DIR/logs/
LOGFILE=/$DATA_DIR/logs/devstacklog.txt
VERBOSE=True
USE_SCREEN=True
LOG_COLOR=True
RABBIT_PASSWORD=secret
MYSQL_PASSWORD=secret
SERVICE_TOKEN=secret
SERVICE_PASSWORD=secret
ADMIN_PASSWORD=secret
ENABLED_SERVICES=g-api,g-reg,key,n-api,n-cpu,n-sch,n-cond,mysql,rabbit,dstat,quantum,q-svc,q-agt,q-dhcp,q-l3,q-meta
SERVICE_HOST=127.0.0.1
NETWORK_GATEWAY=10.1.0.1
FIXED_RANGE=10.1.0.0/24
FIXED_NETWORK_SIZE=256
FORCE_CONFIG_DRIVE=always
VIRT_DRIVER=libvirt
# To use nested KVM, un-comment the below line
# LIBVIRT_TYPE=kvm
IMAGE_URLS="http://download.cirros-cloud.net/0.3.3/cirros-0.3.3-x86_64-disk.img"
# If you have `dnf` package manager, use it to improve speedups in DevStack build/tear down
export YUM=dnf

NOTE: If you’re using KVM-based virtualization under the hood, refer this upstream documentation on setting it up with DevStack, so that the VMs in your OpenStack cloud (i.e. Nova instances) can run, relatively, faster than with plain QEMU emulation. So, if you have the relevant hardware, you might want setup that before proceeding further.

Invoke the install script:

 $ ./stack.sh 

[27MAR2015 Update]: Don’t forget to systemctl enable the below services so they start on reboot — this allows you to successfully start all OpenStack services when you reboot your DevStack VM:

 $ systemctl enable openvswitch mariadb rabbitmq-server 

(b) Configure Neutron networking

Once DevStack installation completes successfully, let’s setup Neutron networking.

Set Neutron router and add security group rules

(1) Source the user tenant (‘demo’ user) credentials:

 $ . openrc demo

(2) Enumerate Neutron security group rules:

$ neutron security-group-list

(3) Create a couple of environment variables, for convenience, capturing the IDs of Neutron public, private networks and router:

$ PUB_NET=$(neutron net-list | grep public | awk '{print $2;}')
$ PRIV_NET=$(neutron net-list | grep private | awk '{print $2;}')
$ ROUTER_ID=$(neutron router-list | grep router1 | awk '{print $2;}')

(4) Set the Neutron gateway for router:

$ neutron router-gateway-set $ROUTER_ID $PUB_NET

(5) Add security group rules to enable ping and ssh:

$ neutron security-group-rule-create --protocol icmp \
    --direction ingress --remote-ip-prefix 0.0.0.0/0 default
$ neutron security-group-rule-create --protocol tcp  \
    --port-range-min 22 --port-range-max 22 --direction ingress default

Boot a Nova instance

Source the ‘demo’ user’s Keystone credentials, add a Nova key pair, and boot an ‘m1.small’ flavored CirrOS instance:

$ . openrc demo
$ nova keypair-add oskey1 > oskey1.priv
$ chmod 600 oskey1.priv
$ nova boot --image cirros-0.3.3-x86_64-disk \
    --nic net-id=$PRIV_NET --flavor m1.small \
    --key_name oskey1 cirrvm1 --security_groups default

Create a floating IP and assign it to the Nova instance

The below sequence of commands enumerate the Nova instance, finds the Neutron port ID for a specific instance. Then, creates a floating IP, associates it to the Nova instance. Then, again, enumerates the Nova instances, so you can notice both floating and fixed IPs for it:

 
$ nova list
$ neutron port-list --device-id $NOVA-INSTANCE-UUID
$ neutron floatingip-create public
$ neutron floatingip-associate $FLOATING-IP-UUID $PORT-ID-OF-NOVA-INSTANCE
$ nova list

A new tenant network creation can be trivially done with a script like this.

Optionally, test the networking setup by trying to ping or ssh into the CirrOS Nova instance.


Given the procedural nature of the above, all of this can be trivially scripted to suit your needs — in fact, upstream OpenStack Infrastructure does use such automated DevStack environments to gate (as part of CI) every change that is submitted to any of the various OpenStack projects.

Finally, to find out how minimal it really is, one way to test is to check the memory footprint inside the DevStack VM, using the ps_mem tool and compare that with a Different DevStack environment with more OpenStack services enabled. Edit: A quick memory profile in a Minimal DevStack environment here — 1.3GB of memory without any Nova instances running (but, with OpenStack services running).

2 Comments

Filed under Uncategorized

Cubietruck: QEMU, KVM and Fedora

Rich Jones previoulsy wrote here on how he got KVM working on Cubietruck — it was Fedora-19 timeframe. It wasn’t quite straight forwad then: you had to build a custom Kernel, custom U-Boot (Universal-Boot), etc.

Recently I got a Cubietruck for testing. First thing I wanted to try was to boot a Fedora-21 KVM guest. It worked just fine — ensure to supply -cpu host parameter to QEMU invocation (Rich actually mentions this at the end of his post linked above). Why? CubieTruck has a Cortex-A7 processor, for which QEMU doesn’t have specific support, so you’d need need to use the same CPU as the host to boot a KVM guest. Peter Maydell, one of the QEMU upstream maintainers, explains a little more here on the why (a Kernel limitation).

Below is what I ended up doing to successfully boot a Fedora 21 guest on Cubietruck.

  • I downloaded the Fedora 21-Beta ARM image and wrote it to a microSD card. (I used this procedure.)
  • The above Fedora ARM image didn’t have KVM module. Josh Boyer on IRC said I might need a (L)PAE ARM Kernel. Installing it (3.17.4-302.fc21.armv7hl+lpae) did it — this is built with KVM module enabled.
  • Resized the root filesystem of Fedora 21 on the microSD card. (Procedure I used.)
  • Then, I tried a quick test to boot a KVM guest with libguestfs-test-tool. But the guest boot failed with:
    “kvm_init_vcpu failed: Invalid argument”. So, I filed a libguestfs bug to track this. After a bit of trial and error on IRC, I made a small edit to libguestfs source so that the KVM appliance is created with -cpu host. Rich suggested I sumbit this as a patch — which resulted in this libguestfs commit (resolving the bug I filed).
  • With this in place, a KVM guest boots successfully. (Complete output of boot via libguestfs appliance is here.) Also, I tested importing a Cirros-0.3.3 disk image into libvirt and run it succesfully.


[Trivia: Somehow the USB to TTL CP2102 serial converter cable I bought didn’t show boot (nor any other) messages when I accessed it via UART (with ‘ screen /dev/ttyUSB0 115200‘). Fortunately, the regular cable that was shipped with the Cubietruck worked just fine. Still, I’d like to find what’s up with the CP2012 cable, since it was deteced as a device in my systemd logs. ]

3 Comments

Filed under Uncategorized

Notes for building KVM-based virtualization components from upstream git

I frequently need to have latest KVM, QEMU, libvirt and libguestfs while testing with OpenStack RDO. I either build from upstream git master branch or from Fedora Rawhide (mostly this suffices). Below I describe the exact sequence I try to build from git. These instructions are available in some form in the README files of the said packages, just noting them here explicitly for convenience. My primary development/test environment is Fedora, but it should be similar on other distributions. (Maybe I should just script it all.)

Build KVM from git

I think it’s worth noting the distinction (from traditional master branch) of these KVM git branches: remotes/origin/queue and remotes/origin/next. queue and next branches are same most of the time with the distinction that KVM queue is the branch where patches are usually tested before moving them to the KVM next branch. And, commits from next branch are submitted (as a PULL request) to Linus during the next Kernel merge window. (I recall this from an old conversation with Gleb Natapov (thank you), one of the previous KVM maintainers on IRC).

# Clone the repo
$ git clone \
  git://git.kernel.org/pub/scm/virt/kvm/kvm.git

# To test out of tree patches,
# it's cleaner to do in a new branch
$ git checkout -b test_branch

# Make a config file
$ make defconfig

# Compile
$ make -j4 && make bzImage && make modules

# Install
$ sudo -i
$ make modules_install && make install

Build QEMU from git

To build QEMU (only x86_64 target) from its git:

# Install buid dependencies of QEMU
$ yum-builddep qemu

# Clone the repo
$ git clone git://git.qemu.org/qemu.git

# Create a build directory to isolate source directory 
# from build directory
$ mkdir -p ~/build/qemu && cd ~/build/qemu

# Run the configure script
$ ~/src/qemu/./configure --target-list=x86_64-softmmu \
  --disable-werror --enable-debug 

# Compile
$ make -j4

I previously discussed about QEMU building here.

Build libvirt from git

To build libvirt from its upstream git:

# Install build dependencies of libvirt
$ yum-builddep libvirt

# Clone the libvirt repo
$ git clone git://libvirt.org/libvirt.git && cd libvirt

# Create a build directory to isolate source directory
# from build directory
$ mkdir -p ~/build/libvirt && cd ~/build/libvirt

# Run the autogen script
$ ../src/libvirt/autogen.sh

# Compile
$ make -j4

# Run tests
$ make check

# Invoke libvirt programs without having to install them
$ ./run tools/virsh [. . .]

[Or, prepare RPMs and install them]

# Make RPMs (assumes Fedora `rpmbuild` setup
# is properly configured)
$ make rpm

# Install/update
$ yum update *.rpm

Build libguestfs from git
To build libguestfs from its upstream git:

# Install build dependencies of libvirt
$ yum-builddep libguestfs

# Clone the libguestfs repo
$ git clone git://github.com/libguestfs/libguestfs.git \
   && cd libguestfs

# Run the autogen script
$ ./autogen.sh

# Compile
$ make -j4

# Run tests
$ make check

# Invoke libguestfs programs without having to install them
$ ./run guestfish [. . .]

If you’d rather prefer libguestfs to use the custom QEMU built from git (as noted above), QEMU wrappers are useful in this case.

Alternate to building from upstream git, if you’d prefer to build the above components locally from Fedora master here are some instructions .

1 Comment

Filed under Uncategorized

Nested Virtualization — KVM, Intel, with VMCS Shadowing

[Previous installments on Nested Virtualization with KVM and Intel.]

This is part of some recent testing that I’ve been doing with upstream KVM (for 3.10.1). The threads linked here has initial tests bench-marking kernel compile (with make defconfig, a default config file) times in L2. And some minimal guestfish appliance start-up timings in L1.

Some details:

  • Setup information to test with VMCS (Virtual Machine Control Structure) Shadowing. In brief, VMCS Shadowing — a processor specific feature — as described upstream, can reduce the overhead of nested virtualization by reducing the number of VMExits from L1 to L0.
  • Simple scripts used to create L1 and L2.
  • Libvirt XMLs of L1, L2 guests, for reference.

The gritty details of reasons for VMExits are described in Intel architecture manuals, Volume 3b, APPENDIX 1.

1 Comment

Filed under Uncategorized

Search for a specific patch from an upstream PULL request (KVM) in Fedora Rawhide Kernel

Before I use the latest rawhide kernel for a test, I had to ensure some specific patches made into it from KVM upstream GIT PULL request . This is how I tried:

Fetch the Rawhide Kernel SRPM

Install the rawhide SRPM:

 
# Move into SRPMS directory
$ cd ~/rpmbuild/SRPMS 

# Fetch the SRPM
$ wget http://kojipkgs.fedoraproject.org//packages/kernel/3.10.0/0.rc0.git23.1.fc20/src/kernel-3.10.0-0.rc0.git23.1.fc20.src.rpm 

# Install the kernel SRPM
$ rpm -ivh kernel-3.10.0-0.rc0.git23.1.fc20.src.rpm 

First, simpler way

git-describe to the rescue: Each Fedora rawhide kernel changelog has the precise output of Linus’ tree (thanks jwb!). So, let’s run it on the upstream kernel tree.

 
# Clone Linus' tree (or traverse to its path if it already exists)
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux 

# Run:
$ git describe
v3.9-11789-ge0fd9af

The above output can be understood as (from git-describe man page): The current head of the “parent” branch is based on v3.9, and is 11789 commits ahead of it; with a hash suffix ge0fd9af: where “-g” (stands for “git”), e0fd9af is the 7-character abbreviation for the tip commit (which was — e0fd9affeb64088eff407dfc98bbd3a5c17ea479)

Now, compare the above tag from the tag mentioned in changelog of Fedora Rawhide Kernel

$ rpm -q kernel --changelog | head -2 
* Thu May 09 2013 Josh Boyer  - 3.10.0-0.rc0.git23.1
- Linux v3.9-11789-ge0fd9af

Second, a more convoluted way to ensure a specific patch is in

Check upstream KVM tree, for the latest commit —
db6ae6158186a17165ef990bda2895ae7594b039

# Clone the upstream KVM git
$ git clone git://git.kernel.org/pub/scm/virt/kvm/kvm.git

# Traverse into its directory
$ cd kvm 

# Get the latest commit ID
$ git log tags/kvm-3.10-1 | head -1
db6ae6158186a17165ef990bda2895ae7594b039 

Now, get details the above commit. Its commit’s short description says: “kvm: Add compat_ioctl for device control API”

$  git log -p db6ae6158186a17165ef990bda2895ae7594b039 
.....
+       .compat_ioctl = kvm_device_ioctl, 

(that’s the first line of change)

Let’s grep for that string in our just installed kernel sources tree

$ cd ~/rpmbuild/SOURCES/ 

# Copy the patch file archive into a temp directory
$  cp patch-3.9-git23.xz /var/tmp/test/ 

# Extract & list it
$ unxz patch-3.9-git23.xz 
$ ls
patch-3.9-git23 

# grep for that specific line
$ grep ".compat_ioctl = kvm_device_ioctl" patch-3.9-git23 
+       .compat_ioctl = kvm_device_ioctl,

So, the above confirms that the patch (and the KVM GIT PULL request is contained in the koji rawhide build.

And the entire PULL request for KVM updates is merged with this commit ID – 01227a889ed56ae53aeebb9f93be9d54dd8b2de8.

Leave a comment

Filed under Uncategorized

Multiple ways to access QEMU Machine Protocol (QMP)

Once QEMU is built, to get a finer understanding of it, or even for plain old debugging, having familiarity with QMP (QEMU Monitor Protocol) is quite useful. QMP allows applications — like libvirt — to communicate with a running QEMU’s instance. There are a few different ways to access the QEMU monitor to query the guest, get device (eg: PCI, block, etc) information, modify the guest state (useful to understand the block layer operations) using QMP commands. This post discusses a few aspects of it.

Access QMP via libvirt’s qemu-monitor-command
Libvirt had this capability for a long time, and this is the simplest. It can be invoked by virsh — on a running guest, in this case, called ‘devstack’:

$ virsh qemu-monitor-command devstack \
--pretty '{"execute":"query-kvm"}'
{
    "return": {
        "enabled": true,
        "present": true
    },
    "id": "libvirt-8"
}

In the above example, I ran the simple command query-kvm which checks if (1) the host is capable of running KVM (2) and if KVM is enabled. Refer below for a list of possible ‘qeury’ commands.

QMP via telnet
To access monitor via any other way, we need to have qemu instance running in control mode, via telnet:

$ ./x86_64-softmmu/qemu-system-x86_64 \
  --enable-kvm -smp 2 -m 1024 \
  /export/images/el6box1.qcow2 \
  -qmp tcp:localhost:4444,server --monitor stdio
QEMU waiting for connection on: tcp::127.0.0.14444,server
VNC server running on `127.0.0.1:5900'
QEMU 1.4.50 monitor - type 'help' for more information
(qemu)

And, from a different shell, connect to that listening port 4444 via telnet:

$ telnet localhost 4444

Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 1}, "package": ""}, "capabilities": []}}

We have to first enable QMP capabilities. This needs to be run before invoking any other commands, do:

{ "execute": "qmp_capabilities" }

QMP via unix socket
First, invoke the qemu binary in control mode using qmp, and create a unix socket as below:

$ ./x86_64-softmmu/qemu-system-x86_64 \
  --enable-kvm -smp 2 -m 1024 \
  /export/images/el6box1.qcow2 \
  -qmp unix:./qmp-sock,server --monitor stdio
QEMU waiting for connection on: unix:./qmp-sock,server

A few different ways to connect to the above qemu instance running in control mode, vi QMP:

  1. Firstly, via nc :

    $ nc -U ./qmp-sock
    {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 1}, "package": ""}, "capabilities": []}}
    
  2. But, with the above, you have to manually enable the QMP capabilities, and type each command in JSON syntax. It’s a bit cumbersome, & no history of commands typed is saved.

  3. Next, a more simpler way — a python script called qmp-shell is located in the QEMU source tree, under qemu/scripts/qmp/qmp-shell, which hides some details — like manually running the qmp_capabilities.

    Connect to the unix socket using the qmp-shell script:

    $ ./qmp-shell ../qmp-sock 
    Welcome to the QMP low-level shell!
    Connected to QEMU 1.4.50
    
    (QEMU) 
    

    Then, just hit the key, and all the possible commands would be listed. To see a list of query commands:

    (QEMU) query-<TAB>
    query-balloon               query-commands              query-kvm                   query-migrate-capabilities  query-uuid
    query-block                 query-cpu-definitions       query-machines              query-name                  query-version
    query-block-jobs            query-cpus                  query-mice                  query-pci                   query-vnc
    query-blockstats            query-events                query-migrate               query-status                
    query-chardev               query-fdsets                query-migrate-cache-size    query-target                
    (QEMU) 
    
  4. Finally, we can also acess the unix socket using socat and rlwrap. Thanks to upstream qemu developer Markus Armbruster for this hint.

    Invoke it this way, also execute a couple of commands — qmp_capabilities, and query-kvm, to view the response from the server.

    $ rlwrap -H ~/.qmp_history \
      socat UNIX-CONNECT:./qmp-sock STDIO
    {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 1}, "package": ""}, "capabilities": []}}
    {"execute":"qmp_capabilities"}
    {"return": {}}
    { "execute": "query-kvm" }
    {"return": {"enabled": true, "present": true}}
    

    Where, qmp_history contains recently ran QMP commands in JSON syntax. And rlwrap adds decent editing capabilities, recursive search & history. So, once you run all your commands, the ~/.qmp_history has a neat stack of all the QMP commands in JSON syntax.

    For instance, this is what my ~/.qmp_history file contains as I write this:

    $ cat ~/.qmp_history
    { "execute": "qmp_capabilities" }
    { "execute": "query-version" }
    { "execute": "query-events" }
    { "execute": "query-chardev" }
    { "execute": "query-block" }
    { "execute": "query-blockstats" }
    { "execute": "query-cpus" }
    { "execute": "query-pci" }
    { "execute": "query-kvm" }
    { "execute": "query-mice" }
    { "execute": "query-vnc" }
    { "execute": "query-spice " }
    { "execute": "query-uuid" }
    { "execute": "query-migrate" }
    { "execute": "query-migrate-capabilities" }
    { "execute": "query-balloon" }
    

To illustrate, I ran a few query commands (noted above) which provides an informative response from the server — no change is done to the state of the guest — so these can be executed safely.

I personally prefer the libvirt way, & accessing via unix socket with socat & rlwrap.

NOTE: To try each of the above variants, fisrst quit — type quit on the (qemu) shell — the qemu instance running in control mode, reinvoke it, then access it via one of the 3 different ways.

4 Comments

Filed under Uncategorized

Nested Virtualization with KVM Intel

Some context: In regular virtualization, your physical linux host is the hypervisor, and runs multiple operating systems. Nested Virtualization let’s you run a guest inside a regular guest(essentially a Guest hypervisor).For AMD there is nested-support available since a while, and some people reported success w/ nesting KVM guests. For Intel arch., there is support available recently, an year-ish, and some in progress work, so thought I’d give it a whirl when Adam Young started discussion about it in context of openstack project.

Some of the common use-cases for that are being discussed for nested-virtualization
– For instance, a cloud user gets a beefy, Regualar Guest(which she completely controls). Now, this user can turn regular guest into a hypervisor, and can cheerfully run/manage multiple guests for developing or testing w/o the hassle and intervention of the cloud provider.
– Possibility of having a many instances of virtualization setup (hypervisor and its guests) on one single Bare metal.
– Ability to debug and test hypervisor software

I have immediate access to a moderately beefy Intel hardware, and rest of the post is based on Intel’s CPU virt extensions. Before proceeding, let’s settle on some terminology for clarity:

  • Physical Host (Host hypervisor/Bare metal)
    • Config: Intel(R) Xeon(R) CPU(4 cores/socket); 10GB Memory; CPU Freq – 2GHz; Running latest Fedora-16(Minimal foot-print, @core only with Virt pkgs;x86_64; kernel-3.1.8-2.fc16.x86_64
  • Regualr Guest (Or Guest Hypervisor)
    • Config: 4GB Memory; 4vCPU; 20GB Raw disk image with cache =’none’ to have decent I/O; Minimal, @core F16; And same virt-packages as Physical Host; x86_64
  • Nested Guest (Guest installed inside the Regular Guest)
    • Config: 2GB Memory; 1vCPU; Minimal(@core only) F16; x86_64

Enabling Nesting on the Physical Host

Node Info of the Physical Host.

 
# virsh nodeinfo
CPU model:           x86_64
CPU(s):              4
CPU frequency:       1994 MHz
CPU socket(s):       1
Core(s) per socket:  4
Thread(s) per core:  1
NUMA cell(s):        1
Memory size:         10242864 kB

Let us first ensure kvm_intel kernel module has nesting enabled. By default, it’s disabled for Intel arch[ but enabled for AMD — SVM (secure virtual machine) extensions arch.]

 
# modinfo kvm_intel | grep -i nested
parm:           nested:bool
# 

And, we need to pass this kvm-intel.nested=1 on kernel commandline while rebooting the host to enable nesting for the Intel KVM kernel module. Which can be verified after boot by doing:

 
# cat /sys/module/kvm_intel/parameters/nested 
Y
# systool -m kvm_intel -v   | grep -i nested
    nested              = "Y"
# 

Or alternatively, Adam Young identified that nesting can be enabled by adding this directive options kvm-intel nested=y to the end of /etc/modprobe.d/dist.conf file and reboot the host so it persists.

Set up the Regular Guest(or Guest hypervisor)
Install a regular guest using virt-install or oz tool or any other preferred way. I made a quick script here. And ensure to have cache=’none’ in the disk attribute of the Guest Hypervisor’s xml file. (observation: Install via virt-install tool didn’t seem have this option picked by default.) Here is the ‘drive’ attribute libvirt xml snippet:

    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/libvirt/images/regular-guest.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>

Now, let’s try to enable Intel VMX(Virtual Machine Extensions) in the regular guest’s CPU. We can do it by running the below on the Physical host(aka Host Hypervisor), and adding the ‘cpu’ attribute to the regular-guest’s libvirt xml file, and start the guest.

# virsh  capabilities | virsh cpu-baseline /dev/stdin 
<cpu match='exact'>
  <model>Penryn</model>
  <vendor>Intel</vendor>
  <feature policy='require' name='dca'/>
  <feature policy='require' name='xtpr'/>
  <feature policy='require' name='tm2'/>
  <feature policy='require' name='vmx'/>
  <feature policy='require' name='ds_cpl'/>
  <feature policy='require' name='monitor'/>
  <feature policy='require' name='pbe'/>
  <feature policy='require' name='tm'/>
  <feature policy='require' name='ht'/>
  <feature policy='require' name='ss'/>
  <feature policy='require' name='acpi'/>
  <feature policy='require' name='ds'/>
  <feature policy='require' name='vme'/>
</cpu>

The o/p of the above cmd has a variety of options. Since we need only vmx extensions, I tried the simple way by adding to the regular-guest’s libvirt xml(virsh edit ..) and started it.

<cpu match='exact'>
  <model>core2duo</model>
 <feature policy='require' name='vmx'/>
</cpu>

Thanks to Jiri Denemark for the above hint. Also note that, there is a very detailed and informative post from Dan P Berrange on host/guest CPU models in libvirt.

As we enabled vmx in the guest-hypervisor, let’s confirm that vmx is exposed in the emulated CPU by ensuring qemu-kvm is invoked with -cpu core2duo,+vmx :


[root@physical-host ~]# ps -ef | grep qemu-kvm
qemu     17102     1  4 22:29 ?        00:00:34 /usr/bin/qemu-kvm -S -M pc-0.14 
-cpu core2duo,+vmx -enable-kvm -m 3072
-smp 3,sockets=3,cores=1,threads=1 -name f16test1 
-uuid f6219dbd-f515-f3c8-a7e8-832b99a24b5d -nographic -nodefconfig 
-nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/f16test1.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-drive file=/export/vmimgs/f16test1.img,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=21,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e6:cc:4e,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Now, let’s attempt to create a nested guest

Here comes the more interesting part, the nested-guest config. will be 2G RAM; 1vcpu; 8GB virtual disk. And let’s invoke a virt-install cmdline with a minimal kickstart install:


[root@regular-guest ~]# virt-install --connect=qemu:///system \
    --network=bridge:virbr0 \
    --initrd-inject=/root/fed.ks \
   --extra-args=ks=file:/fed.ks console=tty0 console=ttyS0,115200 serial rd_NO_PLYMOUTH \
    --name=nested-guest --disk path=/var/lib/libvirt/images/nested-guest.img,size=6 \
    --ram 2048 \
    --vcpus=1 \
    --check-cpu \
    --hvm \
    --location=http://download.foo.bar.com/pub/fedora/linux/releases/16/Fedora/x86_64/os/
    --nographics

Starting install...
Retrieving file .treeinfo...                                                                                                 | 1.7 kB     00:00 ... 
Retrieving file vmlinuz...                                                                                                   | 7.9 MB     00:08 ... 
Retrieving file initrd.img...                               28% [==============                                   ] 647 kB/s |  38 MB     02:25 ETA 

virt-install proceeds fine(to a certain extent), doing all regular things like getting access to network, create devices, create file-systems, dep checks performed, and finally package install proceeds:


Welcome to Fedora for x86_64



     ┌─────────────────────┤ Package Installation ├──────────────────────┐
     │                                                                   │
     │                                                                   │
     │                                 24%                               │
     │                                                                   │
     │                   Packages completed: 52 of 390                   │
     │                                                                   │
     │ Installing glibc-common-2.14.90-14.x86_64 (112 MB)                │
     │ Common binaries and locale data for glibc                         │
     │                                                                   │
     │                                                                   │
     │                                                                   │
     └───────────────────────────────────────────────────────────────────┘

And now, it’s stuck like that for ever. Doesn’t budge, trying to install pkgs for eternity. Let’s try to see what’s the state of the guest in a seperate terminal


[root@regular-guest ~]# virsh list
 Id Name                 State
----------------------------------
  1 nested-guest         paused

[root@regular-guest ~]# 
[root@regular-guest ~]#  virsh domstate nested-guest --reason
paused (unknown)

[root@regular-guest ~]# 

So our nested-guest seems to be paused, And package install on the nested-guest’s serial console is still hung. I gave up at this point. Need to try if I can get any helpful info w/ virt-dmesg tool aor any other ways to debug this further.

Just to note, there is enough disk space and memory on the ‘regular-guest’, so that case is ruled out here. And, I tried to destroy the broken nested-guest, and attempted to create a fresh one(repeated twice). Still no dice.

So not much luck yet with Intel arch, I’d have to try on an AMD machine.

UPDATE(on Intel arch): After trying a couple of times, I was finally able to ssh to the nested guest, but, after a reboot, the nested-guest loses the IP rendering it inaccessible.(Info: the regular-guest has a bridged IP, and nested-guest has a NATed IP) . And I couldn’t login via serial-console, as it’s broken due to a regression(which has a workaround). Also, refer to comments below for further discussion on NATed networking caveats.
UPDATE2: The correct syntax to be added to /etc/modprobe.conf/dist.conf is options kvm-intel nested=y

19 Comments

Filed under Uncategorized