Tag Archives: OpenStack

Documentation of QEMU Block Device Operations

QEMU Block Layer currently (as of QEMU 2.10) supports four major kinds of live block device jobs – stream, commit, mirror, and backup. These can be used to manipulate disk image chains to accomplish certain tasks, e.g.: live copy data from backing files into overlays; shorten long disk image chains by merging data from overlays into backing files; live synchronize data from a disk image chain (including current active disk) to another target image; and point-in-time (and incremental) backups of a block device.

To that end, recently I have written documentation (thanks to the QEMU Block Layer maintainers & developers for the reviews) of the usage of following commands:

  • block-stream
  • block-commit
  • drive-mirror (and blockdev-mirror)
  • drive-backup (and blockdev-backup)

Each of the above block device jobs, their QMP (QEMU Machine Protocol) invocation examples are documented.

Here’s the source. And here’s the Sphinx-rendered HTML version.

This documentation can be handy in those (debugging) scenarios when it’s instructive to look at what is happening behind the scenes of QEMU. For example, live storage migration (without shared storage setup) is one of the most common use-cases that takes advantage of the QMP drive-mirror command and QEMU’s built-in Network Block Device (NBD) server. Here’s the QMP-level workflow for it — this is the flow libvirt internally implements (with some additional niceties).

Advertisement

Leave a comment

Filed under Uncategorized

QEMU Advent Calendar 2016

The QEMU Advent Calendar website 2016 features a QEMU disk image each day from 01-DEC to 24 DEC. Each day a new package becomes available for download (of format tar.xz) which contains a file describing the image (readme.txt or similar), and a little run shell script that starts QEMU with the recommended command-line parameters for the disk image.

The disk images contain interesting operating systems and software that run under the QEMU emulator. Some of them are well-known or not-so-well-known operating systems, old and new, others are custom demos and neat algorithms.” [From the About section.]

This is brought to you by Thomas Huth (his initial announcement here) and yours truly.


Explore the last five days of images from the 2016 edition here! [Extract the download with, e.g. for Day 05: tar -xf.day05.tar.xz]

PS: We still have a few open slots, so please don’t hesitate to contact if you have any fun disk image(s) to contribute.

Leave a comment

Filed under Uncategorized

LinuxCon talk slides: “A Practical Look at QEMU’s Block Layer Primitives”

Last week I spent time at LinuxCon (and the co-located KVM Forum) Toronto. I presented a talk on QEMU’s block layer primitives. Specifically, the QMP primitives block-commit, drive-mirror, drive-backup, and QEMU’s built-in NBD (Network Block Device) server.

Here are the slides.

Leave a comment

Filed under Uncategorized

Minimal DevStack with OpenStack Neutron networking

This post discusses a way to setup minimal DevStack (OpenStack development environment from git sources) with Neutron networking, in a virtual machine.

(a) Setup minimal DevStack environment in a VM

Prepare VM, take a snapshot

Assuming you have a Linux (Fedora 21 or above or any of Debian variants) virtual machine setup (with at-least 8GB memory and 40GB of disk space), take a quick snapshot. The below creates a QCOW2 internal snapshot (that means, your disk image should be a QCOW2 image), you can invoke it live or offline:

 $ virsh snapshot-create-as devstack-vm cleanslate

So that if something goes wrong, you can revert to this clean state, by simple doing:

 $ virsh snapshot-revert devstack-vm cleanslate

Setup DevStack

There’s plenty of configuration variants to set up DevStack. The upstream documentation has its own recommendation of minimal configuration. The below configuration is of much smaller foot-print, which configures only: Nova (Compute, Scheduler, API and Conductor services), Keystone Neutron and Glance (Image service) services.

$ mkdir -p $HOME/sources/cloud
$ git clone https://git.openstack.org/openstack-dev/devstack
$ chmod go+rx $HOME
$ cat << EOF > local.conf
[[local|localrc]]
DEST=$HOME/sources/cloud
DATA_DIR=$DEST/data
SERVICE_DIR=$DEST/status
SCREEN_LOGDIR=$DATA_DIR/logs/
LOGFILE=/$DATA_DIR/logs/devstacklog.txt
VERBOSE=True
USE_SCREEN=True
LOG_COLOR=True
RABBIT_PASSWORD=secret
MYSQL_PASSWORD=secret
SERVICE_TOKEN=secret
SERVICE_PASSWORD=secret
ADMIN_PASSWORD=secret
ENABLED_SERVICES=g-api,g-reg,key,n-api,n-cpu,n-sch,n-cond,mysql,rabbit,dstat,quantum,q-svc,q-agt,q-dhcp,q-l3,q-meta
SERVICE_HOST=127.0.0.1
NETWORK_GATEWAY=10.1.0.1
FIXED_RANGE=10.1.0.0/24
FIXED_NETWORK_SIZE=256
FORCE_CONFIG_DRIVE=always
VIRT_DRIVER=libvirt
# To use nested KVM, un-comment the below line
# LIBVIRT_TYPE=kvm
IMAGE_URLS="http://download.cirros-cloud.net/0.3.3/cirros-0.3.3-x86_64-disk.img"
# If you have `dnf` package manager, use it to improve speedups in DevStack build/tear down
export YUM=dnf

NOTE: If you’re using KVM-based virtualization under the hood, refer this upstream documentation on setting it up with DevStack, so that the VMs in your OpenStack cloud (i.e. Nova instances) can run, relatively, faster than with plain QEMU emulation. So, if you have the relevant hardware, you might want setup that before proceeding further.

Invoke the install script:

 $ ./stack.sh 

[27MAR2015 Update]: Don’t forget to systemctl enable the below services so they start on reboot — this allows you to successfully start all OpenStack services when you reboot your DevStack VM:

 $ systemctl enable openvswitch mariadb rabbitmq-server 

(b) Configure Neutron networking

Once DevStack installation completes successfully, let’s setup Neutron networking.

Set Neutron router and add security group rules

(1) Source the user tenant (‘demo’ user) credentials:

 $ . openrc demo

(2) Enumerate Neutron security group rules:

$ neutron security-group-list

(3) Create a couple of environment variables, for convenience, capturing the IDs of Neutron public, private networks and router:

$ PUB_NET=$(neutron net-list | grep public | awk '{print $2;}')
$ PRIV_NET=$(neutron net-list | grep private | awk '{print $2;}')
$ ROUTER_ID=$(neutron router-list | grep router1 | awk '{print $2;}')

(4) Set the Neutron gateway for router:

$ neutron router-gateway-set $ROUTER_ID $PUB_NET

(5) Add security group rules to enable ping and ssh:

$ neutron security-group-rule-create --protocol icmp \
    --direction ingress --remote-ip-prefix 0.0.0.0/0 default
$ neutron security-group-rule-create --protocol tcp  \
    --port-range-min 22 --port-range-max 22 --direction ingress default

Boot a Nova instance

Source the ‘demo’ user’s Keystone credentials, add a Nova key pair, and boot an ‘m1.small’ flavored CirrOS instance:

$ . openrc demo
$ nova keypair-add oskey1 > oskey1.priv
$ chmod 600 oskey1.priv
$ nova boot --image cirros-0.3.3-x86_64-disk \
    --nic net-id=$PRIV_NET --flavor m1.small \
    --key_name oskey1 cirrvm1 --security_groups default

Create a floating IP and assign it to the Nova instance

The below sequence of commands enumerate the Nova instance, finds the Neutron port ID for a specific instance. Then, creates a floating IP, associates it to the Nova instance. Then, again, enumerates the Nova instances, so you can notice both floating and fixed IPs for it:

 
$ nova list
$ neutron port-list --device-id $NOVA-INSTANCE-UUID
$ neutron floatingip-create public
$ neutron floatingip-associate $FLOATING-IP-UUID $PORT-ID-OF-NOVA-INSTANCE
$ nova list

A new tenant network creation can be trivially done with a script like this.

Optionally, test the networking setup by trying to ping or ssh into the CirrOS Nova instance.


Given the procedural nature of the above, all of this can be trivially scripted to suit your needs — in fact, upstream OpenStack Infrastructure does use such automated DevStack environments to gate (as part of CI) every change that is submitted to any of the various OpenStack projects.

Finally, to find out how minimal it really is, one way to test is to check the memory footprint inside the DevStack VM, using the ps_mem tool and compare that with a Different DevStack environment with more OpenStack services enabled. Edit: A quick memory profile in a Minimal DevStack environment here — 1.3GB of memory without any Nova instances running (but, with OpenStack services running).

2 Comments

Filed under Uncategorized

Notes from a talk on “Advanced snapshots with libvirt and QEMU”

I just did a talk at a small local conference (Infrastructure.Next, co-located with cfgmgmtcamp.eu) about Advanced Snapshots in libvirt and QEMU.

Slides are here, which contains URLs to more examples.

And, some more related notes here:

4 Comments

Filed under Uncategorized

LinuxCon/KVMForum/CloudOpen Eu 2014

While the Linux Foundation’s colocated events (LinuxCon/KVMForum/CloudOpen, Plumbers and a bunch of others) are still in progress (Düsseldorf, Germany), thought I’d quickly write a note here.

Some slides and demo notes on managing snapshots/disk image chains with libvirt/QEMU. And, some additional examples with a bit of commentary. (Thanks to Eric Blake, of libvirt project, for reviewing some of the details there.)

1 Comment

Filed under Uncategorized

libvirt blockcommit: shorten disk image chain by live merging the current active disk content

When using QCOW2-based external snapshots, it is desirable to reduce an entire disk image chain to a single disk to retain performance and increase while the guest is running. Upstream QEMU and libvirt has recently acquired the ability to do that. Relevant git commits for QEMU (Jeff Cody) and libvirt (Eric Blake).

This is best illustrated with a quick example.

Let’s start with the below disk image chain as below for a guest called vm1. For simplicity’s sake:

[base] <-- [sn1] <-- [sn2] <-- [current] (live QEMU)

Once live active block commit operation is complete (step 5 below), the result will be a flattened disk image chain where data from sn1, sn2 and current are live commited into base:

 [base] (live QEMU)

(1) List the current active image in use:

$ virsh domblklist vm1
Target     Source
------------------------------------------------
vda        /export/images/base.qcow2

(2) For a quick test, create external snapshots. (And, repeat the above operation two more times, so we have the chain: [base] <– [sn1] <– [sn2] <– [current] )

$ virsh snapshot-create-as \
   --domain vm1 snap1 \
   --diskspec vda,file=/export/images/sn1.qcow2 \
   --disk-only --atomic

(3) Enumerate the backing file chain:

$ qemu-img info --backing-chain current.qcow2
[. . .] # output discarded for brevity

(4) Again, check the current active disk image:

$ virsh domblklist vm1
Target     Source
------------------------------------------------
vda        /export/images/current.qcow2

(5) Live Active commit an entire chain, including pivot:

$ virsh blockcommit vm1 vda \
   --active --pivot --verbose
Block Commit: [100 %]
Successfully pivoted

Explanation:

  • –active: It performs a two stage operation: first stage – it commits the contents from top images into base (i.e. sn1, sn2, current into base); in the second stage, the block operation remains awake to synchronize any further changes (from top images into base), here the user can take two actions: cancel the job, or pivot the job, i.e. adjust the base image as the current active image.
  • –pivot: Once data is committed from sn1, sn2 and current into base, it pivots the live QEMU to use base as the active image.
  • –verbose: Displays a progress of block operation.
  • Finally, the disk image backing chain is shortened to a single disk image.

(6) Optionally, list the current active image in use. It’s now back to ‘base’ which has all the contents from current, sn2, sn1):

$ virsh domblklist vm1
Target     Source
------------------------------------------------
vda        /export/images/base.qcow2

9 Comments

Filed under Uncategorized

libvirt: default network conflicts (not anymore)

Increasingly there’s a need for libvirt networking to work inside a virtual machine that is already running on the default network (192.168.122.0/24). The immediate practical case where this comes up is while testing nested virtualization: start a guest (L1) with default libvirt networking, and if you need to install libvirt again on it to run a (nested) guest (L2), there’ll be routing conflict because of the existing default route — 192.168.122.0/24. Up until now, I tried to avoid this by creating a new libvirt network with a different IP range (or manually edit the default libvirt network).

To alleviate this routing conflict, Laine Stump (libvirt developer) now pushed a patch (with a tiny follow up) to upstream libvirt git. (Relevant libvirt bug with discussion.)

I ended up testing the patch last night, it works well.

Assuming your physical host (L0) has the default libvirt network route:

$ ip route show | grep virbr
192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1

Now, start a guest (L1) and when you install libvirt (which has the said fix) on it, it notices the existing route of 192.168.122.0/24 and creates the default network on the next free network range (starting its search with 192.168.124.0/24), thus avoiding the routing conflict.

 $ ip route show
  default via 192.168.122.1 dev ens2  proto static  metric 1024 
  192.168.122.0/24 dev ens2  proto kernel  scope link  src 192.168.122.62 
  192.168.124.0/24 dev virbr0  proto kernel  scope link  src 192.168.124.1

Relevant snippet of the default libvirt network (you can notice the new network range):

  $ virsh net-dumpxml default | grep "ip address" -A4
    <ip address='192.168.124.1' netmask='255.255.255.0'>
      <dhcp>
        <range start='192.168.124.2' end='192.168.124.254'/>
      </dhcp>
    </ip>

So, please test it (build RPMs locally from git master or should be available in the next upstream libvirt release, early October) for your use cases and report bugs, if any.

[Update: On Fedora, this fix is available from version libvirt-1.2.8-2.fc21 onwards.]

3 Comments

Filed under Uncategorized

Live disk migration with libvirt blockcopy

[08-JAN-2015 Update: Correct the blockcopy CLI and update the final step to re-use the copy to be consistent with the scenario outlined at the beginning. Corrections pointed out by Gary R Cook at the end of the comments.]
[17-NOV-2014 Update: With recent libvirt/QEMU improvements, another way (which is relatively faster) to take a live disk backup via libvirt blockcommit, here’s an example]

QEMU and libvirt projects has had a lot of block layer improvements in its last few releases (libvirt 1.2.6 & QEMU 2.1). This post discusses a method to do live disk storage migration with libvirt’s blockcopy.

Context on libvirt blockcopy
Simply put, blockcopy facilitates virtual machine live disk image copying (or mirroring) — primarily useful for different use cases of storage migration:

  • Live disk storage migration
  • Live backup of a disk image and its associated backing chain
  • Efficient non-shared storage migration (with a combination of virsh operations snapshort-create-as+blockcopy+blockcommit)
  • As of IceHouse release, OpenStack Nova project also uses a variation of libvirt blockcopy, through its Python API virDomainBlockRebase, to create live snapshots, nova image-create. (More details on this in an upcoming blog post).

A blockcopy operation has two phases: (a) All of source disk content is copied (or mirrored) to the destination, this operation can be canceled to revert to the source disk (b) Once libvirt gets a signal indicating source and destination content are equal, the mirroring job remains awake until an explicit call to virsh blockjob [. . .] --abort is issued to end the mirroring operation gracefully . If desired, this explicit call to abort can be avoided by supplying --finish option. virsh manual page for verbose details.

Scenario: Live disk storage migration

To illustrate a simple case of live disk storage migration, we’ll use a disk image chain of depth 2:

base <-- snap1 <-- snap2 (Live QEMU) 

Once live blockcopy is complete, the resulting status of disk image chain ends up as below:

base <-- snap1 <-- snap2
          ^
          |
          '------- copy (Live QEMU, pivoted)

I.e. once the operation finishes, ‘copy’ will share the backing file chain of ‘snap1’ and ‘base’. And, live QEMU is now pivoted to use the ‘copy’.

Prepare disk images, backing chain & define the libvirt guest

[For simplicity, all virtual machine disks are QCOW2 images.]

Create the base image:

 $ qemu-img create -f qcow2 base 1G

Edit the base disk image using guestfish, create a partition, make a file-system, add a file to the base image so that we distinguish its contents from its qcow2 overlay disk images:

$ guestfish -a base.qcow2 
[. . .]
><fs> run 
><fs> part-disk /dev/sda mbr
><fs> mkfs ext4 /dev/sda1
><fs> mount /dev/sda1 /
><fs> touch /foo
><fs> ls /
foo
><fs> exit

Create another QCOW2 overlay snapshot ‘snap1’, with backing file as ‘base’:

$ qemu-img create -f qcow2 -b base.qcow2 \
  -o backing_fmt=qcow2 snap1.qcow2

Add a file to snap1.qcow2:

$ guestfish -a snap1.qcow2 
[. . .]
><fs> run
><fs> part-disk /dev/sda mbr
><fs> mkfs ext4 /dev/sda1
><fs> mount /dev/sda1 /
><fs> touch /bar
><fs> ls /
bar
baz
foo
lost+found
><fs> exit

Create another QCOW2 overlay snapshot ‘snap2’, with backing file as ‘snap1’:

$ qemu-img create -f qcow2 -b snap1.qcow2 \
  -o backing_fmt=qcow2 snap2.qcow2

Add another test file ‘baz’ into snap2.qcow2 using guestfish (refer to previous examples above) to distinguish contents of base, snap1 and snap2.

Create a simple libvirt XML file as below, with source file pointing to snap2.qcow2 — which will be the active block device (i.e. it tracks all new guest writes):

$ cat <<EOF > /etc/libvirt/qemu/testvm.xml
<domain type='kvm'>
  <name>testvm</name>
  <memory unit='MiB'>512</memory>   
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64'>hvm</type>
  </os>
  <devices>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/export/vmimages/snap2.qcow2'/>
      <target dev='vda' bus='virtio'/>
    </disk>   
  </devices>
</domain>
EOF

Define the guest and start it:

$ virsh define etc/libvirt/qemu/testvm.xml
  Domain testvm defined from /etc/libvirt/qemu/testvm.xml
$ virsh start testvm
Domain testvm started

Perform live disk migration
Undefine the running libvirt guest to make it transient[*]:

$ virsh dumpxml --inactive testvm > /var/tmp/testvm.xml
$ virsh undefine testvm

Check what is the current block device before performing live disk migration:

$ virsh domblklist testvm
Target     Source
------------------------------------------------
vda        /export/vmimages/snap2.qcow2

Optionally, display the backing chain of snap2.qcow2:

$ qemu-img info --backing-chain /export/vmimages/snap2.qcow2
[. . .] # Output removed for brevity

Initiate blockcopy (live disk mirroring):

$ virsh blockcopy --domain testvm vda \
  /export/blockcopy-test/backups/copy.qcow2 \
  --wait --verbose --shallow \
  --pivot

Details of the above command: It creates copy.qcow2 file in the specified path; performs a --shallow blockcopy (i.e. the ‘copy’ shares the backing chain) of the current block device (vda); –pivot will pivot the live QEMU to the ‘copy’.

Confirm that QEMU has pivoted to the ‘copy’ by enumerating the current block device in use:

$ virsh domblklist testvm
Target     Source
------------------------------------------------
vda        /export/vmimages/copy.qcow2

Again, display the backing chain of ‘copy’, it should be the resultant chain as noted in the Scenario section above).

$ qemu-img info --backing-chain /export/vmimages/copy.qcow2

Enumerate the contents of copy.qcow2:

$ guestfish -a copy.qcow2 
[. . .]
><fs> run
><fs> mount /dev/sda1 /
><fs> ls /
bar
foo
baz
lost+found
><fs> quit

(You can notice above: all the content from base.qcow2, snap1.qcow2, and snap2.qcow2 mirrored into copy.qcow2.)

Edit the libvirt guest XML to use the copy.qcow2, and define it:

$ virsh edit testvm
# Replace the <source file='/export/vmimages/snap2.qcow2'/> 
# with <source file='/export/vmimages/copy.qcow2'/>
[. . .] 

$ virsh define /var/tmp/testvm.xml

[*] Reason for the undefining and defining the guest again: As of writing this, QEMU has to support persistent dirty bitmap — this enables us to restart a QEMU process with disk mirroring intact. There are some in-progress patches upstream for a while. Until they are in main line QEMU, the current approach (as illustrated above) is: make a running libvirt guest transient temporarily, perform live blockcopy, and make the guest persistent again. (Thanks to Eric Blake, one of libvirt project’s principal developers, for this detail.)

16 Comments

Filed under Uncategorized

On bug reporting. . .

[This has been sitting my drafts for a while, time to hit send. Long time ago I wrote something like this to a Red Hat internal mailing list, thought I’ll put it out here too.]

A week ago, I was participating in one of the regular upstream OpenStack bug triage sessions (for Nova project) to keep up with the insane flow of bugs. All open source projects are grateful for reported bugs. But more often than not, a good chunk of bugs in such large projects end up being bereft of details.

Writing a good bug report is hard. Writing a coherent, reproducible bug report is much harder (and takes more time). Especially when involved in complex cloud projects like OpentStack where you may have to care about everything from Kernel to user space.

A few things to consider including in a bug report. The following are all dead obvious points (and are usually part of a high-level template in issue trackers like Bugzilla). That said, however many times you repeat, it’s never enough:

  • Summary. A concise summary — is it a bug? an RFE? a tracker-bug?
  • Description. A clear description of the issue and its symptoms in chronological order.
  • Version details. e.g. Havana? IceHouse?, hypervisor details; API versions, anything else that’s relevant.
  • Crystal clear details to reproduce the bug. Bonus points for a reproducer script.
  • Test environment details. (This is crucial.)
    • Most of “cloud” software testing is dearly dependent on test environment. Clearer the details, fewer the round-trips between developers, test engineers, packagers, triagers, etc.
    • Note down if any special hardware is needed for testing, e.g. an exotic NAS, etc.
    • If you altered anything — config files, especially code located in /usr/lib/pythonx.x/site-packages/, or any other dependent lower-layer project configurations (e.g. libvirt’s config file), please say so.
  • Actual results. Post the exact, unedited details of what happens when the bug in question is triggered.
  • Expected results. Describe what is the desired behavior.
  • Additional investigative details (where appropriate).
    • If you’ve done a lot of digging into an issue, writing a detailed summary (even better: a blog post) while its fresh in memory is very useful. Along with addition info like – configuration settings, caveats, relevant log fragment, _stderr_ of a script, or a command being executed, adding trace details — all of which would be useful for archival purposes (and years later, the context would come in very handy)

Why?

  • Useful for new test engineers who do not have all the context of a bug.
  • Useful for documentation writers to help them write correct errata text/release notes.
  • Useful for non-technical folks reading the bugs/RFEs. Clear information saves a heck of a lot of time.
  • Useful for folks like product and program managers who’re always not in the trenches.
  • Useful for downstream support organizations.
  • Should there be a regression years later, having all the info to test/reproduce in the bug, right there makes your day!
  • Reduces needless round-trips of NEEDINFO.
  • Useful for new users referring to these bugs in a different context.

Overall, a very fine bug report reading experience.

You get the drift!

4 Comments

Filed under Uncategorized