Tag Archives: libvirt

Documentation of QEMU Block Device Operations

QEMU Block Layer currently (as of QEMU 2.10) supports four major kinds of live block device jobs – stream, commit, mirror, and backup. These can be used to manipulate disk image chains to accomplish certain tasks, e.g.: live copy data from backing files into overlays; shorten long disk image chains by merging data from overlays into backing files; live synchronize data from a disk image chain (including current active disk) to another target image; and point-in-time (and incremental) backups of a block device.

To that end, recently I have written documentation (thanks to the QEMU Block Layer maintainers & developers for the reviews) of the usage of following commands:

  • block-stream
  • block-commit
  • drive-mirror (and blockdev-mirror)
  • drive-backup (and blockdev-backup)

Each of the above block device jobs, their QMP (QEMU Machine Protocol) invocation examples are documented.

Here’s the source. And here’s the Sphinx-rendered HTML version.

This documentation can be handy in those (debugging) scenarios when it’s instructive to look at what is happening behind the scenes of QEMU. For example, live storage migration (without shared storage setup) is one of the most common use-cases that takes advantage of the QMP drive-mirror command and QEMU’s built-in Network Block Device (NBD) server. Here’s the QMP-level workflow for it — this is the flow libvirt internally implements (with some additional niceties).

Leave a comment

Filed under Uncategorized

LinuxCon talk slides: “A Practical Look at QEMU’s Block Layer Primitives”

Last week I spent time at LinuxCon (and the co-located KVM Forum) Toronto. I presented a talk on QEMU’s block layer primitives. Specifically, the QMP primitives block-commit, drive-mirror, drive-backup, and QEMU’s built-in NBD (Network Block Device) server.

Here are the slides.

Leave a comment

Filed under Uncategorized

libvirt blockcommit: shorten disk image chain by live merging the current active disk content

When using QCOW2-based external snapshots, it is desirable to reduce an entire disk image chain to a single disk to retain performance and increase while the guest is running. Upstream QEMU and libvirt has recently acquired the ability to do that. Relevant git commits for QEMU (Jeff Cody) and libvirt (Eric Blake).

This is best illustrated with a quick example.

Let’s start with the below disk image chain as below for a guest called vm1. For simplicity’s sake:

[base] <-- [sn1] <-- [sn2] <-- [current] (live QEMU)

Once live active block commit operation is complete (step 5 below), the result will be a flattened disk image chain where data from sn1, sn2 and current are live commited into base:

 [base] (live QEMU)

(1) List the current active image in use:

$ virsh domblklist vm1
Target     Source
------------------------------------------------
vda        /export/images/base.qcow2

(2) For a quick test, create external snapshots. (And, repeat the above operation two more times, so we have the chain: [base] <– [sn1] <– [sn2] <– [current] )

$ virsh snapshot-create-as \
   --domain vm1 snap1 \
   --diskspec vda,file=/export/images/sn1.qcow2 \
   --disk-only --atomic

(3) Enumerate the backing file chain:

$ qemu-img info --backing-chain current.qcow2
[. . .] # output discarded for brevity

(4) Again, check the current active disk image:

$ virsh domblklist vm1
Target     Source
------------------------------------------------
vda        /export/images/current.qcow2

(5) Live Active commit an entire chain, including pivot:

$ virsh blockcommit vm1 vda \
   --active --pivot --verbose
Block Commit: [100 %]
Successfully pivoted

Explanation:

  • –active: It performs a two stage operation: first stage – it commits the contents from top images into base (i.e. sn1, sn2, current into base); in the second stage, the block operation remains awake to synchronize any further changes (from top images into base), here the user can take two actions: cancel the job, or pivot the job, i.e. adjust the base image as the current active image.
  • –pivot: Once data is committed from sn1, sn2 and current into base, it pivots the live QEMU to use base as the active image.
  • –verbose: Displays a progress of block operation.
  • Finally, the disk image backing chain is shortened to a single disk image.

(6) Optionally, list the current active image in use. It’s now back to ‘base’ which has all the contents from current, sn2, sn1):

$ virsh domblklist vm1
Target     Source
------------------------------------------------
vda        /export/images/base.qcow2

9 Comments

Filed under Uncategorized

libvirt: default network conflicts (not anymore)

Increasingly there’s a need for libvirt networking to work inside a virtual machine that is already running on the default network (192.168.122.0/24). The immediate practical case where this comes up is while testing nested virtualization: start a guest (L1) with default libvirt networking, and if you need to install libvirt again on it to run a (nested) guest (L2), there’ll be routing conflict because of the existing default route — 192.168.122.0/24. Up until now, I tried to avoid this by creating a new libvirt network with a different IP range (or manually edit the default libvirt network).

To alleviate this routing conflict, Laine Stump (libvirt developer) now pushed a patch (with a tiny follow up) to upstream libvirt git. (Relevant libvirt bug with discussion.)

I ended up testing the patch last night, it works well.

Assuming your physical host (L0) has the default libvirt network route:

$ ip route show | grep virbr
192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1

Now, start a guest (L1) and when you install libvirt (which has the said fix) on it, it notices the existing route of 192.168.122.0/24 and creates the default network on the next free network range (starting its search with 192.168.124.0/24), thus avoiding the routing conflict.

 $ ip route show
  default via 192.168.122.1 dev ens2  proto static  metric 1024 
  192.168.122.0/24 dev ens2  proto kernel  scope link  src 192.168.122.62 
  192.168.124.0/24 dev virbr0  proto kernel  scope link  src 192.168.124.1

Relevant snippet of the default libvirt network (you can notice the new network range):

  $ virsh net-dumpxml default | grep "ip address" -A4
    <ip address='192.168.124.1' netmask='255.255.255.0'>
      <dhcp>
        <range start='192.168.124.2' end='192.168.124.254'/>
      </dhcp>
    </ip>

So, please test it (build RPMs locally from git master or should be available in the next upstream libvirt release, early October) for your use cases and report bugs, if any.

[Update: On Fedora, this fix is available from version libvirt-1.2.8-2.fc21 onwards.]

3 Comments

Filed under Uncategorized

Live disk migration with libvirt blockcopy

[08-JAN-2015 Update: Correct the blockcopy CLI and update the final step to re-use the copy to be consistent with the scenario outlined at the beginning. Corrections pointed out by Gary R Cook at the end of the comments.]
[17-NOV-2014 Update: With recent libvirt/QEMU improvements, another way (which is relatively faster) to take a live disk backup via libvirt blockcommit, here’s an example]

QEMU and libvirt projects has had a lot of block layer improvements in its last few releases (libvirt 1.2.6 & QEMU 2.1). This post discusses a method to do live disk storage migration with libvirt’s blockcopy.

Context on libvirt blockcopy
Simply put, blockcopy facilitates virtual machine live disk image copying (or mirroring) — primarily useful for different use cases of storage migration:

  • Live disk storage migration
  • Live backup of a disk image and its associated backing chain
  • Efficient non-shared storage migration (with a combination of virsh operations snapshort-create-as+blockcopy+blockcommit)
  • As of IceHouse release, OpenStack Nova project also uses a variation of libvirt blockcopy, through its Python API virDomainBlockRebase, to create live snapshots, nova image-create. (More details on this in an upcoming blog post).

A blockcopy operation has two phases: (a) All of source disk content is copied (or mirrored) to the destination, this operation can be canceled to revert to the source disk (b) Once libvirt gets a signal indicating source and destination content are equal, the mirroring job remains awake until an explicit call to virsh blockjob [. . .] --abort is issued to end the mirroring operation gracefully . If desired, this explicit call to abort can be avoided by supplying --finish option. virsh manual page for verbose details.

Scenario: Live disk storage migration

To illustrate a simple case of live disk storage migration, we’ll use a disk image chain of depth 2:

base <-- snap1 <-- snap2 (Live QEMU) 

Once live blockcopy is complete, the resulting status of disk image chain ends up as below:

base <-- snap1 <-- snap2
          ^
          |
          '------- copy (Live QEMU, pivoted)

I.e. once the operation finishes, ‘copy’ will share the backing file chain of ‘snap1’ and ‘base’. And, live QEMU is now pivoted to use the ‘copy’.

Prepare disk images, backing chain & define the libvirt guest

[For simplicity, all virtual machine disks are QCOW2 images.]

Create the base image:

 $ qemu-img create -f qcow2 base 1G

Edit the base disk image using guestfish, create a partition, make a file-system, add a file to the base image so that we distinguish its contents from its qcow2 overlay disk images:

$ guestfish -a base.qcow2 
[. . .]
><fs> run 
><fs> part-disk /dev/sda mbr
><fs> mkfs ext4 /dev/sda1
><fs> mount /dev/sda1 /
><fs> touch /foo
><fs> ls /
foo
><fs> exit

Create another QCOW2 overlay snapshot ‘snap1’, with backing file as ‘base’:

$ qemu-img create -f qcow2 -b base.qcow2 \
  -o backing_fmt=qcow2 snap1.qcow2

Add a file to snap1.qcow2:

$ guestfish -a snap1.qcow2 
[. . .]
><fs> run
><fs> part-disk /dev/sda mbr
><fs> mkfs ext4 /dev/sda1
><fs> mount /dev/sda1 /
><fs> touch /bar
><fs> ls /
bar
baz
foo
lost+found
><fs> exit

Create another QCOW2 overlay snapshot ‘snap2’, with backing file as ‘snap1’:

$ qemu-img create -f qcow2 -b snap1.qcow2 \
  -o backing_fmt=qcow2 snap2.qcow2

Add another test file ‘baz’ into snap2.qcow2 using guestfish (refer to previous examples above) to distinguish contents of base, snap1 and snap2.

Create a simple libvirt XML file as below, with source file pointing to snap2.qcow2 — which will be the active block device (i.e. it tracks all new guest writes):

$ cat <<EOF > /etc/libvirt/qemu/testvm.xml
<domain type='kvm'>
  <name>testvm</name>
  <memory unit='MiB'>512</memory>   
  <vcpu>1</vcpu>
  <os>
    <type arch='x86_64'>hvm</type>
  </os>
  <devices>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/export/vmimages/snap2.qcow2'/>
      <target dev='vda' bus='virtio'/>
    </disk>   
  </devices>
</domain>
EOF

Define the guest and start it:

$ virsh define etc/libvirt/qemu/testvm.xml
  Domain testvm defined from /etc/libvirt/qemu/testvm.xml
$ virsh start testvm
Domain testvm started

Perform live disk migration
Undefine the running libvirt guest to make it transient[*]:

$ virsh dumpxml --inactive testvm > /var/tmp/testvm.xml
$ virsh undefine testvm

Check what is the current block device before performing live disk migration:

$ virsh domblklist testvm
Target     Source
------------------------------------------------
vda        /export/vmimages/snap2.qcow2

Optionally, display the backing chain of snap2.qcow2:

$ qemu-img info --backing-chain /export/vmimages/snap2.qcow2
[. . .] # Output removed for brevity

Initiate blockcopy (live disk mirroring):

$ virsh blockcopy --domain testvm vda \
  /export/blockcopy-test/backups/copy.qcow2 \
  --wait --verbose --shallow \
  --pivot

Details of the above command: It creates copy.qcow2 file in the specified path; performs a --shallow blockcopy (i.e. the ‘copy’ shares the backing chain) of the current block device (vda); –pivot will pivot the live QEMU to the ‘copy’.

Confirm that QEMU has pivoted to the ‘copy’ by enumerating the current block device in use:

$ virsh domblklist testvm
Target     Source
------------------------------------------------
vda        /export/vmimages/copy.qcow2

Again, display the backing chain of ‘copy’, it should be the resultant chain as noted in the Scenario section above).

$ qemu-img info --backing-chain /export/vmimages/copy.qcow2

Enumerate the contents of copy.qcow2:

$ guestfish -a copy.qcow2 
[. . .]
><fs> run
><fs> mount /dev/sda1 /
><fs> ls /
bar
foo
baz
lost+found
><fs> quit

(You can notice above: all the content from base.qcow2, snap1.qcow2, and snap2.qcow2 mirrored into copy.qcow2.)

Edit the libvirt guest XML to use the copy.qcow2, and define it:

$ virsh edit testvm
# Replace the <source file='/export/vmimages/snap2.qcow2'/> 
# with <source file='/export/vmimages/copy.qcow2'/>
[. . .] 

$ virsh define /var/tmp/testvm.xml

[*] Reason for the undefining and defining the guest again: As of writing this, QEMU has to support persistent dirty bitmap — this enables us to restart a QEMU process with disk mirroring intact. There are some in-progress patches upstream for a while. Until they are in main line QEMU, the current approach (as illustrated above) is: make a running libvirt guest transient temporarily, perform live blockcopy, and make the guest persistent again. (Thanks to Eric Blake, one of libvirt project’s principal developers, for this detail.)

16 Comments

Filed under Uncategorized

Notes for building KVM-based virtualization components from upstream git

I frequently need to have latest KVM, QEMU, libvirt and libguestfs while testing with OpenStack RDO. I either build from upstream git master branch or from Fedora Rawhide (mostly this suffices). Below I describe the exact sequence I try to build from git. These instructions are available in some form in the README files of the said packages, just noting them here explicitly for convenience. My primary development/test environment is Fedora, but it should be similar on other distributions. (Maybe I should just script it all.)

Build KVM from git

I think it’s worth noting the distinction (from traditional master branch) of these KVM git branches: remotes/origin/queue and remotes/origin/next. queue and next branches are same most of the time with the distinction that KVM queue is the branch where patches are usually tested before moving them to the KVM next branch. And, commits from next branch are submitted (as a PULL request) to Linus during the next Kernel merge window. (I recall this from an old conversation with Gleb Natapov (thank you), one of the previous KVM maintainers on IRC).

# Clone the repo
$ git clone \
  git://git.kernel.org/pub/scm/virt/kvm/kvm.git

# To test out of tree patches,
# it's cleaner to do in a new branch
$ git checkout -b test_branch

# Make a config file
$ make defconfig

# Compile
$ make -j4 && make bzImage && make modules

# Install
$ sudo -i
$ make modules_install && make install

Build QEMU from git

To build QEMU (only x86_64 target) from its git:

# Install buid dependencies of QEMU
$ yum-builddep qemu

# Clone the repo
$ git clone git://git.qemu.org/qemu.git

# Create a build directory to isolate source directory 
# from build directory
$ mkdir -p ~/build/qemu && cd ~/build/qemu

# Run the configure script
$ ~/src/qemu/./configure --target-list=x86_64-softmmu \
  --disable-werror --enable-debug 

# Compile
$ make -j4

I previously discussed about QEMU building here.

Build libvirt from git

To build libvirt from its upstream git:

# Install build dependencies of libvirt
$ yum-builddep libvirt

# Clone the libvirt repo
$ git clone git://libvirt.org/libvirt.git && cd libvirt

# Create a build directory to isolate source directory
# from build directory
$ mkdir -p ~/build/libvirt && cd ~/build/libvirt

# Run the autogen script
$ ../src/libvirt/autogen.sh

# Compile
$ make -j4

# Run tests
$ make check

# Invoke libvirt programs without having to install them
$ ./run tools/virsh [. . .]

[Or, prepare RPMs and install them]

# Make RPMs (assumes Fedora `rpmbuild` setup
# is properly configured)
$ make rpm

# Install/update
$ yum update *.rpm

Build libguestfs from git
To build libguestfs from its upstream git:

# Install build dependencies of libvirt
$ yum-builddep libguestfs

# Clone the libguestfs repo
$ git clone git://github.com/libguestfs/libguestfs.git \
   && cd libguestfs

# Run the autogen script
$ ./autogen.sh

# Compile
$ make -j4

# Run tests
$ make check

# Invoke libguestfs programs without having to install them
$ ./run guestfish [. . .]

If you’d rather prefer libguestfs to use the custom QEMU built from git (as noted above), QEMU wrappers are useful in this case.

Alternate to building from upstream git, if you’d prefer to build the above components locally from Fedora master here are some instructions .

1 Comment

Filed under Uncategorized

Configuring Libvirt guests with an Open vSwitch bridge

In the context of OpenStack networking, I was trying to explore Open vSwitch. I felt it’s better to go one step back, and try with a pure libvirt guest before I try it with OpenStack networking.

On why Open vSwitch compared to regular Linux bridge?

  • In short (as Thomas Graf, Kernel networking subsystem developer, put it) — Software Defined Networking(SDN)
  • Open vSwitch’s upstream documentation provides a more detailed explanation.

Here’s a simple scenario, where the machine in test has a single physical NIC, obtaining its IP address from DHCP. And, running KVM guests managed via libvirt.

Install Open vSwitch

Install the Open vSwitch package (this is on Fedora 19):

$ yum install openvswitch -y

Enable the openvswitch systemd unit file, and start the daemon:

$ systemctl enable openvswitch.service
$ systemctl start openvswitch.service

Check the status Open vSwitch service, to ensure it’s ‘Active’:

$ systemctl status openvswitch.service

Configure Open vSwitch (OVS) bridge
Before you proceed, ensure to have physical access or access via serial console to the machine, because, associating a physical interface with an Open vSwitch bridge will result in lost connectivity.
Reasoning is here under the ‘Configuration problems’ section.

Add an OVS bridge device:

$ ovs-vsctl add-br ovsbr0

Associate the OVS bridge device to eth0 (or em1). (At this point, network connectivity will be lost.)

$ ovs-vsctl add-port ovsbr0 eth0

I was obtaining IP address to my host from DHCP, so I first cleared it from the physical interface, and associated it with the Open vSwitch bridge device (ovsbr0).

$ ifconfig eth0 0.0.0.0
$ ifconfig ovsbr0 10.xx.yyy.zzz

I killed the existing dhclient instance on ‘eth0’, and initiated it on ovsbr0:

$ dhclient ovsbr0 &

List the OVS database contents

 
$ ovs-vsctl show
    3dc7f3e3-5872-47c0-ba6f-1cb12065f4d0
        Bridge "ovsbr0"
            Port "eth0"
                Interface "eth0"
            Port "ovsbr0"
                Interface "ovsbr0"
                    type: internal
        ovs_version: "1.10.0"

Update libvirt guest’s bridge source

I have an existing KVM guest, managed by libvirt, with its default network source associated with libvirt’s ‘virbr0). Let’s modify its network to Open vSwitch bridge.

Edit the libvirt’s guest XML

$ virsh edit f18vm

The attribute should look as below (take note of the highlighted attributes):

[...]
    <interface type='bridge'>
      <mac address='52:54:00:fb:00:01'/>
      <source bridge='ovsbr0'/>
      <virtualport type='openvswitch'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
[...]

Once the guest XML is edited and saved, dump its contents to stdout, you’ll notice an additional attribute interfaceid added automatically:

    $ virsh dumpxml f18vm | grep bridge -A8
       <interface type='bridge'>
         <mac address='52:54:00:fb:00:01'/>
         <source bridge='ovsbr0'/>
         <virtualport type='openvswitch'>
           <parameters interfaceid='74b6858e-8012-4caa-85c7-b64902a19605'/>
         </virtualport>
         <model type='virtio'/>
         <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
       </interface>
       <serial type='pty'>
         <target port='0'/>

Start the guest, and check if it’s IP address matches the host subnet:

$ virsh start fed18vm --console
$ ifconfig eth0

7 Comments

Filed under Uncategorized

Multiple ways to access QEMU Machine Protocol (QMP)

Once QEMU is built, to get a finer understanding of it, or even for plain old debugging, having familiarity with QMP (QEMU Monitor Protocol) is quite useful. QMP allows applications — like libvirt — to communicate with a running QEMU’s instance. There are a few different ways to access the QEMU monitor to query the guest, get device (eg: PCI, block, etc) information, modify the guest state (useful to understand the block layer operations) using QMP commands. This post discusses a few aspects of it.

Access QMP via libvirt’s qemu-monitor-command
Libvirt had this capability for a long time, and this is the simplest. It can be invoked by virsh — on a running guest, in this case, called ‘devstack’:

$ virsh qemu-monitor-command devstack \
--pretty '{"execute":"query-kvm"}'
{
    "return": {
        "enabled": true,
        "present": true
    },
    "id": "libvirt-8"
}

In the above example, I ran the simple command query-kvm which checks if (1) the host is capable of running KVM (2) and if KVM is enabled. Refer below for a list of possible ‘qeury’ commands.

QMP via telnet
To access monitor via any other way, we need to have qemu instance running in control mode, via telnet:

$ ./x86_64-softmmu/qemu-system-x86_64 \
  --enable-kvm -smp 2 -m 1024 \
  /export/images/el6box1.qcow2 \
  -qmp tcp:localhost:4444,server --monitor stdio
QEMU waiting for connection on: tcp::127.0.0.14444,server
VNC server running on `127.0.0.1:5900'
QEMU 1.4.50 monitor - type 'help' for more information
(qemu)

And, from a different shell, connect to that listening port 4444 via telnet:

$ telnet localhost 4444

Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 1}, "package": ""}, "capabilities": []}}

We have to first enable QMP capabilities. This needs to be run before invoking any other commands, do:

{ "execute": "qmp_capabilities" }

QMP via unix socket
First, invoke the qemu binary in control mode using qmp, and create a unix socket as below:

$ ./x86_64-softmmu/qemu-system-x86_64 \
  --enable-kvm -smp 2 -m 1024 \
  /export/images/el6box1.qcow2 \
  -qmp unix:./qmp-sock,server --monitor stdio
QEMU waiting for connection on: unix:./qmp-sock,server

A few different ways to connect to the above qemu instance running in control mode, vi QMP:

  1. Firstly, via nc :

    $ nc -U ./qmp-sock
    {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 1}, "package": ""}, "capabilities": []}}
    
  2. But, with the above, you have to manually enable the QMP capabilities, and type each command in JSON syntax. It’s a bit cumbersome, & no history of commands typed is saved.

  3. Next, a more simpler way — a python script called qmp-shell is located in the QEMU source tree, under qemu/scripts/qmp/qmp-shell, which hides some details — like manually running the qmp_capabilities.

    Connect to the unix socket using the qmp-shell script:

    $ ./qmp-shell ../qmp-sock 
    Welcome to the QMP low-level shell!
    Connected to QEMU 1.4.50
    
    (QEMU) 
    

    Then, just hit the key, and all the possible commands would be listed. To see a list of query commands:

    (QEMU) query-<TAB>
    query-balloon               query-commands              query-kvm                   query-migrate-capabilities  query-uuid
    query-block                 query-cpu-definitions       query-machines              query-name                  query-version
    query-block-jobs            query-cpus                  query-mice                  query-pci                   query-vnc
    query-blockstats            query-events                query-migrate               query-status                
    query-chardev               query-fdsets                query-migrate-cache-size    query-target                
    (QEMU) 
    
  4. Finally, we can also acess the unix socket using socat and rlwrap. Thanks to upstream qemu developer Markus Armbruster for this hint.

    Invoke it this way, also execute a couple of commands — qmp_capabilities, and query-kvm, to view the response from the server.

    $ rlwrap -H ~/.qmp_history \
      socat UNIX-CONNECT:./qmp-sock STDIO
    {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 1}, "package": ""}, "capabilities": []}}
    {"execute":"qmp_capabilities"}
    {"return": {}}
    { "execute": "query-kvm" }
    {"return": {"enabled": true, "present": true}}
    

    Where, qmp_history contains recently ran QMP commands in JSON syntax. And rlwrap adds decent editing capabilities, recursive search & history. So, once you run all your commands, the ~/.qmp_history has a neat stack of all the QMP commands in JSON syntax.

    For instance, this is what my ~/.qmp_history file contains as I write this:

    $ cat ~/.qmp_history
    { "execute": "qmp_capabilities" }
    { "execute": "query-version" }
    { "execute": "query-events" }
    { "execute": "query-chardev" }
    { "execute": "query-block" }
    { "execute": "query-blockstats" }
    { "execute": "query-cpus" }
    { "execute": "query-pci" }
    { "execute": "query-kvm" }
    { "execute": "query-mice" }
    { "execute": "query-vnc" }
    { "execute": "query-spice " }
    { "execute": "query-uuid" }
    { "execute": "query-migrate" }
    { "execute": "query-migrate-capabilities" }
    { "execute": "query-balloon" }
    

To illustrate, I ran a few query commands (noted above) which provides an informative response from the server — no change is done to the state of the guest — so these can be executed safely.

I personally prefer the libvirt way, & accessing via unix socket with socat & rlwrap.

NOTE: To try each of the above variants, fisrst quit — type quit on the (qemu) shell — the qemu instance running in control mode, reinvoke it, then access it via one of the 3 different ways.

4 Comments

Filed under Uncategorized

OpenStack — nova image-create, under the hood

NOTE (05-OCT-2016): This post is outdated — the current code (in snapshot() and _live_snapshot() methods in libvirt/driver.py) for cold (where the guest is paused) snapshot & live snapshot has changed quite a bit. In short, they now use libvirt APIs managedSave(), for cold snapshot; and blockRebase(), for live snapshot.


I was trying to understand what kind of image nova image-create creates. It’s not entirely obvious from its help output, which says — Creates a new image by taking a snapshot of a running server. But what kind of snapshot? let’s figure.

nova image-create operations

The command is invoked as below

 $  nova image-create fed18 "snap1-of-fed18" --poll 

Drilling into nova’s source code — nova/virt/libvirt/driver.py — this is what image-create does:

  1. If the guest — based on which snapshot is to be taken — is running, nova calls libvirt’s virsh managedsave, which saves and stops a running guest, to be restarted later from the saved state.
  2. Next, it creates a qcow2 internal disk snapshot of the guest (now offline).
  3. Then, extracts the internal named snapshot from the qcow2 file & exports it to a RAW format and temporarily places in $instances_path/snapshots.
  4. Deletes the internal named snapshot from the qcow2 file.
  5. Finally, uploads that image into OpenStack glance service — which can be confirmed by running glance image-list.

Update: Steps 2 and 4 above are now effectively removed with this upstream change.

A simple test
To get a bit more clarity, let’s try Nova’s actions on a single qocw2 disk — with a running Fedora 18 OS — using libvirt’s shell virsh and QEMU’s qemu-img:

 
# Save the state and stop a running guest
$ virsh managedsave fed18 

# Create a qemu internal snapshot
$ qemu-img snapshot -c snap1 fed18.qcow2 

# Get information about the disk
$ qemu-img info fed18.qcow2 

# Extract the internal snapshot, 
# convert it to raw and export it a file
$ qemu-img convert -f qcow2 -O raw -s \
    snap1 fed18.qcow2 snap1-fed18.img 

# Get information about the new image
# extracted from the snapshot
$ qemu-img info snap1-fed18.img 

# List out file sizes of the original 
# and the snapshot
$ ls -lash fed18.qcow2 snap1-fed18.qcow2 

# Delete the internal snapshot 
# from the original disk
$ qemu-img snapshot -d snap1 fed18.qcow2 

# Again, get information of the original disk
$ qemu-img info fed18.qcow2 

# Start the guest again
$ virsh start fed18 

Thanks to Nikola Dipanov for helping me on where to look.

Update: A few things I missed to mention (thanks again for comments from Nikola) — I was using libvirt, kvm as underlying hypervisor technologies, with OpenStack Folsom release.

4 Comments

Filed under Uncategorized

External (and Live) snapshots with libvirt

Previously, I posted about snapshots here , which briefly discussed different types of snapshots. In this post, let’s explore how external snapshots work. Just to quickly rehash, external snapshots are a type of snapshots where, there’s a base image(which is the original disk image), and then its difference/delta (aka, the snapshot image) is stored in a new QCOW2 file. Once the snapshot is taken, the original disk image will be in a ‘read-only’ state, which can be used as backing file for other guests.

It’s worth mentioning here that:

  • The original disk image can be either in RAW format or QCOW2 format. When a snapshot is taken, ‘the difference’ will be stored in a different QCOW2 file
  • The virtual machine has to be running, live. Also with Live snapshots, no guest downtime is experienced when a snapshot is taken.
  • At this moment, external(Live) snapshots work for ‘disk-only’ snapshots(and not VM state). Work for both disk and VM state(and also, reverting to external disk snapshot state) is in-progress upstream(slated for libvirt-0.10.2).

Before we go ahead, here’s some version info, I’m testing on Fedora-17(host), and the guest(named ‘testvm’) is running Fedora-18(Test Compose):

$ rpm -q libvirt qemu-kvm ; uname -r
libvirt-0.10.1-3.fc17.x86_64
qemu-kvm-1.2-0.2.20120806git3e430569.fc17.x86_64
3.5.2-3.fc17.x86_64
$ 

External disk-snapshots(live) using QCOW2 as original image:
Let’s see an illustration of external(live) disk-only snapshots. First, let’s ensure the guest is running:

$ virsh list
 Id    Name                           State
----------------------------------------------------
 3     testvm                          running


$ 

Then, list all the block devices associated with the guest:

$ virsh domblklist testvm --details
Type       Device     Target     Source
------------------------------------------------
file       disk       vda        /export/vmimgs/testvm.qcow2

$ 

Next, let’s create a snapshot(disk-only) of the guest this way, while the guest is running:

$ virsh snapshot-create-as testvm snap1-testvm "snap1 description" \
  --diskspec vda,file=/export/vmimgs/snap1-testvm.qcow2 \
  --disk-only --atomic

Some details of the flags used:
– Passing a ‘–diskspec’ parameter adds the ‘disk’ elements to the Snapshot XML file
– ‘–disk-only’ parameter, takes the snapshot of only the disk
– ‘–atomic’ just ensures either the snapshot is run completely or fails w/o making any changes

Let’s check the information about the just taken snapshot by running qemu-img:

$ qemu-img info /export/vmimgs/snap1-testvm.qcow2 
image: /export/vmimgs/snap1-testvm.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 2.5M
cluster_size: 65536
backing file: /export/vmimgs/testvm.qcow2
$ 

Apart from the above, I created 2 more snapshots(just the same syntax as above) for illustration purpose. Now, the snapshot-tree looks like this:

$ virsh snapshot-list testvm --tree

snap1-testvm
  |
  +- snap2-testvm
      |
      +- snap3-testvm
        

$ 

For the above example image file chain[ base<-snap1<-snap2<-snap3 ], it has to be read as – snap3 has snap2 as its backing file, snap2 has snap1 as its backing file, and snap1 has the base image as its backing file. We can see the backing file info from qemu-img:

#--------------------------------------------#
$ qemu-img info /export/vmimgs/snap3-testvm.qcow2
image: /export/vmimgs/snap3-testvm.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 129M
cluster_size: 65536
backing file: /export/vmimgs/snap2-testvm.qcow2
#--------------------------------------------#
$ qemu-img info /export/vmimgs/snap2-testvm.qcow2
image: /export/vmimgs/snap2-testvm.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 3.6M
cluster_size: 65536
backing file: /export/vmimgs/snap1-testvm.qcow2
#--------------------------------------------#
$ qemu-img info /export/vmimgs/snap1-testvm.qcow2
image: /export/vmimgs/snap1-testvm.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 2.5M
cluster_size: 65536
backing file: /export/vmimgs/testvm.qcow2
$
#--------------------------------------------#

Now, if we do not need snap2 any more, and want to pull all the data from snap1 into snap3, making snap1 as snap3’s backing file, we can do a virsh blockpull operation as below:

#--------------------------------------------#
$ virsh blockpull --domain testvm \
  --path /export/vmimgs/snap3-testvm.qcow2 \
  --base /export/vmimgs/snap1-testvm.qcow2 \
  --wait --verbose
Block Pull: [100 %]
Pull complete
#--------------------------------------------#

Where, –path = path to the snapshot file, and –base = path to a backing file from which the data to be pulled. So from above example, it’s evident that we’re pulling the data from snap1 into snap3, and thus flattening the backing file chain resulting in snap1 as snap3’s backing file, which can be noticed by running qemu-img again.
Thing to note here,

$ qemu-img info /export/vmimgs/snap3-testvm.qcow2
image: /export/vmimgs/snap3-testvm.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 145M
cluster_size: 65536
backing file: /export/vmimgs/snap1-testvm.qcow2
$ 

A couple of things to note here, after discussion with Eric Blake(thank you):

  • If we do a listing of the snapshot tree again(now that ‘snap2-testvm.qcow2’ backing file is no more in use),
$ virsh snapshot-list testvm --tree
snap1-testvm
  |
  +- snap2-testvm
      |
      +- snap3-testvm
$

one might wonder, why is snap3 still pointing to snap2? Thing to note here is, the above is the snapshot chain, which is independent from each virtual disk’s backing file chain. So, the ‘virsh snapshot-list’ is still listing the information accurately at the time of snapshot creation(and not what we’ve done after creating the snapshot). So, from the above snapshot tree, if we were to revert to snap1 or snap2 (when revert-to-disk-snapshots is available), it’d still be possible to do that, meaning:

It’s possible to go from this state:
base <- snap123 (data from snap1, snap2 pulled into snap3)

we can still revert to:

base<-snap1 (thus undoing the changes in snap2 & snap3)

External disk-snapshots(live) using RAW as original image:
With external disk-snapshots, the backing file can be RAW as well (unlike with ‘internal snapshots’ which only work with QCOW2 files, where the snapshots and delta are all stored in a single QCOW2 file)

A quick illustration below. The commands are self-explanatory. It can be noted the change(from RAW to QCOW2) in the block disk associated with the guest, before & after taking the disk-snapshot (when virsh domblklist command was executed)

#-------------------------------------------------#
$ virsh list | grep f17btrfs2
 7     f17btrfs2                      running
$
#-------------------------------------------------#
$ qemu-img info /export/vmimgs/f17btrfs2.img
image: /export/vmimgs/f17btrfs2.img
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 1.5G
$ 
#-------------------------------------------------#
$ virsh domblklist f17btrfs2 --details
Type       Device     Target     Source
------------------------------------------------
file       disk       hda        /export/vmimgs/f17btrfs2.img

$ 
#-------------------------------------------------#
$ virsh snapshot-create-as f17btrfs2 snap1-f17btrfs2 \
  "snap1-f17btrfs2-description" \
  --diskspec hda,file=/export/vmimgs/snap1-f17btrfs2.qcow2 \
  --disk-only --atomic
Domain snapshot snap1-f17btrfs2 created
$ 
#-------------------------------------------------#
$ qemu-img info /export/vmimgs/snap1-f17btrfs2.qcow2
image: /export/vmimgs/snap1-f17btrfs2.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 196K
cluster_size: 65536
backing file: /export/vmimgs/f17btrfs2.img
$ 
#-------------------------------------------------#
$ virsh domblklist f17btrfs2 --details
Type       Device     Target     Source
------------------------------------------------
file       disk       hda        /export/vmimgs/snap1-f17btrfs2.qcow2
$ 
#-------------------------------------------------#

Also note: All snapshot XML files, where libvirt tracks the metadata of snapshots are are located under /var/lib/libvirt/qemu/snapshots/$guestname (and the original libvirt xml file is located under /etc/libvirt/qemu/$guestname.xml)

18 Comments

Filed under Uncategorized