Author Archives: kashyapc

Building qemu from upstream git source

From time to time, I test from upstream qemu to try out some specific area that I’m interested in (lately, block layer operations). Here’s how I try to build qemu from upstream git source, for a specific target — x86_64.

Install development tools (I’m using a Fedora 18 machine here). And a couple of other development libraries

$  yum groupinstall "Development Tools" 
$  yum install glib2-devel zlibrary zlib-devel -y 

Clone the qemu upstream source tree

$  git clone git://git.qemu.org/qemu.git  
$  cd qemu 

Run the qemu configure script for x86_64 architecture, and run make

$ ./configure --target-list=x86_64-softmmu \
--disable-werror --enable-debug 
$  make -j5 

Note: When the qemu configure script is invoked, it’ll throw an error asking you to either install the pixman package or fetch its git submodule.

I fetched pixman submodule

 $  git submodule update --init pixman 

If you have an OS disk image handy, you can invoke the newly built qemu binary to boot a guest on the stdio, assuming there’s a serial console enabled — having console=tty0 console=ttyS0,115200 inside your guest’s kernel command line in /etc/grub2.conf

$ ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm \
-smp 2 -m 1024 /export/images/fedora18.qcow2 \
-nographic

Note: With the above invocation, --enable-kvm parameter will make use of existing kvm kernel module from the system.

7 Comments

Filed under Uncategorized

OpenStack nova — dealing with unused base images

Nova uses disk image cache management to handle unused base disk images, validate their integrity, etc. In this entry, let’s try to find out how unused base images are cleaned up. For context on Openstack’s libvirt base images, take a look at Pádraig’s excellent write-up on life of an OpenStack libvirt image. And a prior related post on disk image usage by nova.

Set the relevant config directives
Let’s set periodic_interval=1 — the no. of seconds between running periodic tasks; image_cache_manager_interval=1 — the no. of periodic scheduler ticks to wait between runs of the image cache manager. I’ve set both of them to ’1′ for test purpose, and grepped to ensure the change reflects


#-----#
[tbox ~keystone_user1]$ sudo echo -e "periodic_interval=1\nimage_cache_manager_interval=1" >> /etc/nova/nova.conf
#-----#
[tbox ~keystone_user1]$ sudo egrep '^image_cache_manager_interval|^periodic_interval' /etc/nova/nova.conf 
periodic_interval=1
image_cache_manager_interval=1
[tbox ~keystone_user1]$ 
#-----#

Restart all openstack nova services, so the config changes take effect:


[root@tbox ~(keystone_user1)]# for j in `for i in $(ls -1 /etc/init.d/openstack-nova-*) ; do $i status | grep running ; done | awk '{print $1}'` ; do service $j restart ; done

Delete a running nova instance, verify logs to see the effect of config changes

Let’s stop and delete the only nova instance running — so that nova’s image cache manager knows the image is no longer in use, and the previously enabled config options can take effect.


#-----#
[tbox ~keystone_user1]$ nova list
+--------------------------------------+---------+--------+-------------------+
| ID                                   | Name    | Status | Networks          |
+--------------------------------------+---------+--------+-------------------+
| 11c1997a-b1ff-46ef-b96c-f329576de8d6 | fedora4 | ACTIVE | net1=10.xx.yyy.zz |
+--------------------------------------+---------+--------+-------------------+
[tbox ~keystone_user1]$ nova stop 11c1997a-b1ff-46ef-b96c-f329576de8d6
#-----#
[tbox ~keystone_user1]$ nova delete 11c1997a-b1ff-46ef-b96c-f329576de8d6
[tbox ~keystone_user1]$
#-----#
[tbox ~keystone_user1]$ nova list 
#-----#
[tbox ~keystone_user1]$ sudo virsh list --all
 Id    Name                           State
----------------------------------------------------

List out the base_images (now they should become invalid) :


[tbox ~keystone_user1]$ sudo ls -lash /var/lib/nova/instances/_base/
total 1.7G
4.0K drwxr-xr-x. 2 nova nova 4.0K Feb 18 00:41 .
4.0K drwxr-xr-x. 3 nova nova 4.0K Feb 18 04:01 ..
739M -rw-r--r--. 1 nova nova 9.8G Feb 18 04:01 06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3
741M -rw-r--r--. 1 nova nova  20G Feb 18 04:01 06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20
241M -rw-r--r--. 1 nova nova 241M Feb 18 00:41 06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3.part
[tbox ~keystone_user1]$ 

At this point, the — now invalid — base_images should’ve been deleted. Let’s look at nova’s compute logs, location: /var/log/nova/compute.log


.
.
2012-02-18 04:13:03 37747 INFO nova.compute.resource_tracker [-] Compute_service record updated for interceptor.foo.bar.com 
2012-02-18 04:13:03 37747 WARNING nova.virt.libvirt.imagecache [-] Unknown base file: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3
2012-02-18 04:13:03 37747 WARNING nova.virt.libvirt.imagecache [-] Unknown base file: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20
2012-02-18 04:13:03 37747 INFO nova.virt.libvirt.imagecache [-] Removable base files: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3 /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20
[tbox ~keystone_user1]$

Now, it says, base-files are too young to remove:


.
.
012-01-18 04:16:46 37747 INFO nova.virt.libvirt.imagecache [-] Base file too young to remove: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3
2012-02-18 04:16:46 37747 INFO nova.virt.libvirt.imagecache [-] Base file too young to remove: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20

Let’s drill down a bit further, by searching in upstream nova code for how young is ‘too young’:


kashyap@test$ git clone git://github.com/openstack/nova.git && cd nova
kashyap@nova$ grep FLAGS.remove_unused_original_minimum_age_seconds nova/virt/libvirt/imagecache.py
            maxage = FLAGS.remove_unused_original_minimum_age_seconds
kashyap@nova$
kashyap@nova$ git grep remove_unused_original_minimum_age_seconds
etc/nova/nova.conf.sample:# remove_unused_original_minimum_age_seconds=86400
nova/virt/libvirt/imagecache.py:    cfg.IntOpt('remove_unused_original_minimum_age_seconds',
nova/virt/libvirt/imagecache.py:            maxage = FLAGS.remove_unused_original_minimum_age_seconds
kashyap@nova$

Ok, 86400 seconds (24 hours) appears to be the default time. Let’s, set that value to 60 seconds in nova.conf and restart all nova services again:


#-----#
[tbox ~keystone_user1]$ sudo echo -e "remove_unused_original_minimum_age_seconds=60" >> /etc/nova/nova.conf
#-----#
[tbox ~keystone_user1]$ sudo egrep '^image_cache_manager_interval|^periodic_interval|^remove_unused_original_minimum_age_seconds' /etc/nova/nova.conf 
periodic_interval=1
image_cache_manager_interval=1
remove_unused_original_minimum_age_seconds=60
[tbox ~keystone_user1]$ 
#-----#

Restart all nova services again:


[tbox ~keystone_user1]$ sudo -i
[root@tbox ~(keystone_user1)]# for j in `for i in $(ls -1 /etc/init.d/openstack-nova-*) ; do $i status | grep running ; done | awk '{print $1}'` ; do service $j restart ; done

Now, the /var/log/compute/compute.log indicates one of the base-files being removed, that’s good:


.
.
2012-02-18 04:24:17 41389 WARNING nova.virt.libvirt.imagecache [-] Unknown base file: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810
a0d1d5d3_20
2012-02-18 04:24:17 41389 INFO nova.virt.libvirt.imagecache [-] Removable base files: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810
a0d1d5d3 /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20
2012-02-18 04:24:17 41389 INFO nova.virt.libvirt.imagecache [-] Removing base file: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3

Let’s check in the nova base images directory. Now, it seems like, two more base_images are still remaining:


[tbox ~keystone_user1]$ sudo ls -1 /var/lib/nova/instances/_base/
06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20
06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3.part
[tbox ~keystone_user1]$ 

Let’s get back to nova compute.log, it still says ‘Unknown base file’ and ‘Base file too young to remove’:


.
.
2012-02-18 04:50:45 41389 WARNING nova.virt.libvirt.imagecache [-] Unknown base file: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20
2012-02-18 04:50:45 41389 INFO nova.virt.libvirt.imagecache [-] Removable base files: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20
2012-02-18 04:50:45 41389 INFO nova.virt.libvirt.imagecache [-] Base file too young to remove: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20

Let’s check the nova compute.log again — now it’s indicating one of the other two unused base images being deleted — from time-stamps, in 10 minutes.


.
.
2012-02-18 05:01:24 41389 INFO nova.compute.resource_tracker [-] Compute_service record updated for interceptor.foo.bar.com 
2012-02-18 05:01:24 41389 WARNING nova.virt.libvirt.imagecache [-] Unknown base file: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20
2012-02-18 05:01:24 41389 INFO nova.virt.libvirt.imagecache [-] Removable base files: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20
2012-02-18 05:01:24 41389 INFO nova.virt.libvirt.imagecache [-] Removing base file: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20

Now, the only remaining one appears to be a .part image.


[tbox ~keystone_user1]$ sudo ls -1 /var/lib/nova/instances/_base/
06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3.part
[tbox ~keystone_user1]$

From further investigation by Padraig , the stale .part files turned out to be a different bug (for which he posted a fix upstream) — When converting non-raw (in this case, qcow2) to raw, libvirt leaves behind the original non-raw image on disk.

Finally
Let’s turn the config directives back to sane defaults, and comment out the config directives periodic_interval, image_cache_manager_interval. And, restart all nova services to take the config change effect.


#-----#
[tbox ~keystone_user1]$  sudo sed -i 's/remove_unused_original_minimum_age_seconds=60/remove_unused_originalminimum_age_seconds=86400/' foobar1.txt
#-----#
[tbox ~keystone_user1]$  sudo sed -i 's/periodic_interval=1/\#periodic_interval=1/;s/image_cache_manager_interval=1/\#image_cache_manager_interval=1/' >> /etc/nova/nova.conf
#-----#
[root@tbox ~(keystone_user1)]# for j in `for i in $(ls -1 /etc/init.d/openstack-nova-*) ; do $i status | grep running ; done | awk '{print $1}'` ; do service $j restart ; done
#-----#

UPDATE: I also had to set the config directive remove_unused_resized_minimum_age_seconds=60 in nova.conf, to remove the re-sized — according to the flavor chosen, in this case, it’s m1.small — raw images from the _base directory.

1 Comment

Filed under Uncategorized

A couple of caveats while setting up devstack on Fedora-18

It took more than a couple of attempts to get a running devstack instance on Fedora-18. I was following danpb’s notes. Here are a few tweaks I had to do get it working:

That’s the localrc file I had, before proceeding ahead:


$ cat localrc 
DESTDIR=$HOME/src/openstack
DATA_DIR=$DESTDIR/data

LOGFILE=$DATA_DIR/logs/stack.log
SCREEN_LOGDIR=$DATA_DIR/logs

# Switch to use QPid instead of RabbitMQ 
disable_service rabbit
enable_service qpid

# Replace with your primary interface name
HOST_IP_IFACE=eth0
PUBLIC_INTERFACE=eth0
VLAN_INTERFACE=eth0
FLAT_INTERFACE=eth0

# Replace with whatever password you wish to use
MYSQL_PASSWORD=testpwd
SERVICE_TOKEN=testpwd
SERVICE_PASSWORD=testpwd
ADMIN_PASSWORD=testpwd

# Pre-populate glance with a minimal image and a Fedora 17 image
IMAGE_URLS="http://launchpad.net/cirros/trunk/0.3.0/+download/cirros-0.3.0-x86_64-uec.tar.gz,http://berrange.fedorapeople.org/images/2012-11-15/f17-x86_64-openstack-sda.qcow2"
$ 

After running stack.sh, you might see some failures — like tgtd service failing to start. The below is what I had in my tgtd.conf and stack.conf.


====
# rpm -q scsi-target-utils
scsi-target-utils-1.0.32-2.fc18.x86_64
====
$ cat /etc/tgt/tgtd.conf 
# The default config file
include /etc/tgt/targets.conf

# Config files from other packages etc.
#include /etc/tgt/conf.d/*.conf

# Explicitly import the stack.conf file
include /etc/tgt/conf.d/stack.conf
====
$ cat /etc/tgt/conf.d/stack.conf 
include /home/tuser1/src/openstack/data/cinder/volumes/*
$
====

And restart tgtd service:


$ systemctl restart tgtd.service  
$ systemctl status tgtd.service                                                                                                                                             
tgtd.service - tgtd iSCSI target daemon
          Loaded: loaded (/usr/lib/systemd/system/tgtd.service; disabled)
          Active: active (running) since Wed 2013-02-27 06:28:13 EST; 5s ago
         Process: 26826 ExecStop=/usr/sbin/tgtadm --op delete --mode system (code=exited, status=0/SUCCESS)
         Process: 26822 ExecStop=/usr/sbin/tgt-admin --update ALL -c /dev/null (code=exited, status=0/SUCCESS)
         Process: 26820 ExecStop=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS)
         Process: 26888 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v ready (code=exited, status=0/SUCCESS)
         Process: 26882 ExecStartPost=/usr/sbin/tgt-admin -e -c $TGTD_CONFIG (code=exited, status=0/SUCCESS)
         Process: 26880 ExecStartPost=/usr/sbin/tgtadm --op update --mode sys --name State -v offline (code=exited, status=0/SUCCESS)
        Main PID: 26879 (tgtd)
          CGroup: name=systemd:/system/tgtd.service
                  └─26879 /usr/sbin/tgtd -f

Feb 27 06:28:12 devstack-fedenglabfoobarcom tgtd[26879]: librdmacm: Warning: couldn't read ABI version.
Feb 27 06:28:12 devstack-fedenglabfoobarcom tgtd[26879]: librdmacm: Warning: assuming: 4
Feb 27 06:28:12 devstack-fedenglabfoobarcom tgtd[26879]: librdmacm: Fatal: unable to get RDMA device list
Feb 27 06:28:12 devstack-fedenglabfoobarcom tgtd[26879]: tgtd: iser_ib_init(3376) Failed to initialize RDMA; load kernel modules?
Feb 27 06:28:12 devstack-fedenglabfoobarcom tgtd[26879]: tgtd: work_timer_start(146) use timer_fd based scheduler
Feb 27 06:28:12 devstack-fedenglabfoobarcom tgtd[26879]: tgtd: bs_init(313) use signalfd notification
Feb 27 06:28:13 devstack-fedenglabfoobarcom systemd[1]: Started tgtd iSCSI target daemon.

Source the openrc file which will do few things: configure keystone authentication credentials, set up nova’s compute api version, etc:


$ . openrc 
$ 

Let’s list images in Glance:


$ glance image-list
+--------------------------------------+---------------------------------+-------------+------------------+-----------+--------+
| ID                                   | Name                            | Disk Format | Container Format | Size      | Status |
+--------------------------------------+---------------------------------+-------------+------------------+-----------+--------+
| fa1adc47-1ff3-4394-99b3-e782a27762b0 | cirros-0.3.0-x86_64-uec         | ami         | ami              | 25165824  | active |
| b6137e19-fe48-40a1-81d8-ccbbae6bfc2b | cirros-0.3.0-x86_64-uec-kernel  | aki         | aki              | 4731440   | active |
| 3780a3bd-e3ad-4323-908e-8f22789e44e1 | cirros-0.3.0-x86_64-uec-ramdisk | ari         | ari              | 2254249   | active |
| 0c445018-f23c-4e5a-876b-912ea8f6c636 | f17-x86_64-openstack-sda        | qcow2       | bare             | 251985920 | active |
+--------------------------------------+---------------------------------+-------------+------------------+-----------+--------+
$ 

Add a new key pair:


$ nova keypair-add oskey > oskey.priv
$ ls
accrc    eucarc      exercises    extras.d  functions    lib      localrc  oskey.priv  rejoin-stack.sh  stackrc         stack.sh  tools
AUTHORS  exerciserc  exercise.sh  files     HACKING.rst  LICENSE  openrc   README.md   samples          stack-screenrc  tests     unstack.sh
$ 

Boot a flavor:


$ nova boot --key-name oskey --image f17-x86_64-openstack-sda  --flavor m1.tiny f17demo1                                                                           
+-----------------------------+--------------------------------------+
| Property                    | Value                                |
+-----------------------------+--------------------------------------+
| status                      | BUILD                                |
| updated                     | 2013-02-27T11:58:58Z                 |
| OS-EXT-STS:task_state       | scheduling                           |
| key_name                    | oskey                                |
| image                       | f17-x86_64-openstack-sda             |
| hostId                      |                                      |
| OS-EXT-STS:vm_state         | building                             |
| flavor                      | m1.tiny                              |
| id                          | 5eaf23b5-3558-4b07-8195-9d24734beced |
| security_groups             | [{u'name': u'default'}]              |
| user_id                     | f28638e924a645e1b69650b8acc99283     |
| name                        | f17demo1                             |
| adminPass                   | e9LZqPFdBvjq                         |
| tenant_id                   | 9b54eaecedb149de9d1d48c43210d463     |
| created                     | 2013-02-27T11:58:58Z                 |
| OS-DCF:diskConfig           | MANUAL                               |
| metadata                    | {}                                   |
| accessIPv4                  |                                      |
| accessIPv6                  |                                      |
| progress                    | 0                                    |
| OS-EXT-STS:power_state      | 0                                    |
| OS-EXT-AZ:availability_zone | None                                 |
| config_drive                |                                      |
+-----------------------------+--------------------------------------+
$ 

Now, list the running nova instances:

 
$ nova list
+--------------------------------------+----------+--------+----------+
| ID                                   | Name     | Status | Networks |
+--------------------------------------+----------+--------+----------+
| 5eaf23b5-3558-4b07-8195-9d24734beced | f17demo1 | ERROR  |          |
+--------------------------------------+----------+--------+----------+
$ 

Oh, it says ERROR, let’s take a look at logs.

NOTE1: Log files to see: screen-n-cpu.log, screen-n-sch.log (located in $HOME/src/openstack/devstack/data/logs).
NOTE2: Use ‘less -r’ to view the above log files, so they’re presented in a readable format. Otherwise all control characters (ESC, Carriage return) are displayed in caret notataion, and it’s unpleasant to look at.

So, I missed to add executable bit to $HOME directory, let’s set it:


$ sudo chmod -R +x /home/tuser1/
$ 

Let’s delete the old instance, boot a flavor (in this case, a tiny one – 512 MB memory), & list the running nova instance, and ensure it’s ACTIVE:

 
#----------#
$ nova delete 5eaf23b5-3558-4b07-8195-9d24734beced                                                                                                                 
$ 
#----------#
$ nova boot --key-name oskey --image f17-x86_64-openstack-sda  --flavor m1.tiny f17demo1
#----------#
$ nova list
+--------------------------------------+----------+--------+------------------+
| ID                                   | Name     | Status | Networks         |
+--------------------------------------+----------+--------+------------------+
| 5710ba30-499e-4c2a-872c-327217e229c6 | f17demo1 | ACTIVE | private=10.0.0.3 |
+--------------------------------------+----------+--------+------------------+
$ 
#----------#

List the running openstack services:

 
$ openstack-status 
== Support services ==
mysqld:                       active (disabled on boot)
libvirtd:                     active
qpidd:                        active
$ 

Leave a Comment

Filed under Uncategorized

Nova’s way of using a disk image when it boots a guest for first time

Let’s see how Nova — the compute component of OpenStack — uses a disk image from the time it was initially imported into Glance (OpenStack’s repository for virtual machine disk images) till a new virtual machine instance is booted and running.

I started by downloading a Fedora-17 disk image from here
(Keep a note of the size of the image — 241 MB )


[tuser1@interceptor ~(keystone_user1)]$ ls -lash f17-x86_64-openstack-sda.qcow2 
241M -rw-rw-r--. 1 tuser1 tuser1 241M Jan 13  2012 f17-x86_64-openstack-sda.qcow2
[tuser1@interceptor ~(keystone_user1)]$ 

Import the above Fedora 17 disk image into Glance:


[tuser1@interceptor ~(keystone_admin)]$ glance image-create --name="fedora-17" --is-public=true \
--disk-format=qcow2 --container-format bare < f17-x86_64-openstack-sda.qcow2


(Note that I have to source Keystone admin credentials to list images from Glance).

Let’s list the current images in Glance:


tuser1@interceptor ~(keystone_admin)]$ glance image-list
+--------------------------------------+-----------+-------------+------------------+------------+--------+
| ID                                   | Name      | Disk Format | Container
Format | Size       | Status |
+--------------------------------------+-----------+-------------+------------------+------------+--------+
| 1e6292f9-82bd-4cdb-969e-c863cb1c6692 | fedora-17 | qcow2       | bare
| 251985920  | active |
| acc4c853-9153-4e80-b3c8-e253451ae983 | rhel63    | qcow2       | bare
| 1074135040 | active |
+--------------------------------------+-----------+-------------+------------------+------------+--------+
[tuser1@interceptor ~(keystone_admin)]$ 

List current running Nova instances :


[tuser1@interceptor ~(keystone_user1)]$ nova list
+--------------------------------------+-----------+--------+-------------------+
| ID                                   | Name      | Status | Networks          |
+--------------------------------------+-----------+--------+-------------------+
| 08d616a9-87a1-4c0d-b986-7d6aa5ed6780 | fedora-t1 | ACTIVE | net1=ww.xx.yyy.zz |
| 3e487977-37e8-4f26-9443-d65ecbdf83c9 | fedora-t2 | ACTIVE | net1=ww.xx.yyy.zz |
| 48d9e518-a91f-48db-9d9b-965b243e7113 | fedora-t4 | ACTIVE | net1=ww.xx.yyy.zz |
+--------------------------------------+-----------+--------+-------------------+
[tuser1@interceptor ~(keystone_user1)]$ 

Let’s also list the guests running using libvirt’s virsh :


[tuser1@interceptor ~(keystone_user1)]$ sudo virsh list
 Id    Name                           State
----------------------------------------------------
 12    instance-0000000c              running
 13    instance-0000000d              running
 22    instance-00000012              running

[tuser1@interceptor ~(keystone_user1)]$ 

Find the block device in use for one of the running instances to examine it further — instance-0000000c, in this case:


[tuser1@interceptor ~(keystone_user1)]$ sudo virsh domblklist instance-0000000c
Target     Source
------------------------------------------------
vda        /var/lib/nova/instances/instance-0000000c/disk

[tuser1@interceptor ~(keystone_user1)]$ 

Let’s get information about the disk in use by the above nova instance; specifically, find its backing file:


[tuser1@interceptor ~(keystone_user1)]$ qemu-img info /var/lib/nova/instances/instance-0000000c/disk
image: /var/lib/nova/instances/instance-0000000c/disk
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 149M
cluster_size: 65536
backing file: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20

Now, get information about the “backing file” used by the overlay (used by the running Nova instance above):


[tuser1@interceptor ~(keystone_user1)]$ qemu-img info /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20
image: /var/lib/nova/instances/_base/06a057b9c7b0b27e3b496f53d1e88810a0d1d5d3_20
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 740M
[tuser1@interceptor ~(keystone_user1)]$ 


It’s worth noting here that the original disk image initially uploaded into glance was a qcow2 image. However, the base image being used by Nova is a raw disk image.

From this, we can see: When booting a virtual machine instance for the first time, Nova does a few things:

  1. Make a copy of the original qcow2 disk image from Glance, convert it into a ‘raw’ sparse image & make it a base image (its location – /var/lib/nova/instances/_base) — Reason for turning non-raw to raw, refer “Background info” section below.
  2. Expands the size of the base image to 20GB (because, I used the m1.small flavour from Nova, when initially booting the image.)
  3. Use this base image & instantiate a copy on write overlay (qcow2) to boot the nova instance.

To demonstrate this post, I used Red Hat OpenStack Folsom, RHEL 6.4, single node, running these services — nova, glance, keystone, cinder; using KVM as the underlying hypervisor.

Background info:

[1] From nova’s git log, the conversion from non ‘raw’ images to ‘raw’ is tracked down to this commit

commit ff9d353b2f4fee469e530fbc8dc231a41f6fed84
Author: Scott Moser 
Date:   Mon Sep 19 16:57:44 2011 -0400

    convert images that are not 'raw' to 'raw' during caching to node
    
    This uses 'qemu-img' to convert images that are not 'raw' to be 'raw'.
    By doing so, it
     a.) refuses to run uploaded images that have a backing image reference
         (LP: #853330, CVE-2011-3147)
     b.) ensures that when FLAGS.use_cow_images is False, and the libvirt
         xml written specifies 'driver_type="raw"' that the disk referenced
         is also raw format. (LP: #837102)
     c.) removes compression that might be present to avoid cpu bottlenecks
         (LP: #837100)
    
    It does have the negative side affect of using more space in the case where
    the user uploaded a qcow2 (or other advanced image format) that could have
    been used directly by the hypervisor.  That could, later, be remedied by
    another 'qemu-img convert' being done to the "preferred" format of the
    hypervisor.

[2] Pádraig Brady pointed out (thank you !) this bz — https://bugs.launchpad.net/nova/+bug/932180 to note further reasons for converting images from non raw to raw.

As an aside, more informaion on disk image size allocation/ performance improvements upstream – https://blueprints.launchpad.net/nova/+spec/preallocated-images

UPDATE : Pádraig Brady writes in excellent detail about the life of an openstack libvirt image — describing Openstack — http://www.pixelbeat.org/docs/openstack_libvirt_images/

3 Comments

Filed under Uncategorized

Nested virtualization with KVM and Intel on Fedora-18

KVM nested virtualization with Intel finally works for me on Fedora-18. All three layers L0 (physical host) -> L1(regular-guest/guest-hypervisor) -> L2 (nested-guest) are running successfully as of writing this.

Previously, nested KVM virtualization on Intel was discussed here and here. This time on Fedora-18, I was able to successfully boot and use nested guest with resonable performance. (Although, I still have to do more formal tests to show some meaningful performance results).

Test setup information

Config info about the physical host, regular-guest/guest hypervisor and nested-guest. (All of them are Fedora-18; x86_64)

  • Physical Host (Host hypervisor/Bare metal)
    • Node info and some version info
      
      #--------------------#
      # virsh nodeinfo
      CPU model:           x86_64
      CPU(s):              4
      CPU frequency:       1995 MHz
      CPU socket(s):       1
      Core(s) per socket:  4
      Thread(s) per core:  1
      NUMA cell(s):        1
      Memory size:         10242692 KiB
      
      #--------------------#
      # cat /etc/redhat-release ; uname -r ; arch ; rpm -q qemu-kvm libvirt-daemon-kvm
      Fedora release 18 (Spherical Cow)
      3.6.7-5.fc18.x86_64
      x86_64
      qemu-kvm-1.3.0-9.fc18.x86_64
      libvirt-daemon-kvm-1.0.2-1.fc18.x86_64
      #
      #--------------------# 
      
  • Regualr Guest (Guest Hypervisor)
    • A 20GB qcow2 disk image w/ cache=’none’ enabled in the libvirt xml
    • 
      #--------------------# 
      # virsh nodeinfo
      CPU model:           x86_64
      CPU(s):              4
      CPU frequency:       1994 MHz
      CPU socket(s):       4
      Core(s) per socket:  1
      Thread(s) per core:  1
      NUMA cell(s):        1
      Memory size:         4049888 KiB
      #--------------------# 
      # cat /etc/redhat-release ; uname -r ; arch ; rpm -q qemu-kvm libvirt-daemon-kvm
      Fedora release 18 (Spherical Cow)
      3.6.10-4.fc18.x86_64
      x86_64
      qemu-kvm-1.2.2-6.fc18.x86_64
      libvirt-daemon-kvm-0.10.2.3-1.fc18.x86_64
      #--------------------# 
      
  • Nested Guest
    • Config: 2GB Memory; 2 vcpus; 6GB sparse qcow2 disk image

Setting up guest hypervisor and nested guest

Refer the notes linked above to get the nested guest up and running:

  • Create a regular guest/guest-hypervisor –
     # ./create-regular-f18-guest.bash 
  • Expose intel VMX extensions inside the guest-hypervisor by adding the cpu’ attribute to the regular-guest’s libvirt xml file
  • Shutdown regular guest, Redefine it ( virsh define /etc/libvirt/qemu/regular-guest-f18.xml ) ; Start the guest ( virsh start regular-guest-f18 )
  • Now, install virtualization packages inside the guest-hypervisor
  • -

     # yum install libvirt-daemon-kvm libvirt-daemon-config-network libvirt-daemon-config-nwfilter python-virtinst -y 
  • Start libvirtd service –
     # systemctl start libvirtd.service && systemctl status libvirtd.service  
  • Create a nested guest
     # ./create-nested-f18-guest.bash 

The scripts, and reference libvirt xmls I used for this demonstration are posted on github .

qemu-kvm invocation of bare-metal and guest hypervisors

qemu-kvm invocation of regular guest (guest hypervisor) indicating vmx extensions


# ps -ef | grep -i qemu-kvm | egrep -i 'regular-guest-f18|vmx'
qemu     15768     1 19 13:33 ?        01:01:52 /usr/bin/qemu-kvm -name regular-guest-f18 -S -M pc-1.3 -cpu core2duo,+vmx -enable-kvm -m 4096 -smp 4,sockets=4,cores=1,threads=1 -uuid 9a7fd95b-7b4c-743b-90de-fa186bb5c85f -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/regular-guest-f18.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/export/vmimgs/regular-guest-f18.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:a6:ff:96,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Running virt-host-validate (it’s part of libvirt-client package) on bare-metal host indicting the host is configured to run KVM


# virt-host-validate 
  QEMU: Checking for hardware virtualization                                 : PASS
  QEMU: Checking for device /dev/kvm                                         : PASS
  QEMU: Checking for device /dev/vhost-net                                   : PASS
  QEMU: Checking for device /dev/net/tun                                     : PASS
   LXC: Checking for Linux >= 2.6.26                                         : PASS
# 

Networking Info
- The regular guest is using the bare metal host’s bridge device ‘br0′
- The nested guest is using libvirt’s default bridge ‘virbr0′

Caveat : If NAT’d networking is used on both bare metal & guest hypervisor, both, by default have 192.168.122.0/24 network subnet (unless explicitly changed), and will mangle the networking setup. Bridging on L0 (bare metal host), and NAT on L1 (guest hypervisor) avoids this.

Notes

  • Ensure to have serial console enabled in the both L1 and L2 guests, very handy for debugging. If you use the kickstart file mentioned here, it’s taken care of. The magic lines to be added to kernel cmd line are console=tty0 console=ttyS0,115200
  • Once the nested guest was created, I tried to set the hostname and it turns out for some reason ext4 has made the file system read-only :
    
    	#  hostnamectl set-hostname nested-guest-f18.foo.bar.com
    Failed to issue method call: Read-only file system
    

    The I see these I/O errors from /var/log/messages:

    
    .
    .
    .
    Feb 12 04:22:31 localhost kernel: [  724.080207] end_request: I/O error, dev vda, sector 9553368
    Feb 12 04:22:31 localhost kernel: [  724.080922] Buffer I/O error on device dm-1, logical block 33467
    Feb 12 04:22:31 localhost kernel: [  724.080922] Buffer I/O error on device dm-1, logical block 33468
    

    At this point, I tried to reboot the guest, only to be thrown at a dracut repair shell. I tried fsck a couple of times, & then tried to reboot the nested guest, to no avail. Then I force powered-off the nested-guest:

    #virsh destroy nested-guest-f18

    Now, it boots just fine — just while I was trying to get to the bottom of the I/O errors. I was discussing this behaviour with Rich Jones, and he suggested to try some more I/O activity inside the nested guest to see if I can trigger those errors again.

    
    # find / -exec md5sum {} \; > /dev/null
    # find / -xdev -exec md5sum {} \; > /dev/null
    

    After the above commands ran for more than 15 minutes, the I/O errors can’t be triggered any more,

  • A test for libugestfs program (from rwmj) would be on the host & first level guest to compare. The command needs to be ran several times and discard the first few results, to get a hot cache.
    
    # time guestfish -a /dev/null run' 
    
  • Another libguestfs test Rich suggested is to disable nested virt and measure guestfish running in the guest to find the speed-up from nested virtualization in contrast to pure software emulation.

Next, to run more useful work loads in these nested vmx guests.

1 Comment

Filed under Uncategorized

Live backup with external disk snapshots and libvirt’s blockpull

There’s been some interesting work going on in blocklayer for libvirt/qemu/kvm. I have a few pending posts related to that, thought I’ll start with a post about live backup with external disk snapshots & using libvirt’s blockpull. External snapshots were previously discussed here.

[1] List the block device associated with a guest.


                # virsh domblklist f17-base
                Target     Source
                ---------------------------------------------
                vda        /export/vmimages/f17-base.qcow2

                #

[2] Create external disk-only snapshot (while the guest is running).


                # virsh snapshot-create-as --domain f17-base snap1 snap1-desc \
                --disk-only --diskspec vda,snapshot=external,file=/export/vmimages/sn1-of-f17-base.qcow2 \
                --atomic
                Domain snapshot snap1 created
                #

[3] Now, list the block device associated(use cmd from step-1, above)
with the guest, to ensure it reflects the new overlay image as
the current block device in use.


                # virsh domblklist f17-base
                Target     Source
                ----------------------------------------------------
                vda        /export/vmimages/sn1-of-f17-base.qcow2

                #

[4] Here, backup the original disk image /export/vmimages/f17-base.qcow2
using rsync or any other preferred method, while the guest is now
running ( &amp writing all its new changes into sn1-of-f17-base.qcow2).

[5] Now, let’s say, there are changes accumulated over time into
sn1-of-f17-base.qcow2(current active layer). We can now merge
the contents from original disk image(f17-base) into the current disk using ‘blockpull’.

Two possible here:

(a) If we had the below chain:

 [base] <-- [sn1] # the current active layer

We can make sn1-of-f17-base.qcow2 a standalone image by
pulling/merging contents from the orignial disk image (f17-base.qcow2). Resulting in sn1-of-f17-base becoming a self-contained disk image without any backing file.


                 # virsh blockpull --domain f17-base --path \
                 /var/lib/libvirt/images/sn1-of-f17-base.qcow2 --verbose --wait

The resultant chain will be:

 [sn1]  

which is an active, self-contained image with all the data from [base] pulled into it.

(b) If we have more snapshots:


                Say, we start with this chain,

                [base] <-- [sn1] <-- [sn2] <-- [active],

                it can be shortened to:

                [base] <-- [sn1] <-- [active]

                by doing:

                # virsh blockpull --domain f17-base --path \
                /var/lib/libvirt/images/active.qcow2 --base /var/lib/libvirt/sn1.qcow2 --verbose --wait

The above command pulls the data from [sn2] into [active], and changes the backing file of [active] to [sn1].

As an optional next step, we can again perform 'blockpull' operation,
with the above shortened chain, resulting in [active] becoming a standalone image:


                # virsh blockpull --domain f17-base --path \
                /var/lib/libvirt/images/active.qcow2 --verbose --wait

The above command pulls data from [sn1] and [base]
into [active] and makes it a self-contained image without any backing file,
resulting in the chain with a single image: [active]

NOTE: 'blockpull' needs to be done for every disk image that was
snapshotted (by optionally supplying –base. If a –base option
is not supplied, the current disk image will pull data from its
entire backing chain, & makes it standalone without any backing
file.)

[6] Once the 'blockpull' operation above is complete, we can clean-up
the tracking of snapshots(metadata) by libvirt to reflect the new reality:

              
                # virsh snapshot-delete $dom $name --metadata

There are couple of other ways doing guest live backups using Libvirt’s newer capabilities like ‘blockcopy’ mechanism. An unpolished example is here (This test was was done by building libvirt & qemu from git, a couple of months ago. I haven’t yet gotten to testing newest qemu/libvirt)

Leave a Comment

Filed under Uncategorized

LinuxCon EU-2012 virt snapshots demo notes

My LinuxCon 2012 slides/demo( & more elaborate technical article) on Virtual machine snapshots with Libvirt/QEMU are here — http://kashyapc.fedorapeople.org/virt/lc-2012/

2 Comments

Filed under Uncategorized

LinuxCon EU & KVM Forum 2012

LinuxCon, KVM Forum, Ovirt & Gluster workshops are all co-located events in Barcelona from 5th-Nov-2012 to 9th-Nov-2012. I’ll also be presenting a talk/demo on Virtual machine snapshots at LinuxCon.

A very informative and diverse schedule shaped up across the virtualization area. Schedule for LinuxCon is here , and for KVM Forum/Ovirt work shop, it’s here.

Leave a Comment

Filed under Uncategorized

Creating rapid thin-provisioned guests using QEMU backing files

Provisioning virtual machines very rapidly is highly desirable, especially, when deploying large number of virtual machines. With QEMU’s backing files concept, we can instantiate several clones, by creating a single base-image and then sharing it(read-only) across multiple guests. So that, these guests, when modified will write all their changes to their disk image

To exemplify:

Initially, let’s create a minimal Fedora 17 virtual guest (I used this script), and copy the resulting qcow2 disk image as base-f17.qcow2. So, base-f17.qcow2 has Fedora 17 on it, and is established as our base image. Let’s see the info of it


[root@localhost vmimages]# qemu-img info base-f17.qcow2 
image: base-f17.qcow2
file format: qcow2
virtual size: 5.0G (5368709120 bytes)
disk size: 5.0G
cluster_size: 65536
[root@localhost vmimages]# 

Now, let’s make use of the above F17 base image and try to instantiate 2 more Fedora 17 virtual machines, quickly. First, create a new qcow2 file(f17vm2-with-b.qcow2) using the base-f7.qcow2 as its backing-file:


[root@localhost vmimages]# qemu-img create -b /home/kashyap/vmimages/base-f17.qcow2 -f qcow2 /home/kashyap/vmimages/f17vm2-with-b.qcow2
Formatting '/home/kashyap/vmimages/f17vm2-with-b.qcow2', fmt=qcow2 size=5368709120 backing_file='/home/kashyap/vmimages/base-f17.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off 

And now, let’s see some information about the just created disk image. (It can be noticed the ‘backing file’ attribute below pointing to our base image(base-f17.qcow2)


[root@localhost vmimages]# qemu-img info /home/kashyap/vmimages/f17vm2-with-b.qcow2
image: /home/kashyap/vmimages/f17vm2-with-b.qcow2
file format: qcow2
virtual size: 5.0G (5368709120 bytes)
disk size: 196K
cluster_size: 65536
backing file: /home/kashyap/vmimages/base-f17.qcow2
[root@localhost vmimages]# 

Now, we’re set — our ‘f17vm2-with-b.qcow2‘ is ready to use. We can verify it in two ways:

  1. to quickly verify, we can invoke qemu-kvm (not recommended in production) — this will boot our new guest on stdio, and throws a serial console (NOTE: the base-f17.qcow2 had ‘console=tty0 console=ttyS0,115200′ on its kernel command line, so that it can provide serial console) –
    
    [root@localhost vmimages]# qemu-kvm -enable-kvm -m 1024 f17vm2-with-b.qcow2 -nographic
    
                              GNU GRUB  version 2.00~beta4
    
     +--------------------------------------------------------------------------+
     |Fedora Linux                                                              | 
     |Advanced options for Fedora Linux                                         |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          |
     |                                                                          | 
     +--------------------------------------------------------------------------+
    
          Use the ^ and v keys to select which entry is highlighted.      
          Press enter to boot the selected OS, `e' to edit the commands      
          before booting or `c' for a command-line.      
                                                                                   
                                                                                   
    Loading Linux 3.3.4-5.fc17.x86_64 ...
    Loading initial ramdisk ...
    [    0.000000] Initializing cgroup subsys cpuset
    .
    .
    .
    (none) login: root
    Password: 
    Last login: Thu Oct  4 07:07:54 on ttyS0
    [root@(none) ~]# 
    
  2. The other, more traditional way(so that libvirt could track it & can be used to manage the guest), is to copy a similar(F17) libvirt XML file, edit and update the name, uuid, disk path, mac-address, then define it, and start it via ‘virsh’:
    
    [root@localhost qemu]# virsh define f17vm2-with-b.xml 
    [root@localhost qemu]# virsh start f17vm2-with-b --console
    [root@localhost qemu]# virsh list 
     Id    Name                           State
    ----------------------------------------------------
     9     f17v2-with-b                  running
    

Now, let’s quickly check the disk-image size of our new thin-provisioned guest. It can be noticed, the size is quite thin (14Mb) — meaning, only the delta from the original backing file will be written to this image.


[root@localhost vmimages]# ls -lash f17vm2-with-b.qcow2 
14M -rw-r--r--. 1 root root 14M Oct  4 06:30 f17vm2-with-b.qcow2
[root@localhost vmimages]#

To instantiate our 2nd F17 guest(say f17vm3-with-b) — again, create a new qcow2 file(f17vm3-with-b.qcow2) with its backing file as our base image base-f17.qcow2 . And then, check the info of the disk image using ‘qemu-img’ tool.


#----------------------------------------------------------#
[root@localhost vmimages]# qemu-img create -b /home/kashyap/vmimages/base-f17.qcow2 -f qcow2 /home/kashyap/vmimages/f17vm3-with-b.qcow2
Formatting '/home/kashyap/vmimages/f17vm3-with-b.qcow2', fmt=qcow2 size=5368709120 backing_file='/home/kashyap/vmimages/base-f17.qcow2' encryption=off cluster_size=65536 lazy_refcounts=off 
#----------------------------------------------------------#
[root@localhost qemu]# qemu-img info /home/kashyap/vmimages/f17vm3-with-b.qcow2 
image: /home/kashyap/vmimages/f17vm3-with-b.qcow2
file format: qcow2
virtual size: 5.0G (5368709120 bytes)
disk size: 196K
cluster_size: 65536
backing file: /home/kashyap/vmimages/base-f17.qcow2
[root@localhost qemu]# 
#----------------------------------------------------------#

[it's worth noting here that we're pointing to the same base image, and multiple guests are using it as a backing file.]

Again check the disk image size of the thin-provisioned guest:


[root@localhost vmimages]# ls -lash f17vm3-with-b.qcow2 
14M -rw-r--r--. 1 qemu qemu 14M Oct  4 07:18 f17vm3-with-b.qcow2

Goes without saying, the 2nd F17 guest also has a new XML file, defined w/ its unique attributes just like the 1st F17 guest.


[root@localhost qemu]# virsh list 
 Id    Name                           State
----------------------------------------------------
 9     f17vm2-with-b                  running
 10    f17vm3-with-b                  running

For reference sake, I’ve posted the xml file I’ve used for ‘f17vm3-with-b’ guest here

To summarize, by sharing a single, common base-image, we can quickly deploy multiple thin-provisioned virtual machines.


                      .----------------------.
                      | base-image-f17.qcow2 |
                      |                      |
                      '----------------------'
                         /       |         \
                        /        |          \
                       /         |           \
                      /          |            \
         .-----------v--.  .-----v--------.  .-v------------.
         | f17vm2.qcow2 |  | f17vm3.qcow2 |  | f17vmN.qcow2 |
         |              |  |              |  |              |
         '--------------'  '--------------'  '--------------'
            

Leave a Comment

Filed under Uncategorized

External(and Live) snapshots with libvirt

Previously, I posted about snapshots here , which briefly discussed different types of snapshots. In this post, let’s explore how external snapshots work. Just to quickly rehash, external snapshots are a type of snapshots where, there’s a base image(which is the original disk image), and then its difference/delta (aka, the snapshot image) is stored in a new QCOW2 file. Once the snapshot is taken, the original disk image will be in a ‘read-only’ state, which can be used as backing file for other guests.

It’s worth mentioning here that:

  • The original disk image can be either in RAW format or QCOW2 format. When a snapshot is taken, ‘the difference’ will be stored in a different QCOW2 file
  • The virtual machine has to be running, live. Also with Live snapshots, no guest downtime is experienced when a snapshot is taken.
  • At this moment, external(Live) snapshots work for ‘disk-only’ snapshots(and not VM state). Work for both disk and VM state(and also, reverting to external disk snapshot state) is in-progress upstream(slated for libvirt-0.10.2).

Before we go ahead, here’s some version info, I’m testing on Fedora-17(host), and the guest(named ‘daisy’) is running Fedora-18(Test Compose):


[root@moon ~]# rpm -q libvirt qemu-kvm ; uname -r
libvirt-0.10.1-3.fc17.x86_64
qemu-kvm-1.2-0.2.20120806git3e430569.fc17.x86_64
3.5.2-3.fc17.x86_64
[root@moon ~]# 

External disk-snapshots(live) using QCOW2 as original image:
Let’s see an illustration of external(live) disk-only snapshots. First, let’s ensure the guest is running:


[root@moon qemu]# virsh list
 Id    Name                           State
----------------------------------------------------
 3     daisy                          running


[root@moon qemu]# 

Then, list all the block devices associated with the guest:


[root@moon ~]# virsh domblklist daisy --details
Type       Device     Target     Source
------------------------------------------------
file       disk       vda        /export/vmimgs/daisy.qcow2

[root@moon ~]# 

Next, let’s create a snapshot(disk-only) of the guest this way, while the guest is running:


[root@moon ~]# virsh snapshot-create-as daisy snap1-daisy "snap1 description" \
 --diskspec vda,file=/export/vmimgs/snap1-daisy.qcow2 --disk-only --atomic

Some details of the flags used:
- Passing a ‘–diskspec’ parameter adds the ‘disk’ elements to the Snapshot XML file
- ‘–disk-only’ parameter, takes the snapshot of only the disk
- ‘–atomic’ just ensures either the snapshot is run completely or fails w/o making any changes

Let’s check the information about the just taken snapshot by running qemu-img:


[root@moon ~]# qemu-img info /export/vmimgs/snap1-daisy.qcow2 
image: /export/vmimgs/snap1-daisy.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 2.5M
cluster_size: 65536
backing file: /export/vmimgs/daisy.qcow2
[root@moon qemu]# 

Apart from the above, I created 2 more snapshots(just the same syntax as above) for illustration purpose. Now, the snapshot-tree looks like this:


[root@moon ~]# virsh snapshot-list daisy --tree

snap1-daisy
  |
  +- snap2-daisy
      |
      +- snap3-daisy
        

[root@moon ~]# 

For the above example image file chain[ base<-snap1<-snap2<-snap3 ], it has to be read as – snap3 has snap2 as its backing file, snap2 has snap1 as its backing file, and snap1 has the base image as its backing file. We can see the backing file info from qemu-img:


#--------------------------------------------#
[root@moon ~]# qemu-img info /export/vmimgs/snap3-daisy.qcow2
image: /export/vmimgs/snap3-daisy.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 129M
cluster_size: 65536
backing file: /export/vmimgs/snap2-daisy.qcow2
#--------------------------------------------#
[root@moon ~]# qemu-img info /export/vmimgs/snap2-daisy.qcow2
image: /export/vmimgs/snap2-daisy.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 3.6M
cluster_size: 65536
backing file: /export/vmimgs/snap1-daisy.qcow2
#--------------------------------------------#
[root@moon ~]# qemu-img info /export/vmimgs/snap1-daisy.qcow2
image: /export/vmimgs/snap1-daisy.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 2.5M
cluster_size: 65536
backing file: /export/vmimgs/daisy.qcow2
[root@moon ~]#
#--------------------------------------------#

Now, if we do not need snap2 any more, and want to pull all the data from snap1 into snap3, making snap1 as snap3′s backing file, we can do a virsh blockpull operation as below:


#--------------------------------------------#
[root@moon ~]# virsh blockpull --domain daisy  --path /export/vmimgs/snap3-daisy.qcow2 \
 --base /export/vmimgs/snap1-daisy.qcow2 --wait --verbose
Block Pull: [100 %]
Pull complete
#--------------------------------------------#

Where, –path = path to the snapshot file, and –base = path to a backing file from which the data to be pulled. So from above example, it’s evident that we’re pulling the data from snap1 into snap3, and thus flattening the backing file chain resulting in snap1 as snap3′s backing file, which can be noticed by running qemu-img again.
Thing to note here,


[root@moon ~]# qemu-img info /export/vmimgs/snap3-daisy.qcow2 
image: /export/vmimgs/snap3-daisy.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 145M
cluster_size: 65536
backing file: /export/vmimgs/snap1-daisy.qcow2
[root@moon ~]# 

A couple of things to note here, after discussion with Eric Blake(thank you):

- If we do a listing of the snapshot tree again(now that ‘snap2-daisy.qcow2′ backing file is no more in use),


[root@moon ~]# virsh snapshot-list daisy --tree
snap1-daisy
  |
  +- snap2-daisy
      |
      +- snap3-daisy
[root@moon ~]#

one might wonder, why is snap3 still pointing to snap2? Thing to note here is, the above is the snapshot chain, which is independent from each virtual disk’s backing file chain. So, the ‘virsh snapshot-list’ is still listing the information accurately at the time of snapshot creation(and not what we’ve done after creating the snapshot). So, from the above snapshot tree, if we were to revert to snap1 or snap2 (when revert-to-disk-snapshots is available), it’d still be possible to do that, meaning:

It’s possible to go from this state:
base <- snap123 (data from snap1, snap2 pulled into snap3)

we can still revert to:

base<-snap1 (thus undoing the changes in snap2 & snap3)

External disk-snapshots(live) using RAW as original image:
With external disk-snapshots, the backing file can be RAW as well (unlike with ‘internal snapshots’ which only work with QCOW2 files, where the snapshots and delta are all stored in a single QCOW2 file)

A quick illustration below. The commands are self-explanatory. It can be noted the change(from RAW to QCOW2) in the block disk associated with the guest, before & after taking the disk-snapshot (when virsh domblklist command was executed)


#-------------------------------------------------#
[root@moon ~]# virsh list | grep f17btrfs2
 7     f17btrfs2                      running
[root@moon ~]#
#-------------------------------------------------#
[root@moon ~]# qemu-img info /export/vmimgs/f17btrfs2.img                                                                                                                                                      
image: /export/vmimgs/f17btrfs2.img
file format: raw
virtual size: 20G (21474836480 bytes)
disk size: 1.5G
[root@moon ~]# 
#-------------------------------------------------#
[root@moon qemu]# virsh domblklist f17btrfs2 --details
Type       Device     Target     Source
------------------------------------------------
file       disk       hda        /export/vmimgs/f17btrfs2.img

[root@moon qemu]# 
#-------------------------------------------------#
[root@moon qemu]# virsh snapshot-create-as f17btrfs2 snap1-f17btrfs2 "snap1-f17btrfs2-description" \
--diskspec hda,file=/export/vmimgs/snap1-f17btrfs2.qcow2 --disk-only --atomic
Domain snapshot snap1-f17btrfs2 created
[root@moon qemu]# 
#-------------------------------------------------#
[root@moon qemu]# qemu-img info /export/vmimgs/snap1-f17btrfs2.qcow2 
image: /export/vmimgs/snap1-f17btrfs2.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 196K
cluster_size: 65536
backing file: /export/vmimgs/f17btrfs2.img
[root@moon qemu]# 
#-------------------------------------------------#
[root@moon qemu]# virsh domblklist f17btrfs2 --details
Type       Device     Target     Source
------------------------------------------------
file       disk       hda        /export/vmimgs/snap1-f17btrfs2.qcow2
[root@moon qemu]# 
#-------------------------------------------------#

Also note: All snapshot XML files, where libvirt tracks the metadata of snapshots are are located under /var/lib/libvirt/qemu/snapshots/$guestname (and the original libvirt xml file is located under /etc/libvirt/qemu/$guestname.xml)

10 Comments

Filed under Uncategorized