[08-JAN-2015 Update: Correct the blockcopy
CLI and update the final step to re-use the copy to be consistent with the scenario outlined at the beginning. Corrections pointed out by Gary R Cook at the end of the comments.]
[17-NOV-2014 Update: With recent libvirt/QEMU improvements, another way (which is relatively faster) to take a live disk backup via libvirt blockcommit
, here’s an example]
QEMU and libvirt projects has had a lot of block layer improvements in its last few releases (libvirt 1.2.6 & QEMU 2.1). This post discusses a method to do live disk storage migration with libvirt’s blockcopy
.
Context on libvirt blockcopy
Simply put, blockcopy
facilitates virtual machine live disk image copying (or mirroring) — primarily useful for different use cases of storage migration:
- Live disk storage migration
- Live backup of a disk image and its associated backing chain
- Efficient non-shared storage migration (with a combination of
virsh
operationssnapshort-create-as
+blockcopy
+blockcommit
) - As of IceHouse release, OpenStack Nova project also uses a variation of libvirt
blockcopy
, through its Python APIvirDomainBlockRebase
, to create live snapshots,nova image-create
. (More details on this in an upcoming blog post).
A blockcopy
operation has two phases: (a) All of source disk content is copied (or mirrored) to the destination, this operation can be canceled to revert to the source disk (b) Once libvirt gets a signal indicating source and destination content are equal, the mirroring job remains awake until an explicit call to virsh blockjob [. . .] --abort
is issued to end the mirroring operation gracefully . If desired, this explicit call to abort can be avoided by supplying --finish
option. virsh
manual page for verbose details.
Scenario: Live disk storage migration
To illustrate a simple case of live disk storage migration, we’ll use a disk image chain of depth 2:
base <-- snap1 <-- snap2 (Live QEMU)
Once live blockcopy is complete, the resulting status of disk image chain ends up as below:
base <-- snap1 <-- snap2 ^ | '------- copy (Live QEMU, pivoted)
I.e. once the operation finishes, ‘copy’ will share the backing file chain of ‘snap1’ and ‘base’. And, live QEMU is now pivoted to use the ‘copy’.
Prepare disk images, backing chain & define the libvirt guest
[For simplicity, all virtual machine disks are QCOW2 images.]
Create the base image:
$ qemu-img create -f qcow2 base 1G
Edit the base disk image using guestfish
, create a partition, make a file-system, add a file to the base image so that we distinguish its contents from its qcow2 overlay disk images:
$ guestfish -a base.qcow2 [. . .] ><fs> run ><fs> part-disk /dev/sda mbr ><fs> mkfs ext4 /dev/sda1 ><fs> mount /dev/sda1 / ><fs> touch /foo ><fs> ls / foo ><fs> exit
Create another QCOW2 overlay snapshot ‘snap1’, with backing file as ‘base’:
$ qemu-img create -f qcow2 -b base.qcow2 \ -o backing_fmt=qcow2 snap1.qcow2
Add a file to snap1.qcow2:
$ guestfish -a snap1.qcow2 [. . .] ><fs> run ><fs> part-disk /dev/sda mbr ><fs> mkfs ext4 /dev/sda1 ><fs> mount /dev/sda1 / ><fs> touch /bar ><fs> ls / bar baz foo lost+found ><fs> exit
Create another QCOW2 overlay snapshot ‘snap2’, with backing file as ‘snap1’:
$ qemu-img create -f qcow2 -b snap1.qcow2 \ -o backing_fmt=qcow2 snap2.qcow2
Add another test file ‘baz’ into snap2.qcow2 using guestfish
(refer to previous examples above) to distinguish contents of base, snap1 and snap2.
Create a simple libvirt XML file as below, with source file pointing to snap2.qcow2 — which will be the active block device (i.e. it tracks all new guest writes):
$ cat <<EOF > /etc/libvirt/qemu/testvm.xml <domain type='kvm'> <name>testvm</name> <memory unit='MiB'>512</memory> <vcpu>1</vcpu> <os> <type arch='x86_64'>hvm</type> </os> <devices> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/export/vmimages/snap2.qcow2'/> <target dev='vda' bus='virtio'/> </disk> </devices> </domain> EOF
Define the guest and start it:
$ virsh define etc/libvirt/qemu/testvm.xml Domain testvm defined from /etc/libvirt/qemu/testvm.xml $ virsh start testvm Domain testvm started
Perform live disk migration
Undefine the running libvirt guest to make it transient[*]:
$ virsh dumpxml --inactive testvm > /var/tmp/testvm.xml $ virsh undefine testvm
Check what is the current block device before performing live disk migration:
$ virsh domblklist testvm Target Source ------------------------------------------------ vda /export/vmimages/snap2.qcow2
Optionally, display the backing chain of snap2.qcow2:
$ qemu-img info --backing-chain /export/vmimages/snap2.qcow2 [. . .] # Output removed for brevity
Initiate blockcopy
(live disk mirroring):
$ virsh blockcopy --domain testvm vda \ /export/blockcopy-test/backups/copy.qcow2 \ --wait --verbose --shallow \ --pivot
Details of the above command: It creates copy.qcow2 file in the specified path; performs a --shallow
blockcopy (i.e. the ‘copy’ shares the backing chain) of the current block device (vda
); –pivot will pivot the live QEMU to the ‘copy’.
Confirm that QEMU has pivoted to the ‘copy’ by enumerating the current block device in use:
$ virsh domblklist testvm Target Source ------------------------------------------------ vda /export/vmimages/copy.qcow2
Again, display the backing chain of ‘copy’, it should be the resultant chain as noted in the Scenario section above).
$ qemu-img info --backing-chain /export/vmimages/copy.qcow2
Enumerate the contents of copy.qcow2:
$ guestfish -a copy.qcow2 [. . .] ><fs> run ><fs> mount /dev/sda1 / ><fs> ls / bar foo baz lost+found ><fs> quit
(You can notice above: all the content from base.qcow2, snap1.qcow2, and snap2.qcow2 mirrored into copy.qcow2.)
Edit the libvirt guest XML to use the copy.qcow2, and define it:
$ virsh edit testvm # Replace the<source file='/export/vmimages/snap2.qcow2'/>
# with<source file='/export/vmimages/copy.qcow2'/>
[. . .] $ virsh define /var/tmp/testvm.xml
[*] Reason for the undefining and defining the guest again: As of writing this, QEMU has to support persistent dirty bitmap — this enables us to restart a QEMU process with disk mirroring intact. There are some in-progress patches upstream for a while. Until they are in main line QEMU, the current approach (as illustrated above) is: make a running libvirt guest transient temporarily, perform live blockcopy
, and make the guest persistent again. (Thanks to Eric Blake, one of libvirt project’s principal developers, for this detail.)
Hi Kashyapc,
I have looked for some resouces (your blog included) to do one simple guest disk live backup without stoping the guest.
And it is resulting difficult to me to find one “oficial” and “simple” elegant way of doing it. All the options ends with the guest running with another snapshot disk.
For ex -> http://kashyapc.com/2013/01/22/live-backup-with-external-disk-snapshots-and-libvirts-blockpull/
# virsh domblklist hermes
Target Source
————————————————
vda /var/lib/libvirt/images/hermes.hermes-snap1
…and I must remove that snapshot, undefine->define again the guest and “restart” it to take the original disk (and that’s what I don’t want to do).
So if I am not wrong, this new block layer process in libvirt that you describe for live disk migration could be applied to do one vey simple live guest backup (disk only).
For ex:
1- Make it transient
# virsh dumpxml –inactive hermes > /var/tmp/hermes.xml
# virsh undefine hermes
2- Make blockcopy without pivoting option to let the guest still have his original disk:
# virsh blockcopy –domain hermes vda \
/mnt/network/nas/backups/libvirt/images/hermes_test.qcow2 \
–wait –verbose –shallow \
–finish
3- Define it again:
# virsh define /var/tmp/hermes.xml
and done…
I am missing somthing?. Is this disk consistent?. Is this not the correct “oficial” way to do one disk backup without stoping the vm?
Really I have tested it and seems to be working.
Thanks in advance
A couple of points:
blockcopy
works fine, but as you can see it requires guest to be undefined & redefined.--quiesce
option withvirsh snapshot-create-as
[. . .].blockcommit
(which is also relatively much faster thanblockpull
): http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit. I updated the above blogpost with a link to it.Sorry. And now I have found this -> https://kashyapc.fedorapeople.org/virt/lc-2012/live-backup-with-blockcopy.txt
So seems that this is a good way of doing it :)
That’s one way, but also refer the wikipage I posted in my previous comment which describes a method to do it via
blockcommit
which can be a bit more efficient.Yes!. That was exactly what I was looking for. With blockcommit and ‘–quiesce option and a quite recent version of libvirt and qemu (available in debian wheezy-backports) I am done.
Thank you very much kashyapc
Two problems here (testing with libvirt 1.2.11 and qemu 2.1.0
The virsh options –pivot and –finish are mutually exclusive. You should fix your command line above to be one that actually works.
Two: you’ve made a copy of a disk. So what? As soon as you define the domain again it goes back to using the pre-blockcopy files. The XML you dumped does not contain a reference to copy.qcow2, so what, exactly, does this exercise accomplish in the long term? Your example would best include the steps required to redefine the domain with the new file in use. At the very least, dump its XML and use that to re-define the domain.
I know it’s crazy, but I find a context much more helpful when demonstrating a new technique. It’s good to solve problems.
Thanks for the review, Gary. You’re right on both accounts: (a)
--pivot
and--finish
should be not mixed. It was a copy/paste mistake on my part I guess, fixed now. (b) About the copy of the disk, it’s just meant to be a backup copy. If one wants use pivot new copy, of course, they should re-defining the libvirt XML pointing the source disk to the copy. If not, just make a copy disk, and re-define the old libvirt guest XML.To that effect, changes I did:
– Corrected the CLI to use
--pivot
and remove the--finish
– Made the edit to the XML to reuse the copy so matches the ASCII diagram.
– Adjusted the final step to redefine the XML to use the ‘copy’.
When I write further examples, will add more details and context.
I have a better workaround! :) tar -S copies sparsed files very quickly, if they have a lot of free space inside. And temporary external snapshot makes all job safe.
#!/bin/bash
VM=$1
TARGET=$2
STOR=”/home/guest_images/”
cd $STOR
# make temporary external disk snapshot named “mig”
virsh snapshot-create-as $VM mig –disk-only –atomic
# remove snapshot from metadata due to virsh-migrate dont like existing snapshots
virsh snapshot-delete $VM mig –metadata
# copy base image
tar –totals –checkpoint=.8192 -Scvf – $VM.qcow2 | ssh $TARGET “tar -C $STOR -xf -”
# suspend VM
virsh suspend $VM
# copy snapshot image
tar –totals -Scvf – $VM.mig | ssh $TARGET “tar -C $STOR -xf -”
# live migrate
virsh migrate –live –undefinesource –persistent –verbose $VM qemu+ssh://$TARGET/system
# merge snapshot to base image file
# and make base image primary in VM config
virsh -c qemu+ssh://$TARGET/system blockcommit $VM vda –active –pivot –verbose
# resume VM
virsh -c qemu+ssh://$TARGET/system resume $VM
#remove orphaned snapshot file
ssh $TARGET “cd $STOR; rm -f $VM.mig”
#remove local disk files if necessary
# rm -f $VM.*
ssh key-based auth between source and destination servers must be cofigured, of cource.
This method to undefine the machine first, than copy the harddrive and finally redefine the machine seems to have one big, big flaw.
A colleague of mine used this method in a job to mirror several of our VMs from one hardware to another. But in the past few days an error occured during that job on several (hardware) machines: “### Undefine ‘name_of_vm’…
error: Failed to undefine domain name_of_vm
error: Requested operation is not valid: cannot undefine transient domain”
I found not much information regarding the error message “cannot undefine transient domain”.
But I found an alarming (sic!) result: As soon as I shut down the domain which couldn’t be undefined previously, now it was undefined! The xml was deleted. And sincemy colleague, before undefining, dumped the machine to /tmp/copy_of_name_of_vm.xml, after a reboot of the hardware the domains definition was gone to hell!
Fortunately I could restore it, but I don’t understand, why the failed undefine command succeded after shutting down the virtual machine. Seems that the undefine command tries a simple rm command – which under linux deletes the node, but let it visible until it’s no longer accessed.
I think your confusion is caused by this: Undefining a running domain will cause it to become a transient domain — which you cannot undefine again, for the second time.
So, before making a guest transient (which is currently needed to perform a successful live
blockcopy
), please ensure to have a backup of the guest XML. Also, if you’ve missed the second update I added at the top of the blog post, here’s another way that you can use (that doesn’t involve making a domain transient):http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit
Next, about transient versus persistent guests, in short:
(a) “Transient domains only exist until the domain is shutdown or when the host server is restarted.”
(b) “Persistent domains last indefinitely.”
You can read more about them here.
Finally, the good news is that there’s work in-progress (with “persistent dirty bitmaps”) upstream QEMU to avoid this undefine/redefine the guest.
Thanks for your comment and your hint to use blockcommit. In my case it seems, that an earlier error had occured, which had been overseen due to a little design flaw in my colleagues backup job: The jobs sends a mail each time it runs, with a subject which show the jobs state – error or success – as the last word of a very, very long subject line. Most mailclients uses to cut overlong subject lines…
So I guess that some time ago a redefine had run on an error (for which reason ever, I have no log entries, and the error mails are long deleted) and due to that little design flaw, all the resulting error messages had been ignored by the staff. Until I made a closer inspection a few days ago.
Fortunately, during my investigations, the vanishing of the domains happened before my eyes, so that I could respond immediately and no loss happened.
I think, I will follow your hint and rewrite the backup job to use the snapshotting and blockcommit method you described. And I will change the subject line (make it even longer…;)of its notification mail.
Ah-ha, good to hear that your investigation found the root cause. And,
blockcommit
, as shown in the example, is also a relatively quicker operation.Good luck.
as a last step you write “vi snap2.qcow2” did you mean “vi /var/tmp/testvm.xml”?
Whoops! Glaring mistake, I didn’t notice over the time.
Fixed now. Thanks for reporting that.
Instead of
vi
ing the file, it’s better to usevirsh edit vmname
, which will also some basic error checking.I think, “virsh edit vmname” alters the system xml file. The vi method makes more sense to me.