By Dave Scott - 2014-02-02
[ updated 2014-02-01 for mirage.1.1.0 and xen-disk.1.2.1 ]
This page describes how to create a synthetic, high-performance virtual disk implementation for Xen based on the MirageOS libraries.
The protocols used by Xen disk and network devices are designed to permit fast and efficient software implementations, avoiding the inefficiencies inherent in emulating physical hardware in software. The protocols are based on two primitives:
In the disk block protocol, the protocol starts with the client ("frontend" in Xen jargon) sharing a page with the server ("backend"). This single page will contain the request/response metadata, arranged as a circular buffer or "ring". The client ("frontend") can then start sharing pages containing disk blocks with the backend and pushing request structures to the ring, updating shared pointers as it goes. The client will give the server end a kick via an event channel signal and then both ends start running simultaneously. There are no locks in the protocol so updates to the shared metadata must be handled carefully, using write memory barriers to ensure consistency.
Like everything else in MirageOS, Xen disk devices are implemented as libraries. The following libraries are used:
Note that all these libraries work equally well in userspace (for development and debug) and kernelspace (for production): the target is chosen at link-time.
Userspace MirageOS apps are ideal for development, since they have access to the full suite of Unix debug and profiling tools. Once written, the exact same code can be relinked and run directly in kernelspace for maximum performance.
The xen-disk demonstrates how to create a synthetic Xen virtual disk. To compile it, first, install Xen (including the -dev, or -devel packages), OCaml and OPAM.
Second initialise your system:
opam init
eval `opam config env`
Third install the unmodified xen-disk
package, this will ensure all the build
dependencies are installed:
opam install xen-disk
When this completes it will have installed a command-line tool called
xen-disk
. If you start a VM using your Xen toolstack of choice
("xl create ..." or "xe vm-install ..." or "virsh create ...") then you
should be able to run:
xen-disk connect <vmname>
which will hotplug a fresh block device into the VM "vmname" using the "discard" backend, which returns "success" to all read and write requests, but actually throws all data away. Obviously this backend should only be used for basic testing!
Assuming that worked ok, clone and build the source for xen-disk
yourself:
git clone https://github.com/mirage/xen-disk
cd xen-disk
make
The xen-disk
program can use any MirageOS disk implementation satisfying
Mirage
BLOCK signature.
The key functions are:
By default xen-disk
uses the following disk implementations:
Ok ()
to all requests without doing any work (typically used for
performance testing the ring code)Let's make a virtual disk implementation which uses an existing disk image file as a "gold image", but uses copy-on-write so that no writes persist. This is a common configuration in Virtual Desktop Infrastructure deployments and is generally handy when you want to test a change quickly, and revert it cleanly afterwards. A useful Unix technique for file I/O is to "memory map" an existing file: this associates the file contents with a range of virtual memory addresses so that reading and writing within this address range will actually read or write the file contents. The "mmap" C function has a number of flags, which can be used to request "copy on write" behaviour. Reading the OCaml manual Bigarray.map_file it says:
If shared is true, all modifications performed on the array are reflected in the file. This requires that fd be opened with write permissions. If shared is false, modifications performed on the array are done in memory only, using copy-on-write of the modified pages; the underlying file is not affected.
So we should be able to make a virtual disk implementation which memory maps the image file and achieves copy-on-write by setting "shared" to false. For extra safety we can also open the file read-only.
Luckily there is already an
"mmap" implementation
in xen-disk
; all we need to do is tweak it slightly.
In the "connect" function we simply need to set "shared" to "false" to
achieve the behaviour we want i.e.
let connect id =
let fd = Unix.openfile (filename_of_id id) [ Unix.O_RDONLY ] 0o0 in
let stats = Unix.LargeFile.fstat fd in
let mmap = Cstruct.of_bigarray (Lwt_bytes.map_file ~fd ~shared:false ()) in
Unix.close fd;
let size = stats.Unix.LargeFile.st_size in
return (`Ok { id; size; mmap })
The read and write functions can be left as they are:
let forall offset bufs f =
let rec loop offset = function
| [] -> ()
| b :: bs ->
f offset b;
loop (offset + (Cstruct.len b)) bs in
loop (Int64.to_int offset * 512) bufs;
return (`Ok ())
let read t offset bufs =
forall offset bufs
(fun offset buf ->
Cstruct.blit t.mmap offset buf 0 (Cstruct.len buf)
)
let write t offset bufs =
forall offset bufs
(fun offset buf ->
Cstruct.blit buf 0 t.mmap offset (Cstruct.len buf)
)
Now if we rebuild and run something like:
dd if=/dev/zero of=disk.raw bs=1M seek=1024 count=1
dist/build/xen-disk/xen-disk connect <myvm> --path disk.raw --backend mmap
Inside the VM we should be able to do some basic speed testing:
djs@ubuntu1310:~$ sudo dd if=/dev/xvdg of=/dev/null bs=1M
16+0 records in
16+0 records out
16777216 bytes (17 MB) copied, 0.0276625 s, 606 MB/s
Plus we should be able to mount the filesystem inside the VM, make changes and then disconnect (send SIGINT to xen-disk by hitting Control+C on your terminal) without disturbing the underlying disk contents.