MirageOS 2014 review: IPv6, TLS, Irmin, Jitsu and community growth

This work funded in part by the EU FP7 User-Centric Networking project, Grant No. 611001.

An action-packed year has flown by for MirageOS, and it's time for a little recap of what's been happening and the plans for the new year. We announced MirageOS 1.0 just over a year ago, and 2014 also saw a major 2.0 summer release and the growth of a developer community that have been building support for IPv6, Transport Layer Security, on-demand spawning, profiling and much more. There have been 205 individual library releases, 25 presentations, and lots of online chatter through the year, so here follows a summary of our major activities recently.

Clean-Slate Transport Layer Security

David Kaloper and Hannes Mehnert started 2014 with getting interested in writing a safer and cleaner TLS stack in OCaml, and ended the year with a complete demonstration and talk last week in 31C3, the premier hacker conference! Their blog posts over the summer remain an excellent introduction to the new stack:

"OCaml-TLS: Introducing transport layer security (TLS) in pure OCaml" presents the motivation and architecture behind our clean-slate implementation of the protocol.
"OCaml-TLS: building the nocrypto library core" talks about the cryptographic primitives that form the heart of TLS confidentiality guarantees, and how they expose safe interfaces to the rest of the stack.
"OCaml-TLS: adventures in X.509 certificate parsing and validation" explains how authentication and chain-of-trust verification is implemented in our stack.
"OCaml-TLS: ASN.1 and notation embedding" introduces the libraries needed for handling ASN.1 grammars, the wire representation of messages in TLS.
"OCaml-TLS: the protocol implementation and mitigations to known attacks" concludes with the implementation of the core TLS protocol logic itself.

By summer, the stack was complete enough to connect to the majority of TLS 1.0+ sites on the Internet, and work progressed to integration with the remainder of the MirageOS libraries. By November, the Conduit network library had Unix support for both the OpenSSL/Lwt bindings and the pure OCaml stack, with the ability to dynamically select them. You can now deploy and test the pure OCaml TLS stack on a webserver simply by:

opam install lwt tls cohttp
export CONDUIT_TLS=native
cohttp-server-lwt -c <certfile> -p <port> <directory>

This will spin up an HTTPS server that serves the contents of <directory> to you over TLS. At the same time, we were also working on integrating the TLS stack into the Xen unikernel backend, so we could run completely standalone. This required some surgery:

The nocrypto crypto core is written in C, so we had to improve support for linking in external C libraries. Since the Xen unikernel is a single address-space custom kernel, we also need to be careful to compile it with the correct compilation flags or else risk subtle bugs. Thomas Leonard completely rearranged the MirageOS compilation pipeline to support separation compilation of C stubs, and we had the opportunity to remove lots of duplicated code within mirage-platform as a result of this work.
Meanwhile, the problem of gathering entropy in a virtual machine reared its head. We created a mirage-entropy device driver, and an active discussion ensued about how best to gather reliable randomness from Xen. Dave Scott built the best solution -- the xenentropyd that proxies entropy from dom0 to a unikernel VM.
David Kaloper also ported the nocrypto library to use the OCaml-Ctypes library, which increases the safety of the C bindings significantly. This is described in more detail in the "Modular foreign function bindings" blog post from the summer. This forms the basis for allowing Xen unikernels to communicate with C code, and integration with the MirageOS toolchain will continue to improve next year.

You can see Hannes and David present OCaml-TLS at CCC online. It's been a real pleasure watching their work develop in the last 12 months with such precision and attention to detail!

HTTP and JavaScript

Rudi Grinberg got sufficiently irked with the poor state of documentation for the CoHTTP library that he began gently contributing fixes towards the end of 2013, and rapidly became one of the maintainers. He also began improving the ecosystem around the web stack by building a HTTP routing layer, described in his blog posts:

Type Safe Routing - Baby Steps: type-safe routing of URLs to avoid dangling links
Introducing Opium: middleware for REST services
Middleware in Opium: a walkthrough the Opium HTTP middleware model
Introducing Humane-Re: more friendly regular expression interfaces

Meanwhile, Andy Ray started developing HardCaml (a register transfer level hardware design system) in OCaml, and built the iocamljs interactive browser notebook. This uses js_of_ocaml to port the entire OCaml compilation toolstack to JavaScript, including ocamlfind, Lwt threading and dynamic loading support. The results are browsable online, and it is now easy to generate a JavaScript-driven interactive page for many MirageOS libraries.

An interesting side effect of Andy's patches were the addition of a JavaScript port to the CoHTTP library. For those not familiar with the innards, CoHTTP uses the OCaml module system to build a very portable HTTP implementation that can make mapped to different I/O models (Lwt or Async cooperative threading or POSIX blocking I/O), and to different operating systems (e.g. Unix or MirageOS). The JavaScript support mapped the high-level modules in CoHTTP to the XMLHTTPRequest native to JavaScript, allowing the same OCaml HTTP client code to run efficiently on Unix, Windows and now an IOCamlJS browser instance.

MirageOS uses a number of libraries developed by the Ocsigen team at IRILL in Paris, and so I was thrilled to deliver a talk there in December. Romain Calascibetta started integrating Ocsigen and MirageOS over the summer, and the inevitable plotting over beer in Paris lead Gabriel Radanne to kick off an effort to integrate the complete Ocsigen web stack into MirageOS. Head to ocsigen/ocsigenserver#54 if you're interested in seeing this happen in 2015! I also expect the JavaScript and MirageOS integration to continue to improve in 2015, thanks to large industrial users such as Facebook adopting js_of_ocaml in their open-source tools such as Hack and Flow.

IPv6

We've wanted IPv6 support in MirageOS since its inception, and several people contributed to making this possible. At the start of the year, Hugo Heuzard and David Sheets got IPv6 parsing support into the ipaddr library (with me watching bemusedly at how insanely complex parsing is versus IPv4).

Meanwhile, Nicolas Ojeda Bar had been building OCaml networking libraries independently for some time, such as a IMAP client, Maildir handler, and a Bittorrent client. He became interested in the networking layer of MirageOS, and performed a comprehensive cleanup that resulted in a more modular stack that now supports both IPv4 and IPv6!

The addition of IPv6 support also forced us to consider how to simplify the configuration frontend to MirageOS unikernels that was originally written by Thomas Gazagnaire and described here by Mindy Preston. Nicolas has proposed a declarative extension to the configuration that allows applications to extend the mirage command-line more easily, thus unifying the "built-in" MirageOS compilation modes (such as choosing between Xen or Unix) and protocol-specific choices (such as configuring IPv4 and IPv6).

The new approach opens up the possibility of writing more user-friendly configuration frontends that can render them as a text- or web-based selectors, which is really important as more real-world uses of MirageOS are being created. It should be possible in 2015 to solve common problems such as web or DNS serving without having to write a single line of OCaml code.

Profiling

One of the benefits touted by our CACM article on unikernels at the start of the year was the improved tooling from the static linking of an entire application stack with an operating system layer. Thomas Leonard joined the project this year after publishing a widely read blog series on his experiences from switching from Python to OCaml. Aside from leading (and upstreaming to Xen) the port of MirageOS to ARM, he also explored how to add profiling throughout the unikernel stack.

The support is now comprehensive and integrated into the MirageOS trees: the Lwt cooperative threading engine has hooks for thread switching, most of the core libraries register named events, traces are dumped into shared memory buffers in the CTF file format used by the Linux trace toolkit, and there are JavaScript and GTK+ GUI frontends that can parse them.

You can find the latest instructions on Tracing and Profiling on this website, and here are Thomas' original blog posts on the subject:

Irmin

Thomas Gazagnaire spent most of the year furiously hacking away at the storage layer in Irmin, which is a clean-slate storage stack that uses a Git-like branching model as the basis for distributed unikernel storage. Irmin 0.9.0 was released in December with efficiency improvements and a sufficiently portable set of dependencies to make JavaScript compilation practical.

"Introducing Irmin: Git-like distributed, branchable storage" describes the concepts and high-level architecture of the system.
"Using Irmin to add fault-tolerance to the Xenstore database" shows how Irmin is used in a real-world application: the security-critical Xen toolstack that manages hosts full of virtual machines (video).
There have been several other early adopters of Irmin for their own projects (independent of MirageOS). One of the most exciting is by Gregory Tsipenyuk, who has been developing a version-controlled Irmin-based IMAP server that offers a very different model for e-mail management. Expect to see more of this in the new year!

We also had the pleasure of Benjamin Farinier and Matthieu Journault join us as summer interns. Both of them did a great job improving the internals of Irmin, and Benjamin's work on Mergeable Persistent Datastructures will be presented at JFLA 2015.

Jitsu

Magnus Skjegstad returned to Cambridge and got interested in the rapid dynamic provisioning of unikernels. He built Jitsu, a DNS server that spawns unikernels in response to DNS requests and boots them in real-time with no perceptible lag to the end user. The longer term goal behind this is to enable a community cloud of ARM-based Cubieboard2 boards that serve user content without requiring centralised data centers, but with the ease-of-use of existing systems.

Building Jitsu and hitting our goal of extremely low latency management of unikernels required a huge amount of effort from across the MirageOS team.

Dave Scott and Jon Ludlam (two of the Xen maintainers at Citrix) improved the Xen xl toolstack to deserialise the VM startup chain to shave 100s of milliseconds off every operation.
Thomas Leonard drove the removal of our forked Xen MiniOS with a library version that is being fed upstream (including ARM support). This made the delta between Xen and MirageOS much smaller and therefore made reducing end-to-end latency tractable.
David Sheets built a test harness to boot unikernel services and measure their latency under very different conditions, including contrasting boot timer versus Docker containers. In many instances, we ended up booting faster than containers due to not touching disk at all with a standalone unikernel. Ian Leslie built us some custom power measurement hardware that came in handy to figure out how to drive down the energy cost of unikernels running on ARM boards.
Thomas Gazagnaire, Balraj Singh, Magnus Skjegstad built the synjitsu proxy server that intercepts and proxies TCP connections to mask the couple of 100 milliseconds during unikernel boot time, ensuring that no TCP connections ever require retransmission from the client.
Dave Scott and I built out the vchan shared memory transport that supports low-latency communiction between unikernels and/or Unix processes. This is rapidly heading into a Plan9-like model, with the additional twist of using Git instead of a flat filesystem hierarchy as its coordination basis.
Amir Chaudhry and Richard Mortier documented the Git-based (and eventually Irmin-based) workflow behind managing the unikernels themselves, so that they can easily be deployed to distance ARM devices simply by running git pull. You can read more about this in his From Jekyll to Unikernels post.

All of this work was hastily crammed into a USENIX NSDI 2015 paper that got submitted at 4am on a bright autumn morning. Here is the published paper, and we're planning a blog post describing how you can deploy this infrastructure for yourself.

Community

All of the above work was only possible due to the vastly improved tooling and infrastructure around the project. Our community manager Amir Chaudhry led the minuted calls every two weeks that tied the efforts together, and we established some pioneer projects for newcomers to tackle.

The OPAM package manager continued to be the frontend for all MirageOS tools, with releases of libraries happening regularly. Because of the modular nature of MirageOS code, most of the libraries can also be used as normal Unix-based libraries, meaning that we aren't just limited to MirageOS users but can benefit from the entire OCaml community. The graph to the right shows the growth of the total package database since the project started to give you a sense of how much activity there is.

The major OPAM 1.2 also added a number of new features that made MirageOS code easier to develop, including a Git-based library pinning workflow that works superbly with GitHub, and easier Travis integration for continuous integration. Nik Sultana also improved the is-mirage-broken to give us a cron-driven prod if a library update caused an end-to-end failure in building the MirageOS website or other self-hosted infrastructure.

Our favourite random idiot, Mindy Preston, wrote up a superb blog series about her experiences in the spring of 2014 with moving her homepage to be hosted on MirageOS. This was followed up by Thomas Leonard, Phil Tomson, Ian Wilkinson, Toby Moore, and many others that we've tried to record in our link log. We really appreciate the hundreds of bug reports filed by users and folk trying out MirageOS; by taking the trouble to do this, you've helped us refine and polish the frontend. One challenge for 2015 that we could use help on is to pull together many of these distributed blogged instructions and merge them back into the main documentation (get in touch if interested!).

OCaml has come a long way in the last year in terms of tooling, and another task my research group OCaml Labs works on at Cambridge is the development of the OCaml Platform. I'll be blogging separately about our OCaml-specific activities in a few days, but all of this work has a direct impact on MirageOS itself since it lets us establish a local feedback loop between MirageOS and OCaml developers to rapidly iterate on large-scale development. The regular OCaml compiler hacking sessions organised by Jeremy Yallop and Leo White have been a great success this year, with a wide variety of people from academic (Cambridge, London universities and Microsoft Research) and industrial (Jane Street, Citrix and Facebook among others) and locally interested folk. One very important project that has had a lot of work put into it in 2014 (but isn't quite ready for a public release yet) is Assemblage, which will remove much of the boilerplate currently needed to build and release an OCaml library to OPAM.

We also had a great time working with open-source summer programs. Thanks to the Xen Foundation and GNOME for their support here, and we hope to do this again next summer! The roundup posts were:

OPW FIN by Mindy Preston: on of her FOSS Outreach Program work.
Amazon Adventures by Jyotsna Prakash: on her Google Summer of Code 2014 efforts on EC2 bindings.

Upcoming features

So what's coming up for our unikernels in 2015? Our focus heading into the new year is very much on improving the ease-of-use and deployability of MirageOS and fleshing out the feature set for the early adopters such as the XAPI project, Galois, and the Nymote personal data project. Here are some of the highlights:

Dust Clouds: The work on Jitsu is leading to the construction of what we term "dust clouds": on-demand scaling of unikernel services within milliseconds of requests coming in, terminated right beside the user on local ARM devices. The model supports existing clouds as well, and so we are improving support for cloud APIs such via Jyotsna Prakash's EC2 bindings, XenAPI, and (volunteers needed) OpenStack support. If you're interested in tracking this work, head over to the Nymote site for updates.
Portability: Beyond Xen, there are several efforts afoot to port MirageOS to bare metal targets. One promising effort is to use Rump Kernels as the boot infrastructure and MirageOS as the application stack. We hope to have a Raspberry Pi and other ARM targets fairly soon. Meanwhile at the end of the spectrum is mobile computing, which was part of the original multiscale vision for starting the project. The JavaScript, iOS and Android ports are all progressing (mainly thanks to community contributions around OCaml support for this space, such as Jeff Psellos' hard work on OCaml-IOS).
Protocol Development: There are a huge number of protocols being developed independently, and more are always welcome. Luke Dunstan is hacking on multicast DNS support, we have an IMAP client and server, Dominic Price has built a series of social network APIs for Facebook or Tumblr, and Masoud Koleini has been extending Haris Rotsos' work to achieve a line-rate and type-safe OpenFlow switch and controller based on the Frenetic project. Hannes is also developing Jackline, which uses his MirageOS to assemble a trustworthy communication client. Daniel Buenzli also continues to release a growing set of high-quality, modular libraries that we depend on throughout MirageOS.
Storage: All storage services from the unikernels will be Git-based (e.g. logging, command-and-control, key-value retrieval). Expect to see Xen toolstack extensions that make this support seamless, so a single Linux VM will be able to control a large army of unikernels via persistent data structures.

Want to get involved?

This is a really fun time to get involved with unikernels and the MirageOS project. The year of 2014 has seen lots of discussion about the potential of unikernels and we'll see some of the first big deployments involving them in 2015. For the ones among you who wish to learn more, then check out the pioneer projects, watch out for Amir's meeting notes and join the voice calls if you want a more interactive discussion, and engage on the mailing lists with any questions you might have.

For me personally, it's been a real privilege to spend the year working with and learning from the friendly, intelligent and diverse community that is springing up around the project. The progression from experiment to reality has been a lot of work, but the unikernel dream is finally coming together rath[er nicely thanks to everyone's hard work and enthusiasm. I'd also like to thank all of our funding bodies and the Linux Foundation and the Xen Project (especially Lars Kurth and Russell Pavlicek) for their support throughout the year that made all this work possible. Happy new year, everyone!