The MirageOS Blogon building functional operating systems

Jul
18
2016
16
0

MirageOS Summer 2016 Hackathon roundup

By Gemma Gordon, Anil Madhavapeddy

Our first Cambridge-based MirageOS hackathon took place yesterday - and what a fantastic day it was! The torrential rain may have halted our punting plans, but it didn't stop progress in the Old Library! Darwin College was a fantastic venue, complete with private islands linked by picturesque wooden bridges and an unwavering wifi connection.

People naturally formed groups to work on similar projects, and we had a handful of brand new users keen to get started with OCaml and Mirage. The major tasks that emerged were:

  • new hypervisor target: the integration of the Solo5 KVM-based hypervisor backend, bringing the number of officially supported targets up to 3 (Xen, Unix and KVM)
  • build system template: establishing a new topkg template for MirageOS libraries, to prepare us for building a unified API documentation bundle that works across all the entire project.
  • CPU portability: improving ARM support via a better base OS image.
  • libraries breadth: hacking on all the things to fill in the blanks, such as btree support for bare-metal Irmin, or a peer-to-peer layer for the DataKit.

We'll write about all of this in more detail, but for now here are the hackathon notes hot off the press...

Solo5/MirageOS integration (KVM-based backend)

Progress on the Solo5 project has been steaming ahead since January, and this was the perfect opportunity to get everyone together to plan its integration with MirageOS. Dan Williams from IBM Research flew over to join us for the week, and Martin Lucina headed to Cambridge to prepare for the upstreaming of the recent Solo5 work. This included deciding on naming and ownership of the repositories, detailing the relationships between repositories and getting ready to publish the mirage-solo5 packages to OPAM. Mindy Preston, our MirageOS 3.0 release manager, and Anil Madhavapeddy and Thomas Gazagnaire (OPAM minions) were on hand to help plan this smoothly.

See their updates from the day on Canopy and related blog posts:

Onboarding new MirageOS/OCaml users

Our tutorials and onboarding guides really needed a facelift and an update, so Gemma Gordon spent the morning with some of our new users to observe their installation process and tried to pinpoint blockers and areas of misunderstanding. Providing the simple, concise instructions needed in a guide together with alternatives for every possible system and version requirement is a tricky combination to get right, but we made some changes to the installation guide that we hope will help. The next task is to do the same for our other popular tutorials, reconfigure the layout for easy reading and centralise the information as much as possible between the OPAM, MirageOS and OCaml guides. Thank you to Marwan Aljubeh for his insight into this process.

Other industrial users are also steaming ahead with their own MirageOS deployments. Amir Chaudhry spent the hackathon blogging about NFV Platforms with MirageOS unikernels, which details how Ericsson Silicon Valley has been using MirageOS to build lightweight routing kernels.

Packaging

Thomas Gazagnaire was frenetically converting functoria, mirage, mirage-types and mirage-console to use topkg, and the feedback prompted fixes and a new release from Daniel Buenzli.

ARM and Cubieboards

Ian Campbell implemented a (slightly hacky) way to get Alpine Linux onto some Cubieboard2 boxes and provided notes on his process, including how to tailor the base for KVM and Xen respectively.

Meanwhile, Qi Li worked on testing and adapting simple-nat and mirage-nat to provide connectivity control for unikernels on ARM Cubieboards to act as network gateways.

Hannes Mehnert recently published a purely functional ARP package and continued refining it (with code coverage via bisect-ppx) during the hackathon.

MirageOS 3.0 API changes

Our MirageOS release manager, Mindy Preston, was on hand to talk with everyone about their PRs in preparation for the 3.0 release along with some patches for deprecating out of date code. There has been a lot of discussion on the development list. One focus was to address time handling properly in the interfaces: Matthew Gray came up from London to finish up his extensive revision of the CLOCK interface, and Hannes developed a new duration library to handle time unit conversions safely and get rid of the need for floating point handling. We are aiming to minimise the dependency on floating point handling in external interfaces to simplify compilation to very embedded hardware that only has soft floats (particularly for something as ubiquitous as time handling).

Error logging

Thomas Leonard continued with the work he started in Marrakech by updating the error reporting patches (also here) to work with the latest version of MirageOS (which has a different logging system based on Daniel Buenzlis Logs). See the original post for more details.

Ctypes 0.7.0 release

Jeremy released the foreign function interface library Ctypes 0.7.0 which, along with bug fixes, adds the following features:

  • Support for bytecode-only architectures (#410)
  • A new sint type corresponding to a full-range C integer and updated errno support for its use (#411)

See the full changelog online.

P2P key-value store over DataKit

KC Sivaramakrishnan and Philip Dexter took on the challenge of grabbing the Docker DataKit release and started building a distributed key-value store that features flexible JSON synching and merging. Their raw notes are in a Gist -- get in touch with them if you want to help hack on the sync system backed by Git.

Developer experience improvements

The OCaml Labs undergraduate interns are spending their summers working on user improvements and CI logs with MirageOS, and used the time at the hackathon to focus on these issues.

Ciaran Lawlor is working on an editor implementation, specifically getting the IOcaml kernel working with the Hydrogen plugin for the Atom editor. This will allow developers to run OCaml code directly in Atom, and eventually interactively build unikernels!

Joel Jakubovic used Angstrom (a fast parser combinator library developed by Spiros Eliopoulos) to ANSI escape codes, usually displayed as colours and styles into HTML for use in viewing CI logs.

Windows Support

Most of the Mirage libraries already work on Windows thanks to lots of work in the wider OCaml community, but other features don't have full support yet.

Dave Scott from Docker worked on ocaml-wpcap: a ctypes binding to the Windows winpcap.dll which lets OCaml programs send and receive ethernet frames on Windows. The ocaml-wpcap library will hopefully let us run the Mirage TCP/IP stack and all the networking applications too.

David Allsopp continued his OPAM-Windows support by fine-tuning the 80 native Windows OCaml versions - these will hopefully form part of OPAM 2.0. As it turns out, he's not the only person still interested in being able to run OCaml 3.07...if you are, get in touch!

General Libraries and utilities

Olivier Nicole is working on an implementation of macros in OCaml and started working on the HTML and XML templates using this system. The objective is to have the same behaviour as the Pa_tyxml syntax extension, but in a type-safe and more maintainable way without requiring PPX extensions. This project could be contributed to the development of Ocsigen once implemented.

Nick Betteridge teamed up with Dave Scott to look at using ocaml-btree as a backend for Irmin/xen and spent the day looking at different approaches.

Anil Madhavapeddy built a Docker wrapper for the CI system and spun up a big cluster to run OPAM bulk builds. Several small utilities like jsontee and an immutable log collection server and bulk build scripts will be released in the next few weeks once the builds are running stably, and be re-usable by other OPAM-based projects to use for their own tests.

Christophe Troestler is spending a month at OCaml Labs in Cambridge this summer, and spent the hack day working on implementing a library to allow seamless application switching from HTTP to FastCGI. Christophe has initiated work on a client and server for this protocol using CoHTTP so that it is unikernel-friendly.


Jun
29
2016
16
0

MirageOS Summer 2016 Hackathon announcement, and talk roundup

By Gemma Gordon, Anil Madhavapeddy

As summer starts to shine over an obstinately rainy England, we are organising the second MirageOS hackathon in Cambridge! It will be held on Weds 13th July at the lovely Darwin College from 9am-11pm, with snacks, teas, coffees and a servery lunch provided (thanks to sponsorship from Docker and OCaml Labs).

Anyone is welcome at all skill levels, but we'd appreciate you filling out the Doodle so that we can plan refreshments. We will be working on a variety of projects from improving ARM support, to continuous integration tests, the new Solo5 backend and improving the suite of protocol libraries. If you have something in particular that interests you, please drop a note to the mailing list or check out the full list of Pioneer Projects.

Some other events of note recently:

  • After several years of scribing awesome notes about our development, Amir has handed over the reigns to Enguerrand. Enguerrand joined OCaml Labs as an intern, and has built an IRC-to-Git logging bot which records our meetings over IRC and commits them directly to a repository which is available online. Thanks Amir and Enguerrand for all their hard work on recording the growing amount of development in MirageOS. Gemma Gordon has also joined the project and been coordinating the meetings. The next one is in a few hours, so please join us on #mirage on Freenode IRC at 4pm British time if you would like to participate or are just curious!

  • Our participation in the Outreachy program for 2016 has begun, and the irrepressible Gina Marie Maini (aka wiredsister) has been hacking on syslogd, mentored by Mindy Preston. She has already started blogging (about syslog and OCaml love), as well as podcasting with the stars. Welcome to the crew, Gina!

  • The new Docker for Mac and Docker for Windows products have entered open beta! They use a number of libraries from MirageOS (including most of the network stack) and provide a fast way of getting started with containers and unikernel builds on Mac and Windows. You can find talks about it at the recent JS London meetup and my slides I also spoke at OSCON 2016 about it, but those videos aren't online yet.

There have also been a number of talks in the past couple of months about MirageOS and its libraries:

  • Dan Williams from IBM Research delivered a paper at USENIX HotCloud 2016 about Unikernel Monitors. This explains the basis of his work on Solo5, which we are currently integrating into MirageOS as a KVM-based boot backend to complement the Xen port. You can also find his talk slides online.
  • Amir Chaudhry has given several talks and demos recently: check out his slides and detailed writeups about GlueCon 2016 and CraftConf 2016 in particular, as they come with instructions on how to reproduce his Mirage/ARM on-stage demonstrations of unikernels.
  • Sean Grove is speaking at Polyconf 2016 next week in Poland. If you are in the region, he would love to meet up with you as well -- his talk abstract is below
    With libraries like Mirage, js_of_ocaml, & ARM compiler output OCaml apps can operate at such a low level we don't even need operating systems on the backend anymore (removing 15 million lines of memory-unsafe code)
    • while at the same time, writing UI's is easier & more reliable than ever before, with lightweight type-checked code sharing between server, browser clients, & native mobile apps. We'll look at what's enabled by new tech like Unikernels, efficient JS/ARM output, & easy host interop.

May
4
2016
16
0

MirageOS Spring 2016 Hackathon!

By Gemma Gordon

Hackathon Trip Reports

We're looking forward to the next MirageOS hackathon already! We've collected some reports from those who were present at our 2016 Spring hackathon to share our excitement! Thanks to the folks who put in the time and effort to organize the event and our wonderful hosts, and a huge thanks to everyone who documented their hackathon experience!

More information is also available at the Canopy site developed and used for information sharing during the hackathon!

Trip Report

by David Kaloper

roof-flash

Last month, the MirageOS community saw its first community-organized, international hackathon. It took place between 11th and 16th March 2016. The venue? Rihad Priscilla, Marrakech, Morocco.

The place turned out to be ideal for a community building exercise. A city bursting with life, scents and colors, a relaxed and friendly hostel with plenty of space, warm and sunny weather -- all the elements of a good get-together were there. This is where some 25 hackers from all over the world convened, with various backgrounds and specialties, all sharing an interest in MirageOS.

Not wanting to limit ourselves to breaking only those conventions, we added another layer: the hackathon was set up as a classical anti-conference, with a bare minimum of structure, no pre-defined program, and a strong focus on one-on-one work, teaching, and collaboration.

As this was the first hackathon, this time the focus was on building up the nascent community that already exists around MirageOS. Faces were put to online handles, stories were exchanged, and connections were forged. Meeting in person helped bring a new level of cohesion to the online community around the project, as witnessed by the flurry of online conversations between people that were present, and have continued after the event ended.

One particularly useful (however inglorious) activity proved to be introducing people to the tool chain. Even though the MirageOS website has a documentation section with various documents on the architecture of the project, technical blog posts and a series of examples to get newcomers started, a number of people found it difficult to juggle all the concepts and tools involved. Where is the line dividing ocamlfind from opam and what exactly constitutes an OCaml library? What is the correct way to declare dependencies not covered by the declarative configuration language? When should one use add_to_opam_packages, and when add_to_ocamlfind_libraries? Will the mirage tool take care of installing missing packages so declared?

Although these questions either have answers scattered throughout the docs, or are almost obvious to an experienced MirageOS developer, such getting-started issues proved to be an early obstacle for a number of hackathon participants. While our project documentation certainly could -- and will! -- be improved with the perspective of a new developer in mind, this was an opportunity to help participants get a more comprehensive overview of the core tooling in an efficient, one-to-one setting. As a result, we saw a number of developers go from trying to get the examples to compile to making their own unikernels within a day, something pretty awesome to witness!

Another fun thread was dogfooding the network stack. Network itself was provided by our venue Priscilla, but we brought our own routers and access points. DHCP on site was served by Charrua, which stood up to the task admirably. We were able to access arbitrary domains on the Internet, almost all of the time!

A group of hackers had a strong web background, and decided to focus their efforts there. Perhaps the most interesting project to come out of this is Canopy. Canopy is best described as the first dynamic offering in the space of static web site generators! It combines Irmin with TyXML, COW, and Mirage HTTP, to create a simple, one-stop solution for putting content on the web. A Canopy unikernel boots, pulls a series of markdown files from a git repository, renders them, and serves them via HTTP. Expect more to come in this space, as Canopy has already proved to be a handy tool to simply put something on the web.

At the same time, the atmosphere was conducive for discussing how OCaml in general, and MirageOS in particular, fits in the web development ecosystem. As a language originally honed in different contexts, it's the opinion of a number of practicing web developers that the current OCaml ecosystem is not as conducive to supporting their day-to-day work as it could be. These brainstorming sessions led to a writeup which tries to summarize the current state and plot the course forward.

Another group of hackers was more focused on security and privacy technology. MirageOS boasts its own cryptographic core and a TLS stack, providing a solid base for development of cryptographic protocols. We saw coordinated work on improving the cryptographic layer; implementations of a few key-derivation functions (Scrypt and PBKDF); and even a beginning of an IKEv2 implementation.

A further common topic was networking, which is not entirely surprising for a network-centric unikernel platform. Amidst the enthusiasm, hackers in attendance started several projects related to general networking. These include a SWIM membership protocol implementation, the beginnings of telnet for Mirage, SOCKS4 packet handling, DNS wildcard matching, Charrua updates, and more.

In between these threads of activity, people used the time to get general MirageOS work done. This resulted in lots of progress including: making AFL, already supported by OCaml, run against MirageOS unikernels; a comprehensive update of error reporting across the stack; a concentrated push to move away from Camlp4 and adopt PPX; and producing a prototype unikernel displaying rich terminal output via telnet.

Partially motivated by the need to improve the experience of building and running unikernels, a number of hackers worked on improvements to error reporting and logging. Improving the experience when things go wrong will be an important part of helping folks make unikernels with MirageOS.

For more, and less structured, details of what went on, check out the blog some of us kept, or the meeting notes from the few short morning meetings we had.

It seems that when surrounded by like-minded, skilled people, in a pleasant atmosphere, and with absolutely nothing to do, people's curiosity will reliably kick in. In between lieing in the sun (sunscreen was a hot commodity!), sinking into the midday heat, and talking to other hackers, not a single person failed to learn, practice, or produce something new.

In this way, the first MirageOS hackathon was a resounding success. Friendships were forged, skills shared, and courses plotted. And although the same venue has already been booked for the next year's event, there is ongoing chit-chat about cutting the downtime in half with a summer edition!

heat


MirageOS hackathon in Marrakech

Text and images by Enguerrand Decorne

Setting up and settling in

The first MirageOS hackathon was held from March 11th-16th 2016, at Priscilla, Queen of the Medina, Marrakech. It successfully gathered around 30 Mirage enthusiasts, some already familiar with the MirageOS ecosystem, and others new to the community. People travelled from Europe and further afield for a week of sun, tajine and hacking.

Main room

Getting to the guesthouse was an adventure, and once there we prepared by quickly setting up a nice internet hotspot then organised groups to head to the souk to meet new arrivals. Soon enough the guest house was filled with people, and various new projects and ideas began to emerge. Having a few books and experienced OCaml developers around helped the OCaml newcomers get stuck in, and it didn't take long to get their first unikernel or OCaml library up and running. Daily meetings were arranged at noon on the rooftop in order to allow the exchange of project ideas and questions, and we used the hackathon notepad to loosely pair projects and people together. Our DHCP server enabled extensive dogfooding and successfully fulfilled our project-testing needs.

Participants found a wide range of activities to keep themselves occupied during the event: contributing to the MirageOS Pioneer Projects, starting new projects and libraries, improving the MirageOS ecosystem and core components, discussing new ideas... or simply enjoying the sun, delicious tajine, or walking around Marrakech itself. Some expeditions were also (non)organised during the week, allowing sightseeing of the nicest local spots, or negotiating with local stallholders to get the best prices on souvenirs and fresh fruits to enjoy during hard hacking sessions.

Food

My week inside the camel's nest

A few days before heading up to Marrakech (in a very non-organised fashion, having been offered a hackathon place only two days before!) the idea of writing some kind of notebook using Mirage had been floating around - we wanted to be able to allow people inside the hackathon to exchange ideas, and those not physically present to be kept updated about progress. I decided to write a simple blog unikernel, Canopy which relies on Irmin's capabilities to synchronise remote git repositiories. By describing new pages in a format similar to Jekyll (and using Markdown) on a git repository, new content pushed there would be pulled to the website and displayed there nicely. This allowed every participant to report on their current projects, and see the content displayed on the notepad after a simple git push.

The project was well received and new ideas started to emerge in order to turn it into a CMS enabling users to easily describe new website with a simple git repository. A huge thank you to Michele for his awesome contributions, as well as everyone involved with answering questions about the Mirage ecosystem along the way. This project also allowed me to dive a little further inside various libraries, report a few issues, discuss features and discover new concepts... A week well spent that I would be glad to repeat at the next MirageOS hackathon :)

Conclusion

Rooftop

This hackathon was a huge success and allowed the MirageOS community to combine sun and high productivity in a crazy yet very relaxing week. We hope (and plan) to see more events like this, so anyone interested in OCaml, Mirage - expert or not - is more than welcome to join us next time!

Cats


MirageOS + OCaml Newcomers

by Alfredo and Sonia

Our experience in Marrakesh was great. We really enjoyed the place, the weather, the food, the people and the atmosphere! I think the setting was a great win, there was lot of open space where you could find a quiet spot for yourself to concentrate while programming, as well as a place with lots of people coding, or a place where you could be talking about anything while enjoying the sun, or just hang out and get lost for a while in the nice Marrakesh's old city.

We had already learnt some OCaml, but we both are quite new to both OCaml and MirageOS, so we decided to work on a project with low entry barrier so we could get in the loop more easily. Nevertheless we had to invest some time getting more familiar with the specifics of the OCaml environment (libraries, packaging, testing frameworks, etc.). Hannes kindly helped us getting started, showing us a library (ocaml-hkdf) we could use to understand this all better, and from here we could start writing some code. Having most of the authors (Thomas, David, Hannes...) of the libraries we used (nocrypto, cstruct, alcotest, opam...) there with us was also a win. Finally we managed to release a pair of libraries with key derivation functions (ocaml-pbkdf and ocaml-scrypt-kdf), so we are quite happy with the outcome.

The only downside of the hackathon we can think of, if any, is that we didn't get too deep into the MirageOS specifics (something we are surely willing to fix!), but we wanted to stay focused to keep productive and had enough new things to learn.


Hackathon Projects

by Ximin Luo

Here's a list of things I did during the hackathon:

  • Read into ocaml-tls and ocaml-otr implementations, as well as David's "nqsb" TLS paper
  • Talked with David about developing a general pattern for implementing protocols, that allows one to compose components more easily and consistently. He pointed me to many resources that I could learn from and build on top of.
  • Read documents on "Extensible Effects", "Freer Monads" and "Iteratee pattern" by Oleg Kiselyov.
  • Read documents and source code of the Haskell Pipes library by Gabriel Gonzalez.
  • Sent some PRs to Hannes' jackline IM client, for better usability under some graphical environments.
  • Showed some people my ocaml-hello "minimal build scripts" example, and my ocaml emacs scripts.
  • Tested the "solo5" system that runs mirageos on kvm as an alternative to xen.

I'm continuing with the following work in my spare time:

  • Read documents and source code of the opam monadlib library with a view to extending this and unifying it with other libraries such as lwt.
  • Using the approach of the Haskel Pipes library to develop a general protocol handler framework. I'm experimenting initially in Haskell but I'd also like to do it in OCaml when the ideas are more solid.

In terms of the event it was great - everything worked out very well, I don't have any suggestions for improvements :)


May
3
2016
18
0

MirageOS security advisory 00: mirage-net-xen

By Hannes Mehnert

MirageOS Security Advisory 00 - memory disclosure in mirage-net-xen

  • Module: mirage-net-xen
  • Announced: 2016-05-03
  • Credits: Enguerrand Decorne, Thomas Leonard, Hannes Mehnert, Mindy Preston
  • Affects: mirage-net-xen <1.4.2
  • Corrected: 2016-01-08 1.5.0 release, 2016-05-03 1.4.2 release

For general information regarding MirageOS Security Advisories, please visit https://mirage.io/security.

Hannes published a blog article about the analysis of this issue.

Background

MirageOS is a library operating system using cooperative multitasking, which can be executed as a guest of the Xen hypervisor. Virtual devices, such as a network device, share memory between MirageOS and the hypervisor. MirageOS allocates and grants the hypervisor access to a ringbuffer containing pages to be sent on the network device, and another ringbuffer with pages to be filled with received data. A write on the MirageOS side consists of filling the page with the packet data, submitting a write request to the hypervisor, and awaiting a response from the hypervisor. To correlate the request with the response, a 16bit identifier is used.

Problem Description

Generating this 16bit identifier was not done in a unique manner. When multiple pages share an identifier, and are requested to be transmitted via the wire, the first successful response will mark all pages with this identifier free, even those still waiting to be transmitted. Once marked free, the MirageOS application fills the page for another chunk of data. This leads to corrupted packets being sent, and can lead to disclosure of memory intended for another recipient.

Impact

This issue discloses memory intended for another recipient. All versions before mirage-net-xen 1.4.2 are affected. The receiving side uses a similar mechanism, which may lead to corrupted incoming data (eventually even mutated while being processed).

Version 1.5.0, released on 8th January, already assigns unique identifiers for transmission. Received pages are copied into freshly allocated buffers before passed to the next layer. When 1.5.0 was released, the impact was not clear to us. Version 1.6.1 now additionally ensures that received pages have a unique identifier.

Workaround

No workaround is available.

Solution

The unique identifier is now generated in a unique manner using a monotonic counter.

Transmitting corrupt data and disclosing memory is fixed in versions 1.4.2 and above.

The recommended way to upgrade is:

opam update
opam upgrade mirage-net-xen

Or, explicitly:

opam upgrade
opam reinstall mirage-net-xen=1.4.2

Affected releases have been marked uninstallable in the opam repository.

Correction details

The following list contains the correction revision numbers for each affected branch.

Memory disclosure on transmit:

master: 47de2edfad9c56110d98d0312c1a7e0b9dcc8fbf

1.4: ec9b1046b75cba5ae3473b2d3b223c3d1284489d

Corrupt data while receiving:

master: 0b1e53c0875062a50e2d5823b7da0d8e0a64dc37

1.4: 6daad38af2f0b5c58d6c1fb24252c3eed737ede4

References

mirage-net-xen

You can find the latest version of this advisory online at https://mirage.io/blog/MSA00.

This advisory is signed using OpenPGP, you can verify the signature by downloading our public key from a keyserver (gpg --recv-key 4A732D757C0EDA74), downloading the raw markdown source of this advisory from GitHub and executing gpg --verify 00.md.asc.


Feb
29
2016
12
0

Introducing Functoria

By Gabriel Radanne

For the last few months, I've been working with Thomas on improving the mirage tool and I'm happy to present Functoria, a library to create arbitrary MirageOS-like DSLs. Functoria is independent from mirage and will replace the core engine, which was somewhat bolted on to the tool until now.

This introduces a few breaking changes so please consult the breaking changes page to see what is different and how to fix things if needed. The good news is that it will be much more simple to use, much more flexible, and will even produce pretty pictures!

Configuration

For people unfamiliar with MirageOS, the mirage tool handles configuration of mirage unikernels by reading an OCaml file describing the various pieces and dependencies of the project. Based on this configuration it will use opam to install the dependencies, handle various configuration tasks and emit a build script.

A very simple configuration file looks like this:

open Mirage
let main = foreign "Unikernel.Main" (console @-> job)
let () = register "console" [main $ default_console]

It declares a new functor, Unikernel.Main, which take a console as an argument and instantiates it on the default_console. For more details about unikernel configuration, please read the hello-world tutorial.

Keys

A much demanded feature has been the ability to define so-called bootvars. Bootvars are variables whose value is set either at configure time or at startup time.

A good example of a bootvar would be the IP address of the HTTP stack. For example, you may wish to:

  • Set a good default directly in the config.ml
  • Provide a value at configure time, if you are already aware of deployment conditions.
  • Provide a value at startup time, for last minute changes.

All of this is now possible using keys. A key is composed of:

  • name — The name of the value in the program.
  • description — How it should be displayed/serialized.
  • stage — Is the key available only at runtime, at configure time, or both?
  • documentation — This is not optional, so you have to write it.

Imagine we are building a multilingual unikernel and we want to pass the default language as a parameter. The language parameter is an optional string, so we use the `opt` and `string` combinators. We want to be able to define it both at configure and run time, so we use the stage `Both. This gives us the following code:

let lang_key =
  let doc = Key.Arg.info
      ~doc:"The default language for the unikernel." [ "l" ; "lang" ]
  in
  Key.(create "language" Arg.(opt ~stage:`Both string "en" doc))

Here, we defined both a long option --lang, and a short one -l, (the format is similar to the one used by Cmdliner). In the unikernel, the value is retrieved with Key_gen.language ().

The option is also documented in the --help option for both mirage configure (at configure time) and ./my_unikernel (at startup time).

       -l VAL, --lang=VAL (absent=en)
           The default language for the unikernel.

A simple example of a unikernel with a key is available in mirage-skeleton in the `hello` directory.

Switching implementation

We can do much more with keys, for example we can use them to switch devices at configure time. To illustrate, let us take the example of dynamic storage, where we want to choose between a block device and a crunch device with a command line option. In order to do that, we must first define a boolean key:

let fat_key =
  let doc = Key.Arg.info
      ~doc:"Use a fat device if true, crunch otherwise." [ "fat" ]
  in
  Key.(create "fat" Arg.(opt ~stage:`Configure bool false doc))

We can use the `if_impl` combinator to choose between two devices depending on the value of the key.

let dynamic_storage =
  if_impl (Key.value fat_key)
    (kv_ro_of_fs my_fat_device)
    (my_crunch_device)

We can now use this device as a normal storage device of type kv_ro impl! The key is also documented in mirage configure --help:

       --fat=VAL (absent=false)
           Use a fat device if true, crunch otherwise.

It is also possible to compute on keys before giving them to if_impl, combining multiple keys in order to compute a value, and so on. For more details, see the API and the various examples available in mirage and mirage-skeleton.

Switching keys opens various possibilities, for example a generic_stack combinator is now implemented in mirage that will switch between socket stack, direct stack with DHCP, and direct stack with static IP, depending on command line arguments.

Drawing unikernels

All these keys and dynamic implementations make for complicated unikernels. In order to clarify what is going on and help to configure our unikernels, we have a new command: describe.

Let us consider the console example in mirage-skeleton:

open Mirage

let main = foreign "Unikernel.Main" (console @-> job)
let () = register "console" [main $ default_console]

This is fairly straightforward: we define a Unikernel.Main functor using a console and we instantiate it with the default console. If we execute mirage describe --dot in this directory, we will get the following output.

A console unikernel

As you can see, there are already quite a few things going on! Rectangles are the various devices and you'll notice that the default_console is actually two consoles: the one on Unix and the one on Xen. We use the if_impl construction — represented as a circular node — to choose between the two during configuration.

The key device handles the runtime key handling. It relies on an argv device, which is similar to console. Those devices are present in all unikernels.

The mirage device is the device that brings all the jobs together (and on the hypervisor binds them).

Data dependencies

You may have noticed dashed lines in the previous diagram, in particular from mirage to Unikernel.Main. Those lines are data dependencies. For example, the bootvar device has a dependency on the argv device. It means that argv is configured and run first, returns some data — an array of string — then bootvar is configured and run.

If your unikernel has a data dependency — say, initializing the entropy — you can use the ~deps argument on Mirage.foreign. The start function of the unikernel will receive one extra argument for each dependency.

As an example, let us look at the `app_info` device. This device makes the configuration information available at runtime. We can declare a dependency on it:

let main =
  foreign "Unikernel.Main" ~deps:[abstract app_info] (console @-> job)

A unikernel with info

The only difference with the previous unikernel is the data dependency — represented by a dashed arrow — going from Unikernel.Main to Info_gen. This means that Unikernel.Main.start will take an extra argument of type Mirage_info.t which we can, for example, print:

name: console
libraries: [functoria.runtime; lwt.syntax; mirage-console.unix;
            mirage-types.lwt; mirage.runtime; sexplib]
packages: [functoria.0.1; lwt.2.5.0; mirage-console.2.1.3; mirage-unix.2.3.1;
           sexplib.113.00.00]

The complete example is available in mirage-skeleton in the `app_info` directory.

Sharing

Since we have a way to draw unikernels, we can now observe the sharing between various pieces. For example, the direct stack with static IP yields this diagram:

A stack unikernel

You can see that all the sub-parts of the stack have been properly shared. To be merged, two devices must have the same name, keys, dependencies and functor arguments. To force non-sharing of two devices, it is enough to give them different names.

This sharing also works up to switching keys. The generic stack gives us this diagram:

A dynamic stack unikernel

If you look closely, you'll notice that there are actually three stacks in the last example: the socket stack, the direct stack with DHCP, and the direct stack with IP. All controlled by switching keys.

All your functors are belong to us

There is more to be said about the new capabilities offered by functoria, in particular on how to define new devices. You can discover them by looking at the mirage implementation.

However, to wrap up this blog post, I offer you a visualization of the MirageOS website itself (brace yourself). Enjoy!

Thanks to Mort, Mindy, Amir and Jeremy for their comments on earlier drafts.


Jan
7
2016
18
10

Run Mirage Unikernels on KVM/QEMU with Solo5

By Dan Williams

I'm excited to announce the release of Solo5! Solo5 is essentially a kernel library that bootstraps the hardware and forms a base (similar to Mini-OS) from which unikernels can be built. It runs on fully virtualized x86 hardware (e.g., KVM/QEMU), using virtio device interfaces.

Importantly, Solo5 is integrated (to some extent) with the MirageOS toolstack, so the Solo5 version of the Mirage toolstack can build Mirage unikernels that run directly on KVM/QEMU instead of Xen. As such, Solo5 can be considered an alternative to Mini-OS in the Mirage stack. Try it out today!

In the rest of this post, I'll give a bit of motivation about why I think the lowest layer of the unikernel is interesting and important, as well as a rough overview of the steps I took to create Solo5.

Why focus so far down the software stack?

When people think about Mirage unikernels, one of the first things that comes to mind is the use of a high-level language (OCaml). Indeed, the Mirage community has invested lots of time and effort producing implementations of traditional system components (e.g., an entire TCP stack) in OCaml. The pervasive use of OCaml contributes to security arguments for Mirage unikernels (strong type systems are good) and is an interesting design choice well worth exploring.

But underneath all of that OCaml goodness is a little kernel layer written in C. This layer has a direct impact on:

  • What environments the unikernel can run on. Mini-OS, for example, assumes a paravirtualized (Xen) machine, whereas Solo5 targets full x86 hardware virtualization with virtio devices.

  • Boot time. "Hardware" initialization (or lack of it in a paravirtualized case) is a major factor in achieving the 20 ms unikernel boot times that are changing the way people think about elasticity in the cloud.

  • Memory layout and protection. Hardware "features" like page-level write protection must be exposed by the lowest layer for techniques like memory tracing to be performed. Also, software-level strategies like address space layout randomization require cooperation of this lowest layer.

  • Low-level device interfacing. As individual devices (e.g., NICs) gain virtualization capabilities, the lowest software layer is an obvious place to interface directly with hardware.

  • Threads/events. The low-level code must ensure that device I/O is asynchronous and/or fits with the higher-level synchronization primitives.

The most popular existing code providing this low-level kernel layer is called Mini-OS. Mini-OS was (I believe) originally written as a vehicle to demonstrate the paravirtualized interface offered by Xen for people to have a reference to port their kernels to and as a base for new kernel builders to build specialized Xen domains. Mini-OS is a popular base for MirageOS, ClickOS, and other unikernels. Other software that implements a unikernel base include Rumprun and OSv.

I built Solo5 from scratch (rather than adapting Mini-OS, for example) primarily as an educational (and fun!) exercise to explore and really understand the role of the low-level kernel layer in a unikernel. To provide applications, Solo5 supports the Mirage stack. It is my hope that Solo5 can be a useful base for others; even if only at this point to run some Mirage applications on KVM/QEMU!

Solo5: Building a Unikernel Base from Scratch

At a high level, there are roughly 3 parts to building a unikernel base that runs on KVM/QEMU and supports Mirage:

  • Typical kernel hardware initialization. The kernel must know how to load things into memory at the desired locations and prepare the processor to operate in the correct mode (e.g., 64-bit). Unlike typical kernels, most setup is one-time and simplified. The kernel must set up a memory map, stack, interrupt vectors, and provide primitives for basic memory allocation. At its simplest, a unikernel base kernel does not need to worry about user address spaces, threads, or many other things typical kernels need.

  • Interact with virtio devices. virtio is a paravirtualized device standard supported by some hypervisors, including KVM/QEMU and Virtualbox. As far as devices go, virtio devices are simple: I was able to write (very simple/unoptimized) virtio drivers for Solo5 drivers from scratch in C. At some point it may be interesting to write them in OCaml like the Xen device drivers in Mirage, but for someone who doesn't know OCaml (like me) a simple C implementation seemed like a good first step. I should note that even though the drivers themselves are written in C, Solo5 does include some OCaml code to call out to the drivers so it can connect with Mirage.

  • Appropriately link Mirage binaries/build system. A piece of software called mirage-platform performs the binding between Mini-OS and the rest of the Mirage stack. Building a new unikernel base means that this "cut point" will have lots of undefined dependencies which can either be implemented in the new unikernel base, stubbed out, or reused. Other "cut points" involve device drivers: the console, network and block devices. Finally, the mirage tool needs to output appropriate Makefiles for the new target and an overall Makefile needs to put everything together.

Each one of these steps carries complexity and gotchas and I have certainly made many mistakes when performing all of them. The hardware initialization process is needlessly complex, and the overall Makefile reflects my ignorance of OCaml and its building and packaging systems. It's a work in progress!

Next Steps and Getting Involved

In addition to the aforementioned clean up, I'm currently exploring the boot time in this environment. So far I've found that generating a bootable iso with GRUB as a bootloader and relying on QEMU to emulate BIOS calls to load the kernel is, by the nature of emulation, inefficient and something that should be avoided.

If you find the lowest layer of the unikernel interesting, please don't hesitate to contact me or get involved. I've packaged the build and test environment for Solo5 into a Docker container to reduce the dependency burden in playing around with it. Check out the repo for the full instructions!

I'll be talking about Solo5 at the upcoming 2016 Unikernels and More: Cloud Innovators Forum event to be held on January 22, 2016 at SCALE 14X in Pasadena, CA USA. I look forward to meeting some of you there!

Discuss this post on devel.unikernel.org

Thanks to Amir, Mort, and Jeremy, for taking the time to read and comment on earlier drafts.



Dec
29
2015
15
30

Introducing Charrua — a DHCP implementation

By Christiano Haesbaert

Almost every network needs to support DHCP (Dynamic Host Configuration Protocol), that is, a way for clients to request network parameters from the environment. Common parameters are an IP address, a network mask, a default gateway and so on.

DHCP can be seen as a critical security component, since it deals usually with unauthenticated/unknown peers, therefore it is of special interest to run a server as a self-contained MirageOS unikernel.

Charrua is a DHCP implementation written in OCaml and it started off as an excuse to learn more about the language. While in development it got picked up on the MirageOS mailing lists and became one of the Pioneer Projects.

The name Charrua is a reference to the, now extinct, semi-nomadic people of southern South America — nowadays it is also used to refer to Uruguayan nationals. The logic is that DHCP handles dynamic (hence nomadic) clients.

The library is platform agnostic and works outside of MirageOS as well. It provides two main modules: Dhcp_wire and Dhcp_server.

Dhcp_wire

Dhcp_wire provides basic functions for dealing with the protocol, essentially marshalling/unmarshalling and helpers for dealing with the various DHCP options.

The central record type of Dhcp_wire is a pkt, which represents a full DHCP packet, including layer 2 and layer 3 data as well as the many possible DHCP options. The most important functions are:

val pkt_of_buf : Cstruct.t -> int -> [> `Error of string | `Ok of pkt ]
val buf_of_pkt : pkt -> Cstruct.t

pkt_of_buf takes a Cstruct.t buffer and a length and it then attempts to build a DHCP packet. Unknown DHCP options are ignored, invalid options or malformed data are not accepted and you get an `Error of string.

buf_of_pkt is the mirror function, but it never fails. It could for instance fail in case of two duplicate DHCP options, but that would imply too much policy in a marshalling function.

The DHCP options from RFC2132 are implemented in dhcp_option. There are more, but the most common ones look like this:

type dhcp_option =
  | Subnet_mask of Ipaddr.V4.t
  | Time_offset of int32
  | Routers of Ipaddr.V4.t list
  | Time_servers of Ipaddr.V4.t list
  | Name_servers of Ipaddr.V4.t list
  | Dns_servers of Ipaddr.V4.t list
  | Log_servers of Ipaddr.V4.t list

Dhcp_server

Dhcp_server Provides a library for building a DHCP server and is divided into two sub-modules: Config, which handles the building of a suitable DHCP server configuration record and Input, which handles the input of DHCP packets.

The logic is modelled in a pure functional style and Dhcp_server does not perform any IO of its own. It works by taking an input packet, a configuration and returns a possible reply to be sent by the caller, or an error/warning:

Input

type result = 
| Silence                 (* Input packet didn't belong to us, normal nop event. *)
| Reply of Dhcp_wire.pkt  (* A reply packet to be sent on the same subnet. *)
| Warning of string       (* An odd event, could be logged. *)
| Error of string         (* Input packet is invalid, or some other error ocurred. *)

val input_pkt : Dhcp_server.Config.t -> Dhcp_server.Config.subnet ->
   Dhcp_wire.pkt -> float -> result
(** input_pkt config subnet pkt time Inputs packet pkt, the resulting action
    should be performed by the caller, normally a Reply packet is returned and
    must be sent on the same subnet. time is a float representing time as in
    Unix.time or MirageOS's Clock.time. **)

A typical main server loop would work by:

  1. Reading a packet from the network.
  2. Unmarshalling with Dhcp_wire.pkt_of_buf.
  3. Inputting the result with Dhcp_server.Input.input_pkt.
  4. Sending the reply, or logging the event from the Dhcp_server.Input.input_pkt call.

A mainloop example can be found in mirage-skeleton:

  let input_dhcp c net config subnet buf =
    let open Dhcp_server.Input in
    match (Dhcp_wire.pkt_of_buf buf (Cstruct.len buf)) with
    | `Error e -> Lwt.return (log c (red "Can't parse packet: %s" e))
    | `Ok pkt ->
      match (input_pkt config subnet pkt (Clock.time ())) with
      | Silence -> Lwt.return_unit
      | Warning w -> Lwt.return (log c (yellow "%s" w))
      | Error e -> Lwt.return (log c (red "%s" e))
      | Reply reply ->
        log c (blue "Received packet %s" (Dhcp_wire.pkt_to_string pkt));
        N.write net (Dhcp_wire.buf_of_pkt reply)
        >>= fun () ->
        log c (blue "Sent reply packet %s" (Dhcp_wire.pkt_to_string reply));
        Lwt.return_unit

As stated, Dhcp_server.Input.input_pkt does not perform any IO of its own, it only deals with the logic of analyzing a DHCP packet and building a possible answer, which should then be sent by the caller. This allows a design where all the side effects are controlled in one small chunk, which makes it easier to understand the state transitions since they are made explicit.

At the time of this writing, Dhcp_server.Input.input_pkt is not side effect free, as it manipulates a database of leases, this will be changed in the next version to be pure as well.

Storing leases in permanent storage is also unsupported at this time and should be available soon, with Irmin and other backends. The main idea is to always return a new lease database for each input, or maybe just the updates to be applied, and in this scenario, the caller would be able to store the database in permanent storage as he sees fit.

Configuration

This project started independently of MirageOS and at that time, the best configuration I could think of was the well known ISC dhcpd.conf. Therefore, the configuration uses the same format but it does not support the myriad of options of the original one.

  type t = {
    addresses : (Ipaddr.V4.t * Macaddr.t) list;
    subnets : subnet list;
    options : Dhcp_wire.dhcp_option list;
    hostname : string;
    default_lease_time : int32;
    max_lease_time : int32;
  }

  val parse : string -> (Ipaddr.V4.Prefix.addr * Macaddr.t) list -> t
  (** [parse cf l] Creates a server configuration by parsing [cf] as an ISC
      dhcpd.conf file, currently only the options at [sample/dhcpd.conf] are
      supported. [l] is a list of network addresses, each pair is the output
      address to be used for building replies and each must match a [network
      section] of [cf]. A normal usage would be a list of all interfaces
      configured in the system *)

Although it is a great format, it doesn't exactly play nice with MirageOS and OCaml, since the unikernel needs to parse a string at runtime to build the configuration, this requires a file IO backend and other complications. The next version should provide OCaml helpers for building the configuration, which would drop the requirements of a file IO backend and facilitate writing tests.

Building a simple server

The easiest way is to follow the mirage-skeleton DHCP README.

Future

The next steps would be:

  • Provide helpers for building the configuration.
  • Expose the lease database in an immutable structure, possibly a Map, adding also support/examples for Irmin.
  • Use Functoria to pass down the configuration in mirage-skeleton. Currently it is awkward since the user has to edit unikernel.ml and config.ml, with Functoria we would be able to have it much nicer and only touch config.ml.
  • Convert MirageOS DHCP client code to use Dhcp_wire, or perhaps add a client logic functionality to Charrua.

Finishing words

This is my first real project in OCaml and I'm more or less a newcomer to functional programming as well, my background is mostly kernel hacking as an ex-OpenBSD developer. I'd love to hear how people are actually using it and any problems they're finding, so please do let me know via the issue tracker!

Prior to this project I had no contact with any of the MirageOS folks, but I'm amazed about how easy the interaction and communication with the community has been, everyone has been incredibly friendly and supportive. I'd say MirageOS is a gold project for anyone wanting to work with smart people and hack OCaml.

My many thanks to Anil, Richard, Hannes, Amir, Scott, Gabriel and others. Thanks also to Thomas and Christophe for comments on this post. I also would like to thank my employer for letting me work on this project in our hackathons.


Dec
17
2015
12
0

Unikernel.org

By Amir Chaudhry

Unikernels are specialised single address space machine images that are constructed by using library operating systems. With MirageOS, we've taken a clean-slate approach to unikernels with a focus on safety. This involved writing protocol libraries from the ground up and it also afforded the ability to use clean, modern APIs.

Other unikernel implementations have made trade-offs different to those made by MirageOS. Some excel at handling legacy applications by making the most of existing OS codebases rather than building clean-slate implementations. Some target a wide array of possible environments, or environments complementary to those supported by MirageOS currently. All of these implementations ultimately help developers construct unikernels that match their specific needs and constraints.

As word about unikernels in general is spreading, more people are trying to learn about this new approach to programming the cloud and embedded devices. Since information is spread across multiple sites, it can be tricky to know where to get an overview and how to get started quickly. So to help people get on board, there's a new community website at unikernel.org!

The unikernel.org community site aims to collate information about the various projects and provide a focal point for early adopters to understand more about the technology and become involved in the projects themselves.

Over time, it will also become a gathering place for common infrastructure to form and be shared across projects. Early examples of this include the scripts for booting on Amazon EC2, which began with MirageOS contributors but were used and improved by Rump Kernel contributors. You can follow the email threads where the script was first proposed and ultimately provided EC2 support for Rumprun. Continuing to work together to make such advances will ease the process of bringing in new users and contributors across all the projects.

Please do visit the site and contribute stories about how you're using and improving unikernels!

Edit: discuss this post on devel.unikernel.org

Thanks to Anil, Jeremy and Mindy for comments on an earlier draft.


Oct
23
2015
11
0

Videos from around the world!

By Amir Chaudhry

Word about Unikernels and MirageOS is spreading and as the community grows, more people have been giving talks at user groups and conferences. Below are a selection of those that have been recorded, which alone is about 5 hours of content. The topics are wide ranging and include discussions about where unikernels fit in the ecosystem, all the way down to networking topics.

I hope you enjoy these and if you'd like to give a talk somewhere or share one of your videos, please do get in touch!

Videos of recent talks

Anil Madhavapeddy at Esper Technologies - May 2015
'Unikernels: Functional Infrastructure with MirageOS'

Russell Pavlicek at SouthEast LinuxFest - June 2015
'Next Generation Cloud'

Russ has also been at many other Linuxfests this year!

Amir Chaudhry at PolyConf - July 2015
'Unikernels!'

There's more information about this talk in the blog post at: http://amirchaudhry.com/unikernels-polyconf-2015

Hannes Mehnert at Source_Code Berlin - Aug 2015
'Leaving legacy behind — A clean-slate approach to operating systems'

Steve Jones at DevopsDays Pittsburgh - Aug 2015
'The Incredible Shrinking Operating System!'

Mindy Preston at Strangeloop - Sep 2015
'Non-Imperative Network Programming'

Mindy also wrote a post that has some information and links: http://somerandomidiot.com/blog/2015/10/07/ocaml-workshop-and-strange-loop-talks

Matt Bajor at Strangeloop - Sep 2015
'Look ma, no OS! Unikernels and their applications'

Gareth Rushgrove at Operability - Sep 2015
'Taking the Operating System out of Operations'

Garett Smith at CityCode - Oct 2015
'Rainbows and Unikernels'


Oct
15
2015
19
0

Getting Started Screencasts

By Amir Chaudhry

We put together some quick screencasts to make it easier for people to get started with MirageOS. They're all in a playlist below, which is around 10 minutes in total.

There are currently 4 videos which walk through some of the typical steps. The first three cover installation, building a 'hello world', and building a Xen unikernel on an Ubuntu machine. The fourth video gives an overview of the development workflows that are possible with OPAM and Git.

These should give everyone a clear idea of what it's like to work with the tools before leaping in and installing things!

If anyone would like to help us make more of these screencasts, please do get in touch on the mailing list — I've also listed it as one of our many Pioneer Projects!


Jul
22
2015
17
0

Organized chaos: managing randomness

By David Kaloper

This post gives a bit of background on the Random Number Generator (RNG) in the recent MirageOS v2.5 release.

First we give background about why RNGs are really critical for security. Then we try to clarify the often-confused concepts of "randomness" and "entropy", as used in this context. Finally, we explore the challenges of harvesting good-quality entropy in a unikernel environment.

Playing dice

Security software must play dice.

It must do so to create secrets, for example. Secrets can then serve as the keys that protect communication, like the Diffie-Hellman key exchanged between two TLS endpoints. Proof of the knowledge of a particular secret can be used to verify the identity of someone on the Internet, as in the case of verifying the possession of the secret RSA key associated with an X.509 certificate. As an attacker guessing a secret can have disastrous consequences, it must be chosen in a manner that is realistically unpredictable by anyone else — we need it to be random.

There are other reasons to use randomness. A number of algorithms require a unique value every time they are invoked and badly malfunction when this assumption is violated, with random choice being one way to provide a value likely to be unique. For example, repeating the k-parameter in DSA digital signatures compromises the secret key, while reusing a GCM nonce negates both confidentiality and authenticity. Other algorithms are probabilistic, in that they generate random values before operating on an input, and then store the chosen values in the output, such as the OAEP padding mode for RSA. This is done in order to confuse the relationship between the input and the output and defeat a clever attacker who tries to manipulate the input to gain knowledge about secrets by looking at the output. Still other algorithms pick random numbers to internally change their operation and hide the physical amount of time they need to execute, to avoid revealing information about the secrets they operate on. This is known as blinding, and is one way to counter timing side-channel attacks.

Randomness is therefore quite pervasive in a security context. In fact, many cryptographic algorithms are designed under the assumption of a readily available source of randomness, termed a random oracle. The security analysis of those algorithms is conditional on the oracle; we know that they have certain security characteristics, like the difficulty of guessing the correct message or impersonating somebody, only given an ideal random oracle.

And so security software has a problem here. Computers are inherently deterministic, made to behave reproducibly given a known program and starting state. How to go about solving this?

Random failures

Before taking a look at how we try to solve this problem, let's instead consider what happens if we fail to do so. There is even a Wikipedia page about this, which is a nice starting point. Some of the highlights:

The first public release of Netscape's original SSL, version 2.0, was broken several months after its release. The weakness was in initializing the RNG with the current time, the process ID and the parent process ID of the browser. The time stamp can be guessed to a certain precision, leaving only its sub-second part and the two PIDs unknown. This relatively small unknown space of initial values can be brute-forced.

About a decade later, Debian patched their version of OpenSSL and reduced RNG initialization to the current PID. As a result, only 32767 random sequences were possible. This flaw went undetected for two years and became known as the Debian fiasco. Personal reports indicate that some of the 32767 distinct secret keys that could be generated with OpenSSL on a Debian system during that time are still in circulation.

Computing the largest common divisor of a pair of numbers is much faster than discovering all the prime divisors of a particular number. RSA public keys contain a number, and secret keys contain its factors. An RSA key is usually generated by randomly picking the factors. If a pool of keys was generated with a heavily biased random number generator, such that factors are likely to repeat, it is possible to search for common factors in all pairs and crack the affected keys, a technique which produces spectacular results.

Recently, a bitcoin application for Android was discovered to be downloading its random initial value from a website. It wasn't even necessary to intercept this unencrypted traffic, because the website started serving a redirect page and the Android application was left initializing its RNG with the text of the redirection message. It therefore started generating the same private ECDSA key and the associated bitcoin address for every affected user, an issue which reportedly cost some users their bitcoins.

Playstation 3 game signatures can be forged. Sony reused a single k-parameter, which is supposed to be "unique, unpredictable and secret", for every ECDSA signature they made. This lead to complete compromise of the signing keys. Admittedly, this is not really an RNG problem in itself, but it shows where such a malfunction can lead.

These are only some of the most spectacular failures related to random numbers. For example, it is widely known in security circles that RNGs of embedded devices tend to be predictable, leading to widespread use of weak keys on routers and similar equipment, amongst other things. So when implementing a unikernel operating system, you don't want to end up on that Wikipedia page either.

Random sequences and stuff

But what are random numbers, really? Intuitively, we tend to think about them as somehow "dancing around", or being "jiggly" in a sense. If we have a software component that keeps producing random outputs, these outputs form a sequence, and we hope this to be a random sequence.

But such a thing is notoriously difficult to define. The above linked page opens with the following quote:

A random sequence is a vague notion... in which each term is unpredictable to the uninitiated and whose digits pass a certain number of tests traditional with statisticians.

The intuitive jigglyness is captured by statistical randomness. We require each output, taken independently, to come from the same distribution (and in fact we want it to be the uniform distribution). That is, when we take a long sequence of outputs, we want them to cover the entire range, we want them to cover it evenly, and we want the evenness to increase as the number of outputs increases — which constitutes a purely frequentist definition of randomness. In addition, we want the absence of clear patterns between outputs. We don't want the sequence to look like 7, 8, 9, 10, ..., even with a bit of noise, and we don't want correlation between outputs. The problem here is that no-one really knows what "having patterns" means; it is entirely possible that we only searched for patterns too simple, and that in fact there is a pattern that fully explains the sequence lurking just around the complexity corner.

Nonetheless, there is a well established battery of tests to check statistical randomness of RNG outputs, called the Diehard Tests, and serves as the de-facto standard for testing random number generators. Here's the beginning of a certain sequence that passes the test with flying colors:

3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3, 2, 3, 8, 4, 6, 2, 6, 4, 3, 3, 8, 3, ...

We still would not recommend using digits of π as a secret key. Neither would we recommend releasing software for everyone to study, which uses that sequence to generate the secrets. But what went wrong?

The other concept of randomness. Roughly, a random sequence should not be predictable to anyone with any knowledge other than the sequence itself. In other words, it cannot be compressed no matter how much we try, and in the extreme, this means that it cannot be generated by a program. While the latter restriction is obviously a little too strong for our purpose, it highlights a deep distinction in what people mean by being "random". Jumping around is one thing. Being actually unpredictable is a wholly different matter.

There are many other simple mathematical processes which generate sequences with high statistical randomness. Many of those are used to produce "random" sequences for various purposes. But these are still completely deterministic processes that exhibit random behaviour only in the statistical sense. Instead of being random, they are pseudo-random, and we call such generators Pseudo-Random Number Generators (PRNGs).

We can look for something approaching a "true" random sequence in nature. The current agreement is that the nature of quantum processes is random in this sense, and random sequences based on this idea are readily available for download. Or we can use the microphone and keep recording; the lowest-order bits of the signal are pretty unpredictable. But we cannot write a program to generate an actually random sequence.

Still, we need to compromise. The real problem of common PRNGs is that knowing the rule and observing some of the outputs is enough to predict the rest of the sequence. The entire future behavior of Mersenne twister, one of the most commonly used generators in various programming packages, can be predicted after observing only 624 outputs in a row. A step up from such a process is a Cryptographically Secure Pseudo-Random Number Generator (CSPRNG). Their key property is that it is computationally prohibitively expensive to distinguish their outputs from a "true" random sequence. This also means that it is computationally prohibitively expensive to reconstruct their internal state, just by looking at their outputs. In a sense, someone trying to predict the outputs can not take shortcuts, and is instead forced to perform the laborious task of starting the generator with all the possible states and checking if the output matches the observed sequence. This is how we can quantify a CSPRNG unpredictability: it takes trying about half of all the possibilities to guess the state.

MirageOS' security stack contains a CSPRNG, a design called Fortuna. What it really does, is encrypt the simple sequence 0, 1, 2, 3, ... with AES (AES-CTR) using a secret key. This makes it as resistant to prediction as AES is to known-plaintext attacks. After each output, it generates a bit more, hashes that, and uses the result as the next key. This is not to improve the statistical randomness, as it is already guaranteed by AES. Rather, it's a form of forward secrecy: an attacker who learns the secret key at some point would need to perform the preimage attack on the hash function to figure out the earlier key and reconstruct the earlier outputs.

Entropy

Although resistant to prediction based solely on the outputs, just like any other software RNG, Fortuna is still just a deterministic PRNG. Its entire output is as unpredictable as its initial value, which we call the seed. From the information perspective, a PRNG can only transform what was unpredictable about its initial seed into an equally unpredictable sequence. In other words, we typically use PRNGs to stretch the unpredictability inherent in the initial seed into an infinite stream. The best PRNGs do not give out more hints about their starting position, but they can never out-race the amount of unpredictability that they started with.

We often call this quality of unpredictability entropy. In a sense, by employing an algorithmic generator, we have just shifted the burden of being unpredictable to the beginning. But now we're cornered and have to search for entropy in the only place where a computer can find it: in the physical world.

A typical (kernel-level) RNG-system reaches out into the world around it through hardware interaction: as hardware events happen, various drivers tend to emit small packets of data, such as the time, or hardware-specific state. These events are a product of the user interactions with the keyboard and mouse, of network packets arriving at an interface, of the hard drive asserting interrupts to signal the end of a DMA transfer, and the like. They are combined together and used to seed the internal (CS-)PRNG.

In fact, describing them as a seed from which the entire sequence is unfolded is a deliberate oversimplification: what really happens is that the PRNG is continuously fed with random events, which change its state as they arrive, and the requests for random bytes are served from the PRNG. The PRNG is used to "mix" the unpredictability inherent in its input, that is, to smooth out various timestamps and similar values into a statistically well-behaved sequence.

Do Virtual Machines Dream of Electric Sheep?

Our problem here is that a virtual machine (VM) in a typical configuration barely sees any physical hardware. Users do not interact with VMs in server scenarios using a directly-connected keyboard and mouse. VMs make use of a virtualized network interface and virtualized disks. Even the CPU features can be intercepted and virtualized. Virtual environments are entropy-starved.

This is a known problem and various analyses of the weakness of random outputs in virtual environments have been published. The problem is especially severe right after boot. The gradual trickle of unpredictability from hardware events slowly moves the pseudo-random stream into an increasingly unpredictable state, but at the very start, it still tends to be fairly predictable. Typically, operating systems store some of their PRNG output on shutdown and use it to quickly reseed their PRNG on the next boot, in order to reuse whatever entropy was contained in its state. Unfortunately, it is common to boot several machines from the same system image, or from a pristine image lacking a seed, making random outputs in a virtual machine vulnerable to prediction close to the startup phase.

To help solve these problems, we employ several sources of entropy in MirageOS unikernels. The case of a Unix executable is simple, as we reuse the system's own RNG, as exposed via /dev/urandom, as the source of our entropy. This is because the kernel is in a much better position to enter an unpredictable state than any single process running under its supervision. The case of Xen unikernels is harder. Here, we group the entropy sources into those that originate within the unikernel itself, and those that originate externally.

In the external case, we again rely on the kernel interacting with the hardware, but this time it's the dom0 kernel. We have a background service, Xentropyd, which runs in dom0, reads the RNG and serves its output to other domains through the Xen Console. The problem is that in many scenarios, like hosting on popular cloud providers, we cannot expect this degree of cooperation from dom0. A bigger problem is that although most of the code is present, we haven't fully fleshed out this design and it remains disabled in MirageOS 2.5.0

So we need to be able to achieve unpredictability relying purely on what is available inside a unikernel. A unikernel has no direct exposure to the hardware, but it is of course interacting with the outside world. To tap into this ambient entropy, we have to continuously sample all inter-event timings in its event loop. This process is analogous to what happens in a full-blown OS kernel, except our events lack the extra hardware context, and our timers are potentially less granular (for example, on ARM). This makes our interaction-based events somewhat more predictable, or in other words, they have a little less entropy.

Recent Intel chips come with an on-die random generator, which ultimately derives from thermal readings, and is available through RDRAND and (more directly) RDSEED instructions. The community has expressed concern that relying exclusively on this generator might not be a wise choice: it could silently malfunction, and its design is hidden in the hardware, which raises concerns about potential intentional biases in the output — a scheme not unheard of. However, since entropy is additive, its output can never reduce whatever unpredictability the system already has. Therefore, if available, we continuously sample this on-die RNG, and inject its outputs into our PRNG.

The combination of event timings and a built-in RNG does have good unpredictability in the long run, especially if our unikernel is running on a multi-tenant host and competing for CPU with other instances. But the entropy in each individual event is still relatively low: we can assume that a determined attacker can guess each individual time stamp up to a certain precision that we don't know, but which is potentially quite high. This creates the following problem: imagine that an attacker knows the current PRNG state, and can measure the time of the next event, but not with sufficient precision to know the last two bits of the timestamp. To this attacker, our event contains two bits of entropy. If we immediately update the PRNG, the attacker only has to observe some of the output and check four candidate states against it, to fully recover knowledge about the state and negate our entropy addition. On the other hand, if we decide to wait and try to accumulate many more events before updating the PRNG, we keep generating a fully predictable sequence in the meantime.

And here is where Fortuna really shines. It keeps accumulating events in a number of internal pools in a round-robin fashion. These pools are constantly being activated, but with an exponentially decreasing frequency. The pools activated too frequently are wasted, but one of them is activated with just the right frequency to contain enough entropy to make it prohibitively expensive for an attacker to enumerate all the possibilities. This design was shown to be within a constant factor from optimal entropy use, and in particular, scales robustly with the actual amount of entropy inherent in the input events.

This leaves us with the problem of boot-time entropy. Not only can the saved random seed be reused by cloning the disk image, but in many cases, a MirageOS unikernel is running without such storage at all!

Following the design of Whirlwind RNG, we employ an entropy bootstrapping loop. It's an iterated computation, which measures the time it took to perform the previous iteration, and then performs the amount of work that depends on the time, many times over. In this way, it creates a feedback loop with a fragile dependency on any non-determinism in the physical execution on the CPU, such as any contention or races in the CPU state. Even on ARM, which currently uses a less fine-grained timer and whose design is not as parallel as Intel's, this yields an initial value which varies wildly between boots. We use this value to kickstart the PRNG, giving it quick divergence, and ensuring that the state is unpredictable from the very start.

Parting words

While some of our techniques (in particular bootstrapping on ARM) need a little more exposure before we place our full confidence in them — and users should probably avoid generating long-term private keys in unikernels running on bare Xen just yet — the combination of boostrapping, continuous reseeding, and robust accumulation gives us a hopefully comprehensive solution to generating randomness in a unikernel environment.

We intend to re-evaluate the effectiveness of this design after getting some experience with how it works in the wild. To this end, we particularly appreciate the community feedback and you can reach us through our mailing list, or hop onto freenode and join #mirage.

Thanks to Daniel, Mort and Amir for their comments on earlier drafts.


Jul
7
2015
12
0

Easy HTTPS Unikernels with mirage-seal

By Mindy Preston, Hannes Mehnert

Unikernels: HTTP -> HTTPS

Building a static website is one of the better-supported user stories for MirageOS, but it currently results in an HTTP-only site, with no capability for TLS. Although there's been a great TLS stack available for a while now, it was a bit fiddly to assemble the pieces of TLS, Cohttp, and the MirageOS frontend tool in order to construct an HTTPS unikernel. With MirageOS 2.5, that's changed! Let's celebrate by building an HTTPS-serving unikernel of our very own.

Prerequisites

Get a Certificate

To serve HTTPS, we'll need a certificate to present to clients (i.e., browsers) for authentication and establishing asymmetric encryption. For just testing things out, or when it's okay to cause a big scary warning message to appear for anyone browsing a site, we can just use a self-signed certificate. Alternatively, the domain name registrar or hosting provider for a site will be happy to sell (or in some cases, give!) a certificate -- both options are explained in more detail below.

Whichever option you choose, you'll need to install certify to get started (assuming you'd like to avoid using openssl). To do so, pin the package in opam:

opam pin add certify https://github.com/yomimono/ocaml-certify.git
opam install certify

Self-Signed

It's not strictly necessary to get someone else to sign a certificate. We can create and sign our own certificates with the selfsign command-line tool. The following invocation will create a secret key in secrets/server.key and a public certificate for the domain totallyradhttpsunikernel.xyz in secrets/server.pem. The certificate will be valid for 365 days, so if you choose this option, it's a good idea set a calendar reminder to renew it if the service will be up for longer than that. The key generated will be a 2048-bit RSA key, although it's possible to create certificates valid for different lengths -- check selfsign --help for more information.

selfsign -c secrets/server.pem -k secrets/server.key -d 365 totallyradhttpsunikernel.example

We can now use this key and certificate with mirage-seal! See "Packaging Up an HTTPS Site with Mirage-Seal" below.

Signed by Someone Else

Although there are many entities that can sign a certificate with different processes, most have the following in common:

1) you generate a request to have a certificate made for a domain 2) the signing entity requests that you prove your ownership over that domain 3) once verified, the signing entity generates a certificate for you

Generating a Certificate-Signing Request

No matter whom we ask to sign a certificate, we'll need to generate a certificate signing request so the signer knows what to create. The csr command-line tool can do this. The line below will generate a CSR (saved as server.csr) signed with a 2048-bit RSA key (which will be saved as server.key), for the organization "Rad Unikernel Construction, Ltd." and the common name "totallyradhttpsunikernel.example". For more information on csr, try csr --help.

csr -c server.csr -k server.key totallyradhttpsunikernel.example "Rad Unikernel Construction, Ltd."

csr will generate a server.csr that contains the certificate signing request for submission elsewhere.

Example: Gandi.net

My domain is registered through the popular registrar Gandi.net, who happen to give a free TLS certificate for one year with domain registration, so I elected to have them sign a certificate for me (Gandi did not pay a promotional consideration for this mention). Most of this process is managed through their web GUI and a fairly large chunk is automatically handled behind the scenes. Here's how you can do it too:

Log in to the web interface available through the registrar's website. You can start the certificate signing process from the "services" tab, which exposes an "SSL" subtab. Click that (Gandi doesn't need to know that we intend only to support TLS, not SSL). Hit the "Get an SSL Certificate" button. Standard SSL is fine. Even if you're entitled to a free certificate, it will appear that you need to pay here; however at checkout, the total amount due will be 0 in your preferred currency. Ask for a single address and, if you want to pay nothing, a valid period of 1 year.

Copy the content of the certificate-signing request you generated earlier and paste it into the web form. Gandi will also ask you to identify your TLS stack; unfortunately ocaml-tls isn't in the drop-down menu, so choose OTHER (and perhaps send them a nice note asking them to add the hottest TLS stack on the block to their list). Click "submit" and click through the order form.

If you're buying a certificate for a domain you have registered through Gandi (via the registered account), the rest of the process is pretty automatic. You should shortly receive an e-mail with a subject like "Procedure for the validation of your Standard SSL certificate", which explains the process in more detail, but really all you need to do is wait a while (about 30 minutes, for me). After the certificate has been generated, Gandi will notify you by e-mail, and you can download your certificate from the SSL management screen. Click the magnifying glass next to the name of the domain for which you generated the cert to do so.

Once you've downloaded your certificate, you may also wish to append the intermediate certificates. Here's a help page on gathering intermediate certificates. Equipped with the intermediate certificates, append them to the signed certificate downloaded for your site to provide a full certificate chain:

cat signed_cert.pem intermediate_certs.pem > server.pem
Example: StartSSL.com

Another free TLS certificate provider is StartSSL. During online registration, StartSSL will generate a TLS client certificate for you. This is used for authentication of yourself towards their service.

You need to validate that you own the domain you want to request a certificate for. This is done via the "Validations Wizard", which lets you choose to validate a domain via "Domain Name Validation". There you enter your domain name, and receive an eMail with a token which you have to enter into the web interface.

Once done, run csr to create a key and a certificate signing request. Go to the "Certificates Wizard", select "Web Server SSL/TLS Certificate", skip the generation of the private key (you already generated one with csr), copy and paste your certificate signing request (only the public key of that CSR is used, everything else is ignored), select a domain name, and immediately receive your certificate.

Make sure to also download their intermediate CA certificate, and append them:

cat intermediate.pem cert.pem > server.pem

Packaging Up an HTTPS Site with Mirage-Seal

Equipped with a private key and a certificate, let's make an HTTPS unikernel! First, use opam to install mirage-seal. If opam or other MirageOS tooling aren't set up yet, check out the instructions for getting started.

opam install mirage-seal

mirage-seal has a few required arguments.

  • --data: one directory containing all the content that should be served by the unikernel. Candidates for such a directory are the top-level output directory of a static site generator (such as public for octopress), the DocumentRoot of an Apache configuration, or the root of an nginx configuration.
  • --keys: one directory containing the certificate (server.pem) and key (server.key) for the site.

There are also a number of configurable parameters for IP settings. By default, mirage-seal will use DHCP to configure the network at boot. To set static IP information, use the --ip, --nm, and --gw arguments.

You'll find more thorough documentation by looking at mirage-seal --help or mirage-seal's README file.

To build a Xen unikernel, select the Xen mode with -t xen. In full, for a unikernel that will configure its network via DHCP:

mirage-seal --data=/home/me/coolwebsite/public --keys=/home/me/coolwebsite/secrets -t xen

mirage-seal will then generate a unikernel mir-seal.xen and a Xen configuration file seal.xl in the current working directory. To boot it and open the console (on a machine running Xen), invoke xl create on the configuration file with the -c option:

sudo xl create seal.xl -c

Via the console, we can see the sealed unikernel boot and obtain an IP through DHCP. Congratulations -- you made a static site unikernel browsable over HTTPS!


Jun
29
2015
16
0

Reviewing the Bitcoin Pinata

By Hannes Mehnert

TL;DR: Nobody took our BTC. Random people from the Internet even donated into our BTC wallet. We showed the feasibility of a transparent self-service bounty. In the style of Dijkstra: security bounties can be a very effective way to show the presence of vulnerabilities, but they are hopelessly inadequate for showing their absence.

What are you talking about?

Earlier this year, we released a Bitcoin Piñata. The Piñata was a security bounty containing 10 BTC and it's been online since 10th February 2015. Upon successful mutual authentication, where the Piñata has only a single trust anchor, it sends the private key to the Bitcoin address.

It is open source, and exposes both the client and server side of ocaml-tls, running as an 8.2MB MirageOS unikernel. You can see the code manifest to find out which libraries are involved. We put this online and invited people to attack it.

Any approach was permitted in attacking the Piñata: the host system, the MirageOS TCP/IP stack, our TLS, X.509 and ASN.1 implementations, as well as the Piñata code. A successful attacker could do whatever they want with the BTC, no questions asked (though we would notice the transaction).

The exposed server could even be short-circuited to the exposed client: you could proxy a TLS connection in which the (encrypted!) secret was transmitted via your machine.

This post summarises what we've seen so far and what we've learned about attempts people have made to take the BTC.

Accesses

There were 50,000 unique IP addresses who accessed the website. 1000 unique IP addresses initiated more than 20,000 TLS connections to the Piñata, trying to break it. Cumulative numbers of the HTTP and TLS accesses are shown in the diagram:

Cumulative Piñata accesses

There were more than 9000 failing and 12000 successful TLS sessions, comprised of short-circuits described earlier, and our own tests.

No X.509 certificate was presented in 1200 of the failed TLS connections. Another 1000 failed due to invalid input as the first bytes. This includes attempts using telnet — I'm looking at you, xx.xxx.74.126 please give key (on 10th February at 16:00) and xx.xxx.166.143 hi give me teh btcs (on 11th February at 05:57)!

We are not talking to everybody

Our implementation first parses the record version of a client hello, and if it fails, an unknown record version is reported. This happened in 10% of all TLS connections (including the 1000 with invalid input in the last section).

Another big class, 6%, were attempted Heartbeat packets (popular due to Heartbleed), which we do not implement.

Recently, issues in the state machines of TLS implementations were published in smacktls (and CCS injection). 3% of the Piñata connections received an unexpected handshake record at some point, which the Piñata handled correctly by shutting down the connection.

In 2009, the renegotiation attack on the TLS protocol was published, which allowed a person in the middle to inject prefix bytes, because a renegotiated handshake was not authenticated with data from the previous handshake. OCaml-TLS closes a connection if the renegotiation extension is not present, which happened in 2% of the connections. Another 2% did not propose a ciphersuite supported by OCaml-TLS; yet another 2% tried to talk SSL version 3 with us, which we do not implement (for good reasons, such as POODLE).

In various other (old versions of) TLS implementations, these connections would have been successful and insecure!

Attempts worth noting

Interesting failures were: 31 connections which sent too many or too few bytes, leading to parse errors.

TLS requires each communication partner who authenticates themselves to present a certificate. To prove ownership of the private key of the certificate, a hash of the concatenated handshake records needs to be signed and transmitted over the wire. 22 of our TLS traces had invalid signatures. Not verifying such signatures was the problem of Apple's famous goto fail.

Another 100 failure traces tested our X.509 validation: The majority of these failures (58) sent us certificates which were not signed by our trust anchor, such as CN=hacker/emailAddress=hacker@hacker and CN=Google Internal SNAX Authority and various Apple and Google IDs -- we're still trying to figure out what SNAX is, Systems Network Architecture maybe?

Several certificates contained invalid X.509 extensions: we require that a server certificate does not contain the BasicConstraints = true extension, which marks this certificate as certificate authority, allowing to sign other certificates. While not explicitly forbidden, best practices (e.g. from Mozilla) reject them. Any sensible systems administrator would not accept a CA as a server certificate.

Several other certificates were self-signed or contained an invalid signature: one certificate was our client certificate, but with a different RSA public key, thus the signature on the certificate was invalid; another one had a different RSA public key, and the signature was zeroed out.

Some certificates were not of X.509 version 3, or were expired. Several certificate chains were not pairwise signed, a common attack vector.

Two traces contained certificate structures which our ASN.1 parser rejected.

Another two connections (both initiated by ourselves) threw an exception which lead to shutdown of the connection: there was an out-of-bounds access while parsing handshake records. This did not lead to arbitrary code execution.

Conclusion

The BTC Piñata was the first transparent self-service bounty, and it was a success: people showed interest in the topic; some even donated BTC; we enjoyed setting it up and running it; we fixed a non-critical out of bounds access in our implementation; a large fraction of our stack has been covered by the recorded traces.

There are several points to improve a future Piñata: attestation that the code running is the open sourced code, attestation that the service owns the private key (maybe by doing transactions or signatures with input from any user).

There are several applications using OCaml-TLS, using MirageOS as well as Unix:

  • mirage-seal compiles to a unikernel container which serves a given directory over https;
  • tlstunnel is a (stud like) TLS proxy, forwarding to a backend server;
  • jackline is a (alpha version) terminal-based XMPP client;
  • conduit is an abstraction over network connections -- to make it use OCaml-TLS, set CONDUIT_TLS=native.

Again, a big thank you to IPredator for hosting our BTC Piñata and lending us the BTC!


Jun
26
2015
16
0

MirageOS v2.5 with full TLS support

By Amir Chaudhry, Thomas Gazagnaire

Today we're announcing the new release of MirageOS v2.5, which includes first-class support for SSL/TLS in the MirageOS configuration language. We introduced the pure OCaml implementation of transport layer security (TLS) last summer and have been working since then to improve the integration and create a robust framework. The recent releases allow developers to easily build and deploy secure unikernel services and we've also incorporated numerous bug-fixes and major stability improvements (especially in the network stack). The full list of changes is available on the releases page and the breaking API changes now have their own page.

Over the coming week, we'll share more about the TLS stack by diving into the results of the Bitcoin Piñata, describing a new workflow for building secure static sites, and discussing insights on entropy in virtualised environments.

In the rest of this post, we'll cover why OCaml-TLS matters (and link to some tools), mention our new domain name, and mention our security advisory process.

Why OCaml-TLS matters

The last year has seen a slew of security flaws, which are even reaching the mainstream news. This history of flaws are often the result of implementation errors and stem from the underlying challenges of interpreting ambiguous specifications, the complexities of large APIs and code bases, and the use of unsafe programming practices. Re-engineering security-critical software allows the opportunity to use modern approaches to prevent these recurring issues. In a separate post, we cover some of the benefits of re-engineering TLS in OCaml.

TLS Unix Tools

To make it even easier to start benefiting from OCaml-TLS, we've also made a collection of TLS unix tools. These are designed to make it really easy to use a good portion of the stack without having to use Xen. For example, Unix tlstunnel is being used on https://realworldocaml.org. If you have stunnel or stud in use somewhere, then replacing it with the tlstunnel binary is an easy way to try things out. Please do give this a go and send us feedback!

openmirage.org -> mirage.io

We've also switched our domain over to https://mirage.io, which is a unikernel running the full stack. We've been discussing this transition for a while on our fortnightly calls and have actually been running this unikernel in parallel for a while. Setting things up this way has allowed us to stress test things in the wild and we've made big improvements to the networking stack as a result.

We now have end-to-end deployments for our secure-site unikernels, which is largely automated -- going from git push all the way to live site. You can get an idea of the workflows we have set up by looking over the following links:

Security disclosure process

Since we're incorporating more security features, it's important to consider the process of disclosing issues to us. Many bugs can be reported as usual on our issue tracker but if you think you've discovered a security vulnerability, the best way to inform us is described on a new page at https://mirage.io/security.

Get started!

As usual, MirageOS v2.5 and the its ever-growing collection of libraries is packaged with the OPAM package manager, so look over the installation instructions and run opam install mirage to get the command-line tool. To update from a previously installed version of MirageOS, simply use the normal workflow to upgrade your packages by using opam update -u (you should do this regularly to benefit from ongoing fixes). If you're looking for inspiration, you can check out the examples on mirage-skeleton or ask on the mailing list. Please do be aware that existing config.ml files using the conduit and http constructors might need to be updated -- we've made a page of backward incompatible changes to explain what you need to do.

We would love to hear your feedback on this release, either on our issue tracker or our mailing lists!


Jun
26
2015
14
0

Why OCaml-TLS?

By Amir Chaudhry

TLS implementations have a history of security flaws, which are often the result of implementation errors. These security flaws stem from the underlying challenges of interpreting ambiguous specifications, the complexities of large APIs and code bases, and the use of unsafe programming practices.

Re-engineering security-critical software allows the opportunity to use modern approaches to prevent these recurring issues. Creating the TLS stack in OCaml offers a range of benefits, including:

Robust memory safety: Lack of memory safety was the largest single source of vulnerabilities in various TLS stacks throughout 2014, including Heartbleed (CVE-2014-0160). OCaml-TLS avoids this class of issues entirely due to OCaml's automatic memory management, safety guarantees and the use of a pure-functional programming style.

Improved certificate validation: Implementation errors in other stacks allowed validation to be skipped under certain conditions, leaving users exposed (e.g. CVE-2014-0092). In our TLS stack, we return errors explicitly as values and handle all possible variants. The OCaml toolchain and compile-time checks ensure that this has taken place.

Mitigation of state machine errors: Errors such as Apple's GoTo Fail (CVE-2014-1266) involved code being skipped and a default 'success' value being returned, even though signatures were never verified. Our approach encodes the state machine explicitly, while state transitions default to failure. The code structure also makes clear the need to consider preconditions.

Elimination of downgrade attacks: Legacy requirements forced other TLS stacks to incorporate weaker 'EXPORT' encryption ciphers. Despite the environment changing, this code still exists and leads to attacks such as FREAK (CVE-2015-0204) and Logjam (CVE-2015-4000). Our TLS server does not support weaker EXPORT cipher suites so was never vulnerable to such attacks. In addition our stack never supported SSLv3, which was known to be the cause of many vulnerabilities and is only now in the process of being deprecated (RFC: 7568).

Greatly reduced TCB: The size of the trusted computing base (TCB) of a system, measured in lines of code, is a widely accepted approximation of the size of its attack surface. Our secure Bitcoin Piñata, a unikernel built using our TLS stack, is less than 4% the size of an equivalent, traditional stack (102 kloc as opposed to 2560 kloc).

These are just some of the benefits of re-engineering critical software using modern techniques.


Feb
10
2015
16
0

Smash the Bitcoin Pinata for fun and profit!

By Amir Chaudhry

Last summer we announced the beta release of a clean-slate implementation of TLS in pure OCaml, alongside a series of blog posts that described the libraries and the thinking behind them. It took two hackers six months — starting on the beach — to get the stack to that point and their demo server is still going strong. Since then, the team has continued working and recently presented at the 31st Chaos Communication Congress.

The authors are putting their stack to the test again and this time they've built a Bitcoin Piñata! Essentially, they've hidden a private key to a bitcoin address within a Unikernel running on Xen. If you're able to smash your way in, then you get to keep the spoils.

There's more context around this in my Piñata post and you can see the details on the site itself. Remember that the codebase is all open (as well as issues) so there's nothing to reverse engineer. Have fun!


Dec
31
2014
16
0

MirageOS 2014 review: IPv6, TLS, Irmin, Jitsu and community growth

By Anil Madhavapeddy

This work funded in part by the EU FP7 User-Centric Networking project, Grant No. 611001.

An action-packed year has flown by for MirageOS, and it's time for a little recap of what's been happening and the plans for the new year. We announced MirageOS 1.0 just over a year ago, and 2014 also saw a major 2.0 summer release and the growth of a developer community that have been building support for IPv6, Transport Layer Security, on-demand spawning, profiling and much more. There have been 205 individual library releases, 25 presentations, and lots of online chatter through the year, so here follows a summary of our major activities recently.

Clean-Slate Transport Layer Security

David Kaloper and Hannes Mehnert started 2014 with getting interested in writing a safer and cleaner TLS stack in OCaml, and ended the year with a complete demonstration and talk last week in 31C3, the premier hacker conference! Their blog posts over the summer remain an excellent introduction to the new stack:

By summer, the stack was complete enough to connect to the majority of TLS 1.0+ sites on the Internet, and work progressed to integration with the remainder of the MirageOS libraries. By November, the Conduit network library had Unix support for both the OpenSSL/Lwt bindings and the pure OCaml stack, with the ability to dynamically select them. You can now deploy and test the pure OCaml TLS stack on a webserver simply by:

opam install lwt tls cohttp
export CONDUIT_TLS=native
cohttp-server-lwt -c <certfile> -p <port> <directory>

This will spin up an HTTPS server that serves the contents of <directory> to you over TLS. At the same time, we were also working on integrating the TLS stack into the Xen unikernel backend, so we could run completely standalone. This required some surgery:

  • The nocrypto crypto core is written in C, so we had to improve support for linking in external C libraries. Since the Xen unikernel is a single address-space custom kernel, we also need to be careful to compile it with the correct compilation flags or else risk subtle bugs. Thomas Leonard completely rearranged the MirageOS compilation pipeline to support separation compilation of C stubs, and we had the opportunity to remove lots of duplicated code within mirage-platform as a result of this work.
  • Meanwhile, the problem of gathering entropy in a virtual machine reared its head. We created a mirage-entropy device driver, and an active discussion ensued about how best to gather reliable randomness from Xen. Dave Scott built the best solution -- the xenentropyd that proxies entropy from dom0 to a unikernel VM.
  • David Kaloper also ported the nocrypto library to use the OCaml-Ctypes library, which increases the safety of the C bindings significantly. This is described in more detail in the "Modular foreign function bindings" blog post from the summer. This forms the basis for allowing Xen unikernels to communicate with C code, and integration with the MirageOS toolchain will continue to improve next year.

You can see Hannes and David present OCaml-TLS at CCC online. It's been a real pleasure watching their work develop in the last 12 months with such precision and attention to detail!

HTTP and JavaScript

Rudi Grinberg got sufficiently irked with the poor state of documentation for the CoHTTP library that he began gently contributing fixes towards the end of 2013, and rapidly became one of the maintainers. He also began improving the ecosystem around the web stack by building a HTTP routing layer, described in his blog posts:

Meanwhile, Andy Ray started developing HardCaml (a register transfer level hardware design system) in OCaml, and built the iocamljs interactive browser notebook. This uses js_of_ocaml to port the entire OCaml compilation toolstack to JavaScript, including ocamlfind, Lwt threading and dynamic loading support. The results are browsable online, and it is now easy to generate a JavaScript-driven interactive page for many MirageOS libraries.

An interesting side effect of Andy's patches were the addition of a JavaScript port to the CoHTTP library. For those not familiar with the innards, CoHTTP uses the OCaml module system to build a very portable HTTP implementation that can make mapped to different I/O models (Lwt or Async cooperative threading or POSIX blocking I/O), and to different operating systems (e.g. Unix or MirageOS). The JavaScript support mapped the high-level modules in CoHTTP to the XMLHTTPRequest native to JavaScript, allowing the same OCaml HTTP client code to run efficiently on Unix, Windows and now an IOCamlJS browser instance.

MirageOS uses a number of libraries developed by the Ocsigen team at IRILL in Paris, and so I was thrilled to deliver a talk there in December. Romain Calascibetta started integrating Ocsigen and MirageOS over the summer, and the inevitable plotting over beer in Paris lead Gabriel Radanne to kick off an effort to integrate the complete Ocsigen web stack into MirageOS. Head to ocsigen/ocsigenserver#54 if you're interested in seeing this happen in 2015! I also expect the JavaScript and MirageOS integration to continue to improve in 2015, thanks to large industrial users such as Facebook adopting js_of_ocaml in their open-source tools such as Hack and Flow.

IPv6

We've wanted IPv6 support in MirageOS since its inception, and several people contributed to making this possible. At the start of the year, Hugo Heuzard and David Sheets got IPv6 parsing support into the ipaddr library (with me watching bemusedly at how insanely complex parsing is versus IPv4).

Meanwhile, Nicolas Ojeda Bar had been building OCaml networking libraries independently for some time, such as a IMAP client, Maildir handler, and a Bittorrent client. He became interested in the networking layer of MirageOS, and performed a comprehensive cleanup that resulted in a more modular stack that now supports both IPv4 and IPv6!

The addition of IPv6 support also forced us to consider how to simplify the configuration frontend to MirageOS unikernels that was originally written by Thomas Gazagnaire and described here by Mindy Preston. Nicolas has proposed a declarative extension to the configuration that allows applications to extend the mirage command-line more easily, thus unifying the "built-in" MirageOS compilation modes (such as choosing between Xen or Unix) and protocol-specific choices (such as configuring IPv4 and IPv6).

The new approach opens up the possibility of writing more user-friendly configuration frontends that can render them as a text- or web-based selectors, which is really important as more real-world uses of MirageOS are being created. It should be possible in 2015 to solve common problems such as web or DNS serving without having to write a single line of OCaml code.

Profiling

One of the benefits touted by our CACM article on unikernels at the start of the year was the improved tooling from the static linking of an entire application stack with an operating system layer. Thomas Leonard joined the project this year after publishing a widely read blog series on his experiences from switching from Python to OCaml. Aside from leading (and upstreaming to Xen) the port of MirageOS to ARM, he also explored how to add profiling throughout the unikernel stack.

The support is now comprehensive and integrated into the MirageOS trees: the Lwt cooperative threading engine has hooks for thread switching, most of the core libraries register named events, traces are dumped into shared memory buffers in the CTF file format used by the Linux trace toolkit, and there are JavaScript and GTK+ GUI frontends that can parse them.

You can find the latest instructions on Tracing and Profiling on this website, and here are Thomas' original blog posts on the subject:

Irmin

Thomas Gazagnaire spent most of the year furiously hacking away at the storage layer in Irmin, which is a clean-slate storage stack that uses a Git-like branching model as the basis for distributed unikernel storage. Irmin 0.9.0 was released in December with efficiency improvements and a sufficiently portable set of dependencies to make JavaScript compilation practical.

We also had the pleasure of Benjamin Farinier and Matthieu Journault join us as summer interns. Both of them did a great job improving the internals of Irmin, and Benjamin's work on Mergeable Persistent Datastructures will be presented at JFLA 2015.

Jitsu

Magnus Skjegstad returned to Cambridge and got interested in the rapid dynamic provisioning of unikernels. He built Jitsu, a DNS server that spawns unikernels in response to DNS requests and boots them in real-time with no perceptible lag to the end user. The longer term goal behind this is to enable a community cloud of ARM-based Cubieboard2 boards that serve user content without requiring centralised data centers, but with the ease-of-use of existing systems.

Building Jitsu and hitting our goal of extremely low latency management of unikernels required a huge amount of effort from across the MirageOS team.

  • Dave Scott and Jon Ludlam (two of the Xen maintainers at Citrix) improved the Xen xl toolstack to deserialise the VM startup chain to shave 100s of milliseconds off every operation.
  • Thomas Leonard drove the removal of our forked Xen MiniOS with a library version that is being fed upstream (including ARM support). This made the delta between Xen and MirageOS much smaller and therefore made reducing end-to-end latency tractable.
  • David Sheets built a test harness to boot unikernel services and measure their latency under very different conditions, including contrasting boot timer versus Docker containers. In many instances, we ended up booting faster than containers due to not touching disk at all with a standalone unikernel. Ian Leslie built us some custom power measurement hardware that came in handy to figure out how to drive down the energy cost of unikernels running on ARM boards.
  • Thomas Gazagnaire, Balraj Singh, Magnus Skjegstad built the synjitsu proxy server that intercepts and proxies TCP connections to mask the couple of 100 milliseconds during unikernel boot time, ensuring that no TCP connections ever require retransmission from the client.
  • Dave Scott and I built out the vchan shared memory transport that supports low-latency communiction between unikernels and/or Unix processes. This is rapidly heading into a Plan9-like model, with the additional twist of using Git instead of a flat filesystem hierarchy as its coordination basis.
  • Amir Chaudhry and Richard Mortier documented the Git-based (and eventually Irmin-based) workflow behind managing the unikernels themselves, so that they can easily be deployed to distance ARM devices simply by running git pull. You can read more about this in his From Jekyll to Unikernels post.

All of this work was hastily crammed into a USENIX NSDI 2015 paper that got submitted at 4am on a bright autumn morning. Here is the published paper, and we're planning a blog post describing how you can deploy this infrastructure for yourself.

Community

All of the above work was only possible due to the vastly improved tooling and infrastructure around the project. Our community manager Amir Chaudhry led the minuted calls every two weeks that tied the efforts together, and we established some pioneer projects for newcomers to tackle.

The OPAM package manager continued to be the frontend for all MirageOS tools, with releases of libraries happening regularly. Because of the modular nature of MirageOS code, most of the libraries can also be used as normal Unix-based libraries, meaning that we aren't just limited to MirageOS users but can benefit from the entire OCaml community. The graph to the right shows the growth of the total package database since the project started to give you a sense of how much activity there is.

The major OPAM 1.2 also added a number of new features that made MirageOS code easier to develop, including a Git-based library pinning workflow that works superbly with GitHub, and easier Travis integration for continuous integration. Nik Sultana also improved the is-mirage-broken to give us a cron-driven prod if a library update caused an end-to-end failure in building the MirageOS website or other self-hosted infrastructure.

Our favourite random idiot, Mindy Preston, wrote up a superb blog series about her experiences in the spring of 2014 with moving her homepage to be hosted on MirageOS. This was followed up by Thomas Leonard, Phil Tomson, Ian Wilkinson, Toby Moore, and many others that we've tried to record in our link log. We really appreciate the hundreds of bug reports filed by users and folk trying out MirageOS; by taking the trouble to do this, you've helped us refine and polish the frontend. One challenge for 2015 that we could use help on is to pull together many of these distributed blogged instructions and merge them back into the main documentation (get in touch if interested!).

OCaml has come a long way in the last year in terms of tooling, and another task my research group OCaml Labs works on at Cambridge is the development of the OCaml Platform. I'll be blogging separately about our OCaml-specific activities in a few days, but all of this work has a direct impact on MirageOS itself since it lets us establish a local feedback loop between MirageOS and OCaml developers to rapidly iterate on large-scale development. The regular OCaml compiler hacking sessions organised by Jeremy Yallop and Leo White have been a great success this year, with a wide variety of people from academic (Cambridge, London universities and Microsoft Research) and industrial (Jane Street, Citrix and Facebook among others) and locally interested folk. One very important project that has had a lot of work put into it in 2014 (but isn't quite ready for a public release yet) is Assemblage, which will remove much of the boilerplate currently needed to build and release an OCaml library to OPAM.

We also had a great time working with open-source summer programs. Thanks to the Xen Foundation and GNOME for their support here, and we hope to do this again next summer! The roundup posts were:

Upcoming features

So what's coming up for our unikernels in 2015? Our focus heading into the new year is very much on improving the ease-of-use and deployability of MirageOS and fleshing out the feature set for the early adopters such as the XAPI project, Galois, and the Nymote personal data project. Here are some of the highlights:

  • Dust Clouds: The work on Jitsu is leading to the construction of what we term "dust clouds": on-demand scaling of unikernel services within milliseconds of requests coming in, terminated right beside the user on local ARM devices. The model supports existing clouds as well, and so we are improving support for cloud APIs such via Jyotsna Prakash's EC2 bindings, XenAPI, and (volunteers needed) OpenStack support. If you're interested in tracking this work, head over to the Nymote site for updates.

  • Portability: Beyond Xen, there are several efforts afoot to port MirageOS to bare metal targets. One promising effort is to use Rump Kernels as the boot infrastructure and MirageOS as the application stack. We hope to have a Raspberry Pi and other ARM targets fairly soon. Meanwhile at the end of the spectrum is mobile computing, which was part of the original multiscale vision for starting the project. The JavaScript, iOS and Android ports are all progressing (mainly thanks to community contributions around OCaml support for this space, such as Jeff Psellos' hard work on OCaml-IOS).

  • Protocol Development: There are a huge number of protocols being developed independently, and more are always welcome. Luke Dunstan is hacking on multicast DNS support, we have an IMAP client and server, Dominic Price has built a series of social network APIs for Facebook or Tumblr, and Masoud Koleini has been extending Haris Rotsos' work to achieve a line-rate and type-safe OpenFlow switch and controller based on the Frenetic project. Hannes is also developing Jackline, which uses his MirageOS to assemble a trustworthy communication client. Daniel Buenzli also continues to release a growing set of high-quality, modular libraries that we depend on throughout MirageOS.

  • Storage: All storage services from the unikernels will be Git-based (e.g. logging, command-and-control, key-value retrieval). Expect to see Xen toolstack extensions that make this support seamless, so a single Linux VM will be able to control a large army of unikernels via persistent data structures.

Want to get involved?

This is a really fun time to get involved with unikernels and the MirageOS project. The year of 2014 has seen lots of discussion about the potential of unikernels and we'll see some of the first big deployments involving them in 2015. For the ones among you who wish to learn more, then check out the pioneer projects, watch out for Amir's meeting notes and join the voice calls if you want a more interactive discussion, and engage on the mailing lists with any questions you might have.

For me personally, it's been a real privilege to spend the year working with and learning from the friendly, intelligent and diverse community that is springing up around the project. The progression from experiment to reality has been a lot of work, but the unikernel dream is finally coming together rath[er nicely thanks to everyone's hard work and enthusiasm. I'd also like to thank all of our funding bodies and the Linux Foundation and the Xen Project (especially Lars Kurth and Russell Pavlicek) for their support throughout the year that made all this work possible. Happy new year, everyone!


Jul
22
2014
11
0

MirageOS v2.0: a recap of the new features

By Anil Madhavapeddy

This work funded in part by the EU FP7 User-Centric Networking project, Grant No. 611001.

The first release of MirageOS back in December 2013 introduced the prototype of the unikernel concept, which realised the promise of a safe, flexible mechanism to build highly optimized software stacks purpose-built for deployment in the public cloud (more background on this). Since then, we've been hard at work using and extending MirageOS for real projects and the community has been steadily growing.

We're thrilled to announce the release of MirageOS v2.0 today! Over the past few weeks the team has been hard at work blogging about all the new features in this latest release, coordinated by the tireless Amir Chaudhry:

All the libraries required for these new features are regularly released into the OPAM package manager, so just follow the installation instructions to give them a spin. A release this size probably introduces minor hiccups that may cause build failures, so we very much encourage bug reports on our issue tracker or questions to our mailing lists. Don't be shy: no question is too basic, and we'd love to hear of any weird and wacky uses you put this new release to! And finally, the lifeblood of MirageOS is about sharing and publishing libraries that add new functionality to the framework, so do get involved and open-source your own efforts.

Breaking news: Richard Mortier and I will be speaking at OSCON this week on Thursday morning about the new features in F150 in the Cloud Track. Come along if you are in rainy Portland at the moment!


Jul
22
2014
10
0

Building an ARMy of Xen unikernels

By Thomas Leonard

Mirage has just gained the ability to compile unikernels for the Xen/arm32 platform, allowing Mirage guests to run under the Xen hypervisor on ARM devices such as the Cubieboard 2 and CubieTruck.

Introduction

The ARMv7 architecture introduced the (optional) Virtualization Extensions, providing hardware support for running virtual machines on ARM devices, and Xen's ARM Hypervisor uses this to support hardware accelerated ARM guests.

Mini-OS is a tiny OS kernel designed specifically for running under Xen. It provides code to initialise the CPU, display messages on the console, allocate memory (malloc), and not much else. It is used as the low-level core of Mirage's Xen implementation.

Mirage v1 was built on an old version of Mini-OS which didn't support ARM. For Mirage v2, we have added ARM support to the current Mini-OS (completing Karim Allah Ahmed's initial ARM port) and made Mirage depend on it as an external library. This means that Mirage will automatically gain support for other architectures that get added later. We are currently working with the Xen developers to get our Mini-OS fork upstreamed.

In a similar way, we have replaced Mirage v1's bundled maths library with a dependency on the external OpenLibm, which we also extended with ARM support (this was just a case of fixing the build system; the code is from FreeBSD's libm, which already supported ARM).

Mirage v1 also bundled dietlibc to provide its standard C library. A nice side-effect of this work came when we were trying to separate out the dietlibc headers from the old Mini-OS headers in Mirage. These had rather grown together over time and the work was proving difficult, until we discovered that we no longer needed a libc at all, as almost everything that used it had been replaced with pure OCaml versions! The only exception was the printf code for formatting floating point numbers, which OCaml uses in its printf implementation. We replaced that by taking the small fmt_fp function from musl libc.

Here's the final diffstat of the changes to mirage-platform adding ARM support:

778 files changed, 1949 insertions(+), 59689 deletions(-)

Trying it out

You'll need an ARM device with the Virtualization Extensions. I've been testing using the Cubieboard 2 (and CubieTruck):

Cubieboard2

The first step is to install Xen. Running Xen on the Cubieboard2 documents the manual installation process, but you can now also use mirage/xen-arm-builder to build an SDcard image automatically. Copy the image to the SDcard, connect the network cable and power, and the board will boot Xen.

Once booted you can ssh to Dom0, the privileged Linux domain used to manage the system, install Mirage, and build your unikernel just as on x86. Currently, you need to select the Git versions of some components. The following commands will install the necessary versions if you're using the xen-arm-builder image:

$ opam init
$ opam install mirage-xen-minios
$ opam remote add mirage-dev git://github.com/mirage/mirage-dev
$ opam install mirage

Technical details

One of the pleasures of unikernels is that you can comprehend the whole system with relatively little effort, and those wishing to understand, debug or contribute to the ARM support may find the following technical sections interesting. However, you don't need to know the details of the ARM port to use it, as Mirage abstracts away the details of the underlying platform.

The boot process

An ARM Mirage unikernel uses the Linux zImage format, though it is not actually compressed. Xen will allocate some RAM for the image and load the kernel at the offset 0x8000 (32 KB).

Execution begins in arm32.S, with the r2 register pointing to a Flattened Device Tree (FDT) describing details of the virtual system. This assembler code performs a few basic boot tasks:

  1. Configuring the MMU, which maps virtual addresses to physical addresses (see next section).
  2. Turning on caching and branch prediction.
  3. Setting up the exception vector table (this says how to handle interrupts and deal with various faults, such as reading from an invalid address).
  4. Setting up the stack pointer and calling the C function arch_init.

arch_init makes some calls to the hypervisor to set up support for the console and interrupt controller, and then calls start_kernel.

start_kernel (in libxencaml) sets up a few more features (events, malloc, time-keeping and grant tables), then calls caml_startup.

caml_startup (in libocaml) initialises the garbage collector and calls caml_program, which is your application's main.ml.

The address space

With the Virtualization Extensions, there are two stages to converting a virtual memory address (used by application code) to a physical address in RAM. The first stage is under the control of the guest VM, mapping the virtual address to what the guest believes is the physical address (this address is referred to as the Intermediate Physical Address or IPA). The second stage, under the control of Xen, maps the IPA to the real physical address. The tables holding these mappings are called translation tables.

Mirage's memory needs are simple: most of the RAM should be used for the garbage-collected OCaml heap, with a few pages used for interacting with Xen (these don't go on the OCaml heap because they must be page aligned and must not move around).

Xen does not commit to using a fixed address as the IPA of the RAM, but the C code needs to run from a known location. To solve this problem the assembler code in arm32.S detects where it is running from and sets up a virtual-to-physical mapping that will make it appear at the expected location, by adding a fixed offset to each virtual address. For example, on Xen/unstable, we configure the beginning of the virtual address space to look like this (on Xen 4.4, the physical addresses would start at 80000000 instead):

Virtual addressPhysical address (IPA)Purpose
40000040000000Stack (16 KB)
40400040004000Translation tables (16 KB)
40800040008000Kernel image

The physical address is always at a fixed offset from the virtual address and the addresses wrap around, so virtual address c0400000 maps back to physical address 0 (in this example).

The stack, which grows downwards, is placed at the start of RAM so that a stack overflow will trigger a fault rather than overwriting other data.

The 16 KB translation table is an array of 4-byte entries each mapping 1 MB of the virtual address space, so the 16 KB table is able to map the entire 32-bit address space (4 GB). Each entry can either give the physical section address directly (which is what we do) or point to a second-level table mapping individual 4 KB pages. By using only the top-level table we reduce possible delays due to TLB misses.

After the kernel code comes the data (constants and global variables), then the bss section (data that is initially zero, and therefore doesn't need to be stored in the kernel image), and finally the rest of the RAM, which is handed over to the malloc system.

Contact

The current version seems to be working well on Xen 4.4 (stable) and the 4.5 development version, but has only been lightly tested. If you have any problems or questions, or get it working on other devices, please let us know!


Jul
21
2014
11
0

Using Irmin to add fault-tolerance to the Xenstore database

By Dave Scott

This is the second in a series of posts that introduces the Irmin distributed storage engine. You might like to begin with the introductory post.

Xenstore is a critical service found on all hosts running Xen. Xenstore is necessary to

  • configure all VM I/O devices such as disk controllers and network interface cards;
  • share performance statistics and OS version information; and
  • signal VMs during shutdown, suspend, resume, migrate etc.

Xenstore must be reliable: if it fails then the host is unmanageable and must be rebooted.

Xenstore must be secure: if it is compromised by a VM then that VM can access data belonging to other VMs.

The current version of Xenstore is already written in OCaml and documented in the paper OXenstored: an efficient hierarchical and transactional database using functional programming with reference cell comparisons presented at ICFP 2009. The existing code works very reliably, but there is always room for improvement for debuggability of such a complex system component. This is where Irmin, the storage layer of Mirage 2.0, can help.

But first, a quick Xenstore primer:

Xen and Xenstore in 30 seconds

The Xen hypervisor focuses on isolating VMs from each-other; the hypervisor provides a virtual CPU scheduler and a memory allocator but does not perform I/O on behalf of guest VMs. On a Xen host, privileged server VMs perform I/O on behalf of client VMs. The configuration for calculating which server VM services requests for which client VMs is stored in Xenstore, as key/value pairs.

The following diagram shows a Xen host with a single client and server VM, with a single virtual device in operation. Disk blocks and network packets flow via shared memory between Xen-aware drivers in the VMs, shown in the lower-half. The control-plane, shown in the upper-half, contains the metadata about the datapath: how the device should appear in the client VM; where the I/O should go in the server VM; where the shared memory control structures are etc.

Device configuration is stored in Xenstore as key=value pairs.

The Xenstore device attach protocol insists that all device keys are added through atomic transactions, i.e. partial updates are never visible to clients and transactions cannot interfere with each other. A Xenstore server must abort transactions whose operations were not successfully isolated from other transactions. After an abort, the client is expected to retry. Each key=value write is communicated to the server as a single request/response, so transactions comprising multiple writes are open for multiple round-trip times. This protocol is baked into guest VM kernels (including Linux, FreeBSD, Mirage, ...) and won't change anytime soon.

Xenstore is used heavily when lots of VMs are starting in parallel. Each VM typically has several devices, each of these devices is added in a parallel transaction and therefore many transactions are open at once. If the server aborts too many of these transactions, causing the clients to retry, the system will make little progress and may appear to live-lock. The challenge for a Xenstore implementation is to minimise the number of aborted transactions and retries, without compromising on the isolation guarantee.

Irmin Xenstore design goals

The design goals of the Irmin-based Mirage Xenstore server are:

  1. safely restart after a crash;
  2. make system debugging easy; and
  3. go really fast!

How does Irmin help achieve these goals?

Restarting after crashes

The Xenstore service is a reliable component and very rarely crashes. However, if a crash does occur, the impact is severe on currently running virtual machines. There is no protocol for a running VM to close its connection to a Xenstore and open a new one, so if Xenstore crashes then running VMs are simply left orphaned. VMs in this state are impossible to manage properly: there is no way to shut them down cleanly, to suspend/resume or migrate, or to configure any disk or network interfaces. If Xenstore crashes, the host must be rebooted shortly after.

Irmin helps make Xenstore recoverable after a crash, by providing a library that applications can use to persist and synchronise distributed data structures on disk and in memory. By using Irmin to persist all our state somewhere sensible and taking care to manage our I/O carefully, then the server process becomes stateless and can be restarted at will.

To make Xenstore use Irmin, the first task is to enumerate all the different kinds of state in the running process. This includes the obvious key-value pairs used for VM configuration as well as data currently hidden away in the OCaml heap: the addresses in memory of established communication rings, per-domain quotas, pending watch events and watch registrations etc etc. Once the state has been enumerated it must be mapped onto key-value pairs which can be stored in Irmin. Rather than using ad-hoc mappings everywhere, the Mirage Irmin server has persistent Maps, persistent Sets, persistent Queues and persistent reference cells.

Irmin applications are naturally written as functors, with the details of the persistence kept abstract. The following Irmin-inspired signature represents what Xenstore needs from Irmin:

module type VIEW = sig
  type t

  val create: unit -> t Lwt.t
  (** Create a fresh VIEW from the current state of the store.
      A VIEW tracks state queries and updates and acts like a branch
      which has an explicit [merge]. *)

  val read: t -> Protocol.Path.t -> 
    [ `Ok of Node.contents | `Enoent of Protocol.Path.t ] Lwt.t
  (** Read a single key *)

  val list: t -> Protocol.Path.t -> 
    [ `Ok of string list | `Enoent of Protocol.Path.t ] Lwt.t
  (** List all the children of a key *)

  val write: t -> Protocol.Path.t -> Node.contents -> 
    [ `Ok of unit ] Lwt.t
  (** Update a single key *)

  val mem: t -> Protocol.Path.t -> bool Lwt.t
  (** Check whether a key exists *)

  val rm: t -> Protocol.Path.t -> [ `Ok of unit ] Lwt.t
  (** Remove a key *)

  val merge: t -> string -> bool Lwt.t
  (** Merge this VIEW into the current state of the store *)
end

The main 'business logic' of Xenstore can then be functorised over this signature relatively easily. All we need is to instantiate the functor using Irmin to persist the data somewhere sensible. Eventually we will need two instantiations: one which runs as a userspace application and which writes to the filesystem; and a second which will run as a native Xen kernel (known as a xenstore stub domain) and which will write to a fixed memory region (like a ramdisk). The choice of which to use is left to the system administrator. Currently most (if not all) distribution packagers choose to run Xenstore in userspace. Administrators who wish to further secure their hosts are encouraged to run the kernelspace version to isolate Xenstore from other processes (where a VM offers more isolation than a container, which offers more isolation than a chroot). Note this choice is invisible to the guest VMs.

So far in the Irmin Xenstore integration only the userspace instantiation has been implemented. One of the most significant user-visible features is that all of the operations done through Irmin can be inspected using the standard git command line tool. The runes to configure Irmin to write git format data to the filesystem are as follows:

    let open Irmin_unix in
    let module Git = IrminGit.FS(struct
      let root = Some filename
      let bare = true
    end) in
    let module DB = Git.Make(IrminKey.SHA1)(IrminContents.String)(IrminTag.String) in
    DB.create () >>= fun db ->

where keys and values will be mapped into OCaml strings, and our VIEW.t is simply an Irmin DB.View.t. All that remains is to implement read, list, write, rm by

  1. mapping Xenstore Protocol.Path.t values onto Irmin keys; and
  2. mapping Xenstore Node.contents records onto Irmin values.

As it happens Xenstore and Irmin have similar notions of "paths" so the first mapping is easy. We currently use sexplib to map Node.contents values onto strings for Irmin.

The resulting Irmin glue module looks like:

    let module V = struct
      type t = DB.View.t
      let create = DB.View.create
      let write t path contents =
        DB.View.update t (value_of_filename path) (Sexp.to_string (Node.sexp_of_contents contents))
      (* omit read,list,write,rm for brevity *)
      let merge t origin =
        let origin = IrminOrigin.create "%s" origin in
        DB.View.merge_path ~origin db [] t >>= function
        | `Ok () -> return true
        | `Conflict msg ->
          info "Conflict while merging database view: %s" msg;
          return false
    end in

The write function simply calls through to Irmin's update function, while the merge function calls Irmin's merge_path. If Irmin cannot merge the transaction then our merge function will return false and this will be signalled to the client, which is expected to retry the high-level operation (e.g. hotplugging or unplugging a device).

Now all that remains is to carefully adjust the I/O code so that effects (reading and writing packets along the persistent connections) are interleaved properly with persisted state changes and voilà, we now have a xenstore which can recover after a restart.

Easy system debugging with Git

When something goes wrong on a Xen system it's standard procedure to

  1. take a snapshot of the current state of Xenstore; and
  2. examine the log files for signs of trouble.

Unfortunately by the time this is done, interesting Xenstore state has usually been deleted. Unfortunately the first task of the human operator is to evaluate by-hand the logged actions in reverse to figure out what the state actually was when the problem happened. Obviously this is tedious, error-prone and not always possible since the log statements are ad-hoc and don't always include the data you need to know.

In the new Irmin-powered Xenstore the history is preserved in a git-format repository, and can be explored using your favourite git viewing tool. Each store update has a compact one-line summary, a more verbose multi-line explanation and (of course) the full state change is available on demand.

For example you can view the history in a highly-summarised form with:

$ git log --pretty=oneline --abbrev-commit --graph
* 2578013 Closing connection -1 to domain 0
* d4728ba Domain 0: rm /bench/local/domain/0/backend/vbd/10 = ()
* 4b55c99 Domain 0: directory /bench/local/domain/0/backend = [ vbd ]
* a71a903 Domain 0: rm /bench/local/domain/10 = ()
* f267b31 Domain 0: rm /bench/vss/uuid-10 = ()
* 94df8ce Domain 0: rm /bench/vm/uuid-10 = ()
* 0abe6b0 Domain 0: directory /bench/vm/uuid-10/domains = [  ]
* 06ddd3b Domain 0: rm /bench/vm/uuid-10/domains/10 = ()
* 1be2633 Domain 0: read /bench/local/domain/10/vss = /bench/vss/uuid-10
* 237a8e4 Domain 0: read /bench/local/domain/10/vm = /bench/vm/uuid-10
* 49d70f6 Domain 0: directory /bench/local/domain/10/device = [  ]
*   ebf4935 Merge view to /
|\
| * e9afd9f Domain 0: read /bench/local/domain/10 =
* | c4e0fa6 Domain 0: merging transaction 375
|/

The summarised form shows both individual operations as well as isolated transactions which are represented as git branches. You can then 'zoom in' and show the exact state change with commands like:

$ git show bd44e03
commit bd44e0388696380cafd048eac49474f68d41bd3a
Author: 448 <irminsule@openmirage.org>
Date:   Thu Jan 1 00:09:26 1970 +0000

    Domain 0: merging transaction 363

diff --git a/*0/bench.dir/local.dir/domain.dir/7.dir/control.dir/shutdown.value b/*0/bench.dir/local.dir/domain.dir/7.dir/control.dir/shutdown.value
new file mode 100644
index 0000000..aa38106
--- /dev/null
+++ b/*0/bench.dir/local.dir/domain.dir/7.dir/control.dir/shutdown.value
@@ -0,0 +1 @@
+((creator 0)(perms((owner 7)(other NONE)(acl())))(value halt))

Last but not least, you can git checkout to the exact time the problem occurred and examine the state of the store.

Going really fast

Xenstore is part of the control-plane of a Xen system and is most heavily stressed when lots of VMs are being started in parallel. Each VM has multiple devices and each device is added in a separate transaction. These transactions remain open for multiple client-server round-trips, as each individual operation is sent to Xenstore as a separate RPC. To provide isolation, each Xenstore transaction is represented by an Irmin VIEW.t which is persisted on disk as a git branch. When starting lots of VMs in parallel, lots of branches are created and must be merged back together. If a branch cannot be merged then an abort signal is sent to the client and it must retry.

Earlier versions of Xenstore had naive transaction merging algorithms which aborted many of these transactions, causing the clients to re-issue them.This led to a live-lock where clients were constantly reissuing the same transactions again and again.

Happily Irmin's default merging strategy is much better: by default Irmin records the results of every operation and replays the operations on merge (similar to git rebase). Irmin will only generate a Conflict and signal an abort if the client would now see different results to those it has already received (imagine reading a key twice within an isolated transaction and seeing two different values). In the case of parallel VM starts, the keys are disjoint by construction so all transactions are merged trivially; clients never receive abort signals; and therefore the system makes steady, predictable progress starting the VMs.

Trying it out

The Irmin Xenstore is under active development but you can try it by:

Install basic development tools along with the xen headers and xenstore tools (NB you don't actually have to run Xen):

  sudo apt-get install libxen-dev xenstore-utils opam build-essential m4

Initialise opam (if you haven't already). Make sure you have OCaml 4.01:

  opam init
  opam update
  opam switch 4.01.0

Install the OCaml build dependencies:

  opam install lwt irmin git sexplib cstruct uri sexplib cmdliner xen-evtchn shared-memory-ring io-page ounit

Clone the code and build it:

  git clone git://github.com/mirage/ocaml-xenstore-server
  cd ocaml-xenstore-server
  make

Run a server (as a regular user):

  ./main.native --database /tmp/db --enable-unix --path /tmp/xenstored

In a separate terminal, perform some operations:

  export XENSTORED_PATH=/tmp/xenstored
  xenstore-write -s /one/two/three 4 /five/six/seven 8
  xenstore-ls -s /

Next check out the git repo generated by Irmin:

  cd /tmp/db
  git log

Comments and/or contributions are welcome: join the Mirage email list and say hi!


Jul
18
2014
13
0

Introducing Irmin: Git-like distributed, branchable storage

By Thomas Gazagnaire

This is the first post in a series which will describe Irmin, the new Git-like storage layer for Mirage OS 2.0. This post gives a high-level description on Irmin and its overall architecture, and later posts will detail how to use Irmin in real systems.

Irmin is a library to persist and synchronize distributed data structures both on-disk and in-memory. It enables a style of programming very similar to the Git workflow, where distributed nodes fork, fetch, merge and push data between each other. The general idea is that you want every active node to get a local (partial) copy of a global database and always be very explicit about how and when data is shared and migrated.

Irmin is not, strictly speaking, a full database engine. It is, as are all other components of Mirage OS, a collection of libraries designed to solve different flavours of the challenges raised by the CAP theorem. Each application can select the right combination of libraries to solve its particular distributed problem. More precisely, Irmin consists of a core of well-defined low-level data structures that specify how data should be persisted and be shared across nodes. It defines algorithms for efficient synchronization of those distributed low-level constructs. It also builds a collection of higher-level data structures, like persistent mergeable queues, that can be used by developers without having to know precisely how Irmin works underneath.

Since it's a part of Mirage OS, Irmin does not make strong assumptions about the OS environment that it runs in. This makes the system very portable, and the details below hold for in-memory databases as well as for slower persistent serialization such as SSDs, hard drives, web browser local storage, or even the Git file format.

Persistent Data Structures

Persistent data structures are well known and used pervasively in many different areas. The programming language community has investigated the concepts widely (and this is not limited to functional programming), and in the meantime, the systems community experimented with various persistent strategies such as copy-on-write filesystems. In most of these systems, the main concern is how to optimize the space complexity by maximizing the sharing of immutable sub-structures.

The Irmin design ideas share roots with previous works on persistent data structures, as it provides an efficient way to fork data structures, but it also explores new strategies and mechanisms to be able to efficiently merge back these forked structures. This offers programming constructs very similar to the Git workflow.

Irmin focuses on two main aspects:

  • Semantics: what properties the resulting merged objects should verify.

  • Complexity: how to design efficient merge and synchronization primitives, taking advantage of the immutable nature of the underlying objects.

Although it is pervasively used, data persistence has a very broad and fuzzy meaning. In this blog post, I will refer to data persistence as a way for:

  • a single process to lazily populate a process memory on startup. You need this when you want the process to be able to resume while holding part of its previous state if it crashes

  • concurrent processes to share references between objects living in a global pool of data. Sharing references, as opposed to sharing values, reduces memory copies and allow different processes to concurrently update a shared store.

In both cases, you need a global pool of data (the Irmin block store) and a way to name values in that pool (the Irmin tag store).

The Block Store: a Virtual Heap

Even high-level data structures need to be allocated in memory, and it is the purpose of the runtime to map such high-level constructs into low-level memory graph blocks. One of the strengths of OCaml is the very simple and deterministic mapping from high-level data structures to low-level block representations (the heap): see for instance, the excellent series of blog posts on OCaml internals by Richard W. Jones, or Chapter 20: Memory Representation of Values in Real World OCaml.

An Irmin block store can be seen as a virtual OCaml heap that uses a more abstract way of connecting heap blocks. Instead of using the concrete physical memory addresses of blocks, Irmin uses the hash of the block contents as an address. As for any content-addressable storage, this gives Irmin block stores a lot of nice properties and greatly simplifies the way distributed stores can be synchronized.

Persistent data structures are immutable, and once a block is created in the block store, its contents will never change again. Updating an immutable data structure means returning a completely new structure, while trying to share common sub-parts to avoid the cost of making new allocations as much as possible. For instance, modifying a value in a persistent tree means creating a chain of new blocks, from the root of the tree to the modified leaf. For convenience, Irmin only considers acyclic block graphs -- it is difficult in a non-lazy pure language to generate complex cyclic values with reasonable space usage.

Conceptually, an Irmin block store has the following signature:

type t
(** The type for Irmin block store. *)

type key
(** The type for Irmin pointers *)

type value = ...
(** The type for Irmin blocks *)

val read: t -> key -> value option
(** [read t k] is the block stored at the location [k] of the
store. It is [None] if no block is available at that location. *)

val add: t -> key -> value -> t
(** [add t k v] is the *new* store storing the block [v] at the
location [k]. *)

Persistent data structures are very efficient to store in memory and on disk as you do not need write barriers, and updates can be written sequentially instead of requiring random access into the data structure.

The Tag Store: Controlled Mutability and Concurrency

So far, we have only discussed purely functional data structures, where updating a structure means returning a pointer to a new structure in the heap that shares most of its contents with the previous one. This style of programming is appealing when implementing complex protocols as it leads to better compositional properties.

Irmin Stores

However, this makes sharing information between processes much more difficult, as you need a way to "inject" the state of one structure into another process's memory. In order to do so, Irmin borrows the concept of branches from Git by relating every operation to a branch name, and modifying the tip of the branch if it has side-effects. The Irmin tag store is the only mutable part of the whole system and is responsible for mapping some global (branch) names to blocks in the block store. These tag names can then be used to pass block references between different processes.

A block store and a tag store can be combined to build a higher-level store (the Irmin store) with fine concurrency control and atomicity guarantees. As mutation happens only in the tag store, we can ensure that as long a given tag is not updated, no change made in the block store will be visible by anyone. This also gives a nice story for concurrency: as in Git, creating a concurrent view of the store is the straightforward operation of creating a new tag that denotes a new branch. All concurrent operations can then happen on different branches:

type t
(** The type for Irmin store. *)

type tag
(** Mutable tags *)

type key = ...
(** The type for user-defined keys (for instance a list of strings) *)

type value = ...
(** The type for user-defined values *)

val read: t -> ?branch:tag -> key -> value option
(** [read t ?branch k] reads the contents of the key [k] in the branch
[branch] of the store [t]. If no branch is specified, then use the
["HEAD"] one. *)

val update: t -> ?branch:tag -> key -> value -> unit
(** [update t ?branch k v] *updates* the branch [branch] of the store
[t] the association of the key [key] to the value [value]. *)

Interactions between concurrent processes are completely explicit and need to happen via synchronization points and merge events (more on this below). It is also possible to emulate the behaviour of transactions by recording the sequence of operations (read and update) on a given branch -- that sequence is used before a merge to check that all the operations are valid (i.e. that all reads in the transaction still return the same result on the current tip of the store) and it can be discarded after the merge takes place.

Merging Data Structures

To merge two data structures in a consistent way, one has to compute the sequence of operations which leads, from an initial common state, to two diverging states (the ones that you want to merge). Once these two sequences of operations have been found, they must be combined (if possible) in a sensible way and then applied again back on the initial state, in order to get the new merged state. This mechanism sounds nice, but in practice it has two major drawbacks:

  • It does not specify how we find the initial state from two diverging states -- this is generally not possible (think of diverging counters); and
  • It means we need to compute the sequence of update operations that leads from one state to an other. This is easier than finding the common initial state between two branches, but is still generally not very efficient.

In Irmin, we solve these problems using two mechanisms.

First of all, an interesting observation is that that we can model the sequence of store tips as a purely functional data-structure. We model the partial order of tips as a directed acyclic graph where nodes are the tips, and there is an edge between two tips if either (i) one is the result of applying a sequence of updates to the other, or (ii) one is the result of a merge operation between the other and some other tips. Practically speaking, that means that every tip should contains the list of its predecessors as well as the actual data it associated to. As it is purely functional, we can (and we do) store that graph in an Irmin block store.

Finding a common ancestor

Having a persistent and immutable history is good for various obvious reasons, such as access to a forensics if an error occurs or snapshot and rollback features for free. But another less obvious useful property is that we can now find the greatest common ancestors of two data structures without an expensive global search.

The second mechanism is that we require the data structures used in Irmin to be equipped with a well-defined 3-way merge operation, which takes two diverging states, the corresponding initial state (computed using the previous mechanism) and that return either a new state or a conflict (similar to the EAGAIN exception that you get when you try to commit a conflicting transaction in more traditional transactional databases). Having access to the common ancestors makes a great difference when designing new merge functions, as usually no modification is required to the data-structure itself. In contrast, the conventional approach is more invasive as it requires the data structure to carry more information about the operation history (for instance conflict-free replicated datatypes, which relies on unbounded vector clocks).

We have thus been designing interesting data structure equipped with a 3-way merge, such as counters, queues and ropes.

This is what the implementation of distributed and mergeable counters looks like:

type t = int
(** distributed counters are just normal integers! *)

let merge ~old t1 t2 = old + (t1-old) + (t2-old)
(** Merging counters means:
   - computing the increments of the two states [t1] and [t2]
     relatively to the initial state [old]; and
   - and add these two increments to [old]. *)

Next steps, how to git at your data

From a design perspective, having access to the history makes it easier to design complex data structures with good compositional properties to use in unikernels. Moreover, as we made few assumptions on how the substrate of the low-level constructs need to be implemented, the Irmin engine can be be ported to many exotic backends such as JavaScript or anywhere else that Mirage OS runs: this is just a matter of implementing a rather trivial signature.

From a developer perspective, this means that the full history of operations is available to inspect, and that the history model is very similar to the Git workflow that is increasingly familiar. So similar, in fact, that we've developed a bidirectional mapping between Irmin data structures and the Git format to permit the git command-line to interact with.

The next post in our series explains what Dave Scott has been doing with the new version of the Xenstore database that powers every Xen host, where the entire database is stored in a prefix-tree Irmin data-structure and exposed as a Git repository which is live-updated! Here's a sneak preview...


Jul
17
2014
13
0

Fitting the modular MirageOS TCP/IP stack together

By Mindy Preston

A critical part of any unikernel is its network stack -- it's difficult to think of a project that needs a cloud platform or runs on a set-top box with no network communications.

Mirage provides a number of module types that abstract interfaces at different layers of the network stack, allowing unikernels to customise their own stack based on their deployment needs. Depending on the abstractions your unikernel uses, you can fulfill these abstract interfaces with implementations ranging from the venerable and much-imitated Unix sockets API to a clean-slate Mirage TCP/IP stack written from the ground up in pure OCaml!

A Mirage unikernel will not use all these interfaces, but will pick those that are appropriate for the particular application at hand. If your unikernel just needs a standard TCP/IP stack, the STACKV4 abstraction will be sufficient. However, if you want more control over the implementation of the different layers in the stack or you don't need TCP support, you might construct your stack by hand using just the NETWORK, ETHIF, IPV4 and UDPV4 interfaces.

How a Stack Looks to a Mirage Application

Mirage provides a high-level interface to a TCP/IP network stack through the module type STACKV4. (Currently this can be included with open V1_LWT, but soon open V2_LWT will also bring this module type into scope as well when Mirage 2.0 is released.)

(** Single network stack *)                                                     
module type STACKV4 = STACKV4                                                   
  with type 'a io = 'a Lwt.t                                                    
   and type ('a,'b,'c) config = ('a,'b,'c) stackv4_config                       
   and type ipv4addr = Ipaddr.V4.t                                              
   and type buffer = Cstruct.t 

STACKV4 has useful high-level functions, a subset of which are reproduced below:

    val listen_udpv4 : t -> port:int -> UDPV4.callback -> unit
    val listen_tcpv4 : t -> port:int -> TCPV4.callback -> unit
    val listen : t -> unit io

as well as submodules that include functions for data transmission:

    module UDPV4 :
      sig
        type callback =
            src:ipv4addr -> dst:ipv4addr -> src_port:int -> buffer -> unit io
        val input :
          listeners:(dst_port:int -> callback option) -> t -> ipv4input
        val write :
          ?source_port:int ->
          dest_ip:ipv4addr -> dest_port:int -> t -> buffer -> unit io
    module TCPV4 :
      sig
        type flow
        type callback = flow -> unit io
        val read : flow -> [ `Eof | `Error of error | `Ok of buffer ] io
        val write : flow -> buffer -> unit io
        val close : flow -> unit io
        val create_connection :
          t -> ipv4addr * int -> [ `Error of error | `Ok of flow ] io
        val input : t -> listeners:(int -> callback option) -> ipv4input

These should look rather familiar if you've used the Unix sockets API before, with one notable difference: the stack accepts functional callbacks to react to events such as a new connection request. This permits callers of the library to define the precise datastructures that are used to store intermediate state (such as active connections). This becomes important when building very scalable systems that have to deal with lots of concurrent connections efficiently.

Configuring a Stack

The STACKV4 signature shown so far is just a module signature, and you need to find a concrete module that satisfies that signature. The known implementations of a module can be found in the mirage CLI frontend, which provids the configuration API for unikernels.
There are currently two implementations for STACKV4: direct and socket.

module STACKV4_direct: CONFIGURABLE with                                        
  type t = console impl * network impl * [`DHCP | `IPV4 of ipv4_config]         
                                                                                
module STACKV4_socket: CONFIGURABLE with                                        
  type t = console impl * Ipaddr.V4.t list  

The socket implementations rely on an underlying OS kernel to provide the transport, network, and data link layers, and therefore can't be used for a Xen guest VM deployment. Currently, the only way to use socket is by configuring your Mirage project for Unix with mirage configure --unix. This is the mode you will most often use when developing high-level application logic that doesn't need to delve into the innards of the network stack (e.g. a REST website).

The direct implementations use the mirage-tcpip implementations of the transport, network, and data link layers. When you use this stack, all the network traffic from the Ethernet level up will be handled in pure OCaml. This means that the direct stack will work with either a Xen guest VM (provided there's a valid network configuration for the unikernel's running environment of course), or a Unix program if there's a valid tuntap interface. direct this works with both mirage configure --xen and mirage configure --unix as long as there is a corresponding available device when the unikernel is run.

There are a few Mirage functions that provide IPv4 (and UDP/TCP) stack implementations (of type stackv4 impl), usable from your application code. The stackv4 impl is generated in config.ml by some logic set when the program is mirage configure'd - often by matching an environment variable. This means it's easy to flip between different stack implementations when developing an application just be recompiling the application. The config.ml below allows the developer to build socket code with NET=socket make and direct code with NET=direct make.

let main = foreign "Services.Main" (console @-> stackv4 @-> job)

let net =
  try match Sys.getenv "NET" with
    | "direct" -> `Direct
    | "socket" -> `Socket
    | _        -> `Direct
  with Not_found -> `Direct

let dhcp =
  try match Sys.getenv "ADDR" with
    | "dhcp"   -> `Dhcp
    | "static" -> `Static
    | _ -> `Dhcp
  with Not_found -> `Dhcp

let stack console =
  match net, dhcp with
  | `Direct, `Dhcp   -> direct_stackv4_with_dhcp console tap0
  | `Direct, `Static -> direct_stackv4_with_default_ipv4 console tap0
  | `Socket, _       -> socket_stackv4 console [Ipaddr.V4.any]

let () =
  register "services" [
    main $ default_console $ stack default_console
  ]

Moreover, it's possible to configure multiple stacks individually for use in the same program, and to register multiple modules from the same config.ml. This means functions can be written such that they're aware of the network stack they ought to be using, and no other - a far cry from developing network code over most socket interfaces, where it can be quite difficult to separate concerns nicely.

let client = foreign "Unikernel.Client" (console @-> stackv4 @-> job)
let server = foreign "Unikernel.Server" (console @-> stackv4 @-> job) 

let client_netif = (netif "0")
let server_netif = (netif "1") 

let client_stack = direct_stackv4_with_dhcp default_console client_netif
let server_stack = direct_stackv4_with_dhcp default_console server_netif

let () = 
  register "unikernel" [
    main $ default_console $ client_stack;
    server $ default_console $ server_stack 
  ]

Acting on Stacks

Most network applications will either want to listen for incoming connections and respond to that traffic with information, or to connect to some remote host, execute a query, and receive information. STACKV4 offers simple ways to define functions implementing either of these patterns.

Establishing and Communicating Across Connections

STACKV4 offers listen_tcpv4 and listen_udpv4 functions for establishing listeners on specific ports. Both take a stack impl, a named port, and a callback function.

For UDP listeners, which are datagram-based rather than connection-based, callback is a function of the source IP, destination IP, source port, and the Cstruct.t that contains the payload data. Applications that wish to respond to incoming UDP packets with their own UDP responses (e.g., DNS servers) can use this information to construct reply packets and send them with UDPV4.write from within the callback function.

For TCP listeners, callback is a function of TCPV4.flow -> unit Lwt.t. STACKV4.TCPV4 offers read, write, and close on flows for application writers to build higher-level protocols on top of.

TCPV4 also offers create_connection, which allows client application code to establish TCP connections with remote servers. In success cases, create_connection returns a TCPV4.flow, which can be acted on just as the data in a callback above. There's also a polymorphic variant for error conditions, such as an unreachable remote server.

A Simple Example

Some very simple examples of user-level TCP code are included in mirage-tcpip/examples. config.ml is identical to the first configuration example above, and will build a direct stack by default.

Imagine a very simple application - one which simply repeats any data back to the sender, until the sender gets bored and wanders off (RFC 862, for the curious).

open Lwt
open V1_LWT

module Main (C: V1_LWT.CONSOLE) (S: V1_LWT.STACKV4) = struct
  let report_and_close c flow message =
    C.log c message;
    S.TCPV4.close flow

  let rec echo c flow =
    S.TCPV4.read flow >>= fun result -> (
      match result with  
        | `Eof -> report_and_close c flow "Echo connection closure initiated."
        | `Error e -> 
          let message = 
          match e with 
            | `Timeout -> "Echo connection timed out; closing.\n"
            | `Refused -> "Echo connection refused; closing.\n"
            | `Unknown s -> (Printf.sprintf "Echo connection error: %s\n" s)
             in
          report_and_close c flow message
        | `Ok buf ->
            S.TCPV4.write flow buf >>= fun () -> echo c flow
        ) 

  let start c s = 
    S.listen_tcpv4 s ~port:7 (echo c);
    S.listen s

end

All the application programmer needs to do is define functionality in relation to flow for sending and receiving data, establish this function as a callback with listen_tcpv4, and start a listening thread with listen.

More Complex Uses

An OCaml HTTP server, Cohttp, is currently powering this very blog. A simple static webserver using Cohttp is included in mirage-skeleton.

The OCaml-TLS demonstration server announced here just a few days ago is also running atop Cohttp - source is available on Github.

The future

Mirage's TCP/IP stack is under active development! Some low-level details are still stubbed out, and we're working on implementing some of the trickier corners of TCP, as well as doing automated testing on the stack. We welcome testing tools, bug reports, bug fixes, and new protocol implementations!


Jul
16
2014
12
0

Vchan: Low-latency inter-VM communication channels

By Jon Ludlam

Today's post is an update to Vincent Bernardoff's introducing vchan blog post, updated to use the modern build scheme for Mirage.

Unless you are familiar with Xen's source code, there is little chance that you've ever heard of the vchan library or protocol. Documentation about it is very scarce: a description can be found on vchan's public header file, that I quote here for convenience:

Originally borrowed from the Qubes OS Project, this code (i.e. libvchan) has been substantially rewritten [...] This is a library for inter-domain communication. A standard Xen ring buffer is used, with a datagram-based interface built on top. The grant reference and event channels are shared in XenStore under a user-specified path.

This protocol uses shared memory for inter-domain communication, i.e. between two VMs residing in the same Xen host, and uses Xen's mechanisms -- more specifically, ring buffers and event channels -- in order to achieve its aims. The term datagram-based interface simply means that the interface resembles UDP, although there is support for stream based communication (like TCP) as well.

The vchan protocol is an important feature in MirageOS 2.0 since it forms the foundational communication mechanism for building distributed clusters of unikernels that cooperate to solve problems that are beyond the power of a single node. Instead of forcing communication between nodes via a conventional wire protocol like TCP, it permits highly efficient low-overhead communication to nodes that are colocated on the same Xen host machine.

Before diving into vchan, I thought I'd also take the opportunity to describe the Ubuntu-Trusty environment for developing and running Xen unikernels.

Installing Xen on Ubuntu

Ubuntu 14.04 has good support for running Xen 4.4, the most recent release (at time of writing). For running VMs it's a good idea to install Ubuntu on an LVM volume rather than directly on a partition, which allows the use of LVs as the virtual disks for your VMs. On my system I have a 40 Gig partition for '/', an 8 Gig swap partition and the rest is free for my VMs:

$ sudo lvs
   LV     VG      Attr      LSize  Pool Origin Data%  Move Log Copy%  Convert
   root   st28-vg -wi-ao--- 37.25g
   swap_1 st28-vg -wi-ao---  7.99g

In this particular walkthough I won't be using disks, but later posts will. Install Xen via the meta-package. This brings in all you will need to run VMs:

$ sudo apt-get install xen-system-amd64

It used to be necessary to reorder the grub entries to make sure Xen was started by default, but this is no longer necessary. Once the machine has rebooted, you should be able to verify you're running virtualized by invoking 'xl':

$ sudo xl list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0  7958     6     r-----       9.7

My machine has 8 Gigs of memory, and this list shows that it's all being used by my dom0, so I'll need to either balloon down dom0 or reboot with a lower maximum memory. Ballooning is the most straightfoward:

$ sudo xenstore-write /local/domain/0/memory/target 4096000
$ sudo xl list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0  4000     6     r-----      12.2

This is handy for quick testing, but is discouraged by the Xen folks. So alternatively, change the xen command line by editing /etc/default/grub and add the line:

GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=4096M,max:4096M"

Once again, update-grub and reboot.

Mirage

Now lets get Mirage up and running. Install ocaml, opam and set up the opam environment:

$ sudo apt-get install ocaml opam ocaml-native-compilers camlp4-extra
...
$ opam init
...
$ eval `opam config env`

Don't forget the ocaml-native-compilers, as without this we can't compile the unikernels. Now we are almost ready to install Mirage; we need two more dependencies, and then we're good to go.

$ sudo apt-get install m4 libxen-dev
$ opam install mirage mirage-xen mirage-unix vchan

Where m4 is for ocamlfind, and libxen-dev is required to compile the unix variants of the xen-evtchn and xen-gnt libraries. Without these installing vchan will complain that there is no xen-evtchn.lwt library installed.

This second line installs the various Mirage and vchan libraries, but doesn't build the demo unikernel and Unix CLI. To get them, clone the ocaml-vchan repository:

$ git clone https://github.com/mirage/ocaml-vchan

The demo unikernel is a very straightforward capitalizing echo server. The main function simply consists of

let (>>=) = Lwt.bind

let (>>|=) m f = m >>= function
| `Ok x -> f x
| `Eof -> Lwt.fail (Failure "End of file")
| `Error (`Not_connected state) ->
    Lwt.fail (Failure (Printf.sprintf "Not in a connected state: %s"
      (Sexplib.Sexp.to_string (Node.V.sexp_of_state state))))

let rec echo vch =
  Node.V.read vch >>|= fun input_line ->
  let line = String.uppercase (Cstruct.to_string input_line) in
  let buf = Cstruct.create (String.length line) in
  Cstruct.blit_from_string line 0 buf 0 (String.length line);
  Node.V.write vch buf >>|= fun () ->
  echo vch

where we've defined an error-handling monadic bind (>>|=) which is then used to sequence the read and write operations.

Building the CLI is done simply via make.

$ make
...
$ ls -l node_cli.native
lrwxrwxrwx 1 jludlam jludlam 52 Jul 14 14:56 node_cli.native -> /home/jludlam/ocaml-vchan/_build/cli/node_cli.native

Building the unikernel is done via the mirage tool:

$ cd test
$ mirage configure --xen
...
$ make depend
...
$ make
...
$ ls -l mir-echo.xen echo.xl
-rw-rw-r-- 1 jludlam jludlam     596 Jul 14 14:58 echo.xl
-rwxrwxr-x 1 jludlam jludlam 3803982 Jul 14 14:59 mir-echo.xen

This make both the unikernel binary (the mir-echo.xen file) and a convenient xl script to run it. To run, we use the xl tool, passing '-c' to connect directly to the console so we can see what's going on:

$ sudo xl create -c echo.xl
Parsing config from echo.xl
kernel.c: Mirage OS!
kernel.c:   start_info: 0x11cd000(VA)
kernel.c:     nr_pages: 0x10000
kernel.c:   shared_inf: 0xdf2f6000(MA)
kernel.c:      pt_base: 0x11d0000(VA)
kernel.c: nr_pt_frames: 0xd
kernel.c:     mfn_list: 0x114d000(VA)
kernel.c:    mod_start: 0x0(VA)
kernel.c:      mod_len: 0
kernel.c:        flags: 0x0
kernel.c:     cmd_line:
x86_setup.c:   stack:      0x144f40-0x944f40
mm.c: MM: Init
x86_mm.c:       _text: 0x0(VA)
x86_mm.c:      _etext: 0xb8eec(VA)
x86_mm.c:    _erodata: 0xde000(VA)
x86_mm.c:      _edata: 0x1336f0(VA)
x86_mm.c: stack start: 0x144f40(VA)
x86_mm.c:        _end: 0x114d000(VA)
x86_mm.c:   start_pfn: 11e0
x86_mm.c:     max_pfn: 10000
x86_mm.c: Mapping memory range 0x1400000 - 0x10000000
x86_mm.c: setting 0x0-0xde000 readonly
x86_mm.c: skipped 0x1000
mm.c: MM: Initialise page allocator for 0x1256000 -> 0x10000000
mm.c: MM: done
x86_mm.c: Pages to allocate for p2m map: 2
x86_mm.c: Used 2 pages for map
x86_mm.c: Demand map pfns at 10001000-2010001000.
Initialising timer interface
Initializing Server domid=0 xs_path=data/vchan
gnttab_stubs.c: gnttab_table mapped at 0x10001000
Server: right_order = 13, left_order = 13
allocate_buffer_locations: gntref = 9
allocate_buffer_locations: gntref = 10
allocate_buffer_locations: gntref = 11
allocate_buffer_locations: gntref = 12
Writing config into the XenStore
Shared page is:

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0d 00 0d 00 02 01 01 00 09 00 00 00 0a 00 00 00
0b 00 00 00 0c 00 00 00
Initialization done!

Vchan is domain-to-domain communication, and relies on Xen's grant tables to share the memory. The entries in the grant tables have domain-level access control, so we need to know the domain ID of the client and server in order to set up the communications. The test unikernel server is hard-coded to talk to domain 0, so we only need to know the domain ID of our echo server. In another terminal,

$ sudo xl list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0  4095     6     r-----    1602.9
echo                                         2   256     1     -b----       0.0

In this case, the domain ID is 2, so we invoke the CLI as follows:

$ sudo ./node_cli.native 2
Client initializing: Received gntref = 8, evtchn = 4
Mapped the ring shared page:

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0d 00 0d 00 02 01 01 00 09 00 00 00 0a 00 00 00
0b 00 00 00 0c 00 00 00
Correctly bound evtchn number 71

We're now connected via vchan to the Mirage domain. The test server is simply a capitalisation service:

hello from dom0
HELLO FROM DOM0

Ctrl-C to get out of the CLI, and destroy the domain with an xl destroy:

$ sudo xl destroy test

vchan is a very low-level communication mechanism, and so our next post on this topic will address how to use it in combination with a name resolver to intelligently map connection requests to use vchan if available, and otherwise fall back to normal TCP or TCP+TLS.


Jul
15
2014
12
0

Modular foreign function bindings

By Jeremy Yallop

One of the most frequent questions about MirageOS from developers is "do I really need to write all my code in OCaml"? There are, of course, very good reasons to build the core system in pure OCaml: the module system permits reusing algorithmic abstractions at scale, and OCaml's static type checking makes it possible to enforce lightweight invariants across interfaces. However, it's ultimately necessary to support interfacing to existing code, and this blog post will describe what we're doing to make this possible this without sacrificing the security benefits afforded by unikernels.

A MirageOS application works by abstracting the logic of the application from the details of platform that it is compiled for. The mirage CLI tool parses a configuration file that represents the desired hardware target, which can be a Unix binary or a specialized Xen guest OS. Our foreign function interface design elaborates on these design principles by separating the description of the C foreign functions from how we link to that code. For instance, a Unix unikernel could use the normal ld.so to connect to a shared library, while in Xen we would need to interface to that C library through some other mechanism (for instance, a separate VM could be spawned to run the untrusted OpenSSL code). If you're curious about how this works, this blog post is for you!

Introducing ctypes

ocaml-ctypes ("ctypes" for short) is a library for gluing together OCaml code and C code without writing any C. This post introduces the ctypes library with a couple of simple examples, and outlines how OCaml's module system makes it possible to write high-level bindings to C that are independent of any particular linking mechanism.

Hello, C

Binding a C function using ctypes involves two steps.

  • First, construct an OCaml value that represents the type of the function
  • Second, use the type representation and the function name to resolve and bind the function

For example, here's a binding to C's puts function, which prints a string to standard output and returns the number of characters written:

let puts = foreign "puts" (string @-> returning int)

After the call to foreign the bound function is available to OCaml immediately. Here's a call to puts from the interactive top level:

# puts "Hello, world";;
Hello, world
- : int = 13

<Hello-C/>

Now that we've had a taste of ctypes, let's look at a more realistic example: a program that defines bindings to the expat XML parsing library, then uses them to display the structure of an XML document.

We'll start by describing the types used by expat. Since ctypes represents C types as OCaml values, each of the types we need becomes a value binding in our OCaml program. The parser object involves an incomplete (abstract) struct definition and a typedef for a pointer to a struct:

struct xml_ParserStruct;
typedef xml_ParserStruct *xml_Parser;

In ctypes these become calls to the structure and ptr functions:

let parser_struct : [`XML_ParserStruct] structure typ = structure "xml_ParserStruct"
let xml_Parser = ptr parser_struct

Next, we'll use the type representations to bind some functions. The XML_ParserCreate and XML_ParserFree functions construct and destroy parser objects. As with puts, each function binding involves a simple call to foreign:

let parser_create = foreign "XML_ParserCreate"
  (ptr void @-> returning xml_Parser)
let parser_free = foreign "XML_ParserFree"
  (xml_Parser @-> returning void)

Expat operates primarily through callbacks: when start and end elements are encountered the parser invokes user-registered functions, passing the tag names and attributes (along with a piece of user data):

typedef void (*start_handler)(void *, char *, char **);
typedef void (*end_handler)(void *, char *);

In ctypes function pointer types are built using the funptr function:

let start_handler =
  funptr (ptr void @-> string @-> ptr string @-> returning void)
let end_handler =
  funptr (ptr void @-> string @-> returning void)

We can use the start_handler and end_handler type representations to bind XML_SetElementHandler, the callback-registration function:

let set_element_handler = foreign "XML_SetElementHandler"
  (xml_Parser @-> start_handler @-> end_handler @-> returning void)

The type that OCaml infers for set_element_handler reveals that the function accepts regular OCaml functions as arguments, since the argument types are normal OCaml function types:

val set_element_handler :
  [ `XML_ParserStruct ] structure ptr ->
  (unit ptr -> string -> string ptr -> unit) ->
  (unit ptr -> string -> unit) -> unit

There's one remaining function to bind, then we're ready to use the library. The XML_Parse function performs the actual parsing, invoking the callbacks when tags are encountered:

let parse = foreign "XML_Parse"
  (xml_Parser @-> string @-> int @-> int @-> returning int)

As before, all the functions that we've bound are available for use immediately. We'll start by using them to define a more idiomatic OCaml entry point to the library. The parse_string function accepts the start and end callbacks as labelled arguments, along with a string to parse:

let parse_string ~start_handler ~end_handler s =
  let p = parser_create null in
  let () = set_element_handler p start_handler end_handler in
  let _ = parse p s (String.length s) 1 in
  parser_free p

Using parse_string we can write a program that prints out the names of each element in an XML document, indented according to nesting depth:

let depth = ref 0

let start_handler _ name _ =
  Printf.printf "%*s%s\n" (!depth * 3) "" name;
  incr depth

let end_handler _ _ =
  decr depth

let () =
  parse_string ~start_handler ~end_handler (In_channel.input_all stdin)

The full source of the program is available on github.

Here's the program in action:

$ ocamlfind opt -thread -package core,ctypes.foreign expat_example.ml \
   -linkpkg -cclib -lexpat -o expat_example
$ wget -q http://openmirage.org/blog/atom.xml -O /dev/stdout \
  | ./expat_example
feed
   id
   title
   subtitle
   rights
   updated
   link
   link
   contributor
      email
      uri
      name
[...]

Since this is just a high-level overview we've passed over a number of details. The interested reader can find a more comprehensive introduction to using ctypes in Chapter 19: Foreign Function Interface of Real World OCaml.

Dynamic vs static

Up to this point we've been using a single function, foreign, to make C functions available to OCaml. Although foreign is simple to use, there's quite a lot going on behind the scenes. The two arguments to foreign are used to dynamically construct an OCaml function value that wraps the C function: the name is used to resolve the code for the C function, and the type representation is used to construct a call frame appropriate to the C types invovled and to the underlying platform.

The dynamic nature of foreign that makes it convenient for interactive use, also makes it unsuitable for some environments. There are three main drawbacks:

  • Binding functions dynamically involves a certain loss of safety: since C libraries typically don't maintain information about the types of the functions they contain, there's no way to check whether the type representation passed to foreign matches the actual type of the C function.

  • Dynamically constructing calls introduces a certain interpretative overhead. In mitigation, this overhead is much less than might be supposed, since much of the work can be done when the function is bound rather than when the call is made, and foreign has been used to bind C functions in performance-sensitive applications without problems.

  • The implementation of foreign uses a low-level library, libffi, to deal with calling conventions across platforms. While libffi is mature and widely supported, it's not appropriate for use in every environment. For example, introducing such a (relatively) large and complex library into Mirage would compromise many of the benefits of writing the rest of the system in OCaml.

Happily, there's a solution at hand. As the introduction hints, foreign is one of a number of binding strategies, and OCaml's module system makes it easy to defer the choice of which strategy to use when writing the actual code. Placing the expat bindings in a functor (parameterised module) makes it possible to abstract over the linking strategy:

module Bindings(F : FOREIGN) =
struct
  let parser_create = F.foreign "XML_ParserCreate"
    (ptr void @-> returning xml_Parser)
  let parser_free = F.foreign "XML_ParserFree"
    (xml_Parser @-> returning void)
  let set_element_handler = F.foreign "XML_SetElementHandler"
    (xml_Parser @-> start_handler @-> end_handler @-> returning void)
  let parse = F.foreign "XML_Parse"
    (xml_Parser @-> string @-> int @-> int @-> returning int)
end

The Bindings module accepts a single parameter of type FOREIGN, which encodes the binding strategy to use. Instantiating Bindings with a module containing the foreign function used above recovers the dynamically-constructed bindings that we've been using so far. However, there are now other possibilities available. In particular, we can instantiate Bindings with code generators that output code to expose the bound functions to OCaml. The actual instantiation is hidden behind a couple of convenient functions, write_c and write_ml, which accept Bindings as a parameter and write to a formatter:

Cstubs.write_c formatter ~prefix:"expat" ~bindings:(module Bindings)
Cstubs.write_ml formatter ~prefix:"expat" ~bindings:(module Bindings)

Generating code in this way eliminates the concerns associated with constructing calls dynamically:

  • The C compiler checks the types of the generated calls against the C headers (the API), so the safety concerns associated with linking directly against the C library binaries (the ABI) don't apply.

  • There's no interpretative overhead, since the generated code is (statically) compiled.

  • The dependency on libffi disappears altogether.

How easy is it in practice to switch between dynamic and static binding strategies? It turns out that it's quite straightforward, even for code that was originally written without parameterisation. Bindings written using early releases of ctypes used the dynamic strategy exclusively, since dynamic binding was then the only option available. The commit logs for projects that switched over to static generation and linking (e.g. ocaml-lz4 and async-ssl) when it became available show that moving to the new approach involved only straightforward and localised changes.

Local vs remote

Generating code is safer than constructing calls dynamically, since it allows the C compiler to check the types of function calls against declarations. However, there are some safety problems that even C's type checking doesn't detect. For instance, the following call is type correct (given suitable definitions of p and q), but is likely to misbehave at run time:

memcpy(p, q, SIZE_MAX)

In contrast, code written purely in OCaml detects and prevents attempts to write beyond the bounds of allocated objects:

# StringLabels.blit ~src ~dst ~src_pos:0 ~dst_pos:0 ~len:max_int;;
Exception: Invalid_argument "String.blit".

It seems a shame to weaken OCaml's safety guarantees by linking in C code that can potentially write to any region of memory, but what is the alternative?

One possibility is to use privilege separation to separate trusted OCaml code from untrusted C functions. The modular design of ctypes means that privilege separation can be treated as one more linking strategy: we can run C code in an entirely separate process (or for Mirage/Xen, in a separate virtual machine), and instantiate Bindings with a strategy that forwards calls to the process using standard inter-process communication. The remote calling strategy is not supported in the current release of ctypes, but it's scheduled for a future version. As with the switch from dynamic to static bindings, we anticipate that updating existing bindings to use cross-process calls will be straightforward.

This introductory post should give you a sense of the power of the unikernel approach in Mirage. By turning the FFI into just another library (for the C interface description) and protocol (for the linkage model), we can use code generation to map application logic onto the privilege model most suitable for the target hardware platform. This starts with Unix processes, continues onto Xen paravirtualization, and could even extend into CHERI fine-grained compartmentalization.

Further examples

Although ctypes is a fairly new library, it's already in use in a number of projects across a variety of domains: graphics, multimedia, compression, cryptography, security, geospatial data, communication, and many others. Further resources (documentation, forums, etc.) are available via the home page.


Jul
14
2014
12
0

OCaml-TLS: the protocol implementation and mitigations to known attacks

By David Kaloper, Hannes Mehnert

This is the fifth in a series of posts that introduce new libraries for a pure OCaml implementation of TLS. You might like to begin with the introduction.

ocaml-tls is the new, clean-slate implementation of TLS in OCaml that we've been working on for the past six months. In this post we try to document some of its internal design, the reasons for the decisions we made, and the current security status of that work. Try our live interactive demonstration server which visualises TLS sessions.

The OCaml-TLS architecture

The OCaml ecosystem has several distinct ways of interacting with the outside world (and the network in particular): straightforward unix interfaces and the asynchronous programming libraries lwt and async. One of the early considerations was not to restrict ourselves to any of those -- we wanted to support them all.

There were also two distinct basic "platforms" we wanted to target from the outset: the case of a simple executable, and the case of Mirage unikernels.

So one of the first questions we faced was deciding how to represent interactions with the network in a portable way. This can be done by systematically abstracting out the API boundary which gives access to network operations, but we had a third thing in mind as well: we wanted to exploit the functional nature of OCaml to its fullest extent!

Our various prior experiences with Haskell and Idris convinced us to adopt what is called "purely functional" technique. We believe it to be an approach which first forces the programmer to give principled answers to all the difficult design questions (errors and global data-flow) in advance, and then leads to far cleaner and composable code later on. A purely functional system has all the data paths made completely explicit in the form of function arguments and results. There are no unaccounted-for interactions between components mediated by shared state, and all the activity of the parts of the system is exposed through types since, after all, it's only about computing values from values.

For these reasons, the library is split into two parts: the directory /lib (and the corresponding findlib package tls) contains the core TLS logic, and /mirage and /lwt (packaged as tls.mirage and tls.lwt respectively) contain front-ends that tie the core to Mirage and Lwt_unix.

Core

The core library is purely functional. A TLS session is represented by the abstract type Tls.Engine.state, and various functions consume this session type together with raw bytes (Cstruct.t -- which is by itself mutable, but ocaml-tls eschews this) and produce new session values and resulting buffers.

The central entry point is handle_tls, which transforms an input state and a buffer to an output state, a (possibly empty) buffer to send to the communication partner, and an optional buffer of data intended to be received by the application:

type state

type ret = [
  | `Ok of [ `Ok of state | `Eof | `Alert of alert ] *
      [ `Response of Cstruct.t ] * [ `Data of Cstruct.t option ]
  | `Fail of alert * [ `Response of Cstruct.t ]
]

val handle_tls : state -> Cstruct.t -> ret

As the signature shows, errors are signalled through the ret type, which is a polymorphic variant. This reflects the actual internal structure: all the errors are represented as values, and operations are composed using an error monad.

Other entry points share the same basic behaviour: they transform the prior state and input bytes into the later state and output bytes.

Here's a rough outline of what happens in handle_tls:

  • TLS packets consist of a header, which contains the protocol version, length, and content type, and the payload of the given content type. Once inside our main handler, we separate the buffer into TLS records, and process each individually. We first check that the version number is correct, then decrypt, and verify the mac.

  • Decrypted data is then dispatched to one of four sub-protocol handlers (Handshake, Change Cipher Spec, Alert and Application Data). Each handler can return a new handshake state, outgoing data, application data, the new decryption state or an error (with the outgoing data being an interleaved list of buffers and new encryption states).

  • The outgoing buffers and the encryption states are traversed to produce the final output to be sent to the communication partner, and the final encryption, decryption and handshake states are combined into a new overall state which is returned to the caller.

Handshake is (by far) the most complex TLS sub-protocol, with an elaborate state machine. Our client and server encode this state as a "flat" sum type, with exactly one incoming message allowed per state. The handlers first parse the handshake packet (which fails in case of malformed or unknown data) and then dispatch it to the handling function. The handshake state is carried around and a fresh one is returned from the handler in case it needs updates. It consists of a protocol version, the handshake state, configuration, renegotiation data, and possibly a handshake fragment.

Logic of both handshake handlers is very localised, and does not mutate any global data structures.

Core API

OCaml permits the implementation a module to be exported via a more abstract signature that hides the internal representation details. Our public API for the core library consists of the Tls.Engine and Tls.Config modules.

Tls.Engine contains the basic reactive function handle_tls, mentioned above, which processes incoming data and optionally produces a response, together with several operations that allow one to initiate message transfer like send_application_data (which processes application-level messages for sending), send_close_notify (for sending the ending message) and reneg (which initiates full TLS renegotiation).

The module also contains the only two ways to obtain the initial state:

val client : Config.client -> (state * Cstruct.t)
val server : Config.server -> state

That is, one needs a configuration value to create it. The Cstruct.t that client emits is the initial Client Hello since in TLS, the client starts the session.

Tls.Config synthesizes configurations, separately for client and server endpoints, through the functions client_exn and server_exn. They take a number of parameters that define a TLS session, check them for consistency, and return the sanitized config value which can be used to create a state and, thus, a session. If the check fails, they raise an exception.

The parameters include the pair of a certificate and its private key for the server, and an X509.Authenticator.t for the client, both produced by our ocaml-x509 library and described in a previous article.

This design reflects our attempts to make the API as close to "fire and forget" as we could, given the complexity of TLS: we wanted the library to be relatively straightforward to use, have a minimal API footprint and, above all, fail very early and very loudly when misconfigured.

Effectful front-ends

Clearly, reading and writing network data does change the state of the world. Having a pure value describing the state of a TLS session is not really useful once we write something onto the network; it is certainly not the case that we can use more than one distinct state to process further data, as only one value is in sync with the other endpoint at any given time.

Therefore we wrap the core types into stateful structures loosely inspired by sockets and provide IO operations on those. The structures of mirage and lwt front-ends mirror one another.

In both cases, the structure is pull-based in the sense that no processing is done until the client requires a read, as opposed to a callback-driven design where the client registers a callback and the library starts spinning in a listening loop and invoking it as soon as there is data to be processed. We do this because in an asynchronous context, it is easy to create a callback-driven interface from a demand-driven one, but the opposite is possible only with unbounded buffering of incoming data.

One exception to demand-driven design is the initial session creation: the library will only yield the connection after the first handshake is over, ensuring the invariant that it is impossible to interact with a connection if it hasn't already been fully established.

Mirage

The Mirage interface matches the FLOW signature (with additional TLS-specific operations). We provide a functor that needs to be applied to an underlying TCP module, to obtain a TLS transport on top. For example:

module Server (Stack: STACKV4) (Entropy: ENTROPY) (KV: KV_RO) =
struct

  module TLS  = Tls_mirage.Make (Stack.TCPV4) (Entropy)
  module X509 = Tls_mirage.X509 (KV) (Clock)

  let accept conf flow =
    TLS.server_of_tcp_flow conf flow >>= function
    | `Ok tls ->
      TLS.read tls >>= function
      | `Ok buf ->
        TLS.write tls buf >>= fun () -> TLS.close buf

  let start stack e kv =
    TLS.attach_entropy e >>= fun () ->
    lwt authenticator = X509.authenticator kv `Default in
    let conf          = Tls.Config.server_exn ~authenticator () in
    Stack.listen_tcpv4 stack 4433 (accept conf) ;
    Stack.listen stack

end

Lwt

The lwt interface has two layers. Tls_lwt.Unix is loosely based on read/write operations from Lwt_unix and provides in-place update of buffers. read, for example, takes a Cstruct.t to write into and returns the number of bytes read. The surrounding module, Tls_lwt, provides a simpler, Lwt_io-compatible API built on top:

let main host port =
  Tls_lwt.rng_init () >>= fun () ->
  lwt authenticator = X509_lwt.authenticator (`Ca_dir nss_trusted_ca_dir) in
  lwt (ic, oc)      = Tls_lwt.connect ~authenticator (host, port) in
  let req = String.concat "\r\n" [
    "GET / HTTP/1.1" ; "Host: " ^ host ; "Connection: close" ; "" ; ""
  ] in
  Lwt_io.(write oc req >>= fun () -> read ic >>= print)

We have further plans to provide wrappers for `Async` and plain `Unix` in a similar vein.

Attacks on TLS

TLS the most widely deployed security protocol on the Internet and, at over 15 years, is also showing its age. As such, a flaw is a valuable commodity due to the commercially sensitive nature of data that is encrypted with TLS. Various vulnerabilities on different layers of TLS have been found - heartbleed and others are implementation specific, advancements in cryptanalysis such as collisions of MD5 lead to vulnerabilities, and even others are due to incorrect usage of TLS (truncation attack or BREACH). Finally, some weaknesses are in the protocol itself. Extensive overviews of attacks on TLS are available.

We look at protocol level attacks of TLS and how ocaml-tls implements mitigations against these. TLS 1.2 RFC provides an overview of attacks and mitigations, and we track our progress in covering them. This is slightly out of date as the RFC is roughly six years old and in the meantime more attacks have been published, such as the renegotiation flaw.

As already mentioned, we track all our mitigated and open security issues on our GitHub issue tracker.

Due to the choice of using OCaml, a memory managed programming language, we obstruct entire bug classes, namely temporal and spatial memory safety.

Cryptanalysis and improvement of computational power weaken some ciphers, such as RC4 and 3DES (see issue 8 and issue 10). If we phase these two ciphers out, there wouldn't be any matching ciphersuite left to communicate with some compliant TLS-1.0 implementations, such as Windows XP, that do not support AES.

Timing attacks

When the timing characteristics between the common case and the error case are different, this might potentially leak confidential information. Timing is a very prominent side-channel and there are a huge variety of timing attacks on different layers, which are observable by different attackers. Small differences in timing behaviour might initially be exploitable only by a local attacker, but advancements to the attack (e.g. increasing the number of tests) might allow a remote attacker to filter the noise and exploit the different timing behaviour.

Timing of cryptographic primitives

We already mentioned cache timing attacks on our AES implementation, and that we use blinding techniques to mitigate RSA timing attacks.

By using a memory managed programming language, we open the attack vector of garbage collector (GC) timing attacks (also mentioned in our nocrypto introduction).

Furthermore, research has been done on virtual machine side channels (l3, cross vm and cache timing), which we will need to study and mitigate appropriately.

For the time being we suggest to not use the stack on a multi-tenant shared host or on a shared host which malicious users might have access to.

Bleichenbacher

In 1998, Daniel Bleichenbacher discovered a timing flaw in the PKCS1 encoding of the premaster secret: the TLS server failed faster when the padding was wrong than when the decryption failed. Using this timing, an attacker can run an adaptive chosen ciphertext attack and find out the plain text of a PKCS1 encrypted message. In TLS, when RSA is used as the key exchange method, this leads to discovery of the premaster secret, which is used to derive the keys for the current session.

The mitigation is to have both padding and decryption failures use the exact same amount of time, thus there should not be any data-dependent branches or different memory access patterns in the code. We implemented this mitigation in Handshake_server.

Padding oracle and CBC timing

Vaudenay discovered a vulnerability involving block ciphers: if an attacker can distinguish between bad mac and bad padding, recovery of the plaintext is possible (within an adaptive chosen ciphertext attack). Another approach using the same issue is to use timing information instead of separate error messages. Further details are described here.

The countermeasure, which we implement here, is to continue with the mac computation even though the padding is incorrect. Furthermore, we send the same alert (bad_record_mac) independent of whether the padding is malformed or the mac is incorrect.

Lucky 13

An advancement of the CBC timing attack was discovered in 2013, named Lucky 13. Due to the fact that the mac is computed over the plaintext without padding, there is a slight (but measurable) difference in timing between computing the mac of the plaintext and computing the fake mac of the ciphertext. This leaks information. We do not have proper mitigation against Lucky 13 in place yet. You can find further discussion in issue 7 and pull request 49.

Renegotiation not authenticated

In 2009, Marsh Ray published a vulnerability of the TLS protocol which lets an attacker prepend arbitrary data to a session due to unauthenticated renegotiation. The attack exploits the fact that a renegotiation of ciphers and key material is possible within a session, and this renegotiated handshake is not authenticated by the previous handshake. A man in the middle can initiate a session with a server, send some data, and hand over the session to a client. Neither the client nor the server can detect the man in the middle.

A fix for this issue is the secure renegotiation extension, which embeds authenticated data of the previous handshake into the client and server hello messages. Now, if a man in the middle initiates a renegotiation, the server will not complete it due to missing authentication data (the client believes this is the first handshake).

We implement and require the secure renegotiation extension by default, but it is possible to configure ocaml-tls to not require it -- to be able to communicate with servers and clients which do not support this extension.

Implementation of the mitigation is on the server side in ensure_reneg and on the client side in validate_reneg. The data required for the secure renegotiation is stored in `handshake_state` while sending and receiving Finished messages. You can find further discussion in issue 3.

TLS 1.0 and known-plaintext (BEAST)

TLS 1.0 reuses the last ciphertext block as IV in CBC mode. If an attacker has a (partially) known plaintext, she can find the remaining plaintext. This is known as the BEAST attack and there is a long discussion about mitigations. Our mitigation is to prepend each TLS-1.0 application data fragment with an empty fragment to randomize the IV. We do this exactly here. There is further discussion in issue 2.

Our mitigation is slightly different from the 1/n-1 splitting proposed here: we split every application data frame into a 0 byte and n byte frame, whereas they split into a 1 byte and a n-1 byte frame.

Researchers have exploited this vulnerability in 2011, although it was known since 2006. TLS versions 1.1 and 1.2 use an explicit IV, instead of reusing the last cipher block on the wire.

Compression and information leakage (CRIME)

When using compression on a chosen-plaintext, encrypting this can leak information, known as CRIME. BREACH furthermore exploits application layer compression, such as HTTP compression. We mitigate CRIME by not providing any TLS compression support, while we cannot do anything to mitigate BREACH.

Traffic analysis

Due to limited amount of padding data, the actual size of transmitted data can be recovered. The mitigation is to implement length hiding policies. This is tracked as issue 162.

Version rollback

SSL-2.0 is insecure, a man in the middle can downgrade the version to SSL-2.0. The mitigation we implement is that we do not support SSL-2.0, and thus cannot be downgraded. Also, we check that the version of the client hello matches the first two bytes in the premaster secret here. You can find further discussion in issue 5.

Triple handshake

A vulnerability including session resumption and renegotiation was discovered by the miTLS team, named triple handshake. Mitigations include disallowing renegotiation, disallowing modification of the certificate during renegotiation, or a hello extension. Since we do not support session resumption yet, we have not yet implemented any of the mentioned mitigations. There is further discussion in issue 9.

Alert attack

A fragment of an alert can be sent by a man in the middle during the initial handshake. If the fragment is not cleared once the handshake is finished, the authentication of alerts is broken. This was discovered in 2012; our mitigation is to discard fragmented alerts.

EOF.

Within six months, two hackers managed to develop a clean-slate TLS stack, together with required crypto primitives, ASN.1, and X.509 handling, in a high-level pure language. We interoperate with widely deployed TLS stacks, as shown by our demo server. The code size is nearly two orders of magnitude smaller than OpenSSL, the most widely used open source library (written in C, which a lot of programming languages wrap instead of providing their own TLS implementation). Our code base seems to be robust -- the demo server successfully finished over 22500 sessions in less than a week, with only 11 failing traces.

There is a huge need for high quality TLS implementations, because several TLS implementations suffered this year from severe security problems, such as heartbleed, goto fail, session id, Bleichenbacher, change cipher suite and GCM DoS. The main cause is implementation complexity due to lack of abstraction, and memory safety issues.

We still need to address some security issues, and improve our performance. We invite people to do rigorous code audits (both manual and automated) and try testing our code in their services.

Please be aware that this release is a beta and is missing external code audits. It is not yet intended for use in any security critical applications.

Acknowledgements

Since this is the final post in our series, we would like to thank all people who reported issues so far: Anil Madhavapeddy, Török Edwin, Daniel Bünzli, Andreas Bogk, Gregor Kopf, Graham Steel, Jerome Vouillon, Amir Chaudhry, Ashish Agarwal. Additionally, we want to thank the miTLS team (especially Cedric and Karthikeyan) for fruitful discussions, as well as the OCaml Labs and Mirage teams. And thanks to Peter Sewell and Richard Mortier for funding within the REMS, UCN, and Horizon projects. The software was started in Aftas beach house in Mirleft, Morocco.

Aftas Beach


Posts in this TLS series:


Jul
11
2014
12
0

OCaml-TLS: ASN.1 and notation embedding

By David Kaloper

This is the fourth in a series of posts that introduce new libraries for a pure OCaml implementation of TLS. You might like to begin with the introduction.

asn1-combinators is a library that allows one to express ASN.1 grammars directly in OCaml, manipulate them as first-class entities, combine them with one of several ASN encoding rules and use the result to parse or serialize values.

It is the parsing and serialization backend for our X.509 certificate library, which in turn provides certificate handling for ocaml-tls. We wrote about the X.509 certificate handling yesterday.

What is ASN.1, really?

ASN.1 (Abstract Syntax Notation, version one) is a way to describe on-the-wire representation of messages. It is split into two components: a way to describe the content of a message, i.e. a notation for its abstract syntax, and a series of standard encoding rules that define the exact byte representations of those syntaxes. It is defined in ITU-T standards X.680-X.683 and X.690-X.695.

The notation itself contains primitive grammar elements, such as BIT STRING or GeneralizedTime, and constructs that allow for creation of compound grammars from other grammars, like SEQUENCE. The notation is probably best introduced through a real-world example:

-- Simple name bindings
UniqueIdentifier ::= BIT STRING

-- Products
Validity ::= SEQUENCE {
  notBefore Time,
  notAfter  Time
}

-- Sums
Time ::= CHOICE {
  utcTime     UTCTime,
  generalTime GeneralizedTime
}

(Example from RFC 5280, the RFC that describes X.509 certificates which heavily rely on ASN.)

The first definition shows that we can introduce an alias for any existing ASN grammar fragment, in this case the primitive BIT STRING. The second and third definitions are, at least morally, a product and a sum.

At their very core, ASN grammars look roughly like algebraic data types, with a range of pre-defined primitive grammar fragments like BIT STRING, INTEGER, NULL, BOOLEAN or even GeneralizedTime, and a number of combining constructs that can be understood as denoting sums and products.

Definitions such as the above are arranged into named modules. The standard even provides for some abstractive capabilities: initially just a macro facility, and later a form of parameterized interfaces.

To facilitate actual message transfer, a grammar needs to be coupled with an encoding. By far the most relevant ones are Basic Encoding Rules (BER) and Distinguished Encoding Rules (DER), although other encodings exist.

BER and DER are tag-length-value (TLV) encodings, meaning that every value is encoded as a triplet containing a tag that gives the interpretation of its contents, a length field, and the actual contents which can in turn contain other TLV triplets.

Let's drop the time from the example above, as time encoding is a little involved, and assume a simpler version for a moment:

Pair ::= SEQUENCE {
  car Value,
  cdr Value
}

Value ::= CHOICE {
  v_str UTF8String,
  v_int INTEGER
}

Then two possible BER encodings of a Pair ("foo", 42) are:

  30         - SEQUENCE            30         - SEQUENCE
  08         - length              0c         - length
  [ 0c       - UTF8String          [ 2c       - UTF8String, compound
    03       - length                07       - length
    [ 66     - 'f'                   [ 0c     - UTF8String
      6f     - 'o'                     01     - length
      6f ]   - 'o'                     [ 66 ] - 'f'
    02       - INTEGER                 0c     - UTF8String
    01       - length                  02     - length
    [ 2a ] ] - 42                      [ 6f   - 'o'
                                         6f ] - 'o'
                                     02       - INTEGER
                                     01       - length
                                     [ 2a ] ] - 42

The left one is also the only valid DER encoding of this value: BER allows certain freedoms in encoding, while DER is just a BER subset without those freedoms. The property of DER that any value has exactly one encoding is useful, for example, when trying to digitally sign a value.

If this piqued your curiosity about ASN, you might want to take a detour and check out this excellent writeup.

A bit of history

The description above paints a picture of a technology a little like Google's Protocol Buffers or Apache Thrift: a way to declaratively specify the structure of a set of values and derive parsers and serializers, with the addition of multiple concrete representations.

But the devil is in the detail. For instance, the examples above intentionally gloss over the fact that often concrete tag values leak into the grammar specifications for various disambiguation reasons. And ASN has more than 10 different string types, most of which use long-obsolete character encodings. Not to mention that the full standard is close to 200 pages of relatively dense language and quite difficult to follow. In general, ASN seems to have too many features for the relatively simple task it is solving, and its specification has evolved over decades, apparently trying to address various other semi-related problems, such as providing a general Interface Description Language.

Which is to say, ASN is probably not what you are looking for. So why implement it?

Developed in the context of the telecom industry around 30 years ago, modified several times after that and apparently suffering from a lack of a coherent goal, by the early 90s ASN was still probably the only universal, machine- and architecture-independent external data representation.

So it came easily to hand around the time RSA Security started publishing its series of PKCS standards, aimed at the standardization of cryptographic material exchange. RSA keys and digital signatures are often exchanged ASN-encoded.

At roughly the same time, ITU-T started publishing the X.500 series of standards which aimed to provide a comprehensive directory service. Much of this work ended up as LDAP, but one little bit stands out in particular: the X.509 PKI certificate.

So a few years later, when Netscape tried to build an authenticated and confidential layer to tunnel HTTP through, they based it on -- amongst other things -- X.509 certificates. Their work went through several revisions as SSL and was finally standardized as TLS. Modern TLS still requires X.509.

Thus, even though TLS uses ASN only for encoding certificates (and the odd PKCS1 signature), every implementation needs to know how to deal with ASN. In fact, many other general cryptographic libraries also need to deal with ASN, as various PKCS standards mandate ASN as the encoding for exchange of cryptographic material.

The grammar of the grammar

As its name implies, ASN was meant to be used with a specialized compiler. ASN is really a standard for writing down abstract syntaxes, and ASN compilers provided with the target encoding will generate code in your programming language of choice that, when invoked, parses to or serializes from ASN.

As long as your programming language of choice is C, C++, Java or C#, obviously -- there doesn't seem to be one freely available that targets OCaml. In any case, generating code for such a high-level language feels wrong somehow. In its effort to be language-neutral, ASN needs to deal with things like modules, abstraction and composition. At this point, most functional programmers reading this are screaming: "I already have a language that can deal with modules, abstraction and composition perfectly well!"

So we're left with implementing ASN in OCaml.

One strategy is to provide utility functions for parsing elements of ASN and simply invoke them in the appropriate order, as imposed by the target grammar. This amounts to hand-writing the parser and is what TLS libraries in C typically do.

As of release 1.3.7, PolarSSL includes ~7,500 lines of rather beautifully written C, that implement a specialized parser for dealing with X.509. OpenSSL's libcrypto contains ~50,000 lines of C in its 'asn1', 'x509' and 'x509v3' directories, and primarily deals with X.509 specifically as required by TLS.

In both cases, low-level control flow is intertwined with the parsing logic and, above the ASN parsing level, the code that deals with interpreting the ASN structure is not particularly concise. It is certainly a far cry from the (relatively) simple grammar description ASN itself provides.

Since in BER every value fully describes itself, another strategy is to parse the input stream without reference to the grammar. This produces a value that belongs to the general type of all ASN-encoded trees, after which we need to process the structure according to the grammar. This is similar to a common treatment of JSON or XML, where one decouples parsing of bytes from the higher-level concerns about the actual structure contained therein. The problem here is that either the downstream client of such a parser needs to constantly re-check whether the parts of the structure it's interacting with are really formed according to the grammar (probably leading to a tedium of pattern-matches), or we have to turn around and solve the parsing problem again, mapping the uni-typed contents of a message to the actual, statically known structure we require the message to have.

Surely we can do better?

LAMBDA: The Ultimate Declarative

Again, ASN is a language with a number of built-in primitives, a few combining constructs, (recursive) name-binding and a module system. Our target language is a language with a perfectly good module system and it can certainly express combining constructs. It includes an abstraction mechanism arguably far simpler and easier to use than those of ASN, namely, functions. And the OCaml compilers can already parse OCaml sources. So why not just reuse this machinery?

The idea is familiar. Creating embedded languages for highly declarative descriptions within narrowly defined problem spaces is the staple of functional programming. In particular, combinatory parsing has been known, studied and used for decades.

However, we also have to diverge from traditional parser combinators in two major ways. Firstly, a single grammar expression needs to be able to generate different concrete parsers, corresponding to different ASN encodings. More importantly, we desire our grammar descriptions to act bidirectionally, producing both parsers and complementary deserializers.

The second point severely restricts the signatures we can support. The usual monadic parsers are off the table because the expression such as:

( (pa : a t) >>= fun (a : a) ->
  (pb : b t) >>= fun (b : b) ->
  return (b, b, a) ) : (b * b * a) t

... "hides" parts of the parser inside the closures, especially the method of mapping the parsed values into the output values, and can not be run "in reverse" [1].

We have a similar problem with applicative functors:

( (fun a b -> (b, b, a))
  <$> (pa : a t)
  <*> (pb : b t) ) : (b * b * a) t

(Given the usual <$> : ('a -> 'b) -> 'a t -> 'b t and <*> : ('a -> 'b) t -> 'a t -> 'b t.) Although the elements of ASN syntax are now exposed, the process of going from intermediate parsing results to the result of the whole is still not accessible.

Fortunately, due to the regular structure of ASN, we don't really need the full expressive power of monadic parsing. The only occurrence of sequential parsing is within SEQUENCE and related constructs, and we don't need look-ahead. All we need to do is provide a few specialized combinators to handle those cases -- combinators the likes of which would be derived in a more typical setting.

So if we imagine we had a few values, like:

val gen_time : gen_time t
val utc_time : utc_time t
val choice   : 'a t -> 'b t -> ('a, 'b) choice t
val sequence : 'a t -> 'b t -> ('a * 'b) t

Assuming appropriate OCaml types gen_time and utc_time that reflect their ASN counterparts, and a simple sum type choice, we could express the Validity grammar above using:

type time = (gen_time, utc_time) choice
let time     : time t          = choice gen_time utc_time
let validity : (time * time) t = sequence time time

In fact, ASN maps quite well to algebraic data types. Its SEQUENCE corresponds to n-ary products and CHOICE to sums. ASN SET is a lot like SEQUENCE, except the elements can come in any order; and SEQUENCE_OF and SET_OF are just lifting an 'a-grammar into an 'a list-grammar.

A small wrinkle is that SEQUENCE allows for more contextual information on its components (so does CHOICE in reality, but we ignore that): elements can carry labels (which are not used for parsing) and can be marked as optional. So instead of working directly on the grammars, our sequence must work on their annotated versions. A second wrinkle is the arity of the sequence combinator.

Thus we introduce the type of annotated grammars, 'a element, which corresponds to one ,-delimited syntactic element in ASN's own SEQUENCE grammar, and the type 'a sequence, which describes the entire contents ({ ... }) of a SEQUENCE definition:

val required : 'a t -> 'a element
val optional : 'a t -> 'a option element
val ( -@ )   : 'a element -> 'b element -> ('a * 'b) sequence
val ( @ )    : 'a element -> 'a sequence -> ('a * 'b) sequence
val sequence : 'a sequence -> 'a t

The following are then equivalent:

Triple ::= SEQUENCE {
  a INTEGER,
  b BOOLEAN,
  c BOOLEAN OPTIONAL
}
let triple : (int * (bool * bool option)) t =
  sequence (
      required int
    @ required bool
   -@ optional bool
  )

We can also re-introduce functions, but in a controlled manner:

val map : ('a -> 'b) -> ('b -> 'a) -> 'a t -> 'b t

Keeping in line with the general theme of bidirectionality, we require functions to come in pairs. The deceptively called map could also be called iso, and comes with a nice property: if the two functions are truly inverses, the serialization process is fully reversible, and so is parsing, under single-representation encodings (DER)!

ASTs of ASNs

To go that last mile, we should probably also implement what we discussed.

Traditional parser combinators look a little like this:

type 'a p = string -> 'a * string

let bool : bool p = fun str -> (s.[0] <> "\000", tail_of_string str)

Usually, the values inhabiting the parser type are the actual parsing functions, and their composition directly produces larger parsing functions. We would probably need to represent them with 'a p * 'a s, pairs of a parser and its inverse, but the same general idea applies.

Nevertheless, we don't want to do this. The grammars need to support more than one concrete parser/serializer, and composing what is common between them and extracting out what is not would probably turn into a tangled mess. That is one reason. The other is that if we encode the grammar purely as (non-function) value, we can traverse it for various other purposes.

So we turn from what is sometimes called "shallow embedding" to "deep embedding" and try to represent the grammar purely as an algebraic data type.

Let's try to encode the parser for bools, boolean : bool t:

type 'a t =
  | Bool
  ...

let boolean : bool t = Bool

Unfortunately our constructor is fully polymorphic, of type 'a. 'a t. We can constrain it for the users, but once we traverse it there is nothing left to prove its intended association with booleans!

Fortunately, starting with the release of OCaml 4.00.0, OCaml joined the ranks of languages equipped with what is probably the supreme tool of deep embedding, GADTs. Using them, we can do things like:

type _ t =
  | Bool   : bool t
  | Pair   : ('a t * 'b t) -> ('a * 'b) t
  | Choice : ('a t * 'b t) -> ('a, 'b) choice t
  ...

In fact, this is very close to how the library is actually implemented.

There is only one thing left to worry about: ASN definitions can be recursive. We might try something like:

let rec list = choice null (pair int list)

But this won't work. Being just trees of applications, our definitions never contain statically constructive parts -- this expression could never terminate in a strict language.

We can get around that by wrapping grammars in Lazy.t (or just closures), but this would be too awkward to use. Like many other similar libraries, we need to provide a fixpoint combinator:

val fix : ('a t -> 'a t) -> 'a t

And get to write:

let list = fix @@ fun list -> choice null (pair int list)

This introduces a small problem. So far we simply reused binding inherited from OCaml without ever worrying about identifiers and references, but with a fixpoint, the grammar encodings need to be able to somehow express a cycle.

Borrowing an idea from higher-order abstract syntax, we can represent the entire fixpoint node using exactly the function provided to define it, re-using OCaml's own binding and identifier resolution:

type _ t =
  | Fix : ('a t -> 'a t) -> 'a t
  ...

This treatment completely sidesteps the problems with variables. We need no binding environments or De Brujin indices, and need not care about the desired scoping semantics. A little trade-off is that with this simple encoding it becomes more difficult to track cycles (when traversing the AST, if we keep applying a Fix node to itself while descending into it, it looks like an infinite tree), but with a little opportunistic caching it all plays out well [2].

The parser and serializer proper then emerge as interpreters for this little language of typed trees, traversing them with an input string, and parsing it in a fully type-safe manner.

How does it play out?

The entire ASN library comes down to ~1,700 lines of OCaml, with around ~1,100 more in tests, giving a mostly-complete treatment of BER and DER.

Its main use so far is in the context of the X.509 library (discussed yesterday). It allowed the grammar of certificates and RSA keys, together with a number of transformations from the raw types to more pleasant, externally facing ones, to be written in ~900 lines of OCaml. And the code looks a lot like the actual standards the grammars were taken from -- the fragment from the beginning of this article becomes:

let unique_identifier = bit_string_cs

let time =
  map (function `C1 t -> t | `C2 t -> t) (fun t -> `C2 t)
      (choice2 utc_time generalized_time)

let validity =
  sequence2
    (required ~label:"not before" time)
    (required ~label:"not after"  time)

We added ~label to 'a element-forming injections, and have:

val choice2 : 'a t -> 'b t -> [ `C1 of 'a | `C2 of 'b ] t

To get a sense of how the resulting system eases the translation of standardized ASN grammars into working code, it is particularly instructive to compare these two definitions.

Reversibility was a major simplifying factor during development. Since the grammars are traversable, it is easy to generate their random inhabitants, encode them, parse the result and verify the reversibility still holds. This can't help convince us the parsing/serializing pair is actually correct with respect to ASN, but it gives a simple tool to generate large amounts of test cases and convince us that that pair is equivalent. A number of hand-written cases then check the conformance to the actual ASN.

As for security, there were two concerns we were aware of. There is a history of catastrophic buffer overruns in some ASN.1 implementations, but -- assuming our compiler and runtime system are correct -- we are immune to these as we are subject to bounds-checking. And there are some documented problems with security of X.509 certificate verification due to overflows of numbers in ASN OID types, which we explicitly guard against.

You can check our security status on our issue tracker.

Footnotes

  1. In fact, the problem with embedding functions in combinator languages, and the fact that in a functional language it is not possible to extract information from a function other than by applying it, was discussed more than a decade ago. Such discussions led to the development of Arrows, amongst other things.

  2. Actually, a version of the library used the more proper encoding to be able to inject results of reducing referred-to parts of the AST into the referring sites directly, roughly like Fix : ('r -> ('a, 'r) t) -> ('a, 'r) t. This approach was abandoned because terms need to be polymorphic in 'r, and this becomes impossible to hide from the user of the library, creating unwelcome noise.


Posts in this TLS series:


Jul
10
2014
13
0

OCaml-TLS: Adventures in X.509 certificate parsing and validation

By Hannes Mehnert

This is the third in a series of posts that introduce new libraries for a pure OCaml implementation of TLS. You might like to begin with the introduction.

The problem of authentication

The authenticity of the remote server needs to be verified while establishing a secure connection to it, or else an attacker (MITM) between the client and the server can eavesdrop on the transmitted data. To the best of our knowledge, authentication cannot be done solely in-band, but needs external infrastructure. The most common methods used in practice rely on public key encryption.

Web of trust (used by OpenPGP) is a decentralised public key infrastructure. It relies on out-of-band verification of public keys and transitivity of trust. If Bob signed Alice's public key, and Charlie trusts Bob (and signed his public key), then Charlie can trust that Alice's public key is hers.

Public key infrastructure (used by TLS) relies on trust anchors which are communicated out-of-band (e.g. distributed with the client software). In order to authenticate a server, a chain of trust between a trust anchor and the server certificate (public key) is established. Only those clients which have the trust anchor deployed can verify the authenticity of the server.

X.509 public key infrastructure

X.509 is an ITU standard for a public key infrastructure, developed in 1988. Amongst other things, it specifies the format of certificates, their attributes, revocation lists, and a path validation algorithm. X.509 certificates are encoded using abstract syntax notation one (ASN.1).

A certificate contains a public key, a subject (server name), a validity period, a purpose (i.e. key usage), an issuer, and possibly other extensions. All components mentioned in the certificate are signed by an issuer.

A certificate authority (CA) receives a certificate signing request from a server operator. It verifies that this signing request is legitimate (e.g. requested server name is owned by the server operator) and signs the request. The CA certificate must be trusted by all potential clients. A CA can also issue intermediate CA certificates, which are allowed to sign certificates.

When a server certificate or intermediate CA certificate is compromised, the CA publishes this certificate in its certificate revocation list (CRL), which each client should poll periodically.

The following certificates are exchanged before a TLS session:

  • CA -> Client: CA certificate, installed as trust anchor on the client
  • Server -> CA: certificate request, to be signed by the CA
  • CA -> Server: signed server certificate

During the TLS handshake the server sends the certificate chain to the client. When a client wants to verify a certificate, it has to verify the signatures of the entire chain, and find a trust anchor which signed the outermost certificate. Further constraints, such as the maximum chain length and the validity period, are checked as well. Finally, the server name in the server certificate is checked to match the expected identity. For an example, you can see the sequence diagram of the TLS handshake your browser makes when you visit our demonstration server.

Example code for verification

OpenSSL implements RFC5280 path validation, but there is no implementation to validate the identity of a certificate. This has to be implemented by each client, which is rather complex (e.g. in libfetch it spans over more than 300 lines). A client of the ocaml-x509 library (such as our http-client) has to write only two lines of code:

lwt authenticator = X509_lwt.authenticator (`Ca_dir ca_cert_dir) in
lwt (ic, oc) =
  Tls_lwt.connect_ext
    (Tls.Config.client_exn ~authenticator ())
    (host, port)

The authenticator uses the default directory where trust anchors are stored ('ca_cert_dir'), and this authenticator is passed to the 'connect_ext' function. This initiates the TLS handshake, and passes the trust anchors and the hostname to the TLS library.

During the client handshake when the certificate chain is received by the server, the given authenticator and hostname are used to authenticate the certificate chain (in 'validate_chain'):

match
 X509.Authenticator.authenticate ?host:server_name authenticator stack
with
 | `Fail SelfSigned         -> fail Packet.UNKNOWN_CA
 | `Fail NoTrustAnchor      -> fail Packet.UNKNOWN_CA
 | `Fail CertificateExpired -> fail Packet.CERTIFICATE_EXPIRED
 | `Fail _                  -> fail Packet.BAD_CERTIFICATE
 | `Ok                      -> return server_cert

Internally, ocaml-x509 extracts the hostname list from a certificate in 'cert_hostnames', and the wildcard or strict matcher compares it to the input. In total, this is less than 50 lines of pure OCaml code.

Problems in X.509 verification

Several weaknesses in the verification of X.509 certificates have been discovered, ranging from cryptographic attacks due to collisions in hash algorithms (practical) over misinterpretation of the name in the certificate (a C string is terminated by a null byte), and treating X.509 version 1 certificates always as a trust anchor in GnuTLS.

An empirical study of software that does certificate verification showed that badly designed APIs are the root cause of vulnerabilities in this area. They tested various implementations by using a list of certificates, which did not form a chain, and would not authenticate due to being self-signed, or carrying a different server name.

Another recent empirical study (Frankencert) generated random certificates and validated these with various stacks. They found lots of small issues in nearly all certificate verification stacks.

Our implementation mitigates against some of the known attacks: we require a complete valid chain, check the extensions of a certificate, and implement hostname checking as specified in RFC6125. We have a test suite with over 3200 tests and multiple CAs. We do not yet discard certificates which use MD5 as hash algorithm. Our TLS stack requires certificates to have at least 1024 bit RSA keys.

X.509 library internals

The x509 library uses asn-combinators to parse X.509 certificates and the nocrypto library for signature verification (which we wrote about previously). At the moment we do not yet expose certificate builders from the library, but focus on certificate parsing and certificate authentication.

The x509 module provides modules which parse PEM-encoded (pem) certificates (Cert) and private keys (Pk), and an authenticator module (Authenticators).

So far we have two authenticators implemented:

  • 'chain_of_trust', which implements the basic path validation algorithm from RFC5280 (section 6) and the hostname validation from RFC6125. To construct such an authenticator, a timestamp and a list of trust anchors is needed.
  • 'null', which always returns success.

The method 'authenticate', to be called when a certificate stack should be verified, receives an authenticator, a hostname and the certificate stack. It returns either Ok or Fail.

Our certificate type is very similar to the described structure in the RFC:

type tBSCertificate = {
  version    : [ `V1 | `V2 | `V3 ] ;
  serial     : Z.t ;
  signature  : Algorithm.t ;
  issuer     : Name.dn ;
  validity   : Time.t * Time.t ;
  subject    : Name.dn ;
  pk_info    : PK.t ;
  issuer_id  : Cstruct.t option ;
  subject_id : Cstruct.t option ;
  extensions : (bool * Extension.t) list
}

type certificate = {
  tbs_cert       : tBSCertificate ;
  signature_algo : Algorithm.t ;
  signature_val  : Cstruct.t
}

The certificate itself wraps the to be signed part ('tBSCertificate'), the used signature algorithm, and the actual signature. It consists of a version, serial number, issuer, validity, subject, public key information, optional issuer and subject identifiers, and a list of extensions -- only version 3 certificates may have extensions.

The 'certificate' module implements the actual authentication of certificates, and provides some useful getters such as 'cert_type', 'cert_usage', and 'cert_extended_usage'. The main entry for authentication is 'verify_chain_of_trust', which checks correct signatures of the chain, extensions and validity of each certificate, and the hostname of the server certificate.

The grammar of X.509 certificates is developed in the 'asn_grammars' module, and the object identifiers are gathered in the 'registry' module.

Implementation of certificate verification

We provide the function 'valid_cas', which takes a timestamp and a list of certificate authorities. Each certificate authority is checked to be valid, self-signed, correctly signed, and having proper X.509 v3 extensions. As mentioned above, version 1 and version 2 certificates do not contain extensions. For a version 3 certificate, 'validate_ca_extensions' is called: The basic constraints extensions must be present, and its value must be true. Also, key usage must be present and the certificate must be allowed to sign certificates. Finally, we reject the certificate if there is any extension marked critical, apart from the two mentioned above.

When we have a list of validated CA certificates, we can use these to verify the chain of trust, which gets a hostname, a timestamp, a list of trust anchors and a certificate chain as input. It first checks that the server certificate is valid, the validity of the intermediate certificates, and that the chain is complete (the pathlen constraint is not validated) and rooted in a trust anchor. A server certificate is valid if the validity period matches the current timestamp, the given hostname matches its subject alternative name extension or common name (might be wildcard or strict matching, RFC6125), and it does not have a basic constraints extension which value is true.

Current status of ocaml-x509

We currently support only RSA certificates. We do not check revocation lists or use the online certificate status protocol (OCSP). Our implementation does not handle name constraints and policies. However, if any of these extensions is marked critical, we refuse to validate the chain. To keep our main authentication free of side-effects, it currently uses the timestamp when the authenticator was created rather than when it is used (this isn't a problem if lifetime of the OCaml-TLS process is comparatively short, as in the worst case the lifetime of the certificates can be extended by the lifetime of the process).

We invite people to read through the certificate verification and the ASN.1 parsing. We welcome discussion on the mirage-devel mailing list and bug reports on the GitHub issue tracker.


Posts in this TLS series:


Jul
9
2014
16
0

OCaml-TLS: building the nocrypto library core

By David Kaloper

This is the second in a series of posts that introduce new libraries for a pure OCaml implementation of TLS. You might like to begin with the introduction.

What is nocrypto?

nocrypto is the small cryptographic library behind the ocaml-tls project. It is built to be straightforward to use, adhere to functional programming principles and able to run in a Xen-based unikernel. Its major use-case is ocaml-tls, which we announced yesterday, but we do intend to provide sufficient features for it to be more widely applicable.

"Wait, you mean you wrote your own crypto library?"

"Never write your own crypto"

Everybody seems to recognize that cryptography is horribly difficult. Building cryptography, it is all too easy to fall off the deep end and end up needing to make decisions only a few, select specialists can make. Worse, any mistake is difficult to uncover but completely compromises the security of the system. Or in Bruce Schneier's words:

Building a secure cryptographic system is easy to do badly, and very difficult to do well. Unfortunately, most people can't tell the difference. In other areas of computer science, functionality serves to differentiate the good from the bad: a good compression algorithm will work better than a bad one; a bad compression program will look worse in feature-comparison charts. Cryptography is different. Just because an encryption program works doesn't mean it is secure.

Obviously, it would be far wiser not to attempt to do this and instead reuse good, proven work done by others. And with the wealth of free cryptographic libraries around, one gets to take their pick.

So to begin with, we turned to cryptokit, the more-or-less standard cryptographic library in the OCaml world. It has a decent coverage of the basics: some stream ciphers (ARC4), some block ciphers (AES, 3DES and Blowfish) the core hashes (MD5, SHA, the SHA2 family and RIPEMD) and the public-key primitives (Diffie-Hellman and RSA). It is also designed with composability in mind, exposing various elements as stream-transforming objects that can be combined on top of one another.

Unfortunately, its API was a little difficult to use. Suppose you have a secret key, an IV and want to use AES-128 in CBC mode to encrypt a bit of data. You do it like this:

let key = "abcd1234abcd1234"
and iv  = "1234abcd1234abcd"
and msg = "fire the missile"

let aes     = new Cryptokit.Block.aes_encrypt key
let aes_cbc = new Cryptokit.Block.cbc_encrypt ~iv aes

let cip =
  let size =
    int_of_float (ceil (float String.(length msg) /. 16.) *. 16.) in
  String.create size

let () = aes_cbc#transform msg 0 cip 0

At this point, cip contains our secret message. This being CBC, both msg and the string the output will be written into (cip) need to have a size that is a multiple of the underlying block size. If they do not, bad things will happen -- silently.

There is also the curious case of hashing-object states:

let md5 = Cryptokit.Hash.md5 ()

let s1 = Cryptokit.hash_string md5 "bacon"
let s2 = Cryptokit.hash_string md5 "bacon"
let s3 = Cryptokit.hash_string md5 "bacon"

(*
  s1 = "x\019%\142\248\198\1822\221\232\204\128\246\189\166/"
  s2 = "'\\F\017\234\172\196\024\142\255\161\145o\142\128\197"
  s3 = "'\\F\017\234\172\196\024\142\255\161\145o\142\128\197"
*)

The error here is to try and carry a single instantiated hashing object around, while trying to get hashes of distinct strings. But with the convergence after the second step, the semantics of the hashing object still remains unclear to us.

One can fairly easily overcome the API style mismatches by making a few specialized wrappers, of course, except for two major problems:

  • Cryptokit is pervasively stateful. While this is almost certainly a result of performance considerations combined with its goals of ease of compositionality, it directly clashes with the fundamental design property of the TLS library we wanted to use it in: our ocaml-tls library is stateless. We need to be able to represent the state the encryption engine is in as a value.

  • Cryptokit operates on strings. As a primary target of ocaml-tls was Mirage, and Mirage uses separate, non-managed regions of memory to store network data in, we need to be able to handle foreign-allocated storage. This means Bigarray (as exposed by Cstruct), and it seems just plain wrong to negate all the careful zero-copy architecture of the stack below by copying everything into and out of strings.

There are further problems. For example, Cryptokit makes no attempts to combat well-known timing vulnerabilities. It has no support for elliptic curves. And it depends on the system-provided random number generator, which does not exist when running in the context of a unikernel.

At this point, with the de facto choice off the table, it's probably worth thinking about writing OCaml bindings to a rock-solid cryptographic library written in C.

NaCl is a modern, well-regarded crypto implementation, created by a group of pretty famous and equally well-regarded cryptographers, and was the first choice. Or at least its more approachable and packageable fork was, which already had OCaml bindings. Unfortunately, NaCl provides a narrow selection of implementations of various cryptographic primitives, the ones its authors thought were best-of-breed (for example, the only symmetric ciphers it implements are (X-)Salsa and AES in CTR mode). And they are probably right (in some aspects they are certainly right), but NaCl is best used for implementations of newly-designed security protocols. It is simply too opinionated to support an old, standardized behemoth like TLS.

Then there is crypto, the library OpenSSL is built on top of. It is quite famous and provides optimized implementations of a wide range of cryptographic algorithms. It also contains upwards of 200,000 lines of C and a very large API footprint, and it's unclear whether it would be possible to run it in the unikernel context. Recently, the parent project it is embedded in has become highly suspect, with one high-profile vulnerability piling on top of another and at least two forks so far attempting to clean the code base. It just didn't feel like a healthy code base to build a new project on.

There are other free cryptographic libraries in C one could try to bind, but at a certain point we faced the question: is the work required to become intimately familiar with the nuances and the API of an existing code base, and create bindings for it in OCaml, really that much smaller than writing one from scratch? When using a full library one commits to its security decisions and starts depending on its authors' time to keep it up to date -- maybe this effort is better spent in writing one in the first place.

Tantalizingly, the length of the single OCaml source file in Cryptokit is 2260 lines.

Maybe if we made zero decisions ourselves, informed all our work by published literature and research, and wrote the bare minimum of code needed, it might not even be dead-wrong to do it ourselves?

And that is the basic design principle. Do nothing fancy. Do only documented things. Don't write too much code. Keep up to date with security research. Open up and ask people.

The anatomy of a simple crypto library

nocrypto uses bits of C, similarly to other cryptographic libraries written in high-level languages.

This was actually less of a performance concern, and more of a security one: for the low-level primitives which are tricky to implement and for which known, compact and widely used code already exists, the implementation is probably better reused. The major pitfall we hoped to avoid that way are side-channel attacks.

We use public domain (or BSD licenced) C sources for the simple cores of AES, 3DES, MD5, SHA and SHA2. The impact of errors in this code is constrained: they contain no recursion, and they perform no allocation, simply filling in caller-supplied fixed-size buffer by appropriate bytes.

The block implementations in C have a simple API that requires us to provide the input and output buffers and a key, writing the single encrypted (or decrypted) block of data into the buffer. Like this:

void rijndaelEncrypt(const unsigned long *rk, int nrounds,
  const unsigned char plaintext[16], unsigned char ciphertext[16]);

void rijndaelDecrypt(const unsigned long *rk, int nrounds,
  const unsigned char ciphertext[16], unsigned char plaintext[16]);

The hashes can initialize a provided buffer to serve as an empty accumulator, hash a single chunk of data into that buffer and convert its contents into a digest, which is written into a provided fixed buffer.

In other words, all the memory management happens exclusively in OCaml and all the buffers passed into the C layer are tracked by the garbage collector (GC).

Symmetric ciphers

So far, the only provided ciphers are AES, 3DES and ARC4, with ARC4 implemented purely in OCaml (and provided only for TLS compatibility and for testing).

AES and 3DES are based on core C code, on top of which we built some standard modes of operation in OCaml. At the moment we support ECB, CBC and CTR. There is also a nascent GCM implementation which is, at the time of writing, known not to be optimal and possibly prone to timing attacks, and which we are still working on.

The exposed API strives to be simple and value-oriented. Each mode of each cipher is packaged up as a module with a similar signature, with a pair of functions for encryption and decryption. Each of those essentially takes a key and a byte buffer and yields the resulting byte buffer, minimising hassle.

This is how you encrypt a message:

open Nocrypto.Block

let key = AES.CBC.of_secret Cstruct.(of_string "abcd1234abcd1234")
and iv  = Cstruct.of_string "1234abcd1234abcd"
and msg = Cstruct.of_string "fire the missile"

let { AES.CBC.message ; iv } = AES.CBC.encrypt ~key ~iv msg

The hashes implemented are just MD5, SHA and the SHA2 family. Mirroring the block ciphers, they are based on C cores, with the HMAC construction provided in OCaml. The API is similarly simple: each hash is a separate module with the same signature, providing a function that takes a byte buffer to its digest, together with several stateful operations for incremental computation of digests.

Of special note is that our current set of C sources will probably soon be replaced. AES uses code that is vulnerable to a timing attack, stemming from the fact that substitution tables are loaded into the CPU cache as-needed. The code does not take advantage of the AES-NI instructions present in modern CPUs that allow AES to be hardware-assisted. SHA and SHA2 cores turned out to be (comparatively) ill-performing, and static analysis already uncovered some potential memory issues, so we are looking for better implementations.

Public-key cryptography

Bignum arithmetic is provided by the excellent zarith library, which in turn uses GMP. This might create some portability problems later on, but as GMP is widely used and well rounded code base which also includes some of the needed auxiliary number-theoretical functions (its slightly extended Miller-Rabin probabilistic primality test and the fast next-prime-scanning function), it seemed like a much saner choice than redoing it from scratch.

The RSA module provides the basics: raw encryption and decryption, PKCS1-padded versions of the same operations, and PKCS1 signing and signature verification. It can generate RSA keys, which it does simply by finding two large primes, in line with Rivest's own recommendation.

Notably, RSA implements the standard blinding technique which can mitigate some side-channel attacks, such as timing or acoustic cryptanalysis. It seems to foil even stronger, cache eviction based attacks, but as of now, we are not yet completely sure.

The Diffie-Hellman module is also relatively basic. We implement some widely recommended checks on the incoming public key to mitigate some possible MITM attacks, the module can generate strong DH groups (using safe primes) with guaranteed large prime-order subgroup, and we provide a catalogue of published DH groups ready for use.

Randomness

Random number generation used to be a chronically overlooked part of cryptographic libraries, so much so that nowadays one of the first questions about a crypto library is, indeed, "Where does it get randomness from?"

It's an important question. A cryptographic system needs unpredictability in many places, and violating this causes catastrophic failures.

nocrypto contains its own implementation of Fortuna. Like Yarrow, Fortuna uses a strong block cipher in CTR mode (AES in our case) to produce the pseudo-random stream, a technique that is considered as unbreakable as the underlying cipher.

The stream is both self-rekeyed, and rekeyed with the entropy gathered into its accumulator pool. Unlike the earlier designs, however, Fortuna is built without entropy estimators, which usually help the PRNG decide when to actually convert the contents of an entropy pool into the new internal state. Instead, Fortuna uses a design where the pools are fed round-robin, but activated with an exponential backoff. There is recent research showing this design is essentially sound: after a state compromise, Fortuna wastes no more than a constant factor of incoming entropy -- whatever the amount of entropy is -- before coming back to an unpredictable state. The resulting design is both simple, and robust in terms of its usage of environmental entropy.

The above paper also suggests a slight improvement to the accumulator regime, yielding a factor-of-2 improvement in entropy usage over the original. We still haven't implemented this, but certainly intend to.

A PRNG needs to be fed with some actual entropy to be able to produce unpredictable streams. The library itself contains no provisions for doing this and its PRNG needs to be fed by the user before any output can be produced. We are working with the Mirage team on exposing environmental entropy sources and connecting them to our implementation of Fortuna.

Above & beyond

nocrypto is still very small, providing the bare minimum cryptographic services to support TLS and related X.509 certificate operations. One of the goals is to flesh it out a bit, adding some more widely deployed algorithms, in hopes of making it more broadly usable.

There are several specific problems with the library at this stage:

C code - As mentioned, we are seeking to replace some of the C code we use. The hash cores are underperforming by about a factor of 2 compared to some other implementations. AES implementation is on one hand vulnerable to a timing attack and, on the other hand, we'd like to make use of hardware acceleration for this workhorse primitive -- without it we lose about an order of magnitude of performance.

Several options were explored, ranging from looking into the murky waters of OpenSSL and trying to exploit their heavily optimized primitives, to bringing AES-NI into OCaml and redoing AES in OCaml. At this point, it is not clear which path we'll take.

ECC - Looking further, the library still lacks support for elliptic curve cryptography and we have several options for solving this. Since it is used by TLS, ECC is probably the missing feature we will concentrate on first.

Entropy on Xen - The entropy gathering on Xen is incomplete. The current prototype uses current time as the random seed and the effort to expose noisier sources like interrupt timings and the RNG from dom0's kernel is still ongoing. Dave Scott, for example, has submitted patches to upstream Xen to make it easier to establish low-bandwidth channels to supplies guest VMs with strong entropy from a privileged domain that has access to physical devices and hence high-quality entropy sources.

GC timing attacks? - There is the question of GC and timing attacks: whether doing cryptography in a high-level language opens up a completely new surface for timing attacks, given that GC runs are very visible in the timing profile. The basic approach is to leave the core routines which we know are potentially timing-sensitive (like AES) and for which we don't have explicit timing mitigations (like RSA) to C, and invoke them atomically from the perspective of the GC. So far, it's an open question whether the constructions built on top of them expose further side-channels.

Still, we believe that the whole package is a pleasant library to work with. Its simplicity contributes to the comparative simplicity of the entire TLS library, and we are actively seeking input on areas that need further improvement. Although we are obviously biased, we believe it is the best cryptographic base library available for this project, and it might be equally suited for your next project too!

We are striving to be open about the current security status of our code. You are free to check out our issue tracker and invited to contribute comments, ideas, and especially audits and code.


Posts in this TLS series:


Jul
8
2014
15
0

Introducing transport layer security (TLS) in pure OCaml

By Hannes Mehnert, David Kaloper

We announce a beta release of ocaml-tls, a clean-slate implementation of Transport Layer Security (TLS) in OCaml.

What is TLS?

Transport Layer Security (TLS) is probably the most widely deployed security protocol on the Internet. It provides communication privacy to prevent eavesdropping, tampering, and message forgery. Furthermore, it optionally provides authentication of the involved endpoints. TLS is commonly deployed for securing web services (HTTPS), emails, virtual private networks, and wireless networks.

TLS uses asymmetric cryptography to exchange a symmetric key, and optionally authenticate (using X.509) either or both endpoints. It provides algorithmic agility, which means that the key exchange method, symmetric encryption algorithm, and hash algorithm are negotiated.

TLS in OCaml

Our implementation ocaml-tls is already able to interoperate with existing TLS implementations, and supports several important TLS extensions such as server name indication (RFC4366, enabling virtual hosting) and secure renegotiation (RFC5746).

Our demonstration server runs ocaml-tls and renders exchanged TLS messages in nearly real time by receiving a trace of the TLS session setup. If you encounter any problems, please give us feedback.

ocaml-tls and all dependent libraries are available via OPAM (opam install tls). The source is available under a BSD license. We are primarily working towards completeness of protocol features, such as client authentication, session resumption, elliptic curve and GCM cipher suites, and have not yet optimised for performance.

ocaml-tls depends on the following independent libraries: ocaml-nocrypto implements the cryptographic primitives, ocaml-asn1-combinators provides ASN.1 parsers/unparsers, and ocaml-x509 implements the X509 grammar and certificate validation (RFC5280). ocaml-tls implements TLS (1.0, 1.1 and 1.2; RFC2246, RFC4346, RFC5246).

We invite the community to audit and run our code, and we are particularly interested in discussion of our APIs. Please use the mirage-devel mailing list for discussions.

Please be aware that this release is a beta and is missing external code audits. It is not yet intended for use in any security critical applications.

In our issue tracker we transparently document known attacks against TLS and our mitigations (checked and unchecked). We have not yet implemented mitigations against either the Lucky13 timing attack or traffic analysis (e.g. length-hiding padding).

Trusted code base

Designed to run on Mirage, the trusted code base of ocaml-tls is small. It includes the libraries already mentioned, `ocaml-tls`, `ocaml-asn-combinators`, `ocaml-x509`, and `ocaml-nocrypto` (which uses C implementations of block ciphers and hash algorithms). For arbitrary precision integers needed in asymmetric cryptography, we rely on `zarith`, which wraps `libgmp`. As underlying byte array structure we use `cstruct` (which uses OCaml Bigarray as storage).

We should also mention the OCaml runtime, the OCaml compiler, the operating system on which the source is compiled and the binary is executed, as well as the underlying hardware. Two effectful frontends for the pure TLS core are implemented, dealing with side-effects such as reading and writing from the network: Lwt_unix and Mirage, so applications can run directly as a Xen unikernel.

Why a new TLS implementation?

Update: Thanks to Frama-C guys for pointing out that CVE-2014-1266 and CVE-2014-0224 are not memory safety issues, but logic errors. This article previously stated otherwise.

There are only a few TLS implementations publicly available and most programming languages bind to OpenSSL, an open source implementation written in C. There are valid reasons to interface with an existing TLS library, rather than developing one from scratch, including protocol complexity and compatibility with different TLS versions and implementations. But from our perspective the disadvantage of most existing libraries is that they are written in C, leading to:

  • Memory safety issues, as recently observed by Heartbleed and GnuTLS session identifier memory corruption (CVE-2014-3466) bugs;
  • Control flow complexity (Apple's goto fail, CVE-2014-1266);
  • And difficulty in encoding state machines (OpenSSL change cipher suite attack, CVE-2014-0224).

Our main reasons for ocaml-tls are that OCaml is a modern functional language, which allows concise and declarative descriptions of the complex protocol logic and provides type safety and memory safety to help guard against programming errors. Its functional nature is extensively employed in our code: the core of the protocol is written in purely functional style, without any side effects.

Subsequent blog posts over the coming days will examine in more detail the design and implementation of the four libraries, as well as the security trade-offs and some TLS attacks and our mitigations against them. For now though, we invite you to try out our demonstration server running our stack over HTTPS. We're particularly interested in feedback on our issue tracker about clients that fail to connect, and any queries from anyone reviewing the source code of the constituent libraries.


Posts in this TLS series:


Jul
8
2014
11
0

MirageOS 1.2 released and the 2.0 runup begins

By Anil Madhavapeddy

Summer is in full swing here in MirageOS HQ with torrential rainstorms, searing sunshine, and our OSCON 2014 talk rapidly approaching in just a few weeks. We've been steadily releasing point releases since the first release back in December, and today's MirageOS 1.2.0 is the last of the 1.x series. The main improvements are usability-oriented:

  • The Mirage frontend tool now generates a Makefile with a make depend target, instead of directly invoking OPAM as part of mirage configure. This greatly improves usability on slow platforms such as ARM, since the output of OPAM as it builds can be inspected more easily. Users will now need to run make depend to ensure they have the latest package set before building their unikernel.

  • Improve formatting of the mirage output, including pretty colours! This makes it easier to distinguish complex unikernel configurations that have lots of deployment options. The generated files are built more verbosely by default to facilitate debugging, and with debug symbols and backtraces enabled by default.

  • Added several device module types, including ENTROPY for random noise, FLOW for stream-oriented connections, and exposed the IPV4 device in the STACKV4 TCP/IP stack type.

  • Significant bugfixes in supporting libraries such as the TCP/IP stack (primarily thanks to Mindy Preston fuzz testing and finding some good zingers). There are too many library releases to list individually here, but you can browse the changelog for more details.

 Towards MirageOS 2.0

We've also been working hard on the MirageOS 2.x series, which introduces a number of new features and usability improvements that emerged from actually using the tools in practical projects. Since there have been so many new contributors recently, Amir Chaudhry is coordinating a series of blog posts in the runup to OSCON that explains the new work in depth. Once the release rush has subsided, we'll be working on integrating these posts into our documentation properly.

The new 2.0 features include the Irmin branch-consistent distributed storage library, the pure OCaml TLS stack, Xen/ARM support and the Conduit I/O subsystem for mapping names to connections. Also included in the blog series are some sample usecases on how these tie together for real applications (as a teaser, here's a video of Xen VMs booting using Irmin thanks to Dave Scott and Thomas Gazagnaire!)

Upcoming talks and tutorials

Richard Mortier and myself will be gallivanting around the world to deliver a few talks this summer:

  • The week of OSCON on July 20th-24th. Please get in touch via the conference website or a direct e-mail, or attend our talk on Thursday morning. There's a Real World OCaml book signing on Tuesday morning for the super keen as well.
  • The ECOOP summer school in beautiful Uppsala in Sweden on Weds 30th July.
  • I'll be presenting the Irmin and Xen integration at Xen Project Developer Summit in Chicago on Aug 18th (as part of LinuxCon North America). Mort and Mindy (no jokes please) will be joining the community panel about GSoC/OPW participation.

As always, if there are any particular topics you would like to see more on, then please comment on the tracking issue or get in touch directly. There will be a lot of releases coming out in the next few weeks (including a beta of the new version of OPAM, so bug reports are very much appreciated for those things that slip past Travis CI!


May
8
2014
18
0

Welcome to the summer MirageOS hackers

By Anil Madhavapeddy

Following our participation in the Google Summer of Code program, we've now finalised selections. We've also got a number of other visitors joining us to hack on Mirage over the summer time, so here are introductions!

  • SSL support: Hannes Mehnert and David Kaloper have been working hard on a safe OCaml TLS implementation. They're going to hack on integrating it all into working under Xen so we can make HTTPS requests (and our Twitter bot will finally be able to tweet!). Both are also interested in formal verification of the result, and several loooong conversations with Peter Sewell will magically transform into machine specifications by summer's end, I'm reliably informed.
  • Cloud APIs: Jyotsna Prakash will spend her summer break as part of Google Summer of Code working on improving cloud provider APIs in OCaml (modelled from her notes on how the GitHub bindings are built). This will let the mirage command-line tool have much more natural integration with remote cloud providers for executing the unikernels straight from a command-line. If you see Jyotsna wandering around aimlessly muttering darkly about HTTP, JSON and REST, then the project is going well.
  • Network Stack fuzzing: Mindy Preston joins us for the summer after her Hacker School stay, courtesy of the OPW program. She's been delving into the network stack running on EC2 and figuring out how to debug issues when the unikernel is running a cloud far, far away (see the post series here: 1, 2, 3, 4).
  • Visualization: Daniel Buenzli returns to Cambridge this summer to continue his work on extremely succinctly named graphics libaries. His Vz, Vg and Gg libaries build a set of primitives for 2D graphics programming. Since the libraries compile to JavaScript, we're planning to use this as the basis for visualization of Mirage applications via a built-in webserver.
  • Modular implicits: Frederic Bour, author of the popular Merlin IDE tool is also in Cambridge this summer working on adding modular implicits to the core OCaml language. Taking inspiration from Modular Type-classes and Scala's implicits, modular implcits allow functions to take implicit module arguments which will be filled-in by the compiler by searching the environment for a module with the appropriate type. This enables ad-hoc polymorphism in a very similar way to Haskell's type classes.
  • Irmin storage algorithms: Benjamin Farinier (from ENS Lyon) and Matthieu Journault (from ENS Cachan) will work on datastructures for the Irmin storage system that the next version of Mirage will use. They'll be grabbing copies of the Okasaki classic text and porting some of them into a branch-consistent form.

Of course, work continues apace by the rest of the team as usual, with a steady stream of releases that are building up to some exciting new features. We'll be blogging about ARM support, PVHVM, Irmin storage and SSL integration just as soon as they're pushed into the stable branches. As always, get in touch via the IRC channel (#mirage on Freenode) or the mailing lists with questions.


Feb
25
2014
18
0

MirageOS is in Google Summer of Code 2014

By Anil Madhavapeddy

MirageOS will part of the Google Summer of Code 2014 program, thanks to the Xen Project's participation! It's been a few years since I've mentored for GSoc, but I still have fond memories of some great projects in the past (such as the legendary Quake testing we were forced to do for hours on end). I've already received a number of queries about this year's program from potential students, so here's a few things to note to become a successful applicant.

  • Students still need to apply and be accepted. Your chances of being selected are much higher if you demonstrate some participation and code contributions (even minor) before submitting an application. Thus, even if you don't have a copy of Xen around, roll up your sleeves and head over to the installation instructions.

  • Contributions do not have to be just code. They can be documentation, help with packaging, wiki posts about a particular use, or test cases to improve code coverage.

  • It's unlikely that we'll get students who are very familiar with both OCaml and Xen (if you are, definitely get in touch with us!). You should therefore look over the project ideas as a set of guidelines and not firm suggestions. If you have a particular thing you'd like to do with Mirage (for example, work on the JavaScript backend, an IPython interface or a particular protocol implementation such as XMPP, then that's fine. Just get in touch with us on the mailing lists or directly via e-mail, and we can work through them.

  • Under some circumstances, we can provide resources such as a login to a Xen machine, or delegated credits on a cloud provider. Don't let that stop you from applying for a project idea. In general though, it's best to only depend on your own computer resources if practical to do so.


Feb
11
2014
16
0

MirageOS 1.1.0: the eat-your-own-dogfood release

By Anil Madhavapeddy

We've just released MirageOS 1.1.0 into OPAM. Once the live site updates, you should be able to run opam update -u and get the latest version. This release is the "eat our own dogfood" release; as I mentioned earlier in January, a number of the MirageOS developers have decided to shift our own personal homepages onto MirageOS. There's nothing better than using our own tools to find all the little annoyances and shortcomings, and so MirageOS 1.1.0 contains some significant usability and structural improvements for building unikernels.

Functional combinators to build device drivers

MirageOS separates the application logic from the concrete backend in use by writing the application as an OCaml functor that is parameterized over module types that represent the device driver signature. All of the module types used in MirageOS can be browsed in one source file.

In MirageOS 1.1.0, Thomas Gazagnaire implemented a a combinator library that makes it easy to separate the definition of application logic from the details of the device drivers that actually execute the code (be it a Unix binary or a dedicated Xen kernel). It lets us write code of this form (taken from mirage-skeleton/block):

let () =
  let main = foreign "Unikernel.Block_test" (console @-> block @-> job) in
  let img = block_of_file "disk.img" in
  register "block_test" [main $ default_console $ img]

In this configuration fragment, our unikernel is defined as a functor over a console and a block device by using console @-> block @-> job. We then define a concrete version of this job by applying the functor (using the $ combinator) to a default console and a file-backed disk image.

The combinator approach lets us express complex assemblies of device driver graphs by writing normal OCaml code, and the mirage command line tool parses this at build-time and generates a main.ml file that has all the functors applied to the right device drivers. Any mismatches in module signatures will result in a build error, thus helping to spot nonsensical combinations (such as using a Unix network socket in a Xen unikernel).

This new feature is walked through in the tutorial, which now walks you through several skeleton examples to explain all the different deployment scenarios. It's also followed by the website tutorial that explains how this website works, and how our Travis autodeployment throws the result onto the public Internet.

Who will win the race to get our website up and running first? Sadly for Anil, Mort is currently in the lead with an all-singing, all-dancing shiny new website. Will he finish in the lead though? Stay tuned!

Less magic in the build

Something that's more behind-the-scenes, but important for easier development, is a simplication in how we build libraries. In MirageOS 1.0, we had several packages that couldn't be simultaneously installed, as they had to be compiled in just the right order to ensure dependencies.

With MirageOS 1.1.0, this is all a thing of the past. All the libraries can be installed fully in parallel, including the network stack. The 1.1.0 TCP/IP stack is now built in the style of the venerable FoxNet network stack, and is parameterized across its network dependencies. This means that once can quickly assemble a custom network stack from modular components, such as this little fragment below from mirage-skeleton/ethifv4/:

module Main (C: CONSOLE) (N: NETWORK) = struct

  module E = Ethif.Make(N)
  module I = Ipv4.Make(E)
  module U = Udpv4.Make(I)
  module T = Tcpv4.Flow.Make(I)(OS.Time)(Clock)(Random)
  module D = Dhcp_clientv4.Make(C)(OS.Time)(Random)(E)(I)(U)
  

This functor stack starts with a NETWORK (i.e. Ethernet) device, and then applies functors until it ends up with a UDPv4, TCPv4 and DHCPv4 client. See the full file to see how the rest of the logic works, but this serves to illustrate how MirageOS makes it possible to build custom network stacks out of modular components. The functors also make it easier to embed the network stack in non-MirageOS applications, and the tcpip OPAM package installs pre-applied Unix versions for your toplevel convenience.

To show just how powerful the functor approach is, the same stack can also be mapped onto a version that uses kernel sockets simply by abstracting the lower-level components into an equivalent that uses the Unix kernel to provide the same functionality. We explain how to swap between these variants in the tutorials.

Lots of library releases

While doing the 1.1.0 release in January, we've also released quite a few libraries into OPAM. Here are some of the highlights.

Low-level libraries:

  • mstruct is a streaming layer for handling lists of memory buffers with a simpler read/write interface.
  • nbd is an implementation of the Network Block Device protocol for block drivers.

Networking and web libraries:

  • ipaddr now has IPv6 parsing support thanks to Hugo Heuzard and David Sheets. This is probably the hardest bit of adding IPv6 support to our network stack!
  • cowabloga is slowly emerging as a library to handle the details of rendering Zurb Foundation websites. It's still in active development, but being used for a few of our personal websites as well as this website.
  • cohttp has had several releases thanks to external contributions, particular from Rudy Grinberg who added s-expression support and several other improvements.
  • uri features performance improvements and the elimination of Scanf (considered rather slow by OCaml standards).
  • cow continues its impossible push to make coding HTML and CSS a pleasant experience, with better support for Markdown now.
  • The github bindings are now also in use as part of an experiment to make upstream OCaml development easier for newcomers, thanks to Gabriel Scherer.

Dave Scott led the splitting up of several low-level Xen libraries as part of the build simplication. These now compile on both Xen (using the direct hypercall interface) and Unix (using the dom0 /dev devices) where possible.

  • xen-evtchn for the event notification mechanism. There are a couple of wiki posts that explain how event channels and suspend/resume work in MirageOS/Xen guests.
  • xen-gnt for the grant table mechanism that controls inter-process memory.
  • The io-page library no longer needs Unix and Xen variants, as the interface has been standardized to work in both.

All of Dave's hacking on Xen device drivers is showcased in this xen-disk wiki post that explains how you can synthesize your own virtual disk backends using MirageOS. Xen uses a split device model, and now MirageOS lets us build backend device drivers that service VMs as well as the frontends!

Last, but not least, Thomas Gazagnaire has been building a brand new storage system for MirageOS guests that uses git-style branches under the hood to help coordinate clusters of unikernels. We'll talk about how this works in a future update, but there are some cool libraries and prototypes available on OPAM for the curious.

  • lazy-trie is a lazy version of the Trie data structure, useful for exposing Git graphs.
  • git is a now-fairly complete implementation of the Git protocol in pure OCaml, which can interoperate with normal Git servers via the ogit command-line tool that it installs.
  • irmin is the main library that abstracts Git DAGs into an OCaml programming API. The homepage has instructions on how to play with the command-line frontend to experiment with the database.
  • git2fat converts a Git checkout into a FAT block image, useful when bundling up unikernels.

We'd also like to thank several conference organizers for giving us the opportunity to demonstrate MirageOS. The talk video from QCon SF is now live, and we also had a great time at FOSDEM recently (summarized by Amir here). So lots of activities, and no doubt little bugs lurking in places (particularly around installation). As always, please do let us know of any problem by reporting bugs, or feel free to contact us via our e-mail lists or IRC. Next stop: our unikernel homepages!


Jan
3
2014
16
0

Presenting Decks

By Richard Mortier

A few months ago, partly as a stunt, mostly because we could, Anil and I put together a presentation for OSCON'13 about Mirage in Mirage. That is, as a self-hosting Mirage web application that served up slides using RevealJS. It was a bit of a hack, but it was cool (we thought!) and it worked. Several more presentations were written and given this way, at venues ranging from the XenSummit 2013 to ACM FOCI 2013 to the Cambridge Computer Lab's MSc in Advanced Computer Science.

With the release of Mirage 1.0, CoHTTP, Cowabloga and the new Zurb Foundation based website, it was time to refresh them and as a little seasonal gift, give them a shiny new index with some actual CSS styling. So here they are, a set of presentations that have been given by various members of the Mirage team over the last 6 months or so. They cover a range of topics, from general introductions to the Xen roadmap to more detailed technical background. And, of course, as Mirage is under constant rapid development, some of the older content may already be outdated. But the code for the site itself serves as another example of a simple -- somewhat simpler than the Mirage website in fact -- Mirage web application.


Dec
19
2013
23
0

MirageOS 1.0.3 released; tutorial on building this website available

By Anil Madhavapeddy

We've had a lot of people trying out MirageOS since the 1.0 release last week, and so we've been steadily cutting point releases and new libraries to OPAM as they're done. The most common build error by far has been people using outdated OPAM packages. Do make sure that you have at least OPAM 1.1 installed, and that you've run opam update -u to get the latest package lists from the package repository.

MirageOS 1.0.3 improves Xen configuration generation, cleans up HTTP support, and adds support for FAT filesystems. Here are some of the libraries we've released this week to go along with it:

  • mirage-www (update): the live website now runs on the 1.0 tools. Explanation of how to build it in various configurations is available here.
  • alcotest (new): a lightweight and colourful test framework built over oUnit. The interface is simpler to facilitate writing tests quickly, and it formats test results nicely.
  • mirage-block-xen.1.0.0 (new): is the stable release of the Xen Blkfront driver for block devices. The library supports both frontend and backend operation, but only the frontend is plumbed through to Mirage for now (although the backend can be manually configured).
  • mirage-block-unix.1.2.0 (update): fixed some concurrency bugs and added support for buffered I/O to improve performance.
  • fat-filesystem.0.10.0 (update): copies with more sector sizes, uses buffered I/O on Unix, and adds a KV_RO key/value interface as well as a more complicated filesystem one.
  • mirage-fs-unix.1.0.0 (update): implements the KV_RO signature as a passthrough to a Unix filesystem. This is convenient during development to avoid recompile cycles while changing data.
  • mirage-xen.1.0.0 (update): removed several distracting-but-harmless linker warnings about code bloat.
  • cohttp.0.9.14 (update): supports Server-Side Events via better channel flushing, has a complete set of HTTP codes autogenerated from httpstatus.es and exposes a platform-independent Cohttp_lwt module.
  • cow.0.8.1 (update): switch to the Omd library for Markdown parsing, which is significantly more compatible with other parsers.
  • ezjsonm.0.2.0 (new): a combinator library to parse, select and manipulate JSON structures.
  • ezxmlm.1.0.0 (new): a combinator library to parse, select and transform XML tags and attributes.
  • mirage-http-xen and mirage-http-unix provide the HTTP drivers on top of Cohttp for MirageOS. Although they are very similar at the moment, they will diverge as the Unix variant gains options to use kernel sockets instead of only the network stack.

We're making great progress on moving our personal homepages over to MirageOS. The first two introductory wiki posts are also now available:

  • Building a hello world example takes you through the basic steps to build a Unix and Xen binary.
  • Building the MirageOS website lets you build this website with several variants, demonstrating the Unix passthrough filesystem, the OCaml FAT filesystem library, and how to attach a network stack to your application.

As always, please feel free to report any issues via the bug tracker and ask questions on the mailing list.


Dec
9
2013
12
0

MirageOS 1.0: not just a hallucination!

By Anil Madhavapeddy

First: read the overview and technical background behind the project.

When we started hacking on MirageOS back in 2009, it started off looking like a conventional OS, except written in OCaml. The monolithic repository contained all the libraries and boot code, and exposed a big OS module for applications to use. We used this to do several fun tutorials at conferences such as ICFP/CUFP and get early feedback.

As development continued though, we started to understand what it is we were building: a "library operating system". As the number of libraries grew, putting everything into one repository just wasn't scaling, and it made it hard to work with third-party code. We spent some time developing tools to make Mirage fit into the broader OCaml ecosystem.

Three key things have emerged from this effort:

  • OPAM, a source-based package manager for OCaml. It supports multiple simultaneous compiler installations, flexible package constraints, and a Git-friendly development workflow. Since releasing 1.0 in March 2013 and 1.1 in October, the community has leapt in to contribute over 1800 packages in this short time. All of the Mirage libraries are now tracked using it, including the Xen libraries.
  • The build system for embedded programming (such as the Xen target) is a difficult one to get right. After several experiments, Mirage provides a single command-line tool that combines configuration directives (also written in OCaml) with OPAM to make building Xen unikernels as easy as Unix binaries.
  • All of the Mirage-compatible libraries satisfy a set of module type signatures in a single file. This is where Mirage lives up to its name: we've gone from the early monolithic repository to a single, standalone interface file that describes the interfaces. Of course, we also have libraries to go along with this signature, and they all live in the MirageOS GitHub organization.

With these components, I'm excited to announce that MirageOS 1.0 is finally ready to see the light of day! Since it consists of so many libraries, we've decided not to have a "big bang" release where we dump fifty complex libraries on the open-source community. Instead, we're going to spend the month of December writing a series of blog posts that explain how the core components work, leading up to several use cases:

  • The development team have all decided to shift our personal homepages to be Mirage kernels running on Xen as a little Christmas present to ourselves, so we'll work through that step-by-step how to build a dedicated unikernel and maintain and deploy it (spoiler: see this repo). This will culminate in a webservice that our colleagues at Horizon have been building using Android apps and an HTTP backend.
  • The XenServer crew at Citrix are using Mirage to build custom middlebox VMs such as block device caches.
  • For teaching purposes, the Cambridge Computer Lab team want a JavaScript backend, so we'll explain how to port Mirage to this target (which is rather different from either Unix or Xen, and serves to illustrate the portability of our approach).

How to get involved

Bear with us while we update all the documentation and start the blog posts off today (the final libraries for the 1.0 release are all being merged into OPAM while I write this, and the usually excellent Travis continuous integration system is down due to a bug on their side). I'll edit this post to contain links to the future posts as they happen.

Since we're now also a proud Xen and Linux Foundation incubator project, our mailing list is shifting to mirageos-devel@lists.xenproject.org, and we very much welcome comments and feedback on our efforts over there. The #mirage channel on FreeNode IRC is also growing increasingly popular, as is simply reporting issues on the main Mirage GitHub repository.

Several people have also commented that they want to learn OCaml properly to start using Mirage. I've just co-published an O'Reilly book called Real World OCaml that's available for free online and also as hardcopy/ebook. Our Cambridge colleague John Whittington has also written an excellent introductory text, and you can generally find more resources online. Feel free to ask beginner OCaml questions on our mailing lists and we'll help as best we can!


Aug
23
2013
17
43

Introducing vchan

By Vincent Bernardoff

Editor: Note that some of the toolchain details of this blog post are now out-of-date with Mirage 1.1, so we will update this shortly.

Unless you are familiar with Xen's source code, there is little chance that you've ever heard of the vchan library or protocol. Documentation about it is very scarce: a description can be found on vchan's public header file, that I quote here for convenience:

Originally borrowed from the Qubes OS Project, this code (i.e. libvchan) has been substantially rewritten [...] This is a library for inter-domain communication. A standard Xen ring buffer is used, with a datagram-based interface built on top. The grant reference and event channels are shared in XenStore under a user-specified path.

This protocol uses shared memory for inter-domain communication, i.e. between two VMs residing in the same Xen host, and uses Xen's mechanisms -- more specifically, ring buffers and event channels -- in order to achieve its aims. Datagram-based interface simply means that the interface resembles UDP, although there is support for stream based communication (like TCP) as well.

Over the last two months or so, I worked on a pure OCaml implementation of this library, meaning that Mirage-based unikernels can now take full advantage of vchan to communicate with neighboring VMs! If your endpoint -- a Linux VM or another unikernel -- is on the same host, it is much faster and more efficient to use vchan rather than the network stack (although unfortunately, it is currently incompatible with existing programs written against the socket library under UNIX or the Flow module of Mirage, although this will improve). It also provides a higher level of security compared to network sockets as messages will never leave the host's shared memory.

Building the vchan echo domain

Provided that you have a Xen-enabled machine, do the following from dom0:

    opam install mirari mirage-xen mirage vchan

This will install the library and its dependencies. mirari is necessary to build the echo unikernel:

    git clone git://github.com/mirage/ocaml-vchan
    cd test
    mirari configure --xen --no-install
    mirari build --xen
    sudo mirari run --xen

This will boot a vchan echo domain for dom0, with connection parameters stored in xenstore at /local/domain/<domid>/data/vchan, where <domid> is the domain id of the vchan echo domain. The echo domain is simply an unikernel hosting a vchan server accepting connections from dom0, and echo'ing everything that is sent to it.

The command xl list will give you the domain id of the echo server.

Building the vchan CLI from Xen's sources

You can try it using a vchan client that can be found in Xen's sources at tools/libvchan: Just type make in this directory. It will compile the executable vchan-node2 that you can use to connect to our freshly created echo domain:

    ./vchan-node2 client <domid>/local/domain/<domid>/data/vchan

If everything goes well, what you type in there will be echoed.

You can obtain the full API documentation for ocaml-vchan by doing a cd ocaml-vchan && make doc. If you are doing network programming under UNIX, vchan's interface will not surprise you. If you are already using vchan for a C project, you will see that the OCaml API is nearly identical to what you are used to.

Please let us know if you use or plan to use this library in any way! If you need tremedous speed or more security, this might fit your needs.


Aug
8
2013
16
0

MirageOS travels to OSCON'13: a trip report

By Richard Mortier

Now that Mirage OS is rapidly converging on a Developer Preview Release 1, we took it for a first public outing at OSCON'13, the O'Reilly Open Source Conference. OSCON is in its 15th year now, and is a meeting place for developers, business people and investors. It was a great opportunity to show MirageOS off to some of the movers and shakers in the OSS world.

Partly because MirageOS is about synthesising extremely specialised guest kernels from high-level code, and partly because both Anil and I are constitutionally incapable of taking the easy way out, we self-hosted the slide deck on Mirage: after some last-minute hacking -- on content not Mirage I should add! -- we built a self-contained unikernel of the talk.

This was what you might call a "full stack" presentation: the custom unikernel (flawlessly!) ran a type-safe network device driver, OCaml TCP/IP stack supporting an OCaml HTTP framework that served slides rendered using reveal.js. The slide deck, including the turbo-boosted screencast of the slide deck compilation, is hosted as another MirageOS virtual machine at decks.openmirage.org. We hope to add more slide decks there soon, including resurrecting the tutorial! The source code for all this is in the mirage-decks GitHub repo.

The Talk

The talk went down pretty well -- given we were in a graveyard slot on Friday after many people had left, attendance was fairly high (around 30-40), and the feedback scores have been positive (averaging 4.7/5) with comments including "excellent content and well done" and "one of the most excited projects I heard about" (though we are suspicious that just refers to Anil's usual high-energy presentation style...).

Probably the most interesting chat after the talk was with the Rust authors at Mozilla (@pcwalton and @brson) about combining the Mirage unikernel techniques with the Rust runtime. But perhaps the most surprising feedback was when Anil and I were stopped in the street while walking back from some well-earned sushi, by a cyclist who loudly declared that he'd really enjoyed the talk and thought it was a really exciting project -- never done something that achieved public acclaim from the streets before :)

Book Signing and Xen.org

Anil also took some time to sit in a book signing for his forthcoming Real World OCaml O'Reilly book. This is really important to making OCaml easier to learn, especially given that all the Mirage libraries are using it. Most of the dev team (and especially thanks to Heidi Howard who bravely worked through really early alpha revisions) have been giving us feedback as the book is written, using the online commenting system.

The Xen.org booth was also huge, and we spent quite a while plotting the forthcoming Mirage/Xen/ARM backend. We're pretty much just waiting for the Cubieboard2 kernel patches to be upstreamed (keep an eye here) so that we can boot Xen/ARM VMs on tiny ARM devices. There's a full report about this on the xen.org blog post about OSCon.

Galois and HalVM

We also stopped by the Galois to chat with Adam Wick, who is the leader of the HalVM project at Galois. This is a similar project to Mirage, but, since it's written in Haskell, has more of a focus on elegant compositional semantics rather than the more brutal performance and predictability that Mirage currently has at its lower levels.

The future of all this ultimately lies in making it easier for these multi-lingual unikernels to be managed and for all of them to communicate more easily, so we chatted about code sharing and common protocols (such as vchan) to help interoperability. Expect to see more of this once our respective implementations get more stable.

All-in-all OSCON'13 was a fun event and definitely one that we look forward returning to with a more mature version of MirageOS, to build on the momentum begun this year! Portland was an amazing host city too, but what happens in Portland, stays in Portland...


Jul
18
2013
11
20

Creating Xen block devices with MirageOS

By Dave Scott

MirageOS is a unikernel or "library operating system" that allows us to build applications which can be compiled to very diverse environments: the same code can be linked to run as a regular Unix app, relinked to run as a FreeBSD kernel module, and even linked into a self-contained kernel which can run on the Xen hypervisor.

Mirage has access to an extensive suite of pure OCaml libraries, covering everything from Xen block and network virtual device drivers, a TCP/IP stack, OpenFlow learning switches and controllers, to SSH and HTTP server implementations.

I normally use Mirage to deploy applications as kernels on top of a XenServer hypervisor. I start by first using the Mirage libraries within a normal Unix userspace application -- where I have access to excellent debugging tools -- and then finally link my app as a high-performance Xen kernel for production.

However Mirage is great for more than simply building Xen kernels. In this post I'll describe how I've been using Mirage to create experimental virtual disk devices for existing Xen VMs (which may themselves be Linux, *BSD, Windows or even Mirage kernels). The Mirage libraries let me easily experiment with different backend file formats and protocols, all while writing only type-safe OCaml code thats runs in userspace in a normal Linux domain 0.

Disk devices under Xen

The protocols used by Xen disk and network devices are designed to permit fast and efficient software implementations, avoiding the inefficiencies inherent in emulating physical hardware in software. The protocols are based on two primitives:

  • shared memory pages: used for sharing both data and metadata
  • event channels: similar to interrupts, these allow one side to signal the other

In the disk block protocol, the protocol starts with the client ("frontend" in Xen jargon) sharing a page with the server ("backend"). This single page will contain the request/response metadata, arranged as a circular buffer or "ring". The client ("frontend") can then start sharing pages containing disk blocks with the backend and pushing request structures to the ring, updating shared pointers as it goes. The client will give the server end a kick via an event channel signal and then both ends start running simultaneously. There are no locks in the protocol so updates to the shared metadata must be handled carefully, using write memory barriers to ensure consistency.

Xen disk devices in MirageOS

Like everything else in Mirage, Xen disk devices are implemented as libraries. The ocamlfind library called "xenctrl" provides support for manipulating blocks of raw memory pages, "granting" access to them to other domains and signalling event channels. There are two implementations of "xenctrl": one that invokes Xen "hypercalls" directly and one which uses the Xen userspace library libxc. Both implementations satisfy a common signature, so it's easy to write code which will work in both userspace and kernelspace.

The ocamlfind library shared-memory-ring provides functions to create and manipulate request/response rings in shared memory as used by the disk and network protocols. This library is a mix of 99.9% OCaml and 0.1% asm, where the asm is only needed to invoke memory barrier operations to ensure that metadata writes issued by one CPU core appear in the same order when viewed from another CPU core.

Finally the ocamlfind library xenblock provides functions to hotplug and hotunplug disk devices, together with an implementation of the disk block protocol itself.

Making custom virtual disk servers with MirageOS

Let's experiment with making our own virtual disk server based on the Mirage example program, xen-disk.

First, install Xen, OCaml and OPAM. Second initialise your system:

  opam init
  eval `opam config env`

At the time of writing, not all the libraries were released as upstream OPAM packages, so it was necessary to add some extra repositories. This should not be necessary after the Mirage developer preview at OSCON 2013.

  opam remote add mirage-dev git://github.com/mirage/opam-repo-dev
  opam remote add xapi-dev git://github.com/xapi-project/opam-repo-dev

Install the unmodified xen-disk package, this will ensure all the build dependencies are installed:

  opam install xen-disk

When this completes it will have installed a command-line tool called xen-disk. If you start a VM using your Xen toolstack of choice ("xl create ..." or "xe vm-install ..." or "virsh create ...") then you should be able to run:

  xen-disk connect <vmname>

which will hotplug a fresh block device into the VM "<vmname>" using the "discard" backend, which returns "success" to all read and write requests, but actually throws all data away. Obviously this backend should only be used for basic testing!

Assuming that worked ok, clone and build the source for xen-disk yourself:

  git clone git://github.com/mirage/xen-disk
  cd xen-disk
  make

Making a custom virtual disk implementation

The xen-disk program has a set of simple built-in virtual disk implementations. Each one satisifies a simple signature, contained in src/storage.mli:

type configuration = {
  filename: string;      (** path where the data will be stored *)
  format: string option; (** format of physical data *)
}
(** Information needed to "open" a disk *)

module type S = sig
  (** A concrete mechanism to access and update a virtual disk. *)

  type t
  (** An open virtual disk *)

  val open_disk: configuration -> t option Lwt.t
  (** Given a configuration, attempt to open a virtual disk *)

  val size: t -> int64
  (** [size t] is the size of the virtual disk in bytes. The actual
      number of bytes stored on media may be different. *)

  val read: t -> Cstruct.t -> int64 -> int -> unit Lwt.t
  (** [read t buf offset_sectors len_sectors] copies [len_sectors]
      sectors beginning at sector [offset_sectors] from [t] into [buf] *)

  val write: t -> Cstruct.t -> int64 -> int -> unit Lwt.t
  (** [write t buf offset_sectors len_sectors] copies [len_sectors]
      sectors from [buf] into [t] beginning at sector [offset_sectors]. *)
end

Let's make a virtual disk implementation which uses an existing disk image file as a "gold image", but uses copy-on-write so that no writes persist. This is a common configuration in Virtual Desktop Infrastructure deployments and is generally handy when you want to test a change quickly, and revert it cleanly afterwards.

A useful Unix technique for file I/O is to "memory map" an existing file: this associates the file contents with a range of virtual memory addresses so that reading and writing within this address range will actually read or write the file contents. The "mmap" C function has a number of flags, which can be used to request "copy on write" behaviour. Reading the OCaml manual Bigarray.map_file it says:

If shared is true, all modifications performed on the array are reflected in the file. This requires that fd be opened with write permissions. If shared is false, modifications performed on the array are done in memory only, using copy-on-write of the modified pages; the underlying file is not affected.

So we should be able to make a virtual disk implementation which memory maps the image file and achieves copy-on-write by setting "shared" to false. For extra safety we can also open the file read-only.

Luckily there is already an "mmap" implementation in xen-disk; all we need to do is tweak it slightly. Note that the xen-disk program uses a co-operative threading library called lwt which replaces functions from the OCaml standard library which might block with non-blocking variants. In particular lwt uses Lwt_bytes.map_file as a wrapper for the Bigarray.Array1.map_file function. In the "open-disk" function we simply need to set "shared" to "false" to achieve the behaviour we want i.e.

  let open_disk configuration =
    let fd = Unix.openfile configuration.filename [ Unix.O_RDONLY ] 0o0 in
    let stats = Unix.LargeFile.fstat fd in
    let mmap = Lwt_bytes.map_file ~fd ~shared:false () in
    Unix.close fd;
    return (Some (stats.Unix.LargeFile.st_size, Cstruct.of_bigarray mmap))

The read and write functions can be left as they are:

  let read (_, mmap) buf offset_sectors len_sectors =
    let offset_sectors = Int64.to_int offset_sectors in
    let len_bytes = len_sectors * sector_size in
    let offset_bytes = offset_sectors * sector_size in
    Cstruct.blit mmap offset_bytes buf 0 len_bytes;
    return ()

  let write (_, mmap) buf offset_sectors len_sectors =
    let offset_sectors = Int64.to_int offset_sectors in
    let offset_bytes = offset_sectors * sector_size in
    let len_bytes = len_sectors * sector_size in
    Cstruct.blit buf 0 mmap offset_bytes len_bytes;
    return () 

Now if we rebuild and run something like:

  dd if=/dev/zero of=disk.raw bs=1M seek=1024 count=1
  losetup /dev/loop0 disk.raw
  mkfs.ext3 /dev/loop0
  losetup -d /dev/loop0

  dist/build/xen-disk/xen-disk connect <myvm> --path disk.raw

Inside the VM we should be able to do some basic speed testing:

  # dd if=/dev/xvdb of=/dev/null bs=1M iflag=direct count=100
  100+0 records in
  100+0 records out
  104857600 bytes (105 MB) copied, 0.125296 s, 837 MB/s

Plus we should be able to mount the filesystem inside the VM, make changes and then disconnect (send SIGINT to xen-disk by hitting Control+C on your terminal) without disturbing the underlying disk contents.

So what else can we do?

Thanks to Mirage it's now really easy to experiment with custom storage types for your existing VMs. If you have a cunning scheme where you want to hash block contents, and use the hashes as keys in some distributed datastructure -- go ahead, it's all easy to do. If you have ideas for improving the low-level block access protocol then Mirage makes those experiments very easy too.

If you come up with a cool example with Mirage, then send us a pull request or send us an email to the Mirage mailing list -- we'd love to hear about it!


May
20
2013
16
20

The road to a developer preview at OSCON 2013

By Anil Madhavapeddy

There's been a crazy stream of activity since the start of the year, but the most important news is that we have a release target for an integrated developer preview of the Mirage stack: a talk at O'Reilly OSCon in July! Do turn up there and find Dave Scott and Anil Madhavapeddy showing off interactive demonstrations.

Meanwhile, another significant announcement has been that Xen is joining the Linux Foundation as a collaborative project. This is great news for Mirage: as a library operating system, we can operate just as easily under other hypervisors, and even on bare-metal devices such as the Raspberry Pi. We're very much looking forward to getting the Xen-based developer release done, and interacting with the wider Linux community (and FreeBSD, for that matter, thanks to Gabor Pali's kFreeBSD backend).

Here's some other significant news from the past few months:

  • OPAM 1.0 was released, giving Mirage a solid package manager for handling the many libraries required to glue an application together. Vincent Bernardoff joined the team at Citrix and has been building a Mirage build-frontend called Mirari to hide much of the system complexity from a user who isn't too familiar with either Xen or OCaml.

  • A new group called the OCaml Labs has started up in the Cambridge Computer Laboratory, and is working on improving the OCaml toolchain and platform. This gives Mirage a big boost, as we can re-use several of the documentation, build and test improvements in our own releases. You can read up on the group's activities via the monthly updates, or browse through the various projects. One of the more important projects is the OCamlot continuous build infrastructure, which will also be testing Mirage kernels as one of the supported backends.

  • As we head into release mode, we've started weekly meetings to coordinate all the activities. We're keeping notes as we go along, so you should be able to skim the notes and mailing list archives to get a feel for the overall activities. Anil is maintaining a release checklist for the summer developer preview.

  • Anil (along with Yaron Minsky and Jason Hickey) is finishing up an O'Reilly book on Real World OCaml, which will be a useful guide to using OCaml for systems and network programming. If you'd like to review an early copy, please get in touch. The final book is anticipated to be released towards the end of the year, with a Rough Cut at the end of the summer.

  • The core system was described in an ASPLOS 2013 paper, which should help you understand the background behind library operating systems. Some of the Mirage libraries are also currently being integrated into the next-generation Windsor release of the Xen Cloud Platform, which means that several of the libraries will be used in production and hence move beyond research-quality code.

In the next few months, the installation notes and getting started guides will all be revamped to match the reality of the new tooling, so expect some flux there. If you want to take an early try of Mirage beforehand, don't forget to hop on the #mirage IRC channel on Freenode and ping us with questions directly. We will also be migrating some of the project infrastructure to be fully self-hosted on Mirage and Xen, and placing some of the services onto the new xenproject.org infrastructure.


Oct
17
2012
17
30

Breaking up is easy to do (with OPAM)

By Anil Madhavapeddy

When we first started developing Mirage in 2009, we were rewriting huge chunks of operating system and runtime code in OCaml. This ranged from low-level device drivers to higher-level networking protocols such as TCP/IP or HTTP. The changes weren't just straight rewrites of C code either, but also involved experimenting with interfaces such as iteratees and lightweight threading to take advantage of OCaml's static type system. To make all of this easy to work with, we decided to lump everything into a single Git repository that would bootstrap the entire system with a single make invocation.

Nowadays though, Mirage is self-hosting, the interfaces are settling down, the number of libraries are growing every day, and portions of it are being used in the Xen Cloud Platform. So for the first developer release, we wanted to split up the monolithic repository into more manageable chunks, but still make it as easy as possible for the average OCaml developer to try out Mirage.

Thanks to much hard work from Thomas and his colleagues at OCamlPro, we now have OPAM: a fully-fledged package manager for Mirage! OPAM is a source-based package manager that supports a growing number of community OCaml libraries. More importantly for Mirage, it can also switch between multiple compiler installations, and so support cross-compiled runtimes and modified standard libraries.

OPAM includes compiler variants for Mirage-friendly environments for Xen and the UNIX tuntap backends. The installation instructions now give you instructions on how to use OPAM, and the old monolithic repository is considered deprecated. We're still working on full documentation for the first beta release, but all the repositories are on the Mirage organisation on Github, with some of the important ones being:

  • mirage-platform has the core runtime for Xen and UNIX, implemented as the OS module.
  • mirage-net has the TCP/IP networking stack.
  • ocaml-cstruct has the camlp4 extension to manipulate memory like C structs, but with type-safe accessors in OCaml.
  • ocaml-xenstore has a portable implementation of the Xenstore protocol to communicate with the Xen management stack from a VM (or even act as a server in a stub domain).
  • ocaml-dns is a pure OCaml implementation of the DNS protocol, including a server and stub resolver.
  • ocaml-re is a pure OCaml version of several regular expression engines, including Perl compatibility.
  • ocaml-uri handles parsing the surprisingly complex URI strings.
  • ocaml-cohttp is a portable HTTP parser, with backends for Mirage, Lwt and Core/Async. This is a good example of how to factor out OS-specific concerns using the OCaml type system (and I plan to blog more about this soon).
  • ocaml-cow is a set of syntax extensions for JSON, CSS, XML and XHTML, which are explained here, and used by this site.
  • ocaml-dyntype uses camlp4 to generate dynamic types and values from OCaml type declarations.
  • ocaml-orm auto-generates SQL scheme from OCaml types via Dyntype, and currently supports SQLite.
  • ocaml-openflow implements an OCaml switch and controller for the Openflow protocol.

There are quite a few more that are still being hacked for release by the team, but we're getting there very fast now. We also have the Mirage ports of SSH to integrate before the first release this year, and Haris has got some interesting DNSSEC code! If you want to get involved, join the mailing list or IRC channel!


Sep
12
2012
0
0

Building a "Xenstore stub domain" with MirageOS

By Dave Scott

[ *Due to continuing development, some of the details in this blog post are now out-of-date. It is archived here.* ]

On all hosts running Xen, there is a critical service called xenstore. Xenstore is used to allow untrusted user VMs to communicate with trusted system VMs, so that

  • virtual disk and network connections can be established
  • performance statistics and OS version information can be shared
  • VMs can be remotely power-cycled, suspended, resumed, snapshotted and migrated.

If the xenstore service fails then at best the host cannot be controlled (i.e. no VM start or shutdown) and at worst VM isolation is compromised since an untrusted VM will be able to gain unauthorised access to disks or networks. This blog post examines how to disaggregate xenstore from the monolithic domain 0, and run it as an independent stub domain.

Recently in the Xen community, Daniel De Graaf and Alex Zeffertt have added support for xenstore stub domains where the xenstore service is run directly as an OS kernel in its own isolated VM. In the world of Xen, a running VM is a "domain" and a "stub" implies a single-purpose OS image rather than a general-purpose machine. Previously if something bad happened in "domain 0" (the privileged general-purpose OS where xenstore traditionally runs) such as an out-of-memory event or a performance problem, then the critical xenstore process might become unusable or fail altogether. Instead if xenstore is run as a "stub domain" then it is immune to such problems in domain 0. In fact, it will even allow us to reboot domain 0 in future (along with all other privileged domains) without incurring any VM downtime during the reset!

The new code in xen-unstable.hg lays the necessary groundwork (Xen and domain 0 kernel changes) and ports the original C xenstored to run as a stub domain.

Meanwhile, thanks to Vincent Hanquez and Thomas Gazagnaire, we also have an OCaml implementation of xenstore which, as well as the offering memory-safety, also supports a high-performance transaction engine, necessary for surviving a stressful "VM bootstorm" event on a large server in the cloud. Vincent and Thomas' code is Linux/POSIX only.

Ideally we would have the best of both worlds:

  • a fast, memory-safe xenstored written in OCaml,
  • running directly as a Xen stub domain i.e. as a specialised kernel image without Linux or POSIX

We can now do both, using Mirage! If you're saying, "that sounds great! How do I do that?" then read on...

Step 1: remove dependency on POSIX/Linux

If you read through the existing OCaml xenstored code, it becomes obvious that the main uses of POSIX APIs are for communication with clients, both Unix sockets and for a special Xen inter-domain shared memory interface. It was a fairly painless process to extract the required socket-like IO signature and turn the bulk of the server into a functor. The IO signature ended up looking approximately like:

    type t
    val read: t -> string -> int -> int -> int Lwt.t
    val write: t -> string -> int -> int -> unit Lwt.t
    val destroy: t -> unit Lwt.t

For now the dependency on Lwt is explicit but in future I'll probably make it more abstract so we can use Core Async too.

Step 2: add a Mirage Xen IO implementation

In a stub-domain all communication with other domains is via shared memory pages and "event channels". Mirage already contains extensive support for using these primitives, and uses them to create fast network and block virtual device drivers. To extend the code to cover the Xenstore stub domain case, only a few tweaks were needed to add the "server" side of a xenstore ring communication, in addition to the "client" side which was already present.

In Xen, domains share memory by a system of explicit "grants", where a client (called "frontend") tells the hypervisor to allow a server (called "backend") access to specific memory pages. Mirage already had code to create such grants, all that was missing was a few simple functions to receive grants from other domains.

These changes are all in the current mirage-platform tree.

Step 3: add a Mirage Xen "main" module and Makefile

The Mirage "main" module necessary for a stub domain looks pretty similar to the normal Unix userspace case except that it:

  • arranges to log messages via the VM console (rather than a file or the network, since a disk or network device cannot be created without a working xenstore, and it's important not to introduce a bootstrap problem here)
  • instantiates the server functor with the shared memory inter-domain IO module.

The Makefile looks like a regular Makefile, invoking ocamlbuild. The whole lot is built with OASIS with a small extension added by Anil to set a few options required for building Xen kernels rather than regular binaries.

... and it all works!

The code is in two separate repositories:

  • ocaml-xenstore: contains all the generic stuff
  • ocaml-xenstore-xen: contains the unix userspace and xen stub domain IO modules and "main" functions
  • (optional) To regenerate the OASIS file, grab the add-xen branch from this OASIS fork.

Example build instructions

If you want to try building it yourself, try the following on a modern 64-bit OS. I've tested these instructions on a fresh install of Debian Wheezy.

First install OCaml and the usual build tools:

    apt-get install ocaml build-essential git curl rsync

Then install the OCamlPro opam package manager to simplify the installation of extra packages

    git clone git://github.com/OCamlPro/opam.git
    cd opam
    make
    make install
    cd ..

Initialise OPAM with the default packages:

    opam --yes init
    eval `opam config -env`

Add the "mirage" development package source (this step will not be needed once the package definitions are upstreamed)

    opam remote -add dev git://github.com/mirage/opam-repo-dev

Switch to the special "mirage" version of the OCaml compiler

    opam --yes switch -install 3.12.1+mirage-xen
    opam --yes switch 3.12.1+mirage-xen
    eval `opam config -env`

Install the generic Xenstore protocol libraries

    opam --yes install xenstore

Install the Mirage development libraries

    opam --yes install mirage

If this fails with "+ runtime/dietlibc/lib/atof.c:1: sorry, unimplemented: 64-bit mode not compiled in" it means you need a 64-bit build environment. Next, clone the xen stubdom tree

    git clone git://github.com/djs55/ocaml-xenstore-xen

Build the Xen stubdom

    cd ocaml-xenstore-xen
    make

The binary now lives in xen/_build/src/server_xen.xen

Deploying on a Xen system

Running a stub Xenstored is a little tricky because it depends on the latest and greatest Xen and Linux PVops kernel. In the future it'll become much easier (and probably the default) but for now you need the following:

  • xen-4.2 with XSM (Xen Security Modules) turned on
  • A XSM/FLASK policy which allows the stubdom to call the "domctl getdomaininfo". For the moment it's safe to skip this step with the caveat that xenstored will leak connections when domains die.
  • a Xen-4.2-compatible toolstack (either the bundled xl/libxl or xapi with some patches)
  • Linux-3.5 PVops domain 0 kernel
  • the domain builder binary init-xenstore-domain from xen-4.2/tools/xenstore.

To turn the stub xenstored on, you need to edit whichever init.d script is currently starting xenstore and modify it to call

    init-xenstore-domain /path/to/server_xen.xen 256 flask_label

Feb
29
2012
11
10

Connected Cloud Control: OpenFlow in MirageOS

By Richard Mortier

Due to continuing development, some of the details in this blog post are now out-of-date. It is archived here.

Something we've been working on for a little while now that we're pretty excited about is an OpenFlow implementation for MirageOS. For those who're not networking types, in short, OpenFlow is a protocol and framework for devolving network control to software running on platforms other than the network elements themselves. It consists of three main parts:

  • a controller, responsible for exercising control over the network;
  • switches, consisting of switching hardware, with flow tables that apply forwarding behaviours to matching packets; and
  • the protocol, by which controllers and switches communicate.

For more -- and far clearer! -- explanations, see any of the many online OpenFlow resources such as OpenFlowHub.

Within MirageOS we have an OpenFlow implementation in two parts: individual libraries that provide controller and switch functionality. Linking the switch library enables your application to become a software-based OpenFlow switch. Linking in the controller library enables your application to exercise direct control over OpenFlow network elements.

The controller is modelled after the NOX open-source controller and currently provides only relatively low-level access to the OpenFlow primitives: a very cool thing to build on top of it would be a higher-level abstraction such as that provided by Nettle or Frenetic.

The switch is primarily intended as an experimental platform -- it is hopefully easier to extend than some of the existing software switches while still being sufficiently high performance to be interesting!

By way of a sample of how it fits together, here's a skeleton for a simple controller application:

type mac_switch = {
  addr: OP.eaddr; 
  switch: OP.datapath_id;
}

type switch_state = {
  mutable mac_cache: 
        (mac_switch, OP.Port.t) Hashtbl.t;
  mutable dpid: OP.datapath_id list
}

let switch_data = {
  mac_cache = Hashtbl.create 7; 
  dpid = [];
} 

let join_cb controller dpid evt =
  let dp = match evt with
      | OE.Datapath_join c -> c
      | _ -> invalid_arg "bogus datapath_join"
  in 
  switch_data.dpid <- switch_data.dpid @ [dp]

let packet_in_cb controller dpid evt =
  (* algorithm details omitted for space *)

let init ctrl = 
  OC.register_cb ctrl OE.DATAPATH_JOIN join_cb;
  OC.register_cb ctrl OE.PACKET_IN packet_in_cb

let main () =
  Net.Manager.create (fun mgr interface id ->
    let port = 6633 in 
    OC.listen mgr (None, port) init
  )

We've written up some of the gory details of the design, implementation and performance in a short paper to the ICC Software Defined Networking workshop. Thanks to some sterling work by Haris and Balraj, the headline numbers are pretty good though: the unoptimised Mirage controller implementation is only 30--40% lower performance than the highly optimised NOX destiny-fast branch, which drops most of the programmability and flexibility of NOX; but is about six times higher performance than the fully flexible current NOX release. The switch's performance running as a domU virtual machine is indistinguishable from the current Open vSwitch release.

For more details see the paper or contact Mort, Haris or Anil. Please do get in touch if you've any comments or questions, or you do anything interesting with it!


Sep
29
2011
11
10

An Outing to CUFP 2011

By Anil Madhavapeddy

The team signed up to do a tutorial at CUFP on the topic of Building a Functional OS, which meant zooming off to Tokyo! This was the first public show of the project, and resulted in a furious flurry of commits from the whole team to get it ready. The 45-strong crowd at the tutorial were really full of feedback, and particular thanks to Michael for organising the event, and Yaron, Marius, Steve, Wil, Adrian and the rest for shouting out questions regularly!

  • The tutorial is a Mirage application, so you can clone it and view it locally through your web browser. The content is mirrored at tutorial.openmirage.org, although it does require cleanup to make it suitable to an online audience. The SVG integration is awkward and it only works on Chrome/Safari, so I will probably rewrite it using deck.js soon. The tutorial is a good showcase of Mirage, as it compiles to Xen, UNIX (both kernel sockets and direct tuntap) with a RAMdisk or external filesystem, and is a good way to mess around with application synthesis (look at the Makefile targets in slides/).

  • Installation: instructions have been simplified, and we now only require OCaml on the host and include everything else in-tree. Thomas has also made Emacs and Vim plugins that are compatible with the ocamlbuild layout.

  • Lwt: a new tutorial which walks you through the cooperative threading library we use, along with exercises (all available in mirage-tutorial). Raphael and Balraj are looking for feedback on this, so get in touch!

  • Javascript: via node.js did not work in time for the tutorial, as integrating I/O is a tangled web that will take some time to sort out. Raphael is working on this in a separate tree for now. As part of this effort though, he integrated a pure OCaml regular expression library that does not require C bindings, and is surprisingly fast.

  • Devices: we can now synthesise binaries that share common code but have very different I/O interfaces. This is due to a new device manager, and David also heroically wrote a complete FAT12/16/32 library that we demonstrated. Yaron Minsky suggested a different approach to the device manager using first-class modules instead of objects, so I am experimentally trying this before writing documentation on it.

  • TCP: the notorious Mirage stack is far more robust due to our resident networking guru Balraj hunting down last-minute bugs. Although it held together with sticky tape during the tutorial, he is now adding retransmission and congestion control to make it actually standards-compliant. Still, if you dont have any packet loss, the unikernel version of this website does actually serve pages.

  • OpenFlow: is a new standard for Software Defined Networking, and Haris and Mort have been hacking away at a complete implementation directly in Mirage! We will be giving a tutorial on this at the OFELIA summer school in November (it is summer somewhere, I guess). The prospect of a high-speed unikernel switching fabric for the cloud, programmed in a functional style, is something I am really looking forward to seeing!

  • Jane Street Core: preceeding us was Yaron's Core tutorial. Since Mirage provides it own complete standard library, we can adopt portions of Core that do not require OS threads or UNIX-specific features. I really like the idea that Mirage enforces a discipline on writing portable interfaces, as dependencies on OS-specific features do sneak in insiduously and make switching to different platforms very difficult (e.g. Windows support). Incidentally, Yaron's ACM Queue article is a great introduction to OCaml.

So as you can see, it has been a busy few months! Much of the core of Mirage is settling down now, and we are writing a paper with detailed performance benchmarks of our various backends. Keep an eye on the Github milestone for the preview release, join our new mailing list, or follow the newly sentient openmirage on twitter!


Aug
12
2011
15
0

Portable Regular Expressions

By Raphael Proust

MirageOS targets different backends: micro-kernels for the Xen hypervisor, Unix executables and Javascript programs. The recent inclusion of the Javascript backend makes many C bindings unsuitable. In order to push backend incompatibilities closer to the application level, it is necessary to either reimplement the C bindings in Javascript or OCaml, or remove them completely. This is particularly important for the standard library.

The Str module has to go!

Str provides regular expressions in a non-reentrant, non-functional fashion. While the OCaml distribution provides it in otherlibs, it is installed by default and so widely used, and implemented under the hood via a C library. Regular expressions are used in several places in MirageOS, mainly for small operations (splitting, getting an offset, etc.), and so having a portable fallback written in pure OCaml would be very useful.

There are several possible ways to replace the Str module, each with its own set of perks and drawbacks:

  • Use a backend-neutral regexp library which "translates" to either Str or Pcre for the Xen and Unix backends or Javascript native regexps for the Javascript backend. This solution may be hard to maintain, especially if a fourth backend is to be included. Moreover each regexp library uses a slightly different convention for regexps (e.g. see the magic option in vim) which means that a lot of translation code might be needed.
  • Do string processing without regexps (using String.index and the likes). This solution is portable and potentially efficient. However, the potential efficiency comes from a low-level way of doing things.
  • Use an OCaml regexp library without C bindings. We expected such a library to be slower than Str and needed an estimation of performance cost in order to assess the practicality of the solution.

Benchmarking Str

There is a purely OCaml regexp library readily available, called Regexp and developed by Claude Marché from the LRI laboratory. You can find the documentation and the source on the associated webpage. After getting rid of mutexes (which, in MirageOS, are of no use, because of the Lwt based concurrency), we benchmarked it against Str. We also included the popular Pcre (Perl Compatible Regular Expression) library that is widely used.

The benchmark (available on github) is really simple and measures three different factors:

  • regexp construction: the transformation of a string (or another representation available to the programmer) into the internal representation of regexps used by the library
  • regexp usage: the execution of operations using regexps
  • string size: the length of the string being matched

MirageOS uses regexp in a specific pattern: a limited number of regexp constructions with a potentially high number of invocation (e.g. HTTP header parsing). The size of the strings on which regexps are used may vary. Because of this pattern, our benchmark does not take regexp construction overhead into account.

Here are the execution times of approximately 35000 string matching operations on strings of 20 to 60 bytes long.

Quite surprisingly for the string matching operation, the C based Str module is less efficient than the pure OCaml Regexp. The Pcre results were even worse than Str. Why?

A simple library for a simple task

The Regexp library is lightweight, and so far faster than its C based counterparts. One of the features Regexp lacks is "group capture": the ability to refer to blocks of a previously matched string. In Pcre it is possible to explicitly and selectively turn group capturing off via special syntax, instead of the regular parentheses. Str does not offer this, and thus imposes the runtime cost of capture even when not necessary. In other words, the slowdown/group capturing "is not a feature, it's a bug!"

The MirageOS Regexp library

With the introduction of Regexp into the tree, the libraries available to MirageOS applications are now Str-free and safer to use across multiple backends. The main drawback is a slight increase in verbosity of some parts of the code. Benchmarking the substitution operation is also necessary to assess the performance gain/loss (which we will do shortly).

In addition to cosmetic and speed considerations, it is important to consider the portability increase: MirageOS's standard library is Node.js compatible, a feature we will explore shortly!


Jun
18
2011
15
47

Delimited Continuations vs Lwt for Threads

By Anil Madhavapeddy

MirageOS is a fully event-driven system, with no support for conventional preemptive threads. Instead, programs are woken by events such as incoming network packets, and event callbacks execute until they themselves need to block (due to I/O or timers) or complete their task.

Event-driven systems are simple to implement, scalable to lots of network clients, and very hip due to frameworks like node.js. However, programming event callbacks directly leads to the control logic being scattered across many small functions, and so we need some abstractions to hide the interruptions of registering and waiting for an event to trigger.

OCaml has the excellent Lwt threading library that utilises a monadic approach to solving this. Consider this simplified signature:

  val return : 'a -> 'a Lwt.t 
  val bind : 'a Lwt.t -> ('a -> 'b Lwt.t) -> 'b Lwt.t
  val run : 'a Lwt.t -> 'a

Threads have the type 'a Lwt.t, which means that the thread will have a result of type 'a when it finishes. The return function is the simplest way to construct such a thread from an OCaml value.

If we then wish to use the value of thread, we must compose a function that will be called in the future when the thread completes. This is what the bind function above is for. For example, assume we have a function that will let us sleep for some time:

  val sleep: int -> unit Lwt.t

We can now use the bind function to do something after the sleep is complete:

  let x = sleep 5 in
  let y = bind x (fun () -> print_endline "awake!") in
  run y

x has the type unit Lwt.t, and the closure passed to bind will eventually be called with unit when the sleep finishes. Note that we also need a function to actually begin evaluating an Lwt thread, which is the run function.

Concerns

MirageOS currently uses Lwt extensively, and we have been very happy with using it to build a network stack. However, I was surprised to hear a lot of debate at the 2011 OCaml Users Group meeting that Lwt is not to everyone's tastes. There are a few issues:

  • The monadic style means that existing code will not just work. Any code that might block must be adapted to use return and bind, which makes integrating third-party code problematic.

  • More concerningly, any potential blocking points require the allocation of a closure. This allocation is very cheap in OCaml, but is still not free. Jun Furuse notes that combinator-based systems are slower during the development of his Planck parser.

Lwt addresses the first problem via a comprehensive syntax extension which provides Lwt equivalents for many common operations. For example, the above example with sleep can be written as:

  lwt x = sleep 5 in
  print_endline "awake"

The lwt keyword indicates the result of the expression should be passed through bind, and this makes it possible to write code that looks more OCaml-like. There are also other keywords like for_lwt and match_lwt that similarly help with common control flow constructs.

Fibers

After the meeting, I did get thinking about using alternatives to Lwt in MirageOS. One exciting option is the delimcc library which implements delimited continuations for OCaml. These can be used to implement restartable exceptions: a program can raise an exception which can be invoked to resume the execution as if the exception had never happened. Delimcc can be combined with Lwt very elegantly, and Jake Donham did just this with the Lwt_fiber library. His post also has a detailed explanation of how delimcc works.

The interface for fibers is also simple:

  val start: (unit -> 'a) -> 'a Lwt.t
  val await : 'a Lwt.t -> 'a

A fiber can be launched with start, and during its execution can block on another thread with await. When it does block, a restartable exception saves the program stack back until the point that start was called, and it will be resumed when the thread it blocked on completes.

Benchmarks

I put together a few microbenchmarks to try out the performance of Lwt threads versus fibers. The fiber test looks like this:

  module Fiber = struct
    let basic fn yields =
      for i = 1 to 15000 do
        for x = 1 to yields do
          Lwt_fiber.await (fn ())
        done
      done

    let run fn yields =
      Lwt_fiber.start (fun () -> basic fn yields)
  end

We invoke the run function with two arguments: a thread to use for blocking and the number of times we should yield serially (so we can confirm that an increasing number of yields scales linearly). The Lwt version is pretty similar:

  module LWT = struct
    let basic fn yields =
      for_lwt i = 1 to 15000 do
        for_lwt x = 1 to yields do
          fn ()
        done
      done
  
    let run = basic
  end

We do not need to do anything special to launch a thread since we are already in the Lwt main loop, and the syntax extension makes the for loops look like the Fiber example above.

The choice of blocking function is important. The first test runs using a fast Lwt.return () that returns immediately:

The x-axis on the above graph represents the number of yields in each loop. Both Lwt_fiber and pure Lwt optimise the case where a thread returns immediately, and so this graph simply tells us that the fast path is working (which is nice!). The next test replaces the blocking function with two alternatives that force the thread to yield:

There are two blocking functions used in the graph above:

  • the "slow" version is Lwt_unix.sleep 0.0 which forces the registration of a timeout.
  • the "medium" version is Lwt.pause () which causes the thread to pause and drop into the thread scheduler. In the case of Lwt_fiber, this causes an exception to be raised so we can benchmark the cost of using a delimited continuation.

Interestingly, using a fiber is slower than normal Lwt here, even though our callstack is not very deep. I would have hoped that fibers would be significantly cheaper with a small callstack, as the amount of backtracking should be quite low. Lets confirm that fibers do in fact slow down as the size of the callstack increases via this test:

  module Fiber = struct
    let recurse fn depth =
      let rec sum n = 
        Lwt_fiber.await (fn ());
        match n with
        |0 -> 0
        |n -> n + (sum (n-1)) 
      in
      for i = 1 to 15000 do
        ignore(sum depth)
      done

    let run fn depth = 
      Lwt_fiber.start (fun () -> recurse fn depth)
  end

The recurse function is deliberately not tail-recursive, so that the callstack increases as the depth parameter grows. The Lwt equivalent is slightly more clunky as we have to rewrite the loop to bind and return:

  module LWT = struct
    let recurse fn depth =
      let rec sum n =
        lwt () = fn () in
        match n with
        |0 -> return 0
        |n ->
          lwt n' = sum (n-1) in 
          return (n + n')
      in
      for_lwt i = 1 to 15000 do
        lwt res = sum depth in
        return ()
      done

   let run = recurse
  end

We then run the experiment using the slow Lwt_unix.sleep 0.0 function, and get this graph:

The above graph shows the recursive Lwt_fiber getting slower as the recursion depth increases, with normal Lwt staying linear. The graph also overlays the non-recursing versions as a guideline (*-basic-slow).

Thoughts

This first benchmark was a little surprising for me:

  • I would have thought that delimcc to be ahead of Lwt when dealing with functions with a small call-depth and a small amount of blocking (i.e. the traffic pattern that loaded network servers see). The cost of taking a restartable exception seems quite high however.
  • The fiber tests still use the Lwt machinery to manage the callback mechanism (i.e. a select loop and the timer priority queue). It may be possible to create a really light-weight version just for delimcc, but the Lwt UNIX backend is already pretty lean and mean and uses the libev to interface with the OS.
  • The problem of having to rewrite code to be Lwt-like still exists unfortunately, but it is getting better as the pa_lwt syntax extension matures and is integrated into my favourite editor (thanks Raphael!)
  • Finally, by far the biggest benefit of Lwt is that it can be compiled straight into Javascript using the js_of_ocaml compiler, opening up the possibility of cool browser visualisations and tickets to cool node.js parties that I don't normally get invited to.

I need to stress that these benchmarks are very micro, and do not take into account other things like memory allocation. The standalone code for the tests is online at Github, and I would be delighted to hear any feedback.

Retesting recursion [18th Jun 2011]

Jake Donham comments:

I speculated in my post that fibers might be faster if the copy/restore were amortized over a large stack. I wonder if you would repeat the experiment with versions where you call fn only in the base case of sum, instead of at every call. I think you're getting N^2 behavior here because you're copying and restoring the stack on each iteration.

When writing the test, I figured that calling the thread waiting function more often wouldn't alter the result (careless). So I modified the test suite to have a recurse test that only waits a single time at the end of a long call stack (see below) as well as the original N^2 version (now called recurse2).

  module Fiber = struct
    let recurse fn depth =
      let rec sum n = 
        match n with
        |0 -> Lwt_fiber.await (fn ()); 0
        |n -> n + (sum (n-1)) 
      in
      for i = 1 to 15000 do
        ignore(sum depth)
      done

    let run fn depth = 
      Lwt_fiber.start (fun () -> recurse fn depth)
  end

The N^2 version below of course looks the same as the previously run tests, with delimcc getting much worse as it yields more often:

However, when we run the recurse test with a single yield at the end of the long callstack, the situation reverses itself and now delimcc is faster. Note that this test ran with more iterations than the recurse2 test to make the results scale, and so the absolute time taken cannot be compared.

The reason for Lwt being slower in this becomes more clear when we examine what the code looks like after it has been passed through the pa_lwt syntax extension. The code before looks like:

  let recurse fn depth =
    let rec sum n =
      match n with
      | 0 -> 
          fn () >> return 0
      | n ->
          lwt n' = sum (n-1) in 
          return (n + n') in

and after pa_lwt macro-expands it:

  let recurse fn depth =
    let rec sum n =
      match n with
      | 0 ->
          Lwt.bind (fn ()) (fun _ -> return 0)
      | n ->
          let __pa_lwt_0 = sum (n - 1)
          in Lwt.bind __pa_lwt_0 (fun n' -> return (n + n')) in

Every iteration of the recursive loop requires the allocation of a closure (the Lwt.bind call). In the delimcc case, the function operates as a normal recursive function that uses the stack, until the very end when it needs to save the stack in one pass.

Overall, I'm convinced now that the performance difference is insignificant for the purposes of choosing one thread system over the other for MirageOS. Instead, the question of code interoperability is more important. Lwt-enabled protocol code will work unmodified in Javascript, and Delimcc code helps migrate existing code over.

Interestingly, Javascript 1.7 introduces a yield operator, which has been shown to have comparable expressive power to the shift-reset delimcc operators. Perhaps convergence isn't too far away after all...


Apr
11
2011
15
0

A Spring Wiki Cleaning

By Anil Madhavapeddy

We've been plugging away on Mirage for the last few months, and things are starting to take shape nicely. As the older blog entries were out-of-date, we have shifted the descriptive material to a new wiki section instead. What else has been happening?

  • The Xen unikernel backend is fully event-driven (no interrupts) and very stable under stress testing now. The TCP stack is also complete enough to self-host this website, and you can try it out by navigating to xen.openmirage.org. The stack doesnt actually do retransmissions yet, so your user experience may "vary". Check out the installation and hello world guides to try it out for yourself.
  • Richard Mortier has put together a performance testing framework that lets us analyse the performance of Mirage applications on different backends (e.g. UNIX vs Xen), and against other conventional applications (e.g. BIND for DNS serving). Read more in the wiki here.
  • Thomas Gazagnaire has rewritten the website to use the COW syntax extensions. He has also started a new job with OCamlPro doing consultancy on OCaml, so congratulations are in order!
  • Thomas has also started integrating experimental Node.js support to fill in our buzzword quota for the year (and more seriously, to explore alternative VM backends for Mirage applications).
  • The build system (often a bugbear of such OS projects) now fully uses ocamlbuild for all OCaml and C dependencies, and so the whole OS can be rebuilt with different compilers (e.g. LLVM) or flags with a single invocation.

There are some exciting developments coming up later this year too!


Oct
11
2010
15
0

Self-hosting MirageOS website

By Anil Madhavapeddy

Welcome to the new self-hosting website for the Mirage project! As we go about preparing a release for later in the year, this blog will contain technical musings and work-in-progress reports of various bits of the operating system as they mature. Since there's so much to talk about, we decided to start with a blog format, and eventually collect things into a proper document as they stabilise.

Feel free to subscribe to the Atom feed to keep up-to-date with our progress, or just e-mail us or comment on individual posts with any queries.