By Hannes Mehnert - 2019-03-05
We are happy to announce our MirageOS 3.5.0 release. We didn't announce post 3.0.0 releases too well -- that's why this post tries to summarize the changes in the MirageOS ecosystem over the past two years. MirageOS consists of over 100 opam packages, lots of which are reused in other OCaml projects and deployments without MirageOS. These opam packages are maintained and developed further by lots of developers.
On the OCaml tooling side, since MirageOS 3.0.0 we did several major changes:
pin-depends in config.ml. pin-depends allows you to depend on a development branch of any opam package for your unikernel,mirage command-line utility now emits lower and upper bounds of opam packages, allowing uncompromising deprecation of packages,safe-string is enabled by default. Strings are immutable now!!,result package, which has incorporated into Pervasives since OCaml 4.03.0.The 3.5.0 release contains several API improvements of different MirageOS interfaces - if you're developing your own MirageOS unikernels, you may want to read this post to adjust to the new APIs.
type t constrained to unit as of 2.0.0;ETHIF module type to the clearer ETHERNET. As of 2.0.0 it also contains keep-alive support, complies with recent TCP/IP layering rework (see below), and IPv4 now supports reassembly and fragmentation;We improved the key-value store API, and added a read-write store. There is also ongoing work which implements the read-write interface using irmin, a branchable persistent storage that can communicate via the git protocol. Motivations for these changes were the development of CalDAV, but also the development of wodan, a flash-friendly, safe and flexible filesystem. The goal is to EOL the mirage-fs interface in favour of the key-value store.
Major API improvements (in this PR, since 2.0.0):
key is now a path (list of segments) instead of a stringvalue type is now a stringlist : t -> key -> (string * [Value|Dictionary], error) result io was addedget : t -> key -> (value, error) result io is now provided (used to be named read and requiring an offset and length parameter)last_modified : t -> key -> (int * int64, error) result io and digest : t -> key -> (string, error) result io have been introducedsize was removed.RW for read-write key-value stores extends RO with three functions set, remove, and batchThere is now a non-persistent in-memory implementation of a read-write key-value store available. Other implementations (such as crunch, mirage-kv-unix, mirage-fs, tar have been adapted, as well as clients of mirage-kv (dns, cohttp, tls)).
The IPv4 implementation now has support for fragment reassembly. Each incoming IPv4 fragment is checked for the "more fragments" and "offset" fields. If these are non-zero, the fragment is processed by the fragment cache, which uses a least recently used data structure of maximum size 256kB content shared by all incoming fragments. If there is any overlap in fragments, the entire packet is dropped (avoiding security issues). Fragments may arrive out of order. The code is heavily unit-tested. Each IPv4 packet may at most be in 16 fragments (to minimise CPU DoS with lots of small fragments), the timeout between the first and last fragment is 10 seconds.
The layering and allocation discipline has been revised. ethernet (now encapsulating and decapsulating Ethernet) and arp (the address resolution protocol) are separate opam packages, and no longer part of tcpip.
At the lowest layer, mirage-net is the network device. This interface is implemented by our different backends (xen, solo5, unix, macos, and vnetif). Some backends require buffers to be page-aligned when they are passed to the host system. This was previously not really ensured: while the abstract type page_aligned_buffer was required, write (and writev) took the abstract buffer type (always constrained to Cstruct.t by mirage-net-lwt). The mtu (maximum transmission unit) used to be an optional connect argument to the Ethernet layer, but now it is a function which needs to be provided by mirage-net.
The Mirage_net.write function now has a signature that is explicit about ownership and lifetime: val write : t -> size:int -> (buffer -> int) -> (unit, error) result io.
It requires a requested size argument to be passed, and a fill function which is called with an allocated buffer, that satisfies the backend demands. The fill function is supposed to write to the buffer, and return the length of the frame to be send out. It can neither error (who should handle such an error anyways?), nor is it in the IO monad. The fill function should not save any references to the buffer, since this is the network device's memory, and may be reused. The writev function has been removed.
The Ethernet layer does encapsulation and decapsulation now. Its write function has the following signature:
val write: t -> ?src:macaddr -> macaddr -> Ethernet.proto -> ?size:int -> (buffer -> int) -> (unit, error) result io.
It fills in the Ethernet header with the given source address (defaults to the device's own MAC address) and destination address, and Ethernet protocol. The size argument is optional, and defaults to the MTU. The buffer that is passed to the fill function is usable from offset 0 on. The Ethernet header is not visible at higher layers.
The IP layer also embeds a revised write signature:
val write: t -> ?fragment:bool -> ?ttl:int -> ?src:ipaddr -> ipaddr -> Ip.proto -> ?size:int -> (buffer -> int) -> buffer list -> (unit, error) result io.
This is similar to the Ethernet signature - it writes the IPv4 header and sends a packet. It also supports fragmentation (including setting the do-not-fragment bit for path MTU discovery) -- whenever the payload is too big for a single frame, it is sent as multiple fragmented IPv4 packets. Additionally, setting the time-to-live is now supported, meaning we now can implement traceroute!
The API used to include two functions, allocate_frame and write, where only buffers allocated by the former should be used in the latter. This has been combined into a single function that takes a fill function and a list of payloads. This change is for maximum flexibility: a higher layer can either construct its header and payload, and pass it to write as payload argument (the buffer list), which is then copied into the buffer(s) allocated by the network device, or the upper layer can provide the callback fill function to assemble its data into the buffer allocated by the network device, to avoid copying. Of course, both can be used - the outgoing packet contains the IPv4 header, and possibly the buffer until the offset returned by fill, and afterwards the payload.
The TCP implementation has preliminary keepalive support.
ukvm target was renamed to hvt, where solo5-hvt is the monitoring processThe default random device from the OCaml standard library is now properly seeded using mirage-entropy. In the future, we plan to make the fortuna RNG the default random number generator.
The semantics of arguments passed to a MirageOS unikernel used to vary between different backends, now they're the same everywhere: all arguments are concatenated using the whitespace character as separator, and split on the whitespace character again by parse-argv. To pass a whitespace character in an argument, the whitespace now needs to be escaped: --hello=foo\\ bar.
You may also want to read the MirageOS 3.2.0 announcement and the MirageOS 3.3.0 announcement.
We are working on further changes which revise the mirage internal build system to dune. At the moment it uses ocamlbuild, ocamlfind, pkg-config, and make. The goal of this change is to make MirageOS more developer-friendly. On the horizon we have MirageOS unikernel monorepos, incremental builds, pain-free cross-compilation, documentation generation, ...
Several other MirageOS ecosystem improvements are on the schedule for 2019, including an irmin 2.0 release, a seccomp target for Solo5, and easier deployment and multiple interface in Solo5.