By Romain Calascibetta - 2022-04-01
The security of communications poses a seemingly never-ending challenge across Cyberspace. From sorting through mountains of spam to protecting our private messages from malicious hackers, cybersecurity has never been more important than it is today. It takes considerable technical skills and dependable infrastructure to run an email service, and sadly, most companies with the ability to handle the billions of emails sent daily make money off mining your sensitive data.
Five years ago, we started to explore an incredible endeavour on how to securely send and receive email. It was my final year in an internship at Cambridge, and the goal was to develop an OCaml library that could parse and craft emails. Thus, Mr. MIME was born. I even gave a presentation on it at ICFP 2016 and introduced Mr. MIME in a previous post. Mr. MIME was also selected by the [NGI DAPSI initiative]((https://tarides.com/blog/2022-03-08-secure-virtual-messages-in-a-bottle-with-scop) last year.
I'm thrilled to shine a spotlight on Mr. MIME as part of the MirageOS 4 release! It was essential to create several small libraries when building and testing Mr. MIME. I've included some samples of how to use Mr. MIME to parse and serialise emails in OCaml, as well as receiving and sending SMTP messages. I then explain how to use all of this via CLI tools. Since unikernels were the foundation on which I built Mr. MIME, the final section explains how to deploy unikernels to handle email traffic.
The following libraries were created to support Mr. MIME:
pecu
as the quoted-printable
serialiser/deserialiser.
First, if we strictly consider standards, email transmission can use a 7-bit channel, so we made different encodings in order to safely transmit 8-bit contents via such channels. quoted-printable
is one of them, where any non-ASCII characters are encoded.
Another encoding is the famous UTF-7 (the one from RFC2152, not the one from RFC2060.5.1.3), which is available in the yuscii
library. Please note, Yukoslavian engineers created YUSCII
encoding to replace the imperial ASCII one.
rosetta
is a little library that normalises some inputs such as KOI8-{U,R}
or ISO-8859-*
to Unicode. This ability permits mrmime
to produce only UTF-8 results that remove the encoding problem. Then, as according to RFC6532 and the Postel law, Mr. MIME can produce only UTF-8 emails.
ke
is a small library that implements a ring buffer with bigarray
. This library has only one purpose: to restrict a transmission's memory consumption via a ring buffer, like the famous Xen's shared-memory ring buffer.
emile
may be the most useful library for many users. It parses and re-encodes an email address according to standards. Email addresses are hard! Many details exist, and some of them have meaning while others don't. emile
proposes the most standardised way to parse email addresses, and it has the smaller dependencies cone, so it could be used by any project, regardless of size.
unstrctrd
may be the most obscure library, but it's the essential piece of Mr. MIME. From archeological research into multiple standards, which describe emails along that time, we discovered the most generic form of any values available in your header: the unstructured form. At least email addresses, Date (RFC822), or DKIM-Signature follow this form. More generally, a form such as this can be found in the Debian package description (the RFC822 form). unstrctrd
implements a decoder for it.
prettym
is the last developed library in this context. It's like the Format
module with ke
, and it produces a continuation, which fills a fixed-length buffer. prettym
describes how to encode emails while complying with the 80-columns rule, so any emails generated by Mr. MIME fit into a catodic monitor! More importantly, with the 7-bit limitation, this rule comes from the MTU limitation of routers, and it's required from the standard point-of-view.
From all of these, we developed mrmime
, a library that can transform your email into an OCaml value and create an email from it. This work is related to necessary pieces in multiple contexts, especially the multipart
format. We decided to extract a relevant piece of software and make a new library more specialised for the HTTP (which shares many things from emails), then integrate it into Dream. For example see multipart_form.
A huge amount of work has been done on mrmime
to ensure a kind of isomorphism, such as x = decode(encode(x))
. For this goal, we created a fuzzer that can generate emails. Next, we tried to encode it and then decode the result. Finally, we compared results and checked if they were semantically equal. This enables us to generate many emails, and Mr. MIME won't alter their values.
We also produced a large corpus of emails (a million) that follows the standards. It's really interesting work because it offers the community a free corpus of emails where implementations can check their reliability through Mr. MIME. For a long time after we released Mr. MIME, users wondered how to confirm that what they decoded is what they wanted. It's easy! Just do as we did! Give a billion emails to Mr. MIME and see for yourself. It never fails to decode them all!
At first, we discovered a problem with this implemenation because we couldn't verify Mr. MIME correctly parsed the emails, but we fixed that through our work on hamlet
.
hamlet
proposes a large corpus of emails, which proves the reliability of Mr. MIME, and mrmime
can parse any of these emails. They can be re-encoded, and mrmime
doesn't alter anything at any step. We ensure correspondance between the parser and the encoder, and we can finally say that mrmime
gives us the expected result after parsing an email.
It's pretty easy to manipulate and craft an email with Mr. MIME, and from our work (especially on hamlet
), we are convinced it's reliabile. Here are some examples of Mr. MIME in OCaml to show you how to create an email and how to introspect & analyse an email:
open Mrmime
let romain_calascibetta =
let open Mailbox in
Local.[ w "romain"; w "calascibetta" ] @ Domain.(domain, [ a "gmail"; a "com" ])
let tarides =
let open Mailbox in
Local.[ w "contact" ] @ Domain.(domain, [ a "tarides"; a "com" ])
let date = Date.of_ptime ~zone:Date.Zone.GMT (Ptime_clock.now ())
let content_type =
Content_type.(make `Text (Subtype.v `Text "plain") Parameters.empty)
let subject =
let open Unstructured.Craft in
compile [ v "A"; sp 1; v "simple"; sp 1; v "email" ]
let header =
let open Header in
empty
|> add Field_name.date Field.(Date, date)
|> add Field_name.subject Field.(Unstructured, subject)
|> add Field_name.from Field.(Mailboxes, [ romain_calascibetta ])
|> add (Field_name.v "To") Field.(Addresses, Address.[ mailbox tarides ])
|> add Field_name.content_encoding Field.(Encoding, `Quoted_printable)
let stream_of_stdin () = match input_line stdin with
| line -> Some (line, 0, String.length line)
| exception _ -> None
let v =
let part = Mt.part ~header stream_of_stdin in
Mt.make Header.empty Mt.simple part
let () =
let stream = Mt.to_stream v in
let rec go () = match stream () with
| Some (str, off, len) ->
output_substring stdout str off len ;
go ()
| None -> () in
go ()
(* $ ocamlfind opt -linkpkg -package mrmime,ptime.clock.os in.ml -o in.exe
$ echo "Hello World\\!" | ./in.exe > mail.eml
*)
In the example above, we wanted to create a simple email with an incoming body using the standard input. It shows that mrmime
is able to encode the body correctly according to the given header. For instance, we used the quoted-printable
encoding (implemented by pecu
).
Then, in the example below from the standard input, we wanted to extract the incoming email's header and extract the email addresses (from the From
, To
, Cc
, Bcc
and Sender
fields). Then, we show them:
open Mrmime
let ps =
let open Field_name in
Map.empty
|> Map.add from Field.(Witness Mailboxes)
|> Map.add (v "To") Field.(Witness Addresses)
|> Map.add cc Field.(Witness Addresses)
|> Map.add bcc Field.(Witness Addresses)
|> Map.add sender Field.(Witness Mailbox)
let parse ic =
let decoder = Hd.decoder ps in
let rec go (addresses : Emile.mailbox list) =
match Hd.decode decoder with
| `Malformed err -> failwith err
| `Field field ->
( match Location.prj field with
| Field (_, Mailboxes, vs) ->
go (vs @ addresses)
| Field (_, Mailbox, v) ->
go (v :: addresses)
| Field (_, Addresses, vs) ->
let vs =
let f = function
| `Group { Emile.mailboxes; _ } ->
mailboxes
| `Mailbox m -> [ m ] in
List.(concat (map f vs)) in
go (vs @ addresses)
| _ -> go addresses )
| `End _ -> addresses
| `Await -> match input_line ic with
| "" -> go addresses
| line
when String.length line >= 1
&& line.[String.length line - 1] = '\\r' ->
Hd.src decoder (line ^ "\\n") 0
(String.length line + 1) ;
go addresses
| line ->
Hd.src decoder (line ^ "\\r\\n") 0
(String.length line + 2) ;
go addresses
| exception _ ->
Hd.src decoder "" 0 0 ;
go addresses in
go []
let () =
let vs = parse stdin in
List.iter (Format.printf "%a\\n%!" Emile.pp_mailbox) vs
(* $ ocamlfind opt -linkpkg -package mrmime out.ml -o out.exe
$ echo "Hello World\\!" | ./in.exe | ./out.exe
romain.calascibetta@gmail.com
contact@tarides.com
*)
From this library, we're able to process emails correctly and verify some meta-information, or we can include some meta-data, such as the Received:
field for example.
Of course, when we talk about email, we must talk about SMTP (described by RFC5321). This protocol is an old one (see RFC821 - 1982), and it comes with many things such as:
Throughout this protocol's history, we tried to pay attention to CVEs like:
A reimplementation of the SMTP protocol becomes an archeological job where we must be aware of its story via the evolution of its standards, usages, and experimentations; so we tried to find the best way to implement the protocol.
We decided to implement a simple framework in order to describe the state machine of an SMTP server that can upgrade its flow to TLS, so we created colombe
as a simple library to implement the foundations of the protocol. In the spirit of MirageOS projects, colombe
doesn't depend on lwt
, async
, or any specific TCP/IP stack, so we ensure the ability to handle incoming/outcoming flow during the process, especially when we want to test/mock our state machine.
With such a design, it becomes easy to integrate a TLS stack. We decided to provide (by default) the SMTP protocol with the STARTTLS
command via the great ocaml-tls
project. Of course, the end user can choose something else if they want.
From all the above, we recently implemented sendmail
(and it's derivation with STARTTLS
), which is currently used by some projects such as letters and Sihl or Dream, to send an email to some existing services (see Mailgun or Sendgrid). Thanks to these outsiders for using our work!
mrmime
is the bedrock of our email stack. With mrmime
, it's possible to manipulate emails as the user wants, so we developed several tools to help the user manipulate emails:
ocaml-dkim
provides a tool to verify and sign an email. This tool is interesting because we put a lot of effort into ensuring that the verification is really memory-bound. Indeed, many tools that verify the DKIM signature do two passes: one to extract the signature and the second to verify. However, it's possible to combine these two steps into one and ensure that such verification can be "piped" into a larger process (such as an SMTP reception server).uspf
provides a verification tool for meta-information (such as the IP address of the sender), like the email's source, and ensure that the email didn't come from an untrusted source. Like ocaml-dkim
, it's a simple tool that can be "piped" into a larger process.ocaml-maildir
is a MirageOS project that manipulates a maildir
"store." Similar to MirageOS, ocaml-maildir
provides a multitude of backends, depending on your context. Of course, the default backend is Unix, but we planned to use ocaml-maildir
with Irmin.ocaml-dmarc
is finally the tool which aggregates SPF and DKIM meta-information to verify an incoming email (if it comes from an expected authority and wasn't altered).spamtacus
is a tool which analyses the incoming email to determine if it's spam or not. It filters incoming emails and rejects spam.conan
is an experimental tool that re-implements the command file
to recognise the MIME type of a given file. Its status is still experimental, but outcomes are promising! We hope to continue the development of it to improve the whole MirageOS stack.blaze
is the end-user tool. It aggregates many small programs in the Unix spirit. Every tool can be used with "pipe" (|
) and allows the user to do something more complex in its emails. It permits an introspection of our emails in order to aggregate some information, and it proposes a "functional" way to craft and send an email, as you can see below:$ blaze.make --from din@osau.re \\
| blaze.make wrap --mixed \\
| blaze.make put --type image/png --encoding base64 image.png \\
| blaze.submit --sender din@osau.re --password ****** osau.re
Currently, our development mainly follows the same pattern:
blaze
is a part of this workflow where you can find:
blaze.dkim
which uses ocaml-dkim
blaze.spf
which uses uspf
blaze.mdir
which uses ocaml-maildir
blaze.recv
to produce a graph of the route of our emailblaze.send
/blaze.submit
to send an email to a recipient/an authorityblaze.srv
which launches a simple SMTP server to receive on emailblaze.descr
which describes the structure of your emailIt's interesting to split and prioritise goals of all email possibilities instead of making a monolithic tool which supports far too wide a range of features, although that could also be useful. We ensure a healthy separation between all functionalities and make the user responsible through a self-learning experience, because the most useful black-box does not really help.
As previously mentioned, we developed all of these libraries in the spirit of MirageOS. This mainly means that they should work everywhere, given that we gave great attention to dependencies and abstractions. The goal is to provide a full SMTP stack that's able to send and receive emails.
This work was funded by the NGI DAPSI project, which was jointly funded by the EU's Horizon 2020 research and innovation programme (contract No. 871498) and the Commissioned Research of National Institute of Information.
Such an endeavour takes a huge amount of work on the MirageOS side in order to "scale-up" our infrastructure and deploy many unikernels automatically, so we can propose a coherent final service. We currently use:
albatross
as the daemon which deploys unikernelsocurrent
as the Continuous Integration pipeline that compiles unikernels from the source and asks albatross
to deploy themWe have a self-contained infrastructure. It does not require extra resources, and you can bootstrap a full SMTP service from what we did with required layouts for SPF, DKIM, and DMARC. Our SMTP stack requires a DNS stack already developed and used by mirageos.org
. From that, we provide a submit service and a receiver that redirects incoming emails to their real identities.
This graph shows our infrastructure:
As you can see, we have seven unikernels:
foo@<my-domain>
) from the Git database, the relay knows that the real address is foo@gmail.com
. Thus, it will retransfer the incoming email to the correct SMTP service.An eighth unikernel can help provide a Let's Encrypt certificate under your domain name. This ensures a secure TLS connection from a recognised authority. At the boot of the submission server and the receiver, they ask this unikernel to obtain and use a certificate. Users can now submit emails in a secure way, and senders can transmit their emails in a secure way, too.
The SMTP stack is pretty complex, but any of these unikernels can be used separately from the others. Finally, a full tutorial to deploy this stack from scratch is available here, and the development of unikernels is available in the ptt
(Poste, Télégraphe, and Téléphone) repository.