11th Mar 2015: More racing, Testing, Networking, Large merges and Cleanups
Attendees: Amir Chaudhry (chair), Thomas Gazagnaire, David Kaloper, Thomas Leonard, Jon Ludlam, Anil Madhavapeddy, Hannes Mehnert, Richard Mortier and Dave Scott
The startOfDay race issue between netback and netfront — discussed on the last call — is still unresolved. It's still difficult to reproduce but we do have a minimal example of this problem, which should help greatly. It only seems to surface when it's on a live Xen host which is why it's a tricky one to deal with. It would be very useful if anyone can provide other examples of this problem, especially to see if the issue exists on x86 and it hasn't surfaced yet.
In the meantime, we might need to put a 'sleep' into
mirage-net-xen to get
it to work for the time being. Ironically, this whole issue didn't occur until
ThomasL's optimisations, which in a way, means they were good enough to
surface this issue (in other words, this is still progress!).
The testing breakout squad we proposed didn't quite convene but things around test are still getting better. Anil has been fixing breakages reported via is-mirage-broken and things look good as mirage-skeleton is now successfully building across all platforms. cohttp and TLS are also part of those tests and some of them are in a known-broken state (due to other outstanding work). We need to get this up to 100% pass rate before we do much more. There are still some more tweaks to commit but things look much better than they did before.
Magnus has been working on a Mirage net front end (including pcap), so we don't need a real network to test. There is also the testing repo from Masoud, which Mort will try out. We need a Xen machine to do this so will have to discuss where this can take place.
Amir noticed a QuickCheck library in OPAM so perhaps we could use it. However, Anil pointed our that there is likely to be a fair amount of plumbing and scaffolding involved in order to get to something useful (which still needs someone to put in the effort). Dave has been using BISECT and coveralls, which he's been happy with (and slightly addicted to). You can see examples of this in the shared-block-ring repo (which is at 81% coverage and climbing!). It would be good to get some notes written down and sent out to the list about this process and hooking together with Travis.
So far you'll notice that we've mainly discussed issues relating to the network stack. It's obvious that this is a critical piece of infrastructure so we need continue the improvements we've been making. However, we also need to be charting this progress, which means gathering base-line numbers before doing releases. We'll be able to do much more functional testing when Magnus' work is complete and Anil will also talk to Mindy about her work. Overall, things are going well, we're continually improving and it would be good to monitor that progress.
There are a couple of wide-ranging changes that are being tracked and releases of various libraries need to be co-ordinated appropriately.
Removing DEVICE.connect (mirage/mirage#350) — As of yesterday, all the
relevant libraries have been updated and released so the
changes have now been merged. This involved coordinated changes and releases
of at least 15 libraries. This also allowed us to update the old repos with
new Travis scripts and improvements to OPAM metadata. As a result of this and
other activity, we released Mirage 2.3.0 into the upstream OPAM
TLS Support (mirage/mirage-dev#52) — This is still awaiting a few patch reviews/merges of a number of different libraries, which is nearly complete. Just awaiting changes to Conduit, OCaml-TLS and also a release of Ctypes 0.4.0. Once TLS support is complete, we will be able to release Mirage 2.4.0
Some informal feedback from a few people has been that MirageOS is still an intimidating project to get involved with. To some extent this is unavoidable as certain aspects require experience of systems/protocols before diving in (or the will and patience to acquire that experience), and we're still building out our set of simpler Pioneer Projects. However, there are certain things we can do to help make it easier for people who may be looking around but are still reluctant to email the list. For example, we could work on the following over the next few months:
GitHub Org cleanup — We have a large number of repos accumulating on the
Mirage GitHub Org and a number of them are no longer relevant (cf.
We could identify these libraries and move them out into a separate Org
(say, mirage-attic) so that only current repos remain in the primary
organisation — we'd have to also update any OPAM-repo metadata as part of this
(it's highly unlikely that anyone is using these libs — it's just so that we do
this properly). Doing this would allow a newcomer to be sure that any of the
libraries in the GitHub org are worth looking at. In addition to this, we
should also clean up stale issues across all the remaining repos so that they
reflect current reality.
Improve documentation — Due to separate efforts towards an OCaml Platform, we now have new documentation generation tools available to us. These are still in alpha but they go some way to improving the docs we can produce for collections of libraries (CSS notwithstanding). We'd encourage people to begin using these tools (and submitting patches if possible). In addition, there have been a lot of blog posts about how to do things with Mirage and it's time we turned those into actual documentation on the website. In other words, sweep through the material we've created and backport it onto the site. This makes it significantly easier to maintain as the underlying tools progress.
Neither of the above activities is particularly challenging, but it's more about setting aside the time to complete them. In the past we've proposed having a 'doc-day' of our own (similar to Xen's document days), so that we can get together and just tick things off. We may need to schedule a few days like that to get these things done.
Would like to move the Mirage site to be HTTPS only, which needs TLS. Hannes is keen to do this and Anil has all the necessary certificates. Will need TLS conduit. Running the website this way will test the whole stack. Amir notes that website is still the canary as when end-to-end thing break, that's the first place we notice. We need to fix things quickly when they do go down.
Irmin feedback: Some discussions on the mailing list about the API and some feedback from early users (ThomasL and KC). Turns our that some aspects of Dave's work on shared-block-ring can help with some of ThomasG's activities.