That's a great point, but I think the point of what Bryan has been doing is to make Linux work with Zones (and dtrace).
That's a primitive that Joyent has wanted to upstream into the Linux kernel for a long time and has never been able to get the necessary consensus around it (similar to OpenVZ's troubles getting their work upstreamed).
In short, this is sort of a hack to give you zones on Linux without needing to get zones into the upstream. Yes, there's no linux code, but there is a lot of required understanding of Linux code to make something like this work.
It's kinda amazing that they got 64-bit linux to run on top of Illumnos, right? I did not see that coming and maybe that's because I'm ignorant in some capacity, but it's been a pleasant surprise.
Emulations have been a Unix feature for a long time actually. NetBSD has had 64 bit Linux emulation for ages, for example, but its not very complete because no one has cared enough to implement more. For example Illumos is AFAIK the first system to emulate epoll. The Linux API is huge and historically the process has been just fixing stuff for a binary someone wants to run. It is very tedious work...
I dont really see it as zones in linux. More a gateway drug for non-Linux.
Hey Justin -- do you have insight into how hard any particular remapping (ie: epoll) is to perform ? I was talking to @bcantrill about their effort at a Docker meetup and mentioned the NetBSD emulation (he said "Oh! Of course!"), but whats interesting (in retrospect) is that they (Joyent) just tried running stuff and played whack-a-mole w/ unimplemented APIs... how tough would it be for "us" (NetBSD) to occasionally implement pieces ?
It is just tedious and you need motivation. Especially as Linux has a lot of interfaces, many of which are frustratingly annoying - there are three file change notification interfaces, of different dates. In fact there are at least two of everything!
I amagine much of the Joyent code could be easily ported to NetBSD/FreeBSD (which now has a 64 bit interface as of a few months back). epoll may well be the most difficult (it has edge and level triggered events and other annoyances). But a not very performant version should be doable.
Mostly, few people have been interested. I have a decent test suite though (rump based) so email if you are interested...
Speaking without familiarity with NetBSD, I think it depends on what kernel facilities the system happens to have; speaking for SmartOS/illumos, in many cases we were able to slightly rephrase Linux facilities as extant facilities -- saving a considerable amount of time and effort. For example, the big realization with epoll was just how naive it is -- so much so, in fact, that it actually looks very similar to a pre-port mechanism (/dev/poll) that we developed nearly 20 years ago (!!) and later deprecated in favor of ports. epoll would have been much nastier without /dev/poll -- which is likely the greatest service that /dev/poll has ever provided anyone...
Yes, NetBSD added some facilities (and general missing functions) that were Linux-like if that made sense. No one did epoll as kqueue is a bit of a mismatch and we never had /dev/poll...
A lot of the issue is just testing - NetBSD does not have any in tree tests for compat. I have some out of tree, which help a lot.
Hi Bryan -- I'm also aware that epoll may have been a bad example on my part, because isn't it subject to some nasty fork/share bugs wrt handling the (well) handle, and what file it's actually associated with the handle -- so a parent can get notifications on a handle it doesn't have, or worse, notifications for a socket that it does have that is not really the same handle that's issuing the event.
In cases like that, did you end up trying to be bug-compatible, or make a design decision to clear up the trouble ?
Funny you should mention that one in particular -- from our (SmartOS's) epoll(5) man page:
While a best effort has been made to mimic the Linux semantics, there
are some semantics that are too peculiar or ill-conceived to merit
accommodation. In particular, the Linux epoll facility will -- by
design -- continue to generate events for closed file descriptors
where/when the underlying file description remains open. For example,
if one were to fork(2) and subsequently close an actively epoll'd file
descriptor in the parent, any events generated in the child on the
implicitly duplicated file descriptor will continue to be delivered to
the parent -- despite the fact that the parent itself no longer has any
notion of the file description! This epoll facility refuses to honor
these semantics; closing the EPOLL_CTL_ADD'd file descriptor will
always result in no further events being generated for that event
description.
So while we do aspire to be bug-compatible, we're not about to compromise our principles over it. More details (or some of them, anyway) can be found in the talk on LX-branded zones that I gave at illumos Day at Surge 2014.[1][2]
Agreed. The application is what matters... in the end most people don't care about what Operating System their apps run on.
I've felt for a long time that with the right tooling, smartOS/Illumos would make an ideal Container OS. Glad to see that they're moving hard in this direction.