This highlights the importance of tradition in modern computing.
You can't and don't "bootstrap" even your bread, which is one of the simplest and oldest human technologies. Bread is said to take flour, yeast, salt and heat. You don't "bootstrap" your yeast culture and your grain culture - you get them from some source. You are handed over the recipe and these living cultures from people who have been doing it before, and you start from that (you can experiment with changes to that of course). Of course the technology changes, but there's still a tradition and will always be.
What matters practically is that we have a variety of living software cultures and traditions to choose from to consume, to conserve and to pass on.
It's very common (and easy) to "bootstrap" a sourdough starter; it's made from wild yeast and lactobacillus, which are present both in the environment and in flour.
> You don't "bootstrap" your yeast culture and your grain culture - you get them from some source.
You do not have to buy them. You get them from the grain. But I agree with you though. It's basically bootstrapping, because they are everywhere around us.
The bread analogy is fantastic because it makes people think that they bootstrapped it, when in fact they didn't. Same as people who think they make things from "scratch". And I don't want to diminish the effort, in fact I'd like to promote it. But scratch is relative.
When I was little my dad used to tell me how he would get up every day at 5 o'clock and feed scratch to the chickens. So when my mom made something "from scratch", I thought she literally made it out of whatever it is that chickens eat.
Of course it doesn't mean that at all. We often use it to mean "from nothing", but it doesn't mean that either. It means "from the beginning" or "from the starting line" (which might be a scratch in the ground). And of course that starting mark is movable, so one can get closer or further away from the ideal of "from nothing" and still make something "from scratch".
> You don't "bootstrap" your yeast culture and your grain culture - you get them from some source.
Wait, what? It is super simple to bootstrap your own yeast culture. You just need wheat, water, and time. You can easily grow your own wheat. The only way this is not bootstrapping is if I need to create my own water out of hydrogen and oxygen and my own wheat by… actually I’m not sure how one would bootstrap wheat. Find some of the original wild plants that served as the original source of our current domesticated wheat and re-domesticate it? But that would literally be impossible, thus making the term bootstrap meaningless in this case.
Growing from seed. I think growing your own wheat from seed can be considered bootstrapping since it is impossible that anyone could go back to the original wild plants and re-domesticate back to having wheat like we do now. Certainly not within a single person's lifetime. So, I say that growing from seed is realistically the closest we can come to bootstrapping it.
So you either take your starter from somebody else or you fall back to a lower-level prerequisite of dormant living organisms present in your flour, and you hope they are alive and good for your end product.
It's not that hard to imagine that these bacteria are killed by irradiation, or are "compromised by perfectly placed state-level adversary".
The GNU Project was already an established thing when Torvalds started Linux. I think it would be fair to argue the GNU bootstrapped Linux even if you don't buy into the whole "you must call it GNU/Linux" argument.
And the whole GNU project was bootstrapped by what was going on at the MIT Media labs etc. etc.
the first Linux was written on Minix and used the Minix filesystem and GCC that was available on minix.
So built on an established software stack and kernel.
As I said in another comment branch, in the case of sourdough it seems you just make use of the yeast cultures already present in your flour. So you kinda just get both in one package. Does this make sense to you?
It's usually on the grain and not from the air. It's where the sour part comes from too, it's not uncommon to use raw unmalted grain to innoculate beer wort to create a sour beer.
The parallels with how biological life started are tantalizing. Life also "optimized away" its origins after the initial bootstrap, through increasing layers of sophistication.
AFAIK research into origins of life is stuck somewhere around the GNU Mes level now. We know of the weird mutual dependency between proteins and nucleic acids (RNA and ribosomes in particular), but noone figured out how to kickstart the process from one turtle deeper yet.
Biology is still waiting for its "stage0".
It makes me excited that we're still so close to the origin (of computing) that such questions are vaguely tractable.
> The same holds for the genome. To create a new ‘binary’ of a specimen, a living copy is required. The genome needs an elaborate toolchain in order to deliver a living thing. The code itself is impotent. This toolchain is commonly called ‘your parents’.
Do you think a time could ever come when we still have computers (and know how to make new ones), but history has forgotten how they came to be in the first place?
Despite being a pragmatist, it’s fun to imagine what’s possible as a thought experiment:
You’re a western signals intelligence agency with virtually unlimited budget and access to some of the brightest mathematical minds on the planet. Your goal is to evaluate the viability in systemic compromise of virtually all compiler infrastructure used in modern computing to the point where verifiability becomes all but impossible, as theorized in the article.
How would you even go about that? What would the modifications look like? How much foresight would you need? Is it even a worthwhile goal, let alone a feasible one?
In practice, you’re probably just going to compromise the proprietary platform security modules instead. There, you can use undercover or otherwise sanctioned employees at semiconductor companies backed up by offensive cyber on a continuous basis. From that offensive vantage point, you’re able to compromise most compilers in use today.
If those miraculous Mes binaries devoid of any available source were in fact compromised, I suspect at best they’d be relics of a chess game that started long ago and is still being played today, with the binaries themselves no longer relevant.
Rather than some deliciously diabolical compromise at the lowest levels of the software stack from which all further compromise flows (a single turtle), a more pragmatic view might hold that systemic compromise may be more akin to architecting turtle contagions and ensuring their continuous delivery. Compilers are just one vector in achieving that end.
An even more pragmatic view might hold systemic compiler compromise (i.e. compilers indiscriminately compromising other compilers) is not undertaken for all manner of reasons.
"In practice, you’re probably just going to compromise the proprietary platform security modules instead."
That's the real problem. It's practical to bootstrap software in surprisingly few steps. There are people who can write assembler that can do a basic C compiler, from which we can get a full C compiler, from which the world is our oyster. I'm not that good in assembler myself but I could bootstrap from nothing in another couple of intermediate steps (and a whole bunch of time). Then on top of that you have things like NixOS and Debian pushing for full reproducability, which isn't a panacea on its own but means we get further visibility into what builds produce, and makes it more worthwhile to do a full software bootstrap because it extends the assurance out that much farther. If there is in fact an ancient buried trapdoor in our compiler stacks, whoever put it there better start trying to remove it because it's probably getting close to being discovered. I don't think there is one, but the amount of room for it to hide in is rapidly shrinking.
But that does nothing about the hardware that has a backdoor into reading the contents of my harddrive and shoveling them out over Wifi to some arbitrary network destination if the right network packets arrive at my Wifi adapter. My software will never even see these magic packets because the hardware will never deliver them.
I bootstrapped Go 1.14 for FreeBSD 8, with a couple of patches for older syscalls, but I didn't go as far as bootstrapping GCC. I wanted to build a tool that would automatically bootstrap Go (gc and gccgo), but I lost momentum before finishing.
More recently, I bootstrapped Rebol 3, which is a self-hosted compiler, but I couldn't get to the bottom turtle since old versions were closed-source.
One way out of the Rebol situation is to implement a minimal interpreter/compiler for the self-hosted language using another language, build the language using that and then rebuild the language using itself. The Bootstrappable Builds folks are working on that for a few things, for eg Scala:
You can do this by following Ben Eater’s 8 bit computer build. Then use that to build and compile out everything. That’s as close to barewires as you can get.
But to be fair, the components you use could be "spiked". Imagine resistors, capacitors, transistors that can find out when they are used in a particular configuration and change their parameters accordingly. /s
Building your own transistors is out of the question for most of us, but we could build our own magnetic logic, and scale that up to a computer. You could verify everything with an analog oscilloscope.
You need insulated wire, saturable cores, some various passive components, and a large 2 phase power source. (You likely could use a modified car alternator, then upscale to about 1 Mhz clock rates)
I imagine a tiny Forth, which interprets larger Forth, which implements C enough to compile old TinyCC/GCC and start the chain of rebuilds (as explained in stage0).
As for hardware, a schematic for Forth would be simpler than schematic for generic RAM machine. But if not, intermediate minimal RISC-style machine language is still fine.
Steps to reach that level of hardware are already describe in NandToTetris, so in the end anything capable of implementing NAND/Fanout/Wiring can be used to run bootstrap chain.
Then we only have to make sure HW and SW implementers don't introduce bugs.
The stage0 project has a Forth implementation of the middle layers (between the pure hex writer and Mes), but according to the author Forth wasn't as easy for them as straight assembler or Scheme. They say the Forth code is sitting there, waiting for a Forth expert to come and prove them wrong. Have at it! :P
I guess it is largely forgotten, but computers used to have a front panel with enough switches to enable you to input a full word into memory, and step forward. This is how bootstraps could be entered if you didn’t have a bootstrap device.
Even the first hobby computer had a front panel. The Apple 1 was a user-friendly improvement in that regard.
Let’s get to the bottom of the mistrust we are trying to overcome, shall we? Whether via compiler, firmware, or rouge NIC, the attack vector which undermines trust would take the shape of a system that could modify and/or exfiltrate data that the machine processes, or perhaps metadata about when and how the machine is used. How could one test if that attack was actually taking place? I would probably start with analyzing what packets a network card emits, or perhaps what is written to disk. If there were a ‘ground truth’ device that could be used as a baseline, such a comparison would be possible. Perhaps an older device that predates the modern security state, or some means of bypassing Secure Boot would suffice. From there, one could ascend the stack of turtles, verifying the outputs at each layer of abstraction.
If you want really secure computing, build a system that consists of a grid of 4x4 bit Look up tables, hook them in a grid so that each has 4 inputs and outputs : up,left,down,right
Clock them in a checkerboard pattern to eliminate race conditions
A B A B A B A B
B A B A B A B A
A B A B A B A B
B A B A B A B A
The logic will just work, there's no program counters, etc. You can flip, rotate, split, fold logic to work around bad gates, you can isolate an input or output by wrapping it in a fence of logic.
However, you give up everything you're used to inheriting from Von Neuman in the process.
I think that the early versions of the Pascal compiler were writen so as to be bootstrappable on diverse systems. I believe you had to write a simple virtual machine to bootstrap the compiler. I cannot find a source for that, I heard it in compilers course.
This was more of a necessity than something done for verifiability though. Systems were more diverse back then and you couldn't just target "a Unix with a C compiler".
Using a Precursor [1] seems like it might be useful here, since you can compile and verify your own CPU on an FPGA. But then it comes down to whether the FPGA software can understand and intercept what you're compiling, I suppose.
That requires the FPGA software to not only solve the halting problem fast enough that you don't notice the lag, but invent backdoor code for arbitrary made up CPU architectures. Even if it was connecting to a godlike AGI in the cloud, I doubt it could pull this off reliably.
In reality, it only requires the FPGA software to backdoor the most commonly used random number generator constructions found in open-source projects based off of the Precursor. Writing such a backdoor would probably take a developer about a single day.
Bastardized TLDR: It's practically impossible to bootstrap a completely pure and trusted compiler (and hence computing environment) on modern hardware as all sorts of unseen firmware gets invoked before your first, manually-verified instruction.
I think one could keep following the turtles right down to the silicon...
I don't understand why people stop their trust analysis at flashable firmware. Somehow people can't trust their hardware's firmware, but they can blindly trust the hardware itself? Why? Would you even be able to tell if the hardware had firmware in it you didn't know about, let alone the fact that the hardware itself could be malicious?
This is why the whole thing seems like an exercise in futility to me. Just trust a reasonable base (e.g., including the OS) and call it a day. If you can't even trust your vendor to give you trustworthy firmware then find some way to invest $N Billion into your own fabrication labs to make your own chips in front of your face.
I did go that far down, and there is a purpose. Reducing the scope of attack to "you must own a fab" is pretty great, honestly. Sure, it won't stop a perfectly placed nation-state from mounting a bespoke attack just for you by twiddling silicon doping on a wafer... But that's quite a bit harder and more expensive than "install an SMM rootkit".
And, if you do care about trustable hardware... There are bootstrapping and verification paths available there as well, depending on your threat model.
Or you can of course give up and declare all of computing fundamentally untrustable but still useful for some purposes. Like I said in the post, I'm glad for the existence of both the purists and the pragmatists in this space.
I think it is something of an exaggeration to say that the scope of attack has been reduced to "you must own a fab". At best, it is the scope of the bootstrap problem that has been reduced, but there is still the problem of securing and verifying all the source code for the software you are going to need to do something useful (including, but by no means limited to, the toolchain and the operating system which hosts it.)
Solving the hardest problem (or what appears to be it) does not mean that everything else is tractable. In this case, the sheer size of the problem means that it is beyond the scope of one person [1], so the problem becomes one of who you trust, not what you trust.
I think we all knew it was going to come down to this; how does bootstrap.org deal with it?
[1] I'm putting aside the problem of verifying the design of what the fab makes, of the fab itself, and the trustworthiness of the people building and operating it, which is, as you suggest, 'just' another heap of turtles.
I still don't understand in what scenario someone could trust their vendor's hardware but not firmware. Somehow the firmware is malicious but the hardware is trusted? Why/how? Either you're getting the product directly from a vendor you trust, or you're not. If you are, then the firmware and the hardware are one thing together. If you're not, you need your own fab. And mind you, whoever is supposedly intercepting your shipments (or whatever) doesn't need a fab to pull off any attack, so I'm not sure what the scope reduction is here...
Yes I remember it from the Snowden days, that's why I mentioned it myself. But I don't get the threat model. So supposedly the NSA planted something in your device's firmware. How exactly would it help you if you could "see" the manufacturer's firmware (say it was open-source)? You still wouldn't know what's running on the chip. Even if you flash it, the chip could just be lying in some part of the process. Conversely, the entire firmware could be encrypted and you could still verify it (without knowing what it's doing) if the chip had an un-tamperable-with "dump out a hash of my firmware" instruction to let you match against the manufacturer's provided hash. Or an instruction to verify that its hash is what you expect in a manner that can't be tampered with. Either way, I don't see how your knowledge of the firmware that's supposed to be on it is necessary or sufficient.
At the minimum, I'd want to be aware that the firmware is not what the manufacturer had intended to provide me. Perhaps it's not the NSA after me, but some other actor or competitor or ransomware agency.
Yeah and to do that you need some mechanism to check what's on the device. It wouldn't help you to have 'open' firmware since it still wouldn't tell you what's on the device.
I hadn’t presumed that the firmware needed to be Open, though. Just a mechanism to verify. Being open and having the ability to compile from source and installing it myself would be even better.
Keep in mind that in the 'Trusting Trust' example, the compiler has to be smart enough to realize that you are building another compiler (and only then insert the backdoor). I can imagine that a back-doored GCC would recognize when you are building another GCC, but it would be hard for an old version of GCC to recognize eg a modern LLVM or even GHC, I'd say.
Similar, the lower down you go, the harder it is to put that kinds of smarts in.
That might be one reason to stop at this point? (Not sure.)
Okay strange idea, but if you are wanting to bootstrap using your own transistors or core rope memory, why not define your baseline architecture instructions in such a way that the boot loader instructions are defined by some physical constants you could measure from your environment (star positions?) or some constant you could calculate (first N digits of PI?) that is as long as you can verify the underlying hardware, you can manually verify the "firmware"... or something, not sure this makes any sense. At least it would a whole new layer of mysitcism to the process :)
With homomorphic encryption you could distribute a trusted platform for execution for execution on untrusted platforms; but you still need a trusted hardware platform on which to execute the initial compilation.
It'd be nice if there was some form of homomorphic compute analogue to public-key/private-key encryption to allow for IO channels, but I'm not sure if that's possible.
Somewhat uselessly, if ever you do actually find the bottom turtle, it's impossible to tell that it really is the bottom - and that holds true right down to the universe itself.
You can't and don't "bootstrap" even your bread, which is one of the simplest and oldest human technologies. Bread is said to take flour, yeast, salt and heat. You don't "bootstrap" your yeast culture and your grain culture - you get them from some source. You are handed over the recipe and these living cultures from people who have been doing it before, and you start from that (you can experiment with changes to that of course). Of course the technology changes, but there's still a tradition and will always be.
What matters practically is that we have a variety of living software cultures and traditions to choose from to consume, to conserve and to pass on.