From: Luke Kenneth Casson Leighton Date: Sun, 13 Jan 2019 16:30:32 +0000 (+0000) Subject: add spectre update X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=4b1def0cfbf32fa5c1b0e3279cd179ba90759f19;p=crowdsupply.git add spectre update --- diff --git a/updates/010_spectre_mitigation.mdwn b/updates/010_spectre_mitigation.mdwn new file mode 100644 index 0000000..ac0a3d0 --- /dev/null +++ b/updates/010_spectre_mitigation.mdwn @@ -0,0 +1,88 @@ +# Spectre: timing attacks of untrusted code + +Just when you thought everything was going swimmingly, the innocent +question was asked: +[how do you deal with spectre attacks?](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000317.html) +Unfortunately, this is not exactly a show-stopper: it is however a massive +spanner in the works. + +Spectre is basically a timing attack: untrusted code runs instructions that +are interspersed with trusted ones, and, if there is no mitigation in place, +the time that it takes for untrusted instructions to be issued (and complete) +can reveal information about the instructions under attack. + +The only way to ensure that this does not happen is to design the processor +so that it is literally impossible for untrusted instructions to affect +whether other instructions start or complete. + +The problem: the entire fundamental basis of an out-of-order micro-architecture +is based on allowing exactly those two factors to occur: (a) instructions +being delayed based on available resources (b) instructions completing in +arbitrary time. + +An in-order micro-architecture does not suffer from this type of problem +because the instructions are issued in a (pretty much) guaranteed order, +and they complete in a (pretty much) guaranteed order. Instructions +get pushed into the pipeline(s), and, with a few exceptions, they absolutely +do not "stall" based on what is already in the pipeline. Occasionally, +roll-back or cancellation has to occur (exceptions, for example), or +there has to be a "pipeline bubble" known as a "stall": a stage runs "empty". +However, the key design factor is that, for the most part, in an in-order +microarchitecture, no past or future instruction will cause the present +one to take longer to complete or be delayed from issue, and vice-versa. + +Very fascinatingly there is one other type of architecture which has this +design criteria: the Mill Architecture. Of particular note is that its +resistance to Spectre style timing attacks is that the resistance is +accidental! The design of the Mill Architecture *pre-dates* Spectre, +yet an analysis +[showed it to be immune](https://groups.google.com/d/msg/comp.arch/mzXXTU2GUSo/5ROndUEMEgAJ). + +This boils down to a couple of factors: firstly, results are generated +in constant time and are pushed onto the "belt" (as it is called). +Secondly: if any result generates an exception, an invalid result, +or a "None", that is *still pushed onto the belt*. Any operation that +requires that as an input operand will *also generate an invalid result*. +Once these "null" results reach the end of the belt, they "fall off" just like +any other result. + +The primary reason for the lack of blocking on instruction issue in the Mill +Architecture seems to be down to the fact that the designers noted that +arithmetic operations are cheap in terms of gates, whilst moving data around +is expensive. They therefore hugely over-duplicated the number of ALUs, +the end result being that there is no stalling: no resource starvation. + +Contrast this with the fact that in any other micro-architecture +it is essential to provide significant internal bus bandwidth to move +data around, and that if that bandwidth is insufficient it becomes a +bottleneck, and if it becomes a bottleneck it is an effective means and +method of initiating Spectre timing attacks. + +In the design of the Libre RISC-V SoC, there are a number of places +where opportunities for resource starvation come up. Some of them +are [described here](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000345.html). +There are more: the virtual register lookup table, for example, would, +if not large enough, result in instructions blocking. + +This is a really serious issue, as it is even plain external javascript, +executed from an arbitrary untrusted web site, that could result in +information leakage! + +So it is going to need a lot of thought. Essentially, as learned from +both the Mill Architecture and an in-order design, those two designs +do not suffer from Spectre timing attacks because no instruction may +cause resource starvation such that information leaks out about other +instructions being processed at the time. These characteristics are +what have to be replicated in an out-of-order design. + +It may mean huge over-allocation of resources, and it may mean "dialing +back" on the number of instructions issued per cycle. It may also +mean simply identifying processes that are vulnerable (or instruction +groups), and sandboxing them. In this way, arbitrary untrusted +code may only "compromise itself". In practical terms it would mean +clearing out the machine state whenever untrusted code is to be run. +Is that viable? honestly, I don't know. + +There is so much to look at, here, it is going to take time to evaluate. +Enough designs have made mistakes: it's generally a good idea to learn +from them.