important to update about tax agreements
[crowdsupply.git] / updates / 010_2019jan15_spectre_mitigation.mdwn
1 # Spectre: timing attacks of untrusted code
2
3 Just when you thought everything was going swimmingly, the innocent
4 question was asked:
5 [how do you deal with spectre attacks?](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000317.html)
6 Unfortunately, this is not exactly a show-stopper: it is however a massive
7 spanner in the works.
8
9 Spectre is basically a timing attack: untrusted code runs instructions that
10 are interspersed with trusted ones, and, if there is no mitigation in place,
11 the time that it takes for untrusted instructions to be issued (and complete)
12 can reveal information about the instructions under attack.
13
14 The only way to ensure that this does not happen is to design the processor
15 so that it is literally impossible for untrusted instructions to affect
16 whether other instructions start or complete.
17
18 The problem: the entire fundamental basis of an out-of-order micro-architecture
19 is based on allowing exactly those two factors to occur: (a) instructions
20 being delayed based on available resources (b) instructions completing in
21 arbitrary time.
22
23 An in-order micro-architecture does not suffer from this type of problem
24 because the instructions are issued in a (pretty much) guaranteed order,
25 and they complete in a (pretty much) guaranteed order. Instructions
26 get pushed into the pipeline(s), and, with a few exceptions, they absolutely
27 do not "stall" based on what is already in the pipeline. Occasionally,
28 roll-back or cancellation has to occur (exceptions, for example), or
29 there has to be a "pipeline bubble" known as a "stall": a stage runs "empty".
30 However, the key design factor is that, for the most part, in an in-order
31 microarchitecture, no past or future instruction will cause the present
32 one to take longer to complete or be delayed from issue, and vice-versa.
33
34 Very fascinatingly there is one other type of architecture which has this
35 design criteria: the Mill Architecture. Of particular note is that its
36 resistance to Spectre style timing attacks is that the resistance is
37 accidental! The design of the Mill Architecture *pre-dates* Spectre,
38 yet an analysis
39 [showed it to be immune](https://groups.google.com/d/msg/comp.arch/mzXXTU2GUSo/5ROndUEMEgAJ).
40
41 This boils down to a couple of factors: firstly, results are generated
42 in constant time and are pushed onto the "belt" (as it is called).
43 Secondly: if any result generates an exception, an invalid result,
44 or a "None", that is *still pushed onto the belt*. Any operation that
45 requires that as an input operand will *also generate an invalid result*.
46 Once these "null" results reach the end of the belt, they "fall off" just like
47 any other result.
48
49 The primary reason for the lack of blocking on instruction issue in the Mill
50 Architecture seems to be down to the fact that the designers noted that
51 arithmetic operations are cheap in terms of gates, whilst moving data around
52 is expensive. They therefore hugely over-duplicated the number of ALUs,
53 the end result being that there is no stalling: no resource starvation.
54
55 Contrast this with the fact that in any other micro-architecture
56 it is essential to provide significant internal bus bandwidth to move
57 data around, and that if that bandwidth is insufficient it becomes a
58 bottleneck, and if it becomes a bottleneck it is an effective means and
59 method of initiating Spectre timing attacks.
60
61 In the design of the Libre RISC-V SoC, there are a number of places
62 where opportunities for resource starvation come up. Some of them
63 are [described here](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000345.html).
64 There are more: the virtual register lookup table, for example, would,
65 if not large enough, result in instructions blocking.
66
67 This is a really serious issue, as it is even plain external javascript,
68 executed from an arbitrary untrusted web site, that could result in
69 information leakage!
70
71 So it is going to need a lot of thought. Essentially, as learned from
72 both the Mill Architecture and an in-order design, those two designs
73 do not suffer from Spectre timing attacks because no instruction may
74 cause resource starvation such that information leaks out about other
75 instructions being processed at the time. These characteristics are
76 what have to be replicated in an out-of-order design.
77
78 It may mean huge over-allocation of resources, and it may mean "dialing
79 back" on the number of instructions issued per cycle. It may also
80 mean simply identifying processes that are vulnerable (or instruction
81 groups), and sandboxing them. In this way, arbitrary untrusted
82 code may only "compromise itself". In practical terms it would mean
83 clearing out the machine state whenever untrusted code is to be run.
84 Is that viable? honestly, I don't know.
85
86 There is so much to look at, here, it is going to take time to evaluate.
87 Enough designs have made mistakes: it's generally a good idea to learn
88 from them.