updates/010_2019jan15_spectre_mitigation.mdwn

   1 # Spectre: timing attacks of untrusted code
   2
   3 Just when you thought everything was going swimmingly, the innocent
   4 question was asked:
   5 [how do you deal with spectre attacks?](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000317.html)
   6 Unfortunately, this is not exactly a show-stopper: it is however a massive
   7 spanner in the works.
   8
   9 Spectre is basically a timing attack: untrusted code runs instructions that
  10 are interspersed with trusted ones, and, if there is no mitigation in place,
  11 the time that it takes for untrusted instructions to be issued (and complete)
  12 can reveal information about the instructions under attack.
  13
  14 The only way to ensure that this does not happen is to design the processor
  15 so that it is literally impossible for untrusted instructions to affect
  16 whether other instructions start or complete.
  17
  18 The problem: the entire fundamental basis of an out-of-order micro-architecture
  19 is based on allowing exactly those two factors to occur: (a) instructions
  20 being delayed based on available resources (b) instructions completing in
  21 arbitrary time.
  22
  23 An in-order micro-architecture does not suffer from this type of problem
  24 because the instructions are issued in a (pretty much) guaranteed order,
  25 and they complete in a (pretty much) guaranteed order.  Instructions
  26 get pushed into the pipeline(s), and, with a few exceptions, they absolutely
  27 do not "stall" based on what is already in the pipeline.  Occasionally,
  28 roll-back or cancellation has to occur (exceptions, for example), or
  29 there has to be a "pipeline bubble" known as a "stall": a stage runs "empty".
  30 However, the key design factor is that, for the most part, in an in-order
  31 microarchitecture, no past or future instruction will cause the present
  32 one to take longer to complete or be delayed from issue, and vice-versa.
  33
  34 Very fascinatingly there is one other type of architecture which has this
  35 design criteria: the Mill Architecture.  Of particular note is that its
  36 resistance to Spectre style timing attacks is that the resistance is
  37 accidental!  The design of the Mill Architecture *pre-dates* Spectre,
  38 yet an analysis
  39 [showed it to be immune](https://groups.google.com/d/msg/comp.arch/mzXXTU2GUSo/5ROndUEMEgAJ).
  40
  41 This boils down to a couple of factors: firstly, results are generated
  42 in constant time and are pushed onto the "belt" (as it is called).
  43 Secondly: if any result generates an exception, an invalid result,
  44 or a "None", that is *still pushed onto the belt*.  Any operation that
  45 requires that as an input operand will *also generate an invalid result*.
  46 Once these "null" results reach the end of the belt, they "fall off" just like
  47 any other result.
  48
  49 The primary reason for the lack of blocking on instruction issue in the Mill
  50 Architecture seems to be down to the fact that the designers noted that
  51 arithmetic operations are cheap in terms of gates, whilst moving data around
  52 is expensive.  They therefore hugely over-duplicated the number of ALUs,
  53 the end result being that there is no stalling: no resource starvation.
  54
  55 Contrast this with the fact that in any other micro-architecture
  56 it is essential to provide significant internal bus bandwidth to move
  57 data around, and that if that bandwidth is insufficient it becomes a
  58 bottleneck, and if it becomes a bottleneck it is an effective means and
  59 method of initiating Spectre timing attacks.
  60
  61 In the design of the Libre RISC-V SoC, there are a number of places
  62 where opportunities for resource starvation come up.  Some of them
  63 are [described here](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000345.html).
  64 There are more: the virtual register lookup table, for example, would,
  65 if not large enough, result in instructions blocking.
  66
  67 This is a really serious issue, as it is even plain external javascript,
  68 executed from an arbitrary untrusted web site, that could result in
  69 information leakage!
  70
  71 So it is going to need a lot of thought.  Essentially, as learned from
  72 both the Mill Architecture and an in-order design, those two designs
  73 do not suffer from Spectre timing attacks because no instruction may
  74 cause resource starvation such that information leaks out about other
  75 instructions being processed at the time.  These characteristics are
  76 what have to be replicated in an out-of-order design.
  77
  78 It may mean huge over-allocation of resources, and it may mean "dialing
  79 back" on the number of instructions issued per cycle.  It may also
  80 mean simply identifying processes that are vulnerable (or instruction
  81 groups), and sandboxing them.  In this way, arbitrary untrusted
  82 code may only "compromise itself".  In practical terms it would mean
  83 clearing out the machine state whenever untrusted code is to be run.
  84 Is that viable? honestly, I don't know.
  85
  86 There is so much to look at, here, it is going to take time to evaluate.
  87 Enough designs have made mistakes: it's generally a good idea to learn
  88 from them.