This document contains the Requirements Specification for the Libre RISC-V
micro-architectural design. It shall meet the target of 5-6 32-bit GFLOPs,
-150 M-Pixels/sec, 30 Million Triangles/sec, and minimum video decode
+150 M-Pixels/sec, 30 Million Triangles/sec, and minimum video decode
capability of 720p @ 30fps to a 1920x1080 framebuffer, in under 2.5 watts
at an 800mhz clock rate. Exceeding this target is acceptable if the
power budget is not exceeded. Exceeding this target "just because we can"
corresponding 8/16-bit Function Unit(s) for that register, and vice-versa.
8/16-bit operations will however **not** block the remaining
(unallocated) bytes of the same register from being utilised.
+* Spectre timing attacks will be dealt with by ensuring that there
+ are no side-channels between cores in the usual ways (no shared
+ DIV unit, correct use of L1 cache), however there will be an
+ addition of a "Speculation Fence" instruction (or hint) that will
+ reset the internal state to a known quiescent state. This involves
+ cancellation of all speculation, cancellation of "nameless" registers,
+ committing outstanding register writes to the register file, and
+ cancelling all Function Units waiting for read hazards. This to
+ be automatically done on any exceptions or interrupts.
# Register File
# Function Units
+## Commit Phase (instruction order preservation)
# 6600 Scoreboards
## Branch Speculation
-![shadow branch][shadow_issue_flipflops.png]
-
+Branch speculation is done by preventing instructions from becoming
+"writeable" until the Branch Unit knows if it has resolved or not.
+This is done with the addition of "Shadow" lines, as shown below:
+
+This image reproduced with kind permission, Copyright (C) Mitch Alsup
+[[!img shadow_issue_flipflops.png]]
+
+Note that there are multiple "Shadow" signals, coming not just from Branch
+Speculation but also from predication and exception shadows.
+
+On a "Failed" signal, the instruction is told to "Go Die". This is
+passed to the Computation Unit as well. When all "Success" signals
+are raised the instruction is permitted to enter "Writeable".
+
+## Exceptions
+
+Exceptions shall be handled by each instruction that *may* throw an
+exception having and holding a "Shadow" wire over all dependent
+Function Units, in exactly the same way as Branch Speculation.
+Likewise, dependent instructions are prevented and prohibited from
+entering the "Writeable" state.
+
+Dependent downstream instructions, if the exception is thrown,
+shall have the "Failed" bit ASSERTED (by the Function Unit throwing
+the exception) such that the down-stream dependent instruction is told
+to "Go Die".
+
+If the point is reached at which the instruction knows that the
+Exception cannot possibly occur, the "Success" signal is raised
+instead, thus cancelling the "hold" over dependent downstream
+instructions - again in exactly the same way as Branch Speculation
+"Success".
+
+Exceptions may **only** be actually raised if they are at the front of
+the instruction queue, i.e. if they are free of write hazards.
+See section on "Function Unit Commit" phase, as the Function Units
+have a "link bit" that preserves the instruction issue order, which
+must also be respected.
+
+# Spectre-style timing mitigation
+
+Spectre-style timing attacks are defined by one instruction issue
+affecting the completion time of past **and future** instructions.
+The key insight to mitigation against such attacks is to note that
+arbitrary untrusted instructions must not be permitted to affect
+trusted instructions. Consequently as long as there is a firebreak
+(a "Fence") between trusted and untrusted, timing attacks can be
+held off.
+
+Two instructions ("hints") shall therefore be added:
+
+* One that stops speculation, multi-issue and any out-of-order
+ resource allocation for a minimum of 16 instructions.
+* Another that **cancels** all speculation and reservations,
+ cancels "nameless" registers, waits for and ensures that all
+ outstanding instructions have completed and committed, before
+ permitting the processor to continue further.
+
+This latter shall occur unconditionally without requiring a special
+instruction to be called, on ECALL as well as all exceptions and
+interrupts.
+
+# ALU design
+
+There is a separate pipelined alu for fdiv/fsqrt/frsqrt/idiv/irem
+that is possibly shared between 2 or 4 cores.
+
+The main ALUs are each a unified ALU for i8-i64/f16-f64 where the
+ALU is split into lanes with separate instructions for each 32-bit half.
+So, the multiplier should be capable of 64-bit fmadd, 2x32-bit fmadd,
+4x16-bit fmadd, 1x32-bit fmadd + 2x16-bit fmadd (in either order), and all
+(8/16/32/64) sizes of integer mul/mulhsu/mulh/mulhu in 2 groups of 32-bits.
+We can implement fmul using fmadd with 0 (make sure that we get the right
+sign bit for 0 for all rounding modes).
+
+# Rowhammer Mitigation
+
+* <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-March/000699.html>
+* <https://arxiv.org/pdf/1903.00446.pdf>