add exception section
[libreriscv.git] / 3d_gpu / requirements_specification.mdwn
1 # Requirements Specification
2
3 This document contains the Requirements Specification for the Libre RISC-V
4 micro-architectural design. It shall meet the target of 5-6 32-bit GFLOPs,
5 150 M-Pixels/sec, 30 Million Triangles/sec, and minimum video decode
6 capability of 720p @ 30fps to a 1920x1080 framebuffer, in under 2.5 watts
7 at an 800mhz clock rate. Exceeding this target is acceptable if the
8 power budget is not exceeded. Exceeding this target "just because we can"
9 is also acceptable, as long as it does not disrupt meeting the minimum
10 performance and power requirements.
11
12 # General Architectural Design Principle
13
14 The general design base is to utilise an augmented and enhanced variant
15 of the original CDC 6600 scoreboard system. It is not well-known that
16 the 6600 includes operand forwarding and register renaming. Precise
17 exceptions, precise in-order commit, branch speculation, "nameless"
18 registers (results detected that need not be written because they have
19 been overwritten by another instruction), predication and vectorisation
20 will all be added by overloading write hazards.
21
22 An overview of the design is as follows:
23
24 * 3D and Video primitives (operations) will only be added as strictly
25 necessary to achieve the minimum power and performance target.
26 * Identified so far is a 4xFP32 ARGB Quad to 1xINT32 ARGB pixel
27 conversion opcode (part of the Vulkan API). It will write directly
28 to a separate "tile buffer" (SRAM), not to the integer register
29 file. The instruction will be scalar and will inherently and
30 automatically parallelised by SV, just like all other scalar opcodes.
31 * xBitManip opcodes will be required to deal with VPU workloads
32 * The register files will be stratified into 4-way 2R1W banks,
33 with *separate* and distinct byte-level write-enable lines on all four
34 bytes of all four banks.
35 * 6600-style scoreboards will be augmented with "shadow" wires
36 and write hazard capability on exceptions, branch speculation,
37 LD/ST and predication.
38 * Each "shadow" capability of each type will be provided by a separate
39 Function Unit. For example if there is to exist the possibility of rolling
40 ahead through two speculative branches, then two **separate**
41 Branch-speculative Function Units will be required: each will
42 hold their own separate and distinct "shadow" (Go-Die wire) and
43 write-hazard over instructions on which the branch depends.
44 * Likewise for predication, which shall place a "hold" on
45 the Function Units that depend on it until the register used
46 as a predicate mask has been read and decoded, there will be
47 separate Function Units waiting for each predication mask register.
48 Bits in the mask that are "zero" will result in "Go-Die" signals being
49 sent to the Function Units previously (speculatively) allocated for that
50 (now cancelled) element operation. Bits that are "1" will cancel
51 their Write-Hazard and allow the Function Unit to proceed with that
52 element's operation.
53 * The 6600 "Q-Table" that records, for each register, the last Function
54 Unit (in instruction issue order) that is to write its result to that
55 register, shall be augmented with "history" capability that aids and
56 assists in "rollback" of "nameless" registers, should an exception
57 or interrupt occur. "History" is simply a (short) queue (stack)
58 that preserves, in instruction-issue order, a record of the previous
59 Function Unit(s) that targetted each register as a destination.
60 * Function Units will have both src and destination Reservation
61 Stations (latches) in order to buffer incoming and outgoing data.
62 This to make best use of (limited) inter-Function-Unit bus bandwidth.
63 * Crossbar Routing from the Register File will be on the **source**
64 registers **only**: Function Units will route **directly** to
65 and be hard-wired associated with one of four register banks.
66 * Additional "Operand Forwarding" crossbar(s) will be added that
67 **bypass** the register file entirely, to be used exclusively
68 for registers that have specifically been identified as "nameless".
69 * Function Units will be the *front-end* to **shared** pipelined
70 concurrent ALUs. The input src registers will come from the
71 latches associated with the Function Unit, and will put the
72 result **back** into the destination latch associated with that
73 **same** Function Unit.
74 * **Pairs** of 32-bit Function Units will handle 64-bit operations,
75 with the 32-bit src Reservation Stations (latches) "teaming up"
76 to store 64-bit src register values, and likewise the 32-bit
77 destination latches for the same (paired) Function Units.
78 * 32-bit Function Units will handle 8 and 16 bit operations in
79 cases where batches of operations may be (easily, conveniently)
80 allocated to a 32-bit-wide SIMD-style (predicated) ALU.
81 * Additional 8-bit Function Units (in groups of 4) will handle
82 8-bit operations as well as pair up to handle 16-bit operations
83 in cases where neither 8 nor 16 bit operations can be (conveniently,
84 easily) allocated to parallel (SIMD-like) ALUs. This to handle
85 corner-cases and to not jam up the 32-bit Function Units with single-byte
86 operations (resulting in only 25% utilisation).
87 * Allocation of an operation to a 32-bit ALU will block the
88 corresponding 8/16-bit Function Unit(s) for that register, and vice-versa.
89 8/16-bit operations will however **not** block the remaining
90 (unallocated) bytes of the same register from being utilised.
91
92 # Register File
93
94 There shall be two 127-entry 64-bit register files: one for floating-point,
95 the other for integer operations. Each shall have byte-level write-enable
96 lines, and shall be divided into 4-way 2R1W banks that are split into
97 odd-even register numbers and further split into hi-32 and lo-32 bits.
98
99 In this way, 2 simultaneous 64-bit operations may write to the register
100 file (as long as the destinations have odd and even numbers), or 4
101 simultaneous 32-bit operations likewise. byte-level write-enable is
102 so that writes may be performed down to the 16-bit and 8-bit level
103 without requiring additional reads.
104
105 Additionally, if a read is requested for a register that is currently
106 being written, the written value shall be "passed through" on the same
107 cycle, such that the register file may effectively be used as an
108 "Operand Forwarding" Channel.
109
110 # Function Units
111
112 ## Commit Phase (instruction order preservation)
113
114 # 6600 Scoreboards
115
116 6600 Scoreboards are usually viewed as incomplete: incapable of register
117 renaming and precise exceptions are two of the perceived flaws. These
118 flaws do not exist, however it takes some explaining.
119
120 ## Q-Table (FU to Register Lookup)
121
122 The Q Table is a lookup table that records (in binary form in the
123 original 6600, however unary bit-wise form - N Function Unit bits
124 and M register bits - can be recommended) the last Function Unit
125 that, in instruction issue order, is to write to any given
126 register.
127
128 However, to support "nameless" registers, the Q-Table shall support
129 *multiple* (historical) entries, recording the history of the
130 *previous* Function Unit that was to write to each register.
131 When historic entries exist (non-empty), the following shall occur:
132
133 * All Function Units with historic entries shall **not** commit
134 their values to the register file, even if they are free to do so.
135 * All Function Units with historic entries shall hold a "write hazard"
136 against their dependencies that are waiting for that "nameless" result.
137 * When a dependent Function Unit has cleared all possibility of an
138 Exception being raised, it shall **drop** the write hazard on the
139 "nameless" source.
140 * If a "nameless" Function Unit needs to generate an Exception, it
141 does so in the standard way (see "Exceptions"), **however**,
142 in doing so it will also result in a **roll back** of the Q-Table for
143 **all and any** cancelled Function Units, to *previous* (historic)
144 Q-Table values for the relevant destination registers. Once
145 rolled back, the Function Unit must store its result in the register
146 file, prior to permitting the Exception to proceed.
147 * Likewise If a dependent Function Unit has to generate an exception,
148 and its source Function Units are "nameless", the "nameless"
149 Function Units must also "roll back", store their results, and
150 finally permit the Exception to trigger.
151 * Likewise, all other "nameless" results must also be "rolled back",
152 except unlike the Function Units triggering the exception they may
153 roll back to the newest "nameless" historical Q-Table entry
154 (if they have not already been cancelled by the FU triggering the
155 exception).
156
157 Bear in mind that exceptions (like all operations that are ready to
158 commit) may only occur in-order (following a FU-to-FU "link" bit),
159 and may only occur if the Function Unit is entirely free of write hazards.
160
161 ## FU-to-FU Dependency Matrix
162
163 The Function-Unit to Function-Unit Dependency Matrix expresses the
164 read and write hazards - dependencies - between Function Units.
165
166 ## Branch Speculation
167
168 Branch speculation is done by preventing instructions from becoming
169 "writeable" until the Branch Unit knows if it has resolved or not.
170 This is done with the addition of "Shadow" lines, as shown below:
171
172 This image reproduced with kind permission, Copyright (C) Mitch Alsup
173 [[!img shadow_issue_flipflops.png]]
174
175 Note that there are multiple "Shadow" signals, coming not just from Branch
176 Speculation but also from predication and exception shadows.
177
178 On a "Failed" signal, the instruction is told to "Go Die". This is
179 passed to the Computation Unit as well. When all "Success" signals
180 are raised the instruction is permitted to enter "Writeable".
181
182 ## Exceptions
183
184 Exceptions shall be handled by each instruction that *may* throw an
185 exception having and holding a "Shadow" wire over all dependent
186 Function Units, in exactly the same way as Branch Speculation.
187 Likewise, dependent instructions are prevented and prohibited from
188 entering the "Writeable" state.
189
190 Dependent downstream instructions, if the exception is thrown,
191 shall have the "Failed" bit ASSERTED (by the Function Unit throwing
192 the exception) such that the down-stream dependent instruction is told
193 to "Go Die".
194
195 If the point is reached at which the instruction knows that the
196 Exception cannot possibly occur, the "Success" signal is raised
197 instead, thus cancelling the "hold" over dependent downstream
198 instructions - again in exactly the same way as Branch Speculation
199 "Success".
200
201 Exceptions may **only** be actually raised if they are at the front of
202 the instruction queue, i.e. if they are free of write hazards.
203 See section on "Function Unit Commit" phase, as the Function Units
204 have a "link bit" that preserves the instruction issue order, which
205 must also be respected.
206