3430d260dc7b15542f04256297805f3b74f299df
[libreriscv.git] / 3d_gpu / requirements_specification.mdwn
1 # Requirements Specification
2
3 This document contains the Requirements Specification for the Libre RISC-V
4 micro-architectural design. It shall meet the target of 5-6 32-bit GFLOPs,
5 150 M-Pixels/sec, 30 Million Triangles/sec, and minimum video decode
6 capability of 720p @ 30fps to a 1920x1080 framebuffer, in under 2.5 watts
7 at an 800mhz clock rate. Exceeding this target is acceptable if the
8 power budget is not exceeded. Exceeding this target "just because we can"
9 is also acceptable, as long as it does not disrupt meeting the minimum
10 performance and power requirements.
11
12 # General Architectural Design Principle
13
14 The general design base is to utilise an augmented and enhanced variant
15 of the original CDC 6600 scoreboard system. It is not well-known that
16 the 6600 includes operand forwarding and register renaming. Precise
17 exceptions, precise in-order commit, branch speculation, "nameless"
18 registers (results detected that need not be written because they have
19 been overwritten by another instruction), predication and vectorisation
20 will all be added by overloading write hazards.
21
22 An overview of the design is as follows:
23
24 * 3D and Video primitives (operations) will only be added as strictly
25 necessary to achieve the minimum power and performance target.
26 * Identified so far is a 4xFP32 ARGB Quad to 1xINT32 ARGB pixel
27 conversion opcode (part of the Vulkan API). It will write directly
28 to a separate "tile buffer" (SRAM), not to the integer register
29 file. The instruction will be scalar and will inherently and
30 automatically parallelised by SV, just like all other scalar opcodes.
31 * The register files will be stratified into 4-way 2R1W banks,
32 with byte-level write-enable on all banks.
33 * 6600-style scoreboards will be augmented with "shadow" wires
34 and write hazard capability on exceptions, branch speculation,
35 LD/ST and predication.
36 * Each "shadow" capability will be provided by a separate Function
37 Unit. If there is to exist the possibility of rolling ahead
38 through two speculative branches, then two **separate**
39 Branch-speculative Function Units will be required: each will
40 hold their own separate and distinct "shadow" (Go-Die wire) and
41 write-hazard over instructions on which the branch depends.
42 * Likewise for predication, which shall place a "hold" on
43 the Function Units that depend on it until the register used
44 as a predicate mask has been read and decoded. Bits in the
45 mask that are "zero" will result in "Go-Die" signals being
46 sent to the Function Units previously (speculatively) allocated
47 for that (now cancelled) element operation.
48 * The 6600 "Q-Table" that records, for each register, the last Function
49 Unit (in instruction issue order) that is to write its result to that
50 register, shall be augmented with "history" capability that aids and
51 assists in "rollback" of "nameless" registers, should an exception
52 or interrupt occur.
53 * Function Units will have both src and destination Reservation
54 Stations (latches) in order to buffer incoming and outgoing data.
55 This to make best use of (limited) inter-Function-Unit bus bandwidth.
56 * Crossbar Routing from the Register File will be on the **source**
57 registers **only**: Function Units will route **directly** to
58 and be hard-wired associated with one of four register banks.
59 * Additional "Operand Forwarding" crossbar(s) will be added that
60 **bypass** the register file entirely, to be used exclusively
61 for registers that have specifically been identified as "nameless".
62 * Function Units will be the *front-end* to **shared** pipelined
63 concurrent ALUs. The input src registers will come from the
64 latches associated with the Function Unit, and will put the
65 result **back** into the destination latch associated with that
66 **same** Function Unit.
67 * **Pairs** of 32-bit Function Units will handle 64-bit operations,
68 with the 32-bit src Reservation Stations (latches) "teaming up"
69 to store 64-bit src register values, and likewise the 32-bit
70 destination latches for the same (paired) Function Units.
71 * 32-bit Function Units will handle 8 and 16 bit operations in
72 cases where batches of operations may be (easily, conveniently)
73 allocated to a 32-bit-wide SIMD-style (predicated) ALU.
74 * Additional 8-bit Function Units (in groups of 4) will handle
75 8-bit operations as well as pair up to handle 16-bit operations
76 in cases where neither 8 nor 16 bit operations can be (conveniently,
77 easily) allocated to parallel (SIMD-like) ALUs. This to handle
78 corner-cases and to not jam up the 32-bit Function Units with single-byte
79 operations (resulting in only 25% utilisation).
80 * Allocation of an operation to a 32-bit ALU will block the
81 corresponding 8/16-bit Function Unit(s) for that register, and vice-versa.
82 8/16-bit operations will however **not** block the remaining
83 (unallocated) bytes of the same register from being utilised.
84
85 # Register File
86
87 There shall be two 127-entry 64-bit register files: one for floating-point,
88 the other for integer operations. Each shall have byte-level write-enable
89 lines, and shall be divided into 4-way 2R1W banks that are split into
90 odd-even register numbers and further split into hi-32 and lo-32 bits.
91
92 In this way, 2 simultaneous 64-bit operations may write to the register
93 file (as long as the destinations have odd and even numbers), or 4
94 simultaneous 32-bit operations likewise. byte-level write-enable is
95 so that writes may be performed down to the 16-bit and 8-bit level
96 without requiring additional reads.
97
98 Additionally, if a read is requested for a register that is currently
99 being written, the written value shall be "passed through" on the same
100 cycle, such that the register file may effectively be used as an
101 "Operand Forwarding" Channel.
102
103 # Function Units
104
105
106 # 6600 Scoreboards
107
108 6600 Scoreboards are usually viewed as incomplete: incapable of register
109 renaming and precise exceptions are two of the perceived flaws. These
110 flaws do not exist, however it takes some explaining.
111
112 ## Q-Table (FU to Register Lookup)
113
114 The Q Table is a lookup table that records (in binary form in the
115 original 6600, however unary bit-wise form - N Function Unit bits
116 and M register bits - can be recommended) the last Function Unit
117 that, in instruction issue order, is to write to any given
118 register.
119
120 However, to support "nameless" registers, the Q-Table shall support
121 *multiple* (historical) entries, recording the history of the
122 *previous* Function Unit that was to write to each register.
123 When historic entries exist (non-empty), the following shall occur:
124
125 * All Function Units with historic entries shall **not** commit
126 their values to the register file, even if they are free to do so.
127 * All Function Units with historic entries shall hold a "write hazard"
128 against their dependencies that are waiting for that "nameless" result.
129 * When a dependent Function Unit has cleared all possibility of an
130 Exception being raised, it shall **drop** the write hazard on the
131 "nameless" source.
132 * If a "nameless" Function Unit needs to generate an Exception, it
133 does so in the standard way (see "Exceptions"), **however**,
134 in doing so it will also result in a **roll back** of the Q-Table for
135 **all and any** cancelled Function Units, to *previous* (historic)
136 Q-Table values for the relevant destination registers. Once
137 rolled back, the Function Unit must store its result in the register
138 file, prior to permitting the Exception to proceed.
139 * Likewise If a dependent Function Unit has to generate an exception,
140 and its source Function Units are "nameless", the "nameless"
141 Function Units must also "roll back", store their results, and
142 finally permit the Exception to trigger.
143 * Likewise, all other "nameless" results must also be "rolled back",
144 except unlike the Function Units triggering the exception they may
145 roll back to the newest "nameless" historical Q-Table entry
146 (if they have not already been cancelled by the FU triggering the
147 exception).
148
149 Bear in mind that exceptions (like all operations that are ready to
150 commit) may only occur in-order (following a FU-to-FU "link" bit),
151 and may only occur if the Function Unit is entirely free of write hazards.
152
153 ## FU-to-FU Dependency Matrix
154
155 The Function-Unit to Function-Unit Dependency Matrix expresses the
156 read and write hazards - dependencies - between Function Units.