(no commit message)
[libreriscv.git] / openpower / sv / sprs.mdwn
1 # SPRs <a name="sprs"></a>
2
3 ## SVSTATE SPR
4
5
6 The format of the SVSTATE SPR is as follows:
7
8 | Field | Name | Description |
9 | ----- | -------- | --------------------- |
10 | 0:6 | maxvl | Max Vector Length |
11 | 7:13 | vl | Vector Length |
12 | 14:20 | srcstep | for srcstep = 0..VL-1 |
13 | 21:27 | dststep | for dststep = 0..VL-1 |
14 | 28:29 | dsubstep | for substep = 0..SUBVL-1 |
15 | 30:31 | ssubstep | for substep = 0..SUBVL-1 |
16 | 32:33 | mi0 | REMAP RA/FRA/BFA SVSHAPE0-3 |
17 | 34:35 | mi1 | REMAP RB/FRB/BFB SVSHAPE0-3 |
18 | 36:37 | mi2 | REMAP RC/FRT SVSHAPE0-3 |
19 | 38:39 | mo0 | REMAP RT/FRT/BF SVSHAPE0-3 |
20 | 40:41 | mo1 | REMAP EA/RS/FRS SVSHAPE0-3 |
21 | 42:46 | SVme | REMAP enable (RA-RT) |
22 | 47:52 | rsvd | reserved |
23 | 53 | pack | PACK (srcstep reorder) |
24 | 54 | unpack | UNPACK (dststep order) |
25 | 55:61 | hphint | Horizontal Hint |
26 | 62 | RMpst | REMAP persistence |
27 | 63 | vfirst | Vertical First mode |
28
29 Notes:
30
31 * The entries are truncated to be within range. Attempts to set VL to
32 greater than MAXVL will truncate VL.
33 * Setting srcstep, dststep to 64 or greater, or VL or MVL to greater
34 than 64 is reserved and will cause an illegal instruction trap.
35
36 **SVSTATE Fields**
37
38 SVSTATE is a standard SPR that (if REMAP is not activated) contains sufficient
39 self-contaned information for a full context save/restore.
40 SVSTATE contains (and permits setting of):
41
42 * MVL (the Maximum Vector Length) - declares (statically) how
43 much of a regfile is to be reserved for Vector elements
44 * VL - Vector Length
45 * dststep - the destination element offset of the current parallel
46 instruction being executed
47 * srcstep - for twin-predication, the source element offset as well.
48 * ssubstep - the source subvector element offset of the current
49 parallel instruction being executed
50 * dsubstep - the destination subvector element offset of the current
51 parallel instruction being executed
52 * vfirst - Vertical First mode. srcstep, dststep and substep
53 **do not advance** unless explicitly requested to do so with svstep
54 * RMpst - REMAP persistence. REMAP will apply only to the following
55 instruction unless this bit is set, in which case REMAP "persists".
56 Reset (cleared) on use of the `setvl` instruction if used to
57 alter VL or MVL.
58 * Pack - if set then srcstep/ssubstep VL/SUBVL loop-ordering is inverted.
59 * UnPack - if set then dststep/dsubstep VL/SUBVL loop-ordering is inverted.
60 * hphint - Horizontal Parallelism Hint. Indicates that
61 no Hazards exist between groups of elements in sequential multiples of this number
62 (before REMAP). By definition: elements for which `FLOOR(srcstep/hphint)` is
63 equal *before REMAP* are in the same parallelism "group". In Vertical First Mode
64 hardware **MUST ONLY** process elements in the same group, and must stop
65 Horizontal Issue at the last element of a given group. Set to zero to indicate "no hint".
66 * SVme - REMAP enable bits, indicating which register is to be
67 REMAPed: RA, RB, RC, RT and EA are the canonical (typical) register names
68 associated with each bit, with RA being the LSB and EA being the MSB.
69 See table below for ordering. When `SVme` is zero (0b00000) REMAP
70 is **fully disabled and inactive** regardless of the contents of
71 `SVSTATE`, `mi0-mi2/mo0-mo1`, or the four `SVSHAPEn` SPRs
72 * mi0-mi2/mo0-mo1 - these
73 indicate the SVSHAPE (0-3) that the corresponding register (RA etc)
74 should use, as long as the register's corresponding SVme bit is set
75
76 Programmer's Note: the fact that REMAP is entirely dormant when `SVme` is zero
77 allows establishment of REMAP context well in advance, followed by utilising `svremap`
78 at a precise (or the very last) moment. Some implementations may exploit this
79 to cache (or take some time to prepare caches) in the background whilst other
80 (unrelated) instructions are being executed. This is particularly important to
81 bear in mind when using `svindex` which will require hardware to perform (and
82 cache) additional GPR reads.
83
84 Programmer's Note: when REMAP is activated it becomes necessary on any
85 context-switch (Interrupt or Function call) to detect (or know in advance)
86 that REMAP is enabled and to additionally save/restore the four SVSHAPE
87 SPRs, SVHAPE0-3. Given that this is expected to be a rare occurrence it was
88 deemed unreasonable to burden every context-switch or function call with
89 mandatory save/restore of SVSHAPEs, and consequently it is a *callee*
90 (and Trap Handler) responsibility. Callees (and Trap Handlers) **MUST**
91 avoid using all and any SVP64 instructions during the period where state
92 could be adversely affected. SVP64 purely relies on Scalar instructions,
93 so Scalar instructions (except the SVP64 Management ones and mtspr and
94 mfspr) are 100% guaranteed to have zero impact on SVP64 state.
95
96 **Max Vector Length (maxvl)** <a name="mvl" />
97
98 MAXVECTORLENGTH is a static (immediate-operand only) compile-time declaration
99 of the maximum number of elements in a Vector. MVL is limited to 7 bits
100 (in the first version of SVP64) and consequently the maximum number of
101 elements is limited to between 0 and 127.
102
103 MAXVL is normally (in other True-Scalable Vector ISAs) an Architecturally-defined
104 quantity related indirectly to the total available number of bits in the Vector
105 Register File. Cray Vectors had a Hardware-Architectural set limit of MAXVL=64.
106 RISC-V RVV has MAXVL defined in terms of a Silicon-Partner-selectable fixed number
107 of bits. MAXVL in Simple-V is set in terms of the number of *elements* and
108 may change at runtime.
109
110 Programmer's Note: Except by directly using `mtspr` on SVSTATE, which may
111 result in performance penalties on some hardware implementations, SVSTATE's `maxvl`
112 field may only be set **statically** as an immediate, by the `setvl` instruction.
113 It may **NOT** be set dynamically from a register. Compiler writers and assembly
114 programmers are expected to perform static register file analysis, subdivision,
115 and allocation and only utilise `setvl`. Direct writing to SVSTATE in order to
116 "bypass" this Note could, in less-advanced implementations, potentially cause stalling,
117 particularly if SVP64 instructions are issued directly after the `mtspr` to SVSTATE.
118
119 **Vector Length (vl)** <a name="vl" />
120
121 The actual Vector length, the number of elements in a "Vector", `SVSTATE.vl` may be set
122 entirely dynamically at runtime from a number of sources. `setvl` is the primary
123 instruction for setting Vector Length.
124 `setvl` is conceptually similar but different from the Cray, SX Aurora, and RISC-V RVV
125 equivalent. Similar to RVV, VL is set to be within
126 the range 0 <= VL <= MVL. Unlike RVV, VL is set **exactly** according to the following:
127
128 ```
129 VL = (RT|0) = MIN(vlen, MVL)
130 ```
131
132 where `0 <= MVL <= 127`, and vlen may come from an immediate, `RA`, or from the `CTR` SPR,
133 depending on options selected with the `setvl` instruction.
134
135 Programmer's Note: conceptual understanding of Cray-style Vectors is far beyond the scope
136 of the Power ISA Technical Reference. Guidance on the 50-year-old Cray Vector paradigm is
137 best sought elsewhere: good studies include Academic Courses given on the 1970s
138 Cray Supercomputers over at least the past three decades.
139
140 **Horizontal Parallelism**
141
142 A problem exists for hardware where it may not be able to detect
143 that a programmer (or compiler) knows of opportunities for parallelism
144 and lack of overlap between loops, despite these being easy for a compiler
145 to statically detect and potentially express.
146 `hphint` is such an expression, declaring that elements within a batch are
147 independent of each other (no Register *or Memory* Hazards).
148
149 Elements are considered to be in the same source batch if they have
150 the same value of `FLOOR(srcstep/hphint)`. Likewise in the same destination batch
151 for the same value `FLOOR(dststep/hphint)`.
152 Four key observations here:
153
154 1. predication is **not** involved here. the number of actual elements
155 involved is considered *before* predicate masks are applied.
156 2. twin predication can result in srcstep and dststep being in different
157 batches
158 3. batch evaluation is done *before* REMAP, making Hazard elimination easier
159 for Multi-Issue systems.
160 4. `hphint` is *not* limited to power-of-two. Hardware implementors may choose
161 a lower parallelism hint up to `hphint` and may find power-of-two more
162 convenient. Actual parallelism (Dependency Hazard relaxation) must **never**
163 exceed `hphint`.
164
165 *Hardware Architect note: each element within the same group may be treated as
166 100% independent from any other element within that group, and therefore
167 neither Register Hazards nor Memory Hazards inter-element exist
168 (but inter-group definitely does). This makes
169 implementation far easier on resources because the Hazard Dependencies are
170 effectively at a much coarser granularity than a single register.*
171
172 `hphint` may legitimately be set greater than `MAXVL`. This indicates to Multi-Issue
173 hardware that even though MAXVL is relatively small the batches are *still independent*
174 and therefore if Multi-Issue hardware chooses to allocate several batches up to
175 `MAXVL` in size they are still independent. This helps greatly simplify Multi-Issue
176 systems by significantly reducing Hazards.
177
178 **Considerable care** must be taken when setting `hphint`. Matrix Outer Product
179 could produce corrupted results if `hphint` is set to greater than the innermost
180 loop depth. Parallel Reduction, DCT and FFT REMAP all are similarly critically affected
181 by `hphint` in ways that if used correctly greatly increases ease of parallelism but
182 if done incorrectly will also result in data corruption. Reduction/Iteration
183 also requires care to correctly declare in `hphint` how many elements are
184 independent. In the case of most Reduction use-cases the answer is almost certainly
185 "none".
186
187 `hphint` must definitely not be set on Atomic Memory operations, Cache-Inhibited
188 Memory operations, or Load-Reservation Store-Conditional. Also if Load-with-Update
189 Data-Dependent Fail-First is ever used for linked-list pointer-chasing, `hphint`
190 should again definitely be disabled.
191
192 `hphint` may only be ignored by Hardware Implementors as long as full element-level
193 Register and Memory Hazards are implemented *in full* (including right down to individual
194 bytes of each register for when elwidth=8/16/32). In other words if `hphint` is to
195 be ignored then implementations must be made as if `hphint=0`.
196
197 **Horizontal Parallelism in Vertical-First Mode**
198
199 Setting `hphint` with Vertical-First is perfectly legitimate. Under these circumstances
200 the single-element strict Program Execution Order must be preserved at all times, but
201 should there be a small enough program loop, than Out-of-Order Hardware may *merge*
202 consecutive element-based instructions into the *same Reservation Stations*, for
203 multiple operations to be passed to massive-wide back-end SIMD ALUs or Vector-Chaining ALUs.
204 **Only** elements within the same `hphint` group (across multiple such looped instructions)
205 may be treated such.
206
207 Note that if the loop of Vertical-First instructions cannot fit entirely into Reservation
208 Stations then Hardware clearly cannot exploit the above optimisation opportunity, but at
209 least there is no harm done: the loop is still correctly executed as Scalar instructions.
210 Programmers do need to be aware though that short loops on some Hardware Implementations
211 can be made considerably faster than on other Implementations.
212
213 ## SVLR
214
215 SV Link Register, exactly analogous to LR (Link Register) may
216 be used for temporary storage of SVSTATE, and, in particular,
217 Vectorised Branch-Conditional instructions may interchange
218 SVLR and SVSTATE whenever LR and NIA are.
219
220 Note that there is no equivalent Link variant of SVREMAP or
221 SVSHAPE0-3 (it would be too costly), so SVLR has limited applicability:
222 REMAP SPRs must be saved and restored explicitly.
223
224 -----------
225
226 [[!tag standards]]
227