(no commit message)
[libreriscv.git] / openpower / sv / svstep.mdwn
1 <!-- hide -->
2 # Links
3 * <https://bugs.libre-soc.org/show_bug.cgi?id=213>
4 <!-- show -->
5 # svstep: Vertical-First Stepping and status reporting
6
7 SVL-Form
8
9 * svstep RT,SVi,vf (Rc=0)
10 * svstep. RT,SVi,vf (Rc=1)
11
12 | 0-5|6-10|11.15|16..22| 23-25 | 26-30 |31| Form |
13 |----|----|-----|------|----------|-------|--|--------- |
14 |PO | RT | / | SVi | / / vf | XO |Rc| SVL-Form |
15
16 Pseudo-code:
17
18 ```
19 if SVi[3:4] = 0b11 then
20 # store pack and unpack in SVSTATE
21 SVSTATE[53] <- SVi[5]
22 SVSTATE[54] <- SVi[6]
23 RT <- [0]*62 || SVSTATE[53:54]
24 else
25 # Vertical-First explicit stepping.
26 step <- SVSTATE_NEXT(SVi, vf)
27 RT <- [0]*57 || step
28 ```
29
30 Special Registers Altered:
31
32 CR0 (if Rc=1)
33
34 **Description**
35
36 svstep may be used to enquire about the REMAP Schedule and it may be
37 used to alter Vectorization State. When `vf=1` then stepping occurs.
38 When `vf=0` the enquiry is performed without altering internal state.
39 If `SVi=0, Rc=0, vf=0` the instruction is a `nop`.
40
41 The following Modes exist:
42
43 * `SVi=0`: appropriately step srcstep, dststep, subsrcstep and subdststep
44 to the next element, taking pack and unpack into consideration.
45 * When `SVi` is 1-4 the REMAP Schedule for a given SVSHAPE may be
46 returned in `RT`. SVi=1 selects SVSHAPE0 current state,
47 through to SVi=4 selects SVSHAPE3.
48 * When `SVi` is 5, `SVSTATE.srcstep` is returned.
49 * When `SVi` is 6, `SVSTATE.dststep` is returned.
50 * When `SVi` is 7, `SVSTATE.ssubstep` is returned.
51 * When `SVi` is 8, `SVSTATE.dsubstep` is returned.
52 * When `SVi` is 0b1100 pack/unpack in SVSTATE is cleared
53 * When `SVi` is 0b1101 pack in SVSTATE is set, unpack is cleared
54 * When `SVi` is 0b1110 unpack in SVSTATE is set, pack is cleared
55 * When `SVi` is 0b1111 pack/unpack in SVSTATE are set
56
57 As this is a Single-Predicated (1P) instruction, predication may be applied
58 to skip (or zero) elements.
59
60 * Vertical-First Mode will return the requested index
61 (and move to the next state if `vf=1`)
62 * Horizontal-First Mode can be used to return all indices,
63 i.e. walks through all possible states.
64
65 **Vectorization of svstep itself**
66
67 As a 32-bit instruction, `svstep` may be itself be Vector-Prefixed, as
68 `sv.svstep`. This will work perfectly well in Horizontal-First
69 as it will in Vertical-First Mode although there are caveats for
70 the Deterministic use of looping with Sub-Vectors in Vertical-First mode.
71
72 Example: to obtain the full set of possible computed element
73 indices use `sv.svstep *RT,SVi,1` which will store all computed element
74 indices, starting from RT. If Rc=1 then a co-result Vector of CR Fields
75 will also be returned, comprising the "loop end-points" of each of the inner
76 loops when either Matrix Mode or DCT/FFT is set. In other words,
77 for example, when the `xdim` inner loop reaches the end and on the next
78 iteration it will begin again at zero, the CR Field `EQ` will be set.
79 With a maximum of three loops within both Matrix and DCT/FFT Modes,
80 the CR Field's EQ bit will be set at the end of the first inner loop,
81 the LE bit for the second, the GT bit for the outermost loop and the
82 SO bit set on the very last element, when all loops reach their maximum
83 extent.
84
85 *Programmer's note: VL in some situations, particularly larger
86 Matrices (5x7x3 will set MAXVL=105), will cause `sv.svstep` to return a
87 considerable number of values. Under such circumstances `sv.svstep/ew=8`
88 is recommended.*
89
90 *Programmer's note: having conveniently obtained a pre-computed Schedule
91 with `sv.svstep`, it may then be used as the input to Indexed REMAP
92 Mode to achieve the exact same Schedule. It is evident however that
93 before use some of the Indices may be arbitrarily altered as desired.
94 `sv.svstep` helps the programmer avoid having to manually recreate
95 Indices for certain types of common Loop patterns. In its simplest form,
96 without REMAP (SVi=5 or SVi=6), is equivalent to the `iota` instruction
97 found in other Vector ISAs*
98
99 **Vertical First Mode**
100
101 Vertical First is effectively like an implicit single bit predicate
102 applied to every SVP64 instruction. **ONLY** one element in each SVP64
103 Vector instruction is executed; srcstep and dststep do **not** increment
104 automatically on completion of one instruction, and the Program Counter
105 progresses **immediately** to the next instruction just as it would for
106 any standard scalar v3.0B instruction.
107
108 A mode of srcstep (SVi=0) is called which can move srcstep and dststep
109 on to the next element, still respecting predicate masks.
110
111 In other words, where normal SVP64 Vectorization acts "horizontally"
112 by looping first through 0 to VL-1 and only then moving the PC to the
113 next instruction, Vertical-First moves the PC onwards (vertically)
114 through multiple instructions **with the same srcstep and dststep**,
115 then an explict instruction used to advance srcstep/dststep. An outer
116 loop is expected to be used (branch instruction) which completes a series
117 of Vector operations.
118
119 Testing any end condition of any loop of any REMAP state allows branches
120 to be used to create loops.
121
122 *Programmer's note: when Predicate Non-Zeroing is used this indicates to
123 the underlying hardware that any masked-out element must be skipped.
124 *This includes in Vertical-First Mode*, and programmers should be
125 keenly aware that srcstep or dststep or both *may* jump by more than
126 one as a result, because the actual request under these circumstances
127 was to execute on the first available next *non-masked-out* element.
128 It should be evident that it is the `sv.svstep` instruction that must
129 be Predicated in order for the **entire** loop to use the Predicate
130 correctly, and it is strongly recommended for all instructions within
131 the same Vertical-First Loop to utilise the exact same Predicate Mask(s).*
132
133 Programmers should be aware that VL, srcstep and dststep and the SUBVL
134 substeps are global in nature. Nested looping with different schedules
135 is perfectly possible, as is calling of functions, however SVSTATE
136 (and any associated SVSHAPEs if REMAP is being used) should obviously
137 be stored on the stack in order to achieve this benefit not normally
138 found in Vector ISAs.
139
140 **Use of svstep with Vertical-First sub-vectors**
141
142 Incrementing and iteration through subvector state ssubstep and dsubstep is
143 possible with `sv.svstep/vecN` where as expected N may be 2/3/4. However it is necessary
144 to use the exact same Sub-Vector qualifier on any Prefixed
145 instructions, within any given Vertical-First loop: `vec2/3/4` is **not**
146 automatically applied to all instructions, it must be explicitly applied on
147 a per-instruction basis. Also valid
148 is not specifying a Sub-vector
149 qualifier at all, but it is critically important to note that
150 operations will be repeated. For example if `sv.svstep/vec2`
151 is not used on `sv.addi` then each Vector element operation is
152 repeated twice. The reason is that whilst svstep will be
153 iterating through both the SUBVL and VL loops, the addi instruction
154 only uses `srcstep` and `dststep` (not ssubstep or dsubstep) Illustrated below:
155
156 ```
157 def offset():
158 for step in range(VL):
159 for substep in range(SUBVL=2):
160 yield step, substep
161 for i, j in offset():
162 vec2_offs = i * SUBVL + j # calculate vec2 offset
163 addi RT+i, RA+i, 1 # but sv.addi is not vec2!
164 muli/vec2 RT+vec2_offs, RA+vec2_offs, 2 # this is
165 ```
166
167 Actual assembler would be:
168
169 ```
170 loop:
171 setvl VF=1, CTRmode
172 sv.addi *RT, *RA, 1 # no vec2
173 sv.muli/vec2 *RT, *RA, 2 # vec2
174 sv.svstep/vec2 # must match the muli
175 sv.bc CTRmode, loop # subtracts VL from CTR
176 ```
177
178 This illustrates the correct but seemingly-anomalous behaviour: `sv.svstep/vec2`
179 is being requested to update `SVSTATE` to follow a vec2 loop construct. The anomalous
180 `sv.addi` is not prohibited as it may in fact be desirable to execute operations twice,
181 or to re-load data that was overwritten, and many other possibilities.
182
183 -------------
184
185 \newpage{}
186
187 # Appendix
188
189 **src_iterate**
190
191 Note that `srcstep` and `ssubstep` are not the absolute final Element
192 (and Sub-Element) offsets. `srcstep` still has to go through individual
193 `REMAP` translation before becoming a per-operand (RA, RB, RC, RT, RS)
194 Element-level Source offset.
195
196 Note also critically that `PACK` mode simply inverts the outer/order
197 loops making SUBVL the outer loop and VL the inner.
198
199 ```
200 # source-stepping iterator
201 subvl = SVSTATE.subvl
202 vl = SVSTATE.vl
203 pack = SVSTATE.pack
204 unpack = SVSTATE.unpack
205 ssubstep = SVSTATE.ssubstep
206 end_ssub = ssubstep == subvl
207 end_src = SVSTATE.srcstep == vl-1
208 # first source step.
209 srcstep = SVSTATE.srcstep
210 # used below:
211 # sz - from RM.MODE, source-zeroing
212 # srcmask - from RM.MODE, the source predicate
213 if pack:
214 # pack advances subvl in *outer* loop
215 while True:
216 assert srcstep <= vl-1
217 end_src = srcstep == vl-1
218 if end_src:
219 if end_ssub:
220 loopend = True
221 else:
222 SVSTATE.ssubstep += 1
223 srcstep = 0 # reset
224 break
225 else:
226 srcstep += 1 # advance srcstep
227 if not sz:
228 break
229 if ((1 << srcstep) & srcmask) != 0:
230 break
231 else:
232 # advance subvl in *inner* loop
233 if end_ssub:
234 while True:
235 assert srcstep <= vl-1
236 end_src = srcstep == vl-1
237 if end_src: # end-point
238 loopend = True
239 srcstep = 0
240 break
241 else:
242 srcstep += 1
243 if not sz:
244 break
245 if ((1 << srcstep) & srcmask) != 0:
246 break
247 else:
248 log(" sskip", bin(srcmask), bin(1 << srcstep))
249 SVSTATE.ssubstep = 0b00 # reset
250 else:
251 # advance ssubstep
252 SVSTATE.ssubstep += 1
253
254 SVSTATE.srcstep = srcstep
255 ```
256
257 -------------
258
259 \newpage{}
260
261 **dest_iterate**
262
263 Note that `dststep` and `dsubstep` are not the absolute final Element
264 (and Sub-Element) offsets. `dststep` still has to go through individual
265 `REMAP` translation before becoming a per-operand (RT, RS/EA) destination
266 Element-level offset, and `dsubstep` may also go through `(f)mv.swizzle`
267 reordering.
268
269 Note also critically that `UNPACK` mode simply inverts the outer/order
270 loops making SUBVL the outer loop and VL the inner.
271
272 ```
273 # dest step iterator
274 vl = SVSTATE.vl
275 subvl = SVSTATE.subvl
276 unpack = SVSTATE.unpack
277 dsubstep = SVSTATE.dsubstep
278 end_dsub = dsubstep == subvl
279 dststep = SVSTATE.dststep
280 end_dst = dststep == vl-1
281 # used below:
282 # dz - from RM.MODE, destination-zeroing
283 # dstmask - from RM.MODE, the destination predicate
284 if unpack:
285 # unpack advances subvl in *outer* loop
286 while True:
287 assert dststep <= vl-1
288 end_dst = dststep == vl-1
289 if end_dst:
290 if end_dsub:
291 loopend = True
292 else:
293 SVSTATE.dsubstep += 1
294 dststep = 0 # reset
295 break
296 else:
297 dststep += 1 # advance dststep
298 if not dz:
299 break
300 if ((1 << dststep) & dstmask) != 0:
301 break
302 else:
303 # advance subvl in *inner* loop
304 if end_dsub:
305 while True:
306 assert dststep <= vl-1
307 end_dst = dststep == vl-1
308 if end_dst: # end-point
309 loopend = True
310 dststep = 0
311 break
312 else:
313 dststep += 1
314 if not dz:
315 break
316 if ((1 << dststep) & dstmask) != 0:
317 break
318 SVSTATE.dsubstep = 0b00 # reset
319 else:
320 # advance ssubstep
321 SVSTATE.dsubstep += 1
322
323 SVSTATE.dststep = dststep
324 ```
325
326 -------------
327
328 \newpage{}
329
330 **SVSTATE_NEXT**
331
332 ```
333 if SVi = 1 then return REMAP SVSHAPE0 current offset
334 if SVi = 2 then return REMAP SVSHAPE1 current offset
335 if SVi = 3 then return REMAP SVSHAPE2 current offset
336 if SVi = 4 then return REMAP SVSHAPE3 current offset
337 if SVi = 5 then return SVSTATE.srcstep # VL source step
338 if SVi = 6 then return SVSTATE.dststep # VL dest step
339 if SVi = 7 then return SVSTATE.ssubstep # SUBVL source step
340 if SVi = 8 then return SVSTATE.dsubstep # SUBVL dest step
341
342 # SVi=0, explicit iteration requezted
343 src_iterate();
344 dst_iterate();
345 return 0
346 ```
347
348 **at_loopend**
349
350 Both Vertical-First and Horizontal-First may use this algorithm to
351 determine if the "end-of-looping" (end of Sub-Program-Counter) has
352 been reached. Horizontal-First Mode will immediately move to the
353 next instruction, where `svstep.` will set `CR0.EQ` to 1.
354
355 ```
356 # tells if this is the last possible element.
357 subvl = SVSTATE.subvl
358 vl = SVSTATE.vl
359 end_ssub = SVSTATE.ssubstep == subvl
360 end_dsub = SVSTATE.dsubstep == subvl
361 if SVSTATE.srcstep == vl-1 and end_ssub:
362 return True
363 if SVSTATE.dststep == vl-1 and end_dsub:
364 return True
365 return False
366 ```
367
368 [[!tag standards]]
369
370 -------------
371
372 \newpage{}
373
374