(no commit message)
[libreriscv.git] / openpower / sv / svstep.mdwn
1 # svstep: Vertical-First Stepping and status reporting
2
3 SVL-Form
4
5 * svstep RT,SVi,vf (Rc=0)
6 * svstep. RT,SVi,vf (Rc=1)
7
8 | 0-5|6-10|11.15|16..22| 23-25 | 26-30 |31| Form |
9 |----|----|-----|------|----------|-------|--|--------- |
10 |PO | RT | / | SVi | / / vf | XO |Rc| SVL-Form |
11
12 Pseudo-code:
13
14 ```
15 if SVi[3:4] = 0b11 then
16 # store pack and unpack in SVSTATE
17 SVSTATE[53] <- SVi[5]
18 SVSTATE[54] <- SVi[6]
19 RT <- [0]*62 || SVSTATE[53:54]
20 else
21 # Vertical-First explicit stepping.
22 step <- SVSTATE_NEXT(SVi, vf)
23 RT <- [0]*57 || step
24 ```
25
26 Special Registers Altered:
27
28 CR0 (if Rc=1)
29
30 **Description**
31
32 svstep may be used to enquire about the REMAP Schedule and it may be
33 used to alter Vectorization State. When `vf=1` then stepping occurs.
34 When `vf=0` the enquiry is performed without altering internal state.
35 If `SVi=0, Rc=0, vf=0` the instruction is a `nop`.
36
37 The following Modes exist:
38
39 * `SVi=0`: appropriately step srcstep, dststep, subsrcstep and subdststep
40 to the next element, taking pack and unpack into consideration.
41 * When `SVi` is 1-4 the REMAP Schedule for a given SVSHAPE may be
42 returned in `RT`. SVi=1 selects SVSHAPE0 current state,
43 through to SVi=4 selects SVSHAPE3.
44 * When `SVi` is 5, `SVSTATE.srcstep` is returned.
45 * When `SVi` is 6, `SVSTATE.dststep` is returned.
46 * When `SVi` is 7, `SVSTATE.ssubstep` is returned.
47 * When `SVi` is 8, `SVSTATE.dsubstep` is returned.
48 * When `SVi` is 0b1100 pack/unpack in SVSTATE is cleared
49 * When `SVi` is 0b1101 pack in SVSTATE is set, unpack is cleared
50 * When `SVi` is 0b1110 unpack in SVSTATE is set, pack is cleared
51 * When `SVi` is 0b1111 pack/unpack in SVSTATE are set
52
53 As this is a Single-Predicated (1P) instruction, predication may be applied
54 to skip (or zero) elements.
55
56 * Vertical-First Mode will return the requested index
57 (and move to the next state if `vf=1`)
58 * Horizontal-First Mode can be used to return all indices,
59 i.e. walks through all possible states.
60
61 **Vectorization of svstep itself**
62
63 As a 32-bit instruction, `svstep` may be itself be Vector-Prefixed, as
64 `sv.svstep`. This will work perfectly well in Horizontal-First
65 as it will in Vertical-First Mode although there are caveats for
66 the Deterministic use of looping with Sub-Vectors in Vertical-First mode.
67
68 Example: to obtain the full set of possible computed element
69 indices use `sv.svstep *RT,SVi,1` which will store all computed element
70 indices, starting from RT. If Rc=1 then a co-result Vector of CR Fields
71 will also be returned, comprising the "loop end-points" of each of the inner
72 loops when either Matrix Mode or DCT/FFT is set. In other words,
73 for example, when the `xdim` inner loop reaches the end and on the next
74 iteration it will begin again at zero, the CR Field `EQ` will be set.
75 With a maximum of three loops within both Matrix and DCT/FFT Modes,
76 the CR Field's EQ bit will be set at the end of the first inner loop,
77 the LE bit for the second, the GT bit for the outermost loop and the
78 SO bit set on the very last element, when all loops reach their maximum
79 extent.
80
81 *Programmer's note: VL in some situations, particularly larger
82 Matrices (5x7x3 will set MAXVL=105), will cause `sv.svstep` to return a
83 considerable number of values. Under such circumstances `sv.svstep/ew=8`
84 is recommended.*
85
86 *Programmer's note: having conveniently obtained a pre-computed Schedule
87 with `sv.svstep`, it may then be used as the input to Indexed REMAP
88 Mode to achieve the exact same Schedule. It is evident however that
89 before use some of the Indices may be arbitrarily altered as desired.
90 `sv.svstep` helps the programmer avoid having to manually recreate
91 Indices for certain types of common Loop patterns. In its simplest form,
92 without REMAP (SVi=5 or SVi=6), is equivalent to the `iota` instruction
93 found in other Vector ISAs*
94
95 **Vertical First Mode**
96
97 Vertical First is effectively like an implicit single bit predicate
98 applied to every SVP64 instruction. **ONLY** one element in each SVP64
99 Vector instruction is executed; srcstep and dststep do **not** increment
100 automatically on completion of one instruction, and the Program Counter
101 progresses **immediately** to the next instruction just as it would for
102 any standard scalar v3.0B instruction.
103
104 A mode of srcstep (SVi=0) is called which can move srcstep and dststep
105 on to the next element, still respecting predicate masks.
106
107 In other words, where normal SVP64 Vectorization acts "horizontally"
108 by looping first through 0 to VL-1 and only then moving the PC to the
109 next instruction, Vertical-First moves the PC onwards (vertically)
110 through multiple instructions **with the same srcstep and dststep**,
111 then an explict instruction used to advance srcstep/dststep. An outer
112 loop is expected to be used (branch instruction) which completes a series
113 of Vector operations.
114
115 Testing any end condition of any loop of any REMAP state allows branches
116 to be used to create loops.
117
118 *Programmer's note: when Predicate Non-Zeroing is used this indicates to
119 the underlying hardware that any masked-out element must be skipped.
120 *This includes in Vertical-First Mode*, and programmers should be
121 keenly aware that srcstep or dststep or both *may* jump by more than
122 one as a result, because the actual request under these circumstances
123 was to execute on the first available next *non-masked-out* element.
124 It should be evident that it is the `sv.svstep` instruction that must
125 be Predicated in order for the **entire** loop to use the Predicate
126 correctly, and it is strongly recommended for all instructions within
127 the same Vertical-First Loop to utilise the exact same Predicate Mask(s).*
128
129 Programmers should be aware that VL, srcstep and dststep and the SUBVL
130 substeps are global in nature. Nested looping with different schedules
131 is perfectly possible, as is calling of functions, however SVSTATE
132 (and any associated SVSHAPEs if REMAP is being used) should obviously
133 be stored on the stack in order to achieve this benefit not normally
134 found in Vector ISAs.
135
136 **Use of svstep with Vertical-First sub-vectors**
137
138 Incrementing and iteration through subvector state ssubstep and dsubstep is
139 possible with `sv.svstep/vecN` where as expected N may be 2/3/4. However it is necessary
140 to use the exact same Sub-Vector qualifier on any Prefixed
141 instructions, within any given Vertical-First loop: `vec2/3/4` is **not**
142 automatically applied to all instructions, it must be explicitly applied on
143 a per-instruction basis. Also valid
144 is not specifying a Sub-vector
145 qualifier at all, but it is critically important to note that
146 operations will be repeated. For example if `sv.svstep/vec2`
147 is not used on `sv.addi` then each Vector element operation is
148 repeated twice. The reason is that whilst svstep will be
149 iterating through both the SUBVL and VL loops, the addi instruction
150 only uses `srcstep` and `dststep` (not ssubstep or dsubstep) Illustrated below:
151
152 ```
153 def offset():
154 for step in range(VL):
155 for substep in range(SUBVL=2):
156 yield step, substep
157 for i, j in offset():
158 vec2_offs = i * SUBVL + j # calculate vec2 offset
159 addi RT+i, RA+i, 1 # but sv.addi is not vec2!
160 muli/vec2 RT+vec2_offs, RA+vec2_offs, 2 # this is
161 ```
162
163 Actual assembler would be:
164
165 ```
166 loop:
167 setvl VF=1, CTRmode
168 sv.addi *RT, *RA, 1 # no vec2
169 sv.muli/vec2 *RT, *RA, 2 # vec2
170 sv.svstep/vec2 # must match the muli
171 sv.bc CTRmode, loop # subtracts VL from CTR
172 ```
173
174 This illustrates the correct but seemingly-anomalous behaviour: `sv.svstep/vec2`
175 is being requested to update `SVSTATE` to follow a vec2 loop construct. The anomalous
176 `sv.addi` is not prohibited as it may in fact be desirable to execute operations twice,
177 or to re-load data that was overwritten, and many other possibilities.
178
179 -------------
180
181 \newpage{}
182
183 # Appendix
184
185 **src_iterate**
186
187 Note that `srcstep` and `ssubstep` are not the absolute final Element
188 (and Sub-Element) offsets. `srcstep` still has to go through individual
189 `REMAP` translation before becoming a per-operand (RA, RB, RC, RT, RS)
190 Element-level Source offset.
191
192 Note also critically that `PACK` mode simply inverts the outer/order
193 loops making SUBVL the outer loop and VL the inner.
194
195 ```
196 # source-stepping iterator
197 subvl = SVSTATE.subvl
198 vl = SVSTATE.vl
199 pack = SVSTATE.pack
200 unpack = SVSTATE.unpack
201 ssubstep = SVSTATE.ssubstep
202 end_ssub = ssubstep == subvl
203 end_src = SVSTATE.srcstep == vl-1
204 # first source step.
205 srcstep = SVSTATE.srcstep
206 # used below:
207 # sz - from RM.MODE, source-zeroing
208 # srcmask - from RM.MODE, the source predicate
209 if pack:
210 # pack advances subvl in *outer* loop
211 while True:
212 assert srcstep <= vl-1
213 end_src = srcstep == vl-1
214 if end_src:
215 if end_ssub:
216 loopend = True
217 else:
218 SVSTATE.ssubstep += 1
219 srcstep = 0 # reset
220 break
221 else:
222 srcstep += 1 # advance srcstep
223 if not sz:
224 break
225 if ((1 << srcstep) & srcmask) != 0:
226 break
227 else:
228 # advance subvl in *inner* loop
229 if end_ssub:
230 while True:
231 assert srcstep <= vl-1
232 end_src = srcstep == vl-1
233 if end_src: # end-point
234 loopend = True
235 srcstep = 0
236 break
237 else:
238 srcstep += 1
239 if not sz:
240 break
241 if ((1 << srcstep) & srcmask) != 0:
242 break
243 else:
244 log(" sskip", bin(srcmask), bin(1 << srcstep))
245 SVSTATE.ssubstep = 0b00 # reset
246 else:
247 # advance ssubstep
248 SVSTATE.ssubstep += 1
249
250 SVSTATE.srcstep = srcstep
251 ```
252
253 -------------
254
255 \newpage{}
256
257 **dest_iterate**
258
259 Note that `dststep` and `dsubstep` are not the absolute final Element
260 (and Sub-Element) offsets. `dststep` still has to go through individual
261 `REMAP` translation before becoming a per-operand (RT, RS/EA) destination
262 Element-level offset, and `dsubstep` may also go through `(f)mv.swizzle`
263 reordering.
264
265 Note also critically that `UNPACK` mode simply inverts the outer/order
266 loops making SUBVL the outer loop and VL the inner.
267
268 ```
269 # dest step iterator
270 vl = SVSTATE.vl
271 subvl = SVSTATE.subvl
272 unpack = SVSTATE.unpack
273 dsubstep = SVSTATE.dsubstep
274 end_dsub = dsubstep == subvl
275 dststep = SVSTATE.dststep
276 end_dst = dststep == vl-1
277 # used below:
278 # dz - from RM.MODE, destination-zeroing
279 # dstmask - from RM.MODE, the destination predicate
280 if unpack:
281 # unpack advances subvl in *outer* loop
282 while True:
283 assert dststep <= vl-1
284 end_dst = dststep == vl-1
285 if end_dst:
286 if end_dsub:
287 loopend = True
288 else:
289 SVSTATE.dsubstep += 1
290 dststep = 0 # reset
291 break
292 else:
293 dststep += 1 # advance dststep
294 if not dz:
295 break
296 if ((1 << dststep) & dstmask) != 0:
297 break
298 else:
299 # advance subvl in *inner* loop
300 if end_dsub:
301 while True:
302 assert dststep <= vl-1
303 end_dst = dststep == vl-1
304 if end_dst: # end-point
305 loopend = True
306 dststep = 0
307 break
308 else:
309 dststep += 1
310 if not dz:
311 break
312 if ((1 << dststep) & dstmask) != 0:
313 break
314 SVSTATE.dsubstep = 0b00 # reset
315 else:
316 # advance ssubstep
317 SVSTATE.dsubstep += 1
318
319 SVSTATE.dststep = dststep
320 ```
321
322 -------------
323
324 \newpage{}
325
326 **SVSTATE_NEXT**
327
328 ```
329 if SVi = 1 then return REMAP SVSHAPE0 current offset
330 if SVi = 2 then return REMAP SVSHAPE1 current offset
331 if SVi = 3 then return REMAP SVSHAPE2 current offset
332 if SVi = 4 then return REMAP SVSHAPE3 current offset
333 if SVi = 5 then return SVSTATE.srcstep # VL source step
334 if SVi = 6 then return SVSTATE.dststep # VL dest step
335 if SVi = 7 then return SVSTATE.ssubstep # SUBVL source step
336 if SVi = 8 then return SVSTATE.dsubstep # SUBVL dest step
337
338 # SVi=0, explicit iteration requezted
339 src_iterate();
340 dst_iterate();
341 return 0
342 ```
343
344 **at_loopend**
345
346 Both Vertical-First and Horizontal-First may use this algorithm to
347 determine if the "end-of-looping" (end of Sub-Program-Counter) has
348 been reached. Horizontal-First Mode will immediately move to the
349 next instruction, where `svstep.` will set `CR0.EQ` to 1.
350
351 ```
352 # tells if this is the last possible element.
353 subvl = SVSTATE.subvl
354 vl = SVSTATE.vl
355 end_ssub = SVSTATE.ssubstep == subvl
356 end_dsub = SVSTATE.dsubstep == subvl
357 if SVSTATE.srcstep == vl-1 and end_ssub:
358 return True
359 if SVSTATE.dststep == vl-1 and end_dsub:
360 return True
361 return False
362 ```
363
364 [[!tag standards]]
365
366 -------------
367
368 \newpage{}
369
370