1 # setvl: Set Vector Length
6 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-November/001366.html>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=535>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=587>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=914> TODO: setvl should not set SO
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=568> TODO
11 * <https://bugs.libre-soc.org/show_bug.cgi?id=927> bug - RT>=32
12 * <https://bugs.libre-soc.org/show_bug.cgi?id=862> VF Predication
13 * <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vsetvlivsetvl-instructions>
15 * pseudocode [[openpower/isa/simplev]]
18 Add the following section to the Simple-V Chapter
24 | 0-5|6-10|11-15|16-22 | 23 24 25 | 26-30 |31| FORM |
25 | -- | -- | --- | ---- |----------| ----- |--|----------|
26 |PO | RT | RA | SVi | ms vs vf | XO |Rc| SVL-Form |
28 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
29 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
34 overflow <- 0b0 # sets CR.SO if set and if Rc=1
37 if ms = 1 then MVL <- VLimm[0:6]
38 else MVL <- SVSTATE[0:6]
40 if vs = 0 then VL <- SVSTATE[7:13]
42 if (RA) >u 0b1111111 then
45 else VL <- (RA)[57:63]
46 else if _RT = 0 then VL <- VLimm[0:6]
47 else if CTR >u 0b1111111 then
51 # limit VL to within MVL
58 GPR(_RT) <- [0]*57 || VL
59 # MAXVL is a static "state-reset" opportunity so VF is only set then.
61 SVSTATE[63] <- vf # set Vertical-First mode
62 SVSTATE[62] <- 0b0 # clear persist bit
65 Special Registers Altered:
72 * `SVi` - bits 16-22 - an immediate operand for setting MVL and/or VL
73 * `ms` - bit 23 - allows for setting of MVL
74 * `vs` - bit 24 - allows for setting of VL
75 * `vf` - bit 25 - sets "Vertical First Mode".
77 Note that in immediate setting mode VL and MVL start from **one** but that
78 this is compensated for in the assembly notation. i.e. that an immediate
79 value of 1 in assembler notation actually places the value 0b0000000 in
80 the `SVi` field bits: on execution the `setvl` instruction adds one to
81 the decoded `SVi` field bits, resulting in VL/MVL being set to 1. In future
82 this will allow VL to be set to values ranging from 1 to 128 with only 7 bits
83 instead of 8. Setting VL/MVL to 0 would result in all Vector operations
84 becoming `nop`. If this is truly desired (nop behaviour) then setting
85 VL and MVL to zero is to be done via the [[SVSTATE SPR|sv/sprs]].
87 Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
90 setvli VL=8 : setvl r0, r0, VL=8, vf=0, vs=1, ms=0
91 setvli. VL=8 : setvl. r0, r0, VL=8, vf=0, vs=1, ms=0
92 setmvli MVL=8 : setvl r0, r0, MVL=8, vf=0, vs=0, ms=1
93 setmvli. MVL=8 : setvl. r0, r0, MVL=8, vf=0, vs=0, ms=1
96 Additional pseudo-op for obtaining VL without modifying it (or any state):
99 getvl r5 : setvl r5, r0, vf=0, vs=0, ms=0
100 getvl. r5 : setvl. r5, r0, vf=0, vs=0, ms=0
103 Note that whilst it is possible to set both MVL and VL from the same
104 immediate, it is not possible to set them to different immediates in
105 the same instruction. Doing so would require two instructions.
107 Use of setvl results in changes to the SVSTATE SPR. see [[sv/sprs]]
109 **Selecting sources for VL**
111 There is considerable opcode pressure, consequently to set MVL and VL
112 from different sources is as follows:
114 | condition | effect |
116 | `vs=1, RA=0, RT!=0` | VL,RT set to MIN(MVL, CTR) |
117 | `vs=1, RA=0, RT=0` | VL set to MIN(MVL, SVi+1) |
118 | `vs=1, RA!=0, RT=0` | VL set to MIN(MVL, RA) |
119 | `vs=1, RA!=0, RT!=0` | VL,RT set to MIN(MVL, RA) |
121 The reasoning here is that the opportunity to set RT equal to the
122 immediate `SVi+1` is sacrificed in favour of setting from CTR.
124 **Unusual Rc=1 behaviour**
126 Normally, the return result from an instruction is in `RT`. With it
127 being possible for `RT=0` to mean that `CTR` mode is to be read, some
128 different semantics are needed.
130 CR Field 0, when `Rc=1`, may be set even if `RT=0`. The reason is that
131 overflow may occur: `VL`, if set either from an immediate or from `CTR`,
132 may not exceed `MAXVL`, and if it is, `CR0.SO` must be set.
134 In reality it is **`VL`** being set. Therefore, rather than `CR0`
135 testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE is set if `VL`
140 Sub-vector elements are not be considered "Vertical". The vec2/3/4
141 is to be considered as if the "single element". Caveats exist for
142 [[sv/mv.swizzle]] and [[sv/mv.vec]] when Pack/Unpack is enabled, due
143 to the order in which VL and SUBVL loops are applied being swapped
144 (outer-inner becomes inner-outer)
148 ### Core concept loop
150 This example illustrates the Cray-style Loop concept. However where most Cray
151 Vectors have a Max Vector Length hard-coded into the architecture, Simple-V
152 allows MVL to be set, but only as a static immediate, so that compilers may
153 embed the register resource allocation statically at compile-time.
157 setvl a3, a0, MVL=8 # update a3 with vl
158 # (# of elements this iteration)
160 # set a3=VL=MIN(a0,MVL)
161 # do vector operations at up to 8 length (MVL=8)
163 sub. a0, a0, a3 # Decrement count by vl, set CR0.eq
164 bnez a0, loop # Any more?
169 In this example, the `setvl.` instruction enabled Rc=1, which
170 sets CR0.eq when VL becomes zero. Testing of `r4` (cmpi) is thus redundant
171 saving one instruction.
181 setvli. r4, r3, MVL=64
187 ### Load/Store-Multi (selective)
189 Up to 64 FPRs will be loaded, here. `r3` is set one per bit for each
190 FP register required to be loaded. The block of memory from which the
191 registers are loaded is contiguous (no gaps): any FP register which has
192 a corresponding zero bit in `r3` is *unaltered*. In essence this is a
193 selective LD-multi with "Scatter" (`VCOMPRESS`) capability.
196 setvli r0, MVL=64, VL=64
197 sv.fld/dm=r3 *r0, 0(r30) # selective load 64 FP registers
200 Up to 64 FPRs will be saved, here. Again, `r3` specifies which
201 registers are set in a `VEXPAND` fashion.
204 setvli r0, MVL=64, VL=64
205 sv.stfd/sm=r3 *fp0, 0(r30) # selective store 64 FP registers