1 # setvl: Set Vector Length
6 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-November/001366.html>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=535>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=587>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=914> TODO: setvl should not set SO
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=568> TODO
11 * <https://bugs.libre-soc.org/show_bug.cgi?id=927> bug - RT>=32
12 * <https://bugs.libre-soc.org/show_bug.cgi?id=862> VF Predication
13 * <https://bugs.libre-soc.org/show_bug.cgi?id=1222> Rc=1 enhancement needed
14 * <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vsetvlivsetvl-instructions>
16 * pseudocode [[openpower/isa/simplev]]
19 Add the following section to the Simple-V Chapter
25 | 0-5|6-10|11-15|16-22 | 23 24 25 | 26-30 |31| FORM |
26 | -- | -- | --- | ---- |----------| ----- |--|----------|
27 |PO | RT | RA | SVi | ms vs vf | XO |Rc| SVL-Form |
29 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
30 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
35 overflow <- 0b0 # sets CR.SO if set and if Rc=1
38 if ms = 1 then MVL <- VLimm[0:6]
39 else MVL <- SVSTATE[0:6]
41 if vs = 0 then VL <- SVSTATE[7:13]
43 if (RA) >u 0b1111111 then
46 else VL <- (RA)[57:63]
47 else if _RT = 0 then VL <- VLimm[0:6]
48 else if CTR >u 0b1111111 then
52 # limit VL to within MVL
59 GPR(_RT) <- [0]*57 || VL
60 # MAXVL is a static "state-reset" opportunity so VF is only set then.
62 SVSTATE[63] <- vf # set Vertical-First mode
63 SVSTATE[62] <- 0b0 # clear persist bit
66 Special Registers Altered:
73 * `SVi` - bits 16-22 - an immediate operand for setting MVL and/or VL
74 * `ms` - bit 23 - allows for setting of MVL
75 * `vs` - bit 24 - allows for setting of VL
76 * `vf` - bit 25 - sets "Vertical First Mode".
78 Note that in immediate setting mode VL and MVL start from **one** but that
79 this is compensated for in the assembly notation. i.e. that an immediate
80 value of 1 in assembler notation actually places the value 0b0000000 in
81 the `SVi` field bits: on execution the `setvl` instruction adds one to
82 the decoded `SVi` field bits, resulting in VL/MVL being set to 1. In future
83 this will allow VL to be set to values ranging from 1 to 128 with only 7 bits
84 instead of 8. Setting VL/MVL to 0 would result in all Vector operations
85 becoming `nop`. If this is truly desired (nop behaviour) then setting
86 VL and MVL to zero is to be done via the [[SVSTATE SPR|sv/sprs]].
88 Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
91 setvli VL=8 : setvl r0, r0, VL=8, vf=0, vs=1, ms=0
92 setvli. VL=8 : setvl. r0, r0, VL=8, vf=0, vs=1, ms=0
93 setmvli MVL=8 : setvl r0, r0, MVL=8, vf=0, vs=0, ms=1
94 setmvli. MVL=8 : setvl. r0, r0, MVL=8, vf=0, vs=0, ms=1
97 Additional pseudo-op for obtaining VL without modifying it (or any state):
100 getvl r5 : setvl r5, r0, vf=0, vs=0, ms=0
101 getvl. r5 : setvl. r5, r0, vf=0, vs=0, ms=0
104 Note that whilst it is possible to set both MVL and VL from the same
105 immediate, it is not possible to set them to different immediates in
106 the same instruction. Doing so would require two instructions.
108 Use of setvl results in changes to the SVSTATE SPR. see [[sv/sprs]]
110 **Selecting sources for VL**
112 There is considerable opcode pressure, consequently to set MVL and VL
113 from different sources is as follows:
115 | condition | effect |
117 | `vs=1, RA=0, RT!=0` | VL,RT set to MIN(MVL, CTR) |
118 | `vs=1, RA=0, RT=0` | VL set to MIN(MVL, SVi+1) |
119 | `vs=1, RA!=0, RT=0` | VL set to MIN(MVL, RA) |
120 | `vs=1, RA!=0, RT!=0` | VL,RT set to MIN(MVL, RA) |
122 The reasoning here is that the opportunity to set RT equal to the
123 immediate `SVi+1` is sacrificed in favour of setting from CTR.
125 **Unusual Rc=1 behaviour**
127 Normally, the return result from an instruction is in `RT`. With it
128 being possible for `RT=0` to mean that `CTR` mode is to be read, some
129 different semantics are needed.
131 CR Field 0, when `Rc=1`, may be set even if `RT=0`. The reason is that
132 overflow may occur: `VL`, if set either from an immediate or from `CTR`,
133 may not exceed `MAXVL`, and if it is, `CR0.SO` must be set.
135 In reality it is **`VL`** being set. Therefore, rather than `CR0`
136 testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE is set if `VL`
141 Sub-vector elements are not be considered "Vertical". The vec2/3/4
142 is to be considered as if the "single element". Caveats exist for
143 [[sv/mv.swizzle]] and [[sv/mv.vec]] when Pack/Unpack is enabled, due
144 to the order in which VL and SUBVL loops are applied being swapped
145 (outer-inner becomes inner-outer)
149 ### Core concept loop
151 This example illustrates the Cray-style Loop concept. However where most Cray
152 Vectors have a Max Vector Length hard-coded into the architecture, Simple-V
153 allows MVL to be set, but only as a static immediate, so that compilers may
154 embed the register resource allocation statically at compile-time.
158 setvl a3, a0, MVL=8 # update a3 with vl
159 # (# of elements this iteration)
161 # set a3=VL=MIN(a0,MVL)
162 # do vector operations at up to 8 length (MVL=8)
164 sub. a0, a0, a3 # Decrement count by vl, set CR0.eq
165 bnez a0, loop # Any more?
170 In this example, the `setvl.` instruction enabled Rc=1, which
171 sets CR0.eq when VL becomes zero. Testing of `r4` (cmpi) is thus redundant
172 saving one instruction.
182 setvli. r4, r3, MVL=64
188 ### Load/Store-Multi (selective)
190 Up to 64 FPRs will be loaded, here. `r3` is set one per bit for each
191 FP register required to be loaded. The block of memory from which the
192 registers are loaded is contiguous (no gaps): any FP register which has
193 a corresponding zero bit in `r3` is *unaltered*. In essence this is a
194 selective LD-multi with "Scatter" (`VCOMPRESS`) capability.
197 setvli r0, MVL=64, VL=64
198 sv.fld/dm=r3 *r0, 0(r30) # selective load 64 FP registers
201 Up to 64 FPRs will be saved, here. Again, `r3` specifies which
202 registers are set in a `VEXPAND` fashion.
205 setvli r0, MVL=64, VL=64
206 sv.stfd/sm=r3 *fp0, 0(r30) # selective store 64 FP registers