(no commit message)
[libreriscv.git] / openpower / sv / setvl.mdwn
1 # setvl: Set Vector Length
2
3 <!-- hide -->
4 See links:
5
6 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-November/001366.html>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=535>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=587>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=914> TODO: setvl should not set SO
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=568> TODO
11 * <https://bugs.libre-soc.org/show_bug.cgi?id=927> bug - RT>=32
12 * <https://bugs.libre-soc.org/show_bug.cgi?id=862> VF Predication
13 * <https://bugs.libre-soc.org/show_bug.cgi?id=1222> Rc=1 enhancement needed
14 * <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vsetvlivsetvl-instructions>
15 * [[sv/svstep]]
16 * pseudocode [[openpower/isa/simplev]]
17 <!-- show -->
18
19 Add the following section to the Simple-V Chapter
20
21 ## setvl
22
23 SVL-Form
24
25 | 0-5|6-10|11-15|16-22 | 23 24 25 | 26-30 |31| FORM |
26 | -- | -- | --- | ---- |----------| ----- |--|----------|
27 |PO | RT | RA | SVi | ms vs vf | XO |Rc| SVL-Form |
28
29 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
30 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
31
32 Pseudo-code:
33
34 ```
35 overflow <- 0b0 # sets CR.SO if set and if Rc=1
36 VLimm <- SVi + 1
37 # set or get MVL
38 if ms = 1 then MVL <- VLimm[0:6]
39 else MVL <- SVSTATE[0:6]
40 # set or get VL
41 if vs = 0 then VL <- SVSTATE[7:13]
42 else if _RA != 0 then
43 if (RA) >u 0b1111111 then
44 VL <- 0b1111111
45 overflow <- 0b1
46 else VL <- (RA)[57:63]
47 else if _RT = 0 then VL <- VLimm[0:6]
48 else if CTR >u 0b1111111 then
49 VL <- 0b1111111
50 overflow <- 0b1
51 else VL <- CTR[57:63]
52 # limit VL to within MVL
53 if VL >u MVL then
54 overflow <- 0b1
55 VL <- MVL
56 SVSTATE[0:6] <- MVL
57 SVSTATE[7:13] <- VL
58 if _RT != 0 then
59 GPR(_RT) <- [0]*57 || VL
60 # MAXVL is a static "state-reset" opportunity so VF is only set then.
61 if ms = 1 then
62 SVSTATE[63] <- vf # set Vertical-First mode
63 SVSTATE[62] <- 0b0 # clear persist bit
64 ```
65
66 Special Registers Altered:
67
68 ```
69 CR0 (if Rc=1)
70 SVSTATE
71 ```
72
73 * `SVi` - bits 16-22 - an immediate operand for setting MVL and/or VL
74 * `ms` - bit 23 - allows for setting of MVL
75 * `vs` - bit 24 - allows for setting of VL
76 * `vf` - bit 25 - sets "Vertical First Mode".
77
78 Note that in immediate setting mode VL and MVL start from **one** but that
79 this is compensated for in the assembly notation. i.e. that an immediate
80 value of 1 in assembler notation actually places the value 0b0000000 in
81 the `SVi` field bits: on execution the `setvl` instruction adds one to
82 the decoded `SVi` field bits, resulting in VL/MVL being set to 1. In future
83 this will allow VL to be set to values ranging from 1 to 128 with only 7 bits
84 instead of 8. Setting VL/MVL to 0 would result in all Vector operations
85 becoming `nop`. If this is truly desired (nop behaviour) then setting
86 VL and MVL to zero is to be done via the [[SVSTATE SPR|sv/sprs]].
87
88 Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
89
90 ```
91 setvli VL=8 : setvl r0, r0, VL=8, vf=0, vs=1, ms=0
92 setvli. VL=8 : setvl. r0, r0, VL=8, vf=0, vs=1, ms=0
93 setmvli MVL=8 : setvl r0, r0, MVL=8, vf=0, vs=0, ms=1
94 setmvli. MVL=8 : setvl. r0, r0, MVL=8, vf=0, vs=0, ms=1
95 ```
96
97 Additional pseudo-op for obtaining VL without modifying it (or any state):
98
99 ```
100 getvl r5 : setvl r5, r0, vf=0, vs=0, ms=0
101 getvl. r5 : setvl. r5, r0, vf=0, vs=0, ms=0
102 ```
103
104 Note that whilst it is possible to set both MVL and VL from the same
105 immediate, it is not possible to set them to different immediates in
106 the same instruction. Doing so would require two instructions.
107
108 Use of setvl results in changes to the SVSTATE SPR. see [[sv/sprs]]
109
110 **Selecting sources for VL**
111
112 There is considerable opcode pressure, consequently to set MVL and VL
113 from different sources is as follows:
114
115 | condition | effect |
116 | - | - |
117 | `vs=1, RA=0, RT!=0` | VL,RT set to MIN(MVL, CTR) |
118 | `vs=1, RA=0, RT=0` | VL set to MIN(MVL, SVi+1) |
119 | `vs=1, RA!=0, RT=0` | VL set to MIN(MVL, RA) |
120 | `vs=1, RA!=0, RT!=0` | VL,RT set to MIN(MVL, RA) |
121
122 The reasoning here is that the opportunity to set RT equal to the
123 immediate `SVi+1` is sacrificed in favour of setting from CTR.
124
125 **Unusual Rc=1 behaviour**
126
127 Normally, the return result from an instruction is in `RT`. With it
128 being possible for `RT=0` to mean that `CTR` mode is to be read, some
129 different semantics are needed.
130
131 CR Field 0, when `Rc=1`, may be set even if `RT=0`. The reason is that
132 overflow may occur: `VL`, if set either from an immediate or from `CTR`,
133 may not exceed `MAXVL`, and if it is, `CR0.SO` must be set.
134
135 In reality it is **`VL`** being set. Therefore, rather than `CR0`
136 testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE is set if `VL`
137 is non-zero.
138
139 **SUBVL**
140
141 Sub-vector elements are not be considered "Vertical". The vec2/3/4
142 is to be considered as if the "single element". Caveats exist for
143 [[sv/mv.swizzle]] and [[sv/mv.vec]] when Pack/Unpack is enabled, due
144 to the order in which VL and SUBVL loops are applied being swapped
145 (outer-inner becomes inner-outer)
146
147 ## Examples
148
149 ### Core concept loop
150
151 This example illustrates the Cray-style Loop concept. However where most Cray
152 Vectors have a Max Vector Length hard-coded into the architecture, Simple-V
153 allows MVL to be set, but only as a static immediate, so that compilers may
154 embed the register resource allocation statically at compile-time.
155
156 ```
157 loop:
158 setvl a3, a0, MVL=8 # update a3 with vl
159 # (# of elements this iteration)
160 # set MVL to 8 and
161 # set a3=VL=MIN(a0,MVL)
162 # do vector operations at up to 8 length (MVL=8)
163 # ...
164 sub. a0, a0, a3 # Decrement count by vl, set CR0.eq
165 bnez a0, loop # Any more?
166 ```
167
168 ### Loop using Rc=1
169
170 In this example, the `setvl.` instruction enabled Rc=1, which
171 sets CR0.eq when VL becomes zero. Testing of `r4` (cmpi) is thus redundant
172 saving one instruction.
173
174 ```
175 my_fn:
176 li r3, 1000
177 b test
178 loop:
179 sub r3, r3, r4
180 ...
181 test:
182 setvli. r4, r3, MVL=64
183 bne cr0, loop
184 end:
185 blr
186 ```
187
188 ### Load/Store-Multi (selective)
189
190 Up to 64 FPRs will be loaded, here. `r3` is set one per bit for each
191 FP register required to be loaded. The block of memory from which the
192 registers are loaded is contiguous (no gaps): any FP register which has
193 a corresponding zero bit in `r3` is *unaltered*. In essence this is a
194 selective LD-multi with "Scatter" (`VCOMPRESS`) capability.
195
196 ```
197 setvli r0, MVL=64, VL=64
198 sv.fld/dm=r3 *r0, 0(r30) # selective load 64 FP registers
199 ```
200
201 Up to 64 FPRs will be saved, here. Again, `r3` specifies which
202 registers are set in a `VEXPAND` fashion.
203
204 ```
205 setvli r0, MVL=64, VL=64
206 sv.stfd/sm=r3 *fp0, 0(r30) # selective store 64 FP registers
207 ```
208
209 [[!tag standards]]
210
211 ------
212
213 \newpage{}
214