remove Rc=1 for now from bmatflip
[libreriscv.git] / openpower / sv / setvl.mdwn
1 # setvl: Set Vector Length
2
3 <!-- hide -->
4 See links:
5
6 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-November/001366.html>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=535>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=587>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=914> TODO: setvl should not set SO
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=568> TODO
11 * <https://bugs.libre-soc.org/show_bug.cgi?id=927> bug - RT>=32
12 * <https://bugs.libre-soc.org/show_bug.cgi?id=862> VF Predication
13 * <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vsetvlivsetvl-instructions>
14 * [[sv/svstep]]
15 * pseudocode [[openpower/isa/simplev]]
16 <!-- show -->
17
18 Add the following section to the Simple-V Chapter
19
20 ## setvl
21
22 SVL-Form
23
24 | 0-5|6-10|11-15|16-22 | 23 24 25 | 26-30 |31| FORM |
25 | -- | -- | --- | ---- |----------| ----- |--|----------|
26 |PO | RT | RA | SVi | ms vs vf | XO |Rc| SVL-Form |
27
28 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
29 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
30
31 Pseudo-code:
32
33 ```
34 overflow <- 0b0 # sets CR.SO if set and if Rc=1
35 VLimm <- SVi + 1
36 # set or get MVL
37 if ms = 1 then MVL <- VLimm[0:6]
38 else MVL <- SVSTATE[0:6]
39 # set or get VL
40 if vs = 0 then VL <- SVSTATE[7:13]
41 else if _RA != 0 then
42 if (RA) >u 0b1111111 then
43 VL <- 0b1111111
44 overflow <- 0b1
45 else VL <- (RA)[57:63]
46 else if _RT = 0 then VL <- VLimm[0:6]
47 else if CTR >u 0b1111111 then
48 VL <- 0b1111111
49 overflow <- 0b1
50 else VL <- CTR[57:63]
51 # limit VL to within MVL
52 if VL >u MVL then
53 overflow <- 0b1
54 VL <- MVL
55 SVSTATE[0:6] <- MVL
56 SVSTATE[7:13] <- VL
57 if _RT != 0 then
58 GPR(_RT) <- [0]*57 || VL
59 # MAXVL is a static "state-reset" opportunity so VF is only set then.
60 if ms = 1 then
61 SVSTATE[63] <- vf # set Vertical-First mode
62 SVSTATE[62] <- 0b0 # clear persist bit
63 ```
64
65 Special Registers Altered:
66
67 ```
68 CR0 (if Rc=1)
69 SVSTATE
70 ```
71
72 * `SVi` - bits 16-22 - an immediate operand for setting MVL and/or VL
73 * `ms` - bit 23 - allows for setting of MVL
74 * `vs` - bit 24 - allows for setting of VL
75 * `vf` - bit 25 - sets "Vertical First Mode".
76
77 Note that in immediate setting mode VL and MVL start from **one** but that
78 this is compensated for in the assembly notation. i.e. that an immediate
79 value of 1 in assembler notation actually places the value 0b0000000 in
80 the `SVi` field bits: on execution the `setvl` instruction adds one to
81 the decoded `SVi` field bits, resulting in VL/MVL being set to 1. In future
82 this will allow VL to be set to values ranging from 1 to 128 with only 7 bits
83 instead of 8. Setting VL/MVL to 0 would result in all Vector operations
84 becoming `nop`. If this is truly desired (nop behaviour) then setting
85 VL and MVL to zero is to be done via the [[SVSTATE SPR|sv/sprs]].
86
87 Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
88
89 ```
90 setvli VL=8 : setvl r0, r0, VL=8, vf=0, vs=1, ms=0
91 setvli. VL=8 : setvl. r0, r0, VL=8, vf=0, vs=1, ms=0
92 setmvli MVL=8 : setvl r0, r0, MVL=8, vf=0, vs=0, ms=1
93 setmvli. MVL=8 : setvl. r0, r0, MVL=8, vf=0, vs=0, ms=1
94 ```
95
96 Additional pseudo-op for obtaining VL without modifying it (or any state):
97
98 ```
99 getvl r5 : setvl r5, r0, vf=0, vs=0, ms=0
100 getvl. r5 : setvl. r5, r0, vf=0, vs=0, ms=0
101 ```
102
103 Note that whilst it is possible to set both MVL and VL from the same
104 immediate, it is not possible to set them to different immediates in
105 the same instruction. Doing so would require two instructions.
106
107 Use of setvl results in changes to the SVSTATE SPR. see [[sv/sprs]]
108
109 **Selecting sources for VL**
110
111 There is considerable opcode pressure, consequently to set MVL and VL
112 from different sources is as follows:
113
114 | condition | effect |
115 | - | - |
116 | `vs=1, RA=0, RT!=0` | VL,RT set to MIN(MVL, CTR) |
117 | `vs=1, RA=0, RT=0` | VL set to MIN(MVL, SVi+1) |
118 | `vs=1, RA!=0, RT=0` | VL set to MIN(MVL, RA) |
119 | `vs=1, RA!=0, RT!=0` | VL,RT set to MIN(MVL, RA) |
120
121 The reasoning here is that the opportunity to set RT equal to the
122 immediate `SVi+1` is sacrificed in favour of setting from CTR.
123
124 **Unusual Rc=1 behaviour**
125
126 Normally, the return result from an instruction is in `RT`. With it
127 being possible for `RT=0` to mean that `CTR` mode is to be read, some
128 different semantics are needed.
129
130 CR Field 0, when `Rc=1`, may be set even if `RT=0`. The reason is that
131 overflow may occur: `VL`, if set either from an immediate or from `CTR`,
132 may not exceed `MAXVL`, and if it is, `CR0.SO` must be set.
133
134 In reality it is **`VL`** being set. Therefore, rather than `CR0`
135 testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE is set if `VL`
136 is non-zero.
137
138 **SUBVL**
139
140 Sub-vector elements are not be considered "Vertical". The vec2/3/4
141 is to be considered as if the "single element". Caveats exist for
142 [[sv/mv.swizzle]] and [[sv/mv.vec]] when Pack/Unpack is enabled, due
143 to the order in which VL and SUBVL loops are applied being swapped
144 (outer-inner becomes inner-outer)
145
146 ## Examples
147
148 ### Core concept loop
149
150 This example illustrates the Cray-style Loop concept. However where most Cray
151 Vectors have a Max Vector Length hard-coded into the architecture, Simple-V
152 allows MVL to be set, but only as a static immediate, so that compilers may
153 embed the register resource allocation statically at compile-time.
154
155 ```
156 loop:
157 setvl a3, a0, MVL=8 # update a3 with vl
158 # (# of elements this iteration)
159 # set MVL to 8 and
160 # set a3=VL=MIN(a0,MVL)
161 # do vector operations at up to 8 length (MVL=8)
162 # ...
163 sub. a0, a0, a3 # Decrement count by vl, set CR0.eq
164 bnez a0, loop # Any more?
165 ```
166
167 ### Loop using Rc=1
168
169 In this example, the `setvl.` instruction enabled Rc=1, which
170 sets CR0.eq when VL becomes zero. Testing of `r4` (cmpi) is thus redundant
171 saving one instruction.
172
173 ```
174 my_fn:
175 li r3, 1000
176 b test
177 loop:
178 sub r3, r3, r4
179 ...
180 test:
181 setvli. r4, r3, MVL=64
182 bne cr0, loop
183 end:
184 blr
185 ```
186
187 ### Load/Store-Multi (selective)
188
189 Up to 64 FPRs will be loaded, here. `r3` is set one per bit for each
190 FP register required to be loaded. The block of memory from which the
191 registers are loaded is contiguous (no gaps): any FP register which has
192 a corresponding zero bit in `r3` is *unaltered*. In essence this is a
193 selective LD-multi with "Scatter" (`VCOMPRESS`) capability.
194
195 ```
196 setvli r0, MVL=64, VL=64
197 sv.fld/dm=r3 *r0, 0(r30) # selective load 64 FP registers
198 ```
199
200 Up to 64 FPRs will be saved, here. Again, `r3` specifies which
201 registers are set in a `VEXPAND` fashion.
202
203 ```
204 setvli r0, MVL=64, VL=64
205 sv.stfd/sm=r3 *fp0, 0(r30) # selective store 64 FP registers
206 ```
207
208 [[!tag standards]]
209
210 ------
211
212 \newpage{}
213