(no commit message)
[libreriscv.git] / openpower / sv / setvl.mdwn
1 # setvl: Set Vector Length
2
3 <!-- hide -->
4 See links:
5
6 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-November/001366.html>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=535>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=587>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=568> TODO
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=927> bug - RT>=32
11 * <https://bugs.libre-soc.org/show_bug.cgi?id=862> VF Predication
12 * <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vsetvlivsetvl-instructions>
13 * [[sv/svstep]]
14 * pseudocode [[openpower/isa/simplev]]
15 <!-- show -->
16
17 Add the following section to the Simple-V Chapter
18
19 ## setvl
20
21 SVL-Form
22
23 | 0-5|6-10|11-15|16-22 | 23 24 25 | 26-30 |31| FORM |
24 | -- | -- | --- | ---- |----------| ----- |--|----------|
25 |PO | RT | RA | SVi | ms vs vf | XO |Rc| SVL-Form |
26
27 * setvl RT,RA,SVi,vf,vs,ms (Rc=0)
28 * setvl. RT,RA,SVi,vf,vs,ms (Rc=1)
29
30 Pseudo-code:
31
32 ```
33 overflow <- 0b0 # sets CR.SO if set and if Rc=1
34 VLimm <- SVi + 1
35 # set or get MVL
36 if ms = 1 then MVL <- VLimm[0:6]
37 else MVL <- SVSTATE[0:6]
38 # set or get VL
39 if vs = 0 then VL <- SVSTATE[7:13]
40 else if _RA != 0 then
41 if (RA) >u 0b1111111 then
42 VL <- 0b1111111
43 overflow <- 0b1
44 else VL <- (RA)[57:63]
45 else if _RT = 0 then VL <- VLimm[0:6]
46 else if CTR >u 0b1111111 then
47 VL <- 0b1111111
48 overflow <- 0b1
49 else VL <- CTR[57:63]
50 # limit VL to within MVL
51 if VL >u MVL then
52 overflow <- 0b1
53 VL <- MVL
54 SVSTATE[0:6] <- MVL
55 SVSTATE[7:13] <- VL
56 if _RT != 0 then
57 GPR(_RT) <- [0]*57 || VL
58 # MAXVL is a static "state-reset" opportunity so VF is only set then.
59 if ms = 1 then
60 SVSTATE[63] <- vf # set Vertical-First mode
61 SVSTATE[62] <- 0b0 # clear persist bit
62 ```
63
64 Special Registers Altered:
65
66 ```
67 CR0 (if Rc=1)
68 SVSTATE
69 ```
70
71 * `SVi` - bits 16-22 - an immediate operand for setting MVL and/or VL
72 * `ms` - bit 23 - allows for setting of MVL
73 * `vs` - bit 24 - allows for setting of VL
74 * `vf` - bit 25 - sets "Vertical First Mode".
75
76 Note that in immediate setting mode VL and MVL start from **one** but that
77 this is compensated for in the assembly notation. i.e. that an immediate
78 value of 1 in assembler notation actually places the value 0b0000000 in
79 the `SVi` field bits: on execution the `setvl` instruction adds one to
80 the decoded `SVi` field bits, resulting in VL/MVL being set to 1. In future
81 this will allow VL to be set to values ranging from 1 to 128 with only 7 bits
82 instead of 8. Setting VL/MVL to 0 would result in all Vector operations
83 becoming `nop`. If this is truly desired (nop behaviour) then setting
84 VL and MVL to zero is to be done via the [[SVSTATE SPR|sv/sprs]].
85
86 Note that setmvli is a pseudo-op, based on RA/RT=0, and setvli likewise
87
88 ```
89 setvli VL=8 : setvl r0, r0, VL=8, vf=0, vs=1, ms=0
90 setvli. VL=8 : setvl. r0, r0, VL=8, vf=0, vs=1, ms=0
91 setmvli MVL=8 : setvl r0, r0, MVL=8, vf=0, vs=0, ms=1
92 setmvli. MVL=8 : setvl. r0, r0, MVL=8, vf=0, vs=0, ms=1
93 ```
94
95 Additional pseudo-op for obtaining VL without modifying it (or any state):
96
97 ```
98 getvl r5 : setvl r5, r0, vf=0, vs=0, ms=0
99 getvl. r5 : setvl. r5, r0, vf=0, vs=0, ms=0
100 ```
101
102 Note that whilst it is possible to set both MVL and VL from the same
103 immediate, it is not possible to set them to different immediates in
104 the same instruction. Doing so would require two instructions.
105
106 Use of setvl results in changes to the SVSTATE SPR. see [[sv/sprs]]
107
108 **Selecting sources for VL**
109
110 There is considerable opcode pressure, consequently to set MVL and VL
111 from different sources is as follows:
112
113 | condition | effect |
114 | - | - |
115 | `vs=1, RA=0, RT!=0` | VL,RT set to MIN(MVL, CTR) |
116 | `vs=1, RA=0, RT=0` | VL set to MIN(MVL, SVi+1) |
117 | `vs=1, RA!=0, RT=0` | VL set to MIN(MVL, RA) |
118 | `vs=1, RA!=0, RT!=0` | VL,RT set to MIN(MVL, RA) |
119
120 The reasoning here is that the opportunity to set RT equal to the
121 immediate `SVi+1` is sacrificed in favour of setting from CTR.
122
123 **Unusual Rc=1 behaviour**
124
125 Normally, the return result from an instruction is in `RT`. With it
126 being possible for `RT=0` to mean that `CTR` mode is to be read, some
127 different semantics are needed.
128
129 CR Field 0, when `Rc=1`, may be set even if `RT=0`. The reason is that
130 overflow may occur: `VL`, if set either from an immediate or from `CTR`,
131 may not exceed `MAXVL`, and if it is, `CR0.SO` must be set.
132
133 In reality it is **`VL`** being set. Therefore, rather than `CR0`
134 testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE is set if `VL`
135 is non-zero.
136
137 **SUBVL**
138
139 Sub-vector elements are not be considered "Vertical". The vec2/3/4
140 is to be considered as if the "single element". Caveats exist for
141 [[sv/mv.swizzle]] and [[sv/mv.vec]] when Pack/Unpack is enabled, due
142 to the order in which VL and SUBVL loops are applied being swapped
143 (outer-inner becomes inner-outer)
144
145 ## Examples
146
147 ### Core concept loop
148
149 This example illustrates the Cray-style Loop concept. However where most Cray
150 Vectors have a Max Vector Length hard-coded into the architecture, Simple-V
151 allows MVL to be set, but only as a static immediate, so that compilers may
152 embed the register resource allocation statically at compile-time.
153
154 ```
155 loop:
156 setvl a3, a0, MVL=8 # update a3 with vl
157 # (# of elements this iteration)
158 # set MVL to 8 and
159 # set a3=VL=MIN(a0,MVL)
160 # do vector operations at up to 8 length (MVL=8)
161 # ...
162 sub. a0, a0, a3 # Decrement count by vl, set CR0.eq
163 bnez a0, loop # Any more?
164 ```
165
166 ### Loop using Rc=1
167
168 In this example, the `setvl.` instruction enabled Rc=1, which
169 sets CR0.eq when VL becomes zero.
170
171 ```
172 my_fn:
173 li r3, 1000
174 b test
175 loop:
176 sub r3, r3, r4
177 ...
178 test:
179 setvli. r4, r3, MVL=64
180 bne cr0, loop
181 end:
182 blr
183 ```
184
185 ### Load/Store-Multi (selective)
186
187 Up to 64 FPRs will be loaded, here. `r3` is set one per bit for each
188 FP register required to be loaded. The block of memory from which the
189 registers are loaded is contiguous (no gaps): any FP register which has
190 a corresponding zero bit in `r3` is *unaltered*. In essence this is a
191 selective LD-multi with "Scatter" capability.
192
193 ```
194 setvli r0, MVL=64, VL=64
195 sv.fld/dm=r3 *r0, 0(r30) # selective load 64 FP registers
196 ```
197
198 Up to 64 FPRs will be saved, here. Again, `r3` specifies which
199 registers are set in a `VEXPAND` fashion.
200
201 ```
202 setvli r0, MVL=64, VL=64
203 sv.stfd/sm=r3 *fp0, 0(r30) # selective store 64 FP registers
204 ```
205
206 [[!tag standards]]
207
208 ------
209
210 \newpage{}
211