[[!tag standards]]
-# SV Vector Operations.
+# SV Vector-assist Operations.
Links:
* [[discussion]]
* <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-register-gather-instructions>
* <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-May/004884.html>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=865> implementation in simulator
* <https://bugs.libre-soc.org/show_bug.cgi?id=213>
* <https://bugs.libre-soc.org/show_bug.cgi?id=142> specialist vector ops
out of scope for this document [[openpower/sv/3d_vector_ops]]
contains pseudocode for sof, sif, sbf
* <https://en.m.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#TBM_(Trailing_Bit_Manipulation)>
-The core Power ISA was designed as scalar: SV provides a level of abstraction to add variable-length element-independent parallelism.
-Therefore there are not that many cases where *actual* Vector
-instructions are needed. If they are, they are more "assistance"
-functions. Two traditional Vector instructions were initially
-considered (conflictd and vmiota) however they may be synthesised
-from existing SVP64 instructions: details in [[discussion]]
+The core Power ISA was designed as scalar: SV provides a level of
+abstraction to add variable-length element-independent parallelism.
+Therefore there are not that many cases where *actual* Vector instructions
+are needed. If they are, they are more "assistance" functions. Two
+traditional Vector instructions were initially considered (conflictd and
+vmiota) however they may be synthesised from existing SVP64 instructions:
+vmiota may use [[svstep]]. Details in [[discussion]]
Notes:
-* Instructions suited to 3D GPU workloads (dotproduct, crossproduct, normalise) are out of scope: this document is for more general-purpose instructions that underpin and are critical to general-purpose Vector workloads (including GPU and VPU)
-* Instructions related to the adaptation of CRs for use as predicate masks are covered separately, by crweird operations. See [[sv/cr_int_predication]].
+* Instructions suited to 3D GPU workloads (dotproduct, crossproduct,
+ normalise) are out of scope: this document is for more general-purpose
+ instructions that underpin and are critical to general-purpose Vector
+ workloads (including GPU and VPU)
+* Instructions related to the adaptation of CRs for use as
+ predicate masks are covered separately, by crweird operations.
+ See [[sv/cr_int_predication]].
-# sbfm
+## Mask-suited Bitmanipulation
- sbfm RT, RA, RB!=0
-Example
+BM2-Form
- 7 6 5 4 3 2 1 0 Bit index
+|0..5 |6..10|11..15|16..20|21-25|26|27..31| Form |
+|------|-----|------|------|-----|--|------|------|
+| PO | RS | RA | RB |bm |L | XO | BM2-Form |
- 1 0 0 1 0 1 0 0 v3 contents
- vmsbf.m v2, v3
- 0 0 0 0 0 0 1 1 v2 contents
+* bmask RS,RA,RB,bm,L
- 1 0 0 1 0 1 0 1 v3 contents
- vmsbf.m v2, v3
- 0 0 0 0 0 0 0 0 v2
-
- 0 0 0 0 0 0 0 0 v3 contents
- vmsbf.m v2, v3
- 1 1 1 1 1 1 1 1 v2
-
- 1 1 0 0 0 0 1 1 RB vcontents
- 1 0 0 1 0 1 0 0 v3 contents
- vmsbf.m v2, v3, v0.t
- 0 1 x x x x 1 1 v2 contents
-
-The vmsbf.m instruction takes a mask register as input and writes results to a mask register. The instruction writes a 1 to all active mask elements before the first source element that is a 1, then writes a 0 to that element and all following active elements. If there is no set bit in the source vector, then all active elements in the destination are written with a 1.
-
-Executable pseudocode demo:
-
-```
-[[!inline quick="yes" raw="yes" pages="openpower/sv/sbf.py"]]
-```
-
-# sifm
-
-The vector mask set-including-first instruction is similar to set-before-first, except it also includes the element with a set bit.
-
- sifm RT, RA, RB!=0
-
- # Example
-
- 7 6 5 4 3 2 1 0 Bit number
-
- 1 0 0 1 0 1 0 0 v3 contents
- vmsif.m v2, v3
- 0 0 0 0 0 1 1 1 v2 contents
-
- 1 0 0 1 0 1 0 1 v3 contents
- vmsif.m v2, v3
- 0 0 0 0 0 0 0 1 v2
-
- 1 1 0 0 0 0 1 1 RB vcontents
- 1 0 0 1 0 1 0 0 v3 contents
- vmsif.m v2, v3, v0.t
- 1 1 x x x x 1 1 v2 contents
-
-Executable pseudocode demo:
+Pseudo-code:
```
-[[!inline quick="yes" raw="yes" pages="openpower/sv/sif.py"]]
+ if _RB = 0 then mask <- [1] * XLEN
+ else mask <- (RB)
+ ra <- (RA) & mask
+ a1 <- ra
+ if bm[4] = 0 then a1 <- ¬ra
+ mode2 <- bm[2:3]
+ if mode2 = 0 then a2 <- (¬ra)+1
+ if mode2 = 1 then a2 <- ra-1
+ if mode2 = 2 then a2 <- ra+1
+ if mode2 = 3 then a2 <- ¬(ra+1)
+ a1 <- a1 & mask
+ a2 <- a2 & mask
+ # select operator
+ mode3 <- bm[0:1]
+ if mode3 = 0 then result <- a1 | a2
+ if mode3 = 1 then result <- a1 & a2
+ if mode3 = 2 then result <- a1 ^ a2
+ if mode3 = 3 then result <- undefined([0]*XLEN)
+ # mask output
+ result <- result & mask
+ # optionally restore masked-out bits
+ if L = 1 then
+ result <- result | (RA & ¬mask)
+ RT <- result
```
-# vmsof
-
-The vector mask set-only-first instruction is similar to set-before-first, except it only sets the first element with a bit set, if any.
-
- sofm RT, RA, RB
-
-Example
+* first pattern A: two options `x` or `~x`
+* second pattern B: three options `|` `&` or `^`
+* third pattern C: four options `x+1`, `x-1`, `~(x+1)` or `(~x)+1`
- 7 6 5 4 3 2 1 0 Bit number
- 1 0 0 1 0 1 0 0 v3 contents
- vmsof.m v2, v3
- 0 0 0 0 0 1 0 0 v2 contents
+The lower two bits of `bm` set to 0b11 are `RESERVED`. An illegal instruction
+trap must be raised.
- 1 0 0 1 0 1 0 1 v3 contents
- vmsof.m v2, v3
- 0 0 0 0 0 0 0 1 v2
-
- 1 1 0 0 0 0 1 1 RB vcontents
- 1 1 0 1 0 1 0 0 v3 contents
- vmsof.m v2, v3, v0.t
- 0 1 x x x x 0 0 v2 content
-
-Executable pseudocode demo:
+Special Registers Altered:
```
-[[!inline quick="yes" raw="yes" pages="openpower/sv/sof.py"]]
+ None
```
-# Carry-lookahead
+## Carry-lookahead
-used not just for carry lookahead, also a special type of predication mask operation.
+As a single scalar 32-bit instruction, up to 64 carry-propagation bits
+may be computed. When the output is then used as a Predicate mask it can
+be used to selectively perform the "add carry" of biginteger math, with
+`sv.addi/sm=rN RT.v, RA.v, 1`.
-* <https://www.geeksforgeeks.org/carry-look-ahead-adder/>
-* <https://media.geeksforgeeks.org/wp-content/uploads/digital_Logic6.png>
-* <https://electronics.stackexchange.com/questions/20085/whats-the-difference-with-carry-look-ahead-generator-block-carry-look-ahead-ge>
-* <https://i.stack.imgur.com/QSLKY.png>
-* <https://stackoverflow.com/questions/27971757/big-integer-addition-code>
- `((P|G)+G)^P`
-* <https://en.m.wikipedia.org/wiki/Carry-lookahead_adder>
+* cprop RT,RA,RB (Rc=0)
+* cprop. RT,RA,RB (Rc=1)
-From QLSKY.png:
+pseudocode:
```
- x0 = nand(CIn, P0)
- C0 = nand(x0, ~G0)
-
- x1 = nand(CIn, P0, P1)
- y1 = nand(G0, P1)
- C1 = nand(x1, y1, ~G1)
-
- x2 = nand(CIn, P0, P1, P2)
- y2 = nand(G0, P1, P2)
- z2 = nand(G1, P2)
- C1 = nand(x2, y2, z2, ~G2)
-
- # Gen*
- x3 = nand(G0, P1, P2, P3)
- y3 = nand(G1, P2, P3)
- z3 = nand(G2, P3)
- G* = nand(x3, y3, z3, ~G3)
+ P = (RA)
+ G = (RB)
+ RT = ((P|G)+G)^P
```
-```
- P = (A | B) & Ci
- G = (A & B)
-```
-
-Stackoverflow algorithm `((P|G)+G)^P` works on the cumulated bits of P and G from associated vector units (P and G are integers here). The result of the algorithm is the new carry-in which already includes ripple, one bit of carry per element.
-
-```
- At each id, compute C[id] = A[id]+B[id]+0
- Get G[id] = C[id] > radix -1
- Get P[id] = C[id] == radix-1
- Join all P[id] together, likewise G[id]
- Compute newC = ((P|G)+G)^P
- result[id] = (C[id] + newC[id]) % radix
-```
+X-Form
-two versions: scalar int version and CR based version.
+| 0:5|6:10|11:15|16:20| 21:30 |31| name | Form |
+| -- | -- | --- | --- | --------- |--| ---- | ------- |
+| PO | RT | RA | RB | XO |Rc| cprop | X-Form |
-scalar int version acts as a scalar carry-propagate, reading XER.CA as input, P and G as regs, and taking a radix argument. the end bits go into XER.CA and CR0.ge
-
-vector version takes CR0.so as carry in, stores in CR0.so and CR.ge end bits.
-
-if zero (no propagation) then CR0.eq is zero
+used not just for carry lookahead, also a special type of predication mask operation.
-CR based version, TODO.