https://bugs.libre-soc.org/show_bug.cgi?id=985
[libreriscv.git] / openpower / sv / vector_ops.mdwn
1 [[!tag standards]]
2
3 # SV Vector-assist Operations.
4
5 Links:
6
7 * [[discussion]]
8 * <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-register-gather-instructions>
9 * <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-May/004884.html>
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=865> implementation in simulator
11 * <https://bugs.libre-soc.org/show_bug.cgi?id=213>
12 * <https://bugs.libre-soc.org/show_bug.cgi?id=142> specialist vector ops
13 out of scope for this document [[openpower/sv/3d_vector_ops]]
14 * [[simple_v_extension/specification/bitmanip]] previous version,
15 contains pseudocode for sof, sif, sbf
16 * <https://en.m.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#TBM_(Trailing_Bit_Manipulation)>
17
18 The core Power ISA was designed as scalar: SV provides a level of
19 abstraction to add variable-length element-independent parallelism.
20 Therefore there are not that many cases where *actual* Vector instructions
21 are needed. If they are, they are more "assistance" functions. Two
22 traditional Vector instructions were initially considered (conflictd and
23 vmiota) however they may be synthesised from existing SVP64 instructions:
24 vmiota may use [[svstep]]. Details in [[discussion]]
25
26 Notes:
27
28 * Instructions suited to 3D GPU workloads (dotproduct, crossproduct,
29 normalise) are out of scope: this document is for more general-purpose
30 instructions that underpin and are critical to general-purpose Vector
31 workloads (including GPU and VPU)
32 * Instructions related to the adaptation of CRs for use as
33 predicate masks are covered separately, by crweird operations.
34 See [[sv/cr_int_predication]].
35
36 ## Mask-suited Bitmanipulation
37
38
39 BM2-Form
40
41 |0..5 |6..10|11..15|16..20|21-25|26|27..31| Form |
42 |------|-----|------|------|-----|--|------|------|
43 | PO | RS | RA | RB |bm |L | XO | BM2-Form |
44
45 * bmask RS,RA,RB,bm,L
46
47 Pseudo-code:
48
49 ```
50 if _RB = 0 then mask <- [1] * XLEN
51 else mask <- (RB)
52 ra <- (RA) & mask
53 a1 <- ra
54 if bm[4] = 0 then a1 <- ¬ra
55 mode2 <- bm[2:3]
56 if mode2 = 0 then a2 <- (¬ra)+1
57 if mode2 = 1 then a2 <- ra-1
58 if mode2 = 2 then a2 <- ra+1
59 if mode2 = 3 then a2 <- ¬(ra+1)
60 a1 <- a1 & mask
61 a2 <- a2 & mask
62 # select operator
63 mode3 <- bm[0:1]
64 if mode3 = 0 then result <- a1 | a2
65 if mode3 = 1 then result <- a1 & a2
66 if mode3 = 2 then result <- a1 ^ a2
67 if mode3 = 3 then result <- undefined([0]*XLEN)
68 # mask output
69 result <- result & mask
70 # optionally restore masked-out bits
71 if L = 1 then
72 result <- result | (RA & ¬mask)
73 RT <- result
74 ```
75
76 * first pattern A: two options `x` or `~x`
77 * second pattern B: three options `|` `&` or `^`
78 * third pattern C: four options `x+1`, `x-1`, `~(x+1)` or `(~x)+1`
79
80
81 The lower two bits of `bm` set to 0b11 are `RESERVED`. An illegal instruction
82 trap must be raised.
83
84 Special Registers Altered:
85
86 ```
87 None
88 ```
89
90 ## Carry-lookahead
91
92 As a single scalar 32-bit instruction, up to 64 carry-propagation bits
93 may be computed. When the output is then used as a Predicate mask it can
94 be used to selectively perform the "add carry" of biginteger math, with
95 `sv.addi/sm=rN RT.v, RA.v, 1`.
96
97 * cprop RT,RA,RB (Rc=0)
98 * cprop. RT,RA,RB (Rc=1)
99
100 pseudocode:
101
102 ```
103 P = (RA)
104 G = (RB)
105 RT = ((P|G)+G)^P
106 ```
107
108 X-Form
109
110 | 0:5|6:10|11:15|16:20| 21:30 |31| name | Form |
111 | -- | -- | --- | --- | --------- |--| ---- | ------- |
112 | PO | RT | RA | RB | XO |Rc| cprop | X-Form |
113
114 used not just for carry lookahead, also a special type of predication mask operation.
115