* [[discussion]]
* <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-register-gather-instructions>
* <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-May/004884.html>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=865> implementation in simulator
* <https://bugs.libre-soc.org/show_bug.cgi?id=213>
* <https://bugs.libre-soc.org/show_bug.cgi?id=142> specialist vector ops
out of scope for this document [[openpower/sv/3d_vector_ops]]
* [[simple_v_extension/specification/bitmanip]] previous version,
contains pseudocode for sof, sif, sbf
-* https://en.m.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#TBM_(Trailing_Bit_Manipulation)
+* <https://en.m.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#TBM_(Trailing_Bit_Manipulation)>
The core Power ISA was designed as scalar: SV provides a level of abstraction to add variable-length element-independent parallelism.
Therefore there are not that many cases where *actual* Vector
instructions are needed. If they are, they are more "assistance"
functions. Two traditional Vector instructions were initially
considered (conflictd and vmiota) however they may be synthesised
-from existing SVP64 instructions: details in [[discussion]]
+from existing SVP64 instructions: vmiota may use [[svstep]].
+Details in [[discussion]]
Notes:
* Instructions suited to 3D GPU workloads (dotproduct, crossproduct, normalise) are out of scope: this document is for more general-purpose instructions that underpin and are critical to general-purpose Vector workloads (including GPU and VPU)
* Instructions related to the adaptation of CRs for use as predicate masks are covered separately, by crweird operations. See [[sv/cr_int_predication]].
-# sbfm
+# Mask-suited Bitmanipulation
- sbfm RT, RA, RB!=0
+Based on RVV masked set-before-first, set-after-first etc.
+and Intel and AMD Bitmanip instructions made generalised then
+advanced further to include masks, this is a single instruction
+covering 24 individual instructions in other ISAs.
+*(sbf/sof/sif moved to [[discussion]])*
-Example
+BM2-Form
- 7 6 5 4 3 2 1 0 Bit index
+|0..5 |6..10|11..15|16..20|21-25|26|27..31| Form |
+|------|-----|------|------|-----|--|------|------|
+| PO | RS | RA | RB |bm |L | XO | BM2-Form |
- 1 0 0 1 0 1 0 0 v3 contents
- vmsbf.m v2, v3
- 0 0 0 0 0 0 1 1 v2 contents
+* bmask RT,RA,RB,bm,L
- 1 0 0 1 0 1 0 1 v3 contents
- vmsbf.m v2, v3
- 0 0 0 0 0 0 0 0 v2
+The patterns within the pseudocode for AMD TBM and x86 BMI1 are
+as follows:
- 0 0 0 0 0 0 0 0 v3 contents
- vmsbf.m v2, v3
- 1 1 1 1 1 1 1 1 v2
+* first pattern A: `x / ~x`
+* second pattern B: `| / & / ^`
+* third pattern C: `x+1 / x-1 / ~(x+1) / (~x)+1`
- 1 1 0 0 0 0 1 1 RB vcontents
- 1 0 0 1 0 1 0 0 v3 contents
- vmsbf.m v2, v3, v0.t
- 0 1 x x x x 1 1 v2 contents
-
-The vmsbf.m instruction takes a mask register as input and writes results to a mask register. The instruction writes a 1 to all active mask elements before the first source element that is a 1, then writes a 0 to that element and all following active elements. If there is no set bit in the source vector, then all active elements in the destination are written with a 1.
+Thus it makes sense to create a single instruction
+that covers all of these. A crucial addition that is essential
+for Scalable Vector usage as Predicate Masks, is the second mask parameter
+(RB). The additional paramater, L, if set, will leave bits of RA masked
+by RB unaltered, otherwise those bits are set to zero. Note that when `RB=0`
+then instead of reading from the register file the mask is set to all ones.
Executable pseudocode demo:
```
-[[!inline quick="yes" raw="yes" pages="openpower/sv/sbf.py"]]
+[[!inline pages="openpower/sv/bmask.py" quick="yes" raw="yes" ]]
```
-# sifm
-
-The vector mask set-including-first instruction is similar to set-before-first, except it also includes the element with a set bit.
-
- sifm RT, RA, RB!=0
-
- # Example
-
- 7 6 5 4 3 2 1 0 Bit number
-
- 1 0 0 1 0 1 0 0 v3 contents
- vmsif.m v2, v3
- 0 0 0 0 0 1 1 1 v2 contents
-
- 1 0 0 1 0 1 0 1 v3 contents
- vmsif.m v2, v3
- 0 0 0 0 0 0 0 1 v2
-
- 1 1 0 0 0 0 1 1 RB vcontents
- 1 0 0 1 0 1 0 0 v3 contents
- vmsif.m v2, v3, v0.t
- 1 1 x x x x 1 1 v2 contents
-
-Executable pseudocode demo:
-
-```
-[[!inline quick="yes" raw="yes" pages="openpower/sv/sif.py"]]
-```
-
-# vmsof
-
-The vector mask set-only-first instruction is similar to set-before-first, except it only sets the first element with a bit set, if any.
-
- sofm RT, RA, RB
-
-Example
-
- 7 6 5 4 3 2 1 0 Bit number
+# Carry-lookahead
- 1 0 0 1 0 1 0 0 v3 contents
- vmsof.m v2, v3
- 0 0 0 0 0 1 0 0 v2 contents
+As a single scalar 32-bit instruction, up to 64 carry-propagation bits
+may be computed. When the output is then used as a Predicate mask it can
+be used to selectively perform the "add carry" of biginteger math, with
+`sv.addi/sm=rN RT.v, RA.v, 1`.
- 1 0 0 1 0 1 0 1 v3 contents
- vmsof.m v2, v3
- 0 0 0 0 0 0 0 1 v2
+* cprop RT,RA,RB
+* cprop. RT,RA,RB
- 1 1 0 0 0 0 1 1 RB vcontents
- 1 1 0 1 0 1 0 0 v3 contents
- vmsof.m v2, v3, v0.t
- 0 1 x x x x 0 0 v2 content
+pseudocode:
-Executable pseudocode demo:
+ P = (RA)
+ G = (RB)
+ RT = ((P|G)+G)^P
-```
-[[!inline quick="yes" raw="yes" pages="openpower/sv/sof.py"]]
-```
+X-Form
-# Carry-lookahead
+| 0.5|6.10|11.15|16.20| 21..30 |31| name | Form |
+| -- | -- | --- | --- | --------- |--| ---- | ------- |
+| NN | RT | RA | RB | 0110001110 |Rc| cprop | X-Form |
used not just for carry lookahead, also a special type of predication mask operation.