1 # RFC ls014 Advanced Scalar Bitmanipulation
5 * <https://libre-soc.org/openpower/sv/rfc/ls014/>
6 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1065>
19 **Books and Section affected**:
22 Book I Fixed-Point Instructions
23 Appendix E Power ISA sorted by opcode
24 Appendix F Power ISA sorted by version
25 Appendix G Power ISA sorted by Compliancy Subset
26 Appendix H Power ISA sorted by mnemonic
32 Instructions added: bmask, grevlut, grevluti
35 **Submitter**: Luke Leighton (Libre-SOC)
37 **Requester**: Libre-SOC
39 **Impact on processor**:
42 Addition of new GPR-based instructions
45 **Impact on software**:
48 Requires support for new instructions in assembler, debuggers,
55 LUTs, Bitmanipulation, GPR
60 Scalar Bitmanipulation in other high-end ISAs have had BMI subsets for over a decade.
61 Their use and benefit is well-understood and compiler integration well-established.
62 `bmask` brings *twenty four* BMI instructions to the Power ISA.
64 `grevlut` on the other hand is highly experimental and extremely powerful. Normally
65 only `grev` (Generalised Reverse) and occasionally `gor` are added to a Bitmanip-strong
66 ISA: grevlut utilises LUTs and inversion to add 512 Generalised Reverse instructions.
67 Desirable savings in general binary size are achieved.
69 **Notes and Observations**:
71 1. bmask is a synthesis and generalisation of every "TBM" instruction with additional
72 options not found in any other ISA BMI group.
73 2. grevluti as a 32-bit Defined Word is capable of generating over a thousand useful
74 regular-patterned 64-bit "magic constants" that otherwise require either a Load
75 or require several instructions to synthesise
76 3. word halfword byte nibble 2-bit 1-bit reversal at multiple levels are all achieved
77 with grevlut. Some of these instructions were explicitly added in Power ISA v3.1
78 but grevlut is akin to xxeval.
79 4. grevlut can be expensive in hardware (estimated 20,000 gates) but
80 like xxeval provides 512 equivalent instructions.
84 Add the following entries to:
86 * the Appendices of Book I
87 * Book I 3.3.13 Fixed-Point Logical Instructions
88 * Book I 1.6.1 and 1.6.2
98 Based on RVV masked set-before-first, set-after-first etc.
99 and Intel and AMD Bitmanip instructions made generalised then
100 advanced further to include masks, this is a single instruction
101 covering 24 individual instructions in other ISAs.
103 The patterns within the pseudocode for AMD TBM and x86 BMI1 are
106 * first pattern A: two options `x` or `~x`
107 * second pattern B: three options `|` `&` or `^`
108 * third pattern C: four options `x+1`, `x-1`, `~(x+1)` or `(~x)+1`
110 Thus it makes sense to create a single instruction
111 that covers all of these. A crucial addition that is essential
112 for Scalable Vector usage as Predicate Masks, is the second mask parameter
113 (RB). The additional paramater, L, if set, will leave bits of RA masked
114 by RB unaltered, otherwise those bits are set to zero. Note that when `RB=0`
115 then instead of reading from the register file the mask is set to all ones.
118 Executable pseudocode demo:
121 [[!inline pages="openpower/sv/bmask.py" quick="yes" raw="yes" ]]
129 [[!inline pages="openpower/sv/vector_ops" raw=yes ]]
135 # Instruction Formats
137 Add the following entries to Book I 1.6.1 Word Instruction Formats:
142 |0 |6 |11 |16 |21 |24 |25 |31 |
143 | PO | FRT | FRA | FRB | FMM | XO | Rc |
144 | PO | RT | RA | RB | MMM | / | XO | Rc |
147 Add the following new fields to Book I 1.6.2 Word Instruction Fields:
151 Field used to specify minimum/maximum mode for fminmax[s].
156 Field used to specify minimum/maximum mode for integer minmax.
161 Add `MM` to the `Formats:` list for all of `FRT`, `FRA`, `FRB`, `XO (25:30)`,
162 `Rc`, `RT`, `RA` and `RB`.
170 Appendix E Power ISA sorted by opcode
171 Appendix F Power ISA sorted by version
172 Appendix G Power ISA sorted by Compliancy Subset
173 Appendix H Power ISA sorted by mnemonic
175 | Form | Book | Page | Version | Mnemonic | Description |
176 |------|------|------|---------|----------|-------------|
177 | MM | I | # | 3.2B | fminmax | Floating Minimum/Maximum |
178 | MM | I | # | 3.2B | fminmaxs | Floating Minimum/Maximum Single |
179 | MM | I | # | 3.2B | minmax | Minimum/Maximum |