1 # RFC ls007 Ternary/Binary GPR and CR Field bit-operations
5 * <https://libre-soc.org/openpower/sv/bitmanip/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls007/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1017>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/todo>
20 **Books and Section affected**: **UPDATE**
22 * Book I 2.5.1 Condition Register Logical Instructions
23 * Book I 3.3.13 Fixed-Point Logical Instructions
24 * Appendix E Power ISA sorted by opcode
25 * Appendix F Power ISA sorted by version
26 * Appendix G Power ISA sorted by Compliancy Subset
27 * Appendix H Power ISA sorted by mnemonic
33 * `ternlogi` -- Ternary Logic Immediate
34 * `crternlogi` -- Condition Register Ternary Logic Immediate
35 * `binlog` -- Dynamic Binary Logic
36 * `crbinlog` -- Condition Register Dynamic Binary Logic
38 **Submitter**: Luke Leighton (Libre-SOC)
40 **Requester**: Libre-SOC
42 **Impact on processor**:
44 * Addition of two new GPR-based instructions
45 * Addition of two new CR-field-based instructions
47 **Impact on software**:
49 * Requires support for new instructions in assembler, debuggers,
55 GPR, CR-Field, bit-manipulation, ternary, binary, dynamic, look-up-table (LUT), FPGA
60 * `ternlogi` is similar to existing `and`/`or`/`xor`/etc. instructions, but
61 allows any arbitrary 3-input 1-output bitwise operation. This can be used to
62 combine several instructions into one. E.g. `A ^ (~B & (C | A))` can become
63 one instruction. This can also be used to have one instruction for
64 bitwise MUX `(A & B) | (~A & C)`.
65 * `binlog` is like `ternlogi` except it supports any arbitrary 2-input
66 1-output bitwise operation, where the operation can be selected dynamically
67 at runtime. This operates similarly to a Programmable LUT in a FPGA.
68 * `crternlogi` is like `ternlogi` except it works with CRs instead of GPRs.
69 * `crbinlog` is like `binlog` except it works with CRs instead of GPRs. Likewise it
70 is similar to a Programmable LUT in an FPGA.
72 **Notes and Observations**:
74 * `ternlogi` is like the existing `xxeval` instruction, except operates on GPRs instead
75 of VSRs and doesn't require VSX/VMX. SFS and SFFS are therefore less powerful.
76 * `crternlogi` is similar to the group of CR Operations (crand, cror etc) which have
77 been identified as a Binary Lookup Group, except an 8-bit
78 immediate is used instead of a 4-bit one, and up to 4 bits of a CR Field may
79 be computed at once, saving 3 CR operations.
80 * `crbinlut` is similar to the Binary Lookup Group of CR Operations except that the
81 4-bit lookup table comes from a CR Field instead of from an Immediate. Also
82 like `crternlogi` up to 4 bits may be computed at once.
86 Add the following entries to:
88 * Book I 2.5.1 Condition Register Logical Instructions
89 * Book I 3.3.13 Fixed-Point Logical Instructions
90 * Book I 1.6.1 and 1.6.2
98 Add the following section to Book I 1.6.1
101 |0 |6 |9 |12 |15 |18 |21 |29 |31 |
102 | PO | BF | BFA | BFB | BFC | msk | TLI | XO | msk |
107 Add the following section to Book I 1.6.1
110 |0 |6 |11 |16 |21 |29 |31 |
111 | PO | RT | RA | RB | TLI | XO | Rc |
116 Add the following entry to VA-FORM in Book I 1.6.1.12
119 |0 |6 |11 |16 |21|22 |26|27 |
120 | PO | RT | RA | RB | RC |nh| XO |
123 # Word Instruction Fields
125 Add the following to Book I 1.6.2
129 Field used by crternlogi to decide which CR bits to modify.
133 Nibble High. Field used by binlog to decide if the look-up-table should
134 be taken from bits 60:63 or 56:59 of RC.
138 Field used by the ternlogi instruction as the
142 Field used by the crternlogi instruction as the
147 Extended opcode field.
150 Extended opcode field.
154 * Add `TLI` to the `Formats:` list of all of `RA`, `RB`, `RT`, and `Rc`.
155 * Add `CRB` to the `Formats:` list of all of `BF`, `BFA`, `BFB`, and `BFC`.
156 * Add `TLI` to the `Formats:` list of `XO (29:30)`.
157 * Add `CRB` to the `Formats:` list of `XO (26:31)`.
158 * Add `VA` to the `Formats:` list of `XO (27:31)`.
164 # Ternary Logic Immediate
168 Add this section to Book I 3.3.13
170 * `ternlogi RT, RA, RB, TLI` (`Rc=0`)
171 * `ternlogi. RT, RA, RB, TLI` (`Rc=1`)
173 | 0-5 | 6-10 | 11-15 | 16-20 | 21-28 | 29-30 | 31 | Form |
174 |-----|------|-------|-------|-------|-------|----|----------|
175 | PO | RT | RA | RB | TLI | XO | Rc | TLI-Form |
180 result <- (~RT&~RA&~RB & TLI[0]*XLEN |
181 (~RT&~RA& RB & TLI[1]*XLEN |
182 (~RT& RA&~RB & TLI[2]*XLEN |
183 (~RT& RA& RB & TLI[3]*XLEN |
184 ( RT&~RA&~RB & TLI[4]*XLEN |
185 ( RT&~RA& RB & TLI[5]*XLEN |
186 ( RT& RA&~RB & TLI[6]*XLEN |
187 ( RT& RA& RB & TLI[7]*XLEN)
191 For each integer value i, 0 to XLEN-1, do the following.
193 Let j be the value of the concatenation of the
194 contents of bit i of RT, bit i of RB, bit i of RT.
195 The value of bit j of TLI is placed into bit i of RT.
197 See Table 145, "xxeval(A, B, C, TLI) Equivalent
198 Functions," on page 968 for the equivalent function
199 evaluated by this instruction for any given value of TLI.
201 Special registers altered:
211 # Condition Register Ternary Logic Immediate
215 Add this section to Book I 2.5.1
217 * `crternlogi BF, BFA, BFB, BFC, TLI, msk`
219 | 0.5| 6-8 | 9-11 | 12-14 | 15-17 | 18-20 | 21-28 | 29-30 | 31 | Form |
220 |----|-----|------|-------|-------|-------|-------|-------|-----|----------|
221 | PO | BF | BFA | BFB | BFC | msk | TLI | XO | msk | CRB-Form |
226 a <- CR[4*BFA+32:4*BFA+35]
227 b <- CR[4*BFB+32:4*BFB+35]
228 c <- CR[4*BFC+32:4*BFC+35]
230 idx <- a[i] || b[i] || c[i] # compute index from current bits
231 result <- TLI[7 - idx] # subtract from 7 to index in LSB0 order
233 CR[4*BF+32+i] <- result
236 Special registers altered:
246 # Dynamic Binary Logic
250 Add this section to Book I 3.3.13
252 * `binlog RT, RA, RB, RC, nh`
254 | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26 | 27-31 | Form |
255 |-----|------|-------|-------|-------|----|-------|---------|
256 | PO | RT | RA | RB | RC | nh | XO | VA-Form |
266 idx <- (RB)[i] || (RA)[i] # compute index from current bits
267 result[i] <- lut[3 - idx] # subtract from 3 to index in LSB0 order
271 Special registers altered:
277 **Programming Note**:
279 Dynamic Ternary Logic may be emulated by appropriate combination of `binlog` and `ternlogi`,
280 using the `nh` (next half) operand to select first and second nibble:
283 # compute r3 = ternlog(r4, r5, r6, table=r7)
284 # compute the values for when r6[i] = 0:
285 binlog r3, r4, r5, r7, 0 # takes look-up-table from LSB 4 bits
286 # compute the values for when r6[i] = 1:
287 binlog r4, r4, r5, r7, 1 # takes look-up-table from second-to-LSB 4 bits
288 # mux the two results together: r3 = (r3 & ~r6) | (r4 & r6)
289 ternlogi r3, r4, r6, 0b11011000
298 With ternary (LUT3) dynamic instructions being very costly,
299 and CR Fields being only 4 bit, a binary (LUT2) variant is better
301 | 0.5|6.8 | 9.11|12.14|15.17|18.21|22...30 |31|
302 | -- | -- | --- | --- | --- |-----| -------- |--|
303 | NN | BT | BA | BB | BC |m0-m3|000101110 |0 |
307 a,b = CRs[BA][i], CRs[BB][i])
308 if mask[i] CRs[BT][i] = lut2(CRs[BC], a, b)
310 When SVP64 Vectorised any of the 4 operands may be Scalar or
311 Vector, including `BC` meaning that multiple different dynamic
312 lookups may be performed with a single instruction.
314 *Programmer's note: just as with binlut and ternlogi, a pair
315 of crbinlog instructions followed by a merging crternlogi may
316 be deployed to synthesise dynamic ternary (LUT3) CR Field
327 Appendix E Power ISA sorted by opcode
328 Appendix F Power ISA sorted by version
329 Appendix G Power ISA sorted by Compliancy Subset
330 Appendix H Power ISA sorted by mnemonic
332 |Form| Book | Page | Version | mnemonic | Description |
333 |----|------|------|---------|----------|-------------|
334 |TLI | I | # | 3.2B | ternlogi | Ternary Logic Immediate |