# RFC ls007 Ternary/Binary GPR and CR Field bit-operations **URLs**: * * * * **Severity**: Major **Status**: New **Date**: 20 Oct 2022 **Target**: v3.2B **Source**: v3.1B **Books and Section affected**: **UPDATE** * Book I 2.5.1 Condition Register Logical Instructions * Book I 3.3.13 Fixed-Point Logical Instructions * Appendix E Power ISA sorted by opcode * Appendix F Power ISA sorted by version * Appendix G Power ISA sorted by Compliancy Subset * Appendix H Power ISA sorted by mnemonic **Summary** Instructions added * `ternlogi` -- Ternary Logic Immediate * `crternlogi` -- Condition Register Ternary Logic Immediate * `binlog` -- Dynamic Binary Logic * `crbinlog` -- Condition Register Dynamic Binary Logic **Submitter**: Luke Leighton (Libre-SOC) **Requester**: Libre-SOC **Impact on processor**: * Addition of two new GPR-based instructions * Addition of two new CR-field-based instructions **Impact on software**: * Requires support for new instructions in assembler, debuggers, and related tools. **Keywords**: ``` GPR, CR-Field, bit-manipulation, ternary, binary, dynamic, look-up-table (LUT), FPGA ``` **Motivation** * `ternlogi` is similar to existing `and`/`or`/`xor`/etc. instructions, but allows any arbitrary 3-input 1-output bitwise operation. This can be used to combine several instructions into one. E.g. `A ^ (~B & (C | A))` can become one instruction. This can also be used to have one instruction for bitwise MUX `(A & B) | (~A & C)`. * `binlog` is like `ternlogi` except it supports any arbitrary 2-input 1-output bitwise operation, where the operation can be selected dynamically at runtime. This operates similarly to a Programmable LUT in a FPGA. * `crternlogi` is like `ternlogi` except it works with CRs instead of GPRs. * `crbinlog` is like `binlog` except it works with CRs instead of GPRs. Likewise it is similar to a Programmable LUT in an FPGA. **Notes and Observations**: * `ternlogi` is like the existing `xxeval` instruction, except operates on GPRs instead of VSRs and doesn't require VSX/VMX. SFS and SFFS are therefore less powerful. * `crternlogi` is similar to the group of CR Operations (crand, cror etc) which have been identified as a Binary Lookup Group, except an 8-bit immediate is used instead of a 4-bit one, and up to 4 bits of a CR Field may be computed at once, saving 3 CR operations. * `crbinlut` is similar to the Binary Lookup Group of CR Operations except that the 4-bit lookup table comes from a CR Field instead of from an Immediate. Also like `crternlogi` up to 4 bits may be computed at once. **Changes** Add the following entries to: * Book I 2.5.1 Condition Register Logical Instructions * Book I 3.3.13 Fixed-Point Logical Instructions * Book I 1.6.1 and 1.6.2 ---------------- \newpage{} # CRB-FORM Add the following section to Book I 1.6.1 ``` |0 |6 |9 |12 |15 |18 |21 |29 |31 | | PO | BF | BFA | BFB | BFC | msk | TLI | XO | msk | ``` # TLI-FORM Add the following section to Book I 1.6.1 ``` |0 |6 |11 |16 |21 |29 |31 | | PO | RT | RA | RB | TLI | XO | Rc | ``` # VA-FORM Add the following entry to VA-FORM in Book I 1.6.1.12 ``` |0 |6 |11 |16 |21|22 |26|27 | | PO | RT | RA | RB | RC |nh| XO | ``` # Word Instruction Fields Add the following to Book I 1.6.2 ``` msk (9:10,14:15) Field used by crternlogi to decide which CR bits to modify. Formats: CRB nh (26) Nibble High. Field used by binlog to decide if the look-up-table should be taken from bits 60:63 or 56:59 of RC. Formats: VA TLI (21:28) Field used by the ternlogi instruction as the look-up table. Formats: TLI TLI (21:25,19:20,31) Field used by the crternlogi instruction as the look-up table. Formats: CRB XO (29:30) Extended opcode field. Formats: TLI XO (26:30) Extended opcode field. Formats: CRB ``` * Add `TLI` to the `Formats:` list of all of `RA`, `RB`, `RT`, and `Rc`. * Add `CRB` to the `Formats:` list of all of `BF`, `BFA`, `BFB`, and `BFC`. * Add `TLI` to the `Formats:` list of `XO (29:30)`. * Add `CRB` to the `Formats:` list of `XO (26:31)`. * Add `VA` to the `Formats:` list of `XO (27:31)`. ---------- \newpage{} # Ternary Logic Immediate TLI-form Add this section to Book I 3.3.13 * `ternlogi RT, RA, RB, TLI` (`Rc=0`) * `ternlogi. RT, RA, RB, TLI` (`Rc=1`) | 0-5 | 6-10 | 11-15 | 16-20 | 21-28 | 29-30 | 31 | Form | |-----|------|-------|-------|-------|-------|----|----------| | PO | RT | RA | RB | TLI | XO | Rc | TLI-Form | Pseudocode: ``` result <- (~RT&~RA&~RB & TLI[0]*XLEN | (~RT&~RA& RB & TLI[1]*XLEN | (~RT& RA&~RB & TLI[2]*XLEN | (~RT& RA& RB & TLI[3]*XLEN | ( RT&~RA&~RB & TLI[4]*XLEN | ( RT&~RA& RB & TLI[5]*XLEN | ( RT& RA&~RB & TLI[6]*XLEN | ( RT& RA& RB & TLI[7]*XLEN) RT <- result ``` For each integer value i, 0 to XLEN-1, do the following. Let j be the value of the concatenation of the contents of bit i of RT, bit i of RB, bit i of RT. The value of bit j of TLI is placed into bit i of RT. See Table 145, "xxeval(A, B, C, TLI) Equivalent Functions," on page 968 for the equivalent function evaluated by this instruction for any given value of TLI. Special registers altered: ``` CR0 (if Rc=1) ``` ---------- \newpage{} # Condition Register Ternary Logic Immediate CRB-form Add this section to Book I 2.5.1 * `crternlogi BF, BFA, BFB, BFC, TLI, msk` | 0.5| 6-8 | 9-11 | 12-14 | 15-17 | 18-20 | 21-28 | 29-30 | 31 | Form | |----|-----|------|-------|-------|-------|-------|-------|-----|----------| | PO | BF | BFA | BFB | BFC | msk | TLI | XO | msk | CRB-Form | Pseudocode: ``` a <- CR[4*BFA+32:4*BFA+35] b <- CR[4*BFB+32:4*BFB+35] c <- CR[4*BFC+32:4*BFC+35] do i = 0 to 3 idx <- a[i] || b[i] || c[i] # compute index from current bits result <- TLI[7 - idx] # subtract from 7 to index in LSB0 order if msk[i] = 1 then CR[4*BF+32+i] <- result ``` Special registers altered: ``` CR field BF ``` ---------- \newpage{} # Dynamic Binary Logic VA-form Add this section to Book I 3.3.13 * `binlog RT, RA, RB, RC, nh` | 0-5 | 6-10 | 11-15 | 16-20 | 21-25 | 26 | 27-31 | Form | |-----|------|-------|-------|-------|----|-------|---------| | PO | RT | RA | RB | RC | nh | XO | VA-Form | Pseudocode: ``` if nh = 1 then lut <- (RC)[56:59] else lut <- (RC)[60:63] do i = 0 to 63 idx <- (RB)[i] || (RA)[i] # compute index from current bits result[i] <- lut[3 - idx] # subtract from 3 to index in LSB0 order RT <- result ``` Special registers altered: ``` None ``` **Programming Note**: Dynamic Ternary Logic may be emulated by appropriate combination of `binlog` and `ternlogi`, using the `nh` (next half) operand to select first and second nibble: ``` # compute r3 = ternlog(r4, r5, r6, table=r7) # compute the values for when r6[i] = 0: binlog r3, r4, r5, r7, 0 # takes look-up-table from LSB 4 bits # compute the values for when r6[i] = 1: binlog r4, r4, r5, r7, 1 # takes look-up-table from second-to-LSB 4 bits # mux the two results together: r3 = (r3 & ~r6) | (r4 & r6) ternlogi r3, r4, r6, 0b11011000 ``` ---------- \newpage{} ## crbinlog With ternary (LUT3) dynamic instructions being very costly, and CR Fields being only 4 bit, a binary (LUT2) variant is better | 0.5|6.8 | 9.11|12.14|15.17|18.21|22...30 |31| | -- | -- | --- | --- | --- |-----| -------- |--| | NN | BT | BA | BB | BC |m0-m3|000101110 |0 | mask = m0..m3 for i in range(4): a,b = CRs[BA][i], CRs[BB][i]) if mask[i] CRs[BT][i] = lut2(CRs[BC], a, b) When SVP64 Vectorised any of the 4 operands may be Scalar or Vector, including `BC` meaning that multiple different dynamic lookups may be performed with a single instruction. *Programmer's note: just as with binlut and ternlogi, a pair of crbinlog instructions followed by a merging crternlogi may be deployed to synthesise dynamic ternary (LUT3) CR Field manipulation* ---------- \newpage{} ---------- # Appendices Appendix E Power ISA sorted by opcode Appendix F Power ISA sorted by version Appendix G Power ISA sorted by Compliancy Subset Appendix H Power ISA sorted by mnemonic |Form| Book | Page | Version | mnemonic | Description | |----|------|------|---------|----------|-------------| |TLI | I | # | 3.2B | ternlogi | Ternary Logic Immediate | ---------------- [[!tag opf_rfc]]