# RFC ls004 v2 Shift-And-Add and LD/ST-Shifted * Funded by NLnet under the Privacy and Enhanced Trust Programme, EU Horizon2020 Grant 825310, and NGI0 Entrust No 101069594 * * * feedback: **Changes**: * initial shift-and-add * add saddw: * consider LD/ST-Shifted **Severity**: Major **Status**: New **Date**: 07 Feb 2024 **Target**: v3.2B **Source**: v3.0B **Books and Section affected**: ``` Book I Fixed-Point Shift Instructions 3.3.14.2 Appendix E Power ISA sorted by opcode Appendix F Power ISA sorted by version Appendix G Power ISA sorted by Compliancy Subset Appendix H Power ISA sorted by mnemonic ``` **Summary** ``` Instructions added sadd - Shift and Add saddw - Shift and Add Signed Word sadduw - Shift and Add Unsigned Word Also LD/ST-Indexed-Shifted (Fixed and Floating) ``` **Submitter**: Luke Leighton (Libre-SOC) **Requester**: Libre-SOC **Impact on processor**: ``` Addition of three new GPR-based instructions ``` **Impact on software**: ``` Requires support for new instructions in assembler, debuggers, and related tools. ``` **Keywords**: ``` GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing ``` **Motivation** Power ISA is missing LD/ST Indexed with shift, which is present in both ARM and x86. Adding more LD/ST is thirty eight instructions, a compromise is to add shift-and-add. Replaces a pair of explicit instructions in hot-loops. Adding actual LD/ST Shifted saves even further. **Notes and Observations**: 1. `sadd` and `sadduw` operate on unsigned integers. 2. `sadduw` is intended for performing address offsets, as the second operand is constrained to lower 32-bits and zero-extended. 3. All three are 2-in 1-out instructions. 4. shift-add operations are present in both x86 and aarch64, since they are useful for both general arithmetic and for computing addresses even when not immediately followed with a load/store. 5. `saddw` is often more useful than `sadduw` because C/C++ programmers like to use `int` for array indexing. for additional details see . 6. Even Motorola 68000 has LD/ST-Indexed-Shifted 7. should average-shift-add also be included? what about CA-in / CA-out? **Changes** Add the following entries to: * the Appendices of Book I * Instructions of Book I added to Section 3.3.14.2 ---------------- \newpage{} # Table of LD/ST-Indexed-Shift The following demonstrates the alternative instructions that could be considered to be added. They are all 9-bit XO: * 12 Load Indexed Shifted (with Update) * 3 Load Indexed Shifted Byte-reverse * 8 Store Indexed Shifted (with Update) * 3 Store Indexed Shifted Byte-reverse * 6 Floating-Point Load Indexed Shifted (with Update) * 6 Floating-Point Store Indexed Shifted (with Update) * 6 Load Indexed Shifted Update Post-Increment * 4 Store Indexed Shifted Update Post-Increment * 2 Floating-Point Load Indexed Shifted Update Post-Increment * 2 Floating-Point Store Indexed Shifted Update Post-Increment Total count: 51 new 9-bit XO instructions, for an approximate total XO cost of 3 bits within a single Primary Opcode. With the savings that these instructions represent in hot-loops, as evidenced by their inclusion in top-end ISAs such as x86 and ARM, the cost may be considered justifiable. However there is no point in placing the 38 Shifted-only group in EXT2xx, they need to be in EXT0xx, because if added as 64-bit Encoding the benefit reduction in binary size is not achieved. Post-Increment-Shifted on the other hand could reasonably be proposed in EXT2xx. **LD/ST-Shifted** | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction | |-------|------|-------|-------|-------|-------|----------------------| | PO | RT | RA | RB | SH | XO | lbzsx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lhzsx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lhasx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lwzsx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lwasx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | ldsx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lhbrsx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lwbrsx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | ldbrsx RT,RA,RB,SH | | PO | RS | RA | RB | SH | XO | stbsx RS,RA,RB,SH | | PO | RS | RA | RB | SH | XO | sthsx RS,RA,RB,SH | | PO | RS | RA | RB | SH | XO | stwsx RS,RA,RB,SH | | PO | RS | RA | RB | SH | XO | stdsx RS,RA,RB,SH | | PO | RS | RA | RB | SH | XO | sthbrsx RS,RA,RB,SH | | PO | RS | RA | RB | SH | XO | stwbrsx RS,RA,RB,SH | | PO | RS | RA | RB | SH | XO | stdbrsx RS,RA,RB,SH | | PO | FRT | RA | RB | SH | XO | lfsxs FRT,RA,RB,SH | | PO | FRT | RA | RB | SH | XO | lfdxs FRT,RA,RB,SH | | PO | FRT | RA | RB | SH | XO | lfiwaxs FRT,RA,RB,SH | | PO | FRT | RA | RB | SH | XO | lfiwzxs FRT,RA,RB,SH | | PO | FRS | RA | RB | SH | XO | stfsxs FRS,RA,RB,SH | | PO | FRS | RA | RB | SH | XO | stfdxs FRS,RA,RB,SH | | PO | FRS | RA | RB | SH | XO | stfiwxs FRS,RA,RB,SH | **LD/ST-Shifted-Update** | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction | |-------|------|-------|-------|-------|-------|----------------------| | PO | RT | RA | RB | SH | XO | lbzusx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lhzusx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lhausx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lwzusx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lwausx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | ldusx RT,RA,RB,SH | | PO | RS | RA | RB | SH | XO | stbusx RS,RA,RB,SH | | PO | RS | RA | RB | SH | XO | sthusx RS,RA,RB,SH | | PO | RS | RA | RB | SH | XO | stwusx RS,RA,RB,SH | | PO | RS | RA | RB | SH | XO | stdusx RS,RA,RB,SH | | PO | FRT | RA | RB | SH | XO | lfsuxs FRT,RA,RB,SH | | PO | FRT | RA | RB | SH | XO | lfduxs FRT,RA,RB,SH | | PO | FRS | RA | RB | SH | XO | stfsuxs FRS,RA,RB,SH | | PO | FRS | RA | RB | SH | XO | stfduxs FRS,RA,RB,SH | **Post-Increment-Update LD/ST-Shifted** | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction | |-------|------|-------|-------|-------|-------|----------------------| | PO | RT | RA | RB | SH | XO | lbzupsx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lhzupsx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lhaupsx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lwzupsx RT,RA,RB,SH | | PO | RT | RA | RB | SH | XO | lwaupsx RT,RA,RB,SH | | PO | RS | RA | RB | SH | XO | stbupsx RS,RA,RB,SH | | PO | RS | RA | RB | SH | XO | sthupsx RS,RA,RB,SH | | PO | RS | RA | RB | SH | XO | stwupsx RS,RA,RB,SH | | PO | RS | RA | RB | SH | XO | stdupsx RS,RA,RB,SH | | PO | RT | RA | RB | SH | XO | ldupsx RT,RA,RB,SH | | PO | FRT | RA | RB | SH | XO | lfdupxs FRT,RA,RB,SH | | PO | FRT | RA | RB | SH | XO | lfsupxs FRT,RA,RB,SH | | PO | FRS | RA | RB | SH | XO | stfdupxs FRS,RA,RB,SH | | PO | FRS | RA | RB | SH | XO | stfsupxs FRS,RA,RB,SH | ---------------- \newpage{} # Shift-and-Add `sadd RT, RA, RB, SH` | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form | |-------|------|-------|-------|-------|-------|----|----------| | PO | RT | RA | RB | SH | XO | Rc | Z23-Form | Pseudocode: ``` shift <- SH + 1 # Shift is between 1-4 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA RT <- sum # Result stored in RT ``` When `SH` is zero, the contents of register RB are multiplied by 2, added to the contents of register RA, and the result stored in RT. `SH` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16. Operands RA and RB, and the result RT are all 64-bit, unsigned integers. **NEED EXAMPLES (not sure how to embed SH)!!!** Examples: ``` # adds r1 to (r2*8) sadd r4, r1, r2, 3 ``` # Shift-and-Add Signed Word `saddw RT, RA, RB, SH` | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form | |-------|------|-------|-------|-------|-------|----|----------| | PO | RT | RA | RB | SH | XO | Rc | Z23-Form | Pseudocode: ``` shift <- SH + 1 # Shift is between 1-4 n <- EXTS64((RB)[32:63]) # Only use lower 32-bits of RB sum[0:63] <- (n << shift) + (RA) # Shift n, add RA RT <- sum # Result stored in RT ``` When `SH` is zero, the lower word contents of register RB are multiplied by 2, added to the contents of register RA, and the result stored in RT. `SH` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16. Operands RA and RB, and the result RT are all 64-bit, signed integers. *Programmer's Note: The advantage of this instruction is doing address offsets. RA is the base 64-bit address. RB is the offset into data structure limited to 32-bit.* Examples: ``` # r4 = r1 + (r2*16) saddw r4, r1, r2, 3 ``` ---------------- \newpage{} # Shift-and-Add Unsigned Word `sadduw RT, RA, RB, SH` | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form | |-------|------|-------|-------|-------|-------|----|----------| | PO | RT | RA | RB | SH | XO | Rc | Z23-Form | Pseudocode: ``` shift <- SH + 1 # Shift is between 1-4 n <- (RB)[32:63] # Only use lower 32-bits of RB sum[0:63] <- (n << shift) + (RA) # Shift n, add RA RT <- sum # Result stored in RT ``` When `SH` is zero, the lower word contents of register RB are multiplied by 2, added to the contents of register RA, and the result stored in RT. `SH` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16. Operands RA and RB, and the result RT are all 64-bit, unsigned integers. *Programmer's Note: The advantage of this instruction is doing address offsets. RA is the base 64-bit address. RB is the offset into data structure limited to 32-bit.* Examples: ``` # sadduw r4, r1, r2, 2 ``` \newpage{} [[!inline pages="openpower/isa/fixedloadshift" raw=yes ]] \newpage{} [[!inline pages="openpower/isa/fixedstoreshift" raw=yes ]] \newpage{} [[!inline pages="openpower/isa/fploadshift" raw=yes ]] \newpage{} [[!inline pages="openpower/isa/fpstoreshift" raw=yes ]] \newpage{} [[!inline pages="openpower/isa/pifixedloadshift" raw=yes ]] \newpage{} [[!inline pages="openpower/isa/pifixedstoreshift" raw=yes ]] \newpage{} [[!inline pages="openpower/isa/pifploadshift" raw=yes ]] \newpage{} [[!inline pages="openpower/isa/pifpstoreshift" raw=yes ]] \newpage{} # Instruction Formats **Add the following to Book I 1.6.1** Z23-Form: ``` | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form | |-------|------|-------|-------|-------|-------|----|----------| | PO | RT | RA | RB | SH | XO | Rc | Z23-Form | | PO | RS | RA | RB | SH | XO | Rc | Z23-Form | | PO | FRT | RA | RB | SH | XO | Rc | Z23-Form | | PO | FRS | RA | RB | SH | XO | Rc | Z23-Form | ``` # Instruction Fields Add Z23 to the following Formats in Book I 1.6.2: `FRS FRT RT RA RB XO Rc` Add the following new fields: ``` SH (21:22) Field used to specify a shift amount. Formats: Z23 ``` # Appendices Appendix E Power ISA sorted by opcode Appendix F Power ISA sorted by version Appendix G Power ISA sorted by Compliancy Subset Appendix H Power ISA sorted by mnemonic | Form | Book | Page | Version | mnemonic | Description | |------|------|------|---------|----------|-------------| | Z23 | I | # | 3.0B | sadd | Shift-and-Add | | Z23 | I | # | 3.0B | saddw | Shift-and-Add Signed Word | | Z23 | I | # | 3.0B | sadduw | Shift-and-Add Unsigned Word | [[!tag opf_rfc]]