# RFC ls011 LD/ST-Update-PostIncrement * Funded by NLnet under the Privacy and Enhanced Trust Programme, EU Horizon2020 Grant 825310, and NGI0 Entrust No 101069594 * * * * **Severity**: Major **Status**: New **Date**: 21 Apr 2023. **Target**: v3.2B **Source**: v3.0B **Books and Section affected**: ``` Chapter 2 Book I, new Fixed-Point Load / Store Sections 3.3.2 3.3.3 Chapter 4 Book I, new Floating-Point Load / Store Sections 4.6.2 4.6.3 ``` **Summary** ``` TODO ``` **Submitter**: Luke Leighton (Libre-SOC) **Requester**: Libre-SOC **Impact on processor**: ``` Addition of new Load/Store Fixed and Floating Point instructions ``` **Impact on software**: ``` Requires support for new instructions in assembler, debuggers, and related tools. Reduces instructions in hot-loops ``` **Keywords**: ``` ``` **Motivation** Moving the update of RA to *after* the Memory operation saves on instruction count both outside and inside hot-loops. strncpy may be reduced to 11 Vector instructions, 3 of which are the zeroing loop, 5 of which are the copy. Percentage-wise LD/ST Update Post-Increment represents a massive 20% reduction. **Notes and Observations**: These types of instructions are already present in x86 (sort-of). * x86 chose that store should be pre-indexed and load should be post-indexed * Power ISA chose everything to be pre-indexed * Motorola 68000 (decades old) has pre- and post- indexed **Changes** Add the following entries to: * New Load/Store Sections * Appendices [[!tag opf_rfc]] -------- \newpage{} TODO (key stub notes below) The LD/ST-Immediate-Post-Increment instructions are all Primary Opcode: there are 13 of these. LD/ST-Indexed-Post-Increment are all effectively 9-bit XO and consequently may easily fit into one single Primary Opcode. EXT2xx is recommended. One alternative idea is that bit 31 could be allocated (retrospectively) to Post-Increment. Although it may be too late for Scalar Power ISA it **may** be possible to consider for SVP64Single and/or SVP64-Vector, but this risks creating a non-Orthogonal ISA. ``` # LD/ST-Postincrement lbzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W lbzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W lhzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W lhzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W lhaup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W lhaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W lwzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W lwzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W lwaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W ldup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W ldupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W stbup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W stbupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W sthup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W sthupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W stwup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W stwupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W stdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W stdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W # FP LD/ST-Postincrement lfdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W lfsup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W lfdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W lsdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W stfdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W stfsup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W stfdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W stfsupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W # LD/ST-Shifted-Postincrement lbzupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W lhzupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W lhaupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W lwzupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W lwaupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W ldupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W stbupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W sthupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W stwupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W stdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W # FP LD/ST-Shifted-Postincrement lfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W lfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W stfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W stfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W ``` # Example Here is an annotated example where the pseudo-code changes to just use `RA` as the address, otherwise remaining the same. No actual change to the Effective Address computation itself occurs, in any of the Post-Update instructions. **Load Byte and Zero with Post-Update** D-Form * lbzup RT,D(RA) Pseudo-code: ``` EA <- (RA) # EA just RA RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # then load RA <- (RA) + EXTS(D) # then update RA after ``` Special Registers Altered: ``` None ``` where the same pseudocode for `lbzu` is: ``` EA <- (RA) + EXTS(D) # EA includes D RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # load from RA+D RA <- EA # and update RA ``` ----- \newpage{} # Fixed-point Load with Post-Update Add the following additional Section to Fixed-Point Load: Book I 3.3.2.1 [[!inline pages="openpower/isa/pifixedload" raw=yes ]] ----- \newpage{} # Fixed-Point Store Post-Update Add the following as a new section in Fixed-Point Store, Book I [[!inline pages="openpower/isa/pifixedstore" raw=yes ]] ----- \newpage{} # Floating-Point Load Post-Update Add the following as a new section in Floating-Point Load, Book I 4.6.2 [[!inline pages="openpower/isa/fpload" raw=yes ]] ----- \newpage{} # Floating-Point Store Post-Update Add the following as a new section in Floating-Point Store, Book I 4.6.3 [[!inline pages="openpower/isa/fpstore" raw=yes ]] ----- \newpage{} # Fixed-Point Load Shifted Post-Update Add the following as a new section in Fixed-Point Load: Book I [[!inline pages="openpower/isa/pifixedloadshift" raw=yes ]] ----- \newpage{} # Fixed-Point Store Shifted Post-Update Add the following as a new section in Fixed-Point Store: Book I [[!inline pages="openpower/isa/pifixedstoreshift" raw=yes ]] ----- \newpage{} # Floating-Point Load Shifted Post-Update Add the following as a new section in Floating-Point Load: Book I [[!inline pages="openpower/isa/pifploadshift" raw=yes ]] ----- \newpage{} # Floating-Point Store Shifted Post-Update Add the following as a new section in Floating-Point Store: Book I [[!inline pages="openpower/isa/pifpstoreshift" raw=yes ]] [[!tag opf_rfc]]