1 # RFC ls011 LD/ST-Update-PostIncrement
3 * Funded by NLnet under the Privacy and Enhanced Trust Programme, EU
4 Horizon2020 Grant 825310, and NGI0 Entrust No 101069594
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=1048>
6 * <https://libre-soc.org/openpower/sv/rfc/ls011/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1045>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
14 **Date**: 21 Apr 2023.
20 **Books and Section affected**:
23 Chapter 2 Book I, new Fixed-Point Load / Store Sections 3.3.2 3.3.3
24 Chapter 4 Book I, new Floating-Point Load / Store Sections 4.6.2 4.6.3
33 **Submitter**: Luke Leighton (Libre-SOC)
35 **Requester**: Libre-SOC
37 **Impact on processor**:
40 Addition of new Load/Store Fixed and Floating Point instructions
43 **Impact on software**:
46 Requires support for new instructions in assembler, debuggers, and related tools.
47 Reduces instructions in hot-loops
58 Moving the update of RA to *after* the Memory operation saves on instruction count
59 both outside and inside hot-loops. strncpy may be reduced to 11 Vector instructions,
60 3 of which are the zeroing loop, 5 of which are the copy. Percentage-wise LD/ST
61 Update Post-Increment represents a massive 20% reduction.
63 **Notes and Observations**:
65 These types of instructions are already present in x86 (sort-of).
67 * x86 chose that store should be pre-indexed and load should be post-indexed
68 * Power ISA chose everything to be pre-indexed
69 * Motorola 68000 (decades old) has pre- and post- indexed
71 <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
73 <https://azeria-labs.com/memory-instructions-load-and-store-part-4/>
77 Add the following entries to:
79 * New Load/Store Sections
88 TODO (key stub notes below)
92 The LD/ST-Immediate-Post-Increment instructions are all Primary
93 Opcode: there are 13 of these. LD/ST-Indexed-Post-Increment
94 are all effectively 9-bit XO and consequently may easily
95 fit into one single Primary Opcode. EXT2xx is recommended.
97 One alternative idea is that bit 31 could be allocated (retrospectively)
98 to Post-Increment. Although it may be too late for Scalar Power ISA
99 it **may** be possible to consider for SVP64Single and/or SVP64-Vector,
100 but this risks creating a non-Orthogonal ISA.
105 # LD/ST-Postincrement
106 lbzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
107 lbzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
108 lhzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
109 lhzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
110 lhaup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
111 lhaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
112 lwzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
113 lwzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
114 lwaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
115 ldup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
116 ldupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
117 stbup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
118 stbupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
119 sthup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
120 sthupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
121 stwup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
122 stwupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
123 stdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
124 stdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
126 # FP LD/ST-Postincrement
127 lfdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
128 lfsup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
129 lfdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
130 lsdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
131 stfdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
132 stfsup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
133 stfdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
134 stfsupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
136 # LD/ST-Shifted-Postincrement
137 lbzupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
138 lhzupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
139 lhaupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
140 lwzupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
141 lwaupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
142 ldupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
143 stbupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
144 sthupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
145 stwupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
146 stdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
148 # FP LD/ST-Shifted-Postincrement
149 lfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
150 lfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
151 stfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
152 stfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
158 Here is an annotated example where the pseudo-code changes to
159 just use `RA` as the address, otherwise remaining the same.
160 No actual change to the Effective Address computation itself
161 occurs, in any of the Post-Update instructions.
163 **Load Byte and Zero with Post-Update**
172 EA <- (RA) # EA just RA
173 RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # then load
174 RA <- (RA) + EXTS(D) # then update RA after
177 Special Registers Altered:
183 where the same pseudocode for `lbzu` is:
186 EA <- (RA) + EXTS(D) # EA includes D
187 RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # load from RA+D
188 RA <- EA # and update RA
194 # Fixed-point Load with Post-Update
196 Add the following additional Section to Fixed-Point Load: Book I 3.3.2.1
198 [[!inline pages="openpower/isa/pifixedload" raw=yes ]]
204 # Fixed-Point Store Post-Update
206 Add the following as a new section in Fixed-Point Store, Book I
208 [[!inline pages="openpower/isa/pifixedstore" raw=yes ]]
214 # Floating-Point Load Post-Update
216 Add the following as a new section in Floating-Point Load, Book I 4.6.2
218 [[!inline pages="openpower/isa/fpload" raw=yes ]]
224 # Floating-Point Store Post-Update
226 Add the following as a new section in Floating-Point Store, Book I 4.6.3
228 [[!inline pages="openpower/isa/fpstore" raw=yes ]]
234 # Fixed-Point Load Shifted Post-Update
236 Add the following as a new section in Fixed-Point Load: Book I
238 [[!inline pages="openpower/isa/pifixedloadshift" raw=yes ]]
244 # Fixed-Point Store Shifted Post-Update
246 Add the following as a new section in Fixed-Point Store: Book I
248 [[!inline pages="openpower/isa/pifixedstoreshift" raw=yes ]]
254 # Floating-Point Load Shifted Post-Update
256 Add the following as a new section in Floating-Point Load: Book I
258 [[!inline pages="openpower/isa/pifploadshift" raw=yes ]]
264 # Floating-Point Store Shifted Post-Update
266 Add the following as a new section in Floating-Point Store: Book I
268 [[!inline pages="openpower/isa/pifpstoreshift" raw=yes ]]