1 # RFC ls011 LD/ST-Update-PostIncrement
3 * Funded by NLnet under the Privacy and Enhanced Trust Programme, EU
4 Horizon2020 Grant 825310, and NGI0 Entrust No 101069594
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=1048>
6 * <https://libre-soc.org/openpower/sv/rfc/ls011/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1045>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
14 **Date**: 21 Apr 2023.
20 **Books and Section affected**:
23 Chapter 2 Book I, new Fixed-Point Load / Store Sections 3.3.2 3.3.3
24 Chapter 4 Book I, new Floating-Point Load / Store Sections 4.6.2 4.6.3
33 **Submitter**: Luke Leighton (Libre-SOC)
35 **Requester**: Libre-SOC
37 **Impact on processor**:
40 Addition of new Load/Store Fixed and Floating Point instructions
43 **Impact on software**:
46 Requires support for new instructions in assembler, debuggers, and related tools.
47 Reduces instructions in hot-loops
58 Moving the update of RA to *after* the Memory operation saves on instruction count
59 both outside and inside hot-loops. strncpy may be reduced to 11 Vector instructions,
60 3 of which are the zeroing loop, 5 of which are the copy. Percentage-wise LD/ST
61 Update Post-Increment represents a massive 20% reduction.
63 **Notes and Observations**:
65 These types of instructions are already present in x86 (sort-of).
67 * x86 chose that store should be pre-indexed and load should be post-indexed
68 * Power ISA chose everything to be pre-indexed
69 * Motorola 68000 (decades old) has pre- and post- indexed
71 <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
73 <https://azeria-labs.com/memory-instructions-load-and-store-part-4/>
77 Add the following entries to:
79 * New Load/Store Sections
88 TODO (key stub notes below)
92 The LD/ST-Immediate-Post-Increment instructions are all Primary
93 Opcode: there are 13 of these. LD/ST-Indexed-Post-Increment
94 are all effectively 9-bit XO and consequently may easily
95 fit into one single Primary Opcode. EXT2xx is recommended.
97 One alternative idea is that bit 31 could be allocated (retrospectively)
98 to Post-Increment. Although it may be too late for Scalar Power ISA
99 it **may** be possible to consider for SVP64Single and/or SVP64-Vector,
100 but this risks creating a non-Orthogonal ISA.
105 # LD/ST-Postincrement
106 lbzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
107 lbzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
108 lhzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
109 lhzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
110 lhaup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
111 lhaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
112 lwzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
113 lwzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
114 lwaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
115 ldup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
116 ldupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
117 stbup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
118 stbupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
119 sthup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
120 sthupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
121 stwup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
122 stwupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
123 stdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
124 stdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
126 # FP LD/ST-Postincrement
127 lfdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
128 lfsup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
129 lfdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
130 lsdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
131 stfdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
132 stfsup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
133 stfdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
134 stfsupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
136 # LD/ST-Shifted-Postincrement
137 lbzupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
138 lhzupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
139 lhaupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
140 lwzupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
141 lwaupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
142 ldupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
143 stbupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
144 sthupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
145 stwupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
146 stdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
148 # FP LD/ST-Shifted-Postincrement
149 lfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
150 lfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
151 stfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
152 stfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
158 Here is an annotated example where the pseudo-code changes to
159 just use `RA` as the address, otherwise remaining the same.
160 No actual change to the Effective Address computation itself
161 occurs, in any of the Post-Update instructions.
163 **Load Byte and Zero with Post-Update**
172 EA <- (RA) # EA just RA
173 RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # then load
174 RA <- (RA) + EXTS(D) # then update RA after
177 Special Registers Altered:
183 where the same pseudocode for `lbzu` is:
186 EA <- (RA) + EXTS(D) # EA includes D
187 RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # load from RA+D
188 RA <- EA # and update RA
194 # Fixed-point Load with Post-Update
196 Add the following additional Section to Fixed-Point Load: Book I 3.3.2.1
198 TODO: move the inline import to pifixedload here... (separate commit).
204 # Fixed-Point Store Post-Update
206 Add the following as a new section in Fixed-Point Store, Book I
208 ## Store Byte with Update
213 |0 |6 |9 |10 |11 |16 |31 |
224 MEM(ea, 1) <- (RS)[XLEN-8:XLEN-1]
228 Special Registers Altered:
232 ## Store Byte with Update Indexed
237 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
238 | PO | RS | RA | RB | XO | / |
248 MEM(ea, 1) <- (RS)[XLEN-8:XLEN-1]
252 Special Registers Altered:
256 ## Store Halfword with Update
261 |0 |6 |9 |10 |11 |16 |31 |
272 MEM(ea, 2) <- (RS)[XLEN-16:XLEN-1]
276 Special Registers Altered:
280 ## Store Halfword with Update Indexed
285 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
286 | PO | RS | RA | RB | XO | / |
296 MEM(ea, 2) <- (RS)[XLEN-16:XLEN-1]
300 Special Registers Altered:
304 ## Store Word with Update
309 |0 |6 |9 |10 |11 |16 |31 |
320 MEM(ea, 4) <- (RS)[XLEN-32:XLEN-1]
324 Special Registers Altered:
328 ## Store Word with Update Indexed
333 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
334 | PO | RS | RA | RB | XO | / |
344 MEM(ea, 4) <- (RS)[XLEN-32:XLEN-1]
348 Special Registers Altered:
352 ## Store Doubleword with Update
361 EA <- (RA) + EXTS(DS || 0b00)
367 Special Registers Altered:
371 ## Store Doubleword with Update Indexed
376 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
377 | PO | RS | RA | RB | XO | / |
391 Special Registers Altered:
396 [[!inline pages="openpower/isa/fixedload" raw=yes ]]
398 [[!inline pages="openpower/isa/fixedstore" raw=yes ]]
400 [[!inline pages="openpower/isa/fpload" raw=yes ]]
402 [[!inline pages="openpower/isa/fpstore" raw=yes ]]
404 [[!inline pages="openpower/isa/pifixedload" raw=yes ]]
406 [[!inline pages="openpower/isa/pifixedstore" raw=yes ]]