1 # RFC ls011 LD/ST-Update-PostIncrement
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=1048>
6 * <https://libre-soc.org/openpower/sv/rfc/ls011/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1045>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
14 **Date**: 21 Apr 2023.
20 **Books and Section affected**:
23 Chapter 2 Book I, new Fixed-Point Load / Store Sections 3.3.2 3.3.3
24 Chapter 4 Book I, new Floating-Point Load / Store Sections 4.6.2 4.6.3
33 **Submitter**: Luke Leighton (Libre-SOC)
35 **Requester**: Libre-SOC
37 **Impact on processor**:
43 **Impact on software**:
46 Requires support for new instructions in assembler, debuggers, and related tools.
47 Reduces instructions in hot-loops
60 **Notes and Observations**:
66 Add the following entries to:
68 * A new "Vector Looping" Book
69 * New Vector-Looping Chapters
70 * New Vector-Looping Appendices
78 TODO (key stub notes below)
82 The following instructions are proposed to be added in EXT2xx,
83 duplicating LD/ST-Update functionality but moving the update
84 of RA to *after* the Memory operation. These types of
85 instructions are already present in x86 (sort-of).
87 * x86 chose that store should be pre-indexed and load should be post-indexed
88 * Power ISA chose everything to be pre-indexed
89 * Motorola 68000 (decades old) has pre- and post- indexed
91 <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
93 <https://azeria-labs.com/memory-instructions-load-and-store-part-4/>
95 The LD/ST-Immediate-Post-Increment instructions are all Primary
96 Opcode: there are 13 of these. LD/ST-Indexed-Post-Increment
97 are all effectively 9-bit XO and consequently may easily
98 fit into one single Primary Opcode. EXT2xx is recommended.
100 One alternative idea is that bit 31 could be allocated (retrospectively)
101 to Post-Increment. Although it may be too late for Scalar Power ISA
102 it **may** be possible to consider for SVP64Single and/or SVP64-Vector,
103 but this risks creating a non-Orthogonal ISA.
108 # LD/ST-Postincrement
109 lbzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
110 lbzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
111 lhzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
112 lhzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
113 lhaup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
114 lhaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
115 lwzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
116 lwzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
117 lwaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
118 ldup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
119 ldupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
120 stbup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
121 stbupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
122 sthup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
123 sthupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
124 stwup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
125 stwupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
126 stdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
127 stdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
129 # FP LD/ST-Postincrement
130 lfdu, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
131 lfsu, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
132 lfdux, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
133 lsdux, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
134 stfdu, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
135 stfsu, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
136 stfdux, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
137 stfsux, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
139 # LD/ST-Shifted-Postincrement
140 lbzuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
141 lhzuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
142 lhauspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
143 lwzuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
144 lwauspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
145 lduspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
146 stbuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
147 sthuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
148 stwuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
149 stduspx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
151 # FP LD/ST-Shifted-Postincrement
152 lfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
153 lfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
154 stfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
155 stfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
161 Here is an annotated example where the pseudo-code changes to
162 just use `RA` as the address, otherwise remaining the same.
163 No actual change to the Effective Address computation itself
164 occurs, in any of the Post-Update instructions.
166 ** Load Byte and Zero with Post-Update**
175 EA <- (RA) # EA just RA
176 RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # then load
177 RA <- (RA) + EXTS(D) # then update RA after
180 Special Registers Altered:
186 where the same pseudocode for `lbzu` is:
189 EA <- (RA) + EXTS(D) # EA includes D
190 RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # load from RA+D
191 RA <- EA # and update RA
197 # Fixed-point Load with Post-Update
199 Add the following additional Section to Fixed-Point Load Book I
201 ## Load Byte and Zero with Post-Update
210 RT <- ([0] * (XLEN-8)) || MEM(EA, 1)
213 Special Registers Altered:
217 ## Load Byte and Zero with Post-Update Indexed
226 RT <- ([0] * (XLEN-8)) || MEM(EA, 1)
229 Special Registers Altered:
233 ## Load Halfword and Zero with Post-Update
242 RT <- ([0] * (XLEN-16)) || MEM(EA, 2)
245 Special Registers Altered:
249 ## Load Halfword and Zero with Post-Update Indexed
258 RT <- ([0] * (XLEN-16)) || MEM(EA, 2)
261 Special Registers Altered:
265 ## Load Halfword Algebraic with Post-Update
274 RT <- EXTS(MEM(EA, 2))
277 Special Registers Altered:
281 ## Load Halfword Algebraic with Post-Update Indexed
290 RT <- EXTS(MEM(EA, 2))
293 Special Registers Altered:
297 ## Load Word and Zero with Post-Update
306 RT <- [0]*32 || MEM(EA, 4)
309 Special Registers Altered:
313 ## Load Word and Zero with Post-Update Indexed
322 RT <- [0] * 32 || MEM(EA, 4)
325 Special Registers Altered:
329 ## Load Word Algebraic with Post-Update Indexed
338 RT <- EXTS(MEM(EA, 4))
341 Special Registers Altered:
345 ## Load Doubleword with Post-Update Indexed
355 RA <- (RA) + EXTS(DS || 0b00)
357 Special Registers Altered:
361 ## Load Doubleword with Post-Update Indexed
373 Special Registers Altered:
381 # Fixed-Point Store Post-Update
383 Add the following as a new section in Fixed-Point Store, Book I
385 ## Store Byte with Update
395 MEM(ea, 1) <- (RS)[XLEN-8:XLEN-1]
398 Special Registers Altered:
402 ## Store Byte with Update Indexed
412 MEM(ea, 1) <- (RS)[XLEN-8:XLEN-1]
415 Special Registers Altered:
419 ## Store Halfword with Update
429 MEM(ea, 2) <- (RS)[XLEN-16:XLEN-1]
432 Special Registers Altered:
436 ## Store Halfword with Update Indexed
446 MEM(ea, 2) <- (RS)[XLEN-16:XLEN-1]
449 Special Registers Altered:
453 ## Store Word with Update
463 MEM(ea, 4) <- (RS)[XLEN-32:XLEN-1]
466 Special Registers Altered:
470 ## Store Word with Update Indexed
480 MEM(ea, 4) <- (RS)[XLEN-32:XLEN-1]
483 Special Registers Altered:
487 ## Store Doubleword with Update
495 EA <- (RA) + EXTS(DS || 0b00)
500 Special Registers Altered:
504 ## Store Doubleword with Update Indexed
517 Special Registers Altered: