1 # RFC ls011 LD/ST-Update-PostIncrement
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=1048>
6 * <https://libre-soc.org/openpower/sv/rfc/ls011/>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1045>
8 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
14 **Date**: 21 Apr 2023.
20 **Books and Section affected**:
23 Chapter 2 Book I, new Fixed-Point Load / Store Sections 3.3.2 3.3.3
24 Chapter 4 Book I, new Floating-Point Load / Store Sections 4.6.2 4.6.3
33 **Submitter**: Luke Leighton (Libre-SOC)
35 **Requester**: Libre-SOC
37 **Impact on processor**:
40 Addition of new Load/Store Fixed and Floating Point instructions
43 **Impact on software**:
46 Requires support for new instructions in assembler, debuggers, and related tools.
47 Reduces instructions in hot-loops
58 Moving the update of RA to *after* the Memory operation saves on instruction count
59 both outside and inside hot-loops. strncpy may be reduced to 11 Vector instructions,
60 3 of which are the zeroing loop, 5 of which are the copy. Percentage-wise LD/ST
61 Update Post-Increment represents a massive 20% reduction.
63 **Notes and Observations**:
65 These types of instructions are already present in x86 (sort-of).
67 * x86 chose that store should be pre-indexed and load should be post-indexed
68 * Power ISA chose everything to be pre-indexed
69 * Motorola 68000 (decades old) has pre- and post- indexed
71 <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
73 <https://azeria-labs.com/memory-instructions-load-and-store-part-4/>
77 Add the following entries to:
79 * New Load/Store Sections
88 TODO (key stub notes below)
92 The LD/ST-Immediate-Post-Increment instructions are all Primary
93 Opcode: there are 13 of these. LD/ST-Indexed-Post-Increment
94 are all effectively 9-bit XO and consequently may easily
95 fit into one single Primary Opcode. EXT2xx is recommended.
97 One alternative idea is that bit 31 could be allocated (retrospectively)
98 to Post-Increment. Although it may be too late for Scalar Power ISA
99 it **may** be possible to consider for SVP64Single and/or SVP64-Vector,
100 but this risks creating a non-Orthogonal ISA.
105 # LD/ST-Postincrement
106 lbzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
107 lbzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
108 lhzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
109 lhzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
110 lhaup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
111 lhaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
112 lwzup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
113 lwzupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
114 lwaupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
115 ldup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
116 ldupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
117 stbup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
118 stbupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
119 sthup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
120 sthupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
121 stwup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
122 stwupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
123 stdup, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
124 stdupx, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
126 # FP LD/ST-Postincrement
127 lfdu, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
128 lfsu, ls011, high, PO, yes, EXT2xx, no, isa/pifixedload, 1R2W
129 lfdux, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
130 lsdux, ls011, high, 10, yes, EXT2xx, no, isa/pifixedload, 2R2W
131 stfdu, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
132 stfsu, ls011, high, PO, yes, EXT2xx, no, isa/pifixedstore, 2R1W
133 stfdux, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
134 stfsux, ls011, high, 10, yes, EXT2xx, no, isa/pifixedstore, 3R1W
136 # LD/ST-Shifted-Postincrement
137 lbzuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
138 lhzuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
139 lhauspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
140 lwzuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
141 lwauspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
142 lduspx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
143 stbuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
144 sthuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
145 stwuspx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
146 stduspx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
148 # FP LD/ST-Shifted-Postincrement
149 lfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
150 lfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 2R2W
151 stfdupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
152 stfsupsx, ls011, med, 10, yes, EXT2xx, no, ls011, 3R1W
158 Here is an annotated example where the pseudo-code changes to
159 just use `RA` as the address, otherwise remaining the same.
160 No actual change to the Effective Address computation itself
161 occurs, in any of the Post-Update instructions.
163 **Load Byte and Zero with Post-Update**
172 EA <- (RA) # EA just RA
173 RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # then load
174 RA <- (RA) + EXTS(D) # then update RA after
177 Special Registers Altered:
183 where the same pseudocode for `lbzu` is:
186 EA <- (RA) + EXTS(D) # EA includes D
187 RT <- ([0] * (XLEN-8)) || MEM(EA, 1) # load from RA+D
188 RA <- EA # and update RA
194 # Fixed-point Load with Post-Update
196 Add the following additional Section to Fixed-Point Load: Book I 3.3.2.1
198 ## Load Byte and Zero with Post-Update
203 |0 |6 |9 |10 |11 |16 |31 |
213 RT <- ([0] * (XLEN-8)) || MEM(EA, 1)
217 Let the effective address (EA) be (RA|0).
218 The byte in storage addressed by EA is loaded into
219 RT[56:63]. RT[0:55] are set to 0.
221 The sum (RA|0)+D is placed into register RA.
223 If RA=0 or RA=RT, the instruction form is invalid.
225 Special Registers Altered:
229 ## Load Byte and Zero with Post-Update Indexed
234 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
235 | PO | RT | RA | RB | XO | / |
244 RT <- ([0] * (XLEN-8)) || MEM(EA, 1)
248 Let the effective address (EA) be (RA).
249 The byte in storage addressed by EA is loaded into
250 RT[56:63]. RT[0:55] are set to 0.
252 The sum (RA)+(RB) is placed into register RA.
254 If RA=0 or RA=RT, the instruction form is invalid.
256 Special Registers Altered:
260 ## Load Halfword and Zero with Post-Update
265 |0 |6 |9 |10 |11 |16 |31 |
275 RT <- ([0] * (XLEN-16)) || MEM(EA, 2)
279 Let the effective address (EA) be (RA|0).
280 The halfword in storage addressed by EA is loaded into
281 RT[48:63]. RT[0:47] are set to 0.
283 The sum (RA|0)+D is placed into register RA.
285 If RA=0 or RA=RT, the instruction form is invalid.
287 Special Registers Altered:
291 ## Load Halfword and Zero with Post-Update Indexed
296 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
297 | PO | RT | RA | RB | XO | / |
306 RT <- ([0] * (XLEN-16)) || MEM(EA, 2)
310 Let the effective address (EA) be (RA).
311 The halfword in storage addressed by EA is loaded into
312 RT[48:63]. RT[0:47] are set to 0.
314 The sum (RA)+(RB) is placed into register RA.
316 If RA=0 or RA=RT, the instruction form is invalid.
318 Special Registers Altered:
322 ## Load Halfword Algebraic with Post-Update
327 |0 |6 |9 |10 |11 |16 |31 |
337 RT <- EXTS(MEM(EA, 2))
341 Special Registers Altered:
345 ## Load Halfword Algebraic with Post-Update Indexed
350 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
351 | PO | RT | RA | RB | XO | / |
360 RT <- EXTS(MEM(EA, 2))
364 Special Registers Altered:
368 ## Load Word and Zero with Post-Update
373 |0 |6 |9 |10 |11 |16 |31 |
383 RT <- [0]*32 || MEM(EA, 4)
387 Let the effective address (EA) be (RA|0).
388 The word in storage addressed by EA is loaded into
389 RT[32:63]. RT[0:31] are set to 0.
391 The sum (RA|0)+D is placed into register RA.
393 If RA=0 or RA=RT, the instruction form is invalid.
395 Special Registers Altered:
399 ## Load Word and Zero with Post-Update Indexed
404 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
405 | PO | RT | RA | RB | XO | / |
414 RT <- [0] * 32 || MEM(EA, 4)
418 Let the effective address (EA) be (RA).
419 The word in storage addressed by EA is loaded into
420 RT[32:63]. RT[0:31] are set to 0.
422 The sum (RA)+(RB) is placed into register RA.
424 If RA=0 or RA=RT, the instruction form is invalid.
426 Special Registers Altered:
430 ## Load Word Algebraic with Post-Update Indexed
435 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
436 | PO | RT | RA | RB | XO | / |
445 RT <- EXTS(MEM(EA, 4))
449 Special Registers Altered:
453 ## Load Doubleword with Post-Update Indexed
464 RA <- (RA) + EXTS(DS || 0b00)
467 Special Registers Altered:
471 ## Load Doubleword with Post-Update Indexed
476 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
477 | PO | RT | RA | RB | XO | / |
490 Special Registers Altered:
498 # Fixed-Point Store Post-Update
500 Add the following as a new section in Fixed-Point Store, Book I
502 ## Store Byte with Update
507 |0 |6 |9 |10 |11 |16 |31 |
518 MEM(ea, 1) <- (RS)[XLEN-8:XLEN-1]
522 Special Registers Altered:
526 ## Store Byte with Update Indexed
531 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
532 | PO | RS | RA | RB | XO | / |
542 MEM(ea, 1) <- (RS)[XLEN-8:XLEN-1]
546 Special Registers Altered:
550 ## Store Halfword with Update
555 |0 |6 |9 |10 |11 |16 |31 |
566 MEM(ea, 2) <- (RS)[XLEN-16:XLEN-1]
570 Special Registers Altered:
574 ## Store Halfword with Update Indexed
579 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
580 | PO | RS | RA | RB | XO | / |
590 MEM(ea, 2) <- (RS)[XLEN-16:XLEN-1]
594 Special Registers Altered:
598 ## Store Word with Update
603 |0 |6 |9 |10 |11 |16 |31 |
614 MEM(ea, 4) <- (RS)[XLEN-32:XLEN-1]
618 Special Registers Altered:
622 ## Store Word with Update Indexed
627 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
628 | PO | RS | RA | RB | XO | / |
638 MEM(ea, 4) <- (RS)[XLEN-32:XLEN-1]
642 Special Registers Altered:
646 ## Store Doubleword with Update
655 EA <- (RA) + EXTS(DS || 0b00)
661 Special Registers Altered:
665 ## Store Doubleword with Update Indexed
670 |0 |6 |7|8|9 |10 |11|12|13 |15|16|17 |20|21 |31 |
671 | PO | RS | RA | RB | XO | / |
685 Special Registers Altered: