**Motivation**
-Power ISA is missing LD/ST with shift, which is present in both ARM and x86.
-Adding more LD/ST is too complex, a compromise is to add shift-and-add.
-Replaces a pair of explicit instructions in hot-loops.
+Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
+and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
+add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
**Notes and Observations**:
and zero-extended.
3. Both are 2-in 1-out instructions.
-TODO: signed 32-bit shift-and-add should be added, this needs to be addressed before submitting the RFC: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
+TODO: signed 32-bit shift-and-add should be added, this needs to be addressed
+before submitting the RFC: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
**Changes**
\newpage{}
+# Table of LD/ST-Indexed-Shift
+
+The following demonstrates the alternative instructions that could
+be considered to be added. They are all 9-bit XO which is not hugely
+costly. The totals are
+
+* 12 Load Indexed Shifted (with Update)
+* 3 Load Indexed Shifted Byte-reverse
+* 8 Store Indexed Shifted (with Update)
+* 3 Store Indexed Shifted Byte-reverse
+* 6 Floating-Point Load Indexed Shifted (with Update)
+* 6 Floating-Point Store Indexed Shifted (with Update)
+
+Total count: 38 new 9-bit XO instructions, for an approximate total
+XO cost of 3 bits within a single Primary Opcode. With the savings
+that these instructions represent in hot-loops, as evidenced by their
+inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
+justifiable. However there is no point in placing these in EXT2xx, they
+need to be in EXT0xx, because if added as 64-bit Encoding the benefit
+reduction in binary size is not achieved.
+
+| 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
+|-------|------|-------|-------|-------|-------|----------|
+| PO | RT | RA | RB | sm | XO | lbzsx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | lbzusx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | lhzsx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | lhzusx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | lhasx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | lhausx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | lwzsx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | lwzusx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | lwasx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | lwausx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | ldsx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | ldusx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | lhbrsx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | lwbrsx RT,RA,RB,sm |
+| PO | RT | RA | RB | sm | XO | ldbrsx RT,RA,RB,sm |
+| PO | RS | RA | RB | sm | XO | stbus RS,RA,RB,sm |
+| PO | RS | RA | RB | sm | XO | stbusx RS,RA,RB,sm |
+| PO | RS | RA | RB | sm | XO | sthsx RS,RA,RB,sm |
+| PO | RS | RA | RB | sm | XO | sthusx RS,RA,RB,sm |
+| PO | RS | RA | RB | sm | XO | stwsx RS,RA,RB,sm |
+| PO | RS | RA | RB | sm | XO | stwusx RS,RA,RB,sm |
+| PO | RS | RA | RB | sm | XO | stdsx RS,RA,RB,sm |
+| PO | RS | RA | RB | sm | XO | stdusx RS,RA,RB,sm |
+| PO | RS | RA | RB | sm | XO | sthbrsx RS,RA,RB,sm |
+| PO | RS | RA | RB | sm | XO | stwbrsx RS,RA,RB,sm |
+| PO | RS | RA | RB | sm | XO | stdbrsx RS,RA,RB,sm |
+| PO | FRT | RA | RB | sm | XO | lfsxs FRT,RA,RB,sm |
+| PO | FRT | RA | RB | sm | XO | lfsuxs FRT,RA,RB,sm |
+| PO | FRT | RA | RB | sm | XO | lfdxs FRT,RA,RB,sm |
+| PO | FRT | RA | RB | sm | XO | lfduxs FRT,RA,RB,sm |
+| PO | FRT | RA | RB | sm | XO | lfiwaxs FRT,RA,RB,sm |
+| PO | FRT | RA | RB | sm | XO | lfiwzxs FRT,RA,RB,sm |
+| PO | FRS | RA | RB | sm | XO | stfsxs FRS,RA,RB,sm |
+| PO | FRS | RA | RB | sm | XO | stfsuxs FRS,RA,RB,sm |
+| PO | FRS | RA | RB | sm | XO | stfdxs FRS,RA,RB,sm |
+| PO | FRS | RA | RB | sm | XO | stfduxs FRS,RA,RB,sm |
+| PO | FRS | RA | RB | sm | XO | stfiwxs FRS,RA,RB,sm |
+
+----------------
+
+\newpage{}
+
# Shift-and-Add
`shadd RT, RA, RB`
Pseudocode:
+```
shift <- sm + 1 # Shift is between 1-4
sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
RT <- sum # Result stored in RT
+```
When `sm` is zero, the contents of register RB are multiplied by 2,
added to the contents of register RA, and the result stored in RT.
Examples:
```
-# adds r1 to (r2*8)
-shadd r4, r1, r2, 3
+ # adds r1 to (r2*8)
+ shadd r4, r1, r2, 3
```
# Shift-and-Add Unsigned Word
Pseudocode:
+```
shift <- sm + 1 # Shift is between 1-4
n <- (RB)[32:63] # Only use lower 32-bits of RB
sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
RT <- sum # Result stored in RT
+```
When `sm` is zero, the lower word contents of register RB are multiplied by 2,
added to the contents of register RA, and the result stored in RT.
*Programmer's Note:
The advantage of this instruction is doing address offsets. RA is the base 64-bit
-address. RB is the offset into data structure limited to 32-bit.
+address. RB is the offset into data structure limited to 32-bit.*
Examples: