# RFC ls004 Shift-And-Add
**URLs**:
*
*
* shift-and-add
* add shaddw:
**Severity**: Major
**Status**: New
**Date**: 31 Oct 2022
**Target**: v3.2B
**Source**: v3.0B
**Books and Section affected**:
```
Book I Fixed-Point Shift Instructions 3.3.14.2
Appendix E Power ISA sorted by opcode
Appendix F Power ISA sorted by version
Appendix G Power ISA sorted by Compliancy Subset
Appendix H Power ISA sorted by mnemonic
```
**Summary**
```
Instructions added
shadd - Shift and Add
shaddw - Shift and Add Signed Word
shadduw - Shift and Add Unsigned Word
Also under consideration LD/ST-Indexed-Shifted
```
**Submitter**: Luke Leighton (Libre-SOC)
**Requester**: Libre-SOC
**Impact on processor**:
```
Addition of three new GPR-based instructions
```
**Impact on software**:
```
Requires support for new instructions in assembler, debuggers,
and related tools.
```
**Keywords**:
```
GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
```
**Motivation**
Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
**Notes and Observations**:
1. `shadd` and `shadduw` operate on unsigned integers.
2. `shadduw` is intended for performing address offsets,
as the second operand is constrained to lower 32-bits
and zero-extended.
3. All three are 2-in 1-out instructions.
4. shift-add operations are present in both x86 and aarch64,
since they are useful for both general arithmetic and for
computing addresses even when not immediately followed
with a load/store.
5. `shaddw` is often more useful than `shadduw` because C/C++ programmers like
to use `int` for array indexing. for additional details see
.
6. Even Motorola 68000 has LD/ST-Indexed-Shifted
7. should average-add also be included? what about CA?
**Changes**
Add the following entries to:
* the Appendices of Book I
* Instructions of Book I added to Section 3.3.14.2
----------------
\newpage{}
# Table of LD/ST-Indexed-Shift
The following demonstrates the alternative instructions that could
be considered to be added. They are all 9-bit XO:
* 12 Load Indexed Shifted (with Update)
* 3 Load Indexed Shifted Byte-reverse
* 8 Store Indexed Shifted (with Update)
* 3 Store Indexed Shifted Byte-reverse
* 6 Floating-Point Load Indexed Shifted (with Update)
* 6 Floating-Point Store Indexed Shifted (with Update)
* 6 Load Indexed Shifted Update Post-Increment
* 4 Store Indexed Shifted Update Post-Increment
* 2 Floating-Point Load Indexed Shifted Update Post-Increment
* 2 Floating-Point Store Indexed Shifted Update Post-Increment
Total count: 51 new 9-bit XO instructions, for an approximate total
XO cost of 3 bits within a single Primary Opcode. With the savings
that these instructions represent in hot-loops, as evidenced by their
inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
justifiable. However there is no point in placing the 38
Shifted-only group in EXT2xx, they need to be in EXT0xx, because if added
as 64-bit Encoding the benefit reduction in binary size is not achieved.
Post-Increment-Shifted on the other hand could reasonably be proposed
in EXT2xx.
| 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
|-------|------|-------|-------|-------|-------|----------------------|
| PO | RT | RA | RB | sm | XO | lbzsx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lbzusx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lhzsx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lhzusx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lhasx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lhausx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lwzsx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lwzusx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lwasx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lwausx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | ldsx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | ldusx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lhbrsx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lwbrsx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | ldbrsx RT,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | stbsx RS,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | stbusx RS,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | sthsx RS,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | sthusx RS,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | stwsx RS,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | stwusx RS,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | stdsx RS,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | stdusx RS,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | sthbrsx RS,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | stwbrsx RS,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | stdbrsx RS,RA,RB,sm |
| PO | FRT | RA | RB | sm | XO | lfsxs FRT,RA,RB,sm |
| PO | FRT | RA | RB | sm | XO | lfsuxs FRT,RA,RB,sm |
| PO | FRT | RA | RB | sm | XO | lfdxs FRT,RA,RB,sm |
| PO | FRT | RA | RB | sm | XO | lfduxs FRT,RA,RB,sm |
| PO | FRT | RA | RB | sm | XO | lfiwaxs FRT,RA,RB,sm |
| PO | FRT | RA | RB | sm | XO | lfiwzxs FRT,RA,RB,sm |
| PO | FRS | RA | RB | sm | XO | stfsxs FRS,RA,RB,sm |
| PO | FRS | RA | RB | sm | XO | stfsuxs FRS,RA,RB,sm |
| PO | FRS | RA | RB | sm | XO | stfdxs FRS,RA,RB,sm |
| PO | FRS | RA | RB | sm | XO | stfduxs FRS,RA,RB,sm |
| PO | FRS | RA | RB | sm | XO | stfiwxs FRS,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lbzuspx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lhzuspx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lhauspx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lwzuspx RT,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lwauspx RT,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | stbuspx RS,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | sthuspx RS,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | stwuspx RS,RA,RB,sm |
| PO | RS | RA | RB | sm | XO | stduspx RS,RA,RB,sm |
| PO | RT | RA | RB | sm | XO | lduspx RT,RA,RB,sm |
| PO | FRT | RA | RB | sm | XO | lfdupxs FRT,RA,RB,sm |
| PO | FRT | RA | RB | sm | XO | lfsupxs FRT,RA,RB,sm |
| PO | FRS | RA | RB | sm | XO | stfdupxs FRS,RA,RB,sm |
| PO | FRS | RA | RB | sm | XO | stfsupxs FRS,RA,RB,sm |
----------------
\newpage{}
# Shift-and-Add
`shadd RT, RA, RB, sm`
| 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
|-------|------|-------|-------|-------|-------|----|----------|
| PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
Pseudocode:
```
shift <- sm + 1 # Shift is between 1-4
sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
RT <- sum # Result stored in RT
```
When `sm` is zero, the contents of register RB are multiplied by 2,
added to the contents of register RA, and the result stored in RT.
`sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
**NEED EXAMPLES (not sure how to embed sm)!!!**
Examples:
```
# adds r1 to (r2*8)
shadd r4, r1, r2, 3
```
# Shift-and-Add Signed Word
`shaddw RT, RA, RB, sm`
| 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
|-------|------|-------|-------|-------|-------|----|----------|
| PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
Pseudocode:
```
shift <- sm + 1 # Shift is between 1-4
n <- EXTS64((RB)[32:63]) # Only use lower 32-bits of RB
sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
RT <- sum # Result stored in RT
```
When `sm` is zero, the lower word contents of register RB are multiplied by 2,
added to the contents of register RA, and the result stored in RT.
`sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
Operands RA and RB, and the result RT are all 64-bit, signed integers.
*Programmer's Note:
The advantage of this instruction is doing address offsets. RA is the base 64-bit
address. RB is the offset into data structure limited to 32-bit.*
Examples:
```
# r4 = r1 + (r2*16)
shaddw r4, r1, r2, 3
```
----------------
\newpage{}
# Shift-and-Add Unsigned Word
`shadduw RT, RA, RB, sm`
| 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
|-------|------|-------|-------|-------|-------|----|----------|
| PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
Pseudocode:
```
shift <- sm + 1 # Shift is between 1-4
n <- (RB)[32:63] # Only use lower 32-bits of RB
sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
RT <- sum # Result stored in RT
```
When `sm` is zero, the lower word contents of register RB are multiplied by 2,
added to the contents of register RA, and the result stored in RT.
`sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
*Programmer's Note:
The advantage of this instruction is doing address offsets. RA is the base 64-bit
address. RB is the offset into data structure limited to 32-bit.*
Examples:
```
#
shadduw r4, r1, r2, 2
```
# Appendices
Appendix E Power ISA sorted by opcode
Appendix F Power ISA sorted by version
Appendix G Power ISA sorted by Compliancy Subset
Appendix H Power ISA sorted by mnemonic
| Form | Book | Page | Version | mnemonic | Description |
|------|------|------|---------|----------|-------------|
| Z23 | I | # | 3.0B | shadd | Shift-and-Add |
| Z23 | I | # | 3.0B | shaddw | Shift-and-Add Signed Word |
| Z23 | I | # | 3.0B | shadduw | Shift-and-Add Unsigned Word |
[[!tag opf_rfc]]