# RFC ls004  Shift-And-Add

**URLs**:

* <https://libre-soc.org/openpower/sv/rfc/ls004/>
* <https://git.openpower.foundation/isa/PowerISA/issues/125>
* shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
* add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>

**Severity**: Major

**Status**: New

**Date**: 31 Oct 2022

**Target**: v3.2B

**Source**: v3.0B

**Books and Section affected**:

```
    Book I Fixed-Point Shift Instructions 3.3.14.2
    Appendix E Power ISA sorted by opcode
    Appendix F Power ISA sorted by version
    Appendix G Power ISA sorted by Compliancy Subset
    Appendix H Power ISA sorted by mnemonic
```

**Summary**

```
    Instructions added
    shadd - Shift and Add
    shaddw - Shift and Add Signed Word
    shadduw - Shift and Add Unsigned Word
    Also under consideration LD/ST-Indexed-Shifted
```

**Submitter**: Luke Leighton (Libre-SOC)

**Requester**: Libre-SOC

**Impact on processor**:

```
    Addition of three new GPR-based instructions
```

**Impact on software**:

```
    Requires support for new instructions in assembler, debuggers,
    and related tools.
```

**Keywords**:

```
    GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
```

**Motivation**

Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
and x86.  Adding more LD/ST is thirty eight instructions, a compromise is to
add shift-and-add.  Replaces a pair of explicit instructions in hot-loops.

**Notes and Observations**:

1. `shadd` and `shadduw` operate on unsigned integers.
2. `shadduw` is intended for performing address offsets,
    as the second operand is constrained to lower 32-bits
    and zero-extended.
3. All three are 2-in 1-out instructions.
4. shift-add operations are present in both x86 and aarch64,
    since they are useful for both general arithmetic and for
    computing addresses even when not immediately followed
    with a load/store.
5. `shaddw` is often more useful than `shadduw` because C/C++ programmers like
    to use `int` for array indexing. for additional details see
    <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
6. Even Motorola 68000 has LD/ST-Indexed-Shifted <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
7.  should average-add also be included? what about CA?

**Changes**

Add the following entries to:

* the Appendices of Book I
* Instructions of Book I added to Section 3.3.14.2

----------------

\newpage{}

# Table of LD/ST-Indexed-Shift

The following demonstrates the alternative instructions that could
be considered to be added. They are all 9-bit XO:

* 12 Load Indexed Shifted (with Update)
* 3 Load Indexed Shifted Byte-reverse
* 8 Store Indexed Shifted (with Update)
* 3 Store Indexed Shifted Byte-reverse
* 6 Floating-Point Load Indexed Shifted (with Update)
* 6 Floating-Point Store Indexed Shifted (with Update)
* 6 Load Indexed Shifted Update Post-Increment
* 4 Store Indexed Shifted Update Post-Increment
* 2 Floating-Point Load Indexed Shifted Update Post-Increment
* 2 Floating-Point Store Indexed Shifted Update Post-Increment

Total count: 51 new 9-bit XO instructions, for an approximate total
XO cost of 3 bits within a single Primary Opcode.  With the savings
that these instructions represent in hot-loops, as evidenced by their
inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
justifiable.  However there is no point in placing the 38
Shifted-only group in EXT2xx, they need to be in EXT0xx, because if added
as 64-bit Encoding the benefit reduction in binary size is not achieved.
Post-Increment-Shifted on the other hand could reasonably be proposed
in EXT2xx.

|  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction          |
|-------|------|-------|-------|-------|-------|----------------------|
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzsx RT,RA,RB,sm    |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzusx RT,RA,RB,sm   |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzsx RT,RA,RB,sm    |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzusx RT,RA,RB,sm   |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhasx RT,RA,RB,sm    |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhausx RT,RA,RB,sm   |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzsx RT,RA,RB,sm    |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzusx RT,RA,RB,sm   |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwasx RT,RA,RB,sm    |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwausx RT,RA,RB,sm   |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldsx RT,RA,RB,sm     |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldusx RT,RA,RB,sm    |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhbrsx RT,RA,RB,sm   |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwbrsx RT,RA,RB,sm   |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldbrsx RT,RA,RB,sm   |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbsx RS,RA,RB,sm    |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbusx RS,RA,RB,sm   |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthsx RS,RA,RB,sm    |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthusx RS,RA,RB,sm   |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwsx RS,RA,RB,sm    |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwusx RS,RA,RB,sm   |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdsx RS,RA,RB,sm    |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdusx RS,RA,RB,sm   |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthbrsx RS,RA,RB,sm  |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwbrsx RS,RA,RB,sm  |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdbrsx RS,RA,RB,sm  |
|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsxs FRT,RA,RB,sm   |
|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsuxs FRT,RA,RB,sm  |
|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfdxs FRT,RA,RB,sm   |
|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfduxs FRT,RA,RB,sm  |
|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwaxs FRT,RA,RB,sm |
|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwzxs FRT,RA,RB,sm |
|  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsxs FRS,RA,RB,sm  |
|  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsuxs FRS,RA,RB,sm |
|  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfdxs FRS,RA,RB,sm  |
|  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfduxs FRS,RA,RB,sm |
|  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfiwxs FRS,RA,RB,sm |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzuspx RT,RA,RB,sm   |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzuspx RT,RA,RB,sm   |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhauspx RT,RA,RB,sm   |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzuspx RT,RA,RB,sm   |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwauspx RT,RA,RB,sm   |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbuspx RS,RA,RB,sm   |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthuspx RS,RA,RB,sm   |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwuspx RS,RA,RB,sm   |
|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stduspx RS,RA,RB,sm   |
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lduspx RT,RA,RB,sm   |
|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfdupxs FRT,RA,RB,sm  |
|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsupxs FRT,RA,RB,sm  |
|  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfdupxs FRS,RA,RB,sm |
|  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsupxs FRS,RA,RB,sm |

----------------

\newpage{}

# Shift-and-Add

`shadd RT, RA, RB, sm`

|  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
|-------|------|-------|-------|-------|-------|----|----------|
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |

Pseudocode:

```
    shift <- sm + 1                     # Shift is between 1-4
    sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
    RT <- sum                           # Result stored in RT
```

When `sm` is zero, the contents of register RB are multiplied by 2,
added to the contents of register RA, and the result stored in RT.

`sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.

Operands RA and RB, and the result RT are all 64-bit, unsigned integers.

**NEED EXAMPLES (not sure how to embed sm)!!!**
Examples:

```
    # adds r1 to (r2*8)
    shadd r4, r1, r2, 3
```

# Shift-and-Add Signed Word

`shaddw RT, RA, RB, sm`

|  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
|-------|------|-------|-------|-------|-------|----|----------|
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |

Pseudocode:

```
    shift <- sm + 1                  # Shift is between 1-4
    n <- EXTS64((RB)[32:63])         # Only use lower 32-bits of RB
    sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
    RT <- sum                        # Result stored in RT
```

When `sm` is zero, the lower word contents of register RB are multiplied by 2,
added to the contents of register RA, and the result stored in RT.

`sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.

Operands RA and RB, and the result RT are all 64-bit, signed integers.

*Programmer's Note:
The advantage of this instruction is doing address offsets. RA is the base 64-bit
address. RB is the offset into data structure limited to 32-bit.*

Examples:

```
# r4 = r1 + (r2*16)
shaddw r4, r1, r2, 3
```

----------------

\newpage{}


# Shift-and-Add Unsigned Word

`shadduw RT, RA, RB, sm`

|  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
|-------|------|-------|-------|-------|-------|----|----------|
|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |

Pseudocode:

```
    shift <- sm + 1                  # Shift is between 1-4
    n <- (RB)[32:63]                 # Only use lower 32-bits of RB
    sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
    RT <- sum                        # Result stored in RT
```

When `sm` is zero, the lower word contents of register RB are multiplied by 2,
added to the contents of register RA, and the result stored in RT.

`sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.

Operands RA and RB, and the result RT are all 64-bit, unsigned integers.

*Programmer's Note:
The advantage of this instruction is doing address offsets. RA is the base 64-bit
address. RB is the offset into data structure limited to 32-bit.*

Examples:

```
#
shadduw r4, r1, r2, 2
```

# Appendices

    Appendix E Power ISA sorted by opcode
    Appendix F Power ISA sorted by version
    Appendix G Power ISA sorted by Compliancy Subset
    Appendix H Power ISA sorted by mnemonic

| Form | Book | Page | Version | mnemonic | Description |
|------|------|------|---------|----------|-------------|
| Z23  | I    | #    | 3.0B    | shadd    | Shift-and-Add |
| Z23  | I    | #    | 3.0B    | shaddw   | Shift-and-Add Signed Word |
| Z23  | I    | #    | 3.0B    | shadduw  | Shift-and-Add Unsigned Word |

[[!tag opf_rfc]]