From: Luke Kenneth Casson Leighton Date: Wed, 12 Apr 2023 08:41:00 +0000 (+0100) Subject: update ls004, add table of 38 LD/ST shift-indexed instructions X-Git-Tag: opf_rfc_ls010_v1~59 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=9ca0101bb441ed8f81b1e2f0f86538942a73d0c0;p=libreriscv.git update ls004, add table of 38 LD/ST shift-indexed instructions --- diff --git a/openpower/sv/rfc/ls004.mdwn b/openpower/sv/rfc/ls004.mdwn index 008ab56f9..fe8c1513e 100644 --- a/openpower/sv/rfc/ls004.mdwn +++ b/openpower/sv/rfc/ls004.mdwn @@ -62,9 +62,9 @@ **Motivation** -Power ISA is missing LD/ST with shift, which is present in both ARM and x86. -Adding more LD/ST is too complex, a compromise is to add shift-and-add. -Replaces a pair of explicit instructions in hot-loops. +Power ISA is missing LD/ST Indexed with shift, which is present in both ARM +and x86. Adding more LD/ST is thirty eight instructions, a compromise is to +add shift-and-add. Replaces a pair of explicit instructions in hot-loops. **Notes and Observations**: @@ -74,7 +74,8 @@ Replaces a pair of explicit instructions in hot-loops. and zero-extended. 3. Both are 2-in 1-out instructions. -TODO: signed 32-bit shift-and-add should be added, this needs to be addressed before submitting the RFC: +TODO: signed 32-bit shift-and-add should be added, this needs to be addressed +before submitting the RFC: **Changes** @@ -87,6 +88,71 @@ Add the following entries to: \newpage{} +# Table of LD/ST-Indexed-Shift + +The following demonstrates the alternative instructions that could +be considered to be added. They are all 9-bit XO which is not hugely +costly. The totals are + +* 12 Load Indexed Shifted (with Update) +* 3 Load Indexed Shifted Byte-reverse +* 8 Store Indexed Shifted (with Update) +* 3 Store Indexed Shifted Byte-reverse +* 6 Floating-Point Load Indexed Shifted (with Update) +* 6 Floating-Point Store Indexed Shifted (with Update) + +Total count: 38 new 9-bit XO instructions, for an approximate total +XO cost of 3 bits within a single Primary Opcode. With the savings +that these instructions represent in hot-loops, as evidenced by their +inclusion in top-end ISAs such as x86 and ARM, the cost may be considered +justifiable. However there is no point in placing these in EXT2xx, they +need to be in EXT0xx, because if added as 64-bit Encoding the benefit +reduction in binary size is not achieved. + +| 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction | +|-------|------|-------|-------|-------|-------|----------| +| PO | RT | RA | RB | sm | XO | lbzsx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | lbzusx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | lhzsx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | lhzusx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | lhasx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | lhausx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | lwzsx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | lwzusx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | lwasx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | lwausx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | ldsx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | ldusx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | lhbrsx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | lwbrsx RT,RA,RB,sm | +| PO | RT | RA | RB | sm | XO | ldbrsx RT,RA,RB,sm | +| PO | RS | RA | RB | sm | XO | stbus RS,RA,RB,sm | +| PO | RS | RA | RB | sm | XO | stbusx RS,RA,RB,sm | +| PO | RS | RA | RB | sm | XO | sthsx RS,RA,RB,sm | +| PO | RS | RA | RB | sm | XO | sthusx RS,RA,RB,sm | +| PO | RS | RA | RB | sm | XO | stwsx RS,RA,RB,sm | +| PO | RS | RA | RB | sm | XO | stwusx RS,RA,RB,sm | +| PO | RS | RA | RB | sm | XO | stdsx RS,RA,RB,sm | +| PO | RS | RA | RB | sm | XO | stdusx RS,RA,RB,sm | +| PO | RS | RA | RB | sm | XO | sthbrsx RS,RA,RB,sm | +| PO | RS | RA | RB | sm | XO | stwbrsx RS,RA,RB,sm | +| PO | RS | RA | RB | sm | XO | stdbrsx RS,RA,RB,sm | +| PO | FRT | RA | RB | sm | XO | lfsxs FRT,RA,RB,sm | +| PO | FRT | RA | RB | sm | XO | lfsuxs FRT,RA,RB,sm | +| PO | FRT | RA | RB | sm | XO | lfdxs FRT,RA,RB,sm | +| PO | FRT | RA | RB | sm | XO | lfduxs FRT,RA,RB,sm | +| PO | FRT | RA | RB | sm | XO | lfiwaxs FRT,RA,RB,sm | +| PO | FRT | RA | RB | sm | XO | lfiwzxs FRT,RA,RB,sm | +| PO | FRS | RA | RB | sm | XO | stfsxs FRS,RA,RB,sm | +| PO | FRS | RA | RB | sm | XO | stfsuxs FRS,RA,RB,sm | +| PO | FRS | RA | RB | sm | XO | stfdxs FRS,RA,RB,sm | +| PO | FRS | RA | RB | sm | XO | stfduxs FRS,RA,RB,sm | +| PO | FRS | RA | RB | sm | XO | stfiwxs FRS,RA,RB,sm | + +---------------- + +\newpage{} + # Shift-and-Add `shadd RT, RA, RB` @@ -97,9 +163,11 @@ Add the following entries to: Pseudocode: +``` shift <- sm + 1 # Shift is between 1-4 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA RT <- sum # Result stored in RT +``` When `sm` is zero, the contents of register RB are multiplied by 2, added to the contents of register RA, and the result stored in RT. @@ -112,8 +180,8 @@ Operands RA and RB, and the result RT are all 64-bit, unsigned integers. Examples: ``` -# adds r1 to (r2*8) -shadd r4, r1, r2, 3 + # adds r1 to (r2*8) + shadd r4, r1, r2, 3 ``` # Shift-and-Add Unsigned Word @@ -126,10 +194,12 @@ shadd r4, r1, r2, 3 Pseudocode: +``` shift <- sm + 1 # Shift is between 1-4 n <- (RB)[32:63] # Only use lower 32-bits of RB sum[0:63] <- (n << shift) + (RA) # Shift n, add RA RT <- sum # Result stored in RT +``` When `sm` is zero, the lower word contents of register RB are multiplied by 2, added to the contents of register RA, and the result stored in RT. @@ -140,7 +210,7 @@ Operands RA and RB, and the result RT are all 64-bit, unsigned integers. *Programmer's Note: The advantage of this instruction is doing address offsets. RA is the base 64-bit -address. RB is the offset into data structure limited to 32-bit. +address. RB is the offset into data structure limited to 32-bit.* Examples: