update ls004, add table of 38 LD/ST shift-indexed instructions

author Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Wed, 12 Apr 2023 08:41:00 +0000 (09:41 +0100)

committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Wed, 12 Apr 2023 08:41:00 +0000 (09:41 +0100)
author Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Wed, 12 Apr 2023 08:41:00 +0000 (09:41 +0100)
committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Wed, 12 Apr 2023 08:41:00 +0000 (09:41 +0100)
diff --git a/openpower/sv/rfc/ls004.mdwn b/openpower/sv/rfc/ls004.mdwn

index 008ab56f92e45b678bfb91bfd5ebcd591b5c5d0e..fe8c1513e2fb331c4f27c68e9e8a0bf061df057f 100644 (file)
--- a/openpower/sv/rfc/ls004.mdwn
+++ b/openpower/sv/rfc/ls004.mdwn
@@ -62,9 +62,9 @@
  
  **Motivation**
  
-Power ISA is missing LD/ST with shift, which is present in both ARM and x86.
-Adding more LD/ST is too complex, a compromise is to add shift-and-add.
-Replaces a pair of explicit instructions in hot-loops.
+Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
+and x86.  Adding more LD/ST is thirty eight instructions, a compromise is to
+add shift-and-add.  Replaces a pair of explicit instructions in hot-loops.
  
  **Notes and Observations**:
  
@@ -74,7 +74,8 @@ Replaces a pair of explicit instructions in hot-loops.
     and zero-extended.
  3. Both are 2-in 1-out instructions.
  
-TODO: signed 32-bit shift-and-add should be added, this needs to be addressed before submitting the RFC: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
+TODO: signed 32-bit shift-and-add should be added, this needs to be addressed
+before submitting the RFC: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
  
  **Changes**
  
@@ -87,6 +88,71 @@ Add the following entries to:
  
  \newpage{}
  
+# Table of LD/ST-Indexed-Shift
+
+The following demonstrates the alternative instructions that could
+be considered to be added. They are all 9-bit XO which is not hugely
+costly.  The totals are
+
+* 12 Load Indexed Shifted (with Update)
+* 3 Load Indexed Shifted Byte-reverse
+* 8 Store Indexed Shifted (with Update)
+* 3 Store Indexed Shifted Byte-reverse
+* 6 Floating-Point Load Indexed Shifted (with Update)
+* 6 Floating-Point Store Indexed Shifted (with Update)
+
+Total count: 38 new 9-bit XO instructions, for an approximate total
+XO cost of 3 bits within a single Primary Opcode.  With the savings
+that these instructions represent in hot-loops, as evidenced by their
+inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
+justifiable.  However there is no point in placing these in EXT2xx, they
+need to be in EXT0xx, because if added as 64-bit Encoding the benefit
+reduction in binary size is not achieved.
+
+|  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction     |
+|-------|------|-------|-------|-------|-------|----------|
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzsx RT,RA,RB,sm  |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzusx RT,RA,RB,sm  |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzsx RT,RA,RB,sm  |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzusx RT,RA,RB,sm  |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhasx RT,RA,RB,sm  |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhausx RT,RA,RB,sm  |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzsx RT,RA,RB,sm  |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzusx RT,RA,RB,sm  |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwasx RT,RA,RB,sm  |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwausx RT,RA,RB,sm  |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldsx RT,RA,RB,sm   |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldusx RT,RA,RB,sm   |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhbrsx RT,RA,RB,sm   |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwbrsx RT,RA,RB,sm   |
+|  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldbrsx RT,RA,RB,sm   |
+|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbus RS,RA,RB,sm   |
+|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbusx RS,RA,RB,sm   |
+|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthsx RS,RA,RB,sm   |
+|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthusx RS,RA,RB,sm   |
+|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwsx RS,RA,RB,sm   |
+|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwusx RS,RA,RB,sm   |
+|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdsx RS,RA,RB,sm   |
+|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdusx RS,RA,RB,sm   |
+|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthbrsx RS,RA,RB,sm   |
+|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwbrsx RS,RA,RB,sm   |
+|  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdbrsx RS,RA,RB,sm   |
+|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsxs FRT,RA,RB,sm  |
+|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsuxs FRT,RA,RB,sm  |
+|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfdxs FRT,RA,RB,sm  |
+|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfduxs FRT,RA,RB,sm  |
+|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwaxs FRT,RA,RB,sm  |
+|  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwzxs FRT,RA,RB,sm  |
+|  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsxs FRS,RA,RB,sm   |
+|  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsuxs FRS,RA,RB,sm   |
+|  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfdxs FRS,RA,RB,sm   |
+|  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfduxs FRS,RA,RB,sm   |
+|  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfiwxs FRS,RA,RB,sm   |
+
+----------------
+
+\newpage{}
+
  # Shift-and-Add
  
  `shadd RT, RA, RB`
@@ -97,9 +163,11 @@ Add the following entries to:
  
  Pseudocode:
  
+```
      shift <- sm + 1                                # Shift is between 1-4
      sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
      RT <- sum                                      # Result stored in RT
+```
  
  When `sm` is zero, the contents of register RB are multiplied by 2,
  added to the contents of register RA, and the result stored in RT.
@@ -112,8 +180,8 @@ Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
  Examples:
  
  ```
-# adds r1 to (r2*8)
-shadd r4, r1, r2, 3
+    # adds r1 to (r2*8)
+    shadd r4, r1, r2, 3
  ```
  
  # Shift-and-Add Unsigned Word
@@ -126,10 +194,12 @@ shadd r4, r1, r2, 3
  
  Pseudocode:
  
+```
      shift <- sm + 1                                    # Shift is between 1-4
      n <- (RB)[32:63]                           # Only use lower 32-bits of RB
      sum[0:63] <- (n << shift) + (RA)    # Shift n, add RA
      RT <- sum                                      # Result stored in RT
+```
  
  When `sm` is zero, the lower word contents of register RB are multiplied by 2,
  added to the contents of register RA, and the result stored in RT.
@@ -140,7 +210,7 @@ Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
  
  *Programmer's Note:
  The advantage of this instruction is doing address offsets. RA is the base 64-bit
-address. RB is the offset into data structure limited to 32-bit.
+address. RB is the offset into data structure limited to 32-bit.*
  
  Examples:
author	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Wed, 12 Apr 2023 08:41:00 +0000 (09:41 +0100)
committer	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Wed, 12 Apr 2023 08:41:00 +0000 (09:41 +0100)