add example of GPR-wrapping

author Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Mon, 3 Apr 2023 14:06:20 +0000 (15:06 +0100)

committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Mon, 3 Apr 2023 14:06:20 +0000 (15:06 +0100)
author Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Mon, 3 Apr 2023 14:06:20 +0000 (15:06 +0100)
committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Mon, 3 Apr 2023 14:06:20 +0000 (15:06 +0100)
diff --git a/openpower/sv/svp64.mdwn b/openpower/sv/svp64.mdwn

index a4980c339eb077e2246c3553cc6f4cc0a1908849..b439f5130b642764eef230437f917a570f73b4f3 100644 (file)
--- a/openpower/sv/svp64.mdwn
+++ b/openpower/sv/svp64.mdwn
@@ -233,9 +233,24 @@ From this Canonical definition it should be clear that sequential elements begin
  at the LSB end of any given underlying Scalar GPR, progress to the MSB end, and
  then to the LSB end of the *next numerically-larger Scalar GPR*.  In the
  example above if VL=5 and RT=1 then the contents of GPR(1) and GPR(2) will
-be as follows:
+be as follows.  For clarity in the table below:
  
+* Both MSB0-ordered bitnumbering *and* LSB-ordered bitnumbering are shown
+* The GPR-numbering is considered LSB0-ordered
+* The Element-numbering (result0-result4) is LSB0-ordered
  
+```
+    | MSB0:  | 0:15    | 16:31   | 32:47   | 48:63   |
+    | LSB0:  | 63:48   | 47:32   | 31:16   | 15:0    |
+    |--------|---------|---------|---------|---------|
+    | GPR(0) | result3 | result2 | result1 | result0 |
+    | GPR(1) | same    | same    | same    | result4 |
+```
+
+Note that the upper 48 bits of GPR(1) would **not** be modified because
+the example has VL=5.  Thus on "wrapping" - sequential progression from
+GPR(0) into GPR(1) - the 5th result modifies
+**only** the bottom 16 LSBs of GPR(1).
  
  Hardware Architectural note: to avoid a Read-Modify-Write at the register
  file it is strongly recommended to implement byte-level write-enable lines
@@ -249,7 +264,8 @@ predicate mask bit corresponds directly with one single byte-level
  write-enable line.  It is up to the Hardware Architect to then amortise
  (merge) elements together into both PredicatedSIMD Pipelines as well
  as simultaneous non-overlapping Register File writes, to achieve High
-Performance designs.
+Performance designs.  Overall it helps to think of the register files
+as being much more akin to a byte-level-addressable SRAM.
  
  **Comparative equivalent using VSR registers**
author	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Mon, 3 Apr 2023 14:06:20 +0000 (15:06 +0100)
committer	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Mon, 3 Apr 2023 14:06:20 +0000 (15:06 +0100)