From 491f5fbf3a480a3be4018e03a0ba195d65902199 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Thu, 11 May 2023 17:16:36 +0100
Subject: [PATCH]

---
 openpower/sv/svp64.mdwn | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/openpower/sv/svp64.mdwn b/openpower/sv/svp64.mdwn
index 0f198f13e..e0c0a9ee0 100644
--- a/openpower/sv/svp64.mdwn
+++ b/openpower/sv/svp64.mdwn
@@ -372,21 +372,6 @@ the example having VL=5.  Thus on "wrapping" - sequential progression
 from GPR(1) into GPR(2) - the 5th result modifies **only** the bottom
 16 LSBs of GPR(1).
 
-*Engineering note: to avoid a Read-Modify-Write at the register
-file it is strongly recommended to implement byte-level write-enable lines
-exactly as has been implemented in DRAM ICs for many decades. Additionally
-the predicate mask bit is advised to be associated with the element
-operation and alongside the result ultimately passed to the register file.
-When element-width is set to 64-bit the relevant predicate mask bit
-may be repeated eight times and pull all eight write-port byte-level
-lines HIGH. Clearly when element-width is set to 8-bit the relevant
-predicate mask bit corresponds directly with one single byte-level
-write-enable line.  It is up to the Hardware Architect to then amortise
-(merge) elements together into both PredicatedSIMD Pipelines as well
-as simultaneous non-overlapping Register File writes, to achieve High
-Performance designs.  Overall it helps to think of the GPR and FPR
-register files as being much more akin to a 64-bit-wide byte-level-addressable SRAM.*
-
 If the 16-bit operation were to be followed up with a 32-bit Vectorised
 Operation, the exact same contents would be viewed as follows:
 
@@ -410,6 +395,21 @@ form because `MSR.LE` is directly in control of the Memory-to-Register
 byte-ordering. This section is exclusively about how to correctly perceive
 Simple-V-Augmented **Register** Files.
 
+*Engineering note: to avoid a Read-Modify-Write at the register
+file it is strongly recommended to implement byte-level write-enable lines
+exactly as has been implemented in DRAM ICs for many decades. Additionally
+the predicate mask bit is advised to be associated with the element
+operation and alongside the result ultimately passed to the register file.
+When element-width is set to 64-bit the relevant predicate mask bit
+may be repeated eight times and pull all eight write-port byte-level
+lines HIGH. Clearly when element-width is set to 8-bit the relevant
+predicate mask bit corresponds directly with one single byte-level
+write-enable line.  It is up to the Hardware Architect to then amortise
+(merge) elements together into both PredicatedSIMD Pipelines as well
+as simultaneous non-overlapping Register File writes, to achieve High
+Performance designs.  Overall it helps to think of the GPR and FPR
+register files as being much more akin to a 64-bit-wide byte-level-addressable SRAM.*
+
 **Comparative equivalent using VSR registers**
 
 For a comparative data point the VSR Registers may be expressed in the
@@ -425,7 +425,6 @@ element (numbered zero) being at the bitwise-numbered **LSB** end of the
 register, where VSX does the reverse: places the numerically-*highest*
 (last-numbered) element at the LSB end of the register.
 
-
 ```
     #pragma pack
     typedef union {
-- 
2.30.2