from GPR(1) into GPR(2) - the 5th result modifies **only** the bottom
16 LSBs of GPR(1).
-*Engineering note: to avoid a Read-Modify-Write at the register
-file it is strongly recommended to implement byte-level write-enable lines
-exactly as has been implemented in DRAM ICs for many decades. Additionally
-the predicate mask bit is advised to be associated with the element
-operation and alongside the result ultimately passed to the register file.
-When element-width is set to 64-bit the relevant predicate mask bit
-may be repeated eight times and pull all eight write-port byte-level
-lines HIGH. Clearly when element-width is set to 8-bit the relevant
-predicate mask bit corresponds directly with one single byte-level
-write-enable line. It is up to the Hardware Architect to then amortise
-(merge) elements together into both PredicatedSIMD Pipelines as well
-as simultaneous non-overlapping Register File writes, to achieve High
-Performance designs. Overall it helps to think of the GPR and FPR
-register files as being much more akin to a 64-bit-wide byte-level-addressable SRAM.*
-
If the 16-bit operation were to be followed up with a 32-bit Vectorised
Operation, the exact same contents would be viewed as follows:
byte-ordering. This section is exclusively about how to correctly perceive
Simple-V-Augmented **Register** Files.
+*Engineering note: to avoid a Read-Modify-Write at the register
+file it is strongly recommended to implement byte-level write-enable lines
+exactly as has been implemented in DRAM ICs for many decades. Additionally
+the predicate mask bit is advised to be associated with the element
+operation and alongside the result ultimately passed to the register file.
+When element-width is set to 64-bit the relevant predicate mask bit
+may be repeated eight times and pull all eight write-port byte-level
+lines HIGH. Clearly when element-width is set to 8-bit the relevant
+predicate mask bit corresponds directly with one single byte-level
+write-enable line. It is up to the Hardware Architect to then amortise
+(merge) elements together into both PredicatedSIMD Pipelines as well
+as simultaneous non-overlapping Register File writes, to achieve High
+Performance designs. Overall it helps to think of the GPR and FPR
+register files as being much more akin to a 64-bit-wide byte-level-addressable SRAM.*
+
**Comparative equivalent using VSR registers**
For a comparative data point the VSR Registers may be expressed in the
register, where VSX does the reverse: places the numerically-*highest*
(last-numbered) element at the LSB end of the register.
-
```
#pragma pack
typedef union {