from GPR(1) into GPR(2) - the 5th result modifies **only** the bottom
16 LSBs of GPR(1).
-Hardware Architectural note: to avoid a Read-Modify-Write at the register
+*Engineering note: to avoid a Read-Modify-Write at the register
file it is strongly recommended to implement byte-level write-enable lines
exactly as has been implemented in DRAM ICs for many decades. Additionally
the predicate mask bit is advised to be associated with the element
write-enable line. It is up to the Hardware Architect to then amortise
(merge) elements together into both PredicatedSIMD Pipelines as well
as simultaneous non-overlapping Register File writes, to achieve High
-Performance designs. Overall it helps to think of the register files
-as being much more akin to a byte-level-addressable SRAM.
+Performance designs. Overall it helps to think of the GPR and FPR
+register files as being much more akin to a 64-bit-wide byte-level-addressable SRAM.*
If the 16-bit operation were to be followed up with a 32-bit Vectorised
Operation, the exact same contents would be viewed as follows: