at the LSB end of any given underlying Scalar GPR, progress to the MSB end, and
then to the LSB end of the *next numerically-larger Scalar GPR*. In the
example above if VL=5 and RT=1 then the contents of GPR(1) and GPR(2) will
-be as follows:
+be as follows. For clarity in the table below:
+* Both MSB0-ordered bitnumbering *and* LSB-ordered bitnumbering are shown
+* The GPR-numbering is considered LSB0-ordered
+* The Element-numbering (result0-result4) is LSB0-ordered
+```
+ | MSB0: | 0:15 | 16:31 | 32:47 | 48:63 |
+ | LSB0: | 63:48 | 47:32 | 31:16 | 15:0 |
+ |--------|---------|---------|---------|---------|
+ | GPR(0) | result3 | result2 | result1 | result0 |
+ | GPR(1) | same | same | same | result4 |
+```
+
+Note that the upper 48 bits of GPR(1) would **not** be modified because
+the example has VL=5. Thus on "wrapping" - sequential progression from
+GPR(0) into GPR(1) - the 5th result modifies
+**only** the bottom 16 LSBs of GPR(1).
Hardware Architectural note: to avoid a Read-Modify-Write at the register
file it is strongly recommended to implement byte-level write-enable lines
write-enable line. It is up to the Hardware Architect to then amortise
(merge) elements together into both PredicatedSIMD Pipelines as well
as simultaneous non-overlapping Register File writes, to achieve High
-Performance designs.
+Performance designs. Overall it helps to think of the register files
+as being much more akin to a byte-level-addressable SRAM.
**Comparative equivalent using VSR registers**