| MSB0: | 0:15 | 16:31 | 32:47 | 48:63 |
| LSB0: | 63:48 | 47:32 | 31:16 | 15:0 |
|--------|---------|---------|---------|---------|
- | GPR(0) | result3 | result2 | result1 | result0 |
- | GPR(1) | same | same | same | result4 |
+ | GPR(0) | same | same | same | same |
+ | GPR(1) | result3 | result2 | result1 | result0 |
+ | GPR(2) | same | same | same | result4 |
+ | GPR(3) | same | same | same | same |
+ | ... | ... | ... | ... | ... |
+ | ... | ... | ... | ... | ... |
```
Note that the upper 48 bits of GPR(1) would **not** be modified because
the example has VL=5. Thus on "wrapping" - sequential progression from
-GPR(0) into GPR(1) - the 5th result modifies
+GPR(1) into GPR(2) - the 5th result modifies
**only** the bottom 16 LSBs of GPR(1).
Hardware Architectural note: to avoid a Read-Modify-Write at the register