For `sv.lfdup`, RA is Scalar so continuously accumulates
additions of the immediate (8) but only *after* RA has been used
-as the Effective Address.
+as the Effective Address each time.
The last write to RA is the address for
-the next block (the next time round the CTR loop).
+the next block (the next time round the CTR loop).
+
To understand this it is necessary to appreciate that
SVP64 is as if a sequence of loop-unrolled scalar instructions were
issued. With that sequence all writing the new version of RA
in effect to Element-Strided, except that RA points to the start
of the next batch.
+If `sv.lfdup` was not available, `sv.lfdu` could be used to the same
+effect, but RA would have to be *pre-subtracted by one element*, outside
+of the loop. Due to the compactness of this highly hardware-parallelizable
+algorithm, that one additinal instruction would increase the implementation
+code size by 5 percent! This helps explain why Post-Increment Update
+Load/Store instructions are so important.
+
Use of Element-Strided on `sv.lfd/els`
ensures the Immediate (8) results in a contiguous LD
*without* modifying RA.