If `sv.lfdup` was not available, `sv.lfdu` could be used to the same
effect, but RA would have to be *pre-subtracted by one element*, outside
of the loop. Due to the compactness of this highly hardware-parallelizable
-algorithm, that one additinal instruction would increase the implementation
+algorithm, that one additional instruction would increase the implementation
code size by 5 percent! This helps explain why Post-Increment Update
Load/Store instructions are so important.