writing to CR8 (TBD evaluate) and increase sequentially from there. Vectorised FP
results, when Rc=1, start from CR32 (TBD evaluate). This is so that:
-* implementations may rely on the Vector CRs being aligned to 8. This means that CRs may be read or written in aligned batches of 32 bits, for high performance implementations.
+* implementations may rely on the Vector CRs being aligned to 8. This means that CRs may be read or written in aligned batches of 32 bits (8 CRs per batch), for high performance implementations.
* scalar Rc=1 operation (CR0, CR1) and callee-saved CRs (CR2-4) are not overwritten by vector Rc=1 operations except for very large VL
* Vector FP and Integer Rc=1 operations do not overwrite each other except for large VL.