writing to CR8 (TBD evaluate) and increase sequentially from there. Vectorised FP
results, when Rc=1, start from CR32 (TBD evaluate). This is so that:
+* implementations may rely on the Vector CRs being aligned to 8. This means that CRs may be read or written in aligned batches of 32 bits, for high performance implementations.
* scalar Rc=1 operation (CR0, CR1) and callee-saved CRs (CR2-4) are not overwritten by vector Rc=1 operations except for very large VL
* Vector FP and Integer Rc=1 operations do not overwrite each other except for large VL.