be unnecessary, under these circunstances.
However, it is not possible for `fadds` to fit two elements into
-64-bit: bear in mind that the FP32 bits are spread out across a 64
+64-bit: that breaks the simplicity of SVP64.
+Bear in mind that the FP32 bits are spread out across a 64
bit register in FP64 format. The solution here was to consider the
"s" at the end of each instruction
to mean "half of the element's width". Thus, `sv.fadds/ew=32`
actually stores an FP16 spread out across the 32 bits of an
element, in FP32 format, where `sv.fadd/ew=32` stores a full
FP32 result into the full 32 bits.
+
+Where this breaks down is when attempting to do half-width on
+BF16 or FP16 operations: there does not exist a BF8 or an IEE754 FP8
+format, so these should be avoided.