go as low as 8-bit arithmetic, even 8-bit Floating-Point for
high-performance AI. Rather than waste opcode space adding all
such operations at different bitwidths, let the prefix
- *redefine* the element width.
+ *redefine* (override) the element width, without actually altering
+ the Scalar ISA at all.
* "Reordering" of the assumption of linear sequential element
access, for Matrices, rotations, transposition, Convolutions,
DCT, FFT, Parallel Prefix-Sum and other common transformations