Traditional Vector ISAs have vastly more (and more complex) addressing
modes: unit strided, element strided, Indexed, Structure Packing. All
-of these had to be jammed in on top of **existing Scalar instructions
-without modifying the Scalar instructions**. A small conceptual
+of these had to be jammed in on top of existing Scalar instructions
+**without modifying the Scalar instructions**. A small conceptual
"cheat" was therefore needed. The Immediate (D) is in some Modes
multiplied by the element index, which gives us element-strided.
For unit-strided the width of the operation (`ld`, 8 byte) is taken
-as the multiplier. Hardware-level modifications to support this
+as the multiplier. Modifications to support this
"cheat" on top of pre-existing Scalar HDL (and Simulators)
-have both turned out to be minimal.
+have both turned out to be minimal.[^mul]
Also added was the option to perform signed or unsigned Effective
Address calculation, which comes into play only on LD/ST Indexed,
For DCT and FFT, normally it is very expensive to perform the
"bit-inversion" needed for address calculation and/or reordering
of elements. DCT in particular needs both bit-inversion *and
-Gray-Coding* offsets. DCT/FFT REMAP **automatically** performs
+Gray-Coding* offsets (a complexity that often "justifies" full
+assmbler loop-unrolling). DCT/FFT REMAP **automatically** performs
the required offset adjustment to get data loaded and stored in
the required order. Matrix REMAP can likewise perform up to 3
Dimensions of reordering (on both Immediate and Indexed), and
Overall the LD/ST Modes available are extremely powerful, especially
when combining arithmetic (lharx) with saturation, element-width overrides,
vec2/3/4 Structure Packing *and* REMAP, the combinations far exceed anything
-seen in any other Vector ISA in history.
+seen in any other Vector ISA in history, yet are really nothing more
+than concepts abstracted out in pure RISC form.[^ldstcisc]
# SVP64Single 24-bits
[^hphint]: intended for use when the compiler has determined the extent of Memory or register aliases in loops: `a[i] += a[i+4]` would necessitate a Vertical-First hphint of 4
[^svshape]: although SVSHAPE0-3 should, realistically, be regarded as high a priority as SVSTATE, and given corresponding SVSRR and SVLR equivalents, it was felt that having to context-switch **five** SPRs on Interrupts and function calls was too much.
[^whoops]: two efforts were made to mix non-uniform encodings into Simple-V space: one deliberate to see how it would go, and one accidental. They both went extremely badly, the deliberate one costing two months to add then remove.
+[^mul]: Setting this "multiplier"to 1 remarkably leaves pre-existing Scalar behaviour completely intact as a degenerate case.
+[ldstcisc]: At least the CISC "auto-increment" modes are not present, from the CDC 6600 and Motorola 68000! although these would be fun to introduce they do unfortunately make for 3-in 3-out register profiles, all 64-bit, which explains why the 6600 and 68000 had separate special dedicated address regfiles.