From 84175a7cfc34beb917975c789a95831644b998d1 Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 18 Sep 2022 10:51:53 +0100 Subject: [PATCH] --- openpower/sv/rfc/ls001.mdwn | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/openpower/sv/rfc/ls001.mdwn b/openpower/sv/rfc/ls001.mdwn index 6f02468e1..37acef3f7 100644 --- a/openpower/sv/rfc/ls001.mdwn +++ b/openpower/sv/rfc/ls001.mdwn @@ -344,14 +344,14 @@ mask `sv.bc/pm=r3`. Traditional Vector ISAs have vastly more (and more complex) addressing modes: unit strided, element strided, Indexed, Structure Packing. All -of these had to be jammed in on top of **existing Scalar instructions -without modifying the Scalar instructions**. A small conceptual +of these had to be jammed in on top of existing Scalar instructions +**without modifying the Scalar instructions**. A small conceptual "cheat" was therefore needed. The Immediate (D) is in some Modes multiplied by the element index, which gives us element-strided. For unit-strided the width of the operation (`ld`, 8 byte) is taken -as the multiplier. Hardware-level modifications to support this +as the multiplier. Modifications to support this "cheat" on top of pre-existing Scalar HDL (and Simulators) -have both turned out to be minimal. +have both turned out to be minimal.[^mul] Also added was the option to perform signed or unsigned Effective Address calculation, which comes into play only on LD/ST Indexed, @@ -374,7 +374,8 @@ Effective Address* in one instruction. For DCT and FFT, normally it is very expensive to perform the "bit-inversion" needed for address calculation and/or reordering of elements. DCT in particular needs both bit-inversion *and -Gray-Coding* offsets. DCT/FFT REMAP **automatically** performs +Gray-Coding* offsets (a complexity that often "justifies" full +assmbler loop-unrolling). DCT/FFT REMAP **automatically** performs the required offset adjustment to get data loaded and stored in the required order. Matrix REMAP can likewise perform up to 3 Dimensions of reordering (on both Immediate and Indexed), and @@ -384,7 +385,8 @@ four dimensions (four nested fixed size loops). Overall the LD/ST Modes available are extremely powerful, especially when combining arithmetic (lharx) with saturation, element-width overrides, vec2/3/4 Structure Packing *and* REMAP, the combinations far exceed anything -seen in any other Vector ISA in history. +seen in any other Vector ISA in history, yet are really nothing more +than concepts abstracted out in pure RISC form.[^ldstcisc] # SVP64Single 24-bits @@ -1179,3 +1181,5 @@ operations. [^hphint]: intended for use when the compiler has determined the extent of Memory or register aliases in loops: `a[i] += a[i+4]` would necessitate a Vertical-First hphint of 4 [^svshape]: although SVSHAPE0-3 should, realistically, be regarded as high a priority as SVSTATE, and given corresponding SVSRR and SVLR equivalents, it was felt that having to context-switch **five** SPRs on Interrupts and function calls was too much. [^whoops]: two efforts were made to mix non-uniform encodings into Simple-V space: one deliberate to see how it would go, and one accidental. They both went extremely badly, the deliberate one costing two months to add then remove. +[^mul]: Setting this "multiplier"to 1 remarkably leaves pre-existing Scalar behaviour completely intact as a degenerate case. +[ldstcisc]: At least the CISC "auto-increment" modes are not present, from the CDC 6600 and Motorola 68000! although these would be fun to introduce they do unfortunately make for 3-in 3-out register profiles, all 64-bit, which explains why the 6600 and 68000 had separate special dedicated address regfiles. -- 2.30.2