[[!toc]]
-SVP64 is designed around these fundamental and inviolate RISC principles:
+SVP64 is designed around fundamental and inviolate RISC principles.
+This gives a uniformity and regularity to the ISA which was why RISC
+as a concept became popular. It is just that nobody has ever considered
+applying the RISC concept to a *Vector* ISA before.
1. There are no actual Vector instructions: Scalar instructions
are the sole exclusive bedrock.
(RVV, NEC SX Aurora, Cray)
they always comprise (a) a for-loop around (b) element-based operations.
It is perfectly reasonable and rational to separate (a) from (b)
-and find a powerful Supercomputing-class ISA that qualifies for (b).
+then find a powerful pre-existing
+Supercomputing-class ISA that qualifies for (b).
There are a few exceptional places where these rules get
bent, and others where the rules take some explaining,
-and this page tracks them.
+and this page tracks them all.
The modification caveat in (2) above semantically
exempts element width overrides,
which still do not actually modify the meaning of the instruction:
an add remains an add, even if its override makes it an 8-bit add rather than
a 64-bit add. Even add-with-carry remains an add-with-carry: it's just
-that when elwidth=8 in the Prefix it's an *8-bit* add-with-carry,
-where the 9th bit becomes Carry-out, not the 65th bit.
+that when elwidth=8 in the Prefix it's an *8-bit* add-with-carry
+where the 9th bit becomes Carry-out (not the 65th bit).
In other words, elwidth overrides **definitely** do not fundamentally
alter the actual
Scalar v3.0 ISA encoding itself. Consequently we can still, in
-the strictest sense, not be breaking rule (2).
+the strictest semantic sense, not be breaking rule (2).
Likewise, other "modifications" such as saturation or Data-dependent
Fail-First likewise are actually post-augmentation or post-analysis, and do
Predication is also unlike other Vector ISAs because it allows zeroing
on both the source and destination. This takes some explaining.
-In Vector ISAs, there is a choice of actions when a Predicate Mask bit
+In Vector ISAs, there is a Predicate Mask, it applies to the
+destination only, and there
+is a choice of actions when a Predicate Mask bit
is zero:
* set the destination element to zero
-* skip that element operation entirely, leaving the result unmodified
+* skip that element operation entirely, leaving the destination unmodified
+The problem comes if the underlying register file SRAM is say 64-bit wide
+write granularity but the Vector elements are say 8-bit wide.
Some Vector ISAs strongly advocate Zeroing because to leave one single
element at a small bitwidth in amongst other elements where the register
file does not have the prerequisite access granularity is very expensive,
requiring a Read-Modify-Write cycle to preserve the untouched elements.
Putting zero into the destination avoids that Read.
+
This is technically very easy to solve: use a Register File that does
in fact have the smallest element-level write-enable granularity.
+If the elements are 8 bit then allow 8-bit writes!
With that technical issue solved there is nothing in the way of choosing
to support both zeroing and non-zeroing (skipping) at the ISA level:
-SV chooses to support both *on both the source and destination*.
+SV chooses to further support both *on both the source and destination*.
This can result in the source and destination
element indices getting "out-of-sync" even though the Predicate Mask
is the same because the behaviour is different when zeros in the
Vector ISA of the past forty years. To explain how normal Single-predication
is applied in a standard Vector ISA:
-* Predication on the **destination** of a LOAD instruction creates something
+* Predication on the **source** of a LOAD instruction creates something
called "Vector Compressed Load" (VCOMPRESS).
-* Predication on the **source** of a STORE instruction creates something
+* Predication on the **destination** of a STORE instruction creates something
called "Vector Expanded Store" (VEXPAND).
* SVP64 allows the two to be put back-to-back: one on source, one on
destination.
to the destination register, not just on Load/Stores but on *arithmetic*
operations.
-No other Vector ISA in the world has this capability. All true Vector
+No other Vector ISA in the world has this back-to-back
+capability. All true Vector
ISAs have Predicate Masks: it is an absolutely essential characteristic.
However none of them have abstracted dual predicates out to the extent
where this VCOMPRESS-VEXPAND effect is applicable *in general* to a
Also worth a special mention: all Load/Store operations are Twin-Predicated.
The underlying key to understanding:
-* one Predicate applies to the Array of Memory *Addresses*,
-* the other Predicate applies to the Array of Memory *Data*.
+* one Predicate effectively applies to the Array of Memory *Addresses*,
+* the other Predicate effectively applies to the Array of Memory *Data*.
# CR weird instructions