From 3d4103cfbda79863270dd2386c2ae4407285a30a Mon Sep 17 00:00:00 2001 From: lkcl Date: Fri, 10 Jun 2022 12:25:27 +0100 Subject: [PATCH] --- openpower/sv/svp64_quirks.mdwn | 39 ++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/openpower/sv/svp64_quirks.mdwn b/openpower/sv/svp64_quirks.mdwn index fafb38b9d..8a40cceac 100644 --- a/openpower/sv/svp64_quirks.mdwn +++ b/openpower/sv/svp64_quirks.mdwn @@ -2,7 +2,10 @@ [[!toc]] -SVP64 is designed around these fundamental and inviolate RISC principles: +SVP64 is designed around fundamental and inviolate RISC principles. +This gives a uniformity and regularity to the ISA which was why RISC +as a concept became popular. It is just that nobody has ever considered +applying the RISC concept to a *Vector* ISA before. 1. There are no actual Vector instructions: Scalar instructions are the sole exclusive bedrock. @@ -20,23 +23,24 @@ had also been added. Looking at the pseudocode of any Vector ISA (RVV, NEC SX Aurora, Cray) they always comprise (a) a for-loop around (b) element-based operations. It is perfectly reasonable and rational to separate (a) from (b) -and find a powerful Supercomputing-class ISA that qualifies for (b). +then find a powerful pre-existing +Supercomputing-class ISA that qualifies for (b). There are a few exceptional places where these rules get bent, and others where the rules take some explaining, -and this page tracks them. +and this page tracks them all. The modification caveat in (2) above semantically exempts element width overrides, which still do not actually modify the meaning of the instruction: an add remains an add, even if its override makes it an 8-bit add rather than a 64-bit add. Even add-with-carry remains an add-with-carry: it's just -that when elwidth=8 in the Prefix it's an *8-bit* add-with-carry, -where the 9th bit becomes Carry-out, not the 65th bit. +that when elwidth=8 in the Prefix it's an *8-bit* add-with-carry +where the 9th bit becomes Carry-out (not the 65th bit). In other words, elwidth overrides **definitely** do not fundamentally alter the actual Scalar v3.0 ISA encoding itself. Consequently we can still, in -the strictest sense, not be breaking rule (2). +the strictest semantic sense, not be breaking rule (2). Likewise, other "modifications" such as saturation or Data-dependent Fail-First likewise are actually post-augmentation or post-analysis, and do @@ -222,23 +226,29 @@ So named because there is a Twin Predication concept as well, Single Predication is also unlike other Vector ISAs because it allows zeroing on both the source and destination. This takes some explaining. -In Vector ISAs, there is a choice of actions when a Predicate Mask bit +In Vector ISAs, there is a Predicate Mask, it applies to the +destination only, and there +is a choice of actions when a Predicate Mask bit is zero: * set the destination element to zero -* skip that element operation entirely, leaving the result unmodified +* skip that element operation entirely, leaving the destination unmodified +The problem comes if the underlying register file SRAM is say 64-bit wide +write granularity but the Vector elements are say 8-bit wide. Some Vector ISAs strongly advocate Zeroing because to leave one single element at a small bitwidth in amongst other elements where the register file does not have the prerequisite access granularity is very expensive, requiring a Read-Modify-Write cycle to preserve the untouched elements. Putting zero into the destination avoids that Read. + This is technically very easy to solve: use a Register File that does in fact have the smallest element-level write-enable granularity. +If the elements are 8 bit then allow 8-bit writes! With that technical issue solved there is nothing in the way of choosing to support both zeroing and non-zeroing (skipping) at the ISA level: -SV chooses to support both *on both the source and destination*. +SV chooses to further support both *on both the source and destination*. This can result in the source and destination element indices getting "out-of-sync" even though the Predicate Mask is the same because the behaviour is different when zeros in the @@ -250,9 +260,9 @@ Twin Predication is an entirely new concept not present in any commercial Vector ISA of the past forty years. To explain how normal Single-predication is applied in a standard Vector ISA: -* Predication on the **destination** of a LOAD instruction creates something +* Predication on the **source** of a LOAD instruction creates something called "Vector Compressed Load" (VCOMPRESS). -* Predication on the **source** of a STORE instruction creates something +* Predication on the **destination** of a STORE instruction creates something called "Vector Expanded Store" (VEXPAND). * SVP64 allows the two to be put back-to-back: one on source, one on destination. @@ -265,7 +275,8 @@ the source element array) and another *completely separate* predicate to the destination register, not just on Load/Stores but on *arithmetic* operations. -No other Vector ISA in the world has this capability. All true Vector +No other Vector ISA in the world has this back-to-back +capability. All true Vector ISAs have Predicate Masks: it is an absolutely essential characteristic. However none of them have abstracted dual predicates out to the extent where this VCOMPRESS-VEXPAND effect is applicable *in general* to a @@ -283,8 +294,8 @@ up in the ISA Tables whether it is 1P or 2P. caveat emptor! Also worth a special mention: all Load/Store operations are Twin-Predicated. The underlying key to understanding: -* one Predicate applies to the Array of Memory *Addresses*, -* the other Predicate applies to the Array of Memory *Data*. +* one Predicate effectively applies to the Array of Memory *Addresses*, +* the other Predicate effectively applies to the Array of Memory *Data*. # CR weird instructions -- 2.30.2