From e506bbfe7a44a0baa63ecd557ee776936b3d2b56 Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 26 Jun 2022 11:46:41 +0100 Subject: [PATCH] --- openpower/sv/svp64/appendix.mdwn | 42 ++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn index 3ff57f151..75a08fa8c 100644 --- a/openpower/sv/svp64/appendix.mdwn +++ b/openpower/sv/svp64/appendix.mdwn @@ -537,6 +537,10 @@ Vector Reduce Mode issues a deterministic tree-reduction schedule to the underly (Power ISA v3.0B) operation is leveraged, unmodified, to give the *appearance* and *effect* of Reduction. +Vector-result reduction **requires** +the destination to be a Vector, which will be used to store +intermediary results. + Given that the tree-reduction schedule is deterministic, Interrupts and exceptions can therefore also be precise. The final result will be in the first @@ -556,6 +560,44 @@ not make sense. Many 3-input instructions (madd, fmadd) unlike Scalar Reduction in particular do not make sense, but `ternlogi`, if used with care, would. +**Parallel-Reduction with Predication** + +To avoid breaking the strict RISC-paradigm, keeping the Issue-Schedule +completely separate from the actual element-level (scalar) operations, +Move operations are **not** included in the Schedule. This means that +the Schedule leaves the final (scalar) result in the first-non-masked +element of the Vector used. With the predicate mask being dynamic +(but deterministic) this result could be anywhere. + +If that result is needed to be moved to a (single) scalar register +then a follow-up `sv.mv/sm=predicate rt, ra.v` instruction will be +needed to get it, where the predicate is the exact same predicate used +in the prior Parallel-Reduction instruction. For *some* implementations +this may be a slow operation. It may be better to perform a pre-copy +of the values, compressing them (VREDUCE-style) into a contiguous block, +which will guarantee that the result goes into the very first element +of the destination vector. + +**Usage conditions** + +The simplest usage is to perform an overwrite, specifying all three +register operands the same. + + setvl VL=6 + sv.add/vr 8.v, 8.v, 8.v + +The Reduction Schedule will issue the Parallel Tree Reduction spanning +registers 8 through 13, by adjusting the offsets to RT, RA and RB as +necessary (see "Parallel Reduction algorithm" in a later section). + +A non-overwrite is possible as well but just as with the overwrite +version, only those destination elements necessary for storing +intermediary computations will be written to: the remaining elements +will **not** be overwritten and will **not** be zero'd. + + setvl VL=4 + sv.add/vr 0.v, 8.v, 8.v + ## Sub-Vector Horizontal Reduction Note that when SVM is clear and SUBVL!=1 the sub-elements are -- 2.30.2