From 94ade358c5cb79892662df04949004dad1977b5c Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 14 Aug 2022 03:04:23 +0100 Subject: [PATCH] --- openpower/sv/svp64/appendix.mdwn | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn index c51e85d0d..4a15058a9 100644 --- a/openpower/sv/svp64/appendix.mdwn +++ b/openpower/sv/svp64/appendix.mdwn @@ -587,18 +587,9 @@ will **not** be overwritten and will **not** be zero'd. ## Sub-Vector Horizontal Reduction -Note that when SVM is clear and SUBVL!=1 the sub-elements are -*independent*, i.e. they are mapreduced per *sub-element* as a result. -illustration with a vec2, assuming RA==RT, e.g `sv.add/mr/vec2 r4, r4, r16.v` - - for i in range(0, VL): - # RA==RT in the instruction. does not have to be - iregs[RT].x = op(iregs[RT].x, iregs[RB+i].x) - iregs[RT].y = op(iregs[RT].y, iregs[RB+i].y) - -Thus logically there is nothing special or unanticipated about -`SVM=0`: it is expected behaviour according to standard SVP64 -Sub-Vector rules. +Note that when SVM is clear and SUBVL!=1 a Parallel Reduction is performed +on all first Subvector elements, followed by another separate independent +Parallel Reduction on all the second Subvector elements and so on. By contrast, when SVM is set and SUBVL!=1, a Horizontal Subvector mode is enabled, applying the Parallel Reduction @@ -617,9 +608,15 @@ Note that as this is a Parallel Reduction, for best results it should be an overwrite operation, where the result for the Horizontal Reduction of each Subvector will be in the first Subvector element. - Also note that use of Rc=1 is `UNDEFINED` behaviour. +In essence what is happening here is that Structure Packing is being +combined with Parallel Reduction. If the Subvector elements may be +laid out as a 2D matrix, with the Subvector elements on rows, +and Parallel Reduction is applied per row, then if `SVM` is **clear** +the Matrix is transposed (like Pack/Unpack) +before still applying the Parallel Reduction to the **row**. + # Fail-on-first Data-dependent fail-on-first has two distinct variants: one for LD/ST -- 2.30.2