(no commit message)

author lkcl <lkcl@web>

Sun, 14 Aug 2022 02:04:23 +0000 (03:04 +0100)

committer IkiWiki <ikiwiki.info>

Sun, 14 Aug 2022 02:04:23 +0000 (03:04 +0100)
author lkcl <lkcl@web>
Sun, 14 Aug 2022 02:04:23 +0000 (03:04 +0100)
committer IkiWiki <ikiwiki.info>
Sun, 14 Aug 2022 02:04:23 +0000 (03:04 +0100)
diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn

index c51e85d0dc58698c55988e6686e9f35e02dc9415..4a15058a9b2147121354a1baa6acdd27c4d057c7 100644 (file)
--- a/openpower/sv/svp64/appendix.mdwn
+++ b/openpower/sv/svp64/appendix.mdwn
@@ -587,18 +587,9 @@ will **not** be overwritten and will **not** be zero'd.
  
  ## Sub-Vector Horizontal Reduction
  
-Note that when SVM is clear and SUBVL!=1 the sub-elements are
-*independent*, i.e. they are mapreduced per *sub-element* as a result.
-illustration with a vec2, assuming RA==RT, e.g `sv.add/mr/vec2 r4, r4, r16.v`
-
-    for i in range(0, VL):
-        # RA==RT in the instruction. does not have to be
-        iregs[RT].x = op(iregs[RT].x, iregs[RB+i].x)
-        iregs[RT].y = op(iregs[RT].y, iregs[RB+i].y)
-
-Thus logically there is nothing special or unanticipated about
-`SVM=0`: it is expected behaviour according to standard SVP64
-Sub-Vector rules.
+Note that when SVM is clear and SUBVL!=1 a Parallel Reduction is performed
+on all first Subvector elements, followed by another separate independent
+Parallel Reduction on all the second Subvector elements and so on.
  
  By contrast, when SVM is set and SUBVL!=1, a Horizontal
  Subvector mode is enabled, applying the Parallel Reduction
@@ -617,9 +608,15 @@ Note that as this is a Parallel Reduction, for best results
  it should be an overwrite operation, where the result for
  the Horizontal Reduction of each Subvector will be in the
  first Subvector element.
-
  Also note that use of Rc=1 is `UNDEFINED` behaviour.
  
+In essence what is happening here is that Structure Packing is being
+combined with Parallel Reduction.  If the Subvector elements may be
+laid out as a 2D matrix, with the Subvector elements on rows,
+and Parallel Reduction is applied per row, then if `SVM` is **clear**
+the Matrix is transposed (like Pack/Unpack)
+before still applying the Parallel Reduction to the **row**.
+
  # Fail-on-first <a name="fail-first"> </a>
  
  Data-dependent fail-on-first has two distinct variants: one for LD/ST
author	lkcl <lkcl@web>
	Sun, 14 Aug 2022 02:04:23 +0000 (03:04 +0100)
committer	IkiWiki <ikiwiki.info>
	Sun, 14 Aug 2022 02:04:23 +0000 (03:04 +0100)