From 8524cda6a12782d8819868c5929a166eea110fdd Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 26 Jun 2022 15:26:24 +0100 Subject: [PATCH] --- openpower/sv/svp64/appendix.mdwn | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn index fc15904a7..365d8a8f5 100644 --- a/openpower/sv/svp64/appendix.mdwn +++ b/openpower/sv/svp64/appendix.mdwn @@ -604,7 +604,7 @@ will **not** be overwritten and will **not** be zero'd. Note that when SVM is clear and SUBVL!=1 the sub-elements are *independent*, i.e. they are mapreduced per *sub-element* as a result. -illustration with a vec2, assuming RA==RT, e.g `sv.add/mr/vec2 r4, r4, r16` +illustration with a vec2, assuming RA==RT, e.g `sv.add/mr/vec2 r4, r4, r16.v` for i in range(0, VL): # RA==RT in the instruction. does not have to be @@ -622,28 +622,33 @@ like a traditional Vector Processor Reduction instruction. Example for a vec2: for i in range(VL): - iregs[RT+i] = op(iregs[RA+i].x, iregs[RA+i].y) + iregs[RT+i] = op(iregs[RA+i].x, iregs[RB+i].y) Example for a vec3: for i in range(VL): - iregs[RT+i] = op(iregs[RA+i].x, iregs[RA+i].y) - iregs[RT+i] = op(iregs[RT+i] , iregs[RA+i].z) + iregs[RT+i] = op(iregs[RA+i].x, iregs[RB+i].y) + iregs[RT+i] = op(iregs[RT+i] , iregs[RB+i].z) Example for a vec4: for i in range(VL): - iregs[RT+i] = op(iregs[RA+i].x, iregs[RA+i].y) - iregs[RT+i] = op(iregs[RT+i] , iregs[RA+i].z) - iregs[RT+i] = op(iregs[RT+i] , iregs[RA+i].w) + iregs[RT+i] = op(iregs[RA+i].x, iregs[RB+i].y) + iregs[RT+i] = op(iregs[RT+i] , iregs[RB+i].z) + iregs[RT+i] = op(iregs[RT+i] , iregs[RB+i].w) In this mode, when Rc=1 the Vector of CRs is as normal: each result element creates a corresponding CR element (for the final, reduced, result). -Note that the destination (RT) is automatically used as an "Accumulator" -register, and consequently the Sub-Vector Loop is interruptible. -If RT is a Scalar then as usual the main VL Loop terminates at the -first predicated element (or the first element if unpredicated). +Note: + +1. that the destination (RT) is inherently used as an "Accumulator" + register, and consequently the Sub-Vector Loop is interruptible. + If RT is a Scalar then as usual the main VL Loop terminates at the + first predicated element (or the first element if unpredicated). +2. that the Sub-Vector designation applies to RA and RB *but not RT*. +3. that the number of operations executed is one less than the Sub-vector + length # Fail-on-first -- 2.30.2