From ea249820c037d53bc7fb6b97c96a953298b544c7 Mon Sep 17 00:00:00 2001 From: lkcl Date: Mon, 11 Dec 2023 02:43:20 +0000 Subject: [PATCH] --- openpower/sv/svp64_quirks.mdwn | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/openpower/sv/svp64_quirks.mdwn b/openpower/sv/svp64_quirks.mdwn index b87bf0da4..fa4a32a31 100644 --- a/openpower/sv/svp64_quirks.mdwn +++ b/openpower/sv/svp64_quirks.mdwn @@ -484,7 +484,7 @@ effect a change of VL: ``` for i in range(VL): - result = element_operation(GPR(RA+i), GPR(RB+i)) + GPR(RT+i) = result = operation(GPR(RA+i), GPR(RB+i)) if test(result): VL = i break @@ -501,6 +501,37 @@ beyond the Vector Truncation point. In-order systems will have a slightly harder time and may choose to execute one element only at a time, reducing performance as a result. +# Data-Dependent Fail-First implicit mapreduce mode + +Best first illustrated with pseudocode, which should be +compared with the above, it is crucial to note that both +RT and RA are scalar: only RB is Vector yet just as with +mapreduce mode looping *continues*. + +``` +for i in range(VL): + GPR(RT) = result = operation(GPR(RA), GPR(RB+i)) + if test(result): + VL = i + break +``` + +The "normal" rule for SV Looping is that looping +terminates at the first scalar result (if destination is +set to scalar). This rule is *disabled* for mapreduce mode, +allowing a scalar to be used as an "accumulator" by +setting the result (RT, FRT, BF) to be the exact same +register as one of the sources. + +It turned out to be extremly useful to have *conditional* +termination of such "mapreducing" style accumulation, +for example to terminate and truncate dotproduct +accumulation should the arithmetic accumulator overflow. +Or, in the [[openpower/sv/cookbook/fortran_maxloc]] +example, to terminate the parallel max-search at the +first instance where the element currently tested is +no longer greater than that previously found. + # OE=1 The hardware cost of Sticky Overflow in a parallel environment is immense. -- 2.30.2