From 74b926479ed177c522aaeac68d67d345bf8e5e99 Mon Sep 17 00:00:00 2001 From: lkcl Date: Wed, 23 Dec 2020 01:06:59 +0000 Subject: [PATCH] --- openpower/sv/svp_rewrite/svp64.mdwn | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/openpower/sv/svp_rewrite/svp64.mdwn b/openpower/sv/svp_rewrite/svp64.mdwn index d90c33e30..9548f67d8 100644 --- a/openpower/sv/svp_rewrite/svp64.mdwn +++ b/openpower/sv/svp_rewrite/svp64.mdwn @@ -681,14 +681,19 @@ are mapreduced per *sub-element* as a result. illustration with a vec2: result.x = op(result.x, iregs[RA+i].x) result.y = op(result.y, iregs[RA+i].y) -When SVM is set and SUBVL!=1, another variant is enabled. +Note here that Rc=1 does not make sense when SVM is clear and SUBVL!=1. + + +When SVM is set and SUBVL!=1, another variant is enabled: horizontal subvector mode. Example for a vec3: for i in range(VL): result = op(iregs[RA+i].x, iregs[RA+i].x) - result = op(result, iregs[RA+i].z) + result = op(result, iregs[RA+i].y) result = op(result, iregs[RA+i].z) iregs[RT+i] = result +In this mode, when Rc=1 the Vector of CRs is as normal: each result element creates a corresponding CR element. + ## Fail-on-first Data-dependent fail-on-first has two distinct variants: one for LD/ST, @@ -730,6 +735,10 @@ One extremely important aspect of ffirst is: vectorised operations are effectively `nops` which is *precisely the desired and intended behaviour*. +Another aspect is that for ffirst LD/STs, VL may be truncated arbitrarily to a nonzero value for any implementation-specific reason. For example: it is perfectly reasonable for implementations to alter VL when ffirst LD or ST operations are initiated on a nonaligned boundary, such that within a loop the subsequent iteration of that loop begins subsequent ffirst LD/ST operations on an aligned boundary. Likewise, to reduce workloads or balance resources. + +CR-based data-dependent first on the other hand MUST not truncate VL arbitrarily. This because it is a precise test on which algorithms will rely. + ## pred-result mode This mode merges common CR testing with predication, saving on instruction count. Below is the pseudocode excluding predicate zeroing and elwidth overrides. -- 2.30.2