From 992155539a8434dbe62c91f42cf608d3263fd9cf Mon Sep 17 00:00:00 2001 From: lkcl Date: Mon, 26 Oct 2020 01:58:56 +0000 Subject: [PATCH] --- openpower/openpower/sv/predication.mdwn | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/openpower/openpower/sv/predication.mdwn b/openpower/openpower/sv/predication.mdwn index 819735379..4ecc61ca6 100644 --- a/openpower/openpower/sv/predication.mdwn +++ b/openpower/openpower/sv/predication.mdwn @@ -49,7 +49,7 @@ a big advantage of this is that unpredicated operations just set the predicate t This idea has several disadvantages. * the single DM entry for the entire 64 bits creates a read hazard that has to be resolved through the addition of a special Shadowing Function Unit. Only when the entire predicate is available can the die-cancel/ok be pulled on the FU elements each bit covers -* this situation is exacerbated if one vector creates a predicate mask that is then used to mask immediately following instructions. Ordinarily, Cray-styke "chaining" would be possible. The single DM entry for the entire predicate mask prohibits this. +* this situation is exacerbated if one vector creates a predicate mask that is then used to mask immediately following instructions. Ordinarily (i.e. without the predicate involved), Cray-style "chaining" would be possible. The single DM entry for the entire predicate mask prohibits this because the subsequent operations can only proceed when the *entire* mask has been computed. * Allocation of bits to FUs gets particularly complex for SIMD (elwidth overrides) requiring shift and mask logic that is simply not needed compared to "one-for-one" schemes (above) Overall there is very little in favour of this concept. @@ -78,4 +78,6 @@ Not only that but it is even more complex when trying to bring in virtual regist Out-of-order systems, to be effective, require several operations to be "in-flight" (POWER10 has up to 1,000 in-flight instructions) and if every predicated vector operation needed one 8-chunked scalar register each it becomes exceedingly complex very quickly. +Even more than that, when computing the mask from a vector "compare", the groupings are troublesome to think through how to implement, which is itself a bad sign. It is suspected that chaining will be complex or adversely affected by certain combinations of element width. + Overall this idea which initially seems to save resources brings together all the least favourable aspects of other proposals and combines all of them! -- 2.30.2