(no commit message)

author lkcl <lkcl@web>

Mon, 26 Oct 2020 01:58:56 +0000 (01:58 +0000)

committer IkiWiki <ikiwiki.info>

Mon, 26 Oct 2020 01:58:56 +0000 (01:58 +0000)
author lkcl <lkcl@web>
Mon, 26 Oct 2020 01:58:56 +0000 (01:58 +0000)
committer IkiWiki <ikiwiki.info>
Mon, 26 Oct 2020 01:58:56 +0000 (01:58 +0000)
diff --git a/openpower/openpower/sv/predication.mdwn b/openpower/openpower/sv/predication.mdwn

index 819735379b8849ee9013ba01cad50c3a884c3b2d..4ecc61ca67b171f286b0d4dc8006b6da4cc7c991 100644 (file)
--- a/openpower/openpower/sv/predication.mdwn
+++ b/openpower/openpower/sv/predication.mdwn
@@ -49,7 +49,7 @@ a big advantage of this is that unpredicated operations just set the predicate t
  This idea has several disadvantages.
  
  * the single DM entry for the entire 64 bits creates a read hazard that has to be resolved through the addition of a special Shadowing Function Unit.  Only when the entire predicate is available can the die-cancel/ok be pulled on the FU elements each bit covers
-* this situation is exacerbated if one vector creates a predicate mask that is then used to mask immediately following instructions.  Ordinarily, Cray-styke "chaining" would be possible.  The single DM entry for the entire predicate mask prohibits this.
+* this situation is exacerbated if one vector creates a predicate mask that is then used to mask immediately following instructions.  Ordinarily (i.e. without the predicate involved), Cray-style "chaining" would be possible.  The single DM entry for the entire predicate mask prohibits this because the subsequent operations can only proceed when the *entire* mask has been computed.
  * Allocation of bits to FUs gets particularly complex for SIMD (elwidth overrides) requiring shift and mask logic that is simply not needed compared to "one-for-one" schemes (above)
  
  Overall there is very little in favour of this concept.
@@ -78,4 +78,6 @@ Not only that but it is even more complex when trying to bring in virtual regist
  
  Out-of-order systems, to be effective, require several operations to be "in-flight" (POWER10 has up to 1,000 in-flight instructions) and if every predicated vector operation needed one 8-chunked scalar register each it becomes exceedingly complex very quickly.
  
+Even more than that, when computing the mask from a vector "compare", the groupings are troublesome to think through how to implement, which is itself a bad sign.  It is suspected that chaining will be complex or adversely affected by certain combinations of element width.
+
  Overall this idea which initially seems to save resources brings together all the least favourable aspects of other proposals and combines all of them!
author	lkcl <lkcl@web>
	Mon, 26 Oct 2020 01:58:56 +0000 (01:58 +0000)
committer	IkiWiki <ikiwiki.info>
	Mon, 26 Oct 2020 01:58:56 +0000 (01:58 +0000)