From d6e469ad3a41a03964401dc8b510a8119d471ff1 Mon Sep 17 00:00:00 2001 From: lkcl Date: Mon, 26 Oct 2020 01:50:12 +0000 Subject: [PATCH] --- openpower/openpower/sv/predication.mdwn | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/openpower/openpower/sv/predication.mdwn b/openpower/openpower/sv/predication.mdwn index 5e204bd60..3df29fbca 100644 --- a/openpower/openpower/sv/predication.mdwn +++ b/openpower/openpower/sv/predication.mdwn @@ -63,3 +63,17 @@ The Dependency Matrix logic from the CR proposal favourably applies equally to t * A solution is to introduce a virtual register naming scheme however this slso introduces huge complexity as the register cache has to be capable of swapping reservations from 64 bitlevel to full 64bit scalar level *and* keep the Dependency Matrices synchronised it is enormously complex and likely to result in debugging, verification and ongoing maintenance difficulties. + +## Schemes which split integer regs into chunks + +These ideas are based on the principle that each chunk of 8 (or 16) bits of a scalar integer register may be covered by its own DM row. 8 chunks would for example require 8 DM entries. + +This would, for vector sizes of 8, solve the "chaining" problem reasonably well even when two FUs (or two clock cycles) were required to deal with 4 elements at a time. The "compare" that generated the predicate would be ready to go into the first "chunk" of predicate bits whilst the second compare was still being issued. + +The problems start when trying to allocate bits of predicate to units. Just like the single-DM-row per entire scalar reg case, a shadow-capable Predicate Funxtion Unit is now required (already determined to be costly) except now if there are 8 chunks requiring 8 Predicate FUs *the problem is now made 8x worse*. + +Not only that but it is even more complex when trying to bring in virtual register cachring in order to bring down overall FU-REGs DM row count, although the numbers are much lower: 8x 8-bit chunks of scalar int only requires 8 DM Rows and 8 virtual subdivisions however *this is per in-flight register*. + +Out-of-order systems, to be effective, require several operations to be "in-flight" (POWER10 has up to 1,000 in-flight instructions) and if every predicated vector operation needed one 8-chunked scalar register each it becomes exceedingly complex very quickly. + +Overall this idea which initially seems to save resources brings together all the least favourable aspects of other proposals and combines all of them! -- 2.30.2