From: Luke Kenneth Casson Leighton Date: Thu, 3 Jan 2019 06:04:34 +0000 (+0000) Subject: update predicate discussion X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=29327cb2bd1e3c1a2ab0f37541dd57045abd3c2c;p=crowdsupply.git update predicate discussion --- diff --git a/updates/007_2018dec25_predication.mdwn b/updates/007_2018dec25_predication.mdwn index 8942af1..9571fe2 100644 --- a/updates/007_2018dec25_predication.mdwn +++ b/updates/007_2018dec25_predication.mdwn @@ -83,3 +83,49 @@ skip some of those element-based instructions! Talk about complex! Simple-V is supposed to be simple! No wonder chip designers go for SIMD and let the software sort out the mess... + +# Placeholder instructions: predication shadow + +Recall from earlier updates that Mitch Alsup describes, in two unpublished +book chapters, some augmentations and modernisations to the 6600 Scoreboard +system, providing speculative branch execution as well as precise exceptions. +Both are identically based on the idea of adding a "schroedinger" wire that may +be used to kill off future instructions, along-side an additional +**non-register-based** Write Hazard dependency that prevents register +writes from committing, **without** preventing the instruction from actually +calculating the result that is to be written (once or if permitted). + +Mentioned above is the idea of issuing "place-holder" instructions. These +are basically instructions which are waiting for their relevant predicate +bit to become *available*. They could hypothetically actually still be +executed (or at least begin execution). They would however **not** be +permitted to commit the results to the register file, and they would be +"shadowed" by the above-proposed "Predication Calculating Function Unit". + +This ineptly-named Function Unit would have the relevant predication register +as its src, just like any other Function Unit with dependent source registers. +It would similarly have a "schroedinger" wire, and it would similarly +cast a write-block shadow over the Vectorised instructions that were waiting +for predication bits. + +Once the predicate register is available, the Predicate-computing FU would +begin "farming out" individual bits of the predicate, calling "Go\_Die" +schroedinger signals on those Vectorised instructions where their associated +predicate bit is zero (or, for when zeroing is enabled, turn them into +"zero result" instructions), and for those instructions where the predicate +bit is set, cancel the write-block shadow. + +Whether this is a wise utilisation of resources is another matter. If +predication is routinely 50% or less, a significant portion of the Vectorised +Function Units could hypothetically be calculating results that are *known* +to be discarded almost immediately. Also, the whole point of the exercise +of using a multi-issue execution engine was to save resources, not allocating +instructions *at all* where the predication bit for that Vectorised operation +is zero. + +However, it is better than the alternatives, and it's possible to +keep to a multi-issue micro-architecture as well, which is important in +order to achieve the target performance. Ultimately, simulations can tell us +whether the GPU and VPU workloads will have significant predication better +than guessing will: we'll just have to see how it goes. +