From fc28ccd3eaf3e0038c484ebae89c2026b0935728 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Tue, 25 Dec 2018 12:56:22 +0000 Subject: [PATCH] update --- updates/007_2018dec25_predication.mdwn | 53 +++++++++++++++++++++++++- 1 file changed, 52 insertions(+), 1 deletion(-) diff --git a/updates/007_2018dec25_predication.mdwn b/updates/007_2018dec25_predication.mdwn index ca0eb39..45a3047 100644 --- a/updates/007_2018dec25_predication.mdwn +++ b/updates/007_2018dec25_predication.mdwn @@ -29,4 +29,55 @@ issue phase itself has to become a Function Unit. Let me repeat that again: the instruction issue phase *itself* has to have its own scoreboard Dependency Matrix entry. -This brings some quite fascinating (read: scary) +This brings some quite fascinating (read: scary) challenges and opportunities. +If handled incorrectly, it means that the entire idea of using a multi-issue +instruction FIFO is toast, as there will be guaranteed stalling whenever +a predicated vectorised instruction is encountered. + +Normally, a multi-issue engine has a guaranteed regular number of instructions +to process and place in the queue. Even branches do not stop the flow +of placement into the FIFO, as branch prediction (speculative execution) can +guess with high accuracy where the branch will go. Predicated vectorised +instruction issue is completely different: we have *no idea* - in advance - +if the issued element-based instruction is actually going to be executed +or not. We do not have the predicate source register (yet) because it +hasn't been calculated, because the prior instruction (which is being +executed out-of-order, and is **itself** dependent on prior instruction +completion) hasn't even been started yet. + +Perhaps - thinking out loud - it would be okay to have a place-holder, +waiting for the predicate bits to arrive. Perhaps it is as simple as +adding an extra source register (predicate source) to every single Function +Unit. So instead of each Function Unit having src1 and src2, it has +src1, src2, predicate "bit". Given that it is just a single bit that each +Function Unit would be waiting for, it does seem somewhat gratuitous, +and a huge complication of an otherwise extremely simple scoreboard +(at present, there are no CAMs and no multi-wire I/Os in any of the +cells of either the FU-to-FU Matrix or the FU-to-Register Dependency Matrix). +Therefore, having **separate** Function Unit(s) which wait for the +predication register to be available, that are themselves plumbed in to +the actual Scoreboard system, decoding and issuing further instructions only +once the predicate register is ready, seems to be a reasonable avenue to +explore. + +However, the last thing that we need is to stall execution completely, +so a lot more thought is going to be needed. The nice thing about having +a predicated vectorisation "Issue" Function Unit is: some of the more +complex decoding (particularly REMAP) can hypothetically be pipelined. +However that is **guaranteed** to result in stalled execution, as the +out-of-order system is going to critically depend on knowing what the +dependencies **are**! Perhaps it may be possible to put in temporary +"blank" entries that are filled in later? Issue place-holder instructions +into the Dependency Matrix, where we know that the registers on which +the instruction will depend is known at a later date? + +Why that needs to be considered is: remember that the whole basis of +Simple-V is: you issue multiple *sequential* instructions. Unfortunately, +REMAP jumbles up the word "sequential" using a 1D/2D/3D/offset algorithm, +such that the actual register (or part-register in the case of 8/16/32-bit +element widths) needs a calculation to be performed in order to determine +which register is to be used. And, secondly, predication can entirely +skip some of those element-based instructions! + +Talk about complex! Simple-V is supposed to be simple! No wonder +chip designers go for SIMD and let the software sort out the mess... -- 2.30.2