From 181ac8dd9bbcae8ccf7cac77ef852509373e7e82 Mon Sep 17 00:00:00 2001 From: Andrey Miroshnikov Date: Wed, 9 Aug 2023 18:13:40 +0000 Subject: [PATCH] inorder_model: Document single-issue in-order model task. --- 3d_gpu/architecture/inorder_model.mdwn | 108 +++++++++++++++++++++++++ 1 file changed, 108 insertions(+) create mode 100644 3d_gpu/architecture/inorder_model.mdwn diff --git a/3d_gpu/architecture/inorder_model.mdwn b/3d_gpu/architecture/inorder_model.mdwn new file mode 100644 index 000000000..59f8d5abb --- /dev/null +++ b/3d_gpu/architecture/inorder_model.mdwn @@ -0,0 +1,108 @@ +# Single-Issue, In-Order Processor Core + +* First steps for a newbie developer [[docs/firststeps]] +* bugreport + +At present *[Update when this is no longer the case]*, the Libre-SOC core +utilises a Finite-State Machine (FSM) to control the fetch/dec/issue/exec +pipelines, with only one pipeline being active at any given time. This is good +for debugging the HDL, but severly restricts performance as a single +instruction will take tens of clock cycles to complete. + +A Single-Issue In-Order control unit will allow every pipepline to be active, +and raises the ideal maximum throughput to 1 instruction per clock cycle, +bearing any register hazards. + +This control unit has not been written in HDL yet, however there's currently a +task to develop the model for the simulator first. The model will be used to +determine performance (and eventually to write the HDL). + +# The Model +## Brief [src](https://bugs.libre-soc.org/show_bug.cgi?id=1039) +The model for the Single-Issue In-Order core needs to be added to the in-house +Python simulator (`ISACaller`, called by `pypowersim`), which will allow basic +*performance estimates*. + +For now, this model resides outside the simulator, and +is *completely standalone*. + +Eventually, Cavatools code will be studied to extract and re-implement in +Python power consumption estimation. + +## Task given [src](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c1) [src](https://libre-soc.org/irclog/%23libre-soc.2023-05-02.log.html#t2023-05-02T10:51:45) +An offline instruction ordering analyser need to be written that models a +(simple, initially V3.0-only) **in-order core** and gives an estimate of +instructions per clock (IPC). + +Hazard Protection should be straightforward, simple bit vector: + +- Take the write result register number: set bit +- For all read registers, check corresponding bit. If bit is set, STALL (fake/ +model-stall) + +A stall is defined as a delay in execution of an instruction in order to +resolve a hazard (i.e. trying to read a register while it is being written to). +See the [wikipedia article on Pipeline Stall](https://en.wikipedia.org/wiki/Pipeline_stall) + +Input should be: + +- Instruction with its operands (as assembler listing) +- plus an optional memory-address and whether it is read or written. + +The input will come from as trace output from the ISACaller simulator, +[see bug comments #7-#16](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c7) + +Some classes needed which "model" pipeline stages: fetch, decode, issue, +execute. + +One global "STALL" flag will cause all buses to stop: + +- Tells fetch to stop fetching +- Decode stops (either because empty, or has instrution whose read reg's and +being written to). +- Issue stops. +- Execute (pipelines) run as an empty slot (except for the initial instruction + causing the stall) + +Example (PC chosen arbitrarily): + + addi 3, 4, 5 #PC=8 + cmpi 1, 0, 3, 4 #PC=12 + ld 1, 2(3) #PC=16 EA=0x12345678 + +The third operand of `cmpi` is the register which to use in comparison, so +register 3 needs to be read. However, `addi` will be writing to this register, +and thus a STALL will occur when `cmpi` is in the decode phase. + +The output diagram will look like this: + +| clk # | fetch | decode | issue | execute | +| 1 | addi 3, 4, 5 | | | | | +| 2 | cmpi 1, 0, 3, 4 | addi 3, 4, 5 | | | +| 3 | STALL | cmpi 1, 0, 3, 4 | addi 3, 4, 5 | | +| 4 | STALL | cmpi 1, 0, 3, 4 | | addi 3, 4, 5 | +| 5 | ld 1, 2(3) | | cmpi 1, 0, 3, 4 | | +| 6 | | ld 1, 2(3) | | cmpi 1, 0, 3, 4 | +| 7 | | | ld 1, 2(3) | | +| 8 | | | | ld 1, 2(3) | + +Explanation: + + 1: Fetched `addi`. + 2: Decoded `addi`, fetched `cmpi`. + 3: Issued `addi`, decoded `cmpi`, must stall decode phase, stop fetching. + 4: Executed `addi`, everything else stalled. + 5: Issued `cmpi`, fetched `ld`. + 6: Executed `cmpi`, decoded `ld`. + 7: Issued `ld`. + 8: Executed `ld`. + +For this initial model, it is assumed that all instructions take one cycle to +execute (not the case for mul/div etc., but will be dealt with later. + +**In-progress TODO** + +# Code Explanation + +Source code: + -- 2.30.2