3d_gpu/architecture/inorder_model.mdwn

   1 # Single-Issue, In-Order Processor Core
   2
   3 * First steps for a newbie developer [[docs/firststeps]]
   4 * bugreport <http://bugs.libre-riscv.org/show_bug.cgi?id=1039>
   5
   6 At present *[Update when this is no longer the case]*, the Libre-SOC core
   7 utilises a Finite-State Machine (FSM) to control the fetch/dec/issue/exec
   8 pipelines, with only one pipeline being active at any given time. This is good
   9 for debugging the HDL, but severly restricts performance as a single
  10 instruction will take tens of clock cycles to complete.
  11
  12 A Single-Issue In-Order control unit will allow every pipepline to be active,
  13 and raises the ideal maximum throughput to 1 instruction per clock cycle,
  14 bearing any register hazards.
  15
  16 This control unit has not been written in HDL yet, however there's currently a
  17 task to develop the model for the simulator first. The model will be used to
  18 determine performance (and eventually to write the HDL).
  19
  20 # The Model
  21 ## Brief [src](https://bugs.libre-soc.org/show_bug.cgi?id=1039)
  22 The model for the Single-Issue In-Order core needs to be added to the in-house
  23 Python simulator (`ISACaller`, called by `pypowersim`), which will allow basic
  24 *performance estimates*.
  25
  26 For now, this model resides outside the simulator, and
  27 is *completely standalone*.
  28
  29 Eventually, Cavatools code will be studied to extract and re-implement in
  30 Python power consumption estimation.
  31
  32 ## Task given [src](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c1) [src](https://libre-soc.org/irclog/%23libre-soc.2023-05-02.log.html#t2023-05-02T10:51:45)
  33 An offline instruction ordering analyser need to be written that models a
  34 (simple, initially V3.0-only) **in-order core** and gives an estimate of
  35 instructions per clock (IPC).
  36
  37 Hazard Protection should be straightforward, simple bit vector:
  38
  39 - Take the write result register number: set bit
  40 - For all read registers, check corresponding bit. If bit is set, STALL (fake/
  41 model-stall)
  42
  43 A stall is defined as a delay in execution of an instruction in order to
  44 resolve a hazard (i.e. trying to read a register while it is being written to).
  45 See the [wikipedia article on Pipeline Stall](https://en.wikipedia.org/wiki/Pipeline_stall)
  46
  47 Input should be:
  48
  49 - Instruction with its operands (as assembler listing)
  50 - plus an optional memory-address and whether it is read or written.
  51
  52 The input will come from as trace output from the ISACaller simulator,
  53 [see bug comments #7-#16](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c7)
  54
  55 Some classes needed which "model" pipeline stages: fetch, decode, issue,
  56 execute.
  57
  58 One global "STALL" flag will cause all buses to stop:
  59
  60 - Tells fetch to stop fetching
  61 - Decode stops (either because empty, or has instrution whose read reg's and
  62 being written to).
  63 - Issue stops.
  64 - Execute (pipelines) run as an empty slot (except for the initial instruction
  65  causing the stall)
  66
  67 Example (PC chosen arbitrarily):
  68
  69     addi 3, 4, 5    #PC=8
  70     cmpi 1, 0, 3, 4 #PC=12
  71     ld   1, 2(3)    #PC=16 EA=0x12345678
  72
  73 The third operand of `cmpi` is the register which to use in comparison, so
  74 register 3 needs to be read. However, `addi` will be writing to this register,
  75 and thus a STALL will occur when `cmpi` is in the decode phase.
  76
  77 The output diagram will look like this:
  78
  79 | clk # | fetch | decode | issue | execute |
  80 | 1 | addi 3, 4, 5 | | | | |
  81 | 2 | cmpi 1, 0, 3, 4 | addi 3, 4, 5 | | |
  82 | 3 | STALL | cmpi 1, 0, 3, 4 | addi 3, 4, 5 | |
  83 | 4 | STALL | cmpi 1, 0, 3, 4 | | addi 3, 4, 5 |
  84 | 5 | ld 1, 2(3) | | cmpi 1, 0, 3, 4 | |
  85 | 6 | | ld 1, 2(3) | | cmpi 1, 0, 3, 4 |
  86 | 7 | | | ld 1, 2(3) | |
  87 | 8 | | | | ld 1, 2(3) |
  88
  89 Explanation:
  90
  91     1: Fetched `addi`.
  92     2: Decoded `addi`, fetched `cmpi`.
  93     3: Issued `addi`, decoded `cmpi`, must stall decode phase, stop fetching.
  94     4: Executed `addi`, everything else stalled.
  95     5: Issued `cmpi`, fetched `ld`.
  96     6: Executed `cmpi`, decoded `ld`.
  97     7: Issued `ld`.
  98     8: Executed `ld`.
  99
 100 For this initial model, it is assumed that all instructions take one cycle to
 101 execute (not the case for mul/div etc., but will be dealt with later.
 102
 103 **In-progress TODO**
 104
 105 # Code Explanation
 106
 107 Source code: <https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/cyclemodel>
 108