3d_gpu/architecture/inorder_model.mdwn

   1 # Single-Issue, In-Order Processor Core
   2
   3 * First steps for a newbie developer [[docs/firststeps]]
   4 * bugreport <http://bugs.libre-riscv.org/show_bug.cgi?id=1039>
   5
   6 The Libre-SOC TestIssuer core
   7 utilises a Finite-State Machine (FSM) to control the fetch/dec/issue/exec
   8 pipelines, with only one pipeline being active at any given time. This is good
   9 for debugging the HDL, but severly restricts performance as a single
  10 instruction will take tens of clock cycles to complete.  In-development
  11 (Andrey to research and link to the relevant bugreport) is an in-order
  12 core and following on from that will be an out-of-order core.
  13
  14 A Single-Issue In-Order control unit will allow every pipepline to be active,
  15 and raises the ideal maximum throughput to 1 instruction per clock cycle,
  16 bearing any register hazards.
  17
  18 This control unit has not been written in HDL yet (incorrect: the first version was written 18 months ago, and is in soc/ and there are options in the Makefile to enable it), however there's currently a
  19 task to develop the model for the simulator first. The model will be used to
  20 determine performance.
  21
  22 Diagram that Luke drew comparing pipelines and fsms which allows for a transition from FSM to in-order to out-of-order and also allows "Micro-Coding".
  23
  24 [[!img /3d_gpu/pipeline_vs_fsms.jpg size="600x"]]
  25
  26 # The Model
  27 ## Brief
  28
  29 * [Bug description](https://bugs.libre-soc.org/show_bug.cgi?id=1039)
  30
  31 The model for the Single-Issue In-Order core needs to be added to the in-house
  32 Python simulator (`ISACaller`, called by `pypowersim`), which will allow basic
  33 *performance estimates*.
  34
  35 For now, this model resides outside the simulator, and
  36 is *completely standalone*.
  37
  38 Eventually, Cavatools code will be studied to extract and re-implement in
  39 Python power consumption estimation.
  40
  41 ## Task given
  42
  43 * [Bug comment #1](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c1)
  44 * [IRC log](https://libre-soc.org/irclog/%23libre-soc.2023-05-02.log.html#t2023-05-02T10:51:45)
  45
  46 An offline instruction ordering analyser need to be written that models a
  47 (simple, initially V3.0-only) **in-order core** and gives an estimate of
  48 instructions per clock (IPC).
  49
  50 Hazard Protection should be straightforward, simple bit vector:
  51
  52 - Take the write result register number: set bit
  53 - For all read registers, check corresponding bit. If bit is set, STALL (fake/
  54 model-stall)
  55
  56 A stall is defined as a delay in execution of an instruction in order to
  57 resolve a hazard (i.e. trying to read a register while it is being written to).
  58 See the [wikipedia article on Pipeline Stall](https://en.wikipedia.org/wiki/Pipeline_stall)
  59
  60 Input should be:
  61
  62 - Instruction with its operands (as assembler listing)
  63 - plus an optional memory-address and whether it is read or written.
  64
  65 The input will come as a trace output from the ISACaller simulator,
  66 [see bug comments #7-#16](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c7)
  67
  68 Some classes needed which "model" pipeline stages: fetch, decode, issue,
  69 execute.
  70
  71 One global "STALL" flag will cause all buses to stop:
  72
  73 - Tells fetch to stop fetching
  74 - Decode stops (either because empty, or has instrution whose read reg's and
  75 being written to).
  76 - Issue stops.
  77 - Execute (pipelines) run as an empty slot (except for the initial instruction
  78  causing the stall)
  79
  80 Example (PC chosen arbitrarily):
  81
  82     addi 3, 4, 5    #PC=8
  83     cmpi 1, 0, 3, 4 #PC=12
  84     ld   1, 2(3)    #PC=16 EA=0x12345678
  85
  86 The third operand of `cmpi` is the register which to use in comparison, so
  87 register 3 needs to be read. However, `addi` will be writing to this register,
  88 and thus a STALL will occur when `cmpi` is in the decode phase.
  89
  90 The output diagram will look like this:
  91
  92 TODO, move this to a separate file then *include it twice*, once with triple-quotes
  93 and once without.  grep "inline raw=yes" for examples on how to include in mdwn
  94
  95 ```
  96 | clk # |    fetch     |    decode    |   issue      |   execute    |
  97 |:-----:|:------------:|:------------:|:------------:|:------------:|
  98 |   1   | addi 3,4,5   |              |              |              |
  99 |   2   | cmpi 1,0,3,4 | addi 3,4,5   |              |              |
 100 |   3   | STALL        | cmpi 1,0,3,4 | addi 3,4,5   |              |
 101 |   4   | STALL        | cmpi 1,0,3,4 |              | addi 3,4,5   |
 102 |   5   | ld 1,2(3)    |              | cmpi 1,0,3,4 |              |
 103 |   6   |              | ld 1,2(3)    |              | cmpi 1,0,3,4 |
 104 |   7   |              |              | ld 1,2(3)    |              |
 105 |   8   |              |              |              | ld 1,2(3)    |
 106 ```
 107
 108 Explanation:
 109
 110     1: Fetched addi.
 111     2: Decoded addi, fetched cmpi.
 112     3: Issued addi, decoded cmpi, must stall decode phase, stop fetching.
 113     4: Executed addi, everything else stalled.
 114     5: Issued cmpi, fetched ld.
 115     6: Executed cmpi, decoded ld.
 116     7: Issued ld.
 117     8: Executed ld.
 118
 119 For this initial model, it is assumed that all instructions take one cycle to
 120 execute (not the case for mul/div etc., but will be dealt with later.
 121
 122 **In-progress TODO**
 123
 124 # Code Explanation
 125
 126 Source code: <https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/cyclemodel>
 127