(no commit message)
[libreriscv.git] / 3d_gpu / architecture / inorder_model.mdwn
1 # Single-Issue, In-Order Processor Core
2
3 * First steps for a newbie developer [[docs/firststeps]]
4 * bugreport <http://bugs.libre-riscv.org/show_bug.cgi?id=1039>
5
6 The Libre-SOC TestIssuer core
7 utilises a Finite-State Machine (FSM) to control the fetch/dec/issue/exec
8 pipelines, with only one pipeline being active at any given time. This is good
9 for debugging the HDL, but severly restricts performance as a single
10 instruction will take tens of clock cycles to complete. In-development
11 (Andrey to research and link to the relevant bugreport) is an in-order
12 core and following on from that will be an out-of-order core.
13
14 A Single-Issue In-Order control unit will allow every pipepline to be active,
15 and raises the ideal maximum throughput to 1 instruction per clock cycle,
16 bearing any register hazards.
17
18 This control unit has not been written in HDL yet (incorrect: the first version was written 18 months ago, and is in soc/ and there are options in the Makefile to enable it), however there's currently a
19 task to develop the model for the simulator first. The model will be used to
20 determine performance.
21
22 Diagram that Luke drew comparing pipelines and fsms which allows for a transition from FSM to in-order to out-of-order and also allows "Micro-Coding".
23
24 [[!img /3d_gpu/pipeline_vs_fsms.jpg size="600x"]]
25
26 # The Model
27 ## Brief
28
29 * [Bug description](https://bugs.libre-soc.org/show_bug.cgi?id=1039)
30
31 The model for the Single-Issue In-Order core needs to be added to the in-house
32 Python simulator (`ISACaller`, called by `pypowersim`), which will allow basic
33 *performance estimates*.
34
35 For now, this model resides outside the simulator, and
36 is *completely standalone*.
37
38 Eventually, Cavatools code will be studied to extract and re-implement in
39 Python power consumption estimation.
40
41 ## Task given
42
43 * [Bug comment #1](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c1)
44 * [IRC log](https://libre-soc.org/irclog/%23libre-soc.2023-05-02.log.html#t2023-05-02T10:51:45)
45
46 An offline instruction ordering analyser need to be written that models a
47 (simple, initially V3.0-only) **in-order core** and gives an estimate of
48 instructions per clock (IPC).
49
50 Hazard Protection should be straightforward, simple bit vector:
51
52 - Take the write result register number: set bit
53 - For all read registers, check corresponding bit. If bit is set, STALL (fake/
54 model-stall)
55
56 A stall is defined as a delay in execution of an instruction in order to
57 resolve a hazard (i.e. trying to read a register while it is being written to).
58 See the [wikipedia article on Pipeline Stall](https://en.wikipedia.org/wiki/Pipeline_stall)
59
60 Input should be:
61
62 - Instruction with its operands (as assembler listing)
63 - plus an optional memory-address and whether it is read or written.
64
65 The input will come as a trace output from the ISACaller simulator,
66 [see bug comments #7-#16](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c7)
67
68 Some classes needed which "model" pipeline stages: fetch, decode, issue,
69 execute.
70
71 One global "STALL" flag will cause all buses to stop:
72
73 - Tells fetch to stop fetching
74 - Decode stops (either because empty, or has instrution whose read reg's and
75 being written to).
76 - Issue stops.
77 - Execute (pipelines) run as an empty slot (except for the initial instruction
78 causing the stall)
79
80 Example (PC chosen arbitrarily):
81
82 addi 3, 4, 5 #PC=8
83 cmpi 1, 0, 3, 4 #PC=12
84 ld 1, 2(3) #PC=16 EA=0x12345678
85
86 The third operand of `cmpi` is the register which to use in comparison, so
87 register 3 needs to be read. However, `addi` will be writing to this register,
88 and thus a STALL will occur when `cmpi` is in the decode phase.
89
90 The output diagram will look like this:
91
92 TODO, move this to a separate file then *include it twice*, once with triple-quotes
93 and once without. grep "inline raw=yes" for examples on how to include in mdwn
94
95 ```
96 | clk # | fetch | decode | issue | execute |
97 |:-----:|:------------:|:------------:|:------------:|:------------:|
98 | 1 | addi 3,4,5 | | | |
99 | 2 | cmpi 1,0,3,4 | addi 3,4,5 | | |
100 | 3 | STALL | cmpi 1,0,3,4 | addi 3,4,5 | |
101 | 4 | STALL | cmpi 1,0,3,4 | | addi 3,4,5 |
102 | 5 | ld 1,2(3) | | cmpi 1,0,3,4 | |
103 | 6 | | ld 1,2(3) | | cmpi 1,0,3,4 |
104 | 7 | | | ld 1,2(3) | |
105 | 8 | | | | ld 1,2(3) |
106 ```
107
108 Explanation:
109
110 1: Fetched addi.
111 2: Decoded addi, fetched cmpi.
112 3: Issued addi, decoded cmpi, must stall decode phase, stop fetching.
113 4: Executed addi, everything else stalled.
114 5: Issued cmpi, fetched ld.
115 6: Executed cmpi, decoded ld.
116 7: Issued ld.
117 8: Executed ld.
118
119 For this initial model, it is assumed that all instructions take one cycle to
120 execute (not the case for mul/div etc., but will be dealt with later.
121
122 **In-progress TODO**
123
124 # Code Explanation
125
126 Source code: <https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/cyclemodel>
127