1 # Single-Issue, In-Order Processor Core
3 * First steps for a newbie developer [[docs/firststeps]]
4 * bugreport <http://bugs.libre-riscv.org/show_bug.cgi?id=1039>
6 The Libre-SOC TestIssuer core
7 utilises a Finite-State Machine (FSM) to control the fetch/dec/issue/exec
8 pipelines, with only one pipeline being active at any given time. This is good
9 for debugging the HDL, but severly restricts performance as a single
10 instruction will take tens of clock cycles to complete. In-development
11 (Andrey to research and link to the relevant bugreport) is an in-order
12 core and following on from that will be an out-of-order core.
14 A Single-Issue In-Order control unit will allow every pipepline to be active,
15 and raises the ideal maximum throughput to 1 instruction per clock cycle,
16 bearing any register hazards.
18 This control unit has not been written in HDL yet (incorrect: the first version was written 18 months ago, and is in soc/ and there are options in the Makefile to enable it), however there's currently a
19 task to develop the model for the simulator first. The model will be used to
20 determine performance.
22 Diagram that Luke drew comparing pipelines and fsms which allows for a transition from FSM to in-order to out-of-order and also allows "Micro-Coding".
24 [[!img /3d_gpu/pipeline_vs_fsms.jpg size="600x"]]
29 * [Bug description](https://bugs.libre-soc.org/show_bug.cgi?id=1039)
31 The model for the Single-Issue In-Order core needs to be added to the in-house
32 Python simulator (`ISACaller`, called by `pypowersim`), which will allow basic
33 *performance estimates*.
35 For now, this model resides outside the simulator, and
36 is *completely standalone*.
38 Eventually, Cavatools code will be studied to extract and re-implement in
39 Python power consumption estimation.
43 * [Bug comment #1](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c1)
44 * [IRC log](https://libre-soc.org/irclog/%23libre-soc.2023-05-02.log.html#t2023-05-02T10:51:45)
46 An offline instruction ordering analyser need to be written that models a
47 (simple, initially V3.0-only) **in-order core** and gives an estimate of
48 instructions per clock (IPC).
50 Hazard Protection should be straightforward, simple bit vector:
52 - Take the write result register number: set bit
53 - For all read registers, check corresponding bit. If bit is set, STALL (fake/
56 A stall is defined as a delay in execution of an instruction in order to
57 resolve a hazard (i.e. trying to read a register while it is being written to).
58 See the [wikipedia article on Pipeline Stall](https://en.wikipedia.org/wiki/Pipeline_stall)
62 - Instruction with its operands (as assembler listing)
63 - plus an optional memory-address and whether it is read or written.
65 The input will come as a trace output from the ISACaller simulator,
66 [see bug comments #7-#16](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c7)
68 Some classes needed which "model" pipeline stages: fetch, decode, issue,
71 One global "STALL" flag will cause all buses to stop:
73 - Tells fetch to stop fetching
74 - Decode stops (either because empty, or has instrution whose read reg's and
77 - Execute (pipelines) run as an empty slot (except for the initial instruction
80 Example (PC chosen arbitrarily):
83 cmpi 1, 0, 3, 4 #PC=12
84 ld 1, 2(3) #PC=16 EA=0x12345678
86 The third operand of `cmpi` is the register which to use in comparison, so
87 register 3 needs to be read. However, `addi` will be writing to this register,
88 and thus a STALL will occur when `cmpi` is in the decode phase.
90 The output diagram will look like this:
92 TODO, move this to a separate file then *include it twice*, once with triple-quotes
93 and once without. grep "inline raw=yes" for examples on how to include in mdwn
96 | clk # | fetch | decode | issue | execute |
97 |:-----:|:------------:|:------------:|:------------:|:------------:|
98 | 1 | addi 3,4,5 | | | |
99 | 2 | cmpi 1,0,3,4 | addi 3,4,5 | | |
100 | 3 | STALL | cmpi 1,0,3,4 | addi 3,4,5 | |
101 | 4 | STALL | cmpi 1,0,3,4 | | addi 3,4,5 |
102 | 5 | ld 1,2(3) | | cmpi 1,0,3,4 | |
103 | 6 | | ld 1,2(3) | | cmpi 1,0,3,4 |
104 | 7 | | | ld 1,2(3) | |
105 | 8 | | | | ld 1,2(3) |
111 2: Decoded addi, fetched cmpi.
112 3: Issued addi, decoded cmpi, must stall decode phase, stop fetching.
113 4: Executed addi, everything else stalled.
114 5: Issued cmpi, fetched ld.
115 6: Executed cmpi, decoded ld.
119 For this initial model, it is assumed that all instructions take one cycle to
120 execute (not the case for mul/div etc., but will be dealt with later.
126 Source code: <https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/cyclemodel>