From 181ac8dd9bbcae8ccf7cac77ef852509373e7e82 Mon Sep 17 00:00:00 2001
From: Andrey Miroshnikov <andrey@technepisteme.xyz>
Date: Wed, 9 Aug 2023 18:13:40 +0000
Subject: [PATCH] inorder_model: Document single-issue in-order model task.

---
 3d_gpu/architecture/inorder_model.mdwn | 108 +++++++++++++++++++++++++
 1 file changed, 108 insertions(+)
 create mode 100644 3d_gpu/architecture/inorder_model.mdwn

diff --git a/3d_gpu/architecture/inorder_model.mdwn b/3d_gpu/architecture/inorder_model.mdwn
new file mode 100644
index 000000000..59f8d5abb
--- /dev/null
+++ b/3d_gpu/architecture/inorder_model.mdwn
@@ -0,0 +1,108 @@
+# Single-Issue, In-Order Processor Core
+
+* First steps for a newbie developer [[docs/firststeps]]
+* bugreport <http://bugs.libre-riscv.org/show_bug.cgi?id=1039>
+
+At present *[Update when this is no longer the case]*, the Libre-SOC core
+utilises a Finite-State Machine (FSM) to control the fetch/dec/issue/exec
+pipelines, with only one pipeline being active at any given time. This is good
+for debugging the HDL, but severly restricts performance as a single
+instruction will take tens of clock cycles to complete.
+
+A Single-Issue In-Order control unit will allow every pipepline to be active,
+and raises the ideal maximum throughput to 1 instruction per clock cycle,
+bearing any register hazards.
+
+This control unit has not been written in HDL yet, however there's currently a
+task to develop the model for the simulator first. The model will be used to
+determine performance (and eventually to write the HDL).
+
+# The Model
+## Brief [src](https://bugs.libre-soc.org/show_bug.cgi?id=1039)
+The model for the Single-Issue In-Order core needs to be added to the in-house
+Python simulator (`ISACaller`, called by `pypowersim`), which will allow basic
+*performance estimates*.
+
+For now, this model resides outside the simulator, and
+is *completely standalone*.
+
+Eventually, Cavatools code will be studied to extract and re-implement in
+Python power consumption estimation.
+
+## Task given [src](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c1) [src](https://libre-soc.org/irclog/%23libre-soc.2023-05-02.log.html#t2023-05-02T10:51:45)
+An offline instruction ordering analyser need to be written that models a
+(simple, initially V3.0-only) **in-order core** and gives an estimate of
+instructions per clock (IPC).
+
+Hazard Protection should be straightforward, simple bit vector:
+
+- Take the write result register number: set bit
+- For all read registers, check corresponding bit. If bit is set, STALL (fake/
+model-stall)
+
+A stall is defined as a delay in execution of an instruction in order to
+resolve a hazard (i.e. trying to read a register while it is being written to).
+See the [wikipedia article on Pipeline Stall](https://en.wikipedia.org/wiki/Pipeline_stall)
+
+Input should be:
+
+- Instruction with its operands (as assembler listing)
+- plus an optional memory-address and whether it is read or written.
+
+The input will come from as trace output from the ISACaller simulator,
+[see bug comments #7-#16](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c7)
+
+Some classes needed which "model" pipeline stages: fetch, decode, issue,
+execute.
+
+One global "STALL" flag will cause all buses to stop:
+
+- Tells fetch to stop fetching
+- Decode stops (either because empty, or has instrution whose read reg's and
+being written to).
+- Issue stops.
+- Execute (pipelines) run as an empty slot (except for the initial instruction
+ causing the stall)
+
+Example (PC chosen arbitrarily):
+
+    addi 3, 4, 5    #PC=8
+    cmpi 1, 0, 3, 4 #PC=12
+    ld   1, 2(3)    #PC=16 EA=0x12345678
+
+The third operand of `cmpi` is the register which to use in comparison, so
+register 3 needs to be read. However, `addi` will be writing to this register,
+and thus a STALL will occur when `cmpi` is in the decode phase.
+
+The output diagram will look like this:
+
+| clk # | fetch | decode | issue | execute |
+| 1 | addi 3, 4, 5 | | | | |
+| 2 | cmpi 1, 0, 3, 4 | addi 3, 4, 5 | | |
+| 3 | STALL | cmpi 1, 0, 3, 4 | addi 3, 4, 5 | |
+| 4 | STALL | cmpi 1, 0, 3, 4 | | addi 3, 4, 5 |
+| 5 | ld 1, 2(3) | | cmpi 1, 0, 3, 4 | |
+| 6 | | ld 1, 2(3) | | cmpi 1, 0, 3, 4 |
+| 7 | | | ld 1, 2(3) | |
+| 8 | | | | ld 1, 2(3) |
+
+Explanation:
+
+    1: Fetched `addi`.
+    2: Decoded `addi`, fetched `cmpi`.
+    3: Issued `addi`, decoded `cmpi`, must stall decode phase, stop fetching.
+    4: Executed `addi`, everything else stalled.
+    5: Issued `cmpi`, fetched `ld`.
+    6: Executed `cmpi`, decoded `ld`.
+    7: Issued `ld`.
+    8: Executed `ld`.
+
+For this initial model, it is assumed that all instructions take one cycle to
+execute (not the case for mul/div etc., but will be dealt with later.
+
+**In-progress TODO**
+
+# Code Explanation
+
+Source code: <https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/cyclemodel>
+
-- 
2.30.2