# Single-Issue, In-Order Processor Core note: as of the time of writing, this task is 95-98% completed and requires approximately 10-15 lines of python code to get it actually running a first unit test. * First steps for a newbie developer [[docs/firststeps]] * bugreport The Libre-SOC TestIssuer core utilises a Finite-State Machine (FSM) to control the fetch/dec/issue/exec Computational Units, with only one such CompUnit (a FSM or a pipeline) being active at any given time. This is good for debugging the HDL, but severly restricts performance as a single instruction will take tens of clock cycles to complete. In-development (Andrey to research and link to the relevant bugreport) is an in-order core and following on from that will be an out-of-order core. A Single-Issue In-Order control unit (written 12+ months ago) will allow every pipepline to be active, and raises the ideal maximum throughput to 1 instruction per clock cycle, bearing any register hazards. This control unit has not been written in HDL yet (incorrect: the first version was written 12+ months ago, and is in soc/ and there are options in the Makefile to enable it), however there's currently a task to develop the model for the simulator first. The model will be used to determine performance. Diagram that Luke drew comparing pipelines and fsms which allows for a transition from FSM to in-order to out-of-order and also allows "Micro-Coding". [[!img /3d_gpu/pipeline_vs_fsms.jpg size="600x"]] # The Model ## Brief * [Bug description](https://bugs.libre-soc.org/show_bug.cgi?id=1039) The model for the Single-Issue In-Order core needs to be added to the in-house Python simulator (`ISACaller`, called by `pypowersim`), which will allow basic *performance estimates*. INCORRECT - pypowersim *outputs an execution trace log* which **after the fact** may be passed to **any** model of which the in-order model is **just the very first**. For now, this model resides outside the simulator, and is *completely standalone* **and will ALWAYS remain standalone** A subtask to be carried out **as incremental development** is that avatools source code will need to be studied to extract power consumption estimation and add that into the inorder model ## Task given * [Bug comment #1](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c1) * [IRC log](https://libre-soc.org/irclog/%23libre-soc.2023-05-02.log.html#t2023-05-02T10:51:45) The offline instruction ordering analyser need to be **COMPLETED** (it is currently 98% complete) that models a (simple, initially V3.0-only) **in-order core** and gives an estimate of instructions per clock (IPC). Hazard Protection **WHICH IS ALREADY COMPLETED** is a straightforward, simple bit vector (WRONG it is a "length of pipeline countdown until result is ready" which models the clock cycles needed in the ACTUAL pipeline(s)? the "bit" you refer to is "is there an entry in the python set() for this register yes-or-no") - Take the write result register number: set bit WRONG "add num-cycles-until-ready to the set()" - For all read registers, check corresponding bit WRONG call the function that checks if there is an entry in the "python set() of expected outstanding results to be written" . If bit is set, STALL (fake/ model-stall) A stall is defined as a delay in execution of an instruction in order to resolve a hazard (i.e. trying to read a register while it is being written to). See the [wikipedia article on Pipeline Stall](https://en.wikipedia.org/wiki/Pipeline_stall) Input **IS** (98% completed, remember?): - Instruction with its operands (as assembler listing) - plus an optional memory-address and whether it is read or written. The input will come as a trace output from the ISACaller simulator, [see bug comments #7-#16](https://bugs.libre-soc.org/show_bug.cgi?id=1039#c7) Some classes needed (WRONG: ALREADY WRITTEN) which "model" pipeline stages: fetch, decode, issue, execute. One global "STALL" flag will cause all buses to stop: - Tells fetch to stop fetching - Decode stops (either because empty, or has instrution whose read reg's and being written to). - Issue stops. - Execute (pipelines) run as an empty slot (except for the initial instruction causing the stall) Example (PC chosen arbitrarily): addi 3, 4, 5 #PC=8 cmpi 1, 0, 3, 4 #PC=12 ld 1, 2(3) #PC=16 EA=0x12345678 The third operand of `cmpi` is the register which to use in comparison, so register 3 needs to be read. However, `addi` will be writing to this register, and thus a STALL will occur when `cmpi` is in the decode phase. The output diagram will look like this: TODO, move this to a separate file then *include it twice*, once with triple-quotes and once without. grep "inline raw=yes" for examples on how to include in mdwn ``` | clk # | fetch | decode | issue | execute | |:-----:|:------------:|:------------:|:------------:|:------------:| | 1 | addi 3,4,5 | | | | | 2 | cmpi 1,0,3,4 | addi 3,4,5 | | | | 3 | STALL | cmpi 1,0,3,4 | addi 3,4,5 | | | 4 | STALL | cmpi 1,0,3,4 | | addi 3,4,5 | | 5 | ld 1,2(3) | | cmpi 1,0,3,4 | | | 6 | | ld 1,2(3) | | cmpi 1,0,3,4 | | 7 | | | ld 1,2(3) | | | 8 | | | | ld 1,2(3) | ``` Explanation: 1: Fetched addi. 2: Decoded addi, fetched cmpi. 3: Issued addi, decoded cmpi, must stall decode phase, stop fetching. 4: Executed addi, everything else stalled. 5: Issued cmpi, fetched ld. 6: Executed cmpi, decoded ld. 7: Issued ld. 8: Executed ld. For this initial model, it is assumed that all instructions take one cycle to execute (not the case for mul/div etc., but will be dealt with later. **In-progress TODO** # Code Explanation - *IN PROGRESS* *(Not all of the code has been explained, just the general classes.)* Source code: ## `Hazard` namedtuple data structure A `namedtuple` object stores the attributes of the register access. The python `namedtuple` is immutable (like a normal tuple), while also allowing to access elements by predefined names. Immutability is great because the register access attributes won't change from fetch to execution stages, which is why a normal `list` or `dict` wouldn't be appropriate. Unlike a normal dictionary, a `namedtuple` is also ordered (so the initially defined order is preserved). See the [python wiki on `namedtuple`](https://docs.python.org/3.7/library/collections.html#collections.namedtuple), [online namedtuple tutorial](https://realpython.com/python-namedtuple/), [sta]. `namedtuple` instances can also be stored in sets, which is exactly how it is used with the `RegisterWrite` class. One instruction trace may contain zero or more `Hazard` register access objects (depending on whether registers are needed for the instruction). ## `HazardProfiles` A dictionary of currently supported register file types. Each entry (register file type) defines the number of read and write ports, written as a tuple, with the first entry being the number of read ports, and second entry being the number of write ports. Having multiple read and/or write ports means that multiple **different** entries in the same register file can be read from and/or written to in the same clock cycle. This doesn't prevent a stall if the same register entry is used by a consecutive instruction, even if a spare port is available (Read-after-Write hazard). ## Parsing trace file dump using `read_file` function The `CPU` model class takes as input, a single instruction trace `list` object. This trace `list` object, is produced by the function `read_file` which itself reads an instruction trace file from modified `ISACaller` ([link to code needed](LINK)). From now on, the trace `list` object will simply be referred to as `trace`. Each line of the trace dump is of the form `[{rw}:FILE:regnum:offset:width]* # insn` where: - `rw` is the register to be used for reading (operands), or writing (to store result, condition codes, etc.). - `FILE` is the register file type (GPR/integer, FPR/floating-point, etc. see Additional Information section at the end of this page). *(TODO: use section reference link instead)*. - `regnum` is the register number - `offset` *TODO: Perhaps the offset of data in bytes??? no idea (right now not important, as examples all show 0 offset)* - `width` is the length of the data in bits to be accessed from the register. - `insn` is the full instruction written in PowerISA assembler. The block `[{rw}:FILE:regnum:offset:width]` is used zero or more times, based on the total number of read and write registers used for the instruction. Example trace file with three instructions: r:GPR:0:0:64 w:GPR:1:0:64 # addi 1, 0, 0x0010 r:GPR:0:0:64 w:GPR:2:0:64 # addi 2, 0, 0x1234 r:GPR:1:0:64 r:GPR:2:0:64 # stw 2, 0(1) The instruction trace file is processed line by line, where each line split into the register access atributes (from which a new namedtuple is created using `_make()` and the `Hazard` definition; see [python wiki on _make() method](https://docs.python.org/3.7/library/collections.html#collections.somenamedtuple._make)). Each line is converted to a `trace` object of the form: `[insn, Hazard(...), Hazard(...), ...]`. An example trace looks like this: ['addi 1, 0, 0x0010', Hazard(action='r', target='GPR', ident='0', offs='0',elwid='64'), Hazard(action='w', target='GPR', ident='1', offs='0', elwid='64')] The function `read_file` yields (see [python wiki on yield]()) a single `trace` for each line of the trace file. To produces a full list of traces all the user needs to do is to call `read_file` with the filename of the `ISACaller` instruction trace dump, and assign to a new variable (which will end up being a list of `trace` objects, ready to be iterated over for the CPU model). ## RegisterWrite A class which is based on a Python set, and is used to keep track of current registers used for writing (for detecting Read-after-Write Hazards). A [python wiki on sets](https://docs.python.org/3.7/tutorial/datastructures.html#sets) is an unordered collection with **no duplicate elements**. By checking if next instruction's read registers match any of the write registers in the RegWrite set, the model can raise a STALL. Anything in the set **MUST STALL** at the Decode phase because the currently issued/executed instruction's result has not been written to the register/s needed for the consecutive instruction. ### Methods def __init__(self): self.storage = set() Initialise `RegisterWrite` set. def expect_write(self, regs): return self.storage.update(regs) If there are new registers to be written to, add them to the current `RegisterWrite` set. def write_expected(self, regs): return (len(self.storage.intersection(regs)) != 0) Boolean flag which is true if no read registers need to be written to (by previous instruction). def retire_write(self, regs): return self.storage.difference_update(regs) Remove write registers from `RegisterWrite` set if they match the given read registers. ## `get_input_regs` and `get_output_regs` functions ## CPU class The `CPU` class models the in-order, single-issue core. Contains the `RegisterWrite` set for tracking Read-after-Write Hazards, fetch, decode, issue, and execute stages, as well as a `stall` flag for indicating if the CPU is currently stalled. The input to the model is a trace `list` object. The main methods used during the running of the model is `process_instructions()`, which is called every time an instruction trace `list` object is read from a trace file. ### Methods def __init__(self): self.regs = RegisterWrite() self.fetch = Fetch(self) self.decode = Decode(self) self.issue = Issue(self) self.exe = Execute(self) self.stall = False def reads_possible(self, regs): # TODO: subdivide this down by GPR FPR CR-field. # currently assumes total of 3 regs are readable at one time possible = set() r = regs.copy() while len(possible) < 3 and len(r) > 0: possible.add(r.pop()) return possible def writes_possible(self, regs): # TODO: subdivide this down by GPR FPR CR-field. # currently assumes total of 1 reg is possible regardless of what it is possible = set() r = regs.copy() while len(possible) < 1 and len(r) > 0: possible.add(r.pop()) return possible def process_instructions(self): stall = self.stall stall = self.fetch.process_instructions(stall) stall = self.decode.process_instructions(stall) stall = self.issue.process_instructions(stall) stall = self.exe.process_instructions(stall) self.stall = stall if not stall: self.fetch.tick() self.decode.tick() self.issue.tick() self.exe.tick() ## Execute class The `Execute` class models the execute phase of the processor. Contains a list ### Methods def __init__(self, cpu): self.stages = [] self.cpu = cpu def add_stage(self, cycles_away, stage): while cycles_away > len(self.stages): self.stages.append([]) self.stages[cycles_away].append(stage) def add_instruction(self, insn, writeregs): self.add_stage(2, {'insn': insn, 'writes': writeregs}) def tick(self): self.stages.pop(0) # tick drops anything at time "zero" def process_instructions(self, stall): instructions = self.stages[0] # get list of instructions to_write = set() # need to know total writes for instruction in instructions: to_write.update(instruction['writes']) # see if all writes can be done, otherwise stall writes_possible = self.cpu.writes_possible(to_write) if writes_possible != to_write: stall = True # retire the writes that are possible in this cycle (regfile writes) self.cpu.regs.retire_write(writes_possible) # and now go through the instructions, removing those regs written for instruction in instructions: instruction['writes'].difference_update(writes_possible) return stall # Additional Information ## On register file types Currently (20th Aug 2023), the following register files are included in the CPU model: - General Purpose Registers (GPR) - stores integers (0-31 in default PowerISA, 0-127 for Libre-SOC with SVP64) - Floating Point Registers (FPR) - stores floating-point numbers - Condition Register (CR) - broken up into 4-bit fields - Condition Register Fields (CRf) - stores arithmetic condition of an operation (less than, greater than, equal to zero, overflow) - Fixed-Point Exception Register (XER) - Machine State Register (MSR) - Floating-Point Status and Control Register (FPSCR) - Program Counter (PC); PowerISA spec primarilly calls this *Current Instruction Address (CIA)*. See PowerISA v3.1, section 1.3.4 Description of Instruction Operation - Slow Special Purpose Registers (SPRs) - Fast SPR (SPRf) *TODO: Special Purpose Registers and fields need better explation. The initial writer of this page (Andrey) has very little understanding of whether SPR is actually a register, or if it's just a category of registers (XER, etc.)* See the [PowerISA 3.1 spec](LINK) for detailed information on register files (Book I, Chapters 1.3.4, 2.3, 3.2, 4.2, 5.2, 5.3).