got dia files to work, they're autoconverted to tex

[libreriscv.git] / 3d_gpu / architecture / decoder.mdwn
diff --git a/3d_gpu/architecture/decoder.mdwn b/3d_gpu/architecture/decoder.mdwn

index 4ca718256531c1f189ca8690ebd10ee9800fb596..5d4c5cb473fcfd6d01ab62860eae28b55a259c10 100644 (file)
--- a/3d_gpu/architecture/decoder.mdwn
+++ b/3d_gpu/architecture/decoder.mdwn
@@ -1,20 +1,258 @@
  # Decoder
  
-The decoder is in charge of translating the RISCV or POWER instruction stream into operations that can be handled by our backend. It will have an extra input bit, set via a MSR that will switch which architecture it treats an instruction as.
+* Context and walkthrough <https://libre-soc.org/irclog/%23libre-soc.2021-07-13.log.html>
+* First steps for a newbie developer [[docs/firststeps]]
+* bugreport <http://bugs.libre-riscv.org/show_bug.cgi?id=186>
+
+The decoder is in charge of translating the POWER instruction stream into operations that can be handled by the backend.
  
  Source code: <https://git.libre-riscv.org/?p=soc.git;a=tree;f=src/soc/decoder;hb=HEAD>
  
-## POWER
+# POWER
+
+The decoder has been written in python, to parse straight CSV files and other information taken directly from the Power ISA Standards PDF files. This significantly reduces the possibility of manual transcription errors and greatly reduces code size.  Based on Anton Blanchard's excellent microwatt design, these tables are in [[openpower/isatables]] which includes links to download the csv files.
+
+The top level decoder object recursively drops through progressive levels of case statement groups, covering additional portions of the incoming instruction bits.  More on this technique - for which python and nmigen were *specifically* and strategically chosen - is outlined here <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004882.html>
+
+The PowerDecoder2, on encountering for example an ADD
+operation, needs to know whether Rc=0/1, whether OE=0/1, whether
+RB is to be read, whether an immediate is to be read and so on.
+With all of this information being specified in the CSV files, on
+a per-instruction basis, it is simply a matter of expanding that
+information out into a data structure called Decode2ToExecute1Type.
+From there it becomes easily possible for other parts of the processor
+to take appropriate action.
+
+* [Decode2ToExecute1Type](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/decode2execute1.py;hb=HEAD)
+
+## Link to Function Units
+
+The Decoder (PowerDecode2) knows which registers are needed, however what
+it does not know is:
+
+* which Register file ports to connect to (this is defined by regspecs)
+* the order of those regfile ports (again: defined by regspecs)
+
+Neither do the Phase-aware Function Units (derived from MultiCompUnit)
+themselves know anything about the PowerDecoder, and they certainly
+do not know when a given instruction will need to tell *them* to read
+RA, or RB.  For example: negation of RA only requires one operand,
+where add RA, RB requires two.  Who tells whom that information, when
+the ALU's job is simply to add, and the Decoder's job is simply to decode?
+
+This is where a special function called "rdflags()" comes into play.
+rdflags works closely in conjunction with regspecs and the PowerDecoder2,
+in each Function Unit's "pipe\_data.py" file.  It defines the flags that
+determine, from current instruction, whether the Function Unit actually
+*wants* any given Register Read Ports activated or not.
+
+That dynamically-determined information will then actively disable
+(or allow) Register file Read requests (rd.req) on a per-port basis.
+
+Example:
+
+    class ALUInputData(IntegerData):
+        regspec = [('INT', 'ra', '0:63'), # RA
+                   ('INT', 'rb', '0:63'), # RB/immediate
+                   ('XER', 'xer_so', '32'), # XER bit 32: SO
+                   ('XER', 'xer_ca', '34,45')] # XER bit 34/45: CA/CA32
+
+This shows us that, for the ALU pipeline, it expects two INTEGER
+operands (RA and RB) both 64-bit, and it expects XER SO, CA and CA32
+bits.  However this information - as to which operands are required -
+is *dynamic*.
+
+Continuing from the OP_ADD example, where inspection of the CSV files
+(or the ISA tables) shows that we optionally need xer_so (OE=1),
+optionally need xer_ca (Rc=1), and even optionally need RB (add with
+immediate), we begin to understand that a dynamic system linking the
+PowerDecoder2 information to the Function Units is needed.  This is
+where power\_regspec\_map.py comes into play.
+
+    def regspec_decode_read(e, regfile, name):
+        if regfile == 'INT':
+            # Int register numbering is *unary* encoded
+            if name == 'ra': # RA
+                return e.read_reg1.ok, 1<<e.read_reg1.data
+            if name == 'rb': # RB
+                return e.read_reg2.ok, 1<<e.read_reg2.data
+
+Here we can see that, for INTEGER registers, if the Function Unit
+has a connection (an incoming operand) named "RA", the tuple returned
+contains two crucial pieces of information:
+
+1. The field from PowerDecoder2 which tells us if RA is even actually
+  required by this (decoded) instruction
+2. The INTEGER Register file read port activation signal (its read-enable
+  line-activation) which, if sent to the INTEGER Register file, will
+  request the actual register required by this current (decoded) 
+  instruction.
+
+Thus we have the *dynamic* information - not hardcoded in RTL but
+specified in *python* - encoding both if (first item of tuple) and
+what (second item of tuple) each Function Unit receives, and this
+for each and every operand.  A corresponding process exists for write,
+as well.
+
+* [[architecture/regfile]]
+* [CompUnits](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/compunits/compunits.py;hb=HEAD)
+* Example [ALU pipe_data.py specification](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/alu/pipe_data.py;hb=HEAD)
+* [power_regspec_map.py](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_regspec_map.py;hb=HEAD)
+
+## Fixed point instructions
+
+ - addi, addis, mulli - fairly straightforward - extract registers and immediate and translate to the appropriate op
+ - addic, addic., subfic - similar to above, but now carry needs to be saved somewhere
+ - add[o][.], subf[o][.], adde\*, subfe\*, addze\*, neg\*, mullw\*, divw\* - These are more fun. They need to set the carry (if . is present) and overflow (if o is present) flags, as well as taking in the carry flag for the extended versions.
+ - addex - uses the overflow flag as a carry in, and if CY is set to 1, sets overflow like it would carry.
+ - cmp, cmpi - sets bits of the selected comparison result register based on whether the comparison result was greater than, less than, or equal to
+ - andi., ori, andis., oris, xori, xoris - similar to above, though the and versions set the flags in CR0
+ - and\*, or\*, xor\*, nand\*, eqv\*, andc\*, orc\* - similar to the register-register arithmetic instructions above
+
+# Decoder internals
+
+The Decoder uses a class called PowerOp which get instantiated
+for every instruction. PowerOp class instantiation has member signals
+whose values get set respectively for each instruction.
+
+We use Python Enums to help with common decoder values.
+Below is the POWER add insruction.
+
+| opcode       | unit | internal op | in1 | in2 | in3  | out | CR in | CR out | inv A | inv out | cry in | cry out | ldst len | BR | sgn ext | upd | rsrv | 32b | sgn | rc | lk | sgl pipe | comment | form |
+|--------------|------|-------------|-----|-----|------|-----|-------|--------|-------|---------|--------|---------|----------|----|---------|-----|------|-----|-----|----|----|----------|---------|------|
+| 0b0100001010 | ALU  | OP_ADD      | RA  | RB  | NONE | RT  | 0     | 0      | 0     | 0       | ZERO   | 0       | NONE     | 0  | 0       | 0   | 0    | 0   | 0   | RC | 0  | 0        | add     | XO   |
+
+Here is an example of a toy multiplexer that sets various fields in the
+PowerOP signal class to the correct values for the add instruction when
+select is set equal to 1.  This should give you a feel for how we work with
+enums and PowerOP.
+
+    from nmigen import Module, Elaboratable, Signal, Cat, Mux
+    from soc.decoder.power_enums import (Function, Form, InternalOp,
+                             In1Sel, In2Sel, In3Sel, OutSel, RC, LdstLen,
+                             CryIn, get_csv, single_bit_flags,
+                             get_signal_name, default_values)
+    from soc.decoder.power_fields import DecodeFields
+    from soc.decoder.power_fieldsn import SigDecode, SignalBitRange
+    from soc.decoder.power_decoder import PowerOp
+    
+    class Op_Add_Example(Elaboratable):
+        def __init__(self):
+            self.select = Signal(reset_less=True)
+            self.op_add = PowerOp()
+        
+        def elaborate(self, platform):
+            m = Module()
+            op_add = self.op_add
+    
+            with m.If(self.select == 1):
+                m.d.comb += op_add.function_unit.eq(Function.ALU)
+                m.d.comb += op_add.form.eq(Form.XO)
+                m.d.comb += op_add.internal_op.eq(InternalOp.OP_ADD)
+                m.d.comb += op_add.in1_sel.eq(In1Sel.RA)
+                m.d.comb += op_add.in2_sel.eq(In2Sel.RB)
+                m.d.comb += op_add.in3_sel.eq(In3Sel.NONE)
+                m.d.comb += op_add.out_sel.eq(OutSel.RT)
+                m.d.comb += op_add.rc_sel.eq(RC.RC)
+                m.d.comb += op_add.ldst_len.eq(LdstLen.NONE)
+                m.d.comb += op_add.cry_in.eq(CryIn.ZERO)
+            
+            return m
+    
+    from nmigen.back import verilog
+    verilog_file = "op_add_example.v"
+    top = Op_Add_Example()
+    f = open(verilog_file, "w")
+    verilog = verilog.convert(top, name='top', strip_internal_attrs=True,
+                              ports=top.op_add.ports())
+    f.write(verilog)
+    print(f"Verilog Written to: {verilog_file}")
+
+The [actual POWER9 Decoder](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder2.py;hb=HEAD)
+uses this principle, in conjunction with reading the information shown
+in the table above from CSV files (as opposed to hardcoding them in
+python source).  These [[CSV files|openpower/isatables]],
+being machine-readable in a wide variety
+of programming languages, are conveniently available for use by
+other projects well beyond just this SOC.
+
+This also demonstrates one of the design aspects taken in this project: to
+*combine* the power of python's full capabilities in order to create
+advanced dynamically generated HDL, rather than (as done with MyHDL)
+limit python code to a subset of its full capabilities.
+
+The CSV Files are loaded by
+[power_decoder.py](https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder.py;hb=HEAD)
+and are used to construct a hierarchical cascade of switch statements.  The original code came from
+[microwatt](https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl)
+where the original hardcoded cascade can be seen.
+
+The docstring for power_decoder.py gives more details: each level in the hierarchy, just as in the original decode1.vhdl, will take slices of the instruction bitpattern, match against it, and if successful will continue with further subdecoders until a line is met that contains the required Operand Information (a PowerOp) exactly as shown at the top of this page.
+
+In this way, different sections of the instruction are successively decoded (major opcode, then minor opcode, then sub-patterns under those) until the required instruction is fully recognised, and the hierarchical cascade of switch patterns results in a flat interpretation being produced that is useful internally. 
+
+# second explanation / walkthrough
+
+the general idea here is to minimise the actual amount of work
+by using human-and-machine-readable files as much as possible,
+and performing automated translation (compilation) into executable
+form.
+
+we (manually) extracted the pseudo-code from the v3.0B specification:
+<https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/isa/fixedlogical.mdwn;hb=HEAD>
+
+then wrote a parser and language translator (aka compiler) to convert
+those code-fragments to python:
+<https://git.libre-soc.org/?p=soc.git;a=tree;f=src/soc/decoder/pseudo;hb=HEAD>
+
+then went to a lot of trouble over the course of several months to
+co-simulate them, update them, and make them accurate according to the
+actual spec:
+<https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/isa/fixedarith.mdwn;h=470a833ca2b8a826f5511c4122114583ef169e55;hb=HEAD#l721>
+
+and created a fully-functioning python-based OpenPOWER ISA simulator:
+<https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/caller.py;hb=HEAD>
+
+there is absolutely no reason why this language-translator (aka compiler)
+here
+<https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/pseudo/parser.py;hb=HEAD>
+
+should not be joined by another compiler, targetting c for use inside
+the linux kernel or, another compiler which auto-generates c++ for use
+inside power-gem5, such that this:
+<https://github.com/power-gem5/gem5/blob/cae53531103ebc5bccddf874db85f2659b64000a/src/arch/power/isa/decoder.isa#L1214>
+
+becomes an absolute breeze to update.
+
+note that we maintain a decoder which is based on Microwatt: we extracted
+microwatt's decode1.vhdl into CSV files, and parse them in python as
+hierarchical recursive data structures:
+<https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder.py;hb=HEAD>
+
+where the actual CSV files that it reads are here:
+<https://git.libre-soc.org/?p=libreriscv.git;a=tree;f=openpower/isatables;hb=HEAD>
+
+this is then combined with *another* table that was extracted from the
+OpenPOWER v3.0B PDF:
+<https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/isatables/fields.text;hb=HEAD>
+
+(the parser for that recognises "vertical bars" as being
+field-separators):
+<https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_fields.py;hb=HEAD>
+
+and FINALLY - and this is about the only major piece of code that
+actually involves any kind of manual code - again it is based on Microwatt
+decode2.vhdl - we put everything together to turn a binary opcode into
+"something that needs to be executed":
+<https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder2.py;hb=HEAD>
  
-### Fixed point instructions
- - `addi`, `addis`, `mulli` - fairly straightforward - extract registers and immediate and translate to the appropriate op
- - `addic`, `addic.`, `subfic` - similar to above, but now carry needs to be saved somewhere
- - `add[o][.]`, `subf[o][.]`, `adde*`, `subfe*`, `addze*`, `neg*`, `mullw*`, `divw*` - These are more fun. They need to set the carry (if `.` is present) and overflow (if `o` is present) flags, as well as taking in the carry flag for the `e`xtended versions.
- - `addex` - uses the overflow flag as a carry in, and if `CY` is set to 1, sets overflow like it would carry.
- - `cmp`, `cmpi` - sets bits of the selected comparison result register based on whether the comparison result was greater than, less than, or equal to
- - `andi.`, `ori`, `andis.`, `oris`, `xori`, `xoris` - similar to above, though the `and` versions set the flags in `CR0`
- - `and*`, `or*`, `xor*`, `nand*`, `eqv*`, `andc*`, `orc*` - similar to the register-register arithmetic instructions above
+so our OpenPOWER simulator is actually based on:
  
+* machine-readable CSV files
+* machine-readable Field-Form files
+* machine-readable spec-accurate pseudocode files
  
+the only reason we haven't used those to turn it into HDL is because
+doing so is a massive research project, where a first pass would be
+highly likely to generate sub-optimal HDL
  
-## RISCV