From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Mon, 23 Mar 2020 15:40:13 +0000 (+0000)
Subject: add section on mem-mem dependency matrix
X-Git-Tag: convert-csv-opcode-to-binary~3080
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=97af19eeb1cfb981ff46fded237a1ceebbe6e704;p=libreriscv.git

add section on mem-mem dependency matrix
---

diff --git a/3d_gpu/address_dep_matrix.png b/3d_gpu/address_dep_matrix.png
deleted file mode 100644
index c73dc40fe..000000000
Binary files a/3d_gpu/address_dep_matrix.png and /dev/null differ
diff --git a/3d_gpu/architecture/6600scoreboard.mdwn b/3d_gpu/architecture/6600scoreboard.mdwn
index 48dcc2ec2..69f22cab1 100644
--- a/3d_gpu/architecture/6600scoreboard.mdwn
+++ b/3d_gpu/architecture/6600scoreboard.mdwn
@@ -30,31 +30,6 @@ Source:
 * [Priority Pickers](https://git.libre-riscv.org/?p=nmutil.git;a=blob;f=src/nmutil/picker.py;hb=HEAD)
 * [ALU Comp Units](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/experiment/compalu.py;h=f7b5e411a739e770777ceb71d7bd09fe4e70e8c0;hb=b08dee1c3e8cf0d635820693fe50cd0518caeed2)
 
-# LD/ST Computation Unit
-
-The Load/Store Computation Unit is a little more complex, involving
-three functions: LOAD, STORE, and INT Addition.  The SR Latches create
-a cyclic chain (just as with the ALU Computation Unit) however here
-there are three possible chains.
-
-* INT Addition mode will activate Issue, GoRead, GoWrite
-* LD Mode will activate Issue, GoRead, GoAddr then finally GoWrite
-* ST Mode will activate Issue, GoRead, GoAddr then GoStore.
-
-These signals will be allowed to activate when the correct "Req" lines
-are active.  Cyclically respecting these request-response signals results in
-the SR Latches never going into "unstable / unknown" states.
-
-Note: there is an error in the diagram, compared to the source code.
-It was necessary to capture src2 (op2) separate from src1 (op1), so that
-for the ST, op2 goes into the STORE as the data, not op1.
-
-Source:
-
-* [LD/ST Comp Units](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/experiment/compldst.py;h=206f44876b00b6c1d94716e624a03e81208120d4;hb=a0e1af6c5dab5c324a8bf3a7ce6eb665d26a65c1)
-
-[[!img ld_st_comp_unit.png]]
-
 # Multi-in cascading Priority Picker
 
 Using the Group Picker as a fundamental unit, a cascading chain is created,
@@ -155,7 +130,20 @@ a DFF (register).
 
 [[!img shadow.jpg]]
 
-# Store Computation Unit
+# LD/ST Computation Unit
+
+The Load/Store Computation Unit is a little more complex, involving
+three functions: LOAD, STORE, and INT Addition.  The SR Latches create
+a cyclic chain (just as with the ALU Computation Unit) however here
+there are three possible chains.
+
+* INT Addition mode will activate Issue, GoRead, GoWrite
+* LD Mode will activate Issue, GoRead, GoAddr then finally GoWrite
+* ST Mode will activate Issue, GoRead, GoAddr then GoStore.
+
+These signals will be allowed to activate when the correct "Req" lines
+are active.  Cyclically respecting these request-response signals results in
+the SR Latches never going into "unstable / unknown" states.
 
 * Issue will close the opcode latch and OPEN the operand latch AND
 trigger "Request-Read" (and set "Busy")
@@ -166,5 +154,50 @@ AND trigger "Request Write"
 * Go-Write will close the result latch and OPEN the opcode latch, and
 reset BUSY back to OFF, ready for a new cycle.
 
-[[!img st_comp_unit.jpg]]
+Note: there is an error in the diagram, compared to the source code.
+It was necessary to capture src2 (op2) separate from src1 (op1), so that
+for the ST, op2 goes into the STORE as the data, not op1.
+
+Source:
+
+* [LD/ST Comp Units](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/experiment/compldst.py;h=206f44876b00b6c1d94716e624a03e81208120d4;hb=a0e1af6c5dab5c324a8bf3a7ce6eb665d26a65c1)
+
+[[!img ld_st_comp_unit.png]]
+
+# Memory-Memory Dependency Matrix
+
+Due to the possibility of more than on LD/ST being in flight, it is necessary
+to determine which memory operations are conflicting, and to preserve a
+semblance of order.  It turns out that as long as there is no *possibility*
+of overlaps (note this wording carefully), and that LOADs are done separately
+from STOREs, this is sufficient.
+
+The first step then is to ensure that only a mutually-exclusive batch of LDs
+*or* STs (not both) is detected, with the order between such batches being
+preserved.  This is what the memory-memory dependency matrix does.
+
+"WAR" stands for "Write After Read" and is an SR Latch.  "RAW" stands for
+"Read After Write" and likewise is an SR Latch.  Any LD which comes in
+when a ST is pending will result in the relevant RAW SR Latch going active.
+Likewise, any ST which comes in when a LD is pending results in the
+relevant WAR SR Latch going active.
+
+LDs can thus be prevented when it has any dependent RAW hazards active,
+and likewise STs can be prevented when any dependent WAR hazards are active.
+The matrix also ensures that ordering is preserved.
+
+Note however that this is the equivalent of an ALU "FU-FU" Matrix.  A
+separate Register-Mem Dependency Matrix is *still needed* in order to
+preserve the **register** read/write dependencies that occur between
+instructions, where the Mem-Mem Matrix simply protects against memory
+hazards.
+
+Note also that it does not detect address clashes: that is the responsibility
+of the Address Match Matrix.
+
+Source:
+
+* [Memory-Dependency Row](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/scoreboard/mem_dependence_cell.py;h=2958d864cec75480b97a0725d9b3c44f53d2e7a0;hb=a0e1af6c5dab5c324a8bf3a7ce6eb665d26a65c1)
+* [Memory-Dependency Matrix](https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/scoreboard/mem_fu_matrix.py;h=6b9ce140312290a26babe2e3e3d821ae3036e3ab;hb=a0e1af6c5dab5c324a8bf3a7ce6eb665d26a65c1)
 
+[[!img ld_st_dep_matrix.png]]
diff --git a/3d_gpu/ld_st_dep_matrix.png b/3d_gpu/ld_st_dep_matrix.png
new file mode 100644
index 000000000..c73dc40fe
Binary files /dev/null and b/3d_gpu/ld_st_dep_matrix.png differ