working on compared_to_register_renaming.mdwn

author Jacob Lifshay <programmerjake@gmail.com>

Tue, 27 Oct 2020 04:54:35 +0000 (21:54 -0700)

committer Jacob Lifshay <programmerjake@gmail.com>

Tue, 27 Oct 2020 04:58:11 +0000 (21:58 -0700)
author Jacob Lifshay <programmerjake@gmail.com>
Tue, 27 Oct 2020 04:54:35 +0000 (21:54 -0700)
committer Jacob Lifshay <programmerjake@gmail.com>
Tue, 27 Oct 2020 04:58:11 +0000 (21:58 -0700)
diff --git a/3d_gpu/architecture/compared_to_register_renaming.mdwn b/3d_gpu/architecture/compared_to_register_renaming.mdwn

new file mode 100644 (file)

index 0000000..08c7838
--- /dev/null
+++ b/3d_gpu/architecture/compared_to_register_renaming.mdwn
@@ -0,0 +1,103 @@
+One critical difference between the 6600-derived architecture and traditional register-renaming OoO speculative processors is that writes to any one particular ISA-level register max out at 1 per clock cycle (without special measures to improve that) in the 6600-derived architecture, whereas the register-renamed version can easily handle multiple such register writes per clock cycle since the register writes are spread out across multiple physical registers.
+
+The following diagrams are assuming that the fetch, decode, branch prediction, and register renaming can handle 4 instructions per clock cycle (usual on Intel's processors for many generations). They assume that `ldu` can write the address register after 1 clock cycle of execution and the destination register after 4 clock cycles of execution (can be achieved by splitting into 2 separate micro-ops).
+
+The following C program is used:
+
+```C
+#include <stdint.h>
+
+void f(uint64_t *r3, uint64_t r4) {
+    uint64_t ctr, r9;
+    ctr = r4;
+    do {
+        r9 = *++r3;
+        r9 += 100;
+        *r3 = r9;
+    } while(--ctr != 0);
+}
+```
+
+[See on Compiler Explorer](https://gcc.godbolt.org/z/hzf7d7)
+
+It produces the following Power instructions (edited for style):
+
+```
+f:
+    mtctr r4
+.L2:
+    ldu r9, 8(r3)
+    addi r9, r9, 100
+    std r9, 0(r3)
+    bdnz .L2
+    blr
+```
+
+## Register Renaming
+
+Renamed hardware registers are named `h0`, `h1`, `h2`, ...
+
+The syntax `ldu h7, 8(h5 -> h8)` will be used to mean that the address read comes from `h5` and the address write goes to `h8`
+
+The register rename table starts out as following:
+
+| `r3` | `r4` |
+|------|------|
+| `h0` | `h1` |
+
+| ISA-level instruction | Renamed Instruction     | 0     | 1      | 2                   | 3            | 4                   | 5                    | 6                     | 7            | 8                    | 9                     | 10                     | 11           | 12           | 13     | 14 |
+|-----------------------|-------------------------|-------|--------|---------------------|--------------|---------------------|----------------------|-----------------------|--------------|----------------------|-----------------------|------------------------|--------------|--------------|--------|----|
+| `mtctr r4`            | `mtctr h1`              | Fetch | Decode | Ex: Rd `h1`         | Ex: Wr `ctr` | Retire              |                      |                       |              |                      |                       |                        |              |              |        |    |
+| `ldu r9, 8(r3)`       | `ldu h2, 8(h0 -> h3)`   | Fetch | Decode | Ex: Rd `h0`         | Ex: Wr `h3`  | Ex                  | Ex: Wr `h2`          | Retire                |              |                      |                       |                        |              |              |        |    |
+| `addi r9, r9, 100`    | `addi h4, h2, 1`        | Fetch | Decode | Wait: `h2`          | Wait: `h2`   | Wait: `h2`          | Ex: Rd `h2`          | Ex: Wr `h4`           | Retire       |                      |                       |                        |              |              |        |    |
+| `std r9, 0(r3)`       | `std h4, 0(h3)`         | Fetch | Decode | Wait: `h3` and `h4` | Wait: `h4`   | Wait: `h4`          | Wait: `h4`           | Ex: Rd `h3` and `h4`  | Ex           | Ex                   | Retire                |                        |              |              |        |    |
+| `bdnz .L2`            | `bdnz .L2`              |       | Fetch  | Decode              | Ex: Rd `ctr` | Ex: Wr `ctr`        | Wait: Retire         | Wait: Retire          | Wait: Retire | Wait: Retire         | Retire                |                        |              |              |        |    |
+| `ldu r9, 8(r3)`       | `ldu h5, 8(h3 -> h6)`   |       |        | Fetch               | Decode       | Ex: Rd `h3`         | Ex: Wr `h6`          | Ex                    | Ex: Wr `h5`  | Wait: Retire         | Retire                |                        |              |              |        |    |
+| `addi r9, r9, 100`    | `addi h7, h5, 100`      |       |        | Fetch               | Decode       | Wait: `h5`          | Wait: `h5`           | Wait: `h5`            | Ex: Rd `h5`  | Ex: Wr `h7`          | Retire                |                        |              |              |        |    |
+| `std r9, 0(r3)`       | `std h7, 0(h6)`         |       |        | Fetch               | Decode       | Wait: `h6` and `h7` | Wait: `h7`           | Wait: `h7`            | Wait: `h7`   | Ex: Rd `h6` and `h7` | Ex                    | Ex                     | Retire       |              |        |    |
+| `bdnz .L2`            | `bdnz .L2`              |       |        | Fetch               | Decode       | Ex: Rd `ctr`        | Ex: Wr `ctr`         | Wait: Retire          | Wait: Retire | Wait: Retire         | Wait: Retire          | Wait: Retire           | Retire       |              |        |    |
+| `ldu r9, 8(r3)`       | `ldu h8, 8(h6 -> h9)`   |       |        |                     | Fetch        | Decode              | Ex: Rd `h6`          | Ex: Wr `h9`           | Ex           | Ex: Wr `h8`          | Wait: Retire          | Wait: Retire           | Retire       |              |        |    |
+| `addi r9, r9, 100`    | `addi h10, h8, 100`     |       |        |                     | Fetch        | Decode              | Wait: `h8`           | Wait: `h8`            | Wait: `h8`   | Ex: Rd `h8`          | Ex: Wr `h10`          | Wait: Retire           | Retire       |              |        |    |
+| `std r9, 0(r3)`       | `std h10, 0(h9)`        |       |        |                     | Fetch        | Decode              | Wait: `h9` and `h10` | Wait: `h10`           | Wait: `h10`  | Wait: `h10`          | Ex: Rd `h9` and `h10` | Ex                     | Ex           | Retire       |        |    |
+| `bdnz .L2`            | `bdnz .L2`              |       |        |                     | Fetch        | Decode              | Ex: Rd `ctr`         | Ex: Wr `ctr`          | Wait: Retire | Wait: Retire         | Wait: Retire          | Wait: Retire           | Wait: Retire | Retire       |        |    |
+| `ldu r9, 8(r3)`       | `ldu h11, 8(h9 -> h12)` |       |        |                     |              | Fetch               | Decode               | Ex: Rd `h9`           | Ex: Wr `h12` | Ex                   | Ex: Wr `h11`          | Wait: Retire           | Wait: Retire | Retire       |        |    |
+| `addi r9, r9, 100`    | `addi h13, h11, 100`    |       |        |                     |              | Fetch               | Decode               | Wait: `h11`           | Wait: `h11`  | Wait: `h11`          | Ex: Rd `h11`          | Ex: Wr `h13`           | Wait: Retire | Retire       |        |    |
+| `std r9, 0(r3)`       | `std h13, 0(h12)`       |       |        |                     |              | Fetch               | Decode               | Wait: `h12` and `h13` | Wait: `h13`  | Wait: `h13`          | Wait: `h13`           | Ex: Rd `h12` and `h13` | Ex           | Ex           | Retire |    |
+| `bdnz .L2`            | `bdnz .L2`              |       |        |                     |              | Fetch               | Decode               | Ex: Rd `ctr`          | Ex: Wr `ctr` | Wait: Retire         | Wait: Retire          | Wait: Retire           | Wait: Retire | Wait: Retire | Retire |    |
+| ...                   | ...                     | ...   | ...    | ...                 | ...          | ...                 | ...                  | ...                   | ...          | ...                  | ...                   | ...                    | ...          | ...          | ...    |    |
+
+## 6600-derived
+
+For the following table:
+- Assumes that `ldu` instructions are split into two micro-ops in the decode stage.
+- Assumes that a mechanism for forwarding from a FU's result latch to a waiting operation is in place, without having to wait until the result can be written to the register file.
+- "Av `r3`" denotes that the value to be written to `r3` is computed and is available for forwarding but can't yet be written to the register file.
+- "SW: #4" denotes that the instruction is waiting on the shadow produced by instruction #4.
+- "Rf #5:`r5`" denotes that the instruction reads the result latch for instruction #5's new value for `r5` through the forwarding mechanism.
+
+TODO(programmerjake): finish
+
+| ISA-level instruction | Num      | 0     | 1      | 2                | 3            | 4                    | 5           | 6                    | 7           | 8      | 9      |
+|-----------------------|----------|-------|--------|------------------|--------------|----------------------|-------------|----------------------|-------------|--------|--------|
+| `mtctr r4`            | #0       | Fetch | Decode | Ex: Rd `r4`      | Ex: Wr `ctr` | Finish               |             |                      |             |        |        |
+| `ldu r9, 8(r3)`       | #1.addr  | Fetch | Decode | Ex: Rd `r3`      | Ex: Av `r3`  | SW: #1.mem           | Ex: Wr `r3` | Finish               |             |        |        |
+| `ldu r9, 8(r3)`       | #1.mem   |       | Decode | Wait: #1.addr    | Ex           | Ex                   | Ex: Wr `r9` | Finish               |             |        |        |
+| `addi r9, r9, 100`    | #2       | Fetch | Decode | Wait: #1.mem     | Wait: #1.mem | Wait: #1.mem         | Ex: Rd `r9` | Ex: Wr `r9`          | Finish      |        |        |
+| `std r9, 0(r3)`       | #3       | Fetch | Decode | Wait: #1.addr #2 | Wait: #2     | Wait: #2             | Wait: #2    | Ex: Rd `r3` and `r9` | Ex          | Ex     | Finish |
+| `bdnz .L2`            | #4       |       | Fetch  | Decode           | Ex: Rd `ctr` | Ex: result available | SW: #3      | SW: #3               | SW: #3      | SW: #3 | Finish |
+| `ldu r9, 8(r3)`       | #5.addr  |       |        | Fetch            | Decode       | Ex: Rf #1.addr:`r3`  | Ex: Av `r3` | SW: #5.mem           | Ex: Wr `r3` |        |        |
+| `ldu r9, 8(r3)`       | #5.mem   |       |        |                  | Decode       | Wait: #5.addr        | Ex          |                      |             |        |        |
+| `addi r9, r9, 100`    | #6       |       |        | Fetch            | Decode       |                      |             |                      |             |        |        |
+| `std r9, 0(r3)`       | #7       |       |        | Fetch            | Decode       |                      |             |                      |             |        |        |
+| `bdnz .L2`            | #8       |       |        | Fetch            | Decode       |                      |             |                      |             |        |        |
+| `ldu r9, 8(r3)`       | #9.addr  |       |        |                  | Fetch        | Decode               |             |                      |             |        |        |
+| `ldu r9, 8(r3)`       | #9.mem   |       |        |                  |              | Decode               |             |                      |             |        |        |
+| `addi r9, r9, 100`    | #10      |       |        |                  | Fetch        | Decode               |             |                      |             |        |        |
+| `std r9, 0(r3)`       | #11      |       |        |                  | Fetch        | Decode               |             |                      |             |        |        |
+| `bdnz .L2`            | #12      |       |        |                  | Fetch        | Decode               |             |                      |             |        |        |
+| `ldu r9, 8(r3)`       | #13.addr |       |        |                  |              | Fetch                | Decode      |                      |             |        |        |
+| `ldu r9, 8(r3)`       | #13.mem  |       |        |                  |              |                      | Decode      |                      |             |        |        |
+| `addi r9, r9, 100`    | #14      |       |        |                  |              | Fetch                | Decode      |                      |             |        |        |
+| `std r9, 0(r3)`       | #15      |       |        |                  |              | Fetch                | Decode      |                      |             |        |        |
+| `bdnz .L2`            | #16      |       |        |                  |              | Fetch                | Decode      |                      |             |        |        |
+| ...                   | ...      | ...   | ...    | ...              | ...          | ...                  | ...         |                      |             |        |        |
author	Jacob Lifshay <programmerjake@gmail.com>
	Tue, 27 Oct 2020 04:54:35 +0000 (21:54 -0700)
committer	Jacob Lifshay <programmerjake@gmail.com>
	Tue, 27 Oct 2020 04:58:11 +0000 (21:58 -0700)