--- /dev/null
+One critical difference between the 6600-derived architecture and traditional register-renaming OoO speculative processors is that writes to any one particular ISA-level register max out at 1 per clock cycle (without special measures to improve that) in the 6600-derived architecture, whereas the register-renamed version can easily handle multiple such register writes per clock cycle since the register writes are spread out across multiple physical registers.
+
+The following diagrams are assuming that the fetch, decode, branch prediction, and register renaming can handle 4 instructions per clock cycle (usual on Intel's processors for many generations). They assume that `ldu` can write the address register after 1 clock cycle of execution and the destination register after 4 clock cycles of execution (can be achieved by splitting into 2 separate micro-ops).
+
+The following C program is used:
+
+```C
+#include <stdint.h>
+
+void f(uint64_t *r3, uint64_t r4) {
+ uint64_t ctr, r9;
+ ctr = r4;
+ do {
+ r9 = *++r3;
+ r9 += 100;
+ *r3 = r9;
+ } while(--ctr != 0);
+}
+```
+
+[See on Compiler Explorer](https://gcc.godbolt.org/z/hzf7d7)
+
+It produces the following Power instructions (edited for style):
+
+```
+f:
+ mtctr r4
+.L2:
+ ldu r9, 8(r3)
+ addi r9, r9, 100
+ std r9, 0(r3)
+ bdnz .L2
+ blr
+```
+
+## Register Renaming
+
+Renamed hardware registers are named `h0`, `h1`, `h2`, ...
+
+The syntax `ldu h7, 8(h5 -> h8)` will be used to mean that the address read comes from `h5` and the address write goes to `h8`
+
+The register rename table starts out as following:
+
+| `r3` | `r4` |
+|------|------|
+| `h0` | `h1` |
+
+| ISA-level instruction | Renamed Instruction | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
+|-----------------------|-------------------------|-------|--------|---------------------|--------------|---------------------|----------------------|-----------------------|--------------|----------------------|-----------------------|------------------------|--------------|--------------|--------|----|
+| `mtctr r4` | `mtctr h1` | Fetch | Decode | Ex: Rd `h1` | Ex: Wr `ctr` | Retire | | | | | | | | | | |
+| `ldu r9, 8(r3)` | `ldu h2, 8(h0 -> h3)` | Fetch | Decode | Ex: Rd `h0` | Ex: Wr `h3` | Ex | Ex: Wr `h2` | Retire | | | | | | | | |
+| `addi r9, r9, 100` | `addi h4, h2, 1` | Fetch | Decode | Wait: `h2` | Wait: `h2` | Wait: `h2` | Ex: Rd `h2` | Ex: Wr `h4` | Retire | | | | | | | |
+| `std r9, 0(r3)` | `std h4, 0(h3)` | Fetch | Decode | Wait: `h3` and `h4` | Wait: `h4` | Wait: `h4` | Wait: `h4` | Ex: Rd `h3` and `h4` | Ex | Ex | Retire | | | | | |
+| `bdnz .L2` | `bdnz .L2` | | Fetch | Decode | Ex: Rd `ctr` | Ex: Wr `ctr` | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Retire | | | | | |
+| `ldu r9, 8(r3)` | `ldu h5, 8(h3 -> h6)` | | | Fetch | Decode | Ex: Rd `h3` | Ex: Wr `h6` | Ex | Ex: Wr `h5` | Wait: Retire | Retire | | | | | |
+| `addi r9, r9, 100` | `addi h7, h5, 100` | | | Fetch | Decode | Wait: `h5` | Wait: `h5` | Wait: `h5` | Ex: Rd `h5` | Ex: Wr `h7` | Retire | | | | | |
+| `std r9, 0(r3)` | `std h7, 0(h6)` | | | Fetch | Decode | Wait: `h6` and `h7` | Wait: `h7` | Wait: `h7` | Wait: `h7` | Ex: Rd `h6` and `h7` | Ex | Ex | Retire | | | |
+| `bdnz .L2` | `bdnz .L2` | | | Fetch | Decode | Ex: Rd `ctr` | Ex: Wr `ctr` | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Retire | | | |
+| `ldu r9, 8(r3)` | `ldu h8, 8(h6 -> h9)` | | | | Fetch | Decode | Ex: Rd `h6` | Ex: Wr `h9` | Ex | Ex: Wr `h8` | Wait: Retire | Wait: Retire | Retire | | | |
+| `addi r9, r9, 100` | `addi h10, h8, 100` | | | | Fetch | Decode | Wait: `h8` | Wait: `h8` | Wait: `h8` | Ex: Rd `h8` | Ex: Wr `h10` | Wait: Retire | Retire | | | |
+| `std r9, 0(r3)` | `std h10, 0(h9)` | | | | Fetch | Decode | Wait: `h9` and `h10` | Wait: `h10` | Wait: `h10` | Wait: `h10` | Ex: Rd `h9` and `h10` | Ex | Ex | Retire | | |
+| `bdnz .L2` | `bdnz .L2` | | | | Fetch | Decode | Ex: Rd `ctr` | Ex: Wr `ctr` | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Retire | | |
+| `ldu r9, 8(r3)` | `ldu h11, 8(h9 -> h12)` | | | | | Fetch | Decode | Ex: Rd `h9` | Ex: Wr `h12` | Ex | Ex: Wr `h11` | Wait: Retire | Wait: Retire | Retire | | |
+| `addi r9, r9, 100` | `addi h13, h11, 100` | | | | | Fetch | Decode | Wait: `h11` | Wait: `h11` | Wait: `h11` | Ex: Rd `h11` | Ex: Wr `h13` | Wait: Retire | Retire | | |
+| `std r9, 0(r3)` | `std h13, 0(h12)` | | | | | Fetch | Decode | Wait: `h12` and `h13` | Wait: `h13` | Wait: `h13` | Wait: `h13` | Ex: Rd `h12` and `h13` | Ex | Ex | Retire | |
+| `bdnz .L2` | `bdnz .L2` | | | | | Fetch | Decode | Ex: Rd `ctr` | Ex: Wr `ctr` | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Retire | |
+| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | |
+
+## 6600-derived
+
+For the following table:
+- Assumes that `ldu` instructions are split into two micro-ops in the decode stage.
+- Assumes that a mechanism for forwarding from a FU's result latch to a waiting operation is in place, without having to wait until the result can be written to the register file.
+- "Av `r3`" denotes that the value to be written to `r3` is computed and is available for forwarding but can't yet be written to the register file.
+- "SW: #4" denotes that the instruction is waiting on the shadow produced by instruction #4.
+- "Rf #5:`r5`" denotes that the instruction reads the result latch for instruction #5's new value for `r5` through the forwarding mechanism.
+
+TODO(programmerjake): finish
+
+| ISA-level instruction | Num | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
+|-----------------------|----------|-------|--------|------------------|--------------|----------------------|-------------|----------------------|-------------|--------|--------|
+| `mtctr r4` | #0 | Fetch | Decode | Ex: Rd `r4` | Ex: Wr `ctr` | Finish | | | | | |
+| `ldu r9, 8(r3)` | #1.addr | Fetch | Decode | Ex: Rd `r3` | Ex: Av `r3` | SW: #1.mem | Ex: Wr `r3` | Finish | | | |
+| `ldu r9, 8(r3)` | #1.mem | | Decode | Wait: #1.addr | Ex | Ex | Ex: Wr `r9` | Finish | | | |
+| `addi r9, r9, 100` | #2 | Fetch | Decode | Wait: #1.mem | Wait: #1.mem | Wait: #1.mem | Ex: Rd `r9` | Ex: Wr `r9` | Finish | | |
+| `std r9, 0(r3)` | #3 | Fetch | Decode | Wait: #1.addr #2 | Wait: #2 | Wait: #2 | Wait: #2 | Ex: Rd `r3` and `r9` | Ex | Ex | Finish |
+| `bdnz .L2` | #4 | | Fetch | Decode | Ex: Rd `ctr` | Ex: result available | SW: #3 | SW: #3 | SW: #3 | SW: #3 | Finish |
+| `ldu r9, 8(r3)` | #5.addr | | | Fetch | Decode | Ex: Rf #1.addr:`r3` | Ex: Av `r3` | SW: #5.mem | Ex: Wr `r3` | | |
+| `ldu r9, 8(r3)` | #5.mem | | | | Decode | Wait: #5.addr | Ex | | | | |
+| `addi r9, r9, 100` | #6 | | | Fetch | Decode | | | | | | |
+| `std r9, 0(r3)` | #7 | | | Fetch | Decode | | | | | | |
+| `bdnz .L2` | #8 | | | Fetch | Decode | | | | | | |
+| `ldu r9, 8(r3)` | #9.addr | | | | Fetch | Decode | | | | | |
+| `ldu r9, 8(r3)` | #9.mem | | | | | Decode | | | | | |
+| `addi r9, r9, 100` | #10 | | | | Fetch | Decode | | | | | |
+| `std r9, 0(r3)` | #11 | | | | Fetch | Decode | | | | | |
+| `bdnz .L2` | #12 | | | | Fetch | Decode | | | | | |
+| `ldu r9, 8(r3)` | #13.addr | | | | | Fetch | Decode | | | | |
+| `ldu r9, 8(r3)` | #13.mem | | | | | | Decode | | | | |
+| `addi r9, r9, 100` | #14 | | | | | Fetch | Decode | | | | |
+| `std r9, 0(r3)` | #15 | | | | | Fetch | Decode | | | | |
+| `bdnz .L2` | #16 | | | | | Fetch | Decode | | | | |
+| ... | ... | ... | ... | ... | ... | ... | ... | | | | |