From 0f41c4f615fd84f28b00f37ebb580dc9bbcfa490 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Tue, 27 Oct 2020 12:43:14 -0700 Subject: [PATCH] finish compared_to_register_renaming.mdwn --- .../compared_to_register_renaming.mdwn | 104 ++++++++++-------- 1 file changed, 57 insertions(+), 47 deletions(-) diff --git a/3d_gpu/architecture/compared_to_register_renaming.mdwn b/3d_gpu/architecture/compared_to_register_renaming.mdwn index 70dd18983..fb8871071 100644 --- a/3d_gpu/architecture/compared_to_register_renaming.mdwn +++ b/3d_gpu/architecture/compared_to_register_renaming.mdwn @@ -62,59 +62,69 @@ The register rename table starts out as following: |------|------| | `h0` | `h1` | -| ISA-level instruction | Renamed Instruction | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | -|-----------------------|-------------------------|-------|--------|---------------------|--------------|---------------------|----------------------|-----------------------|--------------|----------------------|-----------------------|------------------------|--------------|--------------|--------|----| -| `mtctr r4` | `mtctr h1` | Fetch | Decode | Ex: Rd `h1` | Ex: Wr `ctr` | Retire | | | | | | | | | | | -| `ldu r9, 8(r3)` | `ldu h2, 8(h0 -> h3)` | Fetch | Decode | Ex: Rd `h0` | Ex: Wr `h3` | Ex | Ex: Wr `h2` | Retire | | | | | | | | | -| `addi r9, r9, 100` | `addi h4, h2, 1` | Fetch | Decode | Wait: `h2` | Wait: `h2` | Wait: `h2` | Ex: Rd `h2` | Ex: Wr `h4` | Retire | | | | | | | | -| `std r9, 0(r3)` | `std h4, 0(h3)` | Fetch | Decode | Wait: `h3` and `h4` | Wait: `h4` | Wait: `h4` | Wait: `h4` | Ex: Rd `h3` and `h4` | Ex | Ex | Retire | | | | | | -| `bdnz .L2` | `bdnz .L2` | | Fetch | Decode | Ex: Rd `ctr` | Ex: Wr `ctr` | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Retire | | | | | | -| `ldu r9, 8(r3)` | `ldu h5, 8(h3 -> h6)` | | | Fetch | Decode | Ex: Rd `h3` | Ex: Wr `h6` | Ex | Ex: Wr `h5` | Wait: Retire | Retire | | | | | | -| `addi r9, r9, 100` | `addi h7, h5, 100` | | | Fetch | Decode | Wait: `h5` | Wait: `h5` | Wait: `h5` | Ex: Rd `h5` | Ex: Wr `h7` | Retire | | | | | | -| `std r9, 0(r3)` | `std h7, 0(h6)` | | | Fetch | Decode | Wait: `h6` and `h7` | Wait: `h7` | Wait: `h7` | Wait: `h7` | Ex: Rd `h6` and `h7` | Ex | Ex | Retire | | | | -| `bdnz .L2` | `bdnz .L2` | | | Fetch | Decode | Ex: Rd `ctr` | Ex: Wr `ctr` | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Retire | | | | -| `ldu r9, 8(r3)` | `ldu h8, 8(h6 -> h9)` | | | | Fetch | Decode | Ex: Rd `h6` | Ex: Wr `h9` | Ex | Ex: Wr `h8` | Wait: Retire | Wait: Retire | Retire | | | | -| `addi r9, r9, 100` | `addi h10, h8, 100` | | | | Fetch | Decode | Wait: `h8` | Wait: `h8` | Wait: `h8` | Ex: Rd `h8` | Ex: Wr `h10` | Wait: Retire | Retire | | | | -| `std r9, 0(r3)` | `std h10, 0(h9)` | | | | Fetch | Decode | Wait: `h9` and `h10` | Wait: `h10` | Wait: `h10` | Wait: `h10` | Ex: Rd `h9` and `h10` | Ex | Ex | Retire | | | -| `bdnz .L2` | `bdnz .L2` | | | | Fetch | Decode | Ex: Rd `ctr` | Ex: Wr `ctr` | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Retire | | | -| `ldu r9, 8(r3)` | `ldu h11, 8(h9 -> h12)` | | | | | Fetch | Decode | Ex: Rd `h9` | Ex: Wr `h12` | Ex | Ex: Wr `h11` | Wait: Retire | Wait: Retire | Retire | | | -| `addi r9, r9, 100` | `addi h13, h11, 100` | | | | | Fetch | Decode | Wait: `h11` | Wait: `h11` | Wait: `h11` | Ex: Rd `h11` | Ex: Wr `h13` | Wait: Retire | Retire | | | -| `std r9, 0(r3)` | `std h13, 0(h12)` | | | | | Fetch | Decode | Wait: `h12` and `h13` | Wait: `h13` | Wait: `h13` | Wait: `h13` | Ex: Rd `h12` and `h13` | Ex | Ex | Retire | | -| `bdnz .L2` | `bdnz .L2` | | | | | Fetch | Decode | Ex: Rd `ctr` | Ex: Wr `ctr` | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Retire | | -| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | | + +| ISA-level instruction | Num | Renamed Instruction | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | +|-----------------------|-----|--------------------------|-------|--------|---------------------|--------------|---------------------|----------------------|-----------------------|-----------------------|----------------------|-----------------------|------------------------|------------------------|--------------|--------------|--------| +| `mtctr r4` | #0 | `mtctr h1` | Fetch | Decode | Ex: Rd `h1` | Ex: Wr `ctr` | Retire | | | | | | | | | | | +| `ldu r9, 8(r3)` | #1 | `ldu h2, 8(h0 -> h3)` | Fetch | Decode | Ex: Rd `h0` | Ex: Wr `h3` | Ex | Ex: Wr `h2` | Retire | | | | | | | | | +| `addi r9, r9, 100` | #2 | `addi h4, h2, 1` | Fetch | Decode | Wait: `h2` | Wait: `h2` | Wait: `h2` | Ex: Rd `h2` | Ex: Wr `h4` | Retire | | | | | | | | +| `std r9, 0(r3)` | #3 | `std h4, 0(h3)` | Fetch | Decode | Wait: `h3` and `h4` | Wait: `h4` | Wait: `h4` | Wait: `h4` | Ex: Rd `h3` and `h4` | Ex | Ex | Retire | | | | | | +| `bdnz .L2` | #4 | `bdnz .L2` | | Fetch | Decode | Ex: Rd `ctr` | Ex: Wr `ctr` | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Retire | | | | | | +| `ldu r9, 8(r3)` | #5 | `ldu h5, 8(h3 -> h6)` | | | Fetch | Decode | Ex: Rd `h3` | Ex: Wr `h6` | Ex | Ex: Wr `h5` | Wait: Retire | Retire | | | | | | +| `addi r9, r9, 100` | #6 | `addi h7, h5, 100` | | | Fetch | Decode | Wait: `h5` | Wait: `h5` | Wait: `h5` | Ex: Rd `h5` | Ex: Wr `h7` | Retire | | | | | | +| `std r9, 0(r3)` | #7 | `std h7, 0(h6)` | | | Fetch | Decode | Wait: `h6` and `h7` | Wait: `h7` | Wait: `h7` | Wait: `h7` | Ex: Rd `h6` and `h7` | Ex | Ex | Retire | | | | +| `bdnz .L2` | #8 | `bdnz .L2` | | | Fetch | Decode | Ex: Rd `ctr` | Ex: Wr `ctr` | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Retire | | | | +| `ldu r9, 8(r3)` | #9 | `ldu h8, 8(h6 -> h9)` | | | | Fetch | Decode | Ex: Rd `h6` | Ex: Wr `h9` | Ex | Ex: Wr `h8` | Wait: Retire | Wait: Retire | Retire | | | | +| `addi r9, r9, 100` | #10 | `addi h10, h8, 100` | | | | Fetch | Decode | Wait: `h8` | Wait: `h8` | Wait: `h8` | Ex: Rd `h8` | Ex: Wr `h10` | Wait: Retire | Retire | | | | +| `std r9, 0(r3)` | #11 | `std h10, 0(h9)` | | | | Fetch | Decode | Wait: `h9` and `h10` | Wait: `h10` | Wait: `h10` | Wait: `h10` | Ex: Rd `h9` and `h10` | Ex | Ex | Retire | | | +| `bdnz .L2` | #12 | `bdnz .L2` | | | | Fetch | Decode | Ex: Rd `ctr` | Ex: Wr `ctr` | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Retire | | | +| `ldu r9, 8(r3)` | #13 | `ldu h11, 8(h9 -> h12)` | | | | | Fetch | Decode | Ex: Rd `h9` | Ex: Wr `h12` | Ex | Ex: Wr `h11` | Wait: Retire | Wait: Retire | Retire | | | +| `addi r9, r9, 100` | #14 | `addi h13, h11, 100` | | | | | Fetch | Decode | Wait: `h11` | Wait: `h11` | Wait: `h11` | Ex: Rd `h11` | Ex: Wr `h13` | Wait: Retire | Retire | | | +| `std r9, 0(r3)` | #15 | `std h13, 0(h12)` | | | | | Fetch | Decode | Wait: `h12` and `h13` | Wait: `h13` | Wait: `h13` | Wait: `h13` | Ex: Rd `h12` and `h13` | Ex | Ex | Retire | | +| `bdnz .L2` | #16 | `bdnz .L2` | | | | | Fetch | Decode | Ex: Rd `ctr` | Ex: Wr `ctr` | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Retire | | +| `ldu r9, 8(r3)` | #17 | `ldu h14, 8(h12 -> h15)` | | | | | | Fetch | Decode | Ex: Rd `h12` | Ex: Wr `h15` | Ex | Ex: Wr `h14` | Wait: Retire | Wait: Retire | Retire | | +| `addi r9, r9, 100` | #18 | `addi h16, h14, 100` | | | | | | Fetch | Decode | Wait: `h14` | Wait: `h14` | Wait: `h14` | Ex: Rd `h14` | Ex: Wr `h16` | Wait: Retire | Retire | | +| `std r9, 0(r3)` | #19 | `std h16, 0(h15)` | | | | | | Fetch | Decode | Wait: `h15` and `h16` | Wait: `h16` | Wait: `h16` | Wait: `h16` | Ex: Rd `h15` and `h16` | Ex | Ex | Retire | +| `bdnz .L2` | #20 | `bdnz .L2` | | | | | | Fetch | Decode | Ex: Rd `ctr` | Ex: Wr `ctr` | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Wait: Retire | Retire | +| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ## 6600-derived +Notice how the WaR Waits on `r9` cause 2 instructions to finish per cycle (5 micro-ops per 2 cycles) instead of the 4 per cycle for the Register Renaming version, this means the processor's resources will eventually be full, limiting total throughput to 2 instructions/clock. + For the following table: -- Assumes that `ldu` instructions are split into two micro-ops in the decode stage. +- Assumes that `ldu` instructions are split into two micro-ops in the decode stage. The address computation is denoted "#5.a" and the memory read is denoted "#5.m". - Assumes that a mechanism for forwarding from a FU's result latch to a waiting operation is in place, without having to wait until the result can be written to the register file. - "Av `r3`" denotes that the value to be written to `r3` is computed and is available for forwarding but can't yet be written to the register file. - "SW: #4" denotes that the instruction is waiting on the shadow produced by instruction #4. - "Rf #5:`r5`" denotes that the instruction reads the result latch for instruction #5's new value for `r5` through the forwarding mechanism. -TODO(programmerjake): finish - -| ISA-level instruction | Num | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | -|-----------------------|----------|-------|--------|------------------|--------------|----------------------|-------------|----------------------|-------------|--------|--------| -| `mtctr r4` | #0 | Fetch | Decode | Ex: Rd `r4` | Ex: Wr `ctr` | Finish | | | | | | -| `ldu r9, 8(r3)` | #1.addr | Fetch | Decode | Ex: Rd `r3` | Ex: Av `r3` | SW: #1.mem | Ex: Wr `r3` | Finish | | | | -| `ldu r9, 8(r3)` | #1.mem | | Decode | Wait: #1.addr | Ex | Ex | Ex: Wr `r9` | Finish | | | | -| `addi r9, r9, 100` | #2 | Fetch | Decode | Wait: #1.mem | Wait: #1.mem | Wait: #1.mem | Ex: Rd `r9` | Ex: Wr `r9` | Finish | | | -| `std r9, 0(r3)` | #3 | Fetch | Decode | Wait: #1.addr #2 | Wait: #2 | Wait: #2 | Wait: #2 | Ex: Rd `r3` and `r9` | Ex | Ex | Finish | -| `bdnz .L2` | #4 | | Fetch | Decode | Ex: Rd `ctr` | Ex: result available | SW: #3 | SW: #3 | SW: #3 | SW: #3 | Finish | -| `ldu r9, 8(r3)` | #5.addr | | | Fetch | Decode | Ex: Rf #1.addr:`r3` | Ex: Av `r3` | SW: #5.mem | Ex: Wr `r3` | | | -| `ldu r9, 8(r3)` | #5.mem | | | | Decode | Wait: #5.addr | Ex | | | | | -| `addi r9, r9, 100` | #6 | | | Fetch | Decode | | | | | | | -| `std r9, 0(r3)` | #7 | | | Fetch | Decode | | | | | | | -| `bdnz .L2` | #8 | | | Fetch | Decode | | | | | | | -| `ldu r9, 8(r3)` | #9.addr | | | | Fetch | Decode | | | | | | -| `ldu r9, 8(r3)` | #9.mem | | | | | Decode | | | | | | -| `addi r9, r9, 100` | #10 | | | | Fetch | Decode | | | | | | -| `std r9, 0(r3)` | #11 | | | | Fetch | Decode | | | | | | -| `bdnz .L2` | #12 | | | | Fetch | Decode | | | | | | -| `ldu r9, 8(r3)` | #13.addr | | | | | Fetch | Decode | | | | | -| `ldu r9, 8(r3)` | #13.mem | | | | | | Decode | | | | | -| `addi r9, r9, 100` | #14 | | | | | Fetch | Decode | | | | | -| `std r9, 0(r3)` | #15 | | | | | Fetch | Decode | | | | | -| `bdnz .L2` | #16 | | | | | Fetch | Decode | | | | | -| ... | ... | ... | ... | ... | ... | ... | ... | | | | | +| ISA-level instruction | Num | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | +|-----------------------|-------|-------|--------|---------------|--------------|------------------|------------------|------------------|-------------------|------------------|---------------------------|----------------------------|----------------------------|----------------|----------------|----------------|----------------|-------------|--------| +| `mtctr r4` | #0 | Fetch | Decode | Ex: Rd `r4` | Ex: Wr `ctr` | Finish | | | | | | | | | | | | | | +| `ldu r9, 8(r3)` | #1.a | Fetch | Decode | Ex: Rd `r3` | Ex: Av `r3` | SW: #1.m | Ex: Wr `r3` | Finish | | | | | | | | | | | | +| `ldu r9, 8(r3)` | #1.m | | Decode | Wait: #1.a | Ex | Ex | Ex: Wr `r9` | Finish | | | | | | | | | | | | +| `addi r9, r9, 100` | #2 | Fetch | Decode | Wait: #1.m | Wait: #1.m | Wait: #1.m | Ex: Rd `r9` | Ex: Wr `r9` | Finish | | | | | | | | | | | +| `std r9, 0(r3)` | #3 | Fetch | Decode | Wait: #1.a #2 | Wait: #2 | Wait: #2 | Wait: #2 | Ex: Rd `r3` `r9` | Ex | Ex | Finish | | | | | | | | | +| `bdnz .L2` | #4 | | Fetch | Decode | Ex: Rd `ctr` | Ex: Av `ctr` | SW: #3 | SW: #3 | SW: #3 | SW: #3 | Ex: Wr `ctr` | Finish | | | | | | | | +| `ldu r9, 8(r3)` | #5.a | | | Fetch | Decode | Ex: Rf #1.a:`r3` | Ex: Av `r3` | SW: #5.m | SW: #3 | SW: #3 | Ex: Wr `r3` | Finish | | | | | | | | +| `ldu r9, 8(r3)` | #5.m | | | | Decode | Wait: #5.a | Ex | Ex | Ex: Av `r9` | SW: #3 | Ex: Wr `r9` | Finish | | | | | | | | +| `addi r9, r9, 100` | #6 | | | Fetch | Decode | Wait: #5.m | Wait: #5.m | Wait: #5.m | Ex: Rf #5.m:`r9` | Ex: Av `r9` | WaR Wait: `r9` | Ex: Wr `r9` | Finish | | | | | | | +| `std r9, 0(r3)` | #7 | | | Fetch | Decode | Wait: #5.a #6 | Wait: #6 | Wait: #6 | Wait: #6 | Ex: Rf #6:`r9` | Ex | Ex | Finish | | | | | | | +| `bdnz .L2` | #8 | | | Fetch | Decode | Ex: Rf #4:`ctr` | Ex: Av `ctr` | SW: #7 | SW: #7 | SW: #7 | SW: #7 | SW: #7 | Ex: Wr `ctr` | Finish | | | | | | +| `ldu r9, 8(r3)` | #9.a | | | | Fetch | Decode | Ex: Rf #5.m:`r3` | Ex: Av `r3` | SW: #9.m | SW: #7 | SW: #7 | SW: #7 | Ex: Wr `r3` | Finish | | | | | | +| `ldu r9, 8(r3)` | #9.m | | | | | Decode | Wait: #9.a | Ex | Ex | Ex: Av `r9` | SW: #7 | SW: #7 | Ex: Wr `r9` | Finish | | | | | | +| `addi r9, r9, 100` | #10 | | | | Fetch | Decode | Wait: #9.m | Wait: #9.m | Wait: #9.m | Ex: Rf #9.m:`r9` | Ex: Av `r9` | SW: #7 | WaR Wait: `r9` | Ex: Wr `r9` | Finish | | | | | +| `std r9, 0(r3)` | #11 | | | | Fetch | Decode | Wait: #9.a #10 | Wait: #10 | Wait: #10 | Wait: #10 | Ex: Rf #9.a:`r3` #10:`r9` | Ex | Ex | Finish | | | | | | +| `bdnz .L2` | #12 | | | | Fetch | Decode | Ex: Rf `ctr` | Ex: Av `ctr` | SW: #11 | SW: #11 | SW: #11 | SW: #11 | SW: #11 | Ex: Wr `ctr` | Finish | | | | | +| `ldu r9, 8(r3)` | #13.a | | | | | Fetch | Decode | Ex: Rf #9.a:`r3` | Ex: Av `r3` | SW: #13.m | SW: #11 | SW: #11 | SW: #11 | Ex: Wr `r3` | Finish | | | | | +| `ldu r9, 8(r3)` | #13.m | | | | | | Decode | Wait: #13.a | Ex | Ex | Ex: Av `r9` | SW: #11 | SW: #11 | WaR Wait: `r9` | Ex: Wr `r9` | Finish | | | | +| `addi r9, r9, 100` | #14 | | | | | Fetch | Decode | Wait: #13.m | Wait: #13.m | Wait: #13.m | Ex: Rf #13.m:`r9` | Ex: Av `r9` | SW: #11 | WaR Wait: `r9` | WaR Wait: `r9` | Ex: Wr `r9` | Finish | | | +| `std r9, 0(r3)` | #15 | | | | | Fetch | Decode | Wait: #13.a #14 | Wait: #14 | Wait: #14 | Wait: #14 | Ex: Rf #13.a:`r3` #14:`r9` | Ex | Ex | Finish | | | | | +| `bdnz .L2` | #16 | | | | | Fetch | Decode | Ex: Rf #12:`ctr` | Ex: Av `ctr` | SW: #15 | SW: #15 | SW: #15 | SW: #15 | SW: #15 | Ex: Wr `ctr` | Finish | | | | +| `ldu r9, 8(r3)` | #17.a | | | | | | Fetch | Decode | Ex: Rf #13.a:`r3` | Ex: Av `r3` | SW: #17.m | SW: #15 | SW: #15 | SW: #15 | Ex: Wr `r3` | Finish | | | | +| `ldu r9, 8(r3)` | #17.m | | | | | | | Decode | Wait: #17.a | Ex | Ex | Ex: Av `r9` | SW: #15 | SW: #15 | WaR Wait: `r9` | WaR Wait: `r9` | Ex: Wr `r9` | Finish | | +| `addi r9, r9, 100` | #18 | | | | | | Fetch | Decode | Wait: #17.m | Wait: #17.m | Wait: #17.m | Ex: Rf #17.m:`r9` | Ex: Av `r9` | SW: #15 | WaR Wait: `r9` | WaR Wait: `r9` | WaR Wait: `r9` | Ex: Wr `r9` | Finish | +| `std r9, 0(r3)` | #19 | | | | | | Fetch | Decode | Wait: #17.a #18 | Wait: #18 | Wait: #18 | Wait: #18 | Ex: Rf #17.a:`r3` #18:`r9` | Ex | Ex | Finish | | | | +| `bdnz .L2` | #20 | | | | | | Fetch | Decode | Ex: Rf #16:`ctr` | Ex: Av `ctr` | SW: #19 | SW: #19 | SW: #19 | SW: #19 | SW: #19 | Finish | | | | +| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | -- 2.30.2