From c764d23911afaee98b24e890b87371ec8991ad32 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Tue, 27 Oct 2020 12:59:23 +0000 Subject: [PATCH] add link to tomasulo_transformation for notes on "nameless" registers --- .../compared_to_register_renaming.mdwn | 23 ++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/3d_gpu/architecture/compared_to_register_renaming.mdwn b/3d_gpu/architecture/compared_to_register_renaming.mdwn index 08c7838ab..70dd18983 100644 --- a/3d_gpu/architecture/compared_to_register_renaming.mdwn +++ b/3d_gpu/architecture/compared_to_register_renaming.mdwn @@ -1,6 +1,23 @@ -One critical difference between the 6600-derived architecture and traditional register-renaming OoO speculative processors is that writes to any one particular ISA-level register max out at 1 per clock cycle (without special measures to improve that) in the 6600-derived architecture, whereas the register-renamed version can easily handle multiple such register writes per clock cycle since the register writes are spread out across multiple physical registers. - -The following diagrams are assuming that the fetch, decode, branch prediction, and register renaming can handle 4 instructions per clock cycle (usual on Intel's processors for many generations). They assume that `ldu` can write the address register after 1 clock cycle of execution and the destination register after 4 clock cycles of execution (can be achieved by splitting into 2 separate micro-ops). +One critical difference between the 6600-derived architecture and +traditional register-renaming OoO speculative processors is that +writes to any one particular ISA-level register max out at 1 per clock +cycle (without special measures to improve that) in the 6600-derived +architecture, whereas the register-renamed version can easily handle +multiple such register writes per clock cycle since the register writes +are spread out across multiple physical registers. + +(Note from lkcl: 6600 Reservation Stations *are* "register-renaming" +stations. unlike in the Tomasulo Algorithm, they're just not given +"names" because Cray and Thornton solved a problem they didn't realise +everyone else would have. See [[tomasulo_transformation]] and +) + +The following diagrams are assuming that the fetch, decode, branch +prediction, and register renaming can handle 4 instructions per clock +cycle (usual on Intel's processors for many generations). They assume that +`ldu` can write the address register after 1 clock cycle of execution +and the destination register after 4 clock cycles of execution (can be +achieved by splitting into 2 separate micro-ops). The following C program is used: -- 2.30.2