From 4be8001cf3a7b774ce27348f783cbc8f77052004 Mon Sep 17 00:00:00 2001 From: lkcl Date: Thu, 31 Dec 2020 22:07:28 +0000 Subject: [PATCH] --- openpower/sv/overview.mdwn | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/openpower/sv/overview.mdwn b/openpower/sv/overview.mdwn index fae284f92..066e005b6 100644 --- a/openpower/sv/overview.mdwn +++ b/openpower/sv/overview.mdwn @@ -376,6 +376,14 @@ of the destination. The only situation where a full overwrite occurs is on "default" behaviour. This is extremely important to consider the register file as a byte-level store, not a 64-bit-level store. +## Why LE regfile? + +The concept of having a regfile where the byte ordering of the underlying SRAM seems utter nonsense. Surely, a hardware implementation gets to choose the order, right? The bytes come in, all registers are 64 bit and it's just wiring, right? + +The assumption in that question was, "all registers are 64 bit". SV allows SIMD-style packing of vectors into the 64 bit registers, and consequently it becomes critically important to decide a byte-order. That decision was - arbitrarily - LE mode. Actually it wasn't arbitrary at all: it was such hell to implement CRs and LD/ST in LibreSOC, with arbitrary insertions of 7-index here and 3-bitindex there that the decision to pick LE was extremrly easy. + +Without such a decision, if two words are packed as elements into a 64 bit register, what does this mean? Should they be inverted so that the lower indexed element does into the HI or the LO word? should the 8 bytes of each register be inverted? Should the bytes in each element be inverted? The decision was therefore made: the c typedef union is, in a LE context, the definitive canonical definition, and implementations may choose whatever internal HDL wire order they like as long as the results conform to the elwidth pseudocode. + ## Source and Destination overrides A minor fly in the ointment: what happens if the source and destination are over-ridden to different widths? For example, FP16 arithmetic is not accurate enough and may introduce rounding errors when up-converted to FP32 output. The rule is therefore set: -- 2.30.2