## Why LE regfile?
-The concept of having a regfile where the byte ordering of the underlying SRAM seems utter nonsense. Surely, a hardware implementation gets to choose the order, right? The bytes come in, all registers are 64 bit and it's just wiring, right?
+The concept of having a regfile where the byte ordering of the underlying SRAM seems utter nonsense. Surely, a hardware implementation gets to choose the order, right? It's memory only where LE/BE matters, right? The bytes come in, all registers are 64 bit and it's just wiring, right?
-The assumption in that question was, "all registers are 64 bit". SV allows SIMD-style packing of vectors into the 64 bit registers, and consequently it becomes critically important to decide a byte-order. That decision was - arbitrarily - LE mode. Actually it wasn't arbitrary at all: it was such hell to implement CRs and LD/ST in LibreSOC, based on a terse spec that provides indufficient clarity and assumes significant working knowledge of OpenPOWER, with arbitrary insertions of 7-index here and 3-bitindex there that the decision to pick LE was extremely easy.
+Ordinarily this would be 100% correct, in both a scalar ISA and in a Cray style Vector one. The assumption in that last question was, however, "all registers are 64 bit". SV allows SIMD-style packing of vectors into the 64 bit registers, and consequently it becomes critically important to decide a byte-order. That decision was - arbitrarily - LE mode. Actually it wasn't arbitrary at all: it was such hell to implement CRs and LD/ST in LibreSOC, based on a terse spec that provides indufficient clarity and assumes significant working knowledge of OpenPOWER, with arbitrary insertions of 7-index here and 3-bitindex there that the decision to pick LE was extremely easy.
Without such a decision, if two words are packed as elements into a 64 bit register, what does this mean? Should they be inverted so that the lower indexed element goes into the HI or the LO word? should the 8 bytes of each register be inverted? Should the bytes in each element be inverted? The decision was therefore made: the c typedef union is the definitive canonical definition, and its members are defined as being in LE order. From there, implementations may choose whatever internal HDL wire order they like as long as the results conform to the elwidth pseudocode.