The expected results for every single element regardless of BE or LE
order will be `0x0a05` **not** `0x00a5` in BE Mode.
-This will be down to the Architects having chosen a **fixed**
-bit-ordering and **fixed** byte ordering in the underlying
-hardware without revealing that fact.
+This anticipated result will be down to the Architects having chosen
+a **fixed** bit-ordering and **fixed** byte ordering internally
+in the underlying hardware (without revealing that fact).
+
+Microwatt and Libre-SOC perform the following in hardware:
+
+ LD -> brev(XOR(MSR.BE, ldbrx/ld)) -> GPR
+
+such that internally there is no confusion: arithmetic operations
+are consistent arithmetic operations regardless of how data came to
+be stored in memory and also completely irrespective of the setting
+of MSR.BE:
+
+ GPR -> add -> GPR
+
+not:
+
+ LD -> brev(ldbrx/ld) -> GPR
+ GPR -> brev(MSR.BE) -> add -> brev(MSR.BE) -> GPR
+ GPR -> brev(stbrx/st) -> ST
+
+In a fixed-width (64-bit) architecture the above addition of
+byte-reversing in front of read-ports and write-ports in the regfile
+is pointless: they make no difference and would be gates completely
+wasted. However for element-width overrides (which amount
+to what happens to the typedef c struct union if compiled on BE
+hardware) it turns out it does matter.
+
+The choice for the **register file** to be **as if** it is an
+LE-ordered byte-addressable typedef c struct union comes down to **not**
+having to do the above double-byte-reversing trick on all Register
+reads/writes in order to artifically preserve an order-flexibility
+that should
+not have been allowed on the contents of register file in the first place.
+
+Th alternative architecture which causes huge problems in an element-based
+context is:
+
+ LD -> brev(ldbrx/ld) -> GPR
+ GPR -> brev(MSR.BE) -> add -> brev(MSR.BE) -> GPR
+
+The reason is that as in the above 8/16 table, the ordering of bytes, which
+become synonymous with elements, become inverted.