subsystem but the number of available bits is already under
pressure so some care is needed. Even better would be the 24-bit
RM but that is also precious and under pressure.
+
+*In theory* there is one bit free:
+
+| 0-1 | 2 | 3 4 | description |
+| --- | --- |---------|-------------------------- |
+| 00 | 0 | dz sz | simple mode |
+| 00 | 1 | 0 RG | scalar reduce mode (mapreduce) |
+| 00 | 1 | 1 / | reserved |
+
+which could be utilised to instruct hardware between the
+regfile and the Arithmetic ALUs to perform **element*-level
+byte-reversing:
+
+ GPR -> brev(RM.normal.br) -> add -> brev(RM.normal.br) -> GPR
+
+However even this does not solve the problem caused by loading
+the data in 8-byte (`ld/ldbrx`) followed by **accessing** it
+as element-width-overridden half/word/byte elements: the situation
+occurs in the 8/16 table, above.
+
+In addition, it would **still be necessary** to instruct Hardware
+Architects on ensuring that Memory-Load to Regfile-byte-order is
+still **strictly** defined (architecturally, not actual implementation)
+
+Then also there is the completely separate issue of how to describe this
+in MSB0 numbering, which becomes a nightmare all on its own: one that
+has to be solved in the ISACaller Simulator when elwidth overrides are
+completed (recall that the ISACaller Simulator uses a python class where
+numbers are indeed strict MSB0 defined, arithmetically).
+