[Flaws](https://bugs.libre-soc.org/show_bug.cgi?id=216#c24)
in the above were detected, and needed correction.
+Notes:
+
+* The flaw detected above is that for each pair of LD/ST operations
+ coming from the Function Unit (to cover mis-aligned requests),
+ the Addr[4] bit is **mutually-exclusive**. i.e. it is **guaranteed**
+ that Addr[4] for the first FU port's LD/ST request will **never**
+ equal that of the second.
+* Therefore, if the two requests are split into left/right separate L0
+ Cache/Buffers, the advantages and optimisations for XOR-comparison
+ of bits 12-48 of the address **may not take place**.
+* Solution: merge both L0-left and L0-right into one L0 Cache/Buffer,
+ with twin left/right banks in the same L0 Cache/Buffer
+* This then means that the number of rows may be reduced to 8
+* It also means that Addr[12-48] may be stored (and compared) only once
+* It does however mean that the reservation on the row has to wait for
+ *both* ports (left and right) to clear out their LD/ST operation(s).
+* Addr[4] still selects whether the request is to go into left or right bank
+
+Other than that, the design remains the same, as does the algorithm to
+merge the bytemasks. This remains as follows:
+
+* PriorityPicker selects one row
+* For all rows greater than the selected row, if Addr[5:48] matches
+ then the bytemask is "merged" into the output-bytemask-selector
+* The output-bytemask-selector is used as a "byte-enable" line on
+ a single 128-bit byte-level read-or-write (never both).
+
+Twin 128-bit requests (read-or-write) are then passed directly through
+to a pair of L1 Caches.
+
[[!img twin_l0_cache_buffer.jpg size="600x"]]
# Multi-input/output Dependency Cell and Computation Unit