Notice how this still has an overall circuit latency that is essentially equivalent to grev's latency (or shift/rotate's latency). Also notice how this circuit allows specifying much more than just `grev` or `gorc` instructions. Layers of XOR gates can be added at the input and output, allowing it to function as a `gandc` instruction too, requiring a total of 6-bits of immediate (1 bit for inverting the input, 1 bit for inverting the output, 4 bits for the look-up-tables).
We will also want versions of `grev` that have the shift amount be an immediate (needed for bitwise reverse and byte reversals and other similar instructions.) The immediate-shift-amount version can be specified to always do a `grev` (or maybe only `grev`/`gorc`) operation to save encoding space, since I'd guess it's much more common than any of the other immediate-shift variants.
+
+# Twin LUT4s
+
+gate-saving of the AND/OR (AOI) can be applied to grevlut. TODO, version
+of diagram in SVG/DIA
+
+<img src="https://ftp.libre-soc.org/2022-05-17_11-05.png" width=800 />