The design is derived from a circuit for GRev made with muxes:
-<img src="../grev_made_with_muxes.svg" width="50%" height="50%"/>
+<img src="../grev_made_with_muxes.svg" width="100%" height="100%"/>
First, we convert that circuit to use And-Or-Invert gates, since that's an efficient way the muxes can be implemented:
-<img src="../grev_made_with_aoi_gates.svg" width="50%" height="50%" />
+<img src="../grev_made_with_aoi_gates.svg" width="100%" height="100%" />
Notice how each And-Or-Invert has both a bit of `SH` and `~SH` as inputs? Those can be converted to separate inputs, controlled by the bits of `SH` using the instruction's immediate as a pair of 2-bit look-up-tables. This requires 4-bits of immediate.
This gives us our final design:
-<img src="../grev_gorc_combination.svg" width="50%" height="50%" />
+<img src="../grev_gorc_combination.svg" width="100%" height="100%" />
Notice how this still has an overall circuit latency that is essentially equivalent to grev's latency (or shift/rotate's latency). Also notice how this circuit allows specifying much more than just `grev` or `gorc` instructions. A final layer of XOR gates can be added at the input and output, allowing it to function as a `gandc` instruction too, requiring a total of 6-bits of immediate.