From d3a2483e42b0f0615b06158959e8edaaf77039c2 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Mon, 16 May 2022 19:26:13 -0700 Subject: [PATCH] add grev/gorc design doc --- openpower/sv/bitmanip.mdwn | 2 ++ openpower/sv/bitmanip/grev_gorc_design.mdwn | 19 +++++++++++++++++++ 2 files changed, 21 insertions(+) create mode 100644 openpower/sv/bitmanip/grev_gorc_design.mdwn diff --git a/openpower/sv/bitmanip.mdwn b/openpower/sv/bitmanip.mdwn index fbedb496b..2f208d36d 100644 --- a/openpower/sv/bitmanip.mdwn +++ b/openpower/sv/bitmanip.mdwn @@ -421,6 +421,8 @@ uint_xlen_t bmextrev(RA, RB, sh) # grevlut +([3x lower latency alternative](bitmanip/grev_gorc_design.mdwn)) + generalised reverse combined with a pair of LUT2s and allowing a constant `0b0101...0101` when RA=0, and an option to invert (including when RA=0, giving a constant 0b1010...1010 as the diff --git a/openpower/sv/bitmanip/grev_gorc_design.mdwn b/openpower/sv/bitmanip/grev_gorc_design.mdwn new file mode 100644 index 000000000..9d2a2f60a --- /dev/null +++ b/openpower/sv/bitmanip/grev_gorc_design.mdwn @@ -0,0 +1,19 @@ +# GRev/GOrC combination instruction design + +The design is derived from a circuit for GRev made with muxes: + +![grev_made_with_muxes.svg](grev_made_with_muxes.svg) + +First, we convert that circuit to use And-Or-Invert gates, since that's an efficient way the muxes can be implemented: + +![grev_made_with_aoi_gates.svg](grev_made_with_aoi_gates.svg) + +Notice how each And-Or-Invert has both a bit of `SH` and `~SH` as inputs? Those can be converted to separate inputs, controlled by the bits of `SH` using the instruction's immediate as a pair of 2-bit look-up-tables. This requires 4-bits of immediate. + +This gives us our final design: + +![grev_gorc_combination.svg](grev_gorc_combination.svg) + +Notice how this still has an overall circuit latency that is essentially equivalent to grev's latency (or shift/rotate's latency). Also notice how this circuit allows specifying much more than just `grev` or `gorc` instructions. A final layer of XOR gates can be added at the input and output, allowing it to function as a `gandc` instruction too, requiring a total of 6-bits of immediate. + +We will also want versions of `grev` that have the shift amount be an immediate (needed for bitwise reverse and byte reversals and other similar instructions.) The immediate-shift-amount version can be specified to always do a `grev` (or maybe only `grev`/`gorc`) operation to save encoding space, since I'd guess it's much more common than any of the other immediate-shift variants. -- 2.30.2