From 43402fdf11b969453cd6c2a5de03648f2fea0e06 Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 15 May 2022 20:41:36 +0100 Subject: [PATCH] --- openpower/sv/bitmanip.mdwn | 54 +++++++++++++++++--------------------- 1 file changed, 24 insertions(+), 30 deletions(-) diff --git a/openpower/sv/bitmanip.mdwn b/openpower/sv/bitmanip.mdwn index faeebfe7c..8588e6e91 100644 --- a/openpower/sv/bitmanip.mdwn +++ b/openpower/sv/bitmanip.mdwn @@ -97,7 +97,7 @@ TODO: convert all instructions to use RT and not RS | NN | RT | RA | RB | im0-4 | im5-7 00 |1 | grevlog | | NN | | | | | ----- 01 |m3| crternlog | | NN | RT | RA | RB | RC | mode 010 |Rc| bitmask\* | -| NN | | | | | 00 011 | | rsvd | +| NN | RT | RA | RB | RC | 00 011 |nh| binlut | | NN | | | | | 01 011 |0 | svshape | | NN | | | | | 01 011 |1 | svremap | | NN | | | | | 10 011 |Rc| svstep | @@ -153,7 +153,7 @@ double check that instructions didn't need 3 inputs. | NN | RT | RA | RB | 1 | 11 | 1110 110 |Rc| clmulh | X-Form | | NN | | | | | | --11 110 |Rc| rsvd | | -# ternlog bitops +# binary and ternary bitops Similar to FPGA LUTs: for every bit perform a lookup into a table using an 8bit immediate, or in another register. @@ -172,34 +172,28 @@ Like the x86 AVX512F [vpternlogd/vpternlogq](https://www.felixcloutier.com/x86/v for i in range(64): RT[i] = lut3(imm, RB[i], RA[i], RT[i]) -## ternlogv - -also, another possible variant involving swizzle-like selection -and masking, this only requires 3 64 bit registers (RA, RS, RB) and -only 16 LUT3s. - -Note however that unless XLEN matches sz, this instruction -is a Read-Modify-Write: RS must be read as a second operand -and all unmodified bits preserved. SVP64 may provide limited -alternative destination for RS from RS-as-source, but again -all unmodified bits must still be copied. - -| 0.5|6.10|11.15|16.20|21.28 | 29.30 |31| -| -- | -- | --- | --- | ---- | ----- |--| -| NN | RS | RA | RB |idx0-3| 01 |sz| - - SZ = (1+sz) * 8 # 8 or 16 - raoff = MIN(XLEN, idx0 * SZ) - rboff = MIN(XLEN, idx1 * SZ) - rcoff = MIN(XLEN, idx2 * SZ) - rsoff = MIN(XLEN, idx3 * SZ) - imm = RB[0:8] - for i in range(MIN(XLEN, SZ)): - ra = RA[raoff:+i] - rb = RA[rboff+i] - rc = RA[rcoff+i] - res = lut3(imm, ra, rb, rc) - RS[rsoff+i] = res +## binlut + +Binary lookup is a dynamic LUT2 version of ternlogi. Firstly, the +lookup table is 4 bits wide not 8 bits, and secondly the lookup +table comes from a register not an immediate. + +| 0.5|6.10|11.15|16.20| 21..25|26..30|31| +| -- | -- | --- | --- | ----- | ---- |--| +| NN | RT | RA | RB | RC |00011 |nh| + + lut2(imm, a, b): + idx = b << 1 | a + return imm[idx] # idx by LSB0 order + + imm = (RC>>(nh*4))&0b1111 + for i in range(64): + RT[i] = lut2(imm, RB[i], RA[i]) + +*Programmer's note: a dynamic ternary lookup may be synthesised from +a pair of `binlut` instructions followed by a `ternlogi` to select which +to merge. Use `nh` to select which nibble to use as the lookup table +from the RC source register (`nh=1` nibble high)* ## ternlogcr -- 2.30.2