From 437e153b5d9707254e284695b519bb45a2913bf5 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Wed, 17 Nov 2021 11:13:08 -0800 Subject: [PATCH] rename ternary* -> ternlog* and add link to x86 instructions --- openpower/sv/bitmanip.mdwn | 28 +++++++++++++++------------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/openpower/sv/bitmanip.mdwn b/openpower/sv/bitmanip.mdwn index 1ca77613f..16618964f 100644 --- a/openpower/sv/bitmanip.mdwn +++ b/openpower/sv/bitmanip.mdwn @@ -9,7 +9,7 @@ Vectorisation Context is provided by [[openpower/sv]]. When combined with SV, scalar variants of bitmanip operations found in VSX are added so that VSX may be retired as "legacy" in the far future (10 to 20 years). Also, VSX is hundreds of opcodes, requires 128 bit pathways, and is wholly unsuited to low power or embedded scenarios. -ternaryv is experimental and is the only operation that may be considered a "Packed SIMD". It is added as a variant of the already well-justified ternary operation (done in AVX512 as an immediate only) "because it looks fun". As it is based on the LUT4 concept it will allow accelerated emulation of FPGAs. Other vendors of ISAs are buying FPGA companies to achieve similar objectives. +ternlogv is experimental and is the only operation that may be considered a "Packed SIMD". It is added as a variant of the already well-justified ternlog operation (done in AVX512 as an immediate only) "because it looks fun". As it is based on the LUT4 concept it will allow accelerated emulation of FPGAs. Other vendors of ISAs are buying FPGA companies to achieve similar objectives. general-purpose Galois Field operations are added so as to avoid huge custom opcode proliferation across many areas of Computer Science. however for convenience and also to avoid setup costs, some of the more common operations (clmul, crc32) are also added. The expectation is that these operations would all be covered by the same pipeline. @@ -27,12 +27,12 @@ minor opcode allocation | 28.30 |31| name | | ------ |--| --------- | - | 00 |Rc| ternaryi | - | 001 |Rc| ternary | + | 00 |Rc| ternlogi | + | 001 |Rc| ternlog | | 010 |Rc| bitmask | | 011 |Rc| gf* | - | 101 |1 | ternaryv | - | 101 |0 | ternarycr | + | 101 |1 | ternlogv | + | 101 |0 | ternlogcr | | 110 |Rc| 1/2-op | | 111 |Rc| 3-op | @@ -62,13 +62,13 @@ minor opcode allocation 3 ops * bitmask set/extract -* ternary bitops +* ternlog bitops * GF | 0.5|6.10|11.15|16.20|21..25 | 26....30 |31| name | | -- | -- | --- | --- | ----- | -------- |--| ------ | -| NN | RT | RA | RB | RC | mode 001 |Rc| ternary | -| NN | RT | RA | RB | im0-4 | im5-7 00 |Rc| ternaryi | +| NN | RT | RA | RB | RC | mode 001 |Rc| ternlog | +| NN | RT | RA | RB | im0-4 | im5-7 00 |Rc| ternlogi | | NN | RS | RA | RB | RC | 00 011 |Rc| gfmul | | NN | RS | RA | RB | RC | 01 011 |Rc| gfadd | | NN | RT | RA | RB | deg | 10 011 |Rc| gfinv | @@ -77,11 +77,11 @@ minor opcode allocation | 0.5|6.10|11.15| 16.23 |24.27 | 28.30 |31| name | | -- | -- | --- | ----- | ---- | ----- |--| ------ | -| NN | RT | RA | imm | mask | 101 |1 | ternaryv | +| NN | RT | RA | imm | mask | 101 |1 | ternlogv | | 0.5|6.8 | 9.11|12.14|15|16.23|24.27 | 28.30|31| name | | -- | -- | --- | --- |- |-----|----- | -----|--| -------| -| NN | BA | BB | BC |0 |imm | mask | 101 |0 | ternarycr | +| NN | BA | BB | BC |0 |imm | mask | 101 |0 | ternlogcr | ops (note that av avg and abs as well as vec scalar mask are included here) @@ -228,9 +228,11 @@ uint_xlen_t maxu(uint_xlen_t rs1, uint_xlen_t rs2) ``` -# ternary bitops +# ternlog bitops -Similar to FPGA LUTs: for every bit perform a lookup into a table using an 8bit immediate, or in another register +Similar to FPGA LUTs: for every bit perform a lookup into a table using an 8bit immediate, or in another register. + +Like the x86 AVX512F [vpternlogd/vpternlogq](https://www.felixcloutier.com/x86/vpternlogd:vpternlogq) instructions. | 0.5|6.10|11.15|16.20| 21..25| 26..30 |31| | -- | -- | --- | --- | ----- | -------- |--| @@ -510,7 +512,7 @@ uint64_t gorc64(uint64_t RA, uint64_t RB) # cmix -based on RV bitmanip, covered by ternary bitops +based on RV bitmanip, covered by ternlog bitops ``` uint_xlen_t cmix(uint_xlen_t RA, uint_xlen_t RB, uint_xlen_t RC) { -- 2.30.2