* ternlogi <https://bugs.libre-soc.org/show_bug.cgi?id=745>
* grev <https://bugs.libre-soc.org/show_bug.cgi?id=755>
* GF2^M <https://bugs.libre-soc.org/show_bug.cgi?id=782>
-
+* binutils <https://bugs.libre-soc.org/show_bug.cgi?id=836>
+* shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
# bitmanipulation
* <https://en.wikiversity.org/wiki/Reed%E2%80%93Solomon_codes_for_coders>
* <https://maths-people.anu.edu.au/~brent/pd/rpb232tr.pdf>
+* <https://gist.github.com/animetosho/d3ca95da2131b5813e16b5bb1b137ca0>
+* <https://github.com/HJLebbink/asm-dude/wiki/GF2P8AFFINEINVQB>
-# summary
-
-two major opcodes are needed
-
-ternlog has its own major opcode
-
-| 29.30 |31| name | Form |
-| ------ |--| --------- | ---- |
-| 0 0 |Rc| ternlogi | TLI-Form |
-| 0 1 | | crternlogi | TLI-Form |
-| 1 iv | | grevlogi | TLI-Form |
-
-2nd major opcode for other bitmanip: minor opcode allocation
-
-| 28.30 |31| name |
-| ------ |--| --------- |
-| -00 |0 | xpermi |
-| -00 |1 | binary lut |
-| -01 |0 | grevlog |
-| -01 |1 | swizzle mv/fmv |
-| 010 |Rc| bitmask |
-| 011 | | SVP64 |
-| 110 |Rc| 1/2-op |
-| 111 | | bmrevi |
-
-
-1-op and variants
-
-| dest | src1 | subop | op |
-| ---- | ---- | ----- | -------- |
-| RT | RA | .. | bmatflip |
-
-2-op and variants
-
-| dest | src1 | src2 | subop | op |
-| ---- | ---- | ---- | ----- | -------- |
-| RT | RA | RB | or | bmatflip |
-| RT | RA | RB | xor | bmatflip |
-| RT | RA | RB | | grev |
-| RT | RA | RB | | clmul\* |
-| RT | RA | RB | | gorc |
-| RT | RA | RB | shuf | shuffle |
-| RT | RA | RB | unshuf| shuffle |
-| RT | RA | RB | width | xperm |
-| RT | RA | RB | type | av minmax |
-| RT | RA | RB | | av abs avgadd |
-| RT | RA | RB | type | vmask ops |
-| RT | RA | RB | type | abs accumulate (overwrite) |
-
-3 ops
-
-* grevlog[w]
-* GF mul-add
-* bitmask-reverse
-
-TODO: convert all instructions to use RT and not RS
-
-| 0.5|6.10|11.15|16.20 |21..25 | 26....30 |31| name | Form |
-| -- | -- | --- | --- | ----- | -------- |--| ------ | -------- |
-| NN | RT | RA |it/im57|im0-4 | 0 00 00 |0 | xpermi | TODO-Form |
-| NN | | | | | - -- 00 |0 | rsvd | rsvd |
-| NN | RT | RA | RB | RC | nh 00 00 |1 | binlut | VA-Form |
-| NN | RT | RA | RB | /BFA/ | 0 01 00 |1 | bincrflut | VA-Form |
-| NN | | | | | 1 01 00 |1 | svindex | SVI-Form |
-| NN | RT | RA | RB | mode | L 10 00 |1 | bmask | BM2-Form |
-| NN | | | | | 0 11 00 |1 | svshape | SVM-Form |
-| NN | | | | | 1 11 00 |1 | svremap | SVRM-Form |
-| NN | RT | RA | RB | im0-4 | im5-7 01 |0 | grevlog | TLI-Form |
-| NN | RT | RA | RB | im0-4 | im5-7 01 |1 | grevlogw | TLI-Form |
-| NN | RT | RA | RB | RC | mode 010 |Rc| bitmask\* | VA2-Form |
-| NN |FRS | d1 | d0 | d0 | 00 011 |d2| fmvis | DX-Form |
-| NN |FRS | d1 | d0 | d0 | 01 011 |d2| fishmv | DX-Form |
-| NN | | | | | 10 011 |Rc| svstep | SVL-Form |
-| NN | | | | | 11 011 |Rc| setvl | SVL-Form |
-| NN | | | | | ---- 110 | | 1/2 ops | other table [1] |
-| NN | RT | RA | RB | RC | 11 110 |Rc| bmrev | VA2-Form |
-| NN | RT | RA | RB | sh0-4 | sh5 1 111 |Rc| bmrevi | MDS-Form |
-
-[1] except bmrev
-
-ops (note that av avg and abs as well as vec scalar mask
-are included here [[sv/vector_ops]], and
-the [[sv/av_opcodes]])
-
-| 0.5|6.10|11.15|16.20| 21 | 22.23 | 24....30 |31| name | Form |
-| -- | -- | --- | --- | -- | ----- | -------- |--| ---- | ------- |
-| NN | RS | me | sh | SH | ME 0 | nn00 110 |Rc| bmopsi | BM-Form |
-| NN | RS | RA | sh | SH | 0 1 | nn00 110 |Rc| bmopsi | XB-Form |
-| NN | RS | RA |im04 | im5| 1 1 | im67 00 110 |Rc| bmatxori | TODO |
-| NN | RT | RA | RB | 1 | 00 | 0001 110 |Rc| cldiv | X-Form |
-| NN | RT | RA | RB | 1 | 01 | 0001 110 |Rc| clmod | X-Form |
-| NN | RT | RA | | 1 | 10 | 0001 110 |Rc| clmulh | X-Form |
-| NN | RT | RA | RB | 1 | 11 | 0001 110 |Rc| clmul | X-Form |
-| NN | RT | RA | RB | 0 | 00 | 0001 110 |Rc| rsvd | |
-| NN | RT | RA | RB | 0 | 01 | 0001 110 |Rc| rsvd | |
-| NN | RT | RA | RB | 0 | 10 | 0001 110 |Rc| rsvd | |
-| NN | RT | RA | RB | 0 | 11 | 0001 110 |Rc| vec cprop | X-Form |
-| NN | | | | | 00 | 0101 110 |0 | crfbinlog | {TODO} |
-| NN | | | | | 00 | 0101 110 |1 | rsvd | |
-| NN | | | | | 10 | 0101 110 |Rc| rsvd | |
-| NN | | | | | -1 | 0101 110 |Rc| rsvd | |
-| NN | RT | RA | RB | 0 | itype | 1001 110 |Rc| av minmax | X-Form |
-| NN | RT | RA | RB | 1 | 00 | 1001 110 |Rc| av abss | X-Form |
-| NN | RT | RA | RB | 1 | 01 | 1001 110 |Rc| av absu | X-Form |
-| NN | RT | RA | RB | 1 | 10 | 1001 110 |Rc| av avgadd | X-Form |
-| NN | RT | RA | RB | 1 | 11 | 1001 110 |Rc| grevlutr | X-Form |
-| NN | RT | RA | RB | 0 | itype | 1101 110 |Rc| shadd | X-Form |
-| NN | RT | RA | RB | 1 | itype | 1101 110 |Rc| shadduw | X-Form |
-| NN | RT | RA | RB | 0 | 00 | 0010 110 |Rc| rsvd | |
-| NN | RS | RA | sh | SH | 00 | 1010 110 |Rc| rsvd | |
-| NN | RT | RA | RB | 0 | 00 | 0110 110 |Rc| rsvd | |
-| NN | RS | RA | SH | 0 | 00 | 1110 110 |Rc| rsvd | |
-| NN | RT | RA | RB | 1 | 00 | 1110 110 |Rc| absds | X-Form |
-| NN | RT | RA | RB | 0 | 01 | 0010 110 |Rc| rsvd | |
-| NN | RT | RA | RB | 1 | 01 | 0010 110 |Rc| clmulr | X-Form |
-| NN | RS | RA | sh | SH | 01 | 1010 110 |Rc| rsvd | |
-| NN | RT | RA | RB | 0 | 01 | 0110 110 |Rc| rsvd | |
-| NN | RS | RA | SH | 0 | 01 | 1110 110 |Rc| rsvd | |
-| NN | RT | RA | RB | 1 | 01 | 1110 110 |Rc| absdu | X-Form |
-| NN | RS | RA | RB | 0 | 10 | 0010 110 |Rc| bmator | X-Form |
-| NN | RS | RA | RB | 0 | 10 | 0110 110 |Rc| bmatand | X-Form |
-| NN | RS | RA | RB | 0 | 10 | 1010 110 |Rc| bmatxor | X-Form |
-| NN | RS | RA | RB | 0 | 10 | 1110 110 |Rc| bmatflip | X-Form |
-| NN | RT | RA | RB | 1 | 10 | 0010 110 |Rc| xpermn | X-Form |
-| NN | RT | RA | RB | 1 | 10 | 0110 110 |Rc| xpermb | X-Form |
-| NN | RT | RA | RB | 1 | 10 | 1010 110 |Rc| xpermh | X-Form |
-| NN | RT | RA | RB | 1 | 10 | 1110 110 |Rc| xpermw | X-Form |
-| NN | RT | RA | RB | 0 | 11 | 1110 110 |Rc| absdacs | X-Form |
-| NN | RT | RA | RB | 1 | 11 | 1110 110 |Rc| absdacu | X-Form |
-| NN | | | | | | --11 110 |Rc| bmrev | VA2-Form |
+[[!inline pages="openpower/sv/draft_opcode_tables" quick="yes" raw="yes" ]]
# binary and ternary bitops
SVP64 designation from RS-as-dest. This gives a limited range of
non-overwrite capability.
-# shift-and-add
+# shift-and-add <a name="shift-add"> </a>
Power ISA is missing LD/ST with shift, which is present in both ARM and x86.
Too complex to add more LD/ST, a compromise is to add shift-and-add.
Replaces a pair of explicit instructions in hot-loops.
```
-uint_xlen_t shadd(uint_xlen_t rs1, uint_xlen_t rs2, uint8_t sh) {
- return (rs1 << (sh+1)) + rs2;
+# 1.6.27 Z23-FORM
+ |0 |6 |11 |15 |16 |21 |23 |31 |
+ | PO | RT | RA | RB |sm | XO |Rc |
+```
+
+Pseudo-code (shadd):
+
+ shift <- shift + 1 # Shift is between 1-4
+ sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
+ RT <- sum # Result stored in RT
+
+Pseudo-code (shadduw):
+
+ shift <- shift + 1 # Shift is between 1-4
+ n <- (RB)[XLEN/2:XLEN-1] # Limit RB to upper word (32-bits)
+ sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
+ RT <- sum # Result stored in RT
+
+```
+uint_xlen_t shadd(uint_xlen_t RA, uint_xlen_t RB, uint8_t sm) {
+ sm = sm & 0x3;
+ return (RB << (sm+1)) + RA;
}
-uint_xlen_t shadduw(uint_xlen_t rs1, uint_xlen_t rs2, uint8_t sh) {
- uint_xlen_t rs1z = rs1 & 0xFFFFFFFF;
- return (rs1z << (sh+1)) + rs2;
+uint_xlen_t shadduw(uint_xlen_t RA, uint_xlen_t RB, uint8_t sm) {
+ uint_xlen_t n = RB & 0xFFFFFFFF;
+ sm = sm & 0x3;
+ return (n << (sm+1)) + RA;
}
```
# grevlut <a name="grevlut"> </a>
-([3x lower latency alternative](grev_gorc_design/) which is
-not equivalent and has limited constant-generation capability)
-
generalised reverse combined with a pair of LUT2s and allowing
a constant `0b0101...0101` when RA=0, and an option to invert
(including when RA=0, giving a constant 0b1010...1010 as the
| -- | -- | --- | --- | ----- | -----|--| ------ | ----- |
| NN | RT | RA | s0-4 | im0-7 | 1 iv |s5| grevlogi | |
| NN | RT | RA | RB | im0-7 | 01 |0 | grevlog | |
-| NN | RT | RA | RB | im0-7 | 01 |1 | grevlogw | |
+
+An equivalent to `grevlogw` may be synthesised by setting the
+appropriate bits in RB to set the top half of RT to zero.
+Thus an explicit grevlogw instruction is not necessary.
# xperm