* ternlogi <https://bugs.libre-soc.org/show_bug.cgi?id=745>
* grev <https://bugs.libre-soc.org/show_bug.cgi?id=755>
* GF2^M <https://bugs.libre-soc.org/show_bug.cgi?id=782>
-
+* binutils <https://bugs.libre-soc.org/show_bug.cgi?id=836>
+* shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
# bitmanipulation
* <https://en.wikiversity.org/wiki/Reed%E2%80%93Solomon_codes_for_coders>
* <https://maths-people.anu.edu.au/~brent/pd/rpb232tr.pdf>
+* <https://gist.github.com/animetosho/d3ca95da2131b5813e16b5bb1b137ca0>
+* <https://github.com/HJLebbink/asm-dude/wiki/GF2P8AFFINEINVQB>
-[[!inline quick="yes" raw="yes" pages="/openpower/sv/draft_opcode_tables.mdwn"]]
-
+[[!inline pages="openpower/sv/draft_opcode_tables" quick="yes" raw="yes" ]]
# binary and ternary bitops
SVP64 designation from RS-as-dest. This gives a limited range of
non-overwrite capability.
-# shift-and-add
+# shift-and-add <a name="shift-add"> </a>
Power ISA is missing LD/ST with shift, which is present in both ARM and x86.
Too complex to add more LD/ST, a compromise is to add shift-and-add.
Replaces a pair of explicit instructions in hot-loops.
```
-uint_xlen_t shadd(uint_xlen_t rs1, uint_xlen_t rs2, uint8_t sh) {
- return (rs1 << (sh+1)) + rs2;
+# 1.6.27 Z23-FORM
+ |0 |6 |11 |15 |16 |21 |23 |31 |
+ | PO | RT | RA | RB |sm | XO |Rc |
+```
+
+Pseudo-code (shadd):
+
+ shift <- shift + 1 # Shift is between 1-4
+ sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
+ RT <- sum # Result stored in RT
+
+Pseudo-code (shadduw):
+
+ shift <- shift + 1 # Shift is between 1-4
+ n <- (RB)[XLEN/2:XLEN-1] # Limit RB to upper word (32-bits)
+ sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
+ RT <- sum # Result stored in RT
+
+```
+uint_xlen_t shadd(uint_xlen_t RA, uint_xlen_t RB, uint8_t sm) {
+ sm = sm & 0x3;
+ return (RB << (sm+1)) + RA;
}
-uint_xlen_t shadduw(uint_xlen_t rs1, uint_xlen_t rs2, uint8_t sh) {
- uint_xlen_t rs1z = rs1 & 0xFFFFFFFF;
- return (rs1z << (sh+1)) + rs2;
+uint_xlen_t shadduw(uint_xlen_t RA, uint_xlen_t RB, uint8_t sm) {
+ uint_xlen_t n = RB & 0xFFFFFFFF;
+ sm = sm & 0x3;
+ return (n << (sm+1)) + RA;
}
```
# grevlut <a name="grevlut"> </a>
-([3x lower latency alternative](grev_gorc_design/) which is
-not equivalent and has limited constant-generation capability)
-
generalised reverse combined with a pair of LUT2s and allowing
a constant `0b0101...0101` when RA=0, and an option to invert
(including when RA=0, giving a constant 0b1010...1010 as the
| -- | -- | --- | --- | ----- | -----|--| ------ | ----- |
| NN | RT | RA | s0-4 | im0-7 | 1 iv |s5| grevlogi | |
| NN | RT | RA | RB | im0-7 | 01 |0 | grevlog | |
-| NN | RT | RA | RB | im0-7 | 01 |1 | grevlogw | |
+
+An equivalent to `grevlogw` may be synthesised by setting the
+appropriate bits in RB to set the top half of RT to zero.
+Thus an explicit grevlogw instruction is not necessary.
# xperm