[[!tag standards]] # sof/sif/sbfm etc. These all it turns out can be done as bitmanip of the form x/~x &/|/^ (x / -x / x+1 / x-1) so are being superceded. needs some work though ``` if _RB = 0 then mask <- [1] * XLEN else mask = (RB) a1 <- (RA) & mask if mode[1] then a1 <- ¬ra mode2 <- mode[2:3] if mode2 = 0 then a2 <- (¬ra)+1 if mode2 = 1 then a2 <- ra-1 if mode2 = 2 then a2 <- ra+1 if mode2 = 3 then a2 <- ¬(ra+1) a1 <- a1 & mask a2 <- a2 & mask # select operator mode3 <- mode[3:4] if mode3 = 0 then result <- a1 | a2 if mode3 = 1 then result <- a1 & a2 if mode3 = 2 then result <- a1 ^ a2 if mode3 = 3 then result <- UNDEFINED result <- result & mask # optionally restore masked-out bits if L = 1 then result <- result | (RA & ¬mask) RT <- result SBF = 0b01010 # set before first SOF = 0b01001 # set only first SIF = 0b10000 # set including first 10011 also works no idea why yet ``` ## OBSOLETE sbfm sbfm RT, RA, RB!=0 Example 7 6 5 4 3 2 1 0 Bit index 1 0 0 1 0 1 0 0 v3 contents vmsbf.m v2, v3 0 0 0 0 0 0 1 1 v2 contents 1 0 0 1 0 1 0 1 v3 contents vmsbf.m v2, v3 0 0 0 0 0 0 0 0 v2 0 0 0 0 0 0 0 0 v3 contents vmsbf.m v2, v3 1 1 1 1 1 1 1 1 v2 1 1 0 0 0 0 1 1 RB vcontents 1 0 0 1 0 1 0 0 v3 contents vmsbf.m v2, v3, v0.t 0 1 x x x x 1 1 v2 contents The vmsbf.m instruction takes a mask register as input and writes results to a mask register. The instruction writes a 1 to all active mask elements before the first source element that is a 1, then writes a 0 to that element and all following active elements. If there is no set bit in the source vector, then all active elements in the destination are written with a 1. Executable pseudocode demo: ``` [[!inline quick="yes" raw="yes" pages="openpower/sv/sbf.py"]] ``` ## OBSOLETE sifm The vector mask set-including-first instruction is similar to set-before-first, except it also includes the element with a set bit. sifm RT, RA, RB!=0 # Example 7 6 5 4 3 2 1 0 Bit number 1 0 0 1 0 1 0 0 v3 contents vmsif.m v2, v3 0 0 0 0 0 1 1 1 v2 contents 1 0 0 1 0 1 0 1 v3 contents vmsif.m v2, v3 0 0 0 0 0 0 0 1 v2 1 1 0 0 0 0 1 1 RB vcontents 1 0 0 1 0 1 0 0 v3 contents vmsif.m v2, v3, v0.t 1 1 x x x x 1 1 v2 contents Executable pseudocode demo: ``` [[!inline quick="yes" raw="yes" pages="openpower/sv/sif.py"]] ``` ## OBSOLETE vmsof The vector mask set-only-first instruction is similar to set-before-first, except it only sets the first element with a bit set, if any. sofm RT, RA, RB Example 7 6 5 4 3 2 1 0 Bit number 1 0 0 1 0 1 0 0 v3 contents vmsof.m v2, v3 0 0 0 0 0 1 0 0 v2 contents 1 0 0 1 0 1 0 1 v3 contents vmsof.m v2, v3 0 0 0 0 0 0 0 1 v2 1 1 0 0 0 0 1 1 RB vcontents 1 1 0 1 0 1 0 0 v3 contents vmsof.m v2, v3, v0.t 0 1 x x x x 0 0 v2 content Executable pseudocode demo: ``` [[!inline quick="yes" raw="yes" pages="openpower/sv/sof.py"]] ``` # SV Vector Operations not added Links: * * conflictd example * * Both of these instructions may be synthesised from SVP64 Vector instructions. conflictd is an O(N^2) instruction based on `sv.cmpi` and iota is an O(N) instruction based on `sv.addi` with the appropriate predication # conflictd moved to [[sv/cookbook/conflictd]] # iota Based on RVV vmiota. vmiota may be viewed as a cumulative variant of popcount, generating multiple results. successive iterations include more and more bits of the bitstream being tested. When masked, only the bits not masked out are included in the count process. viota RT/v, RA, RB Note that when RA=0 this indicates to test against all 1s, resulting in the instruction generating a vector sequence [0, 1, 2... VL-1]. This will be equivalent to RVV vid.m which is a pseudo-op, here (RA=0). Example 7 6 5 4 3 2 1 0 Element number 1 0 0 1 0 0 0 1 v2 contents viota.m v4, v2 # Unmasked 2 2 2 1 1 1 1 0 v4 result 1 1 1 0 1 0 1 1 v0 contents 1 0 0 1 0 0 0 1 v2 contents 2 3 4 5 6 7 8 9 v4 contents viota.m v4, v2, v0.t # Masked 1 1 1 5 1 7 1 0 v4 results def iota(RT, RA, RB): mask = RB ? iregs[RB] : 0b111111...1 val = RA ? iregs[RA] : 0b111111...1 for i in range(VL): if RA.scalar: testmask = (1< * * * * `((P|G)+G)^P` * From QLSKY.png: ``` x0 = nand(CIn, P0) C0 = nand(x0, ~G0) x1 = nand(CIn, P0, P1) y1 = nand(G0, P1) C1 = nand(x1, y1, ~G1) x2 = nand(CIn, P0, P1, P2) y2 = nand(G0, P1, P2) z2 = nand(G1, P2) C1 = nand(x2, y2, z2, ~G2) # Gen* x3 = nand(G0, P1, P2, P3) y3 = nand(G1, P2, P3) z3 = nand(G2, P3) G* = nand(x3, y3, z3, ~G3) ``` ``` P = (A | B) & Ci G = (A & B) ``` Stackoverflow algorithm `((P|G)+G)^P` works on the cumulated bits of P and G from associated vector units (P and G are integers here). The result of the algorithm is the new carry-in which already includes ripple, one bit of carry per element. ``` At each id, compute C[id] = A[id]+B[id]+0 Get G[id] = C[id] > radix -1 Get P[id] = C[id] == radix-1 Join all P[id] together, likewise G[id] Compute newC = ((P|G)+G)^P result[id] = (C[id] + newC[id]) % radix ``` two versions: scalar int version and CR based version. scalar int version acts as a scalar carry-propagate, reading XER.CA as input, P and G as regs, and taking a radix argument. the end bits go into XER.CA and CR0.ge vector version takes CR0.so as carry in, stores in CR0.so and CR.ge end bits. if zero (no propagation) then CR0.eq is zero CR based version, TODO.