From 73790e850462545dc53bd13976123c4808d700d6 Mon Sep 17 00:00:00 2001 From: lkcl Date: Mon, 20 Jun 2022 17:54:17 +0100 Subject: [PATCH] --- openpower/sv/vector_ops/discussion.mdwn | 168 +----------------------- 1 file changed, 1 insertion(+), 167 deletions(-) diff --git a/openpower/sv/vector_ops/discussion.mdwn b/openpower/sv/vector_ops/discussion.mdwn index 9efe21825..55c8bec8d 100644 --- a/openpower/sv/vector_ops/discussion.mdwn +++ b/openpower/sv/vector_ops/discussion.mdwn @@ -1,6 +1,6 @@ [[!tag standards]] -# SV Vector Operations. +# SV Vector Operations not added Links: @@ -8,18 +8,6 @@ Links: * conflictd example * * -* specialist vector ops - out of scope for this document [[openpower/sv/3d_vector_ops]] -* [[simple_v_extension/specification/bitmanip]] previous version, - contains pseudocode for sof, sif, sbf - -The core OpenPOWER ISA was designed as scalar: SV provides a level of abstraction to add variable-length element-independent parallelism. However, certain classes of instructions only make sense in a Vector context: AVX512 conflictd for example. This section includes such examples. Many of them are from the RISC-V Vector ISA (with thanks to the efforts of RVV's contributors) - -Notes: - -* Some of these actually could be added to a scalar ISA as bitmanipulation instructions. These are separated out into their own section. -* Instructions suited to 3D GPU workloads (dotproduct, crossproduct, normalise) are out of scope: this document is for more general-purpose instructions that underpin and are critical to general-purpose Vector workloads (including GPU and VPU) -* Instructions related to the adaptation of CRs for use as predicate masks are covered separately, by crweird operations. See [[sv/cr_int_predication]]. # Vector @@ -146,157 +134,3 @@ mask-out effect of resetting the count back to zero. However close examination shows that the above may actually be `sv.addi/mr/sm=EQ/dz r0.v, r0.v, 1` -# Scalar - -These may all be viewed as suitable for fitting into a scalar bitmanip extension. - -## sbfm - - sbfm RT, RA, RB!=0 - -Example - - 7 6 5 4 3 2 1 0 Bit index - - 1 0 0 1 0 1 0 0 v3 contents - vmsbf.m v2, v3 - 0 0 0 0 0 0 1 1 v2 contents - - 1 0 0 1 0 1 0 1 v3 contents - vmsbf.m v2, v3 - 0 0 0 0 0 0 0 0 v2 - - 0 0 0 0 0 0 0 0 v3 contents - vmsbf.m v2, v3 - 1 1 1 1 1 1 1 1 v2 - - 1 1 0 0 0 0 1 1 RB vcontents - 1 0 0 1 0 1 0 0 v3 contents - vmsbf.m v2, v3, v0.t - 0 1 x x x x 1 1 v2 contents - -The vmsbf.m instruction takes a mask register as input and writes results to a mask register. The instruction writes a 1 to all active mask elements before the first source element that is a 1, then writes a 0 to that element and all following active elements. If there is no set bit in the source vector, then all active elements in the destination are written with a 1. - -Executable demo: - -``` -[[!inline quick="yes" raw="yes" pages="openpower/sv/sbf.py"]] -``` - -## sifm - -The vector mask set-including-first instruction is similar to set-before-first, except it also includes the element with a set bit. - - sifm RT, RA, RB!=0 - - # Example - - 7 6 5 4 3 2 1 0 Bit number - - 1 0 0 1 0 1 0 0 v3 contents - vmsif.m v2, v3 - 0 0 0 0 0 1 1 1 v2 contents - - 1 0 0 1 0 1 0 1 v3 contents - vmsif.m v2, v3 - 0 0 0 0 0 0 0 1 v2 - - 1 1 0 0 0 0 1 1 RB vcontents - 1 0 0 1 0 1 0 0 v3 contents - vmsif.m v2, v3, v0.t - 1 1 x x x x 1 1 v2 contents - -Executable demo: - -``` -[[!inline quick="yes" raw="yes" pages="openpower/sv/sif.py"]] -``` - -## vmsof - -The vector mask set-only-first instruction is similar to set-before-first, except it only sets the first element with a bit set, if any. - - sofm RT, RA, RB - -Example - - 7 6 5 4 3 2 1 0 Bit number - - 1 0 0 1 0 1 0 0 v3 contents - vmsof.m v2, v3 - 0 0 0 0 0 1 0 0 v2 contents - - 1 0 0 1 0 1 0 1 v3 contents - vmsof.m v2, v3 - 0 0 0 0 0 0 0 1 v2 - - 1 1 0 0 0 0 1 1 RB vcontents - 1 1 0 1 0 1 0 0 v3 contents - vmsof.m v2, v3, v0.t - 0 1 x x x x 0 0 v2 content - -Executable demo: - -``` -[[!inline quick="yes" raw="yes" pages="openpower/sv/sof.py"]] -``` - -# Carry-lookahead - -used not just for carry lookahead, also a special type of predication mask operation. - -* -* -* -* -* - `((P|G)+G)^P` -* - -From QLSKY.png: - -``` - x0 = nand(CIn, P0) - C0 = nand(x0, ~G0) - - x1 = nand(CIn, P0, P1) - y1 = nand(G0, P1) - C1 = nand(x1, y1, ~G1) - - x2 = nand(CIn, P0, P1, P2) - y2 = nand(G0, P1, P2) - z2 = nand(G1, P2) - C1 = nand(x2, y2, z2, ~G2) - - # Gen* - x3 = nand(G0, P1, P2, P3) - y3 = nand(G1, P2, P3) - z3 = nand(G2, P3) - G* = nand(x3, y3, z3, ~G3) -``` - -``` - P = (A | B) & Ci - G = (A & B) -``` - -Stackoverflow algorithm `((P|G)+G)^P` works on the cumulated bits of P and G from associated vector units (P and G are integers here). The result of the algorithm is the new carry-in which already includes ripple, one bit of carry per element. - -``` - At each id, compute C[id] = A[id]+B[id]+0 - Get G[id] = C[id] > radix -1 - Get P[id] = C[id] == radix-1 - Join all P[id] together, likewise G[id] - Compute newC = ((P|G)+G)^P - result[id] = (C[id] + newC[id]) % radix -``` - -two versions: scalar int version and CR based version. - -scalar int version acts as a scalar carry-propagate, reading XER.CA as input, P and G as regs, and taking a radix argument. the end bits go into XER.CA and CR0.ge - -vector version takes CR0.so as carry in, stores in CR0.so and CR.ge end bits. - -if zero (no propagation) then CR0.eq is zero - -CR based version, TODO. -- 2.30.2