From 010bad250be020198c84de3dc3ece8c38b15cadc Mon Sep 17 00:00:00 2001 From: lkcl Date: Mon, 20 Jun 2022 17:52:41 +0100 Subject: [PATCH] --- openpower/sv/vector_ops.mdwn | 143 +++-------------------------------- 1 file changed, 10 insertions(+), 133 deletions(-) diff --git a/openpower/sv/vector_ops.mdwn b/openpower/sv/vector_ops.mdwn index faca26c12..06c8f0634 100644 --- a/openpower/sv/vector_ops.mdwn +++ b/openpower/sv/vector_ops.mdwn @@ -4,6 +4,7 @@ Links: +* [[discussion]] * * conflictd example * @@ -13,7 +14,12 @@ Links: * [[simple_v_extension/specification/bitmanip]] previous version, contains pseudocode for sof, sif, sbf -The core OpenPOWER ISA was designed as scalar: SV provides a level of abstraction to add variable-length element-independent parallelism. However, certain classes of instructions only make sense in a Vector context: AVX512 conflictd for example. This section includes such examples. Many of them are from the RISC-V Vector ISA (with thanks to the efforts of RVV's contributors) +The core OpenPOWER ISA was designed as scalar: SV provides a level of abstraction to add variable-length element-independent parallelism. +Therefore there are not that many cases where *actual* Vector +instructions are needed. If they are, they are more "assistance" +functions. Two traditional Vector instructions were initially +considered (conflictd and vmiota) however they may be synthesised +from existing SVP64 instructions and have been moved to [[discussion]] Notes: @@ -21,136 +27,7 @@ Notes: * Instructions suited to 3D GPU workloads (dotproduct, crossproduct, normalise) are out of scope: this document is for more general-purpose instructions that underpin and are critical to general-purpose Vector workloads (including GPU and VPU) * Instructions related to the adaptation of CRs for use as predicate masks are covered separately, by crweird operations. See [[sv/cr_int_predication]]. -# Vector - -Both of these instructions may be synthesised from SVP64 Vector -instructions. conflictd is an O(N^2) instruction based on -`sv.cmpi` and iota is an O(N) instruction based on `sv.addi` -with the appropriate predication - -## conflictd - -This is based on the AVX512 conflict detection instruction. Internally the logic is used to detect address conflicts in multi-issue LD/ST operations. Two arrays of values are given: the indices are compared and duplicates reported in a triangular fashion. the instruction may be used for histograms (computed in parallel) - - input = [100, 100, 3, 100, 5, 100, 100, 3] - conflict result = [ - 0b00000000, // Note: first element always zero - 0b00000001, // 100 is present on #0 - 0b00000000, - 0b00000011, // 100 is present on #0 and #1 - 0b00000000, - 0b00001011, // 100 is present on #0, #1, #3 - 0b00011011, // .. and #4 - 0b00000100 // 3 is present on #2 - ] - -Pseudocode: - - for i in range(VL): - for j in range(1, i): - if src1[i] == src2[j]: - result[j] |= 1< -* - -## iota - -Based on RVV vmiota. vmiota may be viewed as a cumulative variant of popcount, generating multiple results. successive iterations include more and more bits of the bitstream being tested. - -When masked, only the bits not masked out are included in the count process. - - viota RT/v, RA, RB - -Note that when RA=0 this indicates to test against all 1s, resulting in the instruction generating a vector sequence [0, 1, 2... VL-1]. This will be equivalent to RVV vid.m which is a pseudo-op, here (RA=0). - -Example - - 7 6 5 4 3 2 1 0 Element number - - 1 0 0 1 0 0 0 1 v2 contents - viota.m v4, v2 # Unmasked - 2 2 2 1 1 1 1 0 v4 result - - 1 1 1 0 1 0 1 1 v0 contents - 1 0 0 1 0 0 0 1 v2 contents - 2 3 4 5 6 7 8 9 v4 contents - viota.m v4, v2, v0.t # Masked - 1 1 1 5 1 7 1 0 v4 results - - def iota(RT, RA, RB): - mask = RB ? iregs[RB] : 0b111111...1 - val = RA ? iregs[RA] : 0b111111...1 - for i in range(VL): - if RA.scalar: - testmask = (1<