(no commit message)
[libreriscv.git] / openpower / sv / cookbook / conflictd.mdwn
1 [[!tag svp64_cookbook ]]
2
3 <https://libre-soc.org/openpower/sv/vector_ops/discussion/>
4
5 This is based on the AVX512 conflict detection instruction. Internally the logic is used to detect address conflicts in multi-issue LD/ST operations. Two arrays of values are given: the indices are compared and duplicates reported in a triangular fashion. the instruction may be used for histograms (computed in parallel)
6
7 input = [100, 100, 3, 100, 5, 100, 100, 3]
8 conflict result = [
9 0b00000000, // Note: first element always zero
10 0b00000001, // 100 is present on #0
11 0b00000000,
12 0b00000011, // 100 is present on #0 and #1
13 0b00000000,
14 0b00001011, // 100 is present on #0, #1, #3
15 0b00011011, // .. and #4
16 0b00000100 // 3 is present on #2
17 ]
18
19 Pseudocode:
20
21 for i in range(VL):
22 for j in range(1, i):
23 if src1[i] == src2[j]:
24 result[j] |= 1<<i
25
26 Idea 1: implement this as a Triangular Schedule, Vertical-First Mode,
27 using `mfcrweird` and `cmpi`. first triangular schedule on src1,
28 secpnd on src2.
29
30 Idea 2: implement using outer loop on varying setvl Horizontal-First
31 with `1<<r3` predicate mask for src2 as scalar, creates CR field vector, transfer into INT with mfcrweird then OR into the
32 result.
33
34 ```
35 li r3, target
36 li result, 0
37 for i in range(target):
38 setvl target
39 addi r3, r3, -1 # shift 1<<r3 predicate down by one
40 sv.addi/sm=1<<r3 t0, src1.v, 0 # copy src1[i]
41 sv.cmpi src2.v, t0 # compare src2 vector to scalar
42 sv.mfcrweird t1, cr0.v, eq # copy CR eq result bits to t1
43 or result, result, t1
44 ```
45
46 See [[sv/cr_int_predication]] for full details on the crweird instructions:
47 the primary important aspect here is that a Vector of CR Field's EQ bits is
48 transferred into a single GPR. The secondary important aspect is that VL
49 is being adjusted in each loop, testing successively more of the input
50 vector against a given scalar, each time.
51
52 To investigate:
53
54 * <https://stackoverflow.com/questions/39266476/how-to-speed-up-this-histogram-of-lut-lookups>
55 * <https://stackoverflow.com/questions/39913707/how-do-the-conflict-detection-instructions-make-it-easier-to-vectorize-loops>