(no commit message)
[libreriscv.git] / openpower / sv / av_opcodes.mdwn
1 # Scalar OpenPOWER Audio and Video Opcodes
2
3 the fundamental principle of SV is a hardware for-loop. therefore the first (and in nearly 100% of cases only) place to put Vector operations is first and foremost in the *scalar* ISA. However only by analysing those scalar opcodes *in* a SV Vectorisation context does it become clear why they are needed and how they may be designed.
4
5 This page therefore has acompanying discussion at <https://bugs.libre-soc.org/show_bug.cgi?id=230> for evolution of suitable opcodes.
6
7 # Audio
8
9 The fundamental principle for these instructions is:
10
11 * identify the scalar primitive
12 * assume that longer runs of scalars will have Simple-V vectorisatin applied
13 * assume that "swizzle" may be applied at the (vec2 - SUBVL=2) Vector level,
14 in order to perform the necessary HI/LO selection normally hard-coded
15 into SIMD ISAs.
16
17 Thus for example, where OpenPOWER VSX has vpkswss, this would be achieved in SV with simply:
18
19 * addition of a scalar ext/clamp instruction
20 * 1st op, swizzle-selection vec2 "select X only" from source to dest:
21 dest.X = extclamp(src.X)
22 * 2nd op, swizzle-select vec2 "select Y only" from source to dest
23 dest.Y = extclamp(src.Y)
24
25 Macro-op fusion may be used to detect that these two interleave cleanly, overlapping the vec2.X with vec2.Y to produce a single vec2.XY operation.
26
27 ## Scalar element operations
28
29 * clamping / saturation for signed and unsigned. best done similar to FP rounding modes, i.e. with an SPR.
30 * average-add. result = (src1 + src2 + 1) >> 1
31 * abs-diff: result = (src1 > src2) ? (src1-src2) : (src2-src1)
32 * signed min/max
33
34 # Video
35
36 TODO
37
38 * DCT <https://users.cs.cf.ac.uk/Dave.Marshall/Multimedia/node231.html>
39
40 # VSX SIMD
41
42 ## vpkpx
43
44 vpkpx is a 32-bit to 16-bit 8888 into 1555 conversion
45
46 SV notes:
47
48 a single 32-bit to 16-bit operation should suffice, fitting cleanly into one single scalar op:
49
50 dest[0] = src[7]
51 dest[1 : 5] = src[8 :12]
52 dest[6 :10] = src[16:20]
53 dest[11:15] = src[24:28]
54
55 ## vpks[\*][\*]s
56
57 signed and unsigned, these are N-to-M (N=64/32/16, M=32/16/8) chop/clamp/sign/zero-extend operations
58
59 ## vupkhpx / vupklpx
60
61 these are 16-bit to 32-bit 1555 to 8888 conversion
62
63 ## vavgs\*
64
65 signed and unsigned, 8/16/32: these are all of the form:
66
67 result = truncate((a + b + 1) >> 1))
68
69 ## vabsdu\*
70
71 unsigned 8/16/32: these are all of the form:
72
73 result = (src1 > src2) ? truncate(src1-src2) :
74 truncate(src2-src1)
75
76 ## vmaxs\* / vmaxu\* (and min)
77
78 signed and unsigned, 8/16/32: these are all of the form:
79
80 result = (src1 > src2) ? src1 : src2 # max
81 result = (src1 < src2) ? src1 : src2 # min
82
83 ## vmerge operations
84
85 these take two src vectors of various widths and splice them together. the best technique to cover these is a simple straightforward predicated pair of mv operations, inverting the predicate in the second case, or, alternately, to use a pair of vec2 (SUBVL=2) swizzled operations.
86
87 in the swizzle case the first instruction would be destvect2.X = srcvec2.X and the second would swizzle-select Y. macro-op fusion in both the prefixated variant and the swizzle variant would interleave the two into the same SIMD backend ALUs.
88
89 with twin predication the elwidth can be overridden on both src and dest such that either straight scalar mv or extsw/b/h can be used to provide the combinations of coverage needed, with only 2 actual instructions (plus vectir prefixing)