bb23d8e13518c28349678feb0404e61c2cb32aaf
[libreriscv.git] / openpower / sv / av_opcodes.mdwn
1 # Scalar OpenPOWER Audio and Video Opcodes
2
3 the fundamental principle of SV is a hardware for-loop. therefore the first (and in nearly 100% of cases only) place to put Vector operations is first and foremost in the *scalar* ISA. However only by analysing those scalar opcodes *in* a SV Vectorisation context does it become clear why they are needed and how they may be designed.
4
5 This page therefore has acompanying discussion at <https://bugs.libre-soc.org/show_bug.cgi?id=230> for evolution of suitable opcodes.
6
7 # Audio
8
9 The fundamental principle for these instructions is:
10
11 * identify the scalar primitive
12 * assume that longer runs of scalars will have Simple-V vectorisatin applied
13 * assume that "swizzle" may be applied at the (vec2 - SUBVL=2) Vector level
14
15 Thus for example, where OpenPOWER VSX has vpkswss, this would be achieved in SV with simply:
16
17 * addition of a scalar ext/clamp instruction
18 * 1st op, swizzle-selection vec2 "select X only" from source to dest:
19 dest.X = extclamp(src.X)
20 * 2nd op, swizzle-select vec2 "select Y only" from source to dest
21 dest.Y = extclamp(src.Y)
22
23 Macro-op fusion may be used to detect that these two interleave cleanly.
24
25 # Video
26
27 TODO
28
29 ## VSX SIMD
30
31 ### vpkpx
32
33 vpkpx is a 32-bit to 16-bit 8888 into 1555 conversion
34
35 SV notes:
36
37 a single 32-bit to 16-bit operation should suffice, fitting cleanly into one single scalar op:
38
39 dest[0] = src[7]
40 dest[1 : 5] = src[8 :12]
41 dest[6 :10] = src[16:20]
42 dest[11:15] = src[24:28]
43
44 ### vpks[*][*]s
45
46 signed and unsigned, these are N-to-M (N=64/32/16, M=32/16/8) chop/clamp/sign/zero-extend operations
47
48 ### vupkhpx / vupklpx
49
50 these are 16-bit to 32-bit 1555 to 8888 conversion
51
52 ### vavgs*
53
54 signed and unsigned, 8/16/32: these are all of the form:
55
56 result = truncate((a + b + 1) >> 1))
57
58 ### vabsdu*
59
60 unsigned 8/16/32: these are all of the form:
61
62 result = (src1 > src2) ? truncate(src1-src2) :
63 truncate(src2-src1)
64
65 ### vmaxs* / vmaxu* (and min)
66
67 signed and unsigned, 8/16/32: these are all of the form:
68
69 result = (src1 > src2) ? src1 : src2 # max
70 result = (src1 < src2) ? src1 : src2 # min
71