openpower/sv/av_opcodes.mdwn

   1 # Scalar OpenPOWER Audio and Video Opcodes
   2
   3 the fundamental principle of SV is a hardware for-loop. therefore the first (and in nearly 100% of cases only) place to put Vector operations is first and foremost in the *scalar* ISA.  However only by analysing those scalar opcodes *in* a SV Vectorisation context does it become clear why they are needed and how they may be designed.
   4
   5 This page therefore has acompanying discussion at <https://bugs.libre-soc.org/show_bug.cgi?id=230> for evolution of suitable opcodes.
   6
   7 # Audio
   8
   9 The fundamental principle for these instructions is:
  10
  11 * identify the scalar primitive
  12 * assume that longer runs of scalars will have Simple-V vectorisatin applied
  13 * assume that "swizzle" may be applied at the (vec2 - SUBVL=2) Vector level
  14
  15 Thus for example, where OpenPOWER VSX has vpkswss, this would be achieved in SV with simply:
  16
  17 * addition of a scalar ext/clamp instruction
  18 * 1st op, swizzle-selection vec2 "select X only" from source to dest:
  19   dest.X = extclamp(src.X)
  20 * 2nd op, swizzle-select vec2 "select Y only" from source to dest
  21   dest.Y = extclamp(src.Y)
  22
  23 Macro-op fusion may be used to detect that these two interleave cleanly.
  24
  25 # Video
  26
  27 TODO
  28
  29 ## VSX SIMD
  30
  31 ### vpkpx
  32
  33 vpkpx is a 32-bit to 16-bit 8888 into 1555 conversion
  34
  35 SV notes:
  36
  37 a single 32-bit to 16-bit operation should suffice, fitting cleanly into one single scalar op:
  38
  39     dest[0]     = src[7]
  40     dest[1 : 5] = src[8 :12]
  41     dest[6 :10] = src[16:20]
  42     dest[11:15] = src[24:28]
  43
  44 ### vpks[*][*]s
  45
  46 signed and unsigned, these are N-to-M (N=64/32/16, M=32/16/8) chop/clamp/sign/zero-extend operations
  47
  48 ### vupkhpx / vupklpx
  49
  50 these are 16-bit to 32-bit 1555 to 8888 conversion
  51
  52 ### vavgs*
  53
  54 signed and unsigned, 8/16/32: these are all of the form:
  55
  56     result = truncate((a + b + 1) >> 1))
  57
  58 ### vabsdu*
  59
  60 unsigned 8/16/32: these are all of the form:
  61
  62     result = (src1 > src2) ? truncate(src1-src2) :
  63                              truncate(src2-src1)
  64
  65 ### vmaxs* / vmaxu* (and min)
  66
  67 signed and unsigned, 8/16/32: these are all of the form:
  68
  69     result = (src1 > src2) ? src1 : src2 # max
  70     result = (src1 < src2) ? src1 : src2 # min
  71