# Scalar OpenPOWER Audio and Video Opcodes the fundamental principle of SV is a hardware for-loop. therefore the first (and in nearly 100% of cases only) place to put Vector operations is first and foremost in the *scalar* ISA. However only by analysing those scalar opcodes *in* a SV Vectorisation context does it become clear why they are needed and how they may be designed. This page therefore has acompanying discussion at for evolution of suitable opcodes. # Audio The fundamental principle for these instructions is: * identify the scalar primitive * assume that longer runs of scalars will have Simple-V vectorisatin applied * assume that "swizzle" may be applied at the (vec2 - SUBVL=2) Vector level, in order to perform the necessary HI/LO selection normally hard-coded into SIMD ISAs. Thus for example, where OpenPOWER VSX has vpkswss, this would be achieved in SV with simply: * addition of a scalar ext/clamp instruction * 1st op, swizzle-selection vec2 "select X only" from source to dest: dest.X = extclamp(src.X) * 2nd op, swizzle-select vec2 "select Y only" from source to dest dest.Y = extclamp(src.Y) Macro-op fusion may be used to detect that these two interleave cleanly, overlapping the vec2.X with vec2.Y to produce a single vec2.XY operation. ## Scalar element operations * clamping / saturation for signed and unsigned. best done similar to FP rounding modes, i.e. with an SPR. * average-add. result = (src1 + src2 + 1) >> 1 * abs-diff: result = (src1 > src2) ? (src1-src2) : (src2-src1) * signed min/max # Video TODO # VSX SIMD ## vpkpx vpkpx is a 32-bit to 16-bit 8888 into 1555 conversion SV notes: a single 32-bit to 16-bit operation should suffice, fitting cleanly into one single scalar op: dest[0] = src[7] dest[1 : 5] = src[8 :12] dest[6 :10] = src[16:20] dest[11:15] = src[24:28] ## vpks[\*][\*]s signed and unsigned, these are N-to-M (N=64/32/16, M=32/16/8) chop/clamp/sign/zero-extend operations ## vupkhpx / vupklpx these are 16-bit to 32-bit 1555 to 8888 conversion ## vavgs\* signed and unsigned, 8/16/32: these are all of the form: result = truncate((a + b + 1) >> 1)) ## vabsdu\* unsigned 8/16/32: these are all of the form: result = (src1 > src2) ? truncate(src1-src2) : truncate(src2-src1) ## vmaxs\* / vmaxu\* (and min) signed and unsigned, 8/16/32: these are all of the form: result = (src1 > src2) ? src1 : src2 # max result = (src1 < src2) ? src1 : src2 # min