+[[!tag standards]]
+
# Scalar OpenPOWER Audio and Video Opcodes
the fundamental principle of SV is a hardware for-loop. therefore the first (and in nearly 100% of cases only) place to put Vector operations is first and foremost in the *scalar* ISA. However only by analysing those scalar opcodes *in* a SV Vectorisation context does it become clear why they are needed and how they may be designed.
-This page therefore has acompanying discussion at <https://bugs.libre-soc.org/show_bug.cgi?id=230> for evolution of suitable opcodes.
+This page therefore has accompanying discussion at <https://bugs.libre-soc.org/show_bug.cgi?id=230> for evolution of suitable opcodes.
+
+Links
+
+* <https://bugs.libre-soc.org/show_bug.cgi?id=915> add overflow to maxmin.
+* <https://bugs.libre-soc.org/show_bug.cgi?id=863> add pseudocode etc.
+* <https://bugs.libre-soc.org/show_bug.cgi?id=234> hardware implementation
+* <https://bugs.libre-soc.org/show_bug.cgi?id=910> mins/maxs zero-option?
+* <https://bugs.libre-soc.org/show_bug.cgi?id=1057> move all int/fp min/max to ls013
+* [[vpu]]
+* [[sv/int_fp_mv]]
+* [[openpower/isa/av]] pseudocode
+* [[av_opcodes/analysis]]
+* TODO review HP 1994-6 PA-RISC MAX <https://en.m.wikipedia.org/wiki/Multimedia_Acceleration_eXtensions>
+* <https://en.m.wikipedia.org/wiki/Sum_of_absolute_differences>
+* List of MMX instructions <https://cs.fit.edu/~mmahoney/cse3101/mmx.html>
+
+# Summary
+
+In-advance, the summary of base scalar operations that need to be added is:
+
+| instruction | pseudocode |
+| ------------ | ------------------------ |
+| average-add. | result = (src1 + src2 + 1) >> 1 |
+| abs-diff | result = abs (src1-src2) |
+| abs-accumulate| result += abs (src1-src2) |
+| (un)signed min| result = (src1 < src2) ? src1 : src2 [[ls013]] |
+| (un)signed max| result = (src1 > src2) ? src1 : src2 [[ls013]] |
+| bitwise sel | (a ? b : c) - use [[sv/bitmanip]] ternary |
+| int/fp move | covered by REMAP and Pack/Unpack |
+
+Implemented at the [[openpower/isa/av]] pseudocode page.
+
+All other capabilities (saturate in particular) are achieved with [[sv/svp64]] modes and swizzle. Note that minmax and ternary are added in bitmanip.
+
+# Instructions
+
+## Average Add
+
+X-Form
+
+* avgadd RT,RA,RB (Rc=0)
+* avgadd. RT,RA,RB (Rc=1)
+
+Pseudo-code:
+
+ a <- [0] * (XLEN+1)
+ b <- [0] * (XLEN+1)
+ a[1:XLEN] <- (RA)
+ b[1:XLEN] <- (RB)
+ r <- (a + b + 1)
+ RT <- r[0:XLEN-1]
+
+Special Registers Altered:
-# Audio
+ CR0 (if Rc=1)
-The fundamental principle for these instructions is:
+## Absolute Signed Difference
-* identify the scalar primitive
-* assume that longer runs of scalars will have Simple-V vectorisatin applied
-* assume that "swizzle" may be applied at the (vec2 - SUBVL=2) Vector level,
- in order to perform the necessary HI/LO selection normally hard-coded
- into SIMD ISAs.
+X-Form
-Thus for example, where OpenPOWER VSX has vpkswss, this would be achieved in SV with simply:
+* absds RT,RA,RB (Rc=0)
+* absds. RT,RA,RB (Rc=1)
-* addition of a scalar ext/clamp instruction
-* 1st op, swizzle-selection vec2 "select X only" from source to dest:
- dest.X = extclamp(src.X)
-* 2nd op, swizzle-select vec2 "select Y only" from source to dest
- dest.Y = extclamp(src.Y)
+Pseudo-code:
-Macro-op fusion may be used to detect that these two interleave cleanly, overlapping the vec2.X with vec2.Y to produce a single vec2.XY operation.
+ if (RA) < (RB) then RT <- ¬(RA) + (RB) + 1
+ else RT <- ¬(RB) + (RA) + 1
-## Scalar element operations
+Special Registers Altered:
-* clamping / saturation for signed and unsigned. best done similar to FP rounding modes, i.e. with an SPR.
-* average-add. result = (src1 + src2 + 1) >> 1
-* abs-diff: result = (src1 > src2) ? (src1-src2) : (src2-src1)
-* signed min/max
+ CR0 (if Rc=1)
-# Video
+## Absolute Unsigned Difference
-TODO
+X-Form
-# VSX SIMD
+* absdu RT,RA,RB (Rc=0)
+* absdu. RT,RA,RB (Rc=1)
-## vpkpx
+Pseudo-code:
-vpkpx is a 32-bit to 16-bit 8888 into 1555 conversion
+ if (RA) <u (RB) then RT <- ¬(RA) + (RB) + 1
+ else RT <- ¬(RB) + (RA) + 1
-SV notes:
+Special Registers Altered:
-a single 32-bit to 16-bit operation should suffice, fitting cleanly into one single scalar op:
+ CR0 (if Rc=1)
- dest[0] = src[7]
- dest[1 : 5] = src[8 :12]
- dest[6 :10] = src[16:20]
- dest[11:15] = src[24:28]
+## Absolute Accumulate Unsigned Difference
-## vpks[\*][\*]s
+X-Form
-signed and unsigned, these are N-to-M (N=64/32/16, M=32/16/8) chop/clamp/sign/zero-extend operations
-
-## vupkhpx / vupklpx
+* absdacu RT,RA,RB (Rc=0)
+* absdacu. RT,RA,RB (Rc=1)
-these are 16-bit to 32-bit 1555 to 8888 conversion
+Pseudo-code:
-## vavgs\*
+ if (RA) <u (RB) then r <- ¬(RA) + (RB) + 1
+ else r <- ¬(RB) + (RA) + 1
+ RT <- (RT) + r
-signed and unsigned, 8/16/32: these are all of the form:
+Special Registers Altered:
- result = truncate((a + b + 1) >> 1))
+ CR0 (if Rc=1)
-## vabsdu\*
+## Absolute Accumulate Signed Difference
-unsigned 8/16/32: these are all of the form:
+X-Form
- result = (src1 > src2) ? truncate(src1-src2) :
- truncate(src2-src1)
+* absdacs RT,RA,RB (Rc=0)
+* absdacs. RT,RA,RB (Rc=1)
-## vmaxs\* / vmaxu\* (and min)
+Pseudo-code:
-signed and unsigned, 8/16/32: these are all of the form:
+ if (RA) < (RB) then r <- ¬(RA) + (RB) + 1
+ else r <- ¬(RB) + (RA) + 1
+ RT <- (RT) + r
- result = (src1 > src2) ? src1 : src2 # max
- result = (src1 < src2) ? src1 : src2 # min
+Special Registers Altered:
+ CR0 (if Rc=1)