(no commit message)

[libreriscv.git] / openpower / sv / av_opcodes.mdwn
diff --git a/openpower/sv/av_opcodes.mdwn b/openpower/sv/av_opcodes.mdwn

index 089644a8fce47602980456c09a8bf460f0206d97..9614d0090532bd0329826cb1b05c0b4224708bfb 100644 (file)
--- a/openpower/sv/av_opcodes.mdwn
+++ b/openpower/sv/av_opcodes.mdwn
@@ -1,80 +1,128 @@
+[[!tag standards]]
+
  # Scalar OpenPOWER Audio and Video Opcodes
  
  the fundamental principle of SV is a hardware for-loop. therefore the first (and in nearly 100% of cases only) place to put Vector operations is first and foremost in the *scalar* ISA.  However only by analysing those scalar opcodes *in* a SV Vectorisation context does it become clear why they are needed and how they may be designed.
  
-This page therefore has acompanying discussion at <https://bugs.libre-soc.org/show_bug.cgi?id=230> for evolution of suitable opcodes.
+This page therefore has accompanying discussion at <https://bugs.libre-soc.org/show_bug.cgi?id=230> for evolution of suitable opcodes.
+
+Links
+
+* <https://bugs.libre-soc.org/show_bug.cgi?id=915> add overflow to maxmin.
+* <https://bugs.libre-soc.org/show_bug.cgi?id=863> add pseudocode etc.
+* <https://bugs.libre-soc.org/show_bug.cgi?id=234> hardware implementation
+* <https://bugs.libre-soc.org/show_bug.cgi?id=910> mins/maxs zero-option?
+* <https://bugs.libre-soc.org/show_bug.cgi?id=1057> move all int/fp min/max to ls013
+* [[vpu]]
+* [[sv/int_fp_mv]]
+* [[openpower/isa/av]] pseudocode
+* [[av_opcodes/analysis]]
+* TODO review HP 1994-6 PA-RISC MAX <https://en.m.wikipedia.org/wiki/Multimedia_Acceleration_eXtensions>
+* <https://en.m.wikipedia.org/wiki/Sum_of_absolute_differences>
+* List of MMX instructions <https://cs.fit.edu/~mmahoney/cse3101/mmx.html>
+
+# Summary
+
+In-advance, the summary of base scalar operations that need to be added is:
+
+| instruction   | pseudocode               |
+| ------------  | ------------------------      |
+| average-add.  | result = (src1 + src2 + 1) >> 1 |
+| abs-diff      | result = abs (src1-src2) |
+| abs-accumulate| result += abs (src1-src2) |
+| (un)signed min| result = (src1 < src2) ? src1 : src2 [[ls013]] |
+| (un)signed max| result = (src1 > src2) ? src1 : src2 [[ls013]]  |
+| bitwise sel   | (a ? b : c) - use [[sv/bitmanip]] ternary |
+| int/fp move   | covered by REMAP and Pack/Unpack |
+
+Implemented at the [[openpower/isa/av]] pseudocode page.
+
+All other capabilities (saturate in particular) are achieved with [[sv/svp64]] modes and swizzle.  Note that minmax and ternary are added in bitmanip.
+
+# Instructions
+
+## Average Add
+
+X-Form
+
+* avgadd  RT,RA,RB (Rc=0)
+* avgadd. RT,RA,RB (Rc=1)
+
+Pseudo-code:
+
+    a <- [0] * (XLEN+1)
+    b <- [0] * (XLEN+1)
+    a[1:XLEN] <- (RA)
+    b[1:XLEN] <- (RB)
+    r <- (a + b + 1)
+    RT <- r[0:XLEN-1]
+
+Special Registers Altered:
  
-# Audio
+    CR0                     (if Rc=1)
  
-The fundamental principle for these instructions is:
+## Absolute Signed Difference
  
-* identify the scalar primitive
-* assume that longer runs of scalars will have Simple-V vectorisatin applied
-* assume that "swizzle" may be applied at the (vec2 - SUBVL=2) Vector level,
-  in order to perform the necessary HI/LO selection normally hard-coded
-  into SIMD ISAs.
+X-Form
  
-Thus for example, where OpenPOWER VSX has vpkswss, this would be achieved in SV with simply:
+* absds  RT,RA,RB (Rc=0)
+* absds. RT,RA,RB (Rc=1)
  
-* addition of a scalar ext/clamp instruction
-* 1st op, swizzle-selection vec2 "select X only" from source to dest:
-  dest.X = extclamp(src.X)
-* 2nd op, swizzle-select vec2 "select Y only" from source to dest
-  dest.Y = extclamp(src.Y)
+Pseudo-code:
  
-Macro-op fusion may be used to detect that these two interleave cleanly, overlapping the vec2.X with vec2.Y to produce a single vec2.XY operation.
+    if (RA) < (RB) then RT <- ¬(RA) + (RB) + 1
+    else                RT <- ¬(RB) + (RA) + 1
  
-## Scalar element operations
+Special Registers Altered:
  
-* clamping / saturation for signed and unsigned.  best done similar to FP rounding modes, i.e. with an SPR.
-* average-add.  result = (src1 + src2 + 1) >> 1
-* abs-diff: result = (src1 > src2) ? (src1-src2) : (src2-src1)
-* signed min/max
+    CR0                     (if Rc=1)
  
-# Video
+## Absolute Unsigned Difference
  
-TODO
+X-Form
  
-# VSX SIMD
+* absdu  RT,RA,RB (Rc=0)
+* absdu. RT,RA,RB (Rc=1)
  
-## vpkpx
+Pseudo-code:
  
-vpkpx is a 32-bit to 16-bit 8888 into 1555 conversion
+    if (RA) <u (RB) then RT <- ¬(RA) + (RB) + 1
+    else                RT <- ¬(RB) + (RA) + 1
  
-SV notes:
+Special Registers Altered:
  
-a single 32-bit to 16-bit operation should suffice, fitting cleanly into one single scalar op:
+    CR0                     (if Rc=1)
  
-    dest[0]     = src[7]
-    dest[1 : 5] = src[8 :12]
-    dest[6 :10] = src[16:20]
-    dest[11:15] = src[24:28]
+## Absolute Accumulate Unsigned Difference
  
-## vpks[\*][\*]s
+X-Form
  
-signed and unsigned, these are N-to-M (N=64/32/16, M=32/16/8) chop/clamp/sign/zero-extend operations
- 
-## vupkhpx / vupklpx
+* absdacu  RT,RA,RB (Rc=0)
+* absdacu. RT,RA,RB (Rc=1)
  
-these are 16-bit to 32-bit 1555 to 8888 conversion
+Pseudo-code:
  
-## vavgs\*
+    if (RA) <u (RB) then r <- ¬(RA) + (RB) + 1
+    else                 r <- ¬(RB) + (RA) + 1
+    RT <- (RT) + r
  
-signed and unsigned, 8/16/32: these are all of the form:
+Special Registers Altered:
  
-    result = truncate((a + b + 1) >> 1))
+    CR0                     (if Rc=1)
  
-## vabsdu\*
+## Absolute Accumulate Signed Difference
  
-unsigned 8/16/32: these are all of the form:
+X-Form
  
-    result = (src1 > src2) ? truncate(src1-src2) :
-                             truncate(src2-src1)
+* absdacs  RT,RA,RB (Rc=0)
+* absdacs. RT,RA,RB (Rc=1)
  
-## vmaxs\* / vmaxu\* (and min)
+Pseudo-code:
  
-signed and unsigned, 8/16/32: these are all of the form:
+    if (RA) < (RB) then r <- ¬(RA) + (RB) + 1
+    else                r <- ¬(RB) + (RA) + 1
+    RT <- (RT) + r
  
-    result = (src1 > src2) ? src1 : src2 # max
-    result = (src1 < src2) ? src1 : src2 # min
+Special Registers Altered:
  
+    CR0                     (if Rc=1)