From 56888c0e3424849aef9b7cdb25d25920ec5ef399 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Thu, 10 Dec 2020 20:30:01 +0000
Subject: [PATCH]

---
 openpower/sv/av_opcodes.mdwn | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/openpower/sv/av_opcodes.mdwn b/openpower/sv/av_opcodes.mdwn
index c03ac65cc..8ddf828b4 100644
--- a/openpower/sv/av_opcodes.mdwn
+++ b/openpower/sv/av_opcodes.mdwn
@@ -47,6 +47,8 @@ Useful parts of VSX, and how they might map.
 signed and unsigned, these are N-to-M (N=64/32/16, M=32/16/8) chop/clamp/sign/zero-extend operations. May be implemented by a clamped move to a smaller elwidth.
 
 The other direction, vec_unpack widening ops, may need some way to tell whether to sign-extend or zero-extend.
+
+*scalar extsw/b/h gives one set, mv gives another.  src elwidth override and dest rlwidth override provide the pack/unpack*
  
 ## vavgs\* (vec_avg)
 
@@ -54,6 +56,8 @@ signed and unsigned, 8/16/32: these are all of the form:
 
     result = truncate((a + b + 1) >> 1))
 
+*These do not exist in scalar ISA and would need to be added.  Essentially it is a type of post-processing involving the CA bit so could be included in the existing scalar pipeline ALU*
+
 ## vabsdu\* (vec_abs)
 
 unsigned 8/16/32: these are all of the form:
@@ -61,6 +65,8 @@ unsigned 8/16/32: these are all of the form:
     result = (src1 > src2) ? truncate(src1-src2) :
                              truncate(src2-src1)
 
+*These do not exist in the scalar ISA and would need to be added*
+
 ## vmaxs\* / vmaxu\* (and min)
 
 signed and unsigned, 8/16/32: these are all of the form:
@@ -68,6 +74,8 @@ signed and unsigned, 8/16/32: these are all of the form:
     result = (src1 > src2) ? src1 : src2 # max
     result = (src1 < src2) ? src1 : src2 # min
 
+*These do not exist in the scalar INTEGER ISA and would need to be added*
+
 ## vmerge operations
 
 Their main point was to work around the odd/even multiplies. SV swizzles and mv.x should handle all cases.
@@ -102,7 +110,7 @@ This should be separated to a horizontal multiply and a horizontal add. How a ho
     a.x + a.y + a.z ...
     a.x * a.y * a.z ...
 
-*This would realistically need to be done with a loop doing a mapreduce.  I looked very early on at doing this type of operation and concluded it would be better done with a series of halvings each time, as separate instructions:  VL=16 then VL=8 then 4 then 2 and finally one scalar.  An OoO multi-issue engine woukd be more than capable of desling with the Dependencies.*
+*This would realistically need to be done with a loop doing a mapreduce sequrnce.  I looked very early on at doing this type of operation and concluded it would be better done with a series of halvings each time, as separate instructions:  VL=16 then VL=8 then 4 then 2 and finally one scalar.  An OoO multi-issue engine woukd be more than capable of desling with the Dependencies.*
 
 ## vec_mul*
 
@@ -140,3 +148,5 @@ Bit counts.
     ctz - count trailing zeroes
     clz - count leading zeroes
     popcnt - count set bits
+
+*These all exist in the scalar ISA*
-- 
2.30.2