From 56888c0e3424849aef9b7cdb25d25920ec5ef399 Mon Sep 17 00:00:00 2001 From: lkcl Date: Thu, 10 Dec 2020 20:30:01 +0000 Subject: [PATCH] --- openpower/sv/av_opcodes.mdwn | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/openpower/sv/av_opcodes.mdwn b/openpower/sv/av_opcodes.mdwn index c03ac65cc..8ddf828b4 100644 --- a/openpower/sv/av_opcodes.mdwn +++ b/openpower/sv/av_opcodes.mdwn @@ -47,6 +47,8 @@ Useful parts of VSX, and how they might map. signed and unsigned, these are N-to-M (N=64/32/16, M=32/16/8) chop/clamp/sign/zero-extend operations. May be implemented by a clamped move to a smaller elwidth. The other direction, vec_unpack widening ops, may need some way to tell whether to sign-extend or zero-extend. + +*scalar extsw/b/h gives one set, mv gives another. src elwidth override and dest rlwidth override provide the pack/unpack* ## vavgs\* (vec_avg) @@ -54,6 +56,8 @@ signed and unsigned, 8/16/32: these are all of the form: result = truncate((a + b + 1) >> 1)) +*These do not exist in scalar ISA and would need to be added. Essentially it is a type of post-processing involving the CA bit so could be included in the existing scalar pipeline ALU* + ## vabsdu\* (vec_abs) unsigned 8/16/32: these are all of the form: @@ -61,6 +65,8 @@ unsigned 8/16/32: these are all of the form: result = (src1 > src2) ? truncate(src1-src2) : truncate(src2-src1) +*These do not exist in the scalar ISA and would need to be added* + ## vmaxs\* / vmaxu\* (and min) signed and unsigned, 8/16/32: these are all of the form: @@ -68,6 +74,8 @@ signed and unsigned, 8/16/32: these are all of the form: result = (src1 > src2) ? src1 : src2 # max result = (src1 < src2) ? src1 : src2 # min +*These do not exist in the scalar INTEGER ISA and would need to be added* + ## vmerge operations Their main point was to work around the odd/even multiplies. SV swizzles and mv.x should handle all cases. @@ -102,7 +110,7 @@ This should be separated to a horizontal multiply and a horizontal add. How a ho a.x + a.y + a.z ... a.x * a.y * a.z ... -*This would realistically need to be done with a loop doing a mapreduce. I looked very early on at doing this type of operation and concluded it would be better done with a series of halvings each time, as separate instructions: VL=16 then VL=8 then 4 then 2 and finally one scalar. An OoO multi-issue engine woukd be more than capable of desling with the Dependencies.* +*This would realistically need to be done with a loop doing a mapreduce sequrnce. I looked very early on at doing this type of operation and concluded it would be better done with a series of halvings each time, as separate instructions: VL=16 then VL=8 then 4 then 2 and finally one scalar. An OoO multi-issue engine woukd be more than capable of desling with the Dependencies.* ## vec_mul* @@ -140,3 +148,5 @@ Bit counts. ctz - count trailing zeroes clz - count leading zeroes popcnt - count set bits + +*These all exist in the scalar ISA* -- 2.30.2