Thus for example, where OpenPOWER VSX has vpkswss, this would be achieved in SV with simply:
-* addition of a scalar ext/clamp instruction
+* applying saturation to maxu (sv.maxu/satu)
* 1st op, swizzle-selection vec2 "select X only" from source to dest:
dest.X = extclamp(src.X)
* 2nd op, swizzle-select vec2 "select Y only" from source to dest
Macro-op fusion may be used to detect that these two interleave cleanly, overlapping the vec2.X with vec2.Y to produce a single vec2.XY operation.
+Alternatively Twin-Predication may be applied, with every even bit set in
+the source mask and every odd bit set in the destination mask:
+
+ r3=0b10101010
+ r10=0b01010101
+ r0=0x00007fff # or other limit
+ sv.maxu/satu/sm=r3/dm=r10/ew=32 *r20,*r20,r0
+
## Scalar element operations
* clamping / saturation for signed and unsigned. best done similar to FP rounding modes, i.e. with an SPR.