(no commit message)

[libreriscv.git] / openpower / sv / svp_rewrite / svp64 / discussion.mdwn
diff --git a/openpower/sv/svp_rewrite/svp64/discussion.mdwn b/openpower/sv/svp_rewrite/svp64/discussion.mdwn

index 89e8b343ac8e40880ed4a18a16ece4292e017859..1b03c6a54fbb7dc5f688a8817efbb11f97317a70 100644 (file)
--- a/openpower/sv/svp_rewrite/svp64/discussion.mdwn
+++ b/openpower/sv/svp_rewrite/svp64/discussion.mdwn
@@ -1,19 +1,86 @@
+# Links
+
+* <http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-December/001498.html>>
+
  # Notes on requirements for bit allocations
  
+do not try to jam VL or MAXVL in.  go with the flow of 24 bits spare.
+
  * 2: SUBVL
  * 2: elwidth
  * 2: twin-predication (src, dest) elwidth
  * 1: select INT or CR predication
  * 3: predicate selection and inversion (QTY 2 for tpred)
  * 4x2 or 3x3: src1/2/3/dest Vector/Scalar reg
+* 3: saturate mode
  
-totals: 22 bits leaving 2 spare for further modes.
+totals: 22 bits (dest elwidth shared)
  
  http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-December/001434.html
  
+## twin predication
+
+twin predication and twin elwidth overrides is extremely important to have to be able to override both the src and dest elwidth yet keep the underlying scalar operation intact.  examples include mr with an elwidth=8, VL=8 on the src will take a byte at a time from one 64 bit reg and place it into 8x 64-bit regs, zero-extended.  more complex operations involve SUBVL and Audio/Video DSP operations, see [[av_opcodes]]
+
+something like:
+
+| 0   1 | 2 3 | 4 5 | 6    | 7  9 | 10 12 | 13 18 | 19 20 |
+| ----- | --- | --- | ---- | ---- | ----- | ----- | ----- |
+| subvl | sew | dew | ptyp | psrc | pdst  | vspec | sat   |
+
+* subvl - 1 to 4 scalar / vec2 / vec3 / vec4
+* sew / dew - DEFAULT / 8 / 16 /32 element width
+* ptyp - predication INT / CR
+* psrc / pdst - predicate mask selector and inversion
+* vspec - 3 bit src / dest scalar-vector extension
+* sat: 2 bit s/u
+
+## twin predication, CR based.
+
+separate src and dest predicates are a critical part of SV for provision of VGATHER, VSCATTER, VREDUCE, VSPLAT and many more operations.
+
+Twin CR predication could be done in two ways:
+
+* start from different CRs for the src and dest
+* start from the same CR.
+
+With different bits being selectable (CR[0..3]) starting from the same CR makes some sense.
+
+
+# standard arith ops (single predication)
+
+these are of the form res = op(src1, src2, ...)
+
+| 0   1 | 2 3 | 4 5 | 6    | 7  9 | 10 18 | 19 20 |
+| ----- | --- | --- | ---- | ---- | ----- | ----- |
+| subvl | sew | dew | ptyp | pred | vspec | sat   |
+
+* subvl - 1 to 4 scalar / vec2 / vec3 / vec4
+* sew / dew - DEFAULT / 8 / 16 /32 element width
+* ptyp - predication INT / CR
+* pred - predicate mask selector and inversion
+* vspec - 2/3 bit src / dest scalar-vector extension
+* sat: 2 bit s/u
+
+
+For 2 op (dest/src1/src2) the tag may be 3 bits: total 9 bits.  for 3 op (dest/src1/2/3) the vspec may be 2 bits per reg: total 8 bits.
+
+Note:
+
+* for saturation the operation is done at the **source** width
+  (this is different from normal elwidth overrides which
+   are done at the **dest** width)
+* saturation is done on the result at the **dest** elwidth
+
+# Notes about rounding, clamp and saturate
+
+One of the issues with vector ops is that in integer DSP ops for example in Audio the operation must clamp or saturate rather than overflow or ignore the upper bits and become a modulo operation.  This for Audio is extremely important, also to provide an indicator as to whether saturation occurred.  see  [[av_opcodes]].
+
+If there are spare bits it would be very good to look at using some of them to specify the mode, because otherwise a SPR has to be used which will need to be set and unset.  This can get costly.
+
  # Notes about Swizzle
  
-Basically, there isn't enough room to try to fit two src src1/2 swizzle, and SV, even into 64 bit.
+Basically, there isn't enough room to try to fit two src src1/2 swizzle, and SV, even into 64 bit (actually 24) without severely compromising on the number of bits allocated to either swizzle, or SV, or both.
  
  therefore the strategy proposed is:
  
@@ -25,3 +92,10 @@ with 2x12 this would mean no need to have complex encoding of swizzle.
  
  if we really do need 2 bits spare then the complex encoder of swizzle could be deployed.
  
+# note about INT predicate
+
+001    ALWAYS (implicit)       Operation is not masked
+
+this means by default that 001 will always be in nonpredicated ops, which seems anomalous.  would 000 be better to indicate "no predication"?
+
+000 would indicate "the predicate is an immediate of all 1s" i.e. "no operation is masked out"