openpower/sv/svp_rewrite/svp64/discussion.mdwn

   1 # Links
   2
   3 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-December/001498.html>>
   4
   5 # Notes on requirements for bit allocations
   6
   7 do not try to jam VL or MAXVL in.  go with the flow of 24 bits spare.
   8
   9 * 2: SUBVL
  10 * 2: elwidth
  11 * 2: twin-predication (src, dest) elwidth
  12 * 1: select INT or CR predication
  13 * 3: predicate selection and inversion (QTY 2 for tpred)
  14 * 4x2 or 3x3: src1/2/3/dest Vector/Scalar reg
  15 * 2: saturate mode
  16
  17 totals: 24 bits (dest elwidth shared)
  18
  19 http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-December/001434.html
  20
  21 ## twin predication
  22
  23 twin predication and twin elwidth overrides is extremely important to have to be able to override both the src and dest elwidth yet keep the underlying scalar operation intact.  examples include mr with an elwidth=8, VL=8 on the src will take a byte at a time from one 64 bit reg and place it into 8x 64-bit regs, zero-extended.  more complex operations involve SUBVL and Audio/Video DSP operations, see [[av_opcodes]]
  24
  25 something like:
  26
  27 | 0   1 | 2 3 | 4 5 | 6    | 7  9 | 10 12 | 13 18 | 19 21 |
  28 | ----- | --- | --- | ---- | ---- | ----- | ----- | ----- |
  29 | subvl | sew | dew | ptyp | psrc | pdst  | vspec | sat   |
  30
  31 * subvl - 1 to 4 scalar / vec2 / vec3 / vec4
  32 * sew / dew - DEFAULT / 8 / 16 /32 element width
  33 * ptyp - predication INT / CR
  34 * psrc / pdst - predicate mask selector and inversion
  35 * vspec - 3 bit src / dest scalar-vector extension
  36 * sat: 0bSU - S=1 signed U=1 unsigned 0b11 reserved
  37
  38 # standard arith ops (single predication)
  39
  40 these are of the form res = op(src1, src2, ...)
  41
  42 | 0   1 | 2 3 | 4 5 | 6    | 7  9 | 10 18 | 19 21 |
  43 | ----- | --- | --- | ---- | ---- | ----- | ----- |
  44 | subvl | sew | dew | ptyp | pred | vspec | sat   |
  45
  46 * subvl - 1 to 4 scalar / vec2 / vec3 / vec4
  47 * sew / dew - DEFAULT / 8 / 16 /32 element width
  48 * ptyp - predication INT / CR
  49 * pred - predicate mask selector and inversion
  50 * vspec - 2/3 bit src / dest scalar-vector extension
  51 * sat: 0bSU - S=1 signed U=1 unsigned 0b11 reserved
  52
  53 For 2 op (dest/src1/src2) the tag may be 3 bits: total 9 bits.  for 3 op (dest/src1/2/3) the vspec may be 2 bits per reg: total 8 bits.
  54
  55 Note:
  56
  57 * saturation is done on the result at the **source** elwidth
  58 * signed-saturation causes sign-extension from source to dest elwidths **after** saturation
  59
  60 # Notes about rounding, clamp and saturate
  61
  62 One of the issues with vector ops is that in integer DSP ops for example in Audio the operation must clamp or saturate rather than overflow or ignore the upper bits and become a modulo operation.  This for Audio is extremely important, also to provide an indicator as to whether saturation occurred.  see  [[av_opcodes]].
  63
  64 If there are spare bits it would be very good to look at using some of them to specify the mode, because otherwise a SPR has to be used which will need to be set and unset.  This can get costly.
  65
  66 Idea: 2 bits for clamping mode? similar to elwidth:
  67
  68 * 0b00 default (no clamp)
  69 * 0b01 8 bit (sel: -128/127, us:0/255)
  70 * 0b10 16 bit
  71 * 0b11 32 bit
  72
  73 not the same *as* elwidth.
  74
  75 # Notes about Swizzle
  76
  77 Basically, there isn't enough room to try to fit two src src1/2 swizzle, and SV, even into 64 bit (actually 24) without severely compromising on the number of bits allocated to either swizzle, or SV, or both.
  78
  79 therefore the strategy proposed is:
  80
  81 * design 16bit scalar ops
  82 * use the 11 bit old SV prefix to create 32bit insns
  83 * when those are embedded into v3.1B 64 prefix, the 24 bits are entirely allocated to swizzle.
  84
  85 with 2x12 this would mean no need to have complex encoding of swizzle.
  86
  87 if we really do need 2 bits spare then the complex encoder of swizzle could be deployed.
  88