openpower/sv/mv.vec.mdwn

   1 [[!tag standards]]
   2
   3 # Vector Pack/Unpack operations
   4
   5 In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more.  [[svp64]] provides the Vector Context to also add saturation as well as predication.
   6
   7 * See <https://bugs.libre-soc.org/show_bug.cgi?id=230#c30>
   8 * <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-June/004911.html>
   9
  10 Pack and unpack may be covered by [[sv/remap]] by using Matrix 2D layouts on either source or destination but is quite expensive to do so.  Additionally,
  11 with pressure on the Scalar 32-bit opcode space it is more appropriate to
  12 compromise by adding required capability in SVP64 on top of a
  13 base pre-existing Scalar mv instruction.  [[sv/mv.swizzle]] is sufficiently
  14 unusual to justify a base Scalar 32-bit instruction but pack/unpack is not.
  15 Both may benefit from a use of the `RM.EXTRA` field to provide an
  16 additional mode, that may be applied to vec2/3/4.
  17
  18 # REMAP concept for pack/unpack
  19
  20 It may be possible to use one standard mv instruction to perform packing
  21 and unpacking: Matrix allows for both reordering and offsets. At the very least a predicate mask potentially can
  22 be used.
  23
  24 * If a single src-dest mv is used, then it potentially requires
  25   two separate REMAP and two separate sv.mvs: remap-even, sv.mv,
  26   remap-odd, sv.mv
  27 * If adding twin-src and twin-dest that is a lot of instructions,
  28   particularly if triple is added as well. FPR mv, GPR mv
  29 * Unless twin or triple is added, how is it possible to determine
  30   the extra register(s) to be merged (or split)?
  31
  32 How about instead relying on the implicit RS=MAXVL+RT trick and
  33 extending that to RS=MAXVL+RA as a source?  One spare bit in the
  34 EXTRA RM area says whether the sv.mv is a pack (RS-as-src=RA+MAXVL)
  35 or unpack (RS-as-dest=RT+MAXVL)
  36
  37 Alternatively, given that Matrix is up to 3 Dimensions, not even
  38 be concerned about RS, just simply use one of those dimensions to
  39 span the packing:
  40
  41 Example 1:
  42
  43 * RA set to linear
  44 * RT set to YX, ydim=2, xdim=4
  45 * VL=MAXVL=8
  46
  47 The indices match up as follows:
  48
  49     | RA | (0 1) (2 3) (4 5) (6 7) |
  50     | RT |   0 2 4 8     1 3 5 7   |
  51
  52 This results in a 2-element "unpack"
  53
  54 Example 2:
  55
  56 * RT set to linear
  57 * RT set to YX, ydim=3, xdim=3
  58 * VL=MAXVL=9
  59
  60 The indices match up as follows:
  61
  62     | RA |  0 1 2   3 4 5   6 7 8  |
  63     | RT | (0 3 6) (1 4 7) (2 5 8) |
  64
  65 This results in a 3-element "pack"
  66
  67 Both examples become particularly fun when Twin Predication is thrown
  68 into the mix.
  69
  70 There exists room within the `svshape` instruction of  [[sv/remap]]
  71 to request some alternative Matrix mappings, and there is also
  72 room within the reserved bits of `svremap` as well.
  73
  74 # RM Pack/unpack
  75
  76 Also used on [[sv/mv.swizzle]]
  77
  78 MVRM-2P-1S1D:
  79
  80 | Field Name | Field bits | Description                     |
  81 |------------|------------|----------------------------|
  82 | Rdest_EXTRA2 | `10:11`  | extends Rdest (R\*\_EXTRA2 Encoding)   |
  83 | Rsrc_EXTRA2  | `12:13`  | extends Rsrc  (R\*\_EXTRA2 Encoding)   |
  84 | PACK_en      | `14`     | Enable pack              |
  85 | UNPACK_en    | `15`     | Enable unpack             |
  86 | MASK_SRC     | `16:18`  | Execution Mask for Source     |
  87
  88 The usual RM-2P-1S1D is reduced from EXTRA3 to EXTRA2, making
  89 room for 2 extra bits that enable either "packing" or "unpacking"
  90 on the subvectors vec2/3/4.
  91
  92 Illustrating a
  93 "normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides):
  94
  95     def index():
  96         for i in range(VL):
  97             for j in range(SUBVL):
  98                 yield i*SUBVL+j
  99
 100     for idx in index():
 101         operation_on(RA+idx)
 102
 103 For pack/unpack (again, no elwidth overrides):
 104
 105     # yield an outer-SUBVL or inner VL loop with SUBVL
 106     def index_p(outer):
 107         if outer:
 108             for j in range(SUBVL):
 109                 for i in range(VL):
 110                     yield i+VL*j
 111         else:
 112             for i in range(VL):
 113                 for j in range(SUBVL):
 114                     yield i*SUBVL+j
 115
 116      # walk through both source and dest indices simultaneously
 117      for src_idx, dst_idx in zip(index_p(PACK), index_p(UNPACK)):
 118          move_operation(RT+dst_idx, RA+src_idx)
 119
 120 "yield" from python is used here for simplicity and clarity.
 121 The two Finite State Machines for the generation of the source
 122 and destination element offsets progress incrementally in
 123 lock-step.
 124
 125 Setting of both `PACK_en` and `UNPACK_en` is neither prohibited nor
 126 `UNDEFINED` because the reordering is fully deterministic, and
 127 additional REMAP reordering may be applied. For Matrix this would
 128 give potentially up to 4 Dimensions of reordering.