openpower/sv/mv.vec.mdwn

   1 [[!tag standards]]
   2
   3 # Vector Pack/Unpack operations
   4
   5 In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more.  [[svp64]] provides the Vector Context to also add saturation as well as predication.
   6
   7 * See <https://bugs.libre-soc.org/show_bug.cgi?id=230#c30>
   8 * <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-June/004911.html>
   9
  10 Pack and unpack may be covered by [[sv/remap]] by using Matrix 2D layouts on either source or destination but is quite expensive to do so.  Additionally,
  11 with pressure on the Scalar 32-bit opcode space it is more appropriate to
  12 compromise by adding required capability in SVP64 on top of a
  13 base pre-existing Scalar mv instruction.  [[sv/mv.swizzle]] is sufficiently
  14 unusual to justify a base Scalar 32-bit instruction but pack/unpack is not.
  15 Both may benefit from a use of the `RM.EXTRA` field to provide an
  16 additional mode, that may be applied to vec2/3/4.
  17
  18 # REMAP concept for pack/unpack
  19
  20 It may be possible to use one standard mv instruction to perform packing
  21 and unpacking: Matrix allows for both reordering and offsets. At the very least a predicate mask potentially can
  22 be used.
  23
  24 * If a single src-dest mv is used, then it potentially requires
  25   two separate REMAP and two separate sv.mvs: remap-even, sv.mv,
  26   remap-odd, sv.mv
  27 * If adding twin-src and twin-dest that is a lot of instructions,
  28   particularly if triple is added as well. FPR mv, GPR mv
  29 * Unless twin or triple is added, how is it possible to determine
  30   the extra register(s) to be merged (or split)?
  31
  32 How about instead relying on the implicit RS=MAXVL+RT trick and
  33 extending that to RS=MAXVL+RA as a source?  One spare bit in the
  34 EXTRA RM area says whether the sv.mv is a pack (RS-as-src=RA+MAXVL)
  35 or unpack (RS-as-dest=RT+MAXVL)
  36
  37 Alternatively, given that Matrix is up to 3 Dimensions, not even
  38 be concerned about RS, just simply use one of those dimensions to
  39 span the packing:
  40
  41 Example 1:
  42
  43 * RA set to linear
  44 * RT set to YX, ydim=2, xdim=4
  45 * VL=MAXVL=8
  46
  47 The indices match up as follows:
  48
  49     | RA | (0 1) (2 3) (4 5) (6 7) |
  50     | RT |   0 2 4 8     1 3 5 7   |
  51
  52 This results in a 2-element "unpack"
  53
  54 Example 2:
  55
  56 * RT set to linear
  57 * RT set to YX, ydim=3, xdim=3
  58 * VL=MAXVL=9
  59
  60 The indices match up as follows:
  61
  62     | RA |  0 1 2   3 4 5   6 7 8  |
  63     | RT | (0 3 6) (1 4 7) (2 5 8) |
  64
  65 This results in a 3-element "pack"
  66
  67 Both examples become particularly fun when Twin Predication is thrown
  68 into the mix.
  69
  70 There exists room within the `svshape` instruction of  [[sv/remap]]
  71 to request some alternative Matrix mappings, and there is also
  72 room within the reserved bits of `svremap` as well.
  73
  74 # RM Pack/unpack
  75
  76 Also used on [[sv/mv.swizzle]]
  77
  78 `RM-2P-1S1D-PU` Mode is applicable to all mv operations
  79 (fmv etc) and to Indexed LD/ST.
  80
  81 The usual RM-2P-1S1D is reduced from EXTRA3 to EXTRA2, making
  82 room for 2 extra bits that enable either "packing" or "unpacking"
  83 on the subvectors vec2/3/4.
  84
  85 Illustrating a
  86 "normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides):
  87
  88     def index():
  89         for i in range(VL):
  90             for j in range(SUBVL):
  91                 yield i*SUBVL+j
  92
  93     for idx in index():
  94         operation_on(RA+idx)
  95
  96 For pack/unpack (again, no elwidth overrides):
  97
  98     # yield an outer-SUBVL or inner VL loop with SUBVL
  99     def index_p(outer):
 100         if outer:
 101             for j in range(SUBVL):
 102                 for i in range(VL):
 103                     yield i+VL*j
 104         else:
 105             for i in range(VL):
 106                 for j in range(SUBVL):
 107                     yield i*SUBVL+j
 108
 109      # walk through both source and dest indices simultaneously
 110      for src_idx, dst_idx in zip(index_p(PACK), index_p(UNPACK)):
 111          move_operation(RT+dst_idx, RA+src_idx)
 112
 113 "yield" from python is used here for simplicity and clarity.
 114 The two Finite State Machines for the generation of the source
 115 and destination element offsets progress incrementally in
 116 lock-step.
 117
 118 Setting of both `PACK_en` and `UNPACK_en` is neither prohibited nor
 119 `UNDEFINED` because the reordering is fully deterministic, and
 120 additional REMAP reordering may be applied. For Matrix this would
 121 give potentially up to 4 Dimensions of reordering.