(no commit message)
[libreriscv.git] / openpower / sv / mv.vec.mdwn
1 [[!tag standards]]
2
3 # Vector Pack/Unpack operations
4
5 In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more. [[svp64]] provides the Vector Context to also add saturation as well as predication.
6
7 * See <https://bugs.libre-soc.org/show_bug.cgi?id=230#c30>
8 * <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-June/004911.html>
9
10 Pack and unpack may be covered by [[sv/remap]] by using Matrix 2D layouts on either source or destination but is quite expensive to do so. Additionally,
11 with pressure on the Scalar 32-bit opcode space it is more appropriate to
12 compromise by adding required capability in SVP64 on top of a
13 base pre-existing Scalar mv instruction. [[sv/mv.swizzle]] is sufficiently
14 unusual to justify a base Scalar 32-bit instruction but pack/unpack is not.
15 Both may benefit from a use of the `RM.EXTRA` field to provide an
16 additional mode, that may be applied to vec2/3/4.
17
18 # REMAP concept for pack/unpack
19
20 It may be possible to use one standard mv instruction to perform packing
21 and unpacking: Matrix allows for both reordering and offsets. At the very least a predicate mask potentially can
22 be used.
23
24 * If a single src-dest mv is used, then it potentially requires
25 two separate REMAP and two separate sv.mvs: remap-even, sv.mv,
26 remap-odd, sv.mv
27 * If adding twin-src and twin-dest that is a lot of instructions,
28 particularly if triple is added as well. FPR mv, GPR mv
29 * Unless twin or triple is added, how is it possible to determine
30 the extra register(s) to be merged (or split)?
31
32 How about instead relying on the implicit RS=MAXVL+RT trick and
33 extending that to RS=MAXVL+RA as a source? One spare bit in the
34 EXTRA RM area says whether the sv.mv is a pack (RS-as-src=RA+MAXVL)
35 or unpack (RS-as-dest=RT+MAXVL)
36
37 Alternatively, given that Matrix is up to 3 Dimensions, not even
38 be concerned about RS, just simply use one of those dimensions to
39 span the packing:
40
41 Example 1:
42
43 * RA set to linear
44 * RT set to YX, ydim=2, xdim=4
45 * VL=MAXVL=8
46
47 The indices match up as follows:
48
49 | RA | (0 1) (2 3) (4 5) (6 7) |
50 | RT | 0 2 4 8 1 3 5 7 |
51
52 This results in a 2-element "unpack"
53
54 Example 2:
55
56 * RT set to linear
57 * RT set to YX, ydim=3, xdim=3
58 * VL=MAXVL=9
59
60 The indices match up as follows:
61
62 | RA | 0 1 2 3 4 5 6 7 8 |
63 | RT | (0 3 6) (1 4 7) (2 5 8) |
64
65 This results in a 3-element "pack"
66
67 Both examples become particularly fun when Twin Predication is thrown
68 into the mix.
69
70 There exists room within the `svshape` instruction of [[sv/remap]]
71 to request some alternative Matrix mappings, and there is also
72 room within the reserved bits of `svremap` as well.
73
74 # RM Pack/unpack
75
76 Also used on [[sv/mv.swizzle]]
77
78 MVRM-2P-1S1D:
79
80 | Field Name | Field bits | Description |
81 |------------|------------|----------------------------|
82 | Rdest_EXTRA2 | `10:11` | extends Rdest (R\*\_EXTRA2 Encoding) |
83 | Rsrc_EXTRA2 | `12:13` | extends Rsrc (R\*\_EXTRA2 Encoding) |
84 | PACK_en | `14` | Enable pack |
85 | UNPACK_en | `15` | Enable unpack |
86 | MASK_SRC | `16:18` | Execution Mask for Source |
87
88 The usual RM-2P-1S1D is reduced from EXTRA3 to EXTRA2, making
89 room for 2 extra bits that enable either "packing" or "unpacking"
90 on the subvectors vec2/3/4.
91
92 Illustrating a
93 "normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides):
94
95 def index():
96 for i in range(VL):
97 for j in range(SUBVL):
98 yield i*SUBVL+j
99
100 for idx in index():
101 operation_on(RA+idx)
102
103 For pack/unpack (again, no elwidth overrides):
104
105 # yield an outer-SUBVL or inner VL loop with SUBVL
106 def index_p(outer):
107 if outer:
108 for j in range(SUBVL):
109 for i in range(VL):
110 yield i+VL*j
111 else:
112 for i in range(VL):
113 for j in range(SUBVL):
114 yield i*SUBVL+j
115
116 # walk through both source and dest indices simultaneously
117 for src_idx, dst_idx in zip(index_p(PACK), index_p(UNPACK)):
118 move_operation(RT+dst_idx, RA+src_idx)
119
120 "yield" from python is used here for simplicity and clarity.
121 The two Finite State Machines for the generation of the source
122 and destination element offsets progress incrementally in
123 lock-step.
124
125 Setting of both `PACK_en` and `UNPACK_en` is neither prohibited nor
126 `UNDEFINED` because the reordering is fully deterministic, and
127 additional REMAP reordering may be applied. For Matrix this would
128 give potentially up to 4 Dimensions of reordering.