(no commit message)
[libreriscv.git] / openpower / sv / mv.vec.mdwn
1 [[!tag standards]]
2
3 # Vector Pack/Unpack operations
4
5 In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more. [[svp64]] provides the Vector Context to also add saturation as well as predication.
6
7 * See <https://bugs.libre-soc.org/show_bug.cgi?id=230#c30>
8 * <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-June/004911.html>
9
10 Pack and unpack may be covered by [[sv/remap]] by using Matrix 2D layouts on either source or destination but is quite expensive to do so. Additionally,
11 with pressure on the Scalar 32-bit opcode space it is more appropriate to
12 compromise by adding required capability in SVP64 on top of a
13 base pre-existing Scalar mv instruction. [[sv/mv.swizzle]] is sufficiently
14 unusual to justify a base Scalar 32-bit instruction but pack/unpack is not.
15 Both may benefit from a use of the `RM.EXTRA` field to provide an
16 additional mode, that may be applied to vec2/3/4.
17
18 # REMAP concept for pack/unpack
19
20 It may be possible to use one standard mv instruction to perform packing
21 and unpacking: Matrix allows for both reordering and offsets. At the very least a predicate mask potentially can
22 be used.
23
24 * If a single src-dest mv is used, then it potentially requires
25 two separate REMAP and two separate sv.mvs: remap-even, sv.mv,
26 remap-odd, sv.mv
27 * If adding twin-src and twin-dest that is a lot of instructions,
28 particularly if triple is added as well. FPR mv, GPR mv
29 * Unless twin or triple is added, how is it possible to determine
30 the extra register(s) to be merged (or split)?
31
32 How about instead relying on the implicit RS=MAXVL+RT trick and
33 extending that to RS=MAXVL+RA as a source? One spare bit in the
34 EXTRA RM area says whether the sv.mv is a pack (RS-as-src=RA+MAXVL)
35 or unpack (RS-as-dest=RT+MAXVL)
36
37 Alternatively, given that Matrix is up to 3 Dimensions, not even
38 be concerned about RS, just simply use one of those dimensions to
39 span the packing:
40
41 Example 1:
42
43 * RA set to linear
44 * RT set to YX, ydim=2, xdim=4
45 * VL=MAXVL=8
46
47 The indices match up as follows:
48
49 | RA | (0 1) (2 3) (4 5) (6 7) |
50 | RT | 0 2 4 8 1 3 5 7 |
51
52 This results in a 2-element "unpack"
53
54 Example 2:
55
56 * RT set to linear
57 * RT set to YX, ydim=3, xdim=3
58 * VL=MAXVL=9
59
60 The indices match up as follows:
61
62 | RA | 0 1 2 3 4 5 6 7 8 |
63 | RT | (0 3 6) (1 4 7) (2 5 8) |
64
65 This results in a 3-element "pack"
66
67 Both examples become particularly fun when Twin Predication is thrown
68 into the mix.
69
70 There exists room within the `svshape` instruction of [[sv/remap]]
71 to request some alternative Matrix mappings, and there is also
72 room within the reserved bits of `svremap` as well.
73
74 # RM Pack/unpack
75
76 Also used on [[sv/mv.swizzle]]
77
78 `RM-2P-1S1D-PU` Mode is applicable to all mv operations
79 (fmv etc) and to Indexed LD/ST.
80
81 The usual RM-2P-1S1D is reduced from EXTRA3 to EXTRA2, making
82 room for 2 extra bits that enable either "packing" or "unpacking"
83 on the subvectors vec2/3/4.
84
85 Illustrating a
86 "normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides):
87
88 def index():
89 for i in range(VL):
90 for j in range(SUBVL):
91 yield i*SUBVL+j
92
93 for idx in index():
94 operation_on(RA+idx)
95
96 For pack/unpack (again, no elwidth overrides):
97
98 # yield an outer-SUBVL or inner VL loop with SUBVL
99 def index_p(outer):
100 if outer:
101 for j in range(SUBVL):
102 for i in range(VL):
103 yield i+VL*j
104 else:
105 for i in range(VL):
106 for j in range(SUBVL):
107 yield i*SUBVL+j
108
109 # walk through both source and dest indices simultaneously
110 for src_idx, dst_idx in zip(index_p(PACK), index_p(UNPACK)):
111 move_operation(RT+dst_idx, RA+src_idx)
112
113 "yield" from python is used here for simplicity and clarity.
114 The two Finite State Machines for the generation of the source
115 and destination element offsets progress incrementally in
116 lock-step.
117
118 Setting of both `PACK_en` and `UNPACK_en` is neither prohibited nor
119 `UNDEFINED` because the reordering is fully deterministic, and
120 additional REMAP reordering may be applied. For Matrix this would
121 give potentially up to 4 Dimensions of reordering.