54128e02d8f954d476dcb4e88c49d5d186a6dfdf
[libreriscv.git] / 3d_gpu / architecture / dynamic_simd / slice.mdwn
1 # Dynamic Partitioned Slice (`SimdSlice`)
2
3 In order to match the semantics of nmigen's `Slice` class, `SimdSlice` has to have each element of the result have
4 exactly the same `Shape` as the result of slicing the input `SimdSignal`'s corresponding element.
5
6 ## Example code:
7
8 ```python
9 a_s = SimdSignal(...)
10 a = a_s.sig # shorthand to make table smaller
11 b_s = a_s[3:6]
12 b = b_s.sig # shorthand to make table smaller
13 ```
14
15 ## `a`'s Elements:
16
17 (TODO 1: shrink to only 4 partitions. TODO 2: convert to markdown)
18
19 <table>
20 <tr class="text-right">
21 <th scope="row" class="text-left">Bit #</th>
22 <td>63&#8288;&hellip;&#8288;56</td>
23 <td>55&#8288;&hellip;&#8288;48</td>
24 <td>47&#8288;&hellip;&#8288;40</td>
25 <td>39&#8288;&hellip;&#8288;32</td>
26 <td>31&#8288;&hellip;&#8288;24</td>
27 <td>23&#8288;&hellip;&#8288;16</td>
28 <td>15&#8288;&hellip;&#8288;8</td>
29 <td>7&#8288;&hellip;&#8288;0</td>
30 </tr>
31 <tr class="text-right">
32 <th scope="row" class="text-left">ElWid: 8-bit</th>
33 <td><code>a[56:64]</code></td>
34 <td><code>a[48:56]</code></td>
35 <td><code>a[40:48]</code></td>
36 <td><code>a[32:40]</code></td>
37 <td><code>a[24:32]</code></td>
38 <td><code>a[16:24]</code></td>
39 <td><code>a[8:16]</code></td>
40 <td><code>a[0:8]</code></td>
41 </tr>
42 <tr class="text-right">
43 <th scope="row" class="text-left">ElWid: 16-bit</th>
44 <td colspan="2"><code>a[48:64]</code></td>
45 <td colspan="2"><code>a[32:48]</code></td>
46 <td colspan="2"><code>a[16:32]</code></td>
47 <td colspan="2"><code>a[0:16]</code></td>
48 </tr>
49 <tr class="text-right">
50 <th scope="row" class="text-left">ElWid: 32-bit</th>
51 <td colspan="4"><code>a[32:64]</code></td>
52 <td colspan="4"><code>a[0:32]</code></td>
53 </tr>
54 <tr class="text-right">
55 <th scope="row" class="text-left">ElWid: 64-bit</th>
56 <td colspan="8"><code>a[0:64]</code></td>
57 </tr>
58 </table>
59
60 So, slicing bits `3:6` of a 32-bit element of `a` must, because we have to match nmigen, produce a 3-bit element, which might seem like no problem, however, slicing bits `3:6` of a 16-bit element of a 64-bit `SimdSignal` must *also* produce a 3-bit element, so, in order to get a `SimdSignal` where *all* elements are 3-bit elements, as required by `SimdSlice`'s output, we have to introduce padding:
61
62 ## `b`'s Elements:
63
64 (TODO 1: shrink to only 4 partitions. TODO 2: convert to markdown)
65
66 <table>
67 <tr class="text-right">
68 <th scope="row" class="text-left">Bit #</th>
69 <td>23&#8288;&hellip;&#8288;21</td>
70 <td>20&#8288;&hellip;&#8288;18</td>
71 <td>17&#8288;&hellip;&#8288;15</td>
72 <td>14&#8288;&hellip;&#8288;12</td>
73 <td>11&#8288;&hellip;&#8288;9</td>
74 <td>8&#8288;&hellip;&#8288;6</td>
75 <td>5&#8288;&hellip;&#8288;3</td>
76 <td>2&#8288;&hellip;&#8288;0</td>
77 </tr>
78 <tr class="text-right">
79 <th scope="row" class="text-left">ElWid: 8-bit</th>
80 <td><code>b[21:24]</code></td>
81 <td><code>b[18:21]</code></td>
82 <td><code>b[15:18]</code></td>
83 <td><code>b[12:15]</code></td>
84 <td><code>b[9:12]</code></td>
85 <td><code>b[6:9]</code></td>
86 <td><code>b[3:6]</code></td>
87 <td><code>b[0:3]</code></td>
88 </tr>
89 <tr class="text-right">
90 <th scope="row" class="text-left">ElWid: 16-bit</th>
91 <td class="text-center"><i>Padding</i></td>
92 <td><code>b[18:21]</code></td>
93 <td class="text-center"><i>Padding</i></td>
94 <td><code>b[12:15]</code></td>
95 <td class="text-center"><i>Padding</i></td>
96 <td><code>b[6:9]</code></td>
97 <td class="text-center"><i>Padding</i></td>
98 <td><code>b[0:3]</code></td>
99 </tr>
100 <tr class="text-right">
101 <th scope="row" class="text-left">ElWid: 32-bit</th>
102 <td colspan="3" class="text-center"><i>Padding</i></td>
103 <td><code>b[12:15]</code></td>
104 <td colspan="3" class="text-center"><i>Padding</i></td>
105 <td><code>b[0:3]</code></td>
106 </tr>
107 <tr class="text-right">
108 <th scope="row" class="text-left">ElWid: 64-bit</th>
109 <td colspan="7" class="text-center"><i>Padding</i></td>
110 <td><code>b[0:3]</code></td>
111 </tr>
112 </table>
113
114 <style>
115 /* duplicated from bootstrap so text editors can see it
116 -- ignored by ikiwiki */
117 .text-left {
118 text-align: left !important
119 }
120
121 .text-right {
122 text-align: right !important
123 }
124
125 .text-center {
126 text-align: center !important
127 }
128 </style>
129
130 # Partitioned SIMD Design implications
131
132 Slice is the very first of the entire suite of sub-modules of Partitioned
133 SimdSignal that requires (and propagates) fixed element widths. All other
134 sub-modules have up until this point been a fixed *overall* width where the
135 element widths adapt to completely fill the entire underlying Signal.
136
137 (**This includes for [[dynamic_simd/eq]] and other comparators and the
138 [[dynamic_simd/logicops]] which very deliberately propagate the LSB boolean
139 value in each partition throughout the entire partition on a per-element
140 basis in order to make Mux and Switch function correctly**)
141
142 Given that this new width context is then passed through to other SimdSignals,
143 the entire SimdSignal suite has to adapt to this change in requirements.
144 It is however not as big an adaptation as it first seems, because ultimately
145 SimdSignals use PartitionPoints (and a PartType) to decide what to do.
146 Illustrating that SimdSignal uses PartitionPoints to make its decisions
147 at the low level, an add example using `b` and a new SimdSignal `c` of
148 an overall 8-bit width (with fixed element widths of size 2):
149
150 (TODO: add an example of how this would then do e.g. an add (to another
151 SimdSignal of only 8 bits in length or so - all element widths being
152 2 in all partitions, but having the exact same PartitionPoints)
153
154 Questions raised by the add example:
155
156 * after performing a Slice, which creates an entirely new
157 (padded) set of PartitionPoints, where does c's PartitionPoints
158 come from?
159 * how should a SimdSignal that does not contain the same
160 padding be add()ed to a Slice()d SimdSignal that does *not*
161 contain padding, having a completely different set of PartitionPoints?
162 * what happens when a fixed element width Slice()d source `b` is
163 add()ed to a fixed *overall* width SimdSignal of width 8 that
164 permits variable-length (max available space) elements?
165
166 Illustrating the case of adding a SimdSignal with padding to one that
167 does not:
168
169 (TODO: add a second example of how this would then do e.g. an add (to another
170 SimdSignal of only 8 bits in length or so, but having a **different**
171 style of PartitionPoints, with no padding this time)
172
173 take signal a, of 16 bits, each bit being numbered in hexadecimal:
174
175 | | |
176 AfAeAdAc AbAaA9A8 A7A6A5A4 A3A2A1A0
177
178 and take a slice a[0:1] to create 3-bit values, where padding is
179 specified by "x", at each elwid:
180
181 elwid | | |
182 0b00 x x x x x x x x x x x x x A2A1A0
183 0b01 x x x x x AaA9A8 x x x x x A2A1A0
184 0b10 x AeAdAc x AaA9A8 x A6A5A4 x A2A1A0
185
186 The presence of "x" unused portions actually requires some additional
187 partition points:
188
189 elwid | | | | | | |
190 0b00 x x x x x x x x x x x x x A2A1A0
191 0b01 x x x x x AaA9A8 x x x x x A2A1A0
192 0b10 x AeAdAc x AaA9A8 x A6A5A4 x A2A1A0
193
194 Now let us take a signal, b, of 2-bit lengths,
195 and attempt to perform an add operation:
196
197 elwid | | |
198 0b00 x x x x x x B1B0
199 0b01 x x B5B4 x x B1B0
200 0b10 B7B6 B5B4 B3B2 B1B0
201
202 This is not immediately possible (at least not
203 obviously so) and consequently b needs expanding
204 to the same padding and PartitionPoints:
205
206 elwid | | | | | | |
207 0b00 x x x x x x x x x x x x x 0 B1B0
208 0b01 x x x x x 0 B5B4 x 0 x x x 0 B1B0
209 0b10 x 0 B7B6 x 0 B5B4 x 0 B3B2 x 0 B1B0
210
211 Note here that zero-extension also had to occur to
212 bring b up to the same element width in each partition,
213 at which point, "x" padding being ignored, a straight
214 PartitionedAdd may be deployed because both the overall
215 width and the positions of the PartitionPoints are exactly
216 matched.
217
218 Another example: Cat() on the same 2 signals: here at least we
219 know that the end-result is elements of 5 bits each, because
220 all "a" slices are 3 bit and all "b" elements are 2 bit:
221
222 elwid | | | | | | |
223 0b00 x x x x x x x x x x x x x x x x x x x x x A2A1A0
224 0b01 x x x x x x x B5B4AaA9A8 x x x x x x x x x A2A1A0
225 0b10 x B7B6AeAdAc x B5B4AaA9A8 x B3B2A6A5A4 x B1B0A2A1A0
226
227
228 Illustrating the case where a Sliced (fixed element width) SimdSignal
229 is added to one which has variable-length elements that take up the
230 entirety of the partition (overall fixed width):
231
232 (TODO: third example)