8a3663531d3648ecab49d34693cd555efdf649bd
[libreriscv.git] / 3d_gpu / architecture / dynamic_simd / shape.mdwn
1 # SimdShape
2
3 A logical extension of the nmigen `ast.Shape` concept, `SimdShape`
4 provides sufficient context to both define overrides for individual lengths
5 on a per-mask basis as well as sufficient information to "upcast"
6 back to a SimdSignal, in exactly the same way that c++ virtual base
7 class upcasting works when RTTI (Run Time Type Information) works.
8
9 By deriving from `ast.Shape` both `width` and `signed` are provided
10 already leaving the `SimdShape` class with the responsibility to
11 additionally define lengths for each mask basis. This is best illustrated
12 with an example.
13
14 The Libre-SOC IEEE754 ALUs need to be converted to SIMD Partitioning
15 but without massive disruptive code-duplication or intrusive explicit
16 coding as outlined in the worst of the techniques documented in
17 [[dynamic_simd]]. This in turn implies that Signals need to be declared
18 for both mantissa and exponent that **change width to non-power-of-two
19 sizes** depending on Partition Mask Context.
20
21 Mantissa:
22
23 * when the context is 1xFP64 the mantissa is 54 bits (excluding guard
24 rounding and sticky)
25 * when the context is 2xFP32 there are **two** mantissas of 23 bits
26 * when the context is 4xFP16 there are **four** mantissas of 10 bits
27 * when the context is 4xBF16 there are four mantissas of 5 bits.
28
29 Exponent:
30
31 * 1xFP64: 11 bits, one exponent
32 * 2xFP32: 8 bits, two exponents
33 * 4xFP16: 5 bits, four exponents
34 * 4xBF16: 8 bits, four exponents
35
36 `SimdShape` needs this information in addition to the normal
37 information (width, sign) in order to create the partitions
38 that allow standard nmigen operations to **transparently**
39 and naturally take place at **all** of these non-uniform
40 widths, as if they were in fact scalar Signals *at* those
41 widths.
42
43 A minor wrinkle which emerges from deep analysis is that the overall
44 available width (`Shape.width`) does in fact need to be explicitly
45 declared, and
46 the sub-partitions fit onto power-of-two boundaries, in order to allow
47 straight wire-connections rather than allow the SimdSignal to be
48 arbitrary-sized (compact). Although on shallow inspection this
49 initially would seem to imply that it would result in large unused
50 sub-partitions (padding partitions) these gates can in fact be eliminated
51 with a "blanking" mask, created from static analysis of the SimdShape
52 context.
53
54 Example:
55
56 * all 32 and 16-bit values are actually to be truncated to 11 bit
57 * all 8-bit values to 5-bit
58
59 from these we can write out the allocations, bearing in mind that
60 in each partition the sub-signal must start on a power-2 boundary,
61 and that "x" marks unused (padding) portions:
62
63 |31| | | 16|15| | 8|7 0 |
64 32bit | x| x| x| x| x| x|10 .... 0 |
65 16bit | x| x|26 ... 16 | x| x|10 .... 0 |
66 8bit | x|28.24| 20.16| x|12 .. 8|x|4.. 0 |
67
68 thus, we deduce, we *actually* need breakpoints at these positions,
69 and that unused portions common to **all** cases can be deduced
70 and marked "x"
71
72 |28|26|24| |20|16| |12|10|8| |4 0
73 x x