3 A logical extension of the nmigen `ast.Shape` concept, `SimdShape`
4 provides sufficient context to both define overrides for individual lengths
5 on a per-mask basis as well as sufficient information to "upcast"
6 back to a SimdSignal, in exactly the same way that c++ virtual base
7 class upcasting works when RTTI (Run Time Type Information) works.
9 By deriving from `ast.Shape` both `width` and `signed` are provided
10 already, leaving the `SimdShape` class with the responsibility to
11 additionally define lengths for each mask basis. This is best illustrated
14 The Libre-SOC IEEE754 ALUs need to be converted to SIMD Partitioning
15 but without massive disruptive code-duplication or intrusive explicit
16 coding as outlined in the worst of the techniques documented in
17 [[dynamic_simd]]. This in turn implies that Signals need to be declared
18 for both mantissa and exponent that **change width to non-power-of-two
19 sizes** depending on Partition Mask Context.
23 * when the context is 1xFP64 the mantissa is 54 bits (excluding guard
25 * when the context is 2xFP32 there are **two** mantissas of 23 bits
26 * when the context is 4xFP16 there are **four** mantissas of 10 bits
27 * when the context is 4xBF16 there are four mantissas of 5 bits.
31 * 1xFP64: 11 bits, one exponent
32 * 2xFP32: 8 bits, two exponents
33 * 4xFP16: 5 bits, four exponents
34 * 4xBF16: 8 bits, four exponents
36 `SimdShape` needs this information in addition to the normal
37 information (width, sign) in order to create the partitions
38 that allow standard nmigen operations to **transparently**
39 and naturally take place at **all** of these non-uniform
40 widths, as if they were in fact scalar Signals *at* those
43 A minor wrinkle which emerges from deep analysis is that the overall
44 available width (`Shape.width`) does in fact need to be explicitly
46 the sub-partitions to fit onto power-of-two boundaries, in order to allow
47 straight wire-connections rather than allow the SimdSignal to be
48 arbitrary-sized (compact). Although on shallow inspection this
49 initially would seem to imply that it would result in large unused
50 sub-partitions (padding partitions) these gates can in fact be eliminated
51 with a "blanking" mask, created from static analysis of the SimdShape
56 * all 32 and 16-bit values are actually to be truncated to 11 bit
57 * all 8-bit values to 5-bit
59 from these we can write out the allocations, bearing in mind that
60 in each partition the sub-signal must start on a power-2 boundary,
61 and that "x" marks unused (padding) portions:
63 |31| | | | 16|15| | 8|7 0 |
64 32bit | x| x| x| | x| x| x|10 .... 0 |
65 16bit | x| x|26 ... 16 | x| x|10 .... 0 |
66 8bit | x|28 .. 24| 20.16| x|12 .. 8|x|4.. 0 |
68 thus, we deduce, we *actually* need breakpoints at these positions,
69 and that unused portions common to **all** cases can be deduced
72 | |28|26|24| |20|16| |12|10|8| |4 |
75 These 100% unused "x"s therefore define the "blanking" mask, and in
76 these sub-portions it is unnecessary to allocate computational gates.
78 Also in order to save gates, in the example above there are only three
79 cases (32 bit, 16 bit, 8 bit) therefore only three sets of logic
80 are required to construct the larger overall computational result
81 from the "smaller chunks", rather than at first glance, with there
82 being 9 actual partitions (28, 26, 24, 20, 16, 12, 10, 8, 4), it
83 would appear that 2^9 (512!) cases were required, where in fact
86 These facts also need to be communicated to both the SimdSignal
87 as well as the submodules implementing its core functionality:
88 add operation and other arithmetic behaviour, as well as
89 [[dynamic_simd/cat]] and others.
91 In addition to that, there is a "convenience" that emerged
92 from technical discussions as desirable
93 to have, which is that it should be possible to perform
94 rudimentary arithmetic operations *on a SimdShape* which preserves
95 or adapts the Partition context, where the arithmetic operations
96 occur on `Shape.width`.
98 >>> XLEN = SimdShape(64, signed=True, ...)
105 With this capability it becomes possible to use the Liskov Substitution
106 Principle in dynamically compiling code that switches between scalar and
110 scalarctx = scl = object()
112 scl.SigKls = Signal # standard nmigen Signal
114 simdctx = sdc = object()
115 sdc = SimdShape(64, ....)
116 sdc.SigKls = SimdSignal # advanced SIMD Signal
117 sdc.elwidth = Signal(2)
119 if compiletime_switch == 'SIMD':
124 # exact same code switching context at compile time
127 x = ctx.SigKls(ctx.XLEN)
129 m.d.comb += x.eq(Const(3))