(no commit message)
[libreriscv.git] / 3d_gpu / architecture / dynamic_simd / shape.mdwn
1 # SimdShape
2
3 A logical extension of the nmigen `ast.Shape` concept, `SimdShape`
4 provides sufficient context to both define overrides for individual lengths
5 on a per-mask basis as well as sufficient information to "upcast"
6 back to a SimdSignal, in exactly the same way that c++ virtual base
7 class upcasting works when RTTI (Run Time Type Information) works.
8
9 By deriving from `ast.Shape` both `width` and `signed` are provided
10 already, leaving the `SimdShape` class with the responsibility to
11 additionally define lengths for each mask basis. This is best illustrated
12 with an example.
13
14 The Libre-SOC IEEE754 ALUs need to be converted to SIMD Partitioning
15 but without massive disruptive code-duplication or intrusive explicit
16 coding as outlined in the worst of the techniques documented in
17 [[dynamic_simd]]. This in turn implies that Signals need to be declared
18 for both mantissa and exponent that **change width to non-power-of-two
19 sizes** depending on Partition Mask Context.
20
21 Mantissa:
22
23 * when the context is 1xFP64 the mantissa is 54 bits (excluding guard
24 rounding and sticky)
25 * when the context is 2xFP32 there are **two** mantissas of 23 bits
26 * when the context is 4xFP16 there are **four** mantissas of 10 bits
27 * when the context is 4xBF16 there are four mantissas of 5 bits.
28
29 Exponent:
30
31 * 1xFP64: 11 bits, one exponent
32 * 2xFP32: 8 bits, two exponents
33 * 4xFP16: 5 bits, four exponents
34 * 4xBF16: 8 bits, four exponents
35
36 `SimdShape` needs this information in addition to the normal
37 information (width, sign) in order to create the partitions
38 that allow standard nmigen operations to **transparently**
39 and naturally take place at **all** of these non-uniform
40 widths, as if they were in fact scalar Signals *at* those
41 widths.
42
43 A minor wrinkle which emerges from deep analysis is that the overall
44 available width (`Shape.width`) does in fact need to be explicitly
45 declared, and
46 the sub-partitions to fit onto power-of-two boundaries, in order to allow
47 straight wire-connections rather than allow the SimdSignal to be
48 arbitrary-sized (compact). Although on shallow inspection this
49 initially would seem to imply that it would result in large unused
50 sub-partitions (padding partitions) these gates can in fact be eliminated
51 with a "blanking" mask, created from static analysis of the SimdShape
52 context.
53
54 Example:
55
56 * all 32 and 16-bit values are actually to be truncated to 11 bit
57 * all 8-bit values to 5-bit
58
59 from these we can write out the allocations, bearing in mind that
60 in each partition the sub-signal must start on a power-2 boundary,
61 and that "x" marks unused (padding) portions:
62
63 |31| | | | 16|15| | 8|7 0 |
64 32bit | x| x| x| | x| x| x|10 .... 0 |
65 16bit | x| x|26 ... 16 | x| x|10 .... 0 |
66 8bit | x|28 .. 24| 20.16| x|12 .. 8|x|4.. 0 |
67
68 thus, we deduce, we *actually* need breakpoints at these positions,
69 and that unused portions common to **all** cases can be deduced
70 and marked "x"
71
72 | |28|26|24| |20|16| |12|10|8| |4 |
73 x x
74
75 These 100% unused "x"s therefore define the "blanking" mask, and in
76 these sub-portions it is unnecessary to allocate computational gates.
77
78 Also in order to save gates, in the example above there are only three
79 cases (32 bit, 16 bit, 8 bit) therefore only three sets of logic
80 are required to construct the larger overall computational result
81 from the "smaller chunks", rather than at first glance, with there
82 being 9 actual partitions (28, 26, 24, 20, 16, 12, 10, 8, 4), it
83 would appear that 2^9 (512!) cases were required, where in fact
84 there are only three.
85
86 These facts also need to be communicated to both the SimdSignal
87 as well as the submodules implementing its core functionality:
88 add operation and other arithmetic behaviour, as well as
89 [[dynamic_simd/cat]] and others.
90
91 In addition to that, there is a "convenience" that emerged
92 from technical discussions as desirable
93 to have, which is that it should be possible to perform
94 rudimentary arithmetic operations *on a SimdShape* which preserves
95 or adapts the Partition context, where the arithmetic operations
96 occur on `Shape.width`.
97
98 >>> XLEN = SimdShape(64, signed=True, ...)
99 >>> x2 = XLEN // 2
100 >>> print(x2.width)
101 32
102 >>> print(x2.signed)
103 True
104
105 With this capability it becomes possible to use the Liskov Substitution
106 Principle in dynamically compiling code that switches between scalar and
107 SIMD transparently:
108
109 # scalar context
110 scalarctx = scl = object()
111 scl.XLEN = 64
112 scl.SigKls = Signal # standard nmigen Signal
113 # SIMD context
114 simdctx = sdc = object()
115 sdc = SimdShape(64, ....)
116 sdc.SigKls = SimdSignal # advanced SIMD Signal
117 sdc.elwidth = Signal(2)
118 # select one
119 if compiletime_switch == 'SIMD':
120 ctx = simdctx
121 else:
122 ctx = scalarctx
123
124 # exact same code switching context at compile time
125 m = Module():
126 with ctx:
127 x = ctx.SigKls(ctx.XLEN)
128 ...
129 m.d.comb += x.eq(Const(3))
130