3d_gpu/architecture/dynamic_simd/shape.mdwn

   1 # SimdShape
   2
   3 A logical extension of the nmigen `ast.Shape` concept, `SimdShape`
   4 provides sufficient context to both define overrides for individual lengths
   5 on a per-mask basis as well as sufficient information to "upcast"
   6 back to a SimdSignal, in exactly the same way that c++ virtual base
   7 class upcasting works when RTTI (Run Time Type Information) works.
   8
   9 By deriving from `ast.Shape` both `width` and `signed` are provided
  10 already leaving the `SimdShape` class with the responsibility to
  11 additionally define lengths for each mask basis. This is best illustrated
  12 with an example.
  13
  14 The Libre-SOC IEEE754 ALUs need to be converted to SIMD Partitioning
  15 but without massive disruptive code-duplication or intrusive explicit
  16 coding as outlined in the worst of the techniques documented in
  17 [[dynamic_simd]].  This in turn implies that Signals need to be declared
  18 for both mantissa and exponent that **change width to non-power-of-two
  19 sizes** depending on Partition Mask Context.
  20
  21 Mantissa:
  22
  23 * when the context is 1xFP64 the mantissa is 54 bits (excluding guard
  24   rounding and sticky)
  25 * when the context is 2xFP32 there are **two** mantissas of 23 bits
  26 * when the context is 4xFP16 there are **four** mantissas of 10 bits
  27 * when the context is 4xBF16 there are four mantissas of 5 bits.
  28
  29 Exponent:
  30
  31 * 1xFP64: 11 bits, one exponent
  32 * 2xFP32: 8 bits, two exponents
  33 * 4xFP16: 5 bits, four exponents
  34 * 4xBF16: 8 bits, four exponents
  35
  36 `SimdShape` needs this information in addition to the normal
  37 information (width, sign) in order to create the partitions
  38 that allow standard nmigen operations to **transparently**
  39 and naturally take place at **all** of these non-uniform
  40 widths, as if they were in fact scalar Signals *at* those
  41 widths.
  42
  43 A minor wrinkle which emerges from deep analysis is that the overall
  44 available width (`Shape.width`) does in fact need to be explicitly
  45 declared, and
  46 the sub-partitions fit onto power-of-two boundaries, in order to allow
  47 straight wire-connections rather than allow the SimdSignal to be
  48 arbitrary-sized (compact).  Although on shallow inspection this
  49 initially would seem to imply that it would result in large unused
  50 sub-partitions (padding partitions) these gates can in fact be eliminated
  51 with a "blanking" mask, created from static analysis of the SimdShape
  52 context.
  53
  54 Example:
  55
  56 * all 32 and 16-bit values are actually to be truncated to 11 bit
  57 * all 8-bit values to 5-bit
  58
  59 from these we can write out the allocations, bearing in mind that
  60 in each partition the sub-signal must start on a power-2 boundary,
  61 and that "x" marks unused (padding) portions:
  62
  63           |31|  |  |     16|15|  |   8|7     0 |
  64     32bit | x| x| x|      x| x| x|10 ....    0 |
  65     16bit | x| x|26 ... 16 | x| x|10 ....    0 |
  66     8bit  | x|28.24|  20.16| x|12 .. 8|x|4.. 0 |
  67
  68 thus, we deduce, we *actually* need breakpoints at these positions,
  69 and that unused portions common to **all** cases can be deduced
  70 and marked "x"
  71
  72             |28|26|24| |20|16| |12|10|8|   |4   0
  73            x                  x