3d_gpu/architecture/dynamic_simd.mdwn

   1 # Dynamic Partitioned SIMD
   2
   3 Links:
   4
   5 * <https://bugs.libre-soc.org/show_bug.cgi?id=458> m.If/Switch
   6 * <https://bugs.libre-soc.org/show_bug.cgi?id=115> top level DIMD
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=594> RFC
   8 * <https://bugs.libre-soc.org/show_bug.cgi?id=565> Formal proof
   9
  10 To save hugely on gate count the normal practice of having separate scalar ALUs and separate SIMD ALUs is not followed.
  11
  12 Instead a suite of "partition points" identical in fashion to the Aspex Microelectronics ASP (Array-String-Architecture) architecture is deployed.
  13
  14 Basic principle: when all partition gates are open the ALU is subdivided into isolated and independent 8 bit SIMD ALUs.  Whenever any one gate is opened, the relevant 8 bit "part-results" are chained together in a downstream cascade to create 16 bit, 32 bit, 64 bit and 128 bit compound results.
  15
  16 Pages below describe the basic features of each and track the relevant bugreports.
  17
  18 * [[dynamic_simd/eq]]
  19 * [[dynamic_simd/gt]]
  20 * [[dynamic_simd/add]]
  21 * [[dynamic_simd/mul]]
  22 * [[dynamic_simd/shift]]
  23 * [[dynamic_simd/logicops]] some all xor
  24
  25 # Integration with nmigen
  26
  27 Dynamic partitioning of signals is not enough on its own. Normal nmigen programs involve conditional decisions, that means if statements and switch statements.
  28
  29 With the PartitionedSignal class, basic operations such as `x + y` are functional, producing results 1x64 bit, or 2x32 or 4x16 or 8x8 or anywhere in between, but what about control and decisions? Here is the "normal" way in which SIMD decisions are performed:
  30
  31     if partitions == 1x64
  32          with m.If(x > y):
  33               do something
  34     elif partitions == 2x32:
  35          with m.If(x[0:31] > y[0:31]):
  36               do something on 1st half
  37          elif ...
  38     elif ...
  39     # many more lines of repeated laborious hand written
  40     # SIMD nonsense all exactly the same except for the
  41     # for loop and sizes.
  42
  43 Clearly this is a total unmaintainable nightmare of worthless crud which, if continued throughout a large project with 40,000 lines of code when written without SIMD, would completely destroy all chances of that project being successful by turning 40,000 lines into 400,000 lines of unreadable spaghetti.
  44
  45 A much more intelligent approach is needed. What we actually want is:
  46
  47     with m.If(x > y): # do a partitioned compare here
  48          do something dynamic here
  49
  50 where behind the scenes the above laborious for-loops (conceptually) are created, hidden, behind the scenes, looking to all intents and purposes however that this is exactly like any other nmigen Signal.
  51
  52 This means that nmigen needs to "understand" the partitioning, in m.If, m.Else and m.Switch, at the bare minimum.
  53
  54 Analysis of the internals of nmigen shows that m.If, m.Else and m.Switch are all redirected to `Value.cases'.  Within that function Mux and other "global" functions (similar to python operator functions).  The hypothesis is therefore proposed that if `Value.mux` is added in an identical way to how `operator.add` calls `__add__` this may turn out to be all that (or most of what) is needed.