openpower/sv/propagation.mdwn

   1 # SV Context Propagation
   2
   3 Context Propagation is for a future version of SV
   4
   5 [[sv/svp64]] context is 24 bits long, and Swizzle is 12.  These
   6 are enormous and not sustainable as far as power consumption is
   7 concerned.  Also, there is repetition of the same contexts to different
   8 instructions. An idea therefore is to add a level of indirection that
   9 allows these contexts to be applied to multiple instructions.
  10
  11 The basic principle is to have a suite of 40 indices in a shift register
  12 that indicate one of seven Contexts shall be applied to upcoming 32 bit
  13 v3.0B instructions.  The Least Significant Index in the shift register is
  14 the one that is applied.  One of those indices is 0b000 which indicates
  15 "no prefix applied".
  16
  17 A special instruction in an svp64 context takes a copy of the `RM[0..23]`
  18 bits, alongside a 21 bit suite that indicates up to 20 32 bit instructions
  19 will have that `RM` applied to them, as well as an index to associate
  20 with the `RM`.  If there are already indices set within the shift register
  21 then the new entries are placed after the end of the highest-indexed one.
  22
  23 | 0.5|6.8  | 9.10|11.31|  name   |
  24 | -- | --- | --- | --- | ------- |
  25 | OP | MMM |     |     | ?-Form  |
  26 | OP | 000 | idx | imm |         |
  27
  28 Two different types of contexts are available so far: svp64 RM and
  29 swizzle. Their format is as follows when stored in SPRs:
  30
  31 | 0...4 | 5..7 | 8........31 |  name     |
  32 | ----- | ---- | ----------- | --------- |
  33 | 00000 | 000  | `RM[0:23]`  |  svp64 RM |
  34 | 00001 | mask | swiz1 swiz2 |  swizzle  |
  35 | 00010 | 000  | sh0-3 ms0-3 |  Remap    |
  36
  37
  38 There are 4 64 bit SPRs used for storing Context, and the data is stored
  39 as follows:
  40
  41 * 7 32 bit contexts are stored, each indexed from 0b001 to 0b111,
  42   2 per 64 bit SPR and 1 in the 4th.
  43 * Starting from bit 32 of the 4th SPR, in batches of 40 bits the Shift
  44   Registers are stored.
  45
  46 When each LSB is nonzero in any one of the seven Shift Registers
  47 the corresponding Contexts are looked up and merged (ORed) together.
  48 Contexts for different purposes however may not be mixed: an illegal
  49 instruction is raised if this occurs.
  50
  51 The reason for merging the contexts is so that different aspects may be
  52 applied.  For example some `RM` contexts may indicate that predication
  53 is to be applied to an instruction whilst another context may contain
  54 the svp64 Mode.  Combining the two allows the predication aspect to be
  55 merged and shared, making for better packing.
  56
  57 These changes occur on a precise schedule: compilers should not have
  58 difficulties statically allocating the Context Propagation, as long
  59 as certain conventions are followed, such as avoidance of allowing the
  60 context to propagate through branches used by more than one incoming path,
  61 and variable-length loops.
  62
  63 Loops, clearly, because if the setup of the shift registers does
  64 not precisely match the number of instructions, the meaning of those
  65 instructions will change as the bits in the shift registers run out!
  66 However if the loops are of fixed size and small enough (40 instructions
  67 maximum) then it is perfectly reasonable to insert repeated patterns into
  68 the shift registers, enough to cover all the loops.  Ordinarily however
  69 the use of the Context Propagation instructions should be inside the
  70 loop and it is the responsibility of the compiler and assembler writer
  71 to ensure that the shift registers reach zero before any loop jump-back
  72 point.
  73
  74 ## Pseudocode:
  75
  76 The internal data structures need not precisely match the SPRs.  Here are
  77 some internal datastructures:
  78
  79     bit sreg[7][40] # seven 40 bit shift registers
  80     bit context[7][24]   # seven contexts
  81     int sregoffs[7] # indicator where last bits were placed
  82
  83 The Context Propagation instruction then inserts bits into the selected
  84 stream:
  85
  86     count = 20-count_trailing_zeros(imm)
  87     context[idx] = new_context
  88     start = sregoffs[idx]
  89     sreg[idx][start:start+count] = imm[0:count]
  90     sregoffs[idx] += count
  91
  92 With each shift register being maintained independently the new bits are
  93 dropped in where the last ones end.  To get which one is to be applied
  94 is as follows:
  95
  96     apply_context
  97     for i in range(7):
  98         if sreg[i][0]:
  99             apply_context |= context[i]
 100         sreg[i] = sreg[i] >> 1
 101         sregoffs[i] -= 1
 102
 103 Note that it is the LSB that says which context is to be applied.
 104
 105 # Swizzle Propagation
 106
 107 Swizzle Contexts follow the same schedule except that there is a mask
 108 for specifying to which registers the swizzle is to be applied, and
 109 there is only 17 bit suite to indicate the instructions to which the
 110 swizzle applies.
 111
 112 The bits in rhe svp64 `RM` field are interpreted as a pair of 12 bit
 113 swizzles
 114
 115 | 0.5| 6.8 | 9.11| 12.14 | 15.31 |  name   |
 116 | -- | --- | --- | ----- | ----- | ------- |
 117 | OP | MMM |     | mask  |       | ?-Form  |
 118 | OP | 001 | idx | mask  |  imm  |         |
 119
 120 Note however that it is only svp64 encoded instructions to which swizzle
 121 applies, so Swizzle Shift Registers only activate (and shift down)
 122 on svp64 instructions. *This includes Context-propagated ones!*
 123
 124 The mask is encoded as follows:
 125
 126 * bit 0 indicates that src1 is swizzled
 127 * bit 1 indicates that src2 is swizzled
 128 * bit 2 indicates that src3 is swizzled
 129
 130 When the compiler creates Swizzle Contexts it is important to recall
 131 that the Contexts will be ORed together. Thus one Context may specify
 132 a mask whilst the other Context specifies the swizzles: ORing different
 133 mask contexts with different swizzle Contexts allows more combinations
 134 than would normally fit into seven Contexts.
 135
 136 More than one bit is permitted to be set in the mask: swiz1 is applied
 137 to the first src operand specified by the mask, and swiz2 is applied to
 138 the second.
 139
 140 # 2D/3D Matrix Remap
 141
 142 *Based on the old version [[simple_v_extension/remap]], the Shape CSRs
 143 remain the same as does the algorithm that performs the remapping*.
 144
 145 Remap allows up to four Vectors (all four arguments of `fma` for example)
 146 to be algorithmically arbitrarily remapped via 1D, 2D or 3D reshaping.
 147
 148 Vectors may be remapped such that Matrix multiply of any arbitrary size
 149 is performed in one Vectorised `fma` instruction as long as the total
 150 number of elements is less than 64 (maximum for VL).
 151
 152 There are four possible Shapes.  Unlike swizzle contexts this one requires
 153 rhe external remap Shape SPRs because the state information is too large
 154 to fit into the Context itself.  Thus the Remap Context says which Shapes
 155 apply to which registers.
 156
 157 The instruction format is the same as `RM` and thus uses 21 bits of
 158 immediate, 29 of which are dropped into the indexed Shift Register
 159
 160 | 0.5|6.8  | 9.10|11.31|  name   |
 161 | -- | --- | --- | --- | ------- |
 162 | OP | MMM |     |     | ?-Form  |
 163 | OP | 010 | idx | imm |         |
 164
 165 Again it is the 24 bit `RM` that is interpreted differently:
 166
 167 | 0...7 | 8....23 |
 168 | ----- | ------- |
 169 | sh0-3 | mask0-3 |
 170
 171 The shape indices 0-3 are numbered 0-3 whilst the masks are bitmasks
 172 that indicate src or dest to which the associated shape (0-3) is to
 173 be applied.  A zero mask indicates that the Shape is not to be applied.
 174 Note that whilst the masks are unary encoded the Shape indices sh0-3
 175 are not: this must be taken into consideration when ORing occurs.
 176
 177 The mask is encoded as follows:
 178
 179 * bit 0 indicates that dest is reshaped
 180 * bit 1 indicates that src1 is reshaped
 181 * bit 2 indicates that src2 is reshaped
 182 * bit 3 indicates that src3 is reshaped