(no commit message)

[libreriscv.git] / openpower / sv / propagation.mdwn
diff --git a/openpower/sv/propagation.mdwn b/openpower/sv/propagation.mdwn

index 85b59f8f56a24805d573314e0fb7c2adffcb17bf..1d6e8242a66e5e39bff09f1d0a5cf3da1e0b3023 100644 (file)
--- a/openpower/sv/propagation.mdwn
+++ b/openpower/sv/propagation.mdwn
@@ -4,8 +4,20 @@
  
  [[!toc]]
  
-Context Propagation is for a future version of SV
+Context Propagation is for a future version of SV.  It requires one
+Major opcode in some cases.
  
+The purpose of Context Propagation is a hardware compression algorithm
+for 64-bit prefix-suffix ISAs.  The prefix is *separated* from the suffix
+and, on the reasonable assumption  that the exact same prefix will need to 
+be applied to multiple suffixes, a bit-level FIFO is given to indicate
+when a particular prefix shall be applied to future instructions.
+
+In this way, with the suffixes being only 32 bit and multiple 32-bit
+instructions having the exact same prefix applied to them, the ISA is
+much more compact.
+
+Put another way:
  [[sv/svp64]] context is 24 bits long, and Swizzle is 12.  These
  are enormous and not sustainable as far as power consumption is
  concerned.  Also, there is repetition of the same contexts to different
@@ -16,7 +28,7 @@ The basic principle is to have a suite of 40 indices in a shift register
  that indicate one of seven Contexts shall be applied to upcoming 32 bit
  v3.0B instructions.  The Least Significant Index in the shift register is
  the one that is applied.  One of those indices is 0b000 which indicates
-"no prefix applied".
+"no prefix applied".  Effectively this is a bit-level FIFO.
  
  A special instruction in an svp64 context takes a copy of the `RM[0..23]`
  bits, alongside a 21 bit suite that indicates up to 20 32 bit instructions
@@ -29,15 +41,16 @@ then the new entries are placed after the end of the highest-indexed one.
  | OP |     | MMM |     | ?-Form  |
  | OP | idx | 000 | imm |         |
  
-Three different types of contexts are available so far: svp64 RM and
+Four different types of contexts are available so far: svp64 RM, setvl, Remap and
  swizzle. Their format is as follows when stored in SPRs:
  
  | 0..3 | 4..7   | 8........31 |  name     |
  | ---- | ----   | ----------- | --------- |
-| 0000 | 0000   | `RM[0:23]`  |  svp64 RM |
+| 0000 | 0000   | `RM[0:23]`  |  [[sv/svp64]] RM |
+| 0000 | 0001   |`setvl[0:23]`|  [[sv/setvl]] VL |
  | 0001 | 0 mask | swiz1 swiz2 |  swizzle  |
-| 0010 | brev   | sh0-3 ms0-3 |  Remap    |
-
+| 0010 | brev   | sh0-4 ms0-5 |  [Remap](sv/remap)    |
+| 0011 | brev   | sh0-4 ms0-4 |  [SubVL Remap](sv/remap)    |
  
  There are 4 64 bit SPRs used for storing Context, and the data is stored
  as follows:
@@ -45,7 +58,20 @@ as follows:
  * 7 32 bit contexts are stored, each indexed from 0b001 to 0b111,
    2 per 64 bit SPR and 1 in the 4th.
  * Starting from bit 32 of the 4th SPR, in batches of 40 bits the Shift
-  Registers are stored.
+  Registers (bit-level FIFOs) are stored.
+
+```
+             0            31 32         63
+    SVCTX0   context 0       context 1
+    SVCTX1   context 2       context 3
+    SVCTX2   context 4       context 5
+    SVCTX3   context 6       FIFO0[0..31]
+    SVCTX4   FIFO0[32:39]   FIFO1[0:39] FIFO2[0:15]
+    SVCTX5   FIFO2[16:39]   FIFO3[0:39] FIFO4[0:7]
+    SVCTX5   FIFO4[8:39]    FIFO5[0:39] FIFO5[0:15]
+    SVCTX6   FIFO5[16:39]   FIFO6[0:39] FIFO7[0:7
+    SVCTX7   FIFO7[16:39]
+```
  
  When each LSB is nonzero in any one of the seven Shift Registers
  the corresponding Contexts are looked up and merged (ORed) together.
@@ -67,7 +93,7 @@ and variable-length loops.
  Loops, clearly, because if the setup of the shift registers does
  not precisely match the number of instructions, the meaning of those
  instructions will change as the bits in the shift registers run out!
-However if the loops are of fixed size and small enough (40 instructions
+However if the loops are of fixed static size, with no conditional early exit,  and small enough (40 instructions
  maximum) then it is perfectly reasonable to insert repeated patterns into
  the shift registers, enough to cover all the loops.  Ordinarily however
  the use of the Context Propagation instructions should be inside the
@@ -113,7 +139,7 @@ for specifying to which registers the swizzle is to be applied, and
  there is only 17 bit suite to indicate the instructions to which the
  swizzle applies.
  
-The bits in rhe svp64 `RM` field are interpreted as a pair of 12 bit
+The bits in the svp64 `RM` field are interpreted as a pair of 12 bit
  swizzles
  
  | 0.5| 6.8 | 9.11| 12.14 | 15.31 |  name   |
@@ -143,17 +169,15 @@ the second.
  
  # 2D/3D Matrix Remap
  
-*Based on the old version [[simple_v_extension/remap]], the Shape CSRs
-remain the same as does the algorithm that performs the remapping*.
-
-Remap allows up to four Vectors (all four arguments of `fma` for example)
+[[sv/remap]] allows up to four Vectors (all four arguments of `fma` for example)
  to be algorithmically arbitrarily remapped via 1D, 2D or 3D reshaping.
+The amount of information needed to do so is however quite large: consequently it is only practical to apply indirectly, via Context propagation.
  
  Vectors may be remapped such that Matrix multiply of any arbitrary size
  is performed in one Vectorised `fma` instruction as long as the total
  number of elements is less than 64 (maximum for VL).
  
-Additionally it may be used to perform "zipping" and "unzipping" of
+Additionally, in a fashion known as "Structure Packing" in NEON and RVV, it may be used to perform "zipping" and "unzipping" of
  elements in a regular fashion of any arbitrary size and depth: RGB
  or Audio channel data may be split into separate contiguous lanes of
  registers, for example.
@@ -169,10 +193,13 @@ immediate, 29 of which are dropped into the indexed Shift Register
  | 0.5| 6.8 | 9.10| 11.14 | 15.31|  name   |
  | -- | --- | --- | ----  | ---- | ------- |
  | OP |     | MM  |       |      | ?-Form  |
-| OP | idx | 01  | brev  | imm  |         |
+| OP | idx | 10  | brev  | imm  | Remap        |
+| OP | idx | 11  | brev  | imm  | SUBVL Remap    |
+
+SUBVL Remap applies the remapping even into the SUBVL Elements, for a total of `VL\*SUBVL` Elements.  **swizzle may be applied on top as a second phase** after SUBVL Remap.
  
  brev field, which also applied down to SUBVL elements (not to the whole
-vec2/3/4, that would be handled by swizzle reordering)
+vec2/3/4, that would be handled by swizzle reordering):
  
  * bit 0 indicates that dest elements are byte-reversed
  * bit 1 indicates that src1 elements are byte-reversed
@@ -181,19 +208,21 @@ vec2/3/4, that would be handled by swizzle reordering)
  
  Again it is the 24 bit `RM` that is interpreted differently:
  
-| 0...7 | 8....23 |
-| ----- | ------- |
-| sh0-3 | mask0-3 |
+| 0  | 2  | 4  | 6  | 8  | 10.14 | 15..23 |
+| -- | -- | -- | -- | -- | ----- | ------ |
+|mi0 |mi1 |mi2 |mo0 |mo1 | en0-4 | rsv    |
  
-The shape indices 0-3 are numbered 0-3 whilst the masks are bitmasks
-that indicate src or dest to which the associated shape (0-3) is to
-be applied.  A zero mask indicates that the Shape is not to be applied.
-Note that whilst the masks are unary encoded the Shape indices sh0-3
-are not: this must be taken into consideration when ORing occurs.
+si0-2 and so0-1 each select SVSHAPE0-3 to apply to a given register.
+si0-2 apply to RA, RB, RC respectively, as input registers, and
+likewise so0-1 apply to output registers. en0-4 indicate whether the
+SVSHAPE is actively applied or not.
  
-The mask is encoded as follows:
+# setvl
+
+Fitting into 22 bits with 2 reserved and 2 for future
+expansion of SV Vector Length is a total of 24 bits
+which is exactly the same size as SVP64 RM
  
-* bit 0 indicates that dest is reshaped
-* bit 1 indicates that src1 is reshaped
-* bit 2 indicates that src2 is reshaped
-* bit 3 indicates that src3 is reshaped
+| 0.5|6.10| 11..18 | 19..20 |21| 22.23 |
+| -- | -- | ------ | ------ |--| ----- |
+| RT | RA | SVi // | vs ms  |Rc| rsvd  |