(no commit message)

[libreriscv.git] / simple_v_extension / vblock_format / discussion.mdwn
diff --git a/simple_v_extension/vblock_format/discussion.mdwn b/simple_v_extension/vblock_format/discussion.mdwn

index 98a1820c9c8fc74245c0995d9d8f423607604e50..c3a4b52d07d877709bd4f447512100f7f626036c 100644 (file)
--- a/simple_v_extension/vblock_format/discussion.mdwn
+++ b/simple_v_extension/vblock_format/discussion.mdwn
@@ -4,7 +4,7 @@ This VBLOCK mode effectively extends [[sv_prefix_proposal]] to cover multiple
  registers.  The basic principle: the "prefix" specifies which of source and
  destination registers are to be considered "vectors" (or scalars), however
  where in SVPrefix that applies to only one instruction, the "vector" tag
-designations *continue to cascade* into subsequent instructions within the
+designations *continue a limited cascade* into subsequent instructions within the
  VBLOCK.
  
  Its advantage over the main format is that the main format requires
@@ -17,27 +17,33 @@ explicit naming of the registers to be tagged (taking up 5 bits each time).
  | SVPMode  | 1:0  |
  | -------  | ---  |
  | non-SVP  | 0b00 |
-| P48 Mode | 0b00 |
-| P64 Mode | 0b00 |
-| Twin-SVP | 0b00 |
+| P48 Mode | 0b01 |
+| P64 Mode | 0b10 |
+| Twin-SVP | 0b11 |
  
  non-SVP mode uses the extended format (see main VBLOCK spec [[vblock_format]])
  
-When P48 Mode is enabled (0b01), the P48 prefix follows the VBLOCK header
+When P48 Mode is enabled (0b01), the P48 prefix follows the VBLOCK header, and an additional itype may be applied to the src operand(s).
  
  | 15:11 | 10:0       |
  | -     | ---------- |
-| rsvd  | P48-Prefix |
+| ioffs | P48-Prefix |
  
  When P64 Mode is enabled (0b10), the P64 prefix also follows:
  
  | 31:16      | 15:11 | 10:0       |
  | ---------- | -     | ---------- |
-| P64-prefix | rsvd  | P48-Prefix |
+| P64-prefix | ioffs | P48-Prefix |
  
-When Twin-SVP Mode is enabled (0b11), a *second* P48-P64 prefix pair follows
-in the VBLOCK, which applies vector-context from the *second* instruction's
-registers.
+When Twin-SVP Mode is enabled (0b11), a *second* P48 prefix follows after a P48-P64 pair,
+in the VBLOCK (another 16 bits after the 32 bit P48/P64 block), which applies vector-context from the *second* instruction's
+registers. The reason why Twin-SVP's prefix is only P48 is because P64 can change VL and MVL. It makes no srnse to try to reset VL/MVL twice in succession.
+
+VL/MVL from a P64 prefix is applied as if a [[specification/sv.setvl]] instruction had been executed as a hidden (first, implicit) instruction in the VBLOCK. This *includes* modification of SV CSR STATE.
+
+ioffs is the instruction counter in multiples of 16 bits (matching PCVBLK) at which the prefix "activates". When PCVBLK matches ioffs, the Prefix applies. It is ignored on all instructions in the VBBLOCK prior to that point. This allows a degree of fine-grain control over which registers are to be "vectorised". 
+
+itype is described in [[sv_prefix_proposal]]. The additional itype on the src operand(s) allows, for example, a LD of 8 bit vectors to be auto-converted to 16 bit signed in a single instruction.  More examples on elwidth polymorphism is in the [[appendix]].
  
  # Rules
  
@@ -52,14 +58,15 @@ registers.
    a given P48/P64 prefix, an "implicit" field is created for that src or
    dest register in the form of a bitwise "OR" of all present vs#/vd# fields.
    *This rule continues to apply* to the instructions following the first
-  in the VBLOCK, cascading throughout the subsequent instructions, on an
-  ongoing basis.
+  (and second, if applicable)
+  in the VBLOCK, however the ORing rule
+  *stops* i.e does not cascade via rd in the following instructions.
  * If an instruction is used where registers are implicitly determined to be
    scalars, they *remain* scalars when used in subsequent instructions.
  
  Example (contrived):
  
-    * VBLOCK, P48 prefix only (SVP0=1), vs1=1, vs2=0
+    * VBLOCK, P48 prefix only (SVPMode=0b01), vs1=1, vs2=0
      * 1st instruction in VBLOCK: ADD x3, x5, x12
      * 2nd instruction in VBLOCK: ADD x7, x5, x3
      * 3rd instruction in VBLOCK: ADD x9, x4, x4
@@ -72,29 +79,37 @@ Example (contrived):
    calculated as "vd = vs1 | vs2" and is therefore set to "1".
  * The "full" specification for the 1st add is therefore
    "ADD vector-x3, vector-x5, scalar-x12".
-* The second instruction also uses x5, however x3 is also now considered
-  a "vector", and, consequently, so is x7.
+* The second instruction also uses x5, and x3 was determined by the OR rule as
+  a "vector". A second apication of the "OR" rule, as it is not listed as an operand in the first instruction, gives x7 also as a vector.
  * The "full" specification for the 2nd add is therefore
    "ADD vector-x7, vector-x5, vector-x3".
  * The 3rd instruction has no context applied to any of its registers, therefore
    x9 and x4 are determined to be "scalar"
  * The specification for the 3rd add is therefore
    "ADD scalar-x9, scalar-x4, scalar-x4"
-* The determination of x4 as "scalar" is now frozen for this VBLOCK, such
-  that it *remains* a scalar for its use in the 4th instruction.
+* The 4th instruction. **despite** determining x7 as vector in instruction 2, x7 is **not** listed in the 1st instruction's operands. Likewise for x4. Therefore the "OR" rule applies to them.
+* x5 on the other hand *is* in the 1st instruction's operands, and, given that x4 and x7 have the "OR" rule applied, are also marked as "vector" *despite x4 being formerly scalar in the 3rd instruction*.
  * Therefore, the "full" specification for the 4th add is:
-  "ADD vector-x7, vector-x5, scalar-x4"
+  "ADD vector-x7, vector-x5, vector-x4"
  
  Writing those out separately, for clarity:
  
-  ADD vector-x3, vector-x5, scalar-x12 # from vs1=1, vs2=0, vd=vs1|vs2
-  ADD vector-x7, vector-x5, vector-x3  # x7: v-x5 | v-x3
-  ADD scalar-x9, scalar-x4, scalar-x4  # x9, x4 not prefixed, therefore scalar
-  ADD vector-x7, vector-x5, scalar-x4  # x4 marked as scalar, x7, x5 vector
+    ADD vector-x3, vector-x5, scalar-x12 # from vs1=1, vs2=0, vd=vs1|vs2  
+    ADD vector-x7, vector-x5, vector-x3  # x7: v-x5 | v-x3  
+    ADD scalar-x9, scalar-x4, scalar-x4  # x9, x4 not prefixed, therefore scalar  
+    ADD vector-x7, vector-x5, vector-x4  # x4, x7, x5 vector  
+
+This kind of counterintuitive weirdness (for x4) is important for reducing the amount of state for context switching.
+
+Twin-SVP mode allows even more registers to be explicitly marked, including some specifically as "scalar",
+where the rules might otherwise start to cascade through and cause
+registers to be come undesirably marked as "vectors", but also to give more opportunity to mark registers that would otherwise flip between scalar and vector.
+
+If ultimately a compiler determines that the rules cannot be applied to get the desired cascading, another VBLOCK can always be started. They are a lot more compact than use of CSR setup and teardown (VBLOCKs end and the context is revoked automatically), requiring only between 32 and 80 bytes to establish, where CSRs, due to the OP32 overhead of the CSR instruction(s) themselves, the teardown cost, and the lack of a long immediate instruction in RISC-V could easily require 80 to 160 bytes to achieve the same task.
+
+The reason why the OR rule cannot cascade onwards is because if a trap occurs and the context has to be reestablished on return, it may be reestablished purely with the VBLOCK header and by decoding the first (and second) instruction.
  
-Twin-SVP mode allows certain registers to be explicitly marked as "scalar",
-where some of the rules might otherwise start to cascade through and cause
-registers to be come undesirably marked as "vectors".
+If the cascade of what was marked "vector" was allowed to continue, it would require re-reading of every opcode up to the point where execution of the VBLOCK left off, in order to reestablish the full cascade context.
  
  # Discussion