sync_up: Add link from discussion page

[libreriscv.git] / simple_v_extension / appendix.mdwn
diff --git a/simple_v_extension/appendix.mdwn b/simple_v_extension/appendix.mdwn

index 779fe63d877adb714f27ed5a9f619bb9d29cfdb8..c29044cfea6b9772be22c43d9b8dc3d968f819ee 100644 (file)
--- a/simple_v_extension/appendix.mdwn
+++ b/simple_v_extension/appendix.mdwn
@@ -1,4 +1,8 @@
-# Simple-V (Parallelism Extension Proposal) Appendix
+[[!oldstandards]]
+
+# Simple-V (Parallelism Extension Proposal) Appendix (OBSOLETE)
+
+**OBSOLETE**
  
  * Copyright (C) 2017, 2018, 2019 Luke Kenneth Casson Leighton
  * Status: DRAFTv0.6
@@ -194,34 +198,52 @@ comprehensive in its effect on instructions.
  Branch operations are augmented slightly to be a little more like FP
  Compares (FEQ, FNE etc.), by permitting the cumulation (and storage)
  of multiple comparisons into a register (taken indirectly from the predicate
-table).  As such, "ffirst" - fail-on-first - condition mode can be enabled.
+table) and enhancing them to branch "consensually" depending on *multiple*
+tests.  "ffirst" - fail-on-first - condition mode can also be enabled,
+to terminate the comparisons early.
  See ffirst mode in the Predication Table section.
  
-There are two registers for the comparison operation, therefore there is
-the opportunity to associate two predicate registers.  The first is a
-"normal" predicate register, which acts just as it does on any other
-single-predicated operation: masks out elements where a bit is zero,
-applies an inversion to the predicate mask, and enables zeroing / non-zeroing
-mode.
-
-The second is utilised to indicate where the results of each comparison
-are to be stored, as a bitmask.  Additionally, the behaviour of the branch
-- when it occurs - may also be modified depending on whether the predicate
-"invert" bit is set.
-
-* If the "invert" bit is zero, then the branch will occur if and only
-  all tests pass
-* If the "invert" bit is set, the branch will occur if and only if all
-  tests *fail*.
-
-This inversion capability, with some careful boolean logic manipulation,
-covers AND, OR, NAND and NOR branching based on multiple element comparisons.
-Note that unlike normal computer programming early-termination of chains
-of AND or OR conditional tests, the chain does *not* terminate early except
-if fail-on-first is set, and even then ffirst ends on the first data-dependent
-zero.  When ffirst mode is not set, *all* conditional element tests must be
-performed (and the result optionally stored in the result mask), with a
-"post-analysis" phase carried out which checks whether to branch.
+There are two registers for the comparison operation, therefore there
+is the opportunity to associate two predicate registers (note: not in
+the same way as twin-predication).  The first is a "normal" predicate
+register, which acts just as it does on any other single-predicated
+operation: masks out elements where a bit is zero, applies an inversion
+to the predicate mask, and enables zeroing / non-zeroing mode.
+
+The second (not to be confused with a twin-predication 2nd register)
+is utilised to indicate where the results of each comparison are to
+be stored, as a bitmask.  Additionally, the behaviour of the branch -
+when it occurs - may also be modified depending on whether the 2nd predicate's
+"invert" and "zeroing" bits are set.  These four combinations result
+in "consensual branches", cbranch.ifnone (NOR), cbranch.ifany (OR),
+cbranch.ifall (AND), cbranch.ifnotall (NAND).
+
+| invert | zeroing | description                 | operation | cbranch |
+| ------ | ------- | --------------------------- | --------- | ------- |
+| 0      | 0       | branch if all pass          | AND       | ifall   |
+| 1      | 0       | branch if one fails         | NAND      | ifnall  |
+| 0      | 1       | branch if one passes        | OR        | ifany   |
+| 1      | 1       | branch if all fail          | NOR       | ifnone  |
+
+This inversion capability covers AND, OR, NAND and NOR branching
+based on multiple element comparisons. Without the full set of four,
+it is necessary to have two-sequence branch operations: one conditional, one
+unconditional.
+
+Note that unlike normal computer programming, early-termination of chains
+of AND or OR conditional tests, the chain does *not* terminate early
+except if fail-on-first is set, and even then ffirst ends on the first
+data-dependent zero.  When ffirst mode is not set, *all* conditional
+element tests must be performed (and the result optionally stored in
+the result mask), with a "post-analysis" phase carried out which checks
+whether to branch.
+
+Note also that whilst it may seem excessive to have all four (because
+conditional comparisons may be inverted by swapping src1 and src2),
+data-dependent fail-on-first is *not* invertible and *only* terminates
+on first zero-condition encountered.  Additionally it may be inconvenient
+to have to swap the predicate registers associated with src1 and src2,
+because this involves a new VBLOCK Context.
  
  ### Standard Branch <a name="standard_branch"></a>
  
@@ -290,9 +312,9 @@ complex), this becomes:
  
      ffirst_mode, zeroing = get_pred_flags(rs1)
      if exists(rd):
-        pred_inversion = get_pred_invert(rs2)
+        pred_inversion, pred_zeroing = get_pred_flags(rs2)
      else
-        pred_inversion = False
+        pred_inversion, pred_zeroing = False, False
  
      if not exists(rd) or zeroing:
          result = (1<<VL)-1 # all 1s
@@ -316,11 +338,23 @@ complex), this becomes:
          preg[rd] = result # store in destination
  
      if pred_inversion:
-        if result == 0:
-            goto branch
+        if pred_zeroing:
+            # NOR
+            if result == 0:
+                goto branch
+        else:
+            # NAND
+            if (result & ps) != result:
+                goto branch
      else:
-        if (result & ps) == result:
-            goto branch
+        if pred_zeroing:
+            # OR
+            if result != 0:
+                goto branch
+        else:
+            # AND
+            if (result & ps) == result:
+                goto branch
  
  Notes:
  
@@ -1066,7 +1100,7 @@ Note:
    is also marked as scalar, this is how the compatibility with
    standard RV LOAD/STORE is preserved by this algorithm.
  
-### Example Tables showing LOAD elements
+### Example Tables showing LOAD elements <a name="load_example"></a>
  
  This section contains examples of vectorised LOAD operations, showing
  how the two stage process works (three if zero/sign-extension is included).
@@ -1426,7 +1460,7 @@ circumstances it is perfectly fine to simply have the lanes
  "inactive" for predicated elements, even though it results in
  less than 100% ALU utilisation.
  
-## Twin-predication (based on source and destination register)
+## Twin-predication (based on source and destination register) <a name="tpred"></a>
  
  Twin-predication is not that much different, except that that
  the source is independently zero-predicated from the destination.