(no commit message)

[libreriscv.git] / openpower / sv / normal.mdwn
diff --git a/openpower/sv/normal.mdwn b/openpower/sv/normal.mdwn

index 98d9c192fcddf4d962fba206492fa80f05dd1db4..b7999c11e1c296d3531bcf28862436650c1bd8e8 100644 (file)
--- a/openpower/sv/normal.mdwn
+++ b/openpower/sv/normal.mdwn
@@ -30,8 +30,10 @@ Modes apply to Arithmetic and Logical SVP64 operations:
  and FP.
  * **reduce mode**. a mapreduce is performed.  the result is a scalar.  a result vector however is required, as the upper elements may be used to store intermediary computations.  the result of the mapreduce is in the first element with a nonzero predicate bit.  see [[svp64/appendix]]
    note that there are comprehensive caveats when using this mode.
-* **pred-result** will test the result (CR testing selects a bit of CR and inverts it, just like branch testing) and if the test fails it is as if the 
-*destination* predicate bit was zero.  When Rc=1 the CR element however is still stored in the CR regfile, even if the test failed.  See appendix for details.
+* **pred-result** will test the result (CR testing selects a bit of CR and inverts it, just like branch conditional testing) and if the test fails it
+is as if the 
+*destination* predicate bit was zero even before starting the operation. 
+When Rc=1 the CR element however is still stored in the CR regfile, even if the test failed.  See appendix for details.
  
  Note that ffirst and reduce modes are not anticipated to be high-performance in some implementations.  ffirst due to interactions with VL, and reduce due to it requiring additional operations to produce a result.  normal, saturate and pred-result are however inter-element independent and may easily be parallelised to give high performance, regardless of the value of VL.
  
@@ -93,26 +95,30 @@ the hugely detrimental effect it has on parallel processing, XER.SO is
  overflow bit is therefore simply set to zero if saturation did not occur,
  and to one if it did.
  
-Note also that saturate on operations that produce a carry output are
-prohibited due to the conflicting use of the CR.so bit for storing if
-saturation occurred.
-
-Post-analysis of the Vector of CRs to find out if any given element hit
-saturation may be done using a mapreduced CR op (cror), or by using the
-new crweird instruction, transferring the relevant CR bits to a scalar
-integer and testing it for nonzero.  see [[sv/cr_int_predication]]
+Note also that saturate on operations that set OE=1 must raise an
+Illegal Instruction due to the conflicting use of the CR.so bit for
+storing if
+saturation occurred. Integer Operations that produce a Carry-Out (CA, CA32):
+these two bits will be `UNDEFINED` if saturation is also requested.
  
  Note that the operation takes place at the maximum bitwidth (max of
  src and dest elwidth) and that truncation occurs to the range of the
  dest elwidth.
  
+*Programmer's Note: Post-analysis of the Vector of CRs to find out if any given element hit
+saturation may be done using a mapreduced CR op (cror), or by using the
+new crrweird instruction with Rc=1, which will transfer the required
+CR bits to a scalar integer and update CR0, which will allow testing
+the scalar integer for nonzero.  see [[sv/cr_int_predication]]*
+
  # Reduce mode
  
  Reduction in SVP64 is similar in essence to other Vector Processing
  ISAs, but leverages the underlying scalar Base v3.0B operations.
  Thus it is more a convention that the programmer may utilise to give
-the appearance and effect of a Horizontal Vector Reduction.
-Details are in the [[svp64/appendix]]
+the appearance and effect of a Horizontal Vector Reduction. Due
+to the unusual decoupling it is also possible to perform
+prefix-sum in certain circumstances. Details are in the [[svp64/appendix]]
  
  # Fail-on-first
  
@@ -121,7 +127,6 @@ the other for arithmetic operations (actually, CR-driven).  Note in each
  case the assumption is that vector elements are required appear to be
  executed in sequential Program Order, element 0 being the first.
  
-
  * Data-driven (CR-driven) fail-on-first activates when Rc=1 or other
    CR-creating operation produces a result (including cmp).  Similar to
    branch, an analysis of the CR is performed and if the test fails, the
@@ -153,11 +158,12 @@ of strncpy, to include the terminating zero.
  
  In CR-based data-driven fail-on-first there is only the option to select
  and test one bit of each CR (just as with branch BO).  For more complex
-tests this may be insufficient.  If that is the case, a vectorised crops
+tests this may be insufficient.  If that is the case, a vectorised crop
  (crand, cror) may be used, and ffirst applied to the crop instead of to
-the arithmetic vector.
+the arithmetic vector. Note that crops are covered by 
+the [[sv/cr_ops]] Mode format.
  
-One extremely important aspect of ffirst is:
+Two extremely important aspects of ffirst are:
  
  * LDST ffirst may never set VL equal to zero.  This because on the first
    element an exception must be raised "as normal".
@@ -168,7 +174,12 @@ One extremely important aspect of ffirst is:
    vectorised operations are effectively `nops` which is
    *precisely the desired and intended behaviour*.
  
-CR-based data-dependent first on the other hand MUST not truncate VL
+The second crucial aspect, compared to LDST Ffirst:
+
+* LD/ST Failfirst may (beyond the initial first element
+  conditions) truncate VL for any architecturally
+  suitable reason.
+* CR-based data-dependent first on the other hand MUST NOT truncate VL
  arbitrarily to a length decided by the hardware: VL MUST only be
  truncated based explicitly on whether a test fails.
  This because it is a precise test on which algorithms