(no commit message)

[libreriscv.git] / openpower / sv / svp64 / appendix.mdwn
diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn

index 2f422ea599e9572f443584d8d8004571a9334020..bad606195af4b8d3cfba31a94e8e7d9aa398c053 100644 (file)
--- a/openpower/sv/svp64/appendix.mdwn
+++ b/openpower/sv/svp64/appendix.mdwn
@@ -223,6 +223,12 @@ This is equivalent to
  `llvm.masked.compressstore.*`
  followed by
  `llvm.masked.expandload.*`
+with a single instruction.
+
+This extreme power and flexibility comes down to the fact that SVP64
+is not actually a Vector ISA: it is a loop-abstraction-concept that
+is applied *in general* to Scalar operations, just like the x86
+`REP` instruction (if put on steroids).
  
  # Reduce modes
  
@@ -260,8 +266,9 @@ as a simple and natural relaxation of the usual restriction on the Vector
  Looping which would terminate if the destination was marked as a Scalar.
  Scalar Reduction by contrast *keeps issuing Vector Element Operations*
  even though the destination register is marked as scalar.
-Thus it is up to the programmer to be aware of this and observe some
-conventions.
+Thus it is up to the programmer to be aware of this, observe some
+conventions, and thus end up achieving the desired outcome of scalar
+reduction.
  
  It is also important to appreciate that there is no
  actual imposition or restriction on how this mode is utilised: there
@@ -330,6 +337,7 @@ Using the same register as both the source and destination, with Vectors
  of different offsets masks and values to be inserted has multiple
  applications including Video, cryptography and JIT compilation.
  
+Due to the Deterministic Scheduling,
  Subtract and Divide are still permitted to be executed in this mode,
  although from an algorithmic perspective it is strongly discouraged.
  It would be better to use addition followed by one final subtract,
@@ -419,8 +427,9 @@ executed in sequential Program Order, element 0 being the first.
  * Data-driven (CR-driven) fail-on-first activates when Rc=1 or other
    CR-creating operation produces a result (including cmp).  Similar to
    branch, an analysis of the CR is performed and if the test fails, the
-  vector operation terminates and discards all element operations at and
-  above the current one, and VL is truncated to either
+  vector operation terminates and discards all element operations
+  above the current one (and the current one if VLi is not set),
+  and VL is truncated to either
    the *previous* element or the current one, depending on whether
    VLi (VL "inclusive") is set.
  
@@ -782,6 +791,10 @@ For modes:
  
  # Proposed Parallel-reduction algorithm
  
+**This algorithm contains a MV operation and may NOT be used.  Removal
+of the MV operation may be achieved by using index-redirection as was
+achieved in DCT and FFT REMAP**
+
  ```
  /// reference implementation of proposed SimpleV reduction semantics.
  ///
@@ -801,7 +814,9 @@ def reduce(vl, vec, pred):
              if pred[i] && other_pred
                  vec[i] += vec[other];
              else if other_pred
-                vec[i] = vec[other];
+                XXX VIOLATION OF SVP64 DESIGN XXX
+                XXX vec[i] = vec[other];      XXX
+                XXX VIOLATION OF SVP64 DESIGN XXX
              pred[i] |= other_pred;
  ```