def m2(a): # array a
m, nm, i, n = 0, 0, 0, len(a)
while i<n:
- while i<n and a[i]<=m: i += 1 # skip whilst smaller
- while i<n and a[i]> m: m, nm, i = a[i], i, i+1
- return nm;
+ while i<n and a[i]<=m: i += 1 # skip whilst smaller/equal
+ while i<n and a[i]> m: m,nm,i = a[i],i,i+1 # only whilst bigger
+ return nm
```
# Implementation in SVP64 Assembler
The algorithm works by excluding previous operations using `i-in-unary`,
combined with VL being truncated due to use of Data-Dependent Fail-First.
-What therefore happens for example on the `sv.com/ff=gt/m=ge` operation
+What therefore happens for example on the `sv.cmp/ff=gt/m=ge` operation
is that it is *VL* (the Vector Length) that gets truncated to only
contain those elements that are smaller than the current largest value
found (`m` aka `r4`). Calling `sv.creqv` then sets **only** the
them because the Predicate Mask is `m=ge` (ok if the CR field bit is
**zero**).
+Therefore, the way that Data-Dependent Fail-First works, it attempts
+*up to* the current Vector Length, and on detecting the first failure
+will truncate at that point. In effect this is speculative sequential
+execution of `while (i<n and a[i]<=m) : i += 1`.
+
+Next comes the `sv.minmax.` which covers the `while (i<n and a[i]>m)`
+again in a single instruction, but this time it is a little more
+involved. Firstly: mapreduce mode is used, with `r4` as both source
+and destination, `r4` acts as the sequential accumulator. Secondly,
+again it is masked (`m=ge`) which again excludes testing of previously-tested
+elements. The next few instructions extract the information provided
+by Vector Length (VL) being truncated - potentially even to zero!
+(Note that `mtcrf 128,0` takes care of the possibility of VL=0, which if
+that happens then CR0 would be left it in its previous state: a
+very much undesirable behaviour!)
+
+`crternlogi 0,1,2,127` will combine the setting of CR0.EQ and CR0.LT
+to give us a true Greater-than-or-equal, including under the circumstance
+where VL=0. The `sv.crand` will then take a copy of the `i-in-unary`
+mask, but only when CR0.EQ is set. This is why the third operand `BB`
+is a Scalar not a Vector (BT=16/Vector, BA=19/Vector, BB=0/Scalar)
+which effectively performs a broadcast-splat-ANDing, as follows:
+
+```
+ CR4.SO = CR4.EQ AND CR0.EQ (if VL >= 1)
+ CR5.SO = CR5.EQ AND CR0.EQ (if VL >= 2)
+ CR6.SO = CR6.EQ AND CR0.EQ (if VL >= 3)
+ CR7.SO = CR7.EQ AND CR0.EQ (if VL = 4)
+```
+
[[!tag svp64_cookbook ]]