\begin{itemize}
\item FORTRAN MAXLOC - find the index of largest number
- \item notoriously difficult to optimally implement for SIMD
+ notoriously difficult to optimally implement for SIMD
\item algorithms include \textit{depth-first} recursive
descent (!) mapreduce-style, offsetting the
locally-computed largest index (plus value) which
are then tested in upper level(s)
- \item SVP64 through Data-Dependent Fail-First can perform
- each of the two key while-loop tests with
- \textit{single instructions}.
+ \item SVP64: note below the sv.cmp (first while-loop),
+ sv.minmax. (second while-loop) and the sv.crnand which
+ by Predicate masking is 3-in 1-out CR ops
+ not the usual 2-in 1-out
\item There is however quite a bit of "housekeeping".
Full analysis: \\
https://libre-soc.org/openpower/sv/cookbook/fortran\_maxloc
\frame{\frametitle{maxloc assembler}
\lstinputlisting[language={}]{maxloc.s}
-
}
\frame{\frametitle{Summary}
# nm = i: count masked bits. could use crweirds
sv.svstep/mr/m=so 1,0,6,1 # get vector dststep
sv.creqv *16,*16,*16 # set mask on already-tested
-bc 12,0,-0x40 # CR0 lt clear, branch back
+bc 12,0,-0x3c # CR0 lt clear, branch back