identical to general-purpose Simple-V DD-FFirst...
\end{itemize}
-}Po
-
-\frame{\frametitle{maxloc}
- \begin{itemize}
- \item "TODO
- \end{itemize}
}
\frame{\frametitle{Pospopcount}
}
-\frame{\frametitle{Pospopcount.s}
+\frame{\frametitle{pospopcount assembler}
\lstinputlisting[language={}]{pospopcount.s}
}
-
\frame{\frametitle{strncpy}
\lstinputlisting[language={}]{strncpy.c}
\begin{itemize}
- \item two simple-looking for-loops, unfortunately sequentially
+ \item two simple-looking for-loops,
data-dependent in the first.
- \item Power ISA added a hard-coded variant of this inner
- data-dependent capacity into VSX - only for strcpy!
+ \item sv.cmpi stops at the first zero, /vli includes the zero
+ in VL.
+ \item note the post-increment Load/Store: saves
+ pre-decrementing
+ \item a Vector of CRs is produced which then get tested
+ by the sv.bc/all instruction, counting down CTR
+ per item tested.
+ \item Power ISA added hard-coded data-dependent capacity
+ into vstribr, where SVP64 it is generic (applies
+ to any instruction)
\item even the null-ing part is not straightforward as
it could be mis-aligned compared to the VSX width.
- \item end-result is that assembler-optimised strncpy on Power
- ISA v3.0 is a whopping 240 instructions. SVP64 is 10
+ \item end-result: assembler-optimised strncpy on Power
+ ISA v3.0 is a whopping 240 instructions. SVP64 is 10
+ and parallel in HW
\end{itemize}
}
\end{center}
}
+\frame{\frametitle{maxloc}
+ \begin{itemize}
+ \item "TODO
+ \end{itemize}
+}
+
\frame{\frametitle{Summary}
\begin{itemize}