{} % input encoding is utf8 by default
{\usepackage[utf8]{inputenc}} % switch to utf8
-\usepackage[USenglish]{babel}
+\usepackage[UKenglish]{babel}
%
% if BibLaTeX is used
GPR(RT+i) = GPR(RA+i) + GPR(RB+i)
\end{verbatim}
-On top of these very basic beginnings, Predication and Conditional-Exit
+Note immediately that as a direct consequence of defining
+\textbf{VL} directly in terms of \textit{Elements} instead
+of the Vector Register bit-width, the known-issue of always
+having to have a Vector Loop at all times in order to guarantee
+Binary-executable Portability, SVP64 programmers may write
+simple loops - including SIMD ones, freed from Power-of-Two
+limitations - in just two lines of
+code\footnote[1]{with the proviso that the Programmer must
+be mindful of both the starting point and what they set MAXVL to.
+Hardware will helpfully remind them of any Register File overruns
+by happily throwing an Illegal Instructionp}.
+
+On top of these very basic but
+already-profound\footnote[2]{with hardware and ISA Architectural
+requirements that deal with the increased Dependency
+Hazard Management, too detailed to list in full in
+this document, the most important being that the total number of
+registers be a fixed \textbf{and mandatory} Standards-defined quantity}
+beginnings, Predication and Conditional-Exit
can be added. Predication is found in every GPU ISA, and Conditional-Exit
is a 50-year invention dating back to Zilog Z80 CPIR and LDIR.
\section{strncpy}
-strncpy presents some unique challenges for an ISA and hardware,
+strncpy\cite{libresoc-strncpy}
+presents some unique challenges for an ISA and hardware,
the primary being that in a SIMD (parallel) context, strncpy
operates in bytes where SIMD operates in power-of-two multiples
-only. PackedSIMD is the worst offender: PredicatedSIMD is better.
-If SIMD Load and Store has to start on an Aligned Memory location
-things get even worse. The operations that were supposed to speed
+only. PackedSIMD is the worst offender: PredicatedSIMD is marginally
+better\footnote[3]{caveat: if extended properly, as was
+done successfully, with huge beneficial effect, in ARM SVE}.
+If SIMD Load and Store has to start on an Aligned Memory location,
+which is a common limitation, things get even worse.
+The operations that were supposed to speed
up algorithms have to have "preamble" and "postamble" to take care
of the corner-cases.
in full. If the strncpy subroutine happens to copy from the last
few bytes in memory, SIMD LOADs are the worst thing to use.
We need a way to Conditionally terminate the LOAD and inform the
-Programmer, and this is where Load-Fault-First comes into play.
+Programmer, and this is where (as in ARM SVE)
+Load-Fault-First comes into play.
However even this is not enough: once LOADed it is necessary to
first spot the NUL character, and once identified to then begin
fail, followed by another instruction that explicitly truncates
the Vector Length, followed finally by the actual STORE.
-\textit{All of the sequential-search-and-truncate} is part of
-the Data-Dependent Fail-First Mode that is a first-order construct
-in SVP64. When applied to the \textbf{sv.cmpi} instruction,
-which produces a Vector of Condition Codes ()as opposed to just
-one for the Scalar \textbf{cmpi} instruction),
-the search for the NUL character truncates the Vector Length
-at the required point, such that the next instruction (STORE)
-is already set up to copy up to and including the NUL
-(if one was indeed found).
-
\begin{verbatim}
mtspr 9, 3 # move r3 to CTR
addi 0,0,0 # initialise r0 to zero
sv.bc 16, *0, -0xc
\end{verbatim}
+\textit{All of the sequential-search-and-truncate} is part of
+the Data-Dependent Fail-First Mode that is a first-order construct
+in SVP64. When applied to the \textbf{sv.cmpi} instruction,
+which produces a Vector of Condition Codes ()as opposed to just
+one for the Scalar \textbf{cmpi} instruction),
+the search for the NUL character truncates the Vector Length
+at the required point, such that the next instruction (STORE)
+is already set up to copy up to and including the NUL
+(if one was indeed found).
+
The next most important addition to SVP64 is a Vector-aware
Branch-Conditional instruction. Where \textbf{sv.cmpi} had
created a Vector of Condition Codes, \textbf{sv.bc/all}
\section{Conclusion}
Our goal as part of NGI Search is to validate that the approach
-taken above works across multiple algorithms. VectorScan was
+taken above works across multiple algorithms.
+VectorScan\cite{vectorscan} was
chosen as a high-value library due to the sheer overwhelming
complexity needed for other ISAs. libc6 was also chosen as it
is such a low-level library that any Search algorithm utilising
%% IEEE Periodicals, Piscataway,
%% NJ, USA, Oct. 2014, pp. 34--52.
- \bibitem{journal-abbreviations}
- \url{https://woodward.library.ubc.ca/researchhelp/journal-abbreviations/}
+ \bibitem{libresoc-strncpy}
+ \url{https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_ldst.py;hb=HEAD}
\end{thebibliography}
} % end \ifboolexpr