From: Luke Kenneth Casson Leighton Date: Thu, 18 May 2023 17:54:13 +0000 (+0000) Subject: big amendments to opensearch2023.tex X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=c99af8f9710d3a4f6911b32b76e50e3691ac0f7f;p=libreriscv.git big amendments to opensearch2023.tex --- diff --git a/conferences/opensearch2023/opensearch2023.tex b/conferences/opensearch2023/opensearch2023.tex index bcc3a3418..75daaa614 100644 --- a/conferences/opensearch2023/opensearch2023.tex +++ b/conferences/opensearch2023/opensearch2023.tex @@ -47,7 +47,7 @@ {} % input encoding is utf8 by default {\usepackage[utf8]{inputenc}} % switch to utf8 -\usepackage[USenglish]{babel} +\usepackage[UKenglish]{babel} % % if BibLaTeX is used @@ -113,7 +113,25 @@ This is what SVP64 looks like: GPR(RT+i) = GPR(RA+i) + GPR(RB+i) \end{verbatim} -On top of these very basic beginnings, Predication and Conditional-Exit +Note immediately that as a direct consequence of defining +\textbf{VL} directly in terms of \textit{Elements} instead +of the Vector Register bit-width, the known-issue of always +having to have a Vector Loop at all times in order to guarantee +Binary-executable Portability, SVP64 programmers may write +simple loops - including SIMD ones, freed from Power-of-Two +limitations - in just two lines of +code\footnote[1]{with the proviso that the Programmer must +be mindful of both the starting point and what they set MAXVL to. +Hardware will helpfully remind them of any Register File overruns +by happily throwing an Illegal Instructionp}. + +On top of these very basic but +already-profound\footnote[2]{with hardware and ISA Architectural +requirements that deal with the increased Dependency +Hazard Management, too detailed to list in full in +this document, the most important being that the total number of +registers be a fixed \textbf{and mandatory} Standards-defined quantity} +beginnings, Predication and Conditional-Exit can be added. Predication is found in every GPU ISA, and Conditional-Exit is a 50-year invention dating back to Zilog Z80 CPIR and LDIR. @@ -161,12 +179,16 @@ leveraged for any other purpose, goes away. \section{strncpy} -strncpy presents some unique challenges for an ISA and hardware, +strncpy\cite{libresoc-strncpy} +presents some unique challenges for an ISA and hardware, the primary being that in a SIMD (parallel) context, strncpy operates in bytes where SIMD operates in power-of-two multiples -only. PackedSIMD is the worst offender: PredicatedSIMD is better. -If SIMD Load and Store has to start on an Aligned Memory location -things get even worse. The operations that were supposed to speed +only. PackedSIMD is the worst offender: PredicatedSIMD is marginally +better\footnote[3]{caveat: if extended properly, as was +done successfully, with huge beneficial effect, in ARM SVE}. +If SIMD Load and Store has to start on an Aligned Memory location, +which is a common limitation, things get even worse. +The operations that were supposed to speed up algorithms have to have "preamble" and "postamble" to take care of the corner-cases. @@ -175,7 +197,8 @@ Worse, a naive SIMD ISA cannot have Conditional inter-relationships. in full. If the strncpy subroutine happens to copy from the last few bytes in memory, SIMD LOADs are the worst thing to use. We need a way to Conditionally terminate the LOAD and inform the -Programmer, and this is where Load-Fault-First comes into play. +Programmer, and this is where (as in ARM SVE) +Load-Fault-First comes into play. However even this is not enough: once LOADed it is necessary to first spot the NUL character, and once identified to then begin @@ -195,16 +218,6 @@ by an instruction that then searches sequentially for the first fail, followed by another instruction that explicitly truncates the Vector Length, followed finally by the actual STORE. -\textit{All of the sequential-search-and-truncate} is part of -the Data-Dependent Fail-First Mode that is a first-order construct -in SVP64. When applied to the \textbf{sv.cmpi} instruction, -which produces a Vector of Condition Codes ()as opposed to just -one for the Scalar \textbf{cmpi} instruction), -the search for the NUL character truncates the Vector Length -at the required point, such that the next instruction (STORE) -is already set up to copy up to and including the NUL -(if one was indeed found). - \begin{verbatim} mtspr 9, 3 # move r3 to CTR addi 0,0,0 # initialise r0 to zero @@ -232,6 +245,16 @@ is already set up to copy up to and including the NUL sv.bc 16, *0, -0xc \end{verbatim} +\textit{All of the sequential-search-and-truncate} is part of +the Data-Dependent Fail-First Mode that is a first-order construct +in SVP64. When applied to the \textbf{sv.cmpi} instruction, +which produces a Vector of Condition Codes ()as opposed to just +one for the Scalar \textbf{cmpi} instruction), +the search for the NUL character truncates the Vector Length +at the required point, such that the next instruction (STORE) +is already set up to copy up to and including the NUL +(if one was indeed found). + The next most important addition to SVP64 is a Vector-aware Branch-Conditional instruction. Where \textbf{sv.cmpi} had created a Vector of Condition Codes, \textbf{sv.bc/all} @@ -262,7 +285,8 @@ saving on power consumption as well as potential operational delays. \section{Conclusion} Our goal as part of NGI Search is to validate that the approach -taken above works across multiple algorithms. VectorScan was +taken above works across multiple algorithms. +VectorScan\cite{vectorscan} was chosen as a high-value library due to the sheer overwhelming complexity needed for other ISAs. libc6 was also chosen as it is such a low-level library that any Search algorithm utilising @@ -304,8 +328,8 @@ paradigm. %% IEEE Periodicals, Piscataway, %% NJ, USA, Oct. 2014, pp. 34--52. - \bibitem{journal-abbreviations} - \url{https://woodward.library.ubc.ca/researchhelp/journal-abbreviations/} + \bibitem{libresoc-strncpy} + \url{https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_ldst.py;hb=HEAD} \end{thebibliography} } % end \ifboolexpr