big amendments to opensearch2023.tex

author Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Thu, 18 May 2023 17:54:13 +0000 (17:54 +0000)

committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Thu, 18 May 2023 23:15:41 +0000 (23:15 +0000)
author Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Thu, 18 May 2023 17:54:13 +0000 (17:54 +0000)
committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Thu, 18 May 2023 23:15:41 +0000 (23:15 +0000)
diff --git a/conferences/opensearch2023/opensearch2023.tex b/conferences/opensearch2023/opensearch2023.tex

index bcc3a34188bd15df92ab4fb9c96243dc339d3019..75daaa6149a9ec7371346f3f1470512f37a02494 100644 (file)
--- a/conferences/opensearch2023/opensearch2023.tex
+++ b/conferences/opensearch2023/opensearch2023.tex
@@ -47,7 +47,7 @@
   {}                                      % input encoding is utf8 by default
   {\usepackage[utf8]{inputenc}}           % switch to utf8
  
-\usepackage[USenglish]{babel}
+\usepackage[UKenglish]{babel}
  
  %
  % if BibLaTeX is used
@@ -113,7 +113,25 @@ This is what SVP64 looks like:
         GPR(RT+i) = GPR(RA+i) + GPR(RB+i)
  \end{verbatim}
  
-On top of these very basic beginnings, Predication and Conditional-Exit
+Note immediately that as a direct consequence of defining 
+\textbf{VL} directly in terms of \textit{Elements} instead
+of the Vector Register bit-width, the known-issue of always
+having to have a Vector Loop at all times in order to guarantee
+Binary-executable Portability, SVP64 programmers may write
+simple loops - including SIMD ones, freed from Power-of-Two
+limitations - in just two lines of
+code\footnote[1]{with the proviso that the Programmer must
+be mindful of both the starting point and what they set MAXVL to.
+Hardware will helpfully remind them of any Register File overruns
+by happily throwing an Illegal Instructionp}.
+
+On top of these very basic but
+already-profound\footnote[2]{with hardware and ISA Architectural
+requirements that deal with the increased Dependency
+Hazard Management, too detailed to list in full in
+this document, the most important being that the total number of
+registers be a fixed \textbf{and mandatory} Standards-defined quantity}
+beginnings, Predication and Conditional-Exit
  can be added. Predication is found in every GPU ISA, and Conditional-Exit
  is a 50-year invention dating back to Zilog Z80 CPIR and LDIR.
  
@@ -161,12 +179,16 @@ leveraged for any other purpose, goes away.
  
  \section{strncpy}
  
-strncpy presents some unique challenges for an ISA and hardware,
+strncpy\cite{libresoc-strncpy}
+presents some unique challenges for an ISA and hardware,
  the primary being that in a SIMD (parallel) context, strncpy
  operates in bytes where SIMD operates in power-of-two multiples
-only.  PackedSIMD is the worst offender: PredicatedSIMD is better.
-If SIMD Load and Store has to start on an Aligned Memory location
-things get even worse.  The operations that were supposed to speed
+only.  PackedSIMD is the worst offender: PredicatedSIMD is marginally
+better\footnote[3]{caveat: if extended properly, as was
+done successfully, with huge beneficial effect, in ARM SVE}.
+If SIMD Load and Store has to start on an Aligned Memory location,
+which is a common limitation, things get even worse.
+The operations that were supposed to speed
  up algorithms have to have "preamble" and "postamble" to take care
  of the corner-cases.
  
@@ -175,7 +197,8 @@ Worse, a naive SIMD ISA cannot have Conditional inter-relationships.
  in full.  If the strncpy subroutine happens to copy from the last
  few bytes in memory, SIMD LOADs are the worst thing to use.
  We need a way to Conditionally terminate the LOAD and inform the
-Programmer, and this is where Load-Fault-First comes into play.
+Programmer, and this is where (as in ARM SVE)
+Load-Fault-First comes into play.
  
  However even this is not enough: once LOADed it is necessary to
  first spot the NUL character, and once identified to then begin
@@ -195,16 +218,6 @@ by an instruction that then searches sequentially for the first
  fail, followed by another instruction that explicitly truncates
  the Vector Length, followed finally by the actual STORE.
  
-\textit{All of the sequential-search-and-truncate} is part of
-the Data-Dependent Fail-First Mode that is a first-order construct
-in SVP64.  When applied to the \textbf{sv.cmpi} instruction,
-which produces a Vector of Condition Codes ()as opposed to just
-one for the Scalar \textbf{cmpi} instruction),
-the search for the NUL character truncates the Vector Length
-at the required point, such that the next instruction (STORE)
-is already set up to copy up to and including the NUL
-(if one was indeed found).
-
  \begin{verbatim}
       mtspr 9, 3   # move r3 to CTR
       addi 0,0,0   # initialise r0 to zero
@@ -232,6 +245,16 @@ is already set up to copy up to and including the NUL
       sv.bc 16, *0, -0xc
  \end{verbatim}
  
+\textit{All of the sequential-search-and-truncate} is part of
+the Data-Dependent Fail-First Mode that is a first-order construct
+in SVP64.  When applied to the \textbf{sv.cmpi} instruction,
+which produces a Vector of Condition Codes ()as opposed to just
+one for the Scalar \textbf{cmpi} instruction),
+the search for the NUL character truncates the Vector Length
+at the required point, such that the next instruction (STORE)
+is already set up to copy up to and including the NUL
+(if one was indeed found).
+
  The next most important addition to SVP64 is a Vector-aware
  Branch-Conditional instruction.  Where \textbf{sv.cmpi} had
  created a Vector of Condition Codes, \textbf{sv.bc/all}
@@ -262,7 +285,8 @@ saving on power consumption as well as potential operational delays.
  
  \section{Conclusion}
  Our goal as part of NGI Search is to validate that the approach
-taken above works across multiple algorithms.  VectorScan was
+taken above works across multiple algorithms. 
+VectorScan\cite{vectorscan} was
  chosen as a high-value library due to the sheer overwhelming
  complexity needed for other ISAs.  libc6 was also chosen as it
  is such a low-level library that any Search algorithm utilising
@@ -304,8 +328,8 @@ paradigm.
         %%      IEEE Periodicals, Piscataway,
         %%      NJ, USA, Oct. 2014, pp. 34--52.
  
-       \bibitem{journal-abbreviations}
-       \url{https://woodward.library.ubc.ca/researchhelp/journal-abbreviations/}
+       \bibitem{libresoc-strncpy}
+       \url{https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_ldst.py;hb=HEAD}
  
         \end{thebibliography}
  } % end \ifboolexpr
author	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Thu, 18 May 2023 17:54:13 +0000 (17:54 +0000)
committer	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Thu, 18 May 2023 23:15:41 +0000 (23:15 +0000)