minor rewording

author Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Wed, 29 Jun 2022 08:48:03 +0000 (09:48 +0100)

committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Wed, 29 Jun 2022 08:48:07 +0000 (09:48 +0100)
author Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Wed, 29 Jun 2022 08:48:03 +0000 (09:48 +0100)
committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Wed, 29 Jun 2022 08:48:07 +0000 (09:48 +0100)
diff --git a/svp64-primer/summary.tex b/svp64-primer/summary.tex

index 4d55a301613e2d2b72cee4031951d57afd196cb9..050e8c847fd37c015ed1dd2a3fe096b00092ac94 100644 (file)
--- a/svp64-primer/summary.tex
+++ b/svp64-primer/summary.tex
@@ -3,6 +3,7 @@ The proposed \acs{SV} is a Scalable Vector Specification for a hardware for-loop
  ONLY uses scalar instructions}.
  
  \begin{itemize}
+\itemsep 0em
  \item The Power \acs{ISA} v3.1 Specification is not altered in any way.
    v3.1 Code-compatibility is guaranteed.
  \item Does not require sacrificing 32-bit Major Opcodes.
@@ -38,7 +39,6 @@ All areas investigated so far consistently showed reductions in executable
  size, which as outlined in \cite{SIMD_HARM} has an indirect reduction in
  power consumption due to less I-Cache/TLB pressure and also Issue remaining
  idle for long periods.
-
  Simple-V has been specifically and carefully crafted to respect
  the Power ISA's Supercomputing pedigree.
  
@@ -92,6 +92,7 @@ Five digit Opcode proliferation (10,000 instructions) is overwhelming.
  The following are just some of the reasons why SIMD is unsustainable as
  the number of instructions increase:
  \begin{itemize}
+    \itemsep 0em
         \item Hardware design, ASIC routing etc.
         \item Compiler design
         \item Documentation of the ISA
@@ -115,11 +116,13 @@ elements simultaneously, taking advantage of multiple pipelines.\par
  Typically, today's vector processors can execute two, four, or eight
  64-bit elements per clock cycle.
  \cite{SIMD_HARM}.
-Such processors can also deal with (in hardware) fringe cases where the vector
-length is not a multiple of the number of elements. The element data width
-is variable (just like in SIMD) but it is the \textit{number} of elements being
-variable under control of a "setvl" instruction that makes Vector ISAs
-"Scalable"
+Vector ISAs are specifically designed to deal with (in hardware) fringe
+cases where an algorithm's element count is not a multiple of the
+underlying hardware "Lane" width. The element data width
+is variable (8 to 64-bit just like in SIMD)
+but it is the \textit{number} of elements being
+variable under control of a "setvl" instruction that specifically
+makes Vector ISAs "Scalable"
  \par
  
  \acs{RVV} supports a VL of up to $2^{16}$ or $65536$ bits,
@@ -136,12 +139,16 @@ Floating Point registers, similar to \acs{MMX}.
         \label{fig:cray_vector_regs}
  \end{figure}
  
-Simple-V's "Vector" Registers are specifically designed to fit on top of
+Simple-V's "Vector" Registers (a misnomer) are specifically designed to fit
+on top of
  the Scalar (GPR, FPR) register files, which are extended from the default
  of 32, to 128 entries in the high-end Compliancy Levels.  This is a primary
  reason why Simple-V can be added on top of an existing Scalar ISA, and
-\textit{in particular} why there is no need to add Vector Registers or
-Vector instructions.
+\textit{in particular} why there is no need to add explicit Vector
+Registers or
+Vector instructions.  The diagram below shows \textit{conceptually}
+how a Vector's elements are sequentially and linearly mapped onto the
+\textit{Scalar} register file:
  
  \begin{figure}[ht]
      \centering
@@ -150,6 +157,8 @@ Vector instructions.
         \label{fig:svp64_regs}
  \end{figure}
  
+\pagebreak
+
  \subsection{Simple Vectorisation}
  \acs{SV} is a Scalable Vector ISA designed for hybrid workloads (CPU, GPU,
  VPU, 3D?).  Includes features normally found only on Cray-style Supercomputers
@@ -162,12 +171,14 @@ the SV Vectorisation Context for the 32-bit Scalar Suffix.
  \vspace{10pt}
  Main design principles
  \begin{itemize}
+    \itemsep 0em
         \item Introduce by implementing on top of existing Power ISA
         \item Effectively a \textbf{hardware for-loop}, pauses main PC,
               issues multiple scalar operations
-       \item Preserves underlying scalar execution dependencies as if
+       \item Strictly preserves (leverages) underlying scalar execution
+          dependencies as if
               the for-loop had been expanded into actual scalar instructions
-        ("preserving Program Order")
+          ("preserving Program Order")
         \item Augments existing instructions by adding "tags" - provides
            Vectorisation "context" rather than adding new opcodes.
         \item Does not modify or deviate from the underlying scalar
@@ -181,6 +192,7 @@ Main design principles
  
  Advantages include:
  \begin{itemize}
+    \itemsep 0em
         \item Easy to create first (and sometimes only) implementation
               as a literal for-loop in hardware, simulators, and compilers.
         \item Obliterates SIMD opcode proliferation
author	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Wed, 29 Jun 2022 08:48:03 +0000 (09:48 +0100)
committer	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Wed, 29 Jun 2022 08:48:07 +0000 (09:48 +0100)