clarify summary

[libreriscv.git] / svp64-primer / summary.tex
diff --git a/svp64-primer/summary.tex b/svp64-primer/summary.tex

index d71686a80fa4a90ebb853b20d7a674d29ad0e07f..0b1b172cc22f029eb2bc2dc0c97c5ba24f1aed66 100644 (file)
--- a/svp64-primer/summary.tex
+++ b/svp64-primer/summary.tex
@@ -8,6 +8,8 @@ ONLY uses scalar instructions}.
  \item Does not require sacrificing 32-bit Major Opcodes.
  \item Does not require adding duplicates of instructions
        (popcnt, popcntw, popcntd, vpopcntb, vpopcnth, vpopcntw, vpopcntd)
+\item Fully abstracted: does not create Micro-architectural dependencies
+      (no fixed "Lane" size).
  \item Specifically designed to be easily implemented
        on top of an existing Micro-architecture (especially
        Superscalar Out-of-Order Multi-issue) without
@@ -24,7 +26,8 @@ ONLY uses scalar instructions}.
        dramatically reduced instruction count, and power consumption expected
        to greatly reduce. Normally found only in high-end \acs{VLIW} \acs{DSP}
        (TI MSP, Qualcomm Hexagon)
-\item Fail-First Load/Store allows strncpy to be implemented in around 14
+\item Fail-First Load/Store allows Vectorised high performance
+      strncpy to be implemented in around 14
        instructions (hand-optimised \acs{VSX} assembler is 240).
  \item Inner loop of MP3 implemented in under 100 instructions
        (gcc produces 450 for the same function on POWER9).
@@ -53,8 +56,9 @@ the Power ISA's Supercomputing pedigree.
  registers of 64-bit length into smaller 8-, 16-, 32-bit pieces.
  \cite{SIMD_HARM}\cite{SIMD_HPC}
  These partitions can then be operated on simultaneously, and the initial values 
-and results being stored as entire 64-bit registers. The SIMD instruction opcode
- includes the data width and the operation to perform.
+and results being stored as entire 64-bit registers (\acs{SWAR}).
+The SIMD instruction opcode
+includes the data width and the operation to perform.
  \par
  
  \begin{figure}[hb]
@@ -97,7 +101,7 @@ the number of instructions increase:
         Multi-issue decoding
  \end{itemize}
  
-\subsection{Vector Architectures}
+\subsection{Scalable Vector Architectures}
  An older alternative exists to utilise data parallelism - vector
  architectures. Vector CPUs collect operands from the main memory, and
  store them in large, sequential vector registers.\par
@@ -133,7 +137,7 @@ Floating Point registers, similar to \acs{MMX}.
  
  Simple-V's "Vector" Registers are specifically designed to fit on top of
  the Scalar (GPR, FPR) register files, which are extended from the default
-of 32, to 128 entries in the Libre-SOC implementation.  This is a primary
+of 32, to 128 entries in the high-end Compliancy Levels.  This is a primary
  reason why Simple-V can be added on top of an existing Scalar ISA, and
  \textit{in particular} why there is no need to add Vector Registers or
  Vector instructions.