restore references, remove inline acronym substitution,

author Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Mon, 20 Jun 2022 19:33:43 +0000 (20:33 +0100)

committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Mon, 20 Jun 2022 19:33:43 +0000 (20:33 +0100)
author Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Mon, 20 Jun 2022 19:33:43 +0000 (20:33 +0100)
committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Mon, 20 Jun 2022 19:33:43 +0000 (20:33 +0100)
diff --git a/svp64-primer/summary.tex b/svp64-primer/summary.tex

index 4d33ad6a6723324e8e9336d411d7016eea07c206..806fccf854b94db29fad2957ba7b624d860042a9 100644 (file)
--- a/svp64-primer/summary.tex
+++ b/svp64-primer/summary.tex
@@ -1,9 +1,9 @@
  \section{Summary}
-The proposed \ac{SV} is a Scalable Vector Specification for a hardware for-loop \textbf{that
+The proposed \acs{SV} is a Scalable Vector Specification for a hardware for-loop \textbf{that
  ONLY uses scalar instructions}.
  
  \begin{itemize}
-\item The Power \ac{ISA} v3.1 Specification is not altered in any way.
+\item The Power \acs{ISA} v3.1 Specification is not altered in any way.
    v3.1 Code-compatibility is guaranteed.
  \item Does not require sacrificing 32-bit Major Opcodes.
  \item Does not require adding duplicates of instructions
@@ -14,18 +14,18 @@ ONLY uses scalar instructions}.
        disruptive full architectural redesigns.
  \item Divided into Compliancy Levels to suit differing needs.
  \item At the highest Compliancy Level only requires five instructions
-      (SVE2 requires appx 9,000. \ac{AVX-512} around 10,000. \ac{RVV} around
+      (SVE2 requires appx 9,000. \acs{AVX-512} around 10,000. \acs{RVV} around
        300).
  \item Predication, an often-requested feature, is added cleanly
        (without modifying the v3.1 Power ISA)
  \item In-registers arbitrary-sized Matrix Multiply is achieved in three
        instructions (without adding any v3.1 Power ISA instructions)
-\item Full \ac{DCT} and \ac{FFT} RADIX2 Triple-loops are achieved with
+\item Full \acs{DCT} and \acs{FFT} RADIX2 Triple-loops are achieved with
        dramatically reduced instruction count, and power consumption expected
-      to greatly reduce. Normally found only in high-end \ac{VLIW} \ac{DSP}
+      to greatly reduce. Normally found only in high-end \acs{VLIW} \acs{DSP}
        (TI MSP, Qualcomm Hexagon)
  \item Fail-First Load/Store allows strncpy to be implemented in around 14
-      instructions (hand-optimised \ac{VSX} assembler is 240).
+      instructions (hand-optimised \acs{VSX} assembler is 240).
  \item Inner loop of MP3 implemented in under 100 instructions
        (gcc produces 450 for the same function on POWER9).
  \end{itemize}
@@ -49,9 +49,9 @@ the Power ISA's Supercomputing pedigree.
  
  \subsection{What is SIMD?}
  
-\ac{SIMD} is a way of partitioning existing \ac{CPU}
+\acs{SIMD} is a way of partitioning existing \acs{CPU}
  registers of 64-bit length into smaller 8-, 16-, 32-bit pieces.
-%\cite{SIMD_HARM}\cite{SIMD_HPC}
+\cite{SIMD_HARM}\cite{SIMD_HPC}
  These partitions can then be operated on simultaneously, and the initial values 
  and results being stored as entire 64-bit registers. The SIMD instruction opcode
   includes the data width and the operation to perform.
@@ -67,7 +67,7 @@ and results being stored as entire 64-bit registers. The SIMD instruction opcode
  This method can have a huge advantage for rapid processing of
  vector-type data (image/video, physics simulations, cryptography,
  etc.),
-%\cite{SIMD_WASM},
+\cite{SIMD_WASM},
   and thus on paper is very attractive compared to
  scalar-only instructions.
  \textit{As long as the data width fits the workload, everything is fine}.
@@ -102,13 +102,6 @@ An older alternative exists to utilise data parallelism - vector
  architectures. Vector CPUs collect operands from the main memory, and
  store them in large, sequential vector registers.\par
  
-\begin{figure}[hb]
-    \centering
-       \includegraphics[width=0.6\linewidth]{cray_vector_regs}
-       \caption{Cray Vector registers: 8 registers, 64 elements each}
-       \label{fig:cray_vector_regs}
-\end{figure}
-
  A simple vector processor might operate on one element at a time,
  however as the element operations are usually independent,
  a processor could be made to compute all of the vector's
@@ -116,7 +109,7 @@ elements simultaneously, taking advantage of multiple pipelines.\par
  
  Typically, today's vector processors can execute two, four, or eight
  64-bit elements per clock cycle.
-%\cite{SIMD_HARM}.
+\cite{SIMD_HARM}.
  Such processors can also deal with (in hardware) fringe cases where the vector
  length is not a multiple of the number of elements. The element data width
  is variable (just like in SIMD) but it is the \textit{number} of elements being
@@ -124,12 +117,19 @@ variable under control of a "setvl" instruction that makes Vector ISAs
  "Scalable"
  \par
  
-\ac{RVV} supports a VL of up to $2^{16}$ or $65536$ bits,
+\acs{RVV} supports a VL of up to $2^{16}$ or $65536$ bits,
  which can fit 1024 64-bit words.
-%\cite{riscv-v-spec}.
+\cite{riscv-v-spec}.
  The Cray-1 had 8 Vector Registers with up to 64 elements (64-bit each).
  An early Draft of RVV supported overlaying the Vector Registers onto the
-Floating Point registers, similar to \ac{MMX}.
+Floating Point registers, similar to \acs{MMX}.
+
+\begin{figure}[hb]
+    \centering
+       \includegraphics[width=0.6\linewidth]{cray_vector_regs}
+       \caption{Cray Vector registers: 8 registers, 64 elements each}
+       \label{fig:cray_vector_regs}
+\end{figure}
  
  Simple-V's "Vector" Registers are specifically designed to fit on top of
  the Scalar (GPR, FPR) register files, which are extended from the default
@@ -146,7 +146,7 @@ Vector instructions.
  \end{figure}
  
  \subsection{Simple Vectorisation}
-\ac{SV} is a Scalable Vector ISA designed for hybrid workloads (CPU, GPU,
+\acs{SV} is a Scalable Vector ISA designed for hybrid workloads (CPU, GPU,
  VPU, 3D?).  Includes features normally found only on Cray-style Supercomputers
  (Cray-1, NEC SX-Aurora) and GPUs.  Keeps to a strict uniform RISC paradigm,
  leveraging a scalar ISA by using "Prefixing".
diff --git a/svp64-primer/svp64-primer.tex b/svp64-primer/svp64-primer.tex

index 78c3eb641c89a41ebdbb002a3ee45dfde0ac0c4d..ff782669b073a3b5405d936ee9048c711e1deb61 100644 (file)
--- a/svp64-primer/svp64-primer.tex
+++ b/svp64-primer/svp64-primer.tex
@@ -25,7 +25,7 @@
  \input{summary}
  %\input{...}
  
-%\section{References}
+\section{References}
  %\textit{(All references and sources are available on request)}
  \bibliography{references}
  \bibliographystyle{ieeetr}
author	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Mon, 20 Jun 2022 19:33:43 +0000 (20:33 +0100)
committer	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Mon, 20 Jun 2022 19:33:43 +0000 (20:33 +0100)
svp64-primer/summary.tex		patch \| blob \| history
svp64-primer/svp64-primer.tex		patch \| blob \| history