From 7115d15c2488b574fe6104d740759812bb47b411 Mon Sep 17 00:00:00 2001
From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Mon, 20 Jun 2022 20:33:43 +0100
Subject: [PATCH] restore references, remove inline acronym substitution, move
 images to fit properly (latex "decides" where to put them sigh)

---
 svp64-primer/summary.tex      | 42 +++++++++++++++++------------------
 svp64-primer/svp64-primer.tex |  2 +-
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/svp64-primer/summary.tex b/svp64-primer/summary.tex
index 4d33ad6a6..806fccf85 100644
--- a/svp64-primer/summary.tex
+++ b/svp64-primer/summary.tex
@@ -1,9 +1,9 @@
 \section{Summary}
-The proposed \ac{SV} is a Scalable Vector Specification for a hardware for-loop \textbf{that
+The proposed \acs{SV} is a Scalable Vector Specification for a hardware for-loop \textbf{that
 ONLY uses scalar instructions}.
 
 \begin{itemize}
-\item The Power \ac{ISA} v3.1 Specification is not altered in any way.
+\item The Power \acs{ISA} v3.1 Specification is not altered in any way.
   v3.1 Code-compatibility is guaranteed.
 \item Does not require sacrificing 32-bit Major Opcodes.
 \item Does not require adding duplicates of instructions
@@ -14,18 +14,18 @@ ONLY uses scalar instructions}.
       disruptive full architectural redesigns.
 \item Divided into Compliancy Levels to suit differing needs.
 \item At the highest Compliancy Level only requires five instructions
-      (SVE2 requires appx 9,000. \ac{AVX-512} around 10,000. \ac{RVV} around
+      (SVE2 requires appx 9,000. \acs{AVX-512} around 10,000. \acs{RVV} around
       300).
 \item Predication, an often-requested feature, is added cleanly
       (without modifying the v3.1 Power ISA)
 \item In-registers arbitrary-sized Matrix Multiply is achieved in three
       instructions (without adding any v3.1 Power ISA instructions)
-\item Full \ac{DCT} and \ac{FFT} RADIX2 Triple-loops are achieved with
+\item Full \acs{DCT} and \acs{FFT} RADIX2 Triple-loops are achieved with
       dramatically reduced instruction count, and power consumption expected
-      to greatly reduce. Normally found only in high-end \ac{VLIW} \ac{DSP}
+      to greatly reduce. Normally found only in high-end \acs{VLIW} \acs{DSP}
       (TI MSP, Qualcomm Hexagon)
 \item Fail-First Load/Store allows strncpy to be implemented in around 14
-      instructions (hand-optimised \ac{VSX} assembler is 240).
+      instructions (hand-optimised \acs{VSX} assembler is 240).
 \item Inner loop of MP3 implemented in under 100 instructions
       (gcc produces 450 for the same function on POWER9).
 \end{itemize}
@@ -49,9 +49,9 @@ the Power ISA's Supercomputing pedigree.
 
 \subsection{What is SIMD?}
 
-\ac{SIMD} is a way of partitioning existing \ac{CPU}
+\acs{SIMD} is a way of partitioning existing \acs{CPU}
 registers of 64-bit length into smaller 8-, 16-, 32-bit pieces.
-%\cite{SIMD_HARM}\cite{SIMD_HPC}
+\cite{SIMD_HARM}\cite{SIMD_HPC}
 These partitions can then be operated on simultaneously, and the initial values 
 and results being stored as entire 64-bit registers. The SIMD instruction opcode
  includes the data width and the operation to perform.
@@ -67,7 +67,7 @@ and results being stored as entire 64-bit registers. The SIMD instruction opcode
 This method can have a huge advantage for rapid processing of
 vector-type data (image/video, physics simulations, cryptography,
 etc.),
-%\cite{SIMD_WASM},
+\cite{SIMD_WASM},
  and thus on paper is very attractive compared to
 scalar-only instructions.
 \textit{As long as the data width fits the workload, everything is fine}.
@@ -102,13 +102,6 @@ An older alternative exists to utilise data parallelism - vector
 architectures. Vector CPUs collect operands from the main memory, and
 store them in large, sequential vector registers.\par
 
-\begin{figure}[hb]
-    \centering
-	\includegraphics[width=0.6\linewidth]{cray_vector_regs}
-	\caption{Cray Vector registers: 8 registers, 64 elements each}
-	\label{fig:cray_vector_regs}
-\end{figure}
-
 A simple vector processor might operate on one element at a time,
 however as the element operations are usually independent,
 a processor could be made to compute all of the vector's
@@ -116,7 +109,7 @@ elements simultaneously, taking advantage of multiple pipelines.\par
 
 Typically, today's vector processors can execute two, four, or eight
 64-bit elements per clock cycle.
-%\cite{SIMD_HARM}.
+\cite{SIMD_HARM}.
 Such processors can also deal with (in hardware) fringe cases where the vector
 length is not a multiple of the number of elements. The element data width
 is variable (just like in SIMD) but it is the \textit{number} of elements being
@@ -124,12 +117,19 @@ variable under control of a "setvl" instruction that makes Vector ISAs
 "Scalable"
 \par
 
-\ac{RVV} supports a VL of up to $2^{16}$ or $65536$ bits,
+\acs{RVV} supports a VL of up to $2^{16}$ or $65536$ bits,
 which can fit 1024 64-bit words.
-%\cite{riscv-v-spec}.
+\cite{riscv-v-spec}.
 The Cray-1 had 8 Vector Registers with up to 64 elements (64-bit each).
 An early Draft of RVV supported overlaying the Vector Registers onto the
-Floating Point registers, similar to \ac{MMX}.
+Floating Point registers, similar to \acs{MMX}.
+
+\begin{figure}[hb]
+    \centering
+	\includegraphics[width=0.6\linewidth]{cray_vector_regs}
+	\caption{Cray Vector registers: 8 registers, 64 elements each}
+	\label{fig:cray_vector_regs}
+\end{figure}
 
 Simple-V's "Vector" Registers are specifically designed to fit on top of
 the Scalar (GPR, FPR) register files, which are extended from the default
@@ -146,7 +146,7 @@ Vector instructions.
 \end{figure}
 
 \subsection{Simple Vectorisation}
-\ac{SV} is a Scalable Vector ISA designed for hybrid workloads (CPU, GPU,
+\acs{SV} is a Scalable Vector ISA designed for hybrid workloads (CPU, GPU,
 VPU, 3D?).  Includes features normally found only on Cray-style Supercomputers
 (Cray-1, NEC SX-Aurora) and GPUs.  Keeps to a strict uniform RISC paradigm,
 leveraging a scalar ISA by using "Prefixing".
diff --git a/svp64-primer/svp64-primer.tex b/svp64-primer/svp64-primer.tex
index 78c3eb641..ff782669b 100644
--- a/svp64-primer/svp64-primer.tex
+++ b/svp64-primer/svp64-primer.tex
@@ -25,7 +25,7 @@
 \input{summary}
 %\input{...}
 
-%\section{References}
+\section{References}
 %\textit{(All references and sources are available on request)}
 \bibliography{references}
 \bibliographystyle{ieeetr}
-- 
2.30.2