From 7115d15c2488b574fe6104d740759812bb47b411 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Mon, 20 Jun 2022 20:33:43 +0100 Subject: [PATCH] restore references, remove inline acronym substitution, move images to fit properly (latex "decides" where to put them sigh) --- svp64-primer/summary.tex | 42 +++++++++++++++++------------------ svp64-primer/svp64-primer.tex | 2 +- 2 files changed, 22 insertions(+), 22 deletions(-) diff --git a/svp64-primer/summary.tex b/svp64-primer/summary.tex index 4d33ad6a6..806fccf85 100644 --- a/svp64-primer/summary.tex +++ b/svp64-primer/summary.tex @@ -1,9 +1,9 @@ \section{Summary} -The proposed \ac{SV} is a Scalable Vector Specification for a hardware for-loop \textbf{that +The proposed \acs{SV} is a Scalable Vector Specification for a hardware for-loop \textbf{that ONLY uses scalar instructions}. \begin{itemize} -\item The Power \ac{ISA} v3.1 Specification is not altered in any way. +\item The Power \acs{ISA} v3.1 Specification is not altered in any way. v3.1 Code-compatibility is guaranteed. \item Does not require sacrificing 32-bit Major Opcodes. \item Does not require adding duplicates of instructions @@ -14,18 +14,18 @@ ONLY uses scalar instructions}. disruptive full architectural redesigns. \item Divided into Compliancy Levels to suit differing needs. \item At the highest Compliancy Level only requires five instructions - (SVE2 requires appx 9,000. \ac{AVX-512} around 10,000. \ac{RVV} around + (SVE2 requires appx 9,000. \acs{AVX-512} around 10,000. \acs{RVV} around 300). \item Predication, an often-requested feature, is added cleanly (without modifying the v3.1 Power ISA) \item In-registers arbitrary-sized Matrix Multiply is achieved in three instructions (without adding any v3.1 Power ISA instructions) -\item Full \ac{DCT} and \ac{FFT} RADIX2 Triple-loops are achieved with +\item Full \acs{DCT} and \acs{FFT} RADIX2 Triple-loops are achieved with dramatically reduced instruction count, and power consumption expected - to greatly reduce. Normally found only in high-end \ac{VLIW} \ac{DSP} + to greatly reduce. Normally found only in high-end \acs{VLIW} \acs{DSP} (TI MSP, Qualcomm Hexagon) \item Fail-First Load/Store allows strncpy to be implemented in around 14 - instructions (hand-optimised \ac{VSX} assembler is 240). + instructions (hand-optimised \acs{VSX} assembler is 240). \item Inner loop of MP3 implemented in under 100 instructions (gcc produces 450 for the same function on POWER9). \end{itemize} @@ -49,9 +49,9 @@ the Power ISA's Supercomputing pedigree. \subsection{What is SIMD?} -\ac{SIMD} is a way of partitioning existing \ac{CPU} +\acs{SIMD} is a way of partitioning existing \acs{CPU} registers of 64-bit length into smaller 8-, 16-, 32-bit pieces. -%\cite{SIMD_HARM}\cite{SIMD_HPC} +\cite{SIMD_HARM}\cite{SIMD_HPC} These partitions can then be operated on simultaneously, and the initial values and results being stored as entire 64-bit registers. The SIMD instruction opcode includes the data width and the operation to perform. @@ -67,7 +67,7 @@ and results being stored as entire 64-bit registers. The SIMD instruction opcode This method can have a huge advantage for rapid processing of vector-type data (image/video, physics simulations, cryptography, etc.), -%\cite{SIMD_WASM}, +\cite{SIMD_WASM}, and thus on paper is very attractive compared to scalar-only instructions. \textit{As long as the data width fits the workload, everything is fine}. @@ -102,13 +102,6 @@ An older alternative exists to utilise data parallelism - vector architectures. Vector CPUs collect operands from the main memory, and store them in large, sequential vector registers.\par -\begin{figure}[hb] - \centering - \includegraphics[width=0.6\linewidth]{cray_vector_regs} - \caption{Cray Vector registers: 8 registers, 64 elements each} - \label{fig:cray_vector_regs} -\end{figure} - A simple vector processor might operate on one element at a time, however as the element operations are usually independent, a processor could be made to compute all of the vector's @@ -116,7 +109,7 @@ elements simultaneously, taking advantage of multiple pipelines.\par Typically, today's vector processors can execute two, four, or eight 64-bit elements per clock cycle. -%\cite{SIMD_HARM}. +\cite{SIMD_HARM}. Such processors can also deal with (in hardware) fringe cases where the vector length is not a multiple of the number of elements. The element data width is variable (just like in SIMD) but it is the \textit{number} of elements being @@ -124,12 +117,19 @@ variable under control of a "setvl" instruction that makes Vector ISAs "Scalable" \par -\ac{RVV} supports a VL of up to $2^{16}$ or $65536$ bits, +\acs{RVV} supports a VL of up to $2^{16}$ or $65536$ bits, which can fit 1024 64-bit words. -%\cite{riscv-v-spec}. +\cite{riscv-v-spec}. The Cray-1 had 8 Vector Registers with up to 64 elements (64-bit each). An early Draft of RVV supported overlaying the Vector Registers onto the -Floating Point registers, similar to \ac{MMX}. +Floating Point registers, similar to \acs{MMX}. + +\begin{figure}[hb] + \centering + \includegraphics[width=0.6\linewidth]{cray_vector_regs} + \caption{Cray Vector registers: 8 registers, 64 elements each} + \label{fig:cray_vector_regs} +\end{figure} Simple-V's "Vector" Registers are specifically designed to fit on top of the Scalar (GPR, FPR) register files, which are extended from the default @@ -146,7 +146,7 @@ Vector instructions. \end{figure} \subsection{Simple Vectorisation} -\ac{SV} is a Scalable Vector ISA designed for hybrid workloads (CPU, GPU, +\acs{SV} is a Scalable Vector ISA designed for hybrid workloads (CPU, GPU, VPU, 3D?). Includes features normally found only on Cray-style Supercomputers (Cray-1, NEC SX-Aurora) and GPUs. Keeps to a strict uniform RISC paradigm, leveraging a scalar ISA by using "Prefixing". diff --git a/svp64-primer/svp64-primer.tex b/svp64-primer/svp64-primer.tex index 78c3eb641..ff782669b 100644 --- a/svp64-primer/svp64-primer.tex +++ b/svp64-primer/svp64-primer.tex @@ -25,7 +25,7 @@ \input{summary} %\input{...} -%\section{References} +\section{References} %\textit{(All references and sources are available on request)} \bibliography{references} \bibliographystyle{ieeetr} -- 2.30.2