From a2f91d0a86b78c5da762d4a5cc162d4b0f309e51 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Wed, 29 Jun 2022 09:48:03 +0100 Subject: [PATCH] minor rewording --- svp64-primer/summary.tex | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/svp64-primer/summary.tex b/svp64-primer/summary.tex index 4d55a3016..050e8c847 100644 --- a/svp64-primer/summary.tex +++ b/svp64-primer/summary.tex @@ -3,6 +3,7 @@ The proposed \acs{SV} is a Scalable Vector Specification for a hardware for-loop ONLY uses scalar instructions}. \begin{itemize} +\itemsep 0em \item The Power \acs{ISA} v3.1 Specification is not altered in any way. v3.1 Code-compatibility is guaranteed. \item Does not require sacrificing 32-bit Major Opcodes. @@ -38,7 +39,6 @@ All areas investigated so far consistently showed reductions in executable size, which as outlined in \cite{SIMD_HARM} has an indirect reduction in power consumption due to less I-Cache/TLB pressure and also Issue remaining idle for long periods. - Simple-V has been specifically and carefully crafted to respect the Power ISA's Supercomputing pedigree. @@ -92,6 +92,7 @@ Five digit Opcode proliferation (10,000 instructions) is overwhelming. The following are just some of the reasons why SIMD is unsustainable as the number of instructions increase: \begin{itemize} + \itemsep 0em \item Hardware design, ASIC routing etc. \item Compiler design \item Documentation of the ISA @@ -115,11 +116,13 @@ elements simultaneously, taking advantage of multiple pipelines.\par Typically, today's vector processors can execute two, four, or eight 64-bit elements per clock cycle. \cite{SIMD_HARM}. -Such processors can also deal with (in hardware) fringe cases where the vector -length is not a multiple of the number of elements. The element data width -is variable (just like in SIMD) but it is the \textit{number} of elements being -variable under control of a "setvl" instruction that makes Vector ISAs -"Scalable" +Vector ISAs are specifically designed to deal with (in hardware) fringe +cases where an algorithm's element count is not a multiple of the +underlying hardware "Lane" width. The element data width +is variable (8 to 64-bit just like in SIMD) +but it is the \textit{number} of elements being +variable under control of a "setvl" instruction that specifically +makes Vector ISAs "Scalable" \par \acs{RVV} supports a VL of up to $2^{16}$ or $65536$ bits, @@ -136,12 +139,16 @@ Floating Point registers, similar to \acs{MMX}. \label{fig:cray_vector_regs} \end{figure} -Simple-V's "Vector" Registers are specifically designed to fit on top of +Simple-V's "Vector" Registers (a misnomer) are specifically designed to fit +on top of the Scalar (GPR, FPR) register files, which are extended from the default of 32, to 128 entries in the high-end Compliancy Levels. This is a primary reason why Simple-V can be added on top of an existing Scalar ISA, and -\textit{in particular} why there is no need to add Vector Registers or -Vector instructions. +\textit{in particular} why there is no need to add explicit Vector +Registers or +Vector instructions. The diagram below shows \textit{conceptually} +how a Vector's elements are sequentially and linearly mapped onto the +\textit{Scalar} register file: \begin{figure}[ht] \centering @@ -150,6 +157,8 @@ Vector instructions. \label{fig:svp64_regs} \end{figure} +\pagebreak + \subsection{Simple Vectorisation} \acs{SV} is a Scalable Vector ISA designed for hybrid workloads (CPU, GPU, VPU, 3D?). Includes features normally found only on Cray-style Supercomputers @@ -162,12 +171,14 @@ the SV Vectorisation Context for the 32-bit Scalar Suffix. \vspace{10pt} Main design principles \begin{itemize} + \itemsep 0em \item Introduce by implementing on top of existing Power ISA \item Effectively a \textbf{hardware for-loop}, pauses main PC, issues multiple scalar operations - \item Preserves underlying scalar execution dependencies as if + \item Strictly preserves (leverages) underlying scalar execution + dependencies as if the for-loop had been expanded into actual scalar instructions - ("preserving Program Order") + ("preserving Program Order") \item Augments existing instructions by adding "tags" - provides Vectorisation "context" rather than adding new opcodes. \item Does not modify or deviate from the underlying scalar @@ -181,6 +192,7 @@ Main design principles Advantages include: \begin{itemize} + \itemsep 0em \item Easy to create first (and sometimes only) implementation as a literal for-loop in hardware, simulators, and compilers. \item Obliterates SIMD opcode proliferation -- 2.30.2