ONLY uses scalar instructions}.
\begin{itemize}
+\itemsep 0em
\item The Power \acs{ISA} v3.1 Specification is not altered in any way.
v3.1 Code-compatibility is guaranteed.
\item Does not require sacrificing 32-bit Major Opcodes.
size, which as outlined in \cite{SIMD_HARM} has an indirect reduction in
power consumption due to less I-Cache/TLB pressure and also Issue remaining
idle for long periods.
-
Simple-V has been specifically and carefully crafted to respect
the Power ISA's Supercomputing pedigree.
The following are just some of the reasons why SIMD is unsustainable as
the number of instructions increase:
\begin{itemize}
+ \itemsep 0em
\item Hardware design, ASIC routing etc.
\item Compiler design
\item Documentation of the ISA
Typically, today's vector processors can execute two, four, or eight
64-bit elements per clock cycle.
\cite{SIMD_HARM}.
-Such processors can also deal with (in hardware) fringe cases where the vector
-length is not a multiple of the number of elements. The element data width
-is variable (just like in SIMD) but it is the \textit{number} of elements being
-variable under control of a "setvl" instruction that makes Vector ISAs
-"Scalable"
+Vector ISAs are specifically designed to deal with (in hardware) fringe
+cases where an algorithm's element count is not a multiple of the
+underlying hardware "Lane" width. The element data width
+is variable (8 to 64-bit just like in SIMD)
+but it is the \textit{number} of elements being
+variable under control of a "setvl" instruction that specifically
+makes Vector ISAs "Scalable"
\par
\acs{RVV} supports a VL of up to $2^{16}$ or $65536$ bits,
\label{fig:cray_vector_regs}
\end{figure}
-Simple-V's "Vector" Registers are specifically designed to fit on top of
+Simple-V's "Vector" Registers (a misnomer) are specifically designed to fit
+on top of
the Scalar (GPR, FPR) register files, which are extended from the default
of 32, to 128 entries in the high-end Compliancy Levels. This is a primary
reason why Simple-V can be added on top of an existing Scalar ISA, and
-\textit{in particular} why there is no need to add Vector Registers or
-Vector instructions.
+\textit{in particular} why there is no need to add explicit Vector
+Registers or
+Vector instructions. The diagram below shows \textit{conceptually}
+how a Vector's elements are sequentially and linearly mapped onto the
+\textit{Scalar} register file:
\begin{figure}[ht]
\centering
\label{fig:svp64_regs}
\end{figure}
+\pagebreak
+
\subsection{Simple Vectorisation}
\acs{SV} is a Scalable Vector ISA designed for hybrid workloads (CPU, GPU,
VPU, 3D?). Includes features normally found only on Cray-style Supercomputers
\vspace{10pt}
Main design principles
\begin{itemize}
+ \itemsep 0em
\item Introduce by implementing on top of existing Power ISA
\item Effectively a \textbf{hardware for-loop}, pauses main PC,
issues multiple scalar operations
- \item Preserves underlying scalar execution dependencies as if
+ \item Strictly preserves (leverages) underlying scalar execution
+ dependencies as if
the for-loop had been expanded into actual scalar instructions
- ("preserving Program Order")
+ ("preserving Program Order")
\item Augments existing instructions by adding "tags" - provides
Vectorisation "context" rather than adding new opcodes.
\item Does not modify or deviate from the underlying scalar
Advantages include:
\begin{itemize}
+ \itemsep 0em
\item Easy to create first (and sometimes only) implementation
as a literal for-loop in hardware, simulators, and compilers.
\item Obliterates SIMD opcode proliferation