\begin{figure}[h]
\includegraphics[width=\linewidth]{simd_axb}
\caption{SIMD multiplication}
- \label{simd_axb}
+ \label{fig:simd_axb}
\end{figure}
This method can have a huge advantage for rapid processing of vector-type data (image/video, physics simulations, cryptography, etc.)\cite{SIMD_WASM}, and thus on paper is very attractive compared to scalar-only instructions.\par
A simple vector processor might operate on one element at a time, however as the operations are independent by definition \textbf{(where is this from?)}, a processor could be made to compute all of the vector's elements simultaneously.\par
-Typically, today's vector processors can execute two, four, or eight 64-bit elements per clock cycle\cite{SIMD_HARM}. Such processors can also deal with (in hardware) fringe cases where the vector length is not a multiple of the number of elements. The element data width is variable (just like in SIMD). Fig \ref{vl_reg_n} shows the relationship between number of elements, data width and register vector length.
+Typically, today's vector processors can execute two, four, or eight 64-bit elements per clock cycle\cite{SIMD_HARM}. Such processors can also deal with (in hardware) fringe cases where the vector length is not a multiple of the number of elements. The element data width is variable (just like in SIMD). Fig \ref{fig:vl_reg_n} shows the relationship between number of elements, data width and register vector length.
\begin{figure}[h]
\includegraphics[width=\linewidth]{vl_reg_n}
\caption{Vector length, data width, number of elements}
- \label{vl_reg_n}
+ \label{fig:vl_reg_n}
\end{figure}
RISCV Vector extension supports a VL of up to $2^{16}$ or $65536$ bits, which can fit 1024 64-bit words \cite{riscv-v-spec}.
\end{itemize}
\subsection{Simple Vectorisation}
-\ac{SV} is a an extension to a scalar ISA, designed to be as simple as possible, with no dedicated vector instructions. Effectively a hardware for-loop.
+\ac{SV} is a Scalable Vector ISA designed for hybrid workloads (CPU, GPU, VPU, 3D?).
+Includes features normally found only on Cray Supercomputers (Cray-1, NEC SX-Aurora) and GPUs.
+Keeps a strictly simple RISC leveraging a scalar ISA by using "Prefixing"
+No dedicated vector opcodes exist in SV!
+
+Main design principles
+\begin{itemize}
+ \item Introduce by implementing on top of existing Power ISA
+ \item Effectively a \textbf{hardware for-loop}, pauses main PC, issues multiple scalar op's
+ \item Preserves underlying scalar execution dependencies as the for-loop had been expanded as actual scalar instructions ("preserving Program Order")
+ \item Augments existing instructions by adding "tags" - provides Vectorisation "context" rather than adding new opcodes.
+ \item Does not modify or deviate from the underlying scalar Power ISA unless there's a significant performance boost or other advantage in the vector space (see \ref{subsubsec:add_to_pow_isa})
+ \item Aimed at Supercomputing: avoids creating significant \textit{sequential dependency hazards}, allowing \textbf{high performance superscalar microarchitectures} to be deployed.
+\end{itemize}
+
+Advantages include:
+\begin{itemize}
+ \item Easy to create first (and sometimes only) implementation as a literal for-loop in hardware, simulators, and compilers.
+ \item Hardware Architects may understand and implement SV as being an extra pipeline stage, inserted between decode and issue. Essentially a simple for-loop issuing element-level sub-instructions.
+ \item More complex HDL can be done by repeating existing scalar ALUs and pipelines as blocks, leveraging existing Multi-Issue Infrastructure.
+ \item Mostly high-level "context" which does not significantly deviate from scalar Power ISA and, in its purest form being a "for-loop around scalar instructions". Thus SV is minimally-disruptive and consequently has a reasonable chance of broad community adoption and acceptance.
+ \item Obliterates SIMD opcode proliferation ($O(N^6)$\textbf{[source?]}) as well as dedicated Vectorisation ISAs. No more separate vector instructions.
+\end{itemize}
+
+\subsubsection{Deviations from Power ISA}
+\label{subsubsec:add_to_pow_isa}
+\textit{(TODO: EXPAND)}
+dropping XER.SO for example
\subsubsection{Prefix 64 - SVP64}