\pagebreak
\subsection{What is SIMD?}
-\textit{(for clarity only 64-bit registers will be discussed here,
- however 128-, 256-, and 512-bit implementations also exist)}
\ac{SIMD} is a way of partitioning existing \ac{CPU}
registers of 64-bit length into smaller 8-, 16-, 32-bit pieces
\cite{SIMD_HARM}\cite{SIMD_HPC}. These partitions can then be operated
on simultaneously, and the initial values and results being stored as
entire 64-bit registers. The SIMD instruction opcode includes the data
-width and the operation to perform.\par
+width and the operation to perform.
+\par
-\begin{figure}[h]
- \includegraphics[width=\linewidth]{simd_axb}
+\begin{figure}[hb]
+ \centering
+ \includegraphics[width=0.6\linewidth]{simd_axb}
\caption{SIMD multiplication}
\label{fig:simd_axb}
\end{figure}
This method can have a huge advantage for rapid processing of
vector-type data (image/video, physics simulations, cryptography,
etc.)\cite{SIMD_WASM}, and thus on paper is very attractive compared to
-scalar-only instructions.\par
+scalar-only instructions.
+\textit{As long as the data width fits the workload, everything is fine}.
+\par
SIMD registers are of a fixed length and thus to achieve greater
performance, CPU architects typically increase the width of registers
architectures. Vector CPUs collect operands from the main memory, and
store them in large, sequential vector registers.\par
+\begin{figure}[hb]
+ \centering
+ \includegraphics[width=0.6\linewidth]{cray_vector_regs}
+ \caption{Cray Vector registers: 8 registers, 64 elements each}
+ \label{fig:cray_vector_regs}
+\end{figure}
+
A simple vector processor might operate on one element at a time,
however as the element operations are usually independent,
a processor could be made to compute all of the vector's
64-bit elements per clock cycle\cite{SIMD_HARM}. Such processors can also
deal with (in hardware) fringe cases where the vector length is not a
multiple of the number of elements. The element data width is variable
-(just like in SIMD). Fig \ref{fig:vl_reg_n} shows the relationship
-between number of elements, data width and register vector length.
-
-\begin{figure}[h]
- \includegraphics[width=\linewidth]{vl_reg_n}
- \caption{Vector length, data width, number of elements}
- \label{fig:vl_reg_n}
+(just like in SIMD) but it is the \textit{number} of elements being
+variable under control of a "setvl" instruction that makes Vector ISAs
+"Scalable"
+\par
+
+RISC-V Vector extension (RVV) supports a VL of up to $2^{16}$ or $65536$ bits,
+which can fit 1024 64-bit words \cite{riscv-v-spec}. The Cray-1 had
+8 Vector Registers with up to 64 elements. An early Draft of RVV supported
+overlaying the Vector Registers onto the Floating Point registers, similar
+to x86 "MMX".
+
+Simple-V's "Vector" Registers are specifically designed to fit
+on top of the Scalar (GPR, FPR) register files, which are extended from
+32 to 128 entries. This is a primary reason why Simple-V can be added
+on top of an existing Scalar ISA, and \textit{in particular} why there
+is no need to add Vector Registers or Vector instructions.
+
+\begin{figure}[hb]
+ \centering
+ \includegraphics[width=0.6\linewidth]{svp64_regs}
+ \caption{three instructions, same vector length, different element widths}
+ \label{fig:svp64_regs}
\end{figure}
-RISC-V Vector extension supports a VL of up to $2^{16}$ or $65536$ bits,
-which can fit 1024 64-bit words \cite{riscv-v-spec}.
-
\subsection{Comparison Between SIMD and Vector}
\textit{(Need to add more here, use example from \cite{SIMD_HARM}?)}