\section{Summary}
-Specification for hardware for-loop that ONLY uses scalar instructions
+Simple-V is a Scalable Vector Specification for a hardware for-loop that
+ONLY uses scalar instructions. Advantages:
+
+\begin{itemize}
+\item The v3.1 Specification is not altered in any way.
+\item Specifically designed to be easily implemented
+ on top of an existing Micro-architecture (especially
+ Superscalar Out-of-Order Multi-issue) without
+ disruptive full architectural redesigns.
+\item Divided into Compliancy Levels to suit differing needs.
+\item At the highest Compliancy Level only requires four instructions
+ (SVE2 requires appx 9,000. AVX-512 around 10,000. RVV around
+ 300).
+\item Predication, an often-requested feature, is added cleanly to the
+ Power ISA (without modifying the v3.1 Power ISA)
+\item In-registers arbitrary-sized Matrix Multiply is achieved in three
+ instructions (without adding any v3.1 Power ISA instructions)
+\item Full DCT and FFT RADIX2 Triple-loops are achieved with dramatically
+ reduced instruction count, and power consumption expected to greatly
+ reduce. Normally found only in high-end VLIW DSPs (TI MSP, Qualcomm
+ Hexagon)
+\item Fail-First Load/Store allows strncpy to be implemented in around 14
+ instructions (Optimised VSX assembler is 240).
+\item Inner loop of MP3 implemented in under 100 instructions
+ (gcc produces 450 for the same function)
+\end{itemize}
+
+All areas investigated so far consistently showed reductions in executable
+size, which as outlined in \cite{SIMD_HARM} has an indirect reduction in
+power consumption due both to less I-Cache/TLB pressure and Issue remaining
+idle.
+
\subsection{What is SIMD?}
\textit{(for clarity only 64-bit registers will be discussed here,