add features summary at top of primer

author Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Sat, 18 Jun 2022 12:35:04 +0000 (13:35 +0100)

committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Sat, 18 Jun 2022 12:35:08 +0000 (13:35 +0100)
author Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Sat, 18 Jun 2022 12:35:04 +0000 (13:35 +0100)
committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Sat, 18 Jun 2022 12:35:08 +0000 (13:35 +0100)
diff --git a/svp64-primer/summary.tex b/svp64-primer/summary.tex

index c503247d10075ab50621382486c1aef4ae296877..d9c8c59aadbe6145ad9d7800bff8015b15d8549d 100644 (file)
--- a/svp64-primer/summary.tex
+++ b/svp64-primer/summary.tex
@@ -1,5 +1,36 @@
  \section{Summary}
-Specification for hardware for-loop that ONLY uses scalar instructions
+Simple-V is a Scalable Vector Specification for a hardware for-loop that
+ONLY uses scalar instructions. Advantages:
+
+\begin{itemize}
+\item The v3.1 Specification is not altered in any way.
+\item Specifically designed to be easily implemented
+  on top of an existing Micro-architecture (especially
+  Superscalar Out-of-Order Multi-issue) without
+  disruptive full architectural redesigns.
+\item Divided into Compliancy Levels to suit differing needs.
+\item At the highest Compliancy Level only requires four instructions
+  (SVE2 requires appx 9,000. AVX-512 around 10,000. RVV around
+  300).
+\item Predication, an often-requested feature, is added cleanly to the
+  Power ISA (without modifying the v3.1 Power ISA)
+\item In-registers arbitrary-sized Matrix Multiply is achieved in three
+  instructions (without adding any v3.1 Power ISA instructions)
+\item Full DCT and FFT RADIX2 Triple-loops are achieved with dramatically
+  reduced instruction count, and power consumption expected to greatly
+  reduce. Normally found only in high-end VLIW DSPs (TI MSP, Qualcomm
+  Hexagon)
+\item Fail-First Load/Store allows strncpy to be implemented in around 14
+  instructions (Optimised VSX assembler is 240).
+\item Inner loop of MP3 implemented in under 100 instructions
+  (gcc produces 450 for the same function)
+\end{itemize}
+
+All areas investigated so far consistently showed reductions in executable
+size, which as outlined in \cite{SIMD_HARM} has an indirect reduction in
+power consumption due both to less I-Cache/TLB pressure and Issue remaining
+idle.
+
  
  \subsection{What is SIMD?}
  \textit{(for clarity only 64-bit registers will be discussed here,
diff --git a/svp64-primer/svp64-primer.tex b/svp64-primer/svp64-primer.tex

index 1f001e392602bab6768ec55d7a3e9364cceb3cf2..7ae6f89e830c1b3d23c3ec520c4fef0fa2269ef4 100644 (file)
--- a/svp64-primer/svp64-primer.tex
+++ b/svp64-primer/svp64-primer.tex
@@ -13,6 +13,7 @@
  
  
  \input{acronyms}
+\pagebreak
  \input{summary}
  %\input{...}
author	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Sat, 18 Jun 2022 12:35:04 +0000 (13:35 +0100)
committer	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Sat, 18 Jun 2022 12:35:08 +0000 (13:35 +0100)
svp64-primer/summary.tex		patch \| blob \| history
svp64-primer/svp64-primer.tex		patch \| blob \| history