From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Sat, 18 Jun 2022 12:35:04 +0000 (+0100)
Subject: add features summary at top of primer
X-Git-Tag: opf_rfc_ls005_v1~1718
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=57ac938a4074d54c86267a74fda14ecbb1a7b086;p=libreriscv.git

add features summary at top of primer
https://bugs.libre-soc.org/show_bug.cgi?id=858#c10
---

diff --git a/svp64-primer/summary.tex b/svp64-primer/summary.tex
index c503247d1..d9c8c59aa 100644
--- a/svp64-primer/summary.tex
+++ b/svp64-primer/summary.tex
@@ -1,5 +1,36 @@
 \section{Summary}
-Specification for hardware for-loop that ONLY uses scalar instructions
+Simple-V is a Scalable Vector Specification for a hardware for-loop that
+ONLY uses scalar instructions. Advantages:
+
+\begin{itemize}
+\item The v3.1 Specification is not altered in any way.
+\item Specifically designed to be easily implemented
+  on top of an existing Micro-architecture (especially
+  Superscalar Out-of-Order Multi-issue) without
+  disruptive full architectural redesigns.
+\item Divided into Compliancy Levels to suit differing needs.
+\item At the highest Compliancy Level only requires four instructions
+  (SVE2 requires appx 9,000. AVX-512 around 10,000. RVV around
+  300).
+\item Predication, an often-requested feature, is added cleanly to the
+  Power ISA (without modifying the v3.1 Power ISA)
+\item In-registers arbitrary-sized Matrix Multiply is achieved in three
+  instructions (without adding any v3.1 Power ISA instructions)
+\item Full DCT and FFT RADIX2 Triple-loops are achieved with dramatically
+  reduced instruction count, and power consumption expected to greatly
+  reduce. Normally found only in high-end VLIW DSPs (TI MSP, Qualcomm
+  Hexagon)
+\item Fail-First Load/Store allows strncpy to be implemented in around 14
+  instructions (Optimised VSX assembler is 240).
+\item Inner loop of MP3 implemented in under 100 instructions
+  (gcc produces 450 for the same function)
+\end{itemize}
+
+All areas investigated so far consistently showed reductions in executable
+size, which as outlined in \cite{SIMD_HARM} has an indirect reduction in
+power consumption due both to less I-Cache/TLB pressure and Issue remaining
+idle.
+
 
 \subsection{What is SIMD?}
 \textit{(for clarity only 64-bit registers will be discussed here,
diff --git a/svp64-primer/svp64-primer.tex b/svp64-primer/svp64-primer.tex
index 1f001e392..7ae6f89e8 100644
--- a/svp64-primer/svp64-primer.tex
+++ b/svp64-primer/svp64-primer.tex
@@ -13,6 +13,7 @@
 
 
 \input{acronyms}
+\pagebreak
 \input{summary}
 %\input{...}