From: Luke Kenneth Casson Leighton Date: Sat, 18 Jun 2022 12:35:04 +0000 (+0100) Subject: add features summary at top of primer X-Git-Tag: opf_rfc_ls005_v1~1718 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=57ac938a4074d54c86267a74fda14ecbb1a7b086;p=libreriscv.git add features summary at top of primer https://bugs.libre-soc.org/show_bug.cgi?id=858#c10 --- diff --git a/svp64-primer/summary.tex b/svp64-primer/summary.tex index c503247d1..d9c8c59aa 100644 --- a/svp64-primer/summary.tex +++ b/svp64-primer/summary.tex @@ -1,5 +1,36 @@ \section{Summary} -Specification for hardware for-loop that ONLY uses scalar instructions +Simple-V is a Scalable Vector Specification for a hardware for-loop that +ONLY uses scalar instructions. Advantages: + +\begin{itemize} +\item The v3.1 Specification is not altered in any way. +\item Specifically designed to be easily implemented + on top of an existing Micro-architecture (especially + Superscalar Out-of-Order Multi-issue) without + disruptive full architectural redesigns. +\item Divided into Compliancy Levels to suit differing needs. +\item At the highest Compliancy Level only requires four instructions + (SVE2 requires appx 9,000. AVX-512 around 10,000. RVV around + 300). +\item Predication, an often-requested feature, is added cleanly to the + Power ISA (without modifying the v3.1 Power ISA) +\item In-registers arbitrary-sized Matrix Multiply is achieved in three + instructions (without adding any v3.1 Power ISA instructions) +\item Full DCT and FFT RADIX2 Triple-loops are achieved with dramatically + reduced instruction count, and power consumption expected to greatly + reduce. Normally found only in high-end VLIW DSPs (TI MSP, Qualcomm + Hexagon) +\item Fail-First Load/Store allows strncpy to be implemented in around 14 + instructions (Optimised VSX assembler is 240). +\item Inner loop of MP3 implemented in under 100 instructions + (gcc produces 450 for the same function) +\end{itemize} + +All areas investigated so far consistently showed reductions in executable +size, which as outlined in \cite{SIMD_HARM} has an indirect reduction in +power consumption due both to less I-Cache/TLB pressure and Issue remaining +idle. + \subsection{What is SIMD?} \textit{(for clarity only 64-bit registers will be discussed here, diff --git a/svp64-primer/svp64-primer.tex b/svp64-primer/svp64-primer.tex index 1f001e392..7ae6f89e8 100644 --- a/svp64-primer/svp64-primer.tex +++ b/svp64-primer/svp64-primer.tex @@ -13,6 +13,7 @@ \input{acronyms} +\pagebreak \input{summary} %\input{...}