From: Luke Kenneth Casson Leighton Date: Sat, 18 Jun 2022 14:19:26 +0000 (+0100) Subject: summary clarification X-Git-Tag: opf_rfc_ls005_v1~1705 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=9b2519b08f3b3cd3aa7a330c5a5969de1533ca08;p=libreriscv.git summary clarification --- diff --git a/svp64-primer/summary.tex b/svp64-primer/summary.tex index 48dacea03..a2fc94f0c 100644 --- a/svp64-primer/summary.tex +++ b/svp64-primer/summary.tex @@ -4,16 +4,18 @@ ONLY uses scalar instructions. \begin{itemize} \item The Power ISA v3.1 Specification is not altered in any way. + v3.1 Code-compatibility is guaranteed. +\item Does not require sacrificing 32-bit Major Opcodes. \item Specifically designed to be easily implemented on top of an existing Micro-architecture (especially Superscalar Out-of-Order Multi-issue) without disruptive full architectural redesigns. \item Divided into Compliancy Levels to suit differing needs. -\item At the highest Compliancy Level only requires four instructions +\item At the highest Compliancy Level only requires five instructions (SVE2 requires appx 9,000. AVX-512 around 10,000. RVV around 300). -\item Predication, an often-requested feature, is added cleanly to the - Power ISA (without modifying the v3.1 Power ISA) +\item Predication, an often-requested feature, is added cleanly + (without modifying the v3.1 Power ISA) \item In-registers arbitrary-sized Matrix Multiply is achieved in three instructions (without adding any v3.1 Power ISA instructions) \item Full DCT and FFT RADIX2 Triple-loops are achieved with dramatically @@ -21,7 +23,7 @@ ONLY uses scalar instructions. reduce. Normally found only in high-end VLIW DSPs (TI MSP, Qualcomm Hexagon) \item Fail-First Load/Store allows strncpy to be implemented in around 14 - instructions (Optimised VSX assembler is 240). + instructions (hand-optimised VSX assembler is 240). \item Inner loop of MP3 implemented in under 100 instructions (gcc produces 450 for the same function) \end{itemize} @@ -179,6 +181,6 @@ SIMD implementations by: -for loop, increment registers RT, RA, RB -few instructions, easier to implement and maintain -example assembly code --ARM has already started to add to libC SVE2 support +-ARM has already started to add to libC SVE2 support 1970 x86 comparison