From: Luke Kenneth Casson Leighton Date: Mon, 20 Jun 2022 20:34:16 +0000 (+0100) Subject: clarify summary X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=55e6246befd5804e98390b5bf2b9f624eeb60cd8;p=libreriscv.git clarify summary --- diff --git a/svp64-primer/summary.tex b/svp64-primer/summary.tex index 3c56ea286..0b1b172cc 100644 --- a/svp64-primer/summary.tex +++ b/svp64-primer/summary.tex @@ -8,6 +8,8 @@ ONLY uses scalar instructions}. \item Does not require sacrificing 32-bit Major Opcodes. \item Does not require adding duplicates of instructions (popcnt, popcntw, popcntd, vpopcntb, vpopcnth, vpopcntw, vpopcntd) +\item Fully abstracted: does not create Micro-architectural dependencies + (no fixed "Lane" size). \item Specifically designed to be easily implemented on top of an existing Micro-architecture (especially Superscalar Out-of-Order Multi-issue) without @@ -24,7 +26,8 @@ ONLY uses scalar instructions}. dramatically reduced instruction count, and power consumption expected to greatly reduce. Normally found only in high-end \acs{VLIW} \acs{DSP} (TI MSP, Qualcomm Hexagon) -\item Fail-First Load/Store allows strncpy to be implemented in around 14 +\item Fail-First Load/Store allows Vectorised high performance + strncpy to be implemented in around 14 instructions (hand-optimised \acs{VSX} assembler is 240). \item Inner loop of MP3 implemented in under 100 instructions (gcc produces 450 for the same function on POWER9).