From 55e6246befd5804e98390b5bf2b9f624eeb60cd8 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Mon, 20 Jun 2022 21:34:16 +0100 Subject: [PATCH] clarify summary --- svp64-primer/summary.tex | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/svp64-primer/summary.tex b/svp64-primer/summary.tex index 3c56ea286..0b1b172cc 100644 --- a/svp64-primer/summary.tex +++ b/svp64-primer/summary.tex @@ -8,6 +8,8 @@ ONLY uses scalar instructions}. \item Does not require sacrificing 32-bit Major Opcodes. \item Does not require adding duplicates of instructions (popcnt, popcntw, popcntd, vpopcntb, vpopcnth, vpopcntw, vpopcntd) +\item Fully abstracted: does not create Micro-architectural dependencies + (no fixed "Lane" size). \item Specifically designed to be easily implemented on top of an existing Micro-architecture (especially Superscalar Out-of-Order Multi-issue) without @@ -24,7 +26,8 @@ ONLY uses scalar instructions}. dramatically reduced instruction count, and power consumption expected to greatly reduce. Normally found only in high-end \acs{VLIW} \acs{DSP} (TI MSP, Qualcomm Hexagon) -\item Fail-First Load/Store allows strncpy to be implemented in around 14 +\item Fail-First Load/Store allows Vectorised high performance + strncpy to be implemented in around 14 instructions (hand-optimised \acs{VSX} assembler is 240). \item Inner loop of MP3 implemented in under 100 instructions (gcc produces 450 for the same function on POWER9). -- 2.30.2