From 039cf537e73ca2cb47402286d04f3d0ef0939472 Mon Sep 17 00:00:00 2001 From: Jacob Lifshay Date: Mon, 3 Oct 2022 17:05:10 -0700 Subject: [PATCH] add motivation for SVP64 with VL=1 override --- openpower/sv/svp64/discussion.mdwn | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/openpower/sv/svp64/discussion.mdwn b/openpower/sv/svp64/discussion.mdwn index 894d8b3ff..b53a53e7a 100644 --- a/openpower/sv/svp64/discussion.mdwn +++ b/openpower/sv/svp64/discussion.mdwn @@ -313,7 +313,7 @@ interfered with, except that, again, RT may be set as a vector destination. ## answers to 4, loops/uses -**REMAP** +### REMAP A REMAP would redirect operations from the first nonmasked predicated element to the first **REMAPped** element, and combined @@ -329,7 +329,7 @@ answer: use at least one vector source. this solves the predication issue. question: does this impact LD/ST which has special overrides and mode-selection based on RA.isvec? -**predication** +### predication with nonzeroing the application of a predicate mask to an all-scalar operation effectively tests **ALL** relevant bits 0..VL-1 as nonzero in the @@ -338,3 +338,29 @@ decision-making, whereas VL=1 will only test the first. a need for merging all bits into a single alternative predicate mask (single-bit) is the sort of thing we can probably live with. + +### fast traditional packed SIMD + +A major motivation for changing SVP64 with all isvec=0 to temporarily override VL to 1 is to allow supporting traditional SIMD that has constantly varying element sizes (and therefore vector lengths too) without needing setvl every few instructions. + +Examples of use cases: +* WebAssembly's [128-bit packed SIMD extension](https://github.com/WebAssembly/spec/blob/8a352708cffeb71206ca49a0f743bdc57269fb1a/proposals/simd/SIMD.md) (which is becoming a de-facto standard for WebAssembly on the Web and on Servers) +* Java/C#/JavaScript/etc. 128-bit packed SIMD +* Cross-compiling x86 SSE2/AVX2 or ARM NEON or VSX/VMX code to SVP64. + +Implementing 128-bit packed SIMD can be done without constantly needing `setvl` instructions by: + +Setting VL=4 on entry to the code. + +Then, all 128-bit packed SIMD types can be emulated without additional `setvl` instructions: + +| 128-bit SIMD type | SVP64 vector add | +|------------------------------|-------------------------------------------------------------| +| `u8x16`/`i8x16` | sv.add/subvl=4/elwid=8 RT.vector, RA.vector, RB.vector | +| `u16x8`/`i16x8` | sv.add/subvl=2/elwid=16 RT.vector, RA.vector, RB.vector | +| `u32x4`/`i32x4` | sv.add/elwid=32 RT.vector, RA.vector, RB.vector | +| `u64x2`/`i64x2` | sv.add/subvl=2 RT.scalar, RA.scalar, RB.scalar | +| `bf16x8` (not in base SVP64) | sv.fadd/subvl=2/elwid=8 FRT.vector, FRA.vector, FRB.vector | +| `f16x8` | sv.fadd/subvl=2/elwid=16 FRT.vector, FRA.vector, FRB.vector | +| `f32x4` | sv.fadd/elwid=32 FRT.vector, FRA.vector, FRB.vector | +| `f64x2` | sv.fadd/subvl=2 FRT.scalar, FRA.scalar, FRB.scalar | -- 2.30.2