From 039cf537e73ca2cb47402286d04f3d0ef0939472 Mon Sep 17 00:00:00 2001
From: Jacob Lifshay <programmerjake@gmail.com>
Date: Mon, 3 Oct 2022 17:05:10 -0700
Subject: [PATCH] add motivation for SVP64 with VL=1 override

---
 openpower/sv/svp64/discussion.mdwn | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/openpower/sv/svp64/discussion.mdwn b/openpower/sv/svp64/discussion.mdwn
index 894d8b3ff..b53a53e7a 100644
--- a/openpower/sv/svp64/discussion.mdwn
+++ b/openpower/sv/svp64/discussion.mdwn
@@ -313,7 +313,7 @@ interfered with, except that, again, RT may be set as a vector destination.
 
 ## answers to 4, loops/uses
 
-**REMAP**
+### REMAP
 
 A REMAP would redirect operations from the first nonmasked
 predicated element to the first **REMAPped** element, and combined
@@ -329,7 +329,7 @@ answer: use at least one vector source.  this solves the predication issue.
 question: does this impact LD/ST which has special overrides
 and mode-selection based on RA.isvec?
 
-**predication**
+### predication
 
 with nonzeroing the application of a predicate mask to an all-scalar
 operation effectively tests **ALL** relevant bits 0..VL-1 as nonzero in the
@@ -338,3 +338,29 @@ decision-making, whereas VL=1 will only test the first.
 a need for
 merging all bits into a single alternative predicate mask (single-bit)
 is the sort of thing we can probably live with.
+
+### fast traditional packed SIMD
+
+A major motivation for changing SVP64 with all isvec=0 to temporarily override VL to 1 is to allow supporting traditional SIMD that has constantly varying element sizes (and therefore vector lengths too) without needing setvl every few instructions.
+
+Examples of use cases:
+* WebAssembly's [128-bit packed SIMD extension](https://github.com/WebAssembly/spec/blob/8a352708cffeb71206ca49a0f743bdc57269fb1a/proposals/simd/SIMD.md) (which is becoming a de-facto standard for WebAssembly on the Web and on Servers)
+* Java/C#/JavaScript/etc. 128-bit packed SIMD
+* Cross-compiling x86 SSE2/AVX2 or ARM NEON or VSX/VMX code to SVP64.
+
+Implementing 128-bit packed SIMD can be done without constantly needing `setvl` instructions by:
+
+Setting VL=4 on entry to the code.
+
+Then, all 128-bit packed SIMD types can be emulated without additional `setvl` instructions:
+
+| 128-bit SIMD type            | SVP64 vector add                                            |
+|------------------------------|-------------------------------------------------------------|
+| `u8x16`/`i8x16`              | sv.add/subvl=4/elwid=8 RT.vector, RA.vector, RB.vector      |
+| `u16x8`/`i16x8`              | sv.add/subvl=2/elwid=16 RT.vector, RA.vector, RB.vector     |
+| `u32x4`/`i32x4`              | sv.add/elwid=32 RT.vector, RA.vector, RB.vector             |
+| `u64x2`/`i64x2`              | sv.add/subvl=2 RT.scalar, RA.scalar, RB.scalar              |
+| `bf16x8` (not in base SVP64) | sv.fadd/subvl=2/elwid=8 FRT.vector, FRA.vector, FRB.vector  |
+| `f16x8`                      | sv.fadd/subvl=2/elwid=16 FRT.vector, FRA.vector, FRB.vector |
+| `f32x4`                      | sv.fadd/elwid=32 FRT.vector, FRA.vector, FRB.vector         |
+| `f64x2`                      | sv.fadd/subvl=2 FRT.scalar, FRA.scalar, FRB.scalar          |
-- 
2.30.2