## answers to 4, loops/uses
-**REMAP**
+### REMAP
A REMAP would redirect operations from the first nonmasked
predicated element to the first **REMAPped** element, and combined
question: does this impact LD/ST which has special overrides
and mode-selection based on RA.isvec?
-**predication**
+### predication
with nonzeroing the application of a predicate mask to an all-scalar
operation effectively tests **ALL** relevant bits 0..VL-1 as nonzero in the
a need for
merging all bits into a single alternative predicate mask (single-bit)
is the sort of thing we can probably live with.
+
+### fast traditional packed SIMD
+
+A major motivation for changing SVP64 with all isvec=0 to temporarily override VL to 1 is to allow supporting traditional SIMD that has constantly varying element sizes (and therefore vector lengths too) without needing setvl every few instructions.
+
+Examples of use cases:
+* WebAssembly's [128-bit packed SIMD extension](https://github.com/WebAssembly/spec/blob/8a352708cffeb71206ca49a0f743bdc57269fb1a/proposals/simd/SIMD.md) (which is becoming a de-facto standard for WebAssembly on the Web and on Servers)
+* Java/C#/JavaScript/etc. 128-bit packed SIMD
+* Cross-compiling x86 SSE2/AVX2 or ARM NEON or VSX/VMX code to SVP64.
+
+Implementing 128-bit packed SIMD can be done without constantly needing `setvl` instructions by:
+
+Setting VL=4 on entry to the code.
+
+Then, all 128-bit packed SIMD types can be emulated without additional `setvl` instructions:
+
+| 128-bit SIMD type | SVP64 vector add |
+|------------------------------|-------------------------------------------------------------|
+| `u8x16`/`i8x16` | sv.add/subvl=4/elwid=8 RT.vector, RA.vector, RB.vector |
+| `u16x8`/`i16x8` | sv.add/subvl=2/elwid=16 RT.vector, RA.vector, RB.vector |
+| `u32x4`/`i32x4` | sv.add/elwid=32 RT.vector, RA.vector, RB.vector |
+| `u64x2`/`i64x2` | sv.add/subvl=2 RT.scalar, RA.scalar, RB.scalar |
+| `bf16x8` (not in base SVP64) | sv.fadd/subvl=2/elwid=8 FRT.vector, FRA.vector, FRB.vector |
+| `f16x8` | sv.fadd/subvl=2/elwid=16 FRT.vector, FRA.vector, FRB.vector |
+| `f32x4` | sv.fadd/elwid=32 FRT.vector, FRA.vector, FRB.vector |
+| `f64x2` | sv.fadd/subvl=2 FRT.scalar, FRA.scalar, FRB.scalar |