full review needed, answering question:
- if sv.op RT.scalar RA.scalar RB.scalar is changed to temporarily override VL to be 1, is anything lost?
+ if sv.op RT.scalar RA.scalar RB.scalar is changed to temporarily
+ override VL to be 1, is anything lost?
four aspects:
* sat mode or saturation:
* pred-result mode
-simple mode is fine including on predication but has a CHANGE OF BEHAVIOUR. first bit of src/dest is used when zeroing is on, but first ENABLED bit of predicate is used when VL>1.
+simple mode is fine including on predication but has a CHANGE OF
+BEHAVIOUR. first bit of src/dest is used when zeroing is on, but first
+ENABLED bit of predicate is used when VL>1.
reduce mode is unaffected (meaningless) on a scalar operation
-fail-first with or without VLI, should be unaffected, but what should VL be truncated to? the override VL=1? (or VL=0 when VLI is set?)
+fail-first with or without VLI, should be unaffected, but what should
+VL be truncated to? the override VL=1? (or VL=0 when VLI is set?)
saturation mode is fine
-predicate-result should be fine as well (aside from change in predicate behaviour)
+predicate-result should be fine as well (aside from change in predicate
+behaviour)
**LD/ST Immediate**
with predication (single bit or otherwise) selection of a
single element is achieved.
-(but hang on, does it actually? answer: no. scalar sources are not REMAPped regardless of VL)
+(but hang on, does it actually? answer: no. scalar sources are not
+REMAPped regardless of VL)
biggest concern: how to achieve the same effect?
operation effectively tests **ALL** relevant bits 0..VL-1 as nonzero in the
decision-making, whereas VL=1 will only test the first.
-a need for
-merging all bits into a single alternative predicate mask (single-bit)
-is the sort of thing we can probably live with.
+a need for merging all bits into a single alternative predicate mask
+(single-bit) is the sort of thing we can probably live with.
### fast traditional packed SIMD
-A major motivation for changing SVP64 with all isvec=0 to temporarily override VL to 1 is to allow supporting traditional SIMD that has constantly varying element sizes (and therefore vector lengths too) without needing setvl every few instructions.
+A major motivation for changing SVP64 with all isvec=0 to temporarily
+override VL to 1 is to allow supporting traditional SIMD that has
+constantly varying element sizes (and therefore vector lengths too)
+without needing setvl every few instructions.
Examples of use cases:
* WebAssembly's [128-bit packed SIMD extension](https://github.com/WebAssembly/spec/blob/8a352708cffeb71206ca49a0f743bdc57269fb1a/proposals/simd/SIMD.md) (which is becoming a de-facto standard for WebAssembly on the Web and on Servers)
* Java/C#/JavaScript/etc. 128-bit packed SIMD
* Cross-compiling x86 SSE2/AVX2 or ARM NEON or VSX/VMX code to SVP64.
-Implementing 128-bit packed SIMD can be done without constantly needing `setvl` instructions by:
+Implementing 128-bit packed SIMD can be done without constantly needing
+`setvl` instructions by:
Setting VL=4 on entry to the code.
-Then, all 128-bit packed SIMD types can be emulated without additional `setvl` instructions:
+Then, all 128-bit packed SIMD types can be emulated without additional
+`setvl` instructions:
| 128-bit SIMD type | SVP64 vector add |
|------------------------------|-------------------------------------------------------------|