* **Structure Packing** - covered in SV by [[sv/remap]] and Pack/Unpack Mode.
*Despite being constructed from Scalar LD/ST none of these Modes exist
-or make sense in any Scalar ISA. They **only** exist in Vector ISAs*
+or make sense in any Scalar ISA. They **only** exist in Vector ISAs
+and are a critical part of its value*.
Also included in SVP64 LD/ST is both signed and unsigned Saturation,
as well as Element-width overrides and Twin-Predication.
-Note also that Indexed [[sv/remap]] mode may be applied to both v3.0
-LD/ST Immediate instructions *and* v3.0 LD/ST Indexed instructions.
+Note also that Indexed [[sv/remap]] mode may be applied to both Scalar
+LD/ST Immediate Defined Words *and* LD/ST Indexed Defined Words.
LD/ST-Indexed should not be conflated with Indexed REMAP mode:
clarification is provided below.
modes make sense:
* saturation
-* predicate-result would be useful but is lower priority than Data-Dependent Fail-First
* simple (no augmentation)
* fail-first (where Vector Indexed is banned)
* Signed Effective Address computation (Vector Indexed only)
| 0 | 0 | 0 | zz els | simple mode |
| 0 | 0 | 1 | PI LF | post-increment and Fault-First |
| 1 | 0 | N | zz els | sat mode: N=0/1 u/s |
-|VLi| 1 | inv | CR-bit | Rc=1: ffirst CR sel |
-|VLi| 1 | inv | els RC1 | Rc=0: ffirst z/nonz |
+|VLi| 1 | inv | CR-bit | ffirst CR sel |
The `els` bit is only relevant when `RA.isvec` is clear: this indicates
whether stride is unit or element:
| 0 | 1 | 2 | 3 4 | description |
|---|---| --- |---------|--------------------------- |
|els| 0 | SEA | dz sz | simple mode |
-|VLi| 1 | inv | CR-bit | Rc=1: ffirst CR sel |
-|VLi| 1 | inv | els RC1 | Rc=0: ffirst z/nonz |
+|VLi| 1 | inv | CR-bit | ffirst CR sel |
Vector Indexed Strided Mode is qualified as follows:
Thus it can be seen that the use of Indexed REMAP saves copying
and manual reordering of the Vector of RB offsets.
-## LD/ST ffirst
+## LD/ST ffirst (Fault-First)
LD/ST ffirst treats the first LD/ST in a vector (element 0 if REMAP
is not active) as an ordinary one, with all behaviour with respect to
## Data-Dependent Fail-First (not Fail/Fault-First)
Not to be confused with Fail/Fault First, Data-Fail-First performs an
-additional check on the data into a Condition Register Field and if a test
-on the CR Field fails then VL is truncated and further looping terminates.
+additional check on the data, and if the test
+fails then VL is truncated and further looping terminates.
This is precisely the same as Arithmetic Data-Dependent Fail-First,
-the only difference being that the result comes from the LD/ST.
+the only difference being that the result comes from the LD/ST
+rather than from an Arithmetic operation.
+
+Also a crucial difference between Arithmetic and LD/ST Data-Dependent Fail-First:
+except for Store-Conditional a 4-bit Condition Register Field test is created
+for testing purposes
+*but not stored* (thus there is no RC1 Mode as there is in Arithmetic).
+The reason why a CR Field is not stored is because Load/Store, particularly
+the Update instructions, is already expensive in register terms,
+and adding an extra Vector write would be too costly in hardware.
+
+*Programmer's note: Programmers
+may use Data-Dependent Load with a test to truncate VL, and may then
+follow up with a `sv.cmpi` or other operation. The important aspect is
+that the Vector Load truncated on finding a NULL pointer, for example.*
+
+*Programmer's note: Load-with-Update may be used to update
+the register used in Effective Address computation of th
+next element. This may be used to perform single-linked-list
+walking, where Data-Dependent Fail-First terminates and
+truncates the Vector at the first NULL.*
In the case of Store operations there is a quirk when VLi (VL inclusive
is "Valid") is clear. Bear in mind the criteria is that the truncated
VLi is set the *current failed test* is permitted to be included. Thus,
the actual update (store) to Memory is **not permitted to take place**
should the test fail. Therefore, on testing the value to be stored,
-and after updating the corresponding CR Field Element, when VLi=0 and
-finding that the test fails the Memory store must **not** occur.
+when VLi=0 and finding that the test fails the Memory store must **not** occur.
Additionally, when VLi=0 and a test fails then RA does **not** receive a
copy of the Effective Address. Hardware implementations with Out-of-Order
If however VLi=0 it will *exclude* the NULL pointer by truncating VL to
one Element earlier.
+*Programmer's Note: by also setting the RC1 qualifier as well as setting
+VLi=1 it is possible to establish a Predicate Mask such that the first
+zero in the predicate will be the NULL pointer*
+
```
RT=1 # vec - deliberately overlaps by one with RA
RA=0 # vec - first one is valid, contains ptr
imm = 8 # offset_of(ptr->next)
for i in range(VL):
+ # this part is the Scalar Defined Word (standard scalar ld operation)
EA = GPR(RA+i) + imm # ptr + offset(next)
data = MEM(EA, 8) # 64-bit address of ptr->next
GPR(RT+i) = data # happens to be read on next loop!
- # was a normal ld up to this point. now the Data-Fail-First
- CR.field(i) = conditions(data)
- if CR.field(i).EQ == testbit: # check if zero
- if VLI then VL = i+1 # update VL, inclusive
- else VL = i # update VL
- break # stop looping
+ # was a normal vector-ld up to this point. now the Data-Fail-First
+ cr_test = conditions(data)
+ if Rc=1 or RC1: CR.field(i) = cr_test # only store if Rc=1/RC1
+ if cr_test.EQ == testbit: # check if zero
+ if VLI then VL = i+1 # update VL, inclusive
+ else VL = i # update VL, exclusive current
+ break # stop looping
```
**Data-Dependent Fault-First on Store-Conditional (Rc=1)**
There are very few instructions that allow Rc=1 for Load/Store:
one of those is the `stdcx.` and other Atomic Store-Conditional
instructions. With Simple-V being a loop around Scalar instructions
-strictly obeying Scalar Program Order a Fail-First loop on an
-Atomic Store-Conditional will always fail the second and all other
-Store-Conditional instructions in Horizontal-First Mode because
+strictly obeying Scalar Program Order a Horizontal-First Fail-First loop
+on an Atomic Store-Conditional will always fail the second and all other
+Store-Conditional instructions because
Load-Reservation and Store-Conditional are required to be executed
in pairs.
By contrast, in Vertical-First Mode it is in fact possible to issue
the pairs, and consequently allowing Vectorised Data-Dependent Fail-First is
-useful. Care should be taken however when VL is truncated in Vertical-First
-Mode.
+useful.
+
+Programmer's note: Care should be taken when VL is truncated in
+Vertical-First Mode.
+
+**Future potential**
+
+Although Rc=1 on LD/ST is a rare occurrence at present, future versions
+of Power ISA *might* conceivably have Rc=1 LD/ST Scalar instructions, and
+with the SVP64 Vectorisation Prefixing being itself a RISC-paradigm that
+is itself fully-independent of the Scalar Suffix Defined Words, prohibiting
+the possibility of Rc=1 Data-Dependent Mode on future potential LD/ST
+operations is not strategically sound.
## LOAD/STORE Elwidths <a name="elwidth"></a>
Structure Packing, at the vec2/vec3/vec4 granularity level. Beyond that,
REMAP will need to be used.
+**Parallel Reduction REMAP**
+
+No REMAP Schedule is prohibited in SVP64 because the RISC-paradigm Prefix
+is completely separate from the RISC-paradigm Scalar Defined Words. Although
+obscure there does exist the outside possibility that a potential use for
+Parallel Reduction Schedules on LD/ST would find a use in Computer Science.
+Readers are invited to contact the authors of this document if one is ever
+found.
+
--------
[[!tag standards]]