# SV Load and Store
+<!-- hide -->
Links:
* <https://bugs.libre-soc.org/show_bug.cgi?id=561>
* <https://llvm.org/devmtg/2016-11/Slides/Emerson-ScalableVectorizationinLLVMIR.pdf>
* <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-loads-and-stores>
* [[ldst/discussion]]
+<!-- show -->
## Rationale
RB, and before adding to RA in order to calculate the Effective Address,
if SEA is set RB is sign-extended from elwidth bits to the full 64 bits.
For other Modes (ffirst, saturate), all EA computation with elwidth
-overrides is unsigned.
+overrides is unsigned. RA is *not* altered (not truncated)
+by element-width overrides.
Note that cache-inhibited LD/ST when VSPLAT is activated will perform
**multiple** LD/ST operations, sequentially. Even with scalar src
walking, where Data-Dependent Fail-First terminates and
truncates the Vector at the first NULL.*
+**Load/Store Data-Dependent Fail-First, VLi=0**
+
In the case of Store operations there is a quirk when VLi (VL inclusive
is "Valid") is clear. Bear in mind the criteria is that the truncated
Vector of results, when VLi is clear, must all pass the "test", but when
VLi is set the *current failed test* is permitted to be included. Thus,
the actual update (store) to Memory is **not permitted to take place**
-should the test fail. Therefore, on testing the value to be stored,
-when VLi=0 and finding that the test fails the Memory store must **not** occur.
+should the test fail.
-Additionally, when VLi=0 and a test fails then RA does **not** receive a
+Additionally in any Load/Store with Update instruction,
+when VLi=0 and a test fails then RA does **not** receive a
copy of the Effective Address. Hardware implementations with Out-of-Order
Micro-Architectures should use speculative Shadow-Hold and Cancellation
-when the test fails.
+(or other Transactional Rollback mechanism) when the test fails.
+
+**Load/Store Data-Dependent Fail-First, VLi=1**
-By contrast if VLi=1 and the test fails, Store may proceed *and then*
-looping terminates. In this way, when non-Inclusive, the Vector of
-Truncated results contains only Stores that passed the test (and RA=EA
-updates if any), and when Inclusive the Vector of Truncated results
-contains the first-failed data.
+By contrast if VLi=1 and the test fails, the Store may proceed *and then*
+looping terminates. In this way, when Inclusive the Vector of Truncated results
+contains the first-failed data (including RA on Updates)
Below is an example of loading the starting addresses of Linked-List
nodes. If VLi=1 it will load the NULL pointer into the Vector of results.
# this part is the Scalar Defined Word (standard scalar ld operation)
EA = GPR(RA+i) + imm # ptr + offset(next)
data = MEM(EA, 8) # 64-bit address of ptr->next
- GPR(RT+i) = data # happens to be read on next loop!
# was a normal vector-ld up to this point. now the Data-Fail-First
cr_test = conditions(data)
if Rc=1 or RC1: CR.field(i) = cr_test # only store if Rc=1/RC1
+ action_load = True
if cr_test.EQ == testbit: # check if zero
- if VLI then VL = i+1 # update VL, inclusive
- else VL = i # update VL, exclusive current
- break # stop looping
+ if VLI then
+ VL = i+1 # update VL, inclusive
+ else
+ VL = i # update VL, exclusive current
+ action_load = False # current load excluded
+ stop = True # stop looping
+ if action_load:
+ GPR(RT+i) = data # happens to be read on next loop!
+ if stop: break
```
-**Data-Dependent Fault-First on Store-Conditional (Rc=1)**
+**Data-Dependent Fail-First on Store-Conditional (Rc=1)**
There are very few instructions that allow Rc=1 for Load/Store:
one of those is the `stdcx.` and other Atomic Store-Conditional