From: lkcl <lkcl@web>
Date: Sat, 6 May 2023 14:52:57 +0000 (+0100)
Subject: (no commit message)
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=abbb26f589b89e900e7b11d0c71f12ceabe425ed;p=libreriscv.git

---

diff --git a/openpower/sv/ldst.mdwn b/openpower/sv/ldst.mdwn
index d486556ae..8b68d7472 100644
--- a/openpower/sv/ldst.mdwn
+++ b/openpower/sv/ldst.mdwn
@@ -1,5 +1,6 @@
 # SV Load and Store
 
+<!-- hide -->
 Links:
 
 * <https://bugs.libre-soc.org/show_bug.cgi?id=561>
@@ -10,6 +11,7 @@ Links:
 * <https://llvm.org/devmtg/2016-11/Slides/Emerson-ScalableVectorizationinLLVMIR.pdf>
 * <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-loads-and-stores>
 * [[ldst/discussion]]
+<!-- show -->
 
 ## Rationale
 
@@ -204,7 +206,8 @@ Mode, when elwidth overrides are applied.  The source override applies to
 RB, and before adding to RA in order to calculate the Effective Address,
 if SEA is set RB is sign-extended from elwidth bits to the full 64 bits.
 For other Modes (ffirst, saturate), all EA computation with elwidth
-overrides is unsigned.
+overrides is unsigned.  RA is *not* altered (not truncated)
+by element-width overrides.
 
 Note that cache-inhibited LD/ST  when VSPLAT is activated will perform
 **multiple** LD/ST operations, sequentially.  Even with scalar src
@@ -470,24 +473,26 @@ next element.  This may be used to perform single-linked-list
 walking, where Data-Dependent Fail-First terminates and
 truncates the Vector at the first NULL.*
 
+**Load/Store Data-Dependent Fail-First, VLi=0**
+
 In the case of Store operations there is a quirk when VLi (VL inclusive
 is "Valid") is clear. Bear in mind the criteria is that the truncated
 Vector of results, when VLi is clear, must all pass the "test", but when
 VLi is set the *current failed test* is permitted to be included.  Thus,
 the actual update (store) to Memory is **not permitted to take place**
-should the test fail. Therefore, on testing the value to be stored,
-when VLi=0 and finding that the test fails the Memory store must **not** occur.
+should the test fail.
 
-Additionally, when VLi=0 and a test fails then RA does **not** receive a
+Additionally in any Load/Store with Update instruction,
+when VLi=0 and a test fails then RA does **not** receive a
 copy of the Effective Address.  Hardware implementations with Out-of-Order
 Micro-Architectures should use speculative Shadow-Hold and Cancellation
-when the test fails.
+(or other Transactional Rollback mechanism) when the test fails.
+
+**Load/Store Data-Dependent Fail-First, VLi=1**
 
-By contrast if VLi=1 and the test fails, Store may proceed *and then*
-looping terminates.  In this way, when non-Inclusive, the Vector of
-Truncated results contains only Stores that passed the test (and RA=EA
-updates if any), and when Inclusive the Vector of Truncated results
-contains the first-failed data.
+By contrast if VLi=1 and the test fails, the Store may proceed *and then*
+looping terminates.  In this way, when Inclusive the Vector of Truncated results
+contains the first-failed data (including RA on Updates)
 
 Below is an example of loading the starting addresses of Linked-List
 nodes.  If VLi=1 it will load the NULL pointer into the Vector of results.
@@ -506,17 +511,23 @@ zero in the predicate will be the NULL pointer*
        # this part is the Scalar Defined Word (standard scalar ld operation)
        EA = GPR(RA+i) + imm          # ptr + offset(next)
        data = MEM(EA, 8)             # 64-bit address of ptr->next
-       GPR(RT+i) = data              # happens to be read on next loop!
        # was a normal vector-ld up to this point. now the Data-Fail-First
        cr_test = conditions(data)
        if Rc=1 or RC1: CR.field(i) = cr_test # only store if Rc=1/RC1
+       action_load = True
        if cr_test.EQ == testbit:             # check if zero
-           if VLI then   VL = i+1            # update VL, inclusive
-           else          VL = i              # update VL, exclusive current
-           break                             # stop looping
+           if VLI then
+              VL = i+1            # update VL, inclusive
+           else
+              VL = i              # update VL, exclusive current
+              action_load = False # current load excluded
+           stop = True            # stop looping
+       if action_load:
+          GPR(RT+i) = data        # happens to be read on next loop!
+       if stop: break
 ```
 
-**Data-Dependent Fault-First on Store-Conditional (Rc=1)**
+**Data-Dependent Fail-First on Store-Conditional (Rc=1)**
 
 There are very few instructions that allow Rc=1 for Load/Store:
 one of those is the `stdcx.` and other Atomic Store-Conditional