(no commit message)

author lkcl <lkcl@web>

Sat, 6 May 2023 14:52:57 +0000 (15:52 +0100)

committer IkiWiki <ikiwiki.info>

Sat, 6 May 2023 14:52:57 +0000 (15:52 +0100)
author lkcl <lkcl@web>
Sat, 6 May 2023 14:52:57 +0000 (15:52 +0100)
committer IkiWiki <ikiwiki.info>
Sat, 6 May 2023 14:52:57 +0000 (15:52 +0100)
diff --git a/openpower/sv/ldst.mdwn b/openpower/sv/ldst.mdwn

index d486556ae33354953f02aa6de5ef599897027dd7..8b68d7472b3bbbb505945aaac766b819bd63d110 100644 (file)
--- a/openpower/sv/ldst.mdwn
+++ b/openpower/sv/ldst.mdwn
@@ -1,5 +1,6 @@
  # SV Load and Store
  
+<!-- hide -->
  Links:
  
  * <https://bugs.libre-soc.org/show_bug.cgi?id=561>
@@ -10,6 +11,7 @@ Links:
  * <https://llvm.org/devmtg/2016-11/Slides/Emerson-ScalableVectorizationinLLVMIR.pdf>
  * <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-loads-and-stores>
  * [[ldst/discussion]]
+<!-- show -->
  
  ## Rationale
  
@@ -204,7 +206,8 @@ Mode, when elwidth overrides are applied.  The source override applies to
  RB, and before adding to RA in order to calculate the Effective Address,
  if SEA is set RB is sign-extended from elwidth bits to the full 64 bits.
  For other Modes (ffirst, saturate), all EA computation with elwidth
-overrides is unsigned.
+overrides is unsigned.  RA is *not* altered (not truncated)
+by element-width overrides.
  
  Note that cache-inhibited LD/ST  when VSPLAT is activated will perform
  **multiple** LD/ST operations, sequentially.  Even with scalar src
@@ -470,24 +473,26 @@ next element.  This may be used to perform single-linked-list
  walking, where Data-Dependent Fail-First terminates and
  truncates the Vector at the first NULL.*
  
+**Load/Store Data-Dependent Fail-First, VLi=0**
+
  In the case of Store operations there is a quirk when VLi (VL inclusive
  is "Valid") is clear. Bear in mind the criteria is that the truncated
  Vector of results, when VLi is clear, must all pass the "test", but when
  VLi is set the *current failed test* is permitted to be included.  Thus,
  the actual update (store) to Memory is **not permitted to take place**
-should the test fail. Therefore, on testing the value to be stored,
-when VLi=0 and finding that the test fails the Memory store must **not** occur.
+should the test fail.
  
-Additionally, when VLi=0 and a test fails then RA does **not** receive a
+Additionally in any Load/Store with Update instruction,
+when VLi=0 and a test fails then RA does **not** receive a
  copy of the Effective Address.  Hardware implementations with Out-of-Order
  Micro-Architectures should use speculative Shadow-Hold and Cancellation
-when the test fails.
+(or other Transactional Rollback mechanism) when the test fails.
+
+**Load/Store Data-Dependent Fail-First, VLi=1**
  
-By contrast if VLi=1 and the test fails, Store may proceed *and then*
-looping terminates.  In this way, when non-Inclusive, the Vector of
-Truncated results contains only Stores that passed the test (and RA=EA
-updates if any), and when Inclusive the Vector of Truncated results
-contains the first-failed data.
+By contrast if VLi=1 and the test fails, the Store may proceed *and then*
+looping terminates.  In this way, when Inclusive the Vector of Truncated results
+contains the first-failed data (including RA on Updates)
  
  Below is an example of loading the starting addresses of Linked-List
  nodes.  If VLi=1 it will load the NULL pointer into the Vector of results.
@@ -506,17 +511,23 @@ zero in the predicate will be the NULL pointer*
         # this part is the Scalar Defined Word (standard scalar ld operation)
         EA = GPR(RA+i) + imm          # ptr + offset(next)
         data = MEM(EA, 8)             # 64-bit address of ptr->next
-       GPR(RT+i) = data              # happens to be read on next loop!
         # was a normal vector-ld up to this point. now the Data-Fail-First
         cr_test = conditions(data)
         if Rc=1 or RC1: CR.field(i) = cr_test # only store if Rc=1/RC1
+       action_load = True
         if cr_test.EQ == testbit:             # check if zero
-           if VLI then   VL = i+1            # update VL, inclusive
-           else          VL = i              # update VL, exclusive current
-           break                             # stop looping
+           if VLI then
+              VL = i+1            # update VL, inclusive
+           else
+              VL = i              # update VL, exclusive current
+              action_load = False # current load excluded
+           stop = True            # stop looping
+       if action_load:
+          GPR(RT+i) = data        # happens to be read on next loop!
+       if stop: break
  ```
  
-**Data-Dependent Fault-First on Store-Conditional (Rc=1)**
+**Data-Dependent Fail-First on Store-Conditional (Rc=1)**
  
  There are very few instructions that allow Rc=1 for Load/Store:
  one of those is the `stdcx.` and other Atomic Store-Conditional
author	lkcl <lkcl@web>
	Sat, 6 May 2023 14:52:57 +0000 (15:52 +0100)
committer	IkiWiki <ikiwiki.info>
	Sat, 6 May 2023 14:52:57 +0000 (15:52 +0100)