(no commit message)

author lkcl <lkcl@web>

Fri, 5 Jan 2024 16:04:36 +0000 (16:04 +0000)

committer IkiWiki <ikiwiki.info>

Fri, 5 Jan 2024 16:04:36 +0000 (16:04 +0000)
author lkcl <lkcl@web>
Fri, 5 Jan 2024 16:04:36 +0000 (16:04 +0000)
committer IkiWiki <ikiwiki.info>
Fri, 5 Jan 2024 16:04:36 +0000 (16:04 +0000)
diff --git a/openpower/sv/cookbook/daxpy_example.mdwn b/openpower/sv/cookbook/daxpy_example.mdwn

index 459fc233f6bf85cbafa5d7d3f2a95c68df7b9a5e..0519714a1502b899143c947719011e70568e693c 100644 (file)
--- a/openpower/sv/cookbook/daxpy_example.mdwn
+++ b/openpower/sv/cookbook/daxpy_example.mdwn
@@ -37,9 +37,10 @@ having to pre-subtract an offset before running the loop.
  
  For `sv.lfdup`, RA is Scalar so continuously accumulates
  additions of the immediate (8) but only *after* RA has been used
-as the Effective Address.
+as the Effective Address each time.
  The last write to RA is the address for
-the next block (the next time round the CTR loop). 
+the next block (the next time round the CTR loop).
+
  To understand this it is necessary to appreciate that
  SVP64 is as if a sequence of loop-unrolled scalar instructions were
  issued.  With that sequence all writing the new version of RA
@@ -47,6 +48,13 @@ before the next element-instruction, the end result is identical
  in effect to Element-Strided, except that RA points to the start
  of the next batch.
  
+If  `sv.lfdup` was not available, `sv.lfdu` could be used to the same
+effect, but RA would have to be *pre-subtracted by one element*, outside
+of the loop. Due to the compactness of this highly hardware-parallelizable
+algorithm, that one additinal instruction would increase the implementation
+code size by 5 percent! This helps explain why Post-Increment Update
+Load/Store instructions are so important.
+
  Use of Element-Strided on `sv.lfd/els`
  ensures the Immediate (8) results in a contiguous LD
  *without* modifying RA.
author	lkcl <lkcl@web>
	Fri, 5 Jan 2024 16:04:36 +0000 (16:04 +0000)
committer	IkiWiki <ikiwiki.info>
	Fri, 5 Jan 2024 16:04:36 +0000 (16:04 +0000)