(no commit message)

author lkcl <lkcl@web>

Fri, 6 May 2022 09:13:30 +0000 (10:13 +0100)

committer IkiWiki <ikiwiki.info>

Fri, 6 May 2022 09:13:30 +0000 (10:13 +0100)
author lkcl <lkcl@web>
Fri, 6 May 2022 09:13:30 +0000 (10:13 +0100)
committer IkiWiki <ikiwiki.info>
Fri, 6 May 2022 09:13:30 +0000 (10:13 +0100)
diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn

index ec20d02f91747573f89508fd3a0471e8fa566b7c..2b47344fca6cbdc43cd0803477bf1690420cfd4c 100644 (file)
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -223,7 +223,7 @@ Packed SIMD explicitly smashes that width right in the face of the
  programmer and expects them to like it.  As the article immediately
  demonstrates, an arbitrary-sized data set has to contend with
  an insane power-of-two Packed SIMD cascade at both setup and teardown
-that can add literally an order
+that routinely adds literally an order
  of magnitude increase in the number of hand-written lines of assembler
  compared to a well-designed Cray-style Vector ISA with a `setvl`
  instruction.
@@ -393,7 +393,9 @@ astounding 240 hand-coded assembler instructions where it is around
  for Massive FFTs) the L1 I-Cache becomes completely ineffective, and in
  the case of the IBM POWER9 a little-known design flaw this results in
  contention between the L1 D and I Caches at the L2 Bus, slowing down
-execution even further.
+execution even further.  Power ISA 3.1 MMA (Matrix-Multiply-Assist)
+requires loop-unrolling to contend with non-power-of-two Matrix
+sizes: SVP64 does not, as hinted at below.
  
  Additional savings come in the form of `SVREMAP`. This is a hardware
  index transformation system where the normally sequentially-linear
@@ -403,7 +405,7 @@ DCT, or FFT.  A full in-register-file 5x7 Matrix Multiply or a 3x4 or
  2x6 may be performed in as little as 4 instructions, one of which
  is to zero-initialise the accumulator Vector used to store the result.
  If addition to another Matrix is also required then it is only three
-instructions.  Not only that, but because the "Schedule" is an abstract
+instructions. Not only that, but because the "Schedule" is an abstract
  concept separated from the mathematical operation, there is no reason
  why Matrix Multiplication Schedules may not be applied to Integer
  Mul-and-Accumulate, Galois Field Mul-and-Accumulate, or Logical
author	lkcl <lkcl@web>
	Fri, 6 May 2022 09:13:30 +0000 (10:13 +0100)
committer	IkiWiki <ikiwiki.info>
	Fri, 6 May 2022 09:13:30 +0000 (10:13 +0100)