(no commit message)

author lkcl <lkcl@web>

Sat, 7 May 2022 20:17:17 +0000 (21:17 +0100)

committer IkiWiki <ikiwiki.info>

Sat, 7 May 2022 20:17:17 +0000 (21:17 +0100)
author lkcl <lkcl@web>
Sat, 7 May 2022 20:17:17 +0000 (21:17 +0100)
committer IkiWiki <ikiwiki.info>
Sat, 7 May 2022 20:17:17 +0000 (21:17 +0100)
diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn

index 36dc950c97b43d736f054b2c72dc92ce4af47bad..aecdc58efbfb177ed0fa3fe81e51b7ecc5ccbb60 100644 (file)
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -435,9 +435,14 @@ normally otherwise encountered this results in
  contention between the L1 D and I Caches at the L2 Bus, slowing down
  execution even further.  Power ISA 3.1 MMA (Matrix-Multiply-Assist)
  requires loop-unrolling to contend with non-power-of-two Matrix
-sizes: SVP64 does not, as hinted at below.
-
-Additional savings come in the form of `SVREMAP`. This is a hardware
+sizes: SVP64 does not (as hinted at below).
+[Figures 8 and 9](https://arxiv.org/abs/2104.03142)
+illustrate the process of concatenating copies of data in order
+to match RADIX2 limitations of MMA.
+
+Additional savings come in the form of `SVREMAP`. Like the
+hardware-assist of Google's TPU mentioned on p9 of the above MMA paper,
+`SVREMAP` is a hardware
  index transformation system where the normally sequentially-linear
  Vector element access may be "Re-Mapped" to limited but algorithmic-tailored
  commonly-used deterministic schedules, for example Matrix Multiply,