From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Sun, 18 Sep 2022 21:28:46 +0000 (+0100)
Subject: clarify REMAP modes
X-Git-Tag: opf_rfc_ls005_v1~364
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=427e43729a44cb4ac3960124f3accdf4b9b777d5;p=libreriscv.git

clarify REMAP modes
---

diff --git a/openpower/sv/rfc/ls001.mdwn b/openpower/sv/rfc/ls001.mdwn
index 697791c33..8ae49f789 100644
--- a/openpower/sv/rfc/ls001.mdwn
+++ b/openpower/sv/rfc/ls001.mdwn
@@ -150,7 +150,8 @@ complexity to achieve high throughput, even on a single-issue in-order
 microarchitecture. As usually becomes quickly apparent with in-order, its
 limitations extend also to when Simple-V is deployed, which is why
 Multi-Issue Out-of-Order is the recommended (but not mandatory) Scalar
-Micro-architecture.
+Micro-architecture.  Byte-level write-enable regfiles (like SRAMs) are
+strongly recommended, to avoid a Read-Modify-Write cycle.
 
 The only major concern is in the upper SV Extension Levels: the Hazard
 Management for increased number of Scalar Registers to 128 (in current
@@ -524,8 +525,13 @@ are 100% Deterministic from their point of declaration,
 making it possible to forward-plan
 Issue, Memory access and Register Hazard Management
 in Multi-Issue Micro-architectures.
+
 If combined with Vertical-First then much more complex operations may exploit
-REMAP Schedules, such as Complex Number FFTs.
+REMAP Schedules, such as Complex Number FFTs, by using Scalar intermediary
+temporary registers to compute results that have a Vector destination.
+Contrast this with a Standard Horizontal-First Vector ISA where the only
+way to perform Vectorised Complex Arithmetic would be to add Complex Vector
+Arithmetic operations.
 
 * **DCT/FFT** REMAP brings more capability than TI's MSP-Series DSPs and
   Qualcom Hexagon DSPs, and is not restricted to Integer or FP.
@@ -544,6 +550,16 @@ REMAP Schedules, such as Complex Number FFTs.
 * **Parallel Reduction** REMAP, performs an automatic map-reduce using
   *any suitable scalar operation*.
 
+All REMAP Schedules are Precise-Interruptible. No latency penalty is caused by
+the fact that the Schedule is Parallel-Reduction, for example.  The operations
+are Issued (Deterministically) as **Scalar** operations and thus any latency
+associated with **Scalar** operation Issue exactly as in a **Scalar**
+Micro-architecture will result.  Contrast this with a Standard Vector ISA
+where frequently there is either considerable interrupt latency due to
+requiring a Parallel Reduction to complete in full, or partial results
+to be discarded and re-started should a high-priority Interrupt occur
+in the middle.
+
 Note that predication is possible on REMAP but is hard to use effectively.
 It is often best to make copies of data (`VCOMPRESS`) then apply REMAP.