From 427e43729a44cb4ac3960124f3accdf4b9b777d5 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Sun, 18 Sep 2022 22:28:46 +0100 Subject: [PATCH] clarify REMAP modes --- openpower/sv/rfc/ls001.mdwn | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/openpower/sv/rfc/ls001.mdwn b/openpower/sv/rfc/ls001.mdwn index 697791c33..8ae49f789 100644 --- a/openpower/sv/rfc/ls001.mdwn +++ b/openpower/sv/rfc/ls001.mdwn @@ -150,7 +150,8 @@ complexity to achieve high throughput, even on a single-issue in-order microarchitecture. As usually becomes quickly apparent with in-order, its limitations extend also to when Simple-V is deployed, which is why Multi-Issue Out-of-Order is the recommended (but not mandatory) Scalar -Micro-architecture. +Micro-architecture. Byte-level write-enable regfiles (like SRAMs) are +strongly recommended, to avoid a Read-Modify-Write cycle. The only major concern is in the upper SV Extension Levels: the Hazard Management for increased number of Scalar Registers to 128 (in current @@ -524,8 +525,13 @@ are 100% Deterministic from their point of declaration, making it possible to forward-plan Issue, Memory access and Register Hazard Management in Multi-Issue Micro-architectures. + If combined with Vertical-First then much more complex operations may exploit -REMAP Schedules, such as Complex Number FFTs. +REMAP Schedules, such as Complex Number FFTs, by using Scalar intermediary +temporary registers to compute results that have a Vector destination. +Contrast this with a Standard Horizontal-First Vector ISA where the only +way to perform Vectorised Complex Arithmetic would be to add Complex Vector +Arithmetic operations. * **DCT/FFT** REMAP brings more capability than TI's MSP-Series DSPs and Qualcom Hexagon DSPs, and is not restricted to Integer or FP. @@ -544,6 +550,16 @@ REMAP Schedules, such as Complex Number FFTs. * **Parallel Reduction** REMAP, performs an automatic map-reduce using *any suitable scalar operation*. +All REMAP Schedules are Precise-Interruptible. No latency penalty is caused by +the fact that the Schedule is Parallel-Reduction, for example. The operations +are Issued (Deterministically) as **Scalar** operations and thus any latency +associated with **Scalar** operation Issue exactly as in a **Scalar** +Micro-architecture will result. Contrast this with a Standard Vector ISA +where frequently there is either considerable interrupt latency due to +requiring a Parallel Reduction to complete in full, or partial results +to be discarded and re-started should a high-priority Interrupt occur +in the middle. + Note that predication is possible on REMAP but is hard to use effectively. It is often best to make copies of data (`VCOMPRESS`) then apply REMAP. -- 2.30.2