From 4a19fb4b7fa4092d3f1c145b331e18cdf0e6b607 Mon Sep 17 00:00:00 2001 From: lkcl Date: Wed, 9 Jun 2021 17:41:00 +0100 Subject: [PATCH] --- openpower/sv/svp64/appendix.mdwn | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn index 3e9955c47..ed9cdf7d5 100644 --- a/openpower/sv/svp64/appendix.mdwn +++ b/openpower/sv/svp64/appendix.mdwn @@ -259,6 +259,13 @@ well permitted: both still meet the qualifying characteristics that one source operand can also be the destination, which allows the "accumulator" to be identified. +If an interrupt or exception occurs in the middle of the scalar mapreduce, +the scalar destination register **MUST** be updated with the current +(intermediate) result, because this is how ```Program Order``` is +preserved (Vector Loops are to be considered to be just another instruction +being executed in Program Order). In this way, after return from interrupt, +the scalar mapreduce may continue where it left off. + ## Vector result reduce mode 1. limited to single predicated dual src operations (add RT, RA, RB). @@ -291,6 +298,17 @@ to be identified. unaltered (not used for the purposes of intermediary storage); the scalar result is placed in the first available unmasked element. +Note: Programs **MUST NOT** rely on the contents of the intermediate results: +they may change from hardware implementation to hardware implementation. +Some implementations may perform an incremental update, whilst others +may choose to use the available Vector space for a binary tree reduction. +If an incremental Vector is required (```x[i] = x[i-1] + y[i]```) then +a *straight* SVP64 Vector instruction can be issued, where the source and +destination registers overlap: ```sv.add 1.v, 9.v, 2.v```. Due to +respecting ```Program Order``` being mandatory in SVP64, hardware should +and must detect this case and issue an incremental sequence of scalar +element instructions. + Pseudocode for the case where RA==RB: result = op(iregs[RA], iregs[RA+1]) -- 2.30.2