From 875cfd9dc46f3961819c0a500c22231dcfeef1af Mon Sep 17 00:00:00 2001 From: lkcl Date: Wed, 15 Sep 2021 14:50:32 +0100 Subject: [PATCH] --- openpower/sv/svp64/appendix.mdwn | 37 ++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 16 deletions(-) diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn index 90fa26488..1a528809d 100644 --- a/openpower/sv/svp64/appendix.mdwn +++ b/openpower/sv/svp64/appendix.mdwn @@ -201,22 +201,27 @@ dest elwidth. # Reduce mode -There are two variants here. The first is when the destination is scalar -and at least one of the sources is Vector. The second is more complex -and involves map-reduction on vectors. - -The first defining characteristic distinguishing Scalar-dest reduce mode -from Vector reduce mode is that Scalar-dest reduce issues VL element -operations, whereas Vector reduce mode performs an actual map-reduce -(tree reduction): typically `O(VL log VL)` actual computations. - -The second defining characteristic of scalar-dest reduce mode is that it -is, in simplistic and shallow terms *serial and sequential in nature*, -whereas the Vector reduce mode is definitely inherently paralleliseable. - -The reason why scalar-dest reduce mode is "simplistically" serial and -sequential is that in certain circumstances (such as an `OR` operation -or a MIN/MAX operation) it may be possible to parallelise the reduction. +Reduction in SVP64 is deterministic and somewhat of a misnomer. A normal +Vector ISA would have explicit Reduce opcodes with defibed characteristics +per operation: in SX Aurora there is even an additional scalar argument +containing the initial reduction value. SVP64 fundamentally has to +utilise *existing* Scalar Power ISA v3.0B operations, which presents some +unique challenges. + +The solution turns out to be to simply define reduction as permitting +deterministic element-based schedules to be issued using the base Scalar +operations, and to rely on the underlying microarchitecture to resolve +Register Hazards at the element level. This goes back to +the fundamental principle that SV is nothing more than a Sub-Program-Counter +sitting between Decode and Issue phases. + +Microarchitectures *may* take opportunities to parallelise the reduction +but only if in doing so they preserve Program Order at the Element Level. +Opportunities where this is possible include an `OR` operation +or a MIN/MAX operation: it may be possible to parallelise the reduction, +but for Floating Point it is not permitted due to different results +being obtained if the reduction is not executed in strict sequential +order. ## Scalar result reduce mode -- 2.30.2