From 6061f370c0b153de663a1b302f50c8392be91887 Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 17 Jan 2021 16:10:03 +0000 Subject: [PATCH] --- openpower/sv/svp64/appendix.mdwn | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn index b7bbfd939..1a5b88596 100644 --- a/openpower/sv/svp64/appendix.mdwn +++ b/openpower/sv/svp64/appendix.mdwn @@ -147,6 +147,31 @@ Note that the operation takes place at the maximum bitwidth (max of src and dest # Reduce mode +There are two variants here. The first is when the destination is scalar and at least one of the sources is Vector. The second is more complex and involves reduction on vectors. + +The defining characteristic distinguishing Scalar-dest reduce mode from Vector reduce mode is that Scalar-dest reduce issues VL element operations whereas Vector reduce mode performs an actual map-reduce (tree reduction) typically `O(VL log VL)` actual computations. + +## Scalar result reduce mode + +Scalar reduction is categorised by: + +* One of the sources is a Vector +* the destination is a scalar +* optionally but most usefully when one source register is also the destination + +Typical applications include simple operations such as `ADD r3, r10.v, r3` +where, clearly, r3 is being used to accumulate the addition of all elements is the vector starting at r10. + + # add RT, RA,RB but when RT==RA + for i in range(VL): + iregs[RA] += iregs[RB+i] # RT==RA + +However, *unless* the operation is marked as "mapreduce", SV **terminates** at the first scalar operation. Only by marking the operation as "mapreduce" will it continue to issue multiple sub-looped (element) instructions in `Program Order`. + +Other examples include shift-mask operations where a Vector of inserts into a single destination register is required, as a way to construct a value quickly from multiple arbitrary bit-ranges and bit-offsets. Using the same register as both the source and destination, with Vectirs of different offsets masks and values to be inserted has multiple applications including Video, cryptography and JIT compilation. + +## Vector result reduce mode + 1. limited to single predicated dual src operations (add RT, RA, RB). triple source operations are prohibited (fma). 2. limited to operations that make sense. divide is excluded, as is -- 2.30.2