## Scalar result reduce mode
In this mode, which is suited to operations involving carry or overflow,
-one register is identified as being the "accumulator".
+one register must be identified by the programmer as being the "accumulator".
Scalar reduction is thus categorised by:
* One of the sources is a Vector
* optionally but most usefully when one source register is also the destination
* That the source register type is the same as the destination register
type identified as the "accumulator". scalar reduction on `cmp`,
- `setb` or `isel` is not possible for example because of the mixture
+ `setb` or `isel` makes no sense for example because of the mixture
between CRs and GPRs.
Typical applications include simple operations such as `ADD r3, r10.v,
operation as "mapreduce" will it continue to issue multiple sub-looped
(element) instructions in `Program Order`.
+To.perform the loop in reverse order, the ```RG``` (reverse gear) bit must be set. This is useful for leaving a cumulative suffix sum in reverse order:
+
+ for i in (VL-1 downto 0):
+ # RT-1 = RA gives a suffix sum
+ iregs[RT+i] = iregs[RA+i] - iregs[RB+i]
+
Other examples include shift-mask operations where a Vector of inserts
into a single destination register is required, as a way to construct
a value quickly from multiple arbitrary bit-ranges and bit-offsets.
If an interrupt or exception occurs in the middle of the scalar mapreduce,
the scalar destination register **MUST** be updated with the current
(intermediate) result, because this is how ```Program Order``` is
-preserved (Vector Loops are to be considered to be just another instruction
-being executed in Program Order). In this way, after return from interrupt,
+preserved (Vector Loops are to be considered to be just another way of issuing instructions
+in Program Order). In this way, after return from interrupt,
the scalar mapreduce may continue where it left off. This provides
"precise" exception behaviour.