* When bit 0 of `invxyz` is set, the order of the indices
in the inner for-loop are reversed. This has the side-effect
- of placing the final reduced result in the last element.
+ of placing the final reduced result in the last-predicated element.
+ It also has the indirect side-effect of swapping the source
+ registers: Left-operand index numbers will always exceed
+ Right-operand indices.
+ When clear, the reduced result will be in the first-predicated
+ element, and Left-operand indices will always be *less* than
+ Right-operand ones.
* When bit 1 of `invxyz` is set, the order of the outer loop
step is inverted: stepping begins at the nearest power-of two
to half of the vector length and reduces by half each time.
+ When clear the step will begin at 2 and double on each
+ inner loop.
## FFT/DCT mode