|xdimsz|ydimsz| zdimsz | permute | invxyz |offset|skip |mode |Matrix |
|xdimsz|ydimsz|SVGPR | 11/ |sk1/invxy|offset|elwidth|0b00 |Indexed|
|xdimsz|mode | zdimsz | submode2| invxyz |offset|submode|0b01 |DCT/FFT|
-| rsvd |rsvd |xdimsz | rsvd | invxyz |offset|submode|0b10 |Preduce|
+| rsvd |rsvd |xdimsz | rsvd | invxyz |offset|submode|0b10 |Red/Sum|
| | | | | | | |0b11 |rsvd |
`mode` sets different behaviours (straight matrix multiply, FFT, DCT).
* **mode=0b00** sets straight Matrix Mode
* **mode=0b00** with permute=0b110 or 0b111 sets Indexed Mode
* **mode=0b01** sets "FFT/DCT" mode and activates submodes
-* **mode=0b10** sets "Parallel Reduction" Schedules.
+* **mode=0b10** sets "Parallel Reduction or Prefix-Sum" Schedules.
*Architectural Resource Allocation note: the four SVSHAPE SPRs are best
allocated sequentially and contiguously in order that `sv.mtspr` may
-be used*
+be used. This is safe to do as long as `SVSTATE.SVme=0`*
-## Parallel Reduction Mode
+## Parallel Reduction / Prefix-Sum Mode
-Creates the Schedules for Parallel Tree Reduction.
+Creates the Schedules for Parallel Tree Reduction and Prefix-Sum
-* **submode=0b00** selects the left operand index
-* **submode=0b01** selects the right operand index
+* **submode=0b00** selects the left operand index for Reduction
+* **submode=0b01** selects the right operand index for Reduction
+* **submode=0b10** selects the left operand index for Prefix-Sum
+* **submode=0b11** selects the right operand index for Prefix-Sum
* When bit 0 of `invxyz` is set, the order of the indices
in the inner for-loop are reversed. This has the side-effect