From c1790df2b83c038bfcc7aec9831e207fc4811572 Mon Sep 17 00:00:00 2001 From: lkcl Date: Mon, 1 May 2023 13:26:04 +0100 Subject: [PATCH] --- openpower/sv/remap.mdwn | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/openpower/sv/remap.mdwn b/openpower/sv/remap.mdwn index 64bff0b5b..d5b622537 100644 --- a/openpower/sv/remap.mdwn +++ b/openpower/sv/remap.mdwn @@ -636,7 +636,7 @@ disabled: the register's elements are a linear (1D) vector. |xdimsz|ydimsz| zdimsz | permute | invxyz |offset|skip |mode |Matrix | |xdimsz|ydimsz|SVGPR | 11/ |sk1/invxy|offset|elwidth|0b00 |Indexed| |xdimsz|mode | zdimsz | submode2| invxyz |offset|submode|0b01 |DCT/FFT| -| rsvd |rsvd |xdimsz | rsvd | invxyz |offset|submode|0b10 |Preduce| +| rsvd |rsvd |xdimsz | rsvd | invxyz |offset|submode|0b10 |Red/Sum| | | | | | | | |0b11 |rsvd | `mode` sets different behaviours (straight matrix multiply, FFT, DCT). @@ -644,18 +644,20 @@ disabled: the register's elements are a linear (1D) vector. * **mode=0b00** sets straight Matrix Mode * **mode=0b00** with permute=0b110 or 0b111 sets Indexed Mode * **mode=0b01** sets "FFT/DCT" mode and activates submodes -* **mode=0b10** sets "Parallel Reduction" Schedules. +* **mode=0b10** sets "Parallel Reduction or Prefix-Sum" Schedules. *Architectural Resource Allocation note: the four SVSHAPE SPRs are best allocated sequentially and contiguously in order that `sv.mtspr` may -be used* +be used. This is safe to do as long as `SVSTATE.SVme=0`* -## Parallel Reduction Mode +## Parallel Reduction / Prefix-Sum Mode -Creates the Schedules for Parallel Tree Reduction. +Creates the Schedules for Parallel Tree Reduction and Prefix-Sum -* **submode=0b00** selects the left operand index -* **submode=0b01** selects the right operand index +* **submode=0b00** selects the left operand index for Reduction +* **submode=0b01** selects the right operand index for Reduction +* **submode=0b10** selects the left operand index for Prefix-Sum +* **submode=0b11** selects the right operand index for Prefix-Sum * When bit 0 of `invxyz` is set, the order of the indices in the inner for-loop are reversed. This has the side-effect -- 2.30.2