From: lkcl Date: Tue, 21 Jun 2022 15:11:25 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls005_v1~1623 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=df08ebaa5f5e2d9f05ec2b98b7bbbeee66d83018;p=libreriscv.git --- diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn index a8b9c1644..626c3640a 100644 --- a/openpower/sv/svp64/appendix.mdwn +++ b/openpower/sv/svp64/appendix.mdwn @@ -963,13 +963,13 @@ For modes: - mr OR crm: "normal" map-reduce mode or CR-mode. - mr.svm OR crm.svm: when vec2/3/4 set, sub-vector mapreduce is enabled -# Proposed Parallel-reduction algorithm +# Parallel-reduction algorithm The principle of SVP64 is that SVP64 is a fully-independent Abstraction of hardware-looping in between issue and execute phases that has no relation to the operation it issues. Additional state cannot be saved on context-switching beyond that -of SVSTATE. +of SVSTATE, making things slightly tricky. Executable demo pseudocode, full version [here](https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/preduce.py;hb=HEAD) @@ -997,10 +997,24 @@ def preducei(vl, vec, pred): return vec ``` -In this version the need for an explicit MV is made unnecessary by instead -leaving elements *in situ*. The internal modifications to the predicate may, -due to the reduction being entirely deterministic, be "reconstructed" -on a context-switch. This may make some implementations slower. +This algorithm works by noting when data remains in-place rather than +being reduced, and referring to that alternative position on subsequent +layers of reduction. It is re-entrant. If however interrupted and +restored, some implementations may take longer to re-establish the +context. + +Its application by default is that: + +* RA, FRA or BFA is the first register as the first operand + (ci index offset in the above pseudocode) +* RB, FRB or BFB is the second (co index offset) +* RT (result) also uses ci **if RA==RT** + +For more complex applications a REMAP Schedule must be used + +*Programmers's note: +if passed a predicate mask with only one bit set, this algorithm +takes no action, similar to when a predicate mask is all zero.* *Implementor's Note: many SIMD-based Parallel Reduction Algorithms are implemented in hardware with MVs that ensure lane-crossing is minimised.