From: lkcl Date: Mon, 11 Apr 2022 08:14:01 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls005_v1~2793 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=51ea7133be63382cb1a7026b2ff643bbcb40a812;p=libreriscv.git --- diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn index 18be50fba..6f23727ba 100644 --- a/openpower/sv/svp64/appendix.mdwn +++ b/openpower/sv/svp64/appendix.mdwn @@ -836,7 +836,9 @@ achieved in DCT and FFT REMAP** // reduction operation -- we still use this algorithm even // if the reduction operation isn't associative or // commutative. -/// `temp_pred` is a user-visible Vector Condition register + XXX VIOLATION OF SVP64 DESIGN PRINCIPLES XXXX +/// XXX `pred` is a user-visible Vector Condition register XXXX + XXX VIOLATION OF SVP64 DESIGN PRINCIPLES XXXX /// /// all input arrays have length `vl` def reduce(vl, vec, pred): @@ -855,7 +857,7 @@ def reduce(vl, vec, pred): pred[i] |= other_pred; ``` -The principle in SVP64 being violated is that SVP64 is a fully-independent +The first principle in SVP64 being violated is that SVP64 is a fully-independent Abstraction of hardware-looping in between issue and execute phases that has no relation to the operation it issues. The above pseudocode conditionally changes not only the type of element operation issued @@ -863,7 +865,17 @@ conditionally changes not only the type of element operation issued At the very least, for Vertical-First Mode this will result in unanticipated and unexpected behaviour (maximise "surprises" for programmers) in the middle of loops, that will be far too hard to explain. -An alternative algorithm is therefore required that does not perform MVs. +The second principle being violated by the above algorithm is the expectation +that temporary storage is available for a modified predicate: there is no +such space. SVP64 is founded on the principle that all operations are +"re-entrant" with respect to interrupts and exceptions: SVSTATE must +be saved and restored alongside PC and MSR, but nothing more. It is perfectly +fine to have context-switching back to the operation be somewhat slower, +through "reconstruction" of temporary internal state based on what SVSTATE +contains, but nothing more. + +An alternative algorithm is therefore required that does not perform MVs, +and does not require additional state to be saved on context-switching. ``` def reduce( vl, vec, pred, pred,): @@ -878,15 +890,21 @@ def reduce( vl, vec, pred, pred,): halfstep = step // 2 for i in (0..vl).step_by(step) other = vi[i + halfstep] - i = vi[i] + ir = vi[i] other_pred = other < vl && pred[other] if pred[i] && other_pred - vec[i] += vec[other] - pred[i] |= other_pred + vec[ir] += vec[other] + else if other_pred: + vi[ir] = vi[other] # index redirection, no MV + pred[ir] |= other_pred # reconstructed on context-switch step *= 2 - ``` +In this version the need for an explicit MV is made unnecessary by instead +leaving elements *in situ*. The internal modifications to the predicate may, +due to the reduction being entirely deterministic, be "reconstructed" +on a context-switch. This may make some implementations slower. + *Implementor's Note: many SIMD-based Parallel Reduction Algorithms are implemented in hardware with MVs that ensure lane-crossing is minimised. In SIMD ISAs the internal SIMD Architectural design is exposed and imposed on the programmer. Cray-style Vector ISAs on the other hand provide convenient,