From: lkcl Date: Tue, 14 Jun 2022 15:17:56 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls005_v1~1789 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=150a2f38ba4306ae15f476a62e79c8ef3fee7feb;p=libreriscv.git --- diff --git a/openpower/sv/mv.swizzle.mdwn b/openpower/sv/mv.swizzle.mdwn index 26ab13877..0426f45da 100644 --- a/openpower/sv/mv.swizzle.mdwn +++ b/openpower/sv/mv.swizzle.mdwn @@ -131,35 +131,15 @@ quantities as the default is lifted on `sv.mv.swiz`. Additionally, in order to make life easier for implementers, some of whom may wish, especially for Embedded GPUs, to use multi-cycle Micro-coding, -the usual strict Element-level Program Order is relaxed but only for -Horizontal-First Mode: - -* In Horizontal-First Mode, an overlap between all and any Vectorised - sources and destination Elements for the entirety of - the Vector Loop `0..VL-1` is `UNDEFINED` behaviour. -* In Vertical-First Mode, an overlap on any given one execution of - the Swizzle instruction requires that all Swizzled source elements be - copied into intermediary buffers (in-flight Reservation Stations, - pipeline registers) **before* being swapped and placed in - destinations. In-place (RT=RA) is required to work correctly. - Strict Program Order is required in full. - -*Implementor's note: the cost of Vertical-First Mode in an Embedded design -of storing four 64-bit in-flight elements may be considered -too high. If this is the -case it is acceptable to throw an Illegal Instruction Trap, and emulate -the instruction in software. Performance will obviously be adversely affected. -See [[sv/compliancy_levels]]: all aspects of -Swizzle are entirely optional in hardware at the Embedded Level.* - -Implementors must consider Swizzle instructions to be atomically indivisible, -even if implemented as Micro-coded. The rest of SVP64 permits element-level -operations to be Precise-Interrupted: *Swizzle moves do not* because -the multiple moves are, strictly speaking, one instruction. All XYZW -elements *must* be completed in full before any Trap or Interrupt is -permitted -to be serviced. Out-of-Order Micro-architectures may of course cancel -the in-flight instruction as usual if the Interrupt requires fast servicing. +the usual strict Element-level Program Order is relaxed. +An overlap between all and any Vectorised +sources and destination Elements for the entirety of +the Vector Loop `0..VL-1` is `UNDEFINED` behaviour. + +This in turn implies that Traps and Exceptions are, as usual, +permitted in between element-level moves, because due to there +being no overlap there is no risk of destroying a source with +an overwrite. Determining the source and destination subvector lengths is tricky. Swizzle Pseudocode: