From 88a7f2521f6ebc89eb612cc9e2187c8106ffd373 Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 12 Jun 2022 13:20:20 +0100 Subject: [PATCH] --- openpower/sv/mv.swizzle.mdwn | 41 ++++++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 11 deletions(-) diff --git a/openpower/sv/mv.swizzle.mdwn b/openpower/sv/mv.swizzle.mdwn index 8a5305870..4095f02e5 100644 --- a/openpower/sv/mv.swizzle.mdwn +++ b/openpower/sv/mv.swizzle.mdwn @@ -31,21 +31,21 @@ registers: | swizzle name | source | dest | half | |-- | -- | -- | -- | -| X | RS | RT | lo-half | -| Y | RS | RT | hi-half | -| Z | RS+1 | RT+1 | lo-half | -| W | RS+1 | RT+1 | hi-half | +| X | RA | RT | lo-half | +| Y | RA | RT | hi-half | +| Z | RA+1 | RT+1 | lo-half | +| W | RA+1 | RT+1 | hi-half | -When RS=RT (in-place swizzle) any portion of RT not covered by +When RA=RT (in-place swizzle) any portion of RT not covered by the Swizzle is unmodified. For example a Swizzle of "..XY" -will copy the contents RS+1 into RT but leave RT+1 unmodified. +will copy the contents RA+1 into RT but leave RT+1 unmodified. -When RS!=RT any part of RT or RT+1 not set as a destination by +When RA!=RT any part of RT or RT+1 not set as a destination by the Swizzle will be set to zero. A Swizzle of "..XY" would -copy the contents RS+1 into RT, but set RT+1 to zero. +copy the contents RA+1 into RT, but set RT+1 to zero. Also, making life easier, RT and RA are only permitted to be even -(no overlapping can occur). This makes RT (and RS) a "pair" exactly +(no overlapping can occur). This makes RT (and RA) a "pair" exactly like `lq` and `stq` **SVP64 Vectorised** @@ -58,12 +58,31 @@ would be impractical in a smaller Scalar-only Micro-architecture. Therefore the restriction imposed on the Scalar `mv.swiz` to 32-bit quantities as the default is lifted on `sv.mv.swiz`. +Additionally, in order to make life easier for implementers, some of +whom may wish, especially for Embedded GPUs, to use multi-cycle Micro-coding, +the usual strict Element-level Program Order is relaxed but only for +Horizontal-First Mode: + +* In Horizontal-First Mode, an overlap between all and any Vectorised + sources and destination Elements for the entirety of + the Vector Loop `0..VL-1` is `UNDEFINED` behaviour. +* In Vertical-First Mode, an overlap on any given one execution of + the Swizzle instruction requires that all Swizzled source elements be + copied into intermediary buffers (in-flight Reservation Stations, + pipeline registers) **before* being swapped and placed in + destinations. Strict Program Order is required in full. + +*Implementor's note: the cost of Vertical-First Mode in an Embedded design +of storing four 64-bit in-flight elements may be too high. If this is the +case it is acceptable to throw an Illegal Instruction Trap. +See [[sv/compliancy_levels]]* + # Format | 0.5 |6.10|11.15|16.27|28.31| name | |-----|----|-----|-----|-----|--------------| -|PO | RTp| RSp |imm | 0011| mv.swiz | -|PO | RTp| RSp |imm | 1011| fmv.swiz | +|PO | RTp| RAp |imm | 0011| mv.swiz | +|PO | RTp| RAp |imm | 1011| fmv.swiz | this gives a 12 bit immediate across bits 16 to 27. Each swizzle mnemonic (XYZW), commonly known from 3D GPU programming, -- 2.30.2