From 48728cb99759525f285e80c32d0fe519734d36ab Mon Sep 17 00:00:00 2001 From: lkcl Date: Tue, 14 Jun 2022 16:24:28 +0100 Subject: [PATCH] --- openpower/sv/mv.swizzle.mdwn | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/openpower/sv/mv.swizzle.mdwn b/openpower/sv/mv.swizzle.mdwn index 0426f45da..d18899d44 100644 --- a/openpower/sv/mv.swizzle.mdwn +++ b/openpower/sv/mv.swizzle.mdwn @@ -108,8 +108,12 @@ copy the contents RA+1 into RT, but set RT+1 to zero. Also, making life easier, RT and RA are only permitted to be even (no overlapping can occur). This makes RT (and RA) a "pair" exactly -as in `lq` and `stq`. Swizzle instructions must be atomically indivisible: -an Exception or Interrupt may not occur during the pair of Moves. +as in `lq` and `stq`. Scalar Swizzle instructions must be atomically +indivisible: an Exception or Interrupt may not occur during the Moves. + +Note that unlike the Vectorised variant, when `RT=RA` the Scalar variant +*must* buffer (read) both 64-bit RA registers before writing to the +RT pair. This ensures that register file corruption does not occur. **SVP64 Vectorised** @@ -139,7 +143,8 @@ the Vector Loop `0..VL-1` is `UNDEFINED` behaviour. This in turn implies that Traps and Exceptions are, as usual, permitted in between element-level moves, because due to there being no overlap there is no risk of destroying a source with -an overwrite. +an overwrite. This is *unlike* the Scalar variant which, when +`RT=RA`, must buffer both halves of the RT pair. Determining the source and destination subvector lengths is tricky. Swizzle Pseudocode: @@ -162,7 +167,8 @@ source and destination subvector lengths, by exploiting redundancy in the Swizzle Immediate. With the Swizzles marking what goes into each destination position, the marker "0b001" may be used to indicate the end. If no marker is present then the destination subvector length -may be assumed to be 4. +may be assumed to be 4. SUBVL is considered to be the "source" subvector +length. ``` def index_src(): -- 2.30.2