From cfcf0bfc5f63e5f30193ca54a52be023a3b187fa Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 12 Jun 2022 17:07:52 +0100 Subject: [PATCH] --- openpower/sv/mv.swizzle.mdwn | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/openpower/sv/mv.swizzle.mdwn b/openpower/sv/mv.swizzle.mdwn index 0c9e66c11..b54ceac5a 100644 --- a/openpower/sv/mv.swizzle.mdwn +++ b/openpower/sv/mv.swizzle.mdwn @@ -17,6 +17,15 @@ A compromise is to provide a Swizzle "Move". The encoding for this instruction embeds static predication into the swizzle as well as constants 1/1.0 and 0/0.0 +Although conceptually similar to `vpermd` of Packed SIMD VSX, +Swizzle Moves come in immediate-only form with only up to four +selectors, where VSX refers to individual bytes and may not +copy constants to the destination. +3D Shader programs commonly use the letters "XYZW" +when referring to the four swizzle indices, and also often +use the letters "RGBA" +if referring to pixel data. + # Format | 0.5 |6.10|11.15|16.27|28.31| name | Form | @@ -32,6 +41,7 @@ to each: | imm |0.2 |3.5 |6.8|9.11| |-------|----|----|---|----| |swizzle|X | Y | Z | W | +|pixel |R | G | B | A | |index |0 | 1 | 2 | 3 | the options for each Swizzle are: @@ -70,7 +80,7 @@ mv.swiz and fmv.swiz operate on four 32-bit quantities, reducing this instruction to 2-in, 2-out pairs of 64-bit registers: -| swizzle name | source | dest | half | +| swizzle name | source | dest | half | |-- | -- | -- | -- | | X | RA | RT | lo-half | | Y | RA | RT | hi-half | @@ -92,6 +102,11 @@ an Exception or Interrupt may not occur during the pair of Moves. **SVP64 Vectorised** +Vectorised Swizzle may be considered to be an extended static predicate +mask for subvectors (SUBVL=2/3/4). SUBVL (and SRC_SUBVL, see later section) +must be set in order to aid in determining source and destination subvector +lengths. + When Vectorised, given the use-case is for a High-performance GPU, the fundamental assumption is that Micro-coding or other technique will @@ -130,6 +145,12 @@ permitted to be serviced. Out-of-Order Micro-architectures may of course cancel the in-flight instruction as usual if the Interrupt requires fast servicing. +Swizzle Pseudocode (when SRC_SUBVL=SUBVL): + +``` +``` + + # RM Mode Concept: MVRM-2P-2S1D: @@ -171,9 +192,14 @@ For a separate source/dest SUBVL (again, no elwidth overrides): for i in range(VL): yield i+VL*j - # walk through both source and dest indices simultaneously - for src_idx, dst_idx in zip(index_src(), index_dst()): - move_operation(RT+dst_idx, RA+src_idx) + # inner looping when SUBVLs are equal + if SRC_SUBVL == SUBVL: + for idx in index(): + move_operation(RT+idx, RA+idx) + else: + # walk through both source and dest indices simultaneously + for src_idx, dst_idx in zip(index_src(), index_dst()): + move_operation(RT+dst_idx, RA+src_idx) "yield" from python is used here for simplicity and clarity. The two Finite State Machines for the generation of the source -- 2.30.2