From de442bcfa03103954b50c005c8eafac22ecce13c Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 12 Jun 2022 14:33:47 +0100 Subject: [PATCH] --- openpower/sv/mv.swizzle.mdwn | 53 ++++++++++++++++++++++++++++++++---- 1 file changed, 47 insertions(+), 6 deletions(-) diff --git a/openpower/sv/mv.swizzle.mdwn b/openpower/sv/mv.swizzle.mdwn index 9e747abc0..d4c141251 100644 --- a/openpower/sv/mv.swizzle.mdwn +++ b/openpower/sv/mv.swizzle.mdwn @@ -121,12 +121,13 @@ the instruction in software. Performance will obviously be adversely affected. See [[sv/compliancy_levels]]: all aspects of Swizzle are entirely optional in hardware at the Embedded Level.* -Implementors must consider `SUBVL` to have been implicitly set by -the Swizzle instructions. Hardware may statically calculate `SUBVL` -from the immediate. "W.0Z" is SUBVL=4, where "X0Z." is SUBVL=3, -and ".W.." sets SUBVL=2. Setting `SUBVL` has a different meaning -in Swizzle Move instructions, -as explained below. +Implementors must consider Swizzle instructions to be atomically indivisible, +even if implemented as Micro-coded. The rest of SVP64 permits element-level +operations to be Precise-Interrupted: *Swizzle moves do not*. All XYZW +elements *must* be completed in full before any Trap or Interrupt is +permitted +to be serviced. Out-of-Order Micro-architectures may of course cancel +the in-flight instruction as usual if the Interrupt requires fast servicing. # RM Mode Concept: @@ -143,3 +144,43 @@ The inclusion of a separate src SUBVL allows `sv.mv.swiz RT.vecN RA.vecN` to mean zip/unzip (pack/unpack). This is conceptually achieved by having both source and destination SUBVL be "outer" loops instead of inner loops. + +Illustrating a +"normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides): + + def index(): + for i in range(VL): + for j in range(SUBVL): + yield i*SUBVL+j + + for idx in index(): + operation_on(RA+idx) + +For a separate source/dest SUBVL (again, no elwidth overrides): + + # yield an outer-SUBVL, inner VL loop with SRC SUBVL + def index_src(): + for j in range(SRC_SUBVL): + for i in range(VL): + yield i*SRC_SUBVL+j + + # yield an outer-SUBVL, inner VL loop with DEST SUBVL + def index_dest(): + for j in range(SUBVL): + for i in range(VL): + yield i*SUBVL+j + + # walk through both source and dest indices simultaneously + for src_idx, dst_idx in zip(index_src(), index_dst()): + move_operation(RT+dst_idx, RA+src_idx) + +"yield" from python is used here for simplicity and clarity. +The two Finite State Machines for the generation of the source +and destination element offsets progress incrementally in +lock-step. + +Although not prohibited, it is not expected that +Software would set both source and destination SUBVL at the +same time. Usually, either `SRC_SUBVL=1, SUBVL=2/3/4` to give +a "pack" effect, or `SUBVL=1, SRC_SUBVL=2/3/4` to give an +"unpack". -- 2.30.2