From dea7b7cb09a046e37593b9bf4577b8a482813d65 Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 12 Jun 2022 23:16:55 +0100 Subject: [PATCH] --- openpower/sv/mv.vec.mdwn | 57 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 53 insertions(+), 4 deletions(-) diff --git a/openpower/sv/mv.vec.mdwn b/openpower/sv/mv.vec.mdwn index ec398bcb8..e13e4294c 100644 --- a/openpower/sv/mv.vec.mdwn +++ b/openpower/sv/mv.vec.mdwn @@ -69,7 +69,9 @@ There exists room within the `svshape` instruction of [[sv/remap]] to request some alternative Matrix mappings, and there is also room within the reserved bits of `svremap` as well. -# RM Mode Concept: +# RM Pack/unpack + +Similar to [[sv/mv.swizzle]] MVRM-2P-2S1D: @@ -80,6 +82,53 @@ MVRM-2P-2S1D: | src_SUBVL | `14:15` | SUBVL for Source | | MASK_SRC | `16:18` | Execution Mask for Source | -The inclusion of a separate src SUBVL would allow either -`sv.mv RT.vecN RA.vecN` to mean contiguous sequential copy -or it could mean zip/unzip (pack/unpack). +The inclusion of a separate src SUBVL allows +`sv.mv.swiz RT.vecN RA.vecN` to mean zip/unzip (pack/unpack). +This is conceptually achieved by having both source and +destination SUBVL be "outer" loops instead of inner loops. + +Illustrating a +"normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides): + + def index(): + for i in range(VL): + for j in range(SUBVL): + yield i*SUBVL+j + + for idx in index(): + operation_on(RA+idx) + +For a separate source/dest SUBVL (again, no elwidth overrides): + + # yield an outer-SUBVL, inner VL loop with SRC SUBVL + def index_src(): + for j in range(SRC_SUBVL): + for i in range(VL): + yield i+VL*j + + # yield an outer-SUBVL, inner VL loop with DEST SUBVL + def index_dest(): + for j in range(SUBVL): + for i in range(VL): + yield i+VL*j + + # inner looping when SUBVLs are equal + if SRC_SUBVL == SUBVL: + for idx in index(): + move_operation(RT+idx, RA+idx) + else: + # walk through both source and dest indices simultaneously + for src_idx, dst_idx in zip(index_src(), index_dst()): + move_operation(RT+dst_idx, RA+src_idx) + +"yield" from python is used here for simplicity and clarity. +The two Finite State Machines for the generation of the source +and destination element offsets progress incrementally in +lock-step. + +Normal uaage, `SRC_SUBVL=1, SUBVL=2/3/4` gives +a "pack" effect, and `SUBVL=1, SRC_SUBVL=2/3/4` gives an +"unpack". Setting both SUBVL and SRC_SUBVL to greater than +1 will, unlike [[sv/mv.swizzle]], produce defined deterministic results, +even if a little hard to understand. Loops run through +`MIN(SUBVL, SRC_SUBVL) * VL` elements. -- 2.30.2