See [[sv/compliancy_levels]]: all aspects of
Swizzle are entirely optional in hardware at the Embedded Level.*
-Implementors must consider `SUBVL` to have been implicitly set by
-the Swizzle instructions. Hardware may statically calculate `SUBVL`
-from the immediate. "W.0Z" is SUBVL=4, where "X0Z." is SUBVL=3,
-and ".W.." sets SUBVL=2. Setting `SUBVL` has a different meaning
-in Swizzle Move instructions,
-as explained below.
+Implementors must consider Swizzle instructions to be atomically indivisible,
+even if implemented as Micro-coded. The rest of SVP64 permits element-level
+operations to be Precise-Interrupted: *Swizzle moves do not*. All XYZW
+elements *must* be completed in full before any Trap or Interrupt is
+permitted
+to be serviced. Out-of-Order Micro-architectures may of course cancel
+the in-flight instruction as usual if the Interrupt requires fast servicing.
# RM Mode Concept:
`sv.mv.swiz RT.vecN RA.vecN` to mean zip/unzip (pack/unpack).
This is conceptually achieved by having both source and
destination SUBVL be "outer" loops instead of inner loops.
+
+Illustrating a
+"normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides):
+
+ def index():
+ for i in range(VL):
+ for j in range(SUBVL):
+ yield i*SUBVL+j
+
+ for idx in index():
+ operation_on(RA+idx)
+
+For a separate source/dest SUBVL (again, no elwidth overrides):
+
+ # yield an outer-SUBVL, inner VL loop with SRC SUBVL
+ def index_src():
+ for j in range(SRC_SUBVL):
+ for i in range(VL):
+ yield i*SRC_SUBVL+j
+
+ # yield an outer-SUBVL, inner VL loop with DEST SUBVL
+ def index_dest():
+ for j in range(SUBVL):
+ for i in range(VL):
+ yield i*SUBVL+j
+
+ # walk through both source and dest indices simultaneously
+ for src_idx, dst_idx in zip(index_src(), index_dst()):
+ move_operation(RT+dst_idx, RA+src_idx)
+
+"yield" from python is used here for simplicity and clarity.
+The two Finite State Machines for the generation of the source
+and destination element offsets progress incrementally in
+lock-step.
+
+Although not prohibited, it is not expected that
+Software would set both source and destination SUBVL at the
+same time. Usually, either `SRC_SUBVL=1, SUBVL=2/3/4` to give
+a "pack" effect, or `SUBVL=1, SRC_SUBVL=2/3/4` to give an
+"unpack".