For a separate source/dest SUBVL (again, no elwidth overrides):
- # yield an outer-SUBVL, inner VL loop with SRC SUBVL
- def index_src():
- for j in range(SUBVL):
+ # yield an outer-SUBVL or inner VL loop with SUBVL
+ def index_dest(outer):
+ if outer:
+ for j in range(dst_subvl):
+ for i in range(VL):
+ ....
+ else:
for i in range(VL):
- yield i+VL*j
+ for j in range(dst_subvl):
+ ....
- # yield an outer-SUBVL, inner VL loop with DEST SUBVL
- def index_dest():
- for j in range(dst_subvl):
+ # yield an outer-SUBVL or inner VL loop with SUBVL
+ def index_src(outer):
+ if outer:
+ for j in range(SUBVL):
+ for i in range(VL):
+ ....
+ else:
for i in range(VL):
- yield i+VL*j
+ for j in range(SUBVL):
+ ....
"yield" from python is used here for simplicity and clarity.
The two Finite State Machines for the generation of the source
if PACK_en and UNPACK_en:
num_runs = 1 # both are outer loops
for substep in num_runs:
- (src_idx, offs) = yield from index_src()
- dst_idx = yield from index_dst()
+ (src_idx, offs) = yield from index_src(UNPACK_en)
+ dst_idx = yield from index_dst(PACK_en)
move_operation(RT+dst_idx, RA+src_idx+offs)
```