if outer:
for j in range(dst_subvl):
for i in range(VL):
- ....
+ yield j*VL+i
else:
for i in range(VL):
for j in range(dst_subvl):
- ....
+ yield i*dst_subvl+j
# yield an outer-SUBVL or inner VL loop with SUBVL
def index_src(outer):
if outer:
for j in range(SUBVL):
for i in range(VL):
- ....
+ yield j*VL+i
else:
for i in range(VL):
for j in range(SUBVL):
- ....
+ yield i*SUBVL+j
```
"yield" from python is used here for simplicity and clarity.