1 # RFC ls009 Simple-V REMAP Subsystem
5 * <https://libre-soc.org/openpower/sv/rfc/ls009/>
6 * <https://bugs.libre-soc.org/show_bug.cgi?id=1042>
7 * <https://git.openpower.foundation/isa/PowerISA/issues/124>
13 **Date**: 26 Mar 2023. v2 TODO
19 **Books and Section affected**:
22 Book I, new Zero-Overhead-Loop Chapter.
23 Appendix E Power ISA sorted by opcode
24 Appendix F Power ISA sorted by version
25 Appendix G Power ISA sorted by Compliancy Subset
26 Appendix H Power ISA sorted by mnemonic
32 svremap - Re-Mapping of Register Element Offsets
33 svindex - General-purpose setting of SHAPEs to be re-mapped
34 svshape - Hardware-level setting of SHAPEs for element re-mapping
35 svshape2 - Hardware-level setting of SHAPEs for element re-mapping (v2)
38 **Submitter**: Luke Leighton (Libre-SOC)
40 **Requester**: Libre-SOC
42 **Impact on processor**:
45 Addition of four new "Zero-Overhead-Loop-Control" DSP-style Vector-style
46 Management Instructions which provide advanced features such as Matrix
47 FFT DCT Hardware-Assist Schedules and general-purpose Index reordering.
50 **Impact on software**:
53 Requires support for new instructions in assembler, debuggers,
60 Cray Supercomputing, Vectorisation, Zero-Overhead-Loop-Control (ZOLC),
61 Scalable Vectors, Multi-Issue Out-of-Order, Sequential Programming Model,
62 Digital Signal Processing (DSP)
67 These REMAP Management instructions provide state-of-the-art advanced capabilities
68 to dramatically decrease instruction count and power reduction whilst retaining
69 unprecedented general-purpose capability and a standard Sequential Execution Model.
71 **Notes and Observations**:
73 1. Although costly the alternatives in SIMD-paradigm software result in
74 huge algorithmic complexity and associated power consumption increases.
75 Loop-unrolling compilers are prevalent as is thousands to tens of
76 thousands of instructions.
77 2. Core inner kernels of Matrix DCT DFT FFT NTT are dramatically reduced
78 to a handful of instructions.
79 3. No REMAP instructions with the exception of Indexed rely on registers
80 for the establishment of REMAP capability.
81 4. Future EXT1xx variants and SVP64/VSX will dramatically expand the power
82 and flexibility of REMAP.
83 5. The Simple-V Compliancy Subsets make REMAP optional except in the
84 Advanced Levels. Like v3.1 MMA it is **not** necessary for **all**
85 hardware to implement REMAP.
89 Add the following entries to:
91 * the Appendices of SV Book
92 * Instructions of SV Book as a new Section
93 * SVI, SVM, SVM2, SVRM Form of Book I Section 1.6.1.6 and 1.6.2
99 [[!inline pages="openpower/sv/remap" raw=yes ]]
103 Add `SVI, SVM, SVM2, SVRM` to `XO (26:31)` Field in Book I, 1.6.2
105 Add the following to Book I, 1.6.1, SVI-Form
108 |0 |6 |11 |16 |21 |23 |24|25|26 31|
109 | PO | SVG|rmm | SVd |ew |SVyx|mm|sk| XO |
112 Add the following to Book I, 1.6.1, SVM-Form
115 |0 |6 |11 |16 |21 |25 |26 |31 |
116 | PO | SVxd | SVyd | SVzd | SVrm |vf | XO |
119 Add the following to Book I, 1.6.1, SVM2-Form
122 |0 |6 |10 |11 |16 |21 |24|25 |26 |31 |
123 | PO | SVo |SVyx| rmm | SVd |XO |mm|sk | XO |
126 Add the following to Book I, 1.6.1, SVRM-Form
129 |0 |6 |11 |13 |15 |17 |19 |21 |22 |26 |31 |
130 | PO | SVme |mi0 | mi1 | mi2 | mo0 | mo1 |pst |/// | XO |
133 Add the following to Book I, 1.6.2
137 Field used in REMAP to select the SVSHAPE for 1st input register
140 Field used in REMAP to select the SVSHAPE for 2nd input register
143 Field used in REMAP to select the SVSHAPE for 3rd input register
146 Field used to specify the meaning of the rmm field for SVI-Form
150 Field used in REMAP to select the SVSHAPE for 1st output register
153 Field used in REMAP to select the SVSHAPE for 2nd output register
156 Field used in REMAP to indicate "persistence" mode (REMAP
157 continues to apply to multiple instructions)
160 REMAP Mode field for SVI-Form and SVM2-Form
163 Field used to specify dimensional skipping in svindex
166 Immediate field used to specify the size of the REMAP dimension
167 in the svindex and svshape2 instructions
170 Immediate field used to specify a 9-bit signed
171 two's complement integer which is concatenated
172 on the right with 0b00 and sign-extended to 64 bits.
175 Field used to specify a GPR to be used as a
179 Simple-V immediate field for setting VL or MVL
182 Simple-V "REMAP" map-enable bits (0-4)
185 Field used by the svshape2 instruction as an offset
188 Simple-V "REMAP" Mode
191 Simple-V "REMAP" x-dimension size
194 Simple-V "REMAP" y-dimension size
197 Simple-V "REMAP" z-dimension size
200 Extended opcode field. Note that bit 21 must be 1, 22 and 23
201 must be zero, and bits 26-31 must be exactly the same as
208 Appendix E Power ISA sorted by opcode
209 Appendix F Power ISA sorted by version
210 Appendix G Power ISA sorted by Compliancy Subset
211 Appendix H Power ISA sorted by mnemonic
213 | Form | Book | Page | Version | mnemonic | Description |
214 |------|------|------|---------|----------|-------------|
215 | SVRM | I | # | 3.0B | svremap | REMAP enabling instruction |
216 | SVM | I | # | 3.0B | svshape | REMAP shape instruction |
217 | SVM2 | I | # | 3.0B | svshape2 | REMAP shape instruction (2) |
218 | SVI | I | # | 3.0B | svindex | REMAP General-purpose Indexing |
220 [[!inline pages="openpower/sv/remap/appendix" raw=yes ]]
224 Written in python3 the following stand-alone executable source code is the Canonical
225 Specification for each REMAP. Vectors of "loopends" are returned when Rc=1
226 in Vectors of CR Fields on `sv.svstep.`, or in Vertical-First Mode
227 a single CR Field (CR0) on `svstep.`. The `SVSTATE.srcstep` or `SVSTATE.dststep` sequential
228 offset is put through each algorithm to determine the actual Element Offset.
229 Alternative implementations producing different ordering
230 is prohibited as software will be critically relying on these Deterministic Schedules.
232 ### REMAP 2D/3D Matrix
234 The following stand-alone executable source code is the Canonical
235 Specification for Matrix (2D/3D) REMAP.
236 Hardware implementations are achievable with simple cascading counter-and-compares.
239 # python "yield" can be iterated. use this to make it clear how
240 # the indices are generated by using natural-looking nested loops
241 def iterate_indices(SVSHAPE):
242 # get indices to iterate over, in the required order
246 # create lists of indices to iterate over in each dimension
247 x_r = list(range(xd))
248 y_r = list(range(yd))
249 z_r = list(range(zd))
250 # invert the indices if needed
251 if SVSHAPE.invxyz[0]: x_r.reverse()
252 if SVSHAPE.invxyz[1]: y_r.reverse()
253 if SVSHAPE.invxyz[2]: z_r.reverse()
254 # start an infinite (wrapping) loop
255 step = 0 # track src/dst step
257 for z in z_r: # loop over 1st order dimension
259 for y in y_r: # loop over 2nd order dimension
261 for x in x_r: # loop over 3rd order dimension
263 # ok work out which order to construct things in.
264 # start by creating a list of tuples of the dimension
266 vals = [(SVSHAPE.lims[0], x, "x"),
267 (SVSHAPE.lims[1], y, "y"),
268 (SVSHAPE.lims[2], z, "z")
270 # now select those by order. this allows us to
271 # create schedules for [z][x], [x][y], or [y][z]
272 # for matrix multiply.
273 vals = [vals[SVSHAPE.order[0]],
274 vals[SVSHAPE.order[1]],
275 vals[SVSHAPE.order[2]]
277 # ok now we can construct the result, using bits of
278 # "order" to say which ones get stacked on
282 lim, idx, dbg = vals[i]
283 # some of the dimensions can be "skipped". the order
284 # was actually selected above on all 3 dimensions,
285 # e.g. [z][x][y] or [y][z][x]. "skip" allows one of
286 # those to be knocked out
287 if SVSHAPE.skip == i+1: continue
288 idx *= mult # shifts up by previous dimension(s)
289 result += idx # adds on this dimension
290 mult *= lim # for the next dimension
293 ((y_end and x_end)<<1) |
294 ((y_end and x_end and z_end)<<2))
296 yield result + SVSHAPE.offset, loopends
300 # set the dimension sizes here
305 # set total (can repeat, e.g. VL=x*y*z*4)
306 VL = xdim * ydim * zdim
312 SVSHAPE0.lims = [xdim, ydim, zdim]
313 SVSHAPE0.order = [1,0,2] # experiment with different permutations, here
316 SVSHAPE0.offset = 0 # experiment with different offset, here
317 SVSHAPE0.invxyz = [0,0,0] # inversion if desired
319 # enumerate over the iterator function, getting new indices
320 for idx, (new_idx, end) in enumerate(iterate_indices(SVSHAPE0)):
323 print ("%d->%d" % (idx, new_idx), "end", bin(end)[2:])
326 if __name__ == '__main__':
330 ### REMAP Parallel Reduction pseudocode
332 The python3 program below is stand-alone executable and is the Canonical Specification
333 for Parallel Reduction REMAP.
334 The Algorithm below is not limited to RADIX2 sizes, and Predicate
335 sources, unlike in Matrix REMAP, apply to the Element Indices **after** REMAP
336 has been applied, not before. MV operations are not required: the algorithm
337 tracks positions of elements that would normally be moved and when applying
338 an Element Reduction Operation sources the operands from their last-known (tracked)
342 # a "yield" version of the Parallel Reduction REMAP algorithm.
343 # the algorithm is in-place. it does not perform "MV" operations.
344 # instead, where a masked-out value *should* be read from is tracked
346 def iterate_indices(SVSHAPE, pred=None):
347 # get indices to iterate over, in the required order
349 # create lists of indices to iterate over in each dimension
351 # invert the indices if needed
352 if SVSHAPE.invxyz[0]: ix.reverse()
353 # start a loop from the lowest step
359 # invert the indices if needed
360 if SVSHAPE.invxyz[1]: steps.reverse()
362 stepend = (step == steps[-1]) # note end of steps
363 idxs = list(range(0, xd, step))
366 other = i + step // 2
368 oi = ix[other] if other < xd else None
369 other_pred = other < xd and (pred is None or pred[oi])
370 if (pred is None or pred[ci]) and other_pred:
371 if SVSHAPE.skip == 0b00: # submode 00
373 elif SVSHAPE.skip == 0b01: # submode 01
375 results.append([result + SVSHAPE.offset, 0])
379 results[-1][1] = (stepend<<1) | 1 # notify end of loops
383 # set the dimension sizes here
390 SVSHAPE0.lims = [xdim, 0, 0]
391 SVSHAPE0.order = [0,1,2]
394 SVSHAPE0.offset = 0 # experiment with different offset, here
395 SVSHAPE0.invxyz = [0,0,0] # inversion if desired
398 SVSHAPE1.lims = [xdim, 0, 0]
399 SVSHAPE1.order = [0,1,2]
402 SVSHAPE1.offset = 0 # experiment with different offset, here
403 SVSHAPE1.invxyz = [0,0,0] # inversion if desired
405 # enumerate over the iterator function, getting new indices
406 shapes = list(iterate_indices(SVSHAPE0)), \
407 list(iterate_indices(SVSHAPE1))
408 for idx in range(len(shapes[0])):
413 print ("%d->%d:%d" % (idx, l_idx, r_idx),
414 "end", bin(lend)[2:], bin(rend)[2:])
417 if __name__ == '__main__':
421 ### REMAP FFT pseudocode
423 The FFT REMAP is RADIX2 only.
426 # a "yield" version of the REMAP algorithm, for FFT Tukey-Cooley schedules
427 # original code for the FFT Tukey-Cooley schedule:
428 # https://www.nayuki.io/res/free-small-fft-in-multiple-languages/fft.py
430 # Radix-2 decimation-in-time FFT (real, not complex)
434 tablestep = n // size
435 for i in range(0, n, size):
437 for j in range(i, i + halfsize):
440 temp1 = vec[jh] * exptable[k]
442 vec[jh] = temp2 - temp1
443 vec[jl] = temp2 + temp1
448 # python "yield" can be iterated. use this to make it clear how
449 # the indices are generated by using natural-looking nested loops
450 def iterate_butterfly_indices(SVSHAPE):
451 # get indices to iterate over, in the required order
453 stride = SVSHAPE.lims[2] # stride-multiplier on reg access
454 # creating lists of indices to iterate over in each dimension
455 # has to be done dynamically, because it depends on the size
456 # first, the size-based loop (which can be done statically)
462 # invert order if requested
463 if SVSHAPE.invxyz[0]: x_r.reverse()
468 # start an infinite (wrapping) loop
471 for size in x_r: # loop over 3rd order dimension (size)
472 x_end = size == x_r[-1]
473 # y_r schedule depends on size
475 tablestep = n // size
477 for i in range(0, n, size):
479 # invert if requested
480 if SVSHAPE.invxyz[1]: y_r.reverse()
481 for i in y_r: # loop over 2nd order dimension
486 for j in range(i, i+halfsize):
490 # invert if requested
491 if SVSHAPE.invxyz[2]: k_r.reverse()
492 if SVSHAPE.invxyz[2]: j_r.reverse()
493 for j, k in zip(j_r, k_r): # loop over 1st order dimension
495 # now depending on MODE return the index
496 if SVSHAPE.skip == 0b00:
497 result = j # for vec[j]
498 elif SVSHAPE.skip == 0b01:
499 result = j + halfsize # for vec[j+halfsize]
500 elif SVSHAPE.skip == 0b10:
501 result = k # for exptable[k]
504 ((y_end and z_end)<<1) |
505 ((y_end and x_end and z_end)<<2))
507 yield (result * stride) + SVSHAPE.offset, loopends
510 # set the dimension sizes here
512 ydim = 0 # not needed
513 zdim = 1 # stride must be set to 1
515 # set total. err don't know how to calculate how many there are...
516 # do it manually for now
522 tablestep = n // size
523 for i in range(0, n, size):
524 for j in range(i, i + halfsize):
533 SVSHAPE0.lims = [xdim, ydim, zdim]
534 SVSHAPE0.order = [0,1,2] # experiment with different permutations, here
537 SVSHAPE0.offset = 0 # experiment with different offset, here
538 SVSHAPE0.invxyz = [0,0,0] # inversion if desired
539 # j+halfstep schedule
541 SVSHAPE1.lims = [xdim, ydim, zdim]
542 SVSHAPE1.order = [0,1,2] # experiment with different permutations, here
545 SVSHAPE1.offset = 0 # experiment with different offset, here
546 SVSHAPE1.invxyz = [0,0,0] # inversion if desired
549 SVSHAPE2.lims = [xdim, ydim, zdim]
550 SVSHAPE2.order = [0,1,2] # experiment with different permutations, here
553 SVSHAPE2.offset = 0 # experiment with different offset, here
554 SVSHAPE2.invxyz = [0,0,0] # inversion if desired
556 # enumerate over the iterator function, getting new indices
558 for idx, (jl, jh, k) in enumerate(zip(iterate_butterfly_indices(SVSHAPE0),
559 iterate_butterfly_indices(SVSHAPE1),
560 iterate_butterfly_indices(SVSHAPE2))):
563 schedule.append((jl, jh, k))
565 # ok now pretty-print the results, with some debug output
570 tablestep = n // size
571 print ("size %d halfsize %d tablestep %d" % \
572 (size, halfsize, tablestep))
573 for i in range(0, n, size):
574 prefix = "i %d\t" % i
576 for j in range(i, i + halfsize):
577 (jl, je), (jh, he), (ks, ke) = schedule[idx]
578 print (" %-3d\t%s j=%-2d jh=%-2d k=%-2d -> "
579 "j[jl=%-2d] j[jh=%-2d] ex[k=%d]" % \
580 (idx, prefix, j, j+halfsize, k,
583 "end", bin(je)[2:], bin(je)[2:], bin(ke)[2:])
589 if __name__ == '__main__':
595 DCT REMAP is RADIX2 only. Convolutions may be applied as usual
596 to create non-RADIX2 DCT. Combined with appropriate Twin-butterfly
597 instructions the algorithm below (written in python3), becomes part
598 of an in-place in-registers Vectorised DCT. The algorithms work
599 by loading data such that as the nested loops progress the result
600 is sorted into correct sequential order.
603 # DCT "REMAP" scheduler to create an in-place iterative DCT.
606 # bits of the integer 'val' of width 'width' are reversed
607 def reverse_bits(val, width):
609 for _ in range(width):
610 result = (result << 1) | (val & 1)
615 # iterative version of [recursively-applied] half-reversing
616 # turns out this is Gray-Encoding.
617 def halfrev2(vec, pre_rev=True):
619 for i in range(len(vec)):
621 res.append(vec[i ^ (i>>1)])
625 for ji in range(1, bl):
631 def iterate_dct_inner_halfswap_loadstore(SVSHAPE):
632 # get indices to iterate over, in the required order
634 mode = SVSHAPE.lims[1]
635 stride = SVSHAPE.lims[2]
637 # reference list for not needing to do data-swaps, just swap what
638 # *indices* are referenced (two levels of indirection at the moment)
639 # pre-reverse the data-swap list so that it *ends up* in the order 0123..
642 levels = n.bit_length() - 1
643 ri = [reverse_bits(i, levels) for i in range(n)]
645 if SVSHAPE.mode == 0b01: # FFT, bitrev only
646 ji = [ji[ri[i]] for i in range(n)]
647 elif SVSHAPE.submode2 == 0b001:
648 ji = [ji[ri[i]] for i in range(n)]
649 ji = halfrev2(ji, True)
651 ji = halfrev2(ji, False)
652 ji = [ji[ri[i]] for i in range(n)]
654 # invert order if requested
655 if SVSHAPE.invxyz[0]:
658 for i, jl in enumerate(ji):
660 yield jl * stride, (0b111 if y_end else 0b000)
662 def iterate_dct_inner_costable_indices(SVSHAPE):
663 # get indices to iterate over, in the required order
665 mode = SVSHAPE.lims[1]
666 stride = SVSHAPE.lims[2]
667 # creating lists of indices to iterate over in each dimension
668 # has to be done dynamically, because it depends on the size
669 # first, the size-based loop (which can be done statically)
675 # invert order if requested
676 if SVSHAPE.invxyz[0]:
682 # start an infinite (wrapping) loop
684 z_end = 1 # doesn't exist in this, only 2 loops
687 for size in x_r: # loop over 3rd order dimension (size)
688 x_end = size == x_r[-1]
689 # y_r schedule depends on size
692 for i in range(0, n, size):
694 # invert if requested
695 if SVSHAPE.invxyz[1]: y_r.reverse()
696 # two lists of half-range indices, e.g. j 0123, jr 7654
697 j = list(range(0, halfsize))
698 # invert if requested
699 if SVSHAPE.invxyz[2]: j_r.reverse()
700 # loop over 1st order dimension
701 for ci, jl in enumerate(j):
703 # now depending on MODE return the index. inner butterfly
704 if SVSHAPE.skip == 0b00: # in [0b00, 0b10]:
705 result = k # offset into COS table
706 elif SVSHAPE.skip == 0b10: #
707 result = ci # coefficient helper
708 elif SVSHAPE.skip == 0b11: #
709 result = size # coefficient helper
711 ((y_end and z_end)<<1) |
712 ((y_end and x_end and z_end)<<2))
714 yield (result * stride) + SVSHAPE.offset, loopends
717 def iterate_dct_inner_butterfly_indices(SVSHAPE):
718 # get indices to iterate over, in the required order
720 mode = SVSHAPE.lims[1]
721 stride = SVSHAPE.lims[2]
722 # creating lists of indices to iterate over in each dimension
723 # has to be done dynamically, because it depends on the size
724 # first, the size-based loop (which can be done statically)
730 # invert order if requested
731 if SVSHAPE.invxyz[0]:
737 # reference (read/write) the in-place data in *reverse-bit-order*
739 if SVSHAPE.submode2 == 0b01:
740 levels = n.bit_length() - 1
741 ri = [ri[reverse_bits(i, levels)] for i in range(n)]
743 # reference list for not needing to do data-swaps, just swap what
744 # *indices* are referenced (two levels of indirection at the moment)
745 # pre-reverse the data-swap list so that it *ends up* in the order 0123..
748 if inplace_mode and SVSHAPE.submode2 == 0b01:
749 ji = halfrev2(ji, True)
750 if inplace_mode and SVSHAPE.submode2 == 0b11:
751 ji = halfrev2(ji, False)
753 # start an infinite (wrapping) loop
757 for size in x_r: # loop over 3rd order dimension (size)
758 x_end = size == x_r[-1]
759 # y_r schedule depends on size
762 for i in range(0, n, size):
764 # invert if requested
765 if SVSHAPE.invxyz[1]: y_r.reverse()
766 for i in y_r: # loop over 2nd order dimension
768 # two lists of half-range indices, e.g. j 0123, jr 7654
769 j = list(range(i, i + halfsize))
770 jr = list(range(i+halfsize, i + size))
772 # invert if requested
773 if SVSHAPE.invxyz[2]:
776 hz2 = halfsize // 2 # zero stops reversing 1-item lists
777 # loop over 1st order dimension
779 for ci, (jl, jh) in enumerate(zip(j, jr)):
781 # now depending on MODE return the index. inner butterfly
782 if SVSHAPE.skip == 0b00: # in [0b00, 0b10]:
783 if SVSHAPE.submode2 == 0b11: # iDCT
784 result = ji[ri[jl]] # lower half
786 result = ri[ji[jl]] # lower half
787 elif SVSHAPE.skip == 0b01: # in [0b01, 0b11]:
788 if SVSHAPE.submode2 == 0b11: # iDCT
789 result = ji[ri[jl+halfsize]] # upper half
791 result = ri[ji[jh]] # upper half
793 # COS table pre-generated mode
794 if SVSHAPE.skip == 0b10: #
795 result = k # cos table offset
797 # COS table generated on-demand ("Vertical-First") mode
798 if SVSHAPE.skip == 0b10: #
799 result = ci # coefficient helper
800 elif SVSHAPE.skip == 0b11: #
801 result = size # coefficient helper
803 ((y_end and z_end)<<1) |
804 ((y_end and x_end and z_end)<<2))
806 yield (result * stride) + SVSHAPE.offset, loopends
811 for ci, (jl, jh) in enumerate(zip(j[:hz2], jr[:hz2])):
813 tmp1, tmp2 = ji[jlh], ji[jh]
814 ji[jlh], ji[jh] = tmp2, tmp1
816 # new k_start point for cos tables( runs inside x_r loop NOT i loop)
820 # python "yield" can be iterated. use this to make it clear how
821 # the indices are generated by using natural-looking nested loops
822 def iterate_dct_outer_butterfly_indices(SVSHAPE):
823 # get indices to iterate over, in the required order
825 mode = SVSHAPE.lims[1]
826 stride = SVSHAPE.lims[2]
827 # creating lists of indices to iterate over in each dimension
828 # has to be done dynamically, because it depends on the size
829 # first, the size-based loop (which can be done statically)
835 # invert order if requested
836 if SVSHAPE.invxyz[0]:
842 # I-DCT, reference (read/write) the in-place data in *reverse-bit-order*
844 if SVSHAPE.submode2 in [0b11, 0b01]:
845 levels = n.bit_length() - 1
846 ri = [ri[reverse_bits(i, levels)] for i in range(n)]
848 # reference list for not needing to do data-swaps, just swap what
849 # *indices* are referenced (two levels of indirection at the moment)
850 # pre-reverse the data-swap list so that it *ends up* in the order 0123..
852 inplace_mode = False # need the space... SVSHAPE.skip in [0b10, 0b11]
853 if SVSHAPE.submode2 == 0b11:
854 ji = halfrev2(ji, False)
856 # start an infinite (wrapping) loop
860 for size in x_r: # loop over 3rd order dimension (size)
862 x_end = size == x_r[-1]
863 y_r = list(range(0, halfsize))
864 # invert if requested
865 if SVSHAPE.invxyz[1]: y_r.reverse()
866 for i in y_r: # loop over 2nd order dimension
868 # one list to create iterative-sum schedule
869 jr = list(range(i+halfsize, i+n-halfsize, size))
870 # invert if requested
871 if SVSHAPE.invxyz[2]: jr.reverse()
872 hz2 = halfsize // 2 # zero stops reversing 1-item lists
874 for ci, jh in enumerate(jr): # loop over 1st order dimension
877 # COS table pre-generated mode
878 if SVSHAPE.skip == 0b00: # in [0b00, 0b10]:
879 if SVSHAPE.submode2 == 0b11: # iDCT
880 result = ji[ri[jh]] # upper half
882 result = ri[ji[jh]] # lower half
883 elif SVSHAPE.skip == 0b01: # in [0b01, 0b11]:
884 if SVSHAPE.submode2 == 0b11: # iDCT
885 result = ji[ri[jh+size]] # upper half
887 result = ri[ji[jh+size]] # upper half
888 elif SVSHAPE.skip == 0b10: #
889 result = k # cos table offset
891 # COS table generated on-demand ("Vertical-First") mode
892 if SVSHAPE.skip == 0b00: # in [0b00, 0b10]:
893 if SVSHAPE.submode2 == 0b11: # iDCT
894 result = ji[ri[jh]] # lower half
896 result = ri[ji[jh]] # lower half
897 elif SVSHAPE.skip == 0b01: # in [0b01, 0b11]:
898 if SVSHAPE.submode2 == 0b11: # iDCT
899 result = ji[ri[jh+size]] # upper half
901 result = ri[ji[jh+size]] # upper half
902 elif SVSHAPE.skip == 0b10: #
903 result = ci # coefficient helper
904 elif SVSHAPE.skip == 0b11: #
905 result = size # coefficient helper
907 ((y_end and z_end)<<1) |
908 ((y_end and x_end and z_end)<<2))
910 yield (result * stride) + SVSHAPE.offset, loopends
913 # new k_start point for cos tables( runs inside x_r loop NOT i loop)
920 Selecting which REMAP Schedule to use is shown by the pseudocode below.
921 Each SVSHAPE 0-3 goes through this selection process.
924 if SVSHAPEn.mode == 0b00: iterate_fn = iterate_indices
925 if SVSHAPEn.mode == 0b10: iterate_fn = iterate_preduce_indices
926 if SVSHAPEn.mode in [0b01, 0b11]:
927 # further sub-selection
928 if SVSHAPEn.ydimsz == 1: iterate_fn = iterate_butterfly_indices
929 if SVSHAPEn.ydimsz == 2: iterate_fn = iterate_dct_inner_butterfly_indices
930 if SVSHAPEn.ydimsz == 3: iterate_fn = iterate_dct_outer_butterfly_indices
931 if SVSHAPEn.ydimsz in [5, 13]: iterate_fn = iterate_dct_inner_costable_indices
932 if SVSHAPEn.ydimsz in [6, 14, 15]: iterate_fn = iterate_dct_inner_halfswap_loadstore