There are four "shape" SPRs, SHAPE0-3, 32-bits in each,
which have the same format.
-[[!inline pages="openpower/sv/shape_table_format" raw="yes" ]]
-
+Shape is 32-bits. When SHAPE is set entirely to zeros, remapping is
+disabled: the register's elements are a linear (1D) vector.
+
+|31.30|29..28 |27..24| 23..21 | 20..18 | 17..12 |11..6 |5..0 | Mode |
+|---- |------ |------| ------ | ------- | ------- |----- |----- | ----- |
+|0b00 |skip |offset| invxyz | permute | zdimsz |ydimsz|xdimsz|Matrix |
+|0b00 |elwidth|offset|sk1/invxy|0b110/0b111|SVGPR|ydimsz|xdimsz|Indexed|
+|0b01 |submode|offset| invxyz | submode2| rsvd |rsvd |xdimsz|DCT/FFT|
+|0b10 | | | | | | | |rsvd |
+|0b11 | | | | | | | |rsvd |
+
+mode sets different behaviours (straight matrix multiply, FFT, DCT).
+
+* **mode=0b00** sets straight Matrix Mode
+* **mode=0b00** with permute=0b110 or 0b111 sets Indexed Mode
+* **mode=0b01** sets "FFT/DCT" mode and activates submodes
+
+## FFT/DCT mode
+
+submode2=0 is for FFT. For FFT submode the following schedules may be
+selected:
+
+* **submode=0b00** selects the ``j`` offset of the innermost for-loop
+ of Tukey-Cooley
+* **submode=0b10** selects the ``j+halfsize`` offset of the innermost for-loop
+ of Tukey-Cooley
+* **submode=0b11** selects the ``k`` of exptable (which coefficient)
+
+When submode2 is 1 or 2, for DCT inner butterfly submode the following
+schedules may be selected. When submode2 is 1, additional bit-reversing
+is also performed.
+
+* **submode=0b00** selects the ``j`` offset of the innermost for-loop,
+ in-place
+* **submode=0b010** selects the ``j+halfsize`` offset of the innermost for-loop,
+ in reverse-order, in-place
+* **submode=0b10** selects the ``ci`` count of the innermost for-loop,
+ useful for calculating the cosine coefficient
+* **submode=0b11** selects the ``size`` offset of the outermost for-loop,
+ useful for the cosine coefficient ``cos(ci + 0.5) * pi / size``
+
+When submode2 is 3 or 4, for DCT outer butterfly submode the following
+schedules may be selected. When submode is 3, additional bit-reversing
+is also performed.
+
+* **submode=0b00** selects the ``j`` offset of the innermost for-loop,
+* **submode=0b01** selects the ``j+1`` offset of the innermost for-loop,
+
+## Matrix Mode
+
+In Matrix Mode, skip allows dimensions to be skipped from being included
+in the resultant output index. this allows sequences to be repeated:
+```0 0 0 1 1 1 2 2 2 ...``` or in the case of skip=0b11 this results in
+modulo ```0 1 2 0 1 2 ...```
+
+* **skip=0b00** indicates no dimensions to be skipped
+* **skip=0b01** sets "skip 1st dimension"
+* **skip=0b10** sets "skip 2nd dimension"
+* **skip=0b11** sets "skip 3rd dimension"
+
+invxyz will invert the start index of each of x, y or z. If invxyz[0] is
+zero then x-dimensional counting begins from 0 and increments, otherwise
+it begins from xdimsz-1 and iterates down to zero. Likewise for y and z.
+
+offset will have the effect of offsetting the result by ```offset``` elements:
+
+ for i in 0..VL-1:
+ GPR(RT + remap(i) + SVSHAPE.offset) = ....
+
+this appears redundant because the register RT could simply be changed by a compiler, until element width overrides are introduced. also
+bear in mind that unlike a static compiler SVSHAPE.offset may
+be set dynamically at runtime.
+
+xdimsz, ydimsz and zdimsz are offset by 1, such that a value of 0 indicates
+that the array dimensionality for that dimension is 1. any dimension
+not intended to be used must have its value set to 0 (dimensionality
+of 1). A value of xdimsz=2 would indicate that in the first dimension
+there are 3 elements in the array. For example, to create a 2D array
+X,Y of dimensionality X=3 and Y=2, set xdimsz=2, ydimsz=1 and zdimsz=0
+
+The format of the array is therefore as follows:
+
+ array[xdimsz+1][ydimsz+1][zdimsz+1]
+
+However whilst illustrative of the dimensionality, that does not take the
+"permute" setting into account. "permute" may be any one of six values
+(0-5, with values of 6 and 7 indicating "Indexed" Mode). The table
+below shows how the permutation dimensionality order works:
+
+| permute | order | array format |
+| ------- | ----- | ------------------------ |
+| 000 | 0,1,2 | (xdim+1)(ydim+1)(zdim+1) |
+| 001 | 0,2,1 | (xdim+1)(zdim+1)(ydim+1) |
+| 010 | 1,0,2 | (ydim+1)(xdim+1)(zdim+1) |
+| 011 | 1,2,0 | (ydim+1)(zdim+1)(xdim+1) |
+| 100 | 2,0,1 | (zdim+1)(xdim+1)(ydim+1) |
+| 101 | 2,1,0 | (zdim+1)(ydim+1)(xdim+1) |
+| 110 | 0,1 | Indexed (xdim+1)(ydim+1) |
+| 111 | 1,0 | Indexed (ydim+1)(xdim+1) |
+
+In other words, the "permute" option changes the order in which
+nested for-loops over the array would be done. See executable
+python reference code for further details.
+
+*Note: permute=0b110 and permute=0b111 enable Indexed REMAP Mode,
+described below*
+
+## Indexed Mode
+
+Indexed Mode activates reading of the element indices from the GPR
+and includes optional limited 2D reordering.
+In its simplest form (without elwidth overrides or other modes):
+
+```
+def index_remap(i):
+ return GPR((SVSHAPE.SVGPR<<1)+i+SVSHAPE.offset)
+
+for i in 0..VL-1:
+ element_result = ....
+ GPR(RT + indexed_remap(i)) = element_result
+```
+
+With element-width overrides included, and using the pseudocode
+from the SVP64 [[sv/svp64/appendix#elwidth]] elwidth section
+this becomes:
+
+```
+def index_remap(i):
+ svreg = SVSHAPE.SVGPR << 1
+ srcwid = elwid_to_bitwidth(SVSHAPE.elwid)
+ offs = SVSHAPE.offset
+ return get_polymorphed_reg(svreg, srcwid, i) + offs
+
+for i in 0..VL-1:
+ element_result = ....
+ rt_idx = indexed_remap(i)
+ set_polymorphed_reg(RT, destwid, rt_idx, element_result)
+```
+
+Matrix-style reordering still applies to the indices, except limited
+to up to 2 Dimensions (X,Y). Ordering is therefore limited to (X,Y) or
+(Y,X). Only one dimension may optionally be skipped. Inversion of either
+X or Y or both is possible. Pseudocode for Indexed Mode (including elwidth
+overrides) may be written in terms of Matrix Mode, specifically
+purposed to ensure that the 3rd dimension (Z) has no effect:
+
+```
+def index_remap(ISHAPE, i):
+ MSHAPE.skip = 0b0 || ISHAPE.sk1
+ MSHAPE.invxyz = 0b0 || ISHAPE.invxy
+ MSHAPE.xdimsz = ISHAPE.xdimsz
+ MSHAPE.ydimsz = ISHAPE.ydimsz
+ MSHAPE.zdimsz = 0 # disabled
+ if ISHAPE.permute = 0b110 # 0,1
+ MSHAPE.permute = 0b000 # 0,1,2
+ if ISHAPE.permute = 0b111 # 1,0
+ MSHAPE.permute = 0b010 # 1,0,2
+ el_idx = remap_matrix(MSHAPE, i)
+ svreg = ISHAPE.SVGPR << 1
+ srcwid = elwid_to_bitwidth(ISHAPE.elwid)
+ offs = ISHAPE.offset
+ return get_polymorphed_reg(svreg, srcwid, el_idx) + offs
+```
+
+The most important observation above is that the Matrix-style
+remapping occurs first and the Index lookup second. Thus it
+becomes possible to perform in-place Transpose of Indices which
+may have been costly to set up or costly to duplicate
+(waste register file space).
# svshape instruction <a name="svshape"> </a>
`svshape` is a convenience instruction that reduces instruction
+++ /dev/null
-Shape is 32-bits. When SHAPE is set entirely to zeros, remapping is
-disabled: the register's elements are a linear (1D) vector.
-
-|31.30|29..28 |27..24| 23..21 | 20..18 | 17..12 |11..6 |5..0 | Mode |
-|---- |------ |------| ------ | ------- | ------- |----- |----- | ----- |
-|0b00 |skip |offset| invxyz | permute | zdimsz |ydimsz|xdimsz|Matrix |
-|0b00 |elwidth|offset|sk1/invxy|0b110/0b111|SVGPR|ydimsz|xdimsz|Indexed|
-|0b01 |submode|offset| invxyz | submode2| rsvd |rsvd |xdimsz|DCT/FFT|
-|0b10 | | | | | | | |rsvd |
-|0b11 | | | | | | | |rsvd |
-
-mode sets different behaviours (straight matrix multiply, FFT, DCT).
-
-* **mode=0b00** sets straight Matrix Mode
-* **mode=0b00** with permute=0b110 or 0b111 sets Indexed Mode
-* **mode=0b01** sets "FFT/DCT" mode and activates submodes
-
-## FFT/DCT mode
-
-submode2=0 is for FFT. For FFT submode the following schedules may be
-selected:
-
-* **submode=0b00** selects the ``j`` offset of the innermost for-loop
- of Tukey-Cooley
-* **submode=0b10** selects the ``j+halfsize`` offset of the innermost for-loop
- of Tukey-Cooley
-* **submode=0b11** selects the ``k`` of exptable (which coefficient)
-
-When submode2 is 1 or 2, for DCT inner butterfly submode the following
-schedules may be selected. When submode2 is 1, additional bit-reversing
-is also performed.
-
-* **submode=0b00** selects the ``j`` offset of the innermost for-loop,
- in-place
-* **submode=0b010** selects the ``j+halfsize`` offset of the innermost for-loop,
- in reverse-order, in-place
-* **submode=0b10** selects the ``ci`` count of the innermost for-loop,
- useful for calculating the cosine coefficient
-* **submode=0b11** selects the ``size`` offset of the outermost for-loop,
- useful for the cosine coefficient ``cos(ci + 0.5) * pi / size``
-
-When submode2 is 3 or 4, for DCT outer butterfly submode the following
-schedules may be selected. When submode is 3, additional bit-reversing
-is also performed.
-
-* **submode=0b00** selects the ``j`` offset of the innermost for-loop,
-* **submode=0b01** selects the ``j+1`` offset of the innermost for-loop,
-
-## Matrix Mode
-
-In Matrix Mode, skip allows dimensions to be skipped from being included
-in the resultant output index. this allows sequences to be repeated:
-```0 0 0 1 1 1 2 2 2 ...``` or in the case of skip=0b11 this results in
-modulo ```0 1 2 0 1 2 ...```
-
-* **skip=0b00** indicates no dimensions to be skipped
-* **skip=0b01** sets "skip 1st dimension"
-* **skip=0b10** sets "skip 2nd dimension"
-* **skip=0b11** sets "skip 3rd dimension"
-
-invxyz will invert the start index of each of x, y or z. If invxyz[0] is
-zero then x-dimensional counting begins from 0 and increments, otherwise
-it begins from xdimsz-1 and iterates down to zero. Likewise for y and z.
-
-offset will have the effect of offsetting the result by ```offset``` elements:
-
- for i in 0..VL-1:
- GPR(RT + remap(i) + SVSHAPE.offset) = ....
-
-this appears redundant because the register RT could simply be changed by a compiler, until element width overrides are introduced. also
-bear in mind that unlike a static compiler SVSHAPE.offset may
-be set dynamically at runtime.
-
-xdimsz, ydimsz and zdimsz are offset by 1, such that a value of 0 indicates
-that the array dimensionality for that dimension is 1. any dimension
-not intended to be used must have its value set to 0 (dimensionality
-of 1). A value of xdimsz=2 would indicate that in the first dimension
-there are 3 elements in the array. For example, to create a 2D array
-X,Y of dimensionality X=3 and Y=2, set xdimsz=2, ydimsz=1 and zdimsz=0
-
-The format of the array is therefore as follows:
-
- array[xdimsz+1][ydimsz+1][zdimsz+1]
-
-However whilst illustrative of the dimensionality, that does not take the
-"permute" setting into account. "permute" may be any one of six values
-(0-5, with values of 6 and 7 indicating "Indexed" Mode). The table
-below shows how the permutation dimensionality order works:
-
-| permute | order | array format |
-| ------- | ----- | ------------------------ |
-| 000 | 0,1,2 | (xdim+1)(ydim+1)(zdim+1) |
-| 001 | 0,2,1 | (xdim+1)(zdim+1)(ydim+1) |
-| 010 | 1,0,2 | (ydim+1)(xdim+1)(zdim+1) |
-| 011 | 1,2,0 | (ydim+1)(zdim+1)(xdim+1) |
-| 100 | 2,0,1 | (zdim+1)(xdim+1)(ydim+1) |
-| 101 | 2,1,0 | (zdim+1)(ydim+1)(xdim+1) |
-| 110 | 0,1 | Indexed (xdim+1)(ydim+1) |
-| 111 | 1,0 | Indexed (ydim+1)(xdim+1) |
-
-In other words, the "permute" option changes the order in which
-nested for-loops over the array would be done. See executable
-python reference code for further details.
-
-*Note: permute=0b110 and permute=0b111 enable Indexed REMAP Mode,
-described below*
-
-## Indexed Mode
-
-Indexed Mode activates reading of the element indices from the GPR
-and includes optional limited 2D reordering.
-In its simplest form (without elwidth overrides or other modes):
-
-```
-def index_remap(i):
- return GPR((SVSHAPE.SVGPR<<1)+i+SVSHAPE.offset)
-
-for i in 0..VL-1:
- element_result = ....
- GPR(RT + indexed_remap(i)) = element_result
-```
-
-With element-width overrides included, and using the pseudocode
-from the SVP64 [[sv/svp64/appendix#elwidth]] elwidth section
-this becomes:
-
-```
-def index_remap(i):
- svreg = SVSHAPE.SVGPR << 1
- srcwid = elwid_to_bitwidth(SVSHAPE.elwid)
- offs = SVSHAPE.offset
- return get_polymorphed_reg(svreg, srcwid, i) + offs
-
-for i in 0..VL-1:
- element_result = ....
- rt_idx = indexed_remap(i)
- set_polymorphed_reg(RT, destwid, rt_idx, element_result)
-```
-
-Matrix-style reordering still applies to the indices, except limited
-to up to 2 Dimensions (X,Y). Ordering is therefore limited to (X,Y) or
-(Y,X). Only one dimension may optionally be skipped. Inversion of either
-X or Y or both is possible. Pseudocode for Indexed Mode (including elwidth
-overrides) may be written in terms of Matrix Mode, specifically
-purposed to ensure that the 3rd dimension (Z) has no effect:
-
-```
-def index_remap(ISHAPE, i):
- MSHAPE.skip = 0b0 || ISHAPE.sk1
- MSHAPE.invxyz = 0b0 || ISHAPE.invxy
- MSHAPE.xdimsz = ISHAPE.xdimsz
- MSHAPE.ydimsz = ISHAPE.ydimsz
- MSHAPE.zdimsz = 0 # disabled
- if ISHAPE.permute = 0b110 # 0,1
- MSHAPE.permute = 0b000 # 0,1,2
- if ISHAPE.permute = 0b111 # 1,0
- MSHAPE.permute = 0b010 # 1,0,2
- el_idx = remap_matrix(MSHAPE, i)
- svreg = ISHAPE.SVGPR << 1
- srcwid = elwid_to_bitwidth(ISHAPE.elwid)
- offs = ISHAPE.offset
- return get_polymorphed_reg(svreg, srcwid, el_idx) + offs
-```
-
-The most important observation above is that the Matrix-style
-remapping occurs first and the Index lookup second. Thus it
-becomes possible to perform in-place Transpose of Indices which
-may have been costly to set up or costly to duplicate
-(waste register file space).