From 8972fcf39518341c9190ee17e825f7c15d990120 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Mon, 22 Aug 2022 12:44:26 +0100 Subject: [PATCH] inline shape table format --- openpower/sv/remap.mdwn | 171 ++++++++++++++++++++++++++- openpower/sv/shape_table_format.mdwn | 169 -------------------------- 2 files changed, 169 insertions(+), 171 deletions(-) delete mode 100644 openpower/sv/shape_table_format.mdwn diff --git a/openpower/sv/remap.mdwn b/openpower/sv/remap.mdwn index 45e5f9c44..b9f244b1e 100644 --- a/openpower/sv/remap.mdwn +++ b/openpower/sv/remap.mdwn @@ -279,8 +279,175 @@ instruction which matches the above SPR: There are four "shape" SPRs, SHAPE0-3, 32-bits in each, which have the same format. -[[!inline pages="openpower/sv/shape_table_format" raw="yes" ]] - +Shape is 32-bits. When SHAPE is set entirely to zeros, remapping is +disabled: the register's elements are a linear (1D) vector. + +|31.30|29..28 |27..24| 23..21 | 20..18 | 17..12 |11..6 |5..0 | Mode | +|---- |------ |------| ------ | ------- | ------- |----- |----- | ----- | +|0b00 |skip |offset| invxyz | permute | zdimsz |ydimsz|xdimsz|Matrix | +|0b00 |elwidth|offset|sk1/invxy|0b110/0b111|SVGPR|ydimsz|xdimsz|Indexed| +|0b01 |submode|offset| invxyz | submode2| rsvd |rsvd |xdimsz|DCT/FFT| +|0b10 | | | | | | | |rsvd | +|0b11 | | | | | | | |rsvd | + +mode sets different behaviours (straight matrix multiply, FFT, DCT). + +* **mode=0b00** sets straight Matrix Mode +* **mode=0b00** with permute=0b110 or 0b111 sets Indexed Mode +* **mode=0b01** sets "FFT/DCT" mode and activates submodes + +## FFT/DCT mode + +submode2=0 is for FFT. For FFT submode the following schedules may be +selected: + +* **submode=0b00** selects the ``j`` offset of the innermost for-loop + of Tukey-Cooley +* **submode=0b10** selects the ``j+halfsize`` offset of the innermost for-loop + of Tukey-Cooley +* **submode=0b11** selects the ``k`` of exptable (which coefficient) + +When submode2 is 1 or 2, for DCT inner butterfly submode the following +schedules may be selected. When submode2 is 1, additional bit-reversing +is also performed. + +* **submode=0b00** selects the ``j`` offset of the innermost for-loop, + in-place +* **submode=0b010** selects the ``j+halfsize`` offset of the innermost for-loop, + in reverse-order, in-place +* **submode=0b10** selects the ``ci`` count of the innermost for-loop, + useful for calculating the cosine coefficient +* **submode=0b11** selects the ``size`` offset of the outermost for-loop, + useful for the cosine coefficient ``cos(ci + 0.5) * pi / size`` + +When submode2 is 3 or 4, for DCT outer butterfly submode the following +schedules may be selected. When submode is 3, additional bit-reversing +is also performed. + +* **submode=0b00** selects the ``j`` offset of the innermost for-loop, +* **submode=0b01** selects the ``j+1`` offset of the innermost for-loop, + +## Matrix Mode + +In Matrix Mode, skip allows dimensions to be skipped from being included +in the resultant output index. this allows sequences to be repeated: +```0 0 0 1 1 1 2 2 2 ...``` or in the case of skip=0b11 this results in +modulo ```0 1 2 0 1 2 ...``` + +* **skip=0b00** indicates no dimensions to be skipped +* **skip=0b01** sets "skip 1st dimension" +* **skip=0b10** sets "skip 2nd dimension" +* **skip=0b11** sets "skip 3rd dimension" + +invxyz will invert the start index of each of x, y or z. If invxyz[0] is +zero then x-dimensional counting begins from 0 and increments, otherwise +it begins from xdimsz-1 and iterates down to zero. Likewise for y and z. + +offset will have the effect of offsetting the result by ```offset``` elements: + + for i in 0..VL-1: + GPR(RT + remap(i) + SVSHAPE.offset) = .... + +this appears redundant because the register RT could simply be changed by a compiler, until element width overrides are introduced. also +bear in mind that unlike a static compiler SVSHAPE.offset may +be set dynamically at runtime. + +xdimsz, ydimsz and zdimsz are offset by 1, such that a value of 0 indicates +that the array dimensionality for that dimension is 1. any dimension +not intended to be used must have its value set to 0 (dimensionality +of 1). A value of xdimsz=2 would indicate that in the first dimension +there are 3 elements in the array. For example, to create a 2D array +X,Y of dimensionality X=3 and Y=2, set xdimsz=2, ydimsz=1 and zdimsz=0 + +The format of the array is therefore as follows: + + array[xdimsz+1][ydimsz+1][zdimsz+1] + +However whilst illustrative of the dimensionality, that does not take the +"permute" setting into account. "permute" may be any one of six values +(0-5, with values of 6 and 7 indicating "Indexed" Mode). The table +below shows how the permutation dimensionality order works: + +| permute | order | array format | +| ------- | ----- | ------------------------ | +| 000 | 0,1,2 | (xdim+1)(ydim+1)(zdim+1) | +| 001 | 0,2,1 | (xdim+1)(zdim+1)(ydim+1) | +| 010 | 1,0,2 | (ydim+1)(xdim+1)(zdim+1) | +| 011 | 1,2,0 | (ydim+1)(zdim+1)(xdim+1) | +| 100 | 2,0,1 | (zdim+1)(xdim+1)(ydim+1) | +| 101 | 2,1,0 | (zdim+1)(ydim+1)(xdim+1) | +| 110 | 0,1 | Indexed (xdim+1)(ydim+1) | +| 111 | 1,0 | Indexed (ydim+1)(xdim+1) | + +In other words, the "permute" option changes the order in which +nested for-loops over the array would be done. See executable +python reference code for further details. + +*Note: permute=0b110 and permute=0b111 enable Indexed REMAP Mode, +described below* + +## Indexed Mode + +Indexed Mode activates reading of the element indices from the GPR +and includes optional limited 2D reordering. +In its simplest form (without elwidth overrides or other modes): + +``` +def index_remap(i): + return GPR((SVSHAPE.SVGPR<<1)+i+SVSHAPE.offset) + +for i in 0..VL-1: + element_result = .... + GPR(RT + indexed_remap(i)) = element_result +``` + +With element-width overrides included, and using the pseudocode +from the SVP64 [[sv/svp64/appendix#elwidth]] elwidth section +this becomes: + +``` +def index_remap(i): + svreg = SVSHAPE.SVGPR << 1 + srcwid = elwid_to_bitwidth(SVSHAPE.elwid) + offs = SVSHAPE.offset + return get_polymorphed_reg(svreg, srcwid, i) + offs + +for i in 0..VL-1: + element_result = .... + rt_idx = indexed_remap(i) + set_polymorphed_reg(RT, destwid, rt_idx, element_result) +``` + +Matrix-style reordering still applies to the indices, except limited +to up to 2 Dimensions (X,Y). Ordering is therefore limited to (X,Y) or +(Y,X). Only one dimension may optionally be skipped. Inversion of either +X or Y or both is possible. Pseudocode for Indexed Mode (including elwidth +overrides) may be written in terms of Matrix Mode, specifically +purposed to ensure that the 3rd dimension (Z) has no effect: + +``` +def index_remap(ISHAPE, i): + MSHAPE.skip = 0b0 || ISHAPE.sk1 + MSHAPE.invxyz = 0b0 || ISHAPE.invxy + MSHAPE.xdimsz = ISHAPE.xdimsz + MSHAPE.ydimsz = ISHAPE.ydimsz + MSHAPE.zdimsz = 0 # disabled + if ISHAPE.permute = 0b110 # 0,1 + MSHAPE.permute = 0b000 # 0,1,2 + if ISHAPE.permute = 0b111 # 1,0 + MSHAPE.permute = 0b010 # 1,0,2 + el_idx = remap_matrix(MSHAPE, i) + svreg = ISHAPE.SVGPR << 1 + srcwid = elwid_to_bitwidth(ISHAPE.elwid) + offs = ISHAPE.offset + return get_polymorphed_reg(svreg, srcwid, el_idx) + offs +``` + +The most important observation above is that the Matrix-style +remapping occurs first and the Index lookup second. Thus it +becomes possible to perform in-place Transpose of Indices which +may have been costly to set up or costly to duplicate +(waste register file space). # svshape instruction `svshape` is a convenience instruction that reduces instruction diff --git a/openpower/sv/shape_table_format.mdwn b/openpower/sv/shape_table_format.mdwn deleted file mode 100644 index 894f96733..000000000 --- a/openpower/sv/shape_table_format.mdwn +++ /dev/null @@ -1,169 +0,0 @@ -Shape is 32-bits. When SHAPE is set entirely to zeros, remapping is -disabled: the register's elements are a linear (1D) vector. - -|31.30|29..28 |27..24| 23..21 | 20..18 | 17..12 |11..6 |5..0 | Mode | -|---- |------ |------| ------ | ------- | ------- |----- |----- | ----- | -|0b00 |skip |offset| invxyz | permute | zdimsz |ydimsz|xdimsz|Matrix | -|0b00 |elwidth|offset|sk1/invxy|0b110/0b111|SVGPR|ydimsz|xdimsz|Indexed| -|0b01 |submode|offset| invxyz | submode2| rsvd |rsvd |xdimsz|DCT/FFT| -|0b10 | | | | | | | |rsvd | -|0b11 | | | | | | | |rsvd | - -mode sets different behaviours (straight matrix multiply, FFT, DCT). - -* **mode=0b00** sets straight Matrix Mode -* **mode=0b00** with permute=0b110 or 0b111 sets Indexed Mode -* **mode=0b01** sets "FFT/DCT" mode and activates submodes - -## FFT/DCT mode - -submode2=0 is for FFT. For FFT submode the following schedules may be -selected: - -* **submode=0b00** selects the ``j`` offset of the innermost for-loop - of Tukey-Cooley -* **submode=0b10** selects the ``j+halfsize`` offset of the innermost for-loop - of Tukey-Cooley -* **submode=0b11** selects the ``k`` of exptable (which coefficient) - -When submode2 is 1 or 2, for DCT inner butterfly submode the following -schedules may be selected. When submode2 is 1, additional bit-reversing -is also performed. - -* **submode=0b00** selects the ``j`` offset of the innermost for-loop, - in-place -* **submode=0b010** selects the ``j+halfsize`` offset of the innermost for-loop, - in reverse-order, in-place -* **submode=0b10** selects the ``ci`` count of the innermost for-loop, - useful for calculating the cosine coefficient -* **submode=0b11** selects the ``size`` offset of the outermost for-loop, - useful for the cosine coefficient ``cos(ci + 0.5) * pi / size`` - -When submode2 is 3 or 4, for DCT outer butterfly submode the following -schedules may be selected. When submode is 3, additional bit-reversing -is also performed. - -* **submode=0b00** selects the ``j`` offset of the innermost for-loop, -* **submode=0b01** selects the ``j+1`` offset of the innermost for-loop, - -## Matrix Mode - -In Matrix Mode, skip allows dimensions to be skipped from being included -in the resultant output index. this allows sequences to be repeated: -```0 0 0 1 1 1 2 2 2 ...``` or in the case of skip=0b11 this results in -modulo ```0 1 2 0 1 2 ...``` - -* **skip=0b00** indicates no dimensions to be skipped -* **skip=0b01** sets "skip 1st dimension" -* **skip=0b10** sets "skip 2nd dimension" -* **skip=0b11** sets "skip 3rd dimension" - -invxyz will invert the start index of each of x, y or z. If invxyz[0] is -zero then x-dimensional counting begins from 0 and increments, otherwise -it begins from xdimsz-1 and iterates down to zero. Likewise for y and z. - -offset will have the effect of offsetting the result by ```offset``` elements: - - for i in 0..VL-1: - GPR(RT + remap(i) + SVSHAPE.offset) = .... - -this appears redundant because the register RT could simply be changed by a compiler, until element width overrides are introduced. also -bear in mind that unlike a static compiler SVSHAPE.offset may -be set dynamically at runtime. - -xdimsz, ydimsz and zdimsz are offset by 1, such that a value of 0 indicates -that the array dimensionality for that dimension is 1. any dimension -not intended to be used must have its value set to 0 (dimensionality -of 1). A value of xdimsz=2 would indicate that in the first dimension -there are 3 elements in the array. For example, to create a 2D array -X,Y of dimensionality X=3 and Y=2, set xdimsz=2, ydimsz=1 and zdimsz=0 - -The format of the array is therefore as follows: - - array[xdimsz+1][ydimsz+1][zdimsz+1] - -However whilst illustrative of the dimensionality, that does not take the -"permute" setting into account. "permute" may be any one of six values -(0-5, with values of 6 and 7 indicating "Indexed" Mode). The table -below shows how the permutation dimensionality order works: - -| permute | order | array format | -| ------- | ----- | ------------------------ | -| 000 | 0,1,2 | (xdim+1)(ydim+1)(zdim+1) | -| 001 | 0,2,1 | (xdim+1)(zdim+1)(ydim+1) | -| 010 | 1,0,2 | (ydim+1)(xdim+1)(zdim+1) | -| 011 | 1,2,0 | (ydim+1)(zdim+1)(xdim+1) | -| 100 | 2,0,1 | (zdim+1)(xdim+1)(ydim+1) | -| 101 | 2,1,0 | (zdim+1)(ydim+1)(xdim+1) | -| 110 | 0,1 | Indexed (xdim+1)(ydim+1) | -| 111 | 1,0 | Indexed (ydim+1)(xdim+1) | - -In other words, the "permute" option changes the order in which -nested for-loops over the array would be done. See executable -python reference code for further details. - -*Note: permute=0b110 and permute=0b111 enable Indexed REMAP Mode, -described below* - -## Indexed Mode - -Indexed Mode activates reading of the element indices from the GPR -and includes optional limited 2D reordering. -In its simplest form (without elwidth overrides or other modes): - -``` -def index_remap(i): - return GPR((SVSHAPE.SVGPR<<1)+i+SVSHAPE.offset) - -for i in 0..VL-1: - element_result = .... - GPR(RT + indexed_remap(i)) = element_result -``` - -With element-width overrides included, and using the pseudocode -from the SVP64 [[sv/svp64/appendix#elwidth]] elwidth section -this becomes: - -``` -def index_remap(i): - svreg = SVSHAPE.SVGPR << 1 - srcwid = elwid_to_bitwidth(SVSHAPE.elwid) - offs = SVSHAPE.offset - return get_polymorphed_reg(svreg, srcwid, i) + offs - -for i in 0..VL-1: - element_result = .... - rt_idx = indexed_remap(i) - set_polymorphed_reg(RT, destwid, rt_idx, element_result) -``` - -Matrix-style reordering still applies to the indices, except limited -to up to 2 Dimensions (X,Y). Ordering is therefore limited to (X,Y) or -(Y,X). Only one dimension may optionally be skipped. Inversion of either -X or Y or both is possible. Pseudocode for Indexed Mode (including elwidth -overrides) may be written in terms of Matrix Mode, specifically -purposed to ensure that the 3rd dimension (Z) has no effect: - -``` -def index_remap(ISHAPE, i): - MSHAPE.skip = 0b0 || ISHAPE.sk1 - MSHAPE.invxyz = 0b0 || ISHAPE.invxy - MSHAPE.xdimsz = ISHAPE.xdimsz - MSHAPE.ydimsz = ISHAPE.ydimsz - MSHAPE.zdimsz = 0 # disabled - if ISHAPE.permute = 0b110 # 0,1 - MSHAPE.permute = 0b000 # 0,1,2 - if ISHAPE.permute = 0b111 # 1,0 - MSHAPE.permute = 0b010 # 1,0,2 - el_idx = remap_matrix(MSHAPE, i) - svreg = ISHAPE.SVGPR << 1 - srcwid = elwid_to_bitwidth(ISHAPE.elwid) - offs = ISHAPE.offset - return get_polymorphed_reg(svreg, srcwid, el_idx) + offs -``` - -The most important observation above is that the Matrix-style -remapping occurs first and the Index lookup second. Thus it -becomes possible to perform in-place Transpose of Indices which -may have been costly to set up or costly to duplicate -(waste register file space). -- 2.30.2