From 8972fcf39518341c9190ee17e825f7c15d990120 Mon Sep 17 00:00:00 2001
From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Mon, 22 Aug 2022 12:44:26 +0100
Subject: [PATCH] inline shape table format

---
 openpower/sv/remap.mdwn              | 171 ++++++++++++++++++++++++++-
 openpower/sv/shape_table_format.mdwn | 169 --------------------------
 2 files changed, 169 insertions(+), 171 deletions(-)
 delete mode 100644 openpower/sv/shape_table_format.mdwn

diff --git a/openpower/sv/remap.mdwn b/openpower/sv/remap.mdwn
index 45e5f9c44..b9f244b1e 100644
--- a/openpower/sv/remap.mdwn
+++ b/openpower/sv/remap.mdwn
@@ -279,8 +279,175 @@ instruction which matches the above SPR:
 There are four "shape" SPRs, SHAPE0-3, 32-bits in each,
 which have the same format.  
 
-[[!inline pages="openpower/sv/shape_table_format" raw="yes" ]]
-
+Shape is 32-bits.  When SHAPE is set entirely to zeros, remapping is
+disabled: the register's elements are a linear (1D) vector.
+
+|31.30|29..28 |27..24| 23..21 | 20..18  | 17..12  |11..6 |5..0  | Mode  |
+|---- |------ |------| ------ | ------- | ------- |----- |----- | ----- |
+|0b00 |skip   |offset| invxyz | permute | zdimsz  |ydimsz|xdimsz|Matrix |
+|0b00 |elwidth|offset|sk1/invxy|0b110/0b111|SVGPR|ydimsz|xdimsz|Indexed|
+|0b01 |submode|offset| invxyz | submode2| rsvd    |rsvd  |xdimsz|DCT/FFT|
+|0b10 |       |      |        |         |         |      |      |rsvd   |
+|0b11 |       |      |        |         |         |      |      |rsvd   |
+
+mode sets different behaviours (straight matrix multiply, FFT, DCT).
+
+* **mode=0b00** sets straight Matrix Mode
+* **mode=0b00** with permute=0b110 or 0b111 sets Indexed Mode
+* **mode=0b01** sets "FFT/DCT" mode and activates submodes
+
+## FFT/DCT mode
+
+submode2=0 is for FFT. For FFT submode the following schedules may be 
+selected:
+
+* **submode=0b00** selects the ``j`` offset of the innermost for-loop
+  of Tukey-Cooley
+* **submode=0b10** selects the ``j+halfsize`` offset of the innermost for-loop
+  of Tukey-Cooley
+* **submode=0b11** selects the ``k`` of exptable (which coefficient)
+
+When submode2 is 1 or 2, for DCT inner butterfly submode the following
+schedules may be selected.  When submode2 is 1, additional bit-reversing
+is also performed.
+
+* **submode=0b00** selects the ``j`` offset of the innermost for-loop,
+    in-place
+* **submode=0b010** selects the ``j+halfsize`` offset of the innermost for-loop,
+  in reverse-order, in-place
+* **submode=0b10** selects the ``ci`` count of the innermost for-loop,
+  useful for calculating the cosine coefficient
+* **submode=0b11** selects the ``size`` offset of the outermost for-loop,
+  useful for the cosine coefficient ``cos(ci + 0.5) * pi / size``
+
+When submode2 is 3 or 4, for DCT outer butterfly submode the following
+schedules may be selected.  When submode is 3, additional bit-reversing
+is also performed.
+
+* **submode=0b00** selects the ``j`` offset of the innermost for-loop,
+* **submode=0b01** selects the ``j+1`` offset of the innermost for-loop,
+
+## Matrix Mode
+
+In Matrix Mode, skip allows dimensions to be skipped from being included
+in the resultant output index.  this allows sequences to be repeated:
+```0 0 0 1 1 1 2 2 2 ...``` or in the case of skip=0b11 this results in
+modulo ```0 1 2 0 1 2 ...```
+
+* **skip=0b00** indicates no dimensions to be skipped
+* **skip=0b01** sets "skip 1st dimension"
+* **skip=0b10** sets "skip 2nd dimension"
+* **skip=0b11** sets "skip 3rd dimension"
+
+invxyz will invert the start index of each of x, y or z. If invxyz[0] is
+zero then x-dimensional counting begins from 0 and increments, otherwise
+it begins from xdimsz-1 and iterates down to zero. Likewise for y and z.
+
+offset will have the effect of offsetting the result by ```offset``` elements:
+
+    for i in 0..VL-1:
+        GPR(RT + remap(i) + SVSHAPE.offset) = ....
+
+this appears redundant because the register RT could simply be changed by a compiler, until element width overrides are introduced.  also
+bear in mind that unlike a static compiler SVSHAPE.offset may
+be set dynamically at runtime.
+
+xdimsz, ydimsz and zdimsz are offset by 1, such that a value of 0 indicates
+that the array dimensionality for that dimension is 1. any dimension
+not intended to be used must have its value set to 0 (dimensionality
+of 1).  A value of xdimsz=2 would indicate that in the first dimension
+there are 3 elements in the array.  For example, to create a 2D array
+X,Y of dimensionality X=3 and Y=2, set xdimsz=2, ydimsz=1 and zdimsz=0
+
+The format of the array is therefore as follows:
+
+    array[xdimsz+1][ydimsz+1][zdimsz+1]
+
+However whilst illustrative of the dimensionality, that does not take the
+"permute" setting into account.  "permute" may be any one of six values
+(0-5, with values of 6 and 7 indicating "Indexed" Mode).  The table
+below shows how the permutation dimensionality order works:
+
+| permute | order | array format             |
+| ------- | ----- | ------------------------ |
+| 000     | 0,1,2 | (xdim+1)(ydim+1)(zdim+1) |
+| 001     | 0,2,1 | (xdim+1)(zdim+1)(ydim+1) |
+| 010     | 1,0,2 | (ydim+1)(xdim+1)(zdim+1) |
+| 011     | 1,2,0 | (ydim+1)(zdim+1)(xdim+1) |
+| 100     | 2,0,1 | (zdim+1)(xdim+1)(ydim+1) |
+| 101     | 2,1,0 | (zdim+1)(ydim+1)(xdim+1) |
+| 110     | 0,1   | Indexed (xdim+1)(ydim+1) |
+| 111     | 1,0   | Indexed (ydim+1)(xdim+1) |
+
+In other words, the "permute" option changes the order in which
+nested for-loops over the array would be done.  See executable
+python reference code for further details.
+
+*Note: permute=0b110 and permute=0b111 enable Indexed REMAP Mode,
+described below*
+
+## Indexed Mode
+
+Indexed Mode activates reading of the element indices from the GPR
+and includes optional limited 2D reordering.
+In its simplest form (without elwidth overrides or other modes):
+
+```
+def index_remap(i):
+    return GPR((SVSHAPE.SVGPR<<1)+i+SVSHAPE.offset)
+
+for i in 0..VL-1:
+    element_result = ....
+    GPR(RT + indexed_remap(i)) = element_result
+```
+
+With element-width overrides included, and using the pseudocode
+from the SVP64 [[sv/svp64/appendix#elwidth]] elwidth section
+this becomes:
+
+```
+def index_remap(i):
+    svreg = SVSHAPE.SVGPR << 1
+    srcwid = elwid_to_bitwidth(SVSHAPE.elwid)
+    offs = SVSHAPE.offset
+    return get_polymorphed_reg(svreg, srcwid, i) + offs
+
+for i in 0..VL-1:
+    element_result = ....
+    rt_idx = indexed_remap(i)
+    set_polymorphed_reg(RT, destwid, rt_idx, element_result)
+```
+
+Matrix-style reordering still applies to the indices, except limited
+to up to 2 Dimensions (X,Y). Ordering is therefore limited to (X,Y) or
+(Y,X). Only one dimension may optionally be skipped. Inversion of either
+X or Y or both is possible. Pseudocode for Indexed Mode (including elwidth
+overrides) may be written in terms of Matrix Mode, specifically
+purposed to ensure that the 3rd dimension (Z) has no effect:
+
+```
+def index_remap(ISHAPE, i):
+    MSHAPE.skip   = 0b0 || ISHAPE.sk1
+    MSHAPE.invxyz = 0b0 || ISHAPE.invxy
+    MSHAPE.xdimsz = ISHAPE.xdimsz
+    MSHAPE.ydimsz = ISHAPE.ydimsz
+    MSHAPE.zdimsz = 0 # disabled
+    if ISHAPE.permute = 0b110 # 0,1
+       MSHAPE.permute = 0b000 # 0,1,2
+    if ISHAPE.permute = 0b111 # 1,0
+       MSHAPE.permute = 0b010 # 1,0,2
+    el_idx = remap_matrix(MSHAPE, i)
+    svreg = ISHAPE.SVGPR << 1
+    srcwid = elwid_to_bitwidth(ISHAPE.elwid)
+    offs = ISHAPE.offset
+    return get_polymorphed_reg(svreg, srcwid, el_idx) + offs
+```
+
+The most important observation above is that the Matrix-style
+remapping occurs first and the Index lookup second.  Thus it
+becomes possible to perform in-place Transpose of Indices which
+may have been costly to set up or costly to duplicate
+(waste register file space).
 # svshape instruction  <a name="svshape"> </a>
 
 `svshape` is a convenience instruction that reduces instruction
diff --git a/openpower/sv/shape_table_format.mdwn b/openpower/sv/shape_table_format.mdwn
deleted file mode 100644
index 894f96733..000000000
--- a/openpower/sv/shape_table_format.mdwn
+++ /dev/null
@@ -1,169 +0,0 @@
-Shape is 32-bits.  When SHAPE is set entirely to zeros, remapping is
-disabled: the register's elements are a linear (1D) vector.
-
-|31.30|29..28 |27..24| 23..21 | 20..18  | 17..12  |11..6 |5..0  | Mode  |
-|---- |------ |------| ------ | ------- | ------- |----- |----- | ----- |
-|0b00 |skip   |offset| invxyz | permute | zdimsz  |ydimsz|xdimsz|Matrix |
-|0b00 |elwidth|offset|sk1/invxy|0b110/0b111|SVGPR|ydimsz|xdimsz|Indexed|
-|0b01 |submode|offset| invxyz | submode2| rsvd    |rsvd  |xdimsz|DCT/FFT|
-|0b10 |       |      |        |         |         |      |      |rsvd   |
-|0b11 |       |      |        |         |         |      |      |rsvd   |
-
-mode sets different behaviours (straight matrix multiply, FFT, DCT).
-
-* **mode=0b00** sets straight Matrix Mode
-* **mode=0b00** with permute=0b110 or 0b111 sets Indexed Mode
-* **mode=0b01** sets "FFT/DCT" mode and activates submodes
-
-## FFT/DCT mode
-
-submode2=0 is for FFT. For FFT submode the following schedules may be 
-selected:
-
-* **submode=0b00** selects the ``j`` offset of the innermost for-loop
-  of Tukey-Cooley
-* **submode=0b10** selects the ``j+halfsize`` offset of the innermost for-loop
-  of Tukey-Cooley
-* **submode=0b11** selects the ``k`` of exptable (which coefficient)
-
-When submode2 is 1 or 2, for DCT inner butterfly submode the following
-schedules may be selected.  When submode2 is 1, additional bit-reversing
-is also performed.
-
-* **submode=0b00** selects the ``j`` offset of the innermost for-loop,
-    in-place
-* **submode=0b010** selects the ``j+halfsize`` offset of the innermost for-loop,
-  in reverse-order, in-place
-* **submode=0b10** selects the ``ci`` count of the innermost for-loop,
-  useful for calculating the cosine coefficient
-* **submode=0b11** selects the ``size`` offset of the outermost for-loop,
-  useful for the cosine coefficient ``cos(ci + 0.5) * pi / size``
-
-When submode2 is 3 or 4, for DCT outer butterfly submode the following
-schedules may be selected.  When submode is 3, additional bit-reversing
-is also performed.
-
-* **submode=0b00** selects the ``j`` offset of the innermost for-loop,
-* **submode=0b01** selects the ``j+1`` offset of the innermost for-loop,
-
-## Matrix Mode
-
-In Matrix Mode, skip allows dimensions to be skipped from being included
-in the resultant output index.  this allows sequences to be repeated:
-```0 0 0 1 1 1 2 2 2 ...``` or in the case of skip=0b11 this results in
-modulo ```0 1 2 0 1 2 ...```
-
-* **skip=0b00** indicates no dimensions to be skipped
-* **skip=0b01** sets "skip 1st dimension"
-* **skip=0b10** sets "skip 2nd dimension"
-* **skip=0b11** sets "skip 3rd dimension"
-
-invxyz will invert the start index of each of x, y or z. If invxyz[0] is
-zero then x-dimensional counting begins from 0 and increments, otherwise
-it begins from xdimsz-1 and iterates down to zero. Likewise for y and z.
-
-offset will have the effect of offsetting the result by ```offset``` elements:
-
-    for i in 0..VL-1:
-        GPR(RT + remap(i) + SVSHAPE.offset) = ....
-
-this appears redundant because the register RT could simply be changed by a compiler, until element width overrides are introduced.  also
-bear in mind that unlike a static compiler SVSHAPE.offset may
-be set dynamically at runtime.
-
-xdimsz, ydimsz and zdimsz are offset by 1, such that a value of 0 indicates
-that the array dimensionality for that dimension is 1. any dimension
-not intended to be used must have its value set to 0 (dimensionality
-of 1).  A value of xdimsz=2 would indicate that in the first dimension
-there are 3 elements in the array.  For example, to create a 2D array
-X,Y of dimensionality X=3 and Y=2, set xdimsz=2, ydimsz=1 and zdimsz=0
-
-The format of the array is therefore as follows:
-
-    array[xdimsz+1][ydimsz+1][zdimsz+1]
-
-However whilst illustrative of the dimensionality, that does not take the
-"permute" setting into account.  "permute" may be any one of six values
-(0-5, with values of 6 and 7 indicating "Indexed" Mode).  The table
-below shows how the permutation dimensionality order works:
-
-| permute | order | array format             |
-| ------- | ----- | ------------------------ |
-| 000     | 0,1,2 | (xdim+1)(ydim+1)(zdim+1) |
-| 001     | 0,2,1 | (xdim+1)(zdim+1)(ydim+1) |
-| 010     | 1,0,2 | (ydim+1)(xdim+1)(zdim+1) |
-| 011     | 1,2,0 | (ydim+1)(zdim+1)(xdim+1) |
-| 100     | 2,0,1 | (zdim+1)(xdim+1)(ydim+1) |
-| 101     | 2,1,0 | (zdim+1)(ydim+1)(xdim+1) |
-| 110     | 0,1   | Indexed (xdim+1)(ydim+1) |
-| 111     | 1,0   | Indexed (ydim+1)(xdim+1) |
-
-In other words, the "permute" option changes the order in which
-nested for-loops over the array would be done.  See executable
-python reference code for further details.
-
-*Note: permute=0b110 and permute=0b111 enable Indexed REMAP Mode,
-described below*
-
-## Indexed Mode
-
-Indexed Mode activates reading of the element indices from the GPR
-and includes optional limited 2D reordering.
-In its simplest form (without elwidth overrides or other modes):
-
-```
-def index_remap(i):
-    return GPR((SVSHAPE.SVGPR<<1)+i+SVSHAPE.offset)
-
-for i in 0..VL-1:
-    element_result = ....
-    GPR(RT + indexed_remap(i)) = element_result
-```
-
-With element-width overrides included, and using the pseudocode
-from the SVP64 [[sv/svp64/appendix#elwidth]] elwidth section
-this becomes:
-
-```
-def index_remap(i):
-    svreg = SVSHAPE.SVGPR << 1
-    srcwid = elwid_to_bitwidth(SVSHAPE.elwid)
-    offs = SVSHAPE.offset
-    return get_polymorphed_reg(svreg, srcwid, i) + offs
-
-for i in 0..VL-1:
-    element_result = ....
-    rt_idx = indexed_remap(i)
-    set_polymorphed_reg(RT, destwid, rt_idx, element_result)
-```
-
-Matrix-style reordering still applies to the indices, except limited
-to up to 2 Dimensions (X,Y). Ordering is therefore limited to (X,Y) or
-(Y,X). Only one dimension may optionally be skipped. Inversion of either
-X or Y or both is possible. Pseudocode for Indexed Mode (including elwidth
-overrides) may be written in terms of Matrix Mode, specifically
-purposed to ensure that the 3rd dimension (Z) has no effect:
-
-```
-def index_remap(ISHAPE, i):
-    MSHAPE.skip   = 0b0 || ISHAPE.sk1
-    MSHAPE.invxyz = 0b0 || ISHAPE.invxy
-    MSHAPE.xdimsz = ISHAPE.xdimsz
-    MSHAPE.ydimsz = ISHAPE.ydimsz
-    MSHAPE.zdimsz = 0 # disabled
-    if ISHAPE.permute = 0b110 # 0,1
-       MSHAPE.permute = 0b000 # 0,1,2
-    if ISHAPE.permute = 0b111 # 1,0
-       MSHAPE.permute = 0b010 # 1,0,2
-    el_idx = remap_matrix(MSHAPE, i)
-    svreg = ISHAPE.SVGPR << 1
-    srcwid = elwid_to_bitwidth(ISHAPE.elwid)
-    offs = ISHAPE.offset
-    return get_polymorphed_reg(svreg, srcwid, el_idx) + offs
-```
-
-The most important observation above is that the Matrix-style
-remapping occurs first and the Index lookup second.  Thus it
-becomes possible to perform in-place Transpose of Indices which
-may have been costly to set up or costly to duplicate
-(waste register file space).
-- 
2.30.2