(no commit message)
[libreriscv.git] / openpower / sv / shape_table_format.mdwn
1 Shape is 32-bits. When SHAPE is set entirely to zeros, remapping is
2 disabled: the register's elements are a linear (1D) vector.
3
4 |31.30|29..28 |27..24| 23..21 | 20..18 | 17..12 |11..6 |5..0 | Mode |
5 |---- |------ |------| ------ | ------- | ------- |----- |----- | ----- |
6 |0b00 |skip |offset| invxyz | permute | zdimsz |ydimsz|xdimsz|Matrix |
7 |0b00 |elwidth|offset|sk1/invxy|0b110/0b111|SVGPR|ydimsz|xdimsz|Indexed|
8 |0b01 |submode|offset| invxyz | submode2| rsvd |rsvd |xdimsz|DCT/FFT|
9 |0b10 | | | | | | | |rsvd |
10 |0b11 | | | | | | | |rsvd |
11
12 mode sets different behaviours (straight matrix multiply, FFT, DCT).
13
14 * **mode=0b00** sets straight Matrix Mode
15 * **mode=0b00** with permute=0b110 or 0b111 sets Indexed Mode
16 * **mode=0b01** sets "FFT/DCT" mode and activates submodes
17
18 ## FFT/DCT mode
19
20 submode2=0 is for FFT. For FFT submode the following schedules may be
21 selected:
22
23 * **submode=0b00** selects the ``j`` offset of the innermost for-loop
24 of Tukey-Cooley
25 * **submode=0b10** selects the ``j+halfsize`` offset of the innermost for-loop
26 of Tukey-Cooley
27 * **submode=0b11** selects the ``k`` of exptable (which coefficient)
28
29 When submode2 is 1 or 2, for DCT inner butterfly submode the following
30 schedules may be selected. When submode2 is 1, additional bit-reversing
31 is also performed.
32
33 * **submode=0b00** selects the ``j`` offset of the innermost for-loop,
34 in-place
35 * **submode=0b010** selects the ``j+halfsize`` offset of the innermost for-loop,
36 in reverse-order, in-place
37 * **submode=0b10** selects the ``ci`` count of the innermost for-loop,
38 useful for calculating the cosine coefficient
39 * **submode=0b11** selects the ``size`` offset of the outermost for-loop,
40 useful for the cosine coefficient ``cos(ci + 0.5) * pi / size``
41
42 When submode2 is 3 or 4, for DCT outer butterfly submode the following
43 schedules may be selected. When submode is 3, additional bit-reversing
44 is also performed.
45
46 * **submode=0b00** selects the ``j`` offset of the innermost for-loop,
47 * **submode=0b01** selects the ``j+1`` offset of the innermost for-loop,
48
49 ## Matrix Mode
50
51 In Matrix Mode, skip allows dimensions to be skipped from being included
52 in the resultant output index. this allows sequences to be repeated:
53 ```0 0 0 1 1 1 2 2 2 ...``` or in the case of skip=0b11 this results in
54 modulo ```0 1 2 0 1 2 ...```
55
56 * **skip=0b00** indicates no dimensions to be skipped
57 * **skip=0b01** sets "skip 1st dimension"
58 * **skip=0b10** sets "skip 2nd dimension"
59 * **skip=0b11** sets "skip 3rd dimension"
60
61 invxyz will invert the start index of each of x, y or z. If invxyz[0] is
62 zero then x-dimensional counting begins from 0 and increments, otherwise
63 it begins from xdimsz-1 and iterates down to zero. Likewise for y and z.
64
65 offset will have the effect of offsetting the result by ```offset``` elements:
66
67 for i in 0..VL-1:
68 GPR(RT + remap(i) + SVSHAPE.offset) = ....
69
70 this appears redundant because the register RT could simply be changed by a compiler, until element width overrides are introduced. also
71 bear in mind that unlike a static compiler SVSHAPE.offset may
72 be set dynamically at runtime.
73
74 xdimsz, ydimsz and zdimsz are offset by 1, such that a value of 0 indicates
75 that the array dimensionality for that dimension is 1. any dimension
76 not intended to be used must have its value set to 0 (dimensionality
77 of 1). A value of xdimsz=2 would indicate that in the first dimension
78 there are 3 elements in the array. For example, to create a 2D array
79 X,Y of dimensionality X=3 and Y=2, set xdimsz=2, ydimsz=1 and zdimsz=0
80
81 The format of the array is therefore as follows:
82
83 array[xdimsz+1][ydimsz+1][zdimsz+1]
84
85 However whilst illustrative of the dimensionality, that does not take the
86 "permute" setting into account. "permute" may be any one of six values
87 (0-5, with values of 6 and 7 indicating "Indexed" Mode). The table
88 below shows how the permutation dimensionality order works:
89
90 | permute | order | array format |
91 | ------- | ----- | ------------------------ |
92 | 000 | 0,1,2 | (xdim+1)(ydim+1)(zdim+1) |
93 | 001 | 0,2,1 | (xdim+1)(zdim+1)(ydim+1) |
94 | 010 | 1,0,2 | (ydim+1)(xdim+1)(zdim+1) |
95 | 011 | 1,2,0 | (ydim+1)(zdim+1)(xdim+1) |
96 | 100 | 2,0,1 | (zdim+1)(xdim+1)(ydim+1) |
97 | 101 | 2,1,0 | (zdim+1)(ydim+1)(xdim+1) |
98 | 110 | 0,1 | Indexed (xdim+1)(ydim+1) |
99 | 111 | 1,0 | Indexed (ydim+1)(xdim+1) |
100
101 In other words, the "permute" option changes the order in which
102 nested for-loops over the array would be done. See executable
103 python reference code for further details.
104
105 *Note: permute=0b110 and permute=0b111 enable Indexed REMAP Mode,
106 described below*
107
108 ## Indexed Mode
109
110 Indexed Mode activates reading of the element indices from the GPR
111 and includes optional limited 2D reordering.
112 In its simplest form (without elwidth overrides or other modes):
113
114 ```
115 def index_remap(i):
116 return GPR((SVSHAPE.SVGPR<<1)+i+SVSHAPE.offset)
117
118 for i in 0..VL-1:
119 element_result = ....
120 GPR(RT + indexed_remap(i)) = element_result
121 ```
122
123 With element-width overrides included, and using the pseudocode
124 from the SVP64 [[sv/svp64/appendix#elwidth]] elwidth section
125 this becomes:
126
127 ```
128 def index_remap(i):
129 svreg = SVSHAPE.SVGPR << 1
130 srcwid = elwid_to_bitwidth(SVSHAPE.elwid)
131 offs = SVSHAPE.offset
132 return get_polymorphed_reg(svreg, srcwid, i) + offs
133
134 for i in 0..VL-1:
135 element_result = ....
136 rt_idx = indexed_remap(i)
137 set_polymorphed_reg(RT, destwid, rt_idx, element_result)
138 ```
139
140 Matrix-style reordering still applies to the indices, except limited
141 to up to 2 Dimensions (X,Y). Ordering is therefore limited to (X,Y) or
142 (Y,X). Only one dimension may optionally be skipped. Inversion of either
143 X or Y or both is possible. Pseudocode for Indexed Mode (including elwidth
144 overrides) may be written in terms of Matrix Mode, specifically
145 purposed to ensure that the 3rd dimension (Z) has no effect:
146
147 ```
148 def index_remap(ISHAPE, i):
149 MSHAPE.skip = 0b0 || ISHAPE.sk1
150 MSHAPE.invxyz = 0b0 || ISHAPE.invxy
151 MSHAPE.xdimsz = ISHAPE.xdimsz
152 MSHAPE.ydimsz = ISHAPE.ydimsz
153 MSHAPE.zdimsz = 0 # disabled
154 if ISHAPE.permute = 0b110 # 0,1
155 MSHAPE.permute = 0b000 # 0,1,2
156 if ISHAPE.permute = 0b111 # 1,0
157 MSHAPE.permute = 0b010 # 1,0,2
158 el_idx = remap_matrix(MSHAPE, i)
159 svreg = ISHAPE.SVGPR << 1
160 srcwid = elwid_to_bitwidth(ISHAPE.elwid)
161 offs = ISHAPE.offset
162 return get_polymorphed_reg(svreg, srcwid, el_idx) + offs
163 ```
164