From b493297f7c3a63932b92849521be2bdc441dc935 Mon Sep 17 00:00:00 2001 From: lkcl Date: Mon, 7 Oct 2019 15:47:35 +0100 Subject: [PATCH] --- simple_v_extension/remap.mdwn | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/simple_v_extension/remap.mdwn b/simple_v_extension/remap.mdwn index a9558db05..f9dafa572 100644 --- a/simple_v_extension/remap.mdwn +++ b/simple_v_extension/remap.mdwn @@ -37,14 +37,16 @@ There are three "shape" CSRs, SHAPE0, SHAPE1, SHAPE2, 32-bits in each, which have the same format. When each SHAPE CSR is set entirely to zeros, remapping is disabled: the register's elements are a linear (1D) vector. -| 29..24 | 23..21 | 20..18 | 17..12 | 11..6 | 5..0 | -| ------ | ------- | ------- | ------- | -------- | ------- | -| modulo | invxyz | permute | zdimsz | ydimsz | xdimsz | +| 31..30 | 29..24 | 23..21 | 20..18 | 17..12 | 11..6 | 5..0 | +| -------- | ------ | ------- | ------- | ------- | -------- | ------- | +| applydim |modulo | invxyz | permute | zdimsz | ydimsz | xdimsz | -modulo will cause the output to wrap and remain within the range 0 to modulo. The value zero disables modulus application. +applydim will set to zero the dimensions less than this. applydim=0 applies all three. applydim=1 applies y and z. applydim=2 applys only z. applydim=3 is reserved. invxyz will invert the start index of each of x, y or z. If invxyz[0] is zero then x-dimensional counting begins from 0 and increments, otherwise it begins from xdimsz-1 and iterates down to zero. Likewise for y and z. +modulo will cause the output to wrap and remain within the range 0 to modulo. The value zero disables modulus application. Note that modulo arithmetic is applied after all other remapping calculations. + xdimsz, ydimsz and zdimsz are offset by 1, such that a value of 0 indicates that the array dimensionality for that dimension is 1. A value of xdimsz=2 would indicate that in the first dimension there are 3 elements in the @@ -79,12 +81,14 @@ shows this more clearly, and may be executed as a python program: idxs = [0,0,0] # starting indices order = [1,0,2] # experiment with different permutations, here modulo = 64 # experiment with different modulus, here + applydim=0 invxyz = [0,0,0] for idx in range(xdim * ydim * zdim): ix = [0] * 3 for i in range(3): - ix[i] = idxs[i] + if i >= applydim: + ix[i] = idxs[i] if invxyz[i]: ix[i] = lims[i] - ix[i] new_idx = ix[0] + ix[1] * xdim + ix[2] * xdim * ydim @@ -198,4 +202,4 @@ At the same time, VL will, because there is no SHAPE on f8, increment straight s The only other instruction required is to ensure that f4-f7 are initialised (usually to zero). -It should be clear that a 4x4 by 4x4 Matrix Multiply, being effectively the same technique applied to four independent vectors, can be done by setting VL=64, using an extra dimension on the SHAPE CSRs and applying a rotating SHAPE CSR to f8 in order to get it to apply four times to compute the four columns worth of vectors. +It should be clear that a 4x4 by 4x4 Matrix Multiply, being effectively the same technique applied to four independent vectors, can be done by setting VL=64, using an extra dimension on the SHAPE0 and SHAPE1 CSRs, and applying a rotating 1D SHAPE CSR of xdim=16 to f8 in order to get it to apply four times to compute the four columns worth of vectors. -- 2.30.2