From: lkcl Date: Sun, 16 Apr 2023 10:02:39 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls009_v1~37 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=fe0e85650b2eeaad55bae3ed815769159cbc4c92;p=libreriscv.git --- diff --git a/openpower/sv/remap/appendix.mdwn b/openpower/sv/remap/appendix.mdwn index d4b99c5a2..655c81490 100644 --- a/openpower/sv/remap/appendix.mdwn +++ b/openpower/sv/remap/appendix.mdwn @@ -157,127 +157,6 @@ setting VL=64, using an extra dimension on the SHAPE0 and SHAPE1 SPRs, and applying a rotating 1D SHAPE SPR of xdim=16 to f8 in order to get it to apply four times to compute the four columns worth of vectors. -# Warshall transitive closure algorithm - -TODO move to [[sv/remap/discussion]] page, copied from here -http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-July/003286.html - -with thanks to Hendrik. - - - -> Just a note: interpreting + as 'or', and * as 'and', -> operating on Boolean matrices, -> and having result, X, and Y be the exact same matrix, -> updated while being used, -> gives the traditional Warshall transitive-closure -> algorithm, if the loops are nested exactly in thie order. - -this can be done with the ternary instruction which has -an in-place triple boolean input: - -``` - RT = RT | (RA & RB) -``` - -and also has a CR Field variant of the same - -notes from conversations: - -> > for y in y_r: -> > for x in x_r: -> > for z in z_r: -> > result[y][x] += -> > a[y][z] * -> > b[z][x] - -> This nesting of loops works for matrix multiply, but not for transitive -> closure. - -> > it can be done: -> > -> >   for z in z_r: -> >    for y in y_r: -> >     for x in x_r: -> >       result[y][x] += -> >          a[y][z] * -> >          b[z][x] -> -> And this ordering of loops *does* work for transitive closure, when a, -> b, and result are the very same matrix, updated while being used. -> -> By the way, I believe there is a graph algorithm that does the -> transitive closure thing, but instead of using boolean, "and", and "or", -> they use real numbers, addition, and minimum.  I think that one computes -> shortest paths between vertices. -> -> By the time the z'th iteration of the z loop begins, the algorithm has -> already peocessed paths that go through vertices numbered < z, and it -> adds paths that go through vertices numbered z. -> -> For this to work, the outer loop has to be the one on the subscript that -> bridges a and b (which in this case are teh same matrix, of course). - -# SUBVL Remap - -Remapping of SUBVL (vec2/3/4) elements is not permitted: the vec2/3/4 -itself must be considered to be the "element". To perform REMAP -on the elements of a vec2/3/4, either use [[sv/mv.swizzle]], or, -due to the sub-elements themselves being contiguous, treat them as -such and use Indexing, or add one -extra dimension to Matrix REMAP, the inner dimension being the size -of the Subvector (2, 3, or 4). - -Note that Swizzle on Sub-vectors may be applied on top of REMAP. -Where this is appropriate is the Rijndael MixColumns -stage: - - - -Assuming that the bytes are stored `a00 a01 a02 a03 a10 .. a33` -a 2D REMAP allows: - -* the column bytes (as a vec4) to be iterated over as an inner loop, - progressing vertically (`a00 a10 a20 a30`) -* the columns themselves to be iterated as an outer loop -* a 32 bit `GF(256)` Matrix Multiply on the vec4 to be performed. - -This entirely in-place without special 128-bit opcodes. Below is -the pseudocode for [[!wikipedia Rijndael MixColumns]] - -``` -void gmix_column(unsigned char *r) { - unsigned char a[4]; - unsigned char b[4]; - unsigned char c; - unsigned char h; - // no swizzle here but vec4 byte-level - // elwidth overrides can be done though. - for (c = 0; c < 4; c++) { - a[c] = r[c]; - h = (unsigned char)((signed char)r[c] >> 7); - b[c] = r[c] << 1; - b[c] ^= 0x1B & h; /* Rijndael's Galois field */ - } - // These may then each be 4x 8bit Swizzled - // r0.vec4 = b.vec4 - // r0.vec4 ^= a.vec4.WXYZ - // r0.vec4 ^= a.vec4.ZWXY - // r0.vec4 ^= b.vec4.YZWX ^ a.vec4.YZWX - r[0] = b[0] ^ a[3] ^ a[2] ^ b[1] ^ a[1]; - r[1] = b[1] ^ a[0] ^ a[3] ^ b[2] ^ a[2]; - r[2] = b[2] ^ a[1] ^ a[0] ^ b[3] ^ a[3]; - r[3] = b[3] ^ a[2] ^ a[1] ^ b[0] ^ a[0]; -} -``` - -The application of the swizzles allows the remapped vec4 a, b and r -variables to perform four straight linear 32 bit XOR operations where a -scalar processor would be required to perform 16 byte-level individual -operations. Given wide enough SIMD backends in hardware these 3 bit -XORs may be done as single-cycle operations across the entire 128 bit -Rijndael Matrix. - -The other alternative is to simply perform the actual 4x4 GF(256) Matrix -Multiply using the MDS Matrix. +--------- +\newpage{}