From b45c5a52d33405fb5a057205c7a41debfaa46290 Mon Sep 17 00:00:00 2001 From: Konstantinos Margaritis Date: Thu, 27 Apr 2023 09:32:36 +0000 Subject: [PATCH] fix tables --- openpower/sv/cookbook/chacha20.mdwn | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/openpower/sv/cookbook/chacha20.mdwn b/openpower/sv/cookbook/chacha20.mdwn index fec2b730f..959326e39 100644 --- a/openpower/sv/cookbook/chacha20.mdwn +++ b/openpower/sv/cookbook/chacha20.mdwn @@ -138,6 +138,7 @@ to the beginning of the loop does not occur automatically though, a branch instr Let's assume the values `x` in the registers 24-36 +|--------|-----|-----|-----|-----| | GPR 24 | x0 | x1 | x2 | x3 | | GPR 28 | x4 | x5 | x6 | x7 | | GPR 32 | x8 | x9 | x10 | x11 | @@ -146,6 +147,7 @@ Let's assume the values `x` in the registers 24-36 So for the addition in Vertical-First mode, `RT` (and `RA` as they are the same) indices are (in terms of x): +|----|----|----|----|----|----|----|----| | 0 | 8 | 0 | 8 | 1 | 9 | 1 | 9 | | 2 | 10 | 2 | 10 | 3 | 11 | 3 | 11 | | 0 | 10 | 0 | 10 | 1 | 11 | 1 | 11 | @@ -156,10 +158,12 @@ register for a single index value is a waste so we will compress them, 8 indices in a 64-bit register: So, `RT` indices will fit inside these 4 registers (in Little Endian format): - SVSHAPE0: | 0x901090108000800 | 0xb030b030a020a02 | 0xb010b010a000a00 | 0x903090308020802 | + |-----------|-------------------|-------------------|-------------------|-------------------| + | SVSHAPE0: | 0x901090108000800 | 0xb030b030a020a02 | 0xb010b010a000a00 | 0x903090308020802 | Similarly we find the RB indices: +|----|----|----|----|----|----|----|----| | 4 | 12 | 4 | 12 | 5 | 13 | 5 | 13 | | 6 | 14 | 6 | 14 | 7 | 15 | 7 | 15 | | 5 | 15 | 5 | 15 | 6 | 12 | 6 | 12 | @@ -167,7 +171,8 @@ Similarly we find the RB indices: Using a similar method, we find the final 4 registers with the `RB` indices: - SVSHAPE1: | 0xd050d050c040c04 | 0xf070f070e060e06 | 0xc060c060f050f05 | 0xe040e040d070d07 | + |-----------|-------------------|-------------------|-------------------|-------------------| + | SVSHAPE1: | 0xd050d050c040c04 | 0xf070f070e060e06 | 0xc060c060f050f05 | 0xe040e040d070d07 | Now, we can construct the Vertical First loop: @@ -278,6 +283,7 @@ We will need to create another set of indices for the `XOR` instructions. We will only need one set as the other set of indices is the same as `RT` for `sv.add` (`SHAPE0`). So, remembering that our +|----|----|----|----|----|----|----|----| | 12 | 4 | 12 | 4 | 13 | 5 | 13 | 5 | | 14 | 6 | 14 | 6 | 15 | 7 | 15 | 7 | | 15 | 5 | 15 | 5 | 12 | 6 | 12 | 6 | @@ -285,7 +291,8 @@ for `sv.add` (`SHAPE0`). So, remembering that our Again, we find - SVSHAPE2: | 0x50d050d040c040c | 0x70f070f060e060e | 0x60c060c050f050f | 0x40e040e070d070d | + |-----------|-------------------|-------------------|-------------------|-------------------| + | SVSHAPE2: | 0x50d050d040c040c | 0x70f070f060e060e | 0x60c060c050f050f | 0x40e040e070d070d | The next operation is the `ROTATE` which takes as operand the result of the `XOR` and a shift argument. You can easily see that the indices used in this @@ -315,7 +322,8 @@ So, in a similar fashion, we instruct `XOR` (`sv.xor`) to use `SVSHAPE2` for (the shift values, which cycle every 4 elements). Note that the actual indices for `SVSHAPE3` will have to be in 32-bit elements: - SHIFTS: | 0x0000000c00000010 | 0x0000000700000008 | + |---------|--------------------|--------------------| + | SHIFTS: | 0x0000000c00000010 | 0x0000000700000008 | The complete algorithm for a loop with 10 iterations is as follows: -- 2.30.2