| GPR 24 | x0 | x1 | x2 | x3 |
| GPR 28 | x4 | x5 | x6 | x7 |
| GPR 32 | x8 | x9 | x10 | x11 |
| GPR 24 | x0 | x1 | x2 | x3 |
| GPR 28 | x4 | x5 | x6 | x7 |
| GPR 32 | x8 | x9 | x10 | x11 |
So for the addition in Vertical-First mode, `RT` (and `RA` as they are the
same) indices are (in terms of x):
So for the addition in Vertical-First mode, `RT` (and `RA` as they are the
same) indices are (in terms of x):
| 0 | 8 | 0 | 8 | 1 | 9 | 1 | 9 |
| 2 | 10 | 2 | 10 | 3 | 11 | 3 | 11 |
| 0 | 10 | 0 | 10 | 1 | 11 | 1 | 11 |
| 0 | 8 | 0 | 8 | 1 | 9 | 1 | 9 |
| 2 | 10 | 2 | 10 | 3 | 11 | 3 | 11 |
| 0 | 10 | 0 | 10 | 1 | 11 | 1 | 11 |
8 indices in a 64-bit register:
So, `RT` indices will fit inside these 4 registers (in Little Endian format):
8 indices in a 64-bit register:
So, `RT` indices will fit inside these 4 registers (in Little Endian format):
| 4 | 12 | 4 | 12 | 5 | 13 | 5 | 13 |
| 6 | 14 | 6 | 14 | 7 | 15 | 7 | 15 |
| 5 | 15 | 5 | 15 | 6 | 12 | 6 | 12 |
| 4 | 12 | 4 | 12 | 5 | 13 | 5 | 13 |
| 6 | 14 | 6 | 14 | 7 | 15 | 7 | 15 |
| 5 | 15 | 5 | 15 | 6 | 12 | 6 | 12 |
will only need one set as the other set of indices is the same as `RT`
for `sv.add` (`SHAPE0`). So, remembering that our
will only need one set as the other set of indices is the same as `RT`
for `sv.add` (`SHAPE0`). So, remembering that our
| 12 | 4 | 12 | 4 | 13 | 5 | 13 | 5 |
| 14 | 6 | 14 | 6 | 15 | 7 | 15 | 7 |
| 15 | 5 | 15 | 5 | 12 | 6 | 12 | 6 |
| 12 | 4 | 12 | 4 | 13 | 5 | 13 | 5 |
| 14 | 6 | 14 | 6 | 15 | 7 | 15 | 7 |
| 15 | 5 | 15 | 5 | 12 | 6 | 12 | 6 |
The next operation is the `ROTATE` which takes as operand the result of the
`XOR` and a shift argument. You can easily see that the indices used in this
The next operation is the `ROTATE` which takes as operand the result of the
`XOR` and a shift argument. You can easily see that the indices used in this
(the shift values, which cycle every 4 elements). Note that the actual
indices for `SVSHAPE3` will have to be in 32-bit elements:
(the shift values, which cycle every 4 elements). Note that the actual
indices for `SVSHAPE3` will have to be in 32-bit elements: