| | | | | | | | |
|----|----|----|----|----|----|----|----|
| 0 | 8 | 0 | 8 | 1 | 9 | 1 | 9 |
-|----|----|----|----|----|----|----|----|
| 2 | 10 | 2 | 10 | 3 | 11 | 3 | 11 |
-|----|----|----|----|----|----|----|----|
| 0 | 10 | 0 | 10 | 1 | 11 | 1 | 11 |
-|----|----|----|----|----|----|----|----|
| 2 | 8 | 2 | 8 | 3 | 9 | 3 | 9 |
-|----|----|----|----|----|----|----|----|
However, since the indices are small values, using a single 64-bit
register for a single index value is a waste so we will compress them,
| | | | | |
|-----------|-------------------|-------------------|-------------------|-------------------|
| SVSHAPE0: | 0x901090108000800 | 0xb030b030a020a02 | 0xb010b010a000a00 | 0x903090308020802 |
-|-----------|-------------------|-------------------|-------------------|-------------------|
Similarly we find the RB indices:
| | | | | | | | |
|----|----|----|----|----|----|----|----|
| 4 | 12 | 4 | 12 | 5 | 13 | 5 | 13 |
-|----|----|----|----|----|----|----|----|
| 6 | 14 | 6 | 14 | 7 | 15 | 7 | 15 |
-|----|----|----|----|----|----|----|----|
| 5 | 15 | 5 | 15 | 6 | 12 | 6 | 12 |
-|----|----|----|----|----|----|----|----|
| 7 | 13 | 7 | 13 | 4 | 14 | 7 | 14 |
-|----|----|----|----|----|----|----|----|
Using a similar method, we find the final 4 registers with the `RB` indices:
| | | | | |
|-----------|-------------------|-------------------|-------------------|-------------------|
| SVSHAPE1: | 0xd050d050c040c04 | 0xf070f070e060e06 | 0xc060c060f050f05 | 0xe040e040d070d07 |
-|-----------|-------------------|-------------------|-------------------|-------------------|
Now, we can construct the Vertical First loop:
| | | | | | | | |
|----|----|----|----|----|----|----|----|
| 12 | 4 | 12 | 4 | 13 | 5 | 13 | 5 |
-|----|----|----|----|----|----|----|----|
| 14 | 6 | 14 | 6 | 15 | 7 | 15 | 7 |
-|----|----|----|----|----|----|----|----|
| 15 | 5 | 15 | 5 | 12 | 6 | 12 | 6 |
-|----|----|----|----|----|----|----|----|
| 13 | 7 | 13 | 7 | 14 | 4 | 14 | 4 |
-|----|----|----|----|----|----|----|----|
Again, we find
| | | | | |
|-----------|-------------------|-------------------|-------------------|-------------------|
| SVSHAPE2: | 0x50d050d040c040c | 0x70f070f060e060e | 0x60c060c050f050f | 0x40e040e070d070d |
-|-----------|-------------------|-------------------|-------------------|-------------------|
The next operation is the `ROTATE` which takes as operand the result of the
`XOR` and a shift argument. You can easily see that the indices used in this
| | | |
|---------|--------------------|--------------------|
| SHIFTS: | 0x0000000c00000010 | 0x0000000700000008 |
-|---------|--------------------|--------------------|
The complete algorithm for a loop with 10 iterations is as follows: