Let's assume the values `x` in the registers 24-36
-| h1 | r0 | r1 | r2 | r3 |
+| GPR # | | | | |
|--------|-----|-----|-----|-----|
-| GPR 24 | x0 | x1 | x2 | x3 |
-| GPR 28 | x4 | x5 | x6 | x7 |
-| GPR 32 | x8 | x9 | x10 | x11 |
-| GPR 36 | x12 | x13 | x14 | x15 |
+| 24 | x0 | x1 | x2 | x3 |
+| 28 | x4 | x5 | x6 | x7 |
+| 32 | x8 | x9 | x10 | x11 |
+| 36 | x12 | x13 | x14 | x15 |
So for the addition in Vertical-First mode, `RT` (and `RA` as they are the
same) indices are (in terms of x):
+| | | | | | | | |
|----|----|----|----|----|----|----|----|
| 0 | 8 | 0 | 8 | 1 | 9 | 1 | 9 |
|----|----|----|----|----|----|----|----|
8 indices in a 64-bit register:
So, `RT` indices will fit inside these 4 registers (in Little Endian format):
+| | | | | |
|-----------|-------------------|-------------------|-------------------|-------------------|
| SVSHAPE0: | 0x901090108000800 | 0xb030b030a020a02 | 0xb010b010a000a00 | 0x903090308020802 |
|-----------|-------------------|-------------------|-------------------|-------------------|
Similarly we find the RB indices:
+| | | | | | | | |
|----|----|----|----|----|----|----|----|
| 4 | 12 | 4 | 12 | 5 | 13 | 5 | 13 |
|----|----|----|----|----|----|----|----|
Using a similar method, we find the final 4 registers with the `RB` indices:
+| | | | | |
|-----------|-------------------|-------------------|-------------------|-------------------|
| SVSHAPE1: | 0xd050d050c040c04 | 0xf070f070e060e06 | 0xc060c060f050f05 | 0xe040e040d070d07 |
|-----------|-------------------|-------------------|-------------------|-------------------|
will only need one set as the other set of indices is the same as `RT`
for `sv.add` (`SHAPE0`). So, remembering that our
+| | | | | | | | |
|----|----|----|----|----|----|----|----|
| 12 | 4 | 12 | 4 | 13 | 5 | 13 | 5 |
|----|----|----|----|----|----|----|----|
Again, we find
+| | | | | |
|-----------|-------------------|-------------------|-------------------|-------------------|
| SVSHAPE2: | 0x50d050d040c040c | 0x70f070f060e060e | 0x60c060c050f050f | 0x40e040e070d070d |
|-----------|-------------------|-------------------|-------------------|-------------------|
(the shift values, which cycle every 4 elements). Note that the actual
indices for `SVSHAPE3` will have to be in 32-bit elements:
+| | | |
|---------|--------------------|--------------------|
| SHIFTS: | 0x0000000c00000010 | 0x0000000700000008 |
|---------|--------------------|--------------------|