four separate registers in any given operation may be simultaneously
remapped:
- function op_add(rd, rs1, rs2) # add not VADD!
+```
+ function op_add(RT, RA, RB) # add not VADD!
...
...
- for (i = 0; i < VL; i++)
- xSTATE.srcoffs = i # save context
- if (predval & 1<<i) # predication uses intregs
- ireg[rd+remap1(id)] <= ireg[rs1+remap2(irs1)] +
- ireg[rs2+remap3(irs2)];
- if (!int_vec[rd ].isvector) break;
- if (int_vec[rd ].isvector) { id += 1; }
- if (int_vec[rs1].isvector) { irs1 += 1; }
- if (int_vec[rs2].isvector) { irs2 += 1; }
+ for (i=0,id=0,irs1=0,irs2=0; i < VL; i++)
+ SVSTATE.srcstep = i # save context
+ if (predval & 1<<i) # predication mask
+ GPR[RT+remap1(id)] <= GPR[RA+remap2(irs1)] +
+ GPR[RB+remap3(irs2)];
+ if (!int_vec[RT ].isvector) break;
+ if (int_vec[RT].isvector) { id += 1; }
+ if (int_vec[RA].isvector) { irs1 += 1; }
+ if (int_vec[RB].isvector) { irs2 += 1; }
+```
By changing remappings, 2D matrices may be transposed "in-place" for one
operation, followed by setting a different permutation order without
straight sequentially through the 16 values f8-f23 in the Matrix. The
equivalent sequence thus is issued:
+```
fmac f4, f0, f8, f4
fmac f5, f0, f9, f5
fmac f6, f0, f10, f6
fmac f5, f3, f21, f5
fmac f6, f3, f22, f6
fmac f7, f3, f23, f7
+```
The only other instruction required is to ensure that f4-f7 are
initialised (usually to zero).
this can be done with the ternary instruction which has
an in-place triple boolean input:
+```
RT = RT | (RA & RB)
+```
and also has a CR Field variant of the same