From 6b631bef514c3005a04ba29f1426dc8fccdff493 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Fri, 25 Dec 2020 16:46:29 +0000 Subject: [PATCH] rename regs to RA, RT, RB, RS --- openpower/sv/overview.mdwn | 74 +++++++++++++++++++------------------- 1 file changed, 37 insertions(+), 37 deletions(-) diff --git a/openpower/sv/overview.mdwn b/openpower/sv/overview.mdwn index f87e7e98b..3d8f3b5e0 100644 --- a/openpower/sv/overview.mdwn +++ b/openpower/sv/overview.mdwn @@ -72,16 +72,16 @@ All of this is *without modifying the OpenPOWER v3.0B ISA*, except to add "wrapp The first augmentation to the simple loop is to add the option for all source and destinations to all be either scalar or vector. As a FSM this is where our "simple" loop gets its first complexity. - function op_add(rd, rs1, rs2) # add not VADD! + function op_add(RT, RA, RB) # add not VADD! int id=0, irs1=0, irs2=0; for i = 0 to VL-1: - ireg[rd+id] <= ireg[rs1+irs1] + ireg[rs2+irs2]; - if (!rd.isvec) break; - if (rd.isvec) { id += 1; } - if (rs1.isvec) { irs1 += 1; } - if (rs2.isvec) { irs2 += 1; } + ireg[RT+id] <= ireg[RA+irs1] + ireg[RB+irs2]; + if (!RT.isvec) break; + if (RT.isvec) { id += 1; } + if (RA.isvec) { irs1 += 1; } + if (RB.isvec) { irs2 += 1; } -With some walkthroughs it is clear that the loop exits immediately after the first scalar destination result is written, and that when the destination is a Vector the loop proceeds to fill up the register file, sequentially, starting at `rd` and ending at `rd+VL-1`. The two source registers will, independently, either remain pointing at `rs1` or `rs2` respectively, or, if marked as Vectors, will march incrementally in lockstep, producing element results along the way, as the destination also progresses through elements. +With some walkthroughs it is clear that the loop exits immediately after the first scalar destination result is written, and that when the destination is a Vector the loop proceeds to fill up the register file, sequentially, starting at `rd` and ending at `rd+VL-1`. The two source registers will, independently, either remain pointing at `RB` or `RA` respectively, or, if marked as Vectors, will march incrementally in lockstep, producing element results along the way, as the destination also progresses through elements. In this way all the eight permutations of Scalar and Vector behaviour are covered, although without predication the scalar-destination ones are reduced in usefulness. It does however clearly illustrate the principle. @@ -91,16 +91,16 @@ Note in particular: there is no separate Scalar add instruction and separate Vec The next step is to add a single predicate mask. This is where it gets interesting. Predicate masks are a bitvector, each bit specifying, in order, whether the element operation is to be skipped ("masked out") or allowed. If there is no predicate, it is set to all 1s, which is effectively the same as "no predicate". - function op_add(rd, rs1, rs2) # add not VADD! + function op_add(RT, RA, RB) # add not VADD! int id=0, irs1=0, irs2=0; predval = get_pred_val(FALSE, rd); for i = 0 to VL-1: if (predval & 1< @@ -235,7 +235,7 @@ In SV given the percentage of operations that also involve initialisation to 0.0 remap = (swizzle >> 3*s) & 0b111 if remap < 4: sm = id*SUBVL + remap - ireg[rd+s] <= ireg[rs1+sm] + ireg[rd+s] <= ireg[RA+sm] elif remap == 4: ireg[rd+s] <= 0.0 elif remap == 5: @@ -249,15 +249,15 @@ Some 3D GPU ISAs also allow for two-operand subvector swizzles. These are suffi Twin Predication is cool. Essentially it is a back-to-back VCOMPRESS-VEXPAND (a multiple sequentially ordered VINSERT). The compress part is covered by the source predicate and the expand part by the destination predicate. Of course, if either of those is all 1s then the operation degenerates *to* VCOMPRESS or VEXPAND, respectively. - function op(rd, rs): -  ps = get_pred_val(FALSE, rs); # predication on src -  pd = get_pred_val(FALSE, rd); # ... AND on dest + function op(RT, RS): +  ps = get_pred_val(FALSE, RS); # predication on src +  pd = get_pred_val(FALSE, RT); # ... AND on dest  for (int i = 0, int j = 0; i < VL && j < VL;): - if (rs.isvec) while (!(ps & 1<