+[[!tag standards]]
+
# Simple-V (Parallelism Extension Proposal) Appendix
* Copyright (C) 2017, 2018, 2019 Luke Kenneth Casson Leighton
Branch operations are augmented slightly to be a little more like FP
Compares (FEQ, FNE etc.), by permitting the cumulation (and storage)
of multiple comparisons into a register (taken indirectly from the predicate
-table). As such, "ffirst" - fail-on-first - condition mode can be enabled.
+table) and enhancing them to branch "consensually" depending on *multiple*
+tests. "ffirst" - fail-on-first - condition mode can also be enabled,
+to terminate the comparisons early.
See ffirst mode in the Predication Table section.
+There are two registers for the comparison operation, therefore there
+is the opportunity to associate two predicate registers (note: not in
+the same way as twin-predication). The first is a "normal" predicate
+register, which acts just as it does on any other single-predicated
+operation: masks out elements where a bit is zero, applies an inversion
+to the predicate mask, and enables zeroing / non-zeroing mode.
+
+The second (not to be confused with a twin-predication 2nd register)
+is utilised to indicate where the results of each comparison are to
+be stored, as a bitmask. Additionally, the behaviour of the branch -
+when it occurs - may also be modified depending on whether the 2nd predicate's
+"invert" and "zeroing" bits are set. These four combinations result
+in "consensual branches", cbranch.ifnone (NOR), cbranch.ifany (OR),
+cbranch.ifall (AND), cbranch.ifnotall (NAND).
+
+| invert | zeroing | description | operation | cbranch |
+| ------ | ------- | --------------------------- | --------- | ------- |
+| 0 | 0 | branch if all pass | AND | ifall |
+| 1 | 0 | branch if one fails | NAND | ifnall |
+| 0 | 1 | branch if one passes | OR | ifany |
+| 1 | 1 | branch if all fail | NOR | ifnone |
+
+This inversion capability covers AND, OR, NAND and NOR branching
+based on multiple element comparisons. Without the full set of four,
+it is necessary to have two-sequence branch operations: one conditional, one
+unconditional.
+
+Note that unlike normal computer programming, early-termination of chains
+of AND or OR conditional tests, the chain does *not* terminate early
+except if fail-on-first is set, and even then ffirst ends on the first
+data-dependent zero. When ffirst mode is not set, *all* conditional
+element tests must be performed (and the result optionally stored in
+the result mask), with a "post-analysis" phase carried out which checks
+whether to branch.
+
+Note also that whilst it may seem excessive to have all four (because
+conditional comparisons may be inverted by swapping src1 and src2),
+data-dependent fail-on-first is *not* invertible and *only* terminates
+on first zero-condition encountered. Additionally it may be inconvenient
+to have to swap the predicate registers associated with src1 and src2,
+because this involves a new VBLOCK Context.
+
### Standard Branch <a name="standard_branch"></a>
Branch operations use standard RV opcodes that are reinterpreted to
Note that just as with the standard (scalar, non-predicated) branch
operations, BLE, BGT, BLEU and BTGU may be synthesised by inverting
-src1 and src2.
+src1 and src2, however note that in doing so, the predicate table
+setup must also be correspondingly adjusted.
In Hwacha EECS-2015-262 Section 6.7.2 the following pseudocode is given
for predicated compare operations of function "cmp":
ps = get_pred_val(I/F==INT, rs1);
rd = get_pred_val(I/F==INT, rs2); # this may not exist
+ ffirst_mode, zeroing = get_pred_flags(rs1)
+ if exists(rd):
+ pred_inversion, pred_zeroing = get_pred_flags(rs2)
+ else
+ pred_inversion, pred_zeroing = False, False
+
if not exists(rd) or zeroing:
result = (1<<VL)-1 # all 1s
else
result |= 1<<i;
else
result &= ~(1<<i);
+ if ffirst_mode:
+ break
- if not exists(rd)
- if result == ps
- goto branch
- else
+ if exists(rd):
preg[rd] = result # store in destination
- if preg[rd] == ps
- goto branch
+
+ if pred_inversion:
+ if pred_zeroing:
+ # NOR
+ if result == 0:
+ goto branch
+ else:
+ # NAND
+ if (result & ps) != result:
+ goto branch
+ else:
+ if pred_zeroing:
+ # OR
+ if result != 0:
+ goto branch
+ else:
+ # AND
+ if (result & ps) == result:
+ goto branch
Notes:
RVV version:
strncpy:
- mv a3, a0 # Copy dst
+ c.mv a3, a0 # Copy dst
loop:
setvli x0, a2, vint8 # Vectors of bytes.
vlbff.v v1, (a1) # Get src bytes
vmfirst a4, v0 # Zero found?
vmsif.v v0, v0 # Set mask up to and including zero byte.
vsb.v v1, (a3), v0.t # Write out bytes
- bgez a4, exit # Done
+ c.bgez a4, exit # Done
csrr t1, vl # Get number of bytes fetched
- add a1, a1, t1 # Bump src pointer
- sub a2, a2, t1 # Decrement count.
- add a3, a3, t1 # Bump dst pointer
- bnez a2, loop # Anymore?
+ c.add a1, a1, t1 # Bump src pointer
+ c.sub a2, a2, t1 # Decrement count.
+ c.add a3, a3, t1 # Bump dst pointer
+ c.bnez a2, loop # Anymore?
exit:
- ret
+ c.ret
SV version (WIP):
strncpy:
- mv a3, a0
- RegCSR[a3] = 8bit, a3, scalar
- RegCSR[a1] = 8bit, a1, scalar
- RegCSR[t0] = 8bit, t0, vector
- PredTb[t0] = ffirst, x0, inv
+ c.mv a3, a0
+ VBLK.RegCSR[t0] = 8bit, t0, vector
+ VBLK.PredTb[t0] = ffirst, x0, inv
loop:
- SETVLI a2, t4, 8 # t4 and VL now 1..8 (MVL=8)
- ldb t0, (a1) # t0 fail first mode
- bne t0, x0, allnonzero # still ff
- # VL points to last nonzero
- GETVL t4 # from bne tests
- addi t4, t4, 1 # include zero
- SETVL t4 # set exactly to t4
- stb t0, (a3) # store incl zero
- ret # end subroutine
+ VBLK.SETVLI a2, t4, 8 # t4 and VL now 1..8 (MVL=8)
+ c.ldb t0, (a1) # t0 fail first mode
+ c.bne t0, x0, allnonzero # still ff
+ # VL (t4) points to last nonzero
+ c.addi t4, t4, 1 # include zero
+ c.stb t0, (a3) # store incl zero
+ c.ret # end subroutine
allnonzero:
- stb t0, (a3) # VL legal range
- GETVL t4 # from bne tests
- add a1, a1, t4 # Bump src pointer
- sub a2, a2, t4 # Decrement count.
- add a3, a3, t4 # Bump dst pointer
- bnez a2, loop # Anymore?
+ c.stb t0, (a3) # VL legal range
+ c.add a1, a1, t4 # Bump src pointer
+ c.sub a2, a2, t4 # Decrement count.
+ c.add a3, a3, t4 # Bump dst pointer
+ c.bnez a2, loop # Anymore?
exit:
- ret
+ c.ret
Notes:
number of 16-bit instruction words: 11.
* Total: 14 16-bit words. By contrast, RVV requires around 18 16-bit words.
+## BigInt add <a name="bigadd"></a>
+
+[[!inline raw="yes" pages="simple_v_extension/bigadd_example" ]]