From ea16f6910fc33ed0dacb0bd3913a94ded5ff28a1 Mon Sep 17 00:00:00 2001 From: lkcl Date: Wed, 14 Sep 2022 18:00:34 +0100 Subject: [PATCH] --- openpower/sv/rfc/ls001.mdwn | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/openpower/sv/rfc/ls001.mdwn b/openpower/sv/rfc/ls001.mdwn index 483c07745..0b618f5ce 100644 --- a/openpower/sv/rfc/ls001.mdwn +++ b/openpower/sv/rfc/ls001.mdwn @@ -203,7 +203,8 @@ The primary options are: * vec2/3/4 "Packing" and "Unpacking" (similar to VSX `vpack` and `vpkss`) accessible in a way that is easier than REMAP, added for the same reasons that drove `vpack` and `vpkss` etc. to be added: pixel, audio, and 3D - data manipulation. + data manipulation. With Pack/Unpack being part of SVSTATE it can be + applied *in-place* saving register file space (no copy/mv needed). * Load/Store speculative "fault-first" behaviour, identical to ARM and RVV Fault-first: provides auto-truncation of a speculative LD/ST helping solve the "SIMD Considered Harmful" stripmining problem from a Memory @@ -239,6 +240,29 @@ be suitably adapted to each category. * CR Field ops * Branch-Conditional - saves on instruction count in 3D parallel if/else +**Vectorised Branch-Conditional** + +As mentioned in the introduction this is the one sole instruction group +that +is different pseudocode from its scalar equivalent. However even there +its various Mode bits and options can be set such that in the degenerate +case the behaviour becomes identical to Scalar Branch-Conditional. + +The two additional Modes within Vectorised Branch-Conditional, both of +which may be combined, are `CTR-Mode` and `VLI-Test` (aka "Data Fail First"). +CTR Mode extends the way that CTR may be decremented unconditionally +within Scalar Branch-Conditional, and not only makes it conditional but +also interacts with predication. VLI-Test provides the same option +as Data-Dependent Fault-First to Deterministically truncate the Vector +Length at the fail **or success** point. + +Boolean Logic rules on sets (treating the Vector of CR Fields to be tested by +`BO` as a set) dictate that the Branch should take place on either 'ALL' +tests succeeding (or failing) or whether 'SOME' tests succeed (or fail). +These options provide the ability to cover the majority of Parallel +3D GPU Conditions, saving a not inconsiderable number of instructions +especially given the close interaction with CTR in hot-loops. + **SVP64Single** The `SVP64-Single` 24-bit encoding focusses primarily on ensuring that -- 2.30.2