From: lkcl Date: Thu, 13 Apr 2023 05:00:34 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls010_v1~19 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=952f42cdcfb8850079a82d3f22ee639cb2d678ba;p=libreriscv.git --- diff --git a/openpower/sv/svp64.mdwn b/openpower/sv/svp64.mdwn index a7e53d90d..95dbffd1e 100644 --- a/openpower/sv/svp64.mdwn +++ b/openpower/sv/svp64.mdwn @@ -199,12 +199,17 @@ calculations. An additional caveat involves Condition Register Fields when also used as Predicate Masks. An operation that overwrites the same CR Fields that are simultaneously -being used as a Predicate Mask is `UNDEFINED` behaviour. +being used as a Predicate Mask is `UNDEFINED` behaviour +if the overwritten CR field element was needed by a +subsequent Element for its Predicate Mask bit. This allows implementations to relax some of the otherwise-draconian Register Hazards that would otherwise occur, and to consider internal cacheing of the CR-based Predicate -bits. +bits, but some implementations *may not necessarily +perform pre-reading* and consequently the risk of +overwrite is the responsibility of the Programmer. +Special care is particularly needed here when using REMAP. ## Register files, elements, and Element-width Overrides @@ -833,11 +838,12 @@ used for both src and dest, or different regs (one for src, one for dest). Likewise CR based twin predication has a second set of 3 bits, allowing a different test to be applied. -Note that it is assumed that Predicate Masks (whether INT or CR) are -read *before* the operations proceed. In practice (for CR Fields) -this creates an unnecessary block on parallelism. Therefore, it is up -to the programmer to ensure that the CR fields used as Predicate Masks -are not being written to by any parallel Vector Loop. Doing so results +Note that it cannot necessarily be assumed that Predicate Masks +(whether INT or CR) are read in full *before* the operations proceed. In practice (for CR Fields) +this creates an unnecessary block on parallelism, prohibiting +"Vector Chaining". Therefore, it is up +to the programmer to ensure that the CR field Elements used as Predicate Masks +are not overwritten by any parallel Vector Loop. Doing so results in **UNDEFINED** behaviour, according to the definition outlined in the Power ISA v3.0B Specification. @@ -846,6 +852,14 @@ of individual CR fields until the actual predicated element operation needs to take place, safe in the knowledge that no programmer will have issued a Vector Instruction where previous elements could have overwritten (destroyed) not-yet-executed CR-Predicated element operations. +This particularly is an issue when using REMAP, as the order in +which CR-Field-based Predicate Mask bits could be read on a per-element +execution basis could well conflict with the order in which prior +elements wrote to the very same CR Field. + +Additionally Programmers should avoid using r3 r10 or r30 +as destination registers when these are also used as a Predicate +Mask. Doing so is again UNDEFINED behaviour. ### Integer Predication (MASKMODE=0) @@ -869,6 +883,7 @@ following meaning: r10 and r30 are at the high end of temporary and unused registers, so as not to interfere with register allocation from ABIs. + ### CR-based Predication (MASKMODE=1) When the predicate mode bit is one the 3 bits are interpreted as below.