* Predication on both source and destination
* Two different sources of predication: INT and CR Fields
* SV Modes including saturation (for Audio, Video and DSP), mapreduce,
- fail-first and predicate-result mode.
+ and fail-first mode.
Different classes of operations require different formats. The earlier
sections cover the common formats and the four separate modes follow:
use of SVP64 Prefixed instructions to perform the necessary
save/restore of Simple-V Architectural State.
This capability also allows nested function calls to be made from
-inside Vector loops, which is very rare for Vector ISAs.
+inside Vertical-First Vector loops, which is very rare for Vector ISAs.
Strict Program Order is also preserved by the Parallel Reduction
REMAP Schedule, but only at the cost of requiring the destination
will force implementations to perform divide and modulo
calculations.
+An additional caveat involves Condition Register Fields
+when also used as Predicate Masks. An operation that
+overwrites the same CR Fields that are simultaneously
+being used as a Predicate Mask is `UNDEFINED` behaviour
+if the overwritten CR field element was needed by a
+subsequent Element for its Predicate Mask bit.
+This allows implementations to relax some of the
+otherwise-draconian Register Hazards that would otherwise
+occur, and to consider internal cacheing of the CR-based
+Predicate
+bits, but some implementations *may not necessarily
+perform pre-reading* and consequently the risk of
+overwrite is the responsibility of the Programmer.
+Special care is particularly needed here when using REMAP.
+
## Register files, elements, and Element-width Overrides
The relationship between register files, elements, and element-width
sub-vector. SUBVL=2 represents a vec2, its encoding is 0b01, therefore
this may be considered to be elements 0b00 to 0b01 inclusive.
+Effectively, SUBVL is like a SIMD multiplier: instead of just 1
+element operation issued, SUBVL element operations are issued (as an inner loop).
+The key difference between VL looping and SUBVL looping
+is that predication bits are applied per
+**group**, rather than by individual element.
+
+Directly related to `subvl` is the `pack` and `unpack` Mode bits of `SVSTATE`.
+
## MASK/MASK_SRC & MASKMODE Encoding
One bit (`MASKMODE`) indicates the mode: CR or Int predication. The two
Likewise CR based twin predication has a second set of 3 bits, allowing
a different test to be applied.
-Note that it is assumed that Predicate Masks (whether INT or CR) are
-read *before* the operations proceed. In practice (for CR Fields)
-this creates an unnecessary block on parallelism. Therefore, it is up
-to the programmer to ensure that the CR fields used as Predicate Masks
-are not being written to by any parallel Vector Loop. Doing so results
+Note that it cannot necessarily be assumed that Predicate Masks
+(whether INT or CR) are read in full *before* the operations proceed. In practice (for CR Fields)
+this creates an unnecessary block on parallelism, prohibiting
+"Vector Chaining". Therefore, it is up
+to the programmer to ensure that the CR field Elements used as Predicate Masks
+are not overwritten by any parallel Vector Loop. Doing so results
in **UNDEFINED** behaviour, according to the definition outlined in the
Power ISA v3.0B Specification.
needs to take place, safe in the knowledge that no programmer will have
issued a Vector Instruction where previous elements could have overwritten
(destroyed) not-yet-executed CR-Predicated element operations.
+This particularly is an issue when using REMAP, as the order in
+which CR-Field-based Predicate Mask bits could be read on a per-element
+execution basis could well conflict with the order in which prior
+elements wrote to the very same CR Field.
+
+Additionally Programmers should avoid using r3 r10 or r30
+as destination registers when these are also used as a Predicate
+Mask. Doing so is again UNDEFINED behaviour.
### Integer Predication (MASKMODE=0)
r10 and r30 are at the high end of temporary and unused registers,
so as not to interfere with register allocation from ABIs.
+
### CR-based Predication (MASKMODE=1)
When the predicate mode bit is one the 3 bits are interpreted as below.