how best to process the exact same equivalent loop-unrolled instruction
stream.*
+## Horizontal-Parallelism Hint
+
+`SVSTATE.hphint` is an indicator to hardware of how many elements are 100%
+fully independent. Hardware is permitted to assume that groups of elements
+up to `hphint` in size need not have Register (or Memory) Hazards created
+between them (including when `hphint > VL`).
+
+If care is not taken in setting `hphint` correctly it may wreak havoc.
+For example Matrix Outer Product relies on the innermost loop computations
+being independent. If `hphint` is set to greater than the Outer Product
+depth then data corruption is guaranteed to occur.
+
+Likewise on FFTs it is assumed that each layer of the RADIX2 triple-loop
+is independent, but that there is strict *inter-layer* Register Hazards.
+Therefore if `hphint` is set to greater than the RADIX2 width of the FFT,
+data corruption is guaranteed.
+
+Thus the key message is that setting `hphint` requires in-depth knowledge
+of the REMAP Algorithm Schedules, given in the Appendix.
+
## REMAP types
This section summarises the motivation for each REMAP Schedule
uses overlapping registers as accumulators. Thus the Register Hazard
Management needed by Indexed REMAP *has* to be in place anyway.
+*Programmer's Note: `hphint` may be used to help hardware identify
+parallelism opportunities but it is critical to remember that the
+groupings are by `FLOOR(step/MAXVL)` not `FLOOR(REMAP(step)/MAXVL)`.*
+
The cost compared to Matrix and other REMAPs (and Pack/Unpack) is
clearly that of the additional reading of the GPRs to be used as Indices,
plus the setup cost associated with creating those same Indices.
can therefore also be precise. The final result will be in the first
non-predicate-masked-out destination element, but due again to
the deterministic schedule programmers may find uses for the intermediate
-results.
+results, even for non-commutative Defined Word operations.
When Rc=1 a corresponding Vector of co-resultant CRs is also
created. No special action is taken: the result *and its CR Field*