Programmer's Note: when REMAP is activated it becomes necessary on any
context-switch (Interrupt or Function call) to detect (or know in advance)
-that REMAP is enabled and to additionally save/restore the four SVSHAPE
+that REMAP is enabled and to additionally explicitly save/restore the four SVSHAPE
SPRs, SVHAPE0-3. Given that this is expected to be a rare occurrence it was
deemed unreasonable to burden every context-switch or function call with
mandatory save/restore of SVSHAPEs, and consequently it is a *callee*
for Multi-Issue systems.
4. `hphint` is *not* limited to power-of-two. Hardware implementors may choose
a lower parallelism hint up to `hphint` and may find power-of-two more
- convenient. Actual parallelism (Dependency Hazard relaxation) must **never**
- exceed `hphint`.
+ convenient.
+
+Regarding (4): if a smaller hint is chosen by hardware, actual parallelism
+(Dependency Hazard relaxation) must **never**
+exceed `hphint` and must still respect the batch boundaries, even if this results
+in just one element being considered Hazard-independent. Even under these
+circumstances Multi-Issue Register-renaming is possible, to introduce parallelism
+by a different route.
*Hardware Architect note: each element within the same group may be treated as
100% independent from any other element within that group, and therefore
-neither Register Hazards nor Memory Hazards inter-element exist
-(but inter-group definitely does). This makes
+neither Register Hazards nor Memory Hazards inter-element exist,
+but crucially inter-group definitely remains. This makes
implementation far easier on resources because the Hazard Dependencies are
effectively at a much coarser granularity than a single register.
With element-width overrides extending down to the byte level reducing Dependency
`hphint` may legitimately be set greater than `MAXVL`. This indicates to Multi-Issue
hardware that even though MAXVL is relatively small the batches are *still independent*
and therefore if Multi-Issue hardware chooses to allocate several batches up to
-`MAXVL` in size they are still independent. This helps greatly simplify Multi-Issue
-systems by significantly reducing Hazards.
+`MAXVL` in size they are still independent, even if Register-renaming is deployed.
+This helps greatly simplify Multi-Issue systems by significantly reducing Hazards.
**Considerable care** must be taken when setting `hphint`. Matrix Outer Product
could produce corrupted results if `hphint` is set to greater than the innermost
independent. In the case of most Reduction use-cases the answer is almost certainly
"none".
-`hphint` must definitely not be set on Atomic Memory operations, Cache-Inhibited
+`hphint` must never be set on Atomic Memory operations, Cache-Inhibited
Memory operations, or Load-Reservation Store-Conditional. Also if Load-with-Update
Data-Dependent Fail-First is ever used for linked-list pointer-chasing, `hphint`
-should again definitely be disabled.
+should again definitely be disabled. Failure to do so results in `UNDEFINED`
+behaviour.
`hphint` may only be ignored by Hardware Implementors as long as full element-level
Register and Memory Hazards are implemented *in full* (including right down to individual
bytes of each register for when elwidth=8/16/32). In other words if `hphint` is to
-be ignored then implementations must be made as if `hphint=0`.
+be ignored then implementations must consider the situation as if `hphint=0`.
**Horizontal Parallelism in Vertical-First Mode**
Setting `hphint` with Vertical-First is perfectly legitimate. Under these circumstances
-the single-element strict Program Execution Order must be preserved at all times, but
-should there be a small enough program loop, than Out-of-Order Hardware may *merge*
+single-element strict Program Execution Order must be preserved at all times, but
+should there be a small enough program loop, than Out-of-Order Hardware may
+take the opportunity to *merge*
consecutive element-based instructions into the *same Reservation Stations*, for
multiple operations to be passed to massive-wide back-end SIMD ALUs or Vector-Chaining ALUs.
**Only** elements within the same `hphint` group (across multiple such looped instructions)
-may be treated such.
+may be treated as mergeable in this fashion.
Note that if the loop of Vertical-First instructions cannot fit entirely into Reservation
Stations then Hardware clearly cannot exploit the above optimisation opportunity, but at