From ca54ac20d6a5181033948846b7b5da103a7dd705 Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 23 Apr 2023 14:33:41 +0100 Subject: [PATCH] --- openpower/sv/sprs.mdwn | 34 +++++++++++++++++++++------------- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/openpower/sv/sprs.mdwn b/openpower/sv/sprs.mdwn index e0eabf38b..62aafc181 100644 --- a/openpower/sv/sprs.mdwn +++ b/openpower/sv/sprs.mdwn @@ -85,7 +85,7 @@ cache) additional GPR reads. Programmer's Note: when REMAP is activated it becomes necessary on any context-switch (Interrupt or Function call) to detect (or know in advance) -that REMAP is enabled and to additionally save/restore the four SVSHAPE +that REMAP is enabled and to additionally explicitly save/restore the four SVSHAPE SPRs, SVHAPE0-3. Given that this is expected to be a rare occurrence it was deemed unreasonable to burden every context-switch or function call with mandatory save/restore of SVSHAPEs, and consequently it is a *callee* @@ -161,13 +161,19 @@ batches for Multi-Issue systems. 4. `hphint` is *not* limited to power-of-two. Hardware implementors may choose a lower parallelism hint up to `hphint` and may find power-of-two more - convenient. Actual parallelism (Dependency Hazard relaxation) must **never** - exceed `hphint`. + convenient. + +Regarding (4): if a smaller hint is chosen by hardware, actual parallelism +(Dependency Hazard relaxation) must **never** +exceed `hphint` and must still respect the batch boundaries, even if this results +in just one element being considered Hazard-independent. Even under these +circumstances Multi-Issue Register-renaming is possible, to introduce parallelism +by a different route. *Hardware Architect note: each element within the same group may be treated as 100% independent from any other element within that group, and therefore -neither Register Hazards nor Memory Hazards inter-element exist -(but inter-group definitely does). This makes +neither Register Hazards nor Memory Hazards inter-element exist, +but crucially inter-group definitely remains. This makes implementation far easier on resources because the Hazard Dependencies are effectively at a much coarser granularity than a single register. With element-width overrides extending down to the byte level reducing Dependency @@ -176,8 +182,8 @@ Hazard hardware complexity becomes even more important.* `hphint` may legitimately be set greater than `MAXVL`. This indicates to Multi-Issue hardware that even though MAXVL is relatively small the batches are *still independent* and therefore if Multi-Issue hardware chooses to allocate several batches up to -`MAXVL` in size they are still independent. This helps greatly simplify Multi-Issue -systems by significantly reducing Hazards. +`MAXVL` in size they are still independent, even if Register-renaming is deployed. +This helps greatly simplify Multi-Issue systems by significantly reducing Hazards. **Considerable care** must be taken when setting `hphint`. Matrix Outer Product could produce corrupted results if `hphint` is set to greater than the innermost @@ -188,25 +194,27 @@ also requires care to correctly declare in `hphint` how many elements are independent. In the case of most Reduction use-cases the answer is almost certainly "none". -`hphint` must definitely not be set on Atomic Memory operations, Cache-Inhibited +`hphint` must never be set on Atomic Memory operations, Cache-Inhibited Memory operations, or Load-Reservation Store-Conditional. Also if Load-with-Update Data-Dependent Fail-First is ever used for linked-list pointer-chasing, `hphint` -should again definitely be disabled. +should again definitely be disabled. Failure to do so results in `UNDEFINED` +behaviour. `hphint` may only be ignored by Hardware Implementors as long as full element-level Register and Memory Hazards are implemented *in full* (including right down to individual bytes of each register for when elwidth=8/16/32). In other words if `hphint` is to -be ignored then implementations must be made as if `hphint=0`. +be ignored then implementations must consider the situation as if `hphint=0`. **Horizontal Parallelism in Vertical-First Mode** Setting `hphint` with Vertical-First is perfectly legitimate. Under these circumstances -the single-element strict Program Execution Order must be preserved at all times, but -should there be a small enough program loop, than Out-of-Order Hardware may *merge* +single-element strict Program Execution Order must be preserved at all times, but +should there be a small enough program loop, than Out-of-Order Hardware may +take the opportunity to *merge* consecutive element-based instructions into the *same Reservation Stations*, for multiple operations to be passed to massive-wide back-end SIMD ALUs or Vector-Chaining ALUs. **Only** elements within the same `hphint` group (across multiple such looped instructions) -may be treated such. +may be treated as mergeable in this fashion. Note that if the loop of Vertical-First instructions cannot fit entirely into Reservation Stations then Hardware clearly cannot exploit the above optimisation opportunity, but at -- 2.30.2