(no commit message)

[libreriscv.git] / openpower / sv / sprs.mdwn
diff --git a/openpower/sv/sprs.mdwn b/openpower/sv/sprs.mdwn

index e6e9d17b349ba1a372dd35d6c7e78d1fb3955223..243f7ebd4dd406f52564ac8df6394a94d799fe04 100644 (file)
--- a/openpower/sv/sprs.mdwn
+++ b/openpower/sv/sprs.mdwn
@@ -1,7 +1,20 @@
  # SPRs  <a name="sprs"></a>
  
-## SVSTATE SPR 
+The full list of SPRs for Simple-V is:
+
+| SPR           | Width   | Description   |
+|---------------|---------|---------------------------------|
+| **SVSTATE**   | 64-bit  | Zero-Overhead Loop Architectural State   |
+| **SVLR**      | 64-bit  | SVSTATE equivalent of LR-to-PC  |
+| **SVSHAPE0**  | 32-bit  |  REMAP Shape 0   |
+| **SVSHAPE1**  | 32-bit  |  REMAP Shape 1   |
+| **SVSHAPE2**  | 32-bit  |  REMAP Shape 2   |
+| **SVSHAPE3**  | 32-bit  |  REMAP Shape 3   |
  
+Future versions of Simple-V will have at least 7 more SVSTATE SPRs, in a small
+"stack", as part of a full Zero-Overhead Loop Control subsystem.
+
+## SVSTATE SPR 
  
  The format of the SVSTATE SPR is as follows:
  
@@ -59,10 +72,12 @@ SVSTATE contains (and permits setting of):
  * UnPack - if set then dststep/dsubstep VL/SUBVL loop-ordering is inverted.
  * hphint - Horizontal Parallelism Hint. Indicates that
    no Hazards exist between groups of elements in sequential multiples of this number
-   (before REMAP).  By definition: elements for which `FLOOR(srcstep/hphint)` is
-   equal *before REMAP* are in the same parallelism "group". In Vertical First Mode
-   hardware **MUST ONLY** process elements in the same group, and must stop
-   Horizontal Issue at the last element of a given group. Set to zero to indicate "no hint".
+   (before REMAP).  By definition: elements for which `FLOOR(step/hphint)` is
+   equal *before REMAP* are in the same parallelism "group", for both
+   `srcstep` and `dststep`. In Vertical First Mode
+   hardware **MUST** respect Strict Program Order but is permitted to
+   merge multiple scalar loops into parallel batches, if Reservation Station resources
+   are sufficient.  Set to zero to indicate "no hint".
  * SVme - REMAP enable bits, indicating which register is to be
     REMAPed: RA, RB, RC, RT and EA are the canonical (typical) register names
     associated with each bit, with RA being the LSB and EA being the MSB.
@@ -83,7 +98,7 @@ cache) additional GPR reads.
  
  Programmer's Note: when REMAP is activated it becomes necessary on any
  context-switch (Interrupt or Function call) to detect (or know in advance)
-that REMAP is enabled and to additionally save/restore the four SVSHAPE
+that REMAP is enabled and to additionally explicitly save/restore the four SVSHAPE
  SPRs, SVHAPE0-3.  Given that this is expected to be a rare occurrence it was
  deemed unreasonable to burden every context-switch or function call with
  mandatory save/restore of SVSHAPEs, and consequently it is a *callee*
@@ -93,6 +108,22 @@ could be adversely affected.  SVP64 purely relies on Scalar instructions,
  so Scalar instructions (except the SVP64 Management ones and mtspr and
  mfspr) are 100% guaranteed to have zero impact on SVP64 state.
  
+**SVme REMAP area**
+
+Each bit of `SVSTATE.SVme` indicates whether the SVSHAPE (0-3) is active and to which register
+the REMAP applies.  The application goes by *assembler operand names* on a per-mnemonic
+basis.  Some instructions may have `RT` as a source and as a destination: REMAP applies
+**separately** to each use in this case.  Also for Load/Store with Update the Effective
+Address (stored in EA) also may be separately REMAPed from RA as a source operand.
+
+| bit|applies|register applied|
+|----|-------|----------------|
+| 46 | mi0 | source RA / FRA / BA / BFA / RT / FRT |
+| 45 | mi1 | source RB / FRB / BB|
+| 44 | mi2 | source RC / FRC / BC|
+| 43 | mo0 | result RT / FRT / BT / BF|
+| 42 | mo1 | result Effective Address (RA) / FRS / RS|
+
  **Max Vector Length (maxvl)** <a name="mvl" />
  
  MAXVECTORLENGTH is a static (immediate-operand only) compile-time declaration
@@ -159,21 +190,29 @@ batches
     for Multi-Issue systems.
  4. `hphint` is *not* limited to power-of-two. Hardware implementors may choose
     a lower parallelism hint up to `hphint` and may find power-of-two more
-   convenient. Actual parallelism (Dependency Hazard relaxation) must **never**
-   exceed `hphint`.
+   convenient.
+
+Regarding (4): if a smaller hint is chosen by hardware, actual parallelism
+(Dependency Hazard relaxation) must **never**
+exceed `hphint` and must still respect the batch boundaries, even if this results
+in just one element being considered Hazard-independent.  Even under these
+circumstances Multi-Issue Register-renaming is possible, to introduce parallelism
+by a different route.
  
  *Hardware Architect note: each element within the same group may be treated as
  100% independent from any other element within that group, and therefore
-neither Register Hazards nor Memory Hazards inter-element exist
-(but inter-group definitely does).  This makes
+neither Register Hazards nor Memory Hazards inter-element exist,
+but crucially inter-group definitely remains.  This makes
  implementation far easier on resources because the Hazard Dependencies are
-effectively at a much coarser granularity than a single register.*
+effectively at a much coarser granularity than a single register.
+With element-width overrides extending down to the byte level reducing Dependency
+Hazard hardware complexity becomes even more important.*
  
  `hphint` may legitimately be set greater than `MAXVL`. This indicates to Multi-Issue
  hardware that even though MAXVL is relatively small the batches are *still independent*
  and therefore if Multi-Issue hardware chooses to allocate several batches up to
-`MAXVL` in size they are still independent.  This helps greatly simplify Multi-Issue
-systems by significantly reducing Hazards.
+`MAXVL` in size they are still independent, even if Register-renaming is deployed.
+This helps greatly simplify Multi-Issue systems by significantly reducing Hazards.
  
  **Considerable care** must be taken when setting `hphint`. Matrix Outer Product
  could produce corrupted results if `hphint` is set to greater than the innermost
@@ -184,25 +223,27 @@ also requires care to correctly declare in `hphint` how many elements are
  independent. In the case of most Reduction use-cases the answer is almost certainly
  "none".
  
-`hphint` must definitely not be set on Atomic Memory operations, Cache-Inhibited
+`hphint` must never be set on Atomic Memory operations, Cache-Inhibited
  Memory operations, or Load-Reservation Store-Conditional. Also if Load-with-Update
  Data-Dependent Fail-First is ever used for linked-list pointer-chasing, `hphint`
-should again definitely be disabled.
+should again definitely be disabled. Failure to do so results in `UNDEFINED`
+behaviour.
  
  `hphint` may only be ignored by Hardware Implementors as long as full element-level
  Register and Memory Hazards are implemented *in full* (including right down to individual
  bytes of each register for when elwidth=8/16/32). In other words if `hphint` is to
-be ignored then implementations must be made as if `hphint=0`.
+be ignored then implementations must consider the situation as if `hphint=0`.
  
  **Horizontal Parallelism in Vertical-First Mode**
  
  Setting `hphint` with Vertical-First is perfectly legitimate.  Under these circumstances
-the single-element strict Program Execution Order must be preserved at all times, but
-should there be a small enough program loop, than Out-of-Order Hardware may *merge*
+single-element strict Program Execution Order must be preserved at all times, but
+should there be a small enough program loop, than Out-of-Order Hardware may
+take the opportunity to *merge*
  consecutive element-based instructions into the *same Reservation Stations*, for
  multiple operations to be passed to massive-wide back-end SIMD ALUs or Vector-Chaining ALUs.
  **Only** elements within the same `hphint` group (across multiple such looped instructions)
-may be treated such.
+may be treated as mergeable in this fashion.
  
  Note that if the loop of Vertical-First instructions cannot fit entirely into Reservation
  Stations then Hardware clearly cannot exploit the above optimisation opportunity, but at
@@ -221,5 +262,7 @@ Note that there is no equivalent Link variant of SVREMAP or
  SVSHAPE0-3 (it would be too costly), so SVLR has limited applicability:
  REMAP SPRs must be saved and restored explicitly.
  
+-----------
+
  [[!tag standards]]