(no commit message)

[libreriscv.git] / openpower / sv / svp64.mdwn
diff --git a/openpower/sv/svp64.mdwn b/openpower/sv/svp64.mdwn

index 2a9c6f2c0ec9ab8d92c942397636d9c2f7b657f4..8d732f474f1c1aad0c79fddfaa746fc9c5ecf542 100644 (file)
--- a/openpower/sv/svp64.mdwn
+++ b/openpower/sv/svp64.mdwn
@@ -47,12 +47,7 @@ Table of contents
  Simple-V is a type of Vectorization best described as a "Prefix Loop
  Subsystem" similar to the 5 decades-old Zilog Z80 `LDIR`[^bib_ldir] instruction and
  to the 8086 `REP`[^bib_rep] Prefix instruction.  More advanced features are similar
-to the Z80 `CPIR`[^bib_cpir] instruction. If naively viewed one-dimensionally as an
-actual Vector ISA it introduces over 1.5 million 64-bit True-Scalable
-Vector instructions on the SFFS Subset and closer to 10 million 64-bit
-True-Scalable Vector instructions if introduced on VSX.  SVP64, the
-instruction format used by Simple-V, is therefore best viewed as an
-orthogonal RISC-paradigm "Prefixing" subsystem instead.
+to the Z80 `CPIR`[^bib_cpir] instruction.
  
  [^bib_ldir]:  [Zilog Z80 LDIR](http://z80-heaven.wikidot.com/instructions-set:ldir)
  [^bib_cpir]:  [Zilog Z80 CPIR](http://z80-heaven.wikidot.com/instructions-set:cpir)
@@ -70,18 +65,11 @@ considered an independent "Defined Word-instruction"[^dwi] that augments the beh
  the following instruction (also a Defined Word-instruction), but does **not** change the actual Decoding
  of that following instruction just because it is Prefixed.  Unlike EXT100-163,
  where the Suffix is considered an entirely new Opcode Space,
-SVP64-Prefixed instructions  **MUST NEVER** be treated or regarded
+SVP64-Prefixed instructions must never be treated or regarded
  as a different Opcode Space.
  
  [^dwi]: Defined Word-instruction: Power ISA v3.1 Section 1.6
  
-*Architectural note: Treating the SVP64 Prefix as an "Independent" 64-bit Encoding Space and attempting
-to allocate non-Orthogonal Opcodes within it will result
-in catastrophic unviability of Simple-V. The Orthogonality of the Scalar vs Prefixed-Scalar
-spaces has to be considered inviolate, to the extent that even RESERVED spaces must be
-kept identical. The complexity at the Decode Phase by violating the RISC paradigm inherent
-in Simple-V will be unimplementable*
-
  Two apparent exceptions to the above hard rule exist: SV
  Branch-Conditional operations and LD/ST-update "Post-Increment"
  Mode.  Post-Increment was considered sufficiently high priority
@@ -105,7 +93,8 @@ greatly complexifying Parallel Instruction-Length Detection.
  Therefore it has to be prohibited to accept RFCs
  which fundamentally violate the following hard requirement: **under no circumstances**
  must the use of SVP64 24-bit Suffixes **also** imply a different Opcode space
-from **any** non-prefixed Word, even RESERVED or Illegal Words.*
+from **any** non-prefixed Word. Even RESERVED or Illegal Words must be
+Orthogonal.*
  
  Subset implementations in hardware are permitted, as long as certain
  rules are followed, allowing for full soft-emulation including future
@@ -129,8 +118,11 @@ only 24 bits:
  Different classes of operations require different formats. The earlier
  sections cover the common formats and the five separate modes have their own
  section later:
-CR operations (crops), Arithmetic/Logical (termed "normal"), Load/Store
-Immediate, Load/Store Indexed, and Branch-Conditional.
+* CR operations (crops),
+* Arithmetic/Logical (termed "normal"),
+* Load/Store Immediate,
+* Load/Store Indexed,
+* Branch-Conditional.
  
  ## Definition of Reserved in this spec.
  
@@ -198,17 +190,17 @@ execution of instructions, Simple-V requires a corresponding guarantee for Eleme
  because in Simple-V Execution of Elements is synonymous with Execution of
  instructions.
  
+[^ieo]: Strict Instruction Execution Order is defined in Public v3.1 Book I Section 2.2
+
  ## Precise Interrupt Guarantees
  
-Strict Instruction Execution Order[^ieo] is defined as giving the appearance, as far
+Strict Instruction Execution Order is defined as giving the appearance, as far
  as programs are concerned, that instructions were executed
  strictly in the sequence that they occurred.  A "Precise"
  out-of-order
  Micro-architecture goes to considerable lengths to ensure that
  this is the case.
  
-[^ieo]: Strict Instruction Execution Order is defined in Public v3.1 Book I Section 2.2
-
  Many Vector ISAs allow interrupts to occur in the middle of
  processing of large Vector operations, only under the condition
  that partial results are cleanly discarded, and continuation on return
@@ -221,11 +213,11 @@ accumulator than the registers.
  
  Simple-V operates on an entirely different paradigm from traditional
  Vector ISAs: as a "Sub-Execution Context", where "Elements" are synonymous
-with Scalar instructions. With this in mind it is critical for
-implementations to observe Strict **Element**-Level Execution Order[^svp64_eeo]
+with Scalar instructions. With this in mind
+implementations must observe Strict **Element**-Level Execution Order[[#svp64_eeo]]
  at all times.
-*Any* element is Interruptible and Architectural State may
-be fully preserved and restored regardless of that same State
+*Any* element is Interruptible, and Architectural State may
+be fully preserved and restored regardless of that same State.
  
  *Engineering note: implementations are permitted have higher latency to
  perform context-switching  (particularly if REMAP
@@ -234,19 +226,21 @@ is active).*
  Interrupts still only save `MSR` and `PC` in `SRR0` and `SRR1`
  but the full SVP64 Architectural State may be saved and
  restored through manual copying of `SVSTATE` (and the four
-REMAP SPRs if in use at the time)
-
-*Programmer's note: Trap Handlers (and function call stack save/restore)
-may avoid the
-use of SVP64 Prefixed instructions to perform the necessary
-save/restore of Simple-V Architectural State.
-This capability also allows nested function calls to be made from
-inside Vertical-First Vector loops, which is very rare for Vector ISAs.
-
-Strict Program Order is also preserved by the Parallel Reduction
-REMAP Schedule, but only at the cost of requiring the destination
-Vector to be used (Deterministically) to store partial progress of the
-Parallel Reduction.
+REMAP SPRs if in use at the time, which may be determined by
+`SVSTATE[32:46]` being non-zero).
+
+*Programmer's note: Trap Handlers (and any stack-based context save/restore)
+must avoid the use of SVP64 Prefixed instructions to perform the necessary
+save/restore of Simple-V Architectural State (SPR SVSTATE),
+just as use of FPRs and VSRs is presently avoided.
+However once saved, and set to known-good, SVP64 Prefixed instructions
+may be used to save/restore GPRs, SPRs, FPRs and other state.*
+
+*Programmer's note: SVSHAPE0-3 alters Element Execution Order, but only
+if activated in SVSHAPE. It is therefore technically possible in a Trap
+Handler to save SVSTATE (`mfspr t0, SVSTATE`), then clear bits 32-46.
+At this point it becomes safe to use SVP64 to save sequential batches
+of SPRs (`setvli MAXVL=VL=4; sv.mfspr *t0, *SVSHAPE0`)*
  
  The only major caveat for REMAP is that
  after an explicit change to
@@ -256,7 +250,7 @@ it easier to take longer to calculate where in a given Schedule
  the re-mapping Indices were.  Obvious examples include Interrupts occuring
  in the middle of a non-RADIX2 Matrix Multiply Schedule (5x3 by 3x3
  for example), which
-will force implementations to perform divide and modulo
+will force some implementations to perform divide and modulo
  calculations.
  
  An additional caveat involves Condition Register Fields