add Forms to ls007, missing 1.6.2 fields

[libreriscv.git] / openpower / sv.mdwn
diff --git a/openpower/sv.mdwn b/openpower/sv.mdwn

index 223d3befb4815771f007810d32f678c4be0f85ff..732d7e51228d4b735f35578426f8aab28c2b4b1c 100644 (file)
--- a/openpower/sv.mdwn
+++ b/openpower/sv.mdwn
@@ -38,7 +38,7 @@ Fundamental design principles:
    (termed "preserving Program Order")
  * Specifically designed to be Precise-Interruptible at all times
    (many Vector ISAs have operations which, due to higher internal
-  accuracy or other complexity, must be effectively atomic for
+  accuracy or other complexity, must be effectively atomic only for
    the full Vector operation's duration, adversely affecting interrupt
    response latency, or be abandoned and started again)
  * Augments ("tags") existing instructions, providing Vectorisation
@@ -83,7 +83,7 @@ Comparative instruction count:
  * ARM NEON SIMD: around 2,000 instructions, prerequisite: ARM Scalar.
  * ARM SVE: around 4,000 instructions, prerequisite: NEON and ARM Scalar 
  * ARM SVE2: around 1,000 instructions, prerequisite: SVE, NEON, and
-  ARM Scalar
+  ARM Scalar for a grand total of well over 7,000 instructions.
  * Intel AVX-512: around 4,000 instructions, prerequisite AVX, AVX2,
    AVX-128 and AVX-256 which in turn critically rely on the rest of
    x86, for a grand total of well over 10,000 instructions.
@@ -138,6 +138,7 @@ Pages being developed and examples
    or are not immediately apparent despite the RISC paradigm
  * [[opcode_regs_deduped]] autogenerated table of SVP64 decoder augmentation
  * [[sv/sprs]] SPRs
+* [[sv/rfc]] RFCs to the [OPF ISA WG](https://openpower.foundation/isarfc/)
  
  SVP64 "Modes":
  
@@ -174,15 +175,20 @@ It requires certain guarantees to be provided.
  
  * Firstly: that instructions will never be ambiguously-defined.
  * Secondly, that no instruction shall change meaning to produce
-  different results on different hardware (present or future)
-* Thirdly, that implementors are not permitted to either add
+  different results on different hardware (present or future).
+* Thirdly, that Scalar "defined words" (32 bit instruction
+  encodings) if Vwctorised will also always be implemented as
+  identical Scalar instructions (the sole semi-exception being
+  Vevtorised Branch-Conditional)
+* Fourthly, that implementors are not permitted to either add
    arbitrary features nor implement features in an incompatible
    way. *(Performance may differ, but differing results are
    not permitted)*.
-* Fourthly, that any part of Simple-V not implemented by
+* Fifthly, that any part of Simple-V not implemented by
    a lower Compliancy Level is *required* to raise an illegal
-  instruction trap (allowing soft-emulation).
-* Fifthly, that any `UNDEFINED` behaviour for practical implementation
+  instruction trap (allowing soft-emulation), including if
+  Simple-V is not implemented at all.
+* Sixthly, that any `UNDEFINED` behaviour for practical implementation
    reasons is clearly documented for both programmers and hardware
    implementors.
  
@@ -193,12 +199,14 @@ different results on different hardware depending on a
  "Silicon Partner" hardware choice. This choice catastrophically
  and irrevocably causes binary non-interoperability *despite being
  a "feature"*.  Explained in <https://m.youtube.com/watch?v=HNEm8zmkjBU>
+it is the exact same binary-incompatibility issue faced by Power ISA
+on its 32- to 64-bit transition: 32-bit hardware was **unable** to
+trap-and-emulate 64-bit binaries because the opcodes were (are) the same.
  
  It is therefore *guaranteed* that extensions to the register file
  width and quantity in Simple-V shall only be made in future by
  explicit means, ensuring binary compatibility.
  
-
  # Optional Scalar instructions
  
  **Additional Instructions for specific purposes (not SVP64)**
@@ -235,6 +243,7 @@ Stand-alone Scalar Instructions:
  * [[sv/fclass]] detect class of FP numbers
  * [[sv/int_fp_mv]] Move and convert GPR <-> FPR, needed for !VSX
  * [[sv/av_opcodes]] scalar opcodes for Audio/Video
+* [[prefix_codes]] Decode/encode prefix-codes, used by JPEG, DEFLATE, etc.
  * TODO: OpenPOWER adaptation [[openpower/transcendentals]]
  
  Twin targetted instructions (two registers out, one implicit, just like
@@ -247,6 +256,95 @@ Load-with-Update).
  Explanation of the rules for twin register targets
  (implicit RS, FRS) explained in SVP64 [[svp64/appendix]]
  
+# Architectural Note
+
+This section is primarily for the ISA Working Group and for IBM
+in their capacity and responsibility for allocating "Architectural
+Resources" (opcodes), but it is also useful for general understanding
+of Simple-V.
+
+Simple-V is effectively a type of "Zero-Overhead Loop Control" to which
+an entire 24 bits are exclusively dedicated in a fully RISC-abstracted
+manner. Within those 24-bits there are no Scalar instructions, and
+no Vector instructions: there is *only* "Loop Control".
+
+This is why there are no actual Vector operations in Simple-V: *all* suitable
+Scalar Operations are Vectorised or not at all.  This has some extremely
+important implications when considering adding new instructions, and
+especially when allocating the Opcode Space for them.
+To protect SVP64 from damage, a "Hard Rule" has to be set:
+
+     Scalar Instructions must be simultaneously added in the corresponding
+     SVP64 opcode space with the exact same 32-bit "Defined Word" or they
+     must not be added at all.  Likewise, instructions planned for addition
+     in what is considered (wrongly) to be the exclusive "Vector" domain
+     must correspondingly be added in the Scalar space with the exact same
+     32-bit "Defined Word", or they must not be added at all.
+
+Some explanation of the above is needed.  Firstly, "Defined Word" is a term
+used in Section 1.6.3 of the Power ISA v3 1 Book I: it means, in short,
+"a 32 bit instruction", which can then be Prefixed by EXT001 to extend it
+to 64-bit (named EXT100-163).
+Prefixed-Prefixed (96-bit Variable-Length) encodings are
+prohibited in v3.1 and they are just as prohibited in Simple-V: it's too
+complex in hardware.  This means that **only** 32-bit "Defined Words"
+may be Vectorised, and in particular it means that no 64-bit instruction
+(EXT100-163) may **ever** be Vectorised.
+
+Secondly, the term "Vectoriseable" was used.  This refers to "instructions
+which if SVP64-Prefixed are actually meaningful". `sc` is meaningless
+to Vectorise, for example, as is `sync` and `mtmsr` (there is only ever
+going to be one MSR).
+
+The problem comes if the rationale is applied, "if unused,
+Unvectoriseable opcodes
+can therefore be allocated to alternative instructions mixing inside
+the SVP64
+Opcode space",
+which unfortunately results in huge inadviseable complexity in HDL at the
+Decode Phase, attempting to discern between the two types.  Worse than that,
+if the alternate 64-bit instruction is Vectoriseable but the 32-bit Scalar
+"Defined Word" is already allocated, how can there ever be a Scalar version
+of the alternate instruction? It would have to be added as a **completely
+different** 32-bit "Defined Word", and things go rapidly downhill in the
+Decoder as well as the ISA from there.
+
+Therefore to avoid risk and long-term damage to the Power ISA:
+
+* *even Unvectoriseable* "Defined Words" (`mtmsr`) must have the
+  corresponding SVP64 Prefixed Space `RESERVED`, permanently requiring
+  Illegal Instruction to be raised (the 64-bit encoding corresponding
+  to an illegal `sv.mtmsr` if ever incorrectly attempted must be
+  **defined** to raise an Exception)
+* *Even instructions that may not be Scalar* (although for various
+  practical reasons this is extremely rare if not impossible,
+  if not just generally "strongly discouraged")
+  which have no meaning or use as a 32-bit Scalar "Defined Word", **must**
+  still have the Scalar "Defined Word" `RESERVED` in the scalar
+  opcode space, as an Illegal Instruction.
+
+A good example of the former is `mtmsr` because there is only one
+MSR register (`sv.mtmsr` is meaningless, as is `sv.sc`),
+and a good example of the latter is [[sv/mv.x]]
+which is so deeply problematic to add to any Scalar ISA that it was
+rejected outright and an alternative route taken (Indexed REMAP).
+
+Another good example would be Cross Product which has no meaning
+at all in a Scalar ISA (Cross Product as a concept only applies
+to Mathematical Vectors). If any such Vector operation were ever added,
+it would be **critically** important to reserve the exact same *Scalar*
+opcode with the exact same "Defined Word" in the *Scalar* Power ISA
+opcode space, as an Illegal Instruction.  There are
+good reasons why Cross Product has not been proposed, but it serves
+to illustrate the point as far as Architectural Resource Allocation is
+concerned.
+
+Bottom line is that whilst this seems wasteful the alternatives are a
+destabilisation of the Power ISA and impractically-complex Hardware
+Decoders. With the Scalar Power ISA (v3.0, v3.1) already being comprehensive
+in the number of instructions, keeping further Decode complexity down is a 
+high priority.
+
  # Other Scalable Vector ISAs
  
  These Scalable Vector ISAs are listed to aid in understanding and
@@ -350,13 +448,16 @@ However when combined with the bitmanip scalar instructions
  requiring two Major opcodes this would come to a grand total of 3 precious
  Major opcodes. On balance, then, sacrificing 25% of EXT001 is the "least
  difficult" choice.
+Alternative locations for SVP64
+Prefixing include EXT006 and EXT017, with EXT006 being most favourable
+as there is room for future expansion.
  
  Note also that EXT022, the Official Architectural Sandbox area
  available for "Custom non-approved purposes" according to the Power
  ISA Spec,
  is under severe design pressure as it is insufficient to hold
  the full extent of the instruction additions required to create
-a Hybrid 3D CPU-VPU-GPU.  Akthough the wording of the Power ISA
+a Hybrid 3D CPU-VPU-GPU.  Although the wording of the Power ISA
  Specification leaves open the *possibility* of not needing to
  propose ISA Extensions to the ISA WG, it is clear that EXT022
  is an inappropriate location for a large high-profile Extension