bug 1034: update spec page on bin/tern lut2/lut3

[libreriscv.git] / openpower / sv.mdwn
diff --git a/openpower/sv.mdwn b/openpower/sv.mdwn

index b5ddcba94130944141b6e2fdc36adf34717cbb3f..23b9e811b5641af5f37dc558f0e0a4f80e51dd79 100644 (file)
--- a/openpower/sv.mdwn
+++ b/openpower/sv.mdwn
@@ -38,7 +38,7 @@ Fundamental design principles:
    (termed "preserving Program Order")
  * Specifically designed to be Precise-Interruptible at all times
    (many Vector ISAs have operations which, due to higher internal
    (termed "preserving Program Order")
  * Specifically designed to be Precise-Interruptible at all times
    (many Vector ISAs have operations which, due to higher internal
-  accuracy or other complexity, must be effectively atomic for
+  accuracy or other complexity, must be effectively atomic only for
    the full Vector operation's duration, adversely affecting interrupt
    response latency, or be abandoned and started again)
  * Augments ("tags") existing instructions, providing Vectorisation
    the full Vector operation's duration, adversely affecting interrupt
    response latency, or be abandoned and started again)
  * Augments ("tags") existing instructions, providing Vectorisation
@@ -83,7 +83,7 @@ Comparative instruction count:
  * ARM NEON SIMD: around 2,000 instructions, prerequisite: ARM Scalar.
  * ARM SVE: around 4,000 instructions, prerequisite: NEON and ARM Scalar 
  * ARM SVE2: around 1,000 instructions, prerequisite: SVE, NEON, and
  * ARM NEON SIMD: around 2,000 instructions, prerequisite: ARM Scalar.
  * ARM SVE: around 4,000 instructions, prerequisite: NEON and ARM Scalar 
  * ARM SVE2: around 1,000 instructions, prerequisite: SVE, NEON, and
-  ARM Scalar
+  ARM Scalar for a grand total of well over 7,000 instructions.
  * Intel AVX-512: around 4,000 instructions, prerequisite AVX, AVX2,
    AVX-128 and AVX-256 which in turn critically rely on the rest of
    x86, for a grand total of well over 10,000 instructions.
  * Intel AVX-512: around 4,000 instructions, prerequisite AVX, AVX2,
    AVX-128 and AVX-256 which in turn critically rely on the rest of
    x86, for a grand total of well over 10,000 instructions.
@@ -132,12 +132,16 @@ Pages being developed and examples
  * [[sv/compliancy_levels]] for minimum subsets through to Advanced
    Supercomputing.
  * [[sv/implementation]] implementation planning and coordination
  * [[sv/compliancy_levels]] for minimum subsets through to Advanced
    Supercomputing.
  * [[sv/implementation]] implementation planning and coordination
+* [[sv/po9_encoding]] a new DRAFT 64-bit space similar to EXT1xx,
+  introducing new areas EXT232-63 and EXT300-363
  * [[sv/svp64]] contains the packet-format *only*, the [[svp64/appendix]]
    contains explanations and further details
  * [[sv/svp64]] contains the packet-format *only*, the [[svp64/appendix]]
    contains explanations and further details
+* [[sv/svp64-single]] still under development
  * [[sv/svp64_quirks]] things in SVP64  that slightly break the rules
    or are not immediately apparent despite the RISC paradigm
  * [[opcode_regs_deduped]] autogenerated table of SVP64 decoder augmentation
  * [[sv/sprs]] SPRs
  * [[sv/svp64_quirks]] things in SVP64  that slightly break the rules
    or are not immediately apparent despite the RISC paradigm
  * [[opcode_regs_deduped]] autogenerated table of SVP64 decoder augmentation
  * [[sv/sprs]] SPRs
+* [[sv/rfc]] RFCs to the [OPF ISA WG](https://openpower.foundation/isarfc/)
  
  SVP64 "Modes":
  
  
  SVP64 "Modes":
  
@@ -166,7 +170,7 @@ Core SVP64 instructions:
  Beyond this point are additional **Scalar** instructions related to
  specific workloads that have nothing to do with the SV Specification*
  
  Beyond this point are additional **Scalar** instructions related to
  specific workloads that have nothing to do with the SV Specification*
  
-# Guarantees in Simple-V
+# Stability Guarantees in Simple-V
  
  Providing long-term stability in an ISA is extremely challenging
  but critically important.
  
  Providing long-term stability in an ISA is extremely challenging
  but critically important.
@@ -174,27 +178,38 @@ It requires certain guarantees to be provided.
  
  * Firstly: that instructions will never be ambiguously-defined.
  * Secondly, that no instruction shall change meaning to produce
  
  * Firstly: that instructions will never be ambiguously-defined.
  * Secondly, that no instruction shall change meaning to produce
-  different results on different hardware (present or future)
-* Thirdly, that implementors are not permitted to either add
+  different results on different hardware (present or future).
+* Thirdly, that Scalar "defined words" (32 bit instruction
+  encodings) if Vectorised will also always be implemented as
+  identical Scalar instructions (the sole semi-exception being
+  Vectorised Branch-Conditional)
+* Fourthly, that implementors are not permitted to either add
    arbitrary features nor implement features in an incompatible
    arbitrary features nor implement features in an incompatible
-  way.
-* Fourthly, that any part of Simple-V not implemented by
+  way. *(Performance may differ, but differing results are
+  not permitted)*.
+* Fifthly, that any part of Simple-V not implemented by
    a lower Compliancy Level is *required* to raise an illegal
    a lower Compliancy Level is *required* to raise an illegal
-  instruction trap.
+  instruction trap (allowing soft-emulation), including if
+  Simple-V is not implemented at all.
+* Sixthly, that any `UNDEFINED` behaviour for practical implementation
+  reasons is clearly documented for both programmers and hardware
+  implementors.
  
  In particular, given the strong recent emphasis and interest in
  "Scalable Vector" ISAs, it is most unfortunate that both ARM SVE
  and RISC-V RVV permit the exact same instruction to produce
  different results on different hardware depending on a
  "Silicon Partner" hardware choice. This choice catastrophically
  
  In particular, given the strong recent emphasis and interest in
  "Scalable Vector" ISAs, it is most unfortunate that both ARM SVE
  and RISC-V RVV permit the exact same instruction to produce
  different results on different hardware depending on a
  "Silicon Partner" hardware choice. This choice catastrophically
-and irrevocably causes binary non-interoperability despite being
-a "feature".  Explained in <https://m.youtube.com/watch?v=HNEm8zmkjBU>
+and irrevocably causes binary non-interoperability *despite being
+a "feature"*.  Explained in <https://m.youtube.com/watch?v=HNEm8zmkjBU>
+it is the exact same binary-incompatibility issue faced by Power ISA
+on its 32- to 64-bit transition: 32-bit hardware was **unable** to
+trap-and-emulate 64-bit binaries because the opcodes were (are) the same.
  
  It is therefore *guaranteed* that extensions to the register file
  width and quantity in Simple-V shall only be made in future by
  explicit means, ensuring binary compatibility.
  
  
  It is therefore *guaranteed* that extensions to the register file
  width and quantity in Simple-V shall only be made in future by
  explicit means, ensuring binary compatibility.
  
-
  # Optional Scalar instructions
  
  **Additional Instructions for specific purposes (not SVP64)**
  # Optional Scalar instructions
  
  **Additional Instructions for specific purposes (not SVP64)**
@@ -210,10 +225,10 @@ Some of these Scalar instructions happen also designed to make
  Scalable Vector binaries more efficient, such
  as the crweird group.  Others are to bring the Scalar Power ISA
  up-to-date within specific workloads,
  Scalable Vector binaries more efficient, such
  as the crweird group.  Others are to bring the Scalar Power ISA
  up-to-date within specific workloads,
-such as a Javascript Rounding instruction
-(which saves 35 instructions including 5 branches). None of them are strictly
-necessary but performance and power consumption may be (or, is already)
-compromised
+such as a JavaScript Rounding instruction
+(which saves 32 scalar instructions including seven branch instructions).
+None of them are strictly necessary but performance and power consumption may
+be (or, is already) compromised
  in certain workloads and use-cases without them.
  
  Vector-related but still Scalar:
  in certain workloads and use-cases without them.
  
  Vector-related but still Scalar:
@@ -231,6 +246,7 @@ Stand-alone Scalar Instructions:
  * [[sv/fclass]] detect class of FP numbers
  * [[sv/int_fp_mv]] Move and convert GPR <-> FPR, needed for !VSX
  * [[sv/av_opcodes]] scalar opcodes for Audio/Video
  * [[sv/fclass]] detect class of FP numbers
  * [[sv/int_fp_mv]] Move and convert GPR <-> FPR, needed for !VSX
  * [[sv/av_opcodes]] scalar opcodes for Audio/Video
+* [[prefix_codes]] Decode/encode prefix-codes, used by JPEG, DEFLATE, etc.
  * TODO: OpenPOWER adaptation [[openpower/transcendentals]]
  
  Twin targetted instructions (two registers out, one implicit, just like
  * TODO: OpenPOWER adaptation [[openpower/transcendentals]]
  
  Twin targetted instructions (two registers out, one implicit, just like
@@ -243,6 +259,95 @@ Load-with-Update).
  Explanation of the rules for twin register targets
  (implicit RS, FRS) explained in SVP64 [[svp64/appendix]]
  
  Explanation of the rules for twin register targets
  (implicit RS, FRS) explained in SVP64 [[svp64/appendix]]
  
+# Architectural Note
+
+This section is primarily for the ISA Working Group and for IBM
+in their capacity and responsibility for allocating "Architectural
+Resources" (opcodes), but it is also useful for general understanding
+of Simple-V.
+
+Simple-V is effectively a type of "Zero-Overhead Loop Control" to which
+an entire 24 bits are exclusively dedicated in a fully RISC-abstracted
+manner. Within those 24-bits there are no Scalar instructions, and
+no Vector instructions: there is *only* "Loop Control".
+
+This is why there are no actual Vector operations in Simple-V: *all* suitable
+Scalar Operations are Vectorised or not at all.  This has some extremely
+important implications when considering adding new instructions, and
+especially when allocating the Opcode Space for them.
+To protect SVP64 from damage, a "Hard Rule" has to be set:
+
+     Scalar Instructions must be simultaneously added in the corresponding
+     SVP64 opcode space with the exact same 32-bit "Defined Word" or they
+     must not be added at all.  Likewise, instructions planned for addition
+     in what is considered (wrongly) to be the exclusive "Vector" domain
+     must correspondingly be added in the Scalar space with the exact same
+     32-bit "Defined Word", or they must not be added at all.
+
+Some explanation of the above is needed.  Firstly, "Defined Word" is a term
+used in Section 1.6.3 of the Power ISA v3 1 Book I: it means, in short,
+"a 32 bit instruction", which can then be Prefixed by EXT001 to extend it
+to 64-bit (named EXT100-163).
+Prefixed-Prefixed (96-bit Variable-Length) encodings are
+prohibited in v3.1 and they are just as prohibited in Simple-V: it's too
+complex in hardware.  This means that **only** 32-bit "Defined Words"
+may be Vectorised, and in particular it means that no 64-bit instruction
+(EXT100-163) may **ever** be Vectorised.
+
+Secondly, the term "Vectoriseable" was used.  This refers to "instructions
+which if SVP64-Prefixed are actually meaningful". `sc` is meaningless
+to Vectorise, for example, as is `sync` and `mtmsr` (there is only ever
+going to be one MSR).
+
+The problem comes if the rationale is applied, "if unused,
+Unvectoriseable opcodes
+can therefore be allocated to alternative instructions mixing inside
+the SVP64
+Opcode space",
+which unfortunately results in huge inadviseable complexity in HDL at the
+Decode Phase, attempting to discern between the two types.  Worse than that,
+if the alternate 64-bit instruction is Vectoriseable but the 32-bit Scalar
+"Defined Word" is already allocated, how can there ever be a Scalar version
+of the alternate instruction? It would have to be added as a **completely
+different** 32-bit "Defined Word", and things go rapidly downhill in the
+Decoder as well as the ISA from there.
+
+Therefore to avoid risk and long-term damage to the Power ISA:
+
+* *even Unvectoriseable* "Defined Words" (`mtmsr`) must have the
+  corresponding SVP64 Prefixed Space `RESERVED`, permanently requiring
+  Illegal Instruction to be raised (the 64-bit encoding corresponding
+  to an illegal `sv.mtmsr` if ever incorrectly attempted must be
+  **defined** to raise an Exception)
+* *Even instructions that may not be Scalar* (although for various
+  practical reasons this is extremely rare if not impossible,
+  if not just generally "strongly discouraged")
+  which have no meaning or use as a 32-bit Scalar "Defined Word", **must**
+  still have the Scalar "Defined Word" `RESERVED` in the scalar
+  opcode space, as an Illegal Instruction.
+
+A good example of the former is `mtmsr` because there is only one
+MSR register (`sv.mtmsr` is meaningless, as is `sv.sc`),
+and a good example of the latter is [[sv/mv.x]]
+which is so deeply problematic to add to any Scalar ISA that it was
+rejected outright and an alternative route taken (Indexed REMAP).
+
+Another good example would be Cross Product which has no meaning
+at all in a Scalar ISA (Cross Product as a concept only applies
+to Mathematical Vectors). If any such Vector operation were ever added,
+it would be **critically** important to reserve the exact same *Scalar*
+opcode with the exact same "Defined Word" in the *Scalar* Power ISA
+opcode space, as an Illegal Instruction.  There are
+good reasons why Cross Product has not been proposed, but it serves
+to illustrate the point as far as Architectural Resource Allocation is
+concerned.
+
+Bottom line is that whilst this seems wasteful the alternatives are a
+destabilisation of the Power ISA and impractically-complex Hardware
+Decoders. With the Scalar Power ISA (v3.0, v3.1) already being comprehensive
+in the number of instructions, keeping further Decode complexity down is a 
+high priority.
+
  # Other Scalable Vector ISAs
  
  These Scalable Vector ISAs are listed to aid in understanding and
  # Other Scalable Vector ISAs
  
  These Scalable Vector ISAs are listed to aid in understanding and
@@ -338,21 +443,12 @@ would become a whopping 96-bit long instruction. Avoiding this
  situation is a high priority which in turn by necessity puts pressure
  on the 32-bit Major Opcode space.
  
  situation is a high priority which in turn by necessity puts pressure
  on the 32-bit Major Opcode space.
  
-SVP64 itself is already under pressure, being only 24 bits.  If it is
-not permitted to take up 25% of EXT001 then it would have to be proposed
-in its own Major Opcode, which on first consideration would be beneficial
-for SVP64 due to the availability of 2 extra bits.
-However when combined with the bitmanip scalar instructions
-requiring two Major opcodes this would come to a grand total of 3 precious
-Major opcodes. On balance, then, sacrificing 25% of EXT001 is the "least
-difficult" choice.
-
  Note also that EXT022, the Official Architectural Sandbox area
  available for "Custom non-approved purposes" according to the Power
  ISA Spec,
  is under severe design pressure as it is insufficient to hold
  the full extent of the instruction additions required to create
  Note also that EXT022, the Official Architectural Sandbox area
  available for "Custom non-approved purposes" according to the Power
  ISA Spec,
  is under severe design pressure as it is insufficient to hold
  the full extent of the instruction additions required to create
-a Hybrid 3D CPU-VPU-GPU.  Akthough the wording of the Power ISA
+a Hybrid 3D CPU-VPU-GPU.  Although the wording of the Power ISA
  Specification leaves open the *possibility* of not needing to
  propose ISA Extensions to the ISA WG, it is clear that EXT022
  is an inappropriate location for a large high-profile Extension
  Specification leaves open the *possibility* of not needing to
  propose ISA Extensions to the ISA WG, it is clear that EXT022
  is an inappropriate location for a large high-profile Extension