* Preserving the underlying scalar execution dependencies as if the
for-loop had been expanded as actual scalar instructions
(termed "preserving Program Order")
+* Specifically designed to be Precise-Interruptible at all times
+ (many Vector ISAs have operations which, due to higher internal
+ accuracy or other complexity, must be effectively atomic only for
+ the full Vector operation's duration, adversely affecting interrupt
+ response latency, or be abandoned and started again)
* Augments ("tags") existing instructions, providing Vectorisation
"context" rather than adding new instructions.
* Strictly does not interfere with or alter the non-Scalable Power ISA
* ARM NEON SIMD: around 2,000 instructions, prerequisite: ARM Scalar.
* ARM SVE: around 4,000 instructions, prerequisite: NEON and ARM Scalar
* ARM SVE2: around 1,000 instructions, prerequisite: SVE, NEON, and
- ARM Scalar
+ ARM Scalar for a grand total of well over 7,000 instructions.
* Intel AVX-512: around 4,000 instructions, prerequisite AVX, AVX2,
AVX-128 and AVX-256 which in turn critically rely on the rest of
x86, for a grand total of well over 10,000 instructions.
* RISV-V RVV: 192 instructions, prerequisite 96 Scalar RV64GC instructions
-* SVP64: **five** instructions, 24-bit prefixing of
+* SVP64: **six** instructions, two of which are in the same space
+ (svshape, svshape2), with 24-bit prefixing of
prerequisite SFS (150) or
- SFFS (214) Compliancy Subsets
+ SFFS (214) Compliancy Subsets.
+ **There are no dedicated Vector instructions, only Scalar-prefixed**.
+
+Comparative Basic Design Principle:
+
+* ARM NEON and VSX: PackedSIMD. No instruction-overloaded meaning
+ (every instruction is unique for a given register bitwidth,
+ guaranteeing binary interoperability)
+* Intel AVX-512 (and below): Hybrid Packed-Predicated SIMD with no
+ instruction-overloading, guaranteeing binary interoperability
+ but at the same time penalising the ISA with runaway
+ opcode proliferation.
+* ARM SVE/SVE2: Hybrid Packed-Predicated SIMD with instruction-overloading
+ that destroys binary interoperability. This is hidden behind the
+ misuse of the word "Scalable" and is **permitted under License**
+ by "Silicon Partners".
+* RISC-V RVV: Cray-style Scalable Vector but with instruction-overloading
+ **permitted by the specification** that destroys binary interoperability.
+* SVP64: Cray-style Scalable Vector with no instruction-overloaded
+ meanings. The regfile numbers and bitwidths shall **not** change
+ in a future revision (for the same instruction encoding):
+ "Silicon Partner" Scaling is prohibited,
+ in order to guarantee binary interoperability. Future revisions
+ of SVP64 may extend VSX instructions to achieve larger regfiles, and
+ non-interoperability on the same will likewise be prohibited.
SV comprises several [[sv/compliancy_levels]] suited to Embedded, Energy
efficient High-Performance Compute, Distributed Computing and Advanced
* [[sv/compliancy_levels]] for minimum subsets through to Advanced
Supercomputing.
* [[sv/implementation]] implementation planning and coordination
+* [[sv/po9_encoding]] a new DRAFT 64-bit space similar to EXT1xx,
+ introducing new areas EXT232-63 and EXT300-363
* [[sv/svp64]] contains the packet-format *only*, the [[svp64/appendix]]
contains explanations and further details
+* [[sv/svp64-single]] still under development
* [[sv/svp64_quirks]] things in SVP64 that slightly break the rules
or are not immediately apparent despite the RISC paradigm
* [[opcode_regs_deduped]] autogenerated table of SVP64 decoder augmentation
* [[sv/sprs]] SPRs
+* [[sv/rfc]] RFCs to the [OPF ISA WG](https://openpower.foundation/isarfc/)
SVP64 "Modes":
Vertical-First Mode and also providing traditional "Vector Iota"
capability.
-*Please note: there are only five instructions in the whole of SV.
+*Please note: there are only six instructions in the whole of SV.
Beyond this point are additional **Scalar** instructions related to
specific workloads that have nothing to do with the SV Specification*
+# Stability Guarantees in Simple-V
+
+Providing long-term stability in an ISA is extremely challenging
+but critically important.
+It requires certain guarantees to be provided.
+
+* Firstly: that instructions will never be ambiguously-defined.
+* Secondly, that no instruction shall change meaning to produce
+ different results on different hardware (present or future).
+* Thirdly, that Scalar "defined words" (32 bit instruction
+ encodings) if Vectorised will also always be implemented as
+ identical Scalar instructions (the sole semi-exception being
+ Vectorised Branch-Conditional)
+* Fourthly, that implementors are not permitted to either add
+ arbitrary features nor implement features in an incompatible
+ way. *(Performance may differ, but differing results are
+ not permitted)*.
+* Fifthly, that any part of Simple-V not implemented by
+ a lower Compliancy Level is *required* to raise an illegal
+ instruction trap (allowing soft-emulation), including if
+ Simple-V is not implemented at all.
+* Sixthly, that any `UNDEFINED` behaviour for practical implementation
+ reasons is clearly documented for both programmers and hardware
+ implementors.
+
+In particular, given the strong recent emphasis and interest in
+"Scalable Vector" ISAs, it is most unfortunate that both ARM SVE
+and RISC-V RVV permit the exact same instruction to produce
+different results on different hardware depending on a
+"Silicon Partner" hardware choice. This choice catastrophically
+and irrevocably causes binary non-interoperability *despite being
+a "feature"*. Explained in <https://m.youtube.com/watch?v=HNEm8zmkjBU>
+it is the exact same binary-incompatibility issue faced by Power ISA
+on its 32- to 64-bit transition: 32-bit hardware was **unable** to
+trap-and-emulate 64-bit binaries because the opcodes were (are) the same.
+
+It is therefore *guaranteed* that extensions to the register file
+width and quantity in Simple-V shall only be made in future by
+explicit means, ensuring binary compatibility.
+
# Optional Scalar instructions
**Additional Instructions for specific purposes (not SVP64)**
They are all entirely designed as Scalar instructions that, as
Scalar instructions, stand on their own merit. Considerable
lengths have been made to provide justifications for each of these
-*Scalar* instructions.
+*Scalar* instructions in a *Scalar* context, completely independently
+of SVP64.
-Some of these Scalar instructions are specifically designed to make
+Some of these Scalar instructions happen also designed to make
Scalable Vector binaries more efficient, such
as the crweird group. Others are to bring the Scalar Power ISA
up-to-date within specific workloads,
-such as a Javascript Rounding instruction
-(which saves 35 instructions including 5 branches). None of them are strictly
-necessary but performance and power consumption may be (or, is already)
-compromised
+such as a JavaScript Rounding instruction
+(which saves 32 scalar instructions including seven branch instructions).
+None of them are strictly necessary but performance and power consumption may
+be (or, is already) compromised
in certain workloads and use-cases without them.
Vector-related but still Scalar:
* [[sv/fclass]] detect class of FP numbers
* [[sv/int_fp_mv]] Move and convert GPR <-> FPR, needed for !VSX
* [[sv/av_opcodes]] scalar opcodes for Audio/Video
+* [[prefix_codes]] Decode/encode prefix-codes, used by JPEG, DEFLATE, etc.
* TODO: OpenPOWER adaptation [[openpower/transcendentals]]
Twin targetted instructions (two registers out, one implicit, just like
Explanation of the rules for twin register targets
(implicit RS, FRS) explained in SVP64 [[svp64/appendix]]
+# Architectural Note
+
+This section is primarily for the ISA Working Group and for IBM
+in their capacity and responsibility for allocating "Architectural
+Resources" (opcodes), but it is also useful for general understanding
+of Simple-V.
+
+Simple-V is effectively a type of "Zero-Overhead Loop Control" to which
+an entire 24 bits are exclusively dedicated in a fully RISC-abstracted
+manner. Within those 24-bits there are no Scalar instructions, and
+no Vector instructions: there is *only* "Loop Control".
+
+This is why there are no actual Vector operations in Simple-V: *all* suitable
+Scalar Operations are Vectorised or not at all. This has some extremely
+important implications when considering adding new instructions, and
+especially when allocating the Opcode Space for them.
+To protect SVP64 from damage, a "Hard Rule" has to be set:
+
+ Scalar Instructions must be simultaneously added in the corresponding
+ SVP64 opcode space with the exact same 32-bit "Defined Word" or they
+ must not be added at all. Likewise, instructions planned for addition
+ in what is considered (wrongly) to be the exclusive "Vector" domain
+ must correspondingly be added in the Scalar space with the exact same
+ 32-bit "Defined Word", or they must not be added at all.
+
+Some explanation of the above is needed. Firstly, "Defined Word" is a term
+used in Section 1.6.3 of the Power ISA v3 1 Book I: it means, in short,
+"a 32 bit instruction", which can then be Prefixed by EXT001 to extend it
+to 64-bit (named EXT100-163).
+Prefixed-Prefixed (96-bit Variable-Length) encodings are
+prohibited in v3.1 and they are just as prohibited in Simple-V: it's too
+complex in hardware. This means that **only** 32-bit "Defined Words"
+may be Vectorised, and in particular it means that no 64-bit instruction
+(EXT100-163) may **ever** be Vectorised.
+
+Secondly, the term "Vectoriseable" was used. This refers to "instructions
+which if SVP64-Prefixed are actually meaningful". `sc` is meaningless
+to Vectorise, for example, as is `sync` and `mtmsr` (there is only ever
+going to be one MSR).
+
+The problem comes if the rationale is applied, "if unused,
+Unvectoriseable opcodes
+can therefore be allocated to alternative instructions mixing inside
+the SVP64
+Opcode space",
+which unfortunately results in huge inadviseable complexity in HDL at the
+Decode Phase, attempting to discern between the two types. Worse than that,
+if the alternate 64-bit instruction is Vectoriseable but the 32-bit Scalar
+"Defined Word" is already allocated, how can there ever be a Scalar version
+of the alternate instruction? It would have to be added as a **completely
+different** 32-bit "Defined Word", and things go rapidly downhill in the
+Decoder as well as the ISA from there.
+
+Therefore to avoid risk and long-term damage to the Power ISA:
+
+* *even Unvectoriseable* "Defined Words" (`mtmsr`) must have the
+ corresponding SVP64 Prefixed Space `RESERVED`, permanently requiring
+ Illegal Instruction to be raised (the 64-bit encoding corresponding
+ to an illegal `sv.mtmsr` if ever incorrectly attempted must be
+ **defined** to raise an Exception)
+* *Even instructions that may not be Scalar* (although for various
+ practical reasons this is extremely rare if not impossible,
+ if not just generally "strongly discouraged")
+ which have no meaning or use as a 32-bit Scalar "Defined Word", **must**
+ still have the Scalar "Defined Word" `RESERVED` in the scalar
+ opcode space, as an Illegal Instruction.
+
+A good example of the former is `mtmsr` because there is only one
+MSR register (`sv.mtmsr` is meaningless, as is `sv.sc`),
+and a good example of the latter is [[sv/mv.x]]
+which is so deeply problematic to add to any Scalar ISA that it was
+rejected outright and an alternative route taken (Indexed REMAP).
+
+Another good example would be Cross Product which has no meaning
+at all in a Scalar ISA (Cross Product as a concept only applies
+to Mathematical Vectors). If any such Vector operation were ever added,
+it would be **critically** important to reserve the exact same *Scalar*
+opcode with the exact same "Defined Word" in the *Scalar* Power ISA
+opcode space, as an Illegal Instruction. There are
+good reasons why Cross Product has not been proposed, but it serves
+to illustrate the point as far as Architectural Resource Allocation is
+concerned.
+
+Bottom line is that whilst this seems wasteful the alternatives are a
+destabilisation of the Power ISA and impractically-complex Hardware
+Decoders. With the Scalar Power ISA (v3.0, v3.1) already being comprehensive
+in the number of instructions, keeping further Decode complexity down is a
+high priority.
+
# Other Scalable Vector ISAs
These Scalable Vector ISAs are listed to aid in understanding and
AVX-512 and SVE2 truly "Scalable".* [[sv/comparison_table]] in tabular
form.
-# Major opcodes summary
+# Major opcodes summary <a name="major_op_summary"> </a>
-Simple-V itself only requires five instructions with 6-bit Minor XO
+Simple-V itself only requires six instructions with 6-bit Minor XO
(bits 26-31), and the SVP64 Prefix Encoding requires
25% space of the EXT001 Major Opcode.
There are **no** Vector Instructions and consequently **no further
is considerable concern that because there is not yet any two-way
day-to-day communication established with the OPF ISA WG, we have
no idea if any of these are conflicting with future plans by any OPF
-Members. **The External ISA WG RFC Process is yet to be ratified
-and Libre-SOC may not join the OPF as an entity because it does
+Members. **The External ISA WG RFC Process has now been ratified
+but Libre-SOC may not join the OPF as an entity because it does
not exist except in name. Even if it existed it would be a conflict
of interest to join the OPF, due to our funding remit from NLnet**.
We therefore proceed on the basis of making public the intention to
situation is a high priority which in turn by necessity puts pressure
on the 32-bit Major Opcode space.
-SVP64 itself is already under pressure, being only 24 bits. If it is
-not permitted to take up 25% of EXT001 then it would have to be proposed
-in its own Major Opcode, which on first consideration would be beneficial
-for SVP64 due to the availability of 2 extra bits.
-However when combined with the bitmanip scalar instructions
-requiring two Major opcodes this would come to a grand total of 3 precious
-Major opcodes. On balance, then, sacrificing 25% of EXT001 is the "least
-difficult" choice.
-
Note also that EXT022, the Official Architectural Sandbox area
available for "Custom non-approved purposes" according to the Power
ISA Spec,
is under severe design pressure as it is insufficient to hold
the full extent of the instruction additions required to create
-a Hybrid 3D CPU-VPU-GPU. Akthough the wording of the Power ISA
+a Hybrid 3D CPU-VPU-GPU. Although the wording of the Power ISA
Specification leaves open the *possibility* of not needing to
propose ISA Extensions to the ISA WG, it is clear that EXT022
is an inappropriate location for a large high-profile Extension
therefore be made to work with the OPF ISA WG to
submit SVP64 via the External RFC Process.
-**Whilst SVP64 is only 5 instructions
+**Whilst SVP64 is only 6 instructions
the heavy focus on VSX for the past 12 years has left the SFFS Level
anaemic and out-of-date compared to ARM and x86.**
This is very much
Examples experiments future ideas discussion:
+* [Scalar register access](https://bugs.libre-soc.org/show_bug.cgi?id=905)
+ above r31 and CR7.
* [[sv/propagation]] Context propagation including svp64, swizzle and remap
* [[sv/masked_vector_chaining]]
* [[sv/discussion]]