X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=openpower%2Fsv.mdwn;h=23b9e811b5641af5f37dc558f0e0a4f80e51dd79;hb=83e17db9000ab78bc559eed77c9f06743551bd18;hp=25c607f2370ce2e7dcae882836afd309d4e01929;hpb=4d0f89ce1fb3d2408469e9bc9c6ad7ebd79c0096;p=libreriscv.git diff --git a/openpower/sv.mdwn b/openpower/sv.mdwn index 25c607f23..23b9e811b 100644 --- a/openpower/sv.mdwn +++ b/openpower/sv.mdwn @@ -38,7 +38,7 @@ Fundamental design principles: (termed "preserving Program Order") * Specifically designed to be Precise-Interruptible at all times (many Vector ISAs have operations which, due to higher internal - accuracy or other complexity, must be effectively atomic for + accuracy or other complexity, must be effectively atomic only for the full Vector operation's duration, adversely affecting interrupt response latency, or be abandoned and started again) * Augments ("tags") existing instructions, providing Vectorisation @@ -83,7 +83,7 @@ Comparative instruction count: * ARM NEON SIMD: around 2,000 instructions, prerequisite: ARM Scalar. * ARM SVE: around 4,000 instructions, prerequisite: NEON and ARM Scalar * ARM SVE2: around 1,000 instructions, prerequisite: SVE, NEON, and - ARM Scalar + ARM Scalar for a grand total of well over 7,000 instructions. * Intel AVX-512: around 4,000 instructions, prerequisite AVX, AVX2, AVX-128 and AVX-256 which in turn critically rely on the rest of x86, for a grand total of well over 10,000 instructions. @@ -132,12 +132,16 @@ Pages being developed and examples * [[sv/compliancy_levels]] for minimum subsets through to Advanced Supercomputing. * [[sv/implementation]] implementation planning and coordination +* [[sv/po9_encoding]] a new DRAFT 64-bit space similar to EXT1xx, + introducing new areas EXT232-63 and EXT300-363 * [[sv/svp64]] contains the packet-format *only*, the [[svp64/appendix]] contains explanations and further details +* [[sv/svp64-single]] still under development * [[sv/svp64_quirks]] things in SVP64 that slightly break the rules or are not immediately apparent despite the RISC paradigm * [[opcode_regs_deduped]] autogenerated table of SVP64 decoder augmentation * [[sv/sprs]] SPRs +* [[sv/rfc]] RFCs to the [OPF ISA WG](https://openpower.foundation/isarfc/) SVP64 "Modes": @@ -166,7 +170,7 @@ Core SVP64 instructions: Beyond this point are additional **Scalar** instructions related to specific workloads that have nothing to do with the SV Specification* -# Guarantees in Simple-V +# Stability Guarantees in Simple-V Providing long-term stability in an ISA is extremely challenging but critically important. @@ -174,27 +178,38 @@ It requires certain guarantees to be provided. * Firstly: that instructions will never be ambiguously-defined. * Secondly, that no instruction shall change meaning to produce - different results on different hardware (present or future) -* Thirdly, that implementors are not permitted to either add + different results on different hardware (present or future). +* Thirdly, that Scalar "defined words" (32 bit instruction + encodings) if Vectorised will also always be implemented as + identical Scalar instructions (the sole semi-exception being + Vectorised Branch-Conditional) +* Fourthly, that implementors are not permitted to either add arbitrary features nor implement features in an incompatible - way. -* Fourthly, that any part of Simple-V not implemented by + way. *(Performance may differ, but differing results are + not permitted)*. +* Fifthly, that any part of Simple-V not implemented by a lower Compliancy Level is *required* to raise an illegal - instruction trap. + instruction trap (allowing soft-emulation), including if + Simple-V is not implemented at all. +* Sixthly, that any `UNDEFINED` behaviour for practical implementation + reasons is clearly documented for both programmers and hardware + implementors. In particular, given the strong recent emphasis and interest in "Scalable Vector" ISAs, it is most unfortunate that both ARM SVE and RISC-V RVV permit the exact same instruction to produce different results on different hardware depending on a "Silicon Partner" hardware choice. This choice catastrophically -and irrevocably causes binary non-interoperability despite being -a "feature". Explained in +and irrevocably causes binary non-interoperability *despite being +a "feature"*. Explained in +it is the exact same binary-incompatibility issue faced by Power ISA +on its 32- to 64-bit transition: 32-bit hardware was **unable** to +trap-and-emulate 64-bit binaries because the opcodes were (are) the same. It is therefore *guaranteed* that extensions to the register file width and quantity in Simple-V shall only be made in future by explicit means, ensuring binary compatibility. - # Optional Scalar instructions **Additional Instructions for specific purposes (not SVP64)** @@ -210,10 +225,10 @@ Some of these Scalar instructions happen also designed to make Scalable Vector binaries more efficient, such as the crweird group. Others are to bring the Scalar Power ISA up-to-date within specific workloads, -such as a Javascript Rounding instruction -(which saves 35 instructions including 5 branches). None of them are strictly -necessary but performance and power consumption may be (or, is already) -compromised +such as a JavaScript Rounding instruction +(which saves 32 scalar instructions including seven branch instructions). +None of them are strictly necessary but performance and power consumption may +be (or, is already) compromised in certain workloads and use-cases without them. Vector-related but still Scalar: @@ -231,6 +246,7 @@ Stand-alone Scalar Instructions: * [[sv/fclass]] detect class of FP numbers * [[sv/int_fp_mv]] Move and convert GPR <-> FPR, needed for !VSX * [[sv/av_opcodes]] scalar opcodes for Audio/Video +* [[prefix_codes]] Decode/encode prefix-codes, used by JPEG, DEFLATE, etc. * TODO: OpenPOWER adaptation [[openpower/transcendentals]] Twin targetted instructions (two registers out, one implicit, just like @@ -243,6 +259,95 @@ Load-with-Update). Explanation of the rules for twin register targets (implicit RS, FRS) explained in SVP64 [[svp64/appendix]] +# Architectural Note + +This section is primarily for the ISA Working Group and for IBM +in their capacity and responsibility for allocating "Architectural +Resources" (opcodes), but it is also useful for general understanding +of Simple-V. + +Simple-V is effectively a type of "Zero-Overhead Loop Control" to which +an entire 24 bits are exclusively dedicated in a fully RISC-abstracted +manner. Within those 24-bits there are no Scalar instructions, and +no Vector instructions: there is *only* "Loop Control". + +This is why there are no actual Vector operations in Simple-V: *all* suitable +Scalar Operations are Vectorised or not at all. This has some extremely +important implications when considering adding new instructions, and +especially when allocating the Opcode Space for them. +To protect SVP64 from damage, a "Hard Rule" has to be set: + + Scalar Instructions must be simultaneously added in the corresponding + SVP64 opcode space with the exact same 32-bit "Defined Word" or they + must not be added at all. Likewise, instructions planned for addition + in what is considered (wrongly) to be the exclusive "Vector" domain + must correspondingly be added in the Scalar space with the exact same + 32-bit "Defined Word", or they must not be added at all. + +Some explanation of the above is needed. Firstly, "Defined Word" is a term +used in Section 1.6.3 of the Power ISA v3 1 Book I: it means, in short, +"a 32 bit instruction", which can then be Prefixed by EXT001 to extend it +to 64-bit (named EXT100-163). +Prefixed-Prefixed (96-bit Variable-Length) encodings are +prohibited in v3.1 and they are just as prohibited in Simple-V: it's too +complex in hardware. This means that **only** 32-bit "Defined Words" +may be Vectorised, and in particular it means that no 64-bit instruction +(EXT100-163) may **ever** be Vectorised. + +Secondly, the term "Vectoriseable" was used. This refers to "instructions +which if SVP64-Prefixed are actually meaningful". `sc` is meaningless +to Vectorise, for example, as is `sync` and `mtmsr` (there is only ever +going to be one MSR). + +The problem comes if the rationale is applied, "if unused, +Unvectoriseable opcodes +can therefore be allocated to alternative instructions mixing inside +the SVP64 +Opcode space", +which unfortunately results in huge inadviseable complexity in HDL at the +Decode Phase, attempting to discern between the two types. Worse than that, +if the alternate 64-bit instruction is Vectoriseable but the 32-bit Scalar +"Defined Word" is already allocated, how can there ever be a Scalar version +of the alternate instruction? It would have to be added as a **completely +different** 32-bit "Defined Word", and things go rapidly downhill in the +Decoder as well as the ISA from there. + +Therefore to avoid risk and long-term damage to the Power ISA: + +* *even Unvectoriseable* "Defined Words" (`mtmsr`) must have the + corresponding SVP64 Prefixed Space `RESERVED`, permanently requiring + Illegal Instruction to be raised (the 64-bit encoding corresponding + to an illegal `sv.mtmsr` if ever incorrectly attempted must be + **defined** to raise an Exception) +* *Even instructions that may not be Scalar* (although for various + practical reasons this is extremely rare if not impossible, + if not just generally "strongly discouraged") + which have no meaning or use as a 32-bit Scalar "Defined Word", **must** + still have the Scalar "Defined Word" `RESERVED` in the scalar + opcode space, as an Illegal Instruction. + +A good example of the former is `mtmsr` because there is only one +MSR register (`sv.mtmsr` is meaningless, as is `sv.sc`), +and a good example of the latter is [[sv/mv.x]] +which is so deeply problematic to add to any Scalar ISA that it was +rejected outright and an alternative route taken (Indexed REMAP). + +Another good example would be Cross Product which has no meaning +at all in a Scalar ISA (Cross Product as a concept only applies +to Mathematical Vectors). If any such Vector operation were ever added, +it would be **critically** important to reserve the exact same *Scalar* +opcode with the exact same "Defined Word" in the *Scalar* Power ISA +opcode space, as an Illegal Instruction. There are +good reasons why Cross Product has not been proposed, but it serves +to illustrate the point as far as Architectural Resource Allocation is +concerned. + +Bottom line is that whilst this seems wasteful the alternatives are a +destabilisation of the Power ISA and impractically-complex Hardware +Decoders. With the Scalar Power ISA (v3.0, v3.1) already being comprehensive +in the number of instructions, keeping further Decode complexity down is a +high priority. + # Other Scalable Vector ISAs These Scalable Vector ISAs are listed to aid in understanding and @@ -291,8 +396,8 @@ Please be advised that even though SV is entirely DRAFT status, there is considerable concern that because there is not yet any two-way day-to-day communication established with the OPF ISA WG, we have no idea if any of these are conflicting with future plans by any OPF -Members. **The External ISA WG RFC Process is yet to be ratified -and Libre-SOC may not join the OPF as an entity because it does +Members. **The External ISA WG RFC Process has now been ratified +but Libre-SOC may not join the OPF as an entity because it does not exist except in name. Even if it existed it would be a conflict of interest to join the OPF, due to our funding remit from NLnet**. We therefore proceed on the basis of making public the intention to @@ -338,21 +443,12 @@ would become a whopping 96-bit long instruction. Avoiding this situation is a high priority which in turn by necessity puts pressure on the 32-bit Major Opcode space. -SVP64 itself is already under pressure, being only 24 bits. If it is -not permitted to take up 25% of EXT001 then it would have to be proposed -in its own Major Opcode, which on first consideration would be beneficial -for SVP64 due to the availability of 2 extra bits. -However when combined with the bitmanip scalar instructions -requiring two Major opcodes this would come to a grand total of 3 precious -Major opcodes. On balance, then, sacrificing 25% of EXT001 is the "least -difficult" choice. - Note also that EXT022, the Official Architectural Sandbox area available for "Custom non-approved purposes" according to the Power ISA Spec, is under severe design pressure as it is insufficient to hold the full extent of the instruction additions required to create -a Hybrid 3D CPU-VPU-GPU. Akthough the wording of the Power ISA +a Hybrid 3D CPU-VPU-GPU. Although the wording of the Power ISA Specification leaves open the *possibility* of not needing to propose ISA Extensions to the ISA WG, it is clear that EXT022 is an inappropriate location for a large high-profile Extension