Fundamental design principles:
* Simplicity of introduction and implementation on the existing OpenPOWER ISA
-* Effectively a hardware for-loop, pausing PC, issuing multiple scalar operations
-* Preserving the underlying scalar execution dependencies as if the for-loop had been expanded as actual scalar instructions
+* Effectively a hardware for-loop, pausing PC, issuing multiple scalar
+ operations
+* Preserving the underlying scalar execution dependencies as if the
+ for-loop had been expanded as actual scalar instructions
(termed "preserving Program Order")
-* Augments ("tags") existing instructions, providing Vectorisation "context" rather than adding new ones.
-* Does not modify or deviate from the underlying scalar OpenPOWER ISA unless it provides significant performance or other advantage to do so in the Vector space (dropping XER.SO and OE=1 for example)
+* Augments ("tags") existing instructions, providing Vectorisation
+ "context" rather than adding new ones.
+* Does not modify or deviate from the underlying scalar OpenPOWER ISA
+ unless it provides significant performance or other advantage to do so
+ in the Vector space (dropping XER.SO and OE=1 for example)
* Designed for Supercomputing: avoids creating significant sequential
-dependency hazards, allowing high performance superscalar microarchitectures to be deployed.
+ dependency hazards, allowing high performance superscalar
+ microarchitectures to be deployed.
Advantages of these design principles:
-* It is therefore easy to create a first (and sometimes only) implementation as literally a for-loop in hardware, simulators, and compilers.
+* It is therefore easy to create a first (and sometimes only)
+ implementation as literally a for-loop in hardware, simulators, and
+ compilers.
* Hardware Architects may understand and implement SV as being an
extra pipeline stage, inserted between decode and issue, that is
a simple for-loop issuing element-level sub-instructions.
* More complex HDL can be done by repeating existing scalar ALUs and
pipelines as blocks and leveraging existing Multi-Issue Infrastructure
-* As (mostly) a high-level "context" that does not (significantly) deviate from scalar OpenPOWER ISA and, in its purest form being "a for loop around scalar instructions", it is minimally-disruptive and consequently stands a reasonable chance of broad community adoption and acceptance
+* As (mostly) a high-level "context" that does not (significantly) deviate
+ from scalar OpenPOWER ISA and, in its purest form being "a for loop around
+ scalar instructions", it is minimally-disruptive and consequently stands
+ a reasonable chance of broad community adoption and acceptance
* Completely wipes not just SIMD opcode proliferation off the
map (SIMD is O(N^6) opcode proliferation)
but off of Vectorisation ISAs as well. No more separate Vector
* [[sv/mv.swizzle]]
* [[sv/mv.x]]
* SVP64 "Modes":
- - For condition register operations see [[sv/cr_ops]] - SVP64 Condition Register ops: Guidelines
- on Vectorisation of any v3.0B base operations which return
- or modify a Condition Register bit or field.
+ - For condition register operations see [[sv/cr_ops]] - SVP64 Condition
+ Register ops: Guidelines
+ on Vectorisation of any v3.0B base operations which return
+ or modify a Condition Register bit or field.
- For LD/ST Modes, see [[sv/ldst]].
- - For Branch modes, see [[sv/branches]] - SVP64 Conditional Branch behaviour: All/Some Vector CRs
+ - For Branch modes, see [[sv/branches]] - SVP64 Conditional Branch
+ behaviour: All/Some Vector CRs
- For arithmetic and logical, see [[sv/normal]]
* [[sv/fcvt]] FP Conversion (due to OpenPOWER Scalar FP32)
* [[sv/fclass]] detect class of FP numbers
Required Background Reading:
============================
-These are all, deep breath, basically... required reading, *as well as and in addition* to a full and comprehensive deep technical understanding of the Power ISA, in order to understand the depth and background on SVP64 as a 3D GPU and VPU Extension.
+These are all, deep breath, basically... required reading, *as well as
+and in addition* to a full and comprehensive deep technical understanding
+of the Power ISA, in order to understand the depth and background on
+SVP64 as a 3D GPU and VPU Extension.
-I am keenly aware that each of them is 300 to 1,000 pages (just like the Power ISA itself).
+I am keenly aware that each of them is 300 to 1,000 pages (just like
+the Power ISA itself).
This is just how it is.
-Given the sheer overwhelming size and scope of SVP64 we have gone to CONSIDERABLE LENGTHS to provide justification and rationalisation for adding the various sub-extensions to the Base Scalar Power ISA.
+Given the sheer overwhelming size and scope of SVP64 we have gone to
+**considerable lengths** to provide justification and rationalisation for
+adding the various sub-extensions to the Base Scalar Power ISA.
-* Scalar bitmanipulation is justifiable for the exact same reasons the extensions are justifiable for other ISAs. The additional justification for their inclusion where some instructions are already (sort-of) present in VSX is that VSX is not mandatory, and the complexity of implementation of VSX is too high a price to pay at the Embedded SFFS Compliancy Level.
+* Scalar bitmanipulation is justifiable for the exact same reasons the
+ extensions are justifiable for other ISAs. The additional justification
+ for their inclusion where some instructions are already (sort-of) present
+ in VSX is that VSX is not mandatory, and the complexity of implementation
+ of VSX is too high a price to pay at the Embedded SFFS Compliancy Level.
+* Scalar FP-to-INT conversions, likewise. ARM has a javascript conversion
+ instruction, Power ISA does not (and it costs a ridiculous 45 instructions
+ to implement, including 6 branches!)
+* Scalar Transcendentals (SIN, COS, ATAN2, LOG) are easily justifiable
+ for High-Performance Compute workloads.
-* Scalar FP-to-INT conversions, likewise. ARM has a javascript conversion instruction, Power ISA does not (and it costs a ridiculous 45 instructions to implement, including 6 branches!)
+It also has to be pointed out that normally this work would be covered by
+multiple separate full-time Workgroups with multiple Members contributing
+their time and resources!
-* Scalar Transcendentals (SIN, COS, ATAN2, LOG) are easily justifiable for High-Performance Compute workloads.
-
-It also has to be pointed out that normally this work would be covered by multiple separate full-time Workgroups with multiple Members contributing their time and resources!
-
-Overall the contributions that we are developing take the Power ISA out of the specialist highly-focussed market it is presently best known for, and expands it into areas with much wider general adoption and broader uses.
+Overall the contributions that we are developing take the Power ISA out of
+the specialist highly-focussed market it is presently best known for, and
+expands it into areas with much wider general adoption and broader uses.
---
-OpenCL specifications are linked here, these are relevant when we get to a 3D GPU / High Performance Compute ISA WG RFC:
+OpenCL specifications are linked here, these are relevant when we get
+to a 3D GPU / High Performance Compute ISA WG RFC:
[[openpower/transcendentals]]
-(Failure to add Transcendentals to a 3D GPU is directly equivalent to *willfully* designing a product that is 100% destined for commercial failure.)
+(Failure to add Transcendentals to a 3D GPU is directly equivalent to
+*willfully* designing a product that is 100% destined for commercial
+failure.)
-I mention these because they will be encountered in every single commercial GPU ISA, but they're not part of the "Base" (core design) of a Vector Processor. Transcendentals can be added as a sub-RFC.
+I mention these because they will be encountered in every single
+commercial GPU ISA, but they're not part of the "Base" (core design)
+of a Vector Processor. Transcendentals can be added as a sub-RFC.
---
* Cray ISA
<http://www.bitsavers.org/pdf/cray/CRAY_Y-MP/HR-04001-0C_Cray_Y-MP_Computer_Systems_Functional_Description_Jun90.pdf>
-
* RISC-V RVV
<https://github.com/riscv/riscv-v-spec>
-
* MRISC32 ISA Manual (under active development)
<https://github.com/mrisc32/mrisc32/tree/master/isa-manual>
-
-* Mitch Alsup's MyISA 66000 Vector Processor ISA Manual is available from Mitch on direct contact with him. It is a different approach from the others, which may be termed "Cray-Style Horizontal-First" Vectorisation. 66000 is a *Vertical-First* Vector ISA.
-
-The term Horizontal or Vertical alludes to the Matrix "Row-First" or "Column-First" technique, where:
-
-* Horizontal-First processes all elements in a Vector before moving on to the next instruction
-* Vertical-First processes *ONE* element per instruction, and requires loop constructs to explicitly step to the next element.
+* Mitch Alsup's MyISA 66000 Vector Processor ISA Manual is available from
+ Mitch on direct contact with him. It is a different approach from the
+ others, which may be termed "Cray-Style Horizontal-First" Vectorisation.
+ 66000 is a *Vertical-First* Vector ISA.
+
+The term Horizontal or Vertical alludes to the Matrix "Row-First" or
+"Column-First" technique, where:
+
+* Horizontal-First processes all elements in a Vector before moving on
+ to the next instruction
+* Vertical-First processes *ONE* element per instruction, and requires
+ loop constructs to explicitly step to the next element.
Vector-type Support by Architecture
[[!table data="""