From d22a0e7ed01de001077607bfb9aa5e719c2c0539 Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Tue, 31 May 2022 10:56:55 +0100 Subject: [PATCH] whitespace cleanup --- openpower/sv.mdwn | 102 +++++++++++++++++++++++++++++++--------------- 1 file changed, 69 insertions(+), 33 deletions(-) diff --git a/openpower/sv.mdwn b/openpower/sv.mdwn index 399073e5a..e96ac72a2 100644 --- a/openpower/sv.mdwn +++ b/openpower/sv.mdwn @@ -16,23 +16,34 @@ explicit Vector opcode exists in SV, at all**. Fundamental design principles: * Simplicity of introduction and implementation on the existing OpenPOWER ISA -* Effectively a hardware for-loop, pausing PC, issuing multiple scalar operations -* Preserving the underlying scalar execution dependencies as if the for-loop had been expanded as actual scalar instructions +* Effectively a hardware for-loop, pausing PC, issuing multiple scalar + operations +* Preserving the underlying scalar execution dependencies as if the + for-loop had been expanded as actual scalar instructions (termed "preserving Program Order") -* Augments ("tags") existing instructions, providing Vectorisation "context" rather than adding new ones. -* Does not modify or deviate from the underlying scalar OpenPOWER ISA unless it provides significant performance or other advantage to do so in the Vector space (dropping XER.SO and OE=1 for example) +* Augments ("tags") existing instructions, providing Vectorisation + "context" rather than adding new ones. +* Does not modify or deviate from the underlying scalar OpenPOWER ISA + unless it provides significant performance or other advantage to do so + in the Vector space (dropping XER.SO and OE=1 for example) * Designed for Supercomputing: avoids creating significant sequential -dependency hazards, allowing high performance superscalar microarchitectures to be deployed. + dependency hazards, allowing high performance superscalar + microarchitectures to be deployed. Advantages of these design principles: -* It is therefore easy to create a first (and sometimes only) implementation as literally a for-loop in hardware, simulators, and compilers. +* It is therefore easy to create a first (and sometimes only) + implementation as literally a for-loop in hardware, simulators, and + compilers. * Hardware Architects may understand and implement SV as being an extra pipeline stage, inserted between decode and issue, that is a simple for-loop issuing element-level sub-instructions. * More complex HDL can be done by repeating existing scalar ALUs and pipelines as blocks and leveraging existing Multi-Issue Infrastructure -* As (mostly) a high-level "context" that does not (significantly) deviate from scalar OpenPOWER ISA and, in its purest form being "a for loop around scalar instructions", it is minimally-disruptive and consequently stands a reasonable chance of broad community adoption and acceptance +* As (mostly) a high-level "context" that does not (significantly) deviate + from scalar OpenPOWER ISA and, in its purest form being "a for loop around + scalar instructions", it is minimally-disruptive and consequently stands + a reasonable chance of broad community adoption and acceptance * Completely wipes not just SIMD opcode proliferation off the map (SIMD is O(N^6) opcode proliferation) but off of Vectorisation ISAs as well. No more separate Vector @@ -98,11 +109,13 @@ Pages being developed and examples * [[sv/mv.swizzle]] * [[sv/mv.x]] * SVP64 "Modes": - - For condition register operations see [[sv/cr_ops]] - SVP64 Condition Register ops: Guidelines - on Vectorisation of any v3.0B base operations which return - or modify a Condition Register bit or field. + - For condition register operations see [[sv/cr_ops]] - SVP64 Condition + Register ops: Guidelines + on Vectorisation of any v3.0B base operations which return + or modify a Condition Register bit or field. - For LD/ST Modes, see [[sv/ldst]]. - - For Branch modes, see [[sv/branches]] - SVP64 Conditional Branch behaviour: All/Some Vector CRs + - For Branch modes, see [[sv/branches]] - SVP64 Conditional Branch + behaviour: All/Some Vector CRs - For arithmetic and logical, see [[sv/normal]] * [[sv/fcvt]] FP Conversion (due to OpenPOWER Scalar FP32) * [[sv/fclass]] detect class of FP numbers @@ -147,33 +160,53 @@ Additional links: Required Background Reading: ============================ -These are all, deep breath, basically... required reading, *as well as and in addition* to a full and comprehensive deep technical understanding of the Power ISA, in order to understand the depth and background on SVP64 as a 3D GPU and VPU Extension. +These are all, deep breath, basically... required reading, *as well as +and in addition* to a full and comprehensive deep technical understanding +of the Power ISA, in order to understand the depth and background on +SVP64 as a 3D GPU and VPU Extension. -I am keenly aware that each of them is 300 to 1,000 pages (just like the Power ISA itself). +I am keenly aware that each of them is 300 to 1,000 pages (just like +the Power ISA itself). This is just how it is. -Given the sheer overwhelming size and scope of SVP64 we have gone to CONSIDERABLE LENGTHS to provide justification and rationalisation for adding the various sub-extensions to the Base Scalar Power ISA. +Given the sheer overwhelming size and scope of SVP64 we have gone to +**considerable lengths** to provide justification and rationalisation for +adding the various sub-extensions to the Base Scalar Power ISA. -* Scalar bitmanipulation is justifiable for the exact same reasons the extensions are justifiable for other ISAs. The additional justification for their inclusion where some instructions are already (sort-of) present in VSX is that VSX is not mandatory, and the complexity of implementation of VSX is too high a price to pay at the Embedded SFFS Compliancy Level. +* Scalar bitmanipulation is justifiable for the exact same reasons the + extensions are justifiable for other ISAs. The additional justification + for their inclusion where some instructions are already (sort-of) present + in VSX is that VSX is not mandatory, and the complexity of implementation + of VSX is too high a price to pay at the Embedded SFFS Compliancy Level. +* Scalar FP-to-INT conversions, likewise. ARM has a javascript conversion + instruction, Power ISA does not (and it costs a ridiculous 45 instructions + to implement, including 6 branches!) +* Scalar Transcendentals (SIN, COS, ATAN2, LOG) are easily justifiable + for High-Performance Compute workloads. -* Scalar FP-to-INT conversions, likewise. ARM has a javascript conversion instruction, Power ISA does not (and it costs a ridiculous 45 instructions to implement, including 6 branches!) +It also has to be pointed out that normally this work would be covered by +multiple separate full-time Workgroups with multiple Members contributing +their time and resources! -* Scalar Transcendentals (SIN, COS, ATAN2, LOG) are easily justifiable for High-Performance Compute workloads. - -It also has to be pointed out that normally this work would be covered by multiple separate full-time Workgroups with multiple Members contributing their time and resources! - -Overall the contributions that we are developing take the Power ISA out of the specialist highly-focussed market it is presently best known for, and expands it into areas with much wider general adoption and broader uses. +Overall the contributions that we are developing take the Power ISA out of +the specialist highly-focussed market it is presently best known for, and +expands it into areas with much wider general adoption and broader uses. --- -OpenCL specifications are linked here, these are relevant when we get to a 3D GPU / High Performance Compute ISA WG RFC: +OpenCL specifications are linked here, these are relevant when we get +to a 3D GPU / High Performance Compute ISA WG RFC: [[openpower/transcendentals]] -(Failure to add Transcendentals to a 3D GPU is directly equivalent to *willfully* designing a product that is 100% destined for commercial failure.) +(Failure to add Transcendentals to a 3D GPU is directly equivalent to +*willfully* designing a product that is 100% destined for commercial +failure.) -I mention these because they will be encountered in every single commercial GPU ISA, but they're not part of the "Base" (core design) of a Vector Processor. Transcendentals can be added as a sub-RFC. +I mention these because they will be encountered in every single +commercial GPU ISA, but they're not part of the "Base" (core design) +of a Vector Processor. Transcendentals can be added as a sub-RFC. --- @@ -208,19 +241,22 @@ Actual Vector Processor Architectures and ISAs: * Cray ISA - * RISC-V RVV - * MRISC32 ISA Manual (under active development) - -* Mitch Alsup's MyISA 66000 Vector Processor ISA Manual is available from Mitch on direct contact with him. It is a different approach from the others, which may be termed "Cray-Style Horizontal-First" Vectorisation. 66000 is a *Vertical-First* Vector ISA. - -The term Horizontal or Vertical alludes to the Matrix "Row-First" or "Column-First" technique, where: - -* Horizontal-First processes all elements in a Vector before moving on to the next instruction -* Vertical-First processes *ONE* element per instruction, and requires loop constructs to explicitly step to the next element. +* Mitch Alsup's MyISA 66000 Vector Processor ISA Manual is available from + Mitch on direct contact with him. It is a different approach from the + others, which may be termed "Cray-Style Horizontal-First" Vectorisation. + 66000 is a *Vertical-First* Vector ISA. + +The term Horizontal or Vertical alludes to the Matrix "Row-First" or +"Column-First" technique, where: + +* Horizontal-First processes all elements in a Vector before moving on + to the next instruction +* Vertical-First processes *ONE* element per instruction, and requires + loop constructs to explicitly step to the next element. Vector-type Support by Architecture [[!table data=""" -- 2.30.2