X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=openpower%2Fsv.mdwn;h=732e38d3faf842bd68b7507228c6ff070ac5c9ba;hb=35bbedd4f0a77290c27730201112c2222636ee6a;hp=ad3d107a61e5a6dfe7350ffb3e5a4e107304120f;hpb=2f1b476e21577f2de81c7866ac62ad42fbb96793;p=libreriscv.git diff --git a/openpower/sv.mdwn b/openpower/sv.mdwn index ad3d107a6..732e38d3f 100644 --- a/openpower/sv.mdwn +++ b/openpower/sv.mdwn @@ -1,20 +1,170 @@ +[[!tag standards]] + # Simple-V Vectorisation for the OpenPOWER ISA +**SV is in DRAFT STATUS**. SV has not yet been submitted to the OpenPOWER Foundation ISA WG for review. + + + Fundamental design principles: * Simplicity of introduction and implementation on the existing OpenPOWER ISA * Effectively a hardware for-loop, pausing PC, issuing multiple scalar operations +* Preserving the underlying scalar execution dependencies as if the for-loop had been expanded as actual scalar instructions + (termed "preserving Program Order") * Augments ("tags") existing instructions, providing Vectorisation "context" rather than adding new ones. +* Does not modify or deviate from the underlying scalar OpenPOWER ISA unless it provides significant performance or other advantage to do so in the Vector space (dropping XER.SO and OE=1 for example) +* Designed for Supercomputing: avoids creating significant sequential +dependency hazards, allowing high performance superscalar microarchitectures to be deployed. Advantages of these design principles: * It is therefore easy to create a first (and sometimes only) implementation as literally a for-loop in hardware, simulators, and compilers. * More complex HDL can be done by repeating existing scalar ALUs and pipelines as blocks. -* As (mostly) a high-level "context" that does not (significantly) deviate from scalar OpenPOWER ISA it is minimally-disruptive and consequently stands a reasonable chance of broad community adoption and acceptance +* As (mostly) a high-level "context" that does not (significantly) deviate from scalar OpenPOWER ISA and, in its purest form being "a for loop around scalar instructions", it is minimally-disruptive and consequently stands a reasonable chance of broad community adoption and acceptance +* Completely wipes not just SIMD opcode proliferation off the + map (SIMD is O(N^6) opcode proliferation) + but off of Vectorisation ISAs as well. No more separate Vector + instructions. Pages being developed and examples -* [[openpower/sv/predication]] -* [[simple_v_extension/masked_vector_chaining]] +* [[sv/overview]] explaining the basics. +* [[sv/implementation]] implementation planning and coordination +* [[sv/svp64]] contains the packet-format *only* +* [[sv/setvl]] the Cray-style "Vector Length" instruction +* [[sv/predication]] discussion on predication concepts +* [[sv/cr_int_predication]] instructions needed for effective predication +* [[sv/masked_vector_chaining]] * [[sv/discussion]] * [[sv/example_dep_matrices]] +* [[sv/major_opcode_allocation]] +* [[opcode_regs_deduped]] +* [[sv/vector_swizzle]] +* [[sv/vector_ops]] +* [[sv/register_type_tags]] +* [[sv/mv.swizzle]] +* [[sv/mv.x]] +* [[sv/branches]] - SVP64 Conditional Branch behaviour: All/Some Vector CRs +* [[sv/cr_ops]] - SVP64 Condition Register ops: Guidelines + on Vectorisation of any v3.0B base operations which return + or modify a Condition Register bit or field. +* [[sv/fcvt]] FP Conversion (due to OpenPOWER Scalar FP32) +* [[sv/fclass]] detect class of FP numbers +* [[sv/int_fp_mv]] Move and convert GPR <-> FPR, needed for !VSX +* [[sv/mv.vec]] move to and from vec2/3/4 +* [[sv/16_bit_compressed]] experimental +* [[sv/toc_data_pointer]] experimental +* [[sv/ldst]] Load and Store +* [[sv/sprs]] SPRs +* [[sv/bitmanip]] +* [[sv/remap]] "Remapping" for Matrix Multiply and RGB "Structure Packing" +* [[sv/propagation]] Context propagation including svp64, swizzle and remap +* [[sv/vector_ops]] Vector ops needed to make a "complete" Vector ISA +* [[sv/av_opcodes]] scalar opcodes for Audio/Video +* [[sv/byteswap]] +* TODO: OpenPOWER [[openpower/transcendentals]] + +Additional links: + +* +* [[simple_v_extension]] old (deprecated) version +* [[openpower/sv/llvm]] +* [[openpower/sv/effect-of-more-decode-stages-on-reg-renaming]] + +=== + +Required Background Reading: +============================ + +These are all, deep breath, basically... required reading, *as well as and in addition* to a full and comprehensive deep technical understanding of the Power ISA, in order to understand the depth and background on SVP64 as a 3D GPU and VPU Extension. + +I am keenly aware that each of them is 300 to 1,000 pages (just like the Power ISA itself). + +This is just how it is. + +Given the sheer overwhelming size and scope of SVP64 we have gone to CONSIDERABLE LENGTHS to provide justification and rationalisation for adding the various sub-extensions to the Base Scalar Power ISA. + +* Scalar bitmanipulation is justifiable for the exact same reasons the extensions are justifiable for other ISAs. The additional justification for their inclusion where some instructions are already (sort-of) present in VSX is that VSX is not mandatory, and the complexity of implementation of VSX is too high a price to pay at the Embedded SFFS Compliancy Level. + +* Scalar FP-to-INT conversions, likewise. ARM has a javascript conversion instruction, Power ISA does not (and it costs a ridiculous 45 instructions to implement, including 6 branches!) + +* Scalar Transcendentals (SIN, COS, ATAN2, LOG) are easily justifiable for High-Performance Compute workloads. + +It also has to be pointed out that normally this work would be covered by multiple separate full-time Workgroups with multiple Members contributing their time and resources! + +Overall the contributions that we are developing take the Power ISA out of the specialist highly-focussed market it is presently best known for, and expands it into areas with much wider general adoption and broader uses. + + +--- + +OpenCL specifications are linked here, these are relevant when we get to a 3D GPU / High Performance Compute ISA WG RFC: +[[openpower/transcendentals]] + +(Failure to add Transcendentals to a 3D GPU is directly equivalent to *willfully* designing a product that is 100% destined for commercial failure.) + +I mention these because they will be encountered in every single commercial GPU ISA, but they're not part of the "Base" (core design) of a Vector Processor. Transcendentals can be added as a sub-RFC. + +--- + +Actual 3D GPU Architectures and ISAs: +------------------------------------- + +* Broadcom Videocore + + +* Etnaviv + + +* Nyuzi + + +* MALI + + +* AMD + + + +* MIAOW which is *NOT* a 3D GPU, it is a processor which happens to implement a subset of the AMDGPU ISA (Southern Islands), aka a "GPGPU" + + + +Actual Vector Processor Architectures and ISAs: +----------------------------------------------- + +* NEC SX Aurora + + +* Cray ISA + + +* RISC-V RVV + + +* MRISC32 ISA Manual (under active development) + + +* Mitch Alsup's MyISA 66000 Vector Processor ISA Manual is available from Mitch on direct contact with him. It is a different approach from the others, which may be termed "Cray-Style Horizontal-First" Vectorisation. 66000 is a *Vertical-First* Vector ISA. + +The term Horizontal or Vertical alludes to the Matrix "Row-First" or "Column-First" technique, where: + +* Horizontal-First processes all elements in a Vector before moving on to the next instruction +* Vertical-First processes *ONE* element per instruction, and requires loop constructs to explicitly step to the next element. + +Vector-type Support by Architecture +[[!table data=""" +Architecture | Horizontal | Vertical +MyISA 66000 | | X +Cray | X | +SX Aurora | X | +RVV | X | +SVP64 | X | X +"""]] + +=== + +Obligatory Dilbert: + + +