(no commit message)

[libreriscv.git] / openpower / sv.mdwn
diff --git a/openpower/sv.mdwn b/openpower/sv.mdwn

index a630b184dd30a6f744580140f8f22693be4ac144..732e38d3faf842bd68b7507228c6ff070ac5c9ba 100644 (file)
--- a/openpower/sv.mdwn
+++ b/openpower/sv.mdwn
@@ -13,7 +13,9 @@ Fundamental design principles:
  * Preserving the underlying scalar execution dependencies as if the for-loop had been expanded as actual scalar instructions
    (termed "preserving Program Order")
  * Augments ("tags") existing instructions, providing Vectorisation "context" rather than adding new ones.
-* Does not modify or deviate from the underly scalar OpenPOWER ISA unless it provides significant performance or other advantage to do so in the Vector space (dropping XER.SO and OE=1 for example)
+* Does not modify or deviate from the underlying scalar OpenPOWER ISA unless it provides significant performance or other advantage to do so in the Vector space (dropping XER.SO and OE=1 for example)
+* Designed for Supercomputing: avoids creating significant sequential
+dependency hazards, allowing high performance superscalar microarchitectures to be deployed.
  
  Advantages of these design principles:
  
@@ -28,26 +30,35 @@ Advantages of these design principles:
  Pages being developed and examples
  
  * [[sv/overview]] explaining the basics.
+* [[sv/implementation]] implementation planning and coordination
+* [[sv/svp64]] contains the packet-format *only*
+* [[sv/setvl]] the Cray-style "Vector Length" instruction
  * [[sv/predication]] discussion on predication concepts
+* [[sv/cr_int_predication]] instructions needed for effective predication
  * [[sv/masked_vector_chaining]]
  * [[sv/discussion]]
  * [[sv/example_dep_matrices]]
-* [[sv/prefix]]
  * [[sv/major_opcode_allocation]]
  * [[opcode_regs_deduped]]
  * [[sv/vector_swizzle]]
+* [[sv/vector_ops]]
+* [[sv/register_type_tags]]
  * [[sv/mv.swizzle]]
  * [[sv/mv.x]]
+* [[sv/branches]] - SVP64 Conditional Branch behaviour: All/Some Vector CRs
+* [[sv/cr_ops]] - SVP64 Condition Register ops: Guidelines
+ on Vectorisation of any v3.0B base operations which return
+ or modify a Condition Register bit or field.
  * [[sv/fcvt]] FP Conversion (due to OpenPOWER Scalar FP32)
+* [[sv/fclass]] detect class of FP numbers
+* [[sv/int_fp_mv]] Move and convert GPR <-> FPR, needed for !VSX
  * [[sv/mv.vec]] move to and from vec2/3/4
-* [[sv/16_bit_compressed]]
-* [[sv/toc_data_pointer]]
-* [[sv/cr_int_predication]]
-* [[sv/setvl]]
-* [[sv/svp64]]
+* [[sv/16_bit_compressed]] experimental
+* [[sv/toc_data_pointer]] experimental
  * [[sv/ldst]] Load and Store
  * [[sv/sprs]] SPRs
  * [[sv/bitmanip]]
+* [[sv/remap]] "Remapping" for Matrix Multiply and RGB "Structure Packing"
  * [[sv/propagation]] Context propagation including svp64, swizzle and remap
  * [[sv/vector_ops]] Vector ops needed to make a "complete" Vector ISA
  * [[sv/av_opcodes]] scalar opcodes for Audio/Video
@@ -58,3 +69,102 @@ Additional links:
  
  * <https://www.sigarch.org/simd-instructions-considered-harmful/>
  * [[simple_v_extension]] old (deprecated) version
+* [[openpower/sv/llvm]]
+* [[openpower/sv/effect-of-more-decode-stages-on-reg-renaming]]
+
+===
+
+Required Background Reading:
+============================
+
+These are all, deep breath, basically... required reading, *as well as and in addition* to a full and comprehensive deep technical understanding of the Power ISA, in order to understand the depth and background on SVP64 as a 3D GPU and VPU Extension.
+
+I am keenly aware that each of them is 300 to 1,000 pages (just like the Power ISA itself).
+
+This is just how it is.
+
+Given the sheer overwhelming size and scope of SVP64 we have gone to CONSIDERABLE LENGTHS to provide justification and rationalisation for adding the various sub-extensions to the Base Scalar Power ISA.
+
+* Scalar bitmanipulation is justifiable for the exact same reasons the extensions are justifiable for other ISAs.  The additional justification for their inclusion where some instructions are already (sort-of) present in VSX is that VSX is not mandatory, and the complexity of implementation of VSX is too high a price to pay at the Embedded SFFS Compliancy Level.
+
+* Scalar FP-to-INT conversions, likewise.  ARM has a javascript conversion instruction, Power ISA does not (and it costs a ridiculous 45 instructions to implement, including 6 branches!)
+
+* Scalar Transcendentals (SIN, COS, ATAN2, LOG) are easily justifiable for High-Performance Compute workloads.
+
+It also has to be pointed out that normally this work would be covered by multiple separate full-time Workgroups with multiple Members contributing their time and resources!
+
+Overall the contributions that we are developing take the Power ISA out of the specialist highly-focussed market it is presently best known for, and expands it into areas with much wider general adoption and broader uses.
+
+
+---
+
+OpenCL specifications are linked here, these are relevant when we get to a 3D GPU / High Performance Compute ISA WG RFC:
+[[openpower/transcendentals]]
+
+(Failure to add Transcendentals to a 3D GPU is directly equivalent to *willfully* designing a product that is 100% destined for commercial failure.)
+
+I mention these because they will be encountered in every single commercial GPU ISA, but they're not part of the "Base" (core design) of a Vector Processor. Transcendentals can be added as a sub-RFC.
+
+---
+
+Actual 3D GPU Architectures and ISAs:
+-------------------------------------
+
+* Broadcom Videocore
+  <https://github.com/hermanhermitage/videocoreiv>
+
+* Etnaviv
+  <https://github.com/etnaviv/etna_viv/tree/master/doc>
+
+* Nyuzi
+  <http://www.cs.binghamton.edu/~millerti/nyuziraster.pdf>
+
+* MALI
+  <https://github.com/cwabbott0/mali-isa-docs>
+
+* AMD
+  <https://developer.amd.com/wp-content/resources/RDNA_Shader_ISA.pdf>  
+  <https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf>
+
+* MIAOW which is *NOT* a 3D GPU, it is a processor which happens to implement a subset of the AMDGPU ISA (Southern Islands), aka a "GPGPU"
+  <https://miaowgpu.org/>
+
+
+Actual Vector Processor Architectures and ISAs:
+-----------------------------------------------
+
+* NEC SX Aurora
+  <https://www.hpc.nec/documents/guide/pdfs/Aurora_ISA_guide.pdf>
+
+* Cray ISA
+  <http://www.bitsavers.org/pdf/cray/CRAY_Y-MP/HR-04001-0C_Cray_Y-MP_Computer_Systems_Functional_Description_Jun90.pdf>
+
+* RISC-V RVV
+  <https://github.com/riscv/riscv-v-spec>
+
+* MRISC32 ISA Manual (under active development)
+  <https://github.com/mrisc32/mrisc32/tree/master/isa-manual>
+
+* Mitch Alsup's MyISA 66000 Vector Processor ISA Manual is available from Mitch on direct contact with him.  It is a different approach from the others, which may be termed "Cray-Style Horizontal-First" Vectorisation.  66000 is a *Vertical-First* Vector ISA.
+
+The term Horizontal or Vertical alludes to the Matrix "Row-First" or "Column-First" technique, where:
+
+* Horizontal-First processes all elements in a Vector before moving on to the next instruction
+* Vertical-First processes *ONE* element per instruction, and requires loop constructs to explicitly step to the next element.
+
+Vector-type Support by Architecture
+[[!table  data="""
+Architecture | Horizontal | Vertical
+MyISA 66000  |            | X
+Cray         | X          |
+SX Aurora    | X          |
+RVV          | X          |
+SVP64        | X          | X
+"""]]
+
+===
+
+Obligatory Dilbert:
+
+<img src="https://assets.amuniversal.com/7fada35026ca01393d3d005056a9545d" width="600" />
+