X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=openpower%2Fsv.mdwn;h=732e38d3faf842bd68b7507228c6ff070ac5c9ba;hb=35bbedd4f0a77290c27730201112c2222636ee6a;hp=ad3d107a61e5a6dfe7350ffb3e5a4e107304120f;hpb=2f1b476e21577f2de81c7866ac62ad42fbb96793;p=libreriscv.git

diff --git a/openpower/sv.mdwn b/openpower/sv.mdwn
index ad3d107a6..732e38d3f 100644
--- a/openpower/sv.mdwn
+++ b/openpower/sv.mdwn
@@ -1,20 +1,170 @@
+[[!tag standards]]
+
 # Simple-V Vectorisation for the OpenPOWER ISA
 
+**SV is in DRAFT STATUS**. SV has not yet been submitted to the OpenPOWER Foundation ISA WG for review.
+
+<https://bugs.libre-soc.org/show_bug.cgi?id=213>
+
 Fundamental design principles:
 
 * Simplicity of introduction and implementation on the existing OpenPOWER ISA
 * Effectively a hardware for-loop, pausing PC, issuing multiple scalar operations
+* Preserving the underlying scalar execution dependencies as if the for-loop had been expanded as actual scalar instructions
+  (termed "preserving Program Order")
 * Augments ("tags") existing instructions, providing Vectorisation "context" rather than adding new ones.
+* Does not modify or deviate from the underlying scalar OpenPOWER ISA unless it provides significant performance or other advantage to do so in the Vector space (dropping XER.SO and OE=1 for example)
+* Designed for Supercomputing: avoids creating significant sequential
+dependency hazards, allowing high performance superscalar microarchitectures to be deployed.
 
 Advantages of these design principles:
 
 * It is therefore easy to create a first (and sometimes only) implementation as literally a for-loop in hardware, simulators, and compilers.
 * More complex HDL can be done by repeating existing scalar ALUs and pipelines as blocks.
-* As (mostly) a high-level "context" that does not (significantly) deviate from scalar OpenPOWER ISA it is minimally-disruptive and consequently stands a reasonable chance of broad community adoption and acceptance
+* As (mostly) a high-level "context" that does not (significantly) deviate from scalar OpenPOWER ISA and, in its purest form being "a for loop around scalar instructions", it is minimally-disruptive and consequently stands a reasonable chance of broad community adoption and acceptance
+* Completely wipes not just SIMD opcode proliferation off the
+  map (SIMD is O(N^6) opcode proliferation)
+  but off of Vectorisation ISAs as well.  No more separate Vector
+  instructions.
 
 Pages being developed and examples
 
-* [[openpower/sv/predication]]
-* [[simple_v_extension/masked_vector_chaining]]
+* [[sv/overview]] explaining the basics.
+* [[sv/implementation]] implementation planning and coordination
+* [[sv/svp64]] contains the packet-format *only*
+* [[sv/setvl]] the Cray-style "Vector Length" instruction
+* [[sv/predication]] discussion on predication concepts
+* [[sv/cr_int_predication]] instructions needed for effective predication
+* [[sv/masked_vector_chaining]]
 * [[sv/discussion]]
 * [[sv/example_dep_matrices]]
+* [[sv/major_opcode_allocation]]
+* [[opcode_regs_deduped]]
+* [[sv/vector_swizzle]]
+* [[sv/vector_ops]]
+* [[sv/register_type_tags]]
+* [[sv/mv.swizzle]]
+* [[sv/mv.x]]
+* [[sv/branches]] - SVP64 Conditional Branch behaviour: All/Some Vector CRs
+* [[sv/cr_ops]] - SVP64 Condition Register ops: Guidelines
+ on Vectorisation of any v3.0B base operations which return
+ or modify a Condition Register bit or field.
+* [[sv/fcvt]] FP Conversion (due to OpenPOWER Scalar FP32)
+* [[sv/fclass]] detect class of FP numbers
+* [[sv/int_fp_mv]] Move and convert GPR <-> FPR, needed for !VSX
+* [[sv/mv.vec]] move to and from vec2/3/4
+* [[sv/16_bit_compressed]] experimental
+* [[sv/toc_data_pointer]] experimental
+* [[sv/ldst]] Load and Store
+* [[sv/sprs]] SPRs
+* [[sv/bitmanip]]
+* [[sv/remap]] "Remapping" for Matrix Multiply and RGB "Structure Packing"
+* [[sv/propagation]] Context propagation including svp64, swizzle and remap
+* [[sv/vector_ops]] Vector ops needed to make a "complete" Vector ISA
+* [[sv/av_opcodes]] scalar opcodes for Audio/Video
+* [[sv/byteswap]]
+* TODO: OpenPOWER [[openpower/transcendentals]]
+
+Additional links:
+
+* <https://www.sigarch.org/simd-instructions-considered-harmful/>
+* [[simple_v_extension]] old (deprecated) version
+* [[openpower/sv/llvm]]
+* [[openpower/sv/effect-of-more-decode-stages-on-reg-renaming]]
+
+===
+
+Required Background Reading:
+============================
+
+These are all, deep breath, basically... required reading, *as well as and in addition* to a full and comprehensive deep technical understanding of the Power ISA, in order to understand the depth and background on SVP64 as a 3D GPU and VPU Extension.
+
+I am keenly aware that each of them is 300 to 1,000 pages (just like the Power ISA itself).
+
+This is just how it is.
+
+Given the sheer overwhelming size and scope of SVP64 we have gone to CONSIDERABLE LENGTHS to provide justification and rationalisation for adding the various sub-extensions to the Base Scalar Power ISA.
+
+* Scalar bitmanipulation is justifiable for the exact same reasons the extensions are justifiable for other ISAs.  The additional justification for their inclusion where some instructions are already (sort-of) present in VSX is that VSX is not mandatory, and the complexity of implementation of VSX is too high a price to pay at the Embedded SFFS Compliancy Level.
+
+* Scalar FP-to-INT conversions, likewise.  ARM has a javascript conversion instruction, Power ISA does not (and it costs a ridiculous 45 instructions to implement, including 6 branches!)
+
+* Scalar Transcendentals (SIN, COS, ATAN2, LOG) are easily justifiable for High-Performance Compute workloads.
+
+It also has to be pointed out that normally this work would be covered by multiple separate full-time Workgroups with multiple Members contributing their time and resources!
+
+Overall the contributions that we are developing take the Power ISA out of the specialist highly-focussed market it is presently best known for, and expands it into areas with much wider general adoption and broader uses.
+
+
+---
+
+OpenCL specifications are linked here, these are relevant when we get to a 3D GPU / High Performance Compute ISA WG RFC:
+[[openpower/transcendentals]]
+
+(Failure to add Transcendentals to a 3D GPU is directly equivalent to *willfully* designing a product that is 100% destined for commercial failure.)
+
+I mention these because they will be encountered in every single commercial GPU ISA, but they're not part of the "Base" (core design) of a Vector Processor. Transcendentals can be added as a sub-RFC.
+
+---
+
+Actual 3D GPU Architectures and ISAs:
+-------------------------------------
+
+* Broadcom Videocore
+  <https://github.com/hermanhermitage/videocoreiv>
+
+* Etnaviv
+  <https://github.com/etnaviv/etna_viv/tree/master/doc>
+
+* Nyuzi
+  <http://www.cs.binghamton.edu/~millerti/nyuziraster.pdf>
+
+* MALI
+  <https://github.com/cwabbott0/mali-isa-docs>
+
+* AMD
+  <https://developer.amd.com/wp-content/resources/RDNA_Shader_ISA.pdf>  
+  <https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf>
+
+* MIAOW which is *NOT* a 3D GPU, it is a processor which happens to implement a subset of the AMDGPU ISA (Southern Islands), aka a "GPGPU"
+  <https://miaowgpu.org/>
+
+
+Actual Vector Processor Architectures and ISAs:
+-----------------------------------------------
+
+* NEC SX Aurora
+  <https://www.hpc.nec/documents/guide/pdfs/Aurora_ISA_guide.pdf>
+
+* Cray ISA
+  <http://www.bitsavers.org/pdf/cray/CRAY_Y-MP/HR-04001-0C_Cray_Y-MP_Computer_Systems_Functional_Description_Jun90.pdf>
+
+* RISC-V RVV
+  <https://github.com/riscv/riscv-v-spec>
+
+* MRISC32 ISA Manual (under active development)
+  <https://github.com/mrisc32/mrisc32/tree/master/isa-manual>
+
+* Mitch Alsup's MyISA 66000 Vector Processor ISA Manual is available from Mitch on direct contact with him.  It is a different approach from the others, which may be termed "Cray-Style Horizontal-First" Vectorisation.  66000 is a *Vertical-First* Vector ISA.
+
+The term Horizontal or Vertical alludes to the Matrix "Row-First" or "Column-First" technique, where:
+
+* Horizontal-First processes all elements in a Vector before moving on to the next instruction
+* Vertical-First processes *ONE* element per instruction, and requires loop constructs to explicitly step to the next element.
+
+Vector-type Support by Architecture
+[[!table  data="""
+Architecture | Horizontal | Vertical
+MyISA 66000  |            | X
+Cray         | X          |
+SX Aurora    | X          |
+RVV          | X          |
+SVP64        | X          | X
+"""]]
+
+===
+
+Obligatory Dilbert:
+
+<img src="https://assets.amuniversal.com/7fada35026ca01393d3d005056a9545d" width="600" />
+