sync_up: Formatting fixes

[libreriscv.git] / simple_v_extension / abridged_spec.mdwn
diff --git a/simple_v_extension/abridged_spec.mdwn b/simple_v_extension/abridged_spec.mdwn

index f2dff669e549aa8844302539532336257e09a5f8..1c1a180f9463b0a60d86eed8267160fce539dc16 100644 (file)
--- a/simple_v_extension/abridged_spec.mdwn
+++ b/simple_v_extension/abridged_spec.mdwn
@@ -1,3 +1,4 @@
+
  # Simple-V (Parallelism Extension Proposal) Specification (Abridged)
  
  * Copyright (C) 2017, 2018, 2019 Luke Kenneth Casson Leighton
@@ -13,6 +14,11 @@ Simple-V is a uniform parallelism API for RISC-V hardware that allows
  the Program Counter to enter "sub-contexts" in which, ultimately, standard
  RISC-V scalar opcodes are executed.
  
+Regardless of the actual amount of hardware parallelism (if any is
+added at all by the implementor),
+in direct contrast to SIMD
+hardware parallelism is entirely transparent to software.
+
  The sub-context execution is "nested" in "re-entrant" form, in the
  following order:
  
@@ -20,7 +26,7 @@ following order:
  * VBLOCK sub-execution context (PCVBLK increments whilst PC is paused).
  * VL element loops (STATE srcoffs and destoffs increment, PC and PCVBLK pause).
    Predication bits may be individually applied per element.
-* SUBVL element loops (STATE svdestoffs increments, VL pauses).
+* Optional SUBVL element loops (STATE svdestoffs increments, VL pauses).
    Individual predicate bits from VL loops apply to the *group* of SUBVL
    elements.
  
@@ -36,12 +42,19 @@ and Register or Predicate over-ride tables may be empty: under such
  circumstances the behaviour becomes effectively identical to standard
  RV execution, however SV is never truly actually "off".
  
-Note: **there are *no* new opcodes**. The scheme works *entirely*
+Note: **there are *no* new vector opcodes**. The scheme works *entirely*
  on hidden context that augments (nests) *scalar* RISC-V instructions.
  Thus it may cover existing, future and custom scalar extensions, turning
  all existing, all future and all custom scalar operations parallel,
  without requiring any special (identical, parallel variant) opcodes to do so.
  
+Associated proposals for use with 3D and HPC:
+
+* [[specification/sv.setvl]] - replaces the use of CSRs to set VL (saves
+  32 bits)
+* [[specification/mv.x]] - provides MV.swizzle and MVX (reg[rd] = reg[reg[rs]])
+* [[ztrans_proposal]] - provides trigonometric and transcendental operations
+
  # CSRs <a name="csrs"></a>
  
  There are five CSRs, available in any privilege level:
@@ -86,7 +99,7 @@ where 1 <= MVL <= XLEN
  
  ## SUBVL - Sub Vector Length
  
-This is a "group by quantity" that effectivrly asks each iteration
+This is a "group by quantity" that effectively asks each iteration
  of the hardware loop to load SUBVL elements of width elwidth at a
  time. Effectively, SUBVL is like a SIMD multiplier: instead of just 1
  operation issued, SUBVL operations are issued.
@@ -216,6 +229,55 @@ Pseudocode for predication:
  [[!inline raw="yes" pages="simple_v_extension/pred_table" ]]
  [[!inline raw="yes" pages="simple_v_extension/get_pred_value" ]]
  
+## Swizzle Table <a name="swizzle_table"></a>
+
+The swizzle table is a key-value store that indicates (if a given
+register is used, and SUBVL is 2, 3 or 4) that the sub-elements are to
+be re-ordered according to the indices in the Swizzle format.
+Like the Predication Table, it is an indirect lookup: use of a
+source or destination register in any given operation, if that register
+occurs in the table, "activates" sub-vector element swizzling for
+that register.  Note that the target is taken from the "Register Table"
+(regidx).
+
+Source vectors are free to have the swizzle indices point to the same
+sub-vector element.  However when using swizzling on destination vectors,
+the swizzle **must** be a permutation (no two swizzle indices point to
+the same sub-element).  An illegal instruction exception must be raised
+if this occurs.
+
+[[!inline raw="yes" pages="simple_v_extension/swizzle_table_format" ]]
+
+Simplified pseudocode example, when SUBVL=4 and swizzle is set on rd:
+
+    # default indices if no swizzling table entry present
+    x, y, z, w = 0, 1, 2, 3
+
+    # lookup swizzling in table for rd
+    if swizzle_table[rd].active:
+        swizzle = swizzle_table[rd].swizzle
+
+        # decode the swizzle table entry for rd
+        x = swizzle[0:1] # sub-element 0
+        y = swizzle[2:3] # sub-element 1
+        z = swizzle[4:5] # sub-element 2
+        w = swizzle[6:7] # sub-element 3
+
+    # redirect register numbers through Register Table
+    rd  = int_vec[rd ].isvector ? int_vec[rd ].regidx : rd;
+    rs1 = int_vec[rs1].isvector ? int_vec[rs1].regidx : rs1;
+    rs2 = int_vec[rs2].isvector ? int_vec[rs2].regidx : rs2;
+
+    # loop on VL: SUBVL loop is unrolled (SUBVL=4)
+    for (i in 0; i < VL; i++)
+        ireg[rd+i*4+x] = OPERATION(ireg[rs1+i*4+0], ireg[rs2+i*4+0])
+        ireg[rd+i*4+y] = OPERATION(ireg[rs1+i*4+1], ireg[rs2+i*4+1])
+        ireg[rd+i*4+z] = OPERATION(ireg[rs1+i*4+2], ireg[rs2+i*4+2])
+        ireg[rd+i*4+w] = OPERATION(ireg[rs1+i*4+3], ireg[rs2+i*4+3])
+
+For more information on swizzling, see the Khronos wiki page
+<https://www.khronos.org/opengl/wiki/Data_Type_(GLSL)#Swizzling>
+
  ## Fail-on-First Mode <a name="ffirst-mode"></a>
  
  ffirst is a special data-dependent predicate mode.  There are two