(no commit message)

[libreriscv.git] / openpower / sv / vector_swizzle.mdwn
diff --git a/openpower/sv/vector_swizzle.mdwn b/openpower/sv/vector_swizzle.mdwn

index c04c8644a01b3024c785cf01a2d0914d387be80e..34322e8a4677a4674e8f0cd7f61a182893beced0 100644 (file)
--- a/openpower/sv/vector_swizzle.mdwn
+++ b/openpower/sv/vector_swizzle.mdwn
@@ -1,14 +1,19 @@
+[[!tag standards]]
+
  # SV Vector Prefix Swizzle
  
+* <https://bugs.libre-soc.org/show_bug.cgi?id=139>
+* <https://libre-soc.org/simple_v_extension/specification/mv.x/>
+
  3D GPU operations on batches of vec2, vec3 and vec4 often require re-ordering of the elements in an "out of lane" fashion with respect to standard high performance non-GPU-centric Vector Processors.  Examples include:
  
  * Normalisation of Vectors of XYZ with respect to one dimension
-* Alteration of ARGB pixel vectors wuth respect to opacity (A)
+* Alteration of ARGB pixel vectors with respect to opacity (A)
  * Adjustment of YUV vectors with respect to luminosity
  
  and many more.  Lane-based Vector Processors not having the 2/3/4 inter-lane crossing have some difficulty processing such data and require it to be pushed into memory and retrieved, which is prohibitively costly in both instructions, time, and power consumption.
  
-The cost is so great and the requirement so common that it easily justifies augmenting the ISA of a GPU to be able to specify the reordering of vec2/3/4 elements, often drastically increasing the instruction size in the process.
+The lane reordering cost is so great and the requirement so common that it easily justifies augmenting the ISA of a GPU to be able to specify the reordering of vec2/3/4 elements, often drastically increasing the instruction size in the process.
  
  The reason for the dramatic increase is that the reordering of each element in vec4 requires 2 bits per element, plus a predicate mask.  This means a minimum of 3 bits per element: 12 bits for a vec4, and if there are 2 src operands this is a whopping 24 bits of immediate data, per instruction.
  
@@ -27,7 +32,7 @@ There is also benefit to encoding some useful immediates into src operands, on a
  * However index selection is 2 bits per element
  * Therefore the src SUBVL must be separate and distinct from the dest SUBVL
  
-# Predication mixed with immediates and indices
+## Predication mixed with immediates and indices
  
  * Three bits per element.
  * One encoding (0b000) indicates "mask"
@@ -36,3 +41,9 @@ There is also benefit to encoding some useful immediates into src operands, on a
    - 0 (or 0.0)
    - 1 (or 1.0)
    - -1 (or -1.0) or some other option?
+
+# mv.swizzle
+
+is definitely needed.  TBD encoding.  requires 1 src, 1 dest, and 12 bits immediate minimum.
+
+[[sv/mv.swizzle]]