4 TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
5 for describing shaders. Since Gallium is inherently shaderful, shaders are
6 an important part of the API. TGSI is the only intermediate representation
12 All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
13 floating-point four-component vectors. An opcode may have up to one
14 destination register, known as *dst*, and between zero and three source
15 registers, called *src0* through *src2*, or simply *src* if there is only
18 Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
19 components as integers. Other instructions permit using registers as
20 two-component vectors with double precision; see :ref:`doubleopcodes`.
22 When an instruction has a scalar result, the result is usually copied into
23 each of the components of *dst*. When this happens, the result is said to be
24 *replicated* to *dst*. :opcode:`RCP` is one such instruction.
29 TGSI supports modifiers on inputs (as well as saturate and precise modifier
32 For arithmetic instruction having a precise modifier certain optimizations
33 which may alter the result are disallowed. Example: *add(mul(a,b),c)* can't be
34 optimized to TGSI_OPCODE_MAD, because some hardware only supports the fused
37 For inputs which have a floating point type, both absolute value and
38 negation modifiers are supported (with absolute value being applied
39 first). The only source of TGSI_OPCODE_MOV and the second and third
40 sources of TGSI_OPCODE_UCMP are considered to have float type for
43 For inputs which have signed or unsigned type only the negate modifier is
50 ^^^^^^^^^^^^^^^^^^^^^^^^^
52 These opcodes are guaranteed to be available regardless of the driver being
55 .. opcode:: ARL - Address Register Load
59 dst.x = (int) \lfloor src.x\rfloor
61 dst.y = (int) \lfloor src.y\rfloor
63 dst.z = (int) \lfloor src.z\rfloor
65 dst.w = (int) \lfloor src.w\rfloor
68 .. opcode:: MOV - Move
81 .. opcode:: LIT - Light Coefficients
86 dst.y &= max(src.x, 0) \\
87 dst.z &= (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0 \\
91 .. opcode:: RCP - Reciprocal
93 This instruction replicates its result.
100 .. opcode:: RSQ - Reciprocal Square Root
102 This instruction replicates its result. The results are undefined for src <= 0.
106 dst = \frac{1}{\sqrt{src.x}}
109 .. opcode:: SQRT - Square Root
111 This instruction replicates its result. The results are undefined for src < 0.
118 .. opcode:: EXP - Approximate Exponential Base 2
122 dst.x &= 2^{\lfloor src.x\rfloor} \\
123 dst.y &= src.x - \lfloor src.x\rfloor \\
124 dst.z &= 2^{src.x} \\
128 .. opcode:: LOG - Approximate Logarithm Base 2
132 dst.x &= \lfloor\log_2{|src.x|}\rfloor \\
133 dst.y &= \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} \\
134 dst.z &= \log_2{|src.x|} \\
138 .. opcode:: MUL - Multiply
142 dst.x = src0.x \times src1.x
144 dst.y = src0.y \times src1.y
146 dst.z = src0.z \times src1.z
148 dst.w = src0.w \times src1.w
151 .. opcode:: ADD - Add
155 dst.x = src0.x + src1.x
157 dst.y = src0.y + src1.y
159 dst.z = src0.z + src1.z
161 dst.w = src0.w + src1.w
164 .. opcode:: DP3 - 3-component Dot Product
166 This instruction replicates its result.
170 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
173 .. opcode:: DP4 - 4-component Dot Product
175 This instruction replicates its result.
179 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
182 .. opcode:: DST - Distance Vector
187 dst.y &= src0.y \times src1.y\\
192 .. opcode:: MIN - Minimum
196 dst.x = min(src0.x, src1.x)
198 dst.y = min(src0.y, src1.y)
200 dst.z = min(src0.z, src1.z)
202 dst.w = min(src0.w, src1.w)
205 .. opcode:: MAX - Maximum
209 dst.x = max(src0.x, src1.x)
211 dst.y = max(src0.y, src1.y)
213 dst.z = max(src0.z, src1.z)
215 dst.w = max(src0.w, src1.w)
218 .. opcode:: SLT - Set On Less Than
222 dst.x = (src0.x < src1.x) ? 1.0F : 0.0F
224 dst.y = (src0.y < src1.y) ? 1.0F : 0.0F
226 dst.z = (src0.z < src1.z) ? 1.0F : 0.0F
228 dst.w = (src0.w < src1.w) ? 1.0F : 0.0F
231 .. opcode:: SGE - Set On Greater Equal Than
235 dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F
237 dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F
239 dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F
241 dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F
244 .. opcode:: MAD - Multiply And Add
246 Perform a * b + c. The implementation is free to decide whether there is an
247 intermediate rounding step or not.
251 dst.x = src0.x \times src1.x + src2.x
253 dst.y = src0.y \times src1.y + src2.y
255 dst.z = src0.z \times src1.z + src2.z
257 dst.w = src0.w \times src1.w + src2.w
260 .. opcode:: LRP - Linear Interpolate
264 dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
266 dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
268 dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
270 dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
273 .. opcode:: FMA - Fused Multiply-Add
275 Perform a * b + c with no intermediate rounding step.
279 dst.x = src0.x \times src1.x + src2.x
281 dst.y = src0.y \times src1.y + src2.y
283 dst.z = src0.z \times src1.z + src2.z
285 dst.w = src0.w \times src1.w + src2.w
288 .. opcode:: DP2A - 2-component Dot Product And Add
292 dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x
294 dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x
296 dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x
298 dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x
301 .. opcode:: FRC - Fraction
305 dst.x = src.x - \lfloor src.x\rfloor
307 dst.y = src.y - \lfloor src.y\rfloor
309 dst.z = src.z - \lfloor src.z\rfloor
311 dst.w = src.w - \lfloor src.w\rfloor
314 .. opcode:: FLR - Floor
318 dst.x = \lfloor src.x\rfloor
320 dst.y = \lfloor src.y\rfloor
322 dst.z = \lfloor src.z\rfloor
324 dst.w = \lfloor src.w\rfloor
327 .. opcode:: ROUND - Round
340 .. opcode:: EX2 - Exponential Base 2
342 This instruction replicates its result.
349 .. opcode:: LG2 - Logarithm Base 2
351 This instruction replicates its result.
358 .. opcode:: POW - Power
360 This instruction replicates its result.
364 dst = src0.x^{src1.x}
366 .. opcode:: XPD - Cross Product
370 dst.x = src0.y \times src1.z - src1.y \times src0.z
372 dst.y = src0.z \times src1.x - src1.z \times src0.x
374 dst.z = src0.x \times src1.y - src1.x \times src0.y
379 .. opcode:: DPH - Homogeneous Dot Product
381 This instruction replicates its result.
385 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
388 .. opcode:: COS - Cosine
390 This instruction replicates its result.
397 .. opcode:: DDX, DDX_FINE - Derivative Relative To X
399 The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
400 advertised. When it is, the fine version guarantees one derivative per row
401 while DDX is allowed to be the same for the entire 2x2 quad.
405 dst.x = partialx(src.x)
407 dst.y = partialx(src.y)
409 dst.z = partialx(src.z)
411 dst.w = partialx(src.w)
414 .. opcode:: DDY, DDY_FINE - Derivative Relative To Y
416 The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
417 advertised. When it is, the fine version guarantees one derivative per column
418 while DDY is allowed to be the same for the entire 2x2 quad.
422 dst.x = partialy(src.x)
424 dst.y = partialy(src.y)
426 dst.z = partialy(src.z)
428 dst.w = partialy(src.w)
431 .. opcode:: PK2H - Pack Two 16-bit Floats
433 This instruction replicates its result.
437 dst = f32\_to\_f16(src.x) | f32\_to\_f16(src.y) << 16
440 .. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
445 .. opcode:: PK4B - Pack Four Signed 8-bit Scalars
450 .. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
455 .. opcode:: SEQ - Set On Equal
459 dst.x = (src0.x == src1.x) ? 1.0F : 0.0F
461 dst.y = (src0.y == src1.y) ? 1.0F : 0.0F
463 dst.z = (src0.z == src1.z) ? 1.0F : 0.0F
465 dst.w = (src0.w == src1.w) ? 1.0F : 0.0F
468 .. opcode:: SGT - Set On Greater Than
472 dst.x = (src0.x > src1.x) ? 1.0F : 0.0F
474 dst.y = (src0.y > src1.y) ? 1.0F : 0.0F
476 dst.z = (src0.z > src1.z) ? 1.0F : 0.0F
478 dst.w = (src0.w > src1.w) ? 1.0F : 0.0F
481 .. opcode:: SIN - Sine
483 This instruction replicates its result.
490 .. opcode:: SLE - Set On Less Equal Than
494 dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F
496 dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F
498 dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F
500 dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F
503 .. opcode:: SNE - Set On Not Equal
507 dst.x = (src0.x != src1.x) ? 1.0F : 0.0F
509 dst.y = (src0.y != src1.y) ? 1.0F : 0.0F
511 dst.z = (src0.z != src1.z) ? 1.0F : 0.0F
513 dst.w = (src0.w != src1.w) ? 1.0F : 0.0F
516 .. opcode:: TEX - Texture Lookup
518 for array textures src0.y contains the slice for 1D,
519 and src0.z contain the slice for 2D.
521 for shadow textures with no arrays (and not cube map),
522 src0.z contains the reference value.
524 for shadow textures with arrays, src0.z contains
525 the reference value for 1D arrays, and src0.w contains
526 the reference value for 2D arrays and cube maps.
528 for cube map array shadow textures, the reference value
529 cannot be passed in src0.w, and TEX2 must be used instead.
535 shadow_ref = src0.z or src0.w (optional)
539 dst = texture\_sample(unit, coord, shadow_ref)
542 .. opcode:: TEX2 - Texture Lookup (for shadow cube map arrays only)
544 this is the same as TEX, but uses another reg to encode the
555 dst = texture\_sample(unit, coord, shadow_ref)
560 .. opcode:: TXD - Texture Lookup with Derivatives
572 dst = texture\_sample\_deriv(unit, coord, ddx, ddy)
575 .. opcode:: TXP - Projective Texture Lookup
579 coord.x = src0.x / src0.w
581 coord.y = src0.y / src0.w
583 coord.z = src0.z / src0.w
589 dst = texture\_sample(unit, coord)
592 .. opcode:: UP2H - Unpack Two 16-Bit Floats
596 dst.x = f16\_to\_f32(src0.x \& 0xffff)
598 dst.y = f16\_to\_f32(src0.x >> 16)
600 dst.z = f16\_to\_f32(src0.x \& 0xffff)
602 dst.w = f16\_to\_f32(src0.x >> 16)
606 Considered for removal.
608 .. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
614 Considered for removal.
616 .. opcode:: UP4B - Unpack Four Signed 8-Bit Values
622 Considered for removal.
624 .. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
630 Considered for removal.
633 .. opcode:: ARR - Address Register Load With Round
637 dst.x = (int) round(src.x)
639 dst.y = (int) round(src.y)
641 dst.z = (int) round(src.z)
643 dst.w = (int) round(src.w)
646 .. opcode:: SSG - Set Sign
650 dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
652 dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
654 dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
656 dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
659 .. opcode:: CMP - Compare
663 dst.x = (src0.x < 0) ? src1.x : src2.x
665 dst.y = (src0.y < 0) ? src1.y : src2.y
667 dst.z = (src0.z < 0) ? src1.z : src2.z
669 dst.w = (src0.w < 0) ? src1.w : src2.w
672 .. opcode:: KILL_IF - Conditional Discard
674 Conditional discard. Allowed in fragment shaders only.
678 if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
683 .. opcode:: KILL - Discard
685 Unconditional discard. Allowed in fragment shaders only.
688 .. opcode:: SCS - Sine Cosine
701 .. opcode:: TXB - Texture Lookup With Bias
703 for cube map array textures and shadow cube maps, the bias value
704 cannot be passed in src0.w, and TXB2 must be used instead.
706 if the target is a shadow texture, the reference value is always
707 in src.z (this prevents shadow 3d and shadow 2d arrays from
708 using this instruction, but this is not needed).
724 dst = texture\_sample(unit, coord, bias)
727 .. opcode:: TXB2 - Texture Lookup With Bias (some cube maps only)
729 this is the same as TXB, but uses another reg to encode the
730 lod bias value for cube map arrays and shadow cube maps.
731 Presumably shadow 2d arrays and shadow 3d targets could use
732 this encoding too, but this is not legal.
734 shadow cube map arrays are neither possible nor required.
744 dst = texture\_sample(unit, coord, bias)
747 .. opcode:: DIV - Divide
751 dst.x = \frac{src0.x}{src1.x}
753 dst.y = \frac{src0.y}{src1.y}
755 dst.z = \frac{src0.z}{src1.z}
757 dst.w = \frac{src0.w}{src1.w}
760 .. opcode:: DP2 - 2-component Dot Product
762 This instruction replicates its result.
766 dst = src0.x \times src1.x + src0.y \times src1.y
769 .. opcode:: TEX_LZ - Texture Lookup With LOD = 0
771 This is the same as TXL with LOD = 0. Like every texture opcode, it obeys
772 pipe_sampler_view::u.tex.first_level and pipe_sampler_state::min_lod.
773 There is no way to override those two in shaders.
789 dst = texture\_sample(unit, coord, lod)
792 .. opcode:: TXL - Texture Lookup With explicit LOD
794 for cube map array textures, the explicit lod value
795 cannot be passed in src0.w, and TXL2 must be used instead.
797 if the target is a shadow texture, the reference value is always
798 in src.z (this prevents shadow 3d / 2d array / cube targets from
799 using this instruction, but this is not needed).
815 dst = texture\_sample(unit, coord, lod)
818 .. opcode:: TXL2 - Texture Lookup With explicit LOD (for cube map arrays only)
820 this is the same as TXL, but uses another reg to encode the
822 Presumably shadow 3d / 2d array / cube targets could use
823 this encoding too, but this is not legal.
825 shadow cube map arrays are neither possible nor required.
835 dst = texture\_sample(unit, coord, lod)
838 .. opcode:: PUSHA - Push Address Register On Stack
847 Considered for cleanup.
851 Considered for removal.
853 .. opcode:: POPA - Pop Address Register From Stack
862 Considered for cleanup.
866 Considered for removal.
869 .. opcode:: CALLNZ - Subroutine Call If Not Zero
875 Considered for cleanup.
879 Considered for removal.
883 ^^^^^^^^^^^^^^^^^^^^^^^^
885 These opcodes are primarily provided for special-use computational shaders.
886 Support for these opcodes indicated by a special pipe capability bit (TBD).
888 XXX doesn't look like most of the opcodes really belong here.
890 .. opcode:: CEIL - Ceiling
894 dst.x = \lceil src.x\rceil
896 dst.y = \lceil src.y\rceil
898 dst.z = \lceil src.z\rceil
900 dst.w = \lceil src.w\rceil
903 .. opcode:: TRUNC - Truncate
916 .. opcode:: MOD - Modulus
920 dst.x = src0.x \bmod src1.x
922 dst.y = src0.y \bmod src1.y
924 dst.z = src0.z \bmod src1.z
926 dst.w = src0.w \bmod src1.w
929 .. opcode:: UARL - Integer Address Register Load
931 Moves the contents of the source register, assumed to be an integer, into the
932 destination register, which is assumed to be an address (ADDR) register.
935 .. opcode:: SAD - Sum Of Absolute Differences
939 dst.x = |src0.x - src1.x| + src2.x
941 dst.y = |src0.y - src1.y| + src2.y
943 dst.z = |src0.z - src1.z| + src2.z
945 dst.w = |src0.w - src1.w| + src2.w
948 .. opcode:: TXF - Texel Fetch
950 As per NV_gpu_shader4, extract a single texel from a specified texture
951 image or PIPE_BUFFER resource. The source sampler may not be a CUBE or
953 four-component signed integer vector used to identify the single texel
954 accessed. 3 components + level. If the texture is multisampled, then
955 the fourth component indicates the sample, not the mipmap level.
956 Just like texture instructions, an optional
957 offset vector is provided, which is subject to various driver restrictions
958 (regarding range, source of offsets). This instruction ignores the sampler
961 TXF(uint_vec coord, int_vec offset).
964 .. opcode:: TXF_LZ - Texel Fetch
966 This is the same as TXF with level = 0. Like TXF, it obeys
967 pipe_sampler_view::u.tex.first_level.
970 .. opcode:: TXQ - Texture Size Query
972 As per NV_gpu_program4, retrieve the dimensions of the texture depending on
973 the target. For 1D (width), 2D/RECT/CUBE (width, height), 3D (width, height,
974 depth), 1D array (width, layers), 2D array (width, height, layers).
975 Also return the number of accessible levels (last_level - first_level + 1)
978 For components which don't return a resource dimension, their value
985 dst.x = texture\_width(unit, lod)
987 dst.y = texture\_height(unit, lod)
989 dst.z = texture\_depth(unit, lod)
991 dst.w = texture\_levels(unit)
994 .. opcode:: TXQS - Texture Samples Query
996 This retrieves the number of samples in the texture, and stores it
997 into the x component as an unsigned integer. The other components are
998 undefined. If the texture is not multisampled, this function returns
999 (1, undef, undef, undef).
1003 dst.x = texture\_samples(unit)
1006 .. opcode:: TG4 - Texture Gather
1008 As per ARB_texture_gather, gathers the four texels to be used in a bi-linear
1009 filtering operation and packs them into a single register. Only works with
1010 2D, 2D array, cubemaps, and cubemaps arrays. For 2D textures, only the
1011 addressing modes of the sampler and the top level of any mip pyramid are
1012 used. Set W to zero. It behaves like the TEX instruction, but a filtered
1013 sample is not generated. The four samples that contribute to filtering are
1014 placed into xyzw in clockwise order, starting with the (u,v) texture
1015 coordinate delta at the following locations (-, +), (+, +), (+, -), (-, -),
1016 where the magnitude of the deltas are half a texel.
1018 PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample
1019 depth compares, single component selection, and a non-constant offset. It
1020 doesn't allow support for the GL independent offset to get i0,j0. This would
1021 require another CAP is hw can do it natively. For now we lower that before
1030 dst = texture\_gather4 (unit, coord, component)
1032 (with SM5 - cube array shadow)
1040 dst = texture\_gather (uint, coord, compare)
1042 .. opcode:: LODQ - level of detail query
1044 Compute the LOD information that the texture pipe would use to access the
1045 texture. The Y component contains the computed LOD lambda_prime. The X
1046 component contains the LOD that will be accessed, based on min/max lod's
1053 dst.xy = lodq(uint, coord);
1055 .. opcode:: CLOCK - retrieve the current shader time
1057 Invoking this instruction multiple times in the same shader should
1058 cause monotonically increasing values to be returned. The values
1059 are implicitly 64-bit, so if fewer than 64 bits of precision are
1060 available, to provide expected wraparound semantics, the value
1061 should be shifted up so that the most significant bit of the time
1062 is the most significant bit of the 64-bit value.
1070 ^^^^^^^^^^^^^^^^^^^^^^^^
1071 These opcodes are used for integer operations.
1072 Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
1075 .. opcode:: I2F - Signed Integer To Float
1077 Rounding is unspecified (round to nearest even suggested).
1081 dst.x = (float) src.x
1083 dst.y = (float) src.y
1085 dst.z = (float) src.z
1087 dst.w = (float) src.w
1090 .. opcode:: U2F - Unsigned Integer To Float
1092 Rounding is unspecified (round to nearest even suggested).
1096 dst.x = (float) src.x
1098 dst.y = (float) src.y
1100 dst.z = (float) src.z
1102 dst.w = (float) src.w
1105 .. opcode:: F2I - Float to Signed Integer
1107 Rounding is towards zero (truncate).
1108 Values outside signed range (including NaNs) produce undefined results.
1121 .. opcode:: F2U - Float to Unsigned Integer
1123 Rounding is towards zero (truncate).
1124 Values outside unsigned range (including NaNs) produce undefined results.
1128 dst.x = (unsigned) src.x
1130 dst.y = (unsigned) src.y
1132 dst.z = (unsigned) src.z
1134 dst.w = (unsigned) src.w
1137 .. opcode:: UADD - Integer Add
1139 This instruction works the same for signed and unsigned integers.
1140 The low 32bit of the result is returned.
1144 dst.x = src0.x + src1.x
1146 dst.y = src0.y + src1.y
1148 dst.z = src0.z + src1.z
1150 dst.w = src0.w + src1.w
1153 .. opcode:: UMAD - Integer Multiply And Add
1155 This instruction works the same for signed and unsigned integers.
1156 The multiplication returns the low 32bit (as does the result itself).
1160 dst.x = src0.x \times src1.x + src2.x
1162 dst.y = src0.y \times src1.y + src2.y
1164 dst.z = src0.z \times src1.z + src2.z
1166 dst.w = src0.w \times src1.w + src2.w
1169 .. opcode:: UMUL - Integer Multiply
1171 This instruction works the same for signed and unsigned integers.
1172 The low 32bit of the result is returned.
1176 dst.x = src0.x \times src1.x
1178 dst.y = src0.y \times src1.y
1180 dst.z = src0.z \times src1.z
1182 dst.w = src0.w \times src1.w
1185 .. opcode:: IMUL_HI - Signed Integer Multiply High Bits
1187 The high 32bits of the multiplication of 2 signed integers are returned.
1191 dst.x = (src0.x \times src1.x) >> 32
1193 dst.y = (src0.y \times src1.y) >> 32
1195 dst.z = (src0.z \times src1.z) >> 32
1197 dst.w = (src0.w \times src1.w) >> 32
1200 .. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
1202 The high 32bits of the multiplication of 2 unsigned integers are returned.
1206 dst.x = (src0.x \times src1.x) >> 32
1208 dst.y = (src0.y \times src1.y) >> 32
1210 dst.z = (src0.z \times src1.z) >> 32
1212 dst.w = (src0.w \times src1.w) >> 32
1215 .. opcode:: IDIV - Signed Integer Division
1217 TBD: behavior for division by zero.
1221 dst.x = \frac{src0.x}{src1.x}
1223 dst.y = \frac{src0.y}{src1.y}
1225 dst.z = \frac{src0.z}{src1.z}
1227 dst.w = \frac{src0.w}{src1.w}
1230 .. opcode:: UDIV - Unsigned Integer Division
1232 For division by zero, 0xffffffff is returned.
1236 dst.x = \frac{src0.x}{src1.x}
1238 dst.y = \frac{src0.y}{src1.y}
1240 dst.z = \frac{src0.z}{src1.z}
1242 dst.w = \frac{src0.w}{src1.w}
1245 .. opcode:: UMOD - Unsigned Integer Remainder
1247 If second arg is zero, 0xffffffff is returned.
1251 dst.x = src0.x \bmod src1.x
1253 dst.y = src0.y \bmod src1.y
1255 dst.z = src0.z \bmod src1.z
1257 dst.w = src0.w \bmod src1.w
1260 .. opcode:: NOT - Bitwise Not
1273 .. opcode:: AND - Bitwise And
1277 dst.x = src0.x \& src1.x
1279 dst.y = src0.y \& src1.y
1281 dst.z = src0.z \& src1.z
1283 dst.w = src0.w \& src1.w
1286 .. opcode:: OR - Bitwise Or
1290 dst.x = src0.x | src1.x
1292 dst.y = src0.y | src1.y
1294 dst.z = src0.z | src1.z
1296 dst.w = src0.w | src1.w
1299 .. opcode:: XOR - Bitwise Xor
1303 dst.x = src0.x \oplus src1.x
1305 dst.y = src0.y \oplus src1.y
1307 dst.z = src0.z \oplus src1.z
1309 dst.w = src0.w \oplus src1.w
1312 .. opcode:: IMAX - Maximum of Signed Integers
1316 dst.x = max(src0.x, src1.x)
1318 dst.y = max(src0.y, src1.y)
1320 dst.z = max(src0.z, src1.z)
1322 dst.w = max(src0.w, src1.w)
1325 .. opcode:: UMAX - Maximum of Unsigned Integers
1329 dst.x = max(src0.x, src1.x)
1331 dst.y = max(src0.y, src1.y)
1333 dst.z = max(src0.z, src1.z)
1335 dst.w = max(src0.w, src1.w)
1338 .. opcode:: IMIN - Minimum of Signed Integers
1342 dst.x = min(src0.x, src1.x)
1344 dst.y = min(src0.y, src1.y)
1346 dst.z = min(src0.z, src1.z)
1348 dst.w = min(src0.w, src1.w)
1351 .. opcode:: UMIN - Minimum of Unsigned Integers
1355 dst.x = min(src0.x, src1.x)
1357 dst.y = min(src0.y, src1.y)
1359 dst.z = min(src0.z, src1.z)
1361 dst.w = min(src0.w, src1.w)
1364 .. opcode:: SHL - Shift Left
1366 The shift count is masked with 0x1f before the shift is applied.
1370 dst.x = src0.x << (0x1f \& src1.x)
1372 dst.y = src0.y << (0x1f \& src1.y)
1374 dst.z = src0.z << (0x1f \& src1.z)
1376 dst.w = src0.w << (0x1f \& src1.w)
1379 .. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer)
1381 The shift count is masked with 0x1f before the shift is applied.
1385 dst.x = src0.x >> (0x1f \& src1.x)
1387 dst.y = src0.y >> (0x1f \& src1.y)
1389 dst.z = src0.z >> (0x1f \& src1.z)
1391 dst.w = src0.w >> (0x1f \& src1.w)
1394 .. opcode:: USHR - Logical Shift Right
1396 The shift count is masked with 0x1f before the shift is applied.
1400 dst.x = src0.x >> (unsigned) (0x1f \& src1.x)
1402 dst.y = src0.y >> (unsigned) (0x1f \& src1.y)
1404 dst.z = src0.z >> (unsigned) (0x1f \& src1.z)
1406 dst.w = src0.w >> (unsigned) (0x1f \& src1.w)
1409 .. opcode:: UCMP - Integer Conditional Move
1413 dst.x = src0.x ? src1.x : src2.x
1415 dst.y = src0.y ? src1.y : src2.y
1417 dst.z = src0.z ? src1.z : src2.z
1419 dst.w = src0.w ? src1.w : src2.w
1423 .. opcode:: ISSG - Integer Set Sign
1427 dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0
1429 dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0
1431 dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0
1433 dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0
1437 .. opcode:: FSLT - Float Set On Less Than (ordered)
1439 Same comparison as SLT but returns integer instead of 1.0/0.0 float
1443 dst.x = (src0.x < src1.x) ? \sim 0 : 0
1445 dst.y = (src0.y < src1.y) ? \sim 0 : 0
1447 dst.z = (src0.z < src1.z) ? \sim 0 : 0
1449 dst.w = (src0.w < src1.w) ? \sim 0 : 0
1452 .. opcode:: ISLT - Signed Integer Set On Less Than
1456 dst.x = (src0.x < src1.x) ? \sim 0 : 0
1458 dst.y = (src0.y < src1.y) ? \sim 0 : 0
1460 dst.z = (src0.z < src1.z) ? \sim 0 : 0
1462 dst.w = (src0.w < src1.w) ? \sim 0 : 0
1465 .. opcode:: USLT - Unsigned Integer Set On Less Than
1469 dst.x = (src0.x < src1.x) ? \sim 0 : 0
1471 dst.y = (src0.y < src1.y) ? \sim 0 : 0
1473 dst.z = (src0.z < src1.z) ? \sim 0 : 0
1475 dst.w = (src0.w < src1.w) ? \sim 0 : 0
1478 .. opcode:: FSGE - Float Set On Greater Equal Than (ordered)
1480 Same comparison as SGE but returns integer instead of 1.0/0.0 float
1484 dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1486 dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1488 dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1490 dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1493 .. opcode:: ISGE - Signed Integer Set On Greater Equal Than
1497 dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1499 dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1501 dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1503 dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1506 .. opcode:: USGE - Unsigned Integer Set On Greater Equal Than
1510 dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1512 dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1514 dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1516 dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1519 .. opcode:: FSEQ - Float Set On Equal (ordered)
1521 Same comparison as SEQ but returns integer instead of 1.0/0.0 float
1525 dst.x = (src0.x == src1.x) ? \sim 0 : 0
1527 dst.y = (src0.y == src1.y) ? \sim 0 : 0
1529 dst.z = (src0.z == src1.z) ? \sim 0 : 0
1531 dst.w = (src0.w == src1.w) ? \sim 0 : 0
1534 .. opcode:: USEQ - Integer Set On Equal
1538 dst.x = (src0.x == src1.x) ? \sim 0 : 0
1540 dst.y = (src0.y == src1.y) ? \sim 0 : 0
1542 dst.z = (src0.z == src1.z) ? \sim 0 : 0
1544 dst.w = (src0.w == src1.w) ? \sim 0 : 0
1547 .. opcode:: FSNE - Float Set On Not Equal (unordered)
1549 Same comparison as SNE but returns integer instead of 1.0/0.0 float
1553 dst.x = (src0.x != src1.x) ? \sim 0 : 0
1555 dst.y = (src0.y != src1.y) ? \sim 0 : 0
1557 dst.z = (src0.z != src1.z) ? \sim 0 : 0
1559 dst.w = (src0.w != src1.w) ? \sim 0 : 0
1562 .. opcode:: USNE - Integer Set On Not Equal
1566 dst.x = (src0.x != src1.x) ? \sim 0 : 0
1568 dst.y = (src0.y != src1.y) ? \sim 0 : 0
1570 dst.z = (src0.z != src1.z) ? \sim 0 : 0
1572 dst.w = (src0.w != src1.w) ? \sim 0 : 0
1575 .. opcode:: INEG - Integer Negate
1590 .. opcode:: IABS - Integer Absolute Value
1604 These opcodes are used for bit-level manipulation of integers.
1606 .. opcode:: IBFE - Signed Bitfield Extract
1608 Like GLSL bitfieldExtract. Extracts a set of bits from the input, and
1609 sign-extends them if the high bit of the extracted window is set.
1613 def ibfe(value, offset, bits):
1614 if offset < 0 or bits < 0 or offset + bits > 32:
1616 if bits == 0: return 0
1617 # Note: >> sign-extends
1618 return (value << (32 - offset - bits)) >> (32 - bits)
1620 .. opcode:: UBFE - Unsigned Bitfield Extract
1622 Like GLSL bitfieldExtract. Extracts a set of bits from the input, without
1627 def ubfe(value, offset, bits):
1628 if offset < 0 or bits < 0 or offset + bits > 32:
1630 if bits == 0: return 0
1631 # Note: >> does not sign-extend
1632 return (value << (32 - offset - bits)) >> (32 - bits)
1634 .. opcode:: BFI - Bitfield Insert
1636 Like GLSL bitfieldInsert. Replaces a bit region of 'base' with the low bits
1641 def bfi(base, insert, offset, bits):
1642 if offset < 0 or bits < 0 or offset + bits > 32:
1644 # << defined such that mask == ~0 when bits == 32, offset == 0
1645 mask = ((1 << bits) - 1) << offset
1646 return ((insert << offset) & mask) | (base & ~mask)
1648 .. opcode:: BREV - Bitfield Reverse
1650 See SM5 instruction BFREV. Reverses the bits of the argument.
1652 .. opcode:: POPC - Population Count
1654 See SM5 instruction COUNTBITS. Counts the number of set bits in the argument.
1656 .. opcode:: LSB - Index of lowest set bit
1658 See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set
1659 bit of the argument. Returns -1 if none are set.
1661 .. opcode:: IMSB - Index of highest non-sign bit
1663 See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest
1664 non-sign bit of the argument (i.e. highest 0 bit for negative numbers,
1665 highest 1 bit for positive numbers). Returns -1 if all bits are the same
1666 (i.e. for inputs 0 and -1).
1668 .. opcode:: UMSB - Index of highest set bit
1670 See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest
1671 set bit of the argument. Returns -1 if none are set.
1674 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1676 These opcodes are only supported in geometry shaders; they have no meaning
1677 in any other type of shader.
1679 .. opcode:: EMIT - Emit
1681 Generate a new vertex for the current primitive into the specified vertex
1682 stream using the values in the output registers.
1685 .. opcode:: ENDPRIM - End Primitive
1687 Complete the current primitive in the specified vertex stream (consisting of
1688 the emitted vertices), and start a new one.
1694 These opcodes are part of :term:`GLSL`'s opcode set. Support for these
1695 opcodes is determined by a special capability bit, ``GLSL``.
1696 Some require glsl version 1.30 (UIF/BREAKC/SWITCH/CASE/DEFAULT/ENDSWITCH).
1698 .. opcode:: CAL - Subroutine Call
1704 .. opcode:: RET - Subroutine Call Return
1709 .. opcode:: CONT - Continue
1711 Unconditionally moves the point of execution to the instruction after the
1712 last bgnloop. The instruction must appear within a bgnloop/endloop.
1716 Support for CONT is determined by a special capability bit,
1717 ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
1720 .. opcode:: BGNLOOP - Begin a Loop
1722 Start a loop. Must have a matching endloop.
1725 .. opcode:: BGNSUB - Begin Subroutine
1727 Starts definition of a subroutine. Must have a matching endsub.
1730 .. opcode:: ENDLOOP - End a Loop
1732 End a loop started with bgnloop.
1735 .. opcode:: ENDSUB - End Subroutine
1737 Ends definition of a subroutine.
1740 .. opcode:: NOP - No Operation
1745 .. opcode:: BRK - Break
1747 Unconditionally moves the point of execution to the instruction after the
1748 next endloop or endswitch. The instruction must appear within a loop/endloop
1749 or switch/endswitch.
1752 .. opcode:: BREAKC - Break Conditional
1754 Conditionally moves the point of execution to the instruction after the
1755 next endloop or endswitch. The instruction must appear within a loop/endloop
1756 or switch/endswitch.
1757 Condition evaluates to true if src0.x != 0 where src0.x is interpreted
1758 as an integer register.
1762 Considered for removal as it's quite inconsistent wrt other opcodes
1763 (could emulate with UIF/BRK/ENDIF).
1766 .. opcode:: IF - Float If
1768 Start an IF ... ELSE .. ENDIF block. Condition evaluates to true if
1772 where src0.x is interpreted as a floating point register.
1775 .. opcode:: UIF - Bitwise If
1777 Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
1781 where src0.x is interpreted as an integer register.
1784 .. opcode:: ELSE - Else
1786 Starts an else block, after an IF or UIF statement.
1789 .. opcode:: ENDIF - End If
1791 Ends an IF or UIF block.
1794 .. opcode:: SWITCH - Switch
1796 Starts a C-style switch expression. The switch consists of one or multiple
1797 CASE statements, and at most one DEFAULT statement. Execution of a statement
1798 ends when a BRK is hit, but just like in C falling through to other cases
1799 without a break is allowed. Similarly, DEFAULT label is allowed anywhere not
1800 just as last statement, and fallthrough is allowed into/from it.
1801 CASE src arguments are evaluated at bit level against the SWITCH src argument.
1807 (some instructions here)
1810 (some instructions here)
1813 (some instructions here)
1818 .. opcode:: CASE - Switch case
1820 This represents a switch case label. The src arg must be an integer immediate.
1823 .. opcode:: DEFAULT - Switch default
1825 This represents the default case in the switch, which is taken if no other
1829 .. opcode:: ENDSWITCH - End of switch
1831 Ends a switch expression.
1837 The interpolation instructions allow an input to be interpolated in a
1838 different way than its declaration. This corresponds to the GLSL 4.00
1839 interpolateAt* functions. The first argument of each of these must come from
1840 ``TGSI_FILE_INPUT``.
1842 .. opcode:: INTERP_CENTROID - Interpolate at the centroid
1844 Interpolates the varying specified by src0 at the centroid
1846 .. opcode:: INTERP_SAMPLE - Interpolate at the specified sample
1848 Interpolates the varying specified by src0 at the sample id specified by
1849 src1.x (interpreted as an integer)
1851 .. opcode:: INTERP_OFFSET - Interpolate at the specified offset
1853 Interpolates the varying specified by src0 at the offset src1.xy from the
1854 pixel center (interpreted as floats)
1862 The double-precision opcodes reinterpret four-component vectors into
1863 two-component vectors with doubled precision in each component.
1865 .. opcode:: DABS - Absolute
1873 .. opcode:: DADD - Add
1877 dst.xy = src0.xy + src1.xy
1879 dst.zw = src0.zw + src1.zw
1881 .. opcode:: DSEQ - Set on Equal
1885 dst.x = src0.xy == src1.xy ? \sim 0 : 0
1887 dst.z = src0.zw == src1.zw ? \sim 0 : 0
1889 .. opcode:: DSNE - Set on Equal
1893 dst.x = src0.xy != src1.xy ? \sim 0 : 0
1895 dst.z = src0.zw != src1.zw ? \sim 0 : 0
1897 .. opcode:: DSLT - Set on Less than
1901 dst.x = src0.xy < src1.xy ? \sim 0 : 0
1903 dst.z = src0.zw < src1.zw ? \sim 0 : 0
1905 .. opcode:: DSGE - Set on Greater equal
1909 dst.x = src0.xy >= src1.xy ? \sim 0 : 0
1911 dst.z = src0.zw >= src1.zw ? \sim 0 : 0
1913 .. opcode:: DFRAC - Fraction
1917 dst.xy = src.xy - \lfloor src.xy\rfloor
1919 dst.zw = src.zw - \lfloor src.zw\rfloor
1921 .. opcode:: DTRUNC - Truncate
1925 dst.xy = trunc(src.xy)
1927 dst.zw = trunc(src.zw)
1929 .. opcode:: DCEIL - Ceiling
1933 dst.xy = \lceil src.xy\rceil
1935 dst.zw = \lceil src.zw\rceil
1937 .. opcode:: DFLR - Floor
1941 dst.xy = \lfloor src.xy\rfloor
1943 dst.zw = \lfloor src.zw\rfloor
1945 .. opcode:: DROUND - Fraction
1949 dst.xy = round(src.xy)
1951 dst.zw = round(src.zw)
1953 .. opcode:: DSSG - Set Sign
1957 dst.xy = (src.xy > 0) ? 1.0 : (src.xy < 0) ? -1.0 : 0.0
1959 dst.zw = (src.zw > 0) ? 1.0 : (src.zw < 0) ? -1.0 : 0.0
1961 .. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components
1963 Like the ``frexp()`` routine in many math libraries, this opcode stores the
1964 exponent of its source to ``dst0``, and the significand to ``dst1``, such that
1965 :math:`dst1 \times 2^{dst0} = src` .
1969 dst0.xy = exp(src.xy)
1971 dst1.xy = frac(src.xy)
1973 dst0.zw = exp(src.zw)
1975 dst1.zw = frac(src.zw)
1977 .. opcode:: DLDEXP - Multiply Number by Integral Power of 2
1979 This opcode is the inverse of :opcode:`DFRACEXP`. The second
1980 source is an integer.
1984 dst.xy = src0.xy \times 2^{src1.x}
1986 dst.zw = src0.zw \times 2^{src1.y}
1988 .. opcode:: DMIN - Minimum
1992 dst.xy = min(src0.xy, src1.xy)
1994 dst.zw = min(src0.zw, src1.zw)
1996 .. opcode:: DMAX - Maximum
2000 dst.xy = max(src0.xy, src1.xy)
2002 dst.zw = max(src0.zw, src1.zw)
2004 .. opcode:: DMUL - Multiply
2008 dst.xy = src0.xy \times src1.xy
2010 dst.zw = src0.zw \times src1.zw
2013 .. opcode:: DMAD - Multiply And Add
2017 dst.xy = src0.xy \times src1.xy + src2.xy
2019 dst.zw = src0.zw \times src1.zw + src2.zw
2022 .. opcode:: DFMA - Fused Multiply-Add
2024 Perform a * b + c with no intermediate rounding step.
2028 dst.xy = src0.xy \times src1.xy + src2.xy
2030 dst.zw = src0.zw \times src1.zw + src2.zw
2033 .. opcode:: DDIV - Divide
2037 dst.xy = \frac{src0.xy}{src1.xy}
2039 dst.zw = \frac{src0.zw}{src1.zw}
2042 .. opcode:: DRCP - Reciprocal
2046 dst.xy = \frac{1}{src.xy}
2048 dst.zw = \frac{1}{src.zw}
2050 .. opcode:: DSQRT - Square Root
2054 dst.xy = \sqrt{src.xy}
2056 dst.zw = \sqrt{src.zw}
2058 .. opcode:: DRSQ - Reciprocal Square Root
2062 dst.xy = \frac{1}{\sqrt{src.xy}}
2064 dst.zw = \frac{1}{\sqrt{src.zw}}
2066 .. opcode:: F2D - Float to Double
2070 dst.xy = double(src0.x)
2072 dst.zw = double(src0.y)
2074 .. opcode:: D2F - Double to Float
2078 dst.x = float(src0.xy)
2080 dst.y = float(src0.zw)
2082 .. opcode:: I2D - Int to Double
2086 dst.xy = double(src0.x)
2088 dst.zw = double(src0.y)
2090 .. opcode:: D2I - Double to Int
2094 dst.x = int(src0.xy)
2096 dst.y = int(src0.zw)
2098 .. opcode:: U2D - Unsigned Int to Double
2102 dst.xy = double(src0.x)
2104 dst.zw = double(src0.y)
2106 .. opcode:: D2U - Double to Unsigned Int
2110 dst.x = unsigned(src0.xy)
2112 dst.y = unsigned(src0.zw)
2117 The 64-bit integer opcodes reinterpret four-component vectors into
2118 two-component vectors with 64-bits in each component.
2120 .. opcode:: I64ABS - 64-bit Integer Absolute Value
2128 .. opcode:: I64NEG - 64-bit Integer Negate
2138 .. opcode:: I64SSG - 64-bit Integer Set Sign
2142 dst.xy = (src0.xy < 0) ? -1 : (src0.xy > 0) ? 1 : 0
2144 dst.zw = (src0.zw < 0) ? -1 : (src0.zw > 0) ? 1 : 0
2146 .. opcode:: U64ADD - 64-bit Integer Add
2150 dst.xy = src0.xy + src1.xy
2152 dst.zw = src0.zw + src1.zw
2154 .. opcode:: U64MUL - 64-bit Integer Multiply
2158 dst.xy = src0.xy * src1.xy
2160 dst.zw = src0.zw * src1.zw
2162 .. opcode:: U64SEQ - 64-bit Integer Set on Equal
2166 dst.x = src0.xy == src1.xy ? \sim 0 : 0
2168 dst.z = src0.zw == src1.zw ? \sim 0 : 0
2170 .. opcode:: U64SNE - 64-bit Integer Set on Not Equal
2174 dst.x = src0.xy != src1.xy ? \sim 0 : 0
2176 dst.z = src0.zw != src1.zw ? \sim 0 : 0
2178 .. opcode:: U64SLT - 64-bit Unsigned Integer Set on Less Than
2182 dst.x = src0.xy < src1.xy ? \sim 0 : 0
2184 dst.z = src0.zw < src1.zw ? \sim 0 : 0
2186 .. opcode:: U64SGE - 64-bit Unsigned Integer Set on Greater Equal
2190 dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2192 dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2194 .. opcode:: I64SLT - 64-bit Signed Integer Set on Less Than
2198 dst.x = src0.xy < src1.xy ? \sim 0 : 0
2200 dst.z = src0.zw < src1.zw ? \sim 0 : 0
2202 .. opcode:: I64SGE - 64-bit Signed Integer Set on Greater Equal
2206 dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2208 dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2210 .. opcode:: I64MIN - Minimum of 64-bit Signed Integers
2214 dst.xy = min(src0.xy, src1.xy)
2216 dst.zw = min(src0.zw, src1.zw)
2218 .. opcode:: U64MIN - Minimum of 64-bit Unsigned Integers
2222 dst.xy = min(src0.xy, src1.xy)
2224 dst.zw = min(src0.zw, src1.zw)
2226 .. opcode:: I64MAX - Maximum of 64-bit Signed Integers
2230 dst.xy = max(src0.xy, src1.xy)
2232 dst.zw = max(src0.zw, src1.zw)
2234 .. opcode:: U64MAX - Maximum of 64-bit Unsigned Integers
2238 dst.xy = max(src0.xy, src1.xy)
2240 dst.zw = max(src0.zw, src1.zw)
2242 .. opcode:: U64SHL - Shift Left 64-bit Unsigned Integer
2244 The shift count is masked with 0x3f before the shift is applied.
2248 dst.xy = src0.xy << (0x3f \& src1.x)
2250 dst.zw = src0.zw << (0x3f \& src1.y)
2252 .. opcode:: I64SHR - Arithmetic Shift Right (of 64-bit Signed Integer)
2254 The shift count is masked with 0x3f before the shift is applied.
2258 dst.xy = src0.xy >> (0x3f \& src1.x)
2260 dst.zw = src0.zw >> (0x3f \& src1.y)
2262 .. opcode:: U64SHR - Logical Shift Right (of 64-bit Unsigned Integer)
2264 The shift count is masked with 0x3f before the shift is applied.
2268 dst.xy = src0.xy >> (unsigned) (0x3f \& src1.x)
2270 dst.zw = src0.zw >> (unsigned) (0x3f \& src1.y)
2272 .. opcode:: I64DIV - 64-bit Signed Integer Division
2276 dst.xy = \frac{src0.xy}{src1.xy}
2278 dst.zw = \frac{src0.zw}{src1.zw}
2280 .. opcode:: U64DIV - 64-bit Unsigned Integer Division
2284 dst.xy = \frac{src0.xy}{src1.xy}
2286 dst.zw = \frac{src0.zw}{src1.zw}
2288 .. opcode:: U64MOD - 64-bit Unsigned Integer Remainder
2292 dst.xy = src0.xy \bmod src1.xy
2294 dst.zw = src0.zw \bmod src1.zw
2296 .. opcode:: I64MOD - 64-bit Signed Integer Remainder
2300 dst.xy = src0.xy \bmod src1.xy
2302 dst.zw = src0.zw \bmod src1.zw
2304 .. opcode:: F2U64 - Float to 64-bit Unsigned Int
2308 dst.xy = (uint64_t) src0.x
2310 dst.zw = (uint64_t) src0.y
2312 .. opcode:: F2I64 - Float to 64-bit Int
2316 dst.xy = (int64_t) src0.x
2318 dst.zw = (int64_t) src0.y
2320 .. opcode:: U2I64 - Unsigned Integer to 64-bit Integer
2322 This is a zero extension.
2326 dst.xy = (uint64_t) src0.x
2328 dst.zw = (uint64_t) src0.y
2330 .. opcode:: I2I64 - Signed Integer to 64-bit Integer
2332 This is a sign extension.
2336 dst.xy = (int64_t) src0.x
2338 dst.zw = (int64_t) src0.y
2340 .. opcode:: D2U64 - Double to 64-bit Unsigned Int
2344 dst.xy = (uint64_t) src0.xy
2346 dst.zw = (uint64_t) src0.zw
2348 .. opcode:: D2I64 - Double to 64-bit Int
2352 dst.xy = (int64_t) src0.xy
2354 dst.zw = (int64_t) src0.zw
2356 .. opcode:: U642F - 64-bit unsigned integer to float
2360 dst.x = (float) src0.xy
2362 dst.y = (float) src0.zw
2364 .. opcode:: I642F - 64-bit Int to Float
2368 dst.x = (float) src0.xy
2370 dst.y = (float) src0.zw
2372 .. opcode:: U642D - 64-bit unsigned integer to double
2376 dst.xy = (double) src0.xy
2378 dst.zw = (double) src0.zw
2380 .. opcode:: I642D - 64-bit Int to double
2384 dst.xy = (double) src0.xy
2386 dst.zw = (double) src0.zw
2388 .. _samplingopcodes:
2390 Resource Sampling Opcodes
2391 ^^^^^^^^^^^^^^^^^^^^^^^^^
2393 Those opcodes follow very closely semantics of the respective Direct3D
2394 instructions. If in doubt double check Direct3D documentation.
2395 Note that the swizzle on SVIEW (src1) determines texel swizzling
2400 Using provided address, sample data from the specified texture using the
2401 filtering mode identified by the given sampler. The source data may come from
2402 any resource type other than buffers.
2404 Syntax: ``SAMPLE dst, address, sampler_view, sampler``
2406 Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]``
2408 .. opcode:: SAMPLE_I
2410 Simplified alternative to the SAMPLE instruction. Using the provided
2411 integer address, SAMPLE_I fetches data from the specified sampler view
2412 without any filtering. The source data may come from any resource type
2415 Syntax: ``SAMPLE_I dst, address, sampler_view``
2417 Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]``
2419 The 'address' is specified as unsigned integers. If the 'address' is out of
2420 range [0...(# texels - 1)] the result of the fetch is always 0 in all
2421 components. As such the instruction doesn't honor address wrap modes, in
2422 cases where that behavior is desirable 'SAMPLE' instruction should be used.
2423 address.w always provides an unsigned integer mipmap level. If the value is
2424 out of the range then the instruction always returns 0 in all components.
2425 address.yz are ignored for buffers and 1d textures. address.z is ignored
2426 for 1d texture arrays and 2d textures.
2428 For 1D texture arrays address.y provides the array index (also as unsigned
2429 integer). If the value is out of the range of available array indices
2430 [0... (array size - 1)] then the opcode always returns 0 in all components.
2431 For 2D texture arrays address.z provides the array index, otherwise it
2432 exhibits the same behavior as in the case for 1D texture arrays. The exact
2433 semantics of the source address are presented in the table below:
2435 +---------------------------+----+-----+-----+---------+
2436 | resource type | X | Y | Z | W |
2437 +===========================+====+=====+=====+=========+
2438 | ``PIPE_BUFFER`` | x | | | ignored |
2439 +---------------------------+----+-----+-----+---------+
2440 | ``PIPE_TEXTURE_1D`` | x | | | mpl |
2441 +---------------------------+----+-----+-----+---------+
2442 | ``PIPE_TEXTURE_2D`` | x | y | | mpl |
2443 +---------------------------+----+-----+-----+---------+
2444 | ``PIPE_TEXTURE_3D`` | x | y | z | mpl |
2445 +---------------------------+----+-----+-----+---------+
2446 | ``PIPE_TEXTURE_RECT`` | x | y | | mpl |
2447 +---------------------------+----+-----+-----+---------+
2448 | ``PIPE_TEXTURE_CUBE`` | not allowed as source |
2449 +---------------------------+----+-----+-----+---------+
2450 | ``PIPE_TEXTURE_1D_ARRAY`` | x | idx | | mpl |
2451 +---------------------------+----+-----+-----+---------+
2452 | ``PIPE_TEXTURE_2D_ARRAY`` | x | y | idx | mpl |
2453 +---------------------------+----+-----+-----+---------+
2455 Where 'mpl' is a mipmap level and 'idx' is the array index.
2457 .. opcode:: SAMPLE_I_MS
2459 Just like SAMPLE_I but allows fetch data from multi-sampled surfaces.
2461 Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample``
2463 .. opcode:: SAMPLE_B
2465 Just like the SAMPLE instruction with the exception that an additional bias
2466 is applied to the level of detail computed as part of the instruction
2469 Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias``
2471 Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2473 .. opcode:: SAMPLE_C
2475 Similar to the SAMPLE instruction but it performs a comparison filter. The
2476 operands to SAMPLE_C are identical to SAMPLE, except that there is an
2477 additional float32 operand, reference value, which must be a register with
2478 single-component, or a scalar literal. SAMPLE_C makes the hardware use the
2479 current samplers compare_func (in pipe_sampler_state) to compare reference
2480 value against the red component value for the surce resource at each texel
2481 that the currently configured texture filter covers based on the provided
2484 Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value``
2486 Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2488 .. opcode:: SAMPLE_C_LZ
2490 Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands
2493 Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value``
2495 Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2498 .. opcode:: SAMPLE_D
2500 SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for
2501 the source address in the x direction and the y direction are provided by
2504 Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y``
2506 Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]``
2508 .. opcode:: SAMPLE_L
2510 SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided
2511 directly as a scalar value, representing no anisotropy.
2513 Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod``
2515 Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2519 Gathers the four texels to be used in a bi-linear filtering operation and
2520 packs them into a single register. Only works with 2D, 2D array, cubemaps,
2521 and cubemaps arrays. For 2D textures, only the addressing modes of the
2522 sampler and the top level of any mip pyramid are used. Set W to zero. It
2523 behaves like the SAMPLE instruction, but a filtered sample is not
2524 generated. The four samples that contribute to filtering are placed into
2525 xyzw in counter-clockwise order, starting with the (u,v) texture coordinate
2526 delta at the following locations (-, +), (+, +), (+, -), (-, -), where the
2527 magnitude of the deltas are half a texel.
2530 .. opcode:: SVIEWINFO
2532 Query the dimensions of a given sampler view. dst receives width, height,
2533 depth or array size and number of mipmap levels as int4. The dst can have a
2534 writemask which will specify what info is the caller interested in.
2536 Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view``
2538 Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]``
2540 src_mip_level is an unsigned integer scalar. If it's out of range then
2541 returns 0 for width, height and depth/array size but the total number of
2542 mipmap is still returned correctly for the given sampler view. The returned
2543 width, height and depth values are for the mipmap level selected by the
2544 src_mip_level and are in the number of texels. For 1d texture array width
2545 is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is
2546 still in dst.w. In contrast to d3d10 resinfo, there's no way in the tgsi
2547 instruction encoding to specify the return type (float/rcpfloat/uint), hence
2548 always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1
2549 resinfo allowing swizzling dst values is ignored (due to the interaction
2550 with rcpfloat modifier which requires some swizzle handling in the state
2553 .. opcode:: SAMPLE_POS
2555 Query the position of a sample in the given resource or render target
2556 when per-sample fragment shading is in effect.
2558 Syntax: ``SAMPLE_POS dst, source, sample_index``
2560 dst receives float4 (x, y, undef, undef) indicated where the sample is
2561 located. Sample locations are in the range [0, 1] where 0.5 is the center
2564 source is either a sampler view (to indicate a shader resource) or temp
2565 register (to indicate the render target). The source register may have
2566 an optional swizzle to apply to the returned result
2568 sample_index is an integer scalar indicating which sample position is to
2571 If per-sample shading is not in effect or the source resource or render
2572 target is not multisampled, the result is (0.5, 0.5, undef, undef).
2574 NOTE: no driver has implemented this opcode yet (and no state tracker
2575 emits it). This information is subject to change.
2577 .. opcode:: SAMPLE_INFO
2579 Query the number of samples in a multisampled resource or render target.
2581 Syntax: ``SAMPLE_INFO dst, source``
2583 dst receives int4 (n, 0, 0, 0) where n is the number of samples in a
2584 resource or the render target.
2586 source is either a sampler view (to indicate a shader resource) or temp
2587 register (to indicate the render target). The source register may have
2588 an optional swizzle to apply to the returned result
2590 If per-sample shading is not in effect or the source resource or render
2591 target is not multisampled, the result is (1, 0, 0, 0).
2593 NOTE: no driver has implemented this opcode yet (and no state tracker
2594 emits it). This information is subject to change.
2596 .. _resourceopcodes:
2598 Resource Access Opcodes
2599 ^^^^^^^^^^^^^^^^^^^^^^^
2601 For these opcodes, the resource can be a BUFFER, IMAGE, or MEMORY.
2603 .. opcode:: LOAD - Fetch data from a shader buffer or image
2605 Syntax: ``LOAD dst, resource, address``
2607 Example: ``LOAD TEMP[0], BUFFER[0], TEMP[1]``
2609 Using the provided integer address, LOAD fetches data
2610 from the specified buffer or texture without any
2613 The 'address' is specified as a vector of unsigned
2614 integers. If the 'address' is out of range the result
2617 Only the first mipmap level of a resource can be read
2618 from using this instruction.
2620 For 1D or 2D texture arrays, the array index is
2621 provided as an unsigned integer in address.y or
2622 address.z, respectively. address.yz are ignored for
2623 buffers and 1D textures. address.z is ignored for 1D
2624 texture arrays and 2D textures. address.w is always
2627 A swizzle suffix may be added to the resource argument
2628 this will cause the resource data to be swizzled accordingly.
2630 .. opcode:: STORE - Write data to a shader resource
2632 Syntax: ``STORE resource, address, src``
2634 Example: ``STORE BUFFER[0], TEMP[0], TEMP[1]``
2636 Using the provided integer address, STORE writes data
2637 to the specified buffer or texture.
2639 The 'address' is specified as a vector of unsigned
2640 integers. If the 'address' is out of range the result
2643 Only the first mipmap level of a resource can be
2644 written to using this instruction.
2646 For 1D or 2D texture arrays, the array index is
2647 provided as an unsigned integer in address.y or
2648 address.z, respectively. address.yz are ignored for
2649 buffers and 1D textures. address.z is ignored for 1D
2650 texture arrays and 2D textures. address.w is always
2653 .. opcode:: RESQ - Query information about a resource
2655 Syntax: ``RESQ dst, resource``
2657 Example: ``RESQ TEMP[0], BUFFER[0]``
2659 Returns information about the buffer or image resource. For buffer
2660 resources, the size (in bytes) is returned in the x component. For
2661 image resources, .xyz will contain the width/height/layers of the
2662 image, while .w will contain the number of samples for multi-sampled
2665 .. opcode:: FBFETCH - Load data from framebuffer
2667 Syntax: ``FBFETCH dst, output``
2669 Example: ``FBFETCH TEMP[0], OUT[0]``
2671 This is only valid on ``COLOR`` semantic outputs. Returns the color
2672 of the current position in the framebuffer from before this fragment
2673 shader invocation. May return the same value from multiple calls for
2674 a particular output within a single invocation. Note that result may
2675 be undefined if a fragment is drawn multiple times without a blend
2679 .. _threadsyncopcodes:
2681 Inter-thread synchronization opcodes
2682 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2684 These opcodes are intended for communication between threads running
2685 within the same compute grid. For now they're only valid in compute
2688 .. opcode:: MFENCE - Memory fence
2690 Syntax: ``MFENCE resource``
2692 Example: ``MFENCE RES[0]``
2694 This opcode forces strong ordering between any memory access
2695 operations that affect the specified resource. This means that
2696 previous loads and stores (and only those) will be performed and
2697 visible to other threads before the program execution continues.
2700 .. opcode:: LFENCE - Load memory fence
2702 Syntax: ``LFENCE resource``
2704 Example: ``LFENCE RES[0]``
2706 Similar to MFENCE, but it only affects the ordering of memory loads.
2709 .. opcode:: SFENCE - Store memory fence
2711 Syntax: ``SFENCE resource``
2713 Example: ``SFENCE RES[0]``
2715 Similar to MFENCE, but it only affects the ordering of memory stores.
2718 .. opcode:: BARRIER - Thread group barrier
2722 This opcode suspends the execution of the current thread until all
2723 the remaining threads in the working group reach the same point of
2724 the program. Results are unspecified if any of the remaining
2725 threads terminates or never reaches an executed BARRIER instruction.
2727 .. opcode:: MEMBAR - Memory barrier
2731 This opcode waits for the completion of all memory accesses based on
2732 the type passed in. The type is an immediate bitfield with the following
2735 Bit 0: Shader storage buffers
2736 Bit 1: Atomic buffers
2738 Bit 3: Shared memory
2741 These may be passed in in any combination. An implementation is free to not
2742 distinguish between these as it sees fit. However these map to all the
2743 possibilities made available by GLSL.
2750 These opcodes provide atomic variants of some common arithmetic and
2751 logical operations. In this context atomicity means that another
2752 concurrent memory access operation that affects the same memory
2753 location is guaranteed to be performed strictly before or after the
2754 entire execution of the atomic operation. The resource may be a BUFFER,
2755 IMAGE, or MEMORY. In the case of an image, the offset works the same as for
2756 ``LOAD`` and ``STORE``, specified above. These atomic operations may
2757 only be used with 32-bit integer image formats.
2759 .. opcode:: ATOMUADD - Atomic integer addition
2761 Syntax: ``ATOMUADD dst, resource, offset, src``
2763 Example: ``ATOMUADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2765 The following operation is performed atomically:
2769 dst_x = resource[offset]
2771 resource[offset] = dst_x + src_x
2774 .. opcode:: ATOMXCHG - Atomic exchange
2776 Syntax: ``ATOMXCHG dst, resource, offset, src``
2778 Example: ``ATOMXCHG TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2780 The following operation is performed atomically:
2784 dst_x = resource[offset]
2786 resource[offset] = src_x
2789 .. opcode:: ATOMCAS - Atomic compare-and-exchange
2791 Syntax: ``ATOMCAS dst, resource, offset, cmp, src``
2793 Example: ``ATOMCAS TEMP[0], BUFFER[0], TEMP[1], TEMP[2], TEMP[3]``
2795 The following operation is performed atomically:
2799 dst_x = resource[offset]
2801 resource[offset] = (dst_x == cmp_x ? src_x : dst_x)
2804 .. opcode:: ATOMAND - Atomic bitwise And
2806 Syntax: ``ATOMAND dst, resource, offset, src``
2808 Example: ``ATOMAND TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2810 The following operation is performed atomically:
2814 dst_x = resource[offset]
2816 resource[offset] = dst_x \& src_x
2819 .. opcode:: ATOMOR - Atomic bitwise Or
2821 Syntax: ``ATOMOR dst, resource, offset, src``
2823 Example: ``ATOMOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2825 The following operation is performed atomically:
2829 dst_x = resource[offset]
2831 resource[offset] = dst_x | src_x
2834 .. opcode:: ATOMXOR - Atomic bitwise Xor
2836 Syntax: ``ATOMXOR dst, resource, offset, src``
2838 Example: ``ATOMXOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2840 The following operation is performed atomically:
2844 dst_x = resource[offset]
2846 resource[offset] = dst_x \oplus src_x
2849 .. opcode:: ATOMUMIN - Atomic unsigned minimum
2851 Syntax: ``ATOMUMIN dst, resource, offset, src``
2853 Example: ``ATOMUMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2855 The following operation is performed atomically:
2859 dst_x = resource[offset]
2861 resource[offset] = (dst_x < src_x ? dst_x : src_x)
2864 .. opcode:: ATOMUMAX - Atomic unsigned maximum
2866 Syntax: ``ATOMUMAX dst, resource, offset, src``
2868 Example: ``ATOMUMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2870 The following operation is performed atomically:
2874 dst_x = resource[offset]
2876 resource[offset] = (dst_x > src_x ? dst_x : src_x)
2879 .. opcode:: ATOMIMIN - Atomic signed minimum
2881 Syntax: ``ATOMIMIN dst, resource, offset, src``
2883 Example: ``ATOMIMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2885 The following operation is performed atomically:
2889 dst_x = resource[offset]
2891 resource[offset] = (dst_x < src_x ? dst_x : src_x)
2894 .. opcode:: ATOMIMAX - Atomic signed maximum
2896 Syntax: ``ATOMIMAX dst, resource, offset, src``
2898 Example: ``ATOMIMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2900 The following operation is performed atomically:
2904 dst_x = resource[offset]
2906 resource[offset] = (dst_x > src_x ? dst_x : src_x)
2909 .. _interlaneopcodes:
2914 These opcodes reduce the given value across the shader invocations
2915 running in the current SIMD group. Every thread in the subgroup will receive
2916 the same result. The BALLOT operations accept a single-channel argument that
2917 is treated as a boolean and produce a 64-bit value.
2919 .. opcode:: VOTE_ANY - Value is set in any of the active invocations
2921 Syntax: ``VOTE_ANY dst, value``
2923 Example: ``VOTE_ANY TEMP[0].x, TEMP[1].x``
2926 .. opcode:: VOTE_ALL - Value is set in all of the active invocations
2928 Syntax: ``VOTE_ALL dst, value``
2930 Example: ``VOTE_ALL TEMP[0].x, TEMP[1].x``
2933 .. opcode:: VOTE_EQ - Value is the same in all of the active invocations
2935 Syntax: ``VOTE_EQ dst, value``
2937 Example: ``VOTE_EQ TEMP[0].x, TEMP[1].x``
2940 .. opcode:: BALLOT - Lanemask of whether the value is set in each active
2943 Syntax: ``BALLOT dst, value``
2945 Example: ``BALLOT TEMP[0].xy, TEMP[1].x``
2947 When the argument is a constant true, this produces a bitmask of active
2948 invocations. In fragment shaders, this can include helper invocations
2949 (invocations whose outputs and writes to memory are discarded, but which
2950 are used to compute derivatives).
2953 .. opcode:: READ_FIRST - Broadcast the value from the first active
2954 invocation to all active lanes
2956 Syntax: ``READ_FIRST dst, value``
2958 Example: ``READ_FIRST TEMP[0], TEMP[1]``
2961 .. opcode:: READ_INVOC - Retrieve the value from the given invocation
2962 (need not be uniform)
2964 Syntax: ``READ_INVOC dst, value, invocation``
2966 Example: ``READ_INVOC TEMP[0].xy, TEMP[1].xy, TEMP[2].x``
2968 invocation.x controls the invocation number to read from for all channels.
2969 The invocation number must be the same across all active invocations in a
2970 sub-group; otherwise, the results are undefined.
2973 Explanation of symbols used
2974 ------------------------------
2981 :math:`|x|` Absolute value of `x`.
2983 :math:`\lceil x \rceil` Ceiling of `x`.
2985 clamp(x,y,z) Clamp x between y and z.
2986 (x < y) ? y : (x > z) ? z : x
2988 :math:`\lfloor x\rfloor` Floor of `x`.
2990 :math:`\log_2{x}` Logarithm of `x`, base 2.
2992 max(x,y) Maximum of x and y.
2995 min(x,y) Minimum of x and y.
2998 partialx(x) Derivative of x relative to fragment's X.
3000 partialy(x) Derivative of x relative to fragment's Y.
3002 pop() Pop from stack.
3004 :math:`x^y` `x` to the power `y`.
3006 push(x) Push x on stack.
3010 trunc(x) Truncate x, i.e. drop the fraction bits.
3017 discard Discard fragment.
3021 target Label of target instruction.
3032 Declares a register that is will be referenced as an operand in Instruction
3035 File field contains register file that is being declared and is one
3038 UsageMask field specifies which of the register components can be accessed
3039 and is one of TGSI_WRITEMASK.
3041 The Local flag specifies that a given value isn't intended for
3042 subroutine parameter passing and, as a result, the implementation
3043 isn't required to give any guarantees of it being preserved across
3044 subroutine boundaries. As it's merely a compiler hint, the
3045 implementation is free to ignore it.
3047 If Dimension flag is set to 1, a Declaration Dimension token follows.
3049 If Semantic flag is set to 1, a Declaration Semantic token follows.
3051 If Interpolate flag is set to 1, a Declaration Interpolate token follows.
3053 If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows.
3055 If Array flag is set to 1, a Declaration Array token follows.
3058 ^^^^^^^^^^^^^^^^^^^^^^^^
3060 Declarations can optional have an ArrayID attribute which can be referred by
3061 indirect addressing operands. An ArrayID of zero is reserved and treated as
3062 if no ArrayID is specified.
3064 If an indirect addressing operand refers to a specific declaration by using
3065 an ArrayID only the registers in this declaration are guaranteed to be
3066 accessed, accessing any register outside this declaration results in undefined
3067 behavior. Note that for compatibility the effective index is zero-based and
3068 not relative to the specified declaration
3070 If no ArrayID is specified with an indirect addressing operand the whole
3071 register file might be accessed by this operand. This is strongly discouraged
3072 and will prevent packing of scalar/vec2 arrays and effective alias analysis.
3073 This is only legal for TEMP and CONST register files.
3075 Declaration Semantic
3076 ^^^^^^^^^^^^^^^^^^^^^^^^
3078 Vertex and fragment shader input and output registers may be labeled
3079 with semantic information consisting of a name and index.
3081 Follows Declaration token if Semantic bit is set.
3083 Since its purpose is to link a shader with other stages of the pipeline,
3084 it is valid to follow only those Declaration tokens that declare a register
3085 either in INPUT or OUTPUT file.
3087 SemanticName field contains the semantic name of the register being declared.
3088 There is no default value.
3090 SemanticIndex is an optional subscript that can be used to distinguish
3091 different register declarations with the same semantic name. The default value
3094 The meanings of the individual semantic names are explained in the following
3097 TGSI_SEMANTIC_POSITION
3098 """"""""""""""""""""""
3100 For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
3101 output register which contains the homogeneous vertex position in the clip
3102 space coordinate system. After clipping, the X, Y and Z components of the
3103 vertex will be divided by the W value to get normalized device coordinates.
3105 For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
3106 fragment shader input (or system value, depending on which one is
3107 supported by the driver) contains the fragment's window position. The X
3108 component starts at zero and always increases from left to right.
3109 The Y component starts at zero and always increases but Y=0 may either
3110 indicate the top of the window or the bottom depending on the fragment
3111 coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
3112 The Z coordinate ranges from 0 to 1 to represent depth from the front
3113 to the back of the Z buffer. The W component contains the interpolated
3114 reciprocal of the vertex position W component (corresponding to gl_Fragcoord,
3115 but unlike d3d10 which interpolates the same 1/w but then gives back
3116 the reciprocal of the interpolated value).
3118 Fragment shaders may also declare an output register with
3119 TGSI_SEMANTIC_POSITION. Only the Z component is writable. This allows
3120 the fragment shader to change the fragment's Z position.
3127 For vertex shader outputs or fragment shader inputs/outputs, this
3128 label indicates that the register contains an R,G,B,A color.
3130 Several shader inputs/outputs may contain colors so the semantic index
3131 is used to distinguish them. For example, color[0] may be the diffuse
3132 color while color[1] may be the specular color.
3134 This label is needed so that the flat/smooth shading can be applied
3135 to the right interpolants during rasterization.
3139 TGSI_SEMANTIC_BCOLOR
3140 """"""""""""""""""""
3142 Back-facing colors are only used for back-facing polygons, and are only valid
3143 in vertex shader outputs. After rasterization, all polygons are front-facing
3144 and COLOR and BCOLOR end up occupying the same slots in the fragment shader,
3145 so all BCOLORs effectively become regular COLORs in the fragment shader.
3151 Vertex shader inputs and outputs and fragment shader inputs may be
3152 labeled with TGSI_SEMANTIC_FOG to indicate that the register contains
3153 a fog coordinate. Typically, the fragment shader will use the fog coordinate
3154 to compute a fog blend factor which is used to blend the normal fragment color
3155 with a constant fog color. But fog coord really is just an ordinary vec4
3156 register like regular semantics.
3162 Vertex shader input and output registers may be labeled with
3163 TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size
3164 in the form (S, 0, 0, 1). The point size controls the width or diameter
3165 of points for rasterization. This label cannot be used in fragment
3168 When using this semantic, be sure to set the appropriate state in the
3169 :ref:`rasterizer` first.
3172 TGSI_SEMANTIC_TEXCOORD
3173 """"""""""""""""""""""
3175 Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3177 Vertex shader outputs and fragment shader inputs may be labeled with
3178 this semantic to make them replaceable by sprite coordinates via the
3179 sprite_coord_enable state in the :ref:`rasterizer`.
3180 The semantic index permitted with this semantic is limited to <= 7.
3182 If the driver does not support TEXCOORD, sprite coordinate replacement
3183 applies to inputs with the GENERIC semantic instead.
3185 The intended use case for this semantic is gl_TexCoord.
3188 TGSI_SEMANTIC_PCOORD
3189 """"""""""""""""""""
3191 Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3193 Fragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate
3194 that the register contains sprite coordinates in the form (x, y, 0, 1), if
3195 the current primitive is a point and point sprites are enabled. Otherwise,
3196 the contents of the register are undefined.
3198 The intended use case for this semantic is gl_PointCoord.
3201 TGSI_SEMANTIC_GENERIC
3202 """""""""""""""""""""
3204 All vertex/fragment shader inputs/outputs not labeled with any other
3205 semantic label can be considered to be generic attributes. Typical
3206 uses of generic inputs/outputs are texcoords and user-defined values.
3209 TGSI_SEMANTIC_NORMAL
3210 """"""""""""""""""""
3212 Indicates that a vertex shader input is a normal vector. This is
3213 typically only used for legacy graphics APIs.
3219 This label applies to fragment shader inputs (or system values,
3220 depending on which one is supported by the driver) and indicates that
3221 the register contains front/back-face information.
3223 If it is an input, it will be a floating-point vector in the form (F, 0, 0, 1),
3224 where F will be positive when the fragment belongs to a front-facing polygon,
3225 and negative when the fragment belongs to a back-facing polygon.
3227 If it is a system value, it will be an integer vector in the form (F, 0, 0, 1),
3228 where F is 0xffffffff when the fragment belongs to a front-facing polygon and
3229 0 when the fragment belongs to a back-facing polygon.
3232 TGSI_SEMANTIC_EDGEFLAG
3233 """"""""""""""""""""""
3235 For vertex shaders, this sematic label indicates that an input or
3236 output is a boolean edge flag. The register layout is [F, x, x, x]
3237 where F is 0.0 or 1.0 and x = don't care. Normally, the vertex shader
3238 simply copies the edge flag input to the edgeflag output.
3240 Edge flags are used to control which lines or points are actually
3241 drawn when the polygon mode converts triangles/quads/polygons into
3245 TGSI_SEMANTIC_STENCIL
3246 """""""""""""""""""""
3248 For fragment shaders, this semantic label indicates that an output
3249 is a writable stencil reference value. Only the Y component is writable.
3250 This allows the fragment shader to change the fragments stencilref value.
3253 TGSI_SEMANTIC_VIEWPORT_INDEX
3254 """"""""""""""""""""""""""""
3256 For geometry shaders, this semantic label indicates that an output
3257 contains the index of the viewport (and scissor) to use.
3258 This is an integer value, and only the X component is used.
3260 If PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is
3261 supported, then this semantic label can also be used in vertex or
3262 tessellation evaluation shaders, respectively. Only the value written in the
3263 last vertex processing stage is used.
3269 For geometry shaders, this semantic label indicates that an output
3270 contains the layer value to use for the color and depth/stencil surfaces.
3271 This is an integer value, and only the X component is used.
3272 (Also known as rendertarget array index.)
3274 If PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is
3275 supported, then this semantic label can also be used in vertex or
3276 tessellation evaluation shaders, respectively. Only the value written in the
3277 last vertex processing stage is used.
3280 TGSI_SEMANTIC_CULLDIST
3281 """"""""""""""""""""""
3283 Used as distance to plane for performing application-defined culling
3284 of individual primitives against a plane. When components of vertex
3285 elements are given this label, these values are assumed to be a
3286 float32 signed distance to a plane. Primitives will be completely
3287 discarded if the plane distance for all of the vertices in the
3288 primitive are < 0. If a vertex has a cull distance of NaN, that
3289 vertex counts as "out" (as if its < 0);
3290 The limits on both clip and cull distances are bound
3291 by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
3292 the maximum number of components that can be used to hold the
3293 distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
3294 which specifies the maximum number of registers which can be
3295 annotated with those semantics.
3298 TGSI_SEMANTIC_CLIPDIST
3299 """"""""""""""""""""""
3301 Note this covers clipping and culling distances.
3303 When components of vertex elements are identified this way, these
3304 values are each assumed to be a float32 signed distance to a plane.
3307 Primitive setup only invokes rasterization on pixels for which
3308 the interpolated plane distances are >= 0.
3311 Primitives will be completely discarded if the plane distance
3312 for all of the vertices in the primitive are < 0.
3313 If a vertex has a cull distance of NaN, that vertex counts as "out"
3316 Multiple clip/cull planes can be implemented simultaneously, by
3317 annotating multiple components of one or more vertex elements with
3318 the above specified semantic.
3319 The limits on both clip and cull distances are bound
3320 by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
3321 the maximum number of components that can be used to hold the
3322 distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
3323 which specifies the maximum number of registers which can be
3324 annotated with those semantics.
3325 The properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED
3326 are used to divide up the 2 x vec4 space between clipping and culling.
3328 TGSI_SEMANTIC_SAMPLEID
3329 """"""""""""""""""""""
3331 For fragment shaders, this semantic label indicates that a system value
3332 contains the current sample id (i.e. gl_SampleID) as an unsigned int.
3333 Only the X component is used. If per-sample shading is not enabled,
3334 the result is (0, undef, undef, undef).
3336 Note that if the fragment shader uses this system value, the fragment
3337 shader is automatically executed at per sample frequency.
3339 TGSI_SEMANTIC_SAMPLEPOS
3340 """""""""""""""""""""""
3342 For fragment shaders, this semantic label indicates that a system
3343 value contains the current sample's position as float4(x, y, undef, undef)
3344 in the render target (i.e. gl_SamplePosition) when per-fragment shading
3345 is in effect. Position values are in the range [0, 1] where 0.5 is
3346 the center of the fragment.
3348 Note that if the fragment shader uses this system value, the fragment
3349 shader is automatically executed at per sample frequency.
3351 TGSI_SEMANTIC_SAMPLEMASK
3352 """"""""""""""""""""""""
3354 For fragment shaders, this semantic label can be applied to either a
3355 shader system value input or output.
3357 For a system value, the sample mask indicates the set of samples covered by
3358 the current primitive. If MSAA is not enabled, the value is (1, 0, 0, 0).
3360 For an output, the sample mask is used to disable further sample processing.
3362 For both, the register type is uint[4] but only the X component is used
3363 (i.e. gl_SampleMask[0]). Each bit corresponds to one sample position (up
3364 to 32x MSAA is supported).
3366 TGSI_SEMANTIC_INVOCATIONID
3367 """"""""""""""""""""""""""
3369 For geometry shaders, this semantic label indicates that a system value
3370 contains the current invocation id (i.e. gl_InvocationID).
3371 This is an integer value, and only the X component is used.
3373 TGSI_SEMANTIC_INSTANCEID
3374 """"""""""""""""""""""""
3376 For vertex shaders, this semantic label indicates that a system value contains
3377 the current instance id (i.e. gl_InstanceID). It does not include the base
3378 instance. This is an integer value, and only the X component is used.
3380 TGSI_SEMANTIC_VERTEXID
3381 """"""""""""""""""""""
3383 For vertex shaders, this semantic label indicates that a system value contains
3384 the current vertex id (i.e. gl_VertexID). It does (unlike in d3d10) include the
3385 base vertex. This is an integer value, and only the X component is used.
3387 TGSI_SEMANTIC_VERTEXID_NOBASE
3388 """""""""""""""""""""""""""""""
3390 For vertex shaders, this semantic label indicates that a system value contains
3391 the current vertex id without including the base vertex (this corresponds to
3392 d3d10 vertex id, so TGSI_SEMANTIC_VERTEXID_NOBASE + TGSI_SEMANTIC_BASEVERTEX
3393 == TGSI_SEMANTIC_VERTEXID). This is an integer value, and only the X component
3396 TGSI_SEMANTIC_BASEVERTEX
3397 """"""""""""""""""""""""
3399 For vertex shaders, this semantic label indicates that a system value contains
3400 the base vertex (i.e. gl_BaseVertex). Note that for non-indexed draw calls,
3401 this contains the first (or start) value instead.
3402 This is an integer value, and only the X component is used.
3404 TGSI_SEMANTIC_PRIMID
3405 """"""""""""""""""""
3407 For geometry and fragment shaders, this semantic label indicates the value
3408 contains the primitive id (i.e. gl_PrimitiveID). This is an integer value,
3409 and only the X component is used.
3410 FIXME: This right now can be either a ordinary input or a system value...
3416 For tessellation evaluation/control shaders, this semantic label indicates a
3417 generic per-patch attribute. Such semantics will not implicitly be per-vertex
3420 TGSI_SEMANTIC_TESSCOORD
3421 """""""""""""""""""""""
3423 For tessellation evaluation shaders, this semantic label indicates the
3424 coordinates of the vertex being processed. This is available in XYZ; W is
3427 TGSI_SEMANTIC_TESSOUTER
3428 """""""""""""""""""""""
3430 For tessellation evaluation/control shaders, this semantic label indicates the
3431 outer tessellation levels of the patch. Isoline tessellation will only have XY
3432 defined, triangle will have XYZ and quads will have XYZW defined. This
3433 corresponds to gl_TessLevelOuter.
3435 TGSI_SEMANTIC_TESSINNER
3436 """""""""""""""""""""""
3438 For tessellation evaluation/control shaders, this semantic label indicates the
3439 inner tessellation levels of the patch. The X value is only defined for
3440 triangle tessellation, while quads will have XY defined. This is entirely
3441 undefined for isoline tessellation.
3443 TGSI_SEMANTIC_VERTICESIN
3444 """"""""""""""""""""""""
3446 For tessellation evaluation/control shaders, this semantic label indicates the
3447 number of vertices provided in the input patch. Only the X value is defined.
3449 TGSI_SEMANTIC_HELPER_INVOCATION
3450 """""""""""""""""""""""""""""""
3452 For fragment shaders, this semantic indicates whether the current
3453 invocation is covered or not. Helper invocations are created in order
3454 to properly compute derivatives, however it may be desirable to skip
3455 some of the logic in those cases. See ``gl_HelperInvocation`` documentation.
3457 TGSI_SEMANTIC_BASEINSTANCE
3458 """"""""""""""""""""""""""
3460 For vertex shaders, the base instance argument supplied for this
3461 draw. This is an integer value, and only the X component is used.
3463 TGSI_SEMANTIC_DRAWID
3464 """"""""""""""""""""
3466 For vertex shaders, the zero-based index of the current draw in a
3467 ``glMultiDraw*`` invocation. This is an integer value, and only the X
3471 TGSI_SEMANTIC_WORK_DIM
3472 """"""""""""""""""""""
3474 For compute shaders started via opencl this retrieves the work_dim
3475 parameter to the clEnqueueNDRangeKernel call with which the shader
3479 TGSI_SEMANTIC_GRID_SIZE
3480 """""""""""""""""""""""
3482 For compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3483 of a grid of thread blocks.
3486 TGSI_SEMANTIC_BLOCK_ID
3487 """"""""""""""""""""""
3489 For compute shaders, this semantic indicates the (x, y, z) coordinates of the
3490 current block inside of the grid.
3493 TGSI_SEMANTIC_BLOCK_SIZE
3494 """"""""""""""""""""""""
3496 For compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3497 of a block in threads.
3500 TGSI_SEMANTIC_THREAD_ID
3501 """""""""""""""""""""""
3503 For compute shaders, this semantic indicates the (x, y, z) coordinates of the
3504 current thread inside of the block.
3507 TGSI_SEMANTIC_SUBGROUP_SIZE
3508 """""""""""""""""""""""""""
3510 This semantic indicates the subgroup size for the current invocation. This is
3511 an integer of at most 64, as it indicates the width of lanemasks. It does not
3512 depend on the number of invocations that are active.
3515 TGSI_SEMANTIC_SUBGROUP_INVOCATION
3516 """""""""""""""""""""""""""""""""
3518 The index of the current invocation within its subgroup.
3521 TGSI_SEMANTIC_SUBGROUP_EQ_MASK
3522 """"""""""""""""""""""""""""""
3524 A bit mask of ``bit index == TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3525 ``1 << subgroup_invocation`` in arbitrary precision arithmetic.
3528 TGSI_SEMANTIC_SUBGROUP_GE_MASK
3529 """"""""""""""""""""""""""""""
3531 A bit mask of ``bit index >= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3532 ``((1 << (subgroup_size - subgroup_invocation)) - 1) << subgroup_invocation``
3533 in arbitrary precision arithmetic.
3536 TGSI_SEMANTIC_SUBGROUP_GT_MASK
3537 """"""""""""""""""""""""""""""
3539 A bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3540 ``((1 << (subgroup_size - subgroup_invocation - 1)) - 1) << (subgroup_invocation + 1)``
3541 in arbitrary precision arithmetic.
3544 TGSI_SEMANTIC_SUBGROUP_LE_MASK
3545 """"""""""""""""""""""""""""""
3547 A bit mask of ``bit index <= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3548 ``(1 << (subgroup_invocation + 1)) - 1`` in arbitrary precision arithmetic.
3551 TGSI_SEMANTIC_SUBGROUP_LT_MASK
3552 """"""""""""""""""""""""""""""
3554 A bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3555 ``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic.
3558 Declaration Interpolate
3559 ^^^^^^^^^^^^^^^^^^^^^^^
3561 This token is only valid for fragment shader INPUT declarations.
3563 The Interpolate field specifes the way input is being interpolated by
3564 the rasteriser and is one of TGSI_INTERPOLATE_*.
3566 The Location field specifies the location inside the pixel that the
3567 interpolation should be done at, one of ``TGSI_INTERPOLATE_LOC_*``. Note that
3568 when per-sample shading is enabled, the implementation may choose to
3569 interpolate at the sample irrespective of the Location field.
3571 The CylindricalWrap bitfield specifies which register components
3572 should be subject to cylindrical wrapping when interpolating by the
3573 rasteriser. If TGSI_CYLINDRICAL_WRAP_X is set to 1, the X component
3574 should be interpolated according to cylindrical wrapping rules.
3577 Declaration Sampler View
3578 ^^^^^^^^^^^^^^^^^^^^^^^^
3580 Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
3582 DCL SVIEW[#], resource, type(s)
3584 Declares a shader input sampler view and assigns it to a SVIEW[#]
3587 resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray.
3589 type must be 1 or 4 entries (if specifying on a per-component
3590 level) out of UNORM, SNORM, SINT, UINT and FLOAT.
3592 For TEX\* style texture sample opcodes (as opposed to SAMPLE\* opcodes
3593 which take an explicit SVIEW[#] source register), there may be optionally
3594 SVIEW[#] declarations. In this case, the SVIEW index is implied by the
3595 SAMP index, and there must be a corresponding SVIEW[#] declaration for
3596 each SAMP[#] declaration. Drivers are free to ignore this if they wish.
3597 But note in particular that some drivers need to know the sampler type
3598 (float/int/unsigned) in order to generate the correct code, so cases
3599 where integer textures are sampled, SVIEW[#] declarations should be
3602 NOTE: It is NOT legal to mix SAMPLE\* style opcodes and TEX\* opcodes
3605 Declaration Resource
3606 ^^^^^^^^^^^^^^^^^^^^
3608 Follows Declaration token if file is TGSI_FILE_RESOURCE.
3610 DCL RES[#], resource [, WR] [, RAW]
3612 Declares a shader input resource and assigns it to a RES[#]
3615 resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and
3618 If the RAW keyword is not specified, the texture data will be
3619 subject to conversion, swizzling and scaling as required to yield
3620 the specified data type from the physical data format of the bound
3623 If the RAW keyword is specified, no channel conversion will be
3624 performed: the values read for each of the channels (X,Y,Z,W) will
3625 correspond to consecutive words in the same order and format
3626 they're found in memory. No element-to-address conversion will be
3627 performed either: the value of the provided X coordinate will be
3628 interpreted in byte units instead of texel units. The result of
3629 accessing a misaligned address is undefined.
3631 Usage of the STORE opcode is only allowed if the WR (writable) flag
3636 ^^^^^^^^^^^^^^^^^^^^^^^^
3638 Properties are general directives that apply to the whole TGSI program.
3643 Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
3644 The default value is UPPER_LEFT.
3646 If UPPER_LEFT, the position will be (0,0) at the upper left corner and
3647 increase downward and rightward.
3648 If LOWER_LEFT, the position will be (0,0) at the lower left corner and
3649 increase upward and rightward.
3651 OpenGL defaults to LOWER_LEFT, and is configurable with the
3652 GL_ARB_fragment_coord_conventions extension.
3654 DirectX 9/10 use UPPER_LEFT.
3656 FS_COORD_PIXEL_CENTER
3657 """""""""""""""""""""
3659 Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
3660 The default value is HALF_INTEGER.
3662 If HALF_INTEGER, the fractionary part of the position will be 0.5
3663 If INTEGER, the fractionary part of the position will be 0.0
3665 Note that this does not affect the set of fragments generated by
3666 rasterization, which is instead controlled by half_pixel_center in the
3669 OpenGL defaults to HALF_INTEGER, and is configurable with the
3670 GL_ARB_fragment_coord_conventions extension.
3672 DirectX 9 uses INTEGER.
3673 DirectX 10 uses HALF_INTEGER.
3675 FS_COLOR0_WRITES_ALL_CBUFS
3676 """"""""""""""""""""""""""
3677 Specifies that writes to the fragment shader color 0 are replicated to all
3678 bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where
3679 fragData is directed to a single color buffer, but fragColor is broadcast.
3682 """"""""""""""""""""""""""
3683 If this property is set on the program bound to the shader stage before the
3684 fragment shader, user clip planes should have no effect (be disabled) even if
3685 that shader does not write to any clip distance outputs and the rasterizer's
3686 clip_plane_enable is non-zero.
3687 This property is only supported by drivers that also support shader clip
3689 This is useful for APIs that don't have UCPs and where clip distances written
3690 by a shader cannot be disabled.
3695 Specifies the number of times a geometry shader should be executed for each
3696 input primitive. Each invocation will have a different
3697 TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to
3700 VS_WINDOW_SPACE_POSITION
3701 """"""""""""""""""""""""""
3702 If this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output
3703 is assumed to contain window space coordinates.
3704 Division of X,Y,Z by W and the viewport transformation are disabled, and 1/W is
3705 directly taken from the 4-th component of the shader output.
3706 Naturally, clipping is not performed on window coordinates either.
3707 The effect of this property is undefined if a geometry or tessellation shader
3713 The number of vertices written by the tessellation control shader. This
3714 effectively defines the patch input size of the tessellation evaluation shader
3720 This sets the tessellation primitive mode, one of ``PIPE_PRIM_TRIANGLES``,
3721 ``PIPE_PRIM_QUADS``, or ``PIPE_PRIM_LINES``. (Unlike in GL, there is no
3722 separate isolines settings, the regular lines is assumed to mean isolines.)
3727 This sets the spacing mode of the tessellation generator, one of
3728 ``PIPE_TESS_SPACING_*``.
3733 This sets the vertex order to be clockwise if the value is 1, or
3734 counter-clockwise if set to 0.
3739 If set to a non-zero value, this turns on point mode for the tessellator,
3740 which means that points will be generated instead of primitives.
3742 NUM_CLIPDIST_ENABLED
3743 """"""""""""""""""""
3745 How many clip distance scalar outputs are enabled.
3747 NUM_CULLDIST_ENABLED
3748 """"""""""""""""""""
3750 How many cull distance scalar outputs are enabled.
3752 FS_EARLY_DEPTH_STENCIL
3753 """"""""""""""""""""""
3755 Whether depth test, stencil test, and occlusion query should run before
3756 the fragment shader (regardless of fragment shader side effects). Corresponds
3757 to GLSL early_fragment_tests.
3762 Which shader stage will MOST LIKELY follow after this shader when the shader
3763 is bound. This is only a hint to the driver and doesn't have to be precise.
3764 Only set for VS and TES.
3766 CS_FIXED_BLOCK_WIDTH / HEIGHT / DEPTH
3767 """""""""""""""""""""""""""""""""""""
3769 Threads per block in each dimension, if known at compile time. If the block size
3770 is known all three should be at least 1. If it is unknown they should all be set
3776 The MUL TGSI operation (FP32 multiplication) will return 0 if either
3777 of the operands are equal to 0. That means that 0 * Inf = 0. This
3778 should be set the same way for an entire pipeline. Note that this
3779 applies not only to the literal MUL TGSI opcode, but all FP32
3780 multiplications implied by other operations, such as MAD, FMA, DP2,
3781 DP3, DP4, DPH, DST, LOG, LRP, XPD, and possibly others. If there is a
3782 mismatch between shaders, then it is unspecified whether this behavior
3785 FS_POST_DEPTH_COVERAGE
3786 """"""""""""""""""""""
3788 When enabled, the input for TGSI_SEMANTIC_SAMPLEMASK will exclude samples
3789 that have failed the depth/stencil tests. This is only valid when
3790 FS_EARLY_DEPTH_STENCIL is also specified.
3793 Texture Sampling and Texture Formats
3794 ------------------------------------
3796 This table shows how texture image components are returned as (x,y,z,w) tuples
3797 by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
3798 :opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
3801 +--------------------+--------------+--------------------+--------------+
3802 | Texture Components | Gallium | OpenGL | Direct3D 9 |
3803 +====================+==============+====================+==============+
3804 | R | (r, 0, 0, 1) | (r, 0, 0, 1) | (r, 1, 1, 1) |
3805 +--------------------+--------------+--------------------+--------------+
3806 | RG | (r, g, 0, 1) | (r, g, 0, 1) | (r, g, 1, 1) |
3807 +--------------------+--------------+--------------------+--------------+
3808 | RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) |
3809 +--------------------+--------------+--------------------+--------------+
3810 | RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) |
3811 +--------------------+--------------+--------------------+--------------+
3812 | A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) |
3813 +--------------------+--------------+--------------------+--------------+
3814 | L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) |
3815 +--------------------+--------------+--------------------+--------------+
3816 | LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) |
3817 +--------------------+--------------+--------------------+--------------+
3818 | I | (i, i, i, i) | (i, i, i, i) | N/A |
3819 +--------------------+--------------+--------------------+--------------+
3820 | UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) |
3821 | | | [#envmap-bumpmap]_ | |
3822 +--------------------+--------------+--------------------+--------------+
3823 | Z | XXX TBD | (z, z, z, 1) | (0, z, 0, 1) |
3824 | | | [#depth-tex-mode]_ | |
3825 +--------------------+--------------+--------------------+--------------+
3826 | S | (s, s, s, s) | unknown | unknown |
3827 +--------------------+--------------+--------------------+--------------+
3829 .. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
3830 .. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
3831 or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.