4 TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
5 for describing shaders. Since Gallium is inherently shaderful, shaders are
6 an important part of the API. TGSI is the only intermediate representation
12 All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
13 floating-point four-component vectors. An opcode may have up to one
14 destination register, known as *dst*, and between zero and three source
15 registers, called *src0* through *src2*, or simply *src* if there is only
18 Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
19 components as integers. Other instructions permit using registers as
20 two-component vectors with double precision; see :ref:`doubleopcodes`.
22 When an instruction has a scalar result, the result is usually copied into
23 each of the components of *dst*. When this happens, the result is said to be
24 *replicated* to *dst*. :opcode:`RCP` is one such instruction.
29 TGSI supports modifiers on inputs (as well as saturate modifier on instructions).
31 For inputs which have a floating point type, both absolute value and negation
32 modifiers are supported (with absolute value being applied first).
33 TGSI_OPCODE_MOV is considered to have float input type for applying modifiers.
35 For inputs which have signed or unsigned type only the negate modifier is
42 ^^^^^^^^^^^^^^^^^^^^^^^^^
44 These opcodes are guaranteed to be available regardless of the driver being
47 .. opcode:: ARL - Address Register Load
51 dst.x = (int) \lfloor src.x\rfloor
53 dst.y = (int) \lfloor src.y\rfloor
55 dst.z = (int) \lfloor src.z\rfloor
57 dst.w = (int) \lfloor src.w\rfloor
60 .. opcode:: MOV - Move
73 .. opcode:: LIT - Light Coefficients
78 dst.y &= max(src.x, 0) \\
79 dst.z &= (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0 \\
83 .. opcode:: RCP - Reciprocal
85 This instruction replicates its result.
92 .. opcode:: RSQ - Reciprocal Square Root
94 This instruction replicates its result. The results are undefined for src <= 0.
98 dst = \frac{1}{\sqrt{src.x}}
101 .. opcode:: SQRT - Square Root
103 This instruction replicates its result. The results are undefined for src < 0.
110 .. opcode:: EXP - Approximate Exponential Base 2
114 dst.x &= 2^{\lfloor src.x\rfloor} \\
115 dst.y &= src.x - \lfloor src.x\rfloor \\
116 dst.z &= 2^{src.x} \\
120 .. opcode:: LOG - Approximate Logarithm Base 2
124 dst.x &= \lfloor\log_2{|src.x|}\rfloor \\
125 dst.y &= \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} \\
126 dst.z &= \log_2{|src.x|} \\
130 .. opcode:: MUL - Multiply
134 dst.x = src0.x \times src1.x
136 dst.y = src0.y \times src1.y
138 dst.z = src0.z \times src1.z
140 dst.w = src0.w \times src1.w
143 .. opcode:: ADD - Add
147 dst.x = src0.x + src1.x
149 dst.y = src0.y + src1.y
151 dst.z = src0.z + src1.z
153 dst.w = src0.w + src1.w
156 .. opcode:: DP3 - 3-component Dot Product
158 This instruction replicates its result.
162 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
165 .. opcode:: DP4 - 4-component Dot Product
167 This instruction replicates its result.
171 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
174 .. opcode:: DST - Distance Vector
179 dst.y &= src0.y \times src1.y\\
184 .. opcode:: MIN - Minimum
188 dst.x = min(src0.x, src1.x)
190 dst.y = min(src0.y, src1.y)
192 dst.z = min(src0.z, src1.z)
194 dst.w = min(src0.w, src1.w)
197 .. opcode:: MAX - Maximum
201 dst.x = max(src0.x, src1.x)
203 dst.y = max(src0.y, src1.y)
205 dst.z = max(src0.z, src1.z)
207 dst.w = max(src0.w, src1.w)
210 .. opcode:: SLT - Set On Less Than
214 dst.x = (src0.x < src1.x) ? 1.0F : 0.0F
216 dst.y = (src0.y < src1.y) ? 1.0F : 0.0F
218 dst.z = (src0.z < src1.z) ? 1.0F : 0.0F
220 dst.w = (src0.w < src1.w) ? 1.0F : 0.0F
223 .. opcode:: SGE - Set On Greater Equal Than
227 dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F
229 dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F
231 dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F
233 dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F
236 .. opcode:: MAD - Multiply And Add
240 dst.x = src0.x \times src1.x + src2.x
242 dst.y = src0.y \times src1.y + src2.y
244 dst.z = src0.z \times src1.z + src2.z
246 dst.w = src0.w \times src1.w + src2.w
249 .. opcode:: SUB - Subtract
253 dst.x = src0.x - src1.x
255 dst.y = src0.y - src1.y
257 dst.z = src0.z - src1.z
259 dst.w = src0.w - src1.w
262 .. opcode:: LRP - Linear Interpolate
266 dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
268 dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
270 dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
272 dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
275 .. opcode:: FMA - Fused Multiply-Add
277 Perform a * b + c with no intermediate rounding step.
281 dst.x = src0.x \times src1.x + src2.x
283 dst.y = src0.y \times src1.y + src2.y
285 dst.z = src0.z \times src1.z + src2.z
287 dst.w = src0.w \times src1.w + src2.w
290 .. opcode:: DP2A - 2-component Dot Product And Add
294 dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x
296 dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x
298 dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x
300 dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x
303 .. opcode:: FRC - Fraction
307 dst.x = src.x - \lfloor src.x\rfloor
309 dst.y = src.y - \lfloor src.y\rfloor
311 dst.z = src.z - \lfloor src.z\rfloor
313 dst.w = src.w - \lfloor src.w\rfloor
316 .. opcode:: CLAMP - Clamp
320 dst.x = clamp(src0.x, src1.x, src2.x)
322 dst.y = clamp(src0.y, src1.y, src2.y)
324 dst.z = clamp(src0.z, src1.z, src2.z)
326 dst.w = clamp(src0.w, src1.w, src2.w)
329 .. opcode:: FLR - Floor
333 dst.x = \lfloor src.x\rfloor
335 dst.y = \lfloor src.y\rfloor
337 dst.z = \lfloor src.z\rfloor
339 dst.w = \lfloor src.w\rfloor
342 .. opcode:: ROUND - Round
355 .. opcode:: EX2 - Exponential Base 2
357 This instruction replicates its result.
364 .. opcode:: LG2 - Logarithm Base 2
366 This instruction replicates its result.
373 .. opcode:: POW - Power
375 This instruction replicates its result.
379 dst = src0.x^{src1.x}
381 .. opcode:: XPD - Cross Product
385 dst.x = src0.y \times src1.z - src1.y \times src0.z
387 dst.y = src0.z \times src1.x - src1.z \times src0.x
389 dst.z = src0.x \times src1.y - src1.x \times src0.y
394 .. opcode:: ABS - Absolute
407 .. opcode:: DPH - Homogeneous Dot Product
409 This instruction replicates its result.
413 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
416 .. opcode:: COS - Cosine
418 This instruction replicates its result.
425 .. opcode:: DDX, DDX_FINE - Derivative Relative To X
427 The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
428 advertised. When it is, the fine version guarantees one derivative per row
429 while DDX is allowed to be the same for the entire 2x2 quad.
433 dst.x = partialx(src.x)
435 dst.y = partialx(src.y)
437 dst.z = partialx(src.z)
439 dst.w = partialx(src.w)
442 .. opcode:: DDY, DDY_FINE - Derivative Relative To Y
444 The fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
445 advertised. When it is, the fine version guarantees one derivative per column
446 while DDY is allowed to be the same for the entire 2x2 quad.
450 dst.x = partialy(src.x)
452 dst.y = partialy(src.y)
454 dst.z = partialy(src.z)
456 dst.w = partialy(src.w)
459 .. opcode:: PK2H - Pack Two 16-bit Floats
464 .. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
469 .. opcode:: PK4B - Pack Four Signed 8-bit Scalars
474 .. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
479 .. opcode:: SEQ - Set On Equal
483 dst.x = (src0.x == src1.x) ? 1.0F : 0.0F
485 dst.y = (src0.y == src1.y) ? 1.0F : 0.0F
487 dst.z = (src0.z == src1.z) ? 1.0F : 0.0F
489 dst.w = (src0.w == src1.w) ? 1.0F : 0.0F
492 .. opcode:: SGT - Set On Greater Than
496 dst.x = (src0.x > src1.x) ? 1.0F : 0.0F
498 dst.y = (src0.y > src1.y) ? 1.0F : 0.0F
500 dst.z = (src0.z > src1.z) ? 1.0F : 0.0F
502 dst.w = (src0.w > src1.w) ? 1.0F : 0.0F
505 .. opcode:: SIN - Sine
507 This instruction replicates its result.
514 .. opcode:: SLE - Set On Less Equal Than
518 dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F
520 dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F
522 dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F
524 dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F
527 .. opcode:: SNE - Set On Not Equal
531 dst.x = (src0.x != src1.x) ? 1.0F : 0.0F
533 dst.y = (src0.y != src1.y) ? 1.0F : 0.0F
535 dst.z = (src0.z != src1.z) ? 1.0F : 0.0F
537 dst.w = (src0.w != src1.w) ? 1.0F : 0.0F
540 .. opcode:: TEX - Texture Lookup
542 for array textures src0.y contains the slice for 1D,
543 and src0.z contain the slice for 2D.
545 for shadow textures with no arrays (and not cube map),
546 src0.z contains the reference value.
548 for shadow textures with arrays, src0.z contains
549 the reference value for 1D arrays, and src0.w contains
550 the reference value for 2D arrays and cube maps.
552 for cube map array shadow textures, the reference value
553 cannot be passed in src0.w, and TEX2 must be used instead.
559 shadow_ref = src0.z or src0.w (optional)
563 dst = texture\_sample(unit, coord, shadow_ref)
566 .. opcode:: TEX2 - Texture Lookup (for shadow cube map arrays only)
568 this is the same as TEX, but uses another reg to encode the
579 dst = texture\_sample(unit, coord, shadow_ref)
584 .. opcode:: TXD - Texture Lookup with Derivatives
596 dst = texture\_sample\_deriv(unit, coord, ddx, ddy)
599 .. opcode:: TXP - Projective Texture Lookup
603 coord.x = src0.x / src0.w
605 coord.y = src0.y / src0.w
607 coord.z = src0.z / src0.w
613 dst = texture\_sample(unit, coord)
616 .. opcode:: UP2H - Unpack Two 16-Bit Floats
622 Considered for removal.
624 .. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
630 Considered for removal.
632 .. opcode:: UP4B - Unpack Four Signed 8-Bit Values
638 Considered for removal.
640 .. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
646 Considered for removal.
649 .. opcode:: ARR - Address Register Load With Round
653 dst.x = (int) round(src.x)
655 dst.y = (int) round(src.y)
657 dst.z = (int) round(src.z)
659 dst.w = (int) round(src.w)
662 .. opcode:: SSG - Set Sign
666 dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
668 dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
670 dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
672 dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
675 .. opcode:: CMP - Compare
679 dst.x = (src0.x < 0) ? src1.x : src2.x
681 dst.y = (src0.y < 0) ? src1.y : src2.y
683 dst.z = (src0.z < 0) ? src1.z : src2.z
685 dst.w = (src0.w < 0) ? src1.w : src2.w
688 .. opcode:: KILL_IF - Conditional Discard
690 Conditional discard. Allowed in fragment shaders only.
694 if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
699 .. opcode:: KILL - Discard
701 Unconditional discard. Allowed in fragment shaders only.
704 .. opcode:: SCS - Sine Cosine
717 .. opcode:: TXB - Texture Lookup With Bias
719 for cube map array textures and shadow cube maps, the bias value
720 cannot be passed in src0.w, and TXB2 must be used instead.
722 if the target is a shadow texture, the reference value is always
723 in src.z (this prevents shadow 3d and shadow 2d arrays from
724 using this instruction, but this is not needed).
740 dst = texture\_sample(unit, coord, bias)
743 .. opcode:: TXB2 - Texture Lookup With Bias (some cube maps only)
745 this is the same as TXB, but uses another reg to encode the
746 lod bias value for cube map arrays and shadow cube maps.
747 Presumably shadow 2d arrays and shadow 3d targets could use
748 this encoding too, but this is not legal.
750 shadow cube map arrays are neither possible nor required.
760 dst = texture\_sample(unit, coord, bias)
763 .. opcode:: DIV - Divide
767 dst.x = \frac{src0.x}{src1.x}
769 dst.y = \frac{src0.y}{src1.y}
771 dst.z = \frac{src0.z}{src1.z}
773 dst.w = \frac{src0.w}{src1.w}
776 .. opcode:: DP2 - 2-component Dot Product
778 This instruction replicates its result.
782 dst = src0.x \times src1.x + src0.y \times src1.y
785 .. opcode:: TXL - Texture Lookup With explicit LOD
787 for cube map array textures, the explicit lod value
788 cannot be passed in src0.w, and TXL2 must be used instead.
790 if the target is a shadow texture, the reference value is always
791 in src.z (this prevents shadow 3d / 2d array / cube targets from
792 using this instruction, but this is not needed).
808 dst = texture\_sample(unit, coord, lod)
811 .. opcode:: TXL2 - Texture Lookup With explicit LOD (for cube map arrays only)
813 this is the same as TXL, but uses another reg to encode the
815 Presumably shadow 3d / 2d array / cube targets could use
816 this encoding too, but this is not legal.
818 shadow cube map arrays are neither possible nor required.
828 dst = texture\_sample(unit, coord, lod)
831 .. opcode:: PUSHA - Push Address Register On Stack
840 Considered for cleanup.
844 Considered for removal.
846 .. opcode:: POPA - Pop Address Register From Stack
855 Considered for cleanup.
859 Considered for removal.
862 .. opcode:: CALLNZ - Subroutine Call If Not Zero
868 Considered for cleanup.
872 Considered for removal.
876 ^^^^^^^^^^^^^^^^^^^^^^^^
878 These opcodes are primarily provided for special-use computational shaders.
879 Support for these opcodes indicated by a special pipe capability bit (TBD).
881 XXX doesn't look like most of the opcodes really belong here.
883 .. opcode:: CEIL - Ceiling
887 dst.x = \lceil src.x\rceil
889 dst.y = \lceil src.y\rceil
891 dst.z = \lceil src.z\rceil
893 dst.w = \lceil src.w\rceil
896 .. opcode:: TRUNC - Truncate
909 .. opcode:: MOD - Modulus
913 dst.x = src0.x \bmod src1.x
915 dst.y = src0.y \bmod src1.y
917 dst.z = src0.z \bmod src1.z
919 dst.w = src0.w \bmod src1.w
922 .. opcode:: UARL - Integer Address Register Load
924 Moves the contents of the source register, assumed to be an integer, into the
925 destination register, which is assumed to be an address (ADDR) register.
928 .. opcode:: SAD - Sum Of Absolute Differences
932 dst.x = |src0.x - src1.x| + src2.x
934 dst.y = |src0.y - src1.y| + src2.y
936 dst.z = |src0.z - src1.z| + src2.z
938 dst.w = |src0.w - src1.w| + src2.w
941 .. opcode:: TXF - Texel Fetch
943 As per NV_gpu_shader4, extract a single texel from a specified texture
944 image. The source sampler may not be a CUBE or SHADOW. src 0 is a
945 four-component signed integer vector used to identify the single texel
946 accessed. 3 components + level. Just like texture instructions, an optional
947 offset vector is provided, which is subject to various driver restrictions
948 (regarding range, source of offsets).
949 TXF(uint_vec coord, int_vec offset).
952 .. opcode:: TXQ - Texture Size Query
954 As per NV_gpu_program4, retrieve the dimensions of the texture depending on
955 the target. For 1D (width), 2D/RECT/CUBE (width, height), 3D (width, height,
956 depth), 1D array (width, layers), 2D array (width, height, layers).
957 Also return the number of accessible levels (last_level - first_level + 1)
960 For components which don't return a resource dimension, their value
968 dst.x = texture\_width(unit, lod)
970 dst.y = texture\_height(unit, lod)
972 dst.z = texture\_depth(unit, lod)
974 dst.w = texture\_levels(unit)
976 .. opcode:: TG4 - Texture Gather
978 As per ARB_texture_gather, gathers the four texels to be used in a bi-linear
979 filtering operation and packs them into a single register. Only works with
980 2D, 2D array, cubemaps, and cubemaps arrays. For 2D textures, only the
981 addressing modes of the sampler and the top level of any mip pyramid are
982 used. Set W to zero. It behaves like the TEX instruction, but a filtered
983 sample is not generated. The four samples that contribute to filtering are
984 placed into xyzw in clockwise order, starting with the (u,v) texture
985 coordinate delta at the following locations (-, +), (+, +), (+, -), (-, -),
986 where the magnitude of the deltas are half a texel.
988 PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample
989 depth compares, single component selection, and a non-constant offset. It
990 doesn't allow support for the GL independent offset to get i0,j0. This would
991 require another CAP is hw can do it natively. For now we lower that before
1000 dst = texture\_gather4 (unit, coord, component)
1002 (with SM5 - cube array shadow)
1010 dst = texture\_gather (uint, coord, compare)
1012 .. opcode:: LODQ - level of detail query
1014 Compute the LOD information that the texture pipe would use to access the
1015 texture. The Y component contains the computed LOD lambda_prime. The X
1016 component contains the LOD that will be accessed, based on min/max lod's
1023 dst.xy = lodq(uint, coord);
1026 ^^^^^^^^^^^^^^^^^^^^^^^^
1027 These opcodes are used for integer operations.
1028 Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
1031 .. opcode:: I2F - Signed Integer To Float
1033 Rounding is unspecified (round to nearest even suggested).
1037 dst.x = (float) src.x
1039 dst.y = (float) src.y
1041 dst.z = (float) src.z
1043 dst.w = (float) src.w
1046 .. opcode:: U2F - Unsigned Integer To Float
1048 Rounding is unspecified (round to nearest even suggested).
1052 dst.x = (float) src.x
1054 dst.y = (float) src.y
1056 dst.z = (float) src.z
1058 dst.w = (float) src.w
1061 .. opcode:: F2I - Float to Signed Integer
1063 Rounding is towards zero (truncate).
1064 Values outside signed range (including NaNs) produce undefined results.
1077 .. opcode:: F2U - Float to Unsigned Integer
1079 Rounding is towards zero (truncate).
1080 Values outside unsigned range (including NaNs) produce undefined results.
1084 dst.x = (unsigned) src.x
1086 dst.y = (unsigned) src.y
1088 dst.z = (unsigned) src.z
1090 dst.w = (unsigned) src.w
1093 .. opcode:: UADD - Integer Add
1095 This instruction works the same for signed and unsigned integers.
1096 The low 32bit of the result is returned.
1100 dst.x = src0.x + src1.x
1102 dst.y = src0.y + src1.y
1104 dst.z = src0.z + src1.z
1106 dst.w = src0.w + src1.w
1109 .. opcode:: UMAD - Integer Multiply And Add
1111 This instruction works the same for signed and unsigned integers.
1112 The multiplication returns the low 32bit (as does the result itself).
1116 dst.x = src0.x \times src1.x + src2.x
1118 dst.y = src0.y \times src1.y + src2.y
1120 dst.z = src0.z \times src1.z + src2.z
1122 dst.w = src0.w \times src1.w + src2.w
1125 .. opcode:: UMUL - Integer Multiply
1127 This instruction works the same for signed and unsigned integers.
1128 The low 32bit of the result is returned.
1132 dst.x = src0.x \times src1.x
1134 dst.y = src0.y \times src1.y
1136 dst.z = src0.z \times src1.z
1138 dst.w = src0.w \times src1.w
1141 .. opcode:: IMUL_HI - Signed Integer Multiply High Bits
1143 The high 32bits of the multiplication of 2 signed integers are returned.
1147 dst.x = (src0.x \times src1.x) >> 32
1149 dst.y = (src0.y \times src1.y) >> 32
1151 dst.z = (src0.z \times src1.z) >> 32
1153 dst.w = (src0.w \times src1.w) >> 32
1156 .. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
1158 The high 32bits of the multiplication of 2 unsigned integers are returned.
1162 dst.x = (src0.x \times src1.x) >> 32
1164 dst.y = (src0.y \times src1.y) >> 32
1166 dst.z = (src0.z \times src1.z) >> 32
1168 dst.w = (src0.w \times src1.w) >> 32
1171 .. opcode:: IDIV - Signed Integer Division
1173 TBD: behavior for division by zero.
1177 dst.x = src0.x \ src1.x
1179 dst.y = src0.y \ src1.y
1181 dst.z = src0.z \ src1.z
1183 dst.w = src0.w \ src1.w
1186 .. opcode:: UDIV - Unsigned Integer Division
1188 For division by zero, 0xffffffff is returned.
1192 dst.x = src0.x \ src1.x
1194 dst.y = src0.y \ src1.y
1196 dst.z = src0.z \ src1.z
1198 dst.w = src0.w \ src1.w
1201 .. opcode:: UMOD - Unsigned Integer Remainder
1203 If second arg is zero, 0xffffffff is returned.
1207 dst.x = src0.x \ src1.x
1209 dst.y = src0.y \ src1.y
1211 dst.z = src0.z \ src1.z
1213 dst.w = src0.w \ src1.w
1216 .. opcode:: NOT - Bitwise Not
1229 .. opcode:: AND - Bitwise And
1233 dst.x = src0.x \& src1.x
1235 dst.y = src0.y \& src1.y
1237 dst.z = src0.z \& src1.z
1239 dst.w = src0.w \& src1.w
1242 .. opcode:: OR - Bitwise Or
1246 dst.x = src0.x | src1.x
1248 dst.y = src0.y | src1.y
1250 dst.z = src0.z | src1.z
1252 dst.w = src0.w | src1.w
1255 .. opcode:: XOR - Bitwise Xor
1259 dst.x = src0.x \oplus src1.x
1261 dst.y = src0.y \oplus src1.y
1263 dst.z = src0.z \oplus src1.z
1265 dst.w = src0.w \oplus src1.w
1268 .. opcode:: IMAX - Maximum of Signed Integers
1272 dst.x = max(src0.x, src1.x)
1274 dst.y = max(src0.y, src1.y)
1276 dst.z = max(src0.z, src1.z)
1278 dst.w = max(src0.w, src1.w)
1281 .. opcode:: UMAX - Maximum of Unsigned Integers
1285 dst.x = max(src0.x, src1.x)
1287 dst.y = max(src0.y, src1.y)
1289 dst.z = max(src0.z, src1.z)
1291 dst.w = max(src0.w, src1.w)
1294 .. opcode:: IMIN - Minimum of Signed Integers
1298 dst.x = min(src0.x, src1.x)
1300 dst.y = min(src0.y, src1.y)
1302 dst.z = min(src0.z, src1.z)
1304 dst.w = min(src0.w, src1.w)
1307 .. opcode:: UMIN - Minimum of Unsigned Integers
1311 dst.x = min(src0.x, src1.x)
1313 dst.y = min(src0.y, src1.y)
1315 dst.z = min(src0.z, src1.z)
1317 dst.w = min(src0.w, src1.w)
1320 .. opcode:: SHL - Shift Left
1322 The shift count is masked with 0x1f before the shift is applied.
1326 dst.x = src0.x << (0x1f \& src1.x)
1328 dst.y = src0.y << (0x1f \& src1.y)
1330 dst.z = src0.z << (0x1f \& src1.z)
1332 dst.w = src0.w << (0x1f \& src1.w)
1335 .. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer)
1337 The shift count is masked with 0x1f before the shift is applied.
1341 dst.x = src0.x >> (0x1f \& src1.x)
1343 dst.y = src0.y >> (0x1f \& src1.y)
1345 dst.z = src0.z >> (0x1f \& src1.z)
1347 dst.w = src0.w >> (0x1f \& src1.w)
1350 .. opcode:: USHR - Logical Shift Right
1352 The shift count is masked with 0x1f before the shift is applied.
1356 dst.x = src0.x >> (unsigned) (0x1f \& src1.x)
1358 dst.y = src0.y >> (unsigned) (0x1f \& src1.y)
1360 dst.z = src0.z >> (unsigned) (0x1f \& src1.z)
1362 dst.w = src0.w >> (unsigned) (0x1f \& src1.w)
1365 .. opcode:: UCMP - Integer Conditional Move
1369 dst.x = src0.x ? src1.x : src2.x
1371 dst.y = src0.y ? src1.y : src2.y
1373 dst.z = src0.z ? src1.z : src2.z
1375 dst.w = src0.w ? src1.w : src2.w
1379 .. opcode:: ISSG - Integer Set Sign
1383 dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0
1385 dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0
1387 dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0
1389 dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0
1393 .. opcode:: FSLT - Float Set On Less Than (ordered)
1395 Same comparison as SLT but returns integer instead of 1.0/0.0 float
1399 dst.x = (src0.x < src1.x) ? \sim 0 : 0
1401 dst.y = (src0.y < src1.y) ? \sim 0 : 0
1403 dst.z = (src0.z < src1.z) ? \sim 0 : 0
1405 dst.w = (src0.w < src1.w) ? \sim 0 : 0
1408 .. opcode:: ISLT - Signed Integer Set On Less Than
1412 dst.x = (src0.x < src1.x) ? \sim 0 : 0
1414 dst.y = (src0.y < src1.y) ? \sim 0 : 0
1416 dst.z = (src0.z < src1.z) ? \sim 0 : 0
1418 dst.w = (src0.w < src1.w) ? \sim 0 : 0
1421 .. opcode:: USLT - Unsigned Integer Set On Less Than
1425 dst.x = (src0.x < src1.x) ? \sim 0 : 0
1427 dst.y = (src0.y < src1.y) ? \sim 0 : 0
1429 dst.z = (src0.z < src1.z) ? \sim 0 : 0
1431 dst.w = (src0.w < src1.w) ? \sim 0 : 0
1434 .. opcode:: FSGE - Float Set On Greater Equal Than (ordered)
1436 Same comparison as SGE but returns integer instead of 1.0/0.0 float
1440 dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1442 dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1444 dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1446 dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1449 .. opcode:: ISGE - Signed Integer Set On Greater Equal Than
1453 dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1455 dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1457 dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1459 dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1462 .. opcode:: USGE - Unsigned Integer Set On Greater Equal Than
1466 dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1468 dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1470 dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1472 dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1475 .. opcode:: FSEQ - Float Set On Equal (ordered)
1477 Same comparison as SEQ but returns integer instead of 1.0/0.0 float
1481 dst.x = (src0.x == src1.x) ? \sim 0 : 0
1483 dst.y = (src0.y == src1.y) ? \sim 0 : 0
1485 dst.z = (src0.z == src1.z) ? \sim 0 : 0
1487 dst.w = (src0.w == src1.w) ? \sim 0 : 0
1490 .. opcode:: USEQ - Integer Set On Equal
1494 dst.x = (src0.x == src1.x) ? \sim 0 : 0
1496 dst.y = (src0.y == src1.y) ? \sim 0 : 0
1498 dst.z = (src0.z == src1.z) ? \sim 0 : 0
1500 dst.w = (src0.w == src1.w) ? \sim 0 : 0
1503 .. opcode:: FSNE - Float Set On Not Equal (unordered)
1505 Same comparison as SNE but returns integer instead of 1.0/0.0 float
1509 dst.x = (src0.x != src1.x) ? \sim 0 : 0
1511 dst.y = (src0.y != src1.y) ? \sim 0 : 0
1513 dst.z = (src0.z != src1.z) ? \sim 0 : 0
1515 dst.w = (src0.w != src1.w) ? \sim 0 : 0
1518 .. opcode:: USNE - Integer Set On Not Equal
1522 dst.x = (src0.x != src1.x) ? \sim 0 : 0
1524 dst.y = (src0.y != src1.y) ? \sim 0 : 0
1526 dst.z = (src0.z != src1.z) ? \sim 0 : 0
1528 dst.w = (src0.w != src1.w) ? \sim 0 : 0
1531 .. opcode:: INEG - Integer Negate
1546 .. opcode:: IABS - Integer Absolute Value
1560 These opcodes are used for bit-level manipulation of integers.
1562 .. opcode:: IBFE - Signed Bitfield Extract
1564 See SM5 instruction of the same name. Extracts a set of bits from the input,
1565 and sign-extends them if the high bit of the extracted window is set.
1569 def ibfe(value, offset, bits):
1570 offset = offset & 0x1f
1572 if bits == 0: return 0
1573 # Note: >> sign-extends
1574 if width + offset < 32:
1575 return (value << (32 - offset - bits)) >> (32 - bits)
1577 return value >> offset
1579 .. opcode:: UBFE - Unsigned Bitfield Extract
1581 See SM5 instruction of the same name. Extracts a set of bits from the input,
1582 without any sign-extension.
1586 def ubfe(value, offset, bits):
1587 offset = offset & 0x1f
1589 if bits == 0: return 0
1590 # Note: >> does not sign-extend
1591 if width + offset < 32:
1592 return (value << (32 - offset - bits)) >> (32 - bits)
1594 return value >> offset
1596 .. opcode:: BFI - Bitfield Insert
1598 See SM5 instruction of the same name. Replaces a bit region of 'base' with
1599 the low bits of 'insert'.
1603 def bfi(base, insert, offset, bits):
1604 offset = offset & 0x1f
1606 mask = ((1 << bits) - 1) << offset
1607 return ((insert << offset) & mask) | (base & ~mask)
1609 .. opcode:: BREV - Bitfield Reverse
1611 See SM5 instruction BFREV. Reverses the bits of the argument.
1613 .. opcode:: POPC - Population Count
1615 See SM5 instruction COUNTBITS. Counts the number of set bits in the argument.
1617 .. opcode:: LSB - Index of lowest set bit
1619 See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set
1620 bit of the argument. Returns -1 if none are set.
1622 .. opcode:: IMSB - Index of highest non-sign bit
1624 See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest
1625 non-sign bit of the argument (i.e. highest 0 bit for negative numbers,
1626 highest 1 bit for positive numbers). Returns -1 if all bits are the same
1627 (i.e. for inputs 0 and -1).
1629 .. opcode:: UMSB - Index of highest set bit
1631 See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest
1632 set bit of the argument. Returns -1 if none are set.
1635 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1637 These opcodes are only supported in geometry shaders; they have no meaning
1638 in any other type of shader.
1640 .. opcode:: EMIT - Emit
1642 Generate a new vertex for the current primitive into the specified vertex
1643 stream using the values in the output registers.
1646 .. opcode:: ENDPRIM - End Primitive
1648 Complete the current primitive in the specified vertex stream (consisting of
1649 the emitted vertices), and start a new one.
1655 These opcodes are part of :term:`GLSL`'s opcode set. Support for these
1656 opcodes is determined by a special capability bit, ``GLSL``.
1657 Some require glsl version 1.30 (UIF/BREAKC/SWITCH/CASE/DEFAULT/ENDSWITCH).
1659 .. opcode:: CAL - Subroutine Call
1665 .. opcode:: RET - Subroutine Call Return
1670 .. opcode:: CONT - Continue
1672 Unconditionally moves the point of execution to the instruction after the
1673 last bgnloop. The instruction must appear within a bgnloop/endloop.
1677 Support for CONT is determined by a special capability bit,
1678 ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
1681 .. opcode:: BGNLOOP - Begin a Loop
1683 Start a loop. Must have a matching endloop.
1686 .. opcode:: BGNSUB - Begin Subroutine
1688 Starts definition of a subroutine. Must have a matching endsub.
1691 .. opcode:: ENDLOOP - End a Loop
1693 End a loop started with bgnloop.
1696 .. opcode:: ENDSUB - End Subroutine
1698 Ends definition of a subroutine.
1701 .. opcode:: NOP - No Operation
1706 .. opcode:: BRK - Break
1708 Unconditionally moves the point of execution to the instruction after the
1709 next endloop or endswitch. The instruction must appear within a loop/endloop
1710 or switch/endswitch.
1713 .. opcode:: BREAKC - Break Conditional
1715 Conditionally moves the point of execution to the instruction after the
1716 next endloop or endswitch. The instruction must appear within a loop/endloop
1717 or switch/endswitch.
1718 Condition evaluates to true if src0.x != 0 where src0.x is interpreted
1719 as an integer register.
1723 Considered for removal as it's quite inconsistent wrt other opcodes
1724 (could emulate with UIF/BRK/ENDIF).
1727 .. opcode:: IF - Float If
1729 Start an IF ... ELSE .. ENDIF block. Condition evaluates to true if
1733 where src0.x is interpreted as a floating point register.
1736 .. opcode:: UIF - Bitwise If
1738 Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
1742 where src0.x is interpreted as an integer register.
1745 .. opcode:: ELSE - Else
1747 Starts an else block, after an IF or UIF statement.
1750 .. opcode:: ENDIF - End If
1752 Ends an IF or UIF block.
1755 .. opcode:: SWITCH - Switch
1757 Starts a C-style switch expression. The switch consists of one or multiple
1758 CASE statements, and at most one DEFAULT statement. Execution of a statement
1759 ends when a BRK is hit, but just like in C falling through to other cases
1760 without a break is allowed. Similarly, DEFAULT label is allowed anywhere not
1761 just as last statement, and fallthrough is allowed into/from it.
1762 CASE src arguments are evaluated at bit level against the SWITCH src argument.
1768 (some instructions here)
1771 (some instructions here)
1774 (some instructions here)
1779 .. opcode:: CASE - Switch case
1781 This represents a switch case label. The src arg must be an integer immediate.
1784 .. opcode:: DEFAULT - Switch default
1786 This represents the default case in the switch, which is taken if no other
1790 .. opcode:: ENDSWITCH - End of switch
1792 Ends a switch expression.
1798 The interpolation instructions allow an input to be interpolated in a
1799 different way than its declaration. This corresponds to the GLSL 4.00
1800 interpolateAt* functions. The first argument of each of these must come from
1801 ``TGSI_FILE_INPUT``.
1803 .. opcode:: INTERP_CENTROID - Interpolate at the centroid
1805 Interpolates the varying specified by src0 at the centroid
1807 .. opcode:: INTERP_SAMPLE - Interpolate at the specified sample
1809 Interpolates the varying specified by src0 at the sample id specified by
1810 src1.x (interpreted as an integer)
1812 .. opcode:: INTERP_OFFSET - Interpolate at the specified offset
1814 Interpolates the varying specified by src0 at the offset src1.xy from the
1815 pixel center (interpreted as floats)
1823 The double-precision opcodes reinterpret four-component vectors into
1824 two-component vectors with doubled precision in each component.
1826 .. opcode:: DABS - Absolute
1831 .. opcode:: DADD - Add
1835 dst.xy = src0.xy + src1.xy
1837 dst.zw = src0.zw + src1.zw
1839 .. opcode:: DSEQ - Set on Equal
1843 dst.x = src0.xy == src1.xy ? \sim 0 : 0
1845 dst.z = src0.zw == src1.zw ? \sim 0 : 0
1847 .. opcode:: DSNE - Set on Equal
1851 dst.x = src0.xy != src1.xy ? \sim 0 : 0
1853 dst.z = src0.zw != src1.zw ? \sim 0 : 0
1855 .. opcode:: DSLT - Set on Less than
1859 dst.x = src0.xy < src1.xy ? \sim 0 : 0
1861 dst.z = src0.zw < src1.zw ? \sim 0 : 0
1863 .. opcode:: DSGE - Set on Greater equal
1867 dst.x = src0.xy >= src1.xy ? \sim 0 : 0
1869 dst.z = src0.zw >= src1.zw ? \sim 0 : 0
1871 .. opcode:: DFRAC - Fraction
1875 dst.xy = src.xy - \lfloor src.xy\rfloor
1877 dst.zw = src.zw - \lfloor src.zw\rfloor
1879 .. opcode:: DTRUNC - Truncate
1883 dst.xy = trunc(src.xy)
1885 dst.zw = trunc(src.zw)
1887 .. opcode:: DCEIL - Ceiling
1891 dst.xy = \lceil src.xy\rceil
1893 dst.zw = \lceil src.zw\rceil
1895 .. opcode:: DFLR - Floor
1899 dst.xy = \lfloor src.xy\rfloor
1901 dst.zw = \lfloor src.zw\rfloor
1903 .. opcode:: DROUND - Fraction
1907 dst.xy = round(src.xy)
1909 dst.zw = round(src.zw)
1911 .. opcode:: DSSG - Set Sign
1915 dst.xy = (src.xy > 0) ? 1.0 : (src.xy < 0) ? -1.0 : 0.0
1917 dst.zw = (src.zw > 0) ? 1.0 : (src.zw < 0) ? -1.0 : 0.0
1919 .. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components
1921 Like the ``frexp()`` routine in many math libraries, this opcode stores the
1922 exponent of its source to ``dst0``, and the significand to ``dst1``, such that
1923 :math:`dst1 \times 2^{dst0} = src` .
1927 dst0.xy = exp(src.xy)
1929 dst1.xy = frac(src.xy)
1931 dst0.zw = exp(src.zw)
1933 dst1.zw = frac(src.zw)
1935 .. opcode:: DLDEXP - Multiply Number by Integral Power of 2
1937 This opcode is the inverse of :opcode:`DFRACEXP`. The second
1938 source is an integer.
1942 dst.xy = src0.xy \times 2^{src1.x}
1944 dst.zw = src0.zw \times 2^{src1.y}
1946 .. opcode:: DMIN - Minimum
1950 dst.xy = min(src0.xy, src1.xy)
1952 dst.zw = min(src0.zw, src1.zw)
1954 .. opcode:: DMAX - Maximum
1958 dst.xy = max(src0.xy, src1.xy)
1960 dst.zw = max(src0.zw, src1.zw)
1962 .. opcode:: DMUL - Multiply
1966 dst.xy = src0.xy \times src1.xy
1968 dst.zw = src0.zw \times src1.zw
1971 .. opcode:: DMAD - Multiply And Add
1975 dst.xy = src0.xy \times src1.xy + src2.xy
1977 dst.zw = src0.zw \times src1.zw + src2.zw
1980 .. opcode:: DFMA - Fused Multiply-Add
1982 Perform a * b + c with no intermediate rounding step.
1986 dst.xy = src0.xy \times src1.xy + src2.xy
1988 dst.zw = src0.zw \times src1.zw + src2.zw
1991 .. opcode:: DRCP - Reciprocal
1995 dst.xy = \frac{1}{src.xy}
1997 dst.zw = \frac{1}{src.zw}
1999 .. opcode:: DSQRT - Square Root
2003 dst.xy = \sqrt{src.xy}
2005 dst.zw = \sqrt{src.zw}
2007 .. opcode:: DRSQ - Reciprocal Square Root
2011 dst.xy = \frac{1}{\sqrt{src.xy}}
2013 dst.zw = \frac{1}{\sqrt{src.zw}}
2015 .. opcode:: F2D - Float to Double
2019 dst.xy = double(src0.x)
2021 dst.zw = double(src0.y)
2023 .. opcode:: D2F - Double to Float
2027 dst.x = float(src0.xy)
2029 dst.y = float(src0.zw)
2031 .. opcode:: I2D - Int to Double
2035 dst.xy = double(src0.x)
2037 dst.zw = double(src0.y)
2039 .. opcode:: D2I - Double to Int
2043 dst.x = int(src0.xy)
2045 dst.y = int(src0.zw)
2047 .. opcode:: U2D - Unsigned Int to Double
2051 dst.xy = double(src0.x)
2053 dst.zw = double(src0.y)
2055 .. opcode:: D2U - Double to Unsigned Int
2059 dst.x = unsigned(src0.xy)
2061 dst.y = unsigned(src0.zw)
2063 .. _samplingopcodes:
2065 Resource Sampling Opcodes
2066 ^^^^^^^^^^^^^^^^^^^^^^^^^
2068 Those opcodes follow very closely semantics of the respective Direct3D
2069 instructions. If in doubt double check Direct3D documentation.
2070 Note that the swizzle on SVIEW (src1) determines texel swizzling
2075 Using provided address, sample data from the specified texture using the
2076 filtering mode identified by the gven sampler. The source data may come from
2077 any resource type other than buffers.
2079 Syntax: ``SAMPLE dst, address, sampler_view, sampler``
2081 Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]``
2083 .. opcode:: SAMPLE_I
2085 Simplified alternative to the SAMPLE instruction. Using the provided
2086 integer address, SAMPLE_I fetches data from the specified sampler view
2087 without any filtering. The source data may come from any resource type
2090 Syntax: ``SAMPLE_I dst, address, sampler_view``
2092 Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]``
2094 The 'address' is specified as unsigned integers. If the 'address' is out of
2095 range [0...(# texels - 1)] the result of the fetch is always 0 in all
2096 components. As such the instruction doesn't honor address wrap modes, in
2097 cases where that behavior is desirable 'SAMPLE' instruction should be used.
2098 address.w always provides an unsigned integer mipmap level. If the value is
2099 out of the range then the instruction always returns 0 in all components.
2100 address.yz are ignored for buffers and 1d textures. address.z is ignored
2101 for 1d texture arrays and 2d textures.
2103 For 1D texture arrays address.y provides the array index (also as unsigned
2104 integer). If the value is out of the range of available array indices
2105 [0... (array size - 1)] then the opcode always returns 0 in all components.
2106 For 2D texture arrays address.z provides the array index, otherwise it
2107 exhibits the same behavior as in the case for 1D texture arrays. The exact
2108 semantics of the source address are presented in the table below:
2110 +---------------------------+----+-----+-----+---------+
2111 | resource type | X | Y | Z | W |
2112 +===========================+====+=====+=====+=========+
2113 | ``PIPE_BUFFER`` | x | | | ignored |
2114 +---------------------------+----+-----+-----+---------+
2115 | ``PIPE_TEXTURE_1D`` | x | | | mpl |
2116 +---------------------------+----+-----+-----+---------+
2117 | ``PIPE_TEXTURE_2D`` | x | y | | mpl |
2118 +---------------------------+----+-----+-----+---------+
2119 | ``PIPE_TEXTURE_3D`` | x | y | z | mpl |
2120 +---------------------------+----+-----+-----+---------+
2121 | ``PIPE_TEXTURE_RECT`` | x | y | | mpl |
2122 +---------------------------+----+-----+-----+---------+
2123 | ``PIPE_TEXTURE_CUBE`` | not allowed as source |
2124 +---------------------------+----+-----+-----+---------+
2125 | ``PIPE_TEXTURE_1D_ARRAY`` | x | idx | | mpl |
2126 +---------------------------+----+-----+-----+---------+
2127 | ``PIPE_TEXTURE_2D_ARRAY`` | x | y | idx | mpl |
2128 +---------------------------+----+-----+-----+---------+
2130 Where 'mpl' is a mipmap level and 'idx' is the array index.
2132 .. opcode:: SAMPLE_I_MS
2134 Just like SAMPLE_I but allows fetch data from multi-sampled surfaces.
2136 Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample``
2138 .. opcode:: SAMPLE_B
2140 Just like the SAMPLE instruction with the exception that an additional bias
2141 is applied to the level of detail computed as part of the instruction
2144 Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias``
2146 Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2148 .. opcode:: SAMPLE_C
2150 Similar to the SAMPLE instruction but it performs a comparison filter. The
2151 operands to SAMPLE_C are identical to SAMPLE, except that there is an
2152 additional float32 operand, reference value, which must be a register with
2153 single-component, or a scalar literal. SAMPLE_C makes the hardware use the
2154 current samplers compare_func (in pipe_sampler_state) to compare reference
2155 value against the red component value for the surce resource at each texel
2156 that the currently configured texture filter covers based on the provided
2159 Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value``
2161 Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2163 .. opcode:: SAMPLE_C_LZ
2165 Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands
2168 Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value``
2170 Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2173 .. opcode:: SAMPLE_D
2175 SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for
2176 the source address in the x direction and the y direction are provided by
2179 Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y``
2181 Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]``
2183 .. opcode:: SAMPLE_L
2185 SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided
2186 directly as a scalar value, representing no anisotropy.
2188 Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod``
2190 Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2194 Gathers the four texels to be used in a bi-linear filtering operation and
2195 packs them into a single register. Only works with 2D, 2D array, cubemaps,
2196 and cubemaps arrays. For 2D textures, only the addressing modes of the
2197 sampler and the top level of any mip pyramid are used. Set W to zero. It
2198 behaves like the SAMPLE instruction, but a filtered sample is not
2199 generated. The four samples that contribute to filtering are placed into
2200 xyzw in counter-clockwise order, starting with the (u,v) texture coordinate
2201 delta at the following locations (-, +), (+, +), (+, -), (-, -), where the
2202 magnitude of the deltas are half a texel.
2205 .. opcode:: SVIEWINFO
2207 Query the dimensions of a given sampler view. dst receives width, height,
2208 depth or array size and number of mipmap levels as int4. The dst can have a
2209 writemask which will specify what info is the caller interested in.
2211 Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view``
2213 Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]``
2215 src_mip_level is an unsigned integer scalar. If it's out of range then
2216 returns 0 for width, height and depth/array size but the total number of
2217 mipmap is still returned correctly for the given sampler view. The returned
2218 width, height and depth values are for the mipmap level selected by the
2219 src_mip_level and are in the number of texels. For 1d texture array width
2220 is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is
2221 still in dst.w. In contrast to d3d10 resinfo, there's no way in the tgsi
2222 instruction encoding to specify the return type (float/rcpfloat/uint), hence
2223 always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1
2224 resinfo allowing swizzling dst values is ignored (due to the interaction
2225 with rcpfloat modifier which requires some swizzle handling in the state
2228 .. opcode:: SAMPLE_POS
2230 Query the position of a given sample. dst receives float4 (x, y, 0, 0)
2231 indicated where the sample is located. If the resource is not a multi-sample
2232 resource and not a render target, the result is 0.
2234 .. opcode:: SAMPLE_INFO
2236 dst receives number of samples in x. If the resource is not a multi-sample
2237 resource and not a render target, the result is 0.
2240 .. _resourceopcodes:
2242 Resource Access Opcodes
2243 ^^^^^^^^^^^^^^^^^^^^^^^
2245 .. opcode:: LOAD - Fetch data from a shader resource
2247 Syntax: ``LOAD dst, resource, address``
2249 Example: ``LOAD TEMP[0], RES[0], TEMP[1]``
2251 Using the provided integer address, LOAD fetches data
2252 from the specified buffer or texture without any
2255 The 'address' is specified as a vector of unsigned
2256 integers. If the 'address' is out of range the result
2259 Only the first mipmap level of a resource can be read
2260 from using this instruction.
2262 For 1D or 2D texture arrays, the array index is
2263 provided as an unsigned integer in address.y or
2264 address.z, respectively. address.yz are ignored for
2265 buffers and 1D textures. address.z is ignored for 1D
2266 texture arrays and 2D textures. address.w is always
2269 .. opcode:: STORE - Write data to a shader resource
2271 Syntax: ``STORE resource, address, src``
2273 Example: ``STORE RES[0], TEMP[0], TEMP[1]``
2275 Using the provided integer address, STORE writes data
2276 to the specified buffer or texture.
2278 The 'address' is specified as a vector of unsigned
2279 integers. If the 'address' is out of range the result
2282 Only the first mipmap level of a resource can be
2283 written to using this instruction.
2285 For 1D or 2D texture arrays, the array index is
2286 provided as an unsigned integer in address.y or
2287 address.z, respectively. address.yz are ignored for
2288 buffers and 1D textures. address.z is ignored for 1D
2289 texture arrays and 2D textures. address.w is always
2293 .. _threadsyncopcodes:
2295 Inter-thread synchronization opcodes
2296 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2298 These opcodes are intended for communication between threads running
2299 within the same compute grid. For now they're only valid in compute
2302 .. opcode:: MFENCE - Memory fence
2304 Syntax: ``MFENCE resource``
2306 Example: ``MFENCE RES[0]``
2308 This opcode forces strong ordering between any memory access
2309 operations that affect the specified resource. This means that
2310 previous loads and stores (and only those) will be performed and
2311 visible to other threads before the program execution continues.
2314 .. opcode:: LFENCE - Load memory fence
2316 Syntax: ``LFENCE resource``
2318 Example: ``LFENCE RES[0]``
2320 Similar to MFENCE, but it only affects the ordering of memory loads.
2323 .. opcode:: SFENCE - Store memory fence
2325 Syntax: ``SFENCE resource``
2327 Example: ``SFENCE RES[0]``
2329 Similar to MFENCE, but it only affects the ordering of memory stores.
2332 .. opcode:: BARRIER - Thread group barrier
2336 This opcode suspends the execution of the current thread until all
2337 the remaining threads in the working group reach the same point of
2338 the program. Results are unspecified if any of the remaining
2339 threads terminates or never reaches an executed BARRIER instruction.
2347 These opcodes provide atomic variants of some common arithmetic and
2348 logical operations. In this context atomicity means that another
2349 concurrent memory access operation that affects the same memory
2350 location is guaranteed to be performed strictly before or after the
2351 entire execution of the atomic operation.
2353 For the moment they're only valid in compute programs.
2355 .. opcode:: ATOMUADD - Atomic integer addition
2357 Syntax: ``ATOMUADD dst, resource, offset, src``
2359 Example: ``ATOMUADD TEMP[0], RES[0], TEMP[1], TEMP[2]``
2361 The following operation is performed atomically on each component:
2365 dst_i = resource[offset]_i
2367 resource[offset]_i = dst_i + src_i
2370 .. opcode:: ATOMXCHG - Atomic exchange
2372 Syntax: ``ATOMXCHG dst, resource, offset, src``
2374 Example: ``ATOMXCHG TEMP[0], RES[0], TEMP[1], TEMP[2]``
2376 The following operation is performed atomically on each component:
2380 dst_i = resource[offset]_i
2382 resource[offset]_i = src_i
2385 .. opcode:: ATOMCAS - Atomic compare-and-exchange
2387 Syntax: ``ATOMCAS dst, resource, offset, cmp, src``
2389 Example: ``ATOMCAS TEMP[0], RES[0], TEMP[1], TEMP[2], TEMP[3]``
2391 The following operation is performed atomically on each component:
2395 dst_i = resource[offset]_i
2397 resource[offset]_i = (dst_i == cmp_i ? src_i : dst_i)
2400 .. opcode:: ATOMAND - Atomic bitwise And
2402 Syntax: ``ATOMAND dst, resource, offset, src``
2404 Example: ``ATOMAND TEMP[0], RES[0], TEMP[1], TEMP[2]``
2406 The following operation is performed atomically on each component:
2410 dst_i = resource[offset]_i
2412 resource[offset]_i = dst_i \& src_i
2415 .. opcode:: ATOMOR - Atomic bitwise Or
2417 Syntax: ``ATOMOR dst, resource, offset, src``
2419 Example: ``ATOMOR TEMP[0], RES[0], TEMP[1], TEMP[2]``
2421 The following operation is performed atomically on each component:
2425 dst_i = resource[offset]_i
2427 resource[offset]_i = dst_i | src_i
2430 .. opcode:: ATOMXOR - Atomic bitwise Xor
2432 Syntax: ``ATOMXOR dst, resource, offset, src``
2434 Example: ``ATOMXOR TEMP[0], RES[0], TEMP[1], TEMP[2]``
2436 The following operation is performed atomically on each component:
2440 dst_i = resource[offset]_i
2442 resource[offset]_i = dst_i \oplus src_i
2445 .. opcode:: ATOMUMIN - Atomic unsigned minimum
2447 Syntax: ``ATOMUMIN dst, resource, offset, src``
2449 Example: ``ATOMUMIN TEMP[0], RES[0], TEMP[1], TEMP[2]``
2451 The following operation is performed atomically on each component:
2455 dst_i = resource[offset]_i
2457 resource[offset]_i = (dst_i < src_i ? dst_i : src_i)
2460 .. opcode:: ATOMUMAX - Atomic unsigned maximum
2462 Syntax: ``ATOMUMAX dst, resource, offset, src``
2464 Example: ``ATOMUMAX TEMP[0], RES[0], TEMP[1], TEMP[2]``
2466 The following operation is performed atomically on each component:
2470 dst_i = resource[offset]_i
2472 resource[offset]_i = (dst_i > src_i ? dst_i : src_i)
2475 .. opcode:: ATOMIMIN - Atomic signed minimum
2477 Syntax: ``ATOMIMIN dst, resource, offset, src``
2479 Example: ``ATOMIMIN TEMP[0], RES[0], TEMP[1], TEMP[2]``
2481 The following operation is performed atomically on each component:
2485 dst_i = resource[offset]_i
2487 resource[offset]_i = (dst_i < src_i ? dst_i : src_i)
2490 .. opcode:: ATOMIMAX - Atomic signed maximum
2492 Syntax: ``ATOMIMAX dst, resource, offset, src``
2494 Example: ``ATOMIMAX TEMP[0], RES[0], TEMP[1], TEMP[2]``
2496 The following operation is performed atomically on each component:
2500 dst_i = resource[offset]_i
2502 resource[offset]_i = (dst_i > src_i ? dst_i : src_i)
2506 Explanation of symbols used
2507 ------------------------------
2514 :math:`|x|` Absolute value of `x`.
2516 :math:`\lceil x \rceil` Ceiling of `x`.
2518 clamp(x,y,z) Clamp x between y and z.
2519 (x < y) ? y : (x > z) ? z : x
2521 :math:`\lfloor x\rfloor` Floor of `x`.
2523 :math:`\log_2{x}` Logarithm of `x`, base 2.
2525 max(x,y) Maximum of x and y.
2528 min(x,y) Minimum of x and y.
2531 partialx(x) Derivative of x relative to fragment's X.
2533 partialy(x) Derivative of x relative to fragment's Y.
2535 pop() Pop from stack.
2537 :math:`x^y` `x` to the power `y`.
2539 push(x) Push x on stack.
2543 trunc(x) Truncate x, i.e. drop the fraction bits.
2550 discard Discard fragment.
2554 target Label of target instruction.
2565 Declares a register that is will be referenced as an operand in Instruction
2568 File field contains register file that is being declared and is one
2571 UsageMask field specifies which of the register components can be accessed
2572 and is one of TGSI_WRITEMASK.
2574 The Local flag specifies that a given value isn't intended for
2575 subroutine parameter passing and, as a result, the implementation
2576 isn't required to give any guarantees of it being preserved across
2577 subroutine boundaries. As it's merely a compiler hint, the
2578 implementation is free to ignore it.
2580 If Dimension flag is set to 1, a Declaration Dimension token follows.
2582 If Semantic flag is set to 1, a Declaration Semantic token follows.
2584 If Interpolate flag is set to 1, a Declaration Interpolate token follows.
2586 If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows.
2588 If Array flag is set to 1, a Declaration Array token follows.
2591 ^^^^^^^^^^^^^^^^^^^^^^^^
2593 Declarations can optional have an ArrayID attribute which can be referred by
2594 indirect addressing operands. An ArrayID of zero is reserved and treated as
2595 if no ArrayID is specified.
2597 If an indirect addressing operand refers to a specific declaration by using
2598 an ArrayID only the registers in this declaration are guaranteed to be
2599 accessed, accessing any register outside this declaration results in undefined
2600 behavior. Note that for compatibility the effective index is zero-based and
2601 not relative to the specified declaration
2603 If no ArrayID is specified with an indirect addressing operand the whole
2604 register file might be accessed by this operand. This is strongly discouraged
2605 and will prevent packing of scalar/vec2 arrays and effective alias analysis.
2606 This is only legal for TEMP and CONST register files.
2608 Declaration Semantic
2609 ^^^^^^^^^^^^^^^^^^^^^^^^
2611 Vertex and fragment shader input and output registers may be labeled
2612 with semantic information consisting of a name and index.
2614 Follows Declaration token if Semantic bit is set.
2616 Since its purpose is to link a shader with other stages of the pipeline,
2617 it is valid to follow only those Declaration tokens that declare a register
2618 either in INPUT or OUTPUT file.
2620 SemanticName field contains the semantic name of the register being declared.
2621 There is no default value.
2623 SemanticIndex is an optional subscript that can be used to distinguish
2624 different register declarations with the same semantic name. The default value
2627 The meanings of the individual semantic names are explained in the following
2630 TGSI_SEMANTIC_POSITION
2631 """"""""""""""""""""""
2633 For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
2634 output register which contains the homogeneous vertex position in the clip
2635 space coordinate system. After clipping, the X, Y and Z components of the
2636 vertex will be divided by the W value to get normalized device coordinates.
2638 For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
2639 fragment shader input contains the fragment's window position. The X
2640 component starts at zero and always increases from left to right.
2641 The Y component starts at zero and always increases but Y=0 may either
2642 indicate the top of the window or the bottom depending on the fragment
2643 coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
2644 The Z coordinate ranges from 0 to 1 to represent depth from the front
2645 to the back of the Z buffer. The W component contains the interpolated
2646 reciprocal of the vertex position W component (corresponding to gl_Fragcoord,
2647 but unlike d3d10 which interpolates the same 1/w but then gives back
2648 the reciprocal of the interpolated value).
2650 Fragment shaders may also declare an output register with
2651 TGSI_SEMANTIC_POSITION. Only the Z component is writable. This allows
2652 the fragment shader to change the fragment's Z position.
2659 For vertex shader outputs or fragment shader inputs/outputs, this
2660 label indicates that the resister contains an R,G,B,A color.
2662 Several shader inputs/outputs may contain colors so the semantic index
2663 is used to distinguish them. For example, color[0] may be the diffuse
2664 color while color[1] may be the specular color.
2666 This label is needed so that the flat/smooth shading can be applied
2667 to the right interpolants during rasterization.
2671 TGSI_SEMANTIC_BCOLOR
2672 """"""""""""""""""""
2674 Back-facing colors are only used for back-facing polygons, and are only valid
2675 in vertex shader outputs. After rasterization, all polygons are front-facing
2676 and COLOR and BCOLOR end up occupying the same slots in the fragment shader,
2677 so all BCOLORs effectively become regular COLORs in the fragment shader.
2683 Vertex shader inputs and outputs and fragment shader inputs may be
2684 labeled with TGSI_SEMANTIC_FOG to indicate that the register contains
2685 a fog coordinate. Typically, the fragment shader will use the fog coordinate
2686 to compute a fog blend factor which is used to blend the normal fragment color
2687 with a constant fog color. But fog coord really is just an ordinary vec4
2688 register like regular semantics.
2694 Vertex shader input and output registers may be labeled with
2695 TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size
2696 in the form (S, 0, 0, 1). The point size controls the width or diameter
2697 of points for rasterization. This label cannot be used in fragment
2700 When using this semantic, be sure to set the appropriate state in the
2701 :ref:`rasterizer` first.
2704 TGSI_SEMANTIC_TEXCOORD
2705 """"""""""""""""""""""
2707 Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
2709 Vertex shader outputs and fragment shader inputs may be labeled with
2710 this semantic to make them replaceable by sprite coordinates via the
2711 sprite_coord_enable state in the :ref:`rasterizer`.
2712 The semantic index permitted with this semantic is limited to <= 7.
2714 If the driver does not support TEXCOORD, sprite coordinate replacement
2715 applies to inputs with the GENERIC semantic instead.
2717 The intended use case for this semantic is gl_TexCoord.
2720 TGSI_SEMANTIC_PCOORD
2721 """"""""""""""""""""
2723 Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
2725 Fragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate
2726 that the register contains sprite coordinates in the form (x, y, 0, 1), if
2727 the current primitive is a point and point sprites are enabled. Otherwise,
2728 the contents of the register are undefined.
2730 The intended use case for this semantic is gl_PointCoord.
2733 TGSI_SEMANTIC_GENERIC
2734 """""""""""""""""""""
2736 All vertex/fragment shader inputs/outputs not labeled with any other
2737 semantic label can be considered to be generic attributes. Typical
2738 uses of generic inputs/outputs are texcoords and user-defined values.
2741 TGSI_SEMANTIC_NORMAL
2742 """"""""""""""""""""
2744 Indicates that a vertex shader input is a normal vector. This is
2745 typically only used for legacy graphics APIs.
2751 This label applies to fragment shader inputs only and indicates that
2752 the register contains front/back-face information of the form (F, 0,
2753 0, 1). The first component will be positive when the fragment belongs
2754 to a front-facing polygon, and negative when the fragment belongs to a
2755 back-facing polygon.
2758 TGSI_SEMANTIC_EDGEFLAG
2759 """"""""""""""""""""""
2761 For vertex shaders, this sematic label indicates that an input or
2762 output is a boolean edge flag. The register layout is [F, x, x, x]
2763 where F is 0.0 or 1.0 and x = don't care. Normally, the vertex shader
2764 simply copies the edge flag input to the edgeflag output.
2766 Edge flags are used to control which lines or points are actually
2767 drawn when the polygon mode converts triangles/quads/polygons into
2771 TGSI_SEMANTIC_STENCIL
2772 """""""""""""""""""""
2774 For fragment shaders, this semantic label indicates that an output
2775 is a writable stencil reference value. Only the Y component is writable.
2776 This allows the fragment shader to change the fragments stencilref value.
2779 TGSI_SEMANTIC_VIEWPORT_INDEX
2780 """"""""""""""""""""""""""""
2782 For geometry shaders, this semantic label indicates that an output
2783 contains the index of the viewport (and scissor) to use.
2784 This is an integer value, and only the X component is used.
2790 For geometry shaders, this semantic label indicates that an output
2791 contains the layer value to use for the color and depth/stencil surfaces.
2792 This is an integer value, and only the X component is used.
2793 (Also known as rendertarget array index.)
2796 TGSI_SEMANTIC_CULLDIST
2797 """"""""""""""""""""""
2799 Used as distance to plane for performing application-defined culling
2800 of individual primitives against a plane. When components of vertex
2801 elements are given this label, these values are assumed to be a
2802 float32 signed distance to a plane. Primitives will be completely
2803 discarded if the plane distance for all of the vertices in the
2804 primitive are < 0. If a vertex has a cull distance of NaN, that
2805 vertex counts as "out" (as if its < 0);
2806 The limits on both clip and cull distances are bound
2807 by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
2808 the maximum number of components that can be used to hold the
2809 distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
2810 which specifies the maximum number of registers which can be
2811 annotated with those semantics.
2814 TGSI_SEMANTIC_CLIPDIST
2815 """"""""""""""""""""""
2817 When components of vertex elements are identified this way, these
2818 values are each assumed to be a float32 signed distance to a plane.
2819 Primitive setup only invokes rasterization on pixels for which
2820 the interpolated plane distances are >= 0. Multiple clip planes
2821 can be implemented simultaneously, by annotating multiple
2822 components of one or more vertex elements with the above specified
2823 semantic. The limits on both clip and cull distances are bound
2824 by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
2825 the maximum number of components that can be used to hold the
2826 distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
2827 which specifies the maximum number of registers which can be
2828 annotated with those semantics.
2830 TGSI_SEMANTIC_SAMPLEID
2831 """"""""""""""""""""""
2833 For fragment shaders, this semantic label indicates that a system value
2834 contains the current sample id (i.e. gl_SampleID).
2835 This is an integer value, and only the X component is used.
2837 TGSI_SEMANTIC_SAMPLEPOS
2838 """""""""""""""""""""""
2840 For fragment shaders, this semantic label indicates that a system value
2841 contains the current sample's position (i.e. gl_SamplePosition). Only the X
2842 and Y values are used.
2844 TGSI_SEMANTIC_SAMPLEMASK
2845 """"""""""""""""""""""""
2847 For fragment shaders, this semantic label indicates that an output contains
2848 the sample mask used to disable further sample processing
2849 (i.e. gl_SampleMask). Only the X value is used, up to 32x MS.
2851 TGSI_SEMANTIC_INVOCATIONID
2852 """"""""""""""""""""""""""
2854 For geometry shaders, this semantic label indicates that a system value
2855 contains the current invocation id (i.e. gl_InvocationID).
2856 This is an integer value, and only the X component is used.
2858 TGSI_SEMANTIC_INSTANCEID
2859 """"""""""""""""""""""""
2861 For vertex shaders, this semantic label indicates that a system value contains
2862 the current instance id (i.e. gl_InstanceID). It does not include the base
2863 instance. This is an integer value, and only the X component is used.
2865 TGSI_SEMANTIC_VERTEXID
2866 """"""""""""""""""""""
2868 For vertex shaders, this semantic label indicates that a system value contains
2869 the current vertex id (i.e. gl_VertexID). It does (unlike in d3d10) include the
2870 base vertex. This is an integer value, and only the X component is used.
2872 TGSI_SEMANTIC_VERTEXID_NOBASE
2873 """""""""""""""""""""""""""""""
2875 For vertex shaders, this semantic label indicates that a system value contains
2876 the current vertex id without including the base vertex (this corresponds to
2877 d3d10 vertex id, so TGSI_SEMANTIC_VERTEXID_NOBASE + TGSI_SEMANTIC_BASEVERTEX
2878 == TGSI_SEMANTIC_VERTEXID). This is an integer value, and only the X component
2881 TGSI_SEMANTIC_BASEVERTEX
2882 """"""""""""""""""""""""
2884 For vertex shaders, this semantic label indicates that a system value contains
2885 the base vertex (i.e. gl_BaseVertex). Note that for non-indexed draw calls,
2886 this contains the first (or start) value instead.
2887 This is an integer value, and only the X component is used.
2889 TGSI_SEMANTIC_PRIMID
2890 """"""""""""""""""""
2892 For geometry and fragment shaders, this semantic label indicates the value
2893 contains the primitive id (i.e. gl_PrimitiveID). This is an integer value,
2894 and only the X component is used.
2895 FIXME: This right now can be either a ordinary input or a system value...
2901 For tessellation evaluation/control shaders, this semantic label indicates a
2902 generic per-patch attribute. Such semantics will not implicitly be per-vertex
2905 TGSI_SEMANTIC_TESSCOORD
2906 """""""""""""""""""""""
2908 For tessellation evaluation shaders, this semantic label indicates the
2909 coordinates of the vertex being processed. This is available in XYZ; W is
2912 TGSI_SEMANTIC_TESSOUTER
2913 """""""""""""""""""""""
2915 For tessellation evaluation/control shaders, this semantic label indicates the
2916 outer tessellation levels of the patch. Isoline tessellation will only have XY
2917 defined, triangle will have XYZ and quads will have XYZW defined. This
2918 corresponds to gl_TessLevelOuter.
2920 TGSI_SEMANTIC_TESSINNER
2921 """""""""""""""""""""""
2923 For tessellation evaluation/control shaders, this semantic label indicates the
2924 inner tessellation levels of the patch. The X value is only defined for
2925 triangle tessellation, while quads will have XY defined. This is entirely
2926 undefined for isoline tessellation.
2928 TGSI_SEMANTIC_VERTICESIN
2929 """"""""""""""""""""""""
2931 For tessellation evaluation/control shaders, this semantic label indicates the
2932 number of vertices provided in the input patch. Only the X value is defined.
2935 Declaration Interpolate
2936 ^^^^^^^^^^^^^^^^^^^^^^^
2938 This token is only valid for fragment shader INPUT declarations.
2940 The Interpolate field specifes the way input is being interpolated by
2941 the rasteriser and is one of TGSI_INTERPOLATE_*.
2943 The Location field specifies the location inside the pixel that the
2944 interpolation should be done at, one of ``TGSI_INTERPOLATE_LOC_*``. Note that
2945 when per-sample shading is enabled, the implementation may choose to
2946 interpolate at the sample irrespective of the Location field.
2948 The CylindricalWrap bitfield specifies which register components
2949 should be subject to cylindrical wrapping when interpolating by the
2950 rasteriser. If TGSI_CYLINDRICAL_WRAP_X is set to 1, the X component
2951 should be interpolated according to cylindrical wrapping rules.
2954 Declaration Sampler View
2955 ^^^^^^^^^^^^^^^^^^^^^^^^
2957 Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
2959 DCL SVIEW[#], resource, type(s)
2961 Declares a shader input sampler view and assigns it to a SVIEW[#]
2964 resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray.
2966 type must be 1 or 4 entries (if specifying on a per-component
2967 level) out of UNORM, SNORM, SINT, UINT and FLOAT.
2969 For TEX\* style texture sample opcodes (as opposed to SAMPLE\* opcodes
2970 which take an explicit SVIEW[#] source register), there may be optionally
2971 SVIEW[#] declarations. In this case, the SVIEW index is implied by the
2972 SAMP index, and there must be a corresponding SVIEW[#] declaration for
2973 each SAMP[#] declaration. Drivers are free to ignore this if they wish.
2974 But note in particular that some drivers need to know the sampler type
2975 (float/int/unsigned) in order to generate the correct code, so cases
2976 where integer textures are sampled, SVIEW[#] declarations should be
2979 NOTE: It is NOT legal to mix SAMPLE\* style opcodes and TEX\* opcodes
2982 Declaration Resource
2983 ^^^^^^^^^^^^^^^^^^^^
2985 Follows Declaration token if file is TGSI_FILE_RESOURCE.
2987 DCL RES[#], resource [, WR] [, RAW]
2989 Declares a shader input resource and assigns it to a RES[#]
2992 resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and
2995 If the RAW keyword is not specified, the texture data will be
2996 subject to conversion, swizzling and scaling as required to yield
2997 the specified data type from the physical data format of the bound
3000 If the RAW keyword is specified, no channel conversion will be
3001 performed: the values read for each of the channels (X,Y,Z,W) will
3002 correspond to consecutive words in the same order and format
3003 they're found in memory. No element-to-address conversion will be
3004 performed either: the value of the provided X coordinate will be
3005 interpreted in byte units instead of texel units. The result of
3006 accessing a misaligned address is undefined.
3008 Usage of the STORE opcode is only allowed if the WR (writable) flag
3013 ^^^^^^^^^^^^^^^^^^^^^^^^
3015 Properties are general directives that apply to the whole TGSI program.
3020 Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
3021 The default value is UPPER_LEFT.
3023 If UPPER_LEFT, the position will be (0,0) at the upper left corner and
3024 increase downward and rightward.
3025 If LOWER_LEFT, the position will be (0,0) at the lower left corner and
3026 increase upward and rightward.
3028 OpenGL defaults to LOWER_LEFT, and is configurable with the
3029 GL_ARB_fragment_coord_conventions extension.
3031 DirectX 9/10 use UPPER_LEFT.
3033 FS_COORD_PIXEL_CENTER
3034 """""""""""""""""""""
3036 Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
3037 The default value is HALF_INTEGER.
3039 If HALF_INTEGER, the fractionary part of the position will be 0.5
3040 If INTEGER, the fractionary part of the position will be 0.0
3042 Note that this does not affect the set of fragments generated by
3043 rasterization, which is instead controlled by half_pixel_center in the
3046 OpenGL defaults to HALF_INTEGER, and is configurable with the
3047 GL_ARB_fragment_coord_conventions extension.
3049 DirectX 9 uses INTEGER.
3050 DirectX 10 uses HALF_INTEGER.
3052 FS_COLOR0_WRITES_ALL_CBUFS
3053 """"""""""""""""""""""""""
3054 Specifies that writes to the fragment shader color 0 are replicated to all
3055 bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where
3056 fragData is directed to a single color buffer, but fragColor is broadcast.
3059 """"""""""""""""""""""""""
3060 If this property is set on the program bound to the shader stage before the
3061 fragment shader, user clip planes should have no effect (be disabled) even if
3062 that shader does not write to any clip distance outputs and the rasterizer's
3063 clip_plane_enable is non-zero.
3064 This property is only supported by drivers that also support shader clip
3066 This is useful for APIs that don't have UCPs and where clip distances written
3067 by a shader cannot be disabled.
3072 Specifies the number of times a geometry shader should be executed for each
3073 input primitive. Each invocation will have a different
3074 TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to
3077 VS_WINDOW_SPACE_POSITION
3078 """"""""""""""""""""""""""
3079 If this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output
3080 is assumed to contain window space coordinates.
3081 Division of X,Y,Z by W and the viewport transformation are disabled, and 1/W is
3082 directly taken from the 4-th component of the shader output.
3083 Naturally, clipping is not performed on window coordinates either.
3084 The effect of this property is undefined if a geometry or tessellation shader
3090 The number of vertices written by the tessellation control shader. This
3091 effectively defines the patch input size of the tessellation evaluation shader
3097 This sets the tessellation primitive mode, one of ``PIPE_PRIM_TRIANGLES``,
3098 ``PIPE_PRIM_QUADS``, or ``PIPE_PRIM_LINES``. (Unlike in GL, there is no
3099 separate isolines settings, the regular lines is assumed to mean isolines.)
3104 This sets the spacing mode of the tessellation generator, one of
3105 ``PIPE_TESS_SPACING_*``.
3110 This sets the vertex order to be clockwise if the value is 1, or
3111 counter-clockwise if set to 0.
3116 If set to a non-zero value, this turns on point mode for the tessellator,
3117 which means that points will be generated instead of primitives.
3120 Texture Sampling and Texture Formats
3121 ------------------------------------
3123 This table shows how texture image components are returned as (x,y,z,w) tuples
3124 by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
3125 :opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
3128 +--------------------+--------------+--------------------+--------------+
3129 | Texture Components | Gallium | OpenGL | Direct3D 9 |
3130 +====================+==============+====================+==============+
3131 | R | (r, 0, 0, 1) | (r, 0, 0, 1) | (r, 1, 1, 1) |
3132 +--------------------+--------------+--------------------+--------------+
3133 | RG | (r, g, 0, 1) | (r, g, 0, 1) | (r, g, 1, 1) |
3134 +--------------------+--------------+--------------------+--------------+
3135 | RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) |
3136 +--------------------+--------------+--------------------+--------------+
3137 | RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) |
3138 +--------------------+--------------+--------------------+--------------+
3139 | A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) |
3140 +--------------------+--------------+--------------------+--------------+
3141 | L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) |
3142 +--------------------+--------------+--------------------+--------------+
3143 | LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) |
3144 +--------------------+--------------+--------------------+--------------+
3145 | I | (i, i, i, i) | (i, i, i, i) | N/A |
3146 +--------------------+--------------+--------------------+--------------+
3147 | UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) |
3148 | | | [#envmap-bumpmap]_ | |
3149 +--------------------+--------------+--------------------+--------------+
3150 | Z | XXX TBD | (z, z, z, 1) | (0, z, 0, 1) |
3151 | | | [#depth-tex-mode]_ | |
3152 +--------------------+--------------+--------------------+--------------+
3153 | S | (s, s, s, s) | unknown | unknown |
3154 +--------------------+--------------+--------------------+--------------+
3156 .. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
3157 .. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
3158 or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.