4 TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
5 for describing shaders. Since Gallium is inherently shaderful, shaders are
6 an important part of the API. TGSI is the only intermediate representation
12 From GL_NV_vertex_program
13 ^^^^^^^^^^^^^^^^^^^^^^^^^
16 ARL - Address Register Load
20 dst.x = \lfloor src.x\rfloor
22 dst.y = \lfloor src.y\rfloor
24 dst.z = \lfloor src.z\rfloor
26 dst.w = \lfloor src.w\rfloor
42 LIT - Light Coefficients
50 dst.z = (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0
59 dst.x = \frac{1}{src.x}
61 dst.y = \frac{1}{src.x}
63 dst.z = \frac{1}{src.x}
65 dst.w = \frac{1}{src.x}
68 RSQ - Reciprocal Square Root
72 dst.x = \frac{1}{\sqrt{|src.x|}}
74 dst.y = \frac{1}{\sqrt{|src.x|}}
76 dst.z = \frac{1}{\sqrt{|src.x|}}
78 dst.w = \frac{1}{\sqrt{|src.x|}}
81 EXP - Approximate Exponential Base 2
85 dst.x = 2^{\lfloor src.x\rfloor}
87 dst.y = src.x - \lfloor src.x\rfloor
94 LOG - Approximate Logarithm Base 2
98 dst.x = \lfloor\log_2{|src.x|}\rfloor
100 dst.y = \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}}
102 dst.z = \log_2{|src.x|}
111 dst.x = src0.x \times src1.x
113 dst.y = src0.y \times src1.y
115 dst.z = src0.z \times src1.z
117 dst.w = src0.w \times src1.w
124 dst.x = src0.x + src1.x
126 dst.y = src0.y + src1.y
128 dst.z = src0.z + src1.z
130 dst.w = src0.w + src1.w
133 DP3 - 3-component Dot Product
137 dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
139 dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
141 dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
143 dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
146 DP4 - 4-component Dot Product
150 dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
152 dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
154 dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
156 dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
159 DST - Distance Vector
165 dst.y = src0.y \times src1.y
176 dst.x = min(src0.x, src1.x)
178 dst.y = min(src0.y, src1.y)
180 dst.z = min(src0.z, src1.z)
182 dst.w = min(src0.w, src1.w)
189 dst.x = max(src0.x, src1.x)
191 dst.y = max(src0.y, src1.y)
193 dst.z = max(src0.z, src1.z)
195 dst.w = max(src0.w, src1.w)
198 SLT - Set On Less Than
202 dst.x = (src0.x < src1.x) ? 1 : 0
204 dst.y = (src0.y < src1.y) ? 1 : 0
206 dst.z = (src0.z < src1.z) ? 1 : 0
208 dst.w = (src0.w < src1.w) ? 1 : 0
211 SGE - Set On Greater Equal Than
215 dst.x = (src0.x >= src1.x) ? 1 : 0
217 dst.y = (src0.y >= src1.y) ? 1 : 0
219 dst.z = (src0.z >= src1.z) ? 1 : 0
221 dst.w = (src0.w >= src1.w) ? 1 : 0
224 MAD - Multiply And Add
228 dst.x = src0.x \times src1.x + src2.x
230 dst.y = src0.y \times src1.y + src2.y
232 dst.z = src0.z \times src1.z + src2.z
234 dst.w = src0.w \times src1.w + src2.w
241 dst.x = src0.x - src1.x
243 dst.y = src0.y - src1.y
245 dst.z = src0.z - src1.z
247 dst.w = src0.w - src1.w
250 LRP - Linear Interpolate
254 dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
256 dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
258 dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
260 dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
267 dst.x = (src2.x > 0.5) ? src0.x : src1.x
269 dst.y = (src2.y > 0.5) ? src0.y : src1.y
271 dst.z = (src2.z > 0.5) ? src0.z : src1.z
273 dst.w = (src2.w > 0.5) ? src0.w : src1.w
276 DP2A - 2-component Dot Product And Add
280 dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x
282 dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x
284 dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x
286 dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x
293 dst.x = src.x - \lfloor src.x\rfloor
295 dst.y = src.y - \lfloor src.y\rfloor
297 dst.z = src.z - \lfloor src.z\rfloor
299 dst.w = src.w - \lfloor src.w\rfloor
306 dst.x = clamp(src0.x, src1.x, src2.x)
308 dst.y = clamp(src0.y, src1.y, src2.y)
310 dst.z = clamp(src0.z, src1.z, src2.z)
312 dst.w = clamp(src0.w, src1.w, src2.w)
317 This is identical to ARL.
321 dst.x = \lfloor src.x\rfloor
323 dst.y = \lfloor src.y\rfloor
325 dst.z = \lfloor src.z\rfloor
327 dst.w = \lfloor src.w\rfloor
343 EX2 - Exponential Base 2
356 LG2 - Logarithm Base 2
360 dst.x = \log_2{src.x}
362 dst.y = \log_2{src.x}
364 dst.z = \log_2{src.x}
366 dst.w = \log_2{src.x}
373 dst.x = src0.x^{src1.x}
375 dst.y = src0.x^{src1.x}
377 dst.z = src0.x^{src1.x}
379 dst.w = src0.x^{src1.x}
385 dst.x = src0.y \times src1.z - src1.y \times src0.z
387 dst.y = src0.z \times src1.x - src1.z \times src0.x
389 dst.z = src0.x \times src1.y - src1.x \times src0.y
407 RCC - Reciprocal Clamped
409 XXX cleanup on aisle three
413 dst.x = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
415 dst.y = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
417 dst.z = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
419 dst.w = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
422 DPH - Homogeneous Dot Product
426 dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
428 dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
430 dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
432 dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
448 DDX - Derivative Relative To X
452 dst.x = partialx(src.x)
454 dst.y = partialx(src.y)
456 dst.z = partialx(src.z)
458 dst.w = partialx(src.w)
461 DDY - Derivative Relative To Y
465 dst.x = partialy(src.x)
467 dst.y = partialy(src.y)
469 dst.z = partialy(src.z)
471 dst.w = partialy(src.w)
474 KILP - Predicated Discard
479 PK2H - Pack Two 16-bit Floats
484 PK2US - Pack Two Unsigned 16-bit Scalars
489 PK4B - Pack Four Signed 8-bit Scalars
494 PK4UB - Pack Four Unsigned 8-bit Scalars
499 RFL - Reflection Vector
503 dst.x = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.x - src1.x
505 dst.y = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.y - src1.y
507 dst.z = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.z - src1.z
511 Considered for removal.
518 dst.x = (src0.x == src1.x) ? 1 : 0
520 dst.y = (src0.y == src1.y) ? 1 : 0
522 dst.z = (src0.z == src1.z) ? 1 : 0
524 dst.w = (src0.w == src1.w) ? 1 : 0
539 Considered for removal.
541 SGT - Set On Greater Than
545 dst.x = (src0.x > src1.x) ? 1 : 0
547 dst.y = (src0.y > src1.y) ? 1 : 0
549 dst.z = (src0.z > src1.z) ? 1 : 0
551 dst.w = (src0.w > src1.w) ? 1 : 0
567 SLE - Set On Less Equal Than
571 dst.x = (src0.x <= src1.x) ? 1 : 0
573 dst.y = (src0.y <= src1.y) ? 1 : 0
575 dst.z = (src0.z <= src1.z) ? 1 : 0
577 dst.w = (src0.w <= src1.w) ? 1 : 0
580 SNE - Set On Not Equal
584 dst.x = (src0.x != src1.x) ? 1 : 0
586 dst.y = (src0.y != src1.y) ? 1 : 0
588 dst.z = (src0.z != src1.z) ? 1 : 0
590 dst.w = (src0.w != src1.w) ? 1 : 0
611 TXD - Texture Lookup with Derivatives
616 TXP - Projective Texture Lookup
621 UP2H - Unpack Two 16-Bit Floats
625 Considered for removal.
627 UP2US - Unpack Two Unsigned 16-Bit Scalars
631 Considered for removal.
633 UP4B - Unpack Four Signed 8-Bit Values
637 Considered for removal.
639 UP4UB - Unpack Four Unsigned 8-Bit Scalars
643 Considered for removal.
645 X2D - 2D Coordinate Transformation
649 dst.x = src0.x + src1.x \times src2.x + src1.y \times src2.y
651 dst.y = src0.y + src1.x \times src2.z + src1.y \times src2.w
653 dst.z = src0.x + src1.x \times src2.x + src1.y \times src2.y
655 dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w
657 Considered for removal.
660 From GL_NV_vertex_program2
661 ^^^^^^^^^^^^^^^^^^^^^^^^^^
664 ARA - Address Register Add
668 Considered for removal.
670 ARR - Address Register Load With Round
687 Considered for removal.
689 CAL - Subroutine Call
695 RET - Subroutine Call Return
699 Potential restrictions:
700 * Only occurs at end of function.
706 dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
708 dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
710 dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
712 dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
719 dst.x = (src0.x < 0) ? src1.x : src2.x
721 dst.y = (src0.y < 0) ? src1.y : src2.y
723 dst.z = (src0.z < 0) ? src1.z : src2.z
725 dst.w = (src0.w < 0) ? src1.w : src2.w
728 KIL - Conditional Discard
732 if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
750 TXB - Texture Lookup With Bias
755 NRM - 3-component Vector Normalise
759 dst.x = src.x / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
761 dst.y = src.y / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
763 dst.z = src.z / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
772 dst.x = \frac{src0.x}{src1.x}
774 dst.y = \frac{src0.y}{src1.y}
776 dst.z = \frac{src0.z}{src1.z}
778 dst.w = \frac{src0.w}{src1.w}
781 DP2 - 2-component Dot Product
785 dst.x = src0.x \times src1.x + src0.y \times src1.y
787 dst.y = src0.x \times src1.x + src0.y \times src1.y
789 dst.z = src0.x \times src1.x + src0.y \times src1.y
791 dst.w = src0.x \times src1.x + src0.y \times src1.y
794 TXL - Texture Lookup With LOD
809 BGNFOR - Begin a For-Loop
816 pc = [matching ENDFOR] + 1
819 Note: The destination must be a loop register.
820 The source must be a constant register.
822 Considered for cleanup / removal.
840 ENDFOR - End a For-Loop
842 dst.x = dst.x + dst.z
846 pc = [matching BGNFOR instruction] + 1
849 Note: The destination must be a loop register.
851 Considered for cleanup / removal.
858 PUSHA - Push Address Register On Stack
865 Considered for cleanup / removal.
867 POPA - Pop Address Register From Stack
874 Considered for cleanup / removal.
877 From GL_NV_gpu_program4
878 ^^^^^^^^^^^^^^^^^^^^^^^^
880 Support for these opcodes indicated by a special pipe capability bit (TBD).
886 dst.x = \lceil src.x\rceil
888 dst.y = \lceil src.y\rceil
890 dst.z = \lceil src.z\rceil
892 dst.w = \lceil src.w\rceil
895 I2F - Integer To Float
899 dst.x = (float) src.x
901 dst.y = (float) src.y
903 dst.z = (float) src.z
905 dst.w = (float) src.w
938 dst.x = src0.x << src1.x
940 dst.y = src0.y << src1.x
942 dst.z = src0.z << src1.x
944 dst.w = src0.w << src1.x
951 dst.x = src0.x >> src1.x
953 dst.y = src0.y >> src1.x
955 dst.z = src0.z >> src1.x
957 dst.w = src0.w >> src1.x
964 dst.x = src0.x & src1.x
966 dst.y = src0.y & src1.y
968 dst.z = src0.z & src1.z
970 dst.w = src0.w & src1.w
977 dst.x = src0.x | src1.x
979 dst.y = src0.y | src1.y
981 dst.z = src0.z | src1.z
983 dst.w = src0.w | src1.w
990 dst.x = src0.x \bmod src1.x
992 dst.y = src0.y \bmod src1.y
994 dst.z = src0.z \bmod src1.z
996 dst.w = src0.w \bmod src1.w
1003 dst.x = src0.x ^ src1.x
1005 dst.y = src0.y ^ src1.y
1007 dst.z = src0.z ^ src1.z
1009 dst.w = src0.w ^ src1.w
1012 SAD - Sum Of Absolute Differences
1016 dst.x = |src0.x - src1.x| + src2.x
1018 dst.y = |src0.y - src1.y| + src2.y
1020 dst.z = |src0.z - src1.z| + src2.z
1022 dst.w = |src0.w - src1.w| + src2.w
1030 TXQ - Texture Size Query
1040 From GL_NV_geometry_program4
1041 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1049 ENDPRIM - End Primitive
1058 BGNLOOP - Begin a Loop
1063 BGNSUB - Begin Subroutine
1068 ENDLOOP - End a Loop
1073 ENDSUB - End Subroutine
1083 NRM4 - 4-component Vector Normalise
1087 dst.x = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
1089 dst.y = \frac{src.y}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
1091 dst.z = \frac{src.z}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
1093 dst.w = \frac{src.w}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
1100 CALLNZ - Subroutine Call If Not Zero
1110 BREAKC - Break Conditional
1115 Explanation of symbols used
1116 ------------------------------
1123 :math:`|x|` Absolute value of `x`.
1125 :math:`\lceil x \rceil` Ceiling of `x`.
1127 clamp(x,y,z) Clamp x between y and z.
1128 (x < y) ? y : (x > z) ? z : x
1130 :math:`\lfloor x\rfloor` Floor of `x`.
1132 :math:`\log_2{x}` Logarithm of `x`, base 2.
1134 max(x,y) Maximum of x and y.
1137 min(x,y) Minimum of x and y.
1140 partialx(x) Derivative of x relative to fragment's X.
1142 partialy(x) Derivative of x relative to fragment's Y.
1144 pop() Pop from stack.
1146 :math:`x^y` `x` to the power `y`.
1148 push(x) Push x on stack.
1152 trunc(x) Truncate x, i.e. drop the fraction bits.
1159 discard Discard fragment.
1161 dst First destination register.
1163 dst0 First destination register.
1167 src First source register.
1169 src0 First source register.
1171 src1 Second source register.
1173 src2 Third source register.
1175 target Label of target instruction.
1182 Declaration Semantic
1183 ^^^^^^^^^^^^^^^^^^^^^^^^
1186 Follows Declaration token if Semantic bit is set.
1188 Since its purpose is to link a shader with other stages of the pipeline,
1189 it is valid to follow only those Declaration tokens that declare a register
1190 either in INPUT or OUTPUT file.
1192 SemanticName field contains the semantic name of the register being declared.
1193 There is no default value.
1195 SemanticIndex is an optional subscript that can be used to distinguish
1196 different register declarations with the same semantic name. The default value
1199 The meanings of the individual semantic names are explained in the following
1202 TGSI_SEMANTIC_POSITION
1203 """"""""""""""""""""""
1205 Position, sometimes known as HPOS or WPOS for historical reasons, is the
1206 location of the vertex in space, in ``(x, y, z, w)`` format. ``x``, ``y``, and ``z``
1207 are the Cartesian coordinates, and ``w`` is the homogenous coordinate and used
1208 for the perspective divide, if enabled.
1210 As a vertex shader output, position should be scaled to the viewport. When
1211 used in fragment shaders, position will ---
1213 XXX --- wait a minute. Should position be in [0,1] for x and y?
1215 XXX additionally, is there a way to configure the perspective divide? it's
1216 accelerated on most chipsets AFAIK...
1218 Position, if not specified, usually defaults to ``(0, 0, 0, 1)``, and can
1219 be partially specified as ``(x, y, 0, 1)`` or ``(x, y, z, 1)``.
1221 XXX usually? can we solidify that?
1226 Colors are used to, well, color the primitives. Colors are always in
1227 ``(r, g, b, a)`` format.
1229 If alpha is not specified, it defaults to 1.
1231 TGSI_SEMANTIC_BCOLOR
1232 """"""""""""""""""""
1234 Back-facing colors are only used for back-facing polygons, and are only valid
1235 in vertex shader outputs. After rasterization, all polygons are front-facing
1236 and COLOR and BCOLOR end up occupying the same slots in the fragment, so
1237 all BCOLORs effectively become regular COLORs in the fragment shader.
1242 The fog coordinate historically has been used to replace the depth coordinate
1243 for generation of fog in dedicated fog blocks. Gallium, however, does not use
1244 dedicated fog acceleration, placing it entirely in the fragment shader
1247 The fog coordinate should be written in ``(f, 0, 0, 1)`` format. Only the first
1248 component matters when writing from the vertex shader; the driver will ensure
1249 that the coordinate is in this format when used as a fragment shader input.
1254 PSIZE, or point size, is used to specify point sizes per-vertex. It should
1255 be in ``(p, n, x, f)`` format, where ``p`` is the point size, ``n`` is the minimum
1256 size, ``x`` is the maximum size, and ``f`` is the fade threshold.
1258 XXX this is arb_vp. is this what we actually do? should double-check...
1260 When using this semantic, be sure to set the appropriate state in the
1261 :ref:`rasterizer` first.
1263 TGSI_SEMANTIC_GENERIC
1264 """""""""""""""""""""
1266 Generic semantics are nearly always used for texture coordinate attributes,
1267 in ``(s, t, r, q)`` format. ``t`` and ``r`` may be unused for certain kinds
1268 of lookups, and ``q`` is the level-of-detail bias for biased sampling.
1270 These attributes are called "generic" because they may be used for anything
1271 else, including parameters, texture generation information, or anything that
1272 can be stored inside a four-component vector.
1274 TGSI_SEMANTIC_NORMAL
1275 """"""""""""""""""""
1277 Vertex normal; could be used to implement per-pixel lighting for legacy APIs
1278 that allow mixing fixed-function and programmable stages.
1283 FACE is the facing bit, to store the facing information for the fragment
1284 shader. ``(f, 0, 0, 1)`` is the format. The first component will be positive
1285 when the fragment is front-facing, and negative when the component is
1288 TGSI_SEMANTIC_EDGEFLAG
1289 """"""""""""""""""""""