1 TGSI Instruction Specification
2 ==============================
3 ==============================
6 1 Instruction Set Operations
7 =============================
10 1.1 GL_NV_vertex_program
11 -------------------------
14 1.1.1 ARL - Address Register Load
30 1.1.3 LIT - Light Coefficients
33 dst.y = max(src.x, 0.0)
34 dst.z = (src.x > 0.0) ? pow(max(src.y, 0.0), clamp(src.w, -128.0, 128.0)) : 0.0
38 1.1.4 RCP - Reciprocal
46 1.1.5 RSQ - Reciprocal Square Root
48 dst.x = 1.0 / sqrt(abs(src.x))
49 dst.y = 1.0 / sqrt(abs(src.x))
50 dst.z = 1.0 / sqrt(abs(src.x))
51 dst.w = 1.0 / sqrt(abs(src.x))
54 1.1.6 EXP - Approximate Exponential Base 2
56 dst.x = pow(2.0, floor(src.x))
57 dst.y = src.x - floor(src.x)
58 dst.z = pow(2.0, src.x)
62 1.1.7 LOG - Approximate Logarithm Base 2
64 dst.x = floor(lg2(abs(src.x)))
65 dst.y = abs(src.x) / pow(2.0, floor(lg2(abs(src.x))))
66 dst.z = lg2(abs(src.x))
72 dst.x = src0.x * src1.x
73 dst.y = src0.y * src1.y
74 dst.z = src0.z * src1.z
75 dst.w = src0.w * src1.w
80 dst.x = src0.x + src1.x
81 dst.y = src0.y + src1.y
82 dst.z = src0.z + src1.z
83 dst.w = src0.w + src1.w
86 1.1.10 DP3 - 3-component Dot Product
88 dst.x = src0.x * src1.x + src0.y * src1.y + src0.z * src1.z
89 dst.y = src0.x * src1.x + src0.y * src1.y + src0.z * src1.z
90 dst.z = src0.x * src1.x + src0.y * src1.y + src0.z * src1.z
91 dst.w = src0.x * src1.x + src0.y * src1.y + src0.z * src1.z
94 1.1.11 DP4 - 4-component Dot Product
96 dst.x = src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src0.w * src1.w
97 dst.y = src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src0.w * src1.w
98 dst.z = src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src0.w * src1.w
99 dst.w = src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src0.w * src1.w
102 1.1.12 DST - Distance Vector
105 dst.y = src0.y * src1.y
112 dst.x = min(src0.x, src1.x)
113 dst.y = min(src0.y, src1.y)
114 dst.z = min(src0.z, src1.z)
115 dst.w = min(src0.w, src1.w)
120 dst.x = max(src0.x, src1.x)
121 dst.y = max(src0.y, src1.y)
122 dst.z = max(src0.z, src1.z)
123 dst.w = max(src0.w, src1.w)
126 1.1.15 SLT - Set On Less Than
128 dst.x = (src0.x < src1.x) ? 1.0 : 0.0
129 dst.y = (src0.y < src1.y) ? 1.0 : 0.0
130 dst.z = (src0.z < src1.z) ? 1.0 : 0.0
131 dst.w = (src0.w < src1.w) ? 1.0 : 0.0
134 1.1.16 SGE - Set On Greater Equal Than
136 dst.x = (src0.x >= src1.x) ? 1.0 : 0.0
137 dst.y = (src0.y >= src1.y) ? 1.0 : 0.0
138 dst.z = (src0.z >= src1.z) ? 1.0 : 0.0
139 dst.w = (src0.w >= src1.w) ? 1.0 : 0.0
142 1.1.17 MAD - Multiply And Add
144 dst.x = src0.x * src1.x + src2.x
145 dst.y = src0.y * src1.y + src2.y
146 dst.z = src0.z * src1.z + src2.z
147 dst.w = src0.w * src1.w + src2.w
150 1.2 GL_ATI_fragment_shader
151 ---------------------------
156 dst.x = src0.x - src1.x
157 dst.y = src0.y - src1.y
158 dst.z = src0.z - src1.z
159 dst.w = src0.w - src1.w
162 1.2.2 DOT3 - 3-component Dot Product
167 1.2.3 DOT4 - 4-component Dot Product
172 1.2.4 LERP - Linear Interpolate
174 dst.x = src0.x * (src1.x - src2.x) + src2.x
175 dst.y = src0.y * (src1.y - src2.y) + src2.y
176 dst.z = src0.z * (src1.z - src2.z) + src2.z
177 dst.w = src0.w * (src1.w - src2.w) + src2.w
180 1.2.5 CND - Condition
182 dst.x = (src2.x > 0.5) ? src0.x : src1.x
183 dst.y = (src2.y > 0.5) ? src0.y : src1.y
184 dst.z = (src2.z > 0.5) ? src0.z : src1.z
185 dst.w = (src2.w > 0.5) ? src0.w : src1.w
188 1.2.6 CND0 - Condition Zero
190 Removed. Use (CMP src2, src1, src0) instead.
192 1.2.7 DOT2ADD - 2-component Dot Product And Add
194 dst.x = src0.x * src1.x + src0.y * src1.y + src2.x
195 dst.y = src0.x * src1.x + src0.y * src1.y + src2.x
196 dst.z = src0.x * src1.x + src0.y * src1.y + src2.x
197 dst.w = src0.x * src1.x + src0.y * src1.y + src2.x
200 1.3 GL_EXT_vertex_shader
201 -------------------------
204 1.3.1 INDEX - Array Lookup
206 Considered for removal from language.
209 1.3.2 NEGATE - Negate
211 Considered for removal from language.
214 1.3.3 MADD - Multiply And Add
219 1.3.4 FRAC - Fraction
221 dst.x = src.x - floor(src.x)
222 dst.y = src.y - floor(src.y)
223 dst.z = src.z - floor(src.z)
224 dst.w = src.w - floor(src.w)
227 1.3.5 SETGE - Set On Greater Equal
232 1.3.6 SETLT - Set On Less Than
239 dst.x = clamp(src0.x, src1.x, src2.x)
240 dst.y = clamp(src0.y, src1.y, src2.y)
241 dst.z = clamp(src0.z, src1.z, src2.z)
242 dst.w = clamp(src0.w, src1.w, src2.w)
261 1.3.10 EXPBASE2 - Exponential Base 2
263 dst.x = pow(2.0, src.x)
264 dst.y = pow(2.0, src.x)
265 dst.z = pow(2.0, src.x)
266 dst.w = pow(2.0, src.x)
269 1.3.11 LOGBASE2 - Logarithm Base 2
279 dst.x = pow(src0.x, src1.x)
280 dst.y = pow(src0.x, src1.x)
281 dst.z = pow(src0.x, src1.x)
282 dst.w = pow(src0.x, src1.x)
285 1.3.13 RECIP - Reciprocal
290 1.3.14 RECIPSQRT - Reciprocal Square Root
295 1.3.15 CROSSPRODUCT - Cross Product
297 dst.x = src0.y * src1.z - src1.y * src0.z
298 dst.y = src0.z * src1.x - src1.z * src0.x
299 dst.z = src0.x * src1.y - src1.x * src0.y
303 1.3.16 MULTIPLYMATRIX - Multiply Matrix
305 Considered for removal from language.
308 1.4 GL_NV_vertex_program1_1
309 ----------------------------
320 1.4.2 RCC - Reciprocal Clamped
322 dst.x = (1.0 / src.x) > 0.0 ? clamp(1.0 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1.0 / src.x, -1.884467e+019, -5.42101e-020)
323 dst.y = (1.0 / src.x) > 0.0 ? clamp(1.0 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1.0 / src.x, -1.884467e+019, -5.42101e-020)
324 dst.z = (1.0 / src.x) > 0.0 ? clamp(1.0 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1.0 / src.x, -1.884467e+019, -5.42101e-020)
325 dst.w = (1.0 / src.x) > 0.0 ? clamp(1.0 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1.0 / src.x, -1.884467e+019, -5.42101e-020)
328 1.4.3 DPH - Homogeneous Dot Product
330 dst.x = src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src1.w
331 dst.y = src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src1.w
332 dst.z = src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src1.w
333 dst.w = src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src1.w
336 1.5 GL_NV_fragment_program
337 ---------------------------
348 1.5.2 DDX - Derivative Relative To X
350 dst.x = partialx(src.x)
351 dst.y = partialx(src.y)
352 dst.z = partialx(src.z)
353 dst.w = partialx(src.w)
356 1.5.3 DDY - Derivative Relative To Y
358 dst.x = partialy(src.x)
359 dst.y = partialy(src.y)
360 dst.z = partialy(src.z)
361 dst.w = partialy(src.w)
364 1.5.4 EX2 - Exponential Base 2
379 1.5.7 KILP - Predicated Discard
384 1.5.8 LG2 - Logarithm Base 2
389 1.5.9 LRP - Linear Interpolate
394 1.5.10 PK2H - Pack Two 16-bit Floats
399 1.5.11 PK2US - Pack Two Unsigned 16-bit Scalars
404 1.5.12 PK4B - Pack Four Signed 8-bit Scalars
409 1.5.13 PK4UB - Pack Four Unsigned 8-bit Scalars
419 1.5.15 RFL - Reflection Vector
421 dst.x = 2.0 * (src0.x * src1.x + src0.y * src1.y + src0.z * src1.z) / (src0.x * src0.x + src0.y * src0.y + src0.z * src0.z) * src0.x - src1.x
422 dst.y = 2.0 * (src0.x * src1.x + src0.y * src1.y + src0.z * src1.z) / (src0.x * src0.x + src0.y * src0.y + src0.z * src0.z) * src0.y - src1.y
423 dst.z = 2.0 * (src0.x * src1.x + src0.y * src1.y + src0.z * src1.z) / (src0.x * src0.x + src0.y * src0.y + src0.z * src0.z) * src0.z - src1.z
427 1.5.16 SEQ - Set On Equal
429 dst.x = (src0.x == src1.x) ? 1.0 : 0.0
430 dst.y = (src0.y == src1.y) ? 1.0 : 0.0
431 dst.z = (src0.z == src1.z) ? 1.0 : 0.0
432 dst.w = (src0.w == src1.w) ? 1.0 : 0.0
435 1.5.17 SFL - Set On False
443 1.5.18 SGT - Set On Greater Than
445 dst.x = (src0.x > src1.x) ? 1.0 : 0.0
446 dst.y = (src0.y > src1.y) ? 1.0 : 0.0
447 dst.z = (src0.z > src1.z) ? 1.0 : 0.0
448 dst.w = (src0.w > src1.w) ? 1.0 : 0.0
459 1.5.20 SLE - Set On Less Equal Than
461 dst.x = (src0.x <= src1.x) ? 1.0 : 0.0
462 dst.y = (src0.y <= src1.y) ? 1.0 : 0.0
463 dst.z = (src0.z <= src1.z) ? 1.0 : 0.0
464 dst.w = (src0.w <= src1.w) ? 1.0 : 0.0
467 1.5.21 SNE - Set On Not Equal
469 dst.x = (src0.x != src1.x) ? 1.0 : 0.0
470 dst.y = (src0.y != src1.y) ? 1.0 : 0.0
471 dst.z = (src0.z != src1.z) ? 1.0 : 0.0
472 dst.w = (src0.w != src1.w) ? 1.0 : 0.0
475 1.5.22 STR - Set On True
483 1.5.23 TEX - Texture Lookup
488 1.5.24 TXD - Texture Lookup with Derivatives
493 1.5.25 TXP - Projective Texture Lookup
498 1.5.26 UP2H - Unpack Two 16-Bit Floats
503 1.5.27 UP2US - Unpack Two Unsigned 16-Bit Scalars
508 1.5.28 UP4B - Unpack Four Signed 8-Bit Values
513 1.5.29 UP4UB - Unpack Four Unsigned 8-Bit Scalars
518 1.5.30 X2D - 2D Coordinate Transformation
520 dst.x = src0.x + src1.x * src2.x + src1.y * src2.y
521 dst.y = src0.y + src1.x * src2.z + src1.y * src2.w
522 dst.z = src0.x + src1.x * src2.x + src1.y * src2.y
523 dst.w = src0.y + src1.x * src2.z + src1.y * src2.w
526 1.6 GL_NV_vertex_program2
527 --------------------------
530 1.6.1 ARA - Address Register Add
535 1.6.2 ARR - Address Register Load With Round
548 1.6.4 CAL - Subroutine Call
554 1.6.5 RET - Subroutine Call Return
561 dst.x = (src.x > 0.0) ? 1.0 : (src.x < 0.0) ? -1.0 : 0.0
562 dst.y = (src.y > 0.0) ? 1.0 : (src.y < 0.0) ? -1.0 : 0.0
563 dst.z = (src.z > 0.0) ? 1.0 : (src.z < 0.0) ? -1.0 : 0.0
564 dst.w = (src.w > 0.0) ? 1.0 : (src.w < 0.0) ? -1.0 : 0.0
567 1.7 GL_ARB_vertex_program
568 --------------------------
571 1.7.1 SWZ - Extended Swizzle
579 1.7.2 XPD - Cross Product
581 Alias for CROSSPRODUCT.
584 1.8 GL_ARB_fragment_program
585 ----------------------------
590 dst.x = (src0.x < 0.0) ? src1.x : src2.x
591 dst.y = (src0.y < 0.0) ? src1.y : src2.y
592 dst.z = (src0.z < 0.0) ? src1.z : src2.z
593 dst.w = (src0.w < 0.0) ? src1.w : src2.w
596 1.8.2 KIL - Conditional Discard
598 if (src.x < 0.0 || src.y < 0.0 || src.z < 0.0 || src.w < 0.0)
603 1.8.3 SCS - Sine Cosine
611 1.8.4 TXB - Texture Lookup With Bias
616 1.9 GL_NV_fragment_program2
617 ----------------------------
620 1.9.1 NRM - 3-component Vector Normalise
622 dst.x = src.x / (src.x * src.x + src.y * src.y + src.z * src.z)
623 dst.y = src.y / (src.x * src.x + src.y * src.y + src.z * src.z)
624 dst.z = src.z / (src.x * src.x + src.y * src.y + src.z * src.z)
630 dst.x = src0.x / src1.x
631 dst.y = src0.y / src1.y
632 dst.z = src0.z / src1.z
633 dst.w = src0.w / src1.w
636 1.9.3 DP2 - 2-component Dot Product
638 dst.x = src0.x * src1.x + src0.y * src1.y
639 dst.y = src0.x * src1.x + src0.y * src1.y
640 dst.z = src0.x * src1.x + src0.y * src1.y
641 dst.w = src0.x * src1.x + src0.y * src1.y
644 1.9.4 DP2A - 2-component Dot Product And Add
649 1.9.5 TXL - Texture Lookup With LOD
669 1.9.11 ENDIF - End If
674 1.10 GL_NV_vertex_program3
675 ---------------------------
678 1.10.1 PUSHA - Push Address Register On Stack
686 1.10.2 POPA - Pop Address Register From Stack
694 1.11 GL_NV_gpu_program4
695 ------------------------
698 1.11.1 CEIL - Ceiling
706 1.11.2 I2F - Integer To Float
708 dst.x = (float) src.x
709 dst.y = (float) src.y
710 dst.z = (float) src.z
711 dst.w = (float) src.w
714 1.11.3 NOT - Bitwise Not
722 1.11.4 TRUNC - Truncate
730 1.11.5 SHL - Shift Left
732 dst.x = src0.x << src1.x
733 dst.y = src0.y << src1.x
734 dst.z = src0.z << src1.x
735 dst.w = src0.w << src1.x
738 1.11.6 SHR - Shift Right
740 dst.x = src0.x >> src1.x
741 dst.y = src0.y >> src1.x
742 dst.z = src0.z >> src1.x
743 dst.w = src0.w >> src1.x
746 1.11.7 AND - Bitwise And
748 dst.x = src0.x & src1.x
749 dst.y = src0.y & src1.y
750 dst.z = src0.z & src1.z
751 dst.w = src0.w & src1.w
754 1.11.8 OR - Bitwise Or
756 dst.x = src0.x | src1.x
757 dst.y = src0.y | src1.y
758 dst.z = src0.z | src1.z
759 dst.w = src0.w | src1.w
764 dst.x = src0.x % src1.x
765 dst.y = src0.y % src1.y
766 dst.z = src0.z % src1.z
767 dst.w = src0.w % src1.w
770 1.11.10 XOR - Bitwise Xor
772 dst.x = src0.x ^ src1.x
773 dst.y = src0.y ^ src1.y
774 dst.z = src0.z ^ src1.z
775 dst.w = src0.w ^ src1.w
778 1.11.11 SAD - Sum Of Absolute Differences
780 dst.x = abs(src0.x - src1.x) + src2.x
781 dst.y = abs(src0.y - src1.y) + src2.y
782 dst.z = abs(src0.z - src1.z) + src2.z
783 dst.w = abs(src0.w - src1.w) + src2.w
786 1.11.12 TXF - Texel Fetch
791 1.11.13 TXQ - Texture Size Query
796 1.11.14 CONT - Continue
801 1.12 GL_NV_geometry_program4
802 -----------------------------
810 1.12.2 ENDPRIM - End Primitive
819 1.13.1 BGNLOOP - Begin a Loop
824 1.13.2 BGNSUB - Begin Subroutine
829 1.13.3 ENDLOOP - End a Loop
834 1.13.4 ENDSUB - End Subroutine
839 1.13.5 INT - Truncate
844 1.13.6 NOISE1 - 1D Noise
849 1.13.7 NOISE2 - 2D Noise
854 1.13.8 NOISE3 - 3D Noise
859 1.13.9 NOISE4 - 4D Noise
864 1.13.10 NOP - No Operation
873 1.14.1 TEXKILL - Conditional Discard
882 1.15.1 TEXLD - Texture Lookup
891 1.16.1 M4X4 - Multiply Matrix
893 Alias for MULTIPLYMATRIX.
896 1.16.2 M4X3 - Multiply Matrix
898 Considered for removal from language.
901 1.16.3 M3X4 - Multiply Matrix
903 Considered for removal from language.
906 1.16.4 M3X3 - Multiply Matrix
908 Considered for removal from language.
911 1.16.5 M3X2 - Multiply Matrix
913 Considered for removal from language.
916 1.16.6 CRS - Cross Product
921 1.16.7 NRM4 - 4-component Vector Normalise
923 dst.x = src.x / (src.x * src.x + src.y * src.y + src.z * src.z + src.w * src.w)
924 dst.y = src.y / (src.x * src.x + src.y * src.y + src.z * src.z + src.w * src.w)
925 dst.z = src.z / (src.x * src.x + src.y * src.y + src.z * src.z + src.w * src.w)
926 dst.w = src.w / (src.x * src.x + src.y * src.y + src.z * src.z + src.w * src.w)
929 1.16.8 SINCOS - Sine Cosine
934 1.16.9 TEXLDB - Texture Lookup With Bias
939 1.16.10 DP2ADD - 2-component Dot Product And Add
948 1.17.1 CALL - Subroutine Call
953 1.17.2 CALLNZ - Subroutine Call If Not Zero
968 1.17.5 BREAKC - Break Conditional
973 1.17.6 DSX - Derivative Relative To X
978 1.17.7 DSY - Derivative Relative To Y
983 1.17.8 TEXLDD - Texture Lookup with Derivatives
992 1.18.1 EXPP - Approximate Exponential Base 2
994 Use EXP. See also 1.19.3.
997 1.18.2 LOGP - Logarithm Base 2
999 Use LOG. See also 1.19.4.
1006 1.19.1 SGN - Set Sign
1011 1.19.2 MOVA - Move Address Register
1016 1.19.3 EXPP - Approximate Exponential Base 2
1021 1.19.4 LOGP - Logarithm Base 2
1026 2 Explanation of symbols used
1027 ==============================
1034 abs(x) Absolute value of x.
1038 ceil(x) Ceiling of x.
1040 clamp(x,y,z) Clamp x between y and z.
1041 (x < y) ? y : (x > z) ? z : x
1045 floor(x) Floor of x.
1047 lg2(x) Logarithm base 2 of x.
1049 max(x,y) Maximum of x and y.
1052 min(x,y) Minimum of x and y.
1055 partialx(x) Derivative of x relative to fragment's X.
1057 partialy(x) Derivative of x relative to fragment's Y.
1059 pop() Pop from stack.
1061 pow(x,y) Raise x to power of y.
1063 push(x) Push x on stack.
1069 sqrt(x) Square root of x.
1071 trunc(x) Truncate x.
1078 discard Discard fragment.
1080 dst First destination register.
1082 dst0 First destination register.
1086 src First source register.
1088 src0 First source register.
1090 src1 Second source register.
1092 src2 Third source register.
1094 target Label of target instruction.
1101 3.1 Declaration Semantic
1102 -------------------------
1105 Follows Declaration token if Semantic bit is set.
1107 Since its purpose is to link a shader with other stages of the pipeline,
1108 it is valid to follow only those Declaration tokens that declare a register
1109 either in INPUT or OUTPUT file.
1111 SemanticName field contains the semantic name of the register being declared.
1112 There is no default value.
1114 SemanticIndex is an optional subscript that can be used to distinguish
1115 different register declarations with the same semantic name. The default value
1118 The meanings of the individual semantic names are explained in the following
1124 Valid only in a fragment shader INPUT declaration.
1126 FACE.x is negative when the primitive is back facing. FACE.x is positive
1127 when the primitive is front facing.