Merge branch 'gallium-newclear'
[mesa.git] / src / gallium / docs / source / tgsi.rst
1 TGSI
2 ====
3
4 TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
5 for describing shaders. Since Gallium is inherently shaderful, shaders are
6 an important part of the API. TGSI is the only intermediate representation
7 used by all drivers.
8
9 Basics
10 ------
11
12 All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
13 floating-point four-component vectors. An opcode may have up to one
14 destination register, known as *dst*, and between zero and three source
15 registers, called *src0* through *src2*, or simply *src* if there is only
16 one.
17
18 Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
19 components as integers. Other instructions permit using registers as
20 two-component vectors with double precision; see :ref:`Double Opcodes`.
21
22 When an instruction has a scalar result, the result is usually copied into
23 each of the components of *dst*. When this happens, the result is said to be
24 *replicated* to *dst*. :opcode:`RCP` is one such instruction.
25
26 Instruction Set
27 ---------------
28
29 From GL_NV_vertex_program
30 ^^^^^^^^^^^^^^^^^^^^^^^^^
31
32
33 .. opcode:: ARL - Address Register Load
34
35 .. math::
36
37 dst.x = \lfloor src.x\rfloor
38
39 dst.y = \lfloor src.y\rfloor
40
41 dst.z = \lfloor src.z\rfloor
42
43 dst.w = \lfloor src.w\rfloor
44
45
46 .. opcode:: MOV - Move
47
48 .. math::
49
50 dst.x = src.x
51
52 dst.y = src.y
53
54 dst.z = src.z
55
56 dst.w = src.w
57
58
59 .. opcode:: LIT - Light Coefficients
60
61 .. math::
62
63 dst.x = 1
64
65 dst.y = max(src.x, 0)
66
67 dst.z = (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0
68
69 dst.w = 1
70
71
72 .. opcode:: RCP - Reciprocal
73
74 This instruction replicates its result.
75
76 .. math::
77
78 dst = \frac{1}{src.x}
79
80
81 .. opcode:: RSQ - Reciprocal Square Root
82
83 This instruction replicates its result.
84
85 .. math::
86
87 dst = \frac{1}{\sqrt{|src.x|}}
88
89
90 .. opcode:: EXP - Approximate Exponential Base 2
91
92 .. math::
93
94 dst.x = 2^{\lfloor src.x\rfloor}
95
96 dst.y = src.x - \lfloor src.x\rfloor
97
98 dst.z = 2^{src.x}
99
100 dst.w = 1
101
102
103 .. opcode:: LOG - Approximate Logarithm Base 2
104
105 .. math::
106
107 dst.x = \lfloor\log_2{|src.x|}\rfloor
108
109 dst.y = \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}}
110
111 dst.z = \log_2{|src.x|}
112
113 dst.w = 1
114
115
116 .. opcode:: MUL - Multiply
117
118 .. math::
119
120 dst.x = src0.x \times src1.x
121
122 dst.y = src0.y \times src1.y
123
124 dst.z = src0.z \times src1.z
125
126 dst.w = src0.w \times src1.w
127
128
129 .. opcode:: ADD - Add
130
131 .. math::
132
133 dst.x = src0.x + src1.x
134
135 dst.y = src0.y + src1.y
136
137 dst.z = src0.z + src1.z
138
139 dst.w = src0.w + src1.w
140
141
142 .. opcode:: DP3 - 3-component Dot Product
143
144 This instruction replicates its result.
145
146 .. math::
147
148 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
149
150
151 .. opcode:: DP4 - 4-component Dot Product
152
153 This instruction replicates its result.
154
155 .. math::
156
157 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
158
159
160 .. opcode:: DST - Distance Vector
161
162 .. math::
163
164 dst.x = 1
165
166 dst.y = src0.y \times src1.y
167
168 dst.z = src0.z
169
170 dst.w = src1.w
171
172
173 .. opcode:: MIN - Minimum
174
175 .. math::
176
177 dst.x = min(src0.x, src1.x)
178
179 dst.y = min(src0.y, src1.y)
180
181 dst.z = min(src0.z, src1.z)
182
183 dst.w = min(src0.w, src1.w)
184
185
186 .. opcode:: MAX - Maximum
187
188 .. math::
189
190 dst.x = max(src0.x, src1.x)
191
192 dst.y = max(src0.y, src1.y)
193
194 dst.z = max(src0.z, src1.z)
195
196 dst.w = max(src0.w, src1.w)
197
198
199 .. opcode:: SLT - Set On Less Than
200
201 .. math::
202
203 dst.x = (src0.x < src1.x) ? 1 : 0
204
205 dst.y = (src0.y < src1.y) ? 1 : 0
206
207 dst.z = (src0.z < src1.z) ? 1 : 0
208
209 dst.w = (src0.w < src1.w) ? 1 : 0
210
211
212 .. opcode:: SGE - Set On Greater Equal Than
213
214 .. math::
215
216 dst.x = (src0.x >= src1.x) ? 1 : 0
217
218 dst.y = (src0.y >= src1.y) ? 1 : 0
219
220 dst.z = (src0.z >= src1.z) ? 1 : 0
221
222 dst.w = (src0.w >= src1.w) ? 1 : 0
223
224
225 .. opcode:: MAD - Multiply And Add
226
227 .. math::
228
229 dst.x = src0.x \times src1.x + src2.x
230
231 dst.y = src0.y \times src1.y + src2.y
232
233 dst.z = src0.z \times src1.z + src2.z
234
235 dst.w = src0.w \times src1.w + src2.w
236
237
238 .. opcode:: SUB - Subtract
239
240 .. math::
241
242 dst.x = src0.x - src1.x
243
244 dst.y = src0.y - src1.y
245
246 dst.z = src0.z - src1.z
247
248 dst.w = src0.w - src1.w
249
250
251 .. opcode:: LRP - Linear Interpolate
252
253 .. math::
254
255 dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
256
257 dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
258
259 dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
260
261 dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
262
263
264 .. opcode:: CND - Condition
265
266 .. math::
267
268 dst.x = (src2.x > 0.5) ? src0.x : src1.x
269
270 dst.y = (src2.y > 0.5) ? src0.y : src1.y
271
272 dst.z = (src2.z > 0.5) ? src0.z : src1.z
273
274 dst.w = (src2.w > 0.5) ? src0.w : src1.w
275
276
277 .. opcode:: DP2A - 2-component Dot Product And Add
278
279 .. math::
280
281 dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x
282
283 dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x
284
285 dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x
286
287 dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x
288
289
290 .. opcode:: FRC - Fraction
291
292 .. math::
293
294 dst.x = src.x - \lfloor src.x\rfloor
295
296 dst.y = src.y - \lfloor src.y\rfloor
297
298 dst.z = src.z - \lfloor src.z\rfloor
299
300 dst.w = src.w - \lfloor src.w\rfloor
301
302
303 .. opcode:: CLAMP - Clamp
304
305 .. math::
306
307 dst.x = clamp(src0.x, src1.x, src2.x)
308
309 dst.y = clamp(src0.y, src1.y, src2.y)
310
311 dst.z = clamp(src0.z, src1.z, src2.z)
312
313 dst.w = clamp(src0.w, src1.w, src2.w)
314
315
316 .. opcode:: FLR - Floor
317
318 This is identical to :opcode:`ARL`.
319
320 .. math::
321
322 dst.x = \lfloor src.x\rfloor
323
324 dst.y = \lfloor src.y\rfloor
325
326 dst.z = \lfloor src.z\rfloor
327
328 dst.w = \lfloor src.w\rfloor
329
330
331 .. opcode:: ROUND - Round
332
333 .. math::
334
335 dst.x = round(src.x)
336
337 dst.y = round(src.y)
338
339 dst.z = round(src.z)
340
341 dst.w = round(src.w)
342
343
344 .. opcode:: EX2 - Exponential Base 2
345
346 This instruction replicates its result.
347
348 .. math::
349
350 dst = 2^{src.x}
351
352
353 .. opcode:: LG2 - Logarithm Base 2
354
355 This instruction replicates its result.
356
357 .. math::
358
359 dst = \log_2{src.x}
360
361
362 .. opcode:: POW - Power
363
364 This instruction replicates its result.
365
366 .. math::
367
368 dst = src0.x^{src1.x}
369
370 .. opcode:: XPD - Cross Product
371
372 .. math::
373
374 dst.x = src0.y \times src1.z - src1.y \times src0.z
375
376 dst.y = src0.z \times src1.x - src1.z \times src0.x
377
378 dst.z = src0.x \times src1.y - src1.x \times src0.y
379
380 dst.w = 1
381
382
383 .. opcode:: ABS - Absolute
384
385 .. math::
386
387 dst.x = |src.x|
388
389 dst.y = |src.y|
390
391 dst.z = |src.z|
392
393 dst.w = |src.w|
394
395
396 .. opcode:: RCC - Reciprocal Clamped
397
398 This instruction replicates its result.
399
400 XXX cleanup on aisle three
401
402 .. math::
403
404 dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
405
406
407 .. opcode:: DPH - Homogeneous Dot Product
408
409 This instruction replicates its result.
410
411 .. math::
412
413 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
414
415
416 .. opcode:: COS - Cosine
417
418 This instruction replicates its result.
419
420 .. math::
421
422 dst = \cos{src.x}
423
424
425 .. opcode:: DDX - Derivative Relative To X
426
427 .. math::
428
429 dst.x = partialx(src.x)
430
431 dst.y = partialx(src.y)
432
433 dst.z = partialx(src.z)
434
435 dst.w = partialx(src.w)
436
437
438 .. opcode:: DDY - Derivative Relative To Y
439
440 .. math::
441
442 dst.x = partialy(src.x)
443
444 dst.y = partialy(src.y)
445
446 dst.z = partialy(src.z)
447
448 dst.w = partialy(src.w)
449
450
451 .. opcode:: KILP - Predicated Discard
452
453 discard
454
455
456 .. opcode:: PK2H - Pack Two 16-bit Floats
457
458 TBD
459
460
461 .. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
462
463 TBD
464
465
466 .. opcode:: PK4B - Pack Four Signed 8-bit Scalars
467
468 TBD
469
470
471 .. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
472
473 TBD
474
475
476 .. opcode:: RFL - Reflection Vector
477
478 .. math::
479
480 dst.x = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.x - src1.x
481
482 dst.y = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.y - src1.y
483
484 dst.z = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.z - src1.z
485
486 dst.w = 1
487
488 .. note::
489
490 Considered for removal.
491
492
493 .. opcode:: SEQ - Set On Equal
494
495 .. math::
496
497 dst.x = (src0.x == src1.x) ? 1 : 0
498
499 dst.y = (src0.y == src1.y) ? 1 : 0
500
501 dst.z = (src0.z == src1.z) ? 1 : 0
502
503 dst.w = (src0.w == src1.w) ? 1 : 0
504
505
506 .. opcode:: SFL - Set On False
507
508 This instruction replicates its result.
509
510 .. math::
511
512 dst = 0
513
514 .. note::
515
516 Considered for removal.
517
518
519 .. opcode:: SGT - Set On Greater Than
520
521 .. math::
522
523 dst.x = (src0.x > src1.x) ? 1 : 0
524
525 dst.y = (src0.y > src1.y) ? 1 : 0
526
527 dst.z = (src0.z > src1.z) ? 1 : 0
528
529 dst.w = (src0.w > src1.w) ? 1 : 0
530
531
532 .. opcode:: SIN - Sine
533
534 This instruction replicates its result.
535
536 .. math::
537
538 dst = \sin{src.x}
539
540
541 .. opcode:: SLE - Set On Less Equal Than
542
543 .. math::
544
545 dst.x = (src0.x <= src1.x) ? 1 : 0
546
547 dst.y = (src0.y <= src1.y) ? 1 : 0
548
549 dst.z = (src0.z <= src1.z) ? 1 : 0
550
551 dst.w = (src0.w <= src1.w) ? 1 : 0
552
553
554 .. opcode:: SNE - Set On Not Equal
555
556 .. math::
557
558 dst.x = (src0.x != src1.x) ? 1 : 0
559
560 dst.y = (src0.y != src1.y) ? 1 : 0
561
562 dst.z = (src0.z != src1.z) ? 1 : 0
563
564 dst.w = (src0.w != src1.w) ? 1 : 0
565
566
567 .. opcode:: STR - Set On True
568
569 This instruction replicates its result.
570
571 .. math::
572
573 dst = 1
574
575
576 .. opcode:: TEX - Texture Lookup
577
578 TBD
579
580
581 .. opcode:: TXD - Texture Lookup with Derivatives
582
583 TBD
584
585
586 .. opcode:: TXP - Projective Texture Lookup
587
588 TBD
589
590
591 .. opcode:: UP2H - Unpack Two 16-Bit Floats
592
593 TBD
594
595 .. note::
596
597 Considered for removal.
598
599 .. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
600
601 TBD
602
603 .. note::
604
605 Considered for removal.
606
607 .. opcode:: UP4B - Unpack Four Signed 8-Bit Values
608
609 TBD
610
611 .. note::
612
613 Considered for removal.
614
615 .. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
616
617 TBD
618
619 .. note::
620
621 Considered for removal.
622
623 .. opcode:: X2D - 2D Coordinate Transformation
624
625 .. math::
626
627 dst.x = src0.x + src1.x \times src2.x + src1.y \times src2.y
628
629 dst.y = src0.y + src1.x \times src2.z + src1.y \times src2.w
630
631 dst.z = src0.x + src1.x \times src2.x + src1.y \times src2.y
632
633 dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w
634
635 .. note::
636
637 Considered for removal.
638
639
640 From GL_NV_vertex_program2
641 ^^^^^^^^^^^^^^^^^^^^^^^^^^
642
643
644 .. opcode:: ARA - Address Register Add
645
646 TBD
647
648 .. note::
649
650 Considered for removal.
651
652 .. opcode:: ARR - Address Register Load With Round
653
654 .. math::
655
656 dst.x = round(src.x)
657
658 dst.y = round(src.y)
659
660 dst.z = round(src.z)
661
662 dst.w = round(src.w)
663
664
665 .. opcode:: BRA - Branch
666
667 pc = target
668
669 .. note::
670
671 Considered for removal.
672
673 .. opcode:: CAL - Subroutine Call
674
675 push(pc)
676 pc = target
677
678
679 .. opcode:: RET - Subroutine Call Return
680
681 pc = pop()
682
683 Potential restrictions:
684 * Only occurs at end of function.
685
686 .. opcode:: SSG - Set Sign
687
688 .. math::
689
690 dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
691
692 dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
693
694 dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
695
696 dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
697
698
699 .. opcode:: CMP - Compare
700
701 .. math::
702
703 dst.x = (src0.x < 0) ? src1.x : src2.x
704
705 dst.y = (src0.y < 0) ? src1.y : src2.y
706
707 dst.z = (src0.z < 0) ? src1.z : src2.z
708
709 dst.w = (src0.w < 0) ? src1.w : src2.w
710
711
712 .. opcode:: KIL - Conditional Discard
713
714 .. math::
715
716 if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
717 discard
718 endif
719
720
721 .. opcode:: SCS - Sine Cosine
722
723 .. math::
724
725 dst.x = \cos{src.x}
726
727 dst.y = \sin{src.x}
728
729 dst.z = 0
730
731 dst.y = 1
732
733
734 .. opcode:: TXB - Texture Lookup With Bias
735
736 TBD
737
738
739 .. opcode:: NRM - 3-component Vector Normalise
740
741 .. math::
742
743 dst.x = src.x / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
744
745 dst.y = src.y / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
746
747 dst.z = src.z / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
748
749 dst.w = 1
750
751
752 .. opcode:: DIV - Divide
753
754 .. math::
755
756 dst.x = \frac{src0.x}{src1.x}
757
758 dst.y = \frac{src0.y}{src1.y}
759
760 dst.z = \frac{src0.z}{src1.z}
761
762 dst.w = \frac{src0.w}{src1.w}
763
764
765 .. opcode:: DP2 - 2-component Dot Product
766
767 This instruction replicates its result.
768
769 .. math::
770
771 dst = src0.x \times src1.x + src0.y \times src1.y
772
773
774 .. opcode:: TXL - Texture Lookup With LOD
775
776 TBD
777
778
779 .. opcode:: BRK - Break
780
781 TBD
782
783
784 .. opcode:: IF - If
785
786 TBD
787
788
789 .. opcode:: ELSE - Else
790
791 TBD
792
793
794 .. opcode:: ENDIF - End If
795
796 TBD
797
798
799 .. opcode:: PUSHA - Push Address Register On Stack
800
801 push(src.x)
802 push(src.y)
803 push(src.z)
804 push(src.w)
805
806 .. note::
807
808 Considered for cleanup.
809
810 .. note::
811
812 Considered for removal.
813
814 .. opcode:: POPA - Pop Address Register From Stack
815
816 dst.w = pop()
817 dst.z = pop()
818 dst.y = pop()
819 dst.x = pop()
820
821 .. note::
822
823 Considered for cleanup.
824
825 .. note::
826
827 Considered for removal.
828
829
830 From GL_NV_gpu_program4
831 ^^^^^^^^^^^^^^^^^^^^^^^^
832
833 Support for these opcodes indicated by a special pipe capability bit (TBD).
834
835 .. opcode:: CEIL - Ceiling
836
837 .. math::
838
839 dst.x = \lceil src.x\rceil
840
841 dst.y = \lceil src.y\rceil
842
843 dst.z = \lceil src.z\rceil
844
845 dst.w = \lceil src.w\rceil
846
847
848 .. opcode:: I2F - Integer To Float
849
850 .. math::
851
852 dst.x = (float) src.x
853
854 dst.y = (float) src.y
855
856 dst.z = (float) src.z
857
858 dst.w = (float) src.w
859
860
861 .. opcode:: NOT - Bitwise Not
862
863 .. math::
864
865 dst.x = ~src.x
866
867 dst.y = ~src.y
868
869 dst.z = ~src.z
870
871 dst.w = ~src.w
872
873
874 .. opcode:: TRUNC - Truncate
875
876 .. math::
877
878 dst.x = trunc(src.x)
879
880 dst.y = trunc(src.y)
881
882 dst.z = trunc(src.z)
883
884 dst.w = trunc(src.w)
885
886
887 .. opcode:: SHL - Shift Left
888
889 .. math::
890
891 dst.x = src0.x << src1.x
892
893 dst.y = src0.y << src1.x
894
895 dst.z = src0.z << src1.x
896
897 dst.w = src0.w << src1.x
898
899
900 .. opcode:: SHR - Shift Right
901
902 .. math::
903
904 dst.x = src0.x >> src1.x
905
906 dst.y = src0.y >> src1.x
907
908 dst.z = src0.z >> src1.x
909
910 dst.w = src0.w >> src1.x
911
912
913 .. opcode:: AND - Bitwise And
914
915 .. math::
916
917 dst.x = src0.x & src1.x
918
919 dst.y = src0.y & src1.y
920
921 dst.z = src0.z & src1.z
922
923 dst.w = src0.w & src1.w
924
925
926 .. opcode:: OR - Bitwise Or
927
928 .. math::
929
930 dst.x = src0.x | src1.x
931
932 dst.y = src0.y | src1.y
933
934 dst.z = src0.z | src1.z
935
936 dst.w = src0.w | src1.w
937
938
939 .. opcode:: MOD - Modulus
940
941 .. math::
942
943 dst.x = src0.x \bmod src1.x
944
945 dst.y = src0.y \bmod src1.y
946
947 dst.z = src0.z \bmod src1.z
948
949 dst.w = src0.w \bmod src1.w
950
951
952 .. opcode:: XOR - Bitwise Xor
953
954 .. math::
955
956 dst.x = src0.x \oplus src1.x
957
958 dst.y = src0.y \oplus src1.y
959
960 dst.z = src0.z \oplus src1.z
961
962 dst.w = src0.w \oplus src1.w
963
964
965 .. opcode:: SAD - Sum Of Absolute Differences
966
967 .. math::
968
969 dst.x = |src0.x - src1.x| + src2.x
970
971 dst.y = |src0.y - src1.y| + src2.y
972
973 dst.z = |src0.z - src1.z| + src2.z
974
975 dst.w = |src0.w - src1.w| + src2.w
976
977
978 .. opcode:: TXF - Texel Fetch
979
980 TBD
981
982
983 .. opcode:: TXQ - Texture Size Query
984
985 TBD
986
987
988 .. opcode:: CONT - Continue
989
990 TBD
991
992
993 From GL_NV_geometry_program4
994 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
995
996
997 .. opcode:: EMIT - Emit
998
999 TBD
1000
1001
1002 .. opcode:: ENDPRIM - End Primitive
1003
1004 TBD
1005
1006
1007 From GLSL
1008 ^^^^^^^^^^
1009
1010
1011 .. opcode:: BGNLOOP - Begin a Loop
1012
1013 TBD
1014
1015
1016 .. opcode:: BGNSUB - Begin Subroutine
1017
1018 TBD
1019
1020
1021 .. opcode:: ENDLOOP - End a Loop
1022
1023 TBD
1024
1025
1026 .. opcode:: ENDSUB - End Subroutine
1027
1028 TBD
1029
1030
1031 .. opcode:: NOP - No Operation
1032
1033 Do nothing.
1034
1035
1036 .. opcode:: NRM4 - 4-component Vector Normalise
1037
1038 This instruction replicates its result.
1039
1040 .. math::
1041
1042 dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
1043
1044
1045 ps_2_x
1046 ^^^^^^^^^^^^
1047
1048
1049 .. opcode:: CALLNZ - Subroutine Call If Not Zero
1050
1051 TBD
1052
1053
1054 .. opcode:: IFC - If
1055
1056 TBD
1057
1058
1059 .. opcode:: BREAKC - Break Conditional
1060
1061 TBD
1062
1063 .. _doubleopcodes:
1064
1065 Double Opcodes
1066 ^^^^^^^^^^^^^^^
1067
1068 .. opcode:: DADD - Add Double
1069
1070 .. math::
1071
1072 dst.xy = src0.xy + src1.xy
1073
1074 dst.zw = src0.zw + src1.zw
1075
1076
1077 .. opcode:: DDIV - Divide Double
1078
1079 .. math::
1080
1081 dst.xy = src0.xy / src1.xy
1082
1083 dst.zw = src0.zw / src1.zw
1084
1085 .. opcode:: DSEQ - Set Double on Equal
1086
1087 .. math::
1088
1089 dst.xy = src0.xy == src1.xy ? 1.0F : 0.0F
1090
1091 dst.zw = src0.zw == src1.zw ? 1.0F : 0.0F
1092
1093 .. opcode:: DSLT - Set Double on Less than
1094
1095 .. math::
1096
1097 dst.xy = src0.xy < src1.xy ? 1.0F : 0.0F
1098
1099 dst.zw = src0.zw < src1.zw ? 1.0F : 0.0F
1100
1101 .. opcode:: DFRAC - Double Fraction
1102
1103 .. math::
1104
1105 dst.xy = src.xy - \lfloor src.xy\rfloor
1106
1107 dst.zw = src.zw - \lfloor src.zw\rfloor
1108
1109
1110 .. opcode:: DFRACEXP - Convert Double Number to Fractional and Integral Components
1111
1112 .. math::
1113
1114 dst0.xy = frexp(src.xy, dst1.xy)
1115
1116 dst0.zw = frexp(src.zw, dst1.zw)
1117
1118 .. opcode:: DLDEXP - Multiple Double Number by Integral Power of 2
1119
1120 .. math::
1121
1122 dst.xy = ldexp(src0.xy, src1.xy)
1123
1124 dst.zw = ldexp(src0.zw, src1.zw)
1125
1126 .. opcode:: DMIN - Minimum Double
1127
1128 .. math::
1129
1130 dst.xy = min(src0.xy, src1.xy)
1131
1132 dst.zw = min(src0.zw, src1.zw)
1133
1134 .. opcode:: DMAX - Maximum Double
1135
1136 .. math::
1137
1138 dst.xy = max(src0.xy, src1.xy)
1139
1140 dst.zw = max(src0.zw, src1.zw)
1141
1142 .. opcode:: DMUL - Multiply Double
1143
1144 .. math::
1145
1146 dst.xy = src0.xy \times src1.xy
1147
1148 dst.zw = src0.zw \times src1.zw
1149
1150
1151 .. opcode:: DMAD - Multiply And Add Doubles
1152
1153 .. math::
1154
1155 dst.xy = src0.xy \times src1.xy + src2.xy
1156
1157 dst.zw = src0.zw \times src1.zw + src2.zw
1158
1159
1160 .. opcode:: DRCP - Reciprocal Double
1161
1162 .. math::
1163
1164 dst.xy = \frac{1}{src.xy}
1165
1166 dst.zw = \frac{1}{src.zw}
1167
1168 .. opcode:: DSQRT - Square root double
1169
1170 .. math::
1171
1172 dst.xy = \sqrt{src.xy}
1173
1174 dst.zw = \sqrt{src.zw}
1175
1176
1177 Explanation of symbols used
1178 ------------------------------
1179
1180
1181 Functions
1182 ^^^^^^^^^^^^^^
1183
1184
1185 :math:`|x|` Absolute value of `x`.
1186
1187 :math:`\lceil x \rceil` Ceiling of `x`.
1188
1189 clamp(x,y,z) Clamp x between y and z.
1190 (x < y) ? y : (x > z) ? z : x
1191
1192 :math:`\lfloor x\rfloor` Floor of `x`.
1193
1194 :math:`\log_2{x}` Logarithm of `x`, base 2.
1195
1196 max(x,y) Maximum of x and y.
1197 (x > y) ? x : y
1198
1199 min(x,y) Minimum of x and y.
1200 (x < y) ? x : y
1201
1202 partialx(x) Derivative of x relative to fragment's X.
1203
1204 partialy(x) Derivative of x relative to fragment's Y.
1205
1206 pop() Pop from stack.
1207
1208 :math:`x^y` `x` to the power `y`.
1209
1210 push(x) Push x on stack.
1211
1212 round(x) Round x.
1213
1214 trunc(x) Truncate x, i.e. drop the fraction bits.
1215
1216
1217 Keywords
1218 ^^^^^^^^^^^^^
1219
1220
1221 discard Discard fragment.
1222
1223 pc Program counter.
1224
1225 target Label of target instruction.
1226
1227
1228 Other tokens
1229 ---------------
1230
1231
1232 Declaration
1233 ^^^^^^^^^^^
1234
1235
1236 Declares a register that is will be referenced as an operand in Instruction
1237 tokens.
1238
1239 File field contains register file that is being declared and is one
1240 of TGSI_FILE.
1241
1242 UsageMask field specifies which of the register components can be accessed
1243 and is one of TGSI_WRITEMASK.
1244
1245 Interpolate field is only valid for fragment shader INPUT register files.
1246 It specifes the way input is being interpolated by the rasteriser and is one
1247 of TGSI_INTERPOLATE.
1248
1249 If Dimension flag is set to 1, a Declaration Dimension token follows.
1250
1251 If Semantic flag is set to 1, a Declaration Semantic token follows.
1252
1253 CylindricalWrap bitfield is only valid for fragment shader INPUT register
1254 files. It specifies which register components should be subject to cylindrical
1255 wrapping when interpolating by the rasteriser. If TGSI_CYLINDRICAL_WRAP_X
1256 is set to 1, the X component should be interpolated according to cylindrical
1257 wrapping rules.
1258
1259
1260 Declaration Semantic
1261 ^^^^^^^^^^^^^^^^^^^^^^^^
1262
1263
1264 Follows Declaration token if Semantic bit is set.
1265
1266 Since its purpose is to link a shader with other stages of the pipeline,
1267 it is valid to follow only those Declaration tokens that declare a register
1268 either in INPUT or OUTPUT file.
1269
1270 SemanticName field contains the semantic name of the register being declared.
1271 There is no default value.
1272
1273 SemanticIndex is an optional subscript that can be used to distinguish
1274 different register declarations with the same semantic name. The default value
1275 is 0.
1276
1277 The meanings of the individual semantic names are explained in the following
1278 sections.
1279
1280 TGSI_SEMANTIC_POSITION
1281 """"""""""""""""""""""
1282
1283 Position, sometimes known as HPOS or WPOS for historical reasons, is the
1284 location of the vertex in space, in ``(x, y, z, w)`` format. ``x``, ``y``, and ``z``
1285 are the Cartesian coordinates, and ``w`` is the homogenous coordinate and used
1286 for the perspective divide, if enabled.
1287
1288 As a vertex shader output, position should be scaled to the viewport. When
1289 used in fragment shaders, position will be in window coordinates. The convention
1290 used depends on the FS_COORD_ORIGIN and FS_COORD_PIXEL_CENTER properties.
1291
1292 XXX additionally, is there a way to configure the perspective divide? it's
1293 accelerated on most chipsets AFAIK...
1294
1295 Position, if not specified, usually defaults to ``(0, 0, 0, 1)``, and can
1296 be partially specified as ``(x, y, 0, 1)`` or ``(x, y, z, 1)``.
1297
1298 XXX usually? can we solidify that?
1299
1300 TGSI_SEMANTIC_COLOR
1301 """""""""""""""""""
1302
1303 Colors are used to, well, color the primitives. Colors are always in
1304 ``(r, g, b, a)`` format.
1305
1306 If alpha is not specified, it defaults to 1.
1307
1308 TGSI_SEMANTIC_BCOLOR
1309 """"""""""""""""""""
1310
1311 Back-facing colors are only used for back-facing polygons, and are only valid
1312 in vertex shader outputs. After rasterization, all polygons are front-facing
1313 and COLOR and BCOLOR end up occupying the same slots in the fragment, so
1314 all BCOLORs effectively become regular COLORs in the fragment shader.
1315
1316 TGSI_SEMANTIC_FOG
1317 """""""""""""""""
1318
1319 The fog coordinate historically has been used to replace the depth coordinate
1320 for generation of fog in dedicated fog blocks. Gallium, however, does not use
1321 dedicated fog acceleration, placing it entirely in the fragment shader
1322 instead.
1323
1324 The fog coordinate should be written in ``(f, 0, 0, 1)`` format. Only the first
1325 component matters when writing from the vertex shader; the driver will ensure
1326 that the coordinate is in this format when used as a fragment shader input.
1327
1328 TGSI_SEMANTIC_PSIZE
1329 """""""""""""""""""
1330
1331 PSIZE, or point size, is used to specify point sizes per-vertex. It should
1332 be in ``(s, 0, 0, 1)`` format, where ``s`` is the (possibly clamped) point size.
1333 Only the first component matters when writing from the vertex shader.
1334
1335 When using this semantic, be sure to set the appropriate state in the
1336 :ref:`rasterizer` first.
1337
1338 TGSI_SEMANTIC_GENERIC
1339 """""""""""""""""""""
1340
1341 Generic semantics are nearly always used for texture coordinate attributes,
1342 in ``(s, t, r, q)`` format. ``t`` and ``r`` may be unused for certain kinds
1343 of lookups, and ``q`` is the level-of-detail bias for biased sampling.
1344
1345 These attributes are called "generic" because they may be used for anything
1346 else, including parameters, texture generation information, or anything that
1347 can be stored inside a four-component vector.
1348
1349 TGSI_SEMANTIC_NORMAL
1350 """"""""""""""""""""
1351
1352 Vertex normal; could be used to implement per-pixel lighting for legacy APIs
1353 that allow mixing fixed-function and programmable stages.
1354
1355 TGSI_SEMANTIC_FACE
1356 """"""""""""""""""
1357
1358 FACE is the facing bit, to store the facing information for the fragment
1359 shader. ``(f, 0, 0, 1)`` is the format. The first component will be positive
1360 when the fragment is front-facing, and negative when the component is
1361 back-facing.
1362
1363 TGSI_SEMANTIC_EDGEFLAG
1364 """"""""""""""""""""""
1365
1366 XXX no clue
1367
1368
1369 Properties
1370 ^^^^^^^^^^^^^^^^^^^^^^^^
1371
1372
1373 Properties are general directives that apply to the whole TGSI program.
1374
1375 FS_COORD_ORIGIN
1376 """""""""""""""
1377
1378 Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
1379 The default value is UPPER_LEFT.
1380
1381 If UPPER_LEFT, the position will be (0,0) at the upper left corner and
1382 increase downward and rightward.
1383 If LOWER_LEFT, the position will be (0,0) at the lower left corner and
1384 increase upward and rightward.
1385
1386 OpenGL defaults to LOWER_LEFT, and is configurable with the
1387 GL_ARB_fragment_coord_conventions extension.
1388
1389 DirectX 9/10 use UPPER_LEFT.
1390
1391 FS_COORD_PIXEL_CENTER
1392 """""""""""""""""""""
1393
1394 Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
1395 The default value is HALF_INTEGER.
1396
1397 If HALF_INTEGER, the fractionary part of the position will be 0.5
1398 If INTEGER, the fractionary part of the position will be 0.0
1399
1400 Note that this does not affect the set of fragments generated by
1401 rasterization, which is instead controlled by gl_rasterization_rules in the
1402 rasterizer.
1403
1404 OpenGL defaults to HALF_INTEGER, and is configurable with the
1405 GL_ARB_fragment_coord_conventions extension.
1406
1407 DirectX 9 uses INTEGER.
1408 DirectX 10 uses HALF_INTEGER.
1409
1410
1411
1412 Texture Sampling and Texture Formats
1413 ------------------------------------
1414
1415 This table shows how texture image components are returned as (x,y,z,w) tuples
1416 by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
1417 :opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
1418 well.
1419
1420 +--------------------+--------------+--------------------+--------------+
1421 | Texture Components | Gallium | OpenGL | Direct3D 9 |
1422 +====================+==============+====================+==============+
1423 | R | XXX TBD | (r, 0, 0, 1) | (r, 1, 1, 1) |
1424 +--------------------+--------------+--------------------+--------------+
1425 | RG | XXX TBD | (r, g, 0, 1) | (r, g, 1, 1) |
1426 +--------------------+--------------+--------------------+--------------+
1427 | RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) |
1428 +--------------------+--------------+--------------------+--------------+
1429 | RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) |
1430 +--------------------+--------------+--------------------+--------------+
1431 | A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) |
1432 +--------------------+--------------+--------------------+--------------+
1433 | L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) |
1434 +--------------------+--------------+--------------------+--------------+
1435 | LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) |
1436 +--------------------+--------------+--------------------+--------------+
1437 | I | (i, i, i, i) | (i, i, i, i) | N/A |
1438 +--------------------+--------------+--------------------+--------------+
1439 | UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) |
1440 | | | [#envmap-bumpmap]_ | |
1441 +--------------------+--------------+--------------------+--------------+
1442 | Z | XXX TBD | (z, z, z, 1) | (0, z, 0, 1) |
1443 | | | [#depth-tex-mode]_ | |
1444 +--------------------+--------------+--------------------+--------------+
1445
1446 .. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
1447 .. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
1448 or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.