Merge branch 'master' of ssh://git.freedesktop.org/git/mesa/mesa into pipe-video
[mesa.git] / src / gallium / docs / source / tgsi.rst
1 TGSI
2 ====
3
4 TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
5 for describing shaders. Since Gallium is inherently shaderful, shaders are
6 an important part of the API. TGSI is the only intermediate representation
7 used by all drivers.
8
9 Basics
10 ------
11
12 All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
13 floating-point four-component vectors. An opcode may have up to one
14 destination register, known as *dst*, and between zero and three source
15 registers, called *src0* through *src2*, or simply *src* if there is only
16 one.
17
18 Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
19 components as integers. Other instructions permit using registers as
20 two-component vectors with double precision; see :ref:`Double Opcodes`.
21
22 When an instruction has a scalar result, the result is usually copied into
23 each of the components of *dst*. When this happens, the result is said to be
24 *replicated* to *dst*. :opcode:`RCP` is one such instruction.
25
26 Instruction Set
27 ---------------
28
29 Core ISA
30 ^^^^^^^^^^^^^^^^^^^^^^^^^
31
32 These opcodes are guaranteed to be available regardless of the driver being
33 used.
34
35 .. opcode:: ARL - Address Register Load
36
37 .. math::
38
39 dst.x = \lfloor src.x\rfloor
40
41 dst.y = \lfloor src.y\rfloor
42
43 dst.z = \lfloor src.z\rfloor
44
45 dst.w = \lfloor src.w\rfloor
46
47
48 .. opcode:: MOV - Move
49
50 .. math::
51
52 dst.x = src.x
53
54 dst.y = src.y
55
56 dst.z = src.z
57
58 dst.w = src.w
59
60
61 .. opcode:: LIT - Light Coefficients
62
63 .. math::
64
65 dst.x = 1
66
67 dst.y = max(src.x, 0)
68
69 dst.z = (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0
70
71 dst.w = 1
72
73
74 .. opcode:: RCP - Reciprocal
75
76 This instruction replicates its result.
77
78 .. math::
79
80 dst = \frac{1}{src.x}
81
82
83 .. opcode:: RSQ - Reciprocal Square Root
84
85 This instruction replicates its result.
86
87 .. math::
88
89 dst = \frac{1}{\sqrt{|src.x|}}
90
91
92 .. opcode:: EXP - Approximate Exponential Base 2
93
94 .. math::
95
96 dst.x = 2^{\lfloor src.x\rfloor}
97
98 dst.y = src.x - \lfloor src.x\rfloor
99
100 dst.z = 2^{src.x}
101
102 dst.w = 1
103
104
105 .. opcode:: LOG - Approximate Logarithm Base 2
106
107 .. math::
108
109 dst.x = \lfloor\log_2{|src.x|}\rfloor
110
111 dst.y = \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}}
112
113 dst.z = \log_2{|src.x|}
114
115 dst.w = 1
116
117
118 .. opcode:: MUL - Multiply
119
120 .. math::
121
122 dst.x = src0.x \times src1.x
123
124 dst.y = src0.y \times src1.y
125
126 dst.z = src0.z \times src1.z
127
128 dst.w = src0.w \times src1.w
129
130
131 .. opcode:: ADD - Add
132
133 .. math::
134
135 dst.x = src0.x + src1.x
136
137 dst.y = src0.y + src1.y
138
139 dst.z = src0.z + src1.z
140
141 dst.w = src0.w + src1.w
142
143
144 .. opcode:: DP3 - 3-component Dot Product
145
146 This instruction replicates its result.
147
148 .. math::
149
150 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
151
152
153 .. opcode:: DP4 - 4-component Dot Product
154
155 This instruction replicates its result.
156
157 .. math::
158
159 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
160
161
162 .. opcode:: DST - Distance Vector
163
164 .. math::
165
166 dst.x = 1
167
168 dst.y = src0.y \times src1.y
169
170 dst.z = src0.z
171
172 dst.w = src1.w
173
174
175 .. opcode:: MIN - Minimum
176
177 .. math::
178
179 dst.x = min(src0.x, src1.x)
180
181 dst.y = min(src0.y, src1.y)
182
183 dst.z = min(src0.z, src1.z)
184
185 dst.w = min(src0.w, src1.w)
186
187
188 .. opcode:: MAX - Maximum
189
190 .. math::
191
192 dst.x = max(src0.x, src1.x)
193
194 dst.y = max(src0.y, src1.y)
195
196 dst.z = max(src0.z, src1.z)
197
198 dst.w = max(src0.w, src1.w)
199
200
201 .. opcode:: SLT - Set On Less Than
202
203 .. math::
204
205 dst.x = (src0.x < src1.x) ? 1 : 0
206
207 dst.y = (src0.y < src1.y) ? 1 : 0
208
209 dst.z = (src0.z < src1.z) ? 1 : 0
210
211 dst.w = (src0.w < src1.w) ? 1 : 0
212
213
214 .. opcode:: SGE - Set On Greater Equal Than
215
216 .. math::
217
218 dst.x = (src0.x >= src1.x) ? 1 : 0
219
220 dst.y = (src0.y >= src1.y) ? 1 : 0
221
222 dst.z = (src0.z >= src1.z) ? 1 : 0
223
224 dst.w = (src0.w >= src1.w) ? 1 : 0
225
226
227 .. opcode:: MAD - Multiply And Add
228
229 .. math::
230
231 dst.x = src0.x \times src1.x + src2.x
232
233 dst.y = src0.y \times src1.y + src2.y
234
235 dst.z = src0.z \times src1.z + src2.z
236
237 dst.w = src0.w \times src1.w + src2.w
238
239
240 .. opcode:: SUB - Subtract
241
242 .. math::
243
244 dst.x = src0.x - src1.x
245
246 dst.y = src0.y - src1.y
247
248 dst.z = src0.z - src1.z
249
250 dst.w = src0.w - src1.w
251
252
253 .. opcode:: LRP - Linear Interpolate
254
255 .. math::
256
257 dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
258
259 dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
260
261 dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
262
263 dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
264
265
266 .. opcode:: CND - Condition
267
268 .. math::
269
270 dst.x = (src2.x > 0.5) ? src0.x : src1.x
271
272 dst.y = (src2.y > 0.5) ? src0.y : src1.y
273
274 dst.z = (src2.z > 0.5) ? src0.z : src1.z
275
276 dst.w = (src2.w > 0.5) ? src0.w : src1.w
277
278
279 .. opcode:: DP2A - 2-component Dot Product And Add
280
281 .. math::
282
283 dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x
284
285 dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x
286
287 dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x
288
289 dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x
290
291
292 .. opcode:: FRC - Fraction
293
294 .. math::
295
296 dst.x = src.x - \lfloor src.x\rfloor
297
298 dst.y = src.y - \lfloor src.y\rfloor
299
300 dst.z = src.z - \lfloor src.z\rfloor
301
302 dst.w = src.w - \lfloor src.w\rfloor
303
304
305 .. opcode:: CLAMP - Clamp
306
307 .. math::
308
309 dst.x = clamp(src0.x, src1.x, src2.x)
310
311 dst.y = clamp(src0.y, src1.y, src2.y)
312
313 dst.z = clamp(src0.z, src1.z, src2.z)
314
315 dst.w = clamp(src0.w, src1.w, src2.w)
316
317
318 .. opcode:: FLR - Floor
319
320 This is identical to :opcode:`ARL`.
321
322 .. math::
323
324 dst.x = \lfloor src.x\rfloor
325
326 dst.y = \lfloor src.y\rfloor
327
328 dst.z = \lfloor src.z\rfloor
329
330 dst.w = \lfloor src.w\rfloor
331
332
333 .. opcode:: ROUND - Round
334
335 .. math::
336
337 dst.x = round(src.x)
338
339 dst.y = round(src.y)
340
341 dst.z = round(src.z)
342
343 dst.w = round(src.w)
344
345
346 .. opcode:: EX2 - Exponential Base 2
347
348 This instruction replicates its result.
349
350 .. math::
351
352 dst = 2^{src.x}
353
354
355 .. opcode:: LG2 - Logarithm Base 2
356
357 This instruction replicates its result.
358
359 .. math::
360
361 dst = \log_2{src.x}
362
363
364 .. opcode:: POW - Power
365
366 This instruction replicates its result.
367
368 .. math::
369
370 dst = src0.x^{src1.x}
371
372 .. opcode:: XPD - Cross Product
373
374 .. math::
375
376 dst.x = src0.y \times src1.z - src1.y \times src0.z
377
378 dst.y = src0.z \times src1.x - src1.z \times src0.x
379
380 dst.z = src0.x \times src1.y - src1.x \times src0.y
381
382 dst.w = 1
383
384
385 .. opcode:: ABS - Absolute
386
387 .. math::
388
389 dst.x = |src.x|
390
391 dst.y = |src.y|
392
393 dst.z = |src.z|
394
395 dst.w = |src.w|
396
397
398 .. opcode:: RCC - Reciprocal Clamped
399
400 This instruction replicates its result.
401
402 XXX cleanup on aisle three
403
404 .. math::
405
406 dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
407
408
409 .. opcode:: DPH - Homogeneous Dot Product
410
411 This instruction replicates its result.
412
413 .. math::
414
415 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
416
417
418 .. opcode:: COS - Cosine
419
420 This instruction replicates its result.
421
422 .. math::
423
424 dst = \cos{src.x}
425
426
427 .. opcode:: DDX - Derivative Relative To X
428
429 .. math::
430
431 dst.x = partialx(src.x)
432
433 dst.y = partialx(src.y)
434
435 dst.z = partialx(src.z)
436
437 dst.w = partialx(src.w)
438
439
440 .. opcode:: DDY - Derivative Relative To Y
441
442 .. math::
443
444 dst.x = partialy(src.x)
445
446 dst.y = partialy(src.y)
447
448 dst.z = partialy(src.z)
449
450 dst.w = partialy(src.w)
451
452
453 .. opcode:: KILP - Predicated Discard
454
455 discard
456
457
458 .. opcode:: PK2H - Pack Two 16-bit Floats
459
460 TBD
461
462
463 .. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
464
465 TBD
466
467
468 .. opcode:: PK4B - Pack Four Signed 8-bit Scalars
469
470 TBD
471
472
473 .. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
474
475 TBD
476
477
478 .. opcode:: RFL - Reflection Vector
479
480 .. math::
481
482 dst.x = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.x - src1.x
483
484 dst.y = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.y - src1.y
485
486 dst.z = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.z - src1.z
487
488 dst.w = 1
489
490 .. note::
491
492 Considered for removal.
493
494
495 .. opcode:: SEQ - Set On Equal
496
497 .. math::
498
499 dst.x = (src0.x == src1.x) ? 1 : 0
500
501 dst.y = (src0.y == src1.y) ? 1 : 0
502
503 dst.z = (src0.z == src1.z) ? 1 : 0
504
505 dst.w = (src0.w == src1.w) ? 1 : 0
506
507
508 .. opcode:: SFL - Set On False
509
510 This instruction replicates its result.
511
512 .. math::
513
514 dst = 0
515
516 .. note::
517
518 Considered for removal.
519
520
521 .. opcode:: SGT - Set On Greater Than
522
523 .. math::
524
525 dst.x = (src0.x > src1.x) ? 1 : 0
526
527 dst.y = (src0.y > src1.y) ? 1 : 0
528
529 dst.z = (src0.z > src1.z) ? 1 : 0
530
531 dst.w = (src0.w > src1.w) ? 1 : 0
532
533
534 .. opcode:: SIN - Sine
535
536 This instruction replicates its result.
537
538 .. math::
539
540 dst = \sin{src.x}
541
542
543 .. opcode:: SLE - Set On Less Equal Than
544
545 .. math::
546
547 dst.x = (src0.x <= src1.x) ? 1 : 0
548
549 dst.y = (src0.y <= src1.y) ? 1 : 0
550
551 dst.z = (src0.z <= src1.z) ? 1 : 0
552
553 dst.w = (src0.w <= src1.w) ? 1 : 0
554
555
556 .. opcode:: SNE - Set On Not Equal
557
558 .. math::
559
560 dst.x = (src0.x != src1.x) ? 1 : 0
561
562 dst.y = (src0.y != src1.y) ? 1 : 0
563
564 dst.z = (src0.z != src1.z) ? 1 : 0
565
566 dst.w = (src0.w != src1.w) ? 1 : 0
567
568
569 .. opcode:: STR - Set On True
570
571 This instruction replicates its result.
572
573 .. math::
574
575 dst = 1
576
577
578 .. opcode:: TEX - Texture Lookup
579
580 TBD
581
582
583 .. opcode:: TXD - Texture Lookup with Derivatives
584
585 TBD
586
587
588 .. opcode:: TXP - Projective Texture Lookup
589
590 TBD
591
592
593 .. opcode:: UP2H - Unpack Two 16-Bit Floats
594
595 TBD
596
597 .. note::
598
599 Considered for removal.
600
601 .. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
602
603 TBD
604
605 .. note::
606
607 Considered for removal.
608
609 .. opcode:: UP4B - Unpack Four Signed 8-Bit Values
610
611 TBD
612
613 .. note::
614
615 Considered for removal.
616
617 .. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
618
619 TBD
620
621 .. note::
622
623 Considered for removal.
624
625 .. opcode:: X2D - 2D Coordinate Transformation
626
627 .. math::
628
629 dst.x = src0.x + src1.x \times src2.x + src1.y \times src2.y
630
631 dst.y = src0.y + src1.x \times src2.z + src1.y \times src2.w
632
633 dst.z = src0.x + src1.x \times src2.x + src1.y \times src2.y
634
635 dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w
636
637 .. note::
638
639 Considered for removal.
640
641
642 .. opcode:: ARA - Address Register Add
643
644 TBD
645
646 .. note::
647
648 Considered for removal.
649
650 .. opcode:: ARR - Address Register Load With Round
651
652 .. math::
653
654 dst.x = round(src.x)
655
656 dst.y = round(src.y)
657
658 dst.z = round(src.z)
659
660 dst.w = round(src.w)
661
662
663 .. opcode:: BRA - Branch
664
665 pc = target
666
667 .. note::
668
669 Considered for removal.
670
671 .. opcode:: CAL - Subroutine Call
672
673 push(pc)
674 pc = target
675
676
677 .. opcode:: RET - Subroutine Call Return
678
679 pc = pop()
680
681
682 .. opcode:: SSG - Set Sign
683
684 .. math::
685
686 dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
687
688 dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
689
690 dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
691
692 dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
693
694
695 .. opcode:: CMP - Compare
696
697 .. math::
698
699 dst.x = (src0.x < 0) ? src1.x : src2.x
700
701 dst.y = (src0.y < 0) ? src1.y : src2.y
702
703 dst.z = (src0.z < 0) ? src1.z : src2.z
704
705 dst.w = (src0.w < 0) ? src1.w : src2.w
706
707
708 .. opcode:: KIL - Conditional Discard
709
710 .. math::
711
712 if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
713 discard
714 endif
715
716
717 .. opcode:: SCS - Sine Cosine
718
719 .. math::
720
721 dst.x = \cos{src.x}
722
723 dst.y = \sin{src.x}
724
725 dst.z = 0
726
727 dst.w = 1
728
729
730 .. opcode:: TXB - Texture Lookup With Bias
731
732 TBD
733
734
735 .. opcode:: NRM - 3-component Vector Normalise
736
737 .. math::
738
739 dst.x = src.x / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
740
741 dst.y = src.y / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
742
743 dst.z = src.z / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
744
745 dst.w = 1
746
747
748 .. opcode:: DIV - Divide
749
750 .. math::
751
752 dst.x = \frac{src0.x}{src1.x}
753
754 dst.y = \frac{src0.y}{src1.y}
755
756 dst.z = \frac{src0.z}{src1.z}
757
758 dst.w = \frac{src0.w}{src1.w}
759
760
761 .. opcode:: DP2 - 2-component Dot Product
762
763 This instruction replicates its result.
764
765 .. math::
766
767 dst = src0.x \times src1.x + src0.y \times src1.y
768
769
770 .. opcode:: TXL - Texture Lookup With LOD
771
772 TBD
773
774
775 .. opcode:: BRK - Break
776
777 TBD
778
779
780 .. opcode:: IF - If
781
782 TBD
783
784
785 .. opcode:: ELSE - Else
786
787 TBD
788
789
790 .. opcode:: ENDIF - End If
791
792 TBD
793
794
795 .. opcode:: PUSHA - Push Address Register On Stack
796
797 push(src.x)
798 push(src.y)
799 push(src.z)
800 push(src.w)
801
802 .. note::
803
804 Considered for cleanup.
805
806 .. note::
807
808 Considered for removal.
809
810 .. opcode:: POPA - Pop Address Register From Stack
811
812 dst.w = pop()
813 dst.z = pop()
814 dst.y = pop()
815 dst.x = pop()
816
817 .. note::
818
819 Considered for cleanup.
820
821 .. note::
822
823 Considered for removal.
824
825
826 Compute ISA
827 ^^^^^^^^^^^^^^^^^^^^^^^^
828
829 These opcodes are primarily provided for special-use computational shaders.
830 Support for these opcodes indicated by a special pipe capability bit (TBD).
831
832 XXX so let's discuss it, yeah?
833
834 .. opcode:: CEIL - Ceiling
835
836 .. math::
837
838 dst.x = \lceil src.x\rceil
839
840 dst.y = \lceil src.y\rceil
841
842 dst.z = \lceil src.z\rceil
843
844 dst.w = \lceil src.w\rceil
845
846
847 .. opcode:: I2F - Integer To Float
848
849 .. math::
850
851 dst.x = (float) src.x
852
853 dst.y = (float) src.y
854
855 dst.z = (float) src.z
856
857 dst.w = (float) src.w
858
859
860 .. opcode:: NOT - Bitwise Not
861
862 .. math::
863
864 dst.x = ~src.x
865
866 dst.y = ~src.y
867
868 dst.z = ~src.z
869
870 dst.w = ~src.w
871
872
873 .. opcode:: TRUNC - Truncate
874
875 .. math::
876
877 dst.x = trunc(src.x)
878
879 dst.y = trunc(src.y)
880
881 dst.z = trunc(src.z)
882
883 dst.w = trunc(src.w)
884
885
886 .. opcode:: SHL - Shift Left
887
888 .. math::
889
890 dst.x = src0.x << src1.x
891
892 dst.y = src0.y << src1.x
893
894 dst.z = src0.z << src1.x
895
896 dst.w = src0.w << src1.x
897
898
899 .. opcode:: SHR - Shift Right
900
901 .. math::
902
903 dst.x = src0.x >> src1.x
904
905 dst.y = src0.y >> src1.x
906
907 dst.z = src0.z >> src1.x
908
909 dst.w = src0.w >> src1.x
910
911
912 .. opcode:: AND - Bitwise And
913
914 .. math::
915
916 dst.x = src0.x & src1.x
917
918 dst.y = src0.y & src1.y
919
920 dst.z = src0.z & src1.z
921
922 dst.w = src0.w & src1.w
923
924
925 .. opcode:: OR - Bitwise Or
926
927 .. math::
928
929 dst.x = src0.x | src1.x
930
931 dst.y = src0.y | src1.y
932
933 dst.z = src0.z | src1.z
934
935 dst.w = src0.w | src1.w
936
937
938 .. opcode:: MOD - Modulus
939
940 .. math::
941
942 dst.x = src0.x \bmod src1.x
943
944 dst.y = src0.y \bmod src1.y
945
946 dst.z = src0.z \bmod src1.z
947
948 dst.w = src0.w \bmod src1.w
949
950
951 .. opcode:: XOR - Bitwise Xor
952
953 .. math::
954
955 dst.x = src0.x \oplus src1.x
956
957 dst.y = src0.y \oplus src1.y
958
959 dst.z = src0.z \oplus src1.z
960
961 dst.w = src0.w \oplus src1.w
962
963
964 .. opcode:: SAD - Sum Of Absolute Differences
965
966 .. math::
967
968 dst.x = |src0.x - src1.x| + src2.x
969
970 dst.y = |src0.y - src1.y| + src2.y
971
972 dst.z = |src0.z - src1.z| + src2.z
973
974 dst.w = |src0.w - src1.w| + src2.w
975
976
977 .. opcode:: TXF - Texel Fetch
978
979 TBD
980
981
982 .. opcode:: TXQ - Texture Size Query
983
984 TBD
985
986
987 .. opcode:: CONT - Continue
988
989 TBD
990
991 .. note::
992
993 Support for CONT is determined by a special capability bit,
994 ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
995
996
997 Geometry ISA
998 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
999
1000 These opcodes are only supported in geometry shaders; they have no meaning
1001 in any other type of shader.
1002
1003 .. opcode:: EMIT - Emit
1004
1005 TBD
1006
1007
1008 .. opcode:: ENDPRIM - End Primitive
1009
1010 TBD
1011
1012
1013 GLSL ISA
1014 ^^^^^^^^^^
1015
1016 These opcodes are part of :term:`GLSL`'s opcode set. Support for these
1017 opcodes is determined by a special capability bit, ``GLSL``.
1018
1019 .. opcode:: BGNLOOP - Begin a Loop
1020
1021 TBD
1022
1023
1024 .. opcode:: BGNSUB - Begin Subroutine
1025
1026 TBD
1027
1028
1029 .. opcode:: ENDLOOP - End a Loop
1030
1031 TBD
1032
1033
1034 .. opcode:: ENDSUB - End Subroutine
1035
1036 TBD
1037
1038
1039 .. opcode:: NOP - No Operation
1040
1041 Do nothing.
1042
1043
1044 .. opcode:: NRM4 - 4-component Vector Normalise
1045
1046 This instruction replicates its result.
1047
1048 .. math::
1049
1050 dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
1051
1052
1053 ps_2_x
1054 ^^^^^^^^^^^^
1055
1056 XXX wait what
1057
1058 .. opcode:: CALLNZ - Subroutine Call If Not Zero
1059
1060 TBD
1061
1062
1063 .. opcode:: IFC - If
1064
1065 TBD
1066
1067
1068 .. opcode:: BREAKC - Break Conditional
1069
1070 TBD
1071
1072 .. _doubleopcodes:
1073
1074 Double ISA
1075 ^^^^^^^^^^^^^^^
1076
1077 The double-precision opcodes reinterpret four-component vectors into
1078 two-component vectors with doubled precision in each component.
1079
1080 Support for these opcodes is XXX undecided. :T
1081
1082 .. opcode:: DADD - Add
1083
1084 .. math::
1085
1086 dst.xy = src0.xy + src1.xy
1087
1088 dst.zw = src0.zw + src1.zw
1089
1090
1091 .. opcode:: DDIV - Divide
1092
1093 .. math::
1094
1095 dst.xy = src0.xy / src1.xy
1096
1097 dst.zw = src0.zw / src1.zw
1098
1099 .. opcode:: DSEQ - Set on Equal
1100
1101 .. math::
1102
1103 dst.xy = src0.xy == src1.xy ? 1.0F : 0.0F
1104
1105 dst.zw = src0.zw == src1.zw ? 1.0F : 0.0F
1106
1107 .. opcode:: DSLT - Set on Less than
1108
1109 .. math::
1110
1111 dst.xy = src0.xy < src1.xy ? 1.0F : 0.0F
1112
1113 dst.zw = src0.zw < src1.zw ? 1.0F : 0.0F
1114
1115 .. opcode:: DFRAC - Fraction
1116
1117 .. math::
1118
1119 dst.xy = src.xy - \lfloor src.xy\rfloor
1120
1121 dst.zw = src.zw - \lfloor src.zw\rfloor
1122
1123
1124 .. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components
1125
1126 Like the ``frexp()`` routine in many math libraries, this opcode stores the
1127 exponent of its source to ``dst0``, and the significand to ``dst1``, such that
1128 :math:`dst1 \times 2^{dst0} = src` .
1129
1130 .. math::
1131
1132 dst0.xy = exp(src.xy)
1133
1134 dst1.xy = frac(src.xy)
1135
1136 dst0.zw = exp(src.zw)
1137
1138 dst1.zw = frac(src.zw)
1139
1140 .. opcode:: DLDEXP - Multiply Number by Integral Power of 2
1141
1142 This opcode is the inverse of :opcode:`DFRACEXP`.
1143
1144 .. math::
1145
1146 dst.xy = src0.xy \times 2^{src1.xy}
1147
1148 dst.zw = src0.zw \times 2^{src1.zw}
1149
1150 .. opcode:: DMIN - Minimum
1151
1152 .. math::
1153
1154 dst.xy = min(src0.xy, src1.xy)
1155
1156 dst.zw = min(src0.zw, src1.zw)
1157
1158 .. opcode:: DMAX - Maximum
1159
1160 .. math::
1161
1162 dst.xy = max(src0.xy, src1.xy)
1163
1164 dst.zw = max(src0.zw, src1.zw)
1165
1166 .. opcode:: DMUL - Multiply
1167
1168 .. math::
1169
1170 dst.xy = src0.xy \times src1.xy
1171
1172 dst.zw = src0.zw \times src1.zw
1173
1174
1175 .. opcode:: DMAD - Multiply And Add
1176
1177 .. math::
1178
1179 dst.xy = src0.xy \times src1.xy + src2.xy
1180
1181 dst.zw = src0.zw \times src1.zw + src2.zw
1182
1183
1184 .. opcode:: DRCP - Reciprocal
1185
1186 .. math::
1187
1188 dst.xy = \frac{1}{src.xy}
1189
1190 dst.zw = \frac{1}{src.zw}
1191
1192 .. opcode:: DSQRT - Square Root
1193
1194 .. math::
1195
1196 dst.xy = \sqrt{src.xy}
1197
1198 dst.zw = \sqrt{src.zw}
1199
1200
1201 Explanation of symbols used
1202 ------------------------------
1203
1204
1205 Functions
1206 ^^^^^^^^^^^^^^
1207
1208
1209 :math:`|x|` Absolute value of `x`.
1210
1211 :math:`\lceil x \rceil` Ceiling of `x`.
1212
1213 clamp(x,y,z) Clamp x between y and z.
1214 (x < y) ? y : (x > z) ? z : x
1215
1216 :math:`\lfloor x\rfloor` Floor of `x`.
1217
1218 :math:`\log_2{x}` Logarithm of `x`, base 2.
1219
1220 max(x,y) Maximum of x and y.
1221 (x > y) ? x : y
1222
1223 min(x,y) Minimum of x and y.
1224 (x < y) ? x : y
1225
1226 partialx(x) Derivative of x relative to fragment's X.
1227
1228 partialy(x) Derivative of x relative to fragment's Y.
1229
1230 pop() Pop from stack.
1231
1232 :math:`x^y` `x` to the power `y`.
1233
1234 push(x) Push x on stack.
1235
1236 round(x) Round x.
1237
1238 trunc(x) Truncate x, i.e. drop the fraction bits.
1239
1240
1241 Keywords
1242 ^^^^^^^^^^^^^
1243
1244
1245 discard Discard fragment.
1246
1247 pc Program counter.
1248
1249 target Label of target instruction.
1250
1251
1252 Other tokens
1253 ---------------
1254
1255
1256 Declaration
1257 ^^^^^^^^^^^
1258
1259
1260 Declares a register that is will be referenced as an operand in Instruction
1261 tokens.
1262
1263 File field contains register file that is being declared and is one
1264 of TGSI_FILE.
1265
1266 UsageMask field specifies which of the register components can be accessed
1267 and is one of TGSI_WRITEMASK.
1268
1269 Interpolate field is only valid for fragment shader INPUT register files.
1270 It specifes the way input is being interpolated by the rasteriser and is one
1271 of TGSI_INTERPOLATE.
1272
1273 If Dimension flag is set to 1, a Declaration Dimension token follows.
1274
1275 If Semantic flag is set to 1, a Declaration Semantic token follows.
1276
1277 CylindricalWrap bitfield is only valid for fragment shader INPUT register
1278 files. It specifies which register components should be subject to cylindrical
1279 wrapping when interpolating by the rasteriser. If TGSI_CYLINDRICAL_WRAP_X
1280 is set to 1, the X component should be interpolated according to cylindrical
1281 wrapping rules.
1282
1283
1284 Declaration Semantic
1285 ^^^^^^^^^^^^^^^^^^^^^^^^
1286
1287 Vertex and fragment shader input and output registers may be labeled
1288 with semantic information consisting of a name and index.
1289
1290 Follows Declaration token if Semantic bit is set.
1291
1292 Since its purpose is to link a shader with other stages of the pipeline,
1293 it is valid to follow only those Declaration tokens that declare a register
1294 either in INPUT or OUTPUT file.
1295
1296 SemanticName field contains the semantic name of the register being declared.
1297 There is no default value.
1298
1299 SemanticIndex is an optional subscript that can be used to distinguish
1300 different register declarations with the same semantic name. The default value
1301 is 0.
1302
1303 The meanings of the individual semantic names are explained in the following
1304 sections.
1305
1306 TGSI_SEMANTIC_POSITION
1307 """"""""""""""""""""""
1308
1309 For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
1310 output register which contains the homogeneous vertex position in the clip
1311 space coordinate system. After clipping, the X, Y and Z components of the
1312 vertex will be divided by the W value to get normalized device coordinates.
1313
1314 For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
1315 fragment shader input contains the fragment's window position. The X
1316 component starts at zero and always increases from left to right.
1317 The Y component starts at zero and always increases but Y=0 may either
1318 indicate the top of the window or the bottom depending on the fragment
1319 coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
1320 The Z coordinate ranges from 0 to 1 to represent depth from the front
1321 to the back of the Z buffer. The W component contains the reciprocol
1322 of the interpolated vertex position W component.
1323
1324 Fragment shaders may also declare an output register with
1325 TGSI_SEMANTIC_POSITION. Only the Z component is writable. This allows
1326 the fragment shader to change the fragment's Z position.
1327
1328
1329
1330 TGSI_SEMANTIC_COLOR
1331 """""""""""""""""""
1332
1333 For vertex shader outputs or fragment shader inputs/outputs, this
1334 label indicates that the resister contains an R,G,B,A color.
1335
1336 Several shader inputs/outputs may contain colors so the semantic index
1337 is used to distinguish them. For example, color[0] may be the diffuse
1338 color while color[1] may be the specular color.
1339
1340 This label is needed so that the flat/smooth shading can be applied
1341 to the right interpolants during rasterization.
1342
1343
1344
1345 TGSI_SEMANTIC_BCOLOR
1346 """"""""""""""""""""
1347
1348 Back-facing colors are only used for back-facing polygons, and are only valid
1349 in vertex shader outputs. After rasterization, all polygons are front-facing
1350 and COLOR and BCOLOR end up occupying the same slots in the fragment shader,
1351 so all BCOLORs effectively become regular COLORs in the fragment shader.
1352
1353
1354 TGSI_SEMANTIC_FOG
1355 """""""""""""""""
1356
1357 Vertex shader inputs and outputs and fragment shader inputs may be
1358 labeled with TGSI_SEMANTIC_FOG to indicate that the register contains
1359 a fog coordinate in the form (F, 0, 0, 1). Typically, the fragment
1360 shader will use the fog coordinate to compute a fog blend factor which
1361 is used to blend the normal fragment color with a constant fog color.
1362
1363 Only the first component matters when writing from the vertex shader;
1364 the driver will ensure that the coordinate is in this format when used
1365 as a fragment shader input.
1366
1367
1368 TGSI_SEMANTIC_PSIZE
1369 """""""""""""""""""
1370
1371 Vertex shader input and output registers may be labeled with
1372 TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size
1373 in the form (S, 0, 0, 1). The point size controls the width or diameter
1374 of points for rasterization. This label cannot be used in fragment
1375 shaders.
1376
1377 When using this semantic, be sure to set the appropriate state in the
1378 :ref:`rasterizer` first.
1379
1380
1381 TGSI_SEMANTIC_GENERIC
1382 """""""""""""""""""""
1383
1384 All vertex/fragment shader inputs/outputs not labeled with any other
1385 semantic label can be considered to be generic attributes. Typical
1386 uses of generic inputs/outputs are texcoords and user-defined values.
1387
1388
1389 TGSI_SEMANTIC_NORMAL
1390 """"""""""""""""""""
1391
1392 Indicates that a vertex shader input is a normal vector. This is
1393 typically only used for legacy graphics APIs.
1394
1395
1396 TGSI_SEMANTIC_FACE
1397 """"""""""""""""""
1398
1399 This label applies to fragment shader inputs only and indicates that
1400 the register contains front/back-face information of the form (F, 0,
1401 0, 1). The first component will be positive when the fragment belongs
1402 to a front-facing polygon, and negative when the fragment belongs to a
1403 back-facing polygon.
1404
1405
1406 TGSI_SEMANTIC_EDGEFLAG
1407 """"""""""""""""""""""
1408
1409 For vertex shaders, this sematic label indicates that an input or
1410 output is a boolean edge flag. The register layout is [F, x, x, x]
1411 where F is 0.0 or 1.0 and x = don't care. Normally, the vertex shader
1412 simply copies the edge flag input to the edgeflag output.
1413
1414 Edge flags are used to control which lines or points are actually
1415 drawn when the polygon mode converts triangles/quads/polygons into
1416 points or lines.
1417
1418 TGSI_SEMANTIC_STENCIL
1419 """"""""""""""""""""""
1420
1421 For fragment shaders, this semantic label indicates than an output
1422 is a writable stencil reference value. Only the Y component is writable.
1423 This allows the fragment shader to change the fragments stencilref value.
1424
1425
1426 Properties
1427 ^^^^^^^^^^^^^^^^^^^^^^^^
1428
1429
1430 Properties are general directives that apply to the whole TGSI program.
1431
1432 FS_COORD_ORIGIN
1433 """""""""""""""
1434
1435 Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
1436 The default value is UPPER_LEFT.
1437
1438 If UPPER_LEFT, the position will be (0,0) at the upper left corner and
1439 increase downward and rightward.
1440 If LOWER_LEFT, the position will be (0,0) at the lower left corner and
1441 increase upward and rightward.
1442
1443 OpenGL defaults to LOWER_LEFT, and is configurable with the
1444 GL_ARB_fragment_coord_conventions extension.
1445
1446 DirectX 9/10 use UPPER_LEFT.
1447
1448 FS_COORD_PIXEL_CENTER
1449 """""""""""""""""""""
1450
1451 Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
1452 The default value is HALF_INTEGER.
1453
1454 If HALF_INTEGER, the fractionary part of the position will be 0.5
1455 If INTEGER, the fractionary part of the position will be 0.0
1456
1457 Note that this does not affect the set of fragments generated by
1458 rasterization, which is instead controlled by gl_rasterization_rules in the
1459 rasterizer.
1460
1461 OpenGL defaults to HALF_INTEGER, and is configurable with the
1462 GL_ARB_fragment_coord_conventions extension.
1463
1464 DirectX 9 uses INTEGER.
1465 DirectX 10 uses HALF_INTEGER.
1466
1467
1468
1469 Texture Sampling and Texture Formats
1470 ------------------------------------
1471
1472 This table shows how texture image components are returned as (x,y,z,w) tuples
1473 by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
1474 :opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
1475 well.
1476
1477 +--------------------+--------------+--------------------+--------------+
1478 | Texture Components | Gallium | OpenGL | Direct3D 9 |
1479 +====================+==============+====================+==============+
1480 | R | (r, 0, 0, 1) | (r, 0, 0, 1) | (r, 1, 1, 1) |
1481 +--------------------+--------------+--------------------+--------------+
1482 | RG | (r, g, 0, 1) | (r, g, 0, 1) | (r, g, 1, 1) |
1483 +--------------------+--------------+--------------------+--------------+
1484 | RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) |
1485 +--------------------+--------------+--------------------+--------------+
1486 | RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) |
1487 +--------------------+--------------+--------------------+--------------+
1488 | A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) |
1489 +--------------------+--------------+--------------------+--------------+
1490 | L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) |
1491 +--------------------+--------------+--------------------+--------------+
1492 | LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) |
1493 +--------------------+--------------+--------------------+--------------+
1494 | I | (i, i, i, i) | (i, i, i, i) | N/A |
1495 +--------------------+--------------+--------------------+--------------+
1496 | UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) |
1497 | | | [#envmap-bumpmap]_ | |
1498 +--------------------+--------------+--------------------+--------------+
1499 | Z | XXX TBD | (z, z, z, 1) | (0, z, 0, 1) |
1500 | | | [#depth-tex-mode]_ | |
1501 +--------------------+--------------+--------------------+--------------+
1502 | S | (s, s, s, s) | unknown | unknown |
1503 +--------------------+--------------+--------------------+--------------+
1504
1505 .. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
1506 .. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
1507 or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.