gallium/docs: Label opcodes by capability bits.
[mesa.git] / src / gallium / docs / source / tgsi.rst
1 TGSI
2 ====
3
4 TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
5 for describing shaders. Since Gallium is inherently shaderful, shaders are
6 an important part of the API. TGSI is the only intermediate representation
7 used by all drivers.
8
9 Basics
10 ------
11
12 All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
13 floating-point four-component vectors. An opcode may have up to one
14 destination register, known as *dst*, and between zero and three source
15 registers, called *src0* through *src2*, or simply *src* if there is only
16 one.
17
18 Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
19 components as integers. Other instructions permit using registers as
20 two-component vectors with double precision; see :ref:`Double Opcodes`.
21
22 When an instruction has a scalar result, the result is usually copied into
23 each of the components of *dst*. When this happens, the result is said to be
24 *replicated* to *dst*. :opcode:`RCP` is one such instruction.
25
26 Instruction Set
27 ---------------
28
29 Core ISA
30 ^^^^^^^^^^^^^^^^^^^^^^^^^
31
32 These opcodes are guaranteed to be available regardless of the driver being
33 used.
34
35 .. opcode:: ARL - Address Register Load
36
37 .. math::
38
39 dst.x = \lfloor src.x\rfloor
40
41 dst.y = \lfloor src.y\rfloor
42
43 dst.z = \lfloor src.z\rfloor
44
45 dst.w = \lfloor src.w\rfloor
46
47
48 .. opcode:: MOV - Move
49
50 .. math::
51
52 dst.x = src.x
53
54 dst.y = src.y
55
56 dst.z = src.z
57
58 dst.w = src.w
59
60
61 .. opcode:: LIT - Light Coefficients
62
63 .. math::
64
65 dst.x = 1
66
67 dst.y = max(src.x, 0)
68
69 dst.z = (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0
70
71 dst.w = 1
72
73
74 .. opcode:: RCP - Reciprocal
75
76 This instruction replicates its result.
77
78 .. math::
79
80 dst = \frac{1}{src.x}
81
82
83 .. opcode:: RSQ - Reciprocal Square Root
84
85 This instruction replicates its result.
86
87 .. math::
88
89 dst = \frac{1}{\sqrt{|src.x|}}
90
91
92 .. opcode:: EXP - Approximate Exponential Base 2
93
94 .. math::
95
96 dst.x = 2^{\lfloor src.x\rfloor}
97
98 dst.y = src.x - \lfloor src.x\rfloor
99
100 dst.z = 2^{src.x}
101
102 dst.w = 1
103
104
105 .. opcode:: LOG - Approximate Logarithm Base 2
106
107 .. math::
108
109 dst.x = \lfloor\log_2{|src.x|}\rfloor
110
111 dst.y = \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}}
112
113 dst.z = \log_2{|src.x|}
114
115 dst.w = 1
116
117
118 .. opcode:: MUL - Multiply
119
120 .. math::
121
122 dst.x = src0.x \times src1.x
123
124 dst.y = src0.y \times src1.y
125
126 dst.z = src0.z \times src1.z
127
128 dst.w = src0.w \times src1.w
129
130
131 .. opcode:: ADD - Add
132
133 .. math::
134
135 dst.x = src0.x + src1.x
136
137 dst.y = src0.y + src1.y
138
139 dst.z = src0.z + src1.z
140
141 dst.w = src0.w + src1.w
142
143
144 .. opcode:: DP3 - 3-component Dot Product
145
146 This instruction replicates its result.
147
148 .. math::
149
150 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
151
152
153 .. opcode:: DP4 - 4-component Dot Product
154
155 This instruction replicates its result.
156
157 .. math::
158
159 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
160
161
162 .. opcode:: DST - Distance Vector
163
164 .. math::
165
166 dst.x = 1
167
168 dst.y = src0.y \times src1.y
169
170 dst.z = src0.z
171
172 dst.w = src1.w
173
174
175 .. opcode:: MIN - Minimum
176
177 .. math::
178
179 dst.x = min(src0.x, src1.x)
180
181 dst.y = min(src0.y, src1.y)
182
183 dst.z = min(src0.z, src1.z)
184
185 dst.w = min(src0.w, src1.w)
186
187
188 .. opcode:: MAX - Maximum
189
190 .. math::
191
192 dst.x = max(src0.x, src1.x)
193
194 dst.y = max(src0.y, src1.y)
195
196 dst.z = max(src0.z, src1.z)
197
198 dst.w = max(src0.w, src1.w)
199
200
201 .. opcode:: SLT - Set On Less Than
202
203 .. math::
204
205 dst.x = (src0.x < src1.x) ? 1 : 0
206
207 dst.y = (src0.y < src1.y) ? 1 : 0
208
209 dst.z = (src0.z < src1.z) ? 1 : 0
210
211 dst.w = (src0.w < src1.w) ? 1 : 0
212
213
214 .. opcode:: SGE - Set On Greater Equal Than
215
216 .. math::
217
218 dst.x = (src0.x >= src1.x) ? 1 : 0
219
220 dst.y = (src0.y >= src1.y) ? 1 : 0
221
222 dst.z = (src0.z >= src1.z) ? 1 : 0
223
224 dst.w = (src0.w >= src1.w) ? 1 : 0
225
226
227 .. opcode:: MAD - Multiply And Add
228
229 .. math::
230
231 dst.x = src0.x \times src1.x + src2.x
232
233 dst.y = src0.y \times src1.y + src2.y
234
235 dst.z = src0.z \times src1.z + src2.z
236
237 dst.w = src0.w \times src1.w + src2.w
238
239
240 .. opcode:: SUB - Subtract
241
242 .. math::
243
244 dst.x = src0.x - src1.x
245
246 dst.y = src0.y - src1.y
247
248 dst.z = src0.z - src1.z
249
250 dst.w = src0.w - src1.w
251
252
253 .. opcode:: LRP - Linear Interpolate
254
255 .. math::
256
257 dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
258
259 dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
260
261 dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
262
263 dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
264
265
266 .. opcode:: CND - Condition
267
268 .. math::
269
270 dst.x = (src2.x > 0.5) ? src0.x : src1.x
271
272 dst.y = (src2.y > 0.5) ? src0.y : src1.y
273
274 dst.z = (src2.z > 0.5) ? src0.z : src1.z
275
276 dst.w = (src2.w > 0.5) ? src0.w : src1.w
277
278
279 .. opcode:: DP2A - 2-component Dot Product And Add
280
281 .. math::
282
283 dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x
284
285 dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x
286
287 dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x
288
289 dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x
290
291
292 .. opcode:: FRC - Fraction
293
294 .. math::
295
296 dst.x = src.x - \lfloor src.x\rfloor
297
298 dst.y = src.y - \lfloor src.y\rfloor
299
300 dst.z = src.z - \lfloor src.z\rfloor
301
302 dst.w = src.w - \lfloor src.w\rfloor
303
304
305 .. opcode:: CLAMP - Clamp
306
307 .. math::
308
309 dst.x = clamp(src0.x, src1.x, src2.x)
310
311 dst.y = clamp(src0.y, src1.y, src2.y)
312
313 dst.z = clamp(src0.z, src1.z, src2.z)
314
315 dst.w = clamp(src0.w, src1.w, src2.w)
316
317
318 .. opcode:: FLR - Floor
319
320 This is identical to :opcode:`ARL`.
321
322 .. math::
323
324 dst.x = \lfloor src.x\rfloor
325
326 dst.y = \lfloor src.y\rfloor
327
328 dst.z = \lfloor src.z\rfloor
329
330 dst.w = \lfloor src.w\rfloor
331
332
333 .. opcode:: ROUND - Round
334
335 .. math::
336
337 dst.x = round(src.x)
338
339 dst.y = round(src.y)
340
341 dst.z = round(src.z)
342
343 dst.w = round(src.w)
344
345
346 .. opcode:: EX2 - Exponential Base 2
347
348 This instruction replicates its result.
349
350 .. math::
351
352 dst = 2^{src.x}
353
354
355 .. opcode:: LG2 - Logarithm Base 2
356
357 This instruction replicates its result.
358
359 .. math::
360
361 dst = \log_2{src.x}
362
363
364 .. opcode:: POW - Power
365
366 This instruction replicates its result.
367
368 .. math::
369
370 dst = src0.x^{src1.x}
371
372 .. opcode:: XPD - Cross Product
373
374 .. math::
375
376 dst.x = src0.y \times src1.z - src1.y \times src0.z
377
378 dst.y = src0.z \times src1.x - src1.z \times src0.x
379
380 dst.z = src0.x \times src1.y - src1.x \times src0.y
381
382 dst.w = 1
383
384
385 .. opcode:: ABS - Absolute
386
387 .. math::
388
389 dst.x = |src.x|
390
391 dst.y = |src.y|
392
393 dst.z = |src.z|
394
395 dst.w = |src.w|
396
397
398 .. opcode:: RCC - Reciprocal Clamped
399
400 This instruction replicates its result.
401
402 XXX cleanup on aisle three
403
404 .. math::
405
406 dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
407
408
409 .. opcode:: DPH - Homogeneous Dot Product
410
411 This instruction replicates its result.
412
413 .. math::
414
415 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
416
417
418 .. opcode:: COS - Cosine
419
420 This instruction replicates its result.
421
422 .. math::
423
424 dst = \cos{src.x}
425
426
427 .. opcode:: DDX - Derivative Relative To X
428
429 .. math::
430
431 dst.x = partialx(src.x)
432
433 dst.y = partialx(src.y)
434
435 dst.z = partialx(src.z)
436
437 dst.w = partialx(src.w)
438
439
440 .. opcode:: DDY - Derivative Relative To Y
441
442 .. math::
443
444 dst.x = partialy(src.x)
445
446 dst.y = partialy(src.y)
447
448 dst.z = partialy(src.z)
449
450 dst.w = partialy(src.w)
451
452
453 .. opcode:: KILP - Predicated Discard
454
455 discard
456
457
458 .. opcode:: PK2H - Pack Two 16-bit Floats
459
460 TBD
461
462
463 .. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
464
465 TBD
466
467
468 .. opcode:: PK4B - Pack Four Signed 8-bit Scalars
469
470 TBD
471
472
473 .. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
474
475 TBD
476
477
478 .. opcode:: RFL - Reflection Vector
479
480 .. math::
481
482 dst.x = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.x - src1.x
483
484 dst.y = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.y - src1.y
485
486 dst.z = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.z - src1.z
487
488 dst.w = 1
489
490 .. note::
491
492 Considered for removal.
493
494
495 .. opcode:: SEQ - Set On Equal
496
497 .. math::
498
499 dst.x = (src0.x == src1.x) ? 1 : 0
500
501 dst.y = (src0.y == src1.y) ? 1 : 0
502
503 dst.z = (src0.z == src1.z) ? 1 : 0
504
505 dst.w = (src0.w == src1.w) ? 1 : 0
506
507
508 .. opcode:: SFL - Set On False
509
510 This instruction replicates its result.
511
512 .. math::
513
514 dst = 0
515
516 .. note::
517
518 Considered for removal.
519
520
521 .. opcode:: SGT - Set On Greater Than
522
523 .. math::
524
525 dst.x = (src0.x > src1.x) ? 1 : 0
526
527 dst.y = (src0.y > src1.y) ? 1 : 0
528
529 dst.z = (src0.z > src1.z) ? 1 : 0
530
531 dst.w = (src0.w > src1.w) ? 1 : 0
532
533
534 .. opcode:: SIN - Sine
535
536 This instruction replicates its result.
537
538 .. math::
539
540 dst = \sin{src.x}
541
542
543 .. opcode:: SLE - Set On Less Equal Than
544
545 .. math::
546
547 dst.x = (src0.x <= src1.x) ? 1 : 0
548
549 dst.y = (src0.y <= src1.y) ? 1 : 0
550
551 dst.z = (src0.z <= src1.z) ? 1 : 0
552
553 dst.w = (src0.w <= src1.w) ? 1 : 0
554
555
556 .. opcode:: SNE - Set On Not Equal
557
558 .. math::
559
560 dst.x = (src0.x != src1.x) ? 1 : 0
561
562 dst.y = (src0.y != src1.y) ? 1 : 0
563
564 dst.z = (src0.z != src1.z) ? 1 : 0
565
566 dst.w = (src0.w != src1.w) ? 1 : 0
567
568
569 .. opcode:: STR - Set On True
570
571 This instruction replicates its result.
572
573 .. math::
574
575 dst = 1
576
577
578 .. opcode:: TEX - Texture Lookup
579
580 TBD
581
582
583 .. opcode:: TXD - Texture Lookup with Derivatives
584
585 TBD
586
587
588 .. opcode:: TXP - Projective Texture Lookup
589
590 TBD
591
592
593 .. opcode:: UP2H - Unpack Two 16-Bit Floats
594
595 TBD
596
597 .. note::
598
599 Considered for removal.
600
601 .. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
602
603 TBD
604
605 .. note::
606
607 Considered for removal.
608
609 .. opcode:: UP4B - Unpack Four Signed 8-Bit Values
610
611 TBD
612
613 .. note::
614
615 Considered for removal.
616
617 .. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
618
619 TBD
620
621 .. note::
622
623 Considered for removal.
624
625 .. opcode:: X2D - 2D Coordinate Transformation
626
627 .. math::
628
629 dst.x = src0.x + src1.x \times src2.x + src1.y \times src2.y
630
631 dst.y = src0.y + src1.x \times src2.z + src1.y \times src2.w
632
633 dst.z = src0.x + src1.x \times src2.x + src1.y \times src2.y
634
635 dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w
636
637 .. note::
638
639 Considered for removal.
640
641
642 .. opcode:: ARA - Address Register Add
643
644 TBD
645
646 .. note::
647
648 Considered for removal.
649
650 .. opcode:: ARR - Address Register Load With Round
651
652 .. math::
653
654 dst.x = round(src.x)
655
656 dst.y = round(src.y)
657
658 dst.z = round(src.z)
659
660 dst.w = round(src.w)
661
662
663 .. opcode:: BRA - Branch
664
665 pc = target
666
667 .. note::
668
669 Considered for removal.
670
671 .. opcode:: CAL - Subroutine Call
672
673 push(pc)
674 pc = target
675
676
677 .. opcode:: RET - Subroutine Call Return
678
679 pc = pop()
680
681 Potential restrictions:
682 * Only occurs at end of function.
683
684 .. opcode:: SSG - Set Sign
685
686 .. math::
687
688 dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
689
690 dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
691
692 dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
693
694 dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
695
696
697 .. opcode:: CMP - Compare
698
699 .. math::
700
701 dst.x = (src0.x < 0) ? src1.x : src2.x
702
703 dst.y = (src0.y < 0) ? src1.y : src2.y
704
705 dst.z = (src0.z < 0) ? src1.z : src2.z
706
707 dst.w = (src0.w < 0) ? src1.w : src2.w
708
709
710 .. opcode:: KIL - Conditional Discard
711
712 .. math::
713
714 if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
715 discard
716 endif
717
718
719 .. opcode:: SCS - Sine Cosine
720
721 .. math::
722
723 dst.x = \cos{src.x}
724
725 dst.y = \sin{src.x}
726
727 dst.z = 0
728
729 dst.y = 1
730
731
732 .. opcode:: TXB - Texture Lookup With Bias
733
734 TBD
735
736
737 .. opcode:: NRM - 3-component Vector Normalise
738
739 .. math::
740
741 dst.x = src.x / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
742
743 dst.y = src.y / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
744
745 dst.z = src.z / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
746
747 dst.w = 1
748
749
750 .. opcode:: DIV - Divide
751
752 .. math::
753
754 dst.x = \frac{src0.x}{src1.x}
755
756 dst.y = \frac{src0.y}{src1.y}
757
758 dst.z = \frac{src0.z}{src1.z}
759
760 dst.w = \frac{src0.w}{src1.w}
761
762
763 .. opcode:: DP2 - 2-component Dot Product
764
765 This instruction replicates its result.
766
767 .. math::
768
769 dst = src0.x \times src1.x + src0.y \times src1.y
770
771
772 .. opcode:: TXL - Texture Lookup With LOD
773
774 TBD
775
776
777 .. opcode:: BRK - Break
778
779 TBD
780
781
782 .. opcode:: IF - If
783
784 TBD
785
786
787 .. opcode:: ELSE - Else
788
789 TBD
790
791
792 .. opcode:: ENDIF - End If
793
794 TBD
795
796
797 .. opcode:: PUSHA - Push Address Register On Stack
798
799 push(src.x)
800 push(src.y)
801 push(src.z)
802 push(src.w)
803
804 .. note::
805
806 Considered for cleanup.
807
808 .. note::
809
810 Considered for removal.
811
812 .. opcode:: POPA - Pop Address Register From Stack
813
814 dst.w = pop()
815 dst.z = pop()
816 dst.y = pop()
817 dst.x = pop()
818
819 .. note::
820
821 Considered for cleanup.
822
823 .. note::
824
825 Considered for removal.
826
827
828 Compute ISA
829 ^^^^^^^^^^^^^^^^^^^^^^^^
830
831 These opcodes are primarily provided for special-use computational shaders.
832 Support for these opcodes indicated by a special pipe capability bit (TBD).
833
834 XXX so let's discuss it, yeah?
835
836 .. opcode:: CEIL - Ceiling
837
838 .. math::
839
840 dst.x = \lceil src.x\rceil
841
842 dst.y = \lceil src.y\rceil
843
844 dst.z = \lceil src.z\rceil
845
846 dst.w = \lceil src.w\rceil
847
848
849 .. opcode:: I2F - Integer To Float
850
851 .. math::
852
853 dst.x = (float) src.x
854
855 dst.y = (float) src.y
856
857 dst.z = (float) src.z
858
859 dst.w = (float) src.w
860
861
862 .. opcode:: NOT - Bitwise Not
863
864 .. math::
865
866 dst.x = ~src.x
867
868 dst.y = ~src.y
869
870 dst.z = ~src.z
871
872 dst.w = ~src.w
873
874
875 .. opcode:: TRUNC - Truncate
876
877 .. math::
878
879 dst.x = trunc(src.x)
880
881 dst.y = trunc(src.y)
882
883 dst.z = trunc(src.z)
884
885 dst.w = trunc(src.w)
886
887
888 .. opcode:: SHL - Shift Left
889
890 .. math::
891
892 dst.x = src0.x << src1.x
893
894 dst.y = src0.y << src1.x
895
896 dst.z = src0.z << src1.x
897
898 dst.w = src0.w << src1.x
899
900
901 .. opcode:: SHR - Shift Right
902
903 .. math::
904
905 dst.x = src0.x >> src1.x
906
907 dst.y = src0.y >> src1.x
908
909 dst.z = src0.z >> src1.x
910
911 dst.w = src0.w >> src1.x
912
913
914 .. opcode:: AND - Bitwise And
915
916 .. math::
917
918 dst.x = src0.x & src1.x
919
920 dst.y = src0.y & src1.y
921
922 dst.z = src0.z & src1.z
923
924 dst.w = src0.w & src1.w
925
926
927 .. opcode:: OR - Bitwise Or
928
929 .. math::
930
931 dst.x = src0.x | src1.x
932
933 dst.y = src0.y | src1.y
934
935 dst.z = src0.z | src1.z
936
937 dst.w = src0.w | src1.w
938
939
940 .. opcode:: MOD - Modulus
941
942 .. math::
943
944 dst.x = src0.x \bmod src1.x
945
946 dst.y = src0.y \bmod src1.y
947
948 dst.z = src0.z \bmod src1.z
949
950 dst.w = src0.w \bmod src1.w
951
952
953 .. opcode:: XOR - Bitwise Xor
954
955 .. math::
956
957 dst.x = src0.x \oplus src1.x
958
959 dst.y = src0.y \oplus src1.y
960
961 dst.z = src0.z \oplus src1.z
962
963 dst.w = src0.w \oplus src1.w
964
965
966 .. opcode:: SAD - Sum Of Absolute Differences
967
968 .. math::
969
970 dst.x = |src0.x - src1.x| + src2.x
971
972 dst.y = |src0.y - src1.y| + src2.y
973
974 dst.z = |src0.z - src1.z| + src2.z
975
976 dst.w = |src0.w - src1.w| + src2.w
977
978
979 .. opcode:: TXF - Texel Fetch
980
981 TBD
982
983
984 .. opcode:: TXQ - Texture Size Query
985
986 TBD
987
988
989 .. opcode:: CONT - Continue
990
991 TBD
992
993 .. note::
994
995 Support for CONT is determined by a special capability bit,
996 ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
997
998
999 Geometry ISA
1000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1001
1002 These opcodes are only supported in geometry shaders; they have no meaning
1003 in any other type of shader.
1004
1005 .. opcode:: EMIT - Emit
1006
1007 TBD
1008
1009
1010 .. opcode:: ENDPRIM - End Primitive
1011
1012 TBD
1013
1014
1015 GLSL ISA
1016 ^^^^^^^^^^
1017
1018 These opcodes are part of :term:`GLSL`'s opcode set. Support for these
1019 opcodes is determined by a special capability bit, ``GLSL``.
1020
1021 .. opcode:: BGNLOOP - Begin a Loop
1022
1023 TBD
1024
1025
1026 .. opcode:: BGNSUB - Begin Subroutine
1027
1028 TBD
1029
1030
1031 .. opcode:: ENDLOOP - End a Loop
1032
1033 TBD
1034
1035
1036 .. opcode:: ENDSUB - End Subroutine
1037
1038 TBD
1039
1040
1041 .. opcode:: NOP - No Operation
1042
1043 Do nothing.
1044
1045
1046 .. opcode:: NRM4 - 4-component Vector Normalise
1047
1048 This instruction replicates its result.
1049
1050 .. math::
1051
1052 dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
1053
1054
1055 ps_2_x
1056 ^^^^^^^^^^^^
1057
1058 XXX wait what
1059
1060 .. opcode:: CALLNZ - Subroutine Call If Not Zero
1061
1062 TBD
1063
1064
1065 .. opcode:: IFC - If
1066
1067 TBD
1068
1069
1070 .. opcode:: BREAKC - Break Conditional
1071
1072 TBD
1073
1074 .. _doubleopcodes:
1075
1076 Double ISA
1077 ^^^^^^^^^^^^^^^
1078
1079 .. opcode:: DADD - Add Double
1080
1081 .. math::
1082
1083 dst.xy = src0.xy + src1.xy
1084
1085 dst.zw = src0.zw + src1.zw
1086
1087
1088 .. opcode:: DDIV - Divide Double
1089
1090 .. math::
1091
1092 dst.xy = src0.xy / src1.xy
1093
1094 dst.zw = src0.zw / src1.zw
1095
1096 .. opcode:: DSEQ - Set Double on Equal
1097
1098 .. math::
1099
1100 dst.xy = src0.xy == src1.xy ? 1.0F : 0.0F
1101
1102 dst.zw = src0.zw == src1.zw ? 1.0F : 0.0F
1103
1104 .. opcode:: DSLT - Set Double on Less than
1105
1106 .. math::
1107
1108 dst.xy = src0.xy < src1.xy ? 1.0F : 0.0F
1109
1110 dst.zw = src0.zw < src1.zw ? 1.0F : 0.0F
1111
1112 .. opcode:: DFRAC - Double Fraction
1113
1114 .. math::
1115
1116 dst.xy = src.xy - \lfloor src.xy\rfloor
1117
1118 dst.zw = src.zw - \lfloor src.zw\rfloor
1119
1120
1121 .. opcode:: DFRACEXP - Convert Double Number to Fractional and Integral Components
1122
1123 .. math::
1124
1125 dst0.xy = frexp(src.xy, dst1.xy)
1126
1127 dst0.zw = frexp(src.zw, dst1.zw)
1128
1129 .. opcode:: DLDEXP - Multiple Double Number by Integral Power of 2
1130
1131 .. math::
1132
1133 dst.xy = ldexp(src0.xy, src1.xy)
1134
1135 dst.zw = ldexp(src0.zw, src1.zw)
1136
1137 .. opcode:: DMIN - Minimum Double
1138
1139 .. math::
1140
1141 dst.xy = min(src0.xy, src1.xy)
1142
1143 dst.zw = min(src0.zw, src1.zw)
1144
1145 .. opcode:: DMAX - Maximum Double
1146
1147 .. math::
1148
1149 dst.xy = max(src0.xy, src1.xy)
1150
1151 dst.zw = max(src0.zw, src1.zw)
1152
1153 .. opcode:: DMUL - Multiply Double
1154
1155 .. math::
1156
1157 dst.xy = src0.xy \times src1.xy
1158
1159 dst.zw = src0.zw \times src1.zw
1160
1161
1162 .. opcode:: DMAD - Multiply And Add Doubles
1163
1164 .. math::
1165
1166 dst.xy = src0.xy \times src1.xy + src2.xy
1167
1168 dst.zw = src0.zw \times src1.zw + src2.zw
1169
1170
1171 .. opcode:: DRCP - Reciprocal Double
1172
1173 .. math::
1174
1175 dst.xy = \frac{1}{src.xy}
1176
1177 dst.zw = \frac{1}{src.zw}
1178
1179 .. opcode:: DSQRT - Square root double
1180
1181 .. math::
1182
1183 dst.xy = \sqrt{src.xy}
1184
1185 dst.zw = \sqrt{src.zw}
1186
1187
1188 Explanation of symbols used
1189 ------------------------------
1190
1191
1192 Functions
1193 ^^^^^^^^^^^^^^
1194
1195
1196 :math:`|x|` Absolute value of `x`.
1197
1198 :math:`\lceil x \rceil` Ceiling of `x`.
1199
1200 clamp(x,y,z) Clamp x between y and z.
1201 (x < y) ? y : (x > z) ? z : x
1202
1203 :math:`\lfloor x\rfloor` Floor of `x`.
1204
1205 :math:`\log_2{x}` Logarithm of `x`, base 2.
1206
1207 max(x,y) Maximum of x and y.
1208 (x > y) ? x : y
1209
1210 min(x,y) Minimum of x and y.
1211 (x < y) ? x : y
1212
1213 partialx(x) Derivative of x relative to fragment's X.
1214
1215 partialy(x) Derivative of x relative to fragment's Y.
1216
1217 pop() Pop from stack.
1218
1219 :math:`x^y` `x` to the power `y`.
1220
1221 push(x) Push x on stack.
1222
1223 round(x) Round x.
1224
1225 trunc(x) Truncate x, i.e. drop the fraction bits.
1226
1227
1228 Keywords
1229 ^^^^^^^^^^^^^
1230
1231
1232 discard Discard fragment.
1233
1234 pc Program counter.
1235
1236 target Label of target instruction.
1237
1238
1239 Other tokens
1240 ---------------
1241
1242
1243 Declaration
1244 ^^^^^^^^^^^
1245
1246
1247 Declares a register that is will be referenced as an operand in Instruction
1248 tokens.
1249
1250 File field contains register file that is being declared and is one
1251 of TGSI_FILE.
1252
1253 UsageMask field specifies which of the register components can be accessed
1254 and is one of TGSI_WRITEMASK.
1255
1256 Interpolate field is only valid for fragment shader INPUT register files.
1257 It specifes the way input is being interpolated by the rasteriser and is one
1258 of TGSI_INTERPOLATE.
1259
1260 If Dimension flag is set to 1, a Declaration Dimension token follows.
1261
1262 If Semantic flag is set to 1, a Declaration Semantic token follows.
1263
1264 CylindricalWrap bitfield is only valid for fragment shader INPUT register
1265 files. It specifies which register components should be subject to cylindrical
1266 wrapping when interpolating by the rasteriser. If TGSI_CYLINDRICAL_WRAP_X
1267 is set to 1, the X component should be interpolated according to cylindrical
1268 wrapping rules.
1269
1270
1271 Declaration Semantic
1272 ^^^^^^^^^^^^^^^^^^^^^^^^
1273
1274
1275 Follows Declaration token if Semantic bit is set.
1276
1277 Since its purpose is to link a shader with other stages of the pipeline,
1278 it is valid to follow only those Declaration tokens that declare a register
1279 either in INPUT or OUTPUT file.
1280
1281 SemanticName field contains the semantic name of the register being declared.
1282 There is no default value.
1283
1284 SemanticIndex is an optional subscript that can be used to distinguish
1285 different register declarations with the same semantic name. The default value
1286 is 0.
1287
1288 The meanings of the individual semantic names are explained in the following
1289 sections.
1290
1291 TGSI_SEMANTIC_POSITION
1292 """"""""""""""""""""""
1293
1294 Position, sometimes known as HPOS or WPOS for historical reasons, is the
1295 location of the vertex in space, in ``(x, y, z, w)`` format. ``x``, ``y``, and ``z``
1296 are the Cartesian coordinates, and ``w`` is the homogenous coordinate and used
1297 for the perspective divide, if enabled.
1298
1299 As a vertex shader output, position should be scaled to the viewport. When
1300 used in fragment shaders, position will be in window coordinates. The convention
1301 used depends on the FS_COORD_ORIGIN and FS_COORD_PIXEL_CENTER properties.
1302
1303 XXX additionally, is there a way to configure the perspective divide? it's
1304 accelerated on most chipsets AFAIK...
1305
1306 Position, if not specified, usually defaults to ``(0, 0, 0, 1)``, and can
1307 be partially specified as ``(x, y, 0, 1)`` or ``(x, y, z, 1)``.
1308
1309 XXX usually? can we solidify that?
1310
1311 TGSI_SEMANTIC_COLOR
1312 """""""""""""""""""
1313
1314 Colors are used to, well, color the primitives. Colors are always in
1315 ``(r, g, b, a)`` format.
1316
1317 If alpha is not specified, it defaults to 1.
1318
1319 TGSI_SEMANTIC_BCOLOR
1320 """"""""""""""""""""
1321
1322 Back-facing colors are only used for back-facing polygons, and are only valid
1323 in vertex shader outputs. After rasterization, all polygons are front-facing
1324 and COLOR and BCOLOR end up occupying the same slots in the fragment, so
1325 all BCOLORs effectively become regular COLORs in the fragment shader.
1326
1327 TGSI_SEMANTIC_FOG
1328 """""""""""""""""
1329
1330 The fog coordinate historically has been used to replace the depth coordinate
1331 for generation of fog in dedicated fog blocks. Gallium, however, does not use
1332 dedicated fog acceleration, placing it entirely in the fragment shader
1333 instead.
1334
1335 The fog coordinate should be written in ``(f, 0, 0, 1)`` format. Only the first
1336 component matters when writing from the vertex shader; the driver will ensure
1337 that the coordinate is in this format when used as a fragment shader input.
1338
1339 TGSI_SEMANTIC_PSIZE
1340 """""""""""""""""""
1341
1342 PSIZE, or point size, is used to specify point sizes per-vertex. It should
1343 be in ``(s, 0, 0, 1)`` format, where ``s`` is the (possibly clamped) point size.
1344 Only the first component matters when writing from the vertex shader.
1345
1346 When using this semantic, be sure to set the appropriate state in the
1347 :ref:`rasterizer` first.
1348
1349 TGSI_SEMANTIC_GENERIC
1350 """""""""""""""""""""
1351
1352 Generic semantics are nearly always used for texture coordinate attributes,
1353 in ``(s, t, r, q)`` format. ``t`` and ``r`` may be unused for certain kinds
1354 of lookups, and ``q`` is the level-of-detail bias for biased sampling.
1355
1356 These attributes are called "generic" because they may be used for anything
1357 else, including parameters, texture generation information, or anything that
1358 can be stored inside a four-component vector.
1359
1360 TGSI_SEMANTIC_NORMAL
1361 """"""""""""""""""""
1362
1363 Vertex normal; could be used to implement per-pixel lighting for legacy APIs
1364 that allow mixing fixed-function and programmable stages.
1365
1366 TGSI_SEMANTIC_FACE
1367 """"""""""""""""""
1368
1369 FACE is the facing bit, to store the facing information for the fragment
1370 shader. ``(f, 0, 0, 1)`` is the format. The first component will be positive
1371 when the fragment is front-facing, and negative when the component is
1372 back-facing.
1373
1374 TGSI_SEMANTIC_EDGEFLAG
1375 """"""""""""""""""""""
1376
1377 XXX no clue
1378
1379
1380 Properties
1381 ^^^^^^^^^^^^^^^^^^^^^^^^
1382
1383
1384 Properties are general directives that apply to the whole TGSI program.
1385
1386 FS_COORD_ORIGIN
1387 """""""""""""""
1388
1389 Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
1390 The default value is UPPER_LEFT.
1391
1392 If UPPER_LEFT, the position will be (0,0) at the upper left corner and
1393 increase downward and rightward.
1394 If LOWER_LEFT, the position will be (0,0) at the lower left corner and
1395 increase upward and rightward.
1396
1397 OpenGL defaults to LOWER_LEFT, and is configurable with the
1398 GL_ARB_fragment_coord_conventions extension.
1399
1400 DirectX 9/10 use UPPER_LEFT.
1401
1402 FS_COORD_PIXEL_CENTER
1403 """""""""""""""""""""
1404
1405 Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
1406 The default value is HALF_INTEGER.
1407
1408 If HALF_INTEGER, the fractionary part of the position will be 0.5
1409 If INTEGER, the fractionary part of the position will be 0.0
1410
1411 Note that this does not affect the set of fragments generated by
1412 rasterization, which is instead controlled by gl_rasterization_rules in the
1413 rasterizer.
1414
1415 OpenGL defaults to HALF_INTEGER, and is configurable with the
1416 GL_ARB_fragment_coord_conventions extension.
1417
1418 DirectX 9 uses INTEGER.
1419 DirectX 10 uses HALF_INTEGER.
1420
1421
1422
1423 Texture Sampling and Texture Formats
1424 ------------------------------------
1425
1426 This table shows how texture image components are returned as (x,y,z,w) tuples
1427 by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
1428 :opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
1429 well.
1430
1431 +--------------------+--------------+--------------------+--------------+
1432 | Texture Components | Gallium | OpenGL | Direct3D 9 |
1433 +====================+==============+====================+==============+
1434 | R | (r, 0, 0, 1) | (r, 0, 0, 1) | (r, 1, 1, 1) |
1435 +--------------------+--------------+--------------------+--------------+
1436 | RG | (r, g, 0, 1) | (r, g, 0, 1) | (r, g, 1, 1) |
1437 +--------------------+--------------+--------------------+--------------+
1438 | RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) |
1439 +--------------------+--------------+--------------------+--------------+
1440 | RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) |
1441 +--------------------+--------------+--------------------+--------------+
1442 | A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) |
1443 +--------------------+--------------+--------------------+--------------+
1444 | L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) |
1445 +--------------------+--------------+--------------------+--------------+
1446 | LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) |
1447 +--------------------+--------------+--------------------+--------------+
1448 | I | (i, i, i, i) | (i, i, i, i) | N/A |
1449 +--------------------+--------------+--------------------+--------------+
1450 | UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) |
1451 | | | [#envmap-bumpmap]_ | |
1452 +--------------------+--------------+--------------------+--------------+
1453 | Z | XXX TBD | (z, z, z, 1) | (0, z, 0, 1) |
1454 | | | [#depth-tex-mode]_ | |
1455 +--------------------+--------------+--------------------+--------------+
1456
1457 .. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
1458 .. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
1459 or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.