gallium/docs: document TGSI_SEMANTIC_EDGEFLAG
[mesa.git] / src / gallium / docs / source / tgsi.rst
1 TGSI
2 ====
3
4 TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
5 for describing shaders. Since Gallium is inherently shaderful, shaders are
6 an important part of the API. TGSI is the only intermediate representation
7 used by all drivers.
8
9 Basics
10 ------
11
12 All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
13 floating-point four-component vectors. An opcode may have up to one
14 destination register, known as *dst*, and between zero and three source
15 registers, called *src0* through *src2*, or simply *src* if there is only
16 one.
17
18 Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
19 components as integers. Other instructions permit using registers as
20 two-component vectors with double precision; see :ref:`Double Opcodes`.
21
22 When an instruction has a scalar result, the result is usually copied into
23 each of the components of *dst*. When this happens, the result is said to be
24 *replicated* to *dst*. :opcode:`RCP` is one such instruction.
25
26 Instruction Set
27 ---------------
28
29 Core ISA
30 ^^^^^^^^^^^^^^^^^^^^^^^^^
31
32 These opcodes are guaranteed to be available regardless of the driver being
33 used.
34
35 .. opcode:: ARL - Address Register Load
36
37 .. math::
38
39 dst.x = \lfloor src.x\rfloor
40
41 dst.y = \lfloor src.y\rfloor
42
43 dst.z = \lfloor src.z\rfloor
44
45 dst.w = \lfloor src.w\rfloor
46
47
48 .. opcode:: MOV - Move
49
50 .. math::
51
52 dst.x = src.x
53
54 dst.y = src.y
55
56 dst.z = src.z
57
58 dst.w = src.w
59
60
61 .. opcode:: LIT - Light Coefficients
62
63 .. math::
64
65 dst.x = 1
66
67 dst.y = max(src.x, 0)
68
69 dst.z = (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0
70
71 dst.w = 1
72
73
74 .. opcode:: RCP - Reciprocal
75
76 This instruction replicates its result.
77
78 .. math::
79
80 dst = \frac{1}{src.x}
81
82
83 .. opcode:: RSQ - Reciprocal Square Root
84
85 This instruction replicates its result.
86
87 .. math::
88
89 dst = \frac{1}{\sqrt{|src.x|}}
90
91
92 .. opcode:: EXP - Approximate Exponential Base 2
93
94 .. math::
95
96 dst.x = 2^{\lfloor src.x\rfloor}
97
98 dst.y = src.x - \lfloor src.x\rfloor
99
100 dst.z = 2^{src.x}
101
102 dst.w = 1
103
104
105 .. opcode:: LOG - Approximate Logarithm Base 2
106
107 .. math::
108
109 dst.x = \lfloor\log_2{|src.x|}\rfloor
110
111 dst.y = \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}}
112
113 dst.z = \log_2{|src.x|}
114
115 dst.w = 1
116
117
118 .. opcode:: MUL - Multiply
119
120 .. math::
121
122 dst.x = src0.x \times src1.x
123
124 dst.y = src0.y \times src1.y
125
126 dst.z = src0.z \times src1.z
127
128 dst.w = src0.w \times src1.w
129
130
131 .. opcode:: ADD - Add
132
133 .. math::
134
135 dst.x = src0.x + src1.x
136
137 dst.y = src0.y + src1.y
138
139 dst.z = src0.z + src1.z
140
141 dst.w = src0.w + src1.w
142
143
144 .. opcode:: DP3 - 3-component Dot Product
145
146 This instruction replicates its result.
147
148 .. math::
149
150 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
151
152
153 .. opcode:: DP4 - 4-component Dot Product
154
155 This instruction replicates its result.
156
157 .. math::
158
159 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
160
161
162 .. opcode:: DST - Distance Vector
163
164 .. math::
165
166 dst.x = 1
167
168 dst.y = src0.y \times src1.y
169
170 dst.z = src0.z
171
172 dst.w = src1.w
173
174
175 .. opcode:: MIN - Minimum
176
177 .. math::
178
179 dst.x = min(src0.x, src1.x)
180
181 dst.y = min(src0.y, src1.y)
182
183 dst.z = min(src0.z, src1.z)
184
185 dst.w = min(src0.w, src1.w)
186
187
188 .. opcode:: MAX - Maximum
189
190 .. math::
191
192 dst.x = max(src0.x, src1.x)
193
194 dst.y = max(src0.y, src1.y)
195
196 dst.z = max(src0.z, src1.z)
197
198 dst.w = max(src0.w, src1.w)
199
200
201 .. opcode:: SLT - Set On Less Than
202
203 .. math::
204
205 dst.x = (src0.x < src1.x) ? 1 : 0
206
207 dst.y = (src0.y < src1.y) ? 1 : 0
208
209 dst.z = (src0.z < src1.z) ? 1 : 0
210
211 dst.w = (src0.w < src1.w) ? 1 : 0
212
213
214 .. opcode:: SGE - Set On Greater Equal Than
215
216 .. math::
217
218 dst.x = (src0.x >= src1.x) ? 1 : 0
219
220 dst.y = (src0.y >= src1.y) ? 1 : 0
221
222 dst.z = (src0.z >= src1.z) ? 1 : 0
223
224 dst.w = (src0.w >= src1.w) ? 1 : 0
225
226
227 .. opcode:: MAD - Multiply And Add
228
229 .. math::
230
231 dst.x = src0.x \times src1.x + src2.x
232
233 dst.y = src0.y \times src1.y + src2.y
234
235 dst.z = src0.z \times src1.z + src2.z
236
237 dst.w = src0.w \times src1.w + src2.w
238
239
240 .. opcode:: SUB - Subtract
241
242 .. math::
243
244 dst.x = src0.x - src1.x
245
246 dst.y = src0.y - src1.y
247
248 dst.z = src0.z - src1.z
249
250 dst.w = src0.w - src1.w
251
252
253 .. opcode:: LRP - Linear Interpolate
254
255 .. math::
256
257 dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
258
259 dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
260
261 dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
262
263 dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
264
265
266 .. opcode:: CND - Condition
267
268 .. math::
269
270 dst.x = (src2.x > 0.5) ? src0.x : src1.x
271
272 dst.y = (src2.y > 0.5) ? src0.y : src1.y
273
274 dst.z = (src2.z > 0.5) ? src0.z : src1.z
275
276 dst.w = (src2.w > 0.5) ? src0.w : src1.w
277
278
279 .. opcode:: DP2A - 2-component Dot Product And Add
280
281 .. math::
282
283 dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x
284
285 dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x
286
287 dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x
288
289 dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x
290
291
292 .. opcode:: FRC - Fraction
293
294 .. math::
295
296 dst.x = src.x - \lfloor src.x\rfloor
297
298 dst.y = src.y - \lfloor src.y\rfloor
299
300 dst.z = src.z - \lfloor src.z\rfloor
301
302 dst.w = src.w - \lfloor src.w\rfloor
303
304
305 .. opcode:: CLAMP - Clamp
306
307 .. math::
308
309 dst.x = clamp(src0.x, src1.x, src2.x)
310
311 dst.y = clamp(src0.y, src1.y, src2.y)
312
313 dst.z = clamp(src0.z, src1.z, src2.z)
314
315 dst.w = clamp(src0.w, src1.w, src2.w)
316
317
318 .. opcode:: FLR - Floor
319
320 This is identical to :opcode:`ARL`.
321
322 .. math::
323
324 dst.x = \lfloor src.x\rfloor
325
326 dst.y = \lfloor src.y\rfloor
327
328 dst.z = \lfloor src.z\rfloor
329
330 dst.w = \lfloor src.w\rfloor
331
332
333 .. opcode:: ROUND - Round
334
335 .. math::
336
337 dst.x = round(src.x)
338
339 dst.y = round(src.y)
340
341 dst.z = round(src.z)
342
343 dst.w = round(src.w)
344
345
346 .. opcode:: EX2 - Exponential Base 2
347
348 This instruction replicates its result.
349
350 .. math::
351
352 dst = 2^{src.x}
353
354
355 .. opcode:: LG2 - Logarithm Base 2
356
357 This instruction replicates its result.
358
359 .. math::
360
361 dst = \log_2{src.x}
362
363
364 .. opcode:: POW - Power
365
366 This instruction replicates its result.
367
368 .. math::
369
370 dst = src0.x^{src1.x}
371
372 .. opcode:: XPD - Cross Product
373
374 .. math::
375
376 dst.x = src0.y \times src1.z - src1.y \times src0.z
377
378 dst.y = src0.z \times src1.x - src1.z \times src0.x
379
380 dst.z = src0.x \times src1.y - src1.x \times src0.y
381
382 dst.w = 1
383
384
385 .. opcode:: ABS - Absolute
386
387 .. math::
388
389 dst.x = |src.x|
390
391 dst.y = |src.y|
392
393 dst.z = |src.z|
394
395 dst.w = |src.w|
396
397
398 .. opcode:: RCC - Reciprocal Clamped
399
400 This instruction replicates its result.
401
402 XXX cleanup on aisle three
403
404 .. math::
405
406 dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
407
408
409 .. opcode:: DPH - Homogeneous Dot Product
410
411 This instruction replicates its result.
412
413 .. math::
414
415 dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
416
417
418 .. opcode:: COS - Cosine
419
420 This instruction replicates its result.
421
422 .. math::
423
424 dst = \cos{src.x}
425
426
427 .. opcode:: DDX - Derivative Relative To X
428
429 .. math::
430
431 dst.x = partialx(src.x)
432
433 dst.y = partialx(src.y)
434
435 dst.z = partialx(src.z)
436
437 dst.w = partialx(src.w)
438
439
440 .. opcode:: DDY - Derivative Relative To Y
441
442 .. math::
443
444 dst.x = partialy(src.x)
445
446 dst.y = partialy(src.y)
447
448 dst.z = partialy(src.z)
449
450 dst.w = partialy(src.w)
451
452
453 .. opcode:: KILP - Predicated Discard
454
455 discard
456
457
458 .. opcode:: PK2H - Pack Two 16-bit Floats
459
460 TBD
461
462
463 .. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
464
465 TBD
466
467
468 .. opcode:: PK4B - Pack Four Signed 8-bit Scalars
469
470 TBD
471
472
473 .. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
474
475 TBD
476
477
478 .. opcode:: RFL - Reflection Vector
479
480 .. math::
481
482 dst.x = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.x - src1.x
483
484 dst.y = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.y - src1.y
485
486 dst.z = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.z - src1.z
487
488 dst.w = 1
489
490 .. note::
491
492 Considered for removal.
493
494
495 .. opcode:: SEQ - Set On Equal
496
497 .. math::
498
499 dst.x = (src0.x == src1.x) ? 1 : 0
500
501 dst.y = (src0.y == src1.y) ? 1 : 0
502
503 dst.z = (src0.z == src1.z) ? 1 : 0
504
505 dst.w = (src0.w == src1.w) ? 1 : 0
506
507
508 .. opcode:: SFL - Set On False
509
510 This instruction replicates its result.
511
512 .. math::
513
514 dst = 0
515
516 .. note::
517
518 Considered for removal.
519
520
521 .. opcode:: SGT - Set On Greater Than
522
523 .. math::
524
525 dst.x = (src0.x > src1.x) ? 1 : 0
526
527 dst.y = (src0.y > src1.y) ? 1 : 0
528
529 dst.z = (src0.z > src1.z) ? 1 : 0
530
531 dst.w = (src0.w > src1.w) ? 1 : 0
532
533
534 .. opcode:: SIN - Sine
535
536 This instruction replicates its result.
537
538 .. math::
539
540 dst = \sin{src.x}
541
542
543 .. opcode:: SLE - Set On Less Equal Than
544
545 .. math::
546
547 dst.x = (src0.x <= src1.x) ? 1 : 0
548
549 dst.y = (src0.y <= src1.y) ? 1 : 0
550
551 dst.z = (src0.z <= src1.z) ? 1 : 0
552
553 dst.w = (src0.w <= src1.w) ? 1 : 0
554
555
556 .. opcode:: SNE - Set On Not Equal
557
558 .. math::
559
560 dst.x = (src0.x != src1.x) ? 1 : 0
561
562 dst.y = (src0.y != src1.y) ? 1 : 0
563
564 dst.z = (src0.z != src1.z) ? 1 : 0
565
566 dst.w = (src0.w != src1.w) ? 1 : 0
567
568
569 .. opcode:: STR - Set On True
570
571 This instruction replicates its result.
572
573 .. math::
574
575 dst = 1
576
577
578 .. opcode:: TEX - Texture Lookup
579
580 TBD
581
582
583 .. opcode:: TXD - Texture Lookup with Derivatives
584
585 TBD
586
587
588 .. opcode:: TXP - Projective Texture Lookup
589
590 TBD
591
592
593 .. opcode:: UP2H - Unpack Two 16-Bit Floats
594
595 TBD
596
597 .. note::
598
599 Considered for removal.
600
601 .. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
602
603 TBD
604
605 .. note::
606
607 Considered for removal.
608
609 .. opcode:: UP4B - Unpack Four Signed 8-Bit Values
610
611 TBD
612
613 .. note::
614
615 Considered for removal.
616
617 .. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
618
619 TBD
620
621 .. note::
622
623 Considered for removal.
624
625 .. opcode:: X2D - 2D Coordinate Transformation
626
627 .. math::
628
629 dst.x = src0.x + src1.x \times src2.x + src1.y \times src2.y
630
631 dst.y = src0.y + src1.x \times src2.z + src1.y \times src2.w
632
633 dst.z = src0.x + src1.x \times src2.x + src1.y \times src2.y
634
635 dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w
636
637 .. note::
638
639 Considered for removal.
640
641
642 .. opcode:: ARA - Address Register Add
643
644 TBD
645
646 .. note::
647
648 Considered for removal.
649
650 .. opcode:: ARR - Address Register Load With Round
651
652 .. math::
653
654 dst.x = round(src.x)
655
656 dst.y = round(src.y)
657
658 dst.z = round(src.z)
659
660 dst.w = round(src.w)
661
662
663 .. opcode:: BRA - Branch
664
665 pc = target
666
667 .. note::
668
669 Considered for removal.
670
671 .. opcode:: CAL - Subroutine Call
672
673 push(pc)
674 pc = target
675
676
677 .. opcode:: RET - Subroutine Call Return
678
679 pc = pop()
680
681 Potential restrictions:
682 * Only occurs at end of function.
683
684 .. opcode:: SSG - Set Sign
685
686 .. math::
687
688 dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
689
690 dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
691
692 dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
693
694 dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
695
696
697 .. opcode:: CMP - Compare
698
699 .. math::
700
701 dst.x = (src0.x < 0) ? src1.x : src2.x
702
703 dst.y = (src0.y < 0) ? src1.y : src2.y
704
705 dst.z = (src0.z < 0) ? src1.z : src2.z
706
707 dst.w = (src0.w < 0) ? src1.w : src2.w
708
709
710 .. opcode:: KIL - Conditional Discard
711
712 .. math::
713
714 if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
715 discard
716 endif
717
718
719 .. opcode:: SCS - Sine Cosine
720
721 .. math::
722
723 dst.x = \cos{src.x}
724
725 dst.y = \sin{src.x}
726
727 dst.z = 0
728
729 dst.y = 1
730
731
732 .. opcode:: TXB - Texture Lookup With Bias
733
734 TBD
735
736
737 .. opcode:: NRM - 3-component Vector Normalise
738
739 .. math::
740
741 dst.x = src.x / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
742
743 dst.y = src.y / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
744
745 dst.z = src.z / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
746
747 dst.w = 1
748
749
750 .. opcode:: DIV - Divide
751
752 .. math::
753
754 dst.x = \frac{src0.x}{src1.x}
755
756 dst.y = \frac{src0.y}{src1.y}
757
758 dst.z = \frac{src0.z}{src1.z}
759
760 dst.w = \frac{src0.w}{src1.w}
761
762
763 .. opcode:: DP2 - 2-component Dot Product
764
765 This instruction replicates its result.
766
767 .. math::
768
769 dst = src0.x \times src1.x + src0.y \times src1.y
770
771
772 .. opcode:: TXL - Texture Lookup With LOD
773
774 TBD
775
776
777 .. opcode:: BRK - Break
778
779 TBD
780
781
782 .. opcode:: IF - If
783
784 TBD
785
786
787 .. opcode:: ELSE - Else
788
789 TBD
790
791
792 .. opcode:: ENDIF - End If
793
794 TBD
795
796
797 .. opcode:: PUSHA - Push Address Register On Stack
798
799 push(src.x)
800 push(src.y)
801 push(src.z)
802 push(src.w)
803
804 .. note::
805
806 Considered for cleanup.
807
808 .. note::
809
810 Considered for removal.
811
812 .. opcode:: POPA - Pop Address Register From Stack
813
814 dst.w = pop()
815 dst.z = pop()
816 dst.y = pop()
817 dst.x = pop()
818
819 .. note::
820
821 Considered for cleanup.
822
823 .. note::
824
825 Considered for removal.
826
827
828 Compute ISA
829 ^^^^^^^^^^^^^^^^^^^^^^^^
830
831 These opcodes are primarily provided for special-use computational shaders.
832 Support for these opcodes indicated by a special pipe capability bit (TBD).
833
834 XXX so let's discuss it, yeah?
835
836 .. opcode:: CEIL - Ceiling
837
838 .. math::
839
840 dst.x = \lceil src.x\rceil
841
842 dst.y = \lceil src.y\rceil
843
844 dst.z = \lceil src.z\rceil
845
846 dst.w = \lceil src.w\rceil
847
848
849 .. opcode:: I2F - Integer To Float
850
851 .. math::
852
853 dst.x = (float) src.x
854
855 dst.y = (float) src.y
856
857 dst.z = (float) src.z
858
859 dst.w = (float) src.w
860
861
862 .. opcode:: NOT - Bitwise Not
863
864 .. math::
865
866 dst.x = ~src.x
867
868 dst.y = ~src.y
869
870 dst.z = ~src.z
871
872 dst.w = ~src.w
873
874
875 .. opcode:: TRUNC - Truncate
876
877 .. math::
878
879 dst.x = trunc(src.x)
880
881 dst.y = trunc(src.y)
882
883 dst.z = trunc(src.z)
884
885 dst.w = trunc(src.w)
886
887
888 .. opcode:: SHL - Shift Left
889
890 .. math::
891
892 dst.x = src0.x << src1.x
893
894 dst.y = src0.y << src1.x
895
896 dst.z = src0.z << src1.x
897
898 dst.w = src0.w << src1.x
899
900
901 .. opcode:: SHR - Shift Right
902
903 .. math::
904
905 dst.x = src0.x >> src1.x
906
907 dst.y = src0.y >> src1.x
908
909 dst.z = src0.z >> src1.x
910
911 dst.w = src0.w >> src1.x
912
913
914 .. opcode:: AND - Bitwise And
915
916 .. math::
917
918 dst.x = src0.x & src1.x
919
920 dst.y = src0.y & src1.y
921
922 dst.z = src0.z & src1.z
923
924 dst.w = src0.w & src1.w
925
926
927 .. opcode:: OR - Bitwise Or
928
929 .. math::
930
931 dst.x = src0.x | src1.x
932
933 dst.y = src0.y | src1.y
934
935 dst.z = src0.z | src1.z
936
937 dst.w = src0.w | src1.w
938
939
940 .. opcode:: MOD - Modulus
941
942 .. math::
943
944 dst.x = src0.x \bmod src1.x
945
946 dst.y = src0.y \bmod src1.y
947
948 dst.z = src0.z \bmod src1.z
949
950 dst.w = src0.w \bmod src1.w
951
952
953 .. opcode:: XOR - Bitwise Xor
954
955 .. math::
956
957 dst.x = src0.x \oplus src1.x
958
959 dst.y = src0.y \oplus src1.y
960
961 dst.z = src0.z \oplus src1.z
962
963 dst.w = src0.w \oplus src1.w
964
965
966 .. opcode:: SAD - Sum Of Absolute Differences
967
968 .. math::
969
970 dst.x = |src0.x - src1.x| + src2.x
971
972 dst.y = |src0.y - src1.y| + src2.y
973
974 dst.z = |src0.z - src1.z| + src2.z
975
976 dst.w = |src0.w - src1.w| + src2.w
977
978
979 .. opcode:: TXF - Texel Fetch
980
981 TBD
982
983
984 .. opcode:: TXQ - Texture Size Query
985
986 TBD
987
988
989 .. opcode:: CONT - Continue
990
991 TBD
992
993 .. note::
994
995 Support for CONT is determined by a special capability bit,
996 ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
997
998
999 Geometry ISA
1000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1001
1002 These opcodes are only supported in geometry shaders; they have no meaning
1003 in any other type of shader.
1004
1005 .. opcode:: EMIT - Emit
1006
1007 TBD
1008
1009
1010 .. opcode:: ENDPRIM - End Primitive
1011
1012 TBD
1013
1014
1015 GLSL ISA
1016 ^^^^^^^^^^
1017
1018 These opcodes are part of :term:`GLSL`'s opcode set. Support for these
1019 opcodes is determined by a special capability bit, ``GLSL``.
1020
1021 .. opcode:: BGNLOOP - Begin a Loop
1022
1023 TBD
1024
1025
1026 .. opcode:: BGNSUB - Begin Subroutine
1027
1028 TBD
1029
1030
1031 .. opcode:: ENDLOOP - End a Loop
1032
1033 TBD
1034
1035
1036 .. opcode:: ENDSUB - End Subroutine
1037
1038 TBD
1039
1040
1041 .. opcode:: NOP - No Operation
1042
1043 Do nothing.
1044
1045
1046 .. opcode:: NRM4 - 4-component Vector Normalise
1047
1048 This instruction replicates its result.
1049
1050 .. math::
1051
1052 dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
1053
1054
1055 ps_2_x
1056 ^^^^^^^^^^^^
1057
1058 XXX wait what
1059
1060 .. opcode:: CALLNZ - Subroutine Call If Not Zero
1061
1062 TBD
1063
1064
1065 .. opcode:: IFC - If
1066
1067 TBD
1068
1069
1070 .. opcode:: BREAKC - Break Conditional
1071
1072 TBD
1073
1074 .. _doubleopcodes:
1075
1076 Double ISA
1077 ^^^^^^^^^^^^^^^
1078
1079 The double-precision opcodes reinterpret four-component vectors into
1080 two-component vectors with doubled precision in each component.
1081
1082 Support for these opcodes is XXX undecided. :T
1083
1084 .. opcode:: DADD - Add
1085
1086 .. math::
1087
1088 dst.xy = src0.xy + src1.xy
1089
1090 dst.zw = src0.zw + src1.zw
1091
1092
1093 .. opcode:: DDIV - Divide
1094
1095 .. math::
1096
1097 dst.xy = src0.xy / src1.xy
1098
1099 dst.zw = src0.zw / src1.zw
1100
1101 .. opcode:: DSEQ - Set on Equal
1102
1103 .. math::
1104
1105 dst.xy = src0.xy == src1.xy ? 1.0F : 0.0F
1106
1107 dst.zw = src0.zw == src1.zw ? 1.0F : 0.0F
1108
1109 .. opcode:: DSLT - Set on Less than
1110
1111 .. math::
1112
1113 dst.xy = src0.xy < src1.xy ? 1.0F : 0.0F
1114
1115 dst.zw = src0.zw < src1.zw ? 1.0F : 0.0F
1116
1117 .. opcode:: DFRAC - Fraction
1118
1119 .. math::
1120
1121 dst.xy = src.xy - \lfloor src.xy\rfloor
1122
1123 dst.zw = src.zw - \lfloor src.zw\rfloor
1124
1125
1126 .. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components
1127
1128 Like the ``frexp()`` routine in many math libraries, this opcode stores the
1129 exponent of its source to ``dst0``, and the significand to ``dst1``, such that
1130 :math:`dst1 \times 2^{dst0} = src` .
1131
1132 .. math::
1133
1134 dst0.xy = exp(src.xy)
1135
1136 dst1.xy = frac(src.xy)
1137
1138 dst0.zw = exp(src.zw)
1139
1140 dst1.zw = frac(src.zw)
1141
1142 .. opcode:: DLDEXP - Multiply Number by Integral Power of 2
1143
1144 This opcode is the inverse of :opcode:`DFRACEXP`.
1145
1146 .. math::
1147
1148 dst.xy = src0.xy \times 2^{src1.xy}
1149
1150 dst.zw = src0.zw \times 2^{src1.zw}
1151
1152 .. opcode:: DMIN - Minimum
1153
1154 .. math::
1155
1156 dst.xy = min(src0.xy, src1.xy)
1157
1158 dst.zw = min(src0.zw, src1.zw)
1159
1160 .. opcode:: DMAX - Maximum
1161
1162 .. math::
1163
1164 dst.xy = max(src0.xy, src1.xy)
1165
1166 dst.zw = max(src0.zw, src1.zw)
1167
1168 .. opcode:: DMUL - Multiply
1169
1170 .. math::
1171
1172 dst.xy = src0.xy \times src1.xy
1173
1174 dst.zw = src0.zw \times src1.zw
1175
1176
1177 .. opcode:: DMAD - Multiply And Add
1178
1179 .. math::
1180
1181 dst.xy = src0.xy \times src1.xy + src2.xy
1182
1183 dst.zw = src0.zw \times src1.zw + src2.zw
1184
1185
1186 .. opcode:: DRCP - Reciprocal
1187
1188 .. math::
1189
1190 dst.xy = \frac{1}{src.xy}
1191
1192 dst.zw = \frac{1}{src.zw}
1193
1194 .. opcode:: DSQRT - Square Root
1195
1196 .. math::
1197
1198 dst.xy = \sqrt{src.xy}
1199
1200 dst.zw = \sqrt{src.zw}
1201
1202
1203 Explanation of symbols used
1204 ------------------------------
1205
1206
1207 Functions
1208 ^^^^^^^^^^^^^^
1209
1210
1211 :math:`|x|` Absolute value of `x`.
1212
1213 :math:`\lceil x \rceil` Ceiling of `x`.
1214
1215 clamp(x,y,z) Clamp x between y and z.
1216 (x < y) ? y : (x > z) ? z : x
1217
1218 :math:`\lfloor x\rfloor` Floor of `x`.
1219
1220 :math:`\log_2{x}` Logarithm of `x`, base 2.
1221
1222 max(x,y) Maximum of x and y.
1223 (x > y) ? x : y
1224
1225 min(x,y) Minimum of x and y.
1226 (x < y) ? x : y
1227
1228 partialx(x) Derivative of x relative to fragment's X.
1229
1230 partialy(x) Derivative of x relative to fragment's Y.
1231
1232 pop() Pop from stack.
1233
1234 :math:`x^y` `x` to the power `y`.
1235
1236 push(x) Push x on stack.
1237
1238 round(x) Round x.
1239
1240 trunc(x) Truncate x, i.e. drop the fraction bits.
1241
1242
1243 Keywords
1244 ^^^^^^^^^^^^^
1245
1246
1247 discard Discard fragment.
1248
1249 pc Program counter.
1250
1251 target Label of target instruction.
1252
1253
1254 Other tokens
1255 ---------------
1256
1257
1258 Declaration
1259 ^^^^^^^^^^^
1260
1261
1262 Declares a register that is will be referenced as an operand in Instruction
1263 tokens.
1264
1265 File field contains register file that is being declared and is one
1266 of TGSI_FILE.
1267
1268 UsageMask field specifies which of the register components can be accessed
1269 and is one of TGSI_WRITEMASK.
1270
1271 Interpolate field is only valid for fragment shader INPUT register files.
1272 It specifes the way input is being interpolated by the rasteriser and is one
1273 of TGSI_INTERPOLATE.
1274
1275 If Dimension flag is set to 1, a Declaration Dimension token follows.
1276
1277 If Semantic flag is set to 1, a Declaration Semantic token follows.
1278
1279 CylindricalWrap bitfield is only valid for fragment shader INPUT register
1280 files. It specifies which register components should be subject to cylindrical
1281 wrapping when interpolating by the rasteriser. If TGSI_CYLINDRICAL_WRAP_X
1282 is set to 1, the X component should be interpolated according to cylindrical
1283 wrapping rules.
1284
1285
1286 Declaration Semantic
1287 ^^^^^^^^^^^^^^^^^^^^^^^^
1288
1289
1290 Follows Declaration token if Semantic bit is set.
1291
1292 Since its purpose is to link a shader with other stages of the pipeline,
1293 it is valid to follow only those Declaration tokens that declare a register
1294 either in INPUT or OUTPUT file.
1295
1296 SemanticName field contains the semantic name of the register being declared.
1297 There is no default value.
1298
1299 SemanticIndex is an optional subscript that can be used to distinguish
1300 different register declarations with the same semantic name. The default value
1301 is 0.
1302
1303 The meanings of the individual semantic names are explained in the following
1304 sections.
1305
1306 TGSI_SEMANTIC_POSITION
1307 """"""""""""""""""""""
1308
1309 For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
1310 output register which contains the homogeneous vertex position in the clip
1311 space coordinate system. After clipping, the X, Y and Z components of the
1312 vertex will be divided by the W value to get normalized device coordinates.
1313
1314 For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
1315 fragment shader input contains the fragment's window position. The X
1316 component starts at zero and always increases from left to right.
1317 The Y component starts at zero and always increases but Y=0 may either
1318 indicate the top of the window or the bottom depending on the fragment
1319 coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
1320 The Z coordinate ranges from 0 to 1 to represent depth from the front
1321 to the back of the Z buffer. The W component contains the reciprocol
1322 of the interpolated vertex position W component.
1323
1324
1325
1326 TGSI_SEMANTIC_COLOR
1327 """""""""""""""""""
1328
1329 For vertex shader outputs or fragment shader inputs/outputs, this
1330 label indicates that the resister contains an R,G,B,A color.
1331
1332 Several shader inputs/outputs may contain colors so the semantic index
1333 is used to distinguish them. For example, color[0] may be the diffuse
1334 color while color[1] may be the specular color.
1335
1336 This label is needed so that the flat/smooth shading can be applied
1337 to the right interpolants during rasterization.
1338
1339
1340
1341 TGSI_SEMANTIC_BCOLOR
1342 """"""""""""""""""""
1343
1344 Back-facing colors are only used for back-facing polygons, and are only valid
1345 in vertex shader outputs. After rasterization, all polygons are front-facing
1346 and COLOR and BCOLOR end up occupying the same slots in the fragment shader,
1347 so all BCOLORs effectively become regular COLORs in the fragment shader.
1348
1349
1350 TGSI_SEMANTIC_FOG
1351 """""""""""""""""
1352
1353 The fog coordinate historically has been used to replace the depth coordinate
1354 for generation of fog in dedicated fog blocks. Gallium, however, does not use
1355 dedicated fog acceleration, placing it entirely in the fragment shader
1356 instead.
1357
1358 The fog coordinate should be written in ``(f, 0, 0, 1)`` format. Only the first
1359 component matters when writing from the vertex shader; the driver will ensure
1360 that the coordinate is in this format when used as a fragment shader input.
1361
1362 TGSI_SEMANTIC_PSIZE
1363 """""""""""""""""""
1364
1365 PSIZE, or point size, is used to specify point sizes per-vertex. It should
1366 be in ``(s, 0, 0, 1)`` format, where ``s`` is the (possibly clamped) point size.
1367 Only the first component matters when writing from the vertex shader.
1368
1369 When using this semantic, be sure to set the appropriate state in the
1370 :ref:`rasterizer` first.
1371
1372 TGSI_SEMANTIC_GENERIC
1373 """""""""""""""""""""
1374
1375 Generic semantics are nearly always used for texture coordinate attributes,
1376 in ``(s, t, r, q)`` format. ``t`` and ``r`` may be unused for certain kinds
1377 of lookups, and ``q`` is the level-of-detail bias for biased sampling.
1378
1379 These attributes are called "generic" because they may be used for anything
1380 else, including parameters, texture generation information, or anything that
1381 can be stored inside a four-component vector.
1382
1383 TGSI_SEMANTIC_NORMAL
1384 """"""""""""""""""""
1385
1386 Vertex normal; could be used to implement per-pixel lighting for legacy APIs
1387 that allow mixing fixed-function and programmable stages.
1388
1389 TGSI_SEMANTIC_FACE
1390 """"""""""""""""""
1391
1392 FACE is the facing bit, to store the facing information for the fragment
1393 shader. ``(f, 0, 0, 1)`` is the format. The first component will be positive
1394 when the fragment is front-facing, and negative when the component is
1395 back-facing.
1396
1397 TGSI_SEMANTIC_EDGEFLAG
1398 """"""""""""""""""""""
1399
1400 For vertex shaders, this sematic label indicates that an input or
1401 output is a boolean edge flag. The register layout is [F, x, x, x]
1402 where F is 0.0 or 1.0 and x = don't care. Normally, the vertex shader
1403 simply copies the edge flag input to the edgeflag output.
1404
1405 Edge flags are used to control which lines or points are actually
1406 drawn when the polygon mode converts triangles/quads/polygons into
1407 points or lines.
1408
1409
1410
1411 Properties
1412 ^^^^^^^^^^^^^^^^^^^^^^^^
1413
1414
1415 Properties are general directives that apply to the whole TGSI program.
1416
1417 FS_COORD_ORIGIN
1418 """""""""""""""
1419
1420 Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
1421 The default value is UPPER_LEFT.
1422
1423 If UPPER_LEFT, the position will be (0,0) at the upper left corner and
1424 increase downward and rightward.
1425 If LOWER_LEFT, the position will be (0,0) at the lower left corner and
1426 increase upward and rightward.
1427
1428 OpenGL defaults to LOWER_LEFT, and is configurable with the
1429 GL_ARB_fragment_coord_conventions extension.
1430
1431 DirectX 9/10 use UPPER_LEFT.
1432
1433 FS_COORD_PIXEL_CENTER
1434 """""""""""""""""""""
1435
1436 Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
1437 The default value is HALF_INTEGER.
1438
1439 If HALF_INTEGER, the fractionary part of the position will be 0.5
1440 If INTEGER, the fractionary part of the position will be 0.0
1441
1442 Note that this does not affect the set of fragments generated by
1443 rasterization, which is instead controlled by gl_rasterization_rules in the
1444 rasterizer.
1445
1446 OpenGL defaults to HALF_INTEGER, and is configurable with the
1447 GL_ARB_fragment_coord_conventions extension.
1448
1449 DirectX 9 uses INTEGER.
1450 DirectX 10 uses HALF_INTEGER.
1451
1452
1453
1454 Texture Sampling and Texture Formats
1455 ------------------------------------
1456
1457 This table shows how texture image components are returned as (x,y,z,w) tuples
1458 by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
1459 :opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
1460 well.
1461
1462 +--------------------+--------------+--------------------+--------------+
1463 | Texture Components | Gallium | OpenGL | Direct3D 9 |
1464 +====================+==============+====================+==============+
1465 | R | (r, 0, 0, 1) | (r, 0, 0, 1) | (r, 1, 1, 1) |
1466 +--------------------+--------------+--------------------+--------------+
1467 | RG | (r, g, 0, 1) | (r, g, 0, 1) | (r, g, 1, 1) |
1468 +--------------------+--------------+--------------------+--------------+
1469 | RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) |
1470 +--------------------+--------------+--------------------+--------------+
1471 | RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) |
1472 +--------------------+--------------+--------------------+--------------+
1473 | A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) |
1474 +--------------------+--------------+--------------------+--------------+
1475 | L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) |
1476 +--------------------+--------------+--------------------+--------------+
1477 | LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) |
1478 +--------------------+--------------+--------------------+--------------+
1479 | I | (i, i, i, i) | (i, i, i, i) | N/A |
1480 +--------------------+--------------+--------------------+--------------+
1481 | UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) |
1482 | | | [#envmap-bumpmap]_ | |
1483 +--------------------+--------------+--------------------+--------------+
1484 | Z | XXX TBD | (z, z, z, 1) | (0, z, 0, 1) |
1485 | | | [#depth-tex-mode]_ | |
1486 +--------------------+--------------+--------------------+--------------+
1487
1488 .. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
1489 .. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
1490 or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.