src/gallium/docs/source/tgsi.rst

   1 TGSI
   2 ====
   3
   4 TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
   5 for describing shaders. Since Gallium is inherently shaderful, shaders are
   6 an important part of the API. TGSI is the only intermediate representation
   7 used by all drivers.
   8
   9 Basics
  10 ------
  11
  12 All TGSI instructions, known as *opcodes*, operate on arbitrary-precision
  13 floating-point four-component vectors. An opcode may have up to one
  14 destination register, known as *dst*, and between zero and three source
  15 registers, called *src0* through *src2*, or simply *src* if there is only
  16 one.
  17
  18 Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
  19 components as integers. Other instructions permit using registers as
  20 two-component vectors with double precision; see :ref:`Double Opcodes`.
  21
  22 When an instruction has a scalar result, the result is usually copied into
  23 each of the components of *dst*. When this happens, the result is said to be
  24 *replicated* to *dst*. :opcode:`RCP` is one such instruction.
  25
  26 Modifiers
  27 ^^^^^^^^^^^^^^^
  28
  29 TGSI supports modifiers on inputs (as well as saturate modifier on instructions).
  30
  31 For inputs which have a floating point type, both absolute value and negation
  32 modifiers are supported (with absolute value being applied first).
  33 TGSI_OPCODE_MOV is considered to have float input type for applying modifiers.
  34
  35 For inputs which have signed or unsigned type only the negate modifier is
  36 supported.
  37
  38 Instruction Set
  39 ---------------
  40
  41 Core ISA
  42 ^^^^^^^^^^^^^^^^^^^^^^^^^
  43
  44 These opcodes are guaranteed to be available regardless of the driver being
  45 used.
  46
  47 .. opcode:: ARL - Address Register Load
  48
  49 .. math::
  50
  51   dst.x = \lfloor src.x\rfloor
  52
  53   dst.y = \lfloor src.y\rfloor
  54
  55   dst.z = \lfloor src.z\rfloor
  56
  57   dst.w = \lfloor src.w\rfloor
  58
  59
  60 .. opcode:: MOV - Move
  61
  62 .. math::
  63
  64   dst.x = src.x
  65
  66   dst.y = src.y
  67
  68   dst.z = src.z
  69
  70   dst.w = src.w
  71
  72
  73 .. opcode:: LIT - Light Coefficients
  74
  75 .. math::
  76
  77   dst.x = 1
  78
  79   dst.y = max(src.x, 0)
  80
  81   dst.z = (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0
  82
  83   dst.w = 1
  84
  85
  86 .. opcode:: RCP - Reciprocal
  87
  88 This instruction replicates its result.
  89
  90 .. math::
  91
  92   dst = \frac{1}{src.x}
  93
  94
  95 .. opcode:: RSQ - Reciprocal Square Root
  96
  97 This instruction replicates its result.
  98
  99 .. math::
 100
 101   dst = \frac{1}{\sqrt{|src.x|}}
 102
 103
 104 .. opcode:: SQRT - Square Root
 105
 106 This instruction replicates its result.
 107
 108 .. math::
 109
 110   dst = {\sqrt{src.x}}
 111
 112
 113 .. opcode:: EXP - Approximate Exponential Base 2
 114
 115 .. math::
 116
 117   dst.x = 2^{\lfloor src.x\rfloor}
 118
 119   dst.y = src.x - \lfloor src.x\rfloor
 120
 121   dst.z = 2^{src.x}
 122
 123   dst.w = 1
 124
 125
 126 .. opcode:: LOG - Approximate Logarithm Base 2
 127
 128 .. math::
 129
 130   dst.x = \lfloor\log_2{|src.x|}\rfloor
 131
 132   dst.y = \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}}
 133
 134   dst.z = \log_2{|src.x|}
 135
 136   dst.w = 1
 137
 138
 139 .. opcode:: MUL - Multiply
 140
 141 .. math::
 142
 143   dst.x = src0.x \times src1.x
 144
 145   dst.y = src0.y \times src1.y
 146
 147   dst.z = src0.z \times src1.z
 148
 149   dst.w = src0.w \times src1.w
 150
 151
 152 .. opcode:: ADD - Add
 153
 154 .. math::
 155
 156   dst.x = src0.x + src1.x
 157
 158   dst.y = src0.y + src1.y
 159
 160   dst.z = src0.z + src1.z
 161
 162   dst.w = src0.w + src1.w
 163
 164
 165 .. opcode:: DP3 - 3-component Dot Product
 166
 167 This instruction replicates its result.
 168
 169 .. math::
 170
 171   dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
 172
 173
 174 .. opcode:: DP4 - 4-component Dot Product
 175
 176 This instruction replicates its result.
 177
 178 .. math::
 179
 180   dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
 181
 182
 183 .. opcode:: DST - Distance Vector
 184
 185 .. math::
 186
 187   dst.x = 1
 188
 189   dst.y = src0.y \times src1.y
 190
 191   dst.z = src0.z
 192
 193   dst.w = src1.w
 194
 195
 196 .. opcode:: MIN - Minimum
 197
 198 .. math::
 199
 200   dst.x = min(src0.x, src1.x)
 201
 202   dst.y = min(src0.y, src1.y)
 203
 204   dst.z = min(src0.z, src1.z)
 205
 206   dst.w = min(src0.w, src1.w)
 207
 208
 209 .. opcode:: MAX - Maximum
 210
 211 .. math::
 212
 213   dst.x = max(src0.x, src1.x)
 214
 215   dst.y = max(src0.y, src1.y)
 216
 217   dst.z = max(src0.z, src1.z)
 218
 219   dst.w = max(src0.w, src1.w)
 220
 221
 222 .. opcode:: SLT - Set On Less Than
 223
 224 .. math::
 225
 226   dst.x = (src0.x < src1.x) ? 1 : 0
 227
 228   dst.y = (src0.y < src1.y) ? 1 : 0
 229
 230   dst.z = (src0.z < src1.z) ? 1 : 0
 231
 232   dst.w = (src0.w < src1.w) ? 1 : 0
 233
 234
 235 .. opcode:: SGE - Set On Greater Equal Than
 236
 237 .. math::
 238
 239   dst.x = (src0.x >= src1.x) ? 1 : 0
 240
 241   dst.y = (src0.y >= src1.y) ? 1 : 0
 242
 243   dst.z = (src0.z >= src1.z) ? 1 : 0
 244
 245   dst.w = (src0.w >= src1.w) ? 1 : 0
 246
 247
 248 .. opcode:: MAD - Multiply And Add
 249
 250 .. math::
 251
 252   dst.x = src0.x \times src1.x + src2.x
 253
 254   dst.y = src0.y \times src1.y + src2.y
 255
 256   dst.z = src0.z \times src1.z + src2.z
 257
 258   dst.w = src0.w \times src1.w + src2.w
 259
 260
 261 .. opcode:: SUB - Subtract
 262
 263 .. math::
 264
 265   dst.x = src0.x - src1.x
 266
 267   dst.y = src0.y - src1.y
 268
 269   dst.z = src0.z - src1.z
 270
 271   dst.w = src0.w - src1.w
 272
 273
 274 .. opcode:: LRP - Linear Interpolate
 275
 276 .. math::
 277
 278   dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
 279
 280   dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
 281
 282   dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
 283
 284   dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
 285
 286
 287 .. opcode:: CND - Condition
 288
 289 .. math::
 290
 291   dst.x = (src2.x > 0.5) ? src0.x : src1.x
 292
 293   dst.y = (src2.y > 0.5) ? src0.y : src1.y
 294
 295   dst.z = (src2.z > 0.5) ? src0.z : src1.z
 296
 297   dst.w = (src2.w > 0.5) ? src0.w : src1.w
 298
 299
 300 .. opcode:: DP2A - 2-component Dot Product And Add
 301
 302 .. math::
 303
 304   dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x
 305
 306   dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x
 307
 308   dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x
 309
 310   dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x
 311
 312
 313 .. opcode:: FRC - Fraction
 314
 315 .. math::
 316
 317   dst.x = src.x - \lfloor src.x\rfloor
 318
 319   dst.y = src.y - \lfloor src.y\rfloor
 320
 321   dst.z = src.z - \lfloor src.z\rfloor
 322
 323   dst.w = src.w - \lfloor src.w\rfloor
 324
 325
 326 .. opcode:: CLAMP - Clamp
 327
 328 .. math::
 329
 330   dst.x = clamp(src0.x, src1.x, src2.x)
 331
 332   dst.y = clamp(src0.y, src1.y, src2.y)
 333
 334   dst.z = clamp(src0.z, src1.z, src2.z)
 335
 336   dst.w = clamp(src0.w, src1.w, src2.w)
 337
 338
 339 .. opcode:: FLR - Floor
 340
 341 This is identical to :opcode:`ARL`.
 342
 343 .. math::
 344
 345   dst.x = \lfloor src.x\rfloor
 346
 347   dst.y = \lfloor src.y\rfloor
 348
 349   dst.z = \lfloor src.z\rfloor
 350
 351   dst.w = \lfloor src.w\rfloor
 352
 353
 354 .. opcode:: ROUND - Round
 355
 356 .. math::
 357
 358   dst.x = round(src.x)
 359
 360   dst.y = round(src.y)
 361
 362   dst.z = round(src.z)
 363
 364   dst.w = round(src.w)
 365
 366
 367 .. opcode:: EX2 - Exponential Base 2
 368
 369 This instruction replicates its result.
 370
 371 .. math::
 372
 373   dst = 2^{src.x}
 374
 375
 376 .. opcode:: LG2 - Logarithm Base 2
 377
 378 This instruction replicates its result.
 379
 380 .. math::
 381
 382   dst = \log_2{src.x}
 383
 384
 385 .. opcode:: POW - Power
 386
 387 This instruction replicates its result.
 388
 389 .. math::
 390
 391   dst = src0.x^{src1.x}
 392
 393 .. opcode:: XPD - Cross Product
 394
 395 .. math::
 396
 397   dst.x = src0.y \times src1.z - src1.y \times src0.z
 398
 399   dst.y = src0.z \times src1.x - src1.z \times src0.x
 400
 401   dst.z = src0.x \times src1.y - src1.x \times src0.y
 402
 403   dst.w = 1
 404
 405
 406 .. opcode:: ABS - Absolute
 407
 408 .. math::
 409
 410   dst.x = |src.x|
 411
 412   dst.y = |src.y|
 413
 414   dst.z = |src.z|
 415
 416   dst.w = |src.w|
 417
 418
 419 .. opcode:: RCC - Reciprocal Clamped
 420
 421 This instruction replicates its result.
 422
 423 XXX cleanup on aisle three
 424
 425 .. math::
 426
 427   dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
 428
 429
 430 .. opcode:: DPH - Homogeneous Dot Product
 431
 432 This instruction replicates its result.
 433
 434 .. math::
 435
 436   dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
 437
 438
 439 .. opcode:: COS - Cosine
 440
 441 This instruction replicates its result.
 442
 443 .. math::
 444
 445   dst = \cos{src.x}
 446
 447
 448 .. opcode:: DDX - Derivative Relative To X
 449
 450 .. math::
 451
 452   dst.x = partialx(src.x)
 453
 454   dst.y = partialx(src.y)
 455
 456   dst.z = partialx(src.z)
 457
 458   dst.w = partialx(src.w)
 459
 460
 461 .. opcode:: DDY - Derivative Relative To Y
 462
 463 .. math::
 464
 465   dst.x = partialy(src.x)
 466
 467   dst.y = partialy(src.y)
 468
 469   dst.z = partialy(src.z)
 470
 471   dst.w = partialy(src.w)
 472
 473
 474 .. opcode:: KILP - Predicated Discard
 475
 476   Not really predicated, just unconditional discard
 477
 478
 479 .. opcode:: PK2H - Pack Two 16-bit Floats
 480
 481   TBD
 482
 483
 484 .. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
 485
 486   TBD
 487
 488
 489 .. opcode:: PK4B - Pack Four Signed 8-bit Scalars
 490
 491   TBD
 492
 493
 494 .. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
 495
 496   TBD
 497
 498
 499 .. opcode:: RFL - Reflection Vector
 500
 501 .. math::
 502
 503   dst.x = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.x - src1.x
 504
 505   dst.y = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.y - src1.y
 506
 507   dst.z = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.z - src1.z
 508
 509   dst.w = 1
 510
 511 .. note::
 512
 513    Considered for removal.
 514
 515
 516 .. opcode:: SEQ - Set On Equal
 517
 518 .. math::
 519
 520   dst.x = (src0.x == src1.x) ? 1 : 0
 521
 522   dst.y = (src0.y == src1.y) ? 1 : 0
 523
 524   dst.z = (src0.z == src1.z) ? 1 : 0
 525
 526   dst.w = (src0.w == src1.w) ? 1 : 0
 527
 528
 529 .. opcode:: SFL - Set On False
 530
 531 This instruction replicates its result.
 532
 533 .. math::
 534
 535   dst = 0
 536
 537 .. note::
 538
 539    Considered for removal.
 540
 541
 542 .. opcode:: SGT - Set On Greater Than
 543
 544 .. math::
 545
 546   dst.x = (src0.x > src1.x) ? 1 : 0
 547
 548   dst.y = (src0.y > src1.y) ? 1 : 0
 549
 550   dst.z = (src0.z > src1.z) ? 1 : 0
 551
 552   dst.w = (src0.w > src1.w) ? 1 : 0
 553
 554
 555 .. opcode:: SIN - Sine
 556
 557 This instruction replicates its result.
 558
 559 .. math::
 560
 561   dst = \sin{src.x}
 562
 563
 564 .. opcode:: SLE - Set On Less Equal Than
 565
 566 .. math::
 567
 568   dst.x = (src0.x <= src1.x) ? 1 : 0
 569
 570   dst.y = (src0.y <= src1.y) ? 1 : 0
 571
 572   dst.z = (src0.z <= src1.z) ? 1 : 0
 573
 574   dst.w = (src0.w <= src1.w) ? 1 : 0
 575
 576
 577 .. opcode:: SNE - Set On Not Equal
 578
 579 .. math::
 580
 581   dst.x = (src0.x != src1.x) ? 1 : 0
 582
 583   dst.y = (src0.y != src1.y) ? 1 : 0
 584
 585   dst.z = (src0.z != src1.z) ? 1 : 0
 586
 587   dst.w = (src0.w != src1.w) ? 1 : 0
 588
 589
 590 .. opcode:: STR - Set On True
 591
 592 This instruction replicates its result.
 593
 594 .. math::
 595
 596   dst = 1
 597
 598
 599 .. opcode:: TEX - Texture Lookup
 600
 601 .. math::
 602
 603   coord = src0
 604
 605   bias = 0.0
 606
 607   dst = texture_sample(unit, coord, bias)
 608
 609   for array textures src0.y contains the slice for 1D,
 610   and src0.z contain the slice for 2D.
 611   for shadow textures with no arrays, src0.z contains
 612   the reference value.
 613   for shadow textures with arrays, src0.z contains
 614   the reference value for 1D arrays, and src0.w contains
 615   the reference value for 2D arrays.
 616   There is no way to pass a bias in the .w value for
 617   shadow arrays, and GLSL doesn't allow this.
 618   GLSL does allow cube shadows maps to take a bias value,
 619   and we have to determine how this will look in TGSI.
 620
 621 .. opcode:: TXD - Texture Lookup with Derivatives
 622
 623 .. math::
 624
 625   coord = src0
 626
 627   ddx = src1
 628
 629   ddy = src2
 630
 631   bias = 0.0
 632
 633   dst = texture_sample_deriv(unit, coord, bias, ddx, ddy)
 634
 635
 636 .. opcode:: TXP - Projective Texture Lookup
 637
 638 .. math::
 639
 640   coord.x = src0.x / src.w
 641
 642   coord.y = src0.y / src.w
 643
 644   coord.z = src0.z / src.w
 645
 646   coord.w = src0.w
 647
 648   bias = 0.0
 649
 650   dst = texture_sample(unit, coord, bias)
 651
 652
 653 .. opcode:: UP2H - Unpack Two 16-Bit Floats
 654
 655   TBD
 656
 657 .. note::
 658
 659    Considered for removal.
 660
 661 .. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
 662
 663   TBD
 664
 665 .. note::
 666
 667    Considered for removal.
 668
 669 .. opcode:: UP4B - Unpack Four Signed 8-Bit Values
 670
 671   TBD
 672
 673 .. note::
 674
 675    Considered for removal.
 676
 677 .. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
 678
 679   TBD
 680
 681 .. note::
 682
 683    Considered for removal.
 684
 685 .. opcode:: X2D - 2D Coordinate Transformation
 686
 687 .. math::
 688
 689   dst.x = src0.x + src1.x \times src2.x + src1.y \times src2.y
 690
 691   dst.y = src0.y + src1.x \times src2.z + src1.y \times src2.w
 692
 693   dst.z = src0.x + src1.x \times src2.x + src1.y \times src2.y
 694
 695   dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w
 696
 697 .. note::
 698
 699    Considered for removal.
 700
 701
 702 .. opcode:: ARA - Address Register Add
 703
 704   TBD
 705
 706 .. note::
 707
 708    Considered for removal.
 709
 710 .. opcode:: ARR - Address Register Load With Round
 711
 712 .. math::
 713
 714   dst.x = round(src.x)
 715
 716   dst.y = round(src.y)
 717
 718   dst.z = round(src.z)
 719
 720   dst.w = round(src.w)
 721
 722
 723 .. opcode:: SSG - Set Sign
 724
 725 .. math::
 726
 727   dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
 728
 729   dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
 730
 731   dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
 732
 733   dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
 734
 735
 736 .. opcode:: CMP - Compare
 737
 738 .. math::
 739
 740   dst.x = (src0.x < 0) ? src1.x : src2.x
 741
 742   dst.y = (src0.y < 0) ? src1.y : src2.y
 743
 744   dst.z = (src0.z < 0) ? src1.z : src2.z
 745
 746   dst.w = (src0.w < 0) ? src1.w : src2.w
 747
 748
 749 .. opcode:: KIL - Conditional Discard
 750
 751 .. math::
 752
 753   if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
 754     discard
 755   endif
 756
 757
 758 .. opcode:: SCS - Sine Cosine
 759
 760 .. math::
 761
 762   dst.x = \cos{src.x}
 763
 764   dst.y = \sin{src.x}
 765
 766   dst.z = 0
 767
 768   dst.w = 1
 769
 770
 771 .. opcode:: TXB - Texture Lookup With Bias
 772
 773 .. math::
 774
 775   coord.x = src.x
 776
 777   coord.y = src.y
 778
 779   coord.z = src.z
 780
 781   coord.w = 1.0
 782
 783   bias = src.z
 784
 785   dst = texture_sample(unit, coord, bias)
 786
 787
 788 .. opcode:: NRM - 3-component Vector Normalise
 789
 790 .. math::
 791
 792   dst.x = src.x / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
 793
 794   dst.y = src.y / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
 795
 796   dst.z = src.z / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
 797
 798   dst.w = 1
 799
 800
 801 .. opcode:: DIV - Divide
 802
 803 .. math::
 804
 805   dst.x = \frac{src0.x}{src1.x}
 806
 807   dst.y = \frac{src0.y}{src1.y}
 808
 809   dst.z = \frac{src0.z}{src1.z}
 810
 811   dst.w = \frac{src0.w}{src1.w}
 812
 813
 814 .. opcode:: DP2 - 2-component Dot Product
 815
 816 This instruction replicates its result.
 817
 818 .. math::
 819
 820   dst = src0.x \times src1.x + src0.y \times src1.y
 821
 822
 823 .. opcode:: TXL - Texture Lookup With explicit LOD
 824
 825 .. math::
 826
 827   coord.x = src0.x
 828
 829   coord.y = src0.y
 830
 831   coord.z = src0.z
 832
 833   coord.w = 1.0
 834
 835   lod = src0.w
 836
 837   dst = texture_sample(unit, coord, lod)
 838
 839
 840 .. opcode:: PUSHA - Push Address Register On Stack
 841
 842   push(src.x)
 843   push(src.y)
 844   push(src.z)
 845   push(src.w)
 846
 847 .. note::
 848
 849    Considered for cleanup.
 850
 851 .. note::
 852
 853    Considered for removal.
 854
 855 .. opcode:: POPA - Pop Address Register From Stack
 856
 857   dst.w = pop()
 858   dst.z = pop()
 859   dst.y = pop()
 860   dst.x = pop()
 861
 862 .. note::
 863
 864    Considered for cleanup.
 865
 866 .. note::
 867
 868    Considered for removal.
 869
 870
 871 .. opcode:: BRA - Branch
 872
 873   pc = target
 874
 875 .. note::
 876
 877    Considered for removal.
 878
 879
 880 .. opcode:: CALLNZ - Subroutine Call If Not Zero
 881
 882    TBD
 883
 884 .. note::
 885
 886    Considered for cleanup.
 887
 888 .. note::
 889
 890    Considered for removal.
 891
 892
 893 Compute ISA
 894 ^^^^^^^^^^^^^^^^^^^^^^^^
 895
 896 These opcodes are primarily provided for special-use computational shaders.
 897 Support for these opcodes indicated by a special pipe capability bit (TBD).
 898
 899 XXX doesn't look like most of the opcodes really belong here.
 900
 901 .. opcode:: CEIL - Ceiling
 902
 903 .. math::
 904
 905   dst.x = \lceil src.x\rceil
 906
 907   dst.y = \lceil src.y\rceil
 908
 909   dst.z = \lceil src.z\rceil
 910
 911   dst.w = \lceil src.w\rceil
 912
 913
 914 .. opcode:: TRUNC - Truncate
 915
 916 .. math::
 917
 918   dst.x = trunc(src.x)
 919
 920   dst.y = trunc(src.y)
 921
 922   dst.z = trunc(src.z)
 923
 924   dst.w = trunc(src.w)
 925
 926
 927 .. opcode:: MOD - Modulus
 928
 929 .. math::
 930
 931   dst.x = src0.x \bmod src1.x
 932
 933   dst.y = src0.y \bmod src1.y
 934
 935   dst.z = src0.z \bmod src1.z
 936
 937   dst.w = src0.w \bmod src1.w
 938
 939
 940 .. opcode:: UARL - Integer Address Register Load
 941
 942   Moves the contents of the source register, assumed to be an integer, into the
 943   destination register, which is assumed to be an address (ADDR) register.
 944
 945
 946 .. opcode:: SAD - Sum Of Absolute Differences
 947
 948 .. math::
 949
 950   dst.x = |src0.x - src1.x| + src2.x
 951
 952   dst.y = |src0.y - src1.y| + src2.y
 953
 954   dst.z = |src0.z - src1.z| + src2.z
 955
 956   dst.w = |src0.w - src1.w| + src2.w
 957
 958
 959 .. opcode:: TXF - Texel Fetch (as per NV_gpu_shader4), extract a single texel
 960                   from a specified texture image. The source sampler may
 961                   not be a CUBE or SHADOW.
 962                   src 0 is a four-component signed integer vector used to
 963                   identify the single texel accessed. 3 components + level.
 964                   src 1 is a 3 component constant signed integer vector,
 965                   with each component only have a range of
 966                   -8..+8 (hw only seems to deal with this range, interface
 967                   allows for up to unsigned int).
 968                   TXF(uint_vec coord, int_vec offset).
 969
 970
 971 .. opcode:: TXQ - Texture Size Query (as per NV_gpu_program4)
 972                   retrieve the dimensions of the texture
 973                   depending on the target. For 1D (width), 2D/RECT/CUBE
 974                   (width, height), 3D (width, height, depth),
 975                   1D array (width, layers), 2D array (width, height, layers)
 976
 977 .. math::
 978
 979   lod = src0.x
 980
 981   dst.x = texture_width(unit, lod)
 982
 983   dst.y = texture_height(unit, lod)
 984
 985   dst.z = texture_depth(unit, lod)
 986
 987
 988 Integer ISA
 989 ^^^^^^^^^^^^^^^^^^^^^^^^
 990 These opcodes are used for integer operations.
 991 Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
 992
 993
 994 .. opcode:: I2F - Signed Integer To Float
 995
 996    Rounding is unspecified (round to nearest even suggested).
 997
 998 .. math::
 999
1000   dst.x = (float) src.x
1001
1002   dst.y = (float) src.y
1003
1004   dst.z = (float) src.z
1005
1006   dst.w = (float) src.w
1007
1008
1009 .. opcode:: U2F - Unsigned Integer To Float
1010
1011    Rounding is unspecified (round to nearest even suggested).
1012
1013 .. math::
1014
1015   dst.x = (float) src.x
1016
1017   dst.y = (float) src.y
1018
1019   dst.z = (float) src.z
1020
1021   dst.w = (float) src.w
1022
1023
1024 .. opcode:: F2I - Float to Signed Integer
1025
1026    Rounding is towards zero (truncate).
1027    Values outside signed range (including NaNs) produce undefined results.
1028
1029 .. math::
1030
1031   dst.x = (int) src.x
1032
1033   dst.y = (int) src.y
1034
1035   dst.z = (int) src.z
1036
1037   dst.w = (int) src.w
1038
1039
1040 .. opcode:: F2U - Float to Unsigned Integer
1041
1042    Rounding is towards zero (truncate).
1043    Values outside unsigned range (including NaNs) produce undefined results.
1044
1045 .. math::
1046
1047   dst.x = (unsigned) src.x
1048
1049   dst.y = (unsigned) src.y
1050
1051   dst.z = (unsigned) src.z
1052
1053   dst.w = (unsigned) src.w
1054
1055
1056 .. opcode:: UADD - Integer Add
1057
1058    This instruction works the same for signed and unsigned integers.
1059    The low 32bit of the result is returned.
1060
1061 .. math::
1062
1063   dst.x = src0.x + src1.x
1064
1065   dst.y = src0.y + src1.y
1066
1067   dst.z = src0.z + src1.z
1068
1069   dst.w = src0.w + src1.w
1070
1071
1072 .. opcode:: UMAD - Integer Multiply And Add
1073
1074    This instruction works the same for signed and unsigned integers.
1075    The multiplication returns the low 32bit (as does the result itself).
1076
1077 .. math::
1078
1079   dst.x = src0.x \times src1.x + src2.x
1080
1081   dst.y = src0.y \times src1.y + src2.y
1082
1083   dst.z = src0.z \times src1.z + src2.z
1084
1085   dst.w = src0.w \times src1.w + src2.w
1086
1087
1088 .. opcode:: UMUL - Integer Multiply
1089
1090    This instruction works the same for signed and unsigned integers.
1091    The low 32bit of the result is returned.
1092
1093 .. math::
1094
1095   dst.x = src0.x \times src1.x
1096
1097   dst.y = src0.y \times src1.y
1098
1099   dst.z = src0.z \times src1.z
1100
1101   dst.w = src0.w \times src1.w
1102
1103
1104 .. opcode:: IDIV - Signed Integer Division
1105
1106    TBD: behavior for division by zero.
1107
1108 .. math::
1109
1110   dst.x = src0.x \ src1.x
1111
1112   dst.y = src0.y \ src1.y
1113
1114   dst.z = src0.z \ src1.z
1115
1116   dst.w = src0.w \ src1.w
1117
1118
1119 .. opcode:: UDIV - Unsigned Integer Division
1120
1121    For division by zero, 0xffffffff is returned.
1122
1123 .. math::
1124
1125   dst.x = src0.x \ src1.x
1126
1127   dst.y = src0.y \ src1.y
1128
1129   dst.z = src0.z \ src1.z
1130
1131   dst.w = src0.w \ src1.w
1132
1133
1134 .. opcode:: UMOD - Unsigned Integer Remainder
1135
1136    If second arg is zero, 0xffffffff is returned.
1137
1138 .. math::
1139
1140   dst.x = src0.x \ src1.x
1141
1142   dst.y = src0.y \ src1.y
1143
1144   dst.z = src0.z \ src1.z
1145
1146   dst.w = src0.w \ src1.w
1147
1148
1149 .. opcode:: NOT - Bitwise Not
1150
1151 .. math::
1152
1153   dst.x = ~src.x
1154
1155   dst.y = ~src.y
1156
1157   dst.z = ~src.z
1158
1159   dst.w = ~src.w
1160
1161
1162 .. opcode:: AND - Bitwise And
1163
1164 .. math::
1165
1166   dst.x = src0.x & src1.x
1167
1168   dst.y = src0.y & src1.y
1169
1170   dst.z = src0.z & src1.z
1171
1172   dst.w = src0.w & src1.w
1173
1174
1175 .. opcode:: OR - Bitwise Or
1176
1177 .. math::
1178
1179   dst.x = src0.x | src1.x
1180
1181   dst.y = src0.y | src1.y
1182
1183   dst.z = src0.z | src1.z
1184
1185   dst.w = src0.w | src1.w
1186
1187
1188 .. opcode:: XOR - Bitwise Xor
1189
1190 .. math::
1191
1192   dst.x = src0.x \oplus src1.x
1193
1194   dst.y = src0.y \oplus src1.y
1195
1196   dst.z = src0.z \oplus src1.z
1197
1198   dst.w = src0.w \oplus src1.w
1199
1200
1201 .. opcode:: IMAX - Maximum of Signed Integers
1202
1203 .. math::
1204
1205   dst.x = max(src0.x, src1.x)
1206
1207   dst.y = max(src0.y, src1.y)
1208
1209   dst.z = max(src0.z, src1.z)
1210
1211   dst.w = max(src0.w, src1.w)
1212
1213
1214 .. opcode:: UMAX - Maximum of Unsigned Integers
1215
1216 .. math::
1217
1218   dst.x = max(src0.x, src1.x)
1219
1220   dst.y = max(src0.y, src1.y)
1221
1222   dst.z = max(src0.z, src1.z)
1223
1224   dst.w = max(src0.w, src1.w)
1225
1226
1227 .. opcode:: IMIN - Minimum of Signed Integers
1228
1229 .. math::
1230
1231   dst.x = min(src0.x, src1.x)
1232
1233   dst.y = min(src0.y, src1.y)
1234
1235   dst.z = min(src0.z, src1.z)
1236
1237   dst.w = min(src0.w, src1.w)
1238
1239
1240 .. opcode:: UMIN - Minimum of Unsigned Integers
1241
1242 .. math::
1243
1244   dst.x = min(src0.x, src1.x)
1245
1246   dst.y = min(src0.y, src1.y)
1247
1248   dst.z = min(src0.z, src1.z)
1249
1250   dst.w = min(src0.w, src1.w)
1251
1252
1253 .. opcode:: SHL - Shift Left
1254
1255 .. math::
1256
1257   dst.x = src0.x << src1.x
1258
1259   dst.y = src0.y << src1.x
1260
1261   dst.z = src0.z << src1.x
1262
1263   dst.w = src0.w << src1.x
1264
1265
1266 .. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer)
1267
1268 .. math::
1269
1270   dst.x = src0.x >> src1.x
1271
1272   dst.y = src0.y >> src1.x
1273
1274   dst.z = src0.z >> src1.x
1275
1276   dst.w = src0.w >> src1.x
1277
1278
1279 .. opcode:: USHR - Logical Shift Right
1280
1281 .. math::
1282
1283   dst.x = src0.x >> (unsigned) src1.x
1284
1285   dst.y = src0.y >> (unsigned) src1.x
1286
1287   dst.z = src0.z >> (unsigned) src1.x
1288
1289   dst.w = src0.w >> (unsigned) src1.x
1290
1291
1292 .. opcode:: UCMP - Integer Conditional Move
1293
1294 .. math::
1295
1296   dst.x = src0.x ? src1.x : src2.x
1297
1298   dst.y = src0.y ? src1.y : src2.y
1299
1300   dst.z = src0.z ? src1.z : src2.z
1301
1302   dst.w = src0.w ? src1.w : src2.w
1303
1304
1305
1306 .. opcode:: ISSG - Integer Set Sign
1307
1308 .. math::
1309
1310   dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0
1311
1312   dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0
1313
1314   dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0
1315
1316   dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0
1317
1318
1319
1320 .. opcode:: ISLT - Signed Integer Set On Less Than
1321
1322 .. math::
1323
1324   dst.x = (src0.x < src1.x) ? ~0 : 0
1325
1326   dst.y = (src0.y < src1.y) ? ~0 : 0
1327
1328   dst.z = (src0.z < src1.z) ? ~0 : 0
1329
1330   dst.w = (src0.w < src1.w) ? ~0 : 0
1331
1332
1333 .. opcode:: USLT - Unsigned Integer Set On Less Than
1334
1335 .. math::
1336
1337   dst.x = (src0.x < src1.x) ? ~0 : 0
1338
1339   dst.y = (src0.y < src1.y) ? ~0 : 0
1340
1341   dst.z = (src0.z < src1.z) ? ~0 : 0
1342
1343   dst.w = (src0.w < src1.w) ? ~0 : 0
1344
1345
1346 .. opcode:: ISGE - Signed Integer Set On Greater Equal Than
1347
1348 .. math::
1349
1350   dst.x = (src0.x >= src1.x) ? ~0 : 0
1351
1352   dst.y = (src0.y >= src1.y) ? ~0 : 0
1353
1354   dst.z = (src0.z >= src1.z) ? ~0 : 0
1355
1356   dst.w = (src0.w >= src1.w) ? ~0 : 0
1357
1358
1359 .. opcode:: USGE - Unsigned Integer Set On Greater Equal Than
1360
1361 .. math::
1362
1363   dst.x = (src0.x >= src1.x) ? ~0 : 0
1364
1365   dst.y = (src0.y >= src1.y) ? ~0 : 0
1366
1367   dst.z = (src0.z >= src1.z) ? ~0 : 0
1368
1369   dst.w = (src0.w >= src1.w) ? ~0 : 0
1370
1371
1372 .. opcode:: USEQ - Integer Set On Equal
1373
1374 .. math::
1375
1376   dst.x = (src0.x == src1.x) ? ~0 : 0
1377
1378   dst.y = (src0.y == src1.y) ? ~0 : 0
1379
1380   dst.z = (src0.z == src1.z) ? ~0 : 0
1381
1382   dst.w = (src0.w == src1.w) ? ~0 : 0
1383
1384
1385 .. opcode:: USNE - Integer Set On Not Equal
1386
1387 .. math::
1388
1389   dst.x = (src0.x != src1.x) ? ~0 : 0
1390
1391   dst.y = (src0.y != src1.y) ? ~0 : 0
1392
1393   dst.z = (src0.z != src1.z) ? ~0 : 0
1394
1395   dst.w = (src0.w != src1.w) ? ~0 : 0
1396
1397
1398 .. opcode:: INEG - Integer Negate
1399
1400   Two's complement.
1401
1402 .. math::
1403
1404   dst.x = -src.x
1405
1406   dst.y = -src.y
1407
1408   dst.z = -src.z
1409
1410   dst.w = -src.w
1411
1412
1413 .. opcode:: IABS - Integer Absolute Value
1414
1415 .. math::
1416
1417   dst.x = |src.x|
1418
1419   dst.y = |src.y|
1420
1421   dst.z = |src.z|
1422
1423   dst.w = |src.w|
1424
1425
1426 Geometry ISA
1427 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1428
1429 These opcodes are only supported in geometry shaders; they have no meaning
1430 in any other type of shader.
1431
1432 .. opcode:: EMIT - Emit
1433
1434   Generate a new vertex for the current primitive using the values in the
1435   output registers.
1436
1437
1438 .. opcode:: ENDPRIM - End Primitive
1439
1440   Complete the current primitive (consisting of the emitted vertices),
1441   and start a new one.
1442
1443
1444 GLSL ISA
1445 ^^^^^^^^^^
1446
1447 These opcodes are part of :term:`GLSL`'s opcode set. Support for these
1448 opcodes is determined by a special capability bit, ``GLSL``.
1449 Some require glsl version 1.30 (UIF/BREAKC/SWITCH/CASE/DEFAULT/ENDSWITCH).
1450
1451 .. opcode:: CAL - Subroutine Call
1452
1453   push(pc)
1454   pc = target
1455
1456
1457 .. opcode:: RET - Subroutine Call Return
1458
1459   pc = pop()
1460
1461
1462 .. opcode:: CONT - Continue
1463
1464   Unconditionally moves the point of execution to the instruction after the
1465   last bgnloop. The instruction must appear within a bgnloop/endloop.
1466
1467 .. note::
1468
1469    Support for CONT is determined by a special capability bit,
1470    ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
1471
1472
1473 .. opcode:: BGNLOOP - Begin a Loop
1474
1475   Start a loop. Must have a matching endloop.
1476
1477
1478 .. opcode:: BGNSUB - Begin Subroutine
1479
1480   Starts definition of a subroutine. Must have a matching endsub.
1481
1482
1483 .. opcode:: ENDLOOP - End a Loop
1484
1485   End a loop started with bgnloop.
1486
1487
1488 .. opcode:: ENDSUB - End Subroutine
1489
1490   Ends definition of a subroutine.
1491
1492
1493 .. opcode:: NOP - No Operation
1494
1495   Do nothing.
1496
1497
1498 .. opcode:: BRK - Break
1499
1500   Unconditionally moves the point of execution to the instruction after the
1501   next endloop or endswitch. The instruction must appear within a loop/endloop
1502   or switch/endswitch.
1503
1504
1505 .. opcode:: BREAKC - Break Conditional
1506
1507   Conditionally moves the point of execution to the instruction after the
1508   next endloop or endswitch. The instruction must appear within a loop/endloop
1509   or switch/endswitch.
1510   Condition evaluates to true if src0.x != 0 where src0.x is interpreted
1511   as an integer register.
1512
1513 .. note::
1514
1515    Considered for removal as it's quite inconsistent wrt other opcodes
1516    (could emulate with UIF/BRK/ENDIF).
1517
1518
1519 .. opcode:: IF - Float If
1520
1521   Start an IF ... ELSE .. ENDIF block.  Condition evaluates to true if
1522
1523     src0.x != 0.0
1524
1525   where src0.x is interpreted as a floating point register.
1526
1527
1528 .. opcode:: UIF - Bitwise If
1529
1530   Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
1531
1532     src0.x != 0
1533
1534   where src0.x is interpreted as an integer register.
1535
1536
1537 .. opcode:: ELSE - Else
1538
1539   Starts an else block, after an IF or UIF statement.
1540
1541
1542 .. opcode:: ENDIF - End If
1543
1544   Ends an IF or UIF block.
1545
1546
1547 .. opcode:: SWITCH - Switch
1548
1549    Starts a C-style switch expression. The switch consists of one or multiple
1550    CASE statements, and at most one DEFAULT statement. Execution of a statement
1551    ends when a BRK is hit, but just like in C falling through to other cases
1552    without a break is allowed. Similarly, DEFAULT label is allowed anywhere not
1553    just as last statement, and fallthrough is allowed into/from it.
1554    CASE src arguments are evaluated at bit level against the SWITCH src argument.
1555
1556    Example:
1557    SWITCH src[0].x
1558    CASE src[0].x
1559    (some instructions here)
1560    (optional BRK here)
1561    DEFAULT
1562    (some instructions here)
1563    (optional BRK here)
1564    CASE src[0].x
1565    (some instructions here)
1566    (optional BRK here)
1567    ENDSWITCH
1568
1569
1570 .. opcode:: CASE - Switch case
1571
1572    This represents a switch case label. The src arg must be an integer immediate.
1573
1574
1575 .. opcode:: DEFAULT - Switch default
1576
1577    This represents the default case in the switch, which is taken if no other
1578    case matches.
1579
1580
1581 .. opcode:: ENDSWITCH - End of switch
1582
1583    Ends a switch expression.
1584
1585
1586 .. opcode:: NRM4 - 4-component Vector Normalise
1587
1588 This instruction replicates its result.
1589
1590 .. math::
1591
1592   dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
1593
1594
1595 .. _doubleopcodes:
1596
1597 Double ISA
1598 ^^^^^^^^^^^^^^^
1599
1600 The double-precision opcodes reinterpret four-component vectors into
1601 two-component vectors with doubled precision in each component.
1602
1603 Support for these opcodes is XXX undecided. :T
1604
1605 .. opcode:: DADD - Add
1606
1607 .. math::
1608
1609   dst.xy = src0.xy + src1.xy
1610
1611   dst.zw = src0.zw + src1.zw
1612
1613
1614 .. opcode:: DDIV - Divide
1615
1616 .. math::
1617
1618   dst.xy = src0.xy / src1.xy
1619
1620   dst.zw = src0.zw / src1.zw
1621
1622 .. opcode:: DSEQ - Set on Equal
1623
1624 .. math::
1625
1626   dst.xy = src0.xy == src1.xy ? 1.0F : 0.0F
1627
1628   dst.zw = src0.zw == src1.zw ? 1.0F : 0.0F
1629
1630 .. opcode:: DSLT - Set on Less than
1631
1632 .. math::
1633
1634   dst.xy = src0.xy < src1.xy ? 1.0F : 0.0F
1635
1636   dst.zw = src0.zw < src1.zw ? 1.0F : 0.0F
1637
1638 .. opcode:: DFRAC - Fraction
1639
1640 .. math::
1641
1642   dst.xy = src.xy - \lfloor src.xy\rfloor
1643
1644   dst.zw = src.zw - \lfloor src.zw\rfloor
1645
1646
1647 .. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components
1648
1649 Like the ``frexp()`` routine in many math libraries, this opcode stores the
1650 exponent of its source to ``dst0``, and the significand to ``dst1``, such that
1651 :math:`dst1 \times 2^{dst0} = src` .
1652
1653 .. math::
1654
1655   dst0.xy = exp(src.xy)
1656
1657   dst1.xy = frac(src.xy)
1658
1659   dst0.zw = exp(src.zw)
1660
1661   dst1.zw = frac(src.zw)
1662
1663 .. opcode:: DLDEXP - Multiply Number by Integral Power of 2
1664
1665 This opcode is the inverse of :opcode:`DFRACEXP`.
1666
1667 .. math::
1668
1669   dst.xy = src0.xy \times 2^{src1.xy}
1670
1671   dst.zw = src0.zw \times 2^{src1.zw}
1672
1673 .. opcode:: DMIN - Minimum
1674
1675 .. math::
1676
1677   dst.xy = min(src0.xy, src1.xy)
1678
1679   dst.zw = min(src0.zw, src1.zw)
1680
1681 .. opcode:: DMAX - Maximum
1682
1683 .. math::
1684
1685   dst.xy = max(src0.xy, src1.xy)
1686
1687   dst.zw = max(src0.zw, src1.zw)
1688
1689 .. opcode:: DMUL - Multiply
1690
1691 .. math::
1692
1693   dst.xy = src0.xy \times src1.xy
1694
1695   dst.zw = src0.zw \times src1.zw
1696
1697
1698 .. opcode:: DMAD - Multiply And Add
1699
1700 .. math::
1701
1702   dst.xy = src0.xy \times src1.xy + src2.xy
1703
1704   dst.zw = src0.zw \times src1.zw + src2.zw
1705
1706
1707 .. opcode:: DRCP - Reciprocal
1708
1709 .. math::
1710
1711    dst.xy = \frac{1}{src.xy}
1712
1713    dst.zw = \frac{1}{src.zw}
1714
1715 .. opcode:: DSQRT - Square Root
1716
1717 .. math::
1718
1719    dst.xy = \sqrt{src.xy}
1720
1721    dst.zw = \sqrt{src.zw}
1722
1723
1724 .. _samplingopcodes:
1725
1726 Resource Sampling Opcodes
1727 ^^^^^^^^^^^^^^^^^^^^^^^^^
1728
1729 Those opcodes follow very closely semantics of the respective Direct3D
1730 instructions. If in doubt double check Direct3D documentation.
1731
1732 .. opcode:: SAMPLE - Using provided address, sample data from the
1733                specified texture using the filtering mode identified
1734                by the gven sampler. The source data may come from
1735                any resource type other than buffers.
1736                SAMPLE dst, address, sampler_view, sampler
1737                e.g.
1738                SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]
1739
1740 .. opcode:: SAMPLE_I - Simplified alternative to the SAMPLE instruction.
1741                Using the provided integer address, SAMPLE_I fetches data
1742                from the specified sampler view without any filtering.
1743                The source data may come from any resource type other
1744                than CUBE.
1745                SAMPLE_I dst, address, sampler_view
1746                e.g.
1747                SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]
1748                The 'address' is specified as unsigned integers. If the
1749                'address' is out of range [0...(# texels - 1)] the
1750                result of the fetch is always 0 in all components.
1751                As such the instruction doesn't honor address wrap
1752                modes, in cases where that behavior is desirable
1753                'SAMPLE' instruction should be used.
1754                address.w always provides an unsigned integer mipmap
1755                level. If the value is out of the range then the
1756                instruction always returns 0 in all components.
1757                address.yz are ignored for buffers and 1d textures.
1758                address.z is ignored for 1d texture arrays and 2d
1759                textures.
1760                For 1D texture arrays address.y provides the array
1761                index (also as unsigned integer). If the value is
1762                out of the range of available array indices
1763                [0... (array size - 1)] then the opcode always returns
1764                0 in all components.
1765                For 2D texture arrays address.z provides the array
1766                index, otherwise it exhibits the same behavior as in
1767                the case for 1D texture arrays.
1768                The exact semantics of the source address are presented
1769                in the table below:
1770                resource type         X     Y     Z       W
1771                -------------         ------------------------
1772                PIPE_BUFFER           x                ignored
1773                PIPE_TEXTURE_1D       x                  mpl
1774                PIPE_TEXTURE_2D       x     y            mpl
1775                PIPE_TEXTURE_3D       x     y     z      mpl
1776                PIPE_TEXTURE_RECT     x     y            mpl
1777                PIPE_TEXTURE_CUBE     not allowed as source
1778                PIPE_TEXTURE_1D_ARRAY x    idx           mpl
1779                PIPE_TEXTURE_2D_ARRAY x     y    idx     mpl
1780
1781                Where 'mpl' is a mipmap level and 'idx' is the
1782                array index.
1783
1784 .. opcode:: SAMPLE_I_MS - Just like SAMPLE_I but allows fetch data from
1785                multi-sampled surfaces.
1786                SAMPLE_I_MS dst, address, sampler_view, sample
1787
1788 .. opcode:: SAMPLE_B - Just like the SAMPLE instruction with the
1789                exception that an additional bias is applied to the
1790                level of detail computed as part of the instruction
1791                execution.
1792                SAMPLE_B dst, address, sampler_view, sampler, lod_bias
1793                e.g.
1794                SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x
1795
1796 .. opcode:: SAMPLE_C - Similar to the SAMPLE instruction but it
1797                performs a comparison filter. The operands to SAMPLE_C
1798                are identical to SAMPLE, except that there is an additional
1799                float32 operand, reference value, which must be a register
1800                with single-component, or a scalar literal.
1801                SAMPLE_C makes the hardware use the current samplers
1802                compare_func (in pipe_sampler_state) to compare
1803                reference value against the red component value for the
1804                surce resource at each texel that the currently configured
1805                texture filter covers based on the provided coordinates.
1806                SAMPLE_C dst, address, sampler_view.r, sampler, ref_value
1807                e.g.
1808                SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x
1809
1810 .. opcode:: SAMPLE_C_LZ - Same as SAMPLE_C, but LOD is 0 and derivatives
1811                are ignored. The LZ stands for level-zero.
1812                SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value
1813                e.g.
1814                SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x
1815
1816
1817 .. opcode:: SAMPLE_D - SAMPLE_D is identical to the SAMPLE opcode except
1818                that the derivatives for the source address in the x
1819                direction and the y direction are provided by extra
1820                parameters.
1821                SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y
1822                e.g.
1823                SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]
1824
1825 .. opcode:: SAMPLE_L - SAMPLE_L is identical to the SAMPLE opcode except
1826                that the LOD is provided directly as a scalar value,
1827                representing no anisotropy.
1828                SAMPLE_L dst, address, sampler_view, sampler, explicit_lod
1829                e.g.
1830                SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x
1831
1832 .. opcode:: GATHER4 - Gathers the four texels to be used in a bi-linear
1833                filtering operation and packs them into a single register.
1834                Only works with 2D, 2D array, cubemaps, and cubemaps arrays.
1835                For 2D textures, only the addressing modes of the sampler and
1836                the top level of any mip pyramid are used. Set W to zero.
1837                It behaves like the SAMPLE instruction, but a filtered
1838                sample is not generated. The four samples that contribute
1839                to filtering are placed into xyzw in counter-clockwise order,
1840                starting with the (u,v) texture coordinate delta at the
1841                following locations (-, +), (+, +), (+, -), (-, -), where
1842                the magnitude of the deltas are half a texel.
1843
1844
1845 .. opcode:: SVIEWINFO - query the dimensions of a given sampler view.
1846                dst receives width, height, depth or array size and
1847                number of mipmap levels as int4. The dst can have a writemask
1848                which will specify what info is the caller interested
1849                in.
1850                SVIEWINFO dst, src_mip_level, sampler_view
1851                e.g.
1852                SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]
1853                src_mip_level is an unsigned integer scalar. If it's
1854                out of range then returns 0 for width, height and
1855                depth/array size but the total number of mipmap is
1856                still returned correctly for the given sampler view.
1857                The returned width, height and depth values are for
1858                the mipmap level selected by the src_mip_level and
1859                are in the number of texels.
1860                For 1d texture array width is in dst.x, array size
1861                is in dst.y and dst.zw are always 0.
1862
1863 .. opcode:: SAMPLE_POS - query the position of a given sample.
1864                dst receives float4 (x, y, 0, 0) indicated where the
1865                sample is located. If the resource is not a multi-sample
1866                resource and not a render target, the result is 0.
1867
1868 .. opcode:: SAMPLE_INFO - dst receives number of samples in x.
1869                If the resource is not a multi-sample resource and
1870                not a render target, the result is 0.
1871
1872
1873 .. _resourceopcodes:
1874
1875 Resource Access Opcodes
1876 ^^^^^^^^^^^^^^^^^^^^^^^
1877
1878 .. opcode:: LOAD - Fetch data from a shader resource
1879
1880                Syntax: ``LOAD dst, resource, address``
1881
1882                Example: ``LOAD TEMP[0], RES[0], TEMP[1]``
1883
1884                Using the provided integer address, LOAD fetches data
1885                from the specified buffer or texture without any
1886                filtering.
1887
1888                The 'address' is specified as a vector of unsigned
1889                integers.  If the 'address' is out of range the result
1890                is unspecified.
1891
1892                Only the first mipmap level of a resource can be read
1893                from using this instruction.
1894
1895                For 1D or 2D texture arrays, the array index is
1896                provided as an unsigned integer in address.y or
1897                address.z, respectively.  address.yz are ignored for
1898                buffers and 1D textures.  address.z is ignored for 1D
1899                texture arrays and 2D textures.  address.w is always
1900                ignored.
1901
1902 .. opcode:: STORE - Write data to a shader resource
1903
1904                Syntax: ``STORE resource, address, src``
1905
1906                Example: ``STORE RES[0], TEMP[0], TEMP[1]``
1907
1908                Using the provided integer address, STORE writes data
1909                to the specified buffer or texture.
1910
1911                The 'address' is specified as a vector of unsigned
1912                integers.  If the 'address' is out of range the result
1913                is unspecified.
1914
1915                Only the first mipmap level of a resource can be
1916                written to using this instruction.
1917
1918                For 1D or 2D texture arrays, the array index is
1919                provided as an unsigned integer in address.y or
1920                address.z, respectively.  address.yz are ignored for
1921                buffers and 1D textures.  address.z is ignored for 1D
1922                texture arrays and 2D textures.  address.w is always
1923                ignored.
1924
1925
1926 .. _threadsyncopcodes:
1927
1928 Inter-thread synchronization opcodes
1929 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1930
1931 These opcodes are intended for communication between threads running
1932 within the same compute grid.  For now they're only valid in compute
1933 programs.
1934
1935 .. opcode:: MFENCE - Memory fence
1936
1937   Syntax: ``MFENCE resource``
1938
1939   Example: ``MFENCE RES[0]``
1940
1941   This opcode forces strong ordering between any memory access
1942   operations that affect the specified resource.  This means that
1943   previous loads and stores (and only those) will be performed and
1944   visible to other threads before the program execution continues.
1945
1946
1947 .. opcode:: LFENCE - Load memory fence
1948
1949   Syntax: ``LFENCE resource``
1950
1951   Example: ``LFENCE RES[0]``
1952
1953   Similar to MFENCE, but it only affects the ordering of memory loads.
1954
1955
1956 .. opcode:: SFENCE - Store memory fence
1957
1958   Syntax: ``SFENCE resource``
1959
1960   Example: ``SFENCE RES[0]``
1961
1962   Similar to MFENCE, but it only affects the ordering of memory stores.
1963
1964
1965 .. opcode:: BARRIER - Thread group barrier
1966
1967   ``BARRIER``
1968
1969   This opcode suspends the execution of the current thread until all
1970   the remaining threads in the working group reach the same point of
1971   the program.  Results are unspecified if any of the remaining
1972   threads terminates or never reaches an executed BARRIER instruction.
1973
1974
1975 .. _atomopcodes:
1976
1977 Atomic opcodes
1978 ^^^^^^^^^^^^^^
1979
1980 These opcodes provide atomic variants of some common arithmetic and
1981 logical operations.  In this context atomicity means that another
1982 concurrent memory access operation that affects the same memory
1983 location is guaranteed to be performed strictly before or after the
1984 entire execution of the atomic operation.
1985
1986 For the moment they're only valid in compute programs.
1987
1988 .. opcode:: ATOMUADD - Atomic integer addition
1989
1990   Syntax: ``ATOMUADD dst, resource, offset, src``
1991
1992   Example: ``ATOMUADD TEMP[0], RES[0], TEMP[1], TEMP[2]``
1993
1994   The following operation is performed atomically on each component:
1995
1996 .. math::
1997
1998   dst_i = resource[offset]_i
1999
2000   resource[offset]_i = dst_i + src_i
2001
2002
2003 .. opcode:: ATOMXCHG - Atomic exchange
2004
2005   Syntax: ``ATOMXCHG dst, resource, offset, src``
2006
2007   Example: ``ATOMXCHG TEMP[0], RES[0], TEMP[1], TEMP[2]``
2008
2009   The following operation is performed atomically on each component:
2010
2011 .. math::
2012
2013   dst_i = resource[offset]_i
2014
2015   resource[offset]_i = src_i
2016
2017
2018 .. opcode:: ATOMCAS - Atomic compare-and-exchange
2019
2020   Syntax: ``ATOMCAS dst, resource, offset, cmp, src``
2021
2022   Example: ``ATOMCAS TEMP[0], RES[0], TEMP[1], TEMP[2], TEMP[3]``
2023
2024   The following operation is performed atomically on each component:
2025
2026 .. math::
2027
2028   dst_i = resource[offset]_i
2029
2030   resource[offset]_i = (dst_i == cmp_i ? src_i : dst_i)
2031
2032
2033 .. opcode:: ATOMAND - Atomic bitwise And
2034
2035   Syntax: ``ATOMAND dst, resource, offset, src``
2036
2037   Example: ``ATOMAND TEMP[0], RES[0], TEMP[1], TEMP[2]``
2038
2039   The following operation is performed atomically on each component:
2040
2041 .. math::
2042
2043   dst_i = resource[offset]_i
2044
2045   resource[offset]_i = dst_i \& src_i
2046
2047
2048 .. opcode:: ATOMOR - Atomic bitwise Or
2049
2050   Syntax: ``ATOMOR dst, resource, offset, src``
2051
2052   Example: ``ATOMOR TEMP[0], RES[0], TEMP[1], TEMP[2]``
2053
2054   The following operation is performed atomically on each component:
2055
2056 .. math::
2057
2058   dst_i = resource[offset]_i
2059
2060   resource[offset]_i = dst_i | src_i
2061
2062
2063 .. opcode:: ATOMXOR - Atomic bitwise Xor
2064
2065   Syntax: ``ATOMXOR dst, resource, offset, src``
2066
2067   Example: ``ATOMXOR TEMP[0], RES[0], TEMP[1], TEMP[2]``
2068
2069   The following operation is performed atomically on each component:
2070
2071 .. math::
2072
2073   dst_i = resource[offset]_i
2074
2075   resource[offset]_i = dst_i \oplus src_i
2076
2077
2078 .. opcode:: ATOMUMIN - Atomic unsigned minimum
2079
2080   Syntax: ``ATOMUMIN dst, resource, offset, src``
2081
2082   Example: ``ATOMUMIN TEMP[0], RES[0], TEMP[1], TEMP[2]``
2083
2084   The following operation is performed atomically on each component:
2085
2086 .. math::
2087
2088   dst_i = resource[offset]_i
2089
2090   resource[offset]_i = (dst_i < src_i ? dst_i : src_i)
2091
2092
2093 .. opcode:: ATOMUMAX - Atomic unsigned maximum
2094
2095   Syntax: ``ATOMUMAX dst, resource, offset, src``
2096
2097   Example: ``ATOMUMAX TEMP[0], RES[0], TEMP[1], TEMP[2]``
2098
2099   The following operation is performed atomically on each component:
2100
2101 .. math::
2102
2103   dst_i = resource[offset]_i
2104
2105   resource[offset]_i = (dst_i > src_i ? dst_i : src_i)
2106
2107
2108 .. opcode:: ATOMIMIN - Atomic signed minimum
2109
2110   Syntax: ``ATOMIMIN dst, resource, offset, src``
2111
2112   Example: ``ATOMIMIN TEMP[0], RES[0], TEMP[1], TEMP[2]``
2113
2114   The following operation is performed atomically on each component:
2115
2116 .. math::
2117
2118   dst_i = resource[offset]_i
2119
2120   resource[offset]_i = (dst_i < src_i ? dst_i : src_i)
2121
2122
2123 .. opcode:: ATOMIMAX - Atomic signed maximum
2124
2125   Syntax: ``ATOMIMAX dst, resource, offset, src``
2126
2127   Example: ``ATOMIMAX TEMP[0], RES[0], TEMP[1], TEMP[2]``
2128
2129   The following operation is performed atomically on each component:
2130
2131 .. math::
2132
2133   dst_i = resource[offset]_i
2134
2135   resource[offset]_i = (dst_i > src_i ? dst_i : src_i)
2136
2137
2138
2139 Explanation of symbols used
2140 ------------------------------
2141
2142
2143 Functions
2144 ^^^^^^^^^^^^^^
2145
2146
2147   :math:`|x|`       Absolute value of `x`.
2148
2149   :math:`\lceil x \rceil` Ceiling of `x`.
2150
2151   clamp(x,y,z)      Clamp x between y and z.
2152                     (x < y) ? y : (x > z) ? z : x
2153
2154   :math:`\lfloor x\rfloor` Floor of `x`.
2155
2156   :math:`\log_2{x}` Logarithm of `x`, base 2.
2157
2158   max(x,y)          Maximum of x and y.
2159                     (x > y) ? x : y
2160
2161   min(x,y)          Minimum of x and y.
2162                     (x < y) ? x : y
2163
2164   partialx(x)       Derivative of x relative to fragment's X.
2165
2166   partialy(x)       Derivative of x relative to fragment's Y.
2167
2168   pop()             Pop from stack.
2169
2170   :math:`x^y`       `x` to the power `y`.
2171
2172   push(x)           Push x on stack.
2173
2174   round(x)          Round x.
2175
2176   trunc(x)          Truncate x, i.e. drop the fraction bits.
2177
2178
2179 Keywords
2180 ^^^^^^^^^^^^^
2181
2182
2183   discard           Discard fragment.
2184
2185   pc                Program counter.
2186
2187   target            Label of target instruction.
2188
2189
2190 Other tokens
2191 ---------------
2192
2193
2194 Declaration
2195 ^^^^^^^^^^^
2196
2197
2198 Declares a register that is will be referenced as an operand in Instruction
2199 tokens.
2200
2201 File field contains register file that is being declared and is one
2202 of TGSI_FILE.
2203
2204 UsageMask field specifies which of the register components can be accessed
2205 and is one of TGSI_WRITEMASK.
2206
2207 The Local flag specifies that a given value isn't intended for
2208 subroutine parameter passing and, as a result, the implementation
2209 isn't required to give any guarantees of it being preserved across
2210 subroutine boundaries.  As it's merely a compiler hint, the
2211 implementation is free to ignore it.
2212
2213 If Dimension flag is set to 1, a Declaration Dimension token follows.
2214
2215 If Semantic flag is set to 1, a Declaration Semantic token follows.
2216
2217 If Interpolate flag is set to 1, a Declaration Interpolate token follows.
2218
2219 If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows.
2220
2221 If Array flag is set to 1, a Declaration Array token follows.
2222
2223 Array Declaration
2224 ^^^^^^^^^^^^^^^^^^^^^^^^
2225
2226 Declarations can optional have an ArrayID attribute which can be referred by
2227 indirect addressing operands. An ArrayID of zero is reserved and treaded as
2228 if no ArrayID is specified.
2229
2230 If an indirect addressing operand refers to a specific declaration by using
2231 an ArrayID only the registers in this declaration are guaranteed to be
2232 accessed, accessing any register outside this declaration results in undefined
2233 behavior. Note that for compatibility the effective index is zero-based and
2234 not relative to the specified declaration
2235
2236 If no ArrayID is specified with an indirect addressing operand the whole
2237 register file might be accessed by this operand. This is strongly discouraged
2238 and will prevent packing of scalar/vec2 arrays and effective alias analysis.
2239
2240 Declaration Semantic
2241 ^^^^^^^^^^^^^^^^^^^^^^^^
2242
2243   Vertex and fragment shader input and output registers may be labeled
2244   with semantic information consisting of a name and index.
2245
2246   Follows Declaration token if Semantic bit is set.
2247
2248   Since its purpose is to link a shader with other stages of the pipeline,
2249   it is valid to follow only those Declaration tokens that declare a register
2250   either in INPUT or OUTPUT file.
2251
2252   SemanticName field contains the semantic name of the register being declared.
2253   There is no default value.
2254
2255   SemanticIndex is an optional subscript that can be used to distinguish
2256   different register declarations with the same semantic name. The default value
2257   is 0.
2258
2259   The meanings of the individual semantic names are explained in the following
2260   sections.
2261
2262 TGSI_SEMANTIC_POSITION
2263 """"""""""""""""""""""
2264
2265 For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
2266 output register which contains the homogeneous vertex position in the clip
2267 space coordinate system.  After clipping, the X, Y and Z components of the
2268 vertex will be divided by the W value to get normalized device coordinates.
2269
2270 For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
2271 fragment shader input contains the fragment's window position.  The X
2272 component starts at zero and always increases from left to right.
2273 The Y component starts at zero and always increases but Y=0 may either
2274 indicate the top of the window or the bottom depending on the fragment
2275 coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
2276 The Z coordinate ranges from 0 to 1 to represent depth from the front
2277 to the back of the Z buffer.  The W component contains the reciprocol
2278 of the interpolated vertex position W component.
2279
2280 Fragment shaders may also declare an output register with
2281 TGSI_SEMANTIC_POSITION.  Only the Z component is writable.  This allows
2282 the fragment shader to change the fragment's Z position.
2283
2284
2285
2286 TGSI_SEMANTIC_COLOR
2287 """""""""""""""""""
2288
2289 For vertex shader outputs or fragment shader inputs/outputs, this
2290 label indicates that the resister contains an R,G,B,A color.
2291
2292 Several shader inputs/outputs may contain colors so the semantic index
2293 is used to distinguish them.  For example, color[0] may be the diffuse
2294 color while color[1] may be the specular color.
2295
2296 This label is needed so that the flat/smooth shading can be applied
2297 to the right interpolants during rasterization.
2298
2299
2300
2301 TGSI_SEMANTIC_BCOLOR
2302 """"""""""""""""""""
2303
2304 Back-facing colors are only used for back-facing polygons, and are only valid
2305 in vertex shader outputs. After rasterization, all polygons are front-facing
2306 and COLOR and BCOLOR end up occupying the same slots in the fragment shader,
2307 so all BCOLORs effectively become regular COLORs in the fragment shader.
2308
2309
2310 TGSI_SEMANTIC_FOG
2311 """""""""""""""""
2312
2313 Vertex shader inputs and outputs and fragment shader inputs may be
2314 labeled with TGSI_SEMANTIC_FOG to indicate that the register contains
2315 a fog coordinate in the form (F, 0, 0, 1).  Typically, the fragment
2316 shader will use the fog coordinate to compute a fog blend factor which
2317 is used to blend the normal fragment color with a constant fog color.
2318
2319 Only the first component matters when writing from the vertex shader;
2320 the driver will ensure that the coordinate is in this format when used
2321 as a fragment shader input.
2322
2323
2324 TGSI_SEMANTIC_PSIZE
2325 """""""""""""""""""
2326
2327 Vertex shader input and output registers may be labeled with
2328 TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size
2329 in the form (S, 0, 0, 1).  The point size controls the width or diameter
2330 of points for rasterization.  This label cannot be used in fragment
2331 shaders.
2332
2333 When using this semantic, be sure to set the appropriate state in the
2334 :ref:`rasterizer` first.
2335
2336
2337 TGSI_SEMANTIC_TEXCOORD
2338 """"""""""""""""""""""
2339
2340 Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
2341
2342 Vertex shader outputs and fragment shader inputs may be labeled with
2343 this semantic to make them replaceable by sprite coordinates via the
2344 sprite_coord_enable state in the :ref:`rasterizer`.
2345 The semantic index permitted with this semantic is limited to <= 7.
2346
2347 If the driver does not support TEXCOORD, sprite coordinate replacement
2348 applies to inputs with the GENERIC semantic instead.
2349
2350 The intended use case for this semantic is gl_TexCoord.
2351
2352
2353 TGSI_SEMANTIC_PCOORD
2354 """"""""""""""""""""
2355
2356 Only available if PIPE_CAP_TGSI_TEXCOORD is exposed !
2357
2358 Fragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate
2359 that the register contains sprite coordinates in the form (x, y, 0, 1), if
2360 the current primitive is a point and point sprites are enabled. Otherwise,
2361 the contents of the register are undefined.
2362
2363 The intended use case for this semantic is gl_PointCoord.
2364
2365
2366 TGSI_SEMANTIC_GENERIC
2367 """""""""""""""""""""
2368
2369 All vertex/fragment shader inputs/outputs not labeled with any other
2370 semantic label can be considered to be generic attributes.  Typical
2371 uses of generic inputs/outputs are texcoords and user-defined values.
2372
2373
2374 TGSI_SEMANTIC_NORMAL
2375 """"""""""""""""""""
2376
2377 Indicates that a vertex shader input is a normal vector.  This is
2378 typically only used for legacy graphics APIs.
2379
2380
2381 TGSI_SEMANTIC_FACE
2382 """"""""""""""""""
2383
2384 This label applies to fragment shader inputs only and indicates that
2385 the register contains front/back-face information of the form (F, 0,
2386 0, 1).  The first component will be positive when the fragment belongs
2387 to a front-facing polygon, and negative when the fragment belongs to a
2388 back-facing polygon.
2389
2390
2391 TGSI_SEMANTIC_EDGEFLAG
2392 """"""""""""""""""""""
2393
2394 For vertex shaders, this sematic label indicates that an input or
2395 output is a boolean edge flag.  The register layout is [F, x, x, x]
2396 where F is 0.0 or 1.0 and x = don't care.  Normally, the vertex shader
2397 simply copies the edge flag input to the edgeflag output.
2398
2399 Edge flags are used to control which lines or points are actually
2400 drawn when the polygon mode converts triangles/quads/polygons into
2401 points or lines.
2402
2403 TGSI_SEMANTIC_STENCIL
2404 """"""""""""""""""""""
2405
2406 For fragment shaders, this semantic label indicates than an output
2407 is a writable stencil reference value. Only the Y component is writable.
2408 This allows the fragment shader to change the fragments stencilref value.
2409
2410
2411 Declaration Interpolate
2412 ^^^^^^^^^^^^^^^^^^^^^^^
2413
2414 This token is only valid for fragment shader INPUT declarations.
2415
2416 The Interpolate field specifes the way input is being interpolated by
2417 the rasteriser and is one of TGSI_INTERPOLATE_*.
2418
2419 The CylindricalWrap bitfield specifies which register components
2420 should be subject to cylindrical wrapping when interpolating by the
2421 rasteriser. If TGSI_CYLINDRICAL_WRAP_X is set to 1, the X component
2422 should be interpolated according to cylindrical wrapping rules.
2423
2424
2425 Declaration Sampler View
2426 ^^^^^^^^^^^^^^^^^^^^^^^^
2427
2428    Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
2429
2430    DCL SVIEW[#], resource, type(s)
2431
2432    Declares a shader input sampler view and assigns it to a SVIEW[#]
2433    register.
2434
2435    resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray.
2436
2437    type must be 1 or 4 entries (if specifying on a per-component
2438    level) out of UNORM, SNORM, SINT, UINT and FLOAT.
2439
2440
2441 Declaration Resource
2442 ^^^^^^^^^^^^^^^^^^^^
2443
2444    Follows Declaration token if file is TGSI_FILE_RESOURCE.
2445
2446    DCL RES[#], resource [, WR] [, RAW]
2447
2448    Declares a shader input resource and assigns it to a RES[#]
2449    register.
2450
2451    resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and
2452    2DArray.
2453
2454    If the RAW keyword is not specified, the texture data will be
2455    subject to conversion, swizzling and scaling as required to yield
2456    the specified data type from the physical data format of the bound
2457    resource.
2458
2459    If the RAW keyword is specified, no channel conversion will be
2460    performed: the values read for each of the channels (X,Y,Z,W) will
2461    correspond to consecutive words in the same order and format
2462    they're found in memory.  No element-to-address conversion will be
2463    performed either: the value of the provided X coordinate will be
2464    interpreted in byte units instead of texel units.  The result of
2465    accessing a misaligned address is undefined.
2466
2467    Usage of the STORE opcode is only allowed if the WR (writable) flag
2468    is set.
2469
2470
2471 Properties
2472 ^^^^^^^^^^^^^^^^^^^^^^^^
2473
2474
2475   Properties are general directives that apply to the whole TGSI program.
2476
2477 FS_COORD_ORIGIN
2478 """""""""""""""
2479
2480 Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
2481 The default value is UPPER_LEFT.
2482
2483 If UPPER_LEFT, the position will be (0,0) at the upper left corner and
2484 increase downward and rightward.
2485 If LOWER_LEFT, the position will be (0,0) at the lower left corner and
2486 increase upward and rightward.
2487
2488 OpenGL defaults to LOWER_LEFT, and is configurable with the
2489 GL_ARB_fragment_coord_conventions extension.
2490
2491 DirectX 9/10 use UPPER_LEFT.
2492
2493 FS_COORD_PIXEL_CENTER
2494 """""""""""""""""""""
2495
2496 Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
2497 The default value is HALF_INTEGER.
2498
2499 If HALF_INTEGER, the fractionary part of the position will be 0.5
2500 If INTEGER, the fractionary part of the position will be 0.0
2501
2502 Note that this does not affect the set of fragments generated by
2503 rasterization, which is instead controlled by half_pixel_center in the
2504 rasterizer.
2505
2506 OpenGL defaults to HALF_INTEGER, and is configurable with the
2507 GL_ARB_fragment_coord_conventions extension.
2508
2509 DirectX 9 uses INTEGER.
2510 DirectX 10 uses HALF_INTEGER.
2511
2512 FS_COLOR0_WRITES_ALL_CBUFS
2513 """"""""""""""""""""""""""
2514 Specifies that writes to the fragment shader color 0 are replicated to all
2515 bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where
2516 fragData is directed to a single color buffer, but fragColor is broadcast.
2517
2518 VS_PROHIBIT_UCPS
2519 """"""""""""""""""""""""""
2520 If this property is set on the program bound to the shader stage before the
2521 fragment shader, user clip planes should have no effect (be disabled) even if
2522 that shader does not write to any clip distance outputs and the rasterizer's
2523 clip_plane_enable is non-zero.
2524 This property is only supported by drivers that also support shader clip
2525 distance outputs.
2526 This is useful for APIs that don't have UCPs and where clip distances written
2527 by a shader cannot be disabled.
2528
2529
2530 Texture Sampling and Texture Formats
2531 ------------------------------------
2532
2533 This table shows how texture image components are returned as (x,y,z,w) tuples
2534 by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
2535 :opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
2536 well.
2537
2538 +--------------------+--------------+--------------------+--------------+
2539 | Texture Components | Gallium      | OpenGL             | Direct3D 9   |
2540 +====================+==============+====================+==============+
2541 | R                  | (r, 0, 0, 1) | (r, 0, 0, 1)       | (r, 1, 1, 1) |
2542 +--------------------+--------------+--------------------+--------------+
2543 | RG                 | (r, g, 0, 1) | (r, g, 0, 1)       | (r, g, 1, 1) |
2544 +--------------------+--------------+--------------------+--------------+
2545 | RGB                | (r, g, b, 1) | (r, g, b, 1)       | (r, g, b, 1) |
2546 +--------------------+--------------+--------------------+--------------+
2547 | RGBA               | (r, g, b, a) | (r, g, b, a)       | (r, g, b, a) |
2548 +--------------------+--------------+--------------------+--------------+
2549 | A                  | (0, 0, 0, a) | (0, 0, 0, a)       | (0, 0, 0, a) |
2550 +--------------------+--------------+--------------------+--------------+
2551 | L                  | (l, l, l, 1) | (l, l, l, 1)       | (l, l, l, 1) |
2552 +--------------------+--------------+--------------------+--------------+
2553 | LA                 | (l, l, l, a) | (l, l, l, a)       | (l, l, l, a) |
2554 +--------------------+--------------+--------------------+--------------+
2555 | I                  | (i, i, i, i) | (i, i, i, i)       | N/A          |
2556 +--------------------+--------------+--------------------+--------------+
2557 | UV                 | XXX TBD      | (0, 0, 0, 1)       | (u, v, 1, 1) |
2558 |                    |              | [#envmap-bumpmap]_ |              |
2559 +--------------------+--------------+--------------------+--------------+
2560 | Z                  | XXX TBD      | (z, z, z, 1)       | (0, z, 0, 1) |
2561 |                    |              | [#depth-tex-mode]_ |              |
2562 +--------------------+--------------+--------------------+--------------+
2563 | S                  | (s, s, s, s) | unknown            | unknown      |
2564 +--------------------+--------------+--------------------+--------------+
2565
2566 .. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
2567 .. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
2568    or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.