src/gallium/docs/source/tgsi.rst

   1 TGSI
   2 ====
   3
   4 TGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
   5 for describing shaders. Since Gallium is inherently shaderful, shaders are
   6 an important part of the API. TGSI is the only intermediate representation
   7 used by all drivers.
   8
   9 Instruction Set
  10 ---------------
  11
  12 From GL_NV_vertex_program
  13 ^^^^^^^^^^^^^^^^^^^^^^^^^
  14
  15
  16 ARL - Address Register Load
  17
  18 .. math::
  19
  20   dst.x = \lfloor src.x\rfloor
  21
  22   dst.y = \lfloor src.y\rfloor
  23
  24   dst.z = \lfloor src.z\rfloor
  25
  26   dst.w = \lfloor src.w\rfloor
  27
  28
  29 MOV - Move
  30
  31 .. math::
  32
  33   dst.x = src.x
  34
  35   dst.y = src.y
  36
  37   dst.z = src.z
  38
  39   dst.w = src.w
  40
  41
  42 LIT - Light Coefficients
  43
  44 .. math::
  45
  46   dst.x = 1
  47
  48   dst.y = max(src.x, 0)
  49
  50   dst.z = (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0
  51
  52   dst.w = 1
  53
  54
  55 RCP - Reciprocal
  56
  57 .. math::
  58
  59   dst.x = \frac{1}{src.x}
  60
  61   dst.y = \frac{1}{src.x}
  62
  63   dst.z = \frac{1}{src.x}
  64
  65   dst.w = \frac{1}{src.x}
  66
  67
  68 RSQ - Reciprocal Square Root
  69
  70 .. math::
  71
  72   dst.x = \frac{1}{\sqrt{|src.x|}}
  73
  74   dst.y = \frac{1}{\sqrt{|src.x|}}
  75
  76   dst.z = \frac{1}{\sqrt{|src.x|}}
  77
  78   dst.w = \frac{1}{\sqrt{|src.x|}}
  79
  80
  81 EXP - Approximate Exponential Base 2
  82
  83 .. math::
  84
  85   dst.x = 2^{\lfloor src.x\rfloor}
  86
  87   dst.y = src.x - \lfloor src.x\rfloor
  88
  89   dst.z = 2^{src.x}
  90
  91   dst.w = 1
  92
  93
  94 LOG - Approximate Logarithm Base 2
  95
  96 .. math::
  97
  98   dst.x = \lfloor\log_2{|src.x|}\rfloor
  99
 100   dst.y = \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}}
 101
 102   dst.z = \log_2{|src.x|}
 103
 104   dst.w = 1
 105
 106
 107 MUL - Multiply
 108
 109 .. math::
 110
 111   dst.x = src0.x \times src1.x
 112
 113   dst.y = src0.y \times src1.y
 114
 115   dst.z = src0.z \times src1.z
 116
 117   dst.w = src0.w \times src1.w
 118
 119
 120 ADD - Add
 121
 122 .. math::
 123
 124   dst.x = src0.x + src1.x
 125
 126   dst.y = src0.y + src1.y
 127
 128   dst.z = src0.z + src1.z
 129
 130   dst.w = src0.w + src1.w
 131
 132
 133 DP3 - 3-component Dot Product
 134
 135 .. math::
 136
 137   dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
 138
 139   dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
 140
 141   dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
 142
 143   dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
 144
 145
 146 DP4 - 4-component Dot Product
 147
 148 .. math::
 149
 150   dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
 151
 152   dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
 153
 154   dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
 155
 156   dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
 157
 158
 159 DST - Distance Vector
 160
 161 .. math::
 162
 163   dst.x = 1
 164
 165   dst.y = src0.y \times src1.y
 166
 167   dst.z = src0.z
 168
 169   dst.w = src1.w
 170
 171
 172 MIN - Minimum
 173
 174 .. math::
 175
 176   dst.x = min(src0.x, src1.x)
 177
 178   dst.y = min(src0.y, src1.y)
 179
 180   dst.z = min(src0.z, src1.z)
 181
 182   dst.w = min(src0.w, src1.w)
 183
 184
 185 MAX - Maximum
 186
 187 .. math::
 188
 189   dst.x = max(src0.x, src1.x)
 190
 191   dst.y = max(src0.y, src1.y)
 192
 193   dst.z = max(src0.z, src1.z)
 194
 195   dst.w = max(src0.w, src1.w)
 196
 197
 198 SLT - Set On Less Than
 199
 200 .. math::
 201
 202   dst.x = (src0.x < src1.x) ? 1 : 0
 203
 204   dst.y = (src0.y < src1.y) ? 1 : 0
 205
 206   dst.z = (src0.z < src1.z) ? 1 : 0
 207
 208   dst.w = (src0.w < src1.w) ? 1 : 0
 209
 210
 211 SGE - Set On Greater Equal Than
 212
 213 .. math::
 214
 215   dst.x = (src0.x >= src1.x) ? 1 : 0
 216
 217   dst.y = (src0.y >= src1.y) ? 1 : 0
 218
 219   dst.z = (src0.z >= src1.z) ? 1 : 0
 220
 221   dst.w = (src0.w >= src1.w) ? 1 : 0
 222
 223
 224 MAD - Multiply And Add
 225
 226 .. math::
 227
 228   dst.x = src0.x \times src1.x + src2.x
 229
 230   dst.y = src0.y \times src1.y + src2.y
 231
 232   dst.z = src0.z \times src1.z + src2.z
 233
 234   dst.w = src0.w \times src1.w + src2.w
 235
 236
 237 SUB - Subtract
 238
 239 .. math::
 240
 241   dst.x = src0.x - src1.x
 242
 243   dst.y = src0.y - src1.y
 244
 245   dst.z = src0.z - src1.z
 246
 247   dst.w = src0.w - src1.w
 248
 249
 250 LRP - Linear Interpolate
 251
 252 .. math::
 253
 254   dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
 255
 256   dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
 257
 258   dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
 259
 260   dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
 261
 262
 263 CND - Condition
 264
 265 .. math::
 266
 267   dst.x = (src2.x > 0.5) ? src0.x : src1.x
 268
 269   dst.y = (src2.y > 0.5) ? src0.y : src1.y
 270
 271   dst.z = (src2.z > 0.5) ? src0.z : src1.z
 272
 273   dst.w = (src2.w > 0.5) ? src0.w : src1.w
 274
 275
 276 DP2A - 2-component Dot Product And Add
 277
 278 .. math::
 279
 280   dst.x = src0.x \times src1.x + src0.y \times src1.y + src2.x
 281
 282   dst.y = src0.x \times src1.x + src0.y \times src1.y + src2.x
 283
 284   dst.z = src0.x \times src1.x + src0.y \times src1.y + src2.x
 285
 286   dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x
 287
 288
 289 FRAC - Fraction
 290
 291 .. math::
 292
 293   dst.x = src.x - \lfloor src.x\rfloor
 294
 295   dst.y = src.y - \lfloor src.y\rfloor
 296
 297   dst.z = src.z - \lfloor src.z\rfloor
 298
 299   dst.w = src.w - \lfloor src.w\rfloor
 300
 301
 302 CLAMP - Clamp
 303
 304 .. math::
 305
 306   dst.x = clamp(src0.x, src1.x, src2.x)
 307
 308   dst.y = clamp(src0.y, src1.y, src2.y)
 309
 310   dst.z = clamp(src0.z, src1.z, src2.z)
 311
 312   dst.w = clamp(src0.w, src1.w, src2.w)
 313
 314
 315 FLR - Floor
 316
 317 This is identical to ARL.
 318
 319 .. math::
 320
 321   dst.x = \lfloor src.x\rfloor
 322
 323   dst.y = \lfloor src.y\rfloor
 324
 325   dst.z = \lfloor src.z\rfloor
 326
 327   dst.w = \lfloor src.w\rfloor
 328
 329
 330 ROUND - Round
 331
 332 .. math::
 333
 334   dst.x = round(src.x)
 335
 336   dst.y = round(src.y)
 337
 338   dst.z = round(src.z)
 339
 340   dst.w = round(src.w)
 341
 342
 343 EX2 - Exponential Base 2
 344
 345 .. math::
 346
 347   dst.x = 2^{src.x}
 348
 349   dst.y = 2^{src.x}
 350
 351   dst.z = 2^{src.x}
 352
 353   dst.w = 2^{src.x}
 354
 355
 356 LG2 - Logarithm Base 2
 357
 358 .. math::
 359
 360   dst.x = \log_2{src.x}
 361
 362   dst.y = \log_2{src.x}
 363
 364   dst.z = \log_2{src.x}
 365
 366   dst.w = \log_2{src.x}
 367
 368
 369 POW - Power
 370
 371 .. math::
 372
 373   dst.x = src0.x^{src1.x}
 374
 375   dst.y = src0.x^{src1.x}
 376
 377   dst.z = src0.x^{src1.x}
 378
 379   dst.w = src0.x^{src1.x}
 380
 381 XPD - Cross Product
 382
 383 .. math::
 384
 385   dst.x = src0.y \times src1.z - src1.y \times src0.z
 386
 387   dst.y = src0.z \times src1.x - src1.z \times src0.x
 388
 389   dst.z = src0.x \times src1.y - src1.x \times src0.y
 390
 391   dst.w = 1
 392
 393
 394 ABS - Absolute
 395
 396 .. math::
 397
 398   dst.x = |src.x|
 399
 400   dst.y = |src.y|
 401
 402   dst.z = |src.z|
 403
 404   dst.w = |src.w|
 405
 406
 407 RCC - Reciprocal Clamped
 408
 409 XXX cleanup on aisle three
 410
 411 .. math::
 412
 413   dst.x = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
 414
 415   dst.y = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
 416
 417   dst.z = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
 418
 419   dst.w = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
 420
 421
 422 DPH - Homogeneous Dot Product
 423
 424 .. math::
 425
 426   dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
 427
 428   dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
 429
 430   dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
 431
 432   dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w
 433
 434
 435 COS - Cosine
 436
 437 .. math::
 438
 439   dst.x = \cos{src.x}
 440
 441   dst.y = \cos{src.x}
 442
 443   dst.z = \cos{src.x}
 444
 445   dst.w = \cos{src.x}
 446
 447
 448 DDX - Derivative Relative To X
 449
 450 .. math::
 451
 452   dst.x = partialx(src.x)
 453
 454   dst.y = partialx(src.y)
 455
 456   dst.z = partialx(src.z)
 457
 458   dst.w = partialx(src.w)
 459
 460
 461 DDY - Derivative Relative To Y
 462
 463 .. math::
 464
 465   dst.x = partialy(src.x)
 466
 467   dst.y = partialy(src.y)
 468
 469   dst.z = partialy(src.z)
 470
 471   dst.w = partialy(src.w)
 472
 473
 474 KILP - Predicated Discard
 475
 476   discard
 477
 478
 479 PK2H - Pack Two 16-bit Floats
 480
 481   TBD
 482
 483
 484 PK2US - Pack Two Unsigned 16-bit Scalars
 485
 486   TBD
 487
 488
 489 PK4B - Pack Four Signed 8-bit Scalars
 490
 491   TBD
 492
 493
 494 PK4UB - Pack Four Unsigned 8-bit Scalars
 495
 496   TBD
 497
 498
 499 RFL - Reflection Vector
 500
 501 .. math::
 502
 503   dst.x = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.x - src1.x
 504
 505   dst.y = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.y - src1.y
 506
 507   dst.z = 2 \times (src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z) / (src0.x \times src0.x + src0.y \times src0.y + src0.z \times src0.z) \times src0.z - src1.z
 508
 509   dst.w = 1
 510
 511 Considered for removal.
 512
 513
 514 SEQ - Set On Equal
 515
 516 .. math::
 517
 518   dst.x = (src0.x == src1.x) ? 1 : 0
 519   dst.y = (src0.y == src1.y) ? 1 : 0
 520   dst.z = (src0.z == src1.z) ? 1 : 0
 521   dst.w = (src0.w == src1.w) ? 1 : 0
 522
 523
 524 SFL - Set On False
 525
 526 .. math::
 527
 528   dst.x = 0
 529   dst.y = 0
 530   dst.z = 0
 531   dst.w = 0
 532
 533 Considered for removal.
 534
 535 SGT - Set On Greater Than
 536
 537 .. math::
 538
 539   dst.x = (src0.x > src1.x) ? 1 : 0
 540   dst.y = (src0.y > src1.y) ? 1 : 0
 541   dst.z = (src0.z > src1.z) ? 1 : 0
 542   dst.w = (src0.w > src1.w) ? 1 : 0
 543
 544
 545 SIN - Sine
 546
 547 .. math::
 548
 549   dst.x = \sin{src.x}
 550
 551   dst.y = \sin{src.x}
 552
 553   dst.z = \sin{src.x}
 554
 555   dst.w = \sin{src.x}
 556
 557
 558 SLE - Set On Less Equal Than
 559
 560 .. math::
 561
 562   dst.x = (src0.x <= src1.x) ? 1 : 0
 563   dst.y = (src0.y <= src1.y) ? 1 : 0
 564   dst.z = (src0.z <= src1.z) ? 1 : 0
 565   dst.w = (src0.w <= src1.w) ? 1 : 0
 566
 567
 568 SNE - Set On Not Equal
 569
 570 .. math::
 571
 572   dst.x = (src0.x != src1.x) ? 1 : 0
 573   dst.y = (src0.y != src1.y) ? 1 : 0
 574   dst.z = (src0.z != src1.z) ? 1 : 0
 575   dst.w = (src0.w != src1.w) ? 1 : 0
 576
 577
 578 STR - Set On True
 579
 580 .. math::
 581
 582   dst.x = 1
 583   dst.y = 1
 584   dst.z = 1
 585   dst.w = 1
 586
 587
 588 TEX - Texture Lookup
 589
 590   TBD
 591
 592
 593 TXD - Texture Lookup with Derivatives
 594
 595   TBD
 596
 597
 598 TXP - Projective Texture Lookup
 599
 600   TBD
 601
 602
 603 UP2H - Unpack Two 16-Bit Floats
 604
 605   TBD
 606
 607   Considered for removal.
 608
 609 UP2US - Unpack Two Unsigned 16-Bit Scalars
 610
 611   TBD
 612
 613   Considered for removal.
 614
 615 UP4B - Unpack Four Signed 8-Bit Values
 616
 617   TBD
 618
 619   Considered for removal.
 620
 621 UP4UB - Unpack Four Unsigned 8-Bit Scalars
 622
 623   TBD
 624
 625   Considered for removal.
 626
 627 X2D - 2D Coordinate Transformation
 628
 629 .. math::
 630
 631   dst.x = src0.x + src1.x \times src2.x + src1.y \times src2.y
 632   dst.y = src0.y + src1.x \times src2.z + src1.y \times src2.w
 633   dst.z = src0.x + src1.x \times src2.x + src1.y \times src2.y
 634   dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w
 635
 636 Considered for removal.
 637
 638
 639 From GL_NV_vertex_program2
 640 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 641
 642
 643 ARA - Address Register Add
 644
 645   TBD
 646
 647   Considered for removal.
 648
 649 ARR - Address Register Load With Round
 650
 651 .. math::
 652
 653   dst.x = round(src.x)
 654
 655   dst.y = round(src.y)
 656
 657   dst.z = round(src.z)
 658
 659   dst.w = round(src.w)
 660
 661
 662 BRA - Branch
 663
 664   pc = target
 665
 666   Considered for removal.
 667
 668 CAL - Subroutine Call
 669
 670   push(pc)
 671   pc = target
 672
 673
 674 RET - Subroutine Call Return
 675
 676   pc = pop()
 677
 678   Potential restrictions:
 679   * Only occurs at end of function.
 680
 681 SSG - Set Sign
 682
 683 .. math::
 684
 685   dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
 686
 687   dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
 688
 689   dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
 690
 691   dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
 692
 693
 694 CMP - Compare
 695
 696 .. math::
 697
 698   dst.x = (src0.x < 0) ? src1.x : src2.x
 699
 700   dst.y = (src0.y < 0) ? src1.y : src2.y
 701
 702   dst.z = (src0.z < 0) ? src1.z : src2.z
 703
 704   dst.w = (src0.w < 0) ? src1.w : src2.w
 705
 706
 707 KIL - Conditional Discard
 708
 709 .. math::
 710
 711   if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
 712     discard
 713   endif
 714
 715
 716 SCS - Sine Cosine
 717
 718 .. math::
 719
 720   dst.x = \cos{src.x}
 721
 722   dst.y = \sin{src.x}
 723
 724   dst.z = 0
 725
 726   dst.y = 1
 727
 728
 729 TXB - Texture Lookup With Bias
 730
 731   TBD
 732
 733
 734 NRM - 3-component Vector Normalise
 735
 736 .. math::
 737
 738   dst.x = src.x / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
 739
 740   dst.y = src.y / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
 741
 742   dst.z = src.z / (src.x \times src.x + src.y \times src.y + src.z \times src.z)
 743
 744   dst.w = 1
 745
 746
 747 DIV - Divide
 748
 749 .. math::
 750
 751   dst.x = \frac{src0.x}{src1.x}
 752
 753   dst.y = \frac{src0.y}{src1.y}
 754
 755   dst.z = \frac{src0.z}{src1.z}
 756
 757   dst.w = \frac{src0.w}{src1.w}
 758
 759
 760 DP2 - 2-component Dot Product
 761
 762 .. math::
 763
 764   dst.x = src0.x \times src1.x + src0.y \times src1.y
 765
 766   dst.y = src0.x \times src1.x + src0.y \times src1.y
 767
 768   dst.z = src0.x \times src1.x + src0.y \times src1.y
 769
 770   dst.w = src0.x \times src1.x + src0.y \times src1.y
 771
 772
 773 TXL - Texture Lookup With LOD
 774
 775   TBD
 776
 777
 778 BRK - Break
 779
 780   TBD
 781
 782
 783 IF - If
 784
 785   TBD
 786
 787
 788 BGNFOR - Begin a For-Loop
 789
 790   dst.x = floor(src.x)
 791   dst.y = floor(src.y)
 792   dst.z = floor(src.z)
 793
 794   if (dst.y <= 0)
 795     pc = [matching ENDFOR] + 1
 796   endif
 797
 798   Note: The destination must be a loop register.
 799         The source must be a constant register.
 800
 801   Considered for cleanup / removal.
 802
 803
 804 REP - Repeat
 805
 806   TBD
 807
 808
 809 ELSE - Else
 810
 811   TBD
 812
 813
 814 ENDIF - End If
 815
 816   TBD
 817
 818
 819 ENDFOR - End a For-Loop
 820
 821   dst.x = dst.x + dst.z
 822   dst.y = dst.y - 1.0
 823
 824   if (dst.y > 0)
 825     pc = [matching BGNFOR instruction] + 1
 826   endif
 827
 828   Note: The destination must be a loop register.
 829
 830   Considered for cleanup / removal.
 831
 832 ENDREP - End Repeat
 833
 834   TBD
 835
 836
 837 PUSHA - Push Address Register On Stack
 838
 839   push(src.x)
 840   push(src.y)
 841   push(src.z)
 842   push(src.w)
 843
 844   Considered for cleanup / removal.
 845
 846 POPA - Pop Address Register From Stack
 847
 848   dst.w = pop()
 849   dst.z = pop()
 850   dst.y = pop()
 851   dst.x = pop()
 852
 853   Considered for cleanup / removal.
 854
 855
 856 From GL_NV_gpu_program4
 857 ^^^^^^^^^^^^^^^^^^^^^^^^
 858
 859 Support for these opcodes indicated by a special pipe capability bit (TBD).
 860
 861 CEIL - Ceiling
 862
 863 .. math::
 864
 865   dst.x = \lceil src.x\rceil
 866
 867   dst.y = \lceil src.y\rceil
 868
 869   dst.z = \lceil src.z\rceil
 870
 871   dst.w = \lceil src.w\rceil
 872
 873
 874 I2F - Integer To Float
 875
 876 .. math::
 877
 878   dst.x = (float) src.x
 879
 880   dst.y = (float) src.y
 881
 882   dst.z = (float) src.z
 883
 884   dst.w = (float) src.w
 885
 886
 887 NOT - Bitwise Not
 888
 889 .. math::
 890
 891   dst.x = ~src.x
 892
 893   dst.y = ~src.y
 894
 895   dst.z = ~src.z
 896
 897   dst.w = ~src.w
 898
 899
 900 TRUNC - Truncate
 901
 902 .. math::
 903
 904   dst.x = trunc(src.x)
 905
 906   dst.y = trunc(src.y)
 907
 908   dst.z = trunc(src.z)
 909
 910   dst.w = trunc(src.w)
 911
 912
 913 SHL - Shift Left
 914
 915 .. math::
 916
 917   dst.x = src0.x << src1.x
 918
 919   dst.y = src0.y << src1.x
 920
 921   dst.z = src0.z << src1.x
 922
 923   dst.w = src0.w << src1.x
 924
 925
 926 SHR - Shift Right
 927
 928 .. math::
 929
 930   dst.x = src0.x >> src1.x
 931
 932   dst.y = src0.y >> src1.x
 933
 934   dst.z = src0.z >> src1.x
 935
 936   dst.w = src0.w >> src1.x
 937
 938
 939 AND - Bitwise And
 940
 941 .. math::
 942
 943   dst.x = src0.x & src1.x
 944
 945   dst.y = src0.y & src1.y
 946
 947   dst.z = src0.z & src1.z
 948
 949   dst.w = src0.w & src1.w
 950
 951
 952 OR - Bitwise Or
 953
 954 .. math::
 955
 956   dst.x = src0.x | src1.x
 957
 958   dst.y = src0.y | src1.y
 959
 960   dst.z = src0.z | src1.z
 961
 962   dst.w = src0.w | src1.w
 963
 964
 965 MOD - Modulus
 966
 967 .. math::
 968
 969   dst.x = src0.x \bmod src1.x
 970
 971   dst.y = src0.y \bmod src1.y
 972
 973   dst.z = src0.z \bmod src1.z
 974
 975   dst.w = src0.w \bmod src1.w
 976
 977
 978 XOR - Bitwise Xor
 979
 980 .. math::
 981
 982   dst.x = src0.x ^ src1.x
 983
 984   dst.y = src0.y ^ src1.y
 985
 986   dst.z = src0.z ^ src1.z
 987
 988   dst.w = src0.w ^ src1.w
 989
 990
 991 SAD - Sum Of Absolute Differences
 992
 993 .. math::
 994
 995   dst.x = |src0.x - src1.x| + src2.x
 996
 997   dst.y = |src0.y - src1.y| + src2.y
 998
 999   dst.z = |src0.z - src1.z| + src2.z
1000
1001   dst.w = |src0.w - src1.w| + src2.w
1002
1003
1004 TXF - Texel Fetch
1005
1006   TBD
1007
1008
1009 TXQ - Texture Size Query
1010
1011   TBD
1012
1013
1014 CONT - Continue
1015
1016   TBD
1017
1018
1019 From GL_NV_geometry_program4
1020 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1021
1022
1023 EMIT - Emit
1024
1025   TBD
1026
1027
1028 ENDPRIM - End Primitive
1029
1030   TBD
1031
1032
1033 From GLSL
1034 ^^^^^^^^^^
1035
1036
1037 BGNLOOP - Begin a Loop
1038
1039   TBD
1040
1041
1042 BGNSUB - Begin Subroutine
1043
1044   TBD
1045
1046
1047 ENDLOOP - End a Loop
1048
1049   TBD
1050
1051
1052 ENDSUB - End Subroutine
1053
1054   TBD
1055
1056
1057 NOP - No Operation
1058
1059   Do nothing.
1060
1061
1062 NRM4 - 4-component Vector Normalise
1063
1064 .. math::
1065
1066   dst.x = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
1067
1068   dst.y = \frac{src.y}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
1069
1070   dst.z = \frac{src.z}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
1071
1072   dst.w = \frac{src.w}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
1073
1074
1075 ps_2_x
1076 ^^^^^^^^^^^^
1077
1078
1079 CALLNZ - Subroutine Call If Not Zero
1080
1081   TBD
1082
1083
1084 IFC - If
1085
1086   TBD
1087
1088
1089 BREAKC - Break Conditional
1090
1091   TBD
1092
1093
1094 Explanation of symbols used
1095 ------------------------------
1096
1097
1098 Functions
1099 ^^^^^^^^^^^^^^
1100
1101
1102   :math:`|x|`       Absolute value of `x`.
1103
1104   :math:`\lceil x \rceil` Ceiling of `x`.
1105
1106   clamp(x,y,z)      Clamp x between y and z.
1107                     (x < y) ? y : (x > z) ? z : x
1108
1109   :math:`\lfloor x\rfloor` Floor of `x`.
1110
1111   :math:`\log_2{x}` Logarithm of `x`, base 2.
1112
1113   max(x,y)          Maximum of x and y.
1114                     (x > y) ? x : y
1115
1116   min(x,y)          Minimum of x and y.
1117                     (x < y) ? x : y
1118
1119   partialx(x)       Derivative of x relative to fragment's X.
1120
1121   partialy(x)       Derivative of x relative to fragment's Y.
1122
1123   pop()             Pop from stack.
1124
1125   :math:`x^y`       `x` to the power `y`.
1126
1127   push(x)           Push x on stack.
1128
1129   round(x)          Round x.
1130
1131   trunc(x)          Truncate x, i.e. drop the fraction bits.
1132
1133
1134 Keywords
1135 ^^^^^^^^^^^^^
1136
1137
1138   discard           Discard fragment.
1139
1140   dst               First destination register.
1141
1142   dst0              First destination register.
1143
1144   pc                Program counter.
1145
1146   src               First source register.
1147
1148   src0              First source register.
1149
1150   src1              Second source register.
1151
1152   src2              Third source register.
1153
1154   target            Label of target instruction.
1155
1156
1157 Other tokens
1158 ---------------
1159
1160
1161 Declaration Semantic
1162 ^^^^^^^^^^^^^^^^^^^^^^^^
1163
1164
1165   Follows Declaration token if Semantic bit is set.
1166
1167   Since its purpose is to link a shader with other stages of the pipeline,
1168   it is valid to follow only those Declaration tokens that declare a register
1169   either in INPUT or OUTPUT file.
1170
1171   SemanticName field contains the semantic name of the register being declared.
1172   There is no default value.
1173
1174   SemanticIndex is an optional subscript that can be used to distinguish
1175   different register declarations with the same semantic name. The default value
1176   is 0.
1177
1178   The meanings of the individual semantic names are explained in the following
1179   sections.
1180
1181 TGSI_SEMANTIC_POSITION
1182 """"""""""""""""""""""
1183
1184 Position, sometimes known as HPOS or WPOS for historical reasons, is the
1185 location of the vertex in space, in ``(x, y, z, w)`` format. ``x``, ``y``, and ``z``
1186 are the Cartesian coordinates, and ``w`` is the homogenous coordinate and used
1187 for the perspective divide, if enabled.
1188
1189 As a vertex shader output, position should be scaled to the viewport. When
1190 used in fragment shaders, position will ---
1191
1192 XXX --- wait a minute. Should position be in [0,1] for x and y?
1193
1194 XXX additionally, is there a way to configure the perspective divide? it's
1195 accelerated on most chipsets AFAIK...
1196
1197 Position, if not specified, usually defaults to ``(0, 0, 0, 1)``, and can
1198 be partially specified as ``(x, y, 0, 1)`` or ``(x, y, z, 1)``.
1199
1200 XXX usually? can we solidify that?
1201
1202 TGSI_SEMANTIC_COLOR
1203 """""""""""""""""""
1204
1205 Colors are used to, well, color the primitives. Colors are always in
1206 ``(r, g, b, a)`` format.
1207
1208 If alpha is not specified, it defaults to 1.
1209
1210 TGSI_SEMANTIC_BCOLOR
1211 """"""""""""""""""""
1212
1213 Back-facing colors are only used for back-facing polygons, and are only valid
1214 in vertex shader outputs. After rasterization, all polygons are front-facing
1215 and COLOR and BCOLOR end up occupying the same slots in the fragment, so
1216 all BCOLORs effectively become regular COLORs in the fragment shader.
1217
1218 TGSI_SEMANTIC_FOG
1219 """""""""""""""""
1220
1221 The fog coordinate historically has been used to replace the depth coordinate
1222 for generation of fog in dedicated fog blocks. Gallium, however, does not use
1223 dedicated fog acceleration, placing it entirely in the fragment shader
1224 instead.
1225
1226 The fog coordinate should be written in ``(f, 0, 0, 1)`` format. Only the first
1227 component matters when writing from the vertex shader; the driver will ensure
1228 that the coordinate is in this format when used as a fragment shader input.
1229
1230 TGSI_SEMANTIC_PSIZE
1231 """""""""""""""""""
1232
1233 PSIZE, or point size, is used to specify point sizes per-vertex. It should
1234 be in ``(p, n, x, f)`` format, where ``p`` is the point size, ``n`` is the minimum
1235 size, ``x`` is the maximum size, and ``f`` is the fade threshold.
1236
1237 XXX this is arb_vp. is this what we actually do? should double-check...
1238
1239 When using this semantic, be sure to set the appropriate state in the
1240 :ref:`rasterizer` first.
1241
1242 TGSI_SEMANTIC_GENERIC
1243 """""""""""""""""""""
1244
1245 Generic semantics are nearly always used for texture coordinate attributes,
1246 in ``(s, t, r, q)`` format. ``t`` and ``r`` may be unused for certain kinds
1247 of lookups, and ``q`` is the level-of-detail bias for biased sampling.
1248
1249 These attributes are called "generic" because they may be used for anything
1250 else, including parameters, texture generation information, or anything that
1251 can be stored inside a four-component vector.
1252
1253 TGSI_SEMANTIC_NORMAL
1254 """"""""""""""""""""
1255
1256 Vertex normal; could be used to implement per-pixel lighting for legacy APIs
1257 that allow mixing fixed-function and programmable stages.
1258
1259 TGSI_SEMANTIC_FACE
1260 """"""""""""""""""
1261
1262 FACE is the facing bit, to store the facing information for the fragment
1263 shader. ``(f, 0, 0, 1)`` is the format. The first component will be positive
1264 when the fragment is front-facing, and negative when the component is
1265 back-facing.
1266
1267 TGSI_SEMANTIC_EDGEFLAG
1268 """"""""""""""""""""""
1269
1270 XXX no clue