X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=src%2Fgallium%2Fdocs%2Fsource%2Ftgsi.rst;h=548a9a398556d79368c910a67be8f0b90b7736f8;hb=bb4c5d72d7c7cb1d9e7016e2c07c36875f30011a;hp=2c34a6bae5e647ec9431bd6d222508789c67bfb0;hpb=db89bf40025805165bbc34ac7a6610f9468d8749;p=mesa.git diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 2c34a6bae5e..548a9a39855 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -6,14 +6,33 @@ for describing shaders. Since Gallium is inherently shaderful, shaders are an important part of the API. TGSI is the only intermediate representation used by all drivers. +Basics +------ + +All TGSI instructions, known as *opcodes*, operate on arbitrary-precision +floating-point four-component vectors. An opcode may have up to one +destination register, known as *dst*, and between zero and three source +registers, called *src0* through *src2*, or simply *src* if there is only +one. + +Some instructions, like :opcode:`I2F`, permit re-interpretation of vector +components as integers. Other instructions permit using registers as +two-component vectors with double precision; see :ref:`Double Opcodes`. + +When an instruction has a scalar result, the result is usually copied into +each of the components of *dst*. When this happens, the result is said to be +*replicated* to *dst*. :opcode:`RCP` is one such instruction. + Instruction Set --------------- -From GL_NV_vertex_program +Core ISA ^^^^^^^^^^^^^^^^^^^^^^^^^ +These opcodes are guaranteed to be available regardless of the driver being +used. -ARL - Address Register Load +.. opcode:: ARL - Address Register Load .. math:: @@ -26,7 +45,7 @@ ARL - Address Register Load dst.w = \lfloor src.w\rfloor -MOV - Move +.. opcode:: MOV - Move .. math:: @@ -39,7 +58,7 @@ MOV - Move dst.w = src.w -LIT - Light Coefficients +.. opcode:: LIT - Light Coefficients .. math:: @@ -52,33 +71,25 @@ LIT - Light Coefficients dst.w = 1 -RCP - Reciprocal - -.. math:: +.. opcode:: RCP - Reciprocal - dst.x = \frac{1}{src.x} +This instruction replicates its result. - dst.y = \frac{1}{src.x} +.. math:: - dst.z = \frac{1}{src.x} + dst = \frac{1}{src.x} - dst.w = \frac{1}{src.x} +.. opcode:: RSQ - Reciprocal Square Root -RSQ - Reciprocal Square Root +This instruction replicates its result. .. math:: - dst.x = \frac{1}{\sqrt{|src.x|}} - - dst.y = \frac{1}{\sqrt{|src.x|}} - - dst.z = \frac{1}{\sqrt{|src.x|}} + dst = \frac{1}{\sqrt{|src.x|}} - dst.w = \frac{1}{\sqrt{|src.x|}} - -EXP - Approximate Exponential Base 2 +.. opcode:: EXP - Approximate Exponential Base 2 .. math:: @@ -91,7 +102,7 @@ EXP - Approximate Exponential Base 2 dst.w = 1 -LOG - Approximate Logarithm Base 2 +.. opcode:: LOG - Approximate Logarithm Base 2 .. math:: @@ -104,7 +115,7 @@ LOG - Approximate Logarithm Base 2 dst.w = 1 -MUL - Multiply +.. opcode:: MUL - Multiply .. math:: @@ -117,7 +128,7 @@ MUL - Multiply dst.w = src0.w \times src1.w -ADD - Add +.. opcode:: ADD - Add .. math:: @@ -130,33 +141,25 @@ ADD - Add dst.w = src0.w + src1.w -DP3 - 3-component Dot Product - -.. math:: +.. opcode:: DP3 - 3-component Dot Product - dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z +This instruction replicates its result. - dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z +.. math:: - dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z - dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z +.. opcode:: DP4 - 4-component Dot Product -DP4 - 4-component Dot Product +This instruction replicates its result. .. math:: - dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w - - dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w + dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w - dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w - dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w - - -DST - Distance Vector +.. opcode:: DST - Distance Vector .. math:: @@ -169,7 +172,7 @@ DST - Distance Vector dst.w = src1.w -MIN - Minimum +.. opcode:: MIN - Minimum .. math:: @@ -182,7 +185,7 @@ MIN - Minimum dst.w = min(src0.w, src1.w) -MAX - Maximum +.. opcode:: MAX - Maximum .. math:: @@ -195,7 +198,7 @@ MAX - Maximum dst.w = max(src0.w, src1.w) -SLT - Set On Less Than +.. opcode:: SLT - Set On Less Than .. math:: @@ -208,7 +211,7 @@ SLT - Set On Less Than dst.w = (src0.w < src1.w) ? 1 : 0 -SGE - Set On Greater Equal Than +.. opcode:: SGE - Set On Greater Equal Than .. math:: @@ -221,7 +224,7 @@ SGE - Set On Greater Equal Than dst.w = (src0.w >= src1.w) ? 1 : 0 -MAD - Multiply And Add +.. opcode:: MAD - Multiply And Add .. math:: @@ -234,7 +237,7 @@ MAD - Multiply And Add dst.w = src0.w \times src1.w + src2.w -SUB - Subtract +.. opcode:: SUB - Subtract .. math:: @@ -247,7 +250,7 @@ SUB - Subtract dst.w = src0.w - src1.w -LRP - Linear Interpolate +.. opcode:: LRP - Linear Interpolate .. math:: @@ -260,7 +263,7 @@ LRP - Linear Interpolate dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w -CND - Condition +.. opcode:: CND - Condition .. math:: @@ -273,7 +276,7 @@ CND - Condition dst.w = (src2.w > 0.5) ? src0.w : src1.w -DP2A - 2-component Dot Product And Add +.. opcode:: DP2A - 2-component Dot Product And Add .. math:: @@ -286,7 +289,7 @@ DP2A - 2-component Dot Product And Add dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x -FRAC - Fraction +.. opcode:: FRC - Fraction .. math:: @@ -299,7 +302,7 @@ FRAC - Fraction dst.w = src.w - \lfloor src.w\rfloor -CLAMP - Clamp +.. opcode:: CLAMP - Clamp .. math:: @@ -312,9 +315,9 @@ CLAMP - Clamp dst.w = clamp(src0.w, src1.w, src2.w) -FLR - Floor +.. opcode:: FLR - Floor -This is identical to ARL. +This is identical to :opcode:`ARL`. .. math:: @@ -327,7 +330,7 @@ This is identical to ARL. dst.w = \lfloor src.w\rfloor -ROUND - Round +.. opcode:: ROUND - Round .. math:: @@ -340,45 +343,33 @@ ROUND - Round dst.w = round(src.w) -EX2 - Exponential Base 2 +.. opcode:: EX2 - Exponential Base 2 -.. math:: +This instruction replicates its result. - dst.x = 2^{src.x} - - dst.y = 2^{src.x} +.. math:: - dst.z = 2^{src.x} + dst = 2^{src.x} - dst.w = 2^{src.x} +.. opcode:: LG2 - Logarithm Base 2 -LG2 - Logarithm Base 2 +This instruction replicates its result. .. math:: - dst.x = \log_2{src.x} - - dst.y = \log_2{src.x} - - dst.z = \log_2{src.x} + dst = \log_2{src.x} - dst.w = \log_2{src.x} +.. opcode:: POW - Power -POW - Power +This instruction replicates its result. .. math:: - dst.x = src0.x^{src1.x} + dst = src0.x^{src1.x} - dst.y = src0.x^{src1.x} - - dst.z = src0.x^{src1.x} - - dst.w = src0.x^{src1.x} - -XPD - Cross Product +.. opcode:: XPD - Cross Product .. math:: @@ -391,7 +382,7 @@ XPD - Cross Product dst.w = 1 -ABS - Absolute +.. opcode:: ABS - Absolute .. math:: @@ -404,48 +395,36 @@ ABS - Absolute dst.w = |src.w| -RCC - Reciprocal Clamped +.. opcode:: RCC - Reciprocal Clamped + +This instruction replicates its result. XXX cleanup on aisle three .. math:: - dst.x = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) - - dst.y = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) - - dst.z = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) + dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) - dst.w = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) +.. opcode:: DPH - Homogeneous Dot Product -DPH - Homogeneous Dot Product +This instruction replicates its result. .. math:: - dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w + dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w - dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w - dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w +.. opcode:: COS - Cosine - dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w - - -COS - Cosine +This instruction replicates its result. .. math:: - dst.x = \cos{src.x} - - dst.y = \cos{src.x} - - dst.z = \cos{src.x} + dst = \cos{src.x} - dst.w = \cos{src.x} - -DDX - Derivative Relative To X +.. opcode:: DDX - Derivative Relative To X .. math:: @@ -458,7 +437,7 @@ DDX - Derivative Relative To X dst.w = partialx(src.w) -DDY - Derivative Relative To Y +.. opcode:: DDY - Derivative Relative To Y .. math:: @@ -471,32 +450,32 @@ DDY - Derivative Relative To Y dst.w = partialy(src.w) -KILP - Predicated Discard +.. opcode:: KILP - Predicated Discard discard -PK2H - Pack Two 16-bit Floats +.. opcode:: PK2H - Pack Two 16-bit Floats TBD -PK2US - Pack Two Unsigned 16-bit Scalars +.. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars TBD -PK4B - Pack Four Signed 8-bit Scalars +.. opcode:: PK4B - Pack Four Signed 8-bit Scalars TBD -PK4UB - Pack Four Unsigned 8-bit Scalars +.. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars TBD -RFL - Reflection Vector +.. opcode:: RFL - Reflection Vector .. math:: @@ -508,10 +487,12 @@ RFL - Reflection Vector dst.w = 1 -Considered for removal. +.. note:: + + Considered for removal. -SEQ - Set On Equal +.. opcode:: SEQ - Set On Equal .. math:: @@ -524,21 +505,20 @@ SEQ - Set On Equal dst.w = (src0.w == src1.w) ? 1 : 0 -SFL - Set On False +.. opcode:: SFL - Set On False -.. math:: +This instruction replicates its result. - dst.x = 0 +.. math:: - dst.y = 0 + dst = 0 - dst.z = 0 +.. note:: - dst.w = 0 + Considered for removal. -Considered for removal. -SGT - Set On Greater Than +.. opcode:: SGT - Set On Greater Than .. math:: @@ -551,20 +531,16 @@ SGT - Set On Greater Than dst.w = (src0.w > src1.w) ? 1 : 0 -SIN - Sine +.. opcode:: SIN - Sine -.. math:: +This instruction replicates its result. - dst.x = \sin{src.x} - - dst.y = \sin{src.x} - - dst.z = \sin{src.x} +.. math:: - dst.w = \sin{src.x} + dst = \sin{src.x} -SLE - Set On Less Equal Than +.. opcode:: SLE - Set On Less Equal Than .. math:: @@ -577,7 +553,7 @@ SLE - Set On Less Equal Than dst.w = (src0.w <= src1.w) ? 1 : 0 -SNE - Set On Not Equal +.. opcode:: SNE - Set On Not Equal .. math:: @@ -590,59 +566,102 @@ SNE - Set On Not Equal dst.w = (src0.w != src1.w) ? 1 : 0 -STR - Set On True +.. opcode:: STR - Set On True + +This instruction replicates its result. .. math:: - dst.x = 1 + dst = 1 - dst.y = 1 - dst.z = 1 +.. opcode:: TEX - Texture Lookup - dst.w = 1 +.. math:: + coord = src0 -TEX - Texture Lookup + bias = 0.0 - TBD + dst = texture_sample(unit, coord, bias) + for array textures src0.y contains the slice for 1D, + and src0.z contain the slice for 2D. + for shadow textures with no arrays, src0.z contains + the reference value. + for shadow textures with arrays, src0.z contains + the reference value for 1D arrays, and src0.w contains + the reference value for 2D arrays. + There is no way to pass a bias in the .w value for + shadow arrays, and GLSL doesn't allow this. + GLSL does allow cube shadows maps to take a bias value, + and we have to determine how this will look in TGSI. -TXD - Texture Lookup with Derivatives +.. opcode:: TXD - Texture Lookup with Derivatives - TBD +.. math:: + coord = src0 -TXP - Projective Texture Lookup + ddx = src1 - TBD + ddy = src2 + + bias = 0.0 + + dst = texture_sample_deriv(unit, coord, bias, ddx, ddy) + + +.. opcode:: TXP - Projective Texture Lookup +.. math:: + + coord.x = src0.x / src.w + + coord.y = src0.y / src.w + + coord.z = src0.z / src.w + + coord.w = src0.w + + bias = 0.0 -UP2H - Unpack Two 16-Bit Floats + dst = texture_sample(unit, coord, bias) + + +.. opcode:: UP2H - Unpack Two 16-Bit Floats TBD - Considered for removal. +.. note:: + + Considered for removal. -UP2US - Unpack Two Unsigned 16-Bit Scalars +.. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars TBD - Considered for removal. +.. note:: -UP4B - Unpack Four Signed 8-Bit Values + Considered for removal. + +.. opcode:: UP4B - Unpack Four Signed 8-Bit Values TBD - Considered for removal. +.. note:: + + Considered for removal. -UP4UB - Unpack Four Unsigned 8-Bit Scalars +.. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars TBD - Considered for removal. +.. note:: -X2D - 2D Coordinate Transformation + Considered for removal. + +.. opcode:: X2D - 2D Coordinate Transformation .. math:: @@ -654,20 +673,20 @@ X2D - 2D Coordinate Transformation dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w -Considered for removal. - +.. note:: -From GL_NV_vertex_program2 -^^^^^^^^^^^^^^^^^^^^^^^^^^ + Considered for removal. -ARA - Address Register Add +.. opcode:: ARA - Address Register Add TBD - Considered for removal. +.. note:: + + Considered for removal. -ARR - Address Register Load With Round +.. opcode:: ARR - Address Register Load With Round .. math:: @@ -680,26 +699,26 @@ ARR - Address Register Load With Round dst.w = round(src.w) -BRA - Branch +.. opcode:: BRA - Branch pc = target - Considered for removal. +.. note:: -CAL - Subroutine Call + Considered for removal. + +.. opcode:: CAL - Subroutine Call push(pc) pc = target -RET - Subroutine Call Return +.. opcode:: RET - Subroutine Call Return pc = pop() - Potential restrictions: - * Only occurs at end of function. -SSG - Set Sign +.. opcode:: SSG - Set Sign .. math:: @@ -712,7 +731,7 @@ SSG - Set Sign dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0 -CMP - Compare +.. opcode:: CMP - Compare .. math:: @@ -725,7 +744,7 @@ CMP - Compare dst.w = (src0.w < 0) ? src1.w : src2.w -KIL - Conditional Discard +.. opcode:: KIL - Conditional Discard .. math:: @@ -734,7 +753,7 @@ KIL - Conditional Discard endif -SCS - Sine Cosine +.. opcode:: SCS - Sine Cosine .. math:: @@ -744,15 +763,27 @@ SCS - Sine Cosine dst.z = 0 - dst.y = 1 + dst.w = 1 -TXB - Texture Lookup With Bias +.. opcode:: TXB - Texture Lookup With Bias - TBD +.. math:: + + coord.x = src.x + + coord.y = src.y + + coord.z = src.z + + coord.w = 1.0 + bias = src.z -NRM - 3-component Vector Normalise + dst = texture_sample(unit, coord, bias) + + +.. opcode:: NRM - 3-component Vector Normalise .. math:: @@ -765,7 +796,7 @@ NRM - 3-component Vector Normalise dst.w = 1 -DIV - Divide +.. opcode:: DIV - Divide .. math:: @@ -778,108 +809,92 @@ DIV - Divide dst.w = \frac{src0.w}{src1.w} -DP2 - 2-component Dot Product - -.. math:: - - dst.x = src0.x \times src1.x + src0.y \times src1.y - - dst.y = src0.x \times src1.x + src0.y \times src1.y - - dst.z = src0.x \times src1.x + src0.y \times src1.y - - dst.w = src0.x \times src1.x + src0.y \times src1.y - - -TXL - Texture Lookup With LOD - - TBD +.. opcode:: DP2 - 2-component Dot Product +This instruction replicates its result. -BRK - Break +.. math:: - TBD + dst = src0.x \times src1.x + src0.y \times src1.y -IF - If +.. opcode:: TXL - Texture Lookup With explicit LOD - TBD +.. math:: + coord.x = src0.x -BGNFOR - Begin a For-Loop + coord.y = src0.y - dst.x = floor(src.x) - dst.y = floor(src.y) - dst.z = floor(src.z) + coord.z = src0.z - if (dst.y <= 0) - pc = [matching ENDFOR] + 1 - endif + coord.w = 1.0 - Note: The destination must be a loop register. - The source must be a constant register. + lod = src0.w - Considered for cleanup / removal. + dst = texture_sample(unit, coord, lod) -REP - Repeat +.. opcode:: BRK - Break TBD -ELSE - Else +.. opcode:: IF - If TBD -ENDIF - End If +.. opcode:: ELSE - Else TBD -ENDFOR - End a For-Loop - - dst.x = dst.x + dst.z - dst.y = dst.y - 1.0 - - if (dst.y > 0) - pc = [matching BGNFOR instruction] + 1 - endif - - Note: The destination must be a loop register. - - Considered for cleanup / removal. - -ENDREP - End Repeat +.. opcode:: ENDIF - End If TBD -PUSHA - Push Address Register On Stack +.. opcode:: PUSHA - Push Address Register On Stack push(src.x) push(src.y) push(src.z) push(src.w) - Considered for cleanup / removal. +.. note:: + + Considered for cleanup. + +.. note:: -POPA - Pop Address Register From Stack + Considered for removal. + +.. opcode:: POPA - Pop Address Register From Stack dst.w = pop() dst.z = pop() dst.y = pop() dst.x = pop() - Considered for cleanup / removal. +.. note:: + + Considered for cleanup. + +.. note:: + Considered for removal. -From GL_NV_gpu_program4 + +Compute ISA ^^^^^^^^^^^^^^^^^^^^^^^^ +These opcodes are primarily provided for special-use computational shaders. Support for these opcodes indicated by a special pipe capability bit (TBD). -CEIL - Ceiling +XXX so let's discuss it, yeah? + +.. opcode:: CEIL - Ceiling .. math:: @@ -892,7 +907,7 @@ CEIL - Ceiling dst.w = \lceil src.w\rceil -I2F - Integer To Float +.. opcode:: I2F - Integer To Float .. math:: @@ -905,7 +920,7 @@ I2F - Integer To Float dst.w = (float) src.w -NOT - Bitwise Not +.. opcode:: NOT - Bitwise Not .. math:: @@ -918,7 +933,7 @@ NOT - Bitwise Not dst.w = ~src.w -TRUNC - Truncate +.. opcode:: TRUNC - Truncate .. math:: @@ -931,7 +946,7 @@ TRUNC - Truncate dst.w = trunc(src.w) -SHL - Shift Left +.. opcode:: SHL - Shift Left .. math:: @@ -944,7 +959,7 @@ SHL - Shift Left dst.w = src0.w << src1.x -SHR - Shift Right +.. opcode:: SHR - Shift Right .. math:: @@ -957,7 +972,7 @@ SHR - Shift Right dst.w = src0.w >> src1.x -AND - Bitwise And +.. opcode:: AND - Bitwise And .. math:: @@ -970,7 +985,7 @@ AND - Bitwise And dst.w = src0.w & src1.w -OR - Bitwise Or +.. opcode:: OR - Bitwise Or .. math:: @@ -983,7 +998,7 @@ OR - Bitwise Or dst.w = src0.w | src1.w -MOD - Modulus +.. opcode:: MOD - Modulus .. math:: @@ -996,7 +1011,7 @@ MOD - Modulus dst.w = src0.w \bmod src1.w -XOR - Bitwise Xor +.. opcode:: XOR - Bitwise Xor .. math:: @@ -1009,7 +1024,39 @@ XOR - Bitwise Xor dst.w = src0.w \oplus src1.w -SAD - Sum Of Absolute Differences +.. opcode:: UCMP - Integer Conditional Move + +.. math:: + + dst.x = src0.x ? src1.x : src2.x + + dst.y = src0.y ? src1.y : src2.y + + dst.z = src0.z ? src1.z : src2.z + + dst.w = src0.w ? src1.w : src2.w + + +.. opcode:: UARL - Integer Address Register Load + + Moves the contents of the source register, assumed to be an integer, into the + destination register, which is assumed to be an address (ADDR) register. + + +.. opcode:: IABS - Integer Absolute Value + +.. math:: + + dst.x = |src.x| + + dst.y = |src.y| + + dst.z = |src.z| + + dst.w = |src.w| + + +.. opcode:: SAD - Sum Of Absolute Differences .. math:: @@ -1022,99 +1069,131 @@ SAD - Sum Of Absolute Differences dst.w = |src0.w - src1.w| + src2.w -TXF - Texel Fetch +.. opcode:: TXF - Texel Fetch (as per NV_gpu_shader4), extract a single texel + from a specified texture image. The source sampler may + not be a CUBE or SHADOW. + src 0 is a four-component signed integer vector used to + identify the single texel accessed. 3 components + level. + src 1 is a 3 component constant signed integer vector, + with each component only have a range of + -8..+8 (hw only seems to deal with this range, interface + allows for up to unsigned int). + TXF(uint_vec coord, int_vec offset). - TBD +.. opcode:: TXQ - Texture Size Query (as per NV_gpu_program4) + retrieve the dimensions of the texture + depending on the target. For 1D (width), 2D/RECT/CUBE + (width, height), 3D (width, height, depth), + 1D array (width, layers), 2D array (width, height, layers) + +.. math:: -TXQ - Texture Size Query + lod = src0 - TBD + dst.x = texture_width(unit, lod) + + dst.y = texture_height(unit, lod) + dst.z = texture_depth(unit, lod) -CONT - Continue + +.. opcode:: CONT - Continue TBD +.. note:: + + Support for CONT is determined by a special capability bit, + ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information. + -From GL_NV_geometry_program4 +Geometry ISA ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +These opcodes are only supported in geometry shaders; they have no meaning +in any other type of shader. -EMIT - Emit +.. opcode:: EMIT - Emit TBD -ENDPRIM - End Primitive +.. opcode:: ENDPRIM - End Primitive TBD -From GLSL +GLSL ISA ^^^^^^^^^^ +These opcodes are part of :term:`GLSL`'s opcode set. Support for these +opcodes is determined by a special capability bit, ``GLSL``. -BGNLOOP - Begin a Loop +.. opcode:: BGNLOOP - Begin a Loop TBD -BGNSUB - Begin Subroutine +.. opcode:: BGNSUB - Begin Subroutine TBD -ENDLOOP - End a Loop +.. opcode:: ENDLOOP - End a Loop TBD -ENDSUB - End Subroutine +.. opcode:: ENDSUB - End Subroutine TBD -NOP - No Operation +.. opcode:: NOP - No Operation Do nothing. -NRM4 - 4-component Vector Normalise +.. opcode:: NRM4 - 4-component Vector Normalise -.. math:: - - dst.x = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} - - dst.y = \frac{src.y}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} +This instruction replicates its result. - dst.z = \frac{src.z}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} +.. math:: - dst.w = \frac{src.w}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} + dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} ps_2_x ^^^^^^^^^^^^ +XXX wait what -CALLNZ - Subroutine Call If Not Zero +.. opcode:: CALLNZ - Subroutine Call If Not Zero TBD -IFC - If +.. opcode:: IFC - If TBD -BREAKC - Break Conditional +.. opcode:: BREAKC - Break Conditional TBD -Double Opcodes +.. _doubleopcodes: + +Double ISA ^^^^^^^^^^^^^^^ -DADD - Add Double +The double-precision opcodes reinterpret four-component vectors into +two-component vectors with doubled precision in each component. + +Support for these opcodes is XXX undecided. :T + +.. opcode:: DADD - Add .. math:: @@ -1123,7 +1202,7 @@ DADD - Add Double dst.zw = src0.zw + src1.zw -DDIV - Divide Double +.. opcode:: DDIV - Divide .. math:: @@ -1131,7 +1210,7 @@ DDIV - Divide Double dst.zw = src0.zw / src1.zw -DSEQ - Set Double on Equal +.. opcode:: DSEQ - Set on Equal .. math:: @@ -1139,7 +1218,7 @@ DSEQ - Set Double on Equal dst.zw = src0.zw == src1.zw ? 1.0F : 0.0F -DSLT - Set Double on Less than +.. opcode:: DSLT - Set on Less than .. math:: @@ -1147,7 +1226,7 @@ DSLT - Set Double on Less than dst.zw = src0.zw < src1.zw ? 1.0F : 0.0F -DFRAC - Double Fraction +.. opcode:: DFRAC - Fraction .. math:: @@ -1156,23 +1235,33 @@ DFRAC - Double Fraction dst.zw = src.zw - \lfloor src.zw\rfloor -DFRACEXP - Convert Double Number to Fractional and Integral Components +.. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components + +Like the ``frexp()`` routine in many math libraries, this opcode stores the +exponent of its source to ``dst0``, and the significand to ``dst1``, such that +:math:`dst1 \times 2^{dst0} = src` . .. math:: - dst0.xy = frexp(src.xy, dst1.xy) + dst0.xy = exp(src.xy) + + dst1.xy = frac(src.xy) + + dst0.zw = exp(src.zw) + + dst1.zw = frac(src.zw) - dst0.zw = frexp(src.zw, dst1.zw) +.. opcode:: DLDEXP - Multiply Number by Integral Power of 2 -DLDEXP - Multiple Double Number by Integral Power of 2 +This opcode is the inverse of :opcode:`DFRACEXP`. .. math:: - dst.xy = ldexp(src0.xy, src1.xy) + dst.xy = src0.xy \times 2^{src1.xy} - dst.zw = ldexp(src0.zw, src1.zw) + dst.zw = src0.zw \times 2^{src1.zw} -DMIN - Minimum Double +.. opcode:: DMIN - Minimum .. math:: @@ -1180,7 +1269,7 @@ DMIN - Minimum Double dst.zw = min(src0.zw, src1.zw) -DMAX - Maximum Double +.. opcode:: DMAX - Maximum .. math:: @@ -1188,7 +1277,7 @@ DMAX - Maximum Double dst.zw = max(src0.zw, src1.zw) -DMUL - Multiply Double +.. opcode:: DMUL - Multiply .. math:: @@ -1197,7 +1286,7 @@ DMUL - Multiply Double dst.zw = src0.zw \times src1.zw -DMAD - Multiply And Add Doubles +.. opcode:: DMAD - Multiply And Add .. math:: @@ -1206,7 +1295,7 @@ DMAD - Multiply And Add Doubles dst.zw = src0.zw \times src1.zw + src2.zw -DRCP - Reciprocal Double +.. opcode:: DRCP - Reciprocal .. math:: @@ -1214,7 +1303,7 @@ DRCP - Reciprocal Double dst.zw = \frac{1}{src.zw} -DSQRT - Square root double +.. opcode:: DSQRT - Square Root .. math:: @@ -1223,6 +1312,421 @@ DSQRT - Square root double dst.zw = \sqrt{src.zw} +.. _samplingopcodes: + +Resource Sampling Opcodes +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Those opcodes follow very closely semantics of the respective Direct3D +instructions. If in doubt double check Direct3D documentation. + +.. opcode:: SAMPLE - Using provided address, sample data from the + specified texture using the filtering mode identified + by the gven sampler. The source data may come from + any resource type other than buffers. + SAMPLE dst, address, sampler_view, sampler + e.g. + SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0] + +.. opcode:: SAMPLE_I - Simplified alternative to the SAMPLE instruction. + Using the provided integer address, SAMPLE_I fetches data + from the specified sampler view without any filtering. + The source data may come from any resource type other + than CUBE. + SAMPLE_I dst, address, sampler_view + e.g. + SAMPLE_I TEMP[0], TEMP[1], SVIEW[0] + The 'address' is specified as unsigned integers. If the + 'address' is out of range [0...(# texels - 1)] the + result of the fetch is always 0 in all components. + As such the instruction doesn't honor address wrap + modes, in cases where that behavior is desirable + 'SAMPLE' instruction should be used. + address.w always provides an unsigned integer mipmap + level. If the value is out of the range then the + instruction always returns 0 in all components. + address.yz are ignored for buffers and 1d textures. + address.z is ignored for 1d texture arrays and 2d + textures. + For 1D texture arrays address.y provides the array + index (also as unsigned integer). If the value is + out of the range of available array indices + [0... (array size - 1)] then the opcode always returns + 0 in all components. + For 2D texture arrays address.z provides the array + index, otherwise it exhibits the same behavior as in + the case for 1D texture arrays. + The exact semantics of the source address are presented + in the table below: + resource type X Y Z W + ------------- ------------------------ + PIPE_BUFFER x ignored + PIPE_TEXTURE_1D x mpl + PIPE_TEXTURE_2D x y mpl + PIPE_TEXTURE_3D x y z mpl + PIPE_TEXTURE_RECT x y mpl + PIPE_TEXTURE_CUBE not allowed as source + PIPE_TEXTURE_1D_ARRAY x idx mpl + PIPE_TEXTURE_2D_ARRAY x y idx mpl + + Where 'mpl' is a mipmap level and 'idx' is the + array index. + +.. opcode:: SAMPLE_I_MS - Just like SAMPLE_I but allows fetch data from + multi-sampled surfaces. + +.. opcode:: SAMPLE_B - Just like the SAMPLE instruction with the + exception that an additiona bias is applied to the + level of detail computed as part of the instruction + execution. + SAMPLE_B dst, address, sampler_view, sampler, lod_bias + e.g. + SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x + +.. opcode:: SAMPLE_C - Similar to the SAMPLE instruction but it + performs a comparison filter. The operands to SAMPLE_C + are identical to SAMPLE, except that tere is an additional + float32 operand, reference value, which must be a register + with single-component, or a scalar literal. + SAMPLE_C makes the hardware use the current samplers + compare_func (in pipe_sampler_state) to compare + reference value against the red component value for the + surce resource at each texel that the currently configured + texture filter covers based on the provided coordinates. + SAMPLE_C dst, address, sampler_view.r, sampler, ref_value + e.g. + SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x + +.. opcode:: SAMPLE_C_LZ - Same as SAMPLE_C, but LOD is 0 and derivatives + are ignored. The LZ stands for level-zero. + SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value + e.g. + SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x + + +.. opcode:: SAMPLE_D - SAMPLE_D is identical to the SAMPLE opcode except + that the derivatives for the source address in the x + direction and the y direction are provided by extra + parameters. + SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y + e.g. + SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3] + +.. opcode:: SAMPLE_L - SAMPLE_L is identical to the SAMPLE opcode except + that the LOD is provided directly as a scalar value, + representing no anisotropy. Source addresses A channel + is used as the LOD. + SAMPLE_L dst, address, sampler_view, sampler + e.g. + SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0] + +.. opcode:: GATHER4 - Gathers the four texels to be used in a bi-linear + filtering operation and packs them into a single register. + Only works with 2D, 2D array, cubemaps, and cubemaps arrays. + For 2D textures, only the addressing modes of the sampler and + the top level of any mip pyramid are used. Set W to zero. + It behaves like the SAMPLE instruction, but a filtered + sample is not generated. The four samples that contribute + to filtering are placed into xyzw in counter-clockwise order, + starting with the (u,v) texture coordinate delta at the + following locations (-, +), (+, +), (+, -), (-, -), where + the magnitude of the deltas are half a texel. + + +.. opcode:: SVIEWINFO - query the dimensions of a given sampler view. + dst receives width, height, depth or array size and + number of mipmap levels. The dst can have a writemask + which will specify what info is the caller interested + in. + SVIEWINFO dst, src_mip_level, sampler_view + e.g. + SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0] + src_mip_level is an unsigned integer scalar. If it's + out of range then returns 0 for width, height and + depth/array size but the total number of mipmap is + still returned correctly for the given sampler view. + The returned width, height and depth values are for + the mipmap level selected by the src_mip_level and + are in the number of texels. + For 1d texture array width is in dst.x, array size + is in dst.y and dst.zw are always 0. + +.. opcode:: SAMPLE_POS - query the position of a given sample. + dst receives float4 (x, y, 0, 0) indicated where the + sample is located. If the resource is not a multi-sample + resource and not a render target, the result is 0. + +.. opcode:: SAMPLE_INFO - dst receives number of samples in x. + If the resource is not a multi-sample resource and + not a render target, the result is 0. + + +.. _resourceopcodes: + +Resource Access Opcodes +^^^^^^^^^^^^^^^^^^^^^^^ + +.. opcode:: LOAD - Fetch data from a shader resource + + Syntax: ``LOAD dst, resource, address`` + + Example: ``LOAD TEMP[0], RES[0], TEMP[1]`` + + Using the provided integer address, LOAD fetches data + from the specified buffer or texture without any + filtering. + + The 'address' is specified as a vector of unsigned + integers. If the 'address' is out of range the result + is unspecified. + + Only the first mipmap level of a resource can be read + from using this instruction. + + For 1D or 2D texture arrays, the array index is + provided as an unsigned integer in address.y or + address.z, respectively. address.yz are ignored for + buffers and 1D textures. address.z is ignored for 1D + texture arrays and 2D textures. address.w is always + ignored. + +.. opcode:: STORE - Write data to a shader resource + + Syntax: ``STORE resource, address, src`` + + Example: ``STORE RES[0], TEMP[0], TEMP[1]`` + + Using the provided integer address, STORE writes data + to the specified buffer or texture. + + The 'address' is specified as a vector of unsigned + integers. If the 'address' is out of range the result + is unspecified. + + Only the first mipmap level of a resource can be + written to using this instruction. + + For 1D or 2D texture arrays, the array index is + provided as an unsigned integer in address.y or + address.z, respectively. address.yz are ignored for + buffers and 1D textures. address.z is ignored for 1D + texture arrays and 2D textures. address.w is always + ignored. + + +.. _threadsyncopcodes: + +Inter-thread synchronization opcodes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +These opcodes are intended for communication between threads running +within the same compute grid. For now they're only valid in compute +programs. + +.. opcode:: MFENCE - Memory fence + + Syntax: ``MFENCE resource`` + + Example: ``MFENCE RES[0]`` + + This opcode forces strong ordering between any memory access + operations that affect the specified resource. This means that + previous loads and stores (and only those) will be performed and + visible to other threads before the program execution continues. + + +.. opcode:: LFENCE - Load memory fence + + Syntax: ``LFENCE resource`` + + Example: ``LFENCE RES[0]`` + + Similar to MFENCE, but it only affects the ordering of memory loads. + + +.. opcode:: SFENCE - Store memory fence + + Syntax: ``SFENCE resource`` + + Example: ``SFENCE RES[0]`` + + Similar to MFENCE, but it only affects the ordering of memory stores. + + +.. opcode:: BARRIER - Thread group barrier + + ``BARRIER`` + + This opcode suspends the execution of the current thread until all + the remaining threads in the working group reach the same point of + the program. Results are unspecified if any of the remaining + threads terminates or never reaches an executed BARRIER instruction. + + +.. _atomopcodes: + +Atomic opcodes +^^^^^^^^^^^^^^ + +These opcodes provide atomic variants of some common arithmetic and +logical operations. In this context atomicity means that another +concurrent memory access operation that affects the same memory +location is guaranteed to be performed strictly before or after the +entire execution of the atomic operation. + +For the moment they're only valid in compute programs. + +.. opcode:: ATOMUADD - Atomic integer addition + + Syntax: ``ATOMUADD dst, resource, offset, src`` + + Example: ``ATOMUADD TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = dst_i + src_i + + +.. opcode:: ATOMXCHG - Atomic exchange + + Syntax: ``ATOMXCHG dst, resource, offset, src`` + + Example: ``ATOMXCHG TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = src_i + + +.. opcode:: ATOMCAS - Atomic compare-and-exchange + + Syntax: ``ATOMCAS dst, resource, offset, cmp, src`` + + Example: ``ATOMCAS TEMP[0], RES[0], TEMP[1], TEMP[2], TEMP[3]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = (dst_i == cmp_i ? src_i : dst_i) + + +.. opcode:: ATOMAND - Atomic bitwise And + + Syntax: ``ATOMAND dst, resource, offset, src`` + + Example: ``ATOMAND TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = dst_i \& src_i + + +.. opcode:: ATOMOR - Atomic bitwise Or + + Syntax: ``ATOMOR dst, resource, offset, src`` + + Example: ``ATOMOR TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = dst_i | src_i + + +.. opcode:: ATOMXOR - Atomic bitwise Xor + + Syntax: ``ATOMXOR dst, resource, offset, src`` + + Example: ``ATOMXOR TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = dst_i \oplus src_i + + +.. opcode:: ATOMUMIN - Atomic unsigned minimum + + Syntax: ``ATOMUMIN dst, resource, offset, src`` + + Example: ``ATOMUMIN TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = (dst_i < src_i ? dst_i : src_i) + + +.. opcode:: ATOMUMAX - Atomic unsigned maximum + + Syntax: ``ATOMUMAX dst, resource, offset, src`` + + Example: ``ATOMUMAX TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = (dst_i > src_i ? dst_i : src_i) + + +.. opcode:: ATOMIMIN - Atomic signed minimum + + Syntax: ``ATOMIMIN dst, resource, offset, src`` + + Example: ``ATOMIMIN TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = (dst_i < src_i ? dst_i : src_i) + + +.. opcode:: ATOMIMAX - Atomic signed maximum + + Syntax: ``ATOMIMAX dst, resource, offset, src`` + + Example: ``ATOMIMAX TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = (dst_i > src_i ? dst_i : src_i) + + + Explanation of symbols used ------------------------------ @@ -1269,30 +1773,48 @@ Keywords discard Discard fragment. - dst First destination register. + pc Program counter. - dst0 First destination register. + target Label of target instruction. - pc Program counter. - src First source register. +Other tokens +--------------- - src0 First source register. - src1 Second source register. +Declaration +^^^^^^^^^^^ - src2 Third source register. - target Label of target instruction. +Declares a register that is will be referenced as an operand in Instruction +tokens. +File field contains register file that is being declared and is one +of TGSI_FILE. -Other tokens ---------------- +UsageMask field specifies which of the register components can be accessed +and is one of TGSI_WRITEMASK. + +The Local flag specifies that a given value isn't intended for +subroutine parameter passing and, as a result, the implementation +isn't required to give any guarantees of it being preserved across +subroutine boundaries. As it's merely a compiler hint, the +implementation is free to ignore it. + +If Dimension flag is set to 1, a Declaration Dimension token follows. + +If Semantic flag is set to 1, a Declaration Semantic token follows. + +If Interpolate flag is set to 1, a Declaration Interpolate token follows. + +If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows. Declaration Semantic ^^^^^^^^^^^^^^^^^^^^^^^^ + Vertex and fragment shader input and output registers may be labeled + with semantic information consisting of a name and index. Follows Declaration token if Semantic bit is set. @@ -1313,90 +1835,278 @@ Declaration Semantic TGSI_SEMANTIC_POSITION """""""""""""""""""""" -Position, sometimes known as HPOS or WPOS for historical reasons, is the -location of the vertex in space, in ``(x, y, z, w)`` format. ``x``, ``y``, and ``z`` -are the Cartesian coordinates, and ``w`` is the homogenous coordinate and used -for the perspective divide, if enabled. +For vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader +output register which contains the homogeneous vertex position in the clip +space coordinate system. After clipping, the X, Y and Z components of the +vertex will be divided by the W value to get normalized device coordinates. -As a vertex shader output, position should be scaled to the viewport. When -used in fragment shaders, position will --- +For fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that +fragment shader input contains the fragment's window position. The X +component starts at zero and always increases from left to right. +The Y component starts at zero and always increases but Y=0 may either +indicate the top of the window or the bottom depending on the fragment +coordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN). +The Z coordinate ranges from 0 to 1 to represent depth from the front +to the back of the Z buffer. The W component contains the reciprocol +of the interpolated vertex position W component. -XXX --- wait a minute. Should position be in [0,1] for x and y? +Fragment shaders may also declare an output register with +TGSI_SEMANTIC_POSITION. Only the Z component is writable. This allows +the fragment shader to change the fragment's Z position. -XXX additionally, is there a way to configure the perspective divide? it's -accelerated on most chipsets AFAIK... -Position, if not specified, usually defaults to ``(0, 0, 0, 1)``, and can -be partially specified as ``(x, y, 0, 1)`` or ``(x, y, z, 1)``. - -XXX usually? can we solidify that? TGSI_SEMANTIC_COLOR """"""""""""""""""" -Colors are used to, well, color the primitives. Colors are always in -``(r, g, b, a)`` format. +For vertex shader outputs or fragment shader inputs/outputs, this +label indicates that the resister contains an R,G,B,A color. + +Several shader inputs/outputs may contain colors so the semantic index +is used to distinguish them. For example, color[0] may be the diffuse +color while color[1] may be the specular color. + +This label is needed so that the flat/smooth shading can be applied +to the right interpolants during rasterization. + -If alpha is not specified, it defaults to 1. TGSI_SEMANTIC_BCOLOR """""""""""""""""""" Back-facing colors are only used for back-facing polygons, and are only valid in vertex shader outputs. After rasterization, all polygons are front-facing -and COLOR and BCOLOR end up occupying the same slots in the fragment, so -all BCOLORs effectively become regular COLORs in the fragment shader. +and COLOR and BCOLOR end up occupying the same slots in the fragment shader, +so all BCOLORs effectively become regular COLORs in the fragment shader. + TGSI_SEMANTIC_FOG """"""""""""""""" -The fog coordinate historically has been used to replace the depth coordinate -for generation of fog in dedicated fog blocks. Gallium, however, does not use -dedicated fog acceleration, placing it entirely in the fragment shader -instead. +Vertex shader inputs and outputs and fragment shader inputs may be +labeled with TGSI_SEMANTIC_FOG to indicate that the register contains +a fog coordinate in the form (F, 0, 0, 1). Typically, the fragment +shader will use the fog coordinate to compute a fog blend factor which +is used to blend the normal fragment color with a constant fog color. + +Only the first component matters when writing from the vertex shader; +the driver will ensure that the coordinate is in this format when used +as a fragment shader input. -The fog coordinate should be written in ``(f, 0, 0, 1)`` format. Only the first -component matters when writing from the vertex shader; the driver will ensure -that the coordinate is in this format when used as a fragment shader input. TGSI_SEMANTIC_PSIZE """"""""""""""""""" -PSIZE, or point size, is used to specify point sizes per-vertex. It should -be in ``(p, n, x, f)`` format, where ``p`` is the point size, ``n`` is the minimum -size, ``x`` is the maximum size, and ``f`` is the fade threshold. - -XXX this is arb_vp. is this what we actually do? should double-check... +Vertex shader input and output registers may be labeled with +TGIS_SEMANTIC_PSIZE to indicate that the register contains a point size +in the form (S, 0, 0, 1). The point size controls the width or diameter +of points for rasterization. This label cannot be used in fragment +shaders. When using this semantic, be sure to set the appropriate state in the :ref:`rasterizer` first. + TGSI_SEMANTIC_GENERIC """"""""""""""""""""" -Generic semantics are nearly always used for texture coordinate attributes, -in ``(s, t, r, q)`` format. ``t`` and ``r`` may be unused for certain kinds -of lookups, and ``q`` is the level-of-detail bias for biased sampling. +All vertex/fragment shader inputs/outputs not labeled with any other +semantic label can be considered to be generic attributes. Typical +uses of generic inputs/outputs are texcoords and user-defined values. -These attributes are called "generic" because they may be used for anything -else, including parameters, texture generation information, or anything that -can be stored inside a four-component vector. TGSI_SEMANTIC_NORMAL """""""""""""""""""" -Vertex normal; could be used to implement per-pixel lighting for legacy APIs -that allow mixing fixed-function and programmable stages. +Indicates that a vertex shader input is a normal vector. This is +typically only used for legacy graphics APIs. + TGSI_SEMANTIC_FACE """""""""""""""""" -FACE is the facing bit, to store the facing information for the fragment -shader. ``(f, 0, 0, 1)`` is the format. The first component will be positive -when the fragment is front-facing, and negative when the component is -back-facing. +This label applies to fragment shader inputs only and indicates that +the register contains front/back-face information of the form (F, 0, +0, 1). The first component will be positive when the fragment belongs +to a front-facing polygon, and negative when the fragment belongs to a +back-facing polygon. + TGSI_SEMANTIC_EDGEFLAG """""""""""""""""""""" -XXX no clue +For vertex shaders, this sematic label indicates that an input or +output is a boolean edge flag. The register layout is [F, x, x, x] +where F is 0.0 or 1.0 and x = don't care. Normally, the vertex shader +simply copies the edge flag input to the edgeflag output. + +Edge flags are used to control which lines or points are actually +drawn when the polygon mode converts triangles/quads/polygons into +points or lines. + +TGSI_SEMANTIC_STENCIL +"""""""""""""""""""""" + +For fragment shaders, this semantic label indicates than an output +is a writable stencil reference value. Only the Y component is writable. +This allows the fragment shader to change the fragments stencilref value. + + +Declaration Interpolate +^^^^^^^^^^^^^^^^^^^^^^^ + +This token is only valid for fragment shader INPUT declarations. + +The Interpolate field specifes the way input is being interpolated by +the rasteriser and is one of TGSI_INTERPOLATE_*. + +The CylindricalWrap bitfield specifies which register components +should be subject to cylindrical wrapping when interpolating by the +rasteriser. If TGSI_CYLINDRICAL_WRAP_X is set to 1, the X component +should be interpolated according to cylindrical wrapping rules. + + +Declaration Sampler View +^^^^^^^^^^^^^^^^^^^^^^^^ + + Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW. + + DCL SVIEW[#], resource, type(s) + + Declares a shader input sampler view and assigns it to a SVIEW[#] + register. + + resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray. + + type must be 1 or 4 entries (if specifying on a per-component + level) out of UNORM, SNORM, SINT, UINT and FLOAT. + + +Declaration Resource +^^^^^^^^^^^^^^^^^^^^ + + Follows Declaration token if file is TGSI_FILE_RESOURCE. + + DCL RES[#], resource [, WR] [, RAW] + + Declares a shader input resource and assigns it to a RES[#] + register. + + resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and + 2DArray. + + If the RAW keyword is not specified, the texture data will be + subject to conversion, swizzling and scaling as required to yield + the specified data type from the physical data format of the bound + resource. + + If the RAW keyword is specified, no channel conversion will be + performed: the values read for each of the channels (X,Y,Z,W) will + correspond to consecutive words in the same order and format + they're found in memory. No element-to-address conversion will be + performed either: the value of the provided X coordinate will be + interpreted in byte units instead of texel units. The result of + accessing a misaligned address is undefined. + + Usage of the STORE opcode is only allowed if the WR (writable) flag + is set. + + +Properties +^^^^^^^^^^^^^^^^^^^^^^^^ + + + Properties are general directives that apply to the whole TGSI program. + +FS_COORD_ORIGIN +""""""""""""""" + +Specifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin. +The default value is UPPER_LEFT. + +If UPPER_LEFT, the position will be (0,0) at the upper left corner and +increase downward and rightward. +If LOWER_LEFT, the position will be (0,0) at the lower left corner and +increase upward and rightward. + +OpenGL defaults to LOWER_LEFT, and is configurable with the +GL_ARB_fragment_coord_conventions extension. + +DirectX 9/10 use UPPER_LEFT. + +FS_COORD_PIXEL_CENTER +""""""""""""""""""""" + +Specifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention. +The default value is HALF_INTEGER. + +If HALF_INTEGER, the fractionary part of the position will be 0.5 +If INTEGER, the fractionary part of the position will be 0.0 + +Note that this does not affect the set of fragments generated by +rasterization, which is instead controlled by gl_rasterization_rules in the +rasterizer. + +OpenGL defaults to HALF_INTEGER, and is configurable with the +GL_ARB_fragment_coord_conventions extension. + +DirectX 9 uses INTEGER. +DirectX 10 uses HALF_INTEGER. + +FS_COLOR0_WRITES_ALL_CBUFS +"""""""""""""""""""""""""" +Specifies that writes to the fragment shader color 0 are replicated to all +bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where +fragData is directed to a single color buffer, but fragColor is broadcast. + +VS_PROHIBIT_UCPS +"""""""""""""""""""""""""" +If this property is set on the program bound to the shader stage before the +fragment shader, user clip planes should have no effect (be disabled) even if +that shader does not write to any clip distance outputs and the rasterizer's +clip_plane_enable is non-zero. +This property is only supported by drivers that also support shader clip +distance outputs. +This is useful for APIs that don't have UCPs and where clip distances written +by a shader cannot be disabled. + + +Texture Sampling and Texture Formats +------------------------------------ + +This table shows how texture image components are returned as (x,y,z,w) tuples +by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and +:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as +well. + ++--------------------+--------------+--------------------+--------------+ +| Texture Components | Gallium | OpenGL | Direct3D 9 | ++====================+==============+====================+==============+ +| R | (r, 0, 0, 1) | (r, 0, 0, 1) | (r, 1, 1, 1) | ++--------------------+--------------+--------------------+--------------+ +| RG | (r, g, 0, 1) | (r, g, 0, 1) | (r, g, 1, 1) | ++--------------------+--------------+--------------------+--------------+ +| RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) | ++--------------------+--------------+--------------------+--------------+ +| RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) | ++--------------------+--------------+--------------------+--------------+ +| A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) | ++--------------------+--------------+--------------------+--------------+ +| L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) | ++--------------------+--------------+--------------------+--------------+ +| LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) | ++--------------------+--------------+--------------------+--------------+ +| I | (i, i, i, i) | (i, i, i, i) | N/A | ++--------------------+--------------+--------------------+--------------+ +| UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) | +| | | [#envmap-bumpmap]_ | | ++--------------------+--------------+--------------------+--------------+ +| Z | XXX TBD | (z, z, z, 1) | (0, z, 0, 1) | +| | | [#depth-tex-mode]_ | | ++--------------------+--------------+--------------------+--------------+ +| S | (s, s, s, s) | unknown | unknown | ++--------------------+--------------+--------------------+--------------+ + +.. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt +.. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z) + or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.