X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=src%2Fgallium%2Fdocs%2Fsource%2Ftgsi.rst;h=ecab7cb8097bba02831174a295918f804157915f;hb=1e6d51e805baa11eff17ea784c92ffc7933c56c5;hp=7e6dce995389a17b3624cbac9a3af49f8f0e6306;hpb=62ca7b85ae1f7d914156a9b376d0520db85ba495;p=mesa.git diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 7e6dce99538..ecab7cb8097 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -19,12 +19,18 @@ Some instructions, like :opcode:`I2F`, permit re-interpretation of vector components as integers. Other instructions permit using registers as two-component vectors with double precision; see :ref:`Double Opcodes`. +When an instruction has a scalar result, the result is usually copied into +each of the components of *dst*. When this happens, the result is said to be +*replicated* to *dst*. :opcode:`RCP` is one such instruction. + Instruction Set --------------- -From GL_NV_vertex_program +Core ISA ^^^^^^^^^^^^^^^^^^^^^^^^^ +These opcodes are guaranteed to be available regardless of the driver being +used. .. opcode:: ARL - Address Register Load @@ -67,28 +73,20 @@ From GL_NV_vertex_program .. opcode:: RCP - Reciprocal -.. math:: - - dst.x = \frac{1}{src.x} - - dst.y = \frac{1}{src.x} +This instruction replicates its result. - dst.z = \frac{1}{src.x} +.. math:: - dst.w = \frac{1}{src.x} + dst = \frac{1}{src.x} .. opcode:: RSQ - Reciprocal Square Root -.. math:: - - dst.x = \frac{1}{\sqrt{|src.x|}} - - dst.y = \frac{1}{\sqrt{|src.x|}} +This instruction replicates its result. - dst.z = \frac{1}{\sqrt{|src.x|}} +.. math:: - dst.w = \frac{1}{\sqrt{|src.x|}} + dst = \frac{1}{\sqrt{|src.x|}} .. opcode:: EXP - Approximate Exponential Base 2 @@ -145,28 +143,20 @@ From GL_NV_vertex_program .. opcode:: DP3 - 3-component Dot Product -.. math:: - - dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z - - dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z +This instruction replicates its result. - dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z +.. math:: - dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z .. opcode:: DP4 - 4-component Dot Product -.. math:: - - dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w - - dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w +This instruction replicates its result. - dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w +.. math:: - dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w + dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w .. opcode:: DST - Distance Vector @@ -299,7 +289,7 @@ From GL_NV_vertex_program dst.w = src0.x \times src1.x + src0.y \times src1.y + src2.x -.. opcode:: FRAC - Fraction +.. opcode:: FRC - Fraction .. math:: @@ -327,7 +317,7 @@ From GL_NV_vertex_program .. opcode:: FLR - Floor -This is identical to ARL. +This is identical to :opcode:`ARL`. .. math:: @@ -355,41 +345,29 @@ This is identical to ARL. .. opcode:: EX2 - Exponential Base 2 -.. math:: - - dst.x = 2^{src.x} - - dst.y = 2^{src.x} +This instruction replicates its result. - dst.z = 2^{src.x} +.. math:: - dst.w = 2^{src.x} + dst = 2^{src.x} .. opcode:: LG2 - Logarithm Base 2 -.. math:: - - dst.x = \log_2{src.x} +This instruction replicates its result. - dst.y = \log_2{src.x} - - dst.z = \log_2{src.x} +.. math:: - dst.w = \log_2{src.x} + dst = \log_2{src.x} .. opcode:: POW - Power -.. math:: - - dst.x = src0.x^{src1.x} +This instruction replicates its result. - dst.y = src0.x^{src1.x} - - dst.z = src0.x^{src1.x} +.. math:: - dst.w = src0.x^{src1.x} + dst = src0.x^{src1.x} .. opcode:: XPD - Cross Product @@ -419,43 +397,31 @@ This is identical to ARL. .. opcode:: RCC - Reciprocal Clamped +This instruction replicates its result. + XXX cleanup on aisle three .. math:: - dst.x = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) - - dst.y = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) - - dst.z = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) - - dst.w = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) + dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020) .. opcode:: DPH - Homogeneous Dot Product -.. math:: - - dst.x = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w - - dst.y = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w +This instruction replicates its result. - dst.z = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w +.. math:: - dst.w = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w + dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src1.w .. opcode:: COS - Cosine -.. math:: +This instruction replicates its result. - dst.x = \cos{src.x} - - dst.y = \cos{src.x} - - dst.z = \cos{src.x} +.. math:: - dst.w = \cos{src.x} + dst = \cos{src.x} .. opcode:: DDX - Derivative Relative To X @@ -521,7 +487,9 @@ XXX cleanup on aisle three dst.w = 1 -Considered for removal. +.. note:: + + Considered for removal. .. opcode:: SEQ - Set On Equal @@ -539,17 +507,16 @@ Considered for removal. .. opcode:: SFL - Set On False -.. math:: +This instruction replicates its result. - dst.x = 0 +.. math:: - dst.y = 0 + dst = 0 - dst.z = 0 +.. note:: - dst.w = 0 + Considered for removal. -Considered for removal. .. opcode:: SGT - Set On Greater Than @@ -566,15 +533,11 @@ Considered for removal. .. opcode:: SIN - Sine -.. math:: - - dst.x = \sin{src.x} - - dst.y = \sin{src.x} +This instruction replicates its result. - dst.z = \sin{src.x} +.. math:: - dst.w = \sin{src.x} + dst = \sin{src.x} .. opcode:: SLE - Set On Less Equal Than @@ -605,15 +568,11 @@ Considered for removal. .. opcode:: STR - Set On True -.. math:: +This instruction replicates its result. - dst.x = 1 - - dst.y = 1 +.. math:: - dst.z = 1 - - dst.w = 1 + dst = 1 .. opcode:: TEX - Texture Lookup @@ -635,25 +594,33 @@ Considered for removal. TBD - Considered for removal. +.. note:: + + Considered for removal. .. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars TBD - Considered for removal. +.. note:: + + Considered for removal. .. opcode:: UP4B - Unpack Four Signed 8-Bit Values TBD - Considered for removal. +.. note:: + + Considered for removal. .. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars TBD - Considered for removal. +.. note:: + + Considered for removal. .. opcode:: X2D - 2D Coordinate Transformation @@ -667,18 +634,18 @@ Considered for removal. dst.w = src0.y + src1.x \times src2.z + src1.y \times src2.w -Considered for removal. +.. note:: - -From GL_NV_vertex_program2 -^^^^^^^^^^^^^^^^^^^^^^^^^^ + Considered for removal. .. opcode:: ARA - Address Register Add TBD - Considered for removal. +.. note:: + + Considered for removal. .. opcode:: ARR - Address Register Load With Round @@ -697,7 +664,9 @@ From GL_NV_vertex_program2 pc = target - Considered for removal. +.. note:: + + Considered for removal. .. opcode:: CAL - Subroutine Call @@ -793,15 +762,11 @@ From GL_NV_vertex_program2 .. opcode:: DP2 - 2-component Dot Product -.. math:: - - dst.x = src0.x \times src1.x + src0.y \times src1.y - - dst.y = src0.x \times src1.x + src0.y \times src1.y +This instruction replicates its result. - dst.z = src0.x \times src1.x + src0.y \times src1.y +.. math:: - dst.w = src0.x \times src1.x + src0.y \times src1.y + dst = src0.x \times src1.x + src0.y \times src1.y .. opcode:: TXL - Texture Lookup With LOD @@ -819,27 +784,6 @@ From GL_NV_vertex_program2 TBD -.. opcode:: BGNFOR - Begin a For-Loop - - dst.x = floor(src.x) - dst.y = floor(src.y) - dst.z = floor(src.z) - - if (dst.y <= 0) - pc = [matching ENDFOR] + 1 - endif - - Note: The destination must be a loop register. - The source must be a constant register. - - Considered for cleanup / removal. - - -.. opcode:: REP - Repeat - - TBD - - .. opcode:: ELSE - Else TBD @@ -850,24 +794,6 @@ From GL_NV_vertex_program2 TBD -.. opcode:: ENDFOR - End a For-Loop - - dst.x = dst.x + dst.z - dst.y = dst.y - 1.0 - - if (dst.y > 0) - pc = [matching BGNFOR instruction] + 1 - endif - - Note: The destination must be a loop register. - - Considered for cleanup / removal. - -.. opcode:: ENDREP - End Repeat - - TBD - - .. opcode:: PUSHA - Push Address Register On Stack push(src.x) @@ -875,7 +801,13 @@ From GL_NV_vertex_program2 push(src.z) push(src.w) - Considered for cleanup / removal. +.. note:: + + Considered for cleanup. + +.. note:: + + Considered for removal. .. opcode:: POPA - Pop Address Register From Stack @@ -884,14 +816,23 @@ From GL_NV_vertex_program2 dst.y = pop() dst.x = pop() - Considered for cleanup / removal. +.. note:: + + Considered for cleanup. +.. note:: -From GL_NV_gpu_program4 + Considered for removal. + + +Compute ISA ^^^^^^^^^^^^^^^^^^^^^^^^ +These opcodes are primarily provided for special-use computational shaders. Support for these opcodes indicated by a special pipe capability bit (TBD). +XXX so let's discuss it, yeah? + .. opcode:: CEIL - Ceiling .. math:: @@ -1049,10 +990,17 @@ Support for these opcodes indicated by a special pipe capability bit (TBD). TBD +.. note:: + + Support for CONT is determined by a special capability bit, + ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information. -From GL_NV_geometry_program4 + +Geometry ISA ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +These opcodes are only supported in geometry shaders; they have no meaning +in any other type of shader. .. opcode:: EMIT - Emit @@ -1064,9 +1012,11 @@ From GL_NV_geometry_program4 TBD -From GLSL +GLSL ISA ^^^^^^^^^^ +These opcodes are part of :term:`GLSL`'s opcode set. Support for these +opcodes is determined by a special capability bit, ``GLSL``. .. opcode:: BGNLOOP - Begin a Loop @@ -1095,20 +1045,17 @@ From GLSL .. opcode:: NRM4 - 4-component Vector Normalise -.. math:: - - dst.x = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} - - dst.y = \frac{src.y}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} +This instruction replicates its result. - dst.z = \frac{src.z}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} +.. math:: - dst.w = \frac{src.w}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} + dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w} ps_2_x ^^^^^^^^^^^^ +XXX wait what .. opcode:: CALLNZ - Subroutine Call If Not Zero @@ -1126,10 +1073,15 @@ ps_2_x .. _doubleopcodes: -Double Opcodes +Double ISA ^^^^^^^^^^^^^^^ -.. opcode:: DADD - Add Double +The double-precision opcodes reinterpret four-component vectors into +two-component vectors with doubled precision in each component. + +Support for these opcodes is XXX undecided. :T + +.. opcode:: DADD - Add .. math:: @@ -1138,7 +1090,7 @@ Double Opcodes dst.zw = src0.zw + src1.zw -.. opcode:: DDIV - Divide Double +.. opcode:: DDIV - Divide .. math:: @@ -1146,7 +1098,7 @@ Double Opcodes dst.zw = src0.zw / src1.zw -.. opcode:: DSEQ - Set Double on Equal +.. opcode:: DSEQ - Set on Equal .. math:: @@ -1154,7 +1106,7 @@ Double Opcodes dst.zw = src0.zw == src1.zw ? 1.0F : 0.0F -.. opcode:: DSLT - Set Double on Less than +.. opcode:: DSLT - Set on Less than .. math:: @@ -1162,7 +1114,7 @@ Double Opcodes dst.zw = src0.zw < src1.zw ? 1.0F : 0.0F -.. opcode:: DFRAC - Double Fraction +.. opcode:: DFRAC - Fraction .. math:: @@ -1171,23 +1123,33 @@ Double Opcodes dst.zw = src.zw - \lfloor src.zw\rfloor -.. opcode:: DFRACEXP - Convert Double Number to Fractional and Integral Components +.. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components + +Like the ``frexp()`` routine in many math libraries, this opcode stores the +exponent of its source to ``dst0``, and the significand to ``dst1``, such that +:math:`dst1 \times 2^{dst0} = src` . .. math:: - dst0.xy = frexp(src.xy, dst1.xy) + dst0.xy = exp(src.xy) + + dst1.xy = frac(src.xy) + + dst0.zw = exp(src.zw) + + dst1.zw = frac(src.zw) - dst0.zw = frexp(src.zw, dst1.zw) +.. opcode:: DLDEXP - Multiply Number by Integral Power of 2 -.. opcode:: DLDEXP - Multiple Double Number by Integral Power of 2 +This opcode is the inverse of :opcode:`DFRACEXP`. .. math:: - dst.xy = ldexp(src0.xy, src1.xy) + dst.xy = src0.xy \times 2^{src1.xy} - dst.zw = ldexp(src0.zw, src1.zw) + dst.zw = src0.zw \times 2^{src1.zw} -.. opcode:: DMIN - Minimum Double +.. opcode:: DMIN - Minimum .. math:: @@ -1195,7 +1157,7 @@ Double Opcodes dst.zw = min(src0.zw, src1.zw) -.. opcode:: DMAX - Maximum Double +.. opcode:: DMAX - Maximum .. math:: @@ -1203,7 +1165,7 @@ Double Opcodes dst.zw = max(src0.zw, src1.zw) -.. opcode:: DMUL - Multiply Double +.. opcode:: DMUL - Multiply .. math:: @@ -1212,7 +1174,7 @@ Double Opcodes dst.zw = src0.zw \times src1.zw -.. opcode:: DMAD - Multiply And Add Doubles +.. opcode:: DMAD - Multiply And Add .. math:: @@ -1221,7 +1183,7 @@ Double Opcodes dst.zw = src0.zw \times src1.zw + src2.zw -.. opcode:: DRCP - Reciprocal Double +.. opcode:: DRCP - Reciprocal .. math:: @@ -1229,7 +1191,7 @@ Double Opcodes dst.zw = \frac{1}{src.zw} -.. opcode:: DSQRT - Square root double +.. opcode:: DSQRT - Square Root .. math:: @@ -1293,6 +1255,34 @@ Other tokens --------------- +Declaration +^^^^^^^^^^^ + + +Declares a register that is will be referenced as an operand in Instruction +tokens. + +File field contains register file that is being declared and is one +of TGSI_FILE. + +UsageMask field specifies which of the register components can be accessed +and is one of TGSI_WRITEMASK. + +Interpolate field is only valid for fragment shader INPUT register files. +It specifes the way input is being interpolated by the rasteriser and is one +of TGSI_INTERPOLATE. + +If Dimension flag is set to 1, a Declaration Dimension token follows. + +If Semantic flag is set to 1, a Declaration Semantic token follows. + +CylindricalWrap bitfield is only valid for fragment shader INPUT register +files. It specifies which register components should be subject to cylindrical +wrapping when interpolating by the rasteriser. If TGSI_CYLINDRICAL_WRAP_X +is set to 1, the X component should be interpolated according to cylindrical +wrapping rules. + + Declaration Semantic ^^^^^^^^^^^^^^^^^^^^^^^^ @@ -1365,10 +1355,8 @@ TGSI_SEMANTIC_PSIZE """"""""""""""""""" PSIZE, or point size, is used to specify point sizes per-vertex. It should -be in ``(p, n, x, f)`` format, where ``p`` is the point size, ``n`` is the minimum -size, ``x`` is the maximum size, and ``f`` is the fade threshold. - -XXX this is arb_vp. is this what we actually do? should double-check... +be in ``(s, 0, 0, 1)`` format, where ``s`` is the (possibly clamped) point size. +Only the first component matters when writing from the vertex shader. When using this semantic, be sure to set the appropriate state in the :ref:`rasterizer` first. @@ -1450,16 +1438,17 @@ DirectX 10 uses HALF_INTEGER. Texture Sampling and Texture Formats ------------------------------------ -This table shows how texture image components are returned as (x,y,z,w) -tuples by TGSI texture instructions, such as TEX, TXD, and TXP. -For reference, OpenGL and Direct3D conventions are shown as well. +This table shows how texture image components are returned as (x,y,z,w) tuples +by TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and +:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as +well. +--------------------+--------------+--------------------+--------------+ | Texture Components | Gallium | OpenGL | Direct3D 9 | +====================+==============+====================+==============+ -| R | XXX TBD | (r, 0, 0, 1) | (r, 1, 1, 1) | +| R | (r, 0, 0, 1) | (r, 0, 0, 1) | (r, 1, 1, 1) | +--------------------+--------------+--------------------+--------------+ -| RG | XXX TBD | (r, g, 0, 1) | (r, g, 1, 1) | +| RG | (r, g, 0, 1) | (r, g, 0, 1) | (r, g, 1, 1) | +--------------------+--------------+--------------------+--------------+ | RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) | +--------------------+--------------+--------------------+--------------+ @@ -1482,4 +1471,4 @@ For reference, OpenGL and Direct3D conventions are shown as well. .. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt .. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z) - or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE. + or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.