X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=src%2Fgallium%2Fdocs%2Fsource%2Ftgsi.rst;h=548a9a398556d79368c910a67be8f0b90b7736f8;hb=bb4c5d72d7c7cb1d9e7016e2c07c36875f30011a;hp=4c1f47ac67051c328a8c720dc979b116f96ae658;hpb=d323118c3ef1ed197e61e7a80e0ddafbe9e70ecb;p=mesa.git diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 4c1f47ac670..548a9a39855 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -577,17 +577,56 @@ This instruction replicates its result. .. opcode:: TEX - Texture Lookup - TBD +.. math:: + + coord = src0 + + bias = 0.0 + dst = texture_sample(unit, coord, bias) + + for array textures src0.y contains the slice for 1D, + and src0.z contain the slice for 2D. + for shadow textures with no arrays, src0.z contains + the reference value. + for shadow textures with arrays, src0.z contains + the reference value for 1D arrays, and src0.w contains + the reference value for 2D arrays. + There is no way to pass a bias in the .w value for + shadow arrays, and GLSL doesn't allow this. + GLSL does allow cube shadows maps to take a bias value, + and we have to determine how this will look in TGSI. .. opcode:: TXD - Texture Lookup with Derivatives - TBD +.. math:: + + coord = src0 + + ddx = src1 + + ddy = src2 + + bias = 0.0 + + dst = texture_sample_deriv(unit, coord, bias, ddx, ddy) .. opcode:: TXP - Projective Texture Lookup - TBD +.. math:: + + coord.x = src0.x / src.w + + coord.y = src0.y / src.w + + coord.z = src0.z / src.w + + coord.w = src0.w + + bias = 0.0 + + dst = texture_sample(unit, coord, bias) .. opcode:: UP2H - Unpack Two 16-Bit Floats @@ -678,8 +717,6 @@ This instruction replicates its result. pc = pop() - Potential restrictions: - * Only occurs at end of function. .. opcode:: SSG - Set Sign @@ -731,7 +768,19 @@ This instruction replicates its result. .. opcode:: TXB - Texture Lookup With Bias - TBD +.. math:: + + coord.x = src.x + + coord.y = src.y + + coord.z = src.z + + coord.w = 1.0 + + bias = src.z + + dst = texture_sample(unit, coord, bias) .. opcode:: NRM - 3-component Vector Normalise @@ -769,9 +818,21 @@ This instruction replicates its result. dst = src0.x \times src1.x + src0.y \times src1.y -.. opcode:: TXL - Texture Lookup With LOD +.. opcode:: TXL - Texture Lookup With explicit LOD - TBD +.. math:: + + coord.x = src0.x + + coord.y = src0.y + + coord.z = src0.z + + coord.w = 1.0 + + lod = src0.w + + dst = texture_sample(unit, coord, lod) .. opcode:: BRK - Break @@ -963,6 +1024,38 @@ XXX so let's discuss it, yeah? dst.w = src0.w \oplus src1.w +.. opcode:: UCMP - Integer Conditional Move + +.. math:: + + dst.x = src0.x ? src1.x : src2.x + + dst.y = src0.y ? src1.y : src2.y + + dst.z = src0.z ? src1.z : src2.z + + dst.w = src0.w ? src1.w : src2.w + + +.. opcode:: UARL - Integer Address Register Load + + Moves the contents of the source register, assumed to be an integer, into the + destination register, which is assumed to be an address (ADDR) register. + + +.. opcode:: IABS - Integer Absolute Value + +.. math:: + + dst.x = |src.x| + + dst.y = |src.y| + + dst.z = |src.z| + + dst.w = |src.w| + + .. opcode:: SAD - Sum Of Absolute Differences .. math:: @@ -976,14 +1069,33 @@ XXX so let's discuss it, yeah? dst.w = |src0.w - src1.w| + src2.w -.. opcode:: TXF - Texel Fetch +.. opcode:: TXF - Texel Fetch (as per NV_gpu_shader4), extract a single texel + from a specified texture image. The source sampler may + not be a CUBE or SHADOW. + src 0 is a four-component signed integer vector used to + identify the single texel accessed. 3 components + level. + src 1 is a 3 component constant signed integer vector, + with each component only have a range of + -8..+8 (hw only seems to deal with this range, interface + allows for up to unsigned int). + TXF(uint_vec coord, int_vec offset). - TBD +.. opcode:: TXQ - Texture Size Query (as per NV_gpu_program4) + retrieve the dimensions of the texture + depending on the target. For 1D (width), 2D/RECT/CUBE + (width, height), 3D (width, height, depth), + 1D array (width, layers), 2D array (width, height, layers) + +.. math:: -.. opcode:: TXQ - Texture Size Query + lod = src0 - TBD + dst.x = texture_width(unit, lod) + + dst.y = texture_height(unit, lod) + + dst.z = texture_depth(unit, lod) .. opcode:: CONT - Continue @@ -1200,6 +1312,421 @@ This opcode is the inverse of :opcode:`DFRACEXP`. dst.zw = \sqrt{src.zw} +.. _samplingopcodes: + +Resource Sampling Opcodes +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Those opcodes follow very closely semantics of the respective Direct3D +instructions. If in doubt double check Direct3D documentation. + +.. opcode:: SAMPLE - Using provided address, sample data from the + specified texture using the filtering mode identified + by the gven sampler. The source data may come from + any resource type other than buffers. + SAMPLE dst, address, sampler_view, sampler + e.g. + SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0] + +.. opcode:: SAMPLE_I - Simplified alternative to the SAMPLE instruction. + Using the provided integer address, SAMPLE_I fetches data + from the specified sampler view without any filtering. + The source data may come from any resource type other + than CUBE. + SAMPLE_I dst, address, sampler_view + e.g. + SAMPLE_I TEMP[0], TEMP[1], SVIEW[0] + The 'address' is specified as unsigned integers. If the + 'address' is out of range [0...(# texels - 1)] the + result of the fetch is always 0 in all components. + As such the instruction doesn't honor address wrap + modes, in cases where that behavior is desirable + 'SAMPLE' instruction should be used. + address.w always provides an unsigned integer mipmap + level. If the value is out of the range then the + instruction always returns 0 in all components. + address.yz are ignored for buffers and 1d textures. + address.z is ignored for 1d texture arrays and 2d + textures. + For 1D texture arrays address.y provides the array + index (also as unsigned integer). If the value is + out of the range of available array indices + [0... (array size - 1)] then the opcode always returns + 0 in all components. + For 2D texture arrays address.z provides the array + index, otherwise it exhibits the same behavior as in + the case for 1D texture arrays. + The exact semantics of the source address are presented + in the table below: + resource type X Y Z W + ------------- ------------------------ + PIPE_BUFFER x ignored + PIPE_TEXTURE_1D x mpl + PIPE_TEXTURE_2D x y mpl + PIPE_TEXTURE_3D x y z mpl + PIPE_TEXTURE_RECT x y mpl + PIPE_TEXTURE_CUBE not allowed as source + PIPE_TEXTURE_1D_ARRAY x idx mpl + PIPE_TEXTURE_2D_ARRAY x y idx mpl + + Where 'mpl' is a mipmap level and 'idx' is the + array index. + +.. opcode:: SAMPLE_I_MS - Just like SAMPLE_I but allows fetch data from + multi-sampled surfaces. + +.. opcode:: SAMPLE_B - Just like the SAMPLE instruction with the + exception that an additiona bias is applied to the + level of detail computed as part of the instruction + execution. + SAMPLE_B dst, address, sampler_view, sampler, lod_bias + e.g. + SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x + +.. opcode:: SAMPLE_C - Similar to the SAMPLE instruction but it + performs a comparison filter. The operands to SAMPLE_C + are identical to SAMPLE, except that tere is an additional + float32 operand, reference value, which must be a register + with single-component, or a scalar literal. + SAMPLE_C makes the hardware use the current samplers + compare_func (in pipe_sampler_state) to compare + reference value against the red component value for the + surce resource at each texel that the currently configured + texture filter covers based on the provided coordinates. + SAMPLE_C dst, address, sampler_view.r, sampler, ref_value + e.g. + SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x + +.. opcode:: SAMPLE_C_LZ - Same as SAMPLE_C, but LOD is 0 and derivatives + are ignored. The LZ stands for level-zero. + SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value + e.g. + SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x + + +.. opcode:: SAMPLE_D - SAMPLE_D is identical to the SAMPLE opcode except + that the derivatives for the source address in the x + direction and the y direction are provided by extra + parameters. + SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y + e.g. + SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3] + +.. opcode:: SAMPLE_L - SAMPLE_L is identical to the SAMPLE opcode except + that the LOD is provided directly as a scalar value, + representing no anisotropy. Source addresses A channel + is used as the LOD. + SAMPLE_L dst, address, sampler_view, sampler + e.g. + SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0] + +.. opcode:: GATHER4 - Gathers the four texels to be used in a bi-linear + filtering operation and packs them into a single register. + Only works with 2D, 2D array, cubemaps, and cubemaps arrays. + For 2D textures, only the addressing modes of the sampler and + the top level of any mip pyramid are used. Set W to zero. + It behaves like the SAMPLE instruction, but a filtered + sample is not generated. The four samples that contribute + to filtering are placed into xyzw in counter-clockwise order, + starting with the (u,v) texture coordinate delta at the + following locations (-, +), (+, +), (+, -), (-, -), where + the magnitude of the deltas are half a texel. + + +.. opcode:: SVIEWINFO - query the dimensions of a given sampler view. + dst receives width, height, depth or array size and + number of mipmap levels. The dst can have a writemask + which will specify what info is the caller interested + in. + SVIEWINFO dst, src_mip_level, sampler_view + e.g. + SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0] + src_mip_level is an unsigned integer scalar. If it's + out of range then returns 0 for width, height and + depth/array size but the total number of mipmap is + still returned correctly for the given sampler view. + The returned width, height and depth values are for + the mipmap level selected by the src_mip_level and + are in the number of texels. + For 1d texture array width is in dst.x, array size + is in dst.y and dst.zw are always 0. + +.. opcode:: SAMPLE_POS - query the position of a given sample. + dst receives float4 (x, y, 0, 0) indicated where the + sample is located. If the resource is not a multi-sample + resource and not a render target, the result is 0. + +.. opcode:: SAMPLE_INFO - dst receives number of samples in x. + If the resource is not a multi-sample resource and + not a render target, the result is 0. + + +.. _resourceopcodes: + +Resource Access Opcodes +^^^^^^^^^^^^^^^^^^^^^^^ + +.. opcode:: LOAD - Fetch data from a shader resource + + Syntax: ``LOAD dst, resource, address`` + + Example: ``LOAD TEMP[0], RES[0], TEMP[1]`` + + Using the provided integer address, LOAD fetches data + from the specified buffer or texture without any + filtering. + + The 'address' is specified as a vector of unsigned + integers. If the 'address' is out of range the result + is unspecified. + + Only the first mipmap level of a resource can be read + from using this instruction. + + For 1D or 2D texture arrays, the array index is + provided as an unsigned integer in address.y or + address.z, respectively. address.yz are ignored for + buffers and 1D textures. address.z is ignored for 1D + texture arrays and 2D textures. address.w is always + ignored. + +.. opcode:: STORE - Write data to a shader resource + + Syntax: ``STORE resource, address, src`` + + Example: ``STORE RES[0], TEMP[0], TEMP[1]`` + + Using the provided integer address, STORE writes data + to the specified buffer or texture. + + The 'address' is specified as a vector of unsigned + integers. If the 'address' is out of range the result + is unspecified. + + Only the first mipmap level of a resource can be + written to using this instruction. + + For 1D or 2D texture arrays, the array index is + provided as an unsigned integer in address.y or + address.z, respectively. address.yz are ignored for + buffers and 1D textures. address.z is ignored for 1D + texture arrays and 2D textures. address.w is always + ignored. + + +.. _threadsyncopcodes: + +Inter-thread synchronization opcodes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +These opcodes are intended for communication between threads running +within the same compute grid. For now they're only valid in compute +programs. + +.. opcode:: MFENCE - Memory fence + + Syntax: ``MFENCE resource`` + + Example: ``MFENCE RES[0]`` + + This opcode forces strong ordering between any memory access + operations that affect the specified resource. This means that + previous loads and stores (and only those) will be performed and + visible to other threads before the program execution continues. + + +.. opcode:: LFENCE - Load memory fence + + Syntax: ``LFENCE resource`` + + Example: ``LFENCE RES[0]`` + + Similar to MFENCE, but it only affects the ordering of memory loads. + + +.. opcode:: SFENCE - Store memory fence + + Syntax: ``SFENCE resource`` + + Example: ``SFENCE RES[0]`` + + Similar to MFENCE, but it only affects the ordering of memory stores. + + +.. opcode:: BARRIER - Thread group barrier + + ``BARRIER`` + + This opcode suspends the execution of the current thread until all + the remaining threads in the working group reach the same point of + the program. Results are unspecified if any of the remaining + threads terminates or never reaches an executed BARRIER instruction. + + +.. _atomopcodes: + +Atomic opcodes +^^^^^^^^^^^^^^ + +These opcodes provide atomic variants of some common arithmetic and +logical operations. In this context atomicity means that another +concurrent memory access operation that affects the same memory +location is guaranteed to be performed strictly before or after the +entire execution of the atomic operation. + +For the moment they're only valid in compute programs. + +.. opcode:: ATOMUADD - Atomic integer addition + + Syntax: ``ATOMUADD dst, resource, offset, src`` + + Example: ``ATOMUADD TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = dst_i + src_i + + +.. opcode:: ATOMXCHG - Atomic exchange + + Syntax: ``ATOMXCHG dst, resource, offset, src`` + + Example: ``ATOMXCHG TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = src_i + + +.. opcode:: ATOMCAS - Atomic compare-and-exchange + + Syntax: ``ATOMCAS dst, resource, offset, cmp, src`` + + Example: ``ATOMCAS TEMP[0], RES[0], TEMP[1], TEMP[2], TEMP[3]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = (dst_i == cmp_i ? src_i : dst_i) + + +.. opcode:: ATOMAND - Atomic bitwise And + + Syntax: ``ATOMAND dst, resource, offset, src`` + + Example: ``ATOMAND TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = dst_i \& src_i + + +.. opcode:: ATOMOR - Atomic bitwise Or + + Syntax: ``ATOMOR dst, resource, offset, src`` + + Example: ``ATOMOR TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = dst_i | src_i + + +.. opcode:: ATOMXOR - Atomic bitwise Xor + + Syntax: ``ATOMXOR dst, resource, offset, src`` + + Example: ``ATOMXOR TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = dst_i \oplus src_i + + +.. opcode:: ATOMUMIN - Atomic unsigned minimum + + Syntax: ``ATOMUMIN dst, resource, offset, src`` + + Example: ``ATOMUMIN TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = (dst_i < src_i ? dst_i : src_i) + + +.. opcode:: ATOMUMAX - Atomic unsigned maximum + + Syntax: ``ATOMUMAX dst, resource, offset, src`` + + Example: ``ATOMUMAX TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = (dst_i > src_i ? dst_i : src_i) + + +.. opcode:: ATOMIMIN - Atomic signed minimum + + Syntax: ``ATOMIMIN dst, resource, offset, src`` + + Example: ``ATOMIMIN TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = (dst_i < src_i ? dst_i : src_i) + + +.. opcode:: ATOMIMAX - Atomic signed maximum + + Syntax: ``ATOMIMAX dst, resource, offset, src`` + + Example: ``ATOMIMAX TEMP[0], RES[0], TEMP[1], TEMP[2]`` + + The following operation is performed atomically on each component: + +.. math:: + + dst_i = resource[offset]_i + + resource[offset]_i = (dst_i > src_i ? dst_i : src_i) + + + Explanation of symbols used ------------------------------ @@ -1268,19 +1795,19 @@ of TGSI_FILE. UsageMask field specifies which of the register components can be accessed and is one of TGSI_WRITEMASK. -Interpolate field is only valid for fragment shader INPUT register files. -It specifes the way input is being interpolated by the rasteriser and is one -of TGSI_INTERPOLATE. +The Local flag specifies that a given value isn't intended for +subroutine parameter passing and, as a result, the implementation +isn't required to give any guarantees of it being preserved across +subroutine boundaries. As it's merely a compiler hint, the +implementation is free to ignore it. If Dimension flag is set to 1, a Declaration Dimension token follows. If Semantic flag is set to 1, a Declaration Semantic token follows. -CylindricalWrap bitfield is only valid for fragment shader INPUT register -files. It specifies which register components should be subject to cylindrical -wrapping when interpolating by the rasteriser. If TGSI_CYLINDRICAL_WRAP_X -is set to 1, the X component should be interpolated according to cylindrical -wrapping rules. +If Interpolate flag is set to 1, a Declaration Interpolate token follows. + +If file is TGSI_FILE_RESOURCE, a Declaration Resource token follows. Declaration Semantic @@ -1417,6 +1944,72 @@ Edge flags are used to control which lines or points are actually drawn when the polygon mode converts triangles/quads/polygons into points or lines. +TGSI_SEMANTIC_STENCIL +"""""""""""""""""""""" + +For fragment shaders, this semantic label indicates than an output +is a writable stencil reference value. Only the Y component is writable. +This allows the fragment shader to change the fragments stencilref value. + + +Declaration Interpolate +^^^^^^^^^^^^^^^^^^^^^^^ + +This token is only valid for fragment shader INPUT declarations. + +The Interpolate field specifes the way input is being interpolated by +the rasteriser and is one of TGSI_INTERPOLATE_*. + +The CylindricalWrap bitfield specifies which register components +should be subject to cylindrical wrapping when interpolating by the +rasteriser. If TGSI_CYLINDRICAL_WRAP_X is set to 1, the X component +should be interpolated according to cylindrical wrapping rules. + + +Declaration Sampler View +^^^^^^^^^^^^^^^^^^^^^^^^ + + Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW. + + DCL SVIEW[#], resource, type(s) + + Declares a shader input sampler view and assigns it to a SVIEW[#] + register. + + resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray. + + type must be 1 or 4 entries (if specifying on a per-component + level) out of UNORM, SNORM, SINT, UINT and FLOAT. + + +Declaration Resource +^^^^^^^^^^^^^^^^^^^^ + + Follows Declaration token if file is TGSI_FILE_RESOURCE. + + DCL RES[#], resource [, WR] [, RAW] + + Declares a shader input resource and assigns it to a RES[#] + register. + + resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and + 2DArray. + + If the RAW keyword is not specified, the texture data will be + subject to conversion, swizzling and scaling as required to yield + the specified data type from the physical data format of the bound + resource. + + If the RAW keyword is specified, no channel conversion will be + performed: the values read for each of the channels (X,Y,Z,W) will + correspond to consecutive words in the same order and format + they're found in memory. No element-to-address conversion will be + performed either: the value of the provided X coordinate will be + interpreted in byte units instead of texel units. The result of + accessing a misaligned address is undefined. + + Usage of the STORE opcode is only allowed if the WR (writable) flag + is set. Properties @@ -1460,6 +2053,22 @@ GL_ARB_fragment_coord_conventions extension. DirectX 9 uses INTEGER. DirectX 10 uses HALF_INTEGER. +FS_COLOR0_WRITES_ALL_CBUFS +"""""""""""""""""""""""""" +Specifies that writes to the fragment shader color 0 are replicated to all +bound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where +fragData is directed to a single color buffer, but fragColor is broadcast. + +VS_PROHIBIT_UCPS +"""""""""""""""""""""""""" +If this property is set on the program bound to the shader stage before the +fragment shader, user clip planes should have no effect (be disabled) even if +that shader does not write to any clip distance outputs and the rasterizer's +clip_plane_enable is non-zero. +This property is only supported by drivers that also support shader clip +distance outputs. +This is useful for APIs that don't have UCPs and where clip distances written +by a shader cannot be disabled. Texture Sampling and Texture Formats @@ -1495,6 +2104,8 @@ well. | Z | XXX TBD | (z, z, z, 1) | (0, z, 0, 1) | | | | [#depth-tex-mode]_ | | +--------------------+--------------+--------------------+--------------+ +| S | (s, s, s, s) | unknown | unknown | ++--------------------+--------------+--------------------+--------------+ .. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt .. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)