Some instructions, like :opcode:`I2F`, permit re-interpretation of vector
components as integers. Other instructions permit using registers as
-two-component vectors with double precision; see :ref:`Double Opcodes`.
+two-component vectors with double precision; see :ref:`doubleopcodes`.
When an instruction has a scalar result, the result is usually copied into
each of the components of *dst*. When this happens, the result is said to be
modifiers are supported (with absolute value being applied first).
TGSI_OPCODE_MOV is considered to have float input type for applying modifiers.
-For inputs which have signed type only the negate modifier is supported. This
-includes instructions which are otherwise ignorant if the type is signed or
-unsigned, such as TGSI_OPCODE_UADD.
-
-For inputs with unsigned type no modifiers are allowed.
+For inputs which have signed or unsigned type only the negate modifier is
+supported.
Instruction Set
---------------
.. math::
- dst.x = 1
-
- dst.y = max(src.x, 0)
-
- dst.z = (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0
-
- dst.w = 1
+ dst.x &= 1 \\
+ dst.y &= max(src.x, 0) \\
+ dst.z &= (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0 \\
+ dst.w &= 1
.. opcode:: RCP - Reciprocal
.. opcode:: RSQ - Reciprocal Square Root
-This instruction replicates its result.
+This instruction replicates its result. The results are undefined for src <= 0.
.. math::
- dst = \frac{1}{\sqrt{|src.x|}}
+ dst = \frac{1}{\sqrt{src.x}}
.. opcode:: SQRT - Square Root
-This instruction replicates its result.
+This instruction replicates its result. The results are undefined for src < 0.
.. math::
.. math::
- dst.x = 2^{\lfloor src.x\rfloor}
-
- dst.y = src.x - \lfloor src.x\rfloor
-
- dst.z = 2^{src.x}
-
- dst.w = 1
+ dst.x &= 2^{\lfloor src.x\rfloor} \\
+ dst.y &= src.x - \lfloor src.x\rfloor \\
+ dst.z &= 2^{src.x} \\
+ dst.w &= 1
.. opcode:: LOG - Approximate Logarithm Base 2
.. math::
- dst.x = \lfloor\log_2{|src.x|}\rfloor
-
- dst.y = \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}}
-
- dst.z = \log_2{|src.x|}
-
- dst.w = 1
+ dst.x &= \lfloor\log_2{|src.x|}\rfloor \\
+ dst.y &= \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} \\
+ dst.z &= \log_2{|src.x|} \\
+ dst.w &= 1
.. opcode:: MUL - Multiply
.. math::
- dst.x = 1
-
- dst.y = src0.y \times src1.y
-
- dst.z = src0.z
-
- dst.w = src1.w
+ dst.x &= 1\\
+ dst.y &= src0.y \times src1.y\\
+ dst.z &= src0.z\\
+ dst.w &= src1.w
.. opcode:: MIN - Minimum
.. math::
- dst.x = (src0.x < src1.x) ? 1 : 0
+ dst.x = (src0.x < src1.x) ? 1.0F : 0.0F
- dst.y = (src0.y < src1.y) ? 1 : 0
+ dst.y = (src0.y < src1.y) ? 1.0F : 0.0F
- dst.z = (src0.z < src1.z) ? 1 : 0
+ dst.z = (src0.z < src1.z) ? 1.0F : 0.0F
- dst.w = (src0.w < src1.w) ? 1 : 0
+ dst.w = (src0.w < src1.w) ? 1.0F : 0.0F
.. opcode:: SGE - Set On Greater Equal Than
.. math::
- dst.x = (src0.x >= src1.x) ? 1 : 0
+ dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F
- dst.y = (src0.y >= src1.y) ? 1 : 0
+ dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F
- dst.z = (src0.z >= src1.z) ? 1 : 0
+ dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F
- dst.w = (src0.w >= src1.w) ? 1 : 0
+ dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F
.. opcode:: MAD - Multiply And Add
.. math::
- dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.884467e+019) : clamp(1 / src.x, -1.884467e+019, -5.42101e-020)
+ dst = (1 / src.x) > 0 ? clamp(1 / src.x, 5.42101e-020, 1.84467e+019) : clamp(1 / src.x, -1.84467e+019, -5.42101e-020)
.. opcode:: DPH - Homogeneous Dot Product
dst.w = partialy(src.w)
-.. opcode:: KILP - Predicated Discard
-
- discard
-
-
.. opcode:: PK2H - Pack Two 16-bit Floats
TBD
.. math::
- dst.x = (src0.x == src1.x) ? 1 : 0
+ dst.x = (src0.x == src1.x) ? 1.0F : 0.0F
- dst.y = (src0.y == src1.y) ? 1 : 0
+ dst.y = (src0.y == src1.y) ? 1.0F : 0.0F
- dst.z = (src0.z == src1.z) ? 1 : 0
+ dst.z = (src0.z == src1.z) ? 1.0F : 0.0F
- dst.w = (src0.w == src1.w) ? 1 : 0
+ dst.w = (src0.w == src1.w) ? 1.0F : 0.0F
.. opcode:: SFL - Set On False
.. math::
- dst = 0
+ dst = 0.0F
.. note::
.. math::
- dst.x = (src0.x > src1.x) ? 1 : 0
+ dst.x = (src0.x > src1.x) ? 1.0F : 0.0F
- dst.y = (src0.y > src1.y) ? 1 : 0
+ dst.y = (src0.y > src1.y) ? 1.0F : 0.0F
- dst.z = (src0.z > src1.z) ? 1 : 0
+ dst.z = (src0.z > src1.z) ? 1.0F : 0.0F
- dst.w = (src0.w > src1.w) ? 1 : 0
+ dst.w = (src0.w > src1.w) ? 1.0F : 0.0F
.. opcode:: SIN - Sine
.. math::
- dst.x = (src0.x <= src1.x) ? 1 : 0
+ dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F
- dst.y = (src0.y <= src1.y) ? 1 : 0
+ dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F
- dst.z = (src0.z <= src1.z) ? 1 : 0
+ dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F
- dst.w = (src0.w <= src1.w) ? 1 : 0
+ dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F
.. opcode:: SNE - Set On Not Equal
.. math::
- dst.x = (src0.x != src1.x) ? 1 : 0
+ dst.x = (src0.x != src1.x) ? 1.0F : 0.0F
- dst.y = (src0.y != src1.y) ? 1 : 0
+ dst.y = (src0.y != src1.y) ? 1.0F : 0.0F
- dst.z = (src0.z != src1.z) ? 1 : 0
+ dst.z = (src0.z != src1.z) ? 1.0F : 0.0F
- dst.w = (src0.w != src1.w) ? 1 : 0
+ dst.w = (src0.w != src1.w) ? 1.0F : 0.0F
.. opcode:: STR - Set On True
.. math::
- dst = 1
+ dst = 1.0F
.. opcode:: TEX - Texture Lookup
-.. math::
-
- coord = src0
-
- bias = 0.0
-
- dst = texture_sample(unit, coord, bias)
-
for array textures src0.y contains the slice for 1D,
and src0.z contain the slice for 2D.
+
for shadow textures with no arrays, src0.z contains
the reference value.
+
for shadow textures with arrays, src0.z contains
the reference value for 1D arrays, and src0.w contains
the reference value for 2D arrays.
+
There is no way to pass a bias in the .w value for
shadow arrays, and GLSL doesn't allow this.
GLSL does allow cube shadows maps to take a bias value,
and we have to determine how this will look in TGSI.
+.. math::
+
+ coord = src0
+
+ bias = 0.0
+
+ dst = texture\_sample(unit, coord, bias)
+
.. opcode:: TXD - Texture Lookup with Derivatives
.. math::
bias = 0.0
- dst = texture_sample_deriv(unit, coord, bias, ddx, ddy)
+ dst = texture\_sample\_deriv(unit, coord, bias, ddx, ddy)
.. opcode:: TXP - Projective Texture Lookup
bias = 0.0
- dst = texture_sample(unit, coord, bias)
+ dst = texture\_sample(unit, coord, bias)
.. opcode:: UP2H - Unpack Two 16-Bit Floats
dst.w = round(src.w)
-.. opcode:: BRA - Branch
-
- pc = target
-
-.. note::
-
- Considered for removal.
-
-.. opcode:: CAL - Subroutine Call
-
- push(pc)
- pc = target
-
-
-.. opcode:: RET - Subroutine Call Return
-
- pc = pop()
-
-
.. opcode:: SSG - Set Sign
.. math::
dst.w = (src0.w < 0) ? src1.w : src2.w
-.. opcode:: KIL - Conditional Discard
+.. opcode:: KILL_IF - Conditional Discard
+
+ Conditional discard. Allowed in fragment shaders only.
.. math::
endif
+.. opcode:: KILL - Discard
+
+ Unconditional discard. Allowed in fragment shaders only.
+
+
.. opcode:: SCS - Sine Cosine
.. math::
bias = src.z
- dst = texture_sample(unit, coord, bias)
+ dst = texture\_sample(unit, coord, bias)
.. opcode:: NRM - 3-component Vector Normalise
lod = src0.w
- dst = texture_sample(unit, coord, lod)
+ dst = texture\_sample(unit, coord, lod)
-.. opcode:: BRK - Break
+.. opcode:: PUSHA - Push Address Register On Stack
- TBD
+ push(src.x)
+ push(src.y)
+ push(src.z)
+ push(src.w)
+.. note::
-.. opcode:: IF - Float If
+ Considered for cleanup.
- Start an IF ... ELSE .. ENDIF block. Condition evaluates to true if
+.. note::
- src0.x != 0.0
+ Considered for removal.
- where src0.x is interpreted as a floating point register.
+.. opcode:: POPA - Pop Address Register From Stack
+ dst.w = pop()
+ dst.z = pop()
+ dst.y = pop()
+ dst.x = pop()
-.. opcode:: UIF - Bitwise If
+.. note::
- Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
+ Considered for cleanup.
- src0.x != 0
+.. note::
- where src0.x is interpreted as an integer register.
+ Considered for removal.
+
+
+.. opcode:: BRA - Branch
+
+ pc = target
+
+.. note::
+
+ Considered for removal.
+
+
+.. opcode:: CALLNZ - Subroutine Call If Not Zero
+
+ TBD
+
+.. note::
+
+ Considered for cleanup.
+
+.. note::
+
+ Considered for removal.
+
+
+Compute ISA
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+These opcodes are primarily provided for special-use computational shaders.
+Support for these opcodes indicated by a special pipe capability bit (TBD).
+
+XXX doesn't look like most of the opcodes really belong here.
+
+.. opcode:: CEIL - Ceiling
+
+.. math::
+
+ dst.x = \lceil src.x\rceil
+
+ dst.y = \lceil src.y\rceil
+
+ dst.z = \lceil src.z\rceil
+
+ dst.w = \lceil src.w\rceil
+
+
+.. opcode:: TRUNC - Truncate
+
+.. math::
+
+ dst.x = trunc(src.x)
+
+ dst.y = trunc(src.y)
+
+ dst.z = trunc(src.z)
+
+ dst.w = trunc(src.w)
+
+
+.. opcode:: MOD - Modulus
+
+.. math::
+
+ dst.x = src0.x \bmod src1.x
+
+ dst.y = src0.y \bmod src1.y
+
+ dst.z = src0.z \bmod src1.z
+
+ dst.w = src0.w \bmod src1.w
+
+
+.. opcode:: UARL - Integer Address Register Load
+
+ Moves the contents of the source register, assumed to be an integer, into the
+ destination register, which is assumed to be an address (ADDR) register.
+
+
+.. opcode:: SAD - Sum Of Absolute Differences
+
+.. math::
+
+ dst.x = |src0.x - src1.x| + src2.x
+
+ dst.y = |src0.y - src1.y| + src2.y
+
+ dst.z = |src0.z - src1.z| + src2.z
+
+ dst.w = |src0.w - src1.w| + src2.w
+
+
+.. opcode:: TXF - Texel Fetch
+
+ As per NV_gpu_shader4, extract a single texel from a specified texture
+ image. The source sampler may not be a CUBE or SHADOW. src 0 is a
+ four-component signed integer vector used to identify the single texel
+ accessed. 3 components + level. src 1 is a 3 component constant signed
+ integer vector, with each component only have a range of -8..+8 (hw only
+ seems to deal with this range, interface allows for up to unsigned int).
+ TXF(uint_vec coord, int_vec offset).
+
+
+.. opcode:: TXQ - Texture Size Query
+
+ As per NV_gpu_program4, retrieve the dimensions of the texture depending on
+ the target. For 1D (width), 2D/RECT/CUBE (width, height), 3D (width, height,
+ depth), 1D array (width, layers), 2D array (width, height, layers).
+ Also return the number of accessible levels (last_level - first_level + 1)
+ in W.
+
+.. math::
+
+ lod = src0.x
+
+ dst.x = texture\_width(unit, lod)
+
+ dst.y = texture\_height(unit, lod)
+
+ dst.z = texture\_depth(unit, lod)
+
+ dst.w = texture\_levels(unit)
+
+.. opcode:: TG4 - Texture Gather
+
+ As per ARB_texture_gather, gathers the four texels to be used in a bi-linear
+ filtering operation and packs them into a single register. Only works with
+ 2D, 2D array, cubemaps, and cubemaps arrays. For 2D textures, only the
+ addressing modes of the sampler and the top level of any mip pyramid are
+ used. Set W to zero. It behaves like the TEX instruction, but a filtered
+ sample is not generated. The four samples that contribute to filtering are
+ placed into xyzw in clockwise order, starting with the (u,v) texture
+ coordinate delta at the following locations (-, +), (+, +), (+, -), (-, -),
+ where the magnitude of the deltas are half a texel.
+
+ PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample
+ depth compares, single component selection, and a non-constant offset. It
+ doesn't allow support for the GL independent offset to get i0,j0. This would
+ require another CAP is hw can do it natively. For now we lower that before
+ TGSI.
+
+.. math::
+
+ coord = src0
+
+ component = src1
+
+ dst = texture\_gather4 (unit, coord, component)
+
+(with SM5 - cube array shadow)
+
+.. math::
+
+ coord = src0
+
+ compare = src1
+
+ dst = texture\_gather (uint, coord, compare)
+
+.. opcode:: LODQ - level of detail query
+
+ Compute the LOD information that the texture pipe would use to access the
+ texture. The Y component contains the computed LOD lambda_prime. The X
+ component contains the LOD that will be accessed, based on min/max lod's
+ and mipmap filters.
+
+.. math::
+
+ coord = src0
+
+ dst.xy = lodq(uint, coord);
+
+Integer ISA
+^^^^^^^^^^^^^^^^^^^^^^^^
+These opcodes are used for integer operations.
+Support for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
+
+
+.. opcode:: I2F - Signed Integer To Float
+
+ Rounding is unspecified (round to nearest even suggested).
+
+.. math::
+
+ dst.x = (float) src.x
+
+ dst.y = (float) src.y
+
+ dst.z = (float) src.z
+
+ dst.w = (float) src.w
+
+
+.. opcode:: U2F - Unsigned Integer To Float
+
+ Rounding is unspecified (round to nearest even suggested).
+
+.. math::
+
+ dst.x = (float) src.x
+
+ dst.y = (float) src.y
+
+ dst.z = (float) src.z
+
+ dst.w = (float) src.w
+
+
+.. opcode:: F2I - Float to Signed Integer
+
+ Rounding is towards zero (truncate).
+ Values outside signed range (including NaNs) produce undefined results.
+
+.. math::
+
+ dst.x = (int) src.x
+
+ dst.y = (int) src.y
+
+ dst.z = (int) src.z
+
+ dst.w = (int) src.w
+
+
+.. opcode:: F2U - Float to Unsigned Integer
+
+ Rounding is towards zero (truncate).
+ Values outside unsigned range (including NaNs) produce undefined results.
+
+.. math::
+
+ dst.x = (unsigned) src.x
+
+ dst.y = (unsigned) src.y
+
+ dst.z = (unsigned) src.z
+
+ dst.w = (unsigned) src.w
+
+
+.. opcode:: UADD - Integer Add
+
+ This instruction works the same for signed and unsigned integers.
+ The low 32bit of the result is returned.
+
+.. math::
+
+ dst.x = src0.x + src1.x
+
+ dst.y = src0.y + src1.y
+
+ dst.z = src0.z + src1.z
+
+ dst.w = src0.w + src1.w
+
+
+.. opcode:: UMAD - Integer Multiply And Add
+
+ This instruction works the same for signed and unsigned integers.
+ The multiplication returns the low 32bit (as does the result itself).
+
+.. math::
+
+ dst.x = src0.x \times src1.x + src2.x
+
+ dst.y = src0.y \times src1.y + src2.y
+
+ dst.z = src0.z \times src1.z + src2.z
+
+ dst.w = src0.w \times src1.w + src2.w
+
+
+.. opcode:: UMUL - Integer Multiply
+
+ This instruction works the same for signed and unsigned integers.
+ The low 32bit of the result is returned.
+
+.. math::
+
+ dst.x = src0.x \times src1.x
+
+ dst.y = src0.y \times src1.y
+
+ dst.z = src0.z \times src1.z
+
+ dst.w = src0.w \times src1.w
+
+
+.. opcode:: IMUL_HI - Signed Integer Multiply High Bits
+
+ The high 32bits of the multiplication of 2 signed integers are returned.
+
+.. math::
+
+ dst.x = (src0.x \times src1.x) >> 32
+
+ dst.y = (src0.y \times src1.y) >> 32
+
+ dst.z = (src0.z \times src1.z) >> 32
+
+ dst.w = (src0.w \times src1.w) >> 32
+
+
+.. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
+
+ The high 32bits of the multiplication of 2 unsigned integers are returned.
+
+.. math::
+
+ dst.x = (src0.x \times src1.x) >> 32
+
+ dst.y = (src0.y \times src1.y) >> 32
+
+ dst.z = (src0.z \times src1.z) >> 32
+
+ dst.w = (src0.w \times src1.w) >> 32
+
+
+.. opcode:: IDIV - Signed Integer Division
+
+ TBD: behavior for division by zero.
+
+.. math::
+
+ dst.x = src0.x \ src1.x
+
+ dst.y = src0.y \ src1.y
+
+ dst.z = src0.z \ src1.z
+
+ dst.w = src0.w \ src1.w
+
+
+.. opcode:: UDIV - Unsigned Integer Division
+
+ For division by zero, 0xffffffff is returned.
+
+.. math::
+
+ dst.x = src0.x \ src1.x
+
+ dst.y = src0.y \ src1.y
+
+ dst.z = src0.z \ src1.z
+
+ dst.w = src0.w \ src1.w
+
+
+.. opcode:: UMOD - Unsigned Integer Remainder
+
+ If second arg is zero, 0xffffffff is returned.
+
+.. math::
+
+ dst.x = src0.x \ src1.x
+
+ dst.y = src0.y \ src1.y
+
+ dst.z = src0.z \ src1.z
+
+ dst.w = src0.w \ src1.w
+
+
+.. opcode:: NOT - Bitwise Not
+
+.. math::
+
+ dst.x = \sim src.x
+
+ dst.y = \sim src.y
+
+ dst.z = \sim src.z
+
+ dst.w = \sim src.w
+
+
+.. opcode:: AND - Bitwise And
+
+.. math::
+
+ dst.x = src0.x \& src1.x
+
+ dst.y = src0.y \& src1.y
+
+ dst.z = src0.z \& src1.z
+
+ dst.w = src0.w \& src1.w
+
+
+.. opcode:: OR - Bitwise Or
+
+.. math::
+
+ dst.x = src0.x | src1.x
+
+ dst.y = src0.y | src1.y
+
+ dst.z = src0.z | src1.z
+
+ dst.w = src0.w | src1.w
+
+
+.. opcode:: XOR - Bitwise Xor
+
+.. math::
+
+ dst.x = src0.x \oplus src1.x
+
+ dst.y = src0.y \oplus src1.y
+
+ dst.z = src0.z \oplus src1.z
+
+ dst.w = src0.w \oplus src1.w
+
+
+.. opcode:: IMAX - Maximum of Signed Integers
+
+.. math::
+
+ dst.x = max(src0.x, src1.x)
+
+ dst.y = max(src0.y, src1.y)
+
+ dst.z = max(src0.z, src1.z)
+
+ dst.w = max(src0.w, src1.w)
+
+
+.. opcode:: UMAX - Maximum of Unsigned Integers
+
+.. math::
+
+ dst.x = max(src0.x, src1.x)
+
+ dst.y = max(src0.y, src1.y)
+
+ dst.z = max(src0.z, src1.z)
+
+ dst.w = max(src0.w, src1.w)
+
+
+.. opcode:: IMIN - Minimum of Signed Integers
+
+.. math::
+
+ dst.x = min(src0.x, src1.x)
+
+ dst.y = min(src0.y, src1.y)
+
+ dst.z = min(src0.z, src1.z)
+
+ dst.w = min(src0.w, src1.w)
+
+
+.. opcode:: UMIN - Minimum of Unsigned Integers
+
+.. math::
+
+ dst.x = min(src0.x, src1.x)
+
+ dst.y = min(src0.y, src1.y)
+
+ dst.z = min(src0.z, src1.z)
+
+ dst.w = min(src0.w, src1.w)
+
+
+.. opcode:: SHL - Shift Left
+
+ The shift count is masked with 0x1f before the shift is applied.
+
+.. math::
+
+ dst.x = src0.x << (0x1f \& src1.x)
+
+ dst.y = src0.y << (0x1f \& src1.y)
+
+ dst.z = src0.z << (0x1f \& src1.z)
+
+ dst.w = src0.w << (0x1f \& src1.w)
+
+
+.. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer)
+
+ The shift count is masked with 0x1f before the shift is applied.
+
+.. math::
+
+ dst.x = src0.x >> (0x1f \& src1.x)
+
+ dst.y = src0.y >> (0x1f \& src1.y)
+
+ dst.z = src0.z >> (0x1f \& src1.z)
+
+ dst.w = src0.w >> (0x1f \& src1.w)
+
+
+.. opcode:: USHR - Logical Shift Right
+
+ The shift count is masked with 0x1f before the shift is applied.
+.. math::
-.. opcode:: ELSE - Else
+ dst.x = src0.x >> (unsigned) (0x1f \& src1.x)
- Starts an else block, after an IF or UIF statement.
+ dst.y = src0.y >> (unsigned) (0x1f \& src1.y)
+ dst.z = src0.z >> (unsigned) (0x1f \& src1.z)
-.. opcode:: ENDIF - End If
+ dst.w = src0.w >> (unsigned) (0x1f \& src1.w)
- Ends an IF or UIF block.
+.. opcode:: UCMP - Integer Conditional Move
-.. opcode:: PUSHA - Push Address Register On Stack
+.. math::
- push(src.x)
- push(src.y)
- push(src.z)
- push(src.w)
+ dst.x = src0.x ? src1.x : src2.x
-.. note::
+ dst.y = src0.y ? src1.y : src2.y
- Considered for cleanup.
+ dst.z = src0.z ? src1.z : src2.z
-.. note::
+ dst.w = src0.w ? src1.w : src2.w
- Considered for removal.
-.. opcode:: POPA - Pop Address Register From Stack
- dst.w = pop()
- dst.z = pop()
- dst.y = pop()
- dst.x = pop()
+.. opcode:: ISSG - Integer Set Sign
-.. note::
+.. math::
- Considered for cleanup.
+ dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0
-.. note::
+ dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0
- Considered for removal.
+ dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0
+ dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0
-Compute ISA
-^^^^^^^^^^^^^^^^^^^^^^^^
-These opcodes are primarily provided for special-use computational shaders.
-Support for these opcodes indicated by a special pipe capability bit (TBD).
-XXX so let's discuss it, yeah?
+.. opcode:: FSLT - Float Set On Less Than (ordered)
-.. opcode:: CEIL - Ceiling
+ Same comparison as SLT but returns integer instead of 1.0/0.0 float
.. math::
- dst.x = \lceil src.x\rceil
+ dst.x = (src0.x < src1.x) ? \sim 0 : 0
- dst.y = \lceil src.y\rceil
+ dst.y = (src0.y < src1.y) ? \sim 0 : 0
- dst.z = \lceil src.z\rceil
+ dst.z = (src0.z < src1.z) ? \sim 0 : 0
- dst.w = \lceil src.w\rceil
+ dst.w = (src0.w < src1.w) ? \sim 0 : 0
-.. opcode:: I2F - Integer To Float
+.. opcode:: ISLT - Signed Integer Set On Less Than
.. math::
- dst.x = (float) src.x
+ dst.x = (src0.x < src1.x) ? \sim 0 : 0
- dst.y = (float) src.y
+ dst.y = (src0.y < src1.y) ? \sim 0 : 0
- dst.z = (float) src.z
+ dst.z = (src0.z < src1.z) ? \sim 0 : 0
- dst.w = (float) src.w
+ dst.w = (src0.w < src1.w) ? \sim 0 : 0
-.. opcode:: NOT - Bitwise Not
+.. opcode:: USLT - Unsigned Integer Set On Less Than
.. math::
- dst.x = ~src.x
+ dst.x = (src0.x < src1.x) ? \sim 0 : 0
- dst.y = ~src.y
+ dst.y = (src0.y < src1.y) ? \sim 0 : 0
- dst.z = ~src.z
+ dst.z = (src0.z < src1.z) ? \sim 0 : 0
- dst.w = ~src.w
+ dst.w = (src0.w < src1.w) ? \sim 0 : 0
-.. opcode:: TRUNC - Truncate
+.. opcode:: FSGE - Float Set On Greater Equal Than (ordered)
+
+ Same comparison as SGE but returns integer instead of 1.0/0.0 float
.. math::
- dst.x = trunc(src.x)
+ dst.x = (src0.x >= src1.x) ? \sim 0 : 0
- dst.y = trunc(src.y)
+ dst.y = (src0.y >= src1.y) ? \sim 0 : 0
- dst.z = trunc(src.z)
+ dst.z = (src0.z >= src1.z) ? \sim 0 : 0
- dst.w = trunc(src.w)
+ dst.w = (src0.w >= src1.w) ? \sim 0 : 0
-.. opcode:: SHL - Shift Left
+.. opcode:: ISGE - Signed Integer Set On Greater Equal Than
.. math::
- dst.x = src0.x << src1.x
+ dst.x = (src0.x >= src1.x) ? \sim 0 : 0
- dst.y = src0.y << src1.x
+ dst.y = (src0.y >= src1.y) ? \sim 0 : 0
- dst.z = src0.z << src1.x
+ dst.z = (src0.z >= src1.z) ? \sim 0 : 0
- dst.w = src0.w << src1.x
+ dst.w = (src0.w >= src1.w) ? \sim 0 : 0
-.. opcode:: SHR - Shift Right
+.. opcode:: USGE - Unsigned Integer Set On Greater Equal Than
.. math::
- dst.x = src0.x >> src1.x
+ dst.x = (src0.x >= src1.x) ? \sim 0 : 0
- dst.y = src0.y >> src1.x
+ dst.y = (src0.y >= src1.y) ? \sim 0 : 0
- dst.z = src0.z >> src1.x
+ dst.z = (src0.z >= src1.z) ? \sim 0 : 0
- dst.w = src0.w >> src1.x
+ dst.w = (src0.w >= src1.w) ? \sim 0 : 0
-.. opcode:: AND - Bitwise And
+.. opcode:: FSEQ - Float Set On Equal (ordered)
+
+ Same comparison as SEQ but returns integer instead of 1.0/0.0 float
.. math::
- dst.x = src0.x & src1.x
+ dst.x = (src0.x == src1.x) ? \sim 0 : 0
- dst.y = src0.y & src1.y
+ dst.y = (src0.y == src1.y) ? \sim 0 : 0
- dst.z = src0.z & src1.z
+ dst.z = (src0.z == src1.z) ? \sim 0 : 0
- dst.w = src0.w & src1.w
+ dst.w = (src0.w == src1.w) ? \sim 0 : 0
-.. opcode:: OR - Bitwise Or
+.. opcode:: USEQ - Integer Set On Equal
.. math::
- dst.x = src0.x | src1.x
+ dst.x = (src0.x == src1.x) ? \sim 0 : 0
- dst.y = src0.y | src1.y
+ dst.y = (src0.y == src1.y) ? \sim 0 : 0
- dst.z = src0.z | src1.z
+ dst.z = (src0.z == src1.z) ? \sim 0 : 0
- dst.w = src0.w | src1.w
+ dst.w = (src0.w == src1.w) ? \sim 0 : 0
-.. opcode:: MOD - Modulus
+.. opcode:: FSNE - Float Set On Not Equal (unordered)
+
+ Same comparison as SNE but returns integer instead of 1.0/0.0 float
.. math::
- dst.x = src0.x \bmod src1.x
+ dst.x = (src0.x != src1.x) ? \sim 0 : 0
- dst.y = src0.y \bmod src1.y
+ dst.y = (src0.y != src1.y) ? \sim 0 : 0
- dst.z = src0.z \bmod src1.z
+ dst.z = (src0.z != src1.z) ? \sim 0 : 0
- dst.w = src0.w \bmod src1.w
+ dst.w = (src0.w != src1.w) ? \sim 0 : 0
-.. opcode:: XOR - Bitwise Xor
+.. opcode:: USNE - Integer Set On Not Equal
.. math::
- dst.x = src0.x \oplus src1.x
-
- dst.y = src0.y \oplus src1.y
-
- dst.z = src0.z \oplus src1.z
+ dst.x = (src0.x != src1.x) ? \sim 0 : 0
- dst.w = src0.w \oplus src1.w
+ dst.y = (src0.y != src1.y) ? \sim 0 : 0
+ dst.z = (src0.z != src1.z) ? \sim 0 : 0
-.. opcode:: UCMP - Integer Conditional Move
+ dst.w = (src0.w != src1.w) ? \sim 0 : 0
-.. math::
- dst.x = src0.x ? src1.x : src2.x
+.. opcode:: INEG - Integer Negate
- dst.y = src0.y ? src1.y : src2.y
+ Two's complement.
- dst.z = src0.z ? src1.z : src2.z
+.. math::
- dst.w = src0.w ? src1.w : src2.w
+ dst.x = -src.x
+ dst.y = -src.y
-.. opcode:: UARL - Integer Address Register Load
+ dst.z = -src.z
- Moves the contents of the source register, assumed to be an integer, into the
- destination register, which is assumed to be an address (ADDR) register.
+ dst.w = -src.w
.. opcode:: IABS - Integer Absolute Value
dst.w = |src.w|
+Bitwise ISA
+^^^^^^^^^^^
+These opcodes are used for bit-level manipulation of integers.
-.. opcode:: SAD - Sum Of Absolute Differences
+.. opcode:: IBFE - Signed Bitfield Extract
-.. math::
+ See SM5 instruction of the same name. Extracts a set of bits from the input,
+ and sign-extends them if the high bit of the extracted window is set.
- dst.x = |src0.x - src1.x| + src2.x
+ Pseudocode::
- dst.y = |src0.y - src1.y| + src2.y
+ def ibfe(value, offset, bits):
+ offset = offset & 0x1f
+ bits = bits & 0x1f
+ if bits == 0: return 0
+ # Note: >> sign-extends
+ if width + offset < 32:
+ return (value << (32 - offset - bits)) >> (32 - bits)
+ else:
+ return value >> offset
- dst.z = |src0.z - src1.z| + src2.z
+.. opcode:: UBFE - Unsigned Bitfield Extract
- dst.w = |src0.w - src1.w| + src2.w
+ See SM5 instruction of the same name. Extracts a set of bits from the input,
+ without any sign-extension.
+ Pseudocode::
-.. opcode:: TXF - Texel Fetch (as per NV_gpu_shader4), extract a single texel
- from a specified texture image. The source sampler may
- not be a CUBE or SHADOW.
- src 0 is a four-component signed integer vector used to
- identify the single texel accessed. 3 components + level.
- src 1 is a 3 component constant signed integer vector,
- with each component only have a range of
- -8..+8 (hw only seems to deal with this range, interface
- allows for up to unsigned int).
- TXF(uint_vec coord, int_vec offset).
+ def ubfe(value, offset, bits):
+ offset = offset & 0x1f
+ bits = bits & 0x1f
+ if bits == 0: return 0
+ # Note: >> does not sign-extend
+ if width + offset < 32:
+ return (value << (32 - offset - bits)) >> (32 - bits)
+ else:
+ return value >> offset
+.. opcode:: BFI - Bitfield Insert
-.. opcode:: TXQ - Texture Size Query (as per NV_gpu_program4)
- retrieve the dimensions of the texture
- depending on the target. For 1D (width), 2D/RECT/CUBE
- (width, height), 3D (width, height, depth),
- 1D array (width, layers), 2D array (width, height, layers)
+ See SM5 instruction of the same name. Replaces a bit region of 'base' with
+ the low bits of 'insert'.
-.. math::
+ Pseudocode::
- lod = src0
+ def bfi(base, insert, offset, bits):
+ offset = offset & 0x1f
+ bits = bits & 0x1f
+ mask = ((1 << bits) - 1) << offset
+ return ((insert << offset) & mask) | (base & ~mask)
- dst.x = texture_width(unit, lod)
+.. opcode:: BREV - Bitfield Reverse
- dst.y = texture_height(unit, lod)
+ See SM5 instruction BFREV. Reverses the bits of the argument.
- dst.z = texture_depth(unit, lod)
+.. opcode:: POPC - Population Count
+ See SM5 instruction COUNTBITS. Counts the number of set bits in the argument.
-.. opcode:: CONT - Continue
+.. opcode:: LSB - Index of lowest set bit
- TBD
+ See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set
+ bit of the argument. Returns -1 if none are set.
-.. note::
+.. opcode:: IMSB - Index of highest non-sign bit
- Support for CONT is determined by a special capability bit,
- ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
+ See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest
+ non-sign bit of the argument (i.e. highest 0 bit for negative numbers,
+ highest 1 bit for positive numbers). Returns -1 if all bits are the same
+ (i.e. for inputs 0 and -1).
+.. opcode:: UMSB - Index of highest set bit
+
+ See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest
+ set bit of the argument. Returns -1 if none are set.
Geometry ISA
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. opcode:: EMIT - Emit
- TBD
+ Generate a new vertex for the current primitive into the specified vertex
+ stream using the values in the output registers.
.. opcode:: ENDPRIM - End Primitive
- TBD
+ Complete the current primitive in the specified vertex stream (consisting of
+ the emitted vertices), and start a new one.
GLSL ISA
These opcodes are part of :term:`GLSL`'s opcode set. Support for these
opcodes is determined by a special capability bit, ``GLSL``.
+Some require glsl version 1.30 (UIF/BREAKC/SWITCH/CASE/DEFAULT/ENDSWITCH).
+
+.. opcode:: CAL - Subroutine Call
+
+ push(pc)
+ pc = target
+
+
+.. opcode:: RET - Subroutine Call Return
+
+ pc = pop()
+
+
+.. opcode:: CONT - Continue
+
+ Unconditionally moves the point of execution to the instruction after the
+ last bgnloop. The instruction must appear within a bgnloop/endloop.
+
+.. note::
+
+ Support for CONT is determined by a special capability bit,
+ ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
+
.. opcode:: BGNLOOP - Begin a Loop
- TBD
+ Start a loop. Must have a matching endloop.
.. opcode:: BGNSUB - Begin Subroutine
- TBD
+ Starts definition of a subroutine. Must have a matching endsub.
.. opcode:: ENDLOOP - End a Loop
- TBD
+ End a loop started with bgnloop.
.. opcode:: ENDSUB - End Subroutine
- TBD
+ Ends definition of a subroutine.
.. opcode:: NOP - No Operation
Do nothing.
-.. opcode:: NRM4 - 4-component Vector Normalise
+.. opcode:: BRK - Break
-This instruction replicates its result.
+ Unconditionally moves the point of execution to the instruction after the
+ next endloop or endswitch. The instruction must appear within a loop/endloop
+ or switch/endswitch.
-.. math::
- dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
+.. opcode:: BREAKC - Break Conditional
+
+ Conditionally moves the point of execution to the instruction after the
+ next endloop or endswitch. The instruction must appear within a loop/endloop
+ or switch/endswitch.
+ Condition evaluates to true if src0.x != 0 where src0.x is interpreted
+ as an integer register.
+.. note::
-ps_2_x
-^^^^^^^^^^^^
+ Considered for removal as it's quite inconsistent wrt other opcodes
+ (could emulate with UIF/BRK/ENDIF).
-XXX wait what
-.. opcode:: CALLNZ - Subroutine Call If Not Zero
+.. opcode:: IF - Float If
- TBD
+ Start an IF ... ELSE .. ENDIF block. Condition evaluates to true if
+ src0.x != 0.0
-.. opcode:: BREAKC - Break Conditional
+ where src0.x is interpreted as a floating point register.
+
+
+.. opcode:: UIF - Bitwise If
+
+ Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
+
+ src0.x != 0
+
+ where src0.x is interpreted as an integer register.
+
+
+.. opcode:: ELSE - Else
+
+ Starts an else block, after an IF or UIF statement.
+
+
+.. opcode:: ENDIF - End If
+
+ Ends an IF or UIF block.
+
+
+.. opcode:: SWITCH - Switch
+
+ Starts a C-style switch expression. The switch consists of one or multiple
+ CASE statements, and at most one DEFAULT statement. Execution of a statement
+ ends when a BRK is hit, but just like in C falling through to other cases
+ without a break is allowed. Similarly, DEFAULT label is allowed anywhere not
+ just as last statement, and fallthrough is allowed into/from it.
+ CASE src arguments are evaluated at bit level against the SWITCH src argument.
+
+ Example::
+
+ SWITCH src[0].x
+ CASE src[0].x
+ (some instructions here)
+ (optional BRK here)
+ DEFAULT
+ (some instructions here)
+ (optional BRK here)
+ CASE src[0].x
+ (some instructions here)
+ (optional BRK here)
+ ENDSWITCH
+
+
+.. opcode:: CASE - Switch case
+
+ This represents a switch case label. The src arg must be an integer immediate.
+
+
+.. opcode:: DEFAULT - Switch default
+
+ This represents the default case in the switch, which is taken if no other
+ case matches.
+
+
+.. opcode:: ENDSWITCH - End of switch
+
+ Ends a switch expression.
+
+
+.. opcode:: NRM4 - 4-component Vector Normalise
+
+This instruction replicates its result.
+
+.. math::
+
+ dst = \frac{src.x}{src.x \times src.x + src.y \times src.y + src.z \times src.z + src.w \times src.w}
- TBD
.. _doubleopcodes:
Those opcodes follow very closely semantics of the respective Direct3D
instructions. If in doubt double check Direct3D documentation.
+Note that the swizzle on SVIEW (src1) determines texel swizzling
+after lookup.
+
+.. opcode:: SAMPLE
+
+ Using provided address, sample data from the specified texture using the
+ filtering mode identified by the gven sampler. The source data may come from
+ any resource type other than buffers.
+
+ Syntax: ``SAMPLE dst, address, sampler_view, sampler``
+
+ Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]``
+
+.. opcode:: SAMPLE_I
+
+ Simplified alternative to the SAMPLE instruction. Using the provided
+ integer address, SAMPLE_I fetches data from the specified sampler view
+ without any filtering. The source data may come from any resource type
+ other than CUBE.
+
+ Syntax: ``SAMPLE_I dst, address, sampler_view``
+
+ Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]``
+
+ The 'address' is specified as unsigned integers. If the 'address' is out of
+ range [0...(# texels - 1)] the result of the fetch is always 0 in all
+ components. As such the instruction doesn't honor address wrap modes, in
+ cases where that behavior is desirable 'SAMPLE' instruction should be used.
+ address.w always provides an unsigned integer mipmap level. If the value is
+ out of the range then the instruction always returns 0 in all components.
+ address.yz are ignored for buffers and 1d textures. address.z is ignored
+ for 1d texture arrays and 2d textures.
+
+ For 1D texture arrays address.y provides the array index (also as unsigned
+ integer). If the value is out of the range of available array indices
+ [0... (array size - 1)] then the opcode always returns 0 in all components.
+ For 2D texture arrays address.z provides the array index, otherwise it
+ exhibits the same behavior as in the case for 1D texture arrays. The exact
+ semantics of the source address are presented in the table below:
+
+ +---------------------------+----+-----+-----+---------+
+ | resource type | X | Y | Z | W |
+ +===========================+====+=====+=====+=========+
+ | ``PIPE_BUFFER`` | x | | | ignored |
+ +---------------------------+----+-----+-----+---------+
+ | ``PIPE_TEXTURE_1D`` | x | | | mpl |
+ +---------------------------+----+-----+-----+---------+
+ | ``PIPE_TEXTURE_2D`` | x | y | | mpl |
+ +---------------------------+----+-----+-----+---------+
+ | ``PIPE_TEXTURE_3D`` | x | y | z | mpl |
+ +---------------------------+----+-----+-----+---------+
+ | ``PIPE_TEXTURE_RECT`` | x | y | | mpl |
+ +---------------------------+----+-----+-----+---------+
+ | ``PIPE_TEXTURE_CUBE`` | not allowed as source |
+ +---------------------------+----+-----+-----+---------+
+ | ``PIPE_TEXTURE_1D_ARRAY`` | x | idx | | mpl |
+ +---------------------------+----+-----+-----+---------+
+ | ``PIPE_TEXTURE_2D_ARRAY`` | x | y | idx | mpl |
+ +---------------------------+----+-----+-----+---------+
+
+ Where 'mpl' is a mipmap level and 'idx' is the array index.
+
+.. opcode:: SAMPLE_I_MS
-.. opcode:: SAMPLE - Using provided address, sample data from the
- specified texture using the filtering mode identified
- by the gven sampler. The source data may come from
- any resource type other than buffers.
- SAMPLE dst, address, sampler_view, sampler
- e.g.
- SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]
-
-.. opcode:: SAMPLE_I - Simplified alternative to the SAMPLE instruction.
- Using the provided integer address, SAMPLE_I fetches data
- from the specified sampler view without any filtering.
- The source data may come from any resource type other
- than CUBE.
- SAMPLE_I dst, address, sampler_view
- e.g.
- SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]
- The 'address' is specified as unsigned integers. If the
- 'address' is out of range [0...(# texels - 1)] the
- result of the fetch is always 0 in all components.
- As such the instruction doesn't honor address wrap
- modes, in cases where that behavior is desirable
- 'SAMPLE' instruction should be used.
- address.w always provides an unsigned integer mipmap
- level. If the value is out of the range then the
- instruction always returns 0 in all components.
- address.yz are ignored for buffers and 1d textures.
- address.z is ignored for 1d texture arrays and 2d
- textures.
- For 1D texture arrays address.y provides the array
- index (also as unsigned integer). If the value is
- out of the range of available array indices
- [0... (array size - 1)] then the opcode always returns
- 0 in all components.
- For 2D texture arrays address.z provides the array
- index, otherwise it exhibits the same behavior as in
- the case for 1D texture arrays.
- The exact semantics of the source address are presented
- in the table below:
- resource type X Y Z W
- ------------- ------------------------
- PIPE_BUFFER x ignored
- PIPE_TEXTURE_1D x mpl
- PIPE_TEXTURE_2D x y mpl
- PIPE_TEXTURE_3D x y z mpl
- PIPE_TEXTURE_RECT x y mpl
- PIPE_TEXTURE_CUBE not allowed as source
- PIPE_TEXTURE_1D_ARRAY x idx mpl
- PIPE_TEXTURE_2D_ARRAY x y idx mpl
-
- Where 'mpl' is a mipmap level and 'idx' is the
- array index.
-
-.. opcode:: SAMPLE_I_MS - Just like SAMPLE_I but allows fetch data from
- multi-sampled surfaces.
- SAMPLE_I_MS dst, address, sampler_view, sample
-
-.. opcode:: SAMPLE_B - Just like the SAMPLE instruction with the
- exception that an additional bias is applied to the
- level of detail computed as part of the instruction
- execution.
- SAMPLE_B dst, address, sampler_view, sampler, lod_bias
- e.g.
- SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x
-
-.. opcode:: SAMPLE_C - Similar to the SAMPLE instruction but it
- performs a comparison filter. The operands to SAMPLE_C
- are identical to SAMPLE, except that there is an additional
- float32 operand, reference value, which must be a register
- with single-component, or a scalar literal.
- SAMPLE_C makes the hardware use the current samplers
- compare_func (in pipe_sampler_state) to compare
- reference value against the red component value for the
- surce resource at each texel that the currently configured
- texture filter covers based on the provided coordinates.
- SAMPLE_C dst, address, sampler_view.r, sampler, ref_value
- e.g.
- SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x
-
-.. opcode:: SAMPLE_C_LZ - Same as SAMPLE_C, but LOD is 0 and derivatives
- are ignored. The LZ stands for level-zero.
- SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value
- e.g.
- SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x
-
-
-.. opcode:: SAMPLE_D - SAMPLE_D is identical to the SAMPLE opcode except
- that the derivatives for the source address in the x
- direction and the y direction are provided by extra
- parameters.
- SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y
- e.g.
- SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]
-
-.. opcode:: SAMPLE_L - SAMPLE_L is identical to the SAMPLE opcode except
- that the LOD is provided directly as a scalar value,
- representing no anisotropy.
- SAMPLE_L dst, address, sampler_view, sampler, explicit_lod
- e.g.
- SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x
-
-.. opcode:: GATHER4 - Gathers the four texels to be used in a bi-linear
- filtering operation and packs them into a single register.
- Only works with 2D, 2D array, cubemaps, and cubemaps arrays.
- For 2D textures, only the addressing modes of the sampler and
- the top level of any mip pyramid are used. Set W to zero.
- It behaves like the SAMPLE instruction, but a filtered
- sample is not generated. The four samples that contribute
- to filtering are placed into xyzw in counter-clockwise order,
- starting with the (u,v) texture coordinate delta at the
- following locations (-, +), (+, +), (+, -), (-, -), where
- the magnitude of the deltas are half a texel.
-
-
-.. opcode:: SVIEWINFO - query the dimensions of a given sampler view.
- dst receives width, height, depth or array size and
- number of mipmap levels as int4. The dst can have a writemask
- which will specify what info is the caller interested
- in.
- SVIEWINFO dst, src_mip_level, sampler_view
- e.g.
- SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]
- src_mip_level is an unsigned integer scalar. If it's
- out of range then returns 0 for width, height and
- depth/array size but the total number of mipmap is
- still returned correctly for the given sampler view.
- The returned width, height and depth values are for
- the mipmap level selected by the src_mip_level and
- are in the number of texels.
- For 1d texture array width is in dst.x, array size
- is in dst.y and dst.zw are always 0.
-
-.. opcode:: SAMPLE_POS - query the position of a given sample.
- dst receives float4 (x, y, 0, 0) indicated where the
- sample is located. If the resource is not a multi-sample
- resource and not a render target, the result is 0.
-
-.. opcode:: SAMPLE_INFO - dst receives number of samples in x.
- If the resource is not a multi-sample resource and
- not a render target, the result is 0.
+ Just like SAMPLE_I but allows fetch data from multi-sampled surfaces.
+
+ Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample``
+
+.. opcode:: SAMPLE_B
+
+ Just like the SAMPLE instruction with the exception that an additional bias
+ is applied to the level of detail computed as part of the instruction
+ execution.
+
+ Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias``
+
+ Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
+
+.. opcode:: SAMPLE_C
+
+ Similar to the SAMPLE instruction but it performs a comparison filter. The
+ operands to SAMPLE_C are identical to SAMPLE, except that there is an
+ additional float32 operand, reference value, which must be a register with
+ single-component, or a scalar literal. SAMPLE_C makes the hardware use the
+ current samplers compare_func (in pipe_sampler_state) to compare reference
+ value against the red component value for the surce resource at each texel
+ that the currently configured texture filter covers based on the provided
+ coordinates.
+
+ Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value``
+
+ Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
+
+.. opcode:: SAMPLE_C_LZ
+
+ Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands
+ for level-zero.
+
+ Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value``
+
+ Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
+
+
+.. opcode:: SAMPLE_D
+
+ SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for
+ the source address in the x direction and the y direction are provided by
+ extra parameters.
+
+ Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y``
+
+ Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]``
+
+.. opcode:: SAMPLE_L
+
+ SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided
+ directly as a scalar value, representing no anisotropy.
+
+ Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod``
+
+ Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
+
+.. opcode:: GATHER4
+
+ Gathers the four texels to be used in a bi-linear filtering operation and
+ packs them into a single register. Only works with 2D, 2D array, cubemaps,
+ and cubemaps arrays. For 2D textures, only the addressing modes of the
+ sampler and the top level of any mip pyramid are used. Set W to zero. It
+ behaves like the SAMPLE instruction, but a filtered sample is not
+ generated. The four samples that contribute to filtering are placed into
+ xyzw in counter-clockwise order, starting with the (u,v) texture coordinate
+ delta at the following locations (-, +), (+, +), (+, -), (-, -), where the
+ magnitude of the deltas are half a texel.
+
+
+.. opcode:: SVIEWINFO
+
+ Query the dimensions of a given sampler view. dst receives width, height,
+ depth or array size and number of mipmap levels as int4. The dst can have a
+ writemask which will specify what info is the caller interested in.
+
+ Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view``
+
+ Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]``
+
+ src_mip_level is an unsigned integer scalar. If it's out of range then
+ returns 0 for width, height and depth/array size but the total number of
+ mipmap is still returned correctly for the given sampler view. The returned
+ width, height and depth values are for the mipmap level selected by the
+ src_mip_level and are in the number of texels. For 1d texture array width
+ is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is
+ still in dst.w. In contrast to d3d10 resinfo, there's no way in the tgsi
+ instruction encoding to specify the return type (float/rcpfloat/uint), hence
+ always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1
+ resinfo allowing swizzling dst values is ignored (due to the interaction
+ with rcpfloat modifier which requires some swizzle handling in the state
+ tracker anyway).
+
+.. opcode:: SAMPLE_POS
+
+ Query the position of a given sample. dst receives float4 (x, y, 0, 0)
+ indicated where the sample is located. If the resource is not a multi-sample
+ resource and not a render target, the result is 0.
+
+.. opcode:: SAMPLE_INFO
+
+ dst receives number of samples in x. If the resource is not a multi-sample
+ resource and not a render target, the result is 0.
.. _resourceopcodes:
Declaration Semantic
^^^^^^^^^^^^^^^^^^^^^^^^
- Vertex and fragment shader input and output registers may be labeled
- with semantic information consisting of a name and index.
+Vertex and fragment shader input and output registers may be labeled
+with semantic information consisting of a name and index.
- Follows Declaration token if Semantic bit is set.
+Follows Declaration token if Semantic bit is set.
- Since its purpose is to link a shader with other stages of the pipeline,
- it is valid to follow only those Declaration tokens that declare a register
- either in INPUT or OUTPUT file.
+Since its purpose is to link a shader with other stages of the pipeline,
+it is valid to follow only those Declaration tokens that declare a register
+either in INPUT or OUTPUT file.
- SemanticName field contains the semantic name of the register being declared.
- There is no default value.
+SemanticName field contains the semantic name of the register being declared.
+There is no default value.
- SemanticIndex is an optional subscript that can be used to distinguish
- different register declarations with the same semantic name. The default value
- is 0.
+SemanticIndex is an optional subscript that can be used to distinguish
+different register declarations with the same semantic name. The default value
+is 0.
- The meanings of the individual semantic names are explained in the following
- sections.
+The meanings of the individual semantic names are explained in the following
+sections.
TGSI_SEMANTIC_POSITION
""""""""""""""""""""""
Vertex shader inputs and outputs and fragment shader inputs may be
labeled with TGSI_SEMANTIC_FOG to indicate that the register contains
-a fog coordinate in the form (F, 0, 0, 1). Typically, the fragment
-shader will use the fog coordinate to compute a fog blend factor which
-is used to blend the normal fragment color with a constant fog color.
-
-Only the first component matters when writing from the vertex shader;
-the driver will ensure that the coordinate is in this format when used
-as a fragment shader input.
+a fog coordinate. Typically, the fragment shader will use the fog coordinate
+to compute a fog blend factor which is used to blend the normal fragment color
+with a constant fog color. But fog coord really is just an ordinary vec4
+register like regular semantics.
TGSI_SEMANTIC_PSIZE
drawn when the polygon mode converts triangles/quads/polygons into
points or lines.
+
TGSI_SEMANTIC_STENCIL
-""""""""""""""""""""""
+"""""""""""""""""""""
-For fragment shaders, this semantic label indicates than an output
+For fragment shaders, this semantic label indicates that an output
is a writable stencil reference value. Only the Y component is writable.
This allows the fragment shader to change the fragments stencilref value.
+TGSI_SEMANTIC_VIEWPORT_INDEX
+""""""""""""""""""""""""""""
+
+For geometry shaders, this semantic label indicates that an output
+contains the index of the viewport (and scissor) to use.
+Only the X value is used.
+
+
+TGSI_SEMANTIC_LAYER
+"""""""""""""""""""
+
+For geometry shaders, this semantic label indicates that an output
+contains the layer value to use for the color and depth/stencil surfaces.
+Only the X value is used. (Also known as rendertarget array index.)
+
+
+TGSI_SEMANTIC_CULLDIST
+""""""""""""""""""""""
+
+Used as distance to plane for performing application-defined culling
+of individual primitives against a plane. When components of vertex
+elements are given this label, these values are assumed to be a
+float32 signed distance to a plane. Primitives will be completely
+discarded if the plane distance for all of the vertices in the
+primitive are < 0. If a vertex has a cull distance of NaN, that
+vertex counts as "out" (as if its < 0);
+The limits on both clip and cull distances are bound
+by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
+the maximum number of components that can be used to hold the
+distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
+which specifies the maximum number of registers which can be
+annotated with those semantics.
+
+
+TGSI_SEMANTIC_CLIPDIST
+""""""""""""""""""""""
+
+When components of vertex elements are identified this way, these
+values are each assumed to be a float32 signed distance to a plane.
+Primitive setup only invokes rasterization on pixels for which
+the interpolated plane distances are >= 0. Multiple clip planes
+can be implemented simultaneously, by annotating multiple
+components of one or more vertex elements with the above specified
+semantic. The limits on both clip and cull distances are bound
+by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
+the maximum number of components that can be used to hold the
+distances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
+which specifies the maximum number of registers which can be
+annotated with those semantics.
+
+TGSI_SEMANTIC_SAMPLEID
+""""""""""""""""""""""
+
+For fragment shaders, this semantic label indicates that a system value
+contains the current sample id (i.e. gl_SampleID). Only the X value is used.
+
+TGSI_SEMANTIC_SAMPLEPOS
+"""""""""""""""""""""""
+
+For fragment shaders, this semantic label indicates that a system value
+contains the current sample's position (i.e. gl_SamplePosition). Only the X
+and Y values are used.
+
+TGSI_SEMANTIC_SAMPLEMASK
+""""""""""""""""""""""""
+
+For fragment shaders, this semantic label indicates that an output contains
+the sample mask used to disable further sample processing
+(i.e. gl_SampleMask). Only the X value is used, up to 32x MS.
+
+TGSI_SEMANTIC_INVOCATIONID
+""""""""""""""""""""""""""
+
+For geometry shaders, this semantic label indicates that a system value
+contains the current invocation id (i.e. gl_InvocationID). Only the X value is
+used.
+
Declaration Interpolate
^^^^^^^^^^^^^^^^^^^^^^^
Declaration Sampler View
^^^^^^^^^^^^^^^^^^^^^^^^
- Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
+Follows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
- DCL SVIEW[#], resource, type(s)
+DCL SVIEW[#], resource, type(s)
- Declares a shader input sampler view and assigns it to a SVIEW[#]
- register.
+Declares a shader input sampler view and assigns it to a SVIEW[#]
+register.
- resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray.
+resource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray.
- type must be 1 or 4 entries (if specifying on a per-component
- level) out of UNORM, SNORM, SINT, UINT and FLOAT.
+type must be 1 or 4 entries (if specifying on a per-component
+level) out of UNORM, SNORM, SINT, UINT and FLOAT.
Declaration Resource
^^^^^^^^^^^^^^^^^^^^
- Follows Declaration token if file is TGSI_FILE_RESOURCE.
+Follows Declaration token if file is TGSI_FILE_RESOURCE.
- DCL RES[#], resource [, WR] [, RAW]
+DCL RES[#], resource [, WR] [, RAW]
- Declares a shader input resource and assigns it to a RES[#]
- register.
+Declares a shader input resource and assigns it to a RES[#]
+register.
- resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and
- 2DArray.
+resource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and
+2DArray.
- If the RAW keyword is not specified, the texture data will be
- subject to conversion, swizzling and scaling as required to yield
- the specified data type from the physical data format of the bound
- resource.
+If the RAW keyword is not specified, the texture data will be
+subject to conversion, swizzling and scaling as required to yield
+the specified data type from the physical data format of the bound
+resource.
- If the RAW keyword is specified, no channel conversion will be
- performed: the values read for each of the channels (X,Y,Z,W) will
- correspond to consecutive words in the same order and format
- they're found in memory. No element-to-address conversion will be
- performed either: the value of the provided X coordinate will be
- interpreted in byte units instead of texel units. The result of
- accessing a misaligned address is undefined.
+If the RAW keyword is specified, no channel conversion will be
+performed: the values read for each of the channels (X,Y,Z,W) will
+correspond to consecutive words in the same order and format
+they're found in memory. No element-to-address conversion will be
+performed either: the value of the provided X coordinate will be
+interpreted in byte units instead of texel units. The result of
+accessing a misaligned address is undefined.
- Usage of the STORE opcode is only allowed if the WR (writable) flag
- is set.
+Usage of the STORE opcode is only allowed if the WR (writable) flag
+is set.
Properties
^^^^^^^^^^^^^^^^^^^^^^^^
-
- Properties are general directives that apply to the whole TGSI program.
+Properties are general directives that apply to the whole TGSI program.
FS_COORD_ORIGIN
"""""""""""""""
If INTEGER, the fractionary part of the position will be 0.0
Note that this does not affect the set of fragments generated by
-rasterization, which is instead controlled by gl_rasterization_rules in the
+rasterization, which is instead controlled by half_pixel_center in the
rasterizer.
OpenGL defaults to HALF_INTEGER, and is configurable with the
This is useful for APIs that don't have UCPs and where clip distances written
by a shader cannot be disabled.
+GS_INVOCATIONS
+""""""""""""""
+
+Specifies the number of times a geometry shader should be executed for each
+input primitive. Each invocation will have a different
+TGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to
+be 1.
+
+VS_WINDOW_SPACE_POSITION
+""""""""""""""""""""""""""
+If this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output
+is assumed to contain window space coordinates.
+Division of X,Y,Z by W and the viewport transformation are disabled, and 1/W is
+directly taken from the 4-th component of the shader output.
+Naturally, clipping is not performed on window coordinates either.
+The effect of this property is undefined if a geometry or tessellation shader
+are in use.
Texture Sampling and Texture Formats
------------------------------------