From: Ian Romanick Date: Mon, 20 Jun 2016 23:28:34 +0000 (-0700) Subject: MESA_shader_integer_functions: Add extension specification X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=91482ef226de7686350202cfbdfda4358d9cea86;p=mesa.git MESA_shader_integer_functions: Add extension specification v2: Fix typo in #extension line noticed by Ken. v3: Update spec status. Signed-off-by: Ian Romanick Reviewed-by: Matt Turner --- diff --git a/docs/specs/MESA_shader_integer_functions.txt b/docs/specs/MESA_shader_integer_functions.txt new file mode 100644 index 00000000000..58a956f44d3 --- /dev/null +++ b/docs/specs/MESA_shader_integer_functions.txt @@ -0,0 +1,520 @@ +Name + + MESA_shader_integer_functions + +Name Strings + + GL_MESA_shader_integer_functions + +Contact + + Ian Romanick + +Contributors + + All the contributors of GL_ARB_gpu_shader5 + +Status + + Supported by all GLSL 1.30 capable drivers in Mesa 12.1 and later + +Version + + Version 2, July 7, 2016 + +Number + + TBD + +Dependencies + + This extension is written against the OpenGL 3.2 (Compatibility Profile) + Specification. + + This extension is written against Version 1.50 (Revision 09) of the OpenGL + Shading Language Specification. + + GLSL 1.30 is required. + + This extension interacts with ARB_gpu_shader5. + + This extension interacts with ARB_gpu_shader_fp64. + + This extension interacts with NV_gpu_shader5. + +Overview + + GL_ARB_gpu_shader5 extends GLSL in a number of useful ways. Much of this + added functionality requires significant hardware support. There are many + aspects, however, that can be easily implmented on any GPU with "real" + integer support (as opposed to simulating integers using floating point + calculations). + + This extension provides a set of new features to the OpenGL Shading + Language to support capabilities of these GPUs, extending the capabilities + of version 1.30 of the OpenGL Shading Language. Shaders + using the new functionality provided by this extension should enable this + functionality via the construct + + #extension GL_MESA_shader_integer_functions : require (or enable) + + This extension provides a variety of new features for all shader types, + including: + + * support for implicitly converting signed integer types to unsigned + types, as well as more general implicit conversion and function + overloading infrastructure to support new data types introduced by + other extensions; + + * new built-in functions supporting: + + * splitting a floating-point number into a significand and exponent + (frexp), or building a floating-point number from a significand and + exponent (ldexp); + + * integer bitfield manipulation, including functions to find the + position of the most or least significant set bit, count the number + of one bits, and bitfield insertion, extraction, and reversal; + + * extended integer precision math, including add with carry, subtract + with borrow, and extenended multiplication; + + The resulting extension is a strict subset of GL_ARB_gpu_shader5. + +IP Status + + No known IP claims. + +New Procedures and Functions + + None + +New Tokens + + None + +Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification +(OpenGL Operation) + + None. + +Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification +(Rasterization) + + None. + +Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification +(Per-Fragment Operations and the Frame Buffer) + + None. + +Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification +(Special Functions) + + None. + +Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification +(State and State Requests) + + None. + +Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile) +Specification (Invariance) + + None. + +Additions to the AGL/GLX/WGL Specifications + + None. + +Modifications to The OpenGL Shading Language Specification, Version 1.50 +(Revision 09) + + Including the following line in a shader can be used to control the + language features described in this extension: + + #extension GL_MESA_shader_integer_functions : + + where is as specified in section 3.3. + + New preprocessor #defines are added to the OpenGL Shading Language: + + #define GL_MESA_shader_integer_functions 1 + + + Modify Section 4.1.10, Implicit Conversions, p. 27 + + (modify table of implicit conversions) + + Can be implicitly + Type of expression converted to + --------------------- ----------------- + int uint, float + ivec2 uvec2, vec2 + ivec3 uvec3, vec3 + ivec4 uvec4, vec4 + + uint float + uvec2 vec2 + uvec3 vec3 + uvec4 vec4 + + (modify second paragraph of the section) No implicit conversions are + provided to convert from unsigned to signed integer types or from + floating-point to integer types. There are no implicit array or structure + conversions. + + (insert before the final paragraph of the section) When performing + implicit conversion for binary operators, there may be multiple data types + to which the two operands can be converted. For example, when adding an + int value to a uint value, both values can be implicitly converted to uint + and float. In such cases, a floating-point type is chosen if either + operand has a floating-point type. Otherwise, an unsigned integer type is + chosen if either operand has an unsigned integer type. Otherwise, a + signed integer type is chosen. + + + Modify Section 5.9, Expressions, p. 57 + + (modify bulleted list as follows, adding support for implicit conversion + between signed and unsigned types) + + Expressions in the shading language are built from the following: + + * Constants of type bool, int, int64_t, uint, uint64_t, float, all vector + types, and all matrix types. + + ... + + * The operator modulus (%) operates on signed or unsigned integer scalars + or vectors. If the fundamental types of the operands do not match, the + conversions from Section 4.1.10 "Implicit Conversions" are applied to + produce matching types. ... + + + Modify Section 6.1, Function Definitions, p. 63 + + (modify description of overloading, beginning at the top of p. 64) + + Function names can be overloaded. The same function name can be used for + multiple functions, as long as the parameter types differ. If a function + name is declared twice with the same parameter types, then the return + types and all qualifiers must also match, and it is the same function + being declared. For example, + + vec4 f(in vec4 x, out vec4 y); // (A) + vec4 f(in vec4 x, out uvec4 y); // (B) okay, different argument type + vec4 f(in ivec4 x, out uvec4 y); // (C) okay, different argument type + + int f(in vec4 x, out ivec4 y); // error, only return type differs + vec4 f(in vec4 x, in vec4 y); // error, only qualifier differs + vec4 f(const in vec4 x, out vec4 y); // error, only qualifier differs + + When function calls are resolved, an exact type match for all the + arguments is sought. If an exact match is found, all other functions are + ignored, and the exact match is used. If no exact match is found, then + the implicit conversions in Section 4.1.10 (Implicit Conversions) will be + applied to find a match. Mismatched types on input parameters (in or + inout or default) must have a conversion from the calling argument type + to the formal parameter type. Mismatched types on output parameters (out + or inout) must have a conversion from the formal parameter type to the + calling argument type. + + If implicit conversions can be used to find more than one matching + function, a single best-matching function is sought. To determine a best + match, the conversions between calling argument and formal parameter + types are compared for each function argument and pair of matching + functions. After these comparisons are performed, each pair of matching + functions are compared. A function definition A is considered a better + match than function definition B if: + + * for at least one function argument, the conversion for that argument + in A is better than the corresponding conversion in B; and + + * there is no function argument for which the conversion in B is better + than the corresponding conversion in A. + + If a single function definition is considered a better match than every + other matching function definition, it will be used. Otherwise, a + semantic error occurs and the shader will fail to compile. + + To determine whether the conversion for a single argument in one match is + better than that for another match, the following rules are applied, in + order: + + 1. An exact match is better than a match involving any implicit + conversion. + + 2. A match involving an implicit conversion from float to double is + better than a match involving any other implicit conversion. + + 3. A match involving an implicit conversion from either int or uint to + float is better than a match involving an implicit conversion from + either int or uint to double. + + If none of the rules above apply to a particular pair of conversions, + neither conversion is considered better than the other. + + For the function prototypes (A), (B), and (C) above, the following + examples show how the rules apply to different sets of calling argument + types: + + f(vec4, vec4); // exact match of vec4 f(in vec4 x, out vec4 y) + f(vec4, uvec4); // exact match of vec4 f(in vec4 x, out ivec4 y) + f(vec4, ivec4); // matched to vec4 f(in vec4 x, out vec4 y) + // (C) not relevant, can't convert vec4 to + // ivec4. (A) better than (B) for 2nd + // argument (rule 2), same on first argument. + f(ivec4, vec4); // NOT matched. All three match by implicit + // conversion. (C) is better than (A) and (B) + // on the first argument. (A) is better than + // (B) and (C). + + + Modify Section 8.3, Common Functions, p. 84 + + (add support for single-precision frexp and ldexp functions) + + Syntax: + + genType frexp(genType x, out genIType exp); + genType ldexp(genType x, in genIType exp); + + The function frexp() splits each single-precision floating-point number in + into a binary significand, a floating-point number in the range [0.5, + 1.0), and an integral exponent of two, such that: + + x = significand * 2 ^ exponent + + The significand is returned by the function; the exponent is returned in + the parameter . For a floating-point value of zero, the significant + and exponent are both zero. For a floating-point value that is an + infinity or is not a number, the results of frexp() are undefined. + + If the input is a vector, this operation is performed in a + component-wise manner; the value returned by the function and the value + written to are vectors with the same number of components as . + + The function ldexp() builds a single-precision floating-point number from + each significand component in and the corresponding integral exponent + of two in , returning: + + significand * 2 ^ exponent + + If this product is too large to be represented as a single-precision + floating-point value, the result is considered undefined. + + If the input is a vector, this operation is performed in a + component-wise manner; the value passed in and returned by the + function are vectors with the same number of components as . + + + (add support for new integer built-in functions) + + Syntax: + + genIType bitfieldExtract(genIType value, int offset, int bits); + genUType bitfieldExtract(genUType value, int offset, int bits); + + genIType bitfieldInsert(genIType base, genIType insert, int offset, + int bits); + genUType bitfieldInsert(genUType base, genUType insert, int offset, + int bits); + + genIType bitfieldReverse(genIType value); + genUType bitfieldReverse(genUType value); + + genIType bitCount(genIType value); + genIType bitCount(genUType value); + + genIType findLSB(genIType value); + genIType findLSB(genUType value); + + genIType findMSB(genIType value); + genIType findMSB(genUType value); + + The function bitfieldExtract() extracts bits through + +-1 from each component in , returning them in the + least significant bits of corresponding component of the result. For + unsigned data types, the most significant bits of the result will be set + to zero. For signed data types, the most significant bits will be set to + the value of bit +-1. If is zero, the result will be + zero. The result will be undefined if or is negative, or + if the sum of and is greater than the number of bits used + to store the operand. Note that for vector versions of bitfieldExtract(), + a single pair of and values is shared for all components. + + The function bitfieldInsert() inserts the least significant bits of + each component of into the corresponding component of . + The result will have bits numbered through +-1 + taken from bits 0 through -1 of , and all other bits taken + directly from the corresponding bits of . If is zero, the + result will simply be . The result will be undefined if or + is negative, or if the sum of and is greater than + the number of bits used to store the operand. Note that for vector + versions of bitfieldInsert(), a single pair of and values + is shared for all components. + + The function bitfieldReverse() reverses the bits of . The bit + numbered of the result will be taken from bit (-1)- of + , where is the total number of bits used to represent + . + + The function bitCount() returns the number of one bits in the binary + representation of . + + The function findLSB() returns the bit number of the least significant one + bit in the binary representation of . If is zero, -1 will + be returned. + + The function findMSB() returns the bit number of the most significant bit + in the binary representation of . For positive integers, the + result will be the bit number of the most significant one bit. For + negative integers, the result will be the bit number of the most + significant zero bit. For a of zero or negative one, -1 will be + returned. + + + (support for unsigned integer add/subtract with carry-out) + + Syntax: + + genUType uaddCarry(genUType x, genUType y, out genUType carry); + genUType usubBorrow(genUType x, genUType y, out genUType borrow); + + The function uaddCarry() adds 32-bit unsigned integers or vectors and + , returning the sum modulo 2^32. The value is set to zero if + the sum was less than 2^32, or one otherwise. + + The function usubBorrow() subtracts the 32-bit unsigned integer or vector + from , returning the difference if non-negative or 2^32 plus the + difference, otherwise. The value is set to zero if x >= y, or + one otherwise. + + + (support for signed and unsigned multiplies, with 32-bit inputs and a + 64-bit result spanning two 32-bit outputs) + + Syntax: + + void umulExtended(genUType x, genUType y, out genUType msb, + out genUType lsb); + void imulExtended(genIType x, genIType y, out genIType msb, + out genIType lsb); + + The functions umulExtended() and imulExtended() multiply 32-bit unsigned + or signed integers or vectors and , producing a 64-bit result. The + 32 least significant bits are returned in ; the 32 most significant + bits are returned in . + + +GLX Protocol + + None. + +Dependencies on ARB_gpu_shader_fp64 + + This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set + of implicit conversions supported in the OpenGL Shading Language. If more + than one of these extensions is supported, an expression of one type may + be converted to another type if that conversion is allowed by any of these + specifications. + + If ARB_gpu_shader_fp64 or a similar extension introducing new data types + is not supported, the function overloading rule in the GLSL specification + preferring promotion an input parameters to smaller type to a larger type + is never applicable, as all data types are of the same size. That rule + and the example referring to "double" should be removed. + + +Dependencies on NV_gpu_shader5 + + This extension, ARB_gpu_shader_fp64, and NV_gpu_shader5 all modify the set + of implicit conversions supported in the OpenGL Shading Language. If more + than one of these extensions is supported, an expression of one type may + be converted to another type if that conversion is allowed by any of these + specifications. + + If NV_gpu_shader5 is supported, integer data types are supported with four + different precisions (8-, 16, 32-, and 64-bit) and floating-point data + types are supported with three different precisions (16-, 32-, and + 64-bit). The extension adds the following rule for output parameters, + which is similar to the one present in this extension for input + parameters: + + 5. If the formal parameters in both matches are output parameters, a + conversion from a type with a larger number of bits per component is + better than a conversion from a type with a smaller number of bits + per component. For example, a conversion from an "int16_t" formal + parameter type to "int" is better than one from an "int8_t" formal + parameter type to "int". + + Such a rule is not provided in this extension because there is no + combination of types in this extension and ARB_gpu_shader_fp64 where this + rule has any effect. + + +Errors + + None + + +New State + + None + +New Implementation Dependent State + + None + +Issues + + (1) What should this extension be called? + + UNRESOLVED. This extension borrows from GL_ARB_gpu_shader5, so creating + some sort of a play on that name would be viable. However, nothing in + this extension should require SM5 hardware, so such a name would be a + little misleading and weird. + + Since the primary purpose is to add integer related functions from + GL_ARB_gpu_shader5, call this extension GL_MESA_shader_integer_functions + for now. + + (2) Why is some of the formatting in this extension weird? + + RESOLVED: This extension is formatted to minimize the differences (as + reported by 'diff --side-by-side -W180') with the GL_ARB_gpu_shader5 + specification. + + (3) Should ldexp and frexp be included? + + RESOLVED: Yes. Few GPUs have native instructions to implement these + functions. These are generally implemented using existing GLSL built-in + functions and the other functions provided by this extension. + + (4) Should umulExtended and imulExtended be included? + + RESOLVED: Yes. These functions should be implementable on any GPU that + can support the rest of this extension, but the implementation may be + complex. The implementation on a GPU that only supports 32bit x 32bit = + 32bit multiplication would be quite expensive. However, many GPUs + (including OpenGL 4.0 GPUs that already support this function) have a + 32bit x 16bit = 48bit multiplier. The implementation there is only + trivially more expensive than regular 32bit multiplication. + + (5) Should the pack and unpack functions be included? + + RESOLVED: No. These functions are already available via + GL_ARB_shading_language_packing. + + (6) Should the "BitsTo" functions be included? + + RESOLVED: No. These functions are already available via + GL_ARB_shader_bit_encoding. + +Revision History + + Rev. Date Author Changes + ---- ----------- -------- ----------------------------------------- + 2 7-Jul-2016 idr Fix typo in #extension line + 1 20-Jun-2016 idr Initial version based on GL_ARB_gpu_shader5.