I am not able to find _any_ rounding behavior specified for OpenGL for
float to half-float conversions. However, it is specified for fp11/fp10
which suggests round to next finite value but round-to-zero would also
be allowed, but finite values must not be flushed to infinity in either
case.
Hence I believe it makes sense to do the same for half-floats too.
We could probably also use round-to-zero consistently, which is in fact
required by d3d10 (but it doesn't seem to matter much).
Does not match the mesa core function doing the same though (which is
saying it was built to match intel gpus which I don't believe for a
second as it would cause failures in d3d10, moreover the PRM (for
ivy bridge, not listed in older manuals) while not specifying rounding
behavior clearly states finite numbers are never flushed to infinity).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
*/
uf11 = UF11_MAX_EXPONENT;
if (mantissa) {
- uf11 |= 1; /* NaN */
+ uf11 |= 1; /* NaN */
} else {
- if (sign)
- uf11 = 0; /* 0.0 */
+ if (sign)
+ uf11 = 0; /* 0.0 */
}
} else if (sign) {
return 0;
*/
uf10 = UF10_MAX_EXPONENT;
if (mantissa) {
- uf10 |= 1; /* NaN */
+ uf10 |= 1; /* NaN */
} else {
- if (sign)
- uf10 = 0; /* 0.0 */
+ if (sign)
+ uf10 = 0; /* 0.0 */
}
} else if (sign) {
return 0;
- } else if (val > 64512.0f) { /* Overflow - flush to Infinity */
+ } else if (val > 64512.0f) {
/* From the GL_EXT_packed_float spec:
*
* "Likewise, finite positive values greater than 64512 (the maximum
f32.f *= magic.f;
f32.ui -= round_mask;
- /* Clamp to infinity if overflowed */
+ /*
+ * Clamp to max finite value if overflowed.
+ * OpenGL has completely undefined rounding behavior for float to
+ * half-float conversions, and this matches what is mandated for float
+ * to fp11/fp10, which recommend round-to-nearest-finite too.
+ * (d3d10 is deeply unhappy about flushing such values to infinity, and
+ * while it also mandates round-to-zero it doesn't care nearly as much
+ * about that.)
+ */
if (f32.ui > f16inf)
- f32.ui = f16inf;
+ f32.ui = f16inf - 1;
f16 = f32.ui >> 13;
}