git.libre-soc.org Git - mesa.git/log

clover: Add CL_MEM_HOST_* flag checks.

Those flags have been introduced in OpenCL 1.2.

[ Francisco Jerez: Rebase.  Throw CL_INVALID_VALUE from
  clCreateSubBuffer if the subbuffer drops access flags from its
  parent.  Use single function taking the set of allowed host access
  flags to validate memory transfer operands. ]

Reviewed-by: Francisco Jerez <currojerez@riseup.net>

clover: Factor out memory object flags validation to a helper function.

And define constants for commonly used subsets of flags to save some
typing.

Reviewed-and-tested-by: EdB <edb+mesa@sigluy.net>

vc4: Update to current kernel sources.

New BO create and mmap ioctls are added. The submit ABI gains a flags
argument, and the pointers are fixed at 64-bit. Shaders are now fixed at
the start of their BOs.

r600: Fix build after 984f3069370cd4a347cb38269d430b428385affd

Same as for the CLAMP macro, undef it before including a header file that
tries to make fields with that name.

st/nine: Mark end of non-void function unreachable

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Eric Anholt <eric@anholt.net>

gallium: include util/macros.h

The most common macros are defined there, no use to duplicate these
Clean up the already redefinded macros

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Eric Anholt <eric@anholt.net>

driconf: Update Catalan translation

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>

driconf: Update Spanish translation

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>

mesa: Add missing error checks to GetProgramInfoLog, GetShaderInfoLog and GetProgramiv

Fixes 3 dEQP tests:
* dEQP-GLES3.functional.negative_api.state.get_program_info_log
* dEQP-GLES3.functional.negative_api.state.get_shader_info_log
* dEQP-GLES3.functional.negative_api.state.get_programiv

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965: Fix non-AA wide line rendering with fractional line widths

"(...)Let w be the width rounded to the nearest integer (...). If the
line segment has endpoints given by (x0,y0) and (x1,y1) in window
coordinates, the segment with endpoints (x0,y0-(w-1)/2) and
(x1,y1-(w-1/2)) is rasterized, (...)"

The hardware it not rounding the line width, so we should do it.

Also, we should be careful not to go beyond the hardware limits
for the line width after it gets rounded. Gen6-7 define a maximum line
width slightly below 8.0, so we should advertise a maximum line
width lower than 7.5 to make sure that 7.0 is the maximum integer
line width that we can select. Since the line width granularity in these
platforms is 0.125, we choose 7.375. Other platforms advertise rounded
maximum line widths, so those are fine.

Fixes the following 3 dEQP tests:
dEQP-GLES3.functional.rasterization.primitives.lines_wide
dEQP-GLES3.functional.rasterization.fbo.texture_2d.primitives.lines_wide
dEQP-GLES3.functional.rasterization.fbo.rbo_singlesample.primitives.lines_wide

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

mesa: Fix ctx->Texture.CubeMapSeamless

The intel driver code, and apparently all other Mesa drivers, call
_mesa_initialize_context early in the CreateContext hook. That
function will end up calling _mesa_init_texture which will do:

ctx->Texture.CubeMapSeamless = _mesa_is_gles3(ctx);

But this won't work at this point, since _mesa_is_gles3 requires
ctx->Version to be set and that will not happen until late
in the CreateContext hook, when _mesa_compute_version is called.

We can't just move the call to _mesa_compute_version before
_mesa_initialize_context since it needs that available extensions
have been computed, which again requires other things to be
initialized, etc. Instead, we enable seamless cube maps since
GLES2, which should work for most implementations, and expect
drivers that don't support this to disable it manually as part
of their context initialization setup.

Fixes the following 192 dEQP tests:
dEQP-GLES3.functional.texture.filtering.cube.formats.*
dEQP-GLES3.functional.texture.filtering.cube.sizes.*
dEQP-GLES3.functional.texture.filtering.cube.combinations.*
dEQP-GLES3.functional.texture.mipmap.cube.*
dEQP-GLES3.functional.texture.vertex.cube.filtering.*
dEQP-GLES3.functional.texture.vertex.cube.wrap.*
dEQP-GLES3.functional.shaders.texture_functions.texturelod.samplercube_fixed_*

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Return error if BeginQuery is called with an existing object of different type

Section 2.14 Asynchronous Queries, page 84 of the OpenGL ES 3.0.4
spec states:

  "BeginQuery generates an INVALID_OPERATION error if any of the
   following conditions hold: [...] id is the name of an
   existing query object whose type does not match target; [...]

Similar wording exists in the OpenGL 4.5 spec, section 4.2. QUERY
OBJECTS AND ASYNCHRONOUS QUERIES, page 43.

Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.fragment.begin_query

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Return INVALID_OPERATION when querying a never bound Query obj

Section 2.14 Asynchronous Queries, page 84 of the OpenGL ES 3.0.4 states:

"The command void GenQueries( sizei n, uint *ids ); returns n previously unused
query object names in ids. These names are marked as used, for the purposes of
GenQueries only, but no object is associated with them until the first time they
are used by BeginQuery."

This means that any attempt to use or query a Query object id before it has ever
been bound by calling glBeginQuery, should be assume to be an invalid object.

Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.state.get_query_objectuiv

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Add _mesa_is_array_texture helper

Reviewed-by: Brian Paul <brianp@vmware.com>

mesa: Fix error validating args for TexSubImage3D

The zoffset and depth values were not being considered when calling
error_check_subtexture_dimensions().

Fixes 2 dEQP tests:
* dEQP-GLES3.functional.negative_api.texture.texsubimage3d_neg_offset
* dEQP-GLES3.functional.negative_api.texture.texsubimage3d_invalid_offset

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.4 10.5" <mesa-stable@lists.freedestkop.org>

i965/blorp: round to nearest when converting float into integer

Fixes:

dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_nearest
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_linear
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_y_nearest
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_y_linear
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_dst_y_nearest
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_dst_y_linear
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_dst_x_nearest
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_dst_x_linear
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_dst_y_nearest
dEQP-GLES3.functional.fbo.blit.rect.out_of_bounds_reverse_src_dst_y_linear

No piglit regressions.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965: Perform program state upload outside of atom handling

Across the board of the various generations, the intial few atoms in
all of the atom lists are basically the same, (performing uploads for
the various programs). The only difference is that prior to gen6
there's an ff_gs upload in place of the later gs upload.

In this commit, instead of using the atom lists for this program state
upload, we add a new function brw_upload_programs that calls into the
per-stage upload functions which in turn check dirty bits and return
immediately if nothing needs to be done.

This commit is intended to have no functional change. The motivation
is that future code, (such as the shader cache), wants to have a
single function within which to perform various operations before and
after program upload, (with some local variables holding state across
the upload).

It may be worth looking at whether some of the other functionality
currently handled via atoms might also be more cleanly handled in a
similar fashion.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

egl, wayland: RGB565 format support on Back-buffer

In current code, color format is always hardcoded to
__DRI_IMAGE_FORMAT_ARGB8888 when buffer or DRI image is
allocated in function calls, get_back_bo and dri2_get_buffers,
regardless of current target's color format. This problem
may leads to incorrect render pitch calculation, which
eventually ends up with wrong offset of pixels in
the frame buffer when the image is in different color format
from dri surf's, especially with different bpp. (e.g. RGB565-16bpp)

Attached code patch simply adds RGB565 and XRGB8888 cases to two
functions noted above to resolve the issue.

v2: added a case of XRGB8888, format and bpp selection is done
via switch-case (not "if-else" anymore)

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "10.5" <mesa-stable@lists.freedesktop.org>

mesa: move math-related function into new c99_math.h file

The alternative would be to include math.h in c99_compat.h but that
seems heavy-handed.

This patch also replaces INLINE with inline in the c99 math function
wrappers.

Fixes MSVC build.

Acked-by: Matt Turner <mattst88@gmail.com>

nir/gcm: Add some missing break statements

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir: Copy-propagate vecN operations that are actually moves

We were already do this for ALU operations but we haven't for non-ALU
operations.  This changes that.

total NIR instructions in shared programs: 2039883 -> 2022338 (-0.86%)
NIR instructions in affected programs:     1768850 -> 1751305 (-0.99%)
helped:                                    14244
HURT:                                      124

total FS instructions in shared programs: 4083960 -> 4084036 (0.00%)
FS instructions in affected programs:     7302 -> 7378 (1.04%)
helped:                                   12
HURT:                                     51

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

ra: Disable round-robin strategy for optimistically colorable nodes.

The round-robin allocation strategy is expected to decrease the amount
of false dependencies created by the register allocator and give the
post-RA scheduling pass more freedom to move instructions around.  On
the other hand it has the disadvantage of increasing fragmentation and
decreasing the number of equally-colored nearby nodes, what increases
the likelihood of failure in presence of optimistically colorable
nodes.

This patch disables the round-robin strategy for optimistically
colorable nodes.  These typically arise in situations of high register
pressure or for registers with large live intervals, in both cases the
task of the instruction scheduler shouldn't be constrained excessively
by the dense packing of those nodes, and a spill (or on Intel hardware
a fall-back to SIMD8 mode) is invariably worse than a slightly less
optimal scheduling.

Shader-db results on the i965 driver:

total instructions in shared programs: 5488539 -> 5488489 (-0.00%)
instructions in affected programs:     1121 -> 1071 (-4.46%)
helped:                                1
HURT:                                  0
GAINED:                                49
LOST:                                  5

v2: Re-enable round-robin already for the lowest one of the nodes
    pushed optimistically onto the sack (Connor).
v3: Use UINT_MAX instead of ~0, open-code MIN2 (Jason, Connor).

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

i965/fs: Fix lower_load_payload() not to use an incorrect half for immediates and uniforms.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

i965/fs: Fix lower_load_payload() to take into account non-zero reg_offset.

Fixes metadata guess when instructions in the program specify a
destination register with non-zero reg_offset and when the payload of
a LOAD_PAYLOAD spans several registers.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().

MRFs cannot be read from anyway so they cannot possibly be a valid
source of LOAD_PAYLOAD.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

i965/fs: Less broken handling of force_writemask_all in lower_load_payload().

It's perfectly fine to read the second half of a register written with
force_writemask_all from a first half MOV instruction or vice versa, and
lower_load_payload shouldn't mark the whole MOV as belonging to the second
half in that case. Replicate the same metadata to both halves of the
destination when writemasking is disabled.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

mesa/vbo: Use unreachable to silence uninitialized var warning.

Reviewed-by: Eric Anholt <eric@anholt.net>

mesa: Move START/END_FAST_MATH macros to their only use.

Reviewed-by: Eric Anholt <eric@anholt.net>

mesa: Remove definition of NULL.

If your stdlib.h doesn't define this you should fix your stdlib.h.

Reviewed-by: Eric Anholt <eric@anholt.net>

mesa: Use assert() instead of ASSERT wrapper.

Acked-by: Eric Anholt <eric@anholt.net>

mesa: Remove CHECK macro.

There's some commentary about how it's defined by other "modules", and
maybe that was true in 2000 when the code was added.

Reviewed-by: Eric Anholt <eric@anholt.net>

mesa: Remove dead CAPI define.

Reviewed-by: Eric Anholt <eric@anholt.net>

gallium: Use util_cpu_to_le{16,32} in many more places.

... and util_le{16,32}_to_cpu. I think I've used the right ones for
describing the actual operation performed (even though they're both just
"byte-swap this if I'm on big-endian").

The Linux Kernel has typedefs __le32/__be32 and friends that static
analysis tools can use to check that byte-orderings are correct. It
might be interesting to apply that here as well.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

gallium/util: Use HAVE___BUILTIN_* macros.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

mesa: Move C99 MSVC compatibility code from u_math.h to c99_compat.h.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

i965: Link test programs with gtest before pthreads.

Cc: "10.5" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=540962

osmesa: add gallium include dirs to Makefile.am

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89260
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

util: move pipe_prim_names array into u_prim_name()

Also, wrapping the array in #ifdef DEBUG / #endif doesn't seem necessary.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

util: rewrite debug_print_transfer_flags() using debug_dump_flags()

Add add missing PIPE_TRANSFER_PERSISTENT, PIPE_TRANSFER_COHERENT flags.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

mesa: Adds missing error condition in _mesa_check_sample_count()

This corrects a trivial error introduced in commit
19252fee46b835cb4f6b1cce18d7737d62b64a2e. That patch was merged recently
and omits one condition (that 'samples' is greater than zero) in one of
the error checks. That error will definitely cause regressions.

Also corrects the reference to the specification above the error check,
which was wrongly quoting OpenGL instead of OpenGL-ES.

Reviewed-by: Martin Peres <martin.peres@linux.intel.com>

radeonsi: fix a warning caused by previous commit

Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>

radeonsi: fix point sprites

Broken by a27b74819ad375e8c0bc88e13f42c951d2b5cd6a.

This fix is critical and should be ported to stable ASAP.

Cc: 10.5 10.4 <mesa-stable@lists.freedesktop.org>

i965/skl: Use 1 register for uniform pull constant payload

When under dispatch_width=16 the previous code would allocate 2 registers for
the payload when only one is needed. This manifested itself through bugs on SKL
which needs to mess with this instruction.

Ken though this might impact shader-db, but apparently it doesn't

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89118
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88999
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Timo Aaltonen <timo.aaltonen@canonical.com>

nir: Generalize the optimization of subs of subs from 0.

I initially wrote this based on the "(('fneg', ('fneg', a)), a)" above,
but we can generalize it and make it more potentially useful. In the
specific original case of a 0 for our new 'a' argument, it'll get further
algebraic optimization once the 0 is an argument to the new add.

No shader-db effects.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir: Collapse repeated bcsels on the same argument.

vc4 results:
total instructions in shared programs: 39881 -> 39794 (-0.22%)
instructions in affected programs: 6302 -> 6215 (-1.38%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir: When faced with a csel on !condition, just flip the arguments.

total NIR instructions in shared programs: 39426 -> 39411 (-0.04%)
NIR instructions in affected programs: 3748 -> 3733 (-0.40%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir: Allow nir_opt_algebraic to see booleanness through &&, ||, ^, !.

We have some useful optimizations to drop things like 'ine a, 0' on a
boolean argument, but if 'a' came from logical operations on bools, it
couldn't tell. These kinds of constructs appear as a result of TGSI->NIR
quite frequently (at least with if flattening), so being a little more
aggressive in detecting booleans can pay off.

v2: Add ixor as a booleanness-preserving op (Suggestion by Connor).

vc4 results:
total instructions in shared programs: 40207 -> 39881 (-0.81%)
instructions in affected programs: 6677 -> 6351 (-4.88%)

Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir: Add a couple of simplifications of csel operations.

vc4 was already cleaning these up, but it does shave 4 NIR instructions in
shader-db.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

glsl: ensure that enter/leave record get a record type

May make life easier for tools like Coverity.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>

tgsi: avoid returning pointer to local var, make it static

Spotted by Coverity.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

freedreno/a4xx: set PC_PRIM_VTX_CNTL.VAROUT properly

Fixes xonotic, some webgl stuff, and really pretty much anything with
more than 4 varyings.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno: update generated headers

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/a4xx: bit of cleanup

Signed-off-by: Rob Clark <robclark@freedesktop.org>

loader: not having a pci-id should not be a warn

If there is no pci-id, which is valid for vc4 and freedreno, just emit
an info msg. Keep malformed but existing pci-id's as a warning.

Mostly just to clean up a warning that confuses users for the non-pci
devices.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno: implement fence

I never actually implemented the stubbed out fence stuff back in the
early days. Fix that.

We'll need a few libdrm_freedreno changes to handle timeout properly,
so ignore that for now to avoid a libdrm_freedreno dependency bump.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

freedreno/a2xx: fix increment in assert

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88883
Signed-off-by: Rob Clark <robclark@freedesktop.org>

i965/fs: Use fs_reg for CS/VS atomics pixel mask immediate data

The brw_imm_ud will yield a HW_REG which then will introduce a barrier
for certain optimization opportunities.

No piglit regressions seen with gen8 (simd8vs).

Suggested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/fs: Set pixel/sample mask for compute shaders atomic ops

For fragment programs, we pull this mask from the payload header. The same
mask doesn't exist for compute shaders, so we set all bits to enabled.

Previously we were setting 0xff to support SIMD8 VS, but with CS we
support SIMD16, and therefore we change this to 0xffff.

Related commits for SIMD8 VS:

commit d9cd982d556be560af3bcbcdaf62b6b93eb934a5
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Sun Feb 15 20:06:59 2015 -0800
    i965/simd8vs: Fix SIMD8 atomics

commit 4a95be9772a255776309f23180519a4a8560f2dd
Author: Jordan Justen <jordan.l.justen@intel.com>
Date:   Tue Feb 17 09:57:35 2015 -0800
    i965/simd8vs: Fix SIMD8 atomics (read-only)

Note: this mask is ANDed with the execution mask, so some channels may not end
up issuing the atomic operation.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

ilo: R32G32B32_FLOAT need no special care on Gen8+

Gen8+ must use VALIGN_4. Unlike prior Gens, R32G32B32_FLOAT should supposedly
support VALIGN_4.

ilo: 128 BPP formats can use TiledY on Gen7.5+

The restriction is lifted.

nvc0: enable double support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0/ir: remove merge/split pairs to allow normal propagation to occur

Because the TGSI interface creates merges for each instruction source
and then splits them back out, there are a lot of unnecessary
merge/split pairs which do essentially nothing. The various modifier/etc
propagation doesn't know how to walk though those, so just remove them
when they're unnecessary.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0/ir: add support for new TGSI double opcodes

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0/ir: handle zero and negative sqrt arguments

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0/ir: no instruction can load a double immediate

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0/ir: fix lowering of RSQ/RCP/SQRT/MOD to work with F64

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

gm107/ir: fix F2F flipped stype/dtype flags

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

gm107/ir: fix DSET boolean float flag

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

gm107/ir: fix DMUL opcode encoding

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

gk110/ir: add emission of dadd/dmul/dmad opcodes

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0/ir: add emission of dadd/dmul/dmad opcodes, fix minmax

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

mesa: don't enable NV_fragment_program_option with swrast

Since dropping some NV_fragment_program opcodes (commits
868f95f1da74cf6dd7468cba1b56664aad585ccb, a3688d686f147f4252d19b298ae26d4ac72c2e08)
we can no longer parse all opcodes necessary for this extension, leading
to bugs (https://bugs.freedesktop.org/show_bug.cgi?id=86980).
Hence don't announce support for it in swrast (no other driver enabled it).
(Note that remnants of some NV_fp/vp extensions remain, they could be
dropped but are required as hacks for getting viewperf11 catia to run.)

drivers/x11: add gallium include dirs to Makefile.am

Fixes xlib driver build after e8c5cbfd921680c.

Acked-by: Matt Turner <mattst88@gmail.com>

vbo: fix an unitialized-variable warning

It looks like a bug to me.

Cc: 10.5 10.4 10.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>

gallium/sw/kms: fix a type-mismatch warning

Reviewed-by: Brian Paul <brianp@vmware.com>

gallium/sw/kms: don't redefine DEBUG

Reviewed-by: Brian Paul <brianp@vmware.com>

targets/d3dadapter9: remove an unused variable

Reviewed-by: Brian Paul <brianp@vmware.com>

tgsi: fix type-mismatch warning

Reviewed-by: Brian Paul <brianp@vmware.com>

gallivm: fix uninitialized-variable warnings

Reviewed-by: Brian Paul <brianp@vmware.com>

mesa: Have configure define NDEBUG, not mtypes.h.

mtypes.h had been defining NDEBUG (used by assert) if DEBUG was not
defined. Confusing and bizarre that you don't get NDEBUG if you don't
include mtypes.h.

... which is just what happened in commit bef38f62e.

Let's let configure define this for us if not using --enable-debug.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir: Fix the Mesa build without -DDEBUG.

With -DDEBUG -UNDEBUG, this assert uses reg_state::stack_size, which
doesn't exist, breaking the build:

assert(state->states[index].index < state->states[index].stack_size);

Switch it to ifndef NDEBUG, so the field will exist if the assertion
actually generates code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

nir: Drop dependency on mtypes.h for core NIR.

One less new directory necessary for gallium code that wants to interact
with NIR.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

glsl: Only include mtypes from glsl_types.h for the C++ code that needs it.

It's used in one of the methods, not in the structure definitions.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

util: Move Mesa's bitset.h to util/.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

mesa: Make bitset.h not rely on Mesa-specific types and functions.

Note that we can't use u_math.h's align() because it's a function instead
of a macro, while BITSET_DECLARE needs a constant expression for nouveau's
usage in global declarations.

v2: Stick some parens around the bits macro argument usage (review by Jose).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

mesa: Use u_math.h from macros.h

This avoids duplication of some macros and other definitions across the
tree.

Note that COPY_4FV switches from a memcpy-based implementation to an
assignment of 4 floats.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

gallium/util: Don't include unused debug functions from u_math.h

It introduces references to gallium util/ symbols which means we don't get
to include it from outside-of-gallium code.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

mesa: Add gallium include dirs to more parts of the tree.

v2: Try to patch up the scons bits.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

gallium/radeon: fix an uninitialized-variable warning

gallium: add new double-related shader caps to all the getters

Missed a few drivers in the earlier changes, this should fix up all the
ones that print unknown caps or don't have a default statement.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>

svga: add missing _DROUND,DFRACEXP_DLDEXP_SUPPORTED switch cases

To silence unhandled switch case warnings.

radeonsi: don't use SQC_CACHES to flush ICACHE and KCACHE on SI

This reverts 73c2b0d18c51459697d8ec194ecfc4438c98c139.

It doesn't seem to be reliable. It's probably missing a wait packet or
something, because it's just a register write and doesn't wait for anything.
SURFACE_SYNC at least seems to wait until the flush is done. Just guessing.

Let's not complicate things and revert this.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88561

Cc: 10.5 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>

i965/gen6: Fix GL_GEOMETRY_SHADER_PRIMITIVES_EMITTED_ARB

In gen6 we need to compute the primitive count in the generated GS program.
The current implementation only counts full primitives, that is, if the
output primitive type is a triangle strip, it won't count individual
triangles in the strip, only complete strips.

If we want to count basic primitives instead we have two options: rework
the assembly code we generate for strip primitives or simply use
CL_INVOCATION_COUNT to resolve the query and let the hardware do that work
for us. This patch implements the latter approach.

Fixes the following piglit test:
bin/arb_pipeline_statistics_query-geom -auto

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89210
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

mesa: Check that draw buffers are valid for glDrawBuffers on GLES3

Section 4.2 (Whole Framebuffer Operations) of the OpenGL 3.0 specification
says:

"Each buffer listed in bufs must be BACK, NONE, or one of the values from
table 4.3 (NONE, COLOR_ATTACHMENTi)".

Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.buffer.draw_buffers

Reviewed-by: Matt Turner <mattst88@gmail.com>

glsl: don't allow invariant qualifiers for interface blocks

GLSL 1.50 and GLSL 4.40 specs, they both say the same in
"Interface Blocks" section:

"If optional qualifiers are used, they can include interpolation qualifiers,
auxiliary storage qualifiers, and storage qualifiers and they must declare
an input, output, or uniform member consistent with the interface qualifier
of the block"

From GLSL ES 3.0, chapter 4.3.7 "Interface Blocks", page 38:

"GLSL ES 3.0 does not support interface blocks for shader inputs or outputs."

and from GLSL ES 3.0, chapter 4.6.1 "The invariant qualifier", page 52.

"Only variables output from a shader can be candidates for invariance."

This patch fixes the following dEQP tests:

dEQP-GLES3.functional.shaders.declarations.invalid_declarations.invariant_uniform_block_2_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.invariant_uniform_block_2_fragment

No piglit regressions.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
v2:

- Enable this check for GLSL.

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

vc4: Keep an array of pointers to instructions defining the temps around.

The optimization passes are always regenerating it and throwing it away,
but it's not hard to keep track of.

vc4: Move qir_uniform() and the constant-value versions to vc4_qir.c/h.

I may want them in optimization passes, and they're not really particular
to the program translation stage.

vc4: Enforce one-uniform-per-instruction after optimization.

This lets us more intelligently decide which uniform values should be put
into temporaries, by choosing the most reused values to push to temps
first.

total uniforms in shared programs: 13457 -> 13433 (-0.18%)
uniforms in affected programs: 1524 -> 1500 (-1.57%)
total instructions in shared programs: 40198 -> 40019 (-0.45%)
instructions in affected programs: 6027 -> 5848 (-2.97%)

I noticed this opportunity because with the NIR work, some programs were
happening to make different uniform copy propagation choices that
significantly increased instruction counts.

vc4: Rename add_uniform() to qir_uniform().

vc4: Shut up runtime warnings about new pipe caps.