mesa.git
6 years agonir: Add a large constants optimization pass
Jason Ekstrand [Fri, 29 Jun 2018 02:16:58 +0000 (19:16 -0700)]
nir: Add a large constants optimization pass

This pass searches for reasonably large local variables which can be
statically proven to be constant and moves them into shader constant
data.  This is especially useful when large tables are baked into the
shader source code because they can be moved into a UBO by the driver to
reduce register pressure and make indirect access cheaper.

v2 (Jason Ekstrand):
 - Use a size/align function to ensure we get the right alignments
 - Use the newly added deref offset helpers

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir: Add a concept of constant data associated with a shader
Jason Ekstrand [Fri, 29 Jun 2018 02:16:19 +0000 (19:16 -0700)]
nir: Add a concept of constant data associated with a shader

This commit adds a concept to NIR of having a blob of constant data
associated with a shader.  Instead of being a UBO or uniform that can be
manipulated by the client, this constant data considered part of the
shader and remains constant across all invocations of the given shader
until the end of time.  To access this constant data from the shader, we
add a new load_constant intrinsic.  The intention is that drivers will
eventually lower load_constant intrinsics to load_ubo, load_uniform, or
something similar.  Constant data will be used by the optimization pass
in the next commit but this concept may also be useful for OpenCL.

v2 (Jason Ekstrand):
 - Rename num_constants to constant_data_size (anholt)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir/deref: Add helpers for getting offsets
Jason Ekstrand [Fri, 29 Jun 2018 21:44:19 +0000 (14:44 -0700)]
nir/deref: Add helpers for getting offsets

These are very similar to the related function in nir_lower_io except
that they don't handle per-vertex or packed things (that could be added,
in theory) and they take a more detailed size/align function pointer.
One day, we should consider switching nir_lower_io over to using the
more detailed size/align functions and then we could make it use these
helpers instead of having its own.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir/types: Add a natural size and alignment helper
Jason Ekstrand [Fri, 29 Jun 2018 21:14:52 +0000 (14:14 -0700)]
nir/types: Add a natural size and alignment helper

The size and alignment are "natural" in the sense that everything is
aligned to a scalar.  This is a bit tighter than std430 where vec3s are
required to be aligned to a vec4.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agonir: Add a deref_instr_has_indirect helper
Jason Ekstrand [Fri, 29 Jun 2018 02:46:01 +0000 (19:46 -0700)]
nir: Add a deref_instr_has_indirect helper

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoutil/macros: Import ALIGN_POT from ralloc.c
Jason Ekstrand [Fri, 29 Jun 2018 21:59:56 +0000 (14:59 -0700)]
util/macros: Import ALIGN_POT from ralloc.c

v2 (Jason Ekstrand):
 - Rename y to pot_align (Brian)
 - Also use ALIGN_POT in build_id.c and slab.c (Brian)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agov3d: Claim PIPE_CAP_TGSI_CAN_READ_OUTPUTS.
Eric Anholt [Mon, 2 Jul 2018 17:19:47 +0000 (10:19 -0700)]
v3d: Claim PIPE_CAP_TGSI_CAN_READ_OUTPUTS.

Fixes warning at screen creation.  We store our outputs in normal temps
and just emit them to shader I/O at the end, due to our I/O ordering
requirements, so reading "outputs" in NIR is fine.

6 years agoac: move all LLVM module initialization into ac_create_module
Marek Olšák [Sat, 30 Jun 2018 04:54:30 +0000 (00:54 -0400)]
ac: move all LLVM module initialization into ac_create_module

This removes some ugly code around module initialization.

Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agov3d: Emit a TF flush after each draw using TF.
Eric Anholt [Mon, 25 Jun 2018 17:12:03 +0000 (10:12 -0700)]
v3d: Emit a TF flush after each draw using TF.

This fixes GPU hangs on 7278 in transform feedback tests such as
GTF-GLES3.gtf.GL3Tests.transform_feedback2.transform_feedback2_basic

6 years agonv50/ir: handle clipvertex for geom and tess shaders as well
Karol Herbst [Sat, 30 Jun 2018 02:58:30 +0000 (04:58 +0200)]
nv50/ir: handle clipvertex for geom and tess shaders as well

this will be needed for compatibility profiles

v2: handle tess shaders

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
6 years agogallium/u_vbuf: drop min/max-scanning for empty indirect draws
Erik Faye-Lund [Mon, 25 Jun 2018 20:10:31 +0000 (21:10 +0100)]
gallium/u_vbuf: drop min/max-scanning for empty indirect draws

When building with asserts enabled, we'll end up triggering an assert
in pipe_buffer_map_range down this code-path, due to trying to map
an empty range. Even if we avoid that, we'll trigger another assert
a bit later, because u_vbuf_get_minmax_index returns a min-index of
-1 here, which gets promoted to an unsigned value, and gives us an
out-of-bounds buffer-mapping offset.

Since we can't really have a well-defined min/max range here when
the range is empty anyway, we should just drop this dance in the
first place. After all, no rendering is going to be produced.

This fixes a crash in dEQP-GLES31.functional.draw_indirect.random.0
on VirGL for me.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradv: reset the image's predicate after a color decompression pass
Samuel Pitoiset [Wed, 18 Apr 2018 12:34:55 +0000 (14:34 +0200)]
radv: reset the image's predicate after a color decompression pass

After performing a fast-clear eliminate, a FMASK decompress,
or a DCC decompress, we can reset the predicate to FALSE.

With that, the GPU should be able to skip unnecessary color
decompression passes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: enable/disable predication for the DCC decompression pass
Samuel Pitoiset [Wed, 18 Apr 2018 12:34:54 +0000 (14:34 +0200)]
radv: enable/disable predication for the DCC decompression pass

Performing a DCC decompression pass is currently pretty rare,
but using predication allows the GPU to skip unnecessary passes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: add padding for the UMR disassembler
Samuel Pitoiset [Wed, 27 Jun 2018 08:39:51 +0000 (10:39 +0200)]
radv: add padding for the UMR disassembler

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agovirgl: Add support for glGetMultisample
Gert Wollny [Fri, 29 Jun 2018 10:39:06 +0000 (12:39 +0200)]
virgl: Add support for glGetMultisample

Use caps to obtain the multisample sample positions for up to 16
positions and implement the according Gallium interface.

This implemenation (plus its counterpart in virglrenderer) assume that
the fixed sample position are always the same for a given number of samples
over the whole live time of a qemu session. It also assumes that sample
series are only given for 2, 4, 8, and 16 samples, and for intermediate
numbers N of samples the next higher supported set from above list is picked
and the sample positions for the first N samples are returned accordingly.

Fixes (when run on GL host):
    dEQP-GLES31.functional.texture.multisample.samples_1.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_2.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_3.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_4.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_8.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_10.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_12.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_13.sample_position
    dEQP-GLES31.functional.texture.multisample.samples_16.sample_position

v2: remove unrelated chunk (thanks Ilia Mirkin)
v3: - also return positions for intermediate sample counts
    - fix unused varible warning
    - update description
v4: explain better what this patch assumes and how it handles sample numbers
    that are not directly advertised (thanks go to Erik Faye-Lund for making
    me aware that this should be documented)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
6 years agost/mesa: Also check for PIPE_FORMAT_A8R8G8B8_SRGB for texture_sRGB
Tomeu Vizoso [Fri, 22 Jun 2018 13:59:10 +0000 (15:59 +0200)]
st/mesa: Also check for PIPE_FORMAT_A8R8G8B8_SRGB for texture_sRGB

and PIPE_FORMAT_R8G8B8A8_SRGB, as well.

The reason for this is that when Virgl runs with GLES on the host, it
cannot directly upload textures in BGRA.

So to avoid a conversion step, consider the RGB sRGB formats as well for
this extension.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agost/mesa: Fall back to R8G8B8A8_SRGB for ETC2
Tomeu Vizoso [Fri, 22 Jun 2018 13:59:09 +0000 (15:59 +0200)]
st/mesa: Fall back to R8G8B8A8_SRGB for ETC2

If the driver doesn't support PIPE_FORMAT_B8G8R8A8_SRGB, fall back to
PIPE_FORMAT_R8G8B8A8_SRGB.

Drivers such as Virgl will have a hard time supporting
PIPE_FORMAT_B8G8R8A8_SRGB when the host runs GLES, as GL_BGRA isn't as
well suported there.

So go with PIPE_FORMAT_R8G8B8A8_SRGB so these drivers can avoid a
conversion copy.

v2: Fix typo in commit message

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agost/mesa/i965: Allow decompressing ETC2 to GL_RGBA
Tomeu Vizoso [Fri, 22 Jun 2018 13:59:08 +0000 (15:59 +0200)]
st/mesa/i965: Allow decompressing ETC2 to GL_RGBA

When Mesa itself implements ETC2 decompression, it currently
decompresses to formats in the GL_BGRA component order.

That can be problematic for drivers which cannot upload the texture data
as GL_BGRA, such as Virgl when it's backed by GLES on the host.

So this commit adds a flag to _mesa_unpack_etc2_format so callers can
specify the optimal component order.

In Gallium's case, it will be requested if the format isn't in
PIPE_FORMAT_B8G8R8A8_SRGB format.

For i965, it will remain GL_BGRA, as before.

v2: * Remove unnecesary include (Emil Velikov)

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoanv/cmd_buffer: make descriptors dirty when emitting base state address
Iago Toral Quiroga [Thu, 28 Jun 2018 11:12:53 +0000 (13:12 +0200)]
anv/cmd_buffer: make descriptors dirty when emitting base state address

Every time we emit a new state base address we will need to re-emit our
binding tables, since they might have been emitted with a different base
state adress.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
CC: <mesa-stable@lists.freedesktop.org>
6 years agoanv/cmd_buffer: clean dirty push constants flag after emitting push constants
Iago Toral Quiroga [Thu, 28 Jun 2018 11:16:53 +0000 (13:16 +0200)]
anv/cmd_buffer: clean dirty push constants flag after emitting push constants

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
CC: <mesa-stable@lists.freedesktop.org>
6 years agoanv/cmd_buffer: never shrink the push constant buffer size
Iago Toral Quiroga [Thu, 28 Jun 2018 06:10:16 +0000 (08:10 +0200)]
anv/cmd_buffer: never shrink the push constant buffer size

If we have to re-emit push constant data, we need to re-emit all
of it.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
CC: <mesa-stable@lists.freedesktop.org>
6 years agogallium/llvmpipe: Enable support bptc format.
Denis Pauk [Tue, 26 Jun 2018 20:30:52 +0000 (23:30 +0300)]
gallium/llvmpipe: Enable support bptc format.

v2: none
v3: none

Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
CC: Marek Olšák <maraeo@gmail.com>
CC: Rhys Perry <pendingchaos02@gmail.com>
CC: Matt Turner <mattst88@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
6 years agogallium/softpipe: Enable support bptc format.
Denis Pauk [Tue, 26 Jun 2018 20:30:51 +0000 (23:30 +0300)]
gallium/softpipe: Enable support bptc format.

v2: none
v3: none

Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
CC: Marek Olšák <maraeo@gmail.com>
CC: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
6 years agogallium/auxiliary: Add helper support for bptc format compress/decompress
Denis Pauk [Tue, 26 Jun 2018 20:30:50 +0000 (23:30 +0300)]
gallium/auxiliary: Add helper support for bptc format compress/decompress

Reuse code shared with mesa/main/texcompress_bptc.

v2: Use block decompress function
v3: Include static bptc code from texcompress_bptc_tmp.h
Suggested-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
CC: Nicolai Hähnle <nicolai.haehnle@amd.com>
CC: Marek Olšák <maraeo@gmail.com>
CC: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add header for share bptc decompress functions
Denis Pauk [Tue, 26 Jun 2018 20:30:49 +0000 (23:30 +0300)]
mesa: add header for share bptc decompress functions

Move shared bptc functions to texcompress_bptc_tmp.h:
* fetch_rgba_unorm_from_block
* fetch_rgb_float_from_block
* compress_rgba_unorm
* compress_rgb_float

Create decompress functions:
* decompress_rgba_unorm
* decompress_rgb_float

Functions will be reused in gallium/auxiliary code.

v2: Add block decompress function
v3: Move all shared code to header
Suggested-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
CC: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
6 years agoglsl/cache: save and restore ExternalSamplersUsed
Marek Olšák [Sat, 30 Jun 2018 04:57:08 +0000 (00:57 -0400)]
glsl/cache: save and restore ExternalSamplersUsed

Shaders that need special code for external samplers were broken if
they were loaded from the cache.

Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agonir: fix selection of loop terminator when two or more have the same limit
Timothy Arceri [Mon, 4 Jun 2018 06:26:46 +0000 (16:26 +1000)]
nir: fix selection of loop terminator when two or more have the same limit

We need to add loop terminators to the list in the order we come
across them otherwise if two or more have the same exit condition
we will select that last one rather than the first one even though
its unreachable.

This fix is for simple unrolls where we only have a single exit
point. When unrolling these type of loops the unreachable
terminators and their unreachable branch are removed prior to
unrolling. Because of the logic change we also switch some
list access in the complex unrolling logic to avoid breakage.

Fixes: 6772a17acc8e ("nir: Add a loop analysis pass")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoradeonsi: enable OpenGL 4.4 compat profile
Timothy Arceri [Mon, 25 Jun 2018 10:31:02 +0000 (20:31 +1000)]
radeonsi: enable OpenGL 4.4 compat profile

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: enable ARB_vertex_attrib_64bit in compat profile
Timothy Arceri [Wed, 20 Jun 2018 03:05:05 +0000 (13:05 +1000)]
mesa: enable ARB_vertex_attrib_64bit in compat profile

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add outstanding ARB_vertex_attrib_64bit dlist support
Timothy Arceri [Thu, 28 Jun 2018 05:31:09 +0000 (15:31 +1000)]
mesa: add outstanding ARB_vertex_attrib_64bit dlist support

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agovbo_save: add support for doubles to display list code
Dave Airlie [Thu, 28 Jun 2018 02:40:20 +0000 (12:40 +1000)]
vbo_save: add support for doubles to display list code

Required for ARB_vertex_attrib_64bit compat profile support.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add compat profile support for ARB_multi_draw_indirect
Timothy Arceri [Mon, 25 Jun 2018 00:32:58 +0000 (10:32 +1000)]
mesa: add compat profile support for ARB_multi_draw_indirect

v2: add missing ARB_base_instance support

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: make valid_draw_indirect_multi() accessible externally
Timothy Arceri [Mon, 25 Jun 2018 00:31:34 +0000 (10:31 +1000)]
mesa: make valid_draw_indirect_multi() accessible externally

We will use this to add compat support to ARB_multi_draw_indirect
in the following patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add ARB_draw_indirect support to compat profile
Timothy Arceri [Sat, 23 Jun 2018 07:09:13 +0000 (17:09 +1000)]
mesa: add ARB_draw_indirect support to compat profile

v2: add missing ARB_base_instance support

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: generate GL_INVALID_OPERATION using draw indirect in dlist
Timothy Arceri [Sat, 23 Jun 2018 02:29:50 +0000 (12:29 +1000)]
mesa: generate GL_INVALID_OPERATION using draw indirect in dlist

The spec doesn't explicitly say to generate an error but since
DrawArraysInstanced* and DrawElementsInstanced* do, it makes
sense to do it for these functions also.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add missing display list support for ARB_compute_shader
Timothy Arceri [Thu, 28 Jun 2018 00:25:17 +0000 (10:25 +1000)]
mesa: add missing display list support for ARB_compute_shader

The extension is enabled for compat profile but there is currently
no display list support.

I filed a spec bug and it has been agreed that
glDispatchComputeIndirect should generate an INVALID_OPERATION
error when called during display list compilation.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: expose some ARB_viewport_array dependent extensions in compat
Timothy Arceri [Thu, 21 Jun 2018 00:35:15 +0000 (10:35 +1000)]
mesa: expose some ARB_viewport_array dependent extensions in compat

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: enable ARB_viewport_array in compat profile
Timothy Arceri [Wed, 20 Jun 2018 03:03:40 +0000 (13:03 +1000)]
mesa: enable ARB_viewport_array in compat profile

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add ARB_viewport_array display list support
Timothy Arceri [Thu, 21 Jun 2018 00:14:36 +0000 (10:14 +1000)]
mesa: add ARB_viewport_array display list support

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: enable ARB_shader_subroutine in compat profile
Timothy Arceri [Wed, 20 Jun 2018 00:55:34 +0000 (10:55 +1000)]
mesa: enable ARB_shader_subroutine in compat profile

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add glUniformSubroutinesuiv() display list support
Timothy Arceri [Wed, 20 Jun 2018 01:08:35 +0000 (11:08 +1000)]
mesa: add glUniformSubroutinesuiv() display list support

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: stop hiding remaining query parameters from OpenGL compat
Timothy Arceri [Wed, 20 Jun 2018 00:16:20 +0000 (10:16 +1000)]
mesa: stop hiding remaining query parameters from OpenGL compat

I managed to miss these two in my last pass at this.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: enable ARB_gpu_shader_fp64 in compat profile
Timothy Arceri [Tue, 19 Jun 2018 09:35:17 +0000 (19:35 +1000)]
mesa: enable ARB_gpu_shader_fp64 in compat profile

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add ProgramUniform*d display list support
Timothy Arceri [Tue, 19 Jun 2018 09:33:26 +0000 (19:33 +1000)]
mesa: add ProgramUniform*d display list support

This is required for fp64 to be enabled in compat profile.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add Uniform*d support to display lists
Timothy Arceri [Tue, 19 Jun 2018 09:05:25 +0000 (19:05 +1000)]
mesa: add Uniform*d support to display lists

This is required so we can enable fp64 support in compat profile.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agost/glsl_to_nir: run lower_output_reads on !PIPE_CAP_TGSI_CAN_READ_OUTPUTS
Karol Herbst [Tue, 20 Feb 2018 16:56:47 +0000 (17:56 +0100)]
st/glsl_to_nir: run lower_output_reads on !PIPE_CAP_TGSI_CAN_READ_OUTPUTS

this is required for Drivers which don't allow reading from outputs.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
6 years agov3d: Move GL shader state dumping out of per-version compilation.
Eric Anholt [Thu, 28 Jun 2018 19:33:43 +0000 (12:33 -0700)]
v3d: Move GL shader state dumping out of per-version compilation.

It doesn't depend on V3D_VER, since it's just calling v3d_print_group.

6 years agov3d: Add missing Stream field to transform feedback specs on V3D 4.1.
Eric Anholt [Thu, 28 Jun 2018 20:08:59 +0000 (13:08 -0700)]
v3d: Add missing Stream field to transform feedback specs on V3D 4.1.

Noticed when trying to CLIF parse a transform feedback job that hangs on
HW.

6 years agov3d: Add missing "tri trip or fan" flag in Primitive List Format.
Eric Anholt [Wed, 27 Jun 2018 23:40:36 +0000 (16:40 -0700)]
v3d: Add missing "tri trip or fan" flag in Primitive List Format.

6 years agov3d: Fix the shader code address field widths on V3D 4.1+
Eric Anholt [Wed, 27 Jun 2018 23:31:19 +0000 (16:31 -0700)]
v3d: Fix the shader code address field widths on V3D 4.1+

We were overlapping it with the threadable/nan flags, resulting in
incorrect relocations (threadable/nan included in the offset) and wrong
ordering in the CLIF files.

6 years agov3d: Add missing "no prim pack" field to the V3D4.1+ GL shader state.
Eric Anholt [Wed, 27 Jun 2018 23:28:25 +0000 (16:28 -0700)]
v3d: Add missing "no prim pack" field to the V3D4.1+ GL shader state.

It looks like we don't need this flag for anything (not that I'm clear on
what it does), but it makes our struct dumping line up with CLIF parsing.

6 years agov3d: Express dithering mode in the same way that the CLIF parser does.
Eric Anholt [Wed, 27 Jun 2018 23:00:16 +0000 (16:00 -0700)]
v3d: Express dithering mode in the same way that the CLIF parser does.

6 years agov3d: Add missing "number of bin tile lists" field.
Eric Anholt [Wed, 27 Jun 2018 22:55:32 +0000 (15:55 -0700)]
v3d: Add missing "number of bin tile lists" field.

Noticed when trying to feed our dumps through the CLIF parser.  Since this
is a "minus one" field, we were already filling in the value we wanted (0).

6 years agov3d: Rewrite the color write masks to match CLIF format.
Eric Anholt [Wed, 27 Jun 2018 22:25:03 +0000 (15:25 -0700)]
v3d: Rewrite the color write masks to match CLIF format.

The render_target_* fields gave us pretty(ish) printing, but meant we were
incompatible with CLIF, and had much more verbose code generating them.

6 years agov3d: Merge the V3D 4.1 and 4.2 XML into V3D 3.3'x XML.
Eric Anholt [Wed, 27 Jun 2018 18:21:34 +0000 (11:21 -0700)]
v3d: Merge the V3D 4.1 and 4.2 XML into V3D 3.3'x XML.

The XML ends up noisier if you're only looking at one version, but from
the diffstat there's obvious wins in terms of deduplication.  This will
get even more significant if we ever support 3.2 or 4.0.

6 years agov3d: Switch v3d_decoder.c to the XML's top min_ver/max_ver fields.
Eric Anholt [Wed, 27 Jun 2018 21:10:52 +0000 (14:10 -0700)]
v3d: Switch v3d_decoder.c to the XML's top min_ver/max_ver fields.

The XML zipper wants one XML per version for filling out its tables, but
we want to do more than one GPU version per XML now.  Assume that the
"gen" field will be the same as min_ver and look up our XML text assuming
that they're listed in increasing min_ver.

6 years agov3d: Create XML fields for min_ver and max_ver of a packet/struct/enum.
Eric Anholt [Wed, 27 Jun 2018 18:10:07 +0000 (11:10 -0700)]
v3d: Create XML fields for min_ver and max_ver of a packet/struct/enum.

This will be used to merge together the V3D 3.3-4.1 XML with the variants
disabled based on the version.

6 years agov3d: Pass the version being generated to the pack generator script.
Eric Anholt [Wed, 27 Jun 2018 17:46:04 +0000 (10:46 -0700)]
v3d: Pass the version being generated to the pack generator script.

It turns out that most V3D versions change very few packets, so keeping
separate copies of the XML per version makes changing the XML a pain as
you have to replicate your changes to each one.  This is the start of
changing it so that one XML can generate headers for multiple versions.

6 years agoanv: finish the binding_table_pool on destroyDevice when use_softpin
Jose Maria Casanova Crespo [Thu, 28 Jun 2018 13:36:12 +0000 (15:36 +0200)]
anv: finish the binding_table_pool on destroyDevice when use_softpin

Running VK-CTS in batch execution mode was raising the
VK_ERROR_INITIALIZATION_FAILED error in multiple tests. But when the
same failing tests were run isolated they always passed.

createDevice and destroyDevice were called before and after every
tests. Because the binding_table_pool was never closed, we reached the
maximum number of open file descriptors (ulimit -n) and when that
happened every call to createDevice implied a
VK_ERROR_INITIALIZATION_FAILED error.

Fixes: c7db0ed4e94dce563d722e1b098684fbd7315d51
      ("anv: Use a separate pool for binding tables when soft pinning")

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agogallium/util: remove dummy function util_format_is_supported
Marek Olšák [Mon, 25 Jun 2018 16:34:39 +0000 (12:34 -0400)]
gallium/util: remove dummy function util_format_is_supported

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
6 years agodocs: update calendar, add news and link release notes to 18.1.3
Dylan Baker [Fri, 29 Jun 2018 18:04:22 +0000 (11:04 -0700)]
docs: update calendar, add news and link release notes to 18.1.3

6 years agodocs: Add SHA256 sums to notes for 18.1.3
Dylan Baker [Fri, 29 Jun 2018 18:00:48 +0000 (11:00 -0700)]
docs: Add SHA256 sums to notes for 18.1.3

6 years agodocs: Add release notes for 18.1.3
Dylan Baker [Fri, 29 Jun 2018 17:35:37 +0000 (10:35 -0700)]
docs: Add release notes for 18.1.3

6 years agonv50/ir: improve maintainability of Target*::initOpInfo()
Rhys Perry [Fri, 29 Jun 2018 13:51:11 +0000 (14:51 +0100)]
nv50/ir: improve maintainability of Target*::initOpInfo()

This is mainly useful for when one needs to add new opcodes in a painless
and reliable way.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
6 years agonv50/ir: fix image stores with indirect handles
Rhys Perry [Tue, 5 Jun 2018 20:09:32 +0000 (21:09 +0100)]
nv50/ir: fix image stores with indirect handles

Having this if statement here prevented the next if statement from being
reached in the case of image stores, which is needed for instructions with
indirect bindless handles like "STORE TEMP[ADDR[2].x+1](1) ...".

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
6 years agoegl: fix build race in automake
Ross Burton [Thu, 28 Jun 2018 22:01:59 +0000 (23:01 +0100)]
egl: fix build race in automake

There is a parallel make build issue in src/egl/drivers/dri2/
for wayland builds. Can be reproduced with:

$ rm src/egl/drivers/dri2/*.h src/egl/drivers/dri2/platform_wayland.lo
$ make -C src/egl/ drivers/dri2/platform_wayland.lo
../../../mesa-18.1.2/src/egl/drivers/dri2/platform_wayland.c:50:10: fatal error: linux-dmabuf-unstable-v1-client-protocol.h: No such file or directory

This patch adds the missing dependency.

Fixes: 02cc359372773800de817 "egl/wayland: Use linux-dmabuf interface for buffers"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
[Eric: fixed up the commit title]
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
6 years agoradeonsi: implement vertex color clamping for tess and GS
Marek Olšák [Sat, 23 Jun 2018 05:44:14 +0000 (01:44 -0400)]
radeonsi: implement vertex color clamping for tess and GS

6 years agoradeonsi: move VS_STATE_SGPR before draw SGPRs
Marek Olšák [Sat, 23 Jun 2018 05:43:12 +0000 (01:43 -0400)]
radeonsi: move VS_STATE_SGPR before draw SGPRs

for vertex color clamping.

6 years agoradeonsi: don't use malloc in si_generate_gs_copy_shader
Marek Olšák [Sat, 23 Jun 2018 05:39:02 +0000 (01:39 -0400)]
radeonsi: don't use malloc in si_generate_gs_copy_shader

6 years agoradeonsi: disable DCC statistics gathering on everything but Stoney
Marek Olšák [Mon, 18 Jun 2018 20:03:39 +0000 (16:03 -0400)]
radeonsi: disable DCC statistics gathering on everything but Stoney

I think we don't need it on other chips.

6 years agoradeonsi: don't enable DCC statistics gathering for small surfaces
Marek Olšák [Mon, 18 Jun 2018 20:02:14 +0000 (16:02 -0400)]
radeonsi: don't enable DCC statistics gathering for small surfaces

6 years agoradeonsi: simplify logic around vi_separate_dcc_try_enable
Marek Olšák [Mon, 18 Jun 2018 19:40:07 +0000 (15:40 -0400)]
radeonsi: simplify logic around vi_separate_dcc_try_enable

6 years agoradeonsi: fix memory exhaustion issue with DCC statistics gathering with DRI2
Marek Olšák [Mon, 18 Jun 2018 19:53:47 +0000 (15:53 -0400)]
radeonsi: fix memory exhaustion issue with DCC statistics gathering with DRI2

Cc: 18.1 <mesa-stable@lists.freedesktop.org>
6 years agoradeonsi: remove references to Evergreen
Marek Olšák [Thu, 14 Jun 2018 02:31:21 +0000 (22:31 -0400)]
radeonsi: remove references to Evergreen

6 years agoradeonsi: enable shader caching for compute shaders
Marek Olšák [Thu, 14 Jun 2018 06:43:19 +0000 (02:43 -0400)]
radeonsi: enable shader caching for compute shaders

Compute shaders were not using the shader cache.

6 years agoradeonsi: store compute local_size into tgsi_shader_info
Marek Olšák [Thu, 14 Jun 2018 06:25:00 +0000 (02:25 -0400)]
radeonsi: store compute local_size into tgsi_shader_info

This is kinda a hack, but it's enough for the shader cache.

6 years agoradeonsi: unify duplicated code for initial shader compilation
Marek Olšák [Thu, 14 Jun 2018 06:09:05 +0000 (02:09 -0400)]
radeonsi: unify duplicated code for initial shader compilation

6 years agoac: set +auto-waitcnt-before-barrier when needed
Marek Olšák [Thu, 14 Jun 2018 05:27:10 +0000 (01:27 -0400)]
ac: set +auto-waitcnt-before-barrier when needed

This removes useless s_waitcnt before barriers.
Only radeonsi uses this function.

6 years agoradeonsi/gfx9: insert the barrier between merged shaders inside the if block
Marek Olšák [Thu, 14 Jun 2018 05:10:54 +0000 (01:10 -0400)]
radeonsi/gfx9: insert the barrier between merged shaders inside the if block

6 years agogallium: plumb invariant output attrib thru TGSI
Joe M. Kniss [Thu, 21 Jun 2018 00:55:10 +0000 (17:55 -0700)]
gallium: plumb invariant output attrib thru TGSI

Add support for glsl 'invariant' modifier for output data declarations.
Gallium drivers that use TGSI serialization currently loose invariant
modifiers in glsl shaders.

v2: use boolean for invariant instead of unsigned.

Tested: chromiumos on qemu with virglrenderer.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agointel/fs: Build 32-wide FS shaders.
Francisco Jerez [Wed, 27 Apr 2016 02:45:41 +0000 (19:45 -0700)]
intel/fs: Build 32-wide FS shaders.

Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agointel/anv,blorp,i965: Implement the SKL 16x MSAA SIMD32 workaround
Jason Ekstrand [Fri, 18 May 2018 23:39:21 +0000 (16:39 -0700)]
intel/anv,blorp,i965: Implement the SKL 16x MSAA SIMD32 workaround

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/fs: Add fields to wm_prog_data for SIMD32 dispatch
Jason Ekstrand [Fri, 18 May 2018 06:26:02 +0000 (23:26 -0700)]
intel/fs: Add fields to wm_prog_data for SIMD32 dispatch

Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Fix nir_intrinsic_load_helper_invocation for SIMD32.
Francisco Jerez [Thu, 12 Jan 2017 03:55:33 +0000 (19:55 -0800)]
intel/fs: Fix nir_intrinsic_load_helper_invocation for SIMD32.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Fix fs_builder::sample_mask_reg() for 32-wide FS dispatch.
Francisco Jerez [Mon, 9 Jan 2017 22:14:02 +0000 (14:14 -0800)]
intel/fs: Fix fs_builder::sample_mask_reg() for 32-wide FS dispatch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Fix Gen6+ interpolation setup for SIMD32
Francisco Jerez [Fri, 13 Jan 2017 23:33:11 +0000 (15:33 -0800)]
intel/fs: Fix Gen6+ interpolation setup for SIMD32

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Get rid of MOV_DISPATCH_TO_FLAGS
Jason Ekstrand [Thu, 24 May 2018 01:09:48 +0000 (18:09 -0700)]
intel/fs: Get rid of MOV_DISPATCH_TO_FLAGS

We can just emit the MOV in the two places where we use this.

Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Emit MOV_DISPATCH_TO_FLAGS once for the centroid workaround
Jason Ekstrand [Thu, 24 May 2018 00:54:54 +0000 (17:54 -0700)]
intel/fs: Emit MOV_DISPATCH_TO_FLAGS once for the centroid workaround

There's no reason for us to emit it a pile of times and then have a
whole pass to clean it up.  Just emit it once like we really want.

Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Generalize the unlit centroid workaround
Francisco Jerez [Fri, 13 Jan 2017 23:33:45 +0000 (15:33 -0800)]
intel/fs: Generalize the unlit centroid workaround

This generalizes the unlit centroid workaround so it's less code and now
supports SIMD32.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Fix sample id setup for SIMD32.
Francisco Jerez [Fri, 13 Jan 2017 23:32:05 +0000 (15:32 -0800)]
intel/fs: Fix sample id setup for SIMD32.

v2 (Jason Ekstrand):
 - Disallow gl_SampleId in SIMD32 on gen7

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Fix Gen7 compressed source region alignment restriction for SIMD32
Francisco Jerez [Sat, 14 Jan 2017 01:04:23 +0000 (17:04 -0800)]
intel/fs: Fix Gen7 compressed source region alignment restriction for SIMD32

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Implement 32-wide FS payload setup on Gen6+
Francisco Jerez [Fri, 13 Jan 2017 23:40:38 +0000 (15:40 -0800)]
intel/fs: Implement 32-wide FS payload setup on Gen6+

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Extend thread payload layout to SIMD32
Francisco Jerez [Fri, 13 Jan 2017 23:36:51 +0000 (15:36 -0800)]
intel/fs: Extend thread payload layout to SIMD32

And handle 32-wide payload register reads in fetch_payload_reg().

v2 (Jason Ekstrand);
 - Fix some whitespace and brace placement

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Wrap FS payload register look-up in a helper function.
Francisco Jerez [Fri, 13 Jan 2017 23:23:48 +0000 (15:23 -0800)]
intel/fs: Wrap FS payload register look-up in a helper function.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Use fs_regs instead of brw_regs in the unlit centroid workaround
Francisco Jerez [Fri, 13 Jan 2017 23:18:07 +0000 (15:18 -0800)]
intel/fs: Use fs_regs instead of brw_regs in the unlit centroid workaround

While we're here, we change to using horiz_offset() instead of abusing
half().

v2 (Jason Ekstrand):
 - Use horiz_offset() instead of half()

Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Simplify fs_visitor::emit_samplepos_setup
Francisco Jerez [Fri, 13 Jan 2017 22:53:00 +0000 (14:53 -0800)]
intel/fs: Simplify fs_visitor::emit_samplepos_setup

The original code manually handled splitting the MOVs to 8-wide to
handle various regioning restrictions.  Now that we have a SIMD width
splitting pass that handles these things, we can just emit everything at
the full width and let the SIMD splitting pass handle it.  We also now
have a useful "subscript" helper which is designed exactly for the case
where you want to take a W type and read it as a vector of Bs so we may
as well use that too.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agoi965: Add plumbing for shader time in 32-wide FS dispatch mode.
Francisco Jerez [Tue, 26 Apr 2016 00:02:05 +0000 (17:02 -0700)]
i965: Add plumbing for shader time in 32-wide FS dispatch mode.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Disable opt_sampler_eot() in 32-wide dispatch.
Francisco Jerez [Tue, 26 Apr 2016 00:08:42 +0000 (17:08 -0700)]
intel/fs: Disable opt_sampler_eot() in 32-wide dispatch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Emit LINE+MAC for LINTERP with unaligned coordinates
Jason Ekstrand [Sat, 26 May 2018 05:23:30 +0000 (22:23 -0700)]
intel/fs: Emit LINE+MAC for LINTERP with unaligned coordinates

On g4x through Sandy Bridge, src1 (the coordinates) of the PLN
instruction is required to be an even register number.  When it's odd
(which can happen with SIMD32), we have to emit a LINE+MAC combination
instead.  Unfortunately, we can't just fall through to the gen4 case
because the input registers are still set up for PLN which lays out the
four src1 registers differently in SIMD16 than LINE.

v2 (Jason Ekstrand):
 - Take advantage of both accumulators and emit LINE LINE MAC MAC
   (Based on a patch from Francisco Jerez)
 - Unify the gen4 and gen4x-6 cases using a loop

v3 (Jason Ekstrand):
 - Don't unify gen4 with gen4x-6 as this turns out to be more fragile
   than first thought without reworking the gen4 barycentric coordinate
   layout.

Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Mark LINTERP opcode as writing accumulator on platforms without PLN
Jason Ekstrand [Mon, 28 May 2018 16:42:49 +0000 (09:42 -0700)]
intel/fs: Mark LINTERP opcode as writing accumulator on platforms without PLN

When we don't have PLN (gen4 and gen11+), we implement LINTERP as either
LINE+MAC or a pair of MADs.  In both cases, the accumulator is written
by the first of the two instructions and read by the second.  Even
though the accumulator value isn't actually ever used from a logical
instruction perspective, it is trashed so we need to make the scheduler
aware.  Otherwise, the scheduler could end up re-ordering instructions
and putting a LINTERP between another an instruction which writes the
accumulator and another which tries to use that result.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>