git.libre-soc.org Git - mesa.git/log

i965/fs: split out calculation of payload live ranges

We'll need this for the scheduler too, since it wants to know when the
live ranges of payload registers end in order to model them in our
register pressure calculations.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

i965: dump scheduling cycle estimates

The heuristic we're using is rather lame, since it assumes everything is
non-uniform and loops execute 10 times, but it should be enough for
measuring improvements in the scheduler that don't result in a change in
the number of instructions.

v2:
- Switch loops and cycle counts to be compatible with older shader-db.
- Make loop heuristic 10x to match with spilling code.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

i965: always run the post-RA scheduler

Before, we would only do scheduling after register allocation if we
spilled, despite the fact that the pre-RA scheduler was only supposed to
be for register pressure and set the latencies of every instruction to
1. This meant that unless we spilled, which we rarely do, then we never
considered instruction latencies at all, and we usually never bothered
to try and hide texture fetch latency. Although a later commit removes
the setting the latency to 1 part, we still want to always run the
post-RA scheduler since it's able to take the false dependencies that
the register allocator creates into account, and it can be more
aggressive than the pre-RA scheduler since it doesn't have to worry
about register pressure at all.

Test                   master      post-ra-sched     diff       %diff
bench_OglPSBump2       396.730     402.386           5.656      +1.400%
bench_OglPSBump8       244.370     247.591           3.221      +1.300%
bench_OglPSPhong       241.117     242.002           0.885      +0.300%
bench_OglPSPom         59.555      59.725            0.170      +0.200%
bench_OglShMapPcf      86.149      102.346           16.197     +18.800%
bench_OglVSTangent     388.849     395.489           6.640      +1.700%
bench_trex             65.471      65.862            0.390      +0.500%
bench_trexoff          69.562      70.150            0.588      +0.800%
bench_heaven           25.179      25.254            0.074      +0.200%

Reviewed-by: Jason Ekstrand <jasoan.ekstrand@intel.com>

i965/sched: write-after-read dependencies are free

Although write-after-write dependencies have the same latency as
read-after-write dependencies due to how the register scoreboard works,
write-after-read dependencies aren't checked by the EU at all, so
they're purely a constraint on how the scheduler can order the
instructions.

v2: fix accumulator dependencies too.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

i965: fix cycle estimates when there's a pipeline stall

The issue time for an instruction is how many cycles it takes to
actually put it into the pipeline. If there's a pipeline stall that
causes the instruction to be delayed, we should first take that into
account to figure out when the instruction would start executing and
*then* add the issue time. The old code had it backwards, and so we
would underestimate the total time whenever we thought there would be a
pipeline stall by up to the issue time of the instruction.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

vc4: Allow user index buffers, to avoid slow readback for shadow IBs.

Improves low-settings openarena performance by 31.9975% +/- 0.659931%
(n=7).

nv50: mark contexts shareable, compile at creation time

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nv50: allow per-sample interpolation to be forced via rast

Uses the same technique as for nvc0 of fixups before upload, and
evicting in case of state change. Removes one source of variants kept by
st/mesa.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

i965: Add INTEL_DEBUG=nocompact to disable instruction compaction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Add INTEL_DEBUG=hex to print the hex with the disassembly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Print the type and writemask on null destinations.

These are often useful in debugging, and the writemask (actually
"Channel Enables") determines more than just what goes into the
destination.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Test fixed_hw_reg.file against BRW_IMMEDIATE_VALUE, not IMM.

No functional change, since they were both 3, but BRW_IMMEDIATE_VALUE is
the hardware value and IMM was the IR value -- and you can see that
BRW_IMMEDIATE_VALUE was correctly used in the context of this patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/vec4: Test against BRW_IMMEDIATE_VALUE, not IMM.

No functional change, since they were both 3, but BRW_IMMEDIATE_VALUE is
the hardware value and IMM was the IR value -- and you can see that
BRW_IMMEDIATE_VALUE was correctly used in the context of this patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Use group(4, 0) to emit an exec-size 4 MOV.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/cfg: Handle no-idom case in cfg_t::dump_domtree().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/disasm: Remove unused _addr_mode argument from src_ia1().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Set correct field for indirect align16 addrimm.

This has been wrong since the initial import of the i965 driver.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/vec4: Drop brw_set_default_* before popping insn state.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/vec4: Remove unnecessary #includes from the generator.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

r600: enable SB for geom shaders on pre-evergreen

I've checked with piglit and one tests fails, but it fails
on evergreen as well, so will get fixed later.

Otherwise SB seems to be working fine for geom shaders on my
rv635.

Signed-off-by: Dave Airlie <airlied@redhat.com>

i965/vec4: Eliminate the vec4_generator class altogether.

We really weren't taking advantage of vec4_generator being a class.
By adding a "p" parameter to the helper methods, and "prog_data" to
ones which need binding table information, we can convert everything
to static functions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/vec4: Move vec4_generator class definition into the .cpp file.

The public API for the generator is brw_vec4_generate_code(); nobody
actually needs to use the class. This means we can extend it without
triggering the recompiles associated with altering brw_vec4.h.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/vec4: Wrap vec4_generator in a C function.

vec4_generator is a class for convenience, but only exports a single
method as its public API. It makes much more sense to just export a
single function.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/vec4: Convert src_reg/dst_reg to brw_reg at the end of the visitor.

This patch makes the visitor convert registers to the HW_REG file at the
very end, after register allocation, post-RA scheduling, and dependency
control flagging. After that, everything is in fixed brw_regs.

This simplifies the code generator, as it can just use the hardware
registers rather than having to interpret our abstract files. In
particular, interpreting the UNIFORM file meant reading prog_data
to figure out where push constants are supposed to start.

Having the part of the code that performs register allocation also
translate everything to hardware registers seems sensible.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

r600g: Fix special negative immediate constants when using ABS modifier.

Some constants (like 1.0 and 0.5) could be inlined as immediate inputs
without using their literal value. The r600_bytecode_special_constants()
function emulates the negative of these constants by using NEG modifier.

However some shaders define -1.0 constant and want to use it as 1.0.
They do so by using ABS modifier. But r600_bytecode_special_constants()
set NEG in addition to ABS. Since NEG modifier have priority over ABS one,
we get -|1.0| as result, instead of |1.0|.

The patch simply prevents the additional switching of NEG when ABS is set.

[According to Ivan Kalvachev, this bug was fond via
https://github.com/iXit/Mesa-3D/issues/126 and
https://github.com/iXit/Mesa-3D/issues/127]

Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
CC: <mesa-stable@lists.freedesktop.org>

st/mesa: fix mipmap generation for immutable textures with incomplete pyramids

Without the clamping by NumLevels, the state tracker would reallocate the
texture storage (incorrect) and even fail to copy the base level image
after reallocation, leading to the graphical glitch of
https://bugs.freedesktop.org/show_bug.cgi?id=91993 .

A piglit test has been submitted for review as well (subtest of
arb_texture_storage-texture-storage).

v2: also bypass all calls to st_finalize_texture (suggested by Marek Olšák)

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa: Enable ASTC in GLES' [NUM_]COMPRESSED_TEXTURE_FORMATS queries

In OpenGL ES, the COMPRESSED_TEXTURE_FORMATS query returns the set of
supported specific compressed formats. Since ASTC formats fit within
that category, include them in the set and update the
NUM_COMPRESSED_TEXTURE_FORMATS query as well.

This enables GLES2-based ASTC dEQP tests to run. See the Bugzilla for
more info.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92193
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa/texcompress: Restrict FXT1 format to desktop GL subset

In agreement with the extension spec and commit
dd0eb004874645135b9aaac3ebbd0aaf274079ea, filter FXT1 formats to the
desktop GL profiles. Now we no longer advertise such formats as supported
in an ES context and then throw an INVALID_ENUM error when the client
tries to use such formats with CompressedTexImage2D.

Fixes the following 26 dEQP tests:
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_size
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_level_max_cube_pos
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_level_max_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_level_cube
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_level_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_tex2d

v2. Use _mesa_is_desktop_gl() (Ilia, Ian)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>

nvc0: expose a group of performance metrics on Fermi

This allows to monitor those performance metrics through
GL_AMD_performance_monitor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

st/mesa: create temporary textures with the same nr_samples as source

Not sure if this is actually reachable in practice (to have a complex
copy with MS textures).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

glsl: add fragdata arrays to program resource list

This makes sure that user is still able to query properties about
variables that have gotten removed by opt_dead_builtin_varyings pass.

Fixes following OpenGL ES 3.1 test:
ES31-CTS.program_interface_query.output-layout

No Piglit regressions.

v2: cleanup, drop extra parenthesis (Topi)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>

mesa: add fragdata_arrays list to gl_shader

This is required to store information about fragdata arrays, currently
these variables get lost and cannot be retrieved later in sensible way
for program interface queries. List will be utilized by next patch.

Patch also modifies opt_dead_builtin_varyings pass to build list when
lowering fragdata arrays. This is identical approach as taken with
packed varyings pass.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>

glsl: fix GL_BUFFER_DATA_SIZE value for shader storage blocks with unsize arrays

From ARB_program_interface_query:

"For the property of BUFFER_DATA_SIZE, then the implementation-dependent
minimum total buffer object size, in basic machine units, required to hold
all active variables associated with an active uniform block, shader
storage block, or atomic counter buffer is written to <params>.  If the
final member of an active shader storage block is array with no declared
size, the minimum buffer size is computed assuming the array was declared
as an array with one element."

Fixes the following dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.named_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.unnamed_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.block_array

v2:
- Fix comment's indentation and explain that the parser already
  checked that unsized array is in last element of a shader
  storage block (Iago).
- Add assert (Iago).

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

docs: Mark GL_ARB_fragment_layer_viewport as done on i965.

i965: Implement ARB_fragment_layer_viewport.

Normally, we could read gl_Layer from bits 26:16 of R0.0. However, the
specification requires that bogus out-of-range 32-bit values written by
previous stages need to appear in the fragment shader as-written.

Instead, we pass in the full 32-bit value from the VUE header as an
extra flat-shaded varying. We have the SF override the value to 0
when the previous stage didn't actually write a value (it's actually
defined to return 0).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

i965: Make calculate_attr_overrides return the URB read offset.

Traditionally, we've hardcoded "URB Entry Read Offset" to 1 (which
represents 2 vec4 varying slots) to skip over the 8 DWord VUE header.

In order to support ARB_fragment_layer_viewport, we'll need to read
from that header. This patch adds the basic plumbing necessary to
calculate a value dynamically and hook it up in the SBE packets.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

glsl: Mark gl_ViewportIndex and gl_Layer varyings as flat.

Integer varyings need to be flat qualified - all others were already.
I think we just missed this. Presumably some hardware passes this via
sideband and ignores attribute interpolation, so no one has noticed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

i965/fs: Properly check for PAD in fragment shaders with > 16 varyings.

Commit 268008f98c3810b9f276df985dc93efc0c49f33e changed unused VUE map
slots to be initialized with BRW_VARYING_SLOT_PAD, not COUNT.  I missed
updating this.  It also means that commit message was wrong, as some
code *did* rely slots being initialized to COUNT.

This may fix a bug with SSO programs with > 16 FS input varyings.
I think we probably just emitted extra pointless code, but probably
didn't break anything.  We might also just have no tests for that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

i965: Update stale comment about unused VUE map slots.

I changed this from COUNT to PAD in commit 268008f98c3810b9f276df985dc93ef.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

nv50/ir: adapt to new method for passing in cull/clip distance masks

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0: share shaders between contexts and build immediately

Avoid deferring building shaders until draw time, should hopefully
reduce any stuttering, as well as enable shader-db style analysis.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0: do upload-time fixups for interpolation parameters

Unfortunately flatshading is an all-or-nothing proposition on nvc0,
while GL 3.0 calls for the ability to selectively specify explicit
interpolation parameters on gl_Color/gl_SecondaryColor which would
override the flatshading setting. This allows us to fix up the
interpolation settings after shader generation based on rasterizer
settings.

While we're at it, we can add support for dynamically forcing all
(non-flat) shader inputs to be interpolated per-sample, which allows
st/mesa to not generate variants for these.

Fixes the remaining failing glsl-1.30/execution/interpolation piglits.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nir: Copy "patch" flag from ir_variable to nir_variable.

This was introduced in GLSL IR after NIR development had branched.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

nir: Add intrinsics for tessellation shader system values.

nir_intrinsic_load_patch_vertices_in corresponds to gl_PatchVerticesIn,
a special input in both the TCS and TES stages.

nir_intrinsic_load_tess_coord corresponds to gl_TessCoord, a special
tessellation evaluation shader input.

nir_intrinsic_load_tess_level_outer/inner correspond to the
gl_TessLevelOuter[] and gl_TessLevelInner[] evaluation shader inputs,
which we treat as system values because they're stored specially.
(These intrinsics are only for the TES - the TCS uses output variables.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

i965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.

Consider the case of two nearly identical GLSL fragment shaders:

   out vec4 color;
   void main() { color = vec4(1); }

and

   layout(early_fragment_tests) in;
   out vec4 color;
   void main() { color = vec4(1); }

These shaders compile to the exact same assembly, but have distinct
values for brw_wm_prog_data::early_fragment_tests.

Since these are two independent GLSL shaders, they have different
program keys - notably, brw_wm_prog_key::program_string_id differs.

When uploading the second, brw_upload_cache will find an existing copy
of the assembly in the cache BO, which means matching_data will be
non-NULL.  Although we create a second cache item (with the new key
and prog_data), we set item->offset to the existing copy and avoid
re-uploading duplicate assembly.

However, brw_search_cache() would only flag BRW_NEW_*_PROG_DATA if
item->offset differed from the supplied offset.  With reuse, both
programs have the same offset, but prog_data changed.  We have to
flag it, but failed to.

To fix this, we simply need to check if the aux (prog_data) pointer
changed.  If either the assembly or the prog_data differs, flag it.

This fixes a regression since 1bba29ed403e735ba0bf04ed8aa2e571884f,
where Topi fixed brw_upload_cache() to actually reuse identical
assembly.  Prior to that, reuse basically never happened due to bugs.
Unfortunately, this code apparently wasn't prepared to handle reuse!

Fixes GPU hangs in Dolphin on Broadwell.

Huge thanks to Pierre Bourdon and Ilia Mirkin for debugging this
and helping track down the real issue.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92623
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Pierre Bourdon <delroth@gmail.com>

clover: fix building fix clang-3.8

https://bugs.freedesktop.org/show_bug.cgi?id=92705

v2.1: use Linker::Flags::None instead of 0 and emplace_back()

Signed-off-by: Laurent Carlier <lordheavym@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

nv50: add ARB_copy_image support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0: add ARB_copy_image support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0: fix crash when nv50_miptree_from_handle fails

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

vbo: replace assertion with conditional in vbo_compute_max_verts()

With just the right sequence of per-vertex commands and state changes,
it's possible for this assertion to fail (such as with viewperf11's
lightwave-06-1 test). Instead of asserting, return 0 so that the
caller knows the VBO is full and needs to be flushed.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>

mesa: minor formatting fix in get_tex_rgba_compressed()

st/mesa: implement ARB_copy_image

I wonder if the craziness was worth it.

Reviewed-by: Brian Paul <brianp@vmware.com>

gallium: add PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATS

For ARB_copy_image.

Reviewed-by: Brian Paul <brianp@vmware.com>

radeonsi: allow copying between compatible compressed and uncompressed formats

which is where a block in src maps to a pixel in dst and vice versa.
e.g. DXT1 <-> R32G32_UINT
DXT5 <-> R32G32B32A32_UINT

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

mesa: set TargetIndex in VDPAURegister*SurfaceNV (v2)

We initialized Target, but not TargetIndex.
This is required since 7d7dd1871174905dfdd3ca874a09d9.

v2: do it in the right place. Noticed by Brian Paul.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92645

Reviewed-by: Brian Paul <brianp@vmware.com>

i965: remove unneeded src_reg copy in emit_shader_time_write

The variable is already of type src_reg. creating a new instance only to
destroy it seems unnecessary.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: remove cache_aux_free_func array

There is only one function that can be called, which is well known at
compilation time.

The abstraction used here seems unnecessary, so let's use a direct call
to brw_stage_prog_data_free() when appropriate, cut down the size of
struct brw_cache.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

main: fix GL_MAX_NUM_ACTIVE_VARIABLES value for shader storage blocks

The maximum number of active variables for shader storage blocks should
take into account the specific rules for shader storage blocks, i.e. for
an active shader storage block member declared as an array, an entry
will be generated only for the first array element, regardless of its type.

Fixes 3 dEQP-GLES31.functional.* tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.named_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.unnamed_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.block_array

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

st/vdpau: disable RefPicList for Vdpau HEVC

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

st/va: add VAAPI HEVC decode support

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

radeon/uvd: implement and add flag for VAAPI HEVC decode

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

vl: add RefPicList defines for VAAPI HEVC decode

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>

mesa: Draw indirect is not allowed if the default VAO is bound.

From OpenGL ES 3.1 specification, section 10.5:
"DrawArraysIndirect requires that all data sourced for the
command, including the DrawArraysIndirectCommand
structure, be in buffer objects, and may not be called when
the default vertex array object is bound."

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

winsys/amdgpu: remove the dcc_enable surface flag

dcc_size is sufficient and doesn't need a further comment in my opinion.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

radeonsi: add debug flags that disable DCC and DCC fast clear

For debugging, bug reports, etc.
This is not in the radeonsi directory, but it is about radeonsi.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

radeonsi: properly check if DCC is enabled and allocated

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

radeonsi: simplify DCC handling in si_initialize_color_surface

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

mesa: Draw indirect is not allowed when xfb is active and unpaused

OpenGL ES 3.1 specification, section 10.5:
"An INVALID_OPERATION error is generated if
transform feedback is active and not paused."

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

mesa: Draw Indirect return wrong error code on unalinged

From OpenGL 4.4 specification, section 10.4 and
Open GL Es 3.1 section 10.5:
"An INVALID_VALUE error is generated if indirect is not a multiple
of the size, in basic machine units, of uint."

However, the current code follow the ARB_draw_indirect:
https://www.opengl.org/registry/specs/ARB/draw_indirect.txt
"INVALID_OPERATION is generated by DrawArraysIndirect and
DrawElementsIndirect if commands source data beyond the end
of a buffer object or if <indirect> is not word aligned."

V2: After discussions on the list, it was suggested to
only keep the INVALID_VALUE error.

Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

main: Remove interface block array index for doing the name comparison

From ARB_program_query_interface spec:

"uint GetProgramResourceIndex(uint program, enum programInterface,
                                   const char *name);
[...]
If <name> exactly matches the name string of one of the active resources
for <programInterface>, the index of the matched resource is returned.
Additionally, if <name> would exactly match the name string of an active
resource if "[0]" were appended to <name>, the index of the matched
resource is returned. [...]"

"A string provided to GetProgramResourceLocation or
GetProgramResourceLocationIndex is considered to match an active variable
if:
[...]
   * if the string identifies the base name of an active array, where the
     string would exactly match the name of the variable if the suffix
     "[0]" were appended to the string;
[...]
"

Fixes the following two dEQP-GLES31 tests:

dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element

v2:
- Add AoA support (Timothy)
- Apply it too for GetUniformLocation(), GetUniformName() and others
  because ARB_program_interface_query says that they are equivalent
  to GetProgramResourceLocation() and GetProgramResourceName() (Tapani)

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

vc4: Add support for copy propagation with unpack flags present.

total instructions in shared programs: 89251 -> 87862 (-1.56%)
instructions in affected programs: 52971 -> 51582 (-2.62%)

vc4: Rewrite the pack instructions as a MOV with a dst pack flag

Another step in reducing the special-casing of instructions.

vc4: Move dst pack setup out to a helper function with more asserts.

vc4: Switch the unpack ops to being unpack flags on a mov.

This paves the way for copy propagating our unpacks. We end up with a
small change on shader-db:

total instructions in shared programs: 89390 -> 89251 (-0.16%)
instructions in affected programs: 19041 -> 18902 (-0.73%)

which appears to be because we no longer convert MOVs for an FMAX dst,
r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and
this ends up with a slightly better schedule.

vc4: Drop some confused code about pack/unpack handling.

At one point I thought packs and unpacks were in the same field of the
instruction. They aren't. These instructions therefore never cause a
pack.

total instructions in shared programs: 89472 -> 89390 (-0.09%)
instructions in affected programs: 15261 -> 15179 (-0.54%)

vc4: Reduce MOV special-casing in QIR-to-QPU.

I'm going to introduce some more types of MOV, which also want the elision
of raw MOVs.

vc4: Fix up the test for whether the unpack can be from r4.

We can do 16a/16b from float as well. No difference on shader-db.

vc4: Don't try to follow MOVs across a pack.

vc4: Only copy propagate raw MOVs.

No problems being fixed, but needed for the new unpack changes.

vc4: If a QIR source has an unpack set, print it.

Not used yet, but will be.

glsl: Convert TES gl_PatchVerticesIn into a constant when using a TCS.

When a TCS is present, the TES input gl_PatchVerticesIn is actually a
constant - it's simply the # of output vertices specified by the TCS
layout qualifiers. So, we can replace the system value with a constant,
which may allow further optimization, and will likely be more efficient.

If the TCS is absent, we can't do this optimization.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

i965: Add missing close-parenthesis in error messages

Trivial.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>

i965: Fix is-renderable check in intel_image_target_renderbuffer_storage

Previously we could create a renderbuffer with format
MESA_FORMAT_R8G8B8A8_UNORM, convert that renderbuffer to an EGLImage,
then FAIL to convert the EGLImage back to a renderbuffer because
reasons.  Just use the same check in
intel_image_target_renderbuffer_storage that brw_render_target_supported
uses.

There are more checks in brw_render_target_supported, but I don't think
they are necessary here.  A different approach would be to refactor
brw_render_target_supported to take rb->Format and rb->NumSamples as
parameters (instead of a gl_renderbuffer) and use the new function here.

Fixes:

    ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92476
Cc: "10.3 10.4 10.5 10.6 11.0" <mesa-stable@lists.freedesktop.org>

glsl: keep track of intra-stage indices for atomics

This is more optimal as it means we no longer have to upload the same set
of ABO surfaces to all stages in the program.

This also fixes a bug where since commit c0cd5b var->data.binding was
being used as a replacement for atomic buffer index, but they don't have
to be the same value they just happened to end up the same when binding is 0.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90175

gallivm: disable f16c when not using AVX

f16c intrinsic can only be emitted when AVX is used. So when we disable AVX
due to forcing 128bit vectors we must not use this intrinsic (depending on
llvm version, this worked previously because llvm used AVX even when we didn't
tell it to, however I've seen this fail with llvm 3.3 since
718249843b915decf8fccec92e466ac1a6219934 which seems to have the side effect
of disabling avx in llvm albeit it only touches sse flags really, but
with ea421e919ae6e72e1319fb205c42a6fb53ca2f82 it's now really disabled).
Albeit being able to use AVX with 128bit vectors also would have its uses, the
code as is really was meant to emulate jit code creation for less capable cpus.
v2: add some (ifdefed out) missing de-featuring options for simulating
less capable cpus.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

st/va: pass picture desc to begin and decode

At least vl_mpeg12_decoder uses the picture
desc in begin_frame and decode_bitstream.

https://bugs.freedesktop.org/show_bug.cgi?id=92634

Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

mesa: add additional checks for uniform location query

Patch adds additional check to make sure we don't return locations for
structures or arrays of structures.

From page 79 of the OpenGL 4.2 spec:
"A valid name cannot be a structure, an array of structures, or any
portion of a single vector or a matrix."

v2: use without-array() to simplify code (Timothy)

No Piglit or CTS regressions observed.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>

docs: add news item and link release notes for 11.0.4

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

docs: add sha256 checksums for 11.0.4

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit ec14e6f8fd05999b482e0785d8cd286042c9c254)

docs: add release notes for 11.0.4

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 31bf24703193cc23961923e01548b1acb2760a93)

i965: Make brw_varying_to_offset take a const pointer to the VUE map.

It doesn't modify it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

vc4: Fix names of the 16-bit unpacks

They're only f16-to-f32 on a float operation, otherwise they're
i16-to-i32.

vc4: Don't try to register coalesce into the VPM across non-raw MOVs.

No known bugs, just something I noticed while updating optimization code
for other changes.

vc4: Take advantage of the 8888 pack function in pack_unorm_4x8.

One instruction instead of four, and it turns out you do this a lot for
the Over operator.

total uniforms in shared programs: 32168 -> 32087 (-0.25%)
uniforms in affected programs: 318 -> 237 (-25.47%)
total instructions in shared programs: 89830 -> 89472 (-0.40%)
instructions in affected programs: 6434 -> 6076 (-5.56%)

vc4: Fix the test for skipping raw MOVs.

I don't know what previous test was trying to do, but it dates back to the
first add of vc4_qpu_emit.c. No change to shader-db.

i965: Remove unused devinfo revision

I left the function to obtain the revision because it is, and will continue to
be useful in the future. I'd rather not have to dig it up every time we need it.
Comments left at the implementation to say as much.

This was accidentally left here when I moved the early platform support:
commit 28ed1e08e8ba98ebd4ff0b56326372f0df9c73ad
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Fri Aug 7 13:58:37 2015 -0700

i965/skl: Remove early platform support

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

docs/index.html: fix typo

Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

freedreno: remove unnecessary null checks

According to piglit/xonotic/neverball/stc, blend/rasterize/zsa state
will always be bound (never null). And the null checks were in-
consistent anyways, so remove them.

Signed-off-by: Rob Clark <robclark@freedesktop.org>

radeonsi: Implement DCC fast clear.

Uses the DCC buffer instead of the CMASK buffer. The ELIMINATE_FAST_CLEAR
still works. Furthermore, with DCC compression we can directly clear
to a limited set of colors such that we do not need a postprocessing step.

v2 Marek: check dcc_buffer && dirty_level_mask in set_sampler_view

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

gallivm: fix tex offsets with mirror repeat linear

Can't see why anyone would ever want to use this, but it was clearly broken.
This fixes the piglit texwrap offset test using this combination.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>