mesa.git
6 years agointel/common/icl: Add L3 config
Anuj Phogat [Thu, 20 Jul 2017 23:23:24 +0000 (16:23 -0700)]
intel/common/icl: Add L3 config

ICL uses the same L3 configs as CNL, just leaving the SLM configs out.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/tools/aubinator: Drop platform list from print_help()
Matt Turner [Wed, 21 Mar 2018 21:05:09 +0000 (14:05 -0700)]
intel/tools/aubinator: Drop platform list from print_help()

We all know the platform names, and I don't want to update this list
continually.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoegl/wayland: Make swrast display_sync the correct queue
Derek Foreman [Thu, 22 Mar 2018 15:20:43 +0000 (10:20 -0500)]
egl/wayland: Make swrast display_sync the correct queue

commit 03dd9a88b0be17ff0ce91e92f6902a9a85ba584a introduced per surface
queues, but the display_sync for swrast_commit_backbuffer remained on
the old queue.  This is likely to break when dispatching the correct
queue at the top of function (which can't dispatch the sync callback
we're waiting for).

The easiest known reproduction case is running weston-subsurfaces under
weston --use-pixman

Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
6 years agoradv: remove unused radv_pipeline::needs_data_cache variable
Samuel Pitoiset [Thu, 22 Mar 2018 13:30:37 +0000 (14:30 +0100)]
radv: remove unused radv_pipeline::needs_data_cache variable

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agomeson: merge C and C++ compiler arguments check
Eric Engestrom [Mon, 12 Mar 2018 14:54:50 +0000 (14:54 +0000)]
meson: merge C and C++ compiler arguments check

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agoomx: always define ENABLE_ST_OMX_{BELLAGIO,TIZONIA}
Mathias Fröhlich [Fri, 16 Mar 2018 05:34:35 +0000 (06:34 +0100)]
omx: always define ENABLE_ST_OMX_{BELLAGIO,TIZONIA}

We're trying to be -Wundef clean so that we can turn it on (and
eventually make it an error).

Note that the OMX code already used `#if ENABLE_ST_OMX_BELLAGIO` instead
of #ifdef; I could've changed these, but the point of -Wundef is to
catch typos, so we might as well make the change the right way.

Fixes: 83d4a5d5aea5a8a05be2 "st/omx/tizonia: Add H.264 decoder"
Fixes: b2f2236dc565dd1460f0 "st/omx/tizonia: Add H.264 encoder"
Fixes: c62cf1f165919bc74296 "st/omx/tizonia/h264d: Add EGLImage support"
Cc: Gurkirpal Singh <gurkirpal204@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agomeson: simplify omx logic
Mathias Fröhlich [Fri, 16 Mar 2018 05:34:35 +0000 (06:34 +0100)]
meson: simplify omx logic

and let's make sure `with_gallium_omx` is never 'auto' and can only be
one of [bellagio, tizonia, disabled].

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agovbo: Remove now duplicate _DrawVAO notification.
Mathias Fröhlich [Fri, 16 Mar 2018 05:34:35 +0000 (06:34 +0100)]
vbo: Remove now duplicate _DrawVAO notification.

The DriverFlags.NewArray bit is already set to NewDriverState in
_mesa_set_draw_vao since we have actually just above changed the VAOs
content. So this can be removed.
The _vbo_update_inputs is called by the vbo...recalculate_inputs being
set through the same mechanism as described above.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Remove now duplicate _vbo_update_inputs from dlist draw.
Mathias Fröhlich [Fri, 16 Mar 2018 05:34:35 +0000 (06:34 +0100)]
vbo: Remove now duplicate _vbo_update_inputs from dlist draw.

At the current state, _vbo_update_inputs is called from
the draw callback if vbo...recalculate_inputs is set.
But that is now set of the _DrawVAO or its content or the
vertex program mode is changed.
So remove _vbo_update_inputs from the direct dlist draw path.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Remove redundant set of DriverFlags.NewArray in vbo_bind_arrays.
Mathias Fröhlich [Fri, 16 Mar 2018 05:34:35 +0000 (06:34 +0100)]
vbo: Remove redundant set of DriverFlags.NewArray in vbo_bind_arrays.

Now that setting vbo...recalculate_inputs also sets the
DriverFlags.NewArray bits into the NewDriverState setting that from
vbo_bind_arrays is redundant.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Remove vbo...recalculate_inputs from vbo_exec_invalidate_state.
Mathias Fröhlich [Fri, 16 Mar 2018 05:34:35 +0000 (06:34 +0100)]
vbo: Remove vbo...recalculate_inputs from vbo_exec_invalidate_state.

This flag is now set when the actual Array._DrawVAO changes.
So setting this flag is redundant here.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agomesa: A change of gl_vertex_processing_mode needs an array update.
Mathias Fröhlich [Fri, 16 Mar 2018 05:34:35 +0000 (06:34 +0100)]
mesa: A change of gl_vertex_processing_mode needs an array update.

Since arrays also handle the mapping of current values into the
disabled array slots, we need to tell the array update code that
this mapping has changed. Also mark only dirty if it has changed.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agomesa: Set DriverFlags.NewArray together with vbo...recalculate_inputs.
Mathias Fröhlich [Fri, 16 Mar 2018 05:34:35 +0000 (06:34 +0100)]
mesa: Set DriverFlags.NewArray together with vbo...recalculate_inputs.

Both mean something very similar and are set at the same time now.
For that vbo module to be set from core mesa, implement a public vbo
module method to set that flag. In the longer term the flag should
vanish in favor of a driver flag of the appropriate driver.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agomesa: Update VAO internal state when setting the _DrawVAO.
Mathias Fröhlich [Fri, 16 Mar 2018 05:34:35 +0000 (06:34 +0100)]
mesa: Update VAO internal state when setting the _DrawVAO.

Update the VAO internal state on Array._DrawVAO instead of
Array.VAO. Also the VAO internal state update gets triggered now
by a change of Array._DrawVAO instead of the _NEW_ARRAY state flag.
Also no driver looks at any VAO's NewArrays value from within
the Driver.UpdateState callback. So it should be safe to move
this update into the _mesa_set_draw_vao method.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agovbo: Move vbo_bind_arrays into a dd_driver_functions draw callback.
Mathias Fröhlich [Fri, 16 Mar 2018 05:34:35 +0000 (06:34 +0100)]
vbo: Move vbo_bind_arrays into a dd_driver_functions draw callback.

Factor out that common call into the almost single place.
Remove the _mesa_set_drawing_arrays call from vbo_{exec,save}_draw code
paths as the function is now called through vbo_bind_arrays.
Prepare updating the list of struct gl_vertex_array entries via
calling _vbo_update_inputs for being pushed into those drivers that
finally work on that long list of gl_vertex_array pointers.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agomesa: Move vbo draw functions into dd_function_table.
Mathias Fröhlich [Fri, 16 Mar 2018 05:34:35 +0000 (06:34 +0100)]
mesa: Move vbo draw functions into dd_function_table.

Move vbo draw functions into struct dd_function_table.
For now just wrap the underlying vbo functions.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agoclover/llvm: Fix build against LLVM/Clang 4.0
Aaron Watry [Thu, 22 Mar 2018 01:21:51 +0000 (20:21 -0500)]
clover/llvm: Fix build against LLVM/Clang 4.0

The opencl 1.0 langstandard was renamed in 5.0+

v2: Move preprocessor check into compat.hpp

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
6 years agoac/nir_to_llvm: add frexp support
Timothy Arceri [Tue, 20 Mar 2018 02:07:22 +0000 (13:07 +1100)]
ac/nir_to_llvm: add frexp support

Fixes CTS tests:
KHR-GL40.gpu_shader_fp64.builtin.frexp_double
KHR-GL40.gpu_shader_fp64.builtin.frexp_dvec2
KHR-GL40.gpu_shader_fp64.builtin.frexp_dvec3
KHR-GL40.gpu_shader_fp64.builtin.frexp_dvec4

And piglit test:
tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-frexp-dvec4.shader_test

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agonir: add frexp_exp and frexp_sig opcodes
Timothy Arceri [Tue, 20 Mar 2018 02:06:23 +0000 (13:06 +1100)]
nir: add frexp_exp and frexp_sig opcodes

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoanv/pipeline: don't pass constant view index in multiview
Caio Marcelo de Oliveira Filho [Tue, 27 Feb 2018 19:46:51 +0000 (11:46 -0800)]
anv/pipeline: don't pass constant view index in multiview

If view mask has only one bit set, view index is effectively a
constant, so doesn't need to be passed to the next stages, just always
set it.

Part of this was in the original patch that added
anv_nir_lower_multiview.c but disabled.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoanv/pipeline: use less instructions for multiview
Caio Marcelo de Oliveira Filho [Wed, 14 Mar 2018 23:12:44 +0000 (16:12 -0700)]
anv/pipeline: use less instructions for multiview

The view_index is encoded in the remainder of dividing instance id by
the number of views in the view mask (n). In the general case (handled
by the else clause), there is a need to map from 0..n-1 into the
number of the view being masked. For that a map is encoded.

In the case only the first n bits in the mask are set, the mapping is
trivial, 0..n-1 already represent what view is being referred to.

That case was in the original patch that added
anv_nir_lower_multiview.c but disabled.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agobroadcom/vc5: Fix up the NIR types of FS outputs generated by NIR-to-TGSI.
Eric Anholt [Wed, 21 Mar 2018 19:05:54 +0000 (12:05 -0700)]
broadcom/vc5: Fix up the NIR types of FS outputs generated by NIR-to-TGSI.

Unfortunately TGSI doesn't record the type of the FS output like GLSL
does, but VC5's TLB writes depend on the output's base type.  Just record
the type in the key at variant compile time when we've got a TGSI input
and then fix it up.

Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgba32i/ui and apparently a
GPU hang that breaks most tests that come after it.

6 years agospirv: Add a 64-bit implementation of Frexp
Neil Roberts [Thu, 8 Mar 2018 16:07:46 +0000 (17:07 +0100)]
spirv: Add a 64-bit implementation of Frexp

The implementation is inspired by
lower_instructions_visitor::dfrexp_sig_to_arith.

This has been tested against the arb_gpu_shader_fp64/fs-frexp-dvec4
test using the ARB_gl_spirv branch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoaubinator_error_decode: Compare only the class_name of the ring.
Rafael Antognolli [Tue, 20 Mar 2018 16:13:08 +0000 (09:13 -0700)]
aubinator_error_decode: Compare only the class_name of the ring.

ring_name is "<class_name> + <instance_id>" (e.g. rcs0). So we need to
first compare the class name only, then get the instance id.

Without this, INSTDONE is not being decoded.

Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agonir: Migrate nir_dce to instr worklist
Thomas Helland [Tue, 30 Jan 2018 20:35:50 +0000 (21:35 +0100)]
nir: Migrate nir_dce to instr worklist

Shader-db runtime change avarage of five runs:
   Before 125,77 seconds (+/- 0,09%)
   After  124,48 seconds (+/- 0,07%)

Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric at anholt.net>
6 years agonir: Initial implementation of a nir_instr_worklist
Thomas Helland [Tue, 30 Jan 2018 20:24:44 +0000 (21:24 +0100)]
nir: Initial implementation of a nir_instr_worklist

Make a simple worklist by basically just wrapping u_vector.
This is intended used in nir_opt_dce to reduce the number of calls
to ralloc, as we are currenlty spamming ralloc quite bad. It should
also give better cache locality and much lower memory usage.

Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric at anholt.net>
6 years agointel/tools: aubinator: Catch gen11 "enhanced execlist" submission
Scott D Phillips [Sat, 10 Mar 2018 00:29:41 +0000 (16:29 -0800)]
intel/tools: aubinator: Catch gen11 "enhanced execlist" submission

Different registers are used for execlist submission in gen11, so
also watch those. This code only watches element zero of the
submit queue, which is all aubdump currently writes.

Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
6 years agoradeonsi: fix a snprintf warning on gcc 7.3.0
Marek Olšák [Tue, 20 Mar 2018 21:02:43 +0000 (17:02 -0400)]
radeonsi: fix a snprintf warning on gcc 7.3.0

6 years agoradeonsi/gfx9: print the swizzle mode for testdma
Marek Olšák [Sun, 11 Mar 2018 17:11:01 +0000 (13:11 -0400)]
radeonsi/gfx9: print the swizzle mode for testdma

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
6 years agoac/surface: compute tile swizzle for GFX9
Marek Olšák [Fri, 28 Jul 2017 23:40:48 +0000 (01:40 +0200)]
ac/surface: compute tile swizzle for GFX9

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
6 years agobroadcom/vc5: Don't skip job submit just because everything is scissored.
Eric Anholt [Tue, 20 Mar 2018 19:52:19 +0000 (12:52 -0700)]
broadcom/vc5: Don't skip job submit just because everything is scissored.

The coordinate shaders may now have side effects in the form of transform
feedback.

Part of fixing
GTF-GLES3.gtf.GL3Tests.transform_feedback.transform_feedback_misc

6 years agobroadcom/vc5: Handle sparsely populated SO target array.
Eric Anholt [Tue, 20 Mar 2018 18:09:02 +0000 (11:09 -0700)]
broadcom/vc5: Handle sparsely populated SO target array.

Fixes
GTF-GLES3.gtf.GL3Tests.transform_feedback.transform_feedback_state_variables

6 years agobroadcom/vc5: Fix 3D miplevel limit to match other texture targets.
Eric Anholt [Tue, 20 Mar 2018 17:48:11 +0000 (10:48 -0700)]
broadcom/vc5: Fix 3D miplevel limit to match other texture targets.

Fixes segfault in
GTF-GLES3.gtf.GL3Tests.texture_storage.texture_storage_texture_levels on
level 13.

6 years agobroadcom/vc5: Clamp the instance divisor to 16 bits.
Eric Anholt [Tue, 20 Mar 2018 17:00:21 +0000 (10:00 -0700)]
broadcom/vc5: Clamp the instance divisor to 16 bits.

Fixes debug assert on
GTF-GLES3.gtf.GL3Tests.instanced_arrays.instanced_arrays_divisor

Signed-off-by: Eric Anholt <eric@anholt.net>
6 years agoi965: fix android build
Lionel Landwerlin [Tue, 20 Mar 2018 21:11:58 +0000 (21:11 +0000)]
i965: fix android build

This is the equivalent of commit 5770e1d89e0eb49eb3c9547e8657d636b6e7e5d7 for
android.

v2: fix xml files path and file given to --header

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: 2d2b15fbcab ("i965: fix autotools/android build")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105634
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agodocs: fix typo in 17.3.6 release notes
Juan A. Suarez Romero [Wed, 21 Mar 2018 16:31:13 +0000 (16:31 +0000)]
docs: fix typo in 17.3.6 release notes

Title is about 17.3.5, when it must be about 17.3.6.

CC: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
6 years agonir/dead_cf: also remove useless ifs
Caio Marcelo de Oliveira Filho [Mon, 19 Mar 2018 23:34:17 +0000 (16:34 -0700)]
nir/dead_cf: also remove useless ifs

Generalize the code for remove dead loops to also remove dead if
nodes. The conditions are the same in both cases, if the node (and
it's children) don't have side-effects AND the nodes after it don't
use the values produced by the node.

The only difference is when evaluating side effects: loops consider
only return jumps as a side-effect -- they can stop execution of nodes
after it; 'if' nodes outside loops should consider all kinds of
jumps (return, break, continue) since all of them can cause execution
of nodes after it to be skipped.

After this patch, empty ifs (those which both then and else blocks are
empty) will be removed by nir_opt_dead_cf.

It caused no change to shader-db, in part because the removal of empty
ifs is currently covered by nir_opt_peephole_select.

v2: Improve the identification of cases where break/continue can cause
    side-effects. (Jason)

v3: Move code comment changes to a different patch. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agonir/dead_cf: rephrase definition of a dead loop node
Caio Marcelo de Oliveira Filho [Mon, 19 Mar 2018 23:34:16 +0000 (16:34 -0700)]
nir/dead_cf: rephrase definition of a dead loop node

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agodocs: update calendar, add news and link release notes to 17.3.7
Juan A. Suarez Romero [Wed, 21 Mar 2018 16:02:37 +0000 (16:02 +0000)]
docs: update calendar, add news and link release notes to 17.3.7

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
6 years agodocs: add sha256 checksums for 17.3.7
Juan A. Suarez Romero [Wed, 21 Mar 2018 15:57:23 +0000 (15:57 +0000)]
docs: add sha256 checksums for 17.3.7

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 13dd6016d749c07bfe2f20206a0bb8929ee585e8)

6 years agodocs: add release notes for 17.3.7
Juan A. Suarez Romero [Wed, 21 Mar 2018 13:10:00 +0000 (13:10 +0000)]
docs: add release notes for 17.3.7

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 8a51f3857c22cfa5feab8e72abcdab8802e711df)

6 years agoradeon/vce: move feedback command inside of destroy function
Leo Liu [Mon, 19 Mar 2018 15:16:46 +0000 (11:16 -0400)]
radeon/vce: move feedback command inside of destroy function

On the CI family, firmware requires the destory command have to be the
last command in the IB, moving feedback command after destroy is causing
issues on CI cards, so we have to keep the previous logic that moves
destroy back to the last command.

But as the original issue fixed previously, with the newer family like Vega10,
feedback command have to be included inside of the task info command along
with destroy command.

Fixes: 6d74cb25("radeon/vce: move destroy command before feedback command")
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: mesa-stable@lists.freedesktop.org
6 years agoegl: pull update from Khronos and drop local define
Eric Engestrom [Fri, 16 Mar 2018 15:33:04 +0000 (15:33 +0000)]
egl: pull update from Khronos and drop local define

Added in Khronos in 2b6bb4ee45cc46c89d4a "EGL_MESA_drm_image: add
EGL_DRM_BUFFER_USE_CURSOR_MESA to egl.xml" [1] as part of PR #36 [2].

[1] https://github.com/KhronosGroup/EGL-Registry/commit/2b6bb4ee45cc46c89d4a4349f2ca94e80d77cd97
[2] https://github.com/KhronosGroup/EGL-Registry/pull/36

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoegl: align the formatting of Haiku section of eglplatform.h with Khronos'
Eric Engestrom [Fri, 16 Mar 2018 14:17:54 +0000 (14:17 +0000)]
egl: align the formatting of Haiku section of eglplatform.h with Khronos'

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoegl: add Ozone section to eglplatform.h
Eric Engestrom [Fri, 16 Mar 2018 14:14:48 +0000 (14:14 +0000)]
egl: add Ozone section to eglplatform.h

This pulls in commit a93f559e9c11fa53fb5f1cc255b8f75433f85d2a "Add Ozone
section to eglplatform.h" from Khronos [1] added by Brian Anderson [2]
a few months ago.

[1] https://github.com/KhronosGroup/EGL-Registry/commit/a93f559e9c11fa53fb5f1cc255b8f75433f85d2a
[2] https://github.com/KhronosGroup/EGL-Registry/pull/26

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoclover: Dynamically calculate __OPENCL_VERSION__ and CLC language version
Aaron Watry [Wed, 28 Feb 2018 02:49:03 +0000 (20:49 -0600)]
clover: Dynamically calculate __OPENCL_VERSION__ and CLC language version

Use get_language_version to calculate default cl standard based on
device capabilities and -cl-std specified in build options.

v5; move dev_clc_version declaration from an earlier patch
v4: Squash the __OPENCL_VERSION__ and CLC language version patches
v3: (Jan) Allow device_version up to 2.2 while device_clc_version
    only goes to 2.0
    Use get_cl_version to calculate version instead
v2: Split out from the previous patch (Pierre)

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
CC: Jan Vesely <jan.vesely@rutgers.edu>
6 years agoclover/llvm: Add get_[cl|language]_version, validation and some helpers
Aaron Watry [Sat, 22 Jul 2017 02:43:38 +0000 (21:43 -0500)]
clover/llvm: Add get_[cl|language]_version, validation and some helpers

Used to calculate the default CLC language version based on the --cl-std in build args
and the device capabilities.

According to section 5.8.4.5 of the 2.0 spec, the CL C version is chosen by:
 1) If you have -cl-std=CL1.1+ use the version specified
 2) If not, use the highest 1.x version that the device supports

Curiously, there is no valid value for -cl-std=CL1.0

Validates requested cl-std against device_clc_version

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
v7: (Pierre) Split cl/clc versions into separate lists and
    make more references const.

v6: (Pierre) Add more const and fix some whitespace

v5: (Aaron) Use a collection of cl versions instead of switch cases
    Consolidates the string, numeric version, and clc langstandard::kind

v4: (Pierre) Split get_language_version addition and use into separate patches
    Squash patches that add the helpers and validate the language standard

v3: Change device_version to device_clc_version

v2: (Pierre) Move create_compiler_instance changes to correct patch
    to prevent temporary build breakage.
    Convert version_str into unsigned and use it to find language version
    Add build_error for unknown language version string
    Whitespace fixes

6 years agodocs: add 17.3.{8,9} in the release calendar
Juan A. Suarez Romero [Mon, 19 Mar 2018 12:54:23 +0000 (13:54 +0100)]
docs: add 17.3.{8,9} in the release calendar

Mesa 18.0 series has not been released yet, so let's extend 17.3 lifetime.

v2: add 17.3.9 in the calendar (Andres Gomez)

CC: Andres Gomez <agomez@igalia.com>
CC: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agointel/blorp: Fix compiler warning about num_layers.
Eric Anholt [Sat, 10 Feb 2018 10:29:56 +0000 (10:29 +0000)]
intel/blorp: Fix compiler warning about num_layers.

The compiler doesn't notice that the condition for num_layers to be
undefined already defined it above (as our assert checked in a debug
build).

v2: Move the pair of assignments to one outside of the block.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoradv: add support for VK_EXT_depth_range_unrestricted
Samuel Pitoiset [Fri, 16 Mar 2018 15:39:27 +0000 (16:39 +0100)]
radv: add support for VK_EXT_depth_range_unrestricted

This extension removes the restrictions on minDepth/maxDepth,
minDepthBounds/maxDepthBounds and VkClearDepthStencilValue::depth.

The following CTS tests now pass:

dEQP-VK.glsl.builtin_var.fragdepth.line_list_d32_sfloat_large_depth
dEQP-VK.glsl.builtin_var.fragdepth.point_list_d32_sfloat_large_depth
dEQP-VK.glsl.builtin_var.fragdepth.triangle_list_d32_sfloat_large_depth
dEQP-VK.draw.inverted_depth_ranges.nodepthclamp_depth_range_unrestricted
dEQP-VK.draw.inverted_depth_ranges.depthclamp_depth_range_unrestricted

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: only enable one channel when exporting prim id
Samuel Pitoiset [Tue, 20 Mar 2018 09:07:30 +0000 (10:07 +0100)]
radv: only enable one channel when exporting prim id

It's a 32-bit integer like the layer.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoi965: fix out of tree autotools build
Lionel Landwerlin [Tue, 20 Mar 2018 18:31:53 +0000 (18:31 +0000)]
i965: fix out of tree autotools build

Fixes: 2d2b15fbcab ("i965: fix autotools/android build")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
6 years agovirgl: Implement seamless cube maps
Stéphane Marchesin [Sat, 17 Mar 2018 02:15:02 +0000 (19:15 -0700)]
virgl: Implement seamless cube maps

This was previously ignored.

Along with the virglrenderer patch, this fixes ~100 dEQP tests:
dEQP-GLES3.functional.texture.filtering.cube.*

Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agoi965: annotate brw_oa.py's --header and --code as required
Emil Velikov [Tue, 20 Mar 2018 16:23:05 +0000 (16:23 +0000)]
i965: annotate brw_oa.py's --header and --code as required

As of earlier commit, the --header was made a hard requirement when
using --code.

Hence - annotate both as required and drop a few no longer needed
checks.

Fixes: 035cc7a12dc0 ("i965: perf: reduce i965 binary size")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoi965: pipecontrol: add LRI write immediate flag
Lionel Landwerlin [Thu, 15 Mar 2018 12:11:15 +0000 (12:11 +0000)]
i965: pipecontrol: add LRI write immediate flag

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel: genxml: add INSTPM/CS_DEBUG_MODE2 registers
Lionel Landwerlin [Fri, 2 Mar 2018 16:44:14 +0000 (16:44 +0000)]
intel: genxml: add INSTPM/CS_DEBUG_MODE2 registers

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965: fix autotools/android build
Lionel Landwerlin [Tue, 20 Mar 2018 14:59:57 +0000 (14:59 +0000)]
i965: fix autotools/android build

Autotools/android builds generate the header & code files in 2 steps,
but the code generation requires the name of the header file to
include it.

This change generates both files in one command.

Fixes: 035cc7a12dc ("i965: perf: reduce i965 binary size")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agodri3: Fix typo in version check
Daniel Stone [Tue, 20 Mar 2018 16:05:13 +0000 (16:05 +0000)]
dri3: Fix typo in version check

The have-new-DRI3 codepaths would never actually properly trigger, since
there was a typo in configure.ac which broke the version check. This
went unnoticed but for an error in config.log if you looked closely
enough.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Lukas F. Hartmann <lukas@mntmn.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Fixes: 7aeef2d4efdc ("dri3: allow building against older xcb (v3)")
Cc: Dave Airlie <airlied@redhat.com>
6 years agomeson: Don't build svga by default on ARM/AArch64
Daniel Stone [Tue, 27 Feb 2018 18:00:23 +0000 (18:00 +0000)]
meson: Don't build svga by default on ARM/AArch64

VMware has no (published) support for Arm-architecture guests.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reported-by: Dylan Baker <dylan@pnwbakers.com>
6 years agomeson: Add default DRI drivers for ARM/AArch64
Daniel Stone [Tue, 27 Feb 2018 10:00:24 +0000 (10:00 +0000)]
meson: Add default DRI drivers for ARM/AArch64

On all Arm architectures (ARMv7 and below as 'arm', ARMv8 and above as
'aarch64'), only build swrast for DRI drivers. The only classic drivers
which could be used are r200 and NV20 cards, which seems unlikely enough
that it shouldn't be the default.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Javier Jardón <jjardon@gnome.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
6 years agost/mesa: add compiler/nir/ prefix for nir includes
Emil Velikov [Tue, 20 Mar 2018 11:39:57 +0000 (11:39 +0000)]
st/mesa: add compiler/nir/ prefix for nir includes

Stay consistent with the rest of the codebase, effectively fixing the
autotools build.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105621
Fixes: ffa4bbe4665 ("st/nir/radeonsi: move nir_lower_uniforms_to_ubo()
to the state tracker")
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agoanv: off-by-one in GetDescriptorSetLayoutSupport
Scott D Phillips [Mon, 19 Mar 2018 22:39:25 +0000 (15:39 -0700)]
anv: off-by-one in GetDescriptorSetLayoutSupport

Loop was accessing one more than bindingCount elements from
pBindings, accessing uninitialized memory.

Fixes: ddc4069122 ("anv: Implement VK_KHR_maintenance3")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoi965: perf: reduce i965 binary size
Lionel Landwerlin [Tue, 13 Mar 2018 11:21:17 +0000 (11:21 +0000)]
i965: perf: reduce i965 binary size

Performance metric numbers are calculated the following way :

   - out of the 256 bytes long OA reports, we accumulate the deltas
     into an array of uint64_t

   - the equations' generated code reads the accumulated uint64_t
     deltas and normalizes them for a particular platform

Our hardware is such that a number of counters in the OA reports
always return the same values (i.e. they're not programmable), and
they return the same values even across generations, and as a result a
number of equations are identical in different metric sets across
different generations.

Up to now we've kept the generated code of the equations separated in
different files (per generation/GT), and didn't apply any
factorization of the common equations. We could have make some
improvement by reusing equations within a given metrics file, but we
can go even further and reuse across generations (i.e. all files).

This change changes the code generation to emit a single file in which
we reuse equations emitted code based on the hash of equations'
strings.

Here are the savings in a meson build :

Before(.old)/after :
   $ du -h ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
   43M ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so
   47M ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old

   $ size build/src/mesa/drivers/dri/libmesa_dri_drivers.so build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
       text   data          bss      dec            hex filename
   13054002 409424  671856 14135282  d7aff2 build/src/mesa/drivers/dri/libmesa_dri_drivers.so
   14550386 409552  671856 15631794  ee85b2 build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old

As a side comment here is the size of the drivers if we remove all of
the metrics from the build :

   $ du -sh build/src/mesa/drivers/dri/libmesa_dri_drivers.so
   40M build/src/mesa/drivers/dri/libmesa_dri_drivers.so

v2: Fix an issue with hashing of counter equations (Lionel)
    Build system rework (Emil)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (build system part)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965: perf: fix a counter return type on hsw
Lionel Landwerlin [Tue, 13 Mar 2018 11:45:12 +0000 (11:45 +0000)]
i965: perf: fix a counter return type on hsw

The equation code computes a float (percentage) yet the return type
was an uint64_t.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agomesa: fix leaking ParameterValueOffset
Tapani Pälli [Tue, 20 Mar 2018 06:55:28 +0000 (08:55 +0200)]
mesa: fix leaking ParameterValueOffset

==15115== 48 bytes in 1 blocks are definitely lost in loss record 16 of 66
==15115==    at 0x4C2EC15: realloc (vg_replace_malloc.c:785)
==15115==    by 0x8602C3E: _mesa_reserve_parameter_storage (prog_parameter.c:212)
==15115==    by 0x8602D1E: _mesa_add_parameter (prog_parameter.c:252)
==15115==    by 0x86032C4: _mesa_add_sized_state_reference (prog_parameter.c:384)
==15115==    by 0x8603324: _mesa_add_state_reference (prog_parameter.c:409)

Fixes: edded12376 "mesa: rework ParameterList to allow packing"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agodri3: Don't fail on version mismatch
Daniel Stone [Mon, 19 Mar 2018 15:03:22 +0000 (15:03 +0000)]
dri3: Don't fail on version mismatch

The previous commit to make DRI3 modifier support optional, breaks with
an updated server and old client.

Make sure we never set multibuffers_available unless we also support it
locally. Make sure we don't call stubs of new-DRI3 functions (or empty
branches) which will never succeed.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 7aeef2d4efdc ("dri3: allow building against older xcb (v3)")
6 years agoradv: don't lower indirects until after opts have run
Timothy Arceri [Thu, 8 Mar 2018 05:20:48 +0000 (16:20 +1100)]
radv: don't lower indirects until after opts have run

Noticed while passing by. Not sure if it impacts anything, but
likely to impact GFX9 more than anything else since we lower
inputs, outputs and locals there.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agost/nir: fix atomic lowering for gallium drivers
Timothy Arceri [Mon, 19 Mar 2018 11:23:55 +0000 (22:23 +1100)]
st/nir: fix atomic lowering for gallium drivers

i965 and gallium handle the atomic buffer index differently. It was
just by luck that the single piglit test for this was passing.

For gallium we use the atomic binding so that we match the handling
in st_bind_atomics().

On radeonsi this fixes the CTS test:
KHR-GL43.shader_storage_buffer_object.advanced-write-fragment

It also fixes tressfx hair rendering in Tomb Raider.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agost/radeonsi: enable uniform packing in NIR backend
Timothy Arceri [Tue, 13 Mar 2018 22:51:23 +0000 (09:51 +1100)]
st/radeonsi: enable uniform packing in NIR backend

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agost: add uniform packing support to lower_uniforms_to_ubo()
Timothy Arceri [Fri, 9 Mar 2018 01:30:01 +0000 (12:30 +1100)]
st: add uniform packing support to lower_uniforms_to_ubo()

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agogallium: add packed uniform CAP
Timothy Arceri [Fri, 18 Aug 2017 05:51:48 +0000 (15:51 +1000)]
gallium: add packed uniform CAP

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agost/nir/radeonsi: move nir_lower_uniforms_to_ubo() to the state tracker
Timothy Arceri [Fri, 9 Mar 2018 00:57:52 +0000 (11:57 +1100)]
st/nir/radeonsi: move nir_lower_uniforms_to_ubo() to the state tracker

This will only ever be used by gallium drivers so it probably doesn't
belong in the nir toolkit. Also we want to pass it some non NIR
things in the following patch.

To avoid regressions we wrap the lowering calls that have been moved
to st_glsl_to_nir with a quick hack so that they are only called for
radeonsi, we will replace the hack with a check for uniform packing
in a following patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agost: add st_glsl_type_dword_size() helper
Timothy Arceri [Tue, 13 Mar 2018 01:34:50 +0000 (12:34 +1100)]
st: add st_glsl_type_dword_size() helper

This will be used to support uniform packing.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agost/glsl_to_nir: add support for packed builtin uniforms
Timothy Arceri [Tue, 13 Mar 2018 09:50:27 +0000 (20:50 +1100)]
st/glsl_to_nir: add support for packed builtin uniforms

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add _mesa_add_sized_state_reference() helper
Timothy Arceri [Tue, 13 Mar 2018 09:47:48 +0000 (20:47 +1100)]
mesa: add _mesa_add_sized_state_reference() helper

This will be used for adding packed builtin uniforms.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add support propagate uniform support for packed uniforms
Timothy Arceri [Tue, 13 Mar 2018 05:44:06 +0000 (16:44 +1100)]
mesa: add support propagate uniform support for packed uniforms

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: allow for uniform packing when adding uniforms to param list
Timothy Arceri [Tue, 20 Jun 2017 00:44:08 +0000 (10:44 +1000)]
mesa: allow for uniform packing when adding uniforms to param list

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agomesa: add packing support for setting uniform handles
Timothy Arceri [Tue, 20 Jun 2017 00:31:32 +0000 (10:31 +1000)]
mesa: add packing support for setting uniform handles

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agomesa: add packing support for setting uniforms
Timothy Arceri [Tue, 20 Jun 2017 00:38:05 +0000 (10:38 +1000)]
mesa: add packing support for setting uniforms

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agomesa: create copy uniform to storage helpers
Timothy Arceri [Fri, 16 Jun 2017 05:45:00 +0000 (15:45 +1000)]
mesa: create copy uniform to storage helpers

These will be used in the following patch to allow copying directly
to the param list when packing is enabled.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: rework ParameterList to allow packing
Timothy Arceri [Fri, 16 Jun 2017 00:17:56 +0000 (10:17 +1000)]
mesa: rework ParameterList to allow packing

Currently everything is padded to 4 components. Making the list
more flexible will allow us to do uniform packing.

V2 (suggestions from Nicolai):
- always pass existing calls to _mesa_add_parameter() true for padd_and_align
- fix bindless param value offsets
- remove left over wip logic from pad and align code
- zero out param value padding
- whitespace fix

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add PackedDriverUniformStorage const
Timothy Arceri [Wed, 14 Jun 2017 05:48:45 +0000 (15:48 +1000)]
mesa: add PackedDriverUniformStorage const

Will be used to determine whether to take packing code paths or not.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agobroadcom/vc5: Don't annotate dumps with stale live intervals.
Eric Anholt [Wed, 14 Mar 2018 18:03:23 +0000 (11:03 -0700)]
broadcom/vc5: Don't annotate dumps with stale live intervals.

As you're debugging register allocation, you may have changed the
intervals and not recomputed yet.  Just skip the dump in that case.

6 years agobroadcom/vc5: Add support for register spilling.
Eric Anholt [Tue, 13 Mar 2018 22:13:00 +0000 (15:13 -0700)]
broadcom/vc5: Add support for register spilling.

Our register spilling support is nice to have since vc4 couldn't at all,
but we're still very restricted due to needing to not spill during a TMU
operation, or during the last segment of the program (which would be nice
to spill a value of, when there's a long-lived value being passed through
with little modification from the start to the end).

We could do better by emitting unspills for the last-segment values just
before the last thrsw, since the last segment is probably not the maximum
interference area.

Fixes GTF uniform_buffer_object_arrays_of_all_valid_basic_types and 3
others.

6 years agobroadcom/vc5: Remove redundant last_inst lookup.
Eric Anholt [Wed, 14 Mar 2018 21:43:15 +0000 (14:43 -0700)]
broadcom/vc5: Remove redundant last_inst lookup.

The point was to get the MOV, which the MOV_dest already returned.

6 years agobroadcom/vc5: On QPU pack error, dump the instruction and return cleanly.
Eric Anholt [Wed, 14 Mar 2018 21:39:51 +0000 (14:39 -0700)]
broadcom/vc5: On QPU pack error, dump the instruction and return cleanly.

This is nice for debugging when you've made a bad instruction.

6 years agobroadcom/vc5: Add cursors to the compiler infrastructure, like NIR's.
Eric Anholt [Tue, 13 Mar 2018 22:41:16 +0000 (15:41 -0700)]
broadcom/vc5: Add cursors to the compiler infrastructure, like NIR's.

This will let me do lowering late in compilation using the same
instruction builder as we use in nir_to_vir.

6 years agobroadcom/vc5: Move the umul macro to a header.
Eric Anholt [Tue, 13 Mar 2018 23:23:33 +0000 (16:23 -0700)]
broadcom/vc5: Move the umul macro to a header.

Anywhere we want to multiply, we probably want this.

6 years agobroadcom/vc5: Correct the arg count of TIDX/EIDX.
Eric Anholt [Tue, 13 Mar 2018 23:08:25 +0000 (16:08 -0700)]
broadcom/vc5: Correct the arg count of TIDX/EIDX.

6 years agobroadcom/vc5: Re-do live variables after removing thrsws.
Eric Anholt [Sat, 24 Feb 2018 01:46:35 +0000 (17:46 -0800)]
broadcom/vc5: Re-do live variables after removing thrsws.

Otherwise our start/ends ips won't line up with the actual instructions.

6 years agobroadcom/vc5: Add a QPU helper for instructions using the TLB.
Eric Anholt [Mon, 19 Mar 2018 18:30:27 +0000 (11:30 -0700)]
broadcom/vc5: Add a QPU helper for instructions using the TLB.

This will be used for detecting last thread segment in register spilling.

6 years agobroadcom/vc5: Introduce v3d_qpu_reads_vpm()/v3d_qpu_writes_vpm().
Eric Anholt [Mon, 19 Mar 2018 18:03:47 +0000 (11:03 -0700)]
broadcom/vc5: Introduce v3d_qpu_reads_vpm()/v3d_qpu_writes_vpm().

These helpers will be used in register spilling to determine where to add
a last thrsw if needed, and might help refactor QPU scheduling.

6 years agobroadcom/vc5: The ldvpm signal also a case of using the VPM.
Eric Anholt [Mon, 19 Mar 2018 18:05:03 +0000 (11:05 -0700)]
broadcom/vc5: The ldvpm signal also a case of using the VPM.

The QPU scheduling code calling this function already separately checked
this signal.

6 years agobroadcom/vc5: Extract v3d_qpu_writes_tmu() helper.
Eric Anholt [Wed, 14 Mar 2018 22:04:32 +0000 (15:04 -0700)]
broadcom/vc5: Extract v3d_qpu_writes_tmu() helper.

This will be reused in register spilling.

6 years agoradv: don't export NULL layer.
Dave Airlie [Mon, 19 Mar 2018 20:02:58 +0000 (20:02 +0000)]
radv: don't export NULL layer.

We have some cases where in subpass we want the layer but having
it be 0 and loaded in the frag shader without the vertex shader
exporting it is fine.

So don't export the layer if we don't have a value to put in it.

Fixes: d4c74aed7a8 (radv/multiview: mark layer_input if we have input attachments.)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agomesa: adjust incorrect comment in texture_buffer_range
Marek Olšák [Tue, 6 Mar 2018 22:32:09 +0000 (17:32 -0500)]
mesa: adjust incorrect comment in texture_buffer_range

6 years agonir: Don't compare b2f or b2i with zero
Ian Romanick [Wed, 2 Mar 2016 03:05:14 +0000 (19:05 -0800)]
nir: Don't compare b2f or b2i with zero

All of the shaders that had loops changed were in Tomb Raider.  The one
shader that lost SIMD16 is one of those.

Skylake
total instructions in shared programs: 14391653 -> 14390468 (<.01%)
instructions in affected programs: 111891 -> 110706 (-1.06%)
helped: 501
HURT: 0
helped stats (abs) min: 1 max: 155 x̄: 2.37 x̃: 1
helped stats (rel) min: 0.05% max: 21.54% x̄: 1.61% x̃: 1.01%
95% mean confidence interval for instructions value: -3.23 -1.50
95% mean confidence interval for instructions %-change: -1.77% -1.45%
Instructions are helped.

total cycles in shared programs: 532793024 -> 532776598 (<.01%)
cycles in affected programs: 987682 -> 971256 (-1.66%)
helped: 348
nnHURT: 41
helped stats (abs) min: 1 max: 3074 x̄: 54.91 x̃: 18
helped stats (rel) min: 0.05% max: 32.24% x̄: 3.36% x̃: 1.68%
HURT stats (abs)   min: 1 max: 422 x̄: 65.39 x̃: 24
HURT stats (rel)   min: 0.09% max: 39.29% x̄: 9.50% x̃: 2.02%
95% mean confidence interval for cycles value: -64.08 -20.38
95% mean confidence interval for cycles %-change: -2.78% -1.23%
Cycles are helped.

total loops in shared programs: 4854 -> 4829 (-0.52%)
loops in affected programs: 27 -> 2 (-92.59%)
helped: 18
HURT: 0

LOST:   1
GAINED: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoradv: lower constant initializers on output variables earlier
Dave Airlie [Mon, 19 Mar 2018 04:27:49 +0000 (04:27 +0000)]
radv: lower constant initializers on output variables earlier

If a shader only writes to an output via a constant initializer we
need to lower it before we call nir_remove_dead_variables so that
this pass sees the stores from the initializer and doesn't kill the
output.

Fixes test failures in new work-in-progress CTS tests:
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output.float

This is ported from anv:
99b57daf4a anv/pipeline: lower constant initializers on output variables earlier
from Iago Toral Quiroga <itoral@igalia.com>

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv/query: handle multiview timestamp queries.
Dave Airlie [Mon, 19 Mar 2018 01:27:37 +0000 (01:27 +0000)]
radv/query: handle multiview timestamp queries.

For each view bit we need to emit a timestamp query.

Fixes: dEQP-VK.multiview.queries*
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv/query: handle multiview queries properly. (v3)
Dave Airlie [Thu, 15 Mar 2018 20:23:30 +0000 (20:23 +0000)]
radv/query: handle multiview queries properly. (v3)

For multiview we need to emit a number of sequential queries
depending on the view mask.

This avoids dEQP-VK.multiview.queries.15 waiting forever
on the CPU for query results that are never coming.

We only really want to emit one query,
and the rest should be blank (amdvlk does the same),
so we emit begin/end pairs for all the others except
the first query.

v2: fix tests
v3: split out patch.

Fixes: dEQP-VK.multiview.queries*
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>