mesa.git
7 years agoac/nir: Add ES output to LDS for GFX9.
Bas Nieuwenhuizen [Thu, 19 Oct 2017 23:27:12 +0000 (01:27 +0200)]
ac/nir: Add ES output to LDS for GFX9.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoac/nir: Add merged GS function.
Bas Nieuwenhuizen [Thu, 19 Oct 2017 23:06:50 +0000 (01:06 +0200)]
ac/nir: Add merged GS function.

[airlied: merged fixup + and fixed up a couple more bits].

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Only emit TES when it exists.
Bas Nieuwenhuizen [Thu, 19 Oct 2017 23:08:30 +0000 (01:08 +0200)]
radv: Only emit TES when it exists.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Use control shader presence for detecting tess.
Bas Nieuwenhuizen [Thu, 19 Oct 2017 23:42:34 +0000 (01:42 +0200)]
radv: Use control shader presence for detecting tess.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: fixup tess eval shader when combined.
Dave Airlie [Fri, 20 Oct 2017 02:45:51 +0000 (03:45 +0100)]
radv: fixup tess eval shader when combined.

This fixes some access to the tess eval shader when it's combined
with geometry on gfx9.

This is a review of Bas's commit:
radv: Prevent crashing by accessing TES for VGT reuse depth.

Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Set VGT_GS_MODE properly for gfx9
Bas Nieuwenhuizen [Fri, 20 Oct 2017 01:17:14 +0000 (03:17 +0200)]
radv: Set VGT_GS_MODE properly for gfx9

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: ensure correct outinfo is picked.
Dave Airlie [Fri, 20 Oct 2017 03:02:15 +0000 (04:02 +0100)]
radv: ensure correct outinfo is picked.

This struct used to rely on being in a union, it isn't anymore,
so we have to pick the correct outinfo struct now.

This should fix a regression since the union became a struct.

dEQP-VK.tessellation.geometry_interaction.point_size.vertex_set_geometry_set

Fixes: 6078a3bd51 (ac/nir: Allow ac_shader_variant_info to contain info about multiple stages.)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoswr: Rework scratch space allocation
George Kyriazis [Wed, 18 Oct 2017 19:10:26 +0000 (14:10 -0500)]
swr: Rework scratch space allocation

Remove allocation of > 2kbyte buffers into context memory in
swr_copy_to_scatch_space() (which is used to copy small vertex/index buffers
and shader constants to a scratch space to be used by the upcoming draw.)

Large shader constant allocations need to be done in the circular scratch
buffer instead of context memory, because their values persist across
render calls.

Also lower SCRATCH_SINGLE_ALLOCATION_LIMIT to 8k, since allocations of larger
buffers will get too large for the circular scratch space.

Fixes render issues with CEI Ensight.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoradv: Enable tessellation shaders for GFX9.
Bas Nieuwenhuizen [Thu, 19 Oct 2017 21:28:25 +0000 (23:28 +0200)]
radv: Enable tessellation shaders for  GFX9.

It mostly works now.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoac/nir: init full exec mask for merged shaders.
Dave Airlie [Thu, 19 Oct 2017 04:29:02 +0000 (05:29 +0100)]
ac/nir: init full exec mask for merged shaders.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: drop unused r600_htile_info.
Dave Airlie [Tue, 17 Oct 2017 06:12:28 +0000 (07:12 +0100)]
radv: drop unused r600_htile_info.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: fix CLEAR_STATE packet length.
Dave Airlie [Thu, 19 Oct 2017 03:52:29 +0000 (04:52 +0100)]
radv: fix CLEAR_STATE packet length.

Looking at shader traces I noticed some registers were missing,
one of them was being eaten by the wrong clear state length.

Fixes: 4f42ea4dc (radv: use CLEAR_STATE for initializing some registers)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agomeson: don't build gallium dri target if gallium is disabled
Dylan Baker [Thu, 19 Oct 2017 17:28:37 +0000 (10:28 -0700)]
meson: don't build gallium dri target if gallium is disabled

Otherwise -Dgallium-drivers= will cause libmesa_gallium to be built and
the megadriver install script to attempt to install drivers without any
actual drivers being built.

fixes: 66f97f6640f5316b36177fd1053f0027eb6ec6cc ("meson: build radeonsi")
Reported-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
7 years agoradv: copy indirect lowering settings from radeonsi
Timothy Arceri [Wed, 18 Oct 2017 22:27:04 +0000 (09:27 +1100)]
radv: copy indirect lowering settings from radeonsi

It looks the original indirect mask was probably copied from
ANV.

Sascha Willems demo results:

tessellation ~4000 -> ~4200 fps

V2: continue lowering local indirects due to llvm deficiencies.

Tested-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: stop redundant setting of active_stages
Timothy Arceri [Wed, 18 Oct 2017 22:27:03 +0000 (09:27 +1100)]
radv: stop redundant setting of active_stages

We already set it when above in the nir compilation loop.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agoac: move some code out of loop in store_tcs_output()
Timothy Arceri [Thu, 19 Oct 2017 06:01:35 +0000 (17:01 +1100)]
ac: move some code out of loop in store_tcs_output()

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agoradv: Modify rsrc1/rsrc2 generation for merged tess.
Bas Nieuwenhuizen [Tue, 17 Oct 2017 22:59:16 +0000 (00:59 +0200)]
radv: Modify rsrc1/rsrc2 generation for merged tess.

No OC_LDS_EN for HS, and the included LS vgpr_comp_cnt is at
a different offset.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Set correct registers for merged shader rings.
Bas Nieuwenhuizen [Mon, 16 Oct 2017 21:57:46 +0000 (23:57 +0200)]
radv: Set correct registers for merged shader rings.

We need different regs to end up in s0/s1.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Add GFX9 HS emitting code.
Bas Nieuwenhuizen [Tue, 17 Oct 2017 20:51:00 +0000 (22:51 +0200)]
radv: Add GFX9 HS emitting code.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Remove remaining hard coded references to VS.
Bas Nieuwenhuizen [Mon, 16 Oct 2017 16:27:47 +0000 (18:27 +0200)]
radv: Remove remaining hard coded references to VS.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Update GFX9 user data regs for GS/tess.
Bas Nieuwenhuizen [Mon, 16 Oct 2017 16:09:25 +0000 (18:09 +0200)]
radv: Update GFX9 user data regs for GS/tess.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Add code to compile merged shaders.
Bas Nieuwenhuizen [Mon, 16 Oct 2017 11:18:02 +0000 (13:18 +0200)]
radv: Add code to compile merged shaders.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoac/nir: Add LS-HS input VGPR workaround.
Bas Nieuwenhuizen [Thu, 19 Oct 2017 00:58:34 +0000 (02:58 +0200)]
ac/nir: Add LS-HS input VGPR workaround.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoac/nir: Compile the bodies of multiple shaders.
Bas Nieuwenhuizen [Wed, 18 Oct 2017 23:36:26 +0000 (01:36 +0200)]
ac/nir: Compile the bodies of multiple shaders.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoac/nir: Expand user SGPR descriptions a bit.
Bas Nieuwenhuizen [Mon, 16 Oct 2017 22:01:33 +0000 (00:01 +0200)]
ac/nir: Expand user SGPR descriptions a bit.

To prevent VS/TCS collisions in merged shaders.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoac/nir: Don't write to the dynamic HS word on GFX9.
Bas Nieuwenhuizen [Mon, 16 Oct 2017 15:45:06 +0000 (17:45 +0200)]
ac/nir: Don't write to the dynamic HS word on GFX9.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoac/nir: Add function creation for merged LS+HS.
Bas Nieuwenhuizen [Mon, 16 Oct 2017 14:32:41 +0000 (16:32 +0200)]
ac/nir: Add function creation for merged LS+HS.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoac/nir: Make scan_shader_output_decl less dependent on the context.
Bas Nieuwenhuizen [Mon, 16 Oct 2017 14:04:20 +0000 (16:04 +0200)]
ac/nir: Make scan_shader_output_decl less dependent on the context.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoac/nir: Allow ac_shader_variant_info to contain info about multiple stages.
Bas Nieuwenhuizen [Mon, 25 Sep 2017 03:54:55 +0000 (05:54 +0200)]
ac/nir: Allow ac_shader_variant_info to contain info about multiple stages.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoac/nir: Change interface to allow multiple source shaders.
Bas Nieuwenhuizen [Sun, 24 Sep 2017 23:05:49 +0000 (01:05 +0200)]
ac/nir: Change interface to allow multiple source shaders.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoac/nir: Add HS calling convention.
Bas Nieuwenhuizen [Mon, 16 Oct 2017 20:15:47 +0000 (22:15 +0200)]
ac/nir: Add HS calling convention.

Needed for GFX9 merged shaders.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoac: Parse the new HS RSRC1 register.
Bas Nieuwenhuizen [Mon, 16 Oct 2017 21:58:48 +0000 (23:58 +0200)]
ac: Parse the new HS RSRC1 register.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoswr: knob overrides for Intel Xeon Phi
Tim Rowley [Tue, 17 Oct 2017 20:11:19 +0000 (15:11 -0500)]
swr: knob overrides for Intel Xeon Phi

Architecture benefits from having more threads/work outstanding.

Patch by Jan Zielinski.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Add api to override draws in flight
Tim Rowley [Tue, 17 Oct 2017 20:02:53 +0000 (15:02 -0500)]
swr/rast: Add api to override draws in flight

Allow draws in flight to be overridden via SWR_CREATECONTEXT_INFO.

Patch by Jan Zielinski.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Widen fetch shader to SIMD16 (disabled for now)
Tim Rowley [Mon, 16 Oct 2017 23:39:41 +0000 (18:39 -0500)]
swr/rast: Widen fetch shader to SIMD16 (disabled for now)

Refactored the gather operation to process 16 elements at a time via
paired SIMD8 operations.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Change DS memory allocation
Tim Rowley [Wed, 11 Oct 2017 21:21:21 +0000 (16:21 -0500)]
swr/rast: Change DS memory allocation

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Fix indentation
Tim Rowley [Fri, 6 Oct 2017 18:50:14 +0000 (13:50 -0500)]
swr/rast: Fix indentation

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Miscellaneous viewport array code changes
Tim Rowley [Fri, 29 Sep 2017 19:45:16 +0000 (14:45 -0500)]
swr/rast: Miscellaneous viewport array code changes

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Minor changes for os-x
Tim Rowley [Wed, 27 Sep 2017 17:22:35 +0000 (12:22 -0500)]
swr/rast: Minor changes for os-x

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoi965: Don't disable aux buffers for non-overlapping miplevels.
Kenneth Graunke [Fri, 13 Oct 2017 03:47:41 +0000 (20:47 -0700)]
i965: Don't disable aux buffers for non-overlapping miplevels.

Meta's GenerateMipmap implementation binds the same image for both
sampling and rendering - but it samples from one miplevel while
rendering the next.  This is a false self-dependency, and there's
no need to disable auxiliary buffers in this case.  In fact, we really
want to leave it enabled so the new miplevels gain color compression.

Thankfully, the texture object's _MaxLevel is always one shy of the
miplevel being rendered.  So we can simply check if irb->mt_level is
overlaps with the texture's defined levels.  If not, there's no self-
dependency and we can leave the auxiliary buffers enabled.

Fixes a performance regression in GFXBench4 Car Chase, which apparently
calls glGenerateMipmap() on every frame.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103247
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by; Jason Ekstrand <jason@jlekstrand.net>

7 years agoi965: Remove the intel_miptree_prepare_fb_fetch wrapper.
Kenneth Graunke [Fri, 13 Oct 2017 05:24:18 +0000 (22:24 -0700)]
i965: Remove the intel_miptree_prepare_fb_fetch wrapper.

Now that intel_miptree_prepare_texture takes levels and layers, there's
not much use in this anymore.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by; Jason Ekstrand <jason@jlekstrand.net>

7 years agoi965: Only resolve texture levels/layers that are accessed.
Kenneth Graunke [Mon, 16 Oct 2017 19:28:17 +0000 (12:28 -0700)]
i965: Only resolve texture levels/layers that are accessed.

This should avoid unnecessary resolves when working with texture views.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by; Jason Ekstrand <jason@jlekstrand.net>

7 years agoi965: Make intel_miptree_prepare_texture() take level/layer arguments.
Kenneth Graunke [Fri, 13 Oct 2017 03:59:22 +0000 (20:59 -0700)]
i965: Make intel_miptree_prepare_texture() take level/layer arguments.

This effectively exports intel_miptree_prepare_texture_slices() as
intel_miptree_prepare_texture().  The hope is to avoid resolves for
when using texture views that access a subset of the levels/layers.

For now, we pass the same arguments to separate the mechanical change
from the one that actually modifies our behavior.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by; Jason Ekstrand <jason@jlekstrand.net>

7 years agogallium: add more exceptions to tgsi_util_get_inst_usage_mask
Tim Rowley [Thu, 19 Oct 2017 14:13:46 +0000 (09:13 -0500)]
gallium: add more exceptions to tgsi_util_get_inst_usage_mask

A number of double/int64 operations don't have matching
read and write usage masks, which the fallthrough case of
tgsi_util_get_inst_usage_mask assumes for componentwise
tagged instructions.

No regressions in llvmpipe piglit; fixes a large number of
swr regressions.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoisl: Fix width check in isl_gen7_choose_msaa_layout.
Kenneth Graunke [Sun, 24 Sep 2017 21:50:17 +0000 (14:50 -0700)]
isl: Fix width check in isl_gen7_choose_msaa_layout.

The restriction is supposed to apply if the width *field* is >= 8192,
meaning the actual width *value* is >= 8193.

The code also incorrectly used == for some reason.

Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965: Use is_scheduling_barrier instead of schedule_node::is_barrier.
Kenneth Graunke [Wed, 18 Oct 2017 06:19:20 +0000 (23:19 -0700)]
i965: Use is_scheduling_barrier instead of schedule_node::is_barrier.

Commit a73116ecc60414ade89802150b tried to make add_barrier_deps()
walk to the next barrier, and stop.  To accomplish that, it added an
is_barrier flag.  Unfortunately, this only works half of the time.

The issue is that add_barrier_deps() walks both backward (to the
previous barrier), and forward (to the next barrier).  It also sets
is_barrier.  Assuming that we're processing instructions in forward
order, this means that is_barrier will be set for previous instructions,
but not future ones.  So we'll never see it, and walk further than we
need to.

dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
now compiles its shaders in 3.6 seconds instead of 3.3 minutes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Pallavi G <pallavi.g@intel.com>
7 years agoi965: Move fs_inst::has_side_effects()'s eot check to the parent class.
Kenneth Graunke [Wed, 18 Oct 2017 18:22:43 +0000 (11:22 -0700)]
i965: Move fs_inst::has_side_effects()'s eot check to the parent class.

This eliminates a layer of wrapping, and makes a backend_instruction
sufficient.  The downside is that it exposes 'eot' to the vec4 backend,
which it doesn't need, but can basically happily ignore.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Pallavi G <pallavi.g@intel.com>
7 years agotgsi: fix tgsi_util_get_inst_usage_mask
Roland Scheidegger [Wed, 18 Oct 2017 21:13:58 +0000 (23:13 +0200)]
tgsi: fix tgsi_util_get_inst_usage_mask

The logic for handling shadow coords was completely broken.
Fixes be3ab867bd444594f9d9e0f8e59d305d15769afd.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103265

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agodocs: update calendar, add news item and link release notes for 17.2.3
Emil Velikov [Thu, 19 Oct 2017 12:31:39 +0000 (13:31 +0100)]
docs: update calendar, add news item and link release notes for 17.2.3

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
7 years agodocs: add sha256 checksums for 17.2.3
Emil Velikov [Thu, 19 Oct 2017 12:28:13 +0000 (13:28 +0100)]
docs: add sha256 checksums for 17.2.3

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit facc85181883cb514b2b1a8106255be88fd54c6e)

7 years agodocs: add release notes for 17.2.3
Emil Velikov [Thu, 19 Oct 2017 12:10:20 +0000 (13:10 +0100)]
docs: add release notes for 17.2.3

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 28dc4b64f2f75dc0a0a98e2b97f1dd3350f50e2d)

7 years agoglsl/linker: produce error when invalid explicit locations are used
Iago Toral Quiroga [Mon, 16 Oct 2017 10:43:52 +0000 (12:43 +0200)]
glsl/linker: produce error when invalid explicit locations are used

We only need to add a check to validate output locations here. For
inputs with invalid locations we will fail to link when we can't
find a matching output in the same (invalid) location.

v2: compute location slots properly depending on shader stage and
    variable type / direction

Fixes:
KHR-GL45.enhanced_layouts.varying_location_limit

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoi965/sbe: fix active components for SSO programs with over 16 inputs
Iago Toral Quiroga [Fri, 13 Oct 2017 07:22:54 +0000 (09:22 +0200)]
i965/sbe: fix active components for SSO programs with over 16 inputs

When we have up to 16 FS inputs, the SF unit will reorder our inputs
to be consecutive, however, when we have more than 16 we need to
to read our inputs from the URB exactly as they have been
output from the previous stage. This means that for SSO we have to
consider if we have URB padding due to unused input locations.

Specifically, this affects gen9 active components programming, since
for things to work in scenarios with over 16 inputs that have padded
regions we need to ensure that we program active components for the
padded regions too. If we don't do this the hardware won't read
the URB properly for inputs located after padded regions.

Found empirically.

Fixes (these also require a patch in CTS):
KHR-GL45.enhanced_layouts.varying_locations
KHR-GL45.enhanced_layouts.varying_array_locations

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: Do not log a perf warning when mapping an idle bo
Chris Wilson [Wed, 18 Oct 2017 08:49:31 +0000 (09:49 +0100)]
i965: Do not log a perf warning when mapping an idle bo

We only want to scare the user away from causing a GPU stall for mapping
a busy bo. The time taken to instantiate the set of pages for a buffer
and their mmapping is unavoidable and flagging idle bo as being busy is
"crying wolf".

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: Use a union to bitcast a float
Matt Turner [Thu, 19 Oct 2017 05:16:05 +0000 (22:16 -0700)]
i965: Use a union to bitcast a float

... which does not break C's aliasing rules.

7 years agodrirc: Group a few games in the glthread whitelist together.
Darren Salt [Sun, 15 Oct 2017 22:22:22 +0000 (23:22 +0100)]
drirc: Group a few games in the glthread whitelist together.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
7 years agodrirc: Enable glthread for more games (Saints Row 4 & Gat out of Hell).
Darren Salt [Sun, 15 Oct 2017 22:22:21 +0000 (23:22 +0100)]
drirc: Enable glthread for more games (Saints Row 4 & Gat out of Hell).

“Saints Row: Gat out of Hell” benefits from this on slower CPUs in that
usage spikes on individual cores are avoided, which in turn makes it harder
to hit a bug which causes broken audio and the game to hang on exit.

“Saints Row IV” appears to be fine either way, but also exhibits the audio
breakage bug: glthread is therefore being enabled on the grounds that it should
make it a little harder to hit that bug.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradv: reset dirty flags after flushing all states
Samuel Pitoiset [Wed, 18 Oct 2017 12:09:27 +0000 (14:09 +0200)]
radv: reset dirty flags after flushing all states

Move it to radv_cmd_buffer_flush_state() because if
rasterizerDiscardEnable is true, the flags are not cleared.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: do not re-emit the index buffer for every draw call
Samuel Pitoiset [Wed, 18 Oct 2017 12:17:23 +0000 (14:17 +0200)]
radv: do not re-emit the index buffer for every draw call

It can only be changed when CmdBindIndexBuffer() is called
or when a secondary buffer is used. Though not always, but
let's re-emit the packets in this situation for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: remove useless mask operation in radv_cs_emit_draw_indexed_packet()
Samuel Pitoiset [Wed, 18 Oct 2017 12:17:22 +0000 (14:17 +0200)]
radv: remove useless mask operation in radv_cs_emit_draw_indexed_packet()

This saves few CPU cycles when CmdDrawIndexed() is used a lot.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: Do not read from the disk cache with RADV_DEBUG=nocache.
Bas Nieuwenhuizen [Mon, 16 Oct 2017 11:54:02 +0000 (13:54 +0200)]
radv: Do not read from the disk cache with RADV_DEBUG=nocache.

Otherwise the flag is borderline useless.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Set active_stages after getting cached shaders
Alex Smith [Wed, 18 Oct 2017 13:47:51 +0000 (14:47 +0100)]
radv: Set active_stages after getting cached shaders

Fixes: 7d45d22fdd2e ("radv: switch to using radv_create_shaders()")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: Don't free NIR shaders if tracing
Alex Smith [Wed, 18 Oct 2017 14:08:20 +0000 (15:08 +0100)]
radv: Don't free NIR shaders if tracing

Fixes a crash while generating a hang report.

Fixes: 7d45d22fdd2e ("radv: switch to using radv_create_shaders()")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoRevert "egl: move alloc & init out of _eglBuiltInDriver{DRI2,Haiku}"
Marek Olšák [Wed, 18 Oct 2017 18:23:00 +0000 (20:23 +0200)]
Revert "egl: move alloc & init out of _eglBuiltInDriver{DRI2,Haiku}"

This reverts commit 8cb84c8477a57ed05d703669fee1770f31b76ae6.

This fixes crashing shader-db/run.

7 years agoRevert "egl: drop EGL driver `name`"
Marek Olšák [Wed, 18 Oct 2017 18:22:58 +0000 (20:22 +0200)]
Revert "egl: drop EGL driver `name`"

This reverts commit 6414d6bd8d2897f4ba643357fe3037f3acd60879.

This is needed to apply the next revert.

7 years agost/mesa: set dimension for constants in ATI_fragment_shader
Miklós Máté [Sun, 15 Oct 2017 17:46:03 +0000 (19:46 +0200)]
st/mesa: set dimension for constants in ATI_fragment_shader

This fixes an assertion failure introduced by 30a2f0dfd46de.

Fixes: 30a2f0dfd46 ("radeonsi: add an assertion that only
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
7 years agost/osmesa: include u_inlines.h for pipe_resource_reference
Michel Dänzer [Wed, 18 Oct 2017 16:44:58 +0000 (18:44 +0200)]
st/osmesa: include u_inlines.h for pipe_resource_reference

Fixes build failure due to unresolved symbol.

Fixes: 7561da367bae "st/mesa: Initialize textures array in
                     st_framebuffer_validate"

Trivial.

7 years agost/mesa: Initialize textures array in st_framebuffer_validate
Michel Dänzer [Mon, 16 Oct 2017 14:35:18 +0000 (16:35 +0200)]
st/mesa: Initialize textures array in st_framebuffer_validate

And just reference pipe_resources to it in the validate callbacks.

Avoids pipe_resource leaks when st_framebuffer_validate ends up calling
the validate callback multiple times, e.g. when a window is resized.

v2:
* Use generic stable tag instead of Fixes: tag, since the problem could
  already happen before the commit referenced in v1 (Thomas Hellstrom)
* Use memset to initialize the array on the stack instead of allocating
  the array with os_calloc.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
7 years agoegl: set UseFallback if LIBGL_ALWAYS_SOFTWARE is set
Eric Engestrom [Wed, 18 Oct 2017 16:04:27 +0000 (17:04 +0100)]
egl: set UseFallback if LIBGL_ALWAYS_SOFTWARE is set

Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agoegl: drop EGL driver `name`
Eric Engestrom [Wed, 18 Oct 2017 15:32:33 +0000 (16:32 +0100)]
egl: drop EGL driver `name`

The "DRI2" name was reported as confusing when printing EGL infos (one
user reported thinking DRI3 was not working on his X server), and the
only alternative is Haiku, which can only be used on a Haiku machine.

The name therefore doesn't add any information that the user wouldn't
know already, so let's just drop it.

Cc: Kai Wasserbäch <kai@dev.carbon-project.org>
Suggested-by: Emil Velikov <emil.l.velikov@gmail.com>
Related-to: b174a1ae720cb404738c ("egl: Simplify the "driver" interface")
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agoegl: drop always-false TestOnly option
Eric Engestrom [Wed, 18 Oct 2017 15:31:23 +0000 (16:31 +0100)]
egl: drop always-false TestOnly option

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agoFix the xf86vm meson dependency
Nicholas Miell [Wed, 18 Oct 2017 01:04:16 +0000 (18:04 -0700)]
Fix the xf86vm meson dependency

The pkg-config file is called xxf86vm.

Signed-off-by: Nicholas Miell <nmiell@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agoegl: move alloc & init out of _eglBuiltInDriver{DRI2,Haiku}
Eric Engestrom [Mon, 25 Sep 2017 21:35:24 +0000 (22:35 +0100)]
egl: move alloc & init out of _eglBuiltInDriver{DRI2,Haiku}

Note: dropping the EGL_BAD_ALLOC in egl_haiku because it's
overwritten by the EGL_NOT_INITIALIZED in eglInitialize().

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoegl_dri2: drop dri2_egl_driver struct
Eric Engestrom [Tue, 26 Sep 2017 12:49:05 +0000 (13:49 +0100)]
egl_dri2: drop dri2_egl_driver struct

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoegl_dri2: move glFlush out of struct dri2_egl_driver
Eric Engestrom [Tue, 26 Sep 2017 11:16:33 +0000 (12:16 +0100)]
egl_dri2: move glFlush out of struct dri2_egl_driver

There's no reason to store this there, it doesn't depend on the driver.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agollvmpipe: handle shader sample mask output
Roland Scheidegger [Tue, 17 Oct 2017 19:55:03 +0000 (21:55 +0200)]
llvmpipe: handle shader sample mask output

This probably isn't all that useful for GL, but there are apis where
sample_mask is a valid output even without msaa.
Just discard the pixel if the sample_mask doesn't include the bit for
sample 0.

Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agoanv: Fix instance typos.
Vinson Lee [Wed, 18 Oct 2017 08:12:27 +0000 (08:12 +0000)]
anv: Fix instance typos.

Fix build error.

  CC       vulkan/vulkan_libvulkan_common_la-anv_device.lo
In file included from vulkan/anv_device.c:33:0:
vulkan/anv_device.c: In function ‘anv_AllocateMemory’:
vulkan/anv_device.c:1562:37: error: ‘struct anv_device’ has no member named ‘instace’; did you mean ‘instance’?
          result = vk_errorf(device->instace, device,
                                     ^
vulkan/anv_private.h:317:17: note: in definition of macro ‘vk_errorf’
     __vk_errorf(instance, obj, REPORT_OBJECT_TYPE(obj), error,\
                 ^~~~~~~~

Fixes: 9775894f1025 ("anv: Move size check from anv_bo_cache_import() to caller (v2)")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agomesa: fix trivial typo in _mesa_PixelMapusv() error string
Brian Paul [Wed, 18 Oct 2017 15:44:13 +0000 (09:44 -0600)]
mesa: fix trivial typo in _mesa_PixelMapusv() error string

Signed-off-by: Brian Paul <brianp@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103323

7 years agomeson: move expat dependency where it's needed
Eric Engestrom [Wed, 18 Oct 2017 11:06:16 +0000 (12:06 +0100)]
meson: move expat dependency where it's needed

Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agoautomake: intel: move expat handling where it's used
Hongxu Jia [Wed, 18 Oct 2017 01:47:05 +0000 (09:47 +0800)]
automake: intel: move expat handling where it's used

Linking libvulkan_intel.so can fail, due to unresolved references to
libexpat.so.

EXPAT_CFLAGS should be moved as well.

Signed-off-by: Hongxu Jia <hongxu.jia@windriver.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoradv: don't create dummy fs when compiling compute stage
Timothy Arceri [Wed, 18 Oct 2017 02:58:36 +0000 (13:58 +1100)]
radv: don't create dummy fs when compiling compute stage

Fixes: d1c9f30d7ff7 "radv: add radv_create_shaders() helper"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: use the dispatch initiator for indirect dispatches
Samuel Pitoiset [Tue, 17 Oct 2017 10:02:00 +0000 (12:02 +0200)]
radv: use the dispatch initiator for indirect dispatches

Missed that when I allowed waves to be launched out-of-order.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: remove XtoY_temps structs
Samuel Pitoiset [Tue, 17 Oct 2017 09:04:36 +0000 (11:04 +0200)]
radv: remove XtoY_temps structs

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoanv: Install as Vulkan HAL module in Android.mk build
Tapani Pälli [Thu, 14 Sep 2017 06:57:40 +0000 (09:57 +0300)]
anv: Install as Vulkan HAL module in Android.mk build

Now that anvil fully implements the Vulkan HAL interface, we can install
it as the vendor HAL module at /vendor/lib/hw/vulkan.${board}.so. To do
so:

  - Rename LOCAL_MODULE to vulkan.$(TARGET_BOARD_PLATFORM).
  - Use LOCAL_PROPRIETARY_MODULE to install under vendor path.

Tested by running different Sascha Williams demos on Android-IA.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
[chadv: Extract this hunk from Tapani's patch, and embed it as
 stand-alone patch in my arc-vulkan series].
Signed-off-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Implement VK_ANDROID_native_buffer (v9)
Chad Versace [Tue, 15 Nov 2016 00:13:51 +0000 (16:13 -0800)]
anv: Implement VK_ANDROID_native_buffer (v9)

This implementation is correct (afaict), but takes two shortcuts
regarding the import/export of Android sync fds.

  Shortcut 1. When Android calls vkAcquireImageANDROID to import a sync
  fd into a VkSemaphore or VkFence, the driver instead simply blocks on
  the sync fd, then puts the VkSemaphore or VkFence into the signalled
  state. Thanks to implicit sync, this produces correct behavior (with
  extra latency overhead, perhaps) despite its ugliness.

  Shortcut 2. When Android calls vkQueueSignalReleaseImageANDROID to export
  a collection of wait semaphores as a sync fd, the driver instead
  submits the semaphores to the queue, then returns sync fd -1, which
  informs the caller that no additional synchronization is needed.
  Again, thanks to implicit sync, this produces correct behavior (with
  extra batch submission overhead) despite its ugliness.

I chose to take the shortcuts instead of properly importing/exporting
the sync fds for two reasons:

  Reason 1. I've already tested this patch with dEQP and with demos
  apps. It works. I wanted to get the tested patches into the tree now,
  and polish the implementation afterwards.

  Reason 2. I want to run this on a 3.18 kernel (gasp!). In 3.18, i915
  supports neither Android's sync_fence, nor upstream's sync_file, nor
  drm_syncobj. Again, I tested these patches on Android with a 3.18
  kernel and they work.

I plan to quickly follow-up with patches that remove the shortcuts and
properly import/export the sync fds.

Non-Testing
===========
I did not test at all using the Android.mk buildsystem. I may have broke
it. Please test and review that.

Testing
=======
I tested with 64-bit ARC++ on a Skylake Chromebook and a 3.18 kernel.
The following pass (as of patchset v9):

  - a little spinning cube demo APK
  - several Sascha demos
  - dEQP-VK.info.*
  - dEQP-VK.api.wsi.android.*
      (except dEQP-VK.api.wsi.android.swapchain.*.image_usage, because
      dEQP wants to create swapchains with VK_IMAGE_USAGE_STORAGE_BIT)
  - dEQP-VK.api.smoke.*
  - dEQP-VK.api.info.instance.*
  - dEQP-VK.api.info.device.*

v2:
  - Reject VkNativeBufferANDROID if the dma-buf's size is too small for
    the VkImage.
  - Stop abusing VkNativeBufferANDROID by passing it to vkAllocateMemory
    during vkCreateImage. Instead, directly import its dma-buf during
    vkCreateImage with anv_bo_cache_import(). [for jekstrand]
  - Rebase onto Tapani's VK_EXT_debug_report changes.
  - Drop `CPPFLAGS += $(top_srcdir)/include/android`. The dir does not
    exist.

v3:
  - Delete duplicate #include "anv_private.h". [per Tapani]
  - Try to fix the Android-IA build in Android.vulkan.mk by following
    Tapani's example.

v4:
  - Unset EXEC_OBJECT_ASYNC and set EXEC_OBJECT_WRITE on the imported
    gralloc buffer, just as we do for all other winsys buffers in
    anv_wsi.c. [found by Tapani]

v5:
  - Really fix the Android-IA build by ensuring that Android.vulkan.mk
    uses Mesa' vulkan.h and not Android's.  Insert -I$(MESA_TOP)/include
    before -Iframeworks/native/vulkan/include. [for Tapani]
  - In vkAcquireImageANDROID, submit signal operations to the
    VkSemaphore and VkFence. [for zhou]

v6:
  - Drop copy-paste duplication in vkGetSwapchainGrallocUsageANDROID().
    [found by zhou]
  - Improve comments in vkGetSwapchainGrallocUsageANDROID().

v7:
  - Fix vkGetSwapchainGrallocUsageANDROID() to inspect its
    VkImageUsageFlags parameter. [for tfiga]
  - This fix regresses dEQP-VK.api.wsi.android.swapchain.*.image_usage
    because dEQP wants to create swapchains with
    VK_IMAGE_USAGE_STORAGE_BIT.

v8:
  - Drop unneeded goto in vkAcquireImageANDROID. [for tfiga]

v8.1: (minor changes)
  - Drop errant hunks added by rerere in anv_device.c.
  - Drop explicit mention of VK_ANDROID_native_buffer in
    anv_entrypoints_gen.py. [for jekstrand]

v9:
  - Isolate as much Android code as possible, moving it from anv_image.c
    to anv_android.c. Connect the files with anv_image_from_gralloc().
    Remove VkNativeBufferANDROID params from all anv_image.c
    funcs. [for krh]
  - Replace some intel_loge() with vk_errorf() in anv_android.c.
  - Use © in copyright line. [for krh]

Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v5)
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> (v9)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v9)
Cc: zhoucm1 <david1.zhou@amd.com>
Cc: Tomasz Figa <tfiga@chromium.org>
7 years agoanv: Move size check from anv_bo_cache_import() to caller (v2)
Chad Versace [Tue, 12 Sep 2017 21:05:08 +0000 (14:05 -0700)]
anv: Move size check from anv_bo_cache_import() to caller (v2)

This change prepares for VK_ANDROID_native_buffer. When the user imports
a gralloc hande into a VkImage using VK_ANDROID_native_buffer, the user
provides no size. The driver must infer the size from the internals of
the gralloc buffer.

The patch is essentially a refactor patch, but it does change behavior
in some edge cases, described below. In what follows, the "nominal size"
of the bo refers to anv_bo::size, which may not match the bo's "actual
size" according to the kernel.

Post-patch, the nominal size of the bo returned from
anv_bo_cache_import() is always the size of imported dma-buf according
to lseek(). Pre-patch, the bo's nominal size was difficult to predict.
If the imported dma-buf's gem handle was not resident in the cache, then
the bo's nominal size was align(VkMemoryAllocateInfo::allocationSize,
4096).  If it *was* resident, then the bo's nominal size was whatever
the cache returned. As a consequence, the first cache insert decided the
bo's nominal size, which could be significantly smaller compared to the
dma-buf's actual size, as the nominal size was determined by
VkMemoryAllocationInfo::allocationSize and not lseek().

I believe this patch cleans up that messy behavior. For an imported or
exported VkDeviceMemory, anv_bo::size should now be the true size of the
bo, if I correctly understand the problem (which I possibly don't).

v2:
  - Preserve behavior of aligning size to 4096 before checking. [for
    jekstrand]
  - Check size with < instead of <=, to match behavior of commit c0a4f56
    "anv: bo_cache: allow importing a BO larger than needed". [for
    chadv]

7 years agomeson: turn on pl111 not vc4 when pl111 driver specificed
Dylan Baker [Tue, 17 Oct 2017 21:44:15 +0000 (14:44 -0700)]
meson: turn on pl111 not vc4 when pl111 driver specificed

Reviewed-by: Eric Anholt <eric@anholt.net>
fixes: 1918c9b1627d5403 ("meson: Add support for the pl111 driver.")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
7 years agoradv: Link shaders.
Bas Nieuwenhuizen [Wed, 8 Feb 2017 23:12:10 +0000 (00:12 +0100)]
radv: Link shaders.

Here we make use of NIR the linking helpers to remove unused
varyings.

Sascha Willems demo results:

computecullandlod 39 -> 41 fps
pipelines ~6100 -> ~6200 fps

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: reuse the multiple shader store & load functions for gs copy variant
Timothy Arceri [Fri, 13 Oct 2017 01:22:24 +0000 (12:22 +1100)]
radv: reuse the multiple shader store & load functions for gs copy variant

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: remove some now unused shader compile code
Timothy Arceri [Fri, 13 Oct 2017 01:02:18 +0000 (12:02 +1100)]
radv: remove some now unused shader compile code

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: switch to using radv_create_shaders()
Timothy Arceri [Sat, 14 Oct 2017 22:56:01 +0000 (09:56 +1100)]
radv: switch to using radv_create_shaders()

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: add radv_create_shaders() helper
Bas Nieuwenhuizen [Mon, 16 Oct 2017 22:45:06 +0000 (09:45 +1100)]
radv: add radv_create_shaders() helper

This is a combined shader creation helper than will help us to
create the shaders for each stage at once. This will allow us to
do some link time optimisations.

Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: add radv_hash_shaders() helper
Bas Nieuwenhuizen [Sat, 14 Oct 2017 01:42:40 +0000 (12:42 +1100)]
radv: add radv_hash_shaders() helper

This will be used to create a hash of the combined shaders in the
pipeline.

Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Add multiple shader cache store & load functions.
Bas Nieuwenhuizen [Thu, 1 Dec 2016 22:07:57 +0000 (23:07 +0100)]
radv: Add multiple shader cache store & load functions.

Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Change cache datastructures for combined pipelines.
Bas Nieuwenhuizen [Thu, 1 Dec 2016 08:03:50 +0000 (09:03 +0100)]
radv: Change cache datastructures for combined pipelines.

Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: reorder init function calls
Timothy Arceri [Sat, 14 Oct 2017 02:14:32 +0000 (13:14 +1100)]
radv: reorder init function calls

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agomeson: Add support for the vc5 driver.
Eric Anholt [Fri, 13 Oct 2017 01:40:16 +0000 (18:40 -0700)]
meson: Add support for the vc5 driver.

v2: Default vc5 to off, since it requires the simulator currently.  Add
    missing dep on the XML generation from libbroadcom_vc5.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (v1)
7 years agomeson: Add support for the pl111 driver.
Eric Anholt [Fri, 13 Oct 2017 01:39:08 +0000 (18:39 -0700)]
meson: Add support for the pl111 driver.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
7 years agomeson: Add support for the vc4 driver.
Eric Anholt [Thu, 12 Oct 2017 20:53:12 +0000 (13:53 -0700)]
meson: Add support for the vc4 driver.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
7 years agoradeonsi: if there's just const buffer 0, set it in place of CONST/SSBO pointer
Marek Olšák [Sun, 8 Oct 2017 01:44:07 +0000 (03:44 +0200)]
radeonsi: if there's just const buffer 0, set it in place of CONST/SSBO pointer

SI_SGPR_CONST_AND_SHADER_BUFFERS now contains the pointer to const buffer 0
if there is no other buffer there.

Benefits:
- there is no constbuf descriptor upload and shader load

It's assumed that all constant addresses are within bounds. Non-constant
addresses are clamped against the last declared CONST variable.
This only works if the state tracker ensures the bound constant buffer
matches what the shader needs.

Once we get 32-bit pointers, we can only do this for user constant buffers
where the driver is in charge of the upload so that it can guarantee a 32-bit
address.

The real performance benefit might not be measurable.

These apps get 100% theoretical benefit in all shaders (except where noted):
- antichamber
- barman arkham origins
- borderlands 2
- borderlands pre-sequel
- brutal legend
- civilization BE
- CS:GO
- deadcore
- dota 2 -- most shaders
- europa universalis
- grid autosport -- most shaders
- left 4 dead 2
- legend of grimrock
- life is strange
- payday 2
- portal
- rocket league
- serious sam 3 bfe
- talos principle
- team fortress 2
- thea
- unigine heaven
- unigine valley -- also sanctuary and tropics
- wasteland 2
- xcom: enemy unknown & enemy within
- tesseract
- unity (engine)

Changed stats only:
    SGPRS: 2059998 -> 2086238 (1.27 %)
    VGPRS: 1626888 -> 1626904 (0.00 %)
    Spilled SGPRs: 7902 -> 7865 (-0.47 %)
    Code Size: 60924520 -> 60982660 (0.10 %) bytes
    Max Waves: 374539 -> 374526 (-0.00 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>