mesa.git
5 years agoradeonsi: Bump number of allowed global buffers to 32
Jan Vesely [Thu, 18 Oct 2018 19:15:06 +0000 (15:15 -0400)]
radeonsi: Bump number of allowed global buffers to 32

Fixes assertion failure/crash when running luxmark/luxball on clover.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108272
CC: mesa-stable@lists.freedesktop.org
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradv: fix check for perftest options size
Andres Rodriguez [Thu, 18 Oct 2018 19:32:31 +0000 (15:32 -0400)]
radv: fix check for perftest options size

It was using the debug options array size.

CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradeonsi: fix incorrect hw screen offset and guardband computation
Marek Olšák [Thu, 18 Oct 2018 18:42:42 +0000 (14:42 -0400)]
radeonsi: fix incorrect hw screen offset and guardband computation

It resulted in assertion failures or incorrect rendering.

Broken by: 9e182b8313c5ab952498a76495f57e8420f9e5ad

5 years agovulkan/wsi: Use VK_EXT_pci_bus_info for DRM fd matching
Jason Ekstrand [Thu, 18 Oct 2018 15:08:32 +0000 (10:08 -0500)]
vulkan/wsi: Use VK_EXT_pci_bus_info for DRM fd matching

This lets us avoid passing the DRM fd around all over the place and gets
us closer to layer utopia.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoloader/dri3: Also wait for front buffer fence if we triggered it
Michel Dänzer [Mon, 1 Oct 2018 16:43:46 +0000 (18:43 +0200)]
loader/dri3: Also wait for front buffer fence if we triggered it

In that case, we have to wait for the fence to synchronize with the
corresponding drawing we triggered in the X server.

Fixes incorrect display with the i965 driver and some applications, e.g.
solvespace.

Bugzilla: https://bugs.freedesktop.org/108097
Fixes: aefac10fecc9 "loader/dri3: Only wait for back buffer fences in
                     dri3_get_buffer"
Tested-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
5 years agoanv: Stop generating weak references for instance entrypoints
Jason Ekstrand [Mon, 15 Oct 2018 02:56:47 +0000 (21:56 -0500)]
anv: Stop generating weak references for instance entrypoints

We don't need weak references to instance entrypoints because we never
have more than one of each so we don't need the NULL fall-back.  This
also helps us avoid forgetting things because we now get link errors for
missing instance entrypoints.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agovulkan/wsi: Implement GetPhysicalDevicePresentRectanglesKHR
Jason Ekstrand [Mon, 15 Oct 2018 02:56:34 +0000 (21:56 -0500)]
vulkan/wsi: Implement GetPhysicalDevicePresentRectanglesKHR

This got missed during 1.1 enabling because it was defined as an
interaction between device groups and WSI and it wasn't obvious it was
in the delta.

The idea behind it is that it's supposed to provide a hint to the
application in a multi-GPU setup to indicate which regions of the screen
are being scanned out by which GPU so a multi-device split-screen
rendering application can render each part of the screen on the GPU that
will be presenting it and avoid extra bus traffic between GPUs.  On a
single-GPU setup or one which doesn't support this present mode, we need
to do something.  We choose to return the window size (or a max-size
rect) if the compositor, X server, or crtc is associated with the given
physical device and zero rectangles otherwise.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agovulkan/wsi: Store the instance allocator in wsi_device
Jason Ekstrand [Wed, 17 Oct 2018 19:35:16 +0000 (14:35 -0500)]
vulkan/wsi: Store the instance allocator in wsi_device

We already have wsi_device and we know the instance allocator at
wsi_device_init time so there's no need to pass it into the physical
device queries.  This also fixes a memory allocation domain bug that can
occur if CreateSwapchain gets called prior to any queries (not likely)
in which case the cached connection gets allocated off the device
instead of the instance.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agost/xlib: Use more appropriate include guard
Michał Janiszewski [Tue, 16 Oct 2018 21:44:22 +0000 (23:44 +0200)]
st/xlib: Use more appropriate include guard

Signed-off-by: Michał Janiszewski <janisozaur+signed@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com
5 years agogallium: Fix mismatched ifdef-guards
Michał Janiszewski [Tue, 16 Oct 2018 21:44:21 +0000 (23:44 +0200)]
gallium: Fix mismatched ifdef-guards

Signed-off-by: Michał Janiszewski <janisozaur+signed@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agosoftpipe: dynamically allocate space for immediate constants
Gert Wollny [Tue, 16 Oct 2018 08:07:49 +0000 (10:07 +0200)]
softpipe: dynamically allocate space for immediate constants

The number of immediate constants was fixed and the size check was
only done by means of an assertion. Given this a shader that emits
more immediate constants would result in a memory corruption when
mesa is build in release mode.

Instead of using this fixed limit allocate the space dynamically, let it
grow as needed, and also remove the unused ImmArray.

Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.1
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agoradv: use nir_shrink_vec_array_vars()
Timothy Arceri [Wed, 17 Oct 2018 23:23:42 +0000 (10:23 +1100)]
radv: use nir_shrink_vec_array_vars()

Totals from affected shaders:
SGPRS: 1096 -> 1096 (0.00 %)
VGPRS: 1192 -> 1056 (-11.41 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 100940 -> 94384 (-6.49 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 100 -> 112 (12.00 %)
Wait states: 0 -> 0 (0.00 %)

All affected shaders are from Batman Arkham City.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: use nir_split_array_vars()
Timothy Arceri [Wed, 17 Oct 2018 23:19:16 +0000 (10:19 +1100)]
radv: use nir_split_array_vars()

We call in the opt loop in case another pass results in an
array with indirect access being turned into direct access.

Totals from affected shaders:
SGPRS: 512 -> 496 (-3.12 %)
VGPRS: 456 -> 452 (-0.88 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 40040 -> 39664 (-0.94 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 41 -> 43 (4.88 %)
Wait states: 0 -> 0 (0.00 %)

All affected shaders are from Batman Arkham City.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: use nir_opt_find_array_copies()
Timothy Arceri [Wed, 17 Oct 2018 22:42:17 +0000 (09:42 +1100)]
radv: use nir_opt_find_array_copies()

Totals from affected shaders:
SGPRS: 1112 -> 1112 (0.00 %)
VGPRS: 1492 -> 1196 (-19.84 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 112172 -> 101316 (-9.68 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 93 -> 98 (5.38 %)
Wait states: 0 -> 0 (0.00 %)

All affected shaders are from "Batman: Arkham City" over DXVK.

The pass detects that the temporary array created by DXVK for
storing TCS inputs is a copy of the input arrays and allows
us to avoid copying all of the input data and then indirecting
on it with if-ladders, instead we just do indirect indexing.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: use nir_opt_copy_prop_vars and nir_opt_dead_write_vars
Timothy Arceri [Wed, 17 Oct 2018 21:55:46 +0000 (08:55 +1100)]
radv: use nir_opt_copy_prop_vars and nir_opt_dead_write_vars

Totals from affected shaders:
SGPRS: 2856 -> 2856 (0.00 %)
VGPRS: 3236 -> 3248 (0.37 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 236560 -> 233548 (-1.27 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 277 -> 283 (2.17 %)
Wait states: 0 -> 0 (0.00 %)

Even in the cases were we have increased VGPR use it appears
the NIR is improved significantly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agovulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5]
Keith Packard [Thu, 11 Oct 2018 23:05:18 +0000 (16:05 -0700)]
vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5]

Offers three clocks, device, clock monotonic and clock monotonic
raw. Could use some kernel support to reduce the deviation between
clock values.

v2:
Ensure deviation is at least as big as the GPU time interval.

v3:
Set device->lost when returning DEVICE_LOST.
Use MAX2 and DIV_ROUND_UP instead of open coding these.
Delete spurious TIMESTAMP in radv version.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v4:
Add anv_gem_reg_read to anv_gem_stubs.c

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
v5:
Adjust maxDeviation computation to max(sampled_clock_period) +
sample_interval.

Suggested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agointel/compiler/icl: Use invocation id bits 22:16 instead of 23:17
Topi Pohjolainen [Tue, 16 Oct 2018 11:56:51 +0000 (07:56 -0400)]
intel/compiler/icl: Use invocation id bits 22:16 instead of 23:17

Identifier bits in the dispatch header have changed. See Bspec:

SINGLE_PATCH Payload:

3D Pipeline Stages - 3D Pipeline Geometry -
Hull Shader (HS) Stage IVB+ - Payloads IVB+

Fixes: KHR-GL46.tessellation_shader.tessellation_shader_tc_barriers.barrier_guarded_read_write_calls
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
5 years agoFix setting indent-tabs-mode in the Emacs .dir-locals.el files
Neil Roberts [Wed, 17 Oct 2018 15:27:07 +0000 (17:27 +0200)]
Fix setting indent-tabs-mode in the Emacs .dir-locals.el files

Some of the .dir-locals.el had the wrong name for the truthy value so
it wasn’t setting indent-tabs-mode.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
5 years agofreedreno/a6xx: don't allocate binning rb
Rob Clark [Mon, 15 Oct 2018 13:56:35 +0000 (09:56 -0400)]
freedreno/a6xx: don't allocate binning rb

Now that a single cmdstream is used for both binning and draw passes, we
can skip allocation of cmdstream buffer for binning.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: single cmdstream for draw+binning
Rob Clark [Sat, 13 Oct 2018 17:56:05 +0000 (13:56 -0400)]
freedreno/a6xx: single cmdstream for draw+binning

Now that state which is different for draw vs binning pass is split out
into different state-groups with appropriate enable_mask (so the
appropriate one is chosen for draw vs binning), switch over to using a
single cmdstream for both passes.

This should significantly lower draw overhead for CPU bound benchmarks.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: split binning vs draw program stateobj's
Rob Clark [Sat, 13 Oct 2018 17:54:32 +0000 (13:54 -0400)]
freedreno/a6xx: split binning vs draw program stateobj's

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: split VBO state into binning/draw variants
Rob Clark [Sat, 13 Oct 2018 17:51:33 +0000 (13:51 -0400)]
freedreno/a6xx: split VBO state into binning/draw variants

Blob seems to manage to use same input registers for BS (binning pass)
vs VS (draw pass) shaders, so it can use the same VBO state for both.
We can't quite do that yet, so split them.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: move VBO state to stateobj
Rob Clark [Sat, 13 Oct 2018 17:24:24 +0000 (13:24 -0400)]
freedreno/a6xx: move VBO state to stateobj

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: move ZSA state to stateobj
Rob Clark [Sat, 13 Oct 2018 17:04:16 +0000 (13:04 -0400)]
freedreno/a6xx: move ZSA state to stateobj

Step towards single cmdstream, where we need different state-group-id's
for binning vs draw ZSA state.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: remove vismode param
Rob Clark [Fri, 12 Oct 2018 20:29:22 +0000 (16:29 -0400)]
freedreno/a6xx: remove vismode param

We don't need to keep this IGNORE_VISIBILITY in binning pass.  Prep work
for using single cmdstream for both draw and binning passes.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/ir3: move binning-pass fixup for a6xx+
Rob Clark [Fri, 12 Oct 2018 20:01:22 +0000 (16:01 -0400)]
freedreno/ir3: move binning-pass fixup for a6xx+

Move this to after ir3_cp (which can add lowered immediates to the const
state) for a6xx+, to ensure the uniform state matches between binning
and vertex shaders.  This way we can emit just a single VS_CONST state-
group when we re-use single cmdstream for both binning and draw passes.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: a bit more state emit cleanup
Rob Clark [Fri, 12 Oct 2018 19:10:46 +0000 (15:10 -0400)]
freedreno/a6xx: a bit more state emit cleanup

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: move framebuffer state emit to emit_mrt()
Rob Clark [Fri, 12 Oct 2018 18:47:13 +0000 (14:47 -0400)]
freedreno/a6xx: move framebuffer state emit to emit_mrt()

No point in checking this per-draw, since framebuffer change means new
batch.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: small emit_mrt() cleanup
Rob Clark [Fri, 12 Oct 2018 18:41:14 +0000 (14:41 -0400)]
freedreno/a6xx: small emit_mrt() cleanup

On a6xx, this is only used for pfb->cbufs so we can just directly pass
the pfb state.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: use program cache
Rob Clark [Wed, 10 Oct 2018 19:58:57 +0000 (15:58 -0400)]
freedreno/a6xx: use program cache

Use the in-memory cache to construct shader program state and re-use it
on subsequent draws, to lower driver overhead.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/ir3: shader variant cache
Rob Clark [Tue, 24 Jul 2018 18:12:24 +0000 (14:12 -0400)]
freedreno/ir3: shader variant cache

Cache that maps gallium hwcso (in this case, 'struct ir3_shader') plus
shader variant key to a generation specific state object.

This could eventually replace the linked list of shader variants, but
for now it lets us re-use the work currently done in fdN_program_emit()

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/ir3: move binning_pass out of shader variant key
Rob Clark [Tue, 2 Oct 2018 16:38:09 +0000 (12:38 -0400)]
freedreno/ir3: move binning_pass out of shader variant key

Prep work for a following patch, that introduces a cache to map from
program state (all shader stages) plus variant key to pre-baked hw
state (which could be emit'd via CP_SET_DRAW_STATE, for example).
To do that, we really want the variant key to be immutable, and to
treat the binning pass shader as an extra shader stage, rather than
as a VS variant.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/ir3: track # of samplers used by shader
Rob Clark [Tue, 2 Oct 2018 20:04:39 +0000 (16:04 -0400)]
freedreno/ir3: track # of samplers used by shader

This is useful for a6xx to avoid program state from depending on bound
tex/samp state.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: texture state obj
Rob Clark [Wed, 10 Oct 2018 19:59:29 +0000 (15:59 -0400)]
freedreno/a6xx: texture state obj

Unfortunately gallium doesn't match what the hw wants perfectly here, in
using a separate CSO for each texture/sampler.  So we have to use a hash
table to map the collection of texture/samplers to hw state object.

We probably could use separate hw state objects for texture and sampler
state, but mesa/st tends to update the tex and samp state together.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno: add resource seqno
Rob Clark [Fri, 12 Oct 2018 13:24:21 +0000 (09:24 -0400)]
freedreno: add resource seqno

Intended to be something more compact than a 64b pointer, which could be
used as a key into hashtables.  Prep work for texture state objects.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: move const emit to state group
Rob Clark [Sun, 7 Oct 2018 17:59:27 +0000 (13:59 -0400)]
freedreno/a6xx: move const emit to state group

Eventually we want to move nearly everything, but no other state depends
on const state, so this is the easiest one to move first.

For webgl aquarium, this reduces GPU load by about 10%, since for each
fish it does a uniform upload plus draw.. fish frequently are visible in
only a single tile, so this skips the uniform uploads for other tiles.

The additional step of avoiding WFI's when using CP_SET_DRAW_STATE seems
to be work an additional 10% gain for aquarium.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: add infrastructure for CP_DRAW_STATE
Rob Clark [Sun, 7 Oct 2018 17:58:30 +0000 (13:58 -0400)]
freedreno/a6xx: add infrastructure for CP_DRAW_STATE

Add helper to add state-groups to emit, and code to emit CP_DRAW_STATE
packet if we have any state-groups.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno: reduce resource dependency tracking overhead
Rob Clark [Mon, 1 Oct 2018 18:13:06 +0000 (14:13 -0400)]
freedreno: reduce resource dependency tracking overhead

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno: Remove the Emacs mode lines
Neil Roberts [Wed, 17 Oct 2018 15:38:03 +0000 (17:38 +0200)]
freedreno: Remove the Emacs mode lines

These are not necessary because the corresponding settings are set via
the .dir-locals.el file anyway. Most of them were missing a ‘:’ after
“tab-width” which was making Emacs display an annoying warning
whenever you open the file.

This patch was made with:

sed -ri '/-\*- mode:/,/^$/d' \
    $(find src/gallium/{drivers,winsys} -name \*.\[ch\] \
               -exec grep -l -- '-\*- mode:' {} \+)

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno: Fix the Emacs indentation configuration file
Neil Roberts [Wed, 17 Oct 2018 15:38:02 +0000 (17:38 +0200)]
freedreno: Fix the Emacs indentation configuration file

The .dir-locals.el had the wrong name for the truthy value so it
wasn’t setting indent-tabs-mode.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno: allocate batches from the cache in launch_grid
Hyunjun Ko [Wed, 17 Oct 2018 12:57:28 +0000 (21:57 +0900)]
freedreno: allocate batches from the cache in launch_grid

Needs to allocate batches from the cache so that it could
get a valid index and make resource dependancy tracking right.

In addition this fixes assertion on debug build since the commit
1a40faa8 landed.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno: adds nondraw param to fd_bc_alloc_batch
Hyunjun Ko [Wed, 17 Oct 2018 12:57:27 +0000 (21:57 +0900)]
freedreno: adds nondraw param to fd_bc_alloc_batch

Needs to specify nondraw when creating a batch through
fd_bc_alloc_batch since it'd better create a batch through
it rather than fd_batch_create.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: remove fd6_emit_render_cntl()
Rob Clark [Fri, 12 Oct 2018 20:27:59 +0000 (16:27 -0400)]
freedreno/a6xx: remove fd6_emit_render_cntl()

It was dead code carried over from a5xx

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/ir3: fix broken texcoord inputs
Rob Clark [Sat, 15 Sep 2018 18:41:07 +0000 (14:41 -0400)]
freedreno/ir3: fix broken texcoord inputs

TODO not sure if this is best solution, but current logic is broken for
texcoord inputs.  It is definitely the simplest solution.

Fixes: 1a24f519663 freedreno/ir3: ignore unused inputs
Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno: fix off-by-one error in BEGIN_RING()
Rob Clark [Sat, 13 Oct 2018 16:34:09 +0000 (12:34 -0400)]
freedreno: fix off-by-one error in BEGIN_RING()

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agoutil: document a limitation of util_fast_udiv32
Marek Olšák [Sun, 23 Sep 2018 00:03:27 +0000 (20:03 -0400)]
util: document a limitation of util_fast_udiv32

trivial

5 years agoi965/fs: Add 64-bit int immediate support to dump_instructions()
Matt Turner [Thu, 6 Sep 2018 18:15:55 +0000 (11:15 -0700)]
i965/fs: Add 64-bit int immediate support to dump_instructions()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
5 years agoradeonsi: track context rolls better for the Vega scissor bug workaround
Marek Olšák [Sun, 7 Oct 2018 02:44:36 +0000 (22:44 -0400)]
radeonsi: track context rolls better for the Vega scissor bug workaround

We should get fewer context rolls with the SET_CONTEXT_REG optimization,
but it would have been for nothing if the scissor state rolled the context
anyway. Don't emit the scissor state if there is no context roll.

5 years agoradeonsi: emit sample locations for 1xAA only when the hw bug is present
Marek Olšák [Sun, 7 Oct 2018 02:53:33 +0000 (22:53 -0400)]
radeonsi: emit sample locations for 1xAA only when the hw bug is present

5 years agoradeonsi: use compute shaders for clear_buffer & copy_buffer
Marek Olšák [Fri, 24 Aug 2018 04:29:04 +0000 (00:29 -0400)]
radeonsi: use compute shaders for clear_buffer & copy_buffer

Fast color clears should be much faster. Also, fast color clears on
evicted buffers should be 200x faster on GFX8 and older.

5 years agoradeonsi: use copy_buffer in buffer_do_flush_region directly
Marek Olšák [Sun, 7 Oct 2018 00:56:32 +0000 (20:56 -0400)]
radeonsi: use copy_buffer in buffer_do_flush_region directly

5 years agoradeonsi: use faster integer division for instance divisors
Marek Olšák [Sun, 23 Sep 2018 02:02:32 +0000 (22:02 -0400)]
radeonsi: use faster integer division for instance divisors

We know the divisors when we upload them, so instead we can precompute
and upload division factors derived from each divisor.

This fast division consists of add, mul_hi, and two shifts,
and we have to load 4 dwords intead of 1.

This probably won't affect any apps.

5 years agoac: add helpers for fast integer division by a constant
Marek Olšák [Sun, 23 Sep 2018 01:17:52 +0000 (21:17 -0400)]
ac: add helpers for fast integer division by a constant

5 years agoradeonsi: use higher subpixel precision (QUANT_MODE) for smaller viewports
Marek Olšák [Sat, 29 Sep 2018 01:43:49 +0000 (21:43 -0400)]
radeonsi: use higher subpixel precision (QUANT_MODE) for smaller viewports

5 years agoradeonsi: move emission of PA_SU_VTX_CNTL into emit_guardband
Marek Olšák [Sat, 29 Sep 2018 00:57:07 +0000 (20:57 -0400)]
radeonsi: move emission of PA_SU_VTX_CNTL into emit_guardband

We'll modify the quant mode there, which also affects the guarband
computation.

5 years agoradeonsi: don't re-upload the sample position constant buffer repeatedly
Marek Olšák [Sat, 29 Sep 2018 00:38:26 +0000 (20:38 -0400)]
radeonsi: don't re-upload the sample position constant buffer repeatedly

5 years agoradeonsi: set PA_SU_PRIM_FILTER_CNTL optimally
Marek Olšák [Sat, 29 Sep 2018 00:16:13 +0000 (20:16 -0400)]
radeonsi: set PA_SU_PRIM_FILTER_CNTL optimally

5 years agoradeonsi: center viewport to improve guardband clipping for high resolutions
Marek Olšák [Fri, 28 Sep 2018 22:49:29 +0000 (18:49 -0400)]
radeonsi: center viewport to improve guardband clipping for high resolutions

This will be more useful when we change the quant mode to increase subpixel
precision and decrease the viewport range (which might not be possible
if the viewport is not centered in the viewport range).

5 years agoradeonsi: save raster config in screen, add se_tile_repeat
Marek Olšák [Sat, 29 Sep 2018 23:28:20 +0000 (19:28 -0400)]
radeonsi: save raster config in screen, add se_tile_repeat

5 years agoradeonsi: switch back to standard DX sample positions
Marek Olšák [Fri, 28 Sep 2018 04:38:10 +0000 (00:38 -0400)]
radeonsi: switch back to standard DX sample positions

Apps may rely on them.

5 years agoradeonsi: add GDS support to CP DMA
Marek Olšák [Sun, 11 Sep 2016 20:15:04 +0000 (22:15 +0200)]
radeonsi: add GDS support to CP DMA

5 years agoradeonsi: rename si_gfx_* functions to si_cp_*
Marek Olšák [Fri, 21 Sep 2018 07:41:18 +0000 (03:41 -0400)]
radeonsi: rename si_gfx_* functions to si_cp_*

and write_event_eop -> release_mem

5 years agoradeonsi: make si_gfx_write_event_eop more configurable
Marek Olšák [Fri, 21 Sep 2018 07:36:32 +0000 (03:36 -0400)]
radeonsi: make si_gfx_write_event_eop more configurable

5 years agoanv/skylake: disable ForceThreadDispatchEnable
Sergii Romantsov [Wed, 19 Sep 2018 16:21:11 +0000 (19:21 +0300)]
anv/skylake: disable ForceThreadDispatchEnable

On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang.

-v2: enabling of  ForceThreadDispatchEnable is only for gen8, for
     gen9 and higher reverted enabling of PixelShaderHasUAV.

-v3 (Jason Ekstrand): Rework the comments a bit.

CC: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760
Fixes: 79270d2140ec (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoanv: Implement VK_EXT_pci_bus_info
Lionel Landwerlin [Sun, 14 Oct 2018 12:12:50 +0000 (13:12 +0100)]
anv: Implement VK_EXT_pci_bus_info

Even though the Intel GPU are always at the same PCI location, all the
info we need is already provided by libdrm. Let's be future proof.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoappveyor: Cache pip's cache files.
Jose Fonseca [Fri, 12 Oct 2018 09:21:38 +0000 (10:21 +0100)]
appveyor: Cache pip's cache files.

It should speed up the Python packages installation.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agoappveyor: Update to newer Mako/winflexbison versions.
Jose Fonseca [Fri, 12 Oct 2018 09:09:07 +0000 (10:09 +0100)]
appveyor: Update to newer Mako/winflexbison versions.

As that's what most people are bound to use.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agoappveyor: Update to MSVC 2017.
Jose Fonseca [Fri, 12 Oct 2018 08:52:52 +0000 (09:52 +0100)]
appveyor: Update to MSVC 2017.

That's what we (and I suppose most people out there) are using now.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agoradv: disable VK_SUBGROUP_FEATURE_VOTE_BIT
Samuel Pitoiset [Tue, 16 Oct 2018 07:42:42 +0000 (09:42 +0200)]
radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT

This feature isn't used for now, so disable it until
wwm is fixed in LLVM.

Fixes dEQP-VK.subgroups.vote.graphics.subgroupallequal*

https://bugs.freedesktop.org/show_bug.cgi?id=108115
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: implement buffer to image operations for R32G32B32
Samuel Pitoiset [Fri, 12 Oct 2018 09:30:13 +0000 (11:30 +0200)]
radv: implement buffer to image operations for R32G32B32

This should fix rendering issues with Batman Arkham City.
We will probably need to implement itob and itoi at some
point, but currently nothing hits these paths.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoac/nir: Use context-specific LLVM types
Alex Smith [Mon, 15 Oct 2018 14:50:20 +0000 (15:50 +0100)]
ac/nir: Use context-specific LLVM types

LLVMInt*Type() return types from the global context and therefore are
not safe for use in other contexts. Use types from our own context
instead.

Fixes frequent crashes seen when doing multithreaded pipeline creation.

Fixes: 4d0b02bb5a "ac: add support for 16bit load_push_constant"
Fixes: 7e7ee82698 "ac: add support for 16bit buffer loads"
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoglsl: Check the subroutine associated functions names
Vadym Shovkoplias [Tue, 9 Oct 2018 16:09:10 +0000 (19:09 +0300)]
glsl: Check the subroutine associated functions names

Adding compile time check for subroutine functions with
the same names. Similar check for intrastage linking was already
landed in commit 5f0567a4f60.

From Section 6.1.2 (Subroutines) of the GLSL 4.00 specification

    "A program will fail to compile or link if any shader
     or stage contains two or more functions with the same
     name if the name is associated with a subroutine type."

Fixes:
    * no-overloads.vert

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108109
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
5 years agoglsl/linker: Change the format of spec quotation
Vadym Shovkoplias [Wed, 10 Oct 2018 10:51:28 +0000 (13:51 +0300)]
glsl/linker: Change the format of spec quotation

Also there is no "OpenGL ES Shading Language 4.00" spec,
so change it to GLSL 4.00 spec.

Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
5 years agonir: fix clip cull lowering to not assert if GLSL already lowered.
Dave Airlie [Wed, 9 May 2018 03:21:43 +0000 (13:21 +1000)]
nir: fix clip cull lowering to not assert if GLSL already lowered.

If GLSL has already done the lowering, we'd rather not crash in this pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoi965: Add PCI IDs for new Amberlake parts that are Coffeelake based
Kenneth Graunke [Mon, 15 Oct 2018 23:02:50 +0000 (16:02 -0700)]
i965: Add PCI IDs for new Amberlake parts that are Coffeelake based

See commit c0c46ca461f136a0ae1ed69da6c874e850aeeb53 in the Linux kernel,
where José Roberto de Souza added this new PCI ID there.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
5 years agointel: disable FS IR validation in release mode.
Kenneth Graunke [Sun, 19 Aug 2018 17:15:12 +0000 (10:15 -0700)]
intel: disable FS IR validation in release mode.

We probably don't need to iterate, fprintf, and abort in release mode.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir: Copy propagation between blocks
Caio Marcelo de Oliveira Filho [Fri, 14 Sep 2018 18:41:39 +0000 (11:41 -0700)]
nir: Copy propagation between blocks

Extend the pass to propagate the copies information along the control
flow graph.  It performs two walks, first it collects the vars
that were written inside each node. Then it walks applying the copy
propagation using a list of copies previously available.  At each node
the list is invalidated according to results from the first walk.

This approach is simpler than a full data-flow analysis, but covers
various cases.  If derefs are used for operating on more memory
resources (e.g. SSBOs), the difference from a regular pass is expected
to be more visible -- as the SSA copy propagation pass won't apply to
those.

A full data-flow analysis would handle more scenarios: conditional
breaks in the control flow and merge equivalent effects from multiple
branches (e.g. using a phi node to merge the source for writes to the
same deref).  However, as previous commentary in the code stated, its
complexity 'rapidly get out of hand'.  The current patch is a good
intermediate step towards more complex analysis.

The 'copies' linked list was modified to use util_dynarray to make it
more convenient to clone it (to handle ifs/loops).

Annotated shader-db results for Skylake:

    total instructions in shared programs: 15105796 -> 15105451 (<.01%)
    instructions in affected programs: 152293 -> 151948 (-0.23%)
    helped: 96
    HURT: 17

        All the HURTs and many HELPs are one instruction.  Looking
        at pass by pass outputs, the copy prop kicks in removing a
        bunch of loads correctly, which ends up altering what other
        other optimizations kick.  In those cases the copies would be
        propagated after lowering to SSA.

        In few HELPs we are actually helping doing more than was
        possible previously, e.g. consolidating load_uniforms from
        different blocks.  Most of those are from
        shaders/dolphin/ubershaders/.

    total cycles in shared programs: 566048861 -> 565954876 (-0.02%)
    cycles in affected programs: 151461830 -> 151367845 (-0.06%)
    helped: 2933
    HURT: 2950

        A lot of noise on both sides.

    total loops in shared programs: 4603 -> 4603 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 11085 -> 11073 (-0.11%)
    spills in affected programs: 23 -> 11 (-52.17%)
    helped: 1
    HURT: 0

        The shaders/dolphin/ubershaders/12.shader_test was able to
        pull a couple of loads from inside if statements and reuse
        them.

    total fills in shared programs: 23143 -> 23089 (-0.23%)
    fills in affected programs: 2718 -> 2664 (-1.99%)
    helped: 27
    HURT: 0

        All from shaders/dolphin/ubershaders/.

    LOST:   0
    GAINED: 0

The other generations follow the same overall shape.  The spills and
fills HURTs are all from the same game.

shader-db results for Broadwell.

    total instructions in shared programs: 15402037 -> 15401841 (<.01%)
    instructions in affected programs: 144386 -> 144190 (-0.14%)
    helped: 86
    HURT: 9

    total cycles in shared programs: 600912755 -> 600902486 (<.01%)
    cycles in affected programs: 185662820 -> 185652551 (<.01%)
    helped: 2598
    HURT: 3053

    total loops in shared programs: 4579 -> 4579 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 80929 -> 80924 (<.01%)
    spills in affected programs: 720 -> 715 (-0.69%)
    helped: 1
    HURT: 5

    total fills in shared programs: 93057 -> 93013 (-0.05%)
    fills in affected programs: 3398 -> 3354 (-1.29%)
    helped: 27
    HURT: 5

    LOST:   0
    GAINED: 2

shader-db results for Haswell:

    total instructions in shared programs: 9231975 -> 9230357 (-0.02%)
    instructions in affected programs: 44992 -> 43374 (-3.60%)
    helped: 27
    HURT: 69

    total cycles in shared programs: 87760587 -> 87727502 (-0.04%)
    cycles in affected programs: 7720673 -> 7687588 (-0.43%)
    helped: 1609
    HURT: 1416

    total loops in shared programs: 1830 -> 1830 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total spills in shared programs: 1988 -> 1692 (-14.89%)
    spills in affected programs: 296 -> 0
    helped: 1
    HURT: 0

    total fills in shared programs: 2103 -> 1668 (-20.68%)
    fills in affected programs: 438 -> 3 (-99.32%)
    helped: 4
    HURT: 0

    LOST:   0
    GAINED: 1

v2: Remove the DISABLE prefix from tests we now pass.

v3: Add comments about missing write_mask handling. (Caio)
    Add unreachable when switching on cf_node type. (Jason)
    Properly merge the component information in written map
    instead of replacing. (Jason)
    Explain how removal from written arrays works. (Jason)
    Use mode directly from deref instead of getting the var. (Jason)

v4: Register the local written mode for calls. (Jason)
    Prefer cf_node instead of node. (Jason)
    Clarify that remove inside iteration only works in backward
    iterations. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Take call instruction into account in copy_prop_vars
Caio Marcelo de Oliveira Filho [Sat, 15 Sep 2018 01:17:51 +0000 (18:17 -0700)]
nir: Take call instruction into account in copy_prop_vars

Calls are not used yet (functions are inlined), but since new code is
already taking them into account, do it here too.  The convention here
and in other places is that no writable memory is assumed to remain
unchanged, as well as global variables.

Also, explicitly state the modes affected (instead of using the
reverse logic) in one of the apply_for_barrier_modes calls.

Suggested by Jason.

v2: Consider local vars used by a call to be conservative, SPIR-V has
    such cases. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Add tests for copy propagation of derefs
Caio Marcelo de Oliveira Filho [Wed, 12 Sep 2018 21:33:59 +0000 (14:33 -0700)]
nir: Add tests for copy propagation of derefs

Also tests for removal of redundant loads, that we currently handle as
part of the copy propagation.

Note some tests involve multiple blocks and are currently DISABLED
because they (expectedly) fail.

v2: Add missing DISABLED prefix to "multi block" tests. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Remove handling of dead writes from copy_prop_vars
Caio Marcelo de Oliveira Filho [Wed, 15 Aug 2018 16:52:53 +0000 (09:52 -0700)]
nir: Remove handling of dead writes from copy_prop_vars

These are covered by another pass now.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/nir, freedreno/ir3: Use the separated dead write vars pass
Caio Marcelo de Oliveira Filho [Thu, 30 Aug 2018 00:26:03 +0000 (17:26 -0700)]
intel/nir, freedreno/ir3: Use the separated dead write vars pass

No changes to shader-db for intel.
No changes to shader-db expected for freedreno.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Separate dead write removal into its own pass
Caio Marcelo de Oliveira Filho [Fri, 27 Jul 2018 20:56:35 +0000 (13:56 -0700)]
nir: Separate dead write removal into its own pass

Instead of doing this as part of the existing copy_prop_vars pass.

Separation makes easier to expand the scope of both passes to be more
than per-block.  For copy propagation, the information about valid
copies comes from previous instructions; while the dead write removal
depends on information from later instructions ("have any instruction
used this deref before overwrite it?").

Also change the tests to use this pass (instead of copy prop vars).
Note that the disabled tests continue to fail, since the standalone
pass is still per-block.

v2: Remove entries from dynarray instead of marking items as
    deleted.  Use foreach_reverse. (Caio)

    (all from Jason)
    Do not cache nir_deref_path.  Not worthy for this patch.
    Clear unused writes when hitting a call instruction.
    Clean up enumeration of modes for barriers.
    Move metadata calls to the inner function.

v3: For copies, use the vector length to calculate the mask.

    (all from Jason)
    Use nir_component_mask_t when applicable.
    Rename functions for clarity.
    Consider local vars used by a call to be conservative (SPIR-V has
    such cases).
    Comment and assert the assumption that stores and copies are
    always to a deref that ends with a vector or scalar.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Add tests for dead write elimination
Caio Marcelo de Oliveira Filho [Wed, 29 Aug 2018 23:30:09 +0000 (16:30 -0700)]
nir: Add tests for dead write elimination

Note at the moment the pass called is nir_opt_copy_prop_vars, because
dead write elimination is implemented there.

Also added tests that involve identifying dead writes in multiple
blocks (e.g. the overwrite happens in another block).  Those currently
fail as expected, so are marked to be skipped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Add test file for vars related passes
Caio Marcelo de Oliveira Filho [Wed, 29 Aug 2018 23:30:09 +0000 (16:30 -0700)]
nir: Add test file for vars related passes

Add basic helpers for doing tests on the vars related optimization
passes.  The main goal is to lower the barrier to create tests during
development and debugging of the passes.  Full coverage is not a
requirement.

v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason)
    Move nir_imm_ivec2() to nir_builder.h. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Add nir_imm_ivec2 helper
Caio Marcelo de Oliveira Filho [Thu, 11 Oct 2018 00:14:34 +0000 (17:14 -0700)]
nir: Add nir_imm_ivec2 helper

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil: Add foreach_reverse for dynarray
Caio Marcelo de Oliveira Filho [Wed, 12 Sep 2018 21:57:35 +0000 (14:57 -0700)]
util: Add foreach_reverse for dynarray

Useful to walk the array removing elements by swapping them with the
last element.

v2: Change iteration to make sure we never underflow. (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agov3d: Add support for hardware pack/unpack of half floats.
Eric Anholt [Tue, 18 Sep 2018 21:09:25 +0000 (14:09 -0700)]
v3d: Add support for hardware pack/unpack of half floats.

Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test
in half.

5 years agonir: Expose nir_remove_unused_io_vars().
Eric Anholt [Wed, 26 Sep 2018 16:13:13 +0000 (09:13 -0700)]
nir: Expose nir_remove_unused_io_vars().

For gallium drivers where you want to do some linking at variant compile
time, you don't have the other producer/consumer shader on hand to modify.
By exposing the inner function, the driver can have the used varyings in
the compiled shader cache key and still do linking.

This is also useful for V3D, where the binning shader wants to only output
position and TF varyings.  We've been removing those after nir_lower_io,
but this will be less driver-specific code and let more of the shader get
DCEed early in NIR.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir: Be sure to fix deref modes after demoting shader i/o vars to global.
Eric Anholt [Wed, 26 Sep 2018 18:33:09 +0000 (11:33 -0700)]
nir: Be sure to fix deref modes after demoting shader i/o vars to global.

Fixes assertion failures when calling nir_remove_unused_varyings() or
nir_remove_unused_io_vars().

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agogallium/ttn: Convert inputs and outputs to derefs of variables.
Eric Anholt [Tue, 18 Sep 2018 22:53:54 +0000 (15:53 -0700)]
gallium/ttn: Convert inputs and outputs to derefs of variables.

This means that TTN shaders more closely resemble GTN shaders: they have
inputs and outputs as variable derefs, with the variables having their
.driver_location already set up for you.

This will be useful for v3d to do input variable DCE in NIR, which we
can't do when the TTN shaders never have a pre-nir_lower_io stage.

Acked-by: Rob Clark <robdclark@gmail.com>
5 years agogallium/ttn: Fix the type of gl_FragDepth.
Eric Anholt [Wed, 19 Sep 2018 19:35:51 +0000 (12:35 -0700)]
gallium/ttn: Fix the type of gl_FragDepth.

In TGSI we have a vec4 of which only .z is used, but for NIR we should be
using a float the same as other NIR IR.  We were already moving TGSI's .z
to the .x channel.

Acked-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: Enable blitter
Kristian H. Kristensen [Mon, 17 Sep 2018 23:59:00 +0000 (16:59 -0700)]
freedreno/a6xx: Enable blitter

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: Update headers
Kristian H. Kristensen [Fri, 12 Oct 2018 22:42:05 +0000 (15:42 -0700)]
freedreno/a6xx: Update headers

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: Remove unnecessary GRAS_2D_BLIT_INFO write
Kristian H. Kristensen [Mon, 8 Oct 2018 19:10:26 +0000 (12:10 -0700)]
freedreno/a6xx: Remove unnecessary GRAS_2D_BLIT_INFO write

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agoanv: Don't advertise ASTC support on BSW
Jason Ekstrand [Mon, 15 Oct 2018 18:07:06 +0000 (13:07 -0500)]
anv: Don't advertise ASTC support on BSW

Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
5 years agoradv: do not force the flat qualifier for clip/cull distances
Samuel Pitoiset [Mon, 15 Oct 2018 12:48:16 +0000 (14:48 +0200)]
radv: do not force the flat qualifier for clip/cull distances

This fixes some new CTS that reads clip/cull distances
from the fragment shader stage:

dEQP-VK.clipping.user_defined.clip_*

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: bump discreteQueuePriorities to 2
Samuel Pitoiset [Mon, 15 Oct 2018 12:53:00 +0000 (14:53 +0200)]
radv: bump discreteQueuePriorities to 2

It's the minimum value required by the spec.

This fixes dEQP-VK.api.info.device.properties.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoanv: Split dispatch tables into device and instance
Jason Ekstrand [Sat, 13 Oct 2018 17:16:14 +0000 (12:16 -0500)]
anv: Split dispatch tables into device and instance

There's no reason why we need generate trampoline functions for instance
functions or carry N copies of the instance dispatch table around for
every hardware generation.  Splitting the tables and being more
conservative shaves about 34K off .text and about 4K off .data when
built with clang.

Before splitting dispatch tables:

   text    data     bss     dec     hex filename
3224305  286216    8960 3519481  35b3f9 _install/lib64/libvulkan_intel.so

After splitting dispatch tables:

   text    data     bss     dec     hex filename
3190325  282232    8960 3481517  351fad _install/lib64/libvulkan_intel.so

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoi965: Drop assert about number of uniforms in ARB handling.
Kenneth Graunke [Mon, 15 Oct 2018 16:29:06 +0000 (09:29 -0700)]
i965: Drop assert about number of uniforms in ARB handling.

My recent prog_to_nir patch started making new sampler uniforms, which
apparently increased the number of parameters.  We used to poke at the
one parameter directly, making it important that there was only one,
but we haven't done that in a while.  It should be safe to just delete
the assertion.

Fixes: 1c0f92d8a8c "nir: Create sampler variables in prog_to_nir."
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agovulkan: Add the fuchsia headers
Jason Ekstrand [Sat, 13 Oct 2018 15:00:05 +0000 (10:00 -0500)]
vulkan: Add the fuchsia headers

These were missing in the last couple of spec updates.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>