mesa.git
5 years agofreedreno/ir3: Fix up the half reg source even when src instr==NULL
Neil Roberts [Fri, 15 Mar 2019 13:32:27 +0000 (14:32 +0100)]
freedreno/ir3: Fix up the half reg source even when src instr==NULL

Previously the loop for assigning registers was bailing out early if
the register had a null source. I think the intention is that in this
case it isn’t necessary to assign a register. However it was also
missing out the part to fix up the types. This can happen if the
instruction is copy propagated to be a move from a constant half-float
input register. In that case it still needs to fix up the types.

Fixes assert in
dEQP-GLES3.functional.shaders.invariance.highp.subexpression_precision_mediump

when lowering the precision of the variables.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: Add a 16-bit implementation of nir_op_imul
Neil Roberts [Thu, 28 Feb 2019 15:13:56 +0000 (16:13 +0100)]
freedreno/ir3: Add a 16-bit implementation of nir_op_imul

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: set dst type of alu instructions correctly.
Hyunjun Ko [Tue, 19 Mar 2019 07:17:40 +0000 (07:17 +0000)]
freedreno/ir3: set dst type of alu instructions correctly.

Though it should be fixed in RA pass, it needs to be set correctly from
the beginning according to the bitsize of NIR dest.

v2: Would be better for mad,fddx,fddy to fixup later in RA pass.

[small cleanup of fallout from imov/fmov removal fallout]
Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: adjust the bitsize of regs when an array loading.
Hyunjun Ko [Mon, 22 Apr 2019 06:16:48 +0000 (06:16 +0000)]
freedreno/ir3: adjust the bitsize of regs when an array loading.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: convert back to 32-bit values for half constant registers.
Hyunjun Ko [Thu, 21 Mar 2019 08:30:11 +0000 (17:30 +0900)]
freedreno/ir3: convert back to 32-bit values for half constant registers.

It seems to handle only 32-bit values for half constant registers
within floating point opcodes according to the blob driver.
So we need to convert back to 32-bit values from 16-bit values, when a
lower precision pass is in effect.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov.
Hyunjun Ko [Mon, 15 Apr 2019 03:42:23 +0000 (03:42 +0000)]
freedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov.

If the type of dest reg and src reg of absneg opcode are different,
it shouldn't be considered as same type mov.

This patch becomes meaningful when we start to use mediump information for
doing precision lowering to 16bit.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: set proper dst type for uniform according to the type of nir dest.
Hyunjun Ko [Tue, 26 Feb 2019 08:33:34 +0000 (08:33 +0000)]
freedreno/ir3: set proper dst type for uniform according to the type of nir dest.

eg. uniform mediump vec4 f;

This patch means nothing since there's no mediump lowering pass for now,
but will be meaningful when the pass land in the near future.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: Use output type size to set OUTPUT_REG_HALF_PRECISION
Neil Roberts [Tue, 4 Dec 2018 17:32:15 +0000 (18:32 +0100)]
freedreno/ir3: Use output type size to set OUTPUT_REG_HALF_PRECISION

Previously the A5XX_SP_FS_OUTPUT_REG_HALF_PRECISION was set depending
on whether half_precision was set in the shader key. With support for
mediump precision, it is possible to have different outputs use
different precisions. That means we can’t have a global shader state
to specify it. Instead it now tries to copy the half-float-ness
from the nir_variable for the output into the ir3_shader_variant. This
is then used to decide whether to set half-precision for each output.

The a6xx version is copied from the a5xx code but it has not been
tested.

v2. [Hyunjun Ko (zzoon@igalia.com)] There's the half flag recently
added, which represents precision based on IR3_REG_HALF. Now use this
flag to avoid duplication.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: Fix loading half-float immediate vectors
Neil Roberts [Sun, 2 Dec 2018 17:15:52 +0000 (18:15 +0100)]
freedreno/ir3: Fix loading half-float immediate vectors

Previously the code to load from a constant instruction was always
using the u32 pointer. If the constant is actually a 16-bit source
this would end up with the wrong values because the pointer would be
offset by the wrong size. This fixes it to use the u16 pointer.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: immediately schedule meta instructions
Rob Clark [Tue, 28 May 2019 16:42:26 +0000 (09:42 -0700)]
freedreno/ir3: immediately schedule meta instructions

The aren't real instructions, and don't change # of live values, so no
point in them competing with real instructions.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/ir3: scheduler improvements
Rob Clark [Thu, 30 May 2019 17:44:16 +0000 (10:44 -0700)]
freedreno/ir3: scheduler improvements

For instructions that increase the # of live values, apply a threshold
to avoid scheduling them too early.  And factor the net change of # of
live values that would result from scheduling an instruction, to
prioritize instructions that reduce number of live values as the number
of live values increases.

For manhattan:

  total instructions in shared programs: 27869 -> 28413 (1.95%)
  instructions in affected programs: 26756 -> 27300 (2.03%)
  helped: 102
  HURT: 87

  total full in shared programs: 1903 -> 1719 (-9.67%)
  full in affected programs: 1390 -> 1206 (-13.24%)
  helped: 124
  HURT: 9

The reduction in register usage nets ~20% gain in manhattan.  (So
getting mediump support should be a huge win for gles gfxbench.)

Also significantly helps some of the more complex shadertoy shaders,
like IQ's Piano (32 to 18 regs, doubles fps).

The effect is less pronounced on smaller shaders.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/ir3: sched should mark outputs used
Rob Clark [Fri, 24 May 2019 16:19:55 +0000 (09:19 -0700)]
freedreno/ir3: sched should mark outputs used

Account for shader outputs and values live in any direct/indirect
successor block.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agomesa: EXT_dsa add selectorless matrix stack functions
Pierre-Eric Pelloux-Prayer [Tue, 7 May 2019 09:20:51 +0000 (11:20 +0200)]
mesa: EXT_dsa add selectorless matrix stack functions

Allows the legacy matrix stacks to be manipulated without disturbing the
matrix mode selector.

Adapted from a patch from Chris Forbes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomesa: factor out enum -> matrix stack lookup
Pierre-Eric Pelloux-Prayer [Tue, 7 May 2019 09:20:20 +0000 (11:20 +0200)]
mesa: factor out enum -> matrix stack lookup

Split this out from glMatrixMode since we're about to need it
independently for EXT_DSA.

Adapted from Chris Forbes commit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomesa: add new EXT_direct_state_access tokens
Timothy Arceri [Thu, 16 Aug 2018 01:20:37 +0000 (11:20 +1000)]
mesa: add new EXT_direct_state_access tokens

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoglapi: add EXT_direct_state_access
Chris Forbes [Thu, 17 Jan 2013 09:02:39 +0000 (22:02 +1300)]
glapi: add EXT_direct_state_access

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomesa: add a list of EXT_direct_state_access to dispatch sanity
Timothy Arceri [Sat, 8 Sep 2018 04:01:16 +0000 (14:01 +1000)]
mesa: add a list of EXT_direct_state_access to dispatch sanity

This extension is huge and this gives us a TODO list of functions
to implement.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradeonsi: init sctx->dma_copy before using it
Pierre-Eric Pelloux-Prayer [Fri, 31 May 2019 12:39:46 +0000 (14:39 +0200)]
radeonsi: init sctx->dma_copy before using it

Commit a1378639ab19 reordered context functions initializations but broke
sctx->b.resource_copy_region init when using AMD_DEBUG=forcedma.

In this case sctx->dma_copy was assigned a value after being used in:
   sctx->b.resource_copy_region = sctx->dma_copy;

This commit moves the FORCE_DMA special case after sctx->dma_copy initialization.

See https://bugs.freedesktop.org/show_bug.cgi?id=110422

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
5 years agod3dadapter9: Revert to old throttling limit value
Axel Davy [Sun, 26 May 2019 20:59:30 +0000 (22:59 +0200)]
d3dadapter9: Revert to old throttling limit value

Recently PIPE_CAP_MAX_FRAMES_IN_FLIGHT was changed from 2
to 1:
20909284f204091757c050aa40cfffaf3f981b9c

No driver seems to overwrite the default value.

One user reports severe regressions for some games.
For now, revert to the value 2 for nine.

Cc: "19.1" mesa-stable@lists.freedesktop.org
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
5 years agoac: use amdgpu-flat-work-group-size
Marek Olšák [Fri, 31 May 2019 19:38:39 +0000 (15:38 -0400)]
ac: use amdgpu-flat-work-group-size

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agou_blitter: don't fail mipmap generation for depth formats containing stencil
Marek Olšák [Mon, 27 May 2019 22:47:31 +0000 (18:47 -0400)]
u_blitter: don't fail mipmap generation for depth formats containing stencil

Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109754

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agoetnaviv: drop a bunch of duplicated gallium PIPE_CAP default code
Christian Gmeiner [Mon, 27 May 2019 18:49:58 +0000 (20:49 +0200)]
etnaviv: drop a bunch of duplicated gallium PIPE_CAP default code

Now that we have the util function for the default values, we can get
rid of the boilerplate.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
5 years agoradv: flush pending query reset caches before copying results
Samuel Pitoiset [Fri, 24 May 2019 08:10:08 +0000 (10:10 +0200)]
radv: flush pending query reset caches before copying results

From the Vulkan spec 1.1.108:
   "vkCmdCopyQueryPoolResults is guaranteed to see the effect of
    previous uses of vkCmdResetQueryPool in the same queue, without any
    additional synchronization."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir: copy intrinsic type when lowering load input/uniform and store output
Jonathan Marek [Sun, 2 Jun 2019 19:16:06 +0000 (15:16 -0400)]
nir: copy intrinsic type when lowering load input/uniform and store output

Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics"
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
5 years agoac,radv: remove the vec3 restriction with LLVM 9+
Samuel Pitoiset [Thu, 2 May 2019 14:15:03 +0000 (16:15 +0200)]
ac,radv: remove the vec3 restriction with LLVM 9+

This changes requires LLVM r356755.

32706 shaders in 16744 tests
Totals:
SGPRS: 1448848 -> 1455984 (0.49 %)
VGPRS: 1016684 -> 1016220 (-0.05 %)
Spilled SGPRs: 25871 -> 25815 (-0.22 %)
Spilled VGPRs: 122 -> 122 (0.00 %)
Scratch size: 11964 -> 11956 (-0.07 %) dwords per thread
Code Size: 55324500 -> 55301152 (-0.04 %) bytes
Max Waves: 235660 -> 235586 (-0.03 %)

Totals from affected shaders:
SGPRS: 293704 -> 300840 (2.43 %)
VGPRS: 246716 -> 246252 (-0.19 %)
Spilled SGPRs: 159 -> 103 (-35.22 %)
Scratch size: 188 -> 180 (-4.26 %) dwords per thread
Code Size: 8653664 -> 8630316 (-0.27 %) bytes
Max Waves: 60811 -> 60737 (-0.12 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agonir: Return nir_type_invalid for non-numeric base types
Caio Marcelo de Oliveira Filho [Fri, 31 May 2019 23:15:02 +0000 (16:15 -0700)]
nir: Return nir_type_invalid for non-numeric base types

Now that the type gathering function look at instructions that might
have other types, return invalid type instead of crashing.  That
invalid will be properly ignored later.

Fixes: c12750527b7 "nir: add type information to load uniform/input and store output intrinsics"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoiris: Drop unused locals from iris_clear.c to avoid warning
Caio Marcelo de Oliveira Filho [Fri, 31 May 2019 21:30:18 +0000 (14:30 -0700)]
iris: Drop unused locals from iris_clear.c to avoid warning

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
5 years agonir: remove bool lowering from lower_int_to_float
Jonathan Marek [Fri, 31 May 2019 20:17:06 +0000 (16:17 -0400)]
nir: remove bool lowering from lower_int_to_float

Removes the bool_to_float logic from the int_to_float pass, so that both
can be used separately. By having separate passes we have better validation
and it makes it possible to use with the lower_ftrunc option (int lowering
generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should
probably be reworked). For now we always expect lower_bool to come after
lower_int.

Also fixes f2i32 to become ftrunc and adds u2f/f2u cases.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: fix lower_{int,bool}_to_float for new mov opcode
Jonathan Marek [Fri, 31 May 2019 20:04:10 +0000 (16:04 -0400)]
nir: fix lower_{int,bool}_to_float for new mov opcode

It is treated like the vecN instructions which also have no type.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: add lower_bitshift option
Jonathan Marek [Fri, 31 May 2019 17:54:12 +0000 (13:54 -0400)]
nir: add lower_bitshift option

Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: fix gather_ssa_types
Jonathan Marek [Fri, 31 May 2019 19:08:54 +0000 (15:08 -0400)]
nir: fix gather_ssa_types

Consts and undefs can be used as different types (common with "0" constant)
so don't copy types from consts/undefs, only to them. It doesn't entirely
solve the problem that the type given to the const could be wrong , but
now the only realistic case is with "0" which is the same when casted to
float, so it doesn't matter for lower_int_to_float.

The other change is to get type information for load input/uniform and
store output, and use that to get correct results.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: add type information to load uniform/input and store output intrinsics
Jonathan Marek [Fri, 31 May 2019 17:44:40 +0000 (13:44 -0400)]
nir: add type information to load uniform/input and store output intrinsics

This type information will be used by gather_ssa_types to get usable results

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: improvements to native_integers removal
Jonathan Marek [Wed, 8 May 2019 20:22:45 +0000 (16:22 -0400)]
nir: improvements to native_integers removal

Improvements related to the patch that removed native_integers:
 * In glsl_to_nir, special cases for i2f,u2f,etc are no longer needed
 * In prog_to_nir, use sge/slt and let lower_scmp lower it if needed

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agofreedreno/a6xx: add 'type' to shader state key
Rob Clark [Fri, 31 May 2019 15:44:55 +0000 (08:44 -0700)]
freedreno/a6xx: add 'type' to shader state key

We could have identical texture state for both VS and FS.. which would
result in VS state getting created first, and FS state mapping to the
identical cmdstream.  Resulting in VS state getting emitted twice and no
FS state emitted.

Fixes:
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.basic_array.sampler2D_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.struct_in_array.sampler2D_samplerCube_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both
  dEQP-GLES31.functional.program_uniform.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
  dEQP-GLES31.functional.program_uniform.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
  dEQP-GLES31.functional.program_uniform.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/ir3: fix constlen versus indirect UBO
Rob Clark [Fri, 31 May 2019 14:40:16 +0000 (07:40 -0700)]
freedreno/ir3: fix constlen versus indirect UBO

If we access the address of the UBO indirectly, and there is no higher
const emitted w/ direct access (like an immediate lowered to uniform)
the assembler won't figure out the correct constlen.

Fixes:
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_vertex
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_vertex
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/a6xx: fix GPU crash on small render targets
Rob Clark [Fri, 31 May 2019 14:07:57 +0000 (07:07 -0700)]
freedreno/a6xx: fix GPU crash on small render targets

Fixes dEQP-GLES2.functional.multisampled_render_to_texture.readpixels

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/ir3: set more barrier bits
Rob Clark [Thu, 30 May 2019 16:04:57 +0000 (09:04 -0700)]
freedreno/ir3: set more barrier bits

Blob is also setting the .l bit, and it seems to solve some intermittent
failures with a couple of deqp's:

dEQP-GLES31.functional.image_load_store.2d.qualifiers.coherent_r32i
dEQP-GLES31.functional.image_load_store.2d.qualifiers.volatile_r32f

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/ir3: set (ss) on last_input if ldlv
Rob Clark [Wed, 29 May 2019 20:25:11 +0000 (13:25 -0700)]
freedreno/ir3: set (ss) on last_input if ldlv

It seems like (ei) handling doesn't sync on (ss), so we could end up in
a situation where we release varying storage before an ldlv for flat
shaded varyings completes.  Keep track if we've done an (ss) since the
last ldlv, and if not add (ss) flag to last_input which gets (ei).

Noticed with dEQP-GLES3.functional.fragment_out.random.24 and
dEQP-GLES3.functional.fragment_out.random.27, which previously passed by
luck because ir3_sched ordered instructions in a way that resulted in a
lucky (ss).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/ir3: add assert
Rob Clark [Wed, 29 May 2019 19:26:08 +0000 (12:26 -0700)]
freedreno/ir3: add assert

The special handling for last_input assumes that all the varying loads
are in the first block.  Add an assert to catch if anyone breaks that
assumption.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil/hash_table: Use fast modulo computation
Connor Abbott [Tue, 21 May 2019 10:56:31 +0000 (12:56 +0200)]
util/hash_table: Use fast modulo computation

While we're here, copy the size table from set.c to get rid of hard tabs
in the hash_table.c version.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil/set: Use fast modulo computation
Connor Abbott [Mon, 20 May 2019 13:14:04 +0000 (15:14 +0200)]
util/set: Use fast modulo computation

Compilation times with my shader-db database:

Difference at 95.0% confidence
-1.22312 +/- 0.726033
-0.283979% +/- 0.168254%
(Student's t, pooled s = 1.02177)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil: Add a helper for faster remainders
Connor Abbott [Fri, 29 Mar 2019 14:08:17 +0000 (15:08 +0100)]
util: Add a helper for faster remainders

This should be at least as fast as using fast_idiv_by_const, and has the
advantage that the precomputation is simple enough to be evaluated at
Mesa-compile time for hash tables and sets which have a fixed table of
possible divisors.

Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil/hash_table: Add specialized resizing add function
Connor Abbott [Tue, 21 May 2019 10:36:56 +0000 (12:36 +0200)]
util/hash_table: Add specialized resizing add function

To keep it in sync with the set implementation.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil/set: Add specialized resizing add function
Connor Abbott [Mon, 20 May 2019 12:59:40 +0000 (14:59 +0200)]
util/set: Add specialized resizing add function

A significant portion of the time spent in nir_opt_cse for the Dolphin
ubershaders was in resizing the set. When resizing a hash table, we know
in advance that each new element to be inserted will be different from
every other element, so we don't have to compare them, and there will be
no tombstone elements, so we don't have to worry about caching the
first-seen tombstone. We add a specialized add function which skips
these steps entirely, speeding up resizing.

Compile-time results from my shader-db database:

Difference at 95.0% confidence
-2.29143 +/- 0.845534
-0.529475% +/- 0.194767%
(Student's t, pooled s = 1.08807)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil/hash_table: Pull out loop-invariant computations
Connor Abbott [Tue, 21 May 2019 10:21:53 +0000 (12:21 +0200)]
util/hash_table: Pull out loop-invariant computations

To keep the set and hash table in sync. Note that some of this had
already been done for hash tables, in particular pulling out the
hash % ht->size computation.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil/set: Pull out loop-invariant computations
Connor Abbott [Mon, 20 May 2019 12:58:06 +0000 (14:58 +0200)]
util/set: Pull out loop-invariant computations

Unfortunately GCC can't do this for us, probably because we call the key
comparison function which GCC can't prove won't modify arbitrary memory.
This is a pretty hot function, so do the optimization manually to be
sure the compiler will get it right.

While we're here, make the computation of the new probe address use a
single conditional subtract instead of a modulo, since we know that it
won't ever get as big as 2 * ht->size before the modulo. Modulos tend to
be pretty expensive operations.

shader-db compile time results for my database:

Difference at 95.0% confidence
-2.24934 +/- 0.69897
-0.516296% +/- 0.159993%
(Student's t, pooled s = 0.983684)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir/instr_set: Use _mesa_set_search_or_add()
Connor Abbott [Wed, 27 Mar 2019 11:11:36 +0000 (12:11 +0100)]
nir/instr_set: Use _mesa_set_search_or_add()

Before this change, we were searching for each instruction twice, once
when checking if it exists and once when figuring out where to insert
it. By using the new function, we can do everything we need to do in one
operation.

Compilation time numbers for my shader-db database:

Difference at 95.0% confidence
-4.04706 +/- 0.669508
-0.922142% +/- 0.151948%
(Student's t, pooled s = 0.95824)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil/set: Add a _mesa_set_search_or_add() function
Connor Abbott [Wed, 27 Mar 2019 11:00:54 +0000 (12:00 +0100)]
util/set: Add a _mesa_set_search_or_add() function

Unlike _mesa_set_search_and_add(), it doesn't replace an entry if it's
found, returning it instead. This is useful for nir_instr_set, where
we have to know both the original original instruction and its
equivalent.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agofreedreno/ir3: fix input ncomp for vertex shaders
Jonathan Marek [Fri, 31 May 2019 16:17:06 +0000 (12:17 -0400)]
freedreno/ir3: fix input ncomp for vertex shaders

ncomp is never set for vertex shaders, but a3xx and a4xx still use it.

Fixes: 831f1a05c0d freedreno/ir3: rework varying packing
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
5 years agointel/compiler: Use compare rematerialization pass
Ian Romanick [Mon, 20 May 2019 18:24:57 +0000 (11:24 -0700)]
intel/compiler: Use compare rematerialization pass

Almost all of the spill / fill benefit is in Deus Ex.

Haswell and all Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224438 -> 17196395 (-0.16%)
instructions in affected programs: 1518658 -> 1490615 (-1.85%)
helped: 1550
HURT: 3
helped stats (abs) min: 1 max: 170 x̄: 18.11 x̃: 2
helped stats (rel) min: 0.04% max: 8.35% x̄: 1.12% x̃: 0.45%
HURT stats (abs)   min: 5 max: 10 x̄: 6.67 x̃: 5
HURT stats (rel)   min: 0.32% max: 0.41% x̄: 0.35% x̃: 0.32%
95% mean confidence interval for instructions value: -19.86 -16.26
95% mean confidence interval for instructions %-change: -1.19% -1.04%
Instructions are helped.

total cycles in shared programs: 361468455 -> 361288721 (-0.05%)
cycles in affected programs: 197367688 -> 197187954 (-0.09%)
helped: 990
HURT: 683
helped stats (abs) min: 1 max: 119045 x̄: 806.00 x̃: 16
helped stats (rel) min: <.01% max: 38.56% x̄: 1.06% x̃: 0.26%
HURT stats (abs)   min: 1 max: 12190 x̄: 905.14 x̃: 22
HURT stats (rel)   min: <.01% max: 25.18% x̄: 1.16% x̃: 0.47%
95% mean confidence interval for cycles value: -315.45 100.58
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 12147 -> 8948 (-26.34%)
spills in affected programs: 5433 -> 2234 (-58.88%)
helped: 343
HURT: 0

total fills in shared programs: 25262 -> 21814 (-13.65%)
fills in affected programs: 7771 -> 4323 (-44.37%)
helped: 343
HURT: 3

LOST:   0
GAINED: 17

Ivy Bridge
total instructions in shared programs: 12083517 -> 12081427 (-0.02%)
instructions in affected programs: 540744 -> 538654 (-0.39%)
helped: 786
HURT: 29
helped stats (abs) min: 1 max: 42 x̄: 2.70 x̃: 2
helped stats (rel) min: 0.06% max: 5.44% x̄: 0.55% x̃: 0.36%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.16% max: 0.95% x̄: 0.38% x̃: 0.31%
95% mean confidence interval for instructions value: -2.83 -2.30
95% mean confidence interval for instructions %-change: -0.57% -0.47%
Instructions are helped.

total cycles in shared programs: 180153463 -> 180124798 (-0.02%)
cycles in affected programs: 72597920 -> 72569255 (-0.04%)
helped: 572
HURT: 249
helped stats (abs) min: 1 max: 14830 x̄: 109.48 x̃: 13
helped stats (rel) min: <.01% max: 8.92% x̄: 0.71% x̃: 0.26%
HURT stats (abs)   min: 1 max: 11060 x̄: 136.37 x̃: 10
HURT stats (rel)   min: <.01% max: 10.85% x̄: 0.54% x̃: 0.32%
95% mean confidence interval for cycles value: -96.22 26.39
95% mean confidence interval for cycles %-change: -0.43% -0.23%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 3625 -> 3623 (-0.06%)
spills in affected programs: 46 -> 44 (-4.35%)
helped: 1
HURT: 0

total fills in shared programs: 4065 -> 4061 (-0.10%)
fills in affected programs: 104 -> 100 (-3.85%)
helped: 1
HURT: 0

LOST:   0
GAINED: 8

Sandy Bridge
total instructions in shared programs: 10879656 -> 10878699 (<.01%)
instructions in affected programs: 275167 -> 274210 (-0.35%)
helped: 544
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.06% max: 3.11% x̄: 0.39% x̃: 0.25%
95% mean confidence interval for instructions value: -1.97 -1.55
95% mean confidence interval for instructions %-change: -0.43% -0.36%
Instructions are helped.

total cycles in shared programs: 154089096 -> 154081132 (<.01%)
cycles in affected programs: 4422722 -> 4414758 (-0.18%)
helped: 459
HURT: 214
helped stats (abs) min: 1 max: 258 x̄: 26.67 x̃: 8
helped stats (rel) min: <.01% max: 5.45% x̄: 0.51% x̃: 0.14%
HURT stats (abs)   min: 1 max: 226 x̄: 19.99 x̃: 4
HURT stats (rel)   min: <.01% max: 3.15% x̄: 0.34% x̃: 0.09%
95% mean confidence interval for cycles value: -15.51 -8.15
95% mean confidence interval for cycles %-change: -0.31% -0.17%
Cycles are helped.

total spills in shared programs: 2880 -> 2876 (-0.14%)
spills in affected programs: 636 -> 632 (-0.63%)
helped: 2
HURT: 0

total fills in shared programs: 3161 -> 3157 (-0.13%)
fills in affected programs: 1519 -> 1515 (-0.26%)
helped: 2
HURT: 0

LOST:   0
GAINED: 2

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8157361 -> 8155067 (-0.03%)
instructions in affected programs: 382491 -> 380197 (-0.60%)
helped: 677
HURT: 0
helped stats (abs) min: 1 max: 43 x̄: 3.39 x̃: 2
helped stats (rel) min: 0.09% max: 5.19% x̄: 0.66% x̃: 0.42%
95% mean confidence interval for instructions value: -3.76 -3.01
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 188588292 -> 188583040 (<.01%)
cycles in affected programs: 3155064 -> 3149812 (-0.17%)
helped: 377
HURT: 13
helped stats (abs) min: 2 max: 180 x̄: 14.13 x̃: 6
helped stats (rel) min: <.01% max: 3.96% x̄: 0.39% x̃: 0.12%
HURT stats (abs)   min: 2 max: 8 x̄: 5.85 x̃: 6
HURT stats (rel)   min: <.01% max: 0.22% x̄: 0.06% x̃: 0.04%
95% mean confidence interval for cycles value: -15.67 -11.27
95% mean confidence interval for cycles %-change: -0.45% -0.30%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir: Rematerialize compare instructions
Ian Romanick [Mon, 20 May 2019 18:22:12 +0000 (11:22 -0700)]
nir: Rematerialize compare instructions

On some architectures, Boolean values used to control conditional
branches or condtional selection must be propagated into a flag.  This
generally means that a stored Boolean value must be compared with zero.
Rather than force the generation of extra compares with zero, re-emit
the original comparison instruction.  This can save register pressure by
not needing to store the Boolean value.

There are several possible ares for future improvement to this pass:

1. Be more conservative.  If both sources to the comparison instruction
are non-constants, it may be better for register pressure to emit the
extra compare.  The current shader-db results on Intel GPUs (next
commit) lead me to believe that this is not currently a problem.

2. Be less conservative.  Currently the pass requires that all users of
the comparison match the pattern.  The idea is that after the pass is
complete, no instruction will use the resulting Boolean value.  The only
uses will be of the flag value.  It may be beneficial to relax this
requirement in some cases.

3. Be less conservative.  Also try to rematerialize comparisons used for
discard_if intrinsics.  After changing the way the Intel compiler
generates cod e for discard_if (see MR!935), I tried implementing this
already.  The changes were pretty small.  Instructions were helped in 19
shaders, but, overall, cycles were hurt.  A commit "nir: Rematerialize
comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit.

4. Copy the preceeding ALU instruction.  If the comparison is a
comparison with zero, and it is the only user of a particular ALU
instruction (e.g., (a+b) != 0.0), it may be a further improvment to also
copy the preceeding ALU instruction.  On Intel GPUs, this may enable
cmod propagation to make additional progress.

v2: Use much simpler method to get the prev_block for an if-statement.
Suggested by Tim.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir: Add a shallow clone function for nir_alu_instr
Ian Romanick [Wed, 29 May 2019 23:48:17 +0000 (16:48 -0700)]
nir: Add a shallow clone function for nir_alu_instr

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Matt Turner <mattst88@gmail.com>
5 years agopanfrost: Remove link stage for jobs
Tomeu Vizoso [Wed, 29 May 2019 09:25:20 +0000 (11:25 +0200)]
panfrost: Remove link stage for jobs

And instead, link them as they are added.

Makes things a bit clearer and prepares future work such as FB reload
jobs.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: ci: Switch to kernel 5.2-rc2
Tomeu Vizoso [Mon, 20 May 2019 09:33:25 +0000 (11:33 +0200)]
panfrost: ci: Switch to kernel 5.2-rc2

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: ci: Update expectations
Tomeu Vizoso [Fri, 31 May 2019 10:34:16 +0000 (12:34 +0200)]
panfrost: ci: Update expectations

A bunch of tests have been fixed, but some regressions have appeared on
T760.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
5 years agoradeonsi/nir: Remove hack for builtins
Connor Abbott [Wed, 29 May 2019 14:03:25 +0000 (16:03 +0200)]
radeonsi/nir: Remove hack for builtins

We now bounds check properly in the uniform loading fast path, so
there's no need to disable it by pretending there are other UBO bindings
in use. The way this looks at the variable name was causing problems
when two piglit shaders, one with a name that triggered the hack and one
that didn't, got hashed to the same thing after stripping out the names.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoradeonsi/nir: Use correct location for uniform access bound
Connor Abbott [Wed, 29 May 2019 15:09:45 +0000 (17:09 +0200)]
radeonsi/nir: Use correct location for uniform access bound

location is the API-level location, but driver_location is the actual
location the uniform gets passed to the driver. This apparently only
caused failures with builtins, where the location is 0 because it's
represented via the state tokens instead.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoradeonsi/nir: Correctly handle double TCS/TES varyings
Connor Abbott [Wed, 29 May 2019 13:48:06 +0000 (15:48 +0200)]
radeonsi/nir: Correctly handle double TCS/TES varyings

ac expands the store to 32-bit components for us, but we still have to
deal with storing up to 8 components, and when a varying is split across
two vec4 slots we have to calculate the address again for the second
slot, since they aren't adjacent in memory. I didn't do this on the ac
level because we should generate better indexing arithmetic for the lds
store, where slots are contiguous.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoetnaviv: blt: s/TRUE/true && s/FALSE/false
Christian Gmeiner [Fri, 24 May 2019 10:45:20 +0000 (12:45 +0200)]
etnaviv: blt: s/TRUE/true && s/FALSE/false

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
5 years agoetnaviv: rs: s/TRUE/true && s/FALSE/false
Christian Gmeiner [Fri, 24 May 2019 10:45:19 +0000 (12:45 +0200)]
etnaviv: rs: s/TRUE/true && s/FALSE/false

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
5 years agonir: Actually propagate progress in nir_opt_move_load_ubo.
Bas Nieuwenhuizen [Thu, 30 May 2019 20:48:46 +0000 (22:48 +0200)]
nir: Actually propagate progress in nir_opt_move_load_ubo.

Found with Jasons new metadata rework (https://gitlab.freedesktop.org/mesa/mesa/merge_requests/950).

Fixes: af355aaa071 "nir: add nir_opt_move_load_ubo() optimization pass"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoradv: use RADV_CMD_DIRTY_DYNAMIC_* when restoring viewport/scissor
Samuel Pitoiset [Thu, 30 May 2019 10:29:40 +0000 (12:29 +0200)]
radv: use RADV_CMD_DIRTY_DYNAMIC_* when restoring viewport/scissor

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: use CmdPushConstants when restoring constants after meta operations
Samuel Pitoiset [Thu, 30 May 2019 10:29:39 +0000 (12:29 +0200)]
radv: use CmdPushConstants when restoring constants after meta operations

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir/split_vars: Properly bail in the presence of complex derefs
Jason Ekstrand [Wed, 22 May 2019 20:54:39 +0000 (15:54 -0500)]
nir/split_vars: Properly bail in the presence of complex derefs

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir/vars_to_ssa: Properly ignore variables with complex derefs
Jason Ekstrand [Mon, 4 Mar 2019 18:12:48 +0000 (12:12 -0600)]
nir/vars_to_ssa: Properly ignore variables with complex derefs

Because the core principle of the vars_to_ssa pass is that it globally
(within a function) looks at all of the uses of a never-indirected path
and does a full into-SSA on that path, it can't handle a path which has
any chance of having aliasing.  If a function_temp variable has a cast
or anything else which may cause aliasing, we have to assume that all
paths to that variable may alias and ignore the entire variable.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir/vars_to_ssa: Use a non-null UNDEF_NODE pointer
Jason Ekstrand [Wed, 22 May 2019 23:01:14 +0000 (18:01 -0500)]
nir/vars_to_ssa: Use a non-null UNDEF_NODE pointer

We're about to change the meaning of get_deref_node returning NULL so we
need a non-NULL value to mean properly undefined.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir/deref: Add a has_complex_use helper
Jason Ekstrand [Wed, 22 May 2019 20:00:20 +0000 (15:00 -0500)]
nir/deref: Add a has_complex_use helper

This lets passes easily detect derefs which have uses that fall outside
the standard load/store/copy pattern so they can bail appropriately.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir/dead_cf: Call instructions aren't dead
Jason Ekstrand [Thu, 23 May 2019 03:13:15 +0000 (22:13 -0500)]
nir/dead_cf: Call instructions aren't dead

When we inlined cf_node_has_side_effects into node_is_dead, all the
conditions flipped and we forgot to flip one.  Fortunately, it doesn't
matter right now because no one uses this pass on shaders with more than
one function.

Fixes: b50465d197 "nir/dead_cf: Inline cf_node_has_side_effects"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agovtn: create cast with type stride.
Dave Airlie [Mon, 27 May 2019 01:07:52 +0000 (11:07 +1000)]
vtn: create cast with type stride.

When creating function parameters, we create pointers from ssa
values, this creates nir casts with stride 0, however we have
no where else to get this value from. Later passes to lower
explicit io need this stride value to do the right thing.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
5 years agolist: add some iterator debug
Rob Clark [Sat, 25 May 2019 17:50:41 +0000 (10:50 -0700)]
list: add some iterator debug

Debugging use of unsafe iterators when you should have used the _safe
version sucks.  Add some DEBUG build support to catch and assert if
someone does that.

I didn't update the UPPERCASE verions of the iterators.  They should
probably be deprecated/removed.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
5 years agonir: Accept nir_var_mem_global in derefs used by phis
Caio Marcelo de Oliveira Filho [Thu, 30 May 2019 20:49:19 +0000 (13:49 -0700)]
nir: Accept nir_var_mem_global in derefs used by phis

This mode is used by PhysicalStorageBufferEXT storage class.

Fixes: 8bdf5a008b3 "nir: Allow derefs to be used as phi sources"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agointel/blorp: Use the hardware op for CCS ambiguate on gen10+
Jason Ekstrand [Tue, 15 May 2018 22:28:05 +0000 (15:28 -0700)]
intel/blorp: Use the hardware op for CCS ambiguate on gen10+

Cannonlake hardware adds a new resolve type in 3DSTATE_PS called
FAST_CLEAR_0 which does an ambiguate.  Now that the hardware can do it
directly, we should use that instead of binding the CCS as a render
target and doing it manually.  This was tested with a full Vulkan CTS
run on Cannonlake.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
5 years agoswr/rast: Enable ARB_GL_texture_buffer_range
Jan Zielinski [Mon, 27 May 2019 12:21:49 +0000 (14:21 +0200)]
swr/rast: Enable ARB_GL_texture_buffer_range

No significant changes in the code needed to enable
the extension. Just updating SWR capabilities
and the documentation

Reviewed-by: Alok Hota <alok.hota@intel.com>
5 years agoswr/rast: fix 32-bit compilation on Linux
Jan Zielinski [Mon, 27 May 2019 12:15:53 +0000 (14:15 +0200)]
swr/rast: fix 32-bit compilation on Linux

Removing unused but problematic code from simdlib header to fix
compilation problem on 32-bit Linux.

Reviewed-by: Alok Hota <alok.hota@intel.com>
5 years agointel/fs: Do a stalling MFENCE in endInvocationInterlock()
Jason Ekstrand [Wed, 22 May 2019 17:36:17 +0000 (12:36 -0500)]
intel/fs: Do a stalling MFENCE in endInvocationInterlock()

Fixes: 939312702e "i965: Add ARB_fragment_shader_interlock support"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agointel/fs,vec4: Use g0 as the header for MFENCE
Jason Ekstrand [Wed, 22 May 2019 17:20:01 +0000 (12:20 -0500)]
intel/fs,vec4: Use g0 as the header for MFENCE

We set header_present but then pass it some random garbage.  Give it g0
instead.  I'm not actually sure this does anything but g0 is the usual
header data and this is what the windows driver does so it seems like a
good idea.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoradv: enable transformFeedbackStreamsLinesTriangles
Samuel Pitoiset [Thu, 30 May 2019 08:08:48 +0000 (10:08 +0200)]
radv: enable transformFeedbackStreamsLinesTriangles

The driver should already support this without any changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: implement VK_EXT_sample_locations and disable it
Samuel Pitoiset [Thu, 16 May 2019 09:55:02 +0000 (11:55 +0200)]
radv: implement VK_EXT_sample_locations and disable it

Basically, this extension allows applications to use custom
sample locations. It doesn't support variable sample locations
during subpass. Note that we don't have to upload the user
sample locations because the spec doesn't allow this.

The extension is currently disabled because the driver needs to
support variable sample locations during layout transitions. The
depth decompress needs to know them and that's a bit invasive.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoiris: Avoid holding the lock while allocating pages.
Kenneth Graunke [Thu, 30 May 2019 07:04:38 +0000 (00:04 -0700)]
iris: Avoid holding the lock while allocating pages.

We only need the lock for:
1. Rummaging through the cache
2. Allocating VMA

We don't need it for alloc_fresh_bo(), which does GEM_CREATE, and also
SET_DOMAIN to allocate the underlying pages.  The idea behind calling
SET_DOMAIN was to avoid a lock in the kernel while allocating pages,
now we avoid our own global lock as well.

We do have to re-lock around VMA.  Hopefully this shouldn't happen too
much in practice because we'll find a cached BO in the right memzone
and not have to reallocate it.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
5 years agoiris: Move SET_DOMAIN to alloc_fresh_bo()
Kenneth Graunke [Thu, 30 May 2019 06:40:20 +0000 (23:40 -0700)]
iris: Move SET_DOMAIN to alloc_fresh_bo()

Chris pointed out that the order between SET_DOMAIN and SET_TILING
doesn't matter, so we can just do the page allocation when creating
a new BO.  Simplifies the flow a bit.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
5 years agoiris: Be lazy about cleaning up purged BOs in the cache.
Kenneth Graunke [Thu, 30 May 2019 06:20:31 +0000 (23:20 -0700)]
iris: Be lazy about cleaning up purged BOs in the cache.

Mathias Fröhlich reported that commit 6244da8e23e5470d067680 crashes.
list_for_each_entry_safe is safe against removing the current entry,
but iris_bo_cache_purge_bucket was potentially removing next entries
too, which broke our saved next pointer.

To fix this, don't bother with the iris_bo_cache_purge_bucket step.
We just detected a single entry where the kernel has purged the BO's
memory, and so it isn't a usable entry for our cache.  We're about to
continue the search with the next BO.  If that one's purged, we'll
clean it up too.  And so on.

We may miss cleaning up purged BOs that are further down the list
after non-purged BOs...but that's probably fine.  We still have the
time-based cleaner (cleanup_bo_cache) which will take care of them
eventually, and the kernel's already freed their memory, so it's not
that harmful to have a few kicking around a little longer.

Fixes: 6244da8e23e iris: Dig through the cache to find a BO in the right memzone
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
5 years agoiris: Dig through the cache to find a BO in the right memzone
Kenneth Graunke [Mon, 27 May 2019 00:11:59 +0000 (17:11 -0700)]
iris: Dig through the cache to find a BO in the right memzone

This saves some util_vma thrash when the first entry in the cache
happens to be in a different memory zone, but one just a tiny bit
ahead is already there and instantly reusable.  Hopefully the cost
of a little extra searching won't break the bank - if it does, we
can consider having separate list heads or keeping a separate VMA
cache.

Improves OglDrvRes performance by 22%, restoring a regression from
deleting the bucket allocators in 694d1a08d3e5883d97d5352895f8431f.

Thanks to Clayton Craft for alerting me to the regression.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Tidy BO sizing code and comments
Kenneth Graunke [Sun, 26 May 2019 23:21:55 +0000 (16:21 -0700)]
iris: Tidy BO sizing code and comments

Buckets haven't been power of two sized in over a decade.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Move some field setting after we drop the lock.
Kenneth Graunke [Sun, 26 May 2019 23:11:46 +0000 (16:11 -0700)]
iris: Move some field setting after we drop the lock.

It's not much, but we may as well hold the lock for a bit less time.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Move cached BO allocation into a helper function.
Kenneth Graunke [Sun, 26 May 2019 22:52:56 +0000 (15:52 -0700)]
iris: Move cached BO allocation into a helper function.

There's enough going on here to warrant a helper.  This also simplifies
the control flow and eliminates the last non-error-case goto.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Fall back to fresh allocations of mapping for zero-memset fails.
Kenneth Graunke [Sun, 26 May 2019 20:48:42 +0000 (13:48 -0700)]
iris: Fall back to fresh allocations of mapping for zero-memset fails.

It is unlikely that we would fail to map a cached BO in order to zero
its contents.  When we did, we would free the first BO in the cache and
try again with the second.  It's possible that this next BO already had
a map setup, in which case we'd succeed.  But if it didn't, we'd likely
fail again in the same manner.

There's not much point in optimizing this case (and frankly, if we're
out of CPU-side VMA we should probably dump the cache entirely)...so
instead, just fall back to allocating a fresh BO from the kernel which
will already be zeroed so we don't have to try and map it.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Move fresh BO allocation into a helper function.
Kenneth Graunke [Sun, 26 May 2019 20:43:32 +0000 (13:43 -0700)]
iris: Move fresh BO allocation into a helper function.

There's enough going on here to warrant a helper.  More cleaning coming.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Do SET_TILING at a single point rather than in two places.
Kenneth Graunke [Sun, 26 May 2019 20:34:28 +0000 (13:34 -0700)]
iris: Do SET_TILING at a single point rather than in two places.

Both the from-cache and fresh-from-GEM cases were calling SET_TILING.
In the cached case, we would retry the allocation on failure, pitching
one BO from the cache each time.  This is silly, because the only time
it should fail is if the tiling or stride parameters are unacceptable,
which has nothing to do with the particular BO in question.  So there's
no point in retrying - we should simply fail the allocation.

This patch moves both calls to bo_set_tiling_internal() below the
cache/fresh split, so we have it at a single point in time instead
of two.

To preserve the ordering between SET_TILING and SET_DOMAIN, we move
that below as well.  (I am unsure if the order matters.)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Use the BO cache even for coherent buffers on non-LLC.
Kenneth Graunke [Sun, 26 May 2019 20:03:20 +0000 (13:03 -0700)]
iris: Use the BO cache even for coherent buffers on non-LLC.

We mark snooped BOs as non-reusable, so we never return them to the
cache.  This means that we'd need to call I915_GEM_SET_CACHING to make
any BO we find in the cache snooped.  But then again, any BO we freshly
allocate from the kernel will also be non-snooped, so it has the same
issue.  There's really no reason to skip the cache - we may as well use
it to avoid the I915_GEM_CREATE overhead.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Fix locking around vma_alloc in iris_bo_create_userptr
Kenneth Graunke [Sun, 26 May 2019 22:57:55 +0000 (15:57 -0700)]
iris: Fix locking around vma_alloc in iris_bo_create_userptr

util_vma needs to be protected by a lock.  All other callers of
vma_alloc and vma_free appear to be holding a lock already.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Fix lock/unlock mismatch for non-LLC coherent BO allocation.
Kenneth Graunke [Sun, 26 May 2019 19:58:42 +0000 (12:58 -0700)]
iris: Fix lock/unlock mismatch for non-LLC coherent BO allocation.

The goto jumped over the mtx_lock, but proceeded to hit the mtx_unlock.
We can simply set the bucket to NULL and it will skip the cache without
goto, and without messing up locking.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoradeonsi: fix timestamp queries for compute-only contexts
Marek Olšák [Mon, 27 May 2019 20:09:33 +0000 (16:09 -0400)]
radeonsi: fix timestamp queries for compute-only contexts

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
5 years agoChange a few frequented uses of DEBUG to !NDEBUG
Marek Olšák [Fri, 10 May 2019 01:04:23 +0000 (21:04 -0400)]
Change a few frequented uses of DEBUG to !NDEBUG

debugoptimized builds don't define NDEBUG, but they also don't define
DEBUG. We want to enable cheap debug code for these builds.
I only chose those occurences that I care about.

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
5 years agoiris: Re-emit Surface State Base Address when context is lost.
Kenneth Graunke [Wed, 29 May 2019 21:48:41 +0000 (14:48 -0700)]
iris: Re-emit Surface State Base Address when context is lost.

When we hit a GPU hang, we failed to reset Surface State Base Address
right away, and would keep hanging until we filled up the binder.  Then
we'd finally get it right after a lot of repeated stumbles.  Update it
right away so we hopefully hang fewer times before succeeding.

5 years agoiris: Enable nir_opt_large_constants
Jason Ekstrand [Tue, 28 May 2019 22:33:58 +0000 (17:33 -0500)]
iris: Enable nir_opt_large_constants

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15306230 -> 15304726 (<.01%)
    instructions in affected programs: 4570 -> 3066 (-32.91%)
    helped: 16
    HURT: 0

    total cycles in shared programs: 361703436 -> 361680041 (<.01%)
    cycles in affected programs: 129388 -> 105993 (-18.08%)
    helped: 16
    HURT: 0

    LOST:   0
    GAINED: 2

The helped programs were in XCom 2, Deus Ex: Mankind Divided, and Kerbal
Space Program

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoiris: Don't assume UBO indices are constant
Jason Ekstrand [Wed, 29 May 2019 02:56:04 +0000 (21:56 -0500)]
iris: Don't assume UBO indices are constant

It will be true for the constant/system value buffer because they use a
constant zero but it's not true in general.  If we ever got here when
the source wasn't constant, nir_src_as_uint would assert.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
5 years agoiris: Move upload_ubo_ssbo_surf_state to iris_program.c
Jason Ekstrand [Tue, 28 May 2019 22:52:58 +0000 (17:52 -0500)]
iris: Move upload_ubo_ssbo_surf_state to iris_program.c

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agonir: silence three compiler warnings seen with MinGW
Brian Paul [Mon, 20 May 2019 13:33:14 +0000 (07:33 -0600)]
nir: silence three compiler warnings seen with MinGW

Silence two unused var warnings.  And init elem_size, elem_align to
zero to silence "maybe uninitialized" warnings.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agosvga: clamp max_const_buffers to SVGA_MAX_CONST_BUFS
Brian Paul [Mon, 20 May 2019 12:24:06 +0000 (06:24 -0600)]
svga: clamp max_const_buffers to SVGA_MAX_CONST_BUFS

In case the device reports 15 (or more) buffers.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
5 years agoiris: Clone before calling nir_strip and serializing
Kenneth Graunke [Tue, 28 May 2019 22:39:24 +0000 (15:39 -0700)]
iris: Clone before calling nir_strip and serializing

This is non-destructive and leaves the debugging information in place.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>