mesa.git
5 years agoradv: Use bo metadata for imported image tiling on Android.
Bas Nieuwenhuizen [Mon, 13 May 2019 12:09:55 +0000 (14:09 +0200)]
radv: Use bo metadata for imported image tiling on Android.

This way we handle linear images etc. correctly.

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agovl: Enable DRM by default.
Bas Nieuwenhuizen [Thu, 30 May 2019 18:34:06 +0000 (20:34 +0200)]
vl: Enable DRM by default.

If libdrm is found the pipe loader enables drm anyway, and that is
pretty much the only extra dependency this code has.

This enables creating libva display using a drm fd without having
to enable the DRM (GBM really) backend of EGL, which is completely
unrelated.

Leaving the X11 platforms alone as they would still result in the
additional inclusion of extra deps.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoanv: Advertise support for VK_EXT_fragment_shader_interlock
Jason Ekstrand [Fri, 17 May 2019 16:33:23 +0000 (11:33 -0500)]
anv: Advertise support for VK_EXT_fragment_shader_interlock

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agospirv: Implement SPV_EXT_fragment_shader_interlock
Jason Ekstrand [Fri, 17 May 2019 16:32:10 +0000 (11:32 -0500)]
spirv: Implement SPV_EXT_fragment_shader_interlock

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agospirv: Update the headers from latest Khronos master
Jason Ekstrand [Fri, 17 May 2019 16:20:13 +0000 (11:20 -0500)]
spirv: Update the headers from latest Khronos master

This corresponds to 8b911bd2ba37677037b38c9bd286c7c05701bcda in
https://github.com/KhronosGroup/SPIRV-Headers.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agovulkan: Update the XML and headers to 1.1.110
Jason Ekstrand [Fri, 17 May 2019 16:01:20 +0000 (11:01 -0500)]
vulkan: Update the XML and headers to 1.1.110

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoac/nir: mark some texture intrinsics as convergent
Rhys Perry [Wed, 29 May 2019 15:07:44 +0000 (16:07 +0100)]
ac/nir: mark some texture intrinsics as convergent

Otherwise LLVM can sink them and their texture coordinate calculations
into divergent branches.

v2: simplify the conditions on which the intrinsic is marked as convergent
v3: only mark as convergent in FS and CS with derivative groups

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradv: fix some compiler warnings
Rhys Perry [Thu, 30 May 2019 14:55:11 +0000 (15:55 +0100)]
radv: fix some compiler warnings

Fixes -Woverflow warnings with GCC 9.1.1

v2: use a cast instead of a bitwise and

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agointel/fs: Skip registers faster when setting spill costs
Jason Ekstrand [Mon, 3 Jun 2019 22:09:12 +0000 (17:09 -0500)]
intel/fs: Skip registers faster when setting spill costs

This might be slightly faster since we're doing one read rather than
two before we decide to skip.  The more important reason, however, is
because no_spill prevents us from re-spilling spill registers.  In the
new world in which we don't re-calculate liveness every spill, we may
not have valid liveness for spill registers so we shouldn't even look
their live ranges up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110825
Fixes: e99081e76d4 "intel/fs/ra: Spill without destroying the..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
5 years agoradeonsi/nir: Fix type in bindless address computation
Connor Abbott [Fri, 24 May 2019 13:08:06 +0000 (15:08 +0200)]
radeonsi/nir: Fix type in bindless address computation

Bindless handles in GL are 64-bit. This fixes an assert failure in LLVM.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoetnaviv: implement set_active_query_state(..) for hw queries
Christian Gmeiner [Tue, 28 May 2019 19:43:51 +0000 (21:43 +0200)]
etnaviv: implement set_active_query_state(..) for hw queries

Clear w/ quad uses a normal draw which adds up to OQ. st/meta
uses set_active_query_state(..) to tell the driver to pause
queries in such cases.

Fixes spec@arb_occlusion_query@occlusion_query_meta_save piglit.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
5 years agoradv: do not use gfx fast depth clears for layered depth/stencil images
Samuel Pitoiset [Mon, 3 Jun 2019 15:52:56 +0000 (17:52 +0200)]
radv: do not use gfx fast depth clears for layered depth/stencil images

The driver should only fast depth clears with the graphics path
when the view covers all image layers, otherwise this might
corrupt layers when HTILE is enabled.

Cc: 19.0 19.1 mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoac,radv: do not emit vec3 for raw load/store on SI
Samuel Pitoiset [Mon, 3 Jun 2019 13:09:38 +0000 (15:09 +0200)]
ac,radv: do not emit vec3 for raw load/store on SI

It's unsupported, only load/store format with vec3 are supported.

Fixes: 6970a9a6ca9 ("ac,radv: remove the vec3 restriction with LLVM 9+")"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agointel/compiler: Fix assertions in brw_alu3
Sagar Ghuge [Tue, 16 Apr 2019 06:26:47 +0000 (23:26 -0700)]
intel/compiler: Fix assertions in brw_alu3

v2: Fix assertion for src1 (Ian Romanick)

Fixes: 3b967e17 (intel/compiler: Avoid false positive assertions)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agoiris: Fix SO stride units for DrawTransformFeedback
Kenneth Graunke [Mon, 3 Jun 2019 23:52:59 +0000 (16:52 -0700)]
iris: Fix SO stride units for DrawTransformFeedback

Mesa measures in DWords.  The hardware also claims to measure in DWords.
Except the SO_WRITE_OFFSET field is actually bits 31:2, with 1:0 MBZ.
Which means that it really measures in bytes.  So, convert to bytes.

Without this, our offset / stride denominator was 1/4th the size it
should be, leading to 4x the vertex count that we should have had.

Fixes GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_two_buffers

5 years agost/glsl: make sure to propagate initialisers to driver storage
Timothy Arceri [Wed, 29 May 2019 03:13:44 +0000 (13:13 +1000)]
st/glsl: make sure to propagate initialisers to driver storage

This essentially reverts 20234cfe3a20.

Fixes piglit test:
tests/spec/arb_get_program_binary/execution/uniform-after-restore.shader_test

Fixes: 20234cfe3a20 "st/mesa: don't propagate uniforms when restoring from cache"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110784

5 years agospirv: Like Uniform, do nothing for UniformId
Caio Marcelo de Oliveira Filho [Fri, 26 Apr 2019 20:21:56 +0000 (13:21 -0700)]
spirv: Like Uniform, do nothing for UniformId

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agospirv: Implement SpvOpCopyLogical
Caio Marcelo de Oliveira Filho [Thu, 25 Apr 2019 08:30:24 +0000 (01:30 -0700)]
spirv: Implement SpvOpCopyLogical

This is the same as SpvOpCopyObject but without the type checking,
which is how vtn_composite_copy works, so we just need to hook the
operation.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agospirv: Generalize OpSelect
Caio Marcelo de Oliveira Filho [Mon, 22 Apr 2019 16:45:53 +0000 (09:45 -0700)]
spirv: Generalize OpSelect

SPIR-V 1.4 supports OpSelect over any composite type, and also allows
scalar boolean condition for vector types -- a case which we already
handled to support old GLSLang.

Added a helper function to recursively perform nir_bcsel, that makes
easier to support structs.

v2: Replace asserts() with vtn_fail_if().  (Jason)

v3: Simplify Condition and Result types verifications.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agospirv: Move OpSelect handling to a function
Caio Marcelo de Oliveira Filho [Mon, 3 Jun 2019 22:30:33 +0000 (15:30 -0700)]
spirv: Move OpSelect handling to a function

This will make a later change easier to review.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir/vars_to_ssa: Handle UNDEF_NODE in more places
Caio Marcelo de Oliveira Filho [Mon, 3 Jun 2019 21:13:16 +0000 (14:13 -0700)]
nir/vars_to_ssa: Handle UNDEF_NODE in more places

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110832
Fixes: 911ea2c66fc "nir/vars_to_ssa: Use a non-null UNDEF_NODE pointer"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoac/registers: don't use the si, cik, vi names, use gfxN
Marek Olšák [Mon, 3 Jun 2019 21:07:16 +0000 (17:07 -0400)]
ac/registers: don't use the si, cik, vi names, use gfxN

trivial

5 years agoamd/common: use generated register header
Nicolai Hähnle [Mon, 6 May 2019 23:08:43 +0000 (01:08 +0200)]
amd/common: use generated register header

5 years agoamd/common: use SH{0,1}_CU_EN definitions only of COMPUTE_STATIC_THREAD_MGMT_SE0
Nicolai Hähnle [Mon, 6 May 2019 23:46:28 +0000 (01:46 +0200)]
amd/common: use SH{0,1}_CU_EN definitions only of COMPUTE_STATIC_THREAD_MGMT_SE0

The automatic header generation unifies identical registers in a series
and only emits definitions for the first one. This is mostly to avoid
emitting excessive definitions for CB registers, but special-casing
an exception for this family of registers doesn't seem worth it.

5 years agoamd/common: unify PITCH_GFX6 and PITCH_GFX9
Nicolai Hähnle [Mon, 6 May 2019 23:44:52 +0000 (01:44 +0200)]
amd/common: unify PITCH_GFX6 and PITCH_GFX9

The definition of the fields differs, but PITCH_GFX9 is a mere extension
of PITCH_GFX6 that does not conflict with any other fields.

This aligns the definitions with what will be generated from the
register JSON.

The information about how large the fields really are is preserved in
the register database.

5 years agoamd/common: rename R_3F2_CONTROL to IB_CONTROL for disambiguation
Nicolai Hähnle [Mon, 6 May 2019 12:47:40 +0000 (14:47 +0200)]
amd/common: rename R_3F2_CONTROL to IB_CONTROL for disambiguation

This "register" name collides with R_370_CONTROL.

This aligns the definitions with what will be generated from the
register JSON.

5 years agoamd/common: cleanup DATA_FORMAT/NUM_FORMAT field names
Nicolai Hähnle [Mon, 13 Nov 2017 15:35:59 +0000 (16:35 +0100)]
amd/common: cleanup DATA_FORMAT/NUM_FORMAT field names

The field layout wasn't actually changed in gfx9, so having the suffix
isn't very useful. The field *contents* were changed, but this is
reflected in the V_xxx_xxx definitions and is taken into account by
the ac_debug logic based on the register JSON.

This aligns the definitions with what will be generated from the
register JSON.

5 years agoamd/common: derive ac_debug tables from register JSON
Nicolai Hähnle [Mon, 6 May 2019 22:20:23 +0000 (00:20 +0200)]
amd/common: derive ac_debug tables from register JSON

5 years agoamd/registers: add JSON description of packet3 fields
Nicolai Hähnle [Mon, 6 May 2019 12:48:58 +0000 (14:48 +0200)]
amd/registers: add JSON description of packet3 fields

5 years agoamd/registers: add JSON descriptions of registers
Nicolai Hähnle [Mon, 6 May 2019 08:33:05 +0000 (10:33 +0200)]
amd/registers: add JSON descriptions of registers

The descriptions are mostly derived from parsing the existing
register headers.

5 years agoamd/registers: scripts for processing register descriptions in JSON
Nicolai Hähnle [Mon, 6 May 2019 08:31:19 +0000 (10:31 +0200)]
amd/registers: scripts for processing register descriptions in JSON

We will derive both the debugging tables and (the majority of) the
register headers from descriptions in JSON, instead of deriving the
debugging tables from an awkward parsing of the register headers.

Some of the scripts are useful for maintaining the register database
itself. The scripts are designed to output reasonably readable JSON
by default.

5 years agofreedreno: Fix GCC build error.
Vinson Lee [Thu, 30 May 2019 21:47:37 +0000 (14:47 -0700)]
freedreno: Fix GCC build error.

../src/freedreno/vulkan/tu_device.c:900:4: error: initializer element is not constant
    .minImageTransferGranularity = (VkExtent3D) { 1, 1, 1 },
    ^

Suggested-by: Kristian Høgsberg <krh@bitplanet.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110698
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agomesa: Use string literals for format strings
Mark Janes [Mon, 3 Jun 2019 22:42:22 +0000 (15:42 -0700)]
mesa: Use string literals for format strings

Android build settings require format strings to be string literals.

Fixes: d2906293c43 "mesa: EXT_dsa add selectorless matrix stack functions"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110833
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoiris: Always reserve binding table space for NIR constants
Caio Marcelo de Oliveira Filho [Wed, 29 May 2019 22:30:51 +0000 (15:30 -0700)]
iris: Always reserve binding table space for NIR constants

Don't have a separate mechanism for NIR constants to be removed from
the table.  If unused, we will compact it away.  The use_null_surface
is needed when INTEL_DISABLE_COMPACT_BINDING_TABLE is set.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoiris: Print binding tables when INTEL_DEBUG=bt
Caio Marcelo de Oliveira Filho [Thu, 23 May 2019 21:17:41 +0000 (14:17 -0700)]
iris: Print binding tables when INTEL_DEBUG=bt

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoiris: Compact binding tables
Caio Marcelo de Oliveira Filho [Thu, 23 May 2019 21:17:59 +0000 (14:17 -0700)]
iris: Compact binding tables

Change the iris_binding_table to keep track of what surfaces are
actually going to be used, then assign binding table indices just for
those.  Reducing unused bytes on those are valuable because we use a
reduced space for those tables in Iris.

The rest of the driver can go from "group indices" (i.e. UBO #2) to
BTI and vice-versa using helper functions.  The value
IRIS_SURFACE_NOT_USED is returned to indicate a certain group index is
not used or a certain BTI is not valid.

The environment variable INTEL_DISABLE_COMPACT_BINDING_TABLE can be
set to skip compacting binding table.

v2: (all from Ken)
    Use BITFIELD64_MASK helper. Improve comments.
    Assert all group is marked as used when we have indirects.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoiris: Create an enum for the surface groups
Caio Marcelo de Oliveira Filho [Thu, 23 May 2019 15:44:29 +0000 (08:44 -0700)]
iris: Create an enum for the surface groups

This will make convenient to handle compacting and printing the
binding table.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoiris: Handle binding table in the driver
Caio Marcelo de Oliveira Filho [Thu, 23 May 2019 05:17:27 +0000 (22:17 -0700)]
iris: Handle binding table in the driver

Stop using brw_compiler to lower the final binding table indices for
surface access.  This is done by simply not setting the
'prog_data->binding_table.*_start' fields.  Then make the driver
perform this lowering.

This is a better place to perfom the binding table assignments, since
the driver has more information and will also later consume those
assignments to upload resources.

This also prepares us for two changes: use ibc without having to
implement binding table logic there; and remove unused entries from
the binding table.

Since the `block` field in brw_ubo_range now refers to the final
binding table index, we need to adjust it before using to index
shs->constbuf.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoiris: Pull brw_nir_analyze_ubo_ranges() call out setup_uniforms
Caio Marcelo de Oliveira Filho [Thu, 23 May 2019 08:08:15 +0000 (01:08 -0700)]
iris: Pull brw_nir_analyze_ubo_ranges() call out setup_uniforms

We'll change iris to perform lowering of the binding table indices
earlier (before the backend kick in), but the backend compiler uses
the result of the analysis to identify load_ubo intrinsics, so we do
the analysis after the lowering to have the right indices.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agospirv: Implement OpPtrEqual, OpPtrNotEqual and OpPtrDiff
Caio Marcelo de Oliveira Filho [Thu, 16 May 2019 22:17:31 +0000 (15:17 -0700)]
spirv: Implement OpPtrEqual, OpPtrNotEqual and OpPtrDiff

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Add functions to subtract and compare addresses
Caio Marcelo de Oliveira Filho [Thu, 16 May 2019 22:11:07 +0000 (15:11 -0700)]
nir: Add functions to subtract and compare addresses

v2: Fix comparing addresses from formats that have more than one
    component by using nir_ball_iequal().  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Add nir_ball_iequal() helper
Caio Marcelo de Oliveira Filho [Fri, 31 May 2019 20:48:34 +0000 (13:48 -0700)]
nir: Add nir_ball_iequal() helper

Similar to nir_bany_inequal().  Suggested by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agomesa: ARB program parser should clean parameters
Sergii Romantsov [Tue, 28 May 2019 09:24:36 +0000 (12:24 +0300)]
mesa: ARB program parser should clean parameters

Program parser allocates parameter list.
In case of parsing error some variables will not be freed.
Patch adds freeing of it.

Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
5 years agofreedreno/ir3: fix counting and printing for half registers.
Hyunjun Ko [Sat, 4 May 2019 13:23:03 +0000 (13:23 +0000)]
freedreno/ir3: fix counting and printing for half registers.

v2: defining 0x100 and use this for setting the FS_OUTPUT_REG.HALF_PRECISION
Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: Fix up the half reg source even when src instr==NULL
Neil Roberts [Fri, 15 Mar 2019 13:32:27 +0000 (14:32 +0100)]
freedreno/ir3: Fix up the half reg source even when src instr==NULL

Previously the loop for assigning registers was bailing out early if
the register had a null source. I think the intention is that in this
case it isn’t necessary to assign a register. However it was also
missing out the part to fix up the types. This can happen if the
instruction is copy propagated to be a move from a constant half-float
input register. In that case it still needs to fix up the types.

Fixes assert in
dEQP-GLES3.functional.shaders.invariance.highp.subexpression_precision_mediump

when lowering the precision of the variables.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: Add a 16-bit implementation of nir_op_imul
Neil Roberts [Thu, 28 Feb 2019 15:13:56 +0000 (16:13 +0100)]
freedreno/ir3: Add a 16-bit implementation of nir_op_imul

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: set dst type of alu instructions correctly.
Hyunjun Ko [Tue, 19 Mar 2019 07:17:40 +0000 (07:17 +0000)]
freedreno/ir3: set dst type of alu instructions correctly.

Though it should be fixed in RA pass, it needs to be set correctly from
the beginning according to the bitsize of NIR dest.

v2: Would be better for mad,fddx,fddy to fixup later in RA pass.

[small cleanup of fallout from imov/fmov removal fallout]
Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: adjust the bitsize of regs when an array loading.
Hyunjun Ko [Mon, 22 Apr 2019 06:16:48 +0000 (06:16 +0000)]
freedreno/ir3: adjust the bitsize of regs when an array loading.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: convert back to 32-bit values for half constant registers.
Hyunjun Ko [Thu, 21 Mar 2019 08:30:11 +0000 (17:30 +0900)]
freedreno/ir3: convert back to 32-bit values for half constant registers.

It seems to handle only 32-bit values for half constant registers
within floating point opcodes according to the blob driver.
So we need to convert back to 32-bit values from 16-bit values, when a
lower precision pass is in effect.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov.
Hyunjun Ko [Mon, 15 Apr 2019 03:42:23 +0000 (03:42 +0000)]
freedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov.

If the type of dest reg and src reg of absneg opcode are different,
it shouldn't be considered as same type mov.

This patch becomes meaningful when we start to use mediump information for
doing precision lowering to 16bit.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: set proper dst type for uniform according to the type of nir dest.
Hyunjun Ko [Tue, 26 Feb 2019 08:33:34 +0000 (08:33 +0000)]
freedreno/ir3: set proper dst type for uniform according to the type of nir dest.

eg. uniform mediump vec4 f;

This patch means nothing since there's no mediump lowering pass for now,
but will be meaningful when the pass land in the near future.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: Use output type size to set OUTPUT_REG_HALF_PRECISION
Neil Roberts [Tue, 4 Dec 2018 17:32:15 +0000 (18:32 +0100)]
freedreno/ir3: Use output type size to set OUTPUT_REG_HALF_PRECISION

Previously the A5XX_SP_FS_OUTPUT_REG_HALF_PRECISION was set depending
on whether half_precision was set in the shader key. With support for
mediump precision, it is possible to have different outputs use
different precisions. That means we can’t have a global shader state
to specify it. Instead it now tries to copy the half-float-ness
from the nir_variable for the output into the ir3_shader_variant. This
is then used to decide whether to set half-precision for each output.

The a6xx version is copied from the a5xx code but it has not been
tested.

v2. [Hyunjun Ko (zzoon@igalia.com)] There's the half flag recently
added, which represents precision based on IR3_REG_HALF. Now use this
flag to avoid duplication.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: Fix loading half-float immediate vectors
Neil Roberts [Sun, 2 Dec 2018 17:15:52 +0000 (18:15 +0100)]
freedreno/ir3: Fix loading half-float immediate vectors

Previously the code to load from a constant instruction was always
using the u32 pointer. If the constant is actually a 16-bit source
this would end up with the wrong values because the pointer would be
offset by the wrong size. This fixes it to use the u16 pointer.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: immediately schedule meta instructions
Rob Clark [Tue, 28 May 2019 16:42:26 +0000 (09:42 -0700)]
freedreno/ir3: immediately schedule meta instructions

The aren't real instructions, and don't change # of live values, so no
point in them competing with real instructions.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/ir3: scheduler improvements
Rob Clark [Thu, 30 May 2019 17:44:16 +0000 (10:44 -0700)]
freedreno/ir3: scheduler improvements

For instructions that increase the # of live values, apply a threshold
to avoid scheduling them too early.  And factor the net change of # of
live values that would result from scheduling an instruction, to
prioritize instructions that reduce number of live values as the number
of live values increases.

For manhattan:

  total instructions in shared programs: 27869 -> 28413 (1.95%)
  instructions in affected programs: 26756 -> 27300 (2.03%)
  helped: 102
  HURT: 87

  total full in shared programs: 1903 -> 1719 (-9.67%)
  full in affected programs: 1390 -> 1206 (-13.24%)
  helped: 124
  HURT: 9

The reduction in register usage nets ~20% gain in manhattan.  (So
getting mediump support should be a huge win for gles gfxbench.)

Also significantly helps some of the more complex shadertoy shaders,
like IQ's Piano (32 to 18 regs, doubles fps).

The effect is less pronounced on smaller shaders.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/ir3: sched should mark outputs used
Rob Clark [Fri, 24 May 2019 16:19:55 +0000 (09:19 -0700)]
freedreno/ir3: sched should mark outputs used

Account for shader outputs and values live in any direct/indirect
successor block.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agomesa: EXT_dsa add selectorless matrix stack functions
Pierre-Eric Pelloux-Prayer [Tue, 7 May 2019 09:20:51 +0000 (11:20 +0200)]
mesa: EXT_dsa add selectorless matrix stack functions

Allows the legacy matrix stacks to be manipulated without disturbing the
matrix mode selector.

Adapted from a patch from Chris Forbes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomesa: factor out enum -> matrix stack lookup
Pierre-Eric Pelloux-Prayer [Tue, 7 May 2019 09:20:20 +0000 (11:20 +0200)]
mesa: factor out enum -> matrix stack lookup

Split this out from glMatrixMode since we're about to need it
independently for EXT_DSA.

Adapted from Chris Forbes commit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomesa: add new EXT_direct_state_access tokens
Timothy Arceri [Thu, 16 Aug 2018 01:20:37 +0000 (11:20 +1000)]
mesa: add new EXT_direct_state_access tokens

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoglapi: add EXT_direct_state_access
Chris Forbes [Thu, 17 Jan 2013 09:02:39 +0000 (22:02 +1300)]
glapi: add EXT_direct_state_access

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomesa: add a list of EXT_direct_state_access to dispatch sanity
Timothy Arceri [Sat, 8 Sep 2018 04:01:16 +0000 (14:01 +1000)]
mesa: add a list of EXT_direct_state_access to dispatch sanity

This extension is huge and this gives us a TODO list of functions
to implement.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradeonsi: init sctx->dma_copy before using it
Pierre-Eric Pelloux-Prayer [Fri, 31 May 2019 12:39:46 +0000 (14:39 +0200)]
radeonsi: init sctx->dma_copy before using it

Commit a1378639ab19 reordered context functions initializations but broke
sctx->b.resource_copy_region init when using AMD_DEBUG=forcedma.

In this case sctx->dma_copy was assigned a value after being used in:
   sctx->b.resource_copy_region = sctx->dma_copy;

This commit moves the FORCE_DMA special case after sctx->dma_copy initialization.

See https://bugs.freedesktop.org/show_bug.cgi?id=110422

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
5 years agod3dadapter9: Revert to old throttling limit value
Axel Davy [Sun, 26 May 2019 20:59:30 +0000 (22:59 +0200)]
d3dadapter9: Revert to old throttling limit value

Recently PIPE_CAP_MAX_FRAMES_IN_FLIGHT was changed from 2
to 1:
20909284f204091757c050aa40cfffaf3f981b9c

No driver seems to overwrite the default value.

One user reports severe regressions for some games.
For now, revert to the value 2 for nine.

Cc: "19.1" mesa-stable@lists.freedesktop.org
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
5 years agoac: use amdgpu-flat-work-group-size
Marek Olšák [Fri, 31 May 2019 19:38:39 +0000 (15:38 -0400)]
ac: use amdgpu-flat-work-group-size

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agou_blitter: don't fail mipmap generation for depth formats containing stencil
Marek Olšák [Mon, 27 May 2019 22:47:31 +0000 (18:47 -0400)]
u_blitter: don't fail mipmap generation for depth formats containing stencil

Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109754

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agoetnaviv: drop a bunch of duplicated gallium PIPE_CAP default code
Christian Gmeiner [Mon, 27 May 2019 18:49:58 +0000 (20:49 +0200)]
etnaviv: drop a bunch of duplicated gallium PIPE_CAP default code

Now that we have the util function for the default values, we can get
rid of the boilerplate.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
5 years agoradv: flush pending query reset caches before copying results
Samuel Pitoiset [Fri, 24 May 2019 08:10:08 +0000 (10:10 +0200)]
radv: flush pending query reset caches before copying results

From the Vulkan spec 1.1.108:
   "vkCmdCopyQueryPoolResults is guaranteed to see the effect of
    previous uses of vkCmdResetQueryPool in the same queue, without any
    additional synchronization."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir: copy intrinsic type when lowering load input/uniform and store output
Jonathan Marek [Sun, 2 Jun 2019 19:16:06 +0000 (15:16 -0400)]
nir: copy intrinsic type when lowering load input/uniform and store output

Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics"
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
5 years agoac,radv: remove the vec3 restriction with LLVM 9+
Samuel Pitoiset [Thu, 2 May 2019 14:15:03 +0000 (16:15 +0200)]
ac,radv: remove the vec3 restriction with LLVM 9+

This changes requires LLVM r356755.

32706 shaders in 16744 tests
Totals:
SGPRS: 1448848 -> 1455984 (0.49 %)
VGPRS: 1016684 -> 1016220 (-0.05 %)
Spilled SGPRs: 25871 -> 25815 (-0.22 %)
Spilled VGPRs: 122 -> 122 (0.00 %)
Scratch size: 11964 -> 11956 (-0.07 %) dwords per thread
Code Size: 55324500 -> 55301152 (-0.04 %) bytes
Max Waves: 235660 -> 235586 (-0.03 %)

Totals from affected shaders:
SGPRS: 293704 -> 300840 (2.43 %)
VGPRS: 246716 -> 246252 (-0.19 %)
Spilled SGPRs: 159 -> 103 (-35.22 %)
Scratch size: 188 -> 180 (-4.26 %) dwords per thread
Code Size: 8653664 -> 8630316 (-0.27 %) bytes
Max Waves: 60811 -> 60737 (-0.12 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agonir: Return nir_type_invalid for non-numeric base types
Caio Marcelo de Oliveira Filho [Fri, 31 May 2019 23:15:02 +0000 (16:15 -0700)]
nir: Return nir_type_invalid for non-numeric base types

Now that the type gathering function look at instructions that might
have other types, return invalid type instead of crashing.  That
invalid will be properly ignored later.

Fixes: c12750527b7 "nir: add type information to load uniform/input and store output intrinsics"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoiris: Drop unused locals from iris_clear.c to avoid warning
Caio Marcelo de Oliveira Filho [Fri, 31 May 2019 21:30:18 +0000 (14:30 -0700)]
iris: Drop unused locals from iris_clear.c to avoid warning

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
5 years agonir: remove bool lowering from lower_int_to_float
Jonathan Marek [Fri, 31 May 2019 20:17:06 +0000 (16:17 -0400)]
nir: remove bool lowering from lower_int_to_float

Removes the bool_to_float logic from the int_to_float pass, so that both
can be used separately. By having separate passes we have better validation
and it makes it possible to use with the lower_ftrunc option (int lowering
generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should
probably be reworked). For now we always expect lower_bool to come after
lower_int.

Also fixes f2i32 to become ftrunc and adds u2f/f2u cases.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: fix lower_{int,bool}_to_float for new mov opcode
Jonathan Marek [Fri, 31 May 2019 20:04:10 +0000 (16:04 -0400)]
nir: fix lower_{int,bool}_to_float for new mov opcode

It is treated like the vecN instructions which also have no type.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: add lower_bitshift option
Jonathan Marek [Fri, 31 May 2019 17:54:12 +0000 (13:54 -0400)]
nir: add lower_bitshift option

Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: fix gather_ssa_types
Jonathan Marek [Fri, 31 May 2019 19:08:54 +0000 (15:08 -0400)]
nir: fix gather_ssa_types

Consts and undefs can be used as different types (common with "0" constant)
so don't copy types from consts/undefs, only to them. It doesn't entirely
solve the problem that the type given to the const could be wrong , but
now the only realistic case is with "0" which is the same when casted to
float, so it doesn't matter for lower_int_to_float.

The other change is to get type information for load input/uniform and
store output, and use that to get correct results.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: add type information to load uniform/input and store output intrinsics
Jonathan Marek [Fri, 31 May 2019 17:44:40 +0000 (13:44 -0400)]
nir: add type information to load uniform/input and store output intrinsics

This type information will be used by gather_ssa_types to get usable results

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: improvements to native_integers removal
Jonathan Marek [Wed, 8 May 2019 20:22:45 +0000 (16:22 -0400)]
nir: improvements to native_integers removal

Improvements related to the patch that removed native_integers:
 * In glsl_to_nir, special cases for i2f,u2f,etc are no longer needed
 * In prog_to_nir, use sge/slt and let lower_scmp lower it if needed

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agofreedreno/a6xx: add 'type' to shader state key
Rob Clark [Fri, 31 May 2019 15:44:55 +0000 (08:44 -0700)]
freedreno/a6xx: add 'type' to shader state key

We could have identical texture state for both VS and FS.. which would
result in VS state getting created first, and FS state mapping to the
identical cmdstream.  Resulting in VS state getting emitted twice and no
FS state emitted.

Fixes:
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.basic_array.sampler2D_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.struct_in_array.sampler2D_samplerCube_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
  dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both
  dEQP-GLES31.functional.program_uniform.by_pointer.render.array_in_struct.sampler2D_samplerCube_both
  dEQP-GLES31.functional.program_uniform.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both
  dEQP-GLES31.functional.program_uniform.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/ir3: fix constlen versus indirect UBO
Rob Clark [Fri, 31 May 2019 14:40:16 +0000 (07:40 -0700)]
freedreno/ir3: fix constlen versus indirect UBO

If we access the address of the UBO indirectly, and there is no higher
const emitted w/ direct access (like an immediate lowered to uniform)
the assembler won't figure out the correct constlen.

Fixes:
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_vertex
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_vertex
  dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/a6xx: fix GPU crash on small render targets
Rob Clark [Fri, 31 May 2019 14:07:57 +0000 (07:07 -0700)]
freedreno/a6xx: fix GPU crash on small render targets

Fixes dEQP-GLES2.functional.multisampled_render_to_texture.readpixels

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/ir3: set more barrier bits
Rob Clark [Thu, 30 May 2019 16:04:57 +0000 (09:04 -0700)]
freedreno/ir3: set more barrier bits

Blob is also setting the .l bit, and it seems to solve some intermittent
failures with a couple of deqp's:

dEQP-GLES31.functional.image_load_store.2d.qualifiers.coherent_r32i
dEQP-GLES31.functional.image_load_store.2d.qualifiers.volatile_r32f

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/ir3: set (ss) on last_input if ldlv
Rob Clark [Wed, 29 May 2019 20:25:11 +0000 (13:25 -0700)]
freedreno/ir3: set (ss) on last_input if ldlv

It seems like (ei) handling doesn't sync on (ss), so we could end up in
a situation where we release varying storage before an ldlv for flat
shaded varyings completes.  Keep track if we've done an (ss) since the
last ldlv, and if not add (ss) flag to last_input which gets (ei).

Noticed with dEQP-GLES3.functional.fragment_out.random.24 and
dEQP-GLES3.functional.fragment_out.random.27, which previously passed by
luck because ir3_sched ordered instructions in a way that resulted in a
lucky (ss).

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/ir3: add assert
Rob Clark [Wed, 29 May 2019 19:26:08 +0000 (12:26 -0700)]
freedreno/ir3: add assert

The special handling for last_input assumes that all the varying loads
are in the first block.  Add an assert to catch if anyone breaks that
assumption.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil/hash_table: Use fast modulo computation
Connor Abbott [Tue, 21 May 2019 10:56:31 +0000 (12:56 +0200)]
util/hash_table: Use fast modulo computation

While we're here, copy the size table from set.c to get rid of hard tabs
in the hash_table.c version.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil/set: Use fast modulo computation
Connor Abbott [Mon, 20 May 2019 13:14:04 +0000 (15:14 +0200)]
util/set: Use fast modulo computation

Compilation times with my shader-db database:

Difference at 95.0% confidence
-1.22312 +/- 0.726033
-0.283979% +/- 0.168254%
(Student's t, pooled s = 1.02177)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil: Add a helper for faster remainders
Connor Abbott [Fri, 29 Mar 2019 14:08:17 +0000 (15:08 +0100)]
util: Add a helper for faster remainders

This should be at least as fast as using fast_idiv_by_const, and has the
advantage that the precomputation is simple enough to be evaluated at
Mesa-compile time for hash tables and sets which have a fixed table of
possible divisors.

Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil/hash_table: Add specialized resizing add function
Connor Abbott [Tue, 21 May 2019 10:36:56 +0000 (12:36 +0200)]
util/hash_table: Add specialized resizing add function

To keep it in sync with the set implementation.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil/set: Add specialized resizing add function
Connor Abbott [Mon, 20 May 2019 12:59:40 +0000 (14:59 +0200)]
util/set: Add specialized resizing add function

A significant portion of the time spent in nir_opt_cse for the Dolphin
ubershaders was in resizing the set. When resizing a hash table, we know
in advance that each new element to be inserted will be different from
every other element, so we don't have to compare them, and there will be
no tombstone elements, so we don't have to worry about caching the
first-seen tombstone. We add a specialized add function which skips
these steps entirely, speeding up resizing.

Compile-time results from my shader-db database:

Difference at 95.0% confidence
-2.29143 +/- 0.845534
-0.529475% +/- 0.194767%
(Student's t, pooled s = 1.08807)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil/hash_table: Pull out loop-invariant computations
Connor Abbott [Tue, 21 May 2019 10:21:53 +0000 (12:21 +0200)]
util/hash_table: Pull out loop-invariant computations

To keep the set and hash table in sync. Note that some of this had
already been done for hash tables, in particular pulling out the
hash % ht->size computation.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil/set: Pull out loop-invariant computations
Connor Abbott [Mon, 20 May 2019 12:58:06 +0000 (14:58 +0200)]
util/set: Pull out loop-invariant computations

Unfortunately GCC can't do this for us, probably because we call the key
comparison function which GCC can't prove won't modify arbitrary memory.
This is a pretty hot function, so do the optimization manually to be
sure the compiler will get it right.

While we're here, make the computation of the new probe address use a
single conditional subtract instead of a modulo, since we know that it
won't ever get as big as 2 * ht->size before the modulo. Modulos tend to
be pretty expensive operations.

shader-db compile time results for my database:

Difference at 95.0% confidence
-2.24934 +/- 0.69897
-0.516296% +/- 0.159993%
(Student's t, pooled s = 0.983684)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir/instr_set: Use _mesa_set_search_or_add()
Connor Abbott [Wed, 27 Mar 2019 11:11:36 +0000 (12:11 +0100)]
nir/instr_set: Use _mesa_set_search_or_add()

Before this change, we were searching for each instruction twice, once
when checking if it exists and once when figuring out where to insert
it. By using the new function, we can do everything we need to do in one
operation.

Compilation time numbers for my shader-db database:

Difference at 95.0% confidence
-4.04706 +/- 0.669508
-0.922142% +/- 0.151948%
(Student's t, pooled s = 0.95824)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil/set: Add a _mesa_set_search_or_add() function
Connor Abbott [Wed, 27 Mar 2019 11:00:54 +0000 (12:00 +0100)]
util/set: Add a _mesa_set_search_or_add() function

Unlike _mesa_set_search_and_add(), it doesn't replace an entry if it's
found, returning it instead. This is useful for nir_instr_set, where
we have to know both the original original instruction and its
equivalent.

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agofreedreno/ir3: fix input ncomp for vertex shaders
Jonathan Marek [Fri, 31 May 2019 16:17:06 +0000 (12:17 -0400)]
freedreno/ir3: fix input ncomp for vertex shaders

ncomp is never set for vertex shaders, but a3xx and a4xx still use it.

Fixes: 831f1a05c0d freedreno/ir3: rework varying packing
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
5 years agointel/compiler: Use compare rematerialization pass
Ian Romanick [Mon, 20 May 2019 18:24:57 +0000 (11:24 -0700)]
intel/compiler: Use compare rematerialization pass

Almost all of the spill / fill benefit is in Deus Ex.

Haswell and all Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224438 -> 17196395 (-0.16%)
instructions in affected programs: 1518658 -> 1490615 (-1.85%)
helped: 1550
HURT: 3
helped stats (abs) min: 1 max: 170 x̄: 18.11 x̃: 2
helped stats (rel) min: 0.04% max: 8.35% x̄: 1.12% x̃: 0.45%
HURT stats (abs)   min: 5 max: 10 x̄: 6.67 x̃: 5
HURT stats (rel)   min: 0.32% max: 0.41% x̄: 0.35% x̃: 0.32%
95% mean confidence interval for instructions value: -19.86 -16.26
95% mean confidence interval for instructions %-change: -1.19% -1.04%
Instructions are helped.

total cycles in shared programs: 361468455 -> 361288721 (-0.05%)
cycles in affected programs: 197367688 -> 197187954 (-0.09%)
helped: 990
HURT: 683
helped stats (abs) min: 1 max: 119045 x̄: 806.00 x̃: 16
helped stats (rel) min: <.01% max: 38.56% x̄: 1.06% x̃: 0.26%
HURT stats (abs)   min: 1 max: 12190 x̄: 905.14 x̃: 22
HURT stats (rel)   min: <.01% max: 25.18% x̄: 1.16% x̃: 0.47%
95% mean confidence interval for cycles value: -315.45 100.58
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 12147 -> 8948 (-26.34%)
spills in affected programs: 5433 -> 2234 (-58.88%)
helped: 343
HURT: 0

total fills in shared programs: 25262 -> 21814 (-13.65%)
fills in affected programs: 7771 -> 4323 (-44.37%)
helped: 343
HURT: 3

LOST:   0
GAINED: 17

Ivy Bridge
total instructions in shared programs: 12083517 -> 12081427 (-0.02%)
instructions in affected programs: 540744 -> 538654 (-0.39%)
helped: 786
HURT: 29
helped stats (abs) min: 1 max: 42 x̄: 2.70 x̃: 2
helped stats (rel) min: 0.06% max: 5.44% x̄: 0.55% x̃: 0.36%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.16% max: 0.95% x̄: 0.38% x̃: 0.31%
95% mean confidence interval for instructions value: -2.83 -2.30
95% mean confidence interval for instructions %-change: -0.57% -0.47%
Instructions are helped.

total cycles in shared programs: 180153463 -> 180124798 (-0.02%)
cycles in affected programs: 72597920 -> 72569255 (-0.04%)
helped: 572
HURT: 249
helped stats (abs) min: 1 max: 14830 x̄: 109.48 x̃: 13
helped stats (rel) min: <.01% max: 8.92% x̄: 0.71% x̃: 0.26%
HURT stats (abs)   min: 1 max: 11060 x̄: 136.37 x̃: 10
HURT stats (rel)   min: <.01% max: 10.85% x̄: 0.54% x̃: 0.32%
95% mean confidence interval for cycles value: -96.22 26.39
95% mean confidence interval for cycles %-change: -0.43% -0.23%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 3625 -> 3623 (-0.06%)
spills in affected programs: 46 -> 44 (-4.35%)
helped: 1
HURT: 0

total fills in shared programs: 4065 -> 4061 (-0.10%)
fills in affected programs: 104 -> 100 (-3.85%)
helped: 1
HURT: 0

LOST:   0
GAINED: 8

Sandy Bridge
total instructions in shared programs: 10879656 -> 10878699 (<.01%)
instructions in affected programs: 275167 -> 274210 (-0.35%)
helped: 544
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.06% max: 3.11% x̄: 0.39% x̃: 0.25%
95% mean confidence interval for instructions value: -1.97 -1.55
95% mean confidence interval for instructions %-change: -0.43% -0.36%
Instructions are helped.

total cycles in shared programs: 154089096 -> 154081132 (<.01%)
cycles in affected programs: 4422722 -> 4414758 (-0.18%)
helped: 459
HURT: 214
helped stats (abs) min: 1 max: 258 x̄: 26.67 x̃: 8
helped stats (rel) min: <.01% max: 5.45% x̄: 0.51% x̃: 0.14%
HURT stats (abs)   min: 1 max: 226 x̄: 19.99 x̃: 4
HURT stats (rel)   min: <.01% max: 3.15% x̄: 0.34% x̃: 0.09%
95% mean confidence interval for cycles value: -15.51 -8.15
95% mean confidence interval for cycles %-change: -0.31% -0.17%
Cycles are helped.

total spills in shared programs: 2880 -> 2876 (-0.14%)
spills in affected programs: 636 -> 632 (-0.63%)
helped: 2
HURT: 0

total fills in shared programs: 3161 -> 3157 (-0.13%)
fills in affected programs: 1519 -> 1515 (-0.26%)
helped: 2
HURT: 0

LOST:   0
GAINED: 2

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8157361 -> 8155067 (-0.03%)
instructions in affected programs: 382491 -> 380197 (-0.60%)
helped: 677
HURT: 0
helped stats (abs) min: 1 max: 43 x̄: 3.39 x̃: 2
helped stats (rel) min: 0.09% max: 5.19% x̄: 0.66% x̃: 0.42%
95% mean confidence interval for instructions value: -3.76 -3.01
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 188588292 -> 188583040 (<.01%)
cycles in affected programs: 3155064 -> 3149812 (-0.17%)
helped: 377
HURT: 13
helped stats (abs) min: 2 max: 180 x̄: 14.13 x̃: 6
helped stats (rel) min: <.01% max: 3.96% x̄: 0.39% x̃: 0.12%
HURT stats (abs)   min: 2 max: 8 x̄: 5.85 x̃: 6
HURT stats (rel)   min: <.01% max: 0.22% x̄: 0.06% x̃: 0.04%
95% mean confidence interval for cycles value: -15.67 -11.27
95% mean confidence interval for cycles %-change: -0.45% -0.30%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir: Rematerialize compare instructions
Ian Romanick [Mon, 20 May 2019 18:22:12 +0000 (11:22 -0700)]
nir: Rematerialize compare instructions

On some architectures, Boolean values used to control conditional
branches or condtional selection must be propagated into a flag.  This
generally means that a stored Boolean value must be compared with zero.
Rather than force the generation of extra compares with zero, re-emit
the original comparison instruction.  This can save register pressure by
not needing to store the Boolean value.

There are several possible ares for future improvement to this pass:

1. Be more conservative.  If both sources to the comparison instruction
are non-constants, it may be better for register pressure to emit the
extra compare.  The current shader-db results on Intel GPUs (next
commit) lead me to believe that this is not currently a problem.

2. Be less conservative.  Currently the pass requires that all users of
the comparison match the pattern.  The idea is that after the pass is
complete, no instruction will use the resulting Boolean value.  The only
uses will be of the flag value.  It may be beneficial to relax this
requirement in some cases.

3. Be less conservative.  Also try to rematerialize comparisons used for
discard_if intrinsics.  After changing the way the Intel compiler
generates cod e for discard_if (see MR!935), I tried implementing this
already.  The changes were pretty small.  Instructions were helped in 19
shaders, but, overall, cycles were hurt.  A commit "nir: Rematerialize
comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit.

4. Copy the preceeding ALU instruction.  If the comparison is a
comparison with zero, and it is the only user of a particular ALU
instruction (e.g., (a+b) != 0.0), it may be a further improvment to also
copy the preceeding ALU instruction.  On Intel GPUs, this may enable
cmod propagation to make additional progress.

v2: Use much simpler method to get the prev_block for an if-statement.
Suggested by Tim.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir: Add a shallow clone function for nir_alu_instr
Ian Romanick [Wed, 29 May 2019 23:48:17 +0000 (16:48 -0700)]
nir: Add a shallow clone function for nir_alu_instr

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Matt Turner <mattst88@gmail.com>
5 years agopanfrost: Remove link stage for jobs
Tomeu Vizoso [Wed, 29 May 2019 09:25:20 +0000 (11:25 +0200)]
panfrost: Remove link stage for jobs

And instead, link them as they are added.

Makes things a bit clearer and prepares future work such as FB reload
jobs.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: ci: Switch to kernel 5.2-rc2
Tomeu Vizoso [Mon, 20 May 2019 09:33:25 +0000 (11:33 +0200)]
panfrost: ci: Switch to kernel 5.2-rc2

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: ci: Update expectations
Tomeu Vizoso [Fri, 31 May 2019 10:34:16 +0000 (12:34 +0200)]
panfrost: ci: Update expectations

A bunch of tests have been fixed, but some regressions have appeared on
T760.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
5 years agoradeonsi/nir: Remove hack for builtins
Connor Abbott [Wed, 29 May 2019 14:03:25 +0000 (16:03 +0200)]
radeonsi/nir: Remove hack for builtins

We now bounds check properly in the uniform loading fast path, so
there's no need to disable it by pretending there are other UBO bindings
in use. The way this looks at the variable name was causing problems
when two piglit shaders, one with a name that triggered the hack and one
that didn't, got hashed to the same thing after stripping out the names.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>