git.libre-soc.org Git - mesa.git/log

Andreas Baierl [Thu, 11 Jul 2019 13:26:24 +0000 (15:26 +0200)]

lima: Fix compiler warnings for unused functions.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>

commit | commitdiff | tree

Caio Marcelo de Oliveira Filho [Fri, 12 Jul 2019 21:37:38 +0000 (14:37 -0700)]

anv: Fix pool allocator when first alloc needs to grow

When using softpin, the first allocation was not calculating the
padding and offset correctly for the case the first allocation needed
to grow. We were missing initialize the state.end right after
expanding the pool for the first time.

This is not a problem for non-softpin since there we don't use
leftover padding so the ends would re-arrange incrementally.

This fixes running dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 in
SKL -- the test uses a shader larger than the initial size for the
instruction pool.

Fixes: dfc9ab2ccd9 "anv/allocator: Add padding information."
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Kenneth Graunke [Sun, 31 May 2015 23:02:36 +0000 (16:02 -0700)]

mesa: Port errors.c to util/list.h instead of simple_list.

There is widespread consensus that simple_list should go away.
This patch converts one more use to the modern kernel-style list.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

commit | commitdiff | tree

Jason Ekstrand [Fri, 12 Jul 2019 23:47:15 +0000 (18:47 -0500)]

intel: Run the optimization loop before and after lowering int64

For bindless SSBO access, we have to do 64-bit address calculations.  On
ICL and above, we don't have 64-bit integer support so we have to lower
the address calculations to 32-bit arithmetic.  If we don't run the
optimization loop before lowering, we won't fold any of the address
chain calculations before lowering 64-bit arithmetic and they aren't
really foldable afterwards.  This cuts the size of the generated code in
the compute shader in dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 by
around 30%.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 12 Jul 2019 15:57:10 +0000 (08:57 -0700)]

panfrost/decode: Drop _replay prefix

We don't even support replay anymore; this is just wasting characters
and adding clutter.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 12 Jul 2019 15:54:49 +0000 (08:54 -0700)]

panfrost/decode: Drop _name suffixes

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 12 Jul 2019 15:47:35 +0000 (08:47 -0700)]

panfrost/decode: Add MEMORY_PROP_DIR variant

This allows dumping memory properties directly without dereferencing an
address, allowing us to fix more -Waddress-of-packed-member warnings.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 12 Jul 2019 15:45:51 +0000 (08:45 -0700)]

panfrost/decode: Copy embedded structs before using

Fixes some, but not all, warnings from -Waddress-of-packed-member

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 12 Jul 2019 15:45:36 +0000 (08:45 -0700)]

panfrost/decode: Remove pandecode_decode_fbd_type

It is unused.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 12 Jul 2019 15:41:13 +0000 (08:41 -0700)]

panfrost/midgard: Use generic outmod type

It could be midgard_outmod_float or midgard_outmod_int; don't assume
it's one or the other. Fixes -Wenum-conversion warnings.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 12 Jul 2019 21:48:34 +0000 (14:48 -0700)]

panfrost: Precompute scoreboard dependents

Mali job dependency graphs, at least for GLES3.0, have the special
property that a given node will only have at most a single dependent.
This allows us to efficiently precompute the dependent array and
replace an inner loop's O(N) search with an O(1) lookup, bringing the
algorithmic complexity of scoreboarding from O(N^2) to O(N).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 12 Jul 2019 21:09:57 +0000 (14:09 -0700)]

panfrost: Remove transient pool abstraction

Now that it has been totally replaced by the borrow mechanism, it is now
unused code.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 12 Jul 2019 20:59:35 +0000 (13:59 -0700)]

panfrost: Subdivide fixed-size transient slabs

The whole purpose of the transient memory model is to make subdivision
stupidly easy, so let's handle that.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 12 Jul 2019 20:05:14 +0000 (13:05 -0700)]

panfrost: Recycle fixed-size transient BOs

The usual case. We use the bitset to mark freedom and seize it.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 12 Jul 2019 19:53:36 +0000 (12:53 -0700)]

panfrost: Bookkeep transient indices

The batch now temporarily possesses the transient buffer, so it'll need
to remember that to free it later.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 12 Jul 2019 19:49:23 +0000 (12:49 -0700)]

panfrost: Rewrite allocate_transient with new abstraction

We use a fixed size slab if we can, otherwise we create a dedicated
("oversized") BO and add that to the job. In the latter case we'll get
reference counting for free so we can forget about this corner case for
the rest of the series.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 12 Jul 2019 20:55:45 +0000 (13:55 -0700)]

panfrost: Add pan_bo_for_screen helper

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Thu, 11 Jul 2019 17:34:40 +0000 (10:34 -0700)]

panfrost: Add panfrost_transient_bo array

We would like transient allocations to occur on the screen (borrowed by
the batch) rather than on the context. Add fields to track this.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Thu, 11 Jul 2019 18:39:33 +0000 (11:39 -0700)]

panfrost: Don't upload vertex/tiler twice

The latter upload is correct, but the former upload is unassociated with
any particular FBO and therefore becomes orphaned. We do have to upload
at draw-time at the latest, if we haven't by then.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Thu, 11 Jul 2019 17:53:37 +0000 (10:53 -0700)]

panfrost/drm: Check allocation size is positive

Zero-sized allocations will fail with an unhelpful errno from the
kernel; check size explicitly in userspace before it gets that far.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Neil Roberts [Tue, 12 Jun 2018 20:24:00 +0000 (22:24 +0200)]

mesa/glspirv: Validate that compute shaders are not linked with other stages

The test is based on link_shaders().

For example, it allows the following test (when run on SPIR-V mode) to
pass:
spec/arb_compute_shader/linker/mix_compute_and_non_compute.shader_test

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Neil Roberts [Fri, 25 May 2018 13:34:26 +0000 (15:34 +0200)]

mesa/glspirv: Validate that there is a VS when there is a TCS, TES or GS

The shader combination tests are copied from link_shaders().

For example, it allows the following tests (when run on SPIR-V mode) to
pass:
   spec/arb_tessellation_shader/linker/no-vs
   spec/arb_tessellation_shader/linker/tcs-no-vs
   spec/arb_tessellation_shader/linker/tes-no-vs
   spec/glsl-1.50/linker/gs-without-vs

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Alejandro Piñeiro [Wed, 27 Feb 2019 14:28:47 +0000 (15:28 +0100)]

i965: don't use disk cache with SPIR-V shaders

Right now we don't support disk cache for SPIR-V shaders (from
ARB_gl_spirv), so let's avoid writing the program data to or reading
it from the disk if any in-use shaders use SPIR-V.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Alejandro Piñeiro [Wed, 27 Feb 2019 14:29:15 +0000 (15:29 +0100)]

glsl/shader_cache: handle SPIR-V shaders

Right now we don't have cache support for SPIR-V shaders (from
ARB_gl_spirv). Right now they are properly skipped because they fall
on the ff shader code path (no key, no name), but it would be better
to update current comments, and add some guards.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Arcady Goldmints-Orlov [Mon, 28 Jan 2019 16:19:28 +0000 (10:19 -0600)]

nir/linker: Initialize UniformDataDefaults when using SPIR-V

Allocate UniformDataDefaults and fill in the data defaults when
linking a SPIR-V program. Among other things, this allows program
serialization to work.

It allows the following piglit test (when run on SPIR-V mode) to pass:
spec/arb_get_program_binary/execution/uniform-after-restore.shader_test

v2: use memcpy to initialize UniformDataDefaults

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Arcady Goldmints-Orlov [Thu, 20 Dec 2018 01:12:25 +0000 (02:12 +0100)]

glsl/serialize: Update write_program_resource_data() to handle NULL input and output variable names

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Arcady Goldmints-Orlov [Thu, 29 Nov 2018 14:16:34 +0000 (15:16 +0100)]

glsl/serialize: Handle NULL uniform name in write_uniforms()

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Antia Puentes [Tue, 18 Dec 2018 10:55:04 +0000 (11:55 +0100)]

mesa/main: Fix UBO/SSBO ACTIVE_VARIABLES query (ARB_gl_spirv)

When querying MAX_NUM_ACTIVE_VARIABLES, NUM_ACTIVE_VARIABLES and
ACTIVE_VARIABLES over SSBO and UBO interfaces, we filter the variables
which are active using the variable's name and looking for it in the
program resource list. If it is in the program resource list, the
variable will be considered active.

However due to ARB_gl_spirv where name reflection information is not
mandatory, we can use the UBO/SSBO binding and variable offset to
filter which variables which are active.

v2: use RESOURCE_UBO/UNI macros instead of direct castings, update
    comment (Alejandro)

v3: Change signature of _mesa_program_resource_find_active_variable
    to simplify calling it. Also, squash the fix for find_binding_offset
    for arrays of blocks (Arcady)

Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Antia Puentes [Fri, 14 Sep 2018 06:55:24 +0000 (08:55 +0200)]

mesa/shader_query: Fix LOCATION_INDEX query (ARB_gl_spirv)

When querying GL_LOCATION_INDEX using glGetProgramResourceiv
we already know the index of the resource, we do not need to find
it using the name, which is convenient for shaders coming from
SPIR-V binaries where names are optional.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Antia Puentes [Mon, 13 Aug 2018 12:13:38 +0000 (14:13 +0200)]

mesa/shaderapi: Fix TRANSFORM_FEEDBACK_VARYING program query

Fixes the program queries API (glGetProgramiv):
TRANSFORM_FEEDBACK_VARYINGS and TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH
in two cases:

1. ARB_enhaced_layouts:

The queries were not working for GLSL shaders which specify the
varyings using enhanced layouts. We were returning the info as if the
varyings could only be specified using the API.

2. ARB_gl_spirv:

TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH should return 1 if there is no
name reflection information available.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Antia Puentes [Wed, 8 Aug 2018 15:52:04 +0000 (17:52 +0200)]

mesa/uniforms: Fix GetUniformLocation (ARB_gl_spirv)

From the ARB_gl_spirv specification, glGetUniformLocation should
return -1 when no name reflection is available.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Antia Puentes [Mon, 13 Aug 2018 16:48:37 +0000 (18:48 +0200)]

mesa/shader_query: Fix NAME_LENGTH queries (ARB_gl_spirv)

For shaders constructed from SPIR-V binaries, it is possible that
no name reflection information is available. In that case,

- glGetProgramInterfaceiv(.., pname=MAX_NAME_LENGTH, ..)
- gletProgramResourceiv(.., props=NAME_LENGTH, ..)

should return 1.

Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Alejandro Piñeiro [Sat, 18 Nov 2017 09:04:42 +0000 (10:04 +0100)]

mesa: Fix ACTIVE_*_MAX_LENGTH program queries (ARB_gl_spirv)

Since ARB_gl_spirv it is possible to miss a lot of name reflection
information, so it is needed to add NULL name checks for several
queries, and return a specific value on those cases. This commit add
them for ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH,
ACTIVE_ATTRIBUTE_MAX_LENGTH and ACTIVE_UNIFORM_MAX_LENGTH.

From ARB_gl_spirv spec:

   "If pname is ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH, the length of
    the longest active uniform block name, including the null
    terminator, is returned. If no active uniform blocks exist, zero
    is returned. If no name reflection information is available, one
    is returned.

    If pname is ACTIVE_ATTRIBUTE_MAX_LENGTH, the length of the longest
    active attribute name, including a null terminator, is returned.
    If no active attributes exist, zero is returned. If no name
    reflection information is available, one is returned.

    If pname is ACTIVE_UNIFORM_MAX_LENGTH, the length of the longest
    active uniform name, including a null terminator, is returned. If
    no active uniforms exist, zero is returned. If no name reflection
    information is available, one is returned."

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Antia Puentes [Thu, 15 Nov 2018 08:13:08 +0000 (09:13 +0100)]

nir/types: Add glsl_type_is_unsized_array helper

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Antia Puentes [Sun, 30 Jun 2019 23:27:59 +0000 (18:27 -0500)]

nir/linker: Fill TOP_LEVEL_ARRAY_SIZE and STRIDE

From the ARB_program_interface_query specification:

    "For the property TOP_LEVEL_ARRAY_SIZE, a single integer
    identifying the number of active array elements of the top-level
    shader storage block member containing to the active variable is
    written to <params>.  If the top-level block member is not
    declared as an array, the value one is written to <params>.  If
    the top-level block member is an array with no declared size, the
    value zero is written to <params>."

    "For the property TOP_LEVEL_ARRAY_STRIDE, a single integer
    identifying the stride between array elements of the top-level
    shader storage block member containing the active variable is
    written to <params>.  For top-level block members declared as
    arrays, the value written is the difference, in basic machine
    units, between the offsets of the active variable for consecutive
    elements in the top-level array.  For top-level block members not
    declared as an array, zero is written to <params>."

v2: move top_level_array_size and stride into nir_link_uniforms_state
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Antia Puentes [Wed, 12 Sep 2018 11:51:57 +0000 (13:51 +0200)]

nir/linker: Compute the offset for non-trivial uniform types.

ARB_gl_spirv points that the offset must be explicit, however this is
true for 'root' types. For complex types, like struct members or
arrays of arraya, it needs to be computed.

We are not using the offset stored in the gl_buffer_variables during
the uniform blocks linking because currently we do not have a way to
relate a gl_buffer_variable with its corresponding gl_uniform_storage.
The GLSL path uses the name for that, but we can not rely on that
because names are optional in SPIR-V.

Notice that uniforms non-backed by a buffer object will have an offset
equal to -1, like in the GLSL path.

v2: add offset and var_is_in_block as per-variable state in
nir_link_uniforms_state (Arcady)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Antia Puentes [Sat, 15 Dec 2018 17:34:11 +0000 (18:34 +0100)]

nir/linker: Add atomic counters to the program resource list

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Antia Puentes [Sat, 15 Dec 2018 17:33:18 +0000 (18:33 +0100)]

nir/linker: Add XFB resources to the program resource list

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Antia Puentes [Sat, 15 Dec 2018 17:25:41 +0000 (18:25 +0100)]

nir/linker: Add BUFFER_VARIABLEs to the prog resource list

v2: use link_util_should_add_buffer_variable() (Arcady)
Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Antia Puentes [Wed, 8 Aug 2018 12:29:38 +0000 (14:29 +0200)]

nir/linker: Add inputs/outputs to the program resource list

v2: added TODO comment hinting possible future refactoring of
    nir_build_program_resource_list and build_program_resource_list,
    to avoid code duplication (Alejandro, to explicitly reflect a
    valid concern from Timothy during the review).

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Alejandro Piñeiro [Fri, 23 Mar 2018 11:35:48 +0000 (12:35 +0100)]

nir/linker: add ubo/ssbo to the program resource list

v2: "nir/linker: Use the stageref when adding UBO/SSBO resources"
squashed on this one (Timothy)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Antia Puentes [Sat, 25 Aug 2018 13:15:30 +0000 (15:15 +0200)]

nir/linker: Fill the uniform's BLOCK_INDEX

Binding comparison is used to determine the block the uniform is part
of. Note that to do the binding comparison we need the information in
UniformBlocks[] and ShaderStorageBlocks[] to be available, so we have
to call gl_nir_link_uniform_blocks() before linking the uniforms.

v2: add missing break (Timothy)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Jul 2019 06:17:09 +0000 (08:17 +0200)]

radv/gfx10: enable 1D textures

Mirror RadeonSI. This also fixes crashes in addrlib.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Andres Gomez [Fri, 12 Jul 2019 15:17:01 +0000 (18:17 +0300)]

intel/compiler: remove abandoned comments

c8665005: ("intel/compiler: Don't always require precise lowering of flrp")
forgot to remove some comments that didn't apply any more after the
change.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>

commit | commitdiff | tree

Andres Gomez [Mon, 8 Jul 2019 13:26:52 +0000 (16:26 +0300)]

nir/compiler: keep same bit size when lowering with flrp

This was probably not caught before because no supported test was
exercising the flrp lowering with other bit size different than 32.

With the arrival of VK_KHR_shader_float_controls we will have some of
those and, unless we keep the bit size, we will end with something
like:

../src/compiler/nir/nir_builder.h:420: nir_builder_alu_instr_finish_and_insert: Assertion `src_bit_size == bit_size' failed.

Fixes: 158370ed2a0 ("nir/flrp: Add new lowering pass for flrp instructions")
Fixes: ae02622d8fd ("nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>

commit | commitdiff | tree

Jason Ekstrand [Fri, 12 Jul 2019 13:20:25 +0000 (08:20 -0500)]

anv: Properly compute image usage in CreateImageView

With separate stencil usage, we can't just grab the usage from the image
directly and have to consider the per-aspect usage instead.

Fixes: 1be38f9178 "anv:Use VK_EXT_separate_stencil_usage to avoid..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Jul 2019 10:17:18 +0000 (12:17 +0200)]

radv/gfx10: emit DISABLE_CONSERVATIVE_ZPASS_COUNTS

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Jul 2019 10:17:17 +0000 (12:17 +0200)]

radv/gfx10: init more registers in the graphics preamble

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Jul 2019 10:17:16 +0000 (12:17 +0200)]

radv/gfx10: set HS/GS/CS.WGP_MODE

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Jul 2019 10:17:15 +0000 (12:17 +0200)]

radv/gfx10: emit GE_PC_ALLOC

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Jul 2019 10:17:14 +0000 (12:17 +0200)]

radv/gfx10: enable vertex shaders without export parameters

GFX10 allows this.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Jul 2019 10:17:13 +0000 (12:17 +0200)]

radv/gfx10: launch 2 compute waves per CU before going onto the next CU

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Jul 2019 10:17:12 +0000 (12:17 +0200)]

radv: use ac_get_compute_resource_limits()

No behaviour change.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Jul 2019 10:17:11 +0000 (12:17 +0200)]

ac: import ac_get_compute_resource_limits() from RadeonSI

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Alyssa Rosenzweig [Thu, 11 Jul 2019 22:46:22 +0000 (15:46 -0700)]

panfrost: Initialize shift/extra_flags

Don't rely on them being preinitialized to zero; this can cause junk to
appear on the wire.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Thu, 11 Jul 2019 22:34:56 +0000 (15:34 -0700)]

panfrost: Fix build warnings

A bunch of these are from asserts not being compiled in 32-bit mode
(once Erik's ASSERTABLE stuff is merged, we'll want to switch).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Jul 2019 11:59:08 +0000 (13:59 +0200)]

radv/gfx10: invalidate everything in L2 when shaders read data

This includes metadata as well. On GFX10, we have to invalidate
the L2 metadata cache when shaders read DCC.

Note that we still have to implement GFX10 coherency by
introducing INV_L2_METATADA but for now just flush L2.

This fixes a corruption with DCC and Talos.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Jul 2019 09:12:58 +0000 (11:12 +0200)]

radv/gfx10: fix wrong emission of GE_CNTL

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 12 Jul 2019 09:12:57 +0000 (11:12 +0200)]

radv: add more assertions to make sure packets are correctly emitted

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Alejandro Piñeiro [Tue, 25 Jun 2019 13:02:56 +0000 (15:02 +0200)]

v3d: use inc/dec tmu operation with image atomic sub/add of 1

This allows to remove a mov of 1/-1, as it is implicit with the
operation.

As with atomic inc/dec/add, usual shader-db set doesn't include any
GLES shader using it. So using as workaround vk-gl-cts shaders, we get
this:

total instructions in shared programs: 1217013 -> 1217006 (<.01%)
instructions in affected programs: 53 -> 46 (-13.21%)
helped: 2
HURT: 0

One of the helped shader went from 40 to 34 instructions.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Alejandro Piñeiro [Thu, 27 Jun 2019 12:16:15 +0000 (14:16 +0200)]

v3d: refactor some code from v3d40_vir_emit_image_load_store

And moved to new auxiliar method v3d40_image_load_store_tmu_op,
equivalent to the nir_to_nir v3d_general_tmu_op, to clean-up a little.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Alejandro Piñeiro [Wed, 19 Jun 2019 11:17:41 +0000 (13:17 +0200)]

v3d: use inc/dec tmu operation with atomic sub/add of 1

Among other things, this avoid the need of loading 1/-1 constants (so
one less operation).

The removed comment suggest the option of adding support on NIR for
inc/dec. Intel just uses an auxiliar method to get which hw operation
is needed, so no lowering is needed. And at the same time, being so
small, seems unreasonable to try to add a general one on NIR
itself. It is more easy to just adapt the method here (that is what
the patch does right now).

It is worth to note that we are not getting any change on shader-db
stats because all those methods are used on the usual shader-db set
with shaders needing GLSL > 4.2. In general there aren't too many GLSL
ES 3.1 tests.

As an alternative, we captured the GLES3/GLSL31/GLS32 used on
vk-gl-cts, even if that is not a real life usage of shaders. With
those we get the following:

total instructions in shared programs: 1217022 -> 1217013 (<.01%)
instructions in affected programs: 117 -> 108 (-7.69%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.50 x̃: 1
helped stats (rel) min: 3.57% max: 10.00% x̄: 8.09% x̃: 9.09%
95% mean confidence interval for instructions value: -2.07 -0.93
95% mean confidence interval for instructions %-change: -10.54% -5.64%
Instructions are helped.

Note that the shaders helped are really low because most of the
vk-gl-cts tests using AtomicInc/Dec/Add are mostly used on compute
shaders. Although right now there is a branch around with CS support,
the usual is doing the stats against master.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Alejandro Piñeiro [Tue, 2 Jul 2019 10:45:49 +0000 (12:45 +0200)]

v3d: remove redefinition of tmu operations on nir_to_vir

They are already defined, although is a slightly different format on
the generated packet headers, so it was needed to change how it is
used on nir_to_vir.

In addition to allow to remove some duplicated headers, it will allow
to define just one get_op_for_atomic_add aux method later to support
using inc/dec instead of add of 1/-1.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Alejandro Piñeiro [Tue, 2 Jul 2019 10:02:04 +0000 (12:02 +0200)]

v3d: tweak initial comment on pack generator script

As the files it mentions to use as reference has slightly different
names.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Yevhenii Kolesnikov [Wed, 10 Jul 2019 10:44:44 +0000 (13:44 +0300)]

glsl/link_varyings: Fix hash table leak

Hash tables were not destroyed at return.

v2: Use ralloc_context (Eric Anholt)

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Kenneth Graunke [Mon, 1 Apr 2019 22:27:01 +0000 (15:27 -0700)]

iris: Simplify devinfo access in calculate_result_on_gpu()

We have devinfo, no need for screen->devinfo.

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 9 Jul 2019 10:24:43 +0000 (12:24 +0200)]

v3d: remove unused definitions

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 3 Jul 2019 07:56:49 +0000 (09:56 +0200)]

v3d: move implementation of some intrinsics to separate helpers

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Thu, 11 Jul 2019 08:56:17 +0000 (10:56 +0200)]

v3d: emit correct lowering for logic ops with RGB10A2 render targets

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Mon, 8 Jul 2019 10:31:38 +0000 (12:31 +0200)]

v3d: emit correct lowering for logic ops with integer render targets

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 3 Jul 2019 07:38:39 +0000 (09:38 +0200)]

v3d: add lowering for OpenGL logic operations

This implements support for OpenGL logic operations by emitting code to read
from the TLB if needed and blending the fragment output accordingly. It is
similar to VC4's blend lowering pass, but exclusive to logic operations, since
blending is otherwise supported in hardware.

The pass doesn't handle MSAA targets yet.

Fixes the following piglit tests:
spec/!opengl 1.0/gl-1.0-logicop/*
spec/!opengl 1.1/gl-1.1-xor
spec/!opengl 1.1/gl-1.1-xor-copypixels

It also fixes text cursor rendering in Libreoffice with the GTK+2 theme, which
is rendered via glamor using the XOR logic operation.

v2: fix checks for allowed variable location and maximum render target (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Thu, 4 Jul 2019 10:22:40 +0000 (12:22 +0200)]

v3d: acquire scoreboard lock before first tlb read

Until now we have always been emitting our scoreboard locks on the last thread
switch to improve parallelism. We did this by emitting our last thread switch
right before our tlb writes at the very end of the program, where we know that
we are outside control flow.

Unfortunately, this strategy is not valid when we have tlb color reads too, as
these will happen before this point in the program and can happen inside
control flow.

To fix this we always emit a thread switch before the first tlb load and if we
see additional thread switches after that point, we change the strategy to lock
on the first thread switch.

v2: change the solution so it is expected to work in more scenarios (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 3 Jul 2019 07:30:43 +0000 (09:30 +0200)]

v3d: implement tile buffer color read intrinsic

We will be emitting this intrinsic to signal TLB color loads when we implement
OpenGL logic operations, where we need to blend the fragment shader color
output with the existing color in the render target.

Per-sample TLB reads are not supported yet.

v2: fix the offset into the color_reads array (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Tue, 9 Jul 2019 07:15:02 +0000 (09:15 +0200)]

nir: add a new v3d-specific intrinsic for tile buffer color reads

This is intended to be used, for example, with OpenGL logic operations. It
takes a render target as source and a sample index in the base index for
MSAA color reads.

v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 3 Jul 2019 07:19:52 +0000 (09:19 +0200)]

v3d: fix size of color_reads and sample_colors arrays

We need to scale the size of these arrays to consider up to
V3D_MAX_DRAW_BUFFERS render targets and 4 components per color.

v2: we want to store each color component separately, so scale by 4 too.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 3 Jul 2019 07:14:25 +0000 (09:14 +0200)]

v3d: add color formats and swizzles to the fragment shader key

We are going to need these very soon to emit correct reads from the tlb
to implement logic operations.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 3 Jul 2019 07:09:11 +0000 (09:09 +0200)]

v3d: add helpers to emit ldtlb and ldtlbu signals

The ldtlbu version will read an implicit uniform with the TLB read
specifier and should be used for the first read in a sequence
of TLB reads (unless the default configuration is valid, in which
case we can use ldtlb). The ldtlb version is used for any subsequent
TLB read in the sequence.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Fri, 5 Jul 2019 08:04:32 +0000 (10:04 +0200)]

v3d: handle tlb read dependency tracking as if they were writes

Tile buffer reads are emitted as ordered sequences and cannot be reordered.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 3 Jul 2019 10:02:11 +0000 (12:02 +0200)]

v3d: instructions with the ldtlb and ldtlbu signals are tlb instructions

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Fri, 5 Jul 2019 07:47:05 +0000 (09:47 +0200)]

v3d: tlb loads cannot be removed

Loads from the tile buffer are emitted in ordered sequences so
we cannot eliminate or reorder any of them.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 3 Jul 2019 07:07:22 +0000 (09:07 +0200)]

v3d: the ldtlbu signal reads an implicit uniform

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Iago Toral Quiroga [Wed, 3 Jul 2019 07:01:22 +0000 (09:01 +0200)]

v3d: handle ldtlb and ldtlbu signals during disassembly

We already have code to print these signals but the early return in the code
that checks if any signals are present present was missing the checks for them,
so it would skip printing them unless they were paired with other signals.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Samuel Pitoiset [Thu, 11 Jul 2019 16:03:56 +0000 (18:03 +0200)]

radv: report shader stage name when dumping LLVM IR

For debugging purposes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Thu, 11 Jul 2019 16:03:55 +0000 (18:03 +0200)]

radv: tidy up radv_get_shader_name() and add NGG stages

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Thu, 11 Jul 2019 13:51:40 +0000 (15:51 +0200)]

radv/gfx10: update OVERWRITE_COMBINER_{MRT_SHARING,WATERMARK}

DCC related, mirror RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com

commit | commitdiff | tree

Samuel Pitoiset [Thu, 11 Jul 2019 16:32:56 +0000 (18:32 +0200)]

radv/gfx10: do not set alignment on the ngg_emit pointer

This is invalid and this fixes a crash in LLVM.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Thu, 11 Jul 2019 15:02:13 +0000 (17:02 +0200)]

radv/gfx10: fix exporting clip/cull distances for GS

This fixes dEQP-VK.clipping.user_defined.clip_distance.*geom*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Thu, 11 Jul 2019 15:02:12 +0000 (17:02 +0200)]

radv/gfx10: fix exporting the subpass view index for GS

This fixes dEQP-VK.multiview.*geometry*.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Timothy Arceri [Mon, 1 Jul 2019 02:25:19 +0000 (12:25 +1000)]

mesa: save/restore SSO flag when using ARB_get_program_binary

Without this the restored program will fail the pipeline validation
checks when we attempt to use an SSO program.

Fixes: c20fd744fef1 ("mesa: Add Mesa ARB_get_program_binary helper functions")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111010

commit | commitdiff | tree

Alyssa Rosenzweig [Thu, 11 Jul 2019 20:15:17 +0000 (13:15 -0700)]

pan/midgard: Correct component count clamping PSIZ

Kind of a funky corner case that does not (as far as I know) apply to
organic shaders from GLES but does pop up in generated shaders from the
fixed-function desktop pipeline.

Fixes: bb483a91663f ("panfrost: Clamp point size")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Thu, 11 Jul 2019 17:12:12 +0000 (10:12 -0700)]

panfrost: Remove unused display target field

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Thu, 11 Jul 2019 19:23:48 +0000 (12:23 -0700)]

panfrost/ci: Update expectations

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

commit | commitdiff | tree

Samuel Pitoiset [Thu, 11 Jul 2019 19:35:46 +0000 (21:35 +0200)]

radv: only enable the GS copy shader stage if GS is enabled

Ooops.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

commit | commitdiff | tree

Eric Anholt [Thu, 11 Jul 2019 18:35:12 +0000 (11:35 -0700)]

freedreno: Add dependency on the xml build to the winsys.

The screen header includes the common xml, and otherwise we might race
to build before it's done.

Fixes: e03259974e2f ("freedreno: Generate headers from xml files")
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

commit | commitdiff | tree

Kenneth Graunke [Mon, 8 Jul 2019 00:14:15 +0000 (17:14 -0700)]

iris: Disable SIMD32 when using a 16x MSAA framebuffer.

We weren't doing this documented workaround because it's sorta painful.

commit | commitdiff | tree

Ian Romanick [Mon, 6 Aug 2018 20:05:08 +0000 (13:05 -0700)]

nir/algebraic: Recognize open-coded flrp(a, b, a)

No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.

v2: Remove flrp@64 cases.  Since Gen11 removes flrp@32, it seems
unlikely that we'll ever have a flrp@64.  Should that occur, the cases
can be added back.

All Gen6-Gen9 platforms had similar results. (Skylake shown)
total instructions in shared programs: 15041996 -> 15041184 (<.01%)
instructions in affected programs: 71776 -> 70964 (-1.13%)
helped: 312
HURT: 0
helped stats (abs) min: 2 max: 3 x̄: 2.60 x̃: 3
helped stats (rel) min: 0.36% max: 4.55% x̄: 1.75% x̃: 1.28%
95% mean confidence interval for instructions value: -2.66 -2.55
95% mean confidence interval for instructions %-change: -1.89% -1.61%
Instructions are helped.

total cycles in shared programs: 354303333 -> 354301807 (<.01%)
cycles in affected programs: 433742 -> 432216 (-0.35%)
helped: 206
HURT: 78
helped stats (abs) min: 2 max: 244 x̄: 21.02 x̃: 8
helped stats (rel) min: 0.06% max: 19.59% x̄: 1.72% x̃: 0.82%
HURT stats (abs)   min: 1 max: 220 x̄: 35.95 x̃: 10
HURT stats (rel)   min: 0.07% max: 30.48% x̄: 2.53% x̃: 0.56%
95% mean confidence interval for cycles value: -10.68 -0.06
95% mean confidence interval for cycles %-change: -0.99% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Ian Romanick [Thu, 25 Apr 2019 06:49:30 +0000 (23:49 -0700)]

nir/algebraic: Rearrange 1-((1-a) * (1-b)) into flrp-friendly form

No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.

v2: Convert the pattern directly to flrp.  There were negligible
improvements on Gen4 and Gen5, and Gen11 was actually hurt.  I believe
the problem is this optimization conflicts with the (1-x)*y =>
ffma(-x, y, y) optimization on Gen11.

Skylake
total instructions in shared programs: 15046487 -> 15041996 (-0.03%)
instructions in affected programs: 194681 -> 190190 (-2.31%)
helped: 880
HURT: 20
helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4
helped stats (rel) min: 0.19% max: 36.36% x̄: 4.85% x̃: 3.33%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.11% max: 1.06% x̄: 0.28% x̃: 0.17%
95% mean confidence interval for instructions value: -5.25 -4.73
95% mean confidence interval for instructions %-change: -5.11% -4.36%
Instructions are helped.

total cycles in shared programs: 354340839 -> 354303333 (-0.01%)
cycles in affected programs: 1753622 -> 1716116 (-2.14%)
helped: 786
HURT: 182
helped stats (abs) min: 1 max: 1842 x̄: 56.52 x̃: 22
helped stats (rel) min: 0.03% max: 43.17% x̄: 3.90% x̃: 2.84%
HURT stats (abs)   min: 1 max: 440 x̄: 37.99 x̃: 9
HURT stats (rel)   min: 0.03% max: 29.37% x̄: 1.96% x̃: 0.32%
95% mean confidence interval for cycles value: -45.90 -31.59
95% mean confidence interval for cycles %-change: -3.09% -2.50%
Cycles are helped.

All Gen6-Gen8 platforms had similar results. (Broadwell shown)
total instructions in shared programs: 15055907 -> 15051466 (-0.03%)
instructions in affected programs: 196370 -> 191929 (-2.26%)
helped: 871
HURT: 26
helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4
helped stats (rel) min: 0.19% max: 36.36% x̄: 4.76% x̃: 3.27%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.11% max: 1.06% x̄: 0.24% x̃: 0.12%
95% mean confidence interval for instructions value: -5.21 -4.69
95% mean confidence interval for instructions %-change: -4.99% -4.24%
Instructions are helped.

total cycles in shared programs: 387729170 -> 387699745 (<.01%)
cycles in affected programs: 1816409 -> 1786984 (-1.62%)
helped: 788
HURT: 172
helped stats (abs) min: 1 max: 662 x̄: 47.29 x̃: 22
helped stats (rel) min: 0.03% max: 31.26% x̄: 3.55% x̃: 2.76%
HURT stats (abs)   min: 1 max: 404 x̄: 45.59 x̃: 14
HURT stats (rel)   min: 0.03% max: 22.92% x̄: 1.53% x̃: 0.43%
95% mean confidence interval for cycles value: -35.69 -25.61
95% mean confidence interval for cycles %-change: -2.88% -2.40%
Cycles are helped.

total fills in shared programs: 34712 -> 34710 (<.01%)
fills in affected programs: 7 -> 5 (-28.57%)
helped: 1
HURT: 0

LOST:   0
GAINED: 2

Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Ian Romanick [Thu, 6 Jun 2019 00:23:11 +0000 (17:23 -0700)]

nir/algebraic: Reassociate fadd into fmul in DPH-like pattern

Moving the add to the other end of the sequence allows it to be fused
into an FMA.

Ice Lake
total instructions in shared programs: 17173074 -> 16933147 (-1.40%)
instructions in affected programs: 7938745 -> 7698818 (-3.02%)
helped: 35583
HURT: 90
helped stats (abs) min: 1 max: 716 x̄: 6.75 x̃: 6
helped stats (rel) min: 0.10% max: 53.04% x̄: 5.29% x̃: 3.45%
HURT stats (abs)   min: 1 max: 41 x̄: 2.46 x̃: 1
HURT stats (rel)   min: 0.32% max: 8.33% x̄: 1.41% x̃: 0.77%
95% mean confidence interval for instructions value: -6.80 -6.65
95% mean confidence interval for instructions %-change: -5.32% -5.22%
Instructions are helped.

total cycles in shared programs: 360881386 -> 359533568 (-0.37%)
cycles in affected programs: 189489144 -> 188141326 (-0.71%)
helped: 27250
HURT: 6707
helped stats (abs) min: 1 max: 21997 x̄: 62.15 x̃: 16
helped stats (rel) min: <.01% max: 70.69% x̄: 4.04% x̃: 2.35%
HURT stats (abs)   min: 1 max: 3507 x̄: 51.56 x̃: 14
HURT stats (rel)   min: <.01% max: 77.26% x̄: 2.72% x̃: 1.27%
95% mean confidence interval for cycles value: -44.70 -34.68
95% mean confidence interval for cycles %-change: -2.75% -2.65%
Cycles are helped.

total spills in shared programs: 8943 -> 8829 (-1.27%)
spills in affected programs: 625 -> 511 (-18.24%)
helped: 6
HURT: 3

total fills in shared programs: 21815 -> 21719 (-0.44%)
fills in affected programs: 1653 -> 1557 (-5.81%)
helped: 7
HURT: 10

LOST:   11
GAINED: 3

Skylake and Broadwell had similar results. (Skylake shown)
total instructions in shared programs: 15271996 -> 15040882 (-1.51%)
instructions in affected programs: 7193699 -> 6962585 (-3.21%)
helped: 33985
HURT: 30
helped stats (abs) min: 1 max: 260 x̄: 6.80 x̃: 6
helped stats (rel) min: 0.10% max: 30.00% x̄: 5.54% x̃: 3.85%
HURT stats (abs)   min: 1 max: 41 x̄: 4.00 x̃: 3
HURT stats (rel)   min: 0.20% max: 2.16% x̄: 1.46% x̃: 1.72%
95% mean confidence interval for instructions value: -6.87 -6.72
95% mean confidence interval for instructions %-change: -5.59% -5.48%
Instructions are helped.

total cycles in shared programs: 355520785 -> 354253799 (-0.36%)
cycles in affected programs: 185869148 -> 184602162 (-0.68%)
helped: 25824
HURT: 6287
helped stats (abs) min: 1 max: 21997 x̄: 61.66 x̃: 16
helped stats (rel) min: <.01% max: 42.05% x̄: 4.18% x̃: 2.41%
HURT stats (abs)   min: 1 max: 3327 x̄: 51.76 x̃: 14
HURT stats (rel)   min: <.01% max: 101.62% x̄: 2.80% x̃: 1.28%
95% mean confidence interval for cycles value: -44.70 -34.21
95% mean confidence interval for cycles %-change: -2.87% -2.76%
Cycles are helped.

total spills in shared programs: 8835 -> 8818 (-0.19%)
spills in affected programs: 613 -> 596 (-2.77%)
helped: 5
HURT: 2

total fills in shared programs: 21738 -> 21744 (0.03%)
fills in affected programs: 1348 -> 1354 (0.45%)
helped: 5
HURT: 11

LOST:   0
GAINED: 12

Haswell
total instructions in shared programs: 13447102 -> 13381508 (-0.49%)
instructions in affected programs: 3770735 -> 3705141 (-1.74%)
helped: 11999
HURT: 29
helped stats (abs) min: 1 max: 409 x̄: 5.60 x̃: 3
helped stats (rel) min: 0.10% max: 20.00% x̄: 2.38% x̃: 1.87%
HURT stats (abs)   min: 3 max: 750 x̄: 54.90 x̃: 3
HURT stats (rel)   min: 0.12% max: 125.30% x̄: 9.96% x̃: 1.82%
95% mean confidence interval for instructions value: -5.71 -5.19
95% mean confidence interval for instructions %-change: -2.39% -2.30%
Instructions are helped.

total cycles in shared programs: 376342236 -> 375690458 (-0.17%)
cycles in affected programs: 155699021 -> 155047243 (-0.42%)
helped: 8397
HURT: 2876
helped stats (abs) min: 1 max: 20248 x̄: 109.87 x̃: 18
helped stats (rel) min: <.01% max: 40.71% x̄: 2.23% x̃: 1.49%
HURT stats (abs)   min: 1 max: 15414 x̄: 94.15 x̃: 22
HURT stats (rel)   min: <.01% max: 432.49% x̄: 3.15% x̃: 1.41%
95% mean confidence interval for cycles value: -67.64 -48.00
95% mean confidence interval for cycles %-change: -0.99% -0.74%
Cycles are helped.

total spills in shared programs: 23134 -> 23184 (0.22%)
spills in affected programs: 1675 -> 1725 (2.99%)
helped: 13
HURT: 11

total fills in shared programs: 34550 -> 34686 (0.39%)
fills in affected programs: 1421 -> 1557 (9.57%)
helped: 13
HURT: 11

LOST:   0
GAINED: 11

Ivy Bridge
total instructions in shared programs: 12019642 -> 11987285 (-0.27%)
instructions in affected programs: 1532236 -> 1499879 (-2.11%)
helped: 5522
HURT: 110
helped stats (abs) min: 1 max: 312 x̄: 6.22 x̃: 3
helped stats (rel) min: 0.16% max: 20.00% x̄: 2.46% x̃: 1.88%
HURT stats (abs)   min: 1 max: 750 x̄: 18.07 x̃: 3
HURT stats (rel)   min: 0.09% max: 125.30% x̄: 3.42% x̃: 1.15%
95% mean confidence interval for instructions value: -6.25 -5.24
95% mean confidence interval for instructions %-change: -2.43% -2.26%
Instructions are helped.

total cycles in shared programs: 180214667 -> 179761900 (-0.25%)
cycles in affected programs: 31448723 -> 30995956 (-1.44%)
helped: 7191
HURT: 2838
helped stats (abs) min: 1 max: 17680 x̄: 88.47 x̃: 17
helped stats (rel) min: <.01% max: 50.45% x̄: 2.16% x̃: 1.40%
HURT stats (abs)   min: 1 max: 15540 x̄: 64.63 x̃: 24
HURT stats (rel)   min: 0.02% max: 435.17% x̄: 3.10% x̃: 1.51%
95% mean confidence interval for cycles value: -53.34 -36.95
95% mean confidence interval for cycles %-change: -0.81% -0.53%
Cycles are helped.

total spills in shared programs: 3599 -> 3642 (1.19%)
spills in affected programs: 1180 -> 1223 (3.64%)
helped: 12
HURT: 2

total fills in shared programs: 4031 -> 4162 (3.25%)
fills in affected programs: 876 -> 1007 (14.95%)
helped: 12
HURT: 2

LOST:   6
GAINED: 5

Sandy Bridge
total instructions in shared programs: 10850686 -> 10822890 (-0.26%)
instructions in affected programs: 1247986 -> 1220190 (-2.23%)
helped: 4699
HURT: 102
helped stats (abs) min: 1 max: 104 x̄: 6.02 x̃: 3
helped stats (rel) min: 0.15% max: 17.65% x̄: 2.44% x̃: 1.88%
HURT stats (abs)   min: 1 max: 16 x̄: 4.70 x̃: 3
HURT stats (rel)   min: 0.09% max: 3.85% x̄: 1.11% x̃: 1.10%
95% mean confidence interval for instructions value: -6.10 -5.47
95% mean confidence interval for instructions %-change: -2.42% -2.30%
Instructions are helped.

total cycles in shared programs: 154044149 -> 153920095 (-0.08%)
cycles in affected programs: 26037392 -> 25913338 (-0.48%)
helped: 5974
HURT: 2521
helped stats (abs) min: 1 max: 1802 x̄: 35.42 x̃: 16
helped stats (rel) min: <.01% max: 35.80% x̄: 1.43% x̃: 0.84%
HURT stats (abs)   min: 1 max: 862 x̄: 34.73 x̃: 20
HURT stats (rel)   min: 0.01% max: 36.33% x̄: 1.67% x̃: 0.85%
95% mean confidence interval for cycles value: -16.31 -12.90
95% mean confidence interval for cycles %-change: -0.56% -0.45%
Cycles are helped.

total spills in shared programs: 2876 -> 2957 (2.82%)
spills in affected programs: 592 -> 673 (13.68%)
helped: 6
HURT: 35

total fills in shared programs: 3157 -> 3134 (-0.73%)
fills in affected programs: 402 -> 379 (-5.72%)
helped: 6
HURT: 0

LOST:   5
GAINED: 11

Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Ian Romanick [Mon, 6 Aug 2018 20:07:59 +0000 (13:07 -0700)]

nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a)

v2: Remove flrp@64 cases.  Since Gen11 removes flrp@32, it seems
unlikely that we'll ever have a flrp@64.  Should that occur, the cases
can be added back.

v3: Add a couple more patterns that just move the negation around.

No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms
lack a LRP instruction.

Skylake
total instructions in shared programs: 15279687 -> 15256058 (-0.15%)
instructions in affected programs: 4344440 -> 4320811 (-0.54%)
helped: 23455
HURT: 18
helped stats (abs) min: 1 max: 21 x̄: 1.01 x̃: 1
helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65%
HURT stats (abs)   min: 1 max: 2 x̄: 1.06 x̃: 1
HURT stats (rel)   min: 0.13% max: 1.16% x̄: 0.43% x̃: 0.34%
95% mean confidence interval for instructions value: -1.01 -1.00
95% mean confidence interval for instructions %-change: -0.87% -0.85%
Instructions are helped.

total cycles in shared programs: 355593755 -> 355339981 (-0.07%)
cycles in affected programs: 162089552 -> 161835778 (-0.16%)
helped: 20467
HURT: 7158
helped stats (abs) min: 1 max: 2074 x̄: 29.00 x̃: 6
helped stats (rel) min: <.01% max: 35.71% x̄: 1.71% x̃: 0.58%
HURT stats (abs)   min: 1 max: 4814 x̄: 47.46 x̃: 11
HURT stats (rel)   min: <.01% max: 125.43% x̄: 2.88% x̃: 0.98%
95% mean confidence interval for cycles value: -10.39 -7.98
95% mean confidence interval for cycles %-change: -0.57% -0.47%
Cycles are helped.

total spills in shared programs: 8843 -> 8835 (-0.09%)
spills in affected programs: 190 -> 182 (-4.21%)
helped: 2
HURT: 0

total fills in shared programs: 21738 -> 21738 (0.00%)
fills in affected programs: 372 -> 372 (0.00%)
helped: 1
HURT: 1

LOST:   12
GAINED: 22

Broadwell
total instructions in shared programs: 15290523 -> 15266818 (-0.16%)
instructions in affected programs: 4314738 -> 4291033 (-0.55%)
helped: 23391
HURT: 11
helped stats (abs) min: 1 max: 119 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65%
HURT stats (abs)   min: 1 max: 189 x̄: 18.09 x̃: 1
HURT stats (rel)   min: 0.11% max: 5.39% x̄: 0.98% x̃: 0.50%
95% mean confidence interval for instructions value: -1.04 -0.99
95% mean confidence interval for instructions %-change: -0.87% -0.85%
Instructions are helped.

total cycles in shared programs: 388911660 -> 388830827 (-0.02%)
cycles in affected programs: 172903324 -> 172822491 (-0.05%)
helped: 15601
HURT: 13269
helped stats (abs) min: 1 max: 1986 x̄: 29.18 x̃: 6
helped stats (rel) min: <.01% max: 36.60% x̄: 1.74% x̃: 0.55%
HURT stats (abs)   min: 1 max: 14904 x̄: 28.21 x̃: 6
HURT stats (rel)   min: <.01% max: 102.58% x̄: 1.77% x̃: 0.60%
95% mean confidence interval for cycles value: -4.20 -1.40
95% mean confidence interval for cycles %-change: -0.17% -0.08%
Cycles are helped.

total spills in shared programs: 23110 -> 23069 (-0.18%)
spills in affected programs: 656 -> 615 (-6.25%)
helped: 3
HURT: 1

total fills in shared programs: 34399 -> 34398 (<.01%)
fills in affected programs: 905 -> 904 (-0.11%)
helped: 3
HURT: 1

LOST:   6
GAINED: 23

Haswell
total instructions in shared programs: 13465303 -> 13441142 (-0.18%)
instructions in affected programs: 3726999 -> 3702838 (-0.65%)
helped: 22139
HURT: 347
helped stats (abs) min: 1 max: 43 x̄: 1.11 x̃: 1
helped stats (rel) min: 0.03% max: 10.00% x̄: 1.01% x̃: 0.75%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.35% max: 11.11% x̄: 1.48% x̃: 1.12%
95% mean confidence interval for instructions value: -1.08 -1.07
95% mean confidence interval for instructions %-change: -0.99% -0.96%
Instructions are helped.

total cycles in shared programs: 376271308 -> 376273090 (<.01%)
cycles in affected programs: 167496811 -> 167498593 (<.01%)
helped: 13206
HURT: 13281
helped stats (abs) min: 1 max: 3864 x̄: 35.39 x̃: 8
helped stats (rel) min: <.01% max: 53.10% x̄: 2.31% x̃: 0.80%
HURT stats (abs)   min: 1 max: 3828 x̄: 35.32 x̃: 8
HURT stats (rel)   min: <.01% max: 117.85% x̄: 2.88% x̃: 0.61%
95% mean confidence interval for cycles value: -1.33 1.47
95% mean confidence interval for cycles %-change: 0.22% 0.36%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 23158 -> 23134 (-0.10%)
spills in affected programs: 24 -> 0
helped: 3
HURT: 0

total fills in shared programs: 34580 -> 34550 (-0.09%)
fills in affected programs: 30 -> 0
helped: 3
HURT: 0

LOST:   23
GAINED: 13

Ivy Bridge
total instructions in shared programs: 12034154 -> 12014301 (-0.16%)
instructions in affected programs: 3636209 -> 3616356 (-0.55%)
helped: 18771
HURT: 459
helped stats (abs) min: 1 max: 43 x̄: 1.08 x̃: 1
helped stats (rel) min: 0.03% max: 10.00% x̄: 0.91% x̃: 0.68%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.34% max: 8.33% x̄: 1.43% x̃: 1.11%
95% mean confidence interval for instructions value: -1.04 -1.02
95% mean confidence interval for instructions %-change: -0.86% -0.84%
Instructions are helped.

total cycles in shared programs: 180186960 -> 180175147 (<.01%)
cycles in affected programs: 44652745 -> 44640932 (-0.03%)
helped: 12979
HURT: 11033
helped stats (abs) min: 1 max: 5836 x̄: 32.88 x̃: 6
helped stats (rel) min: <.01% max: 53.10% x̄: 2.19% x̃: 0.74%
HURT stats (abs)   min: 1 max: 4811 x̄: 37.61 x̃: 9
HURT stats (rel)   min: <.01% max: 115.18% x̄: 2.99% x̃: 0.69%
95% mean confidence interval for cycles value: -2.29 1.31
95% mean confidence interval for cycles %-change: 0.11% 0.26%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 3623 -> 3599 (-0.66%)
spills in affected programs: 24 -> 0
helped: 3
HURT: 0

total fills in shared programs: 4061 -> 4031 (-0.74%)
fills in affected programs: 30 -> 0
helped: 3
HURT: 0

LOST:   17
GAINED: 18

Sandy Bridge
total instructions in shared programs: 10853968 -> 10834932 (-0.18%)
instructions in affected programs: 3769957 -> 3750921 (-0.50%)
helped: 17944
HURT: 204
helped stats (abs) min: 1 max: 3 x̄: 1.07 x̃: 1
helped stats (rel) min: 0.02% max: 10.00% x̄: 0.83% x̃: 0.60%
HURT stats (abs)   min: 1 max: 2 x̄: 1.01 x̃: 1
HURT stats (rel)   min: 0.31% max: 9.09% x̄: 1.83% x̃: 0.93%
95% mean confidence interval for instructions value: -1.05 -1.04
95% mean confidence interval for instructions %-change: -0.81% -0.78%
Instructions are helped.

total cycles in shared programs: 153894864 -> 153885988 (<.01%)
cycles in affected programs: 50643925 -> 50635049 (-0.02%)
helped: 9361
HURT: 10534
helped stats (abs) min: 1 max: 1966 x̄: 19.42 x̃: 4
helped stats (rel) min: <.01% max: 34.97% x̄: 0.90% x̃: 0.22%
HURT stats (abs)   min: 1 max: 1371 x̄: 16.42 x̃: 5
HURT stats (rel)   min: <.01% max: 55.10% x̄: 0.81% x̃: 0.27%
95% mean confidence interval for cycles value: -1.27 0.38
95% mean confidence interval for cycles %-change: -0.03% 0.04%
Inconclusive result (value mean confidence interval includes 0).

LOST:   6
GAINED: 24

Reviewed-by: Matt Turner <mattst88@gmail.com>

commit | commitdiff | tree

Ian Romanick [Mon, 3 Jun 2019 22:22:15 +0000 (15:22 -0700)]

nir: intel/vec4: Add flag to disable some algebraic optimizations

A couple patches later in this series use the flag to avoid a few
thousand shader-db regresions on all vec4 platforms.

I'm not particularly enamored with the name of this flag. However, I
suspect the Intel vec4 backend is the only backend that will benefit
from it. Specifically, the cases where this helps are all cases where
we want to prevent nir_opt_algebraic from rearranging instructions to
create 3-source instructions, such as ffma and flrp, with additional
immediate value or uniform sources.

The earlier commit "intel/vec4: Try to emit a single load for multiple
3-src instruction operands" solves most of the problems caused by
additional immediate values, but the restrictions on register strides
that cause problems for uniforms and shader inputs persist.

Reviewed-by: Matt Turner <mattst88@gmail.com>

RSS Atom