git.libre-soc.org Git - mesa.git/log

projects / mesa.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Timur Kristóf [Fri, 8 Feb 2019 21:19:14 +0000 (22:19 +0100)]

tgsi_to_nir: Support FACE and POSITION properly.

Previously, FACE was hard-coded as a sysval, but TTN emulated
it incorrectly. Also, POSITION was not supported when it was
a sysval. This patch fixes these by allowing both of them to
be sysvals or inputs, based on driver capabilities. It also
fixes the TGSI FACE emulation based on the TGSI spec.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Timur Kristóf [Fri, 8 Feb 2019 21:11:08 +0000 (22:11 +0100)]

tgsi_to_nir: Extract ttn_emulate_tgsi_front_face into its own function.

We'll need to use the same logic in other places, so it makes sense to
have a separate function for this.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Timur Kristóf [Fri, 8 Feb 2019 21:15:56 +0000 (22:15 +0100)]

tgsi_to_nir: Restructure system value loads.

Minor cleanup to the way system value loads work in tgsi_to_nir.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Timur Kristóf [Tue, 5 Mar 2019 17:59:47 +0000 (18:59 +0100)]

tgsi_to_nir: Produce optimized NIR for a given pipe_screen.

With this patch, tgsi_to_nir will output NIR that is tailored to
the given pipe, by reading its capabilities and adjusting the NIR code
to those capabilities similarly to how glsl_to_nir works.

It also adds an optimization loop that brings the output NIR in line
with what glsl_to_nir outputs. This is necessary for the same reason
why glsl_to_nir has its own optimization loop: currently not every
driver does these optimizations yet.

For uses which cannot pass a pipe_screen we also keep a variant
called tgsi_to_nir_noscreen which keeps the old behavior.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Acked-By: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Timur Kristóf [Mon, 4 Mar 2019 12:54:10 +0000 (13:54 +0100)]

freedreno: Plumb pipe_screen through to irX_tgsi_to_nir.

This patch makes it possible for freedreno to pass a pipe_screen
to tgsi_to_nir. This will be needed when tgsi_to_nir supports reading
pipe capabilities.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Timur Kristóf [Thu, 28 Feb 2019 09:53:11 +0000 (10:53 +0100)]

nir: Add multiplier argument to nir_lower_uniforms_to_ubo.

Note that locations can be set in different units, and the multiplier
argument caters to supporting these different units. For example,
st_glsl_to_nir uses dwords (4 bytes) so the multiplier should be 4,
while tgsi_to_nir uses bytes, so the multiplier should be 16.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Timur Kristóf [Fri, 8 Feb 2019 21:36:37 +0000 (22:36 +0100)]

nir: Move nir_lower_uniforms_to_ubo to compiler/nir.

The nir_lower_uniforms_to_ubo function is useful outside of
mesa/state_tracker, and in fact is needed to produce NIR for
drivers that have the PIPE_CAP_PACKED_UNIFORMS capability.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Timur Kristóf [Fri, 8 Feb 2019 08:59:58 +0000 (09:59 +0100)]

tgsi_to_nir: Split to smaller functions.

Previously, tgsi_to_nir was a single big function, and this patch
intends to make the code easier to understand by splitting it up
to multiple smaller pieces.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-By: Tested-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Timur Kristóf [Wed, 13 Feb 2019 23:01:04 +0000 (01:01 +0200)]

tgsi_to_nir: Make the TGSI IF translation code more readable.

This patch is a minor cleanup that only intends to make the
TGSI IF translation a bit easier to read.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Timur Kristóf [Wed, 13 Feb 2019 22:45:47 +0000 (00:45 +0200)]

tgsi_to_nir: Fix TGSI LIT translation by using flt.

TGSI spec says LIT needs a "greater than" comparison. NIR doesn't have that,
so let's use "less than" and swap the arguments. Previously "greater than or equal"
was used by tgsi_to_nir which is incorrect.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Timur Kristóf [Thu, 7 Feb 2019 17:01:24 +0000 (18:01 +0100)]

tgsi_to_nir: Fix the TGSI ARR translation by converting the result to int.

According to the TGSI spec, ARR needs to do a rounding and then
a float-to-integer conversion which was missing. This patch also
makes the rounding a bit more efficient by using nir_fround_even
instead of the previous nir_ffloor+nir_fadd trick.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Timur Kristóf [Tue, 5 Feb 2019 17:08:24 +0000 (18:08 +0100)]

nir: Add ability for shaders to use window space coordinates.

This patch adds a shader_info field that tells the driver to use window
space coordinates for a given vertex shader. It also enables this feature
in radeonsi (the only NIR-capable driver that supported it in TGSI),
and makes tgsi_to_nir aware of it.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Eric Anholt [Fri, 22 Feb 2019 22:26:26 +0000 (14:26 -0800)]

v3d: Move the stores for fixed function VS output reads into NIR.

This lets us emit the VPM_WRITEs directly from
nir_intrinsic_store_output() (useful once NIR scheduling is in place so
that we can reduce register pressure), and lets future NIR scheduling
schedule the math to generate them. Even in the meantime, it looks like
this lets NIR DCE some more code and make better decisions.

total instructions in shared programs: 6429246 -> 6412976 (-0.25%)
total threads in shared programs: 153924 -> 153934 (<.01%)
total loops in shared programs: 486 -> 483 (-0.62%)
total uniforms in shared programs: 2385436 -> 2388195 (0.12%)

Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)

commit | commitdiff | tree

Eric Anholt [Sat, 23 Feb 2019 19:21:26 +0000 (11:21 -0800)]

v3d: Translate f2i(fround_even) as FTOIN.

This appears to be just what the opcode does. Needed for equivalence when
moving FF VPM stores into NIR.

commit | commitdiff | tree

Eric Anholt [Sun, 24 Feb 2019 00:17:02 +0000 (16:17 -0800)]

nir: Improve printing of load_input/store_output variable names.

We were printing only when the channel was exactly the start channel, so
scalarized loads/stores would be missing the name on the rest.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Tue, 12 Feb 2019 22:56:24 +0000 (16:56 -0600)]

anv: Implement VK_EXT_inline_uniform_block

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Sat, 12 Jan 2019 16:58:33 +0000 (10:58 -0600)]

spirv: Use the same types for resource indices as pointers

We need more space than just a 32-bit scalar and we have to burn all
that space anyway so we may as well expose it to the driver. This also
fixes a subtle bug when UBOs and SSBOs have different pointer types.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Sat, 12 Jan 2019 16:57:28 +0000 (10:57 -0600)]

spirv: Use the generic dereference function for OpArrayLength

With the new deref changes, the old pointer_offset version may not be
the right one to call. Just call the generic one and let it sort it
out.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Sat, 12 Jan 2019 16:32:13 +0000 (10:32 -0600)]

spirv: Pull offset/stride from the pointer for OpArrayLength

We can't pull it from the variable type because it might be an array of
blocks and not just the one block. While we're here, throw in some
error checking.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org

commit | commitdiff | tree

Jason Ekstrand [Mon, 19 Nov 2018 20:28:39 +0000 (14:28 -0600)]

anv: Add a concept of a descriptor buffer

This buffer goes along side the CPU data structure and may contain
pointers, bindless handles, or any other descriptor information.
Currently, all descriptors are size zero and nothing goes in the buffer
but this commit sets up the framework we will need later.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Tue, 12 Feb 2019 21:19:57 +0000 (15:19 -0600)]

anv: Take references to push descriptor set layouts

Technically, descriptor set layouts aren't required to survive past the
function they're passed into so we need to reference them.

Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Tue, 12 Feb 2019 21:29:19 +0000 (15:29 -0600)]

anv: Refactor descriptor pushing a bit

Pull the common code out of the two entrypoints into the helper which
fetches the push descriptor set for us. Now that it does more than just
get a thing, call it anv_cmd_buffer_push_descriptor_set.

Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Tue, 12 Feb 2019 20:02:09 +0000 (14:02 -0600)]

anv: drop add_var_binding from anv_nir_apply_pipeline_layout.c

It has exactly one caller. Just inline it.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Sat, 24 Nov 2018 18:42:39 +0000 (12:42 -0600)]

anv: Clean up descriptor set layouts

The descriptor set layout code in our driver has undergone many changes
over the years.  Some of the fields which were once essential are now
useless or nearly so.  The has_dynamic_offsets field was completely
unused accept for the code to set and hash it.  The per-stage indices
were only being used to determine if a particular binding had images,
samplers, etc.  The fact that it's per-stage also doesn't matter because
that binding should never be accessed by a shader of the wrong stage.

This commit deletes a pile of cruft and replaces it all with a
descriptive bitfield which states what a particular descriptor contains.
This merely describes the data available and doesn't necessarily dictate
how it will be lowered in anv_nir_apply_pipeline_layout.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Thu, 7 Feb 2019 00:02:30 +0000 (18:02 -0600)]

anv: Count image param entries rather than images

This is what we're actually storing in the descriptor set and consuming
when we bind surface states. This commit renames image_count to
image_param_count a few places and moves the decision to not count image
params on gen9+ into anv_descriptor_set.c when we build the layout.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Wed, 6 Feb 2019 23:16:34 +0000 (17:16 -0600)]

anv: Stop allocating buffer views for dynamic buffers

We emit the surface states for those on-the-fly so we don't need the
buffer view.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Thu, 22 Nov 2018 00:00:56 +0000 (18:00 -0600)]

anv: Rework arguments to anv_descriptor_set_write_*

Make them all take a device followed by a set. This is consistent
with how the actual Vulkan entrypoint parameters are laid out.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Mon, 19 Nov 2018 21:15:56 +0000 (15:15 -0600)]

anv/descriptor_set: Refactor alloc/free of descriptor sets

This commit just puts the free list code together as part of the pool
instead of having it inlined into the descriptor set create code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Eric Anholt [Tue, 5 Mar 2019 06:11:15 +0000 (22:11 -0800)]

v3d: Stop treating exec masking specially.

In our backend, the successor edges from the blocks only point to where
QPU control flow goes, not where the notional control flow goes from a
"break" or "continue" modifying the execution mask to resume writing to
some channels later. As a result, this attempt at restricting live ranges
ended up missing the live range of a value where a conditional
break/continue was present in a loop before the later def of a variable.
The previous commit ended up fixing the problem that the flag tried to
solve.

Fixes glsl-vs-loop-continue.shader_test and/or
glsl-vs-loop-redundant-condition.shader_test based on register allocation
results.

commit | commitdiff | tree

Eric Anholt [Tue, 5 Mar 2019 06:10:33 +0000 (22:10 -0800)]

v3d: Restrict live intervals to the blocks reachable from any def.

In the backend, we often have condition codes on writes to variables, such
that there's no screening def anywhere and the previous live ranges
algorithm would conclude that the start of the range extends to the start
of the program. However, we do know that the live range can only extend
as early as you can reach from all blocks writing to the variable.

The motivation was that, while we have a couple of hacks to try to promote
conditional writes up to being a def within the block, the exec_mask one
was broken and needed a replacement.

Based on c3c1aa5aeb92 ("intel/fs: Restrict live intervals to the subset
possibly reachable from any definition.").

commit | commitdiff | tree

Andres Gomez [Tue, 5 Mar 2019 11:55:17 +0000 (13:55 +0200)]

gitlab-ci: install distro's ninja

Ubuntu Bionic is shipping ninja 1.8.2. Therefore, we do not need to
download v1.6.0 manually any more.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

commit | commitdiff | tree

Samuel Pitoiset [Mon, 4 Mar 2019 13:25:08 +0000 (14:25 +0100)]

radv: properly align the fence and EOP bug VA on GFX9

If alignement is 0, offets returned by
radv_cmd_buffer_upload_alloc() are always 0. These two
virtual addresses were pointing at the same location.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Samuel Pitoiset [Tue, 5 Mar 2019 09:45:00 +0000 (10:45 +0100)]

radv: allocate enough space in cmdbuf when starting a subpass

This fixes some CTS crashes with:
dEQP-VK.renderpass2.suballocation.attachment_write_mask.attachment_count_8.start_index_*

Ideally, we should check cmd_buffer->cs->max_dw because there is
likely enough space (the internal clear draws allocate space), but
keep that way for consistency.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Eric Engestrom [Tue, 5 Mar 2019 13:18:28 +0000 (13:18 +0000)]

vulkan: import vk_layer.h from Khronos

Instead of relying on the system having it (and the right version).

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Eric Engestrom [Wed, 27 Feb 2019 15:26:08 +0000 (15:26 +0000)]

egl: fix libdrm-less builds

This function was never used, and isn't properly guarded by HAVE_LIBDRM,
breaking the build on systems that don't have libdrm.

Let's just remove it.

Fixes: 7552fcb7b9b98392e6a8 "egl: add base EGL_EXT_device_base implementation"
Reported-by: Timo Aaltonen <tjaalton@debian.org>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>

commit | commitdiff | tree

Eric Engestrom [Tue, 5 Mar 2019 12:20:53 +0000 (12:20 +0000)]

vulkan: import missing file from Khronos

Fixes: 114c4aa0c84fc6d00407 "vulkan: update headers/registry to 1.1.102"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Eric Engestrom [Wed, 27 Feb 2019 15:08:56 +0000 (15:08 +0000)]

util: #define PATH_MAX when undefined (eg. Hurd)

Cc: Timo Aaltonen <tjaalton@debian.org>
Cc: James Clarke <jrtc27@debian.org>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

commit | commitdiff | tree

Eric Engestrom [Wed, 27 Feb 2019 12:20:56 +0000 (12:20 +0000)]

radv: use the platform defines in vk.xml instead of hard-coding them

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Eric Engestrom [Wed, 27 Feb 2019 12:20:31 +0000 (12:20 +0000)]

anv: use the platform defines in vk.xml instead of hard-coding them

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Lionel Landwerlin [Mon, 4 Mar 2019 17:36:10 +0000 (17:36 +0000)]

anv: update supported patch version

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Tapani Pälli [Fri, 22 Feb 2019 06:54:13 +0000 (08:54 +0200)]

anv: toggle on support for VK_EXT_ycbcr_image_arrays

We already propagate coord_components correctly and did not have
layer restrictions for ycbcr formats.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Lionel Landwerlin [Mon, 4 Mar 2019 17:40:08 +0000 (17:40 +0000)]

vulkan: update headers/registry to 1.1.102

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

commit | commitdiff | tree

Tapani Pälli [Wed, 20 Feb 2019 07:18:39 +0000 (09:18 +0200)]

anv: retain the is_array state in create_plane_tex_instr_implicit

This does not seem to fix anything ATM but is the right thing todo.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Fixes: f3e91e78a33775 ("anv: add nir lowering pass for ycbcr textures")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Eric Engestrom [Thu, 14 Feb 2019 17:22:00 +0000 (17:22 +0000)]

meson: avoid going back up the tree with include_directories()

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

commit | commitdiff | tree

Kenneth Graunke [Mon, 10 Jul 2017 06:03:44 +0000 (23:03 -0700)]

i965: Implement threaded GL support.

Now i965 supports mesa_glthread=true like Gallium drivers do.

According to Markus (degasus), the Citra emulator now runs ~30% faster.
Emmanuel (linkmauve) also reported that the Dolphin emulator improved
by 2.8x on one game. (Both of those still need to be added to drirc.)

An Intel Mesa CI run with mesa_glthread=true appears to be happy.

Bioshock Infinite's benchmark mode seems to be around 15-20% faster
on my Skylake GT4 at 1920x1080.

Tested-by: Markus Wick <markus@selfnet.de>
Tested-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Sat, 2 Mar 2019 07:33:39 +0000 (01:33 -0600)]

anv/pipeline: Drop anv_fill_binding_table

We zero out the prog data anyway and, now that bias is always zero, this
function is accomplishing nothing.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Sat, 23 Feb 2019 19:34:11 +0000 (13:34 -0600)]

anv: Use an actual binding for gl_NumWorkgroups

This commit moves our handling of gl_NumWorkgroups over to work like our
handling of other special bindings in the Vulkan driver. We give it a
magic descriptor set number and teach emit_binding_tables to handle it.
This is better than the bias mechanism we were using because it allows
us to do proper accounting through the bind map mechanism.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Fri, 8 Feb 2019 23:51:24 +0000 (17:51 -0600)]

intel,nir: Lower TXD with min_lod when the sampler index is not < 16

When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header. Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.

Fixes: cb98e0755f8d "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Jason Ekstrand [Wed, 27 Feb 2019 06:12:01 +0000 (00:12 -0600)]

spirv: OpImageQueryLod requires a sampler

No idea how this fell through the cracks besides the fact that the
sampler bound at 0 almost always works and the CTS isn't amazing. In
any case, this appears to have been broken for almost forever.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org

commit | commitdiff | tree

Jason Ekstrand [Fri, 1 Mar 2019 20:01:08 +0000 (14:01 -0600)]

anv: Count surfaces for non-YCbCr images in GetDescriptorSetLayoutSupport

We were accidentally not counting those surfaces

Fixes: ddc4069122 "anv: Implement VK_KHR_maintenance3"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Sagar Ghuge [Mon, 25 Feb 2019 22:56:29 +0000 (14:56 -0800)]

spirv: Allow [i/u]mulExtended to use new nir opcode

Use new nir opcode nir_[i/u]mul_2x32_64 and extract lower and higher 32
bits as needed instead of emitting mul and mul_high.

v2: Surround the switch case with curly braces (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Sagar Ghuge [Mon, 25 Feb 2019 19:43:53 +0000 (11:43 -0800)]

nir/algebraic: Optimize low 32 bit extraction

Optimize a situation where we only need lower 32 bits from 64 bit
result.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Sagar Ghuge [Wed, 27 Feb 2019 22:02:54 +0000 (14:02 -0800)]

glsl: [u/i]mulExtended optimization for GLSL

Optimize mulExtended to use 32x32->64 multiplication.

Drivers which are not based on NIR, they can set the
MUL64_TO_MUL_AND_MUL_HIGH lowering flag in order to have same old
behavior.

v2: Add missing condition check (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <Matt Turner <mattst88@gmail.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Sagar Ghuge [Fri, 15 Feb 2019 07:08:39 +0000 (23:08 -0800)]

nir/glsl: Add another way of doing lower_imul64 for gen8+

On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We
can reduce our 64x64 int multiplication from 4 instructions to 3.

Also instead of emitting two mul instructions, we can emit single mul
instuction and extract low/high 32 bits from 64 bit result for
[i/u]mulExtended

v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand)
    2) Add lower_mul_2x32_64 flag (Matt Turner)
    3) Remove associative property as bit size is different (Connor
       Abbott)

v3: Fix indentation and variable naming convention (Jason Ekstrand)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Axel Davy [Fri, 22 Feb 2019 19:45:51 +0000 (20:45 +0100)]

st/nine: Ignore multisample quality level if no ms

Apparently instead of returning error when passing
a quality level different than 0 for
D3DMULTISAMPLE_NONE, we should pass.

Fixes: https://github.com/iXit/Mesa-3D/issues/340
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Axel Davy <davyaxel0@gmail.com>

commit | commitdiff | tree

Axel Davy [Wed, 2 Jan 2019 21:13:12 +0000 (22:13 +0100)]

st/nine: Ignore window size if error

Check GetWindowInfo and ignore the computed sizes
if there is an error.

Fixes a regression caused by earlier commit when
using old wine gallium nine patches.

Should also address a crash at window destruction.

Related issues:
https://github.com/iXit/Mesa-3D/issues/331
https://github.com/iXit/Mesa-3D/issues/332

Cc: mesa-stable@lists.freedesktop.org
Fixes: 2318ca68bbe ("st/nine: Handle window resize when a presentation
buffer is used")

Signed-off-by: Axel Davy <davyaxel0@gmail.com>

commit | commitdiff | tree

Mauro Rossi [Sat, 2 Mar 2019 22:38:27 +0000 (23:38 +0100)]

android: anv: fix libexpat shared dependency

Fixes undefined reference building errors for XML_* functions

Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "19.0" <mesa-stable@lists.freedesktop.org>

commit | commitdiff | tree

Mauro Rossi [Mon, 4 Mar 2019 09:34:08 +0000 (10:34 +0100)]

android: anv: fix generated files depedencies (v2)

Fix anv_extrypoints.{c,h} and anv_extensions.{c,h} missing dependencies
Rename the variable labels according to targets and python scripts
Align the building rules as per Automake for simplification

Fixes building errors during rebuils due to missing dependencies

(v2) Fixed a missing $(VULKAN_API_XML) reference

Fixes: 9a508b7 ("android: anv/extensions: fix generated sources build")
Fixes: dd088d4bec7 ("anv/extensions: Generate a header file with extension tables")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: "19.0" <mesa-stable@lists.freedesktop.org>

commit | commitdiff | tree

Brian Paul [Sat, 2 Mar 2019 18:26:44 +0000 (11:26 -0700)]

st/wgl: init a variable to silence MinGW warning

MinGW release build says 'value' may be used before being initialized.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

commit | commitdiff | tree

Brian Paul [Fri, 1 Mar 2019 20:56:18 +0000 (13:56 -0700)]

svga: silence array out of bounds warning

MinGW release build complains about a possible out-of-bounds
array access. Test i < 4 to silence it.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

commit | commitdiff | tree

Brian Paul [Fri, 1 Mar 2019 20:55:30 +0000 (13:55 -0700)]

svga: init fill variable to avoid compiler warning

MinGW release builds warns about use of a possbily uninitialized
variable here.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

commit | commitdiff | tree

Brian Paul [Thu, 28 Feb 2019 19:01:02 +0000 (12:01 -0700)]

st/mesa: whitespace fixes in st_texture.h

Trivial.

commit | commitdiff | tree

Brian Paul [Thu, 28 Feb 2019 18:59:16 +0000 (11:59 -0700)]

st/mesa: line wrapping, whitespace fixes in st_cb_texture.c

Trivial.

commit | commitdiff | tree

Brian Paul [Thu, 28 Feb 2019 18:56:31 +0000 (11:56 -0700)]

st/mesa: whitespace fixes in st_sampler_view.c

Replace tabs w/ spaces. 80-column wrapping.
Trivial.

commit | commitdiff | tree

Gurchetan Singh [Sat, 2 Mar 2019 02:58:16 +0000 (18:58 -0800)]

egl/sl: also allow virtgpu to fallback to kms_swrast

virtio-gpu fallbacks to software rendering when 3D features
are unavailable since 6c5ab, and kms_swrast is more
feature complete than swrast.

v2: Add comment (Emil)

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

commit | commitdiff | tree

Mathias Fröhlich [Tue, 26 Feb 2019 05:39:05 +0000 (06:39 +0100)]

st/mesa: Invalidate the gallium array atom only if needed.

Now that the buffer object usage history tracks if it is
being used as vertex buffer object, we can restrict setting
the ST_NEW_VERTEX_ARRAYS bit to dirty on glBufferData calls to
buffers that are potentially used as vertex buffer object.
Also put a note that the same could be done for index arrays
used in indexed draws.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

commit | commitdiff | tree

Mathias Fröhlich [Fri, 21 Dec 2018 17:41:27 +0000 (18:41 +0100)]

mesa: Track buffer object use also for VAO usage.

We already track the usage history for buffer objects
in a lot of aspects. Add GL_ARRAY_BUFFER and
GL_ELEMENT_ARRAY_BUFFER to gl_buffer_object::UsageHistory.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

commit | commitdiff | tree

Samuel Pitoiset [Fri, 1 Mar 2019 17:28:02 +0000 (18:28 +0100)]

rav: use 32_AR instead of 32_ABGR when alpha coverage is required

This export format is faster. Seems to improve performance in
Wreckfest.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Alyssa Rosenzweig [Wed, 27 Feb 2019 04:33:13 +0000 (04:33 +0000)]

panfrost: List primitive restart enable bit

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

commit | commitdiff | tree

Alyssa Rosenzweig [Wed, 27 Feb 2019 05:40:55 +0000 (05:40 +0000)]

panfrost/midgard: Preview for data hazards

If a selected unit causes a data hazard, the whole block gets cut short.
So, we preview for data hazards _while_ selecting units.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com

commit | commitdiff | tree

Alyssa Rosenzweig [Wed, 27 Feb 2019 05:32:16 +0000 (05:32 +0000)]

panfrost/midgard: Promote smul to vmul

smul comes first in the pipeline, before vmul. Until we have a full
instruction scheduler, it's better to have vmul prioritized to maximize
bundle size.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com

commit | commitdiff | tree

Alyssa Rosenzweig [Mon, 4 Mar 2019 05:01:45 +0000 (05:01 +0000)]

panfrost: Flush with offscreen rendering

This special-case was needlessly added and breaks purely offscreen
rendering (when there is no scanout involved)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com

commit | commitdiff | tree

Alyssa Rosenzweig [Wed, 27 Feb 2019 02:06:29 +0000 (02:06 +0000)]

panfrost/midgard: Don't force constant on VLUT

Previously, we forced a #0 inline constant tacked on for the lut
instructions to mirror the blob's behaviour, which caused some
suboptimal codegen due to our constant inlining implementation. Instead,
just don't force a constant at all.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com

commit | commitdiff | tree

Alyssa Rosenzweig [Wed, 27 Feb 2019 00:30:59 +0000 (00:30 +0000)]

panfrost: Cleanup cruft related to clears

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Tue, 26 Feb 2019 23:51:34 +0000 (23:51 +0000)]

panfrost: Decouple Gallium clear from FBD clear

The operations of gallium->clear() and the hardware callbacks are
fundamentally independent. This routine decouples them by routing shared
information via panfrost_job, allowing the hardware half to be deferred
to the fragment job generation.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Mon, 25 Feb 2019 05:32:16 +0000 (05:32 +0000)]

panfrost: Import job data structures from v3d

At the moment, Panfrost state is ad hoc, which creates issues for FBOs.
This commit imports the skeleton of the v3d_job structure as
panfrost_job, in preparation for refactors to organize this state.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

commit | commitdiff | tree

Ilia Mirkin [Fri, 22 Feb 2019 06:13:39 +0000 (01:13 -0500)]

glsl: fix recording of variables for XFB in TCS shaders

This is purely for conformance, since it's not actually possible to do
XFB on TCS output varyings. However we do have to make sure we record
the names correctly, and this removes an extra level of array-ness from
the names in question.

Fixes KHR-GL45.tessellation_shader.single.xfb_captures_data_from_correct_stage

v2: Add comment to the new program_resource_visitor::process function.
(Ilia Mirkin)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108457
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Jose Maria Casanova Crespo [Wed, 21 Nov 2018 17:23:03 +0000 (18:23 +0100)]

glsl: TCS outputs can not be transform feedback candidates on GLES

Avoids regression on:

KHR-GLES*.core.tessellation_shader.single.xfb_captures_data_from_correct_stage

that is uncovered by the following patch.

"glsl: fix recording of variables for XFB in TCS shaders"

v2: Rebased over glsl: fix recording of variables for XFB in TCS shaders
v3: Move this patch before "glsl: fix recording of variables for XFB in TCS
shaders" to avoid temporal regressions. (Illia Mirkin)

Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Jose Maria Casanova Crespo [Wed, 21 Nov 2018 18:22:05 +0000 (19:22 +0100)]

glsl: fix typos in comments "transfor" -> "transform"

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Gert Wollny [Mon, 25 Feb 2019 18:12:07 +0000 (19:12 +0100)]

mesa: Expose EXT_texture_query_lod and add support for its use shaders

EXT_texture_query_lod provides the same functionality for GLES like
the ARB extension with the same name for GL.

v2: Set ES 3.0 as minimum GLES version as required by the extension

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Greg V [Sun, 24 Dec 2017 16:55:46 +0000 (19:55 +0300)]

util: emulate futex on FreeBSD using umtx

Obtained from: FreeBSD ports
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>

commit | commitdiff | tree

Rob Clark [Wed, 27 Feb 2019 14:56:18 +0000 (09:56 -0500)]

freedreno/ir3: track register pressure in sched

Not a perfect solution, and the "pressure" target is hard-coded. But it
doesn't really seem to much in the common case, and avoids exploding
register usage in dEQP ssbo tests.

So this should serve as a stop-gap solution until I have time to re-
write the scheduler.

Hurts slightly in instruction count, but gains (reduces) slightly the
register usage in shader-db. Fixes ~150 dEQP-GLES31.functional.ssbo.*
that were failing due to RA fail.

Signed-off-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Rob Clark [Sat, 24 Nov 2018 17:18:08 +0000 (12:18 -0500)]

freedreno/ir3: add Sethi–Ullman numbering pass

Signed-off-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Rob Clark [Wed, 27 Feb 2019 20:57:23 +0000 (15:57 -0500)]

freedreno/ir3: include nopN in expanded instruction count

Signed-off-by: Rob Clark <robdclark@gmail.com>

commit | commitdiff | tree

Dave Airlie [Thu, 10 Jan 2019 06:24:57 +0000 (16:24 +1000)]

st/mesa: add support for lowering fp64/int64 for nir drivers

This might enough for iris and possible r600 (when it gets NIR)

This appears to work for iris.

v2:
* change cap return so DOUBLES == 2 means sw emu

v3:
* Refactor using int64/doubles lowering options which were added
into nir options
* Remove DOUBLES == 2 added in v2

[jordan: Remove "2" value on PIPE_CAP_DOUBLES]
[jordan: Use lowering options added to nir options]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jordan Justen [Tue, 26 Feb 2019 07:26:16 +0000 (23:26 -0800)]

scons: Generate float64_glsl.h for glsl_to_nir fp64 lowering

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jordan Justen [Tue, 26 Feb 2019 01:17:29 +0000 (17:17 -0800)]

intel/compiler: Move int64/doubles lowering options

Instead of calculating the int64 and doubles lowering options each
time a shader is preprocessed, save and use the values in
nir_shader_compiler_options.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Jordan Justen [Tue, 26 Feb 2019 01:13:48 +0000 (17:13 -0800)]

nir: Add int64/doubles options into nir_shader_compiler_options

This will allow the options to be visible under nir_shader->options,
which will allow the gallium state_tracker to use the driver preferred
settings during glsl_to_nir.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Ian Romanick [Fri, 1 Mar 2019 22:41:59 +0000 (14:41 -0800)]

nir/algebraic: Optimize away an fsat of a b2f

The b2f can only produce 0.0 or 1.0, so the fsat does nothing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Ian Romanick [Fri, 1 Mar 2019 22:39:14 +0000 (14:39 -0800)]

intel/fs: Don't assert on b2f with a saturate modifier

This ran afoul of Iris's use of nir_lower_clamp_color_outputs which
applies fsat() before writes to vertex shader color outpus.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7725d609387 ("intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))")

commit | commitdiff | tree

Lionel Landwerlin [Sat, 23 Feb 2019 23:27:17 +0000 (23:27 +0000)]

anv: add support for INTEL_DEBUG=bat

As requested by Ken ;)

v2: Also decode simple batches (Caio)
Fix u_vector usage issues (Lionel)

v3: Make binding/instruction/state/surface available (Lionel)

v4: Going through device pools for simple batches (Lionel)
Centralize search BO callbacks into anv_device.c (Lionel)

v5: Clear decoded batch buffer var after use (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Eric Anholt [Fri, 1 Mar 2019 20:48:51 +0000 (12:48 -0800)]

v3d: Fix build of NEON code with Mesa's cflags not targeting NEON.

v3d may be built as part of a set of drivers in a system not requiring
NEON, but we know V3D devices will be paired with CPUs with NEON so we
should be able to use this asm.

Fixes: 0c05198d6b5b ("v3d: Always enable the NEON utile load/store code.")

commit | commitdiff | tree

Matt Turner [Thu, 15 Feb 2018 22:43:30 +0000 (14:43 -0800)]

intel/compiler: Add commas on final values of compaction table arrays

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

commit | commitdiff | tree

Ian Romanick [Sat, 23 Feb 2019 00:47:06 +0000 (16:47 -0800)]

nir/algebraic: Replace a-fract(a) with floor(a)

I noticed this while looking at a shader that was affected by Tim's
"more loop unrolling" series.

In review, Tim Arceri asked:
> Why the hurt on Gen6+ is this something that should be in the late
> optimisations pass?

As far as I can tell, it's just because our scheduler is terrible.  In
all the fragment shaders that I looked at (some hurt shaders were from
other stages), only one of the SIMD8 or SIMD16 version would be hurt.
In many of those case, the other SIMD width is improved (e.g.,
shaders/closed/steam/brutal-legend/3990.shader_test).

Often it looks like the scheduler decides to differently schedule a SEND
the occurs somewhere early in the shader.  Once that happens, everything
is different.

I looked at one vertex shader that was hurt (from Goat Simulator).  In
that case, both the floor and fract are used.  The optimization
eliminates the add, and it should allow better scheduling.  In the area
of the FRC and RNDD instructions, the scheduler does the right thing.
However, later in the shader a MAD and and ADD get scheduled
differently, and that makes it slightly worse.

In light of this, I tried adding some "is_used_once" mark-up, and that
did not fix all the cycles regressions.  It also did a lot more harm
than good on SKL (helped 82 vs. hurt 241).

All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15437001 -> 15435259 (-0.01%)
instructions in affected programs: 213651 -> 211909 (-0.82%)
helped: 988
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1
helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59%
95% mean confidence interval for instructions value: -1.89 -1.63
95% mean confidence interval for instructions %-change: -1.23% -1.05%
Instructions are helped.

total cycles in shared programs: 383007378 -> 382997063 (<.01%)
cycles in affected programs: 1650825 -> 1640510 (-0.62%)
helped: 679
HURT: 302
helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14
helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98%
HURT stats (abs)   min: 1 max: 250 x̄: 18.43 x̃: 7
HURT stats (rel)   min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53%
95% mean confidence interval for cycles value: -13.05 -7.98
95% mean confidence interval for cycles %-change: -0.86% -0.50%
Cycles are helped.

Iron Lake and GM45 had similar results. (GM45 shown)
total instructions in shared programs: 5043616 -> 5043010 (-0.01%)
instructions in affected programs: 119691 -> 119085 (-0.51%)
helped: 432
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39%
95% mean confidence interval for instructions value: -1.58 -1.23
95% mean confidence interval for instructions %-change: -0.72% -0.59%
Instructions are helped.

total cycles in shared programs: 128139812 -> 128135762 (<.01%)
cycles in affected programs: 3829724 -> 3825674 (-0.11%)
helped: 602
HURT: 0
helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6
helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10%
95% mean confidence interval for cycles value: -8.40 -5.05
95% mean confidence interval for cycles %-change: -0.22% -0.16%
Cycles are helped.

Reviewed-by: Elie Tournier <tournier.elie@gmail.com>

commit | commitdiff | tree

Ian Romanick [Mon, 3 Dec 2018 20:06:50 +0000 (12:06 -0800)]

intel/fs: Generate if instructions with inverted conditions

Per-platform results were all over the place, so I have included all the
results here.  There is an important note at the bottom of the commit
message.

Skylake
total instructions in shared programs: 15184683 -> 15184679 (<.01%)
instructions in affected programs: 2786 -> 2782 (-0.14%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.05% max: 0.84% x̄: 0.44% x̃: 0.44%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.96% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 370961367 -> 370961173 (<.01%)
cycles in affected programs: 205867 -> 205673 (-0.09%)
helped: 5
HURT: 1
helped stats (abs) min: 1 max: 149 x̄: 39.60 x̃: 16
helped stats (rel) min: 0.02% max: 1.05% x̄: 0.45% x̃: 0.55%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -93.01 28.34
95% mean confidence interval for cycles %-change: -0.82% 0.08%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total instructions in shared programs: 15465366 -> 15465362 (<.01%)
instructions in affected programs: 2799 -> 2795 (-0.14%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.04% max: 0.84% x̄: 0.44% x̃: 0.44%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.96% 0.07%
Inconclusive result (%-change mean confidence interval includes 0).

total cycles in shared programs: 410938419 -> 410938531 (<.01%)
cycles in affected programs: 566028 -> 566140 (0.02%)
helped: 18
HURT: 17
helped stats (abs) min: 1 max: 16 x̄: 3.50 x̃: 1
helped stats (rel) min: <.01% max: 1.05% x̄: 0.13% x̃: <.01%
HURT stats (abs)   min: 1 max: 12 x̄: 10.29 x̃: 12
HURT stats (rel)   min: <.01% max: 0.16% x̄: 0.08% x̃: 0.09%
95% mean confidence interval for cycles value: 0.31 6.09
95% mean confidence interval for cycles %-change: -0.10% 0.05%
Inconclusive result (%-change mean confidence interval includes 0).

Haswell
total instructions in shared programs: 13749760 -> 13749759 (<.01%)
instructions in affected programs: 2241 -> 2240 (-0.04%)
helped: 1
HURT: 0

total cycles in shared programs: 385398913 -> 385398363 (<.01%)
cycles in affected programs: 554914 -> 554364 (-0.10%)
helped: 31
HURT: 1
helped stats (abs) min: 1 max: 453 x̄: 18.00 x̃: 6
helped stats (rel) min: <.01% max: 0.25% x̄: 0.03% x̃: 0.05%
HURT stats (abs)   min: 8 max: 8 x̄: 8.00 x̃: 8
HURT stats (rel)   min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for cycles value: -45.88 11.51
95% mean confidence interval for cycles %-change: -0.05% -0.02%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total cycles in shared programs: 180663626 -> 180663881 (<.01%)
cycles in affected programs: 472350 -> 472605 (0.05%)
helped: 15
HURT: 30
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 8 max: 10 x̄: 9.00 x̃: 9
HURT stats (rel)   min: 0.06% max: 0.14% x̄: 0.10% x̃: 0.10%
95% mean confidence interval for cycles value: 4.21 7.12
95% mean confidence interval for cycles %-change: 0.05% 0.08%
Cycles are HURT.

Sandy Bridge
total cycles in shared programs: 154568664 -> 154569225 (<.01%)
cycles in affected programs: 356486 -> 357047 (0.16%)
helped: 1
HURT: 31
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02%
HURT stats (abs)   min: 4 max: 33 x̄: 18.16 x̃: 8
HURT stats (rel)   min: 0.05% max: 0.23% x̄: 0.14% x̃: 0.10%
95% mean confidence interval for cycles value: 12.19 22.87
95% mean confidence interval for cycles %-change: 0.10% 0.16%
Cycles are HURT.

Iron Lake
total instructions in shared programs: 8206589 -> 8206565 (<.01%)
instructions in affected programs: 3024 -> 3000 (-0.79%)
helped: 12
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.75% max: 0.83% x̄: 0.80% x̃: 0.80%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.82% -0.77%
Instructions are helped.

total cycles in shared programs: 187657428 -> 187656228 (<.01%)
cycles in affected programs: 95748 -> 94548 (-1.25%)
helped: 12
HURT: 0
helped stats (abs) min: 80 max: 120 x̄: 100.00 x̃: 100
helped stats (rel) min: 1.00% max: 1.66% x̄: 1.27% x̃: 1.21%
95% mean confidence interval for cycles value: -113.27 -86.73
95% mean confidence interval for cycles %-change: -1.43% -1.11%
Cycles are helped.

GM45
total instructions in shared programs: 5037569 -> 5037557 (<.01%)
instructions in affected programs: 1521 -> 1509 (-0.79%)
helped: 6
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.75% max: 0.83% x̄: 0.79% x̃: 0.79%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.83% -0.75%
Instructions are helped.

total cycles in shared programs: 128101478 -> 128100758 (<.01%)
cycles in affected programs: 52746 -> 52026 (-1.37%)
helped: 6
HURT: 0
helped stats (abs) min: 120 max: 120 x̄: 120.00 x̃: 120
helped stats (rel) min: 1.16% max: 1.66% x̄: 1.41% x̃: 1.41%
95% mean confidence interval for cycles value: -120.00 -120.00
95% mean confidence interval for cycles %-change: -1.70% -1.12%
Cycles are helped.

This change has almost no effect right now.  However, removing this
patch (but leaving the patch "nir/algebraic: Replace a bcsel of a b2f
with a b2f(!(a || b))") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:

Skylake
total instructions in shared programs: 15071022 -> 15089710 (0.12%)
instructions in affected programs: 1022219 -> 1040907 (1.83%)
helped: 1
HURT: 3937
helped stats (abs) min: 41 max: 41 x̄: 41.00 x̃: 41
helped stats (rel) min: 1.01% max: 1.01% x̄: 1.01% x̃: 1.01%
HURT stats (abs)   min: 1 max: 256 x̄: 4.76 x̃: 4
HURT stats (rel)   min: 0.05% max: 11.18% x̄: 2.59% x̃: 2.60%
95% mean confidence interval for instructions value: 4.56 4.93
95% mean confidence interval for instructions %-change: 2.54% 2.64%
Instructions are HURT.

total cycles in shared programs: 369777134 -> 370092923 (0.09%)
cycles in affected programs: 17516573 -> 17832362 (1.80%)
helped: 115
HURT: 3624
helped stats (abs) min: 1 max: 1721 x̄: 81.18 x̃: 28
helped stats (rel) min: <.01% max: 10.74% x̄: 1.24% x̃: 0.65%
HURT stats (abs)   min: 1 max: 12640 x̄: 89.71 x̃: 54
HURT stats (rel)   min: <.01% max: 28.24% x̄: 4.72% x̃: 4.52%
95% mean confidence interval for cycles value: 75.21 93.71
95% mean confidence interval for cycles %-change: 4.43% 4.64%
Cycles are HURT.

total spills in shared programs: 9450 -> 9442 (-0.08%)
spills in affected programs: 166 -> 158 (-4.82%)
helped: 2
HURT: 0

total fills in shared programs: 21115 -> 21094 (-0.10%)
fills in affected programs: 438 -> 417 (-4.79%)
helped: 2
HURT: 0

LOST:   1
GAINED: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Ian Romanick [Mon, 3 Dec 2018 22:41:07 +0000 (14:41 -0800)]

nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a || b))

I have not investigated the result of doing this during code
generation.  That should be possible, but it would be a bit more
effort.

All Gen6+ platforms had nearly identical results. (Skylake shown)
total cycles in shared programs: 370961508 -> 370961367 (<.01%)
cycles in affected programs: 5174 -> 5033 (-2.73%)
helped: 2
HURT: 0

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8206587 -> 8206589 (<.01%)
instructions in affected programs: 1325 -> 1327 (0.15%)
helped: 0
HURT: 2

total cycles in shared programs: 187657422 -> 187657428 (<.01%)
cycles in affected programs: 11566 -> 11572 (0.05%)
helped: 0
HURT: 2

This change has almost no effect right now.  However, removing this
patch (but leaving the patch "intel/fs: Generate if instructions with
inverted conditions") after adding a patch that removes !(a < b) -> (a
>= b) optimizations (like
https://patchwork.freedesktop.org/patch/264787/) has the following
results on Skylake:

Skylake
total instructions in shared programs: 15071804 -> 15071806 (<.01%)
instructions in affected programs: 640 -> 642 (0.31%)
helped: 0
HURT: 2

total cycles in shared programs: 369914348 -> 369916569 (<.01%)
cycles in affected programs: 27900 -> 30121 (7.96%)
helped: 4
HURT: 15
helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3
helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40%
HURT stats (abs)   min: 2 max: 758 x̄: 156.07 x̃: 81
HURT stats (rel)   min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91%
95% mean confidence interval for cycles value: 12.68 221.11
95% mean confidence interval for cycles %-change: 3.09% 21.23%
Cycles are HURT.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Ian Romanick [Mon, 3 Dec 2018 23:53:36 +0000 (15:53 -0800)]

intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))

Since Boolean values are either -1 (true) or 0 (false), b2f(inot(a))
maps -1 => 0.0 and 0 => 1.0.  This is equivalent to 1.0 +
float(boolBitsToInt(a)).  On Intel GPUs, ADD is one of the few
instructions that can type-convert during write to destination, so we
can achieve this in a single instruction:

    add    g47F, g26D, 1D

v2: Fix swizzles.

v3: Fix typos in comments.  Noticed by Ken.

All Gen6+ platforms had similar results. (Skylake shown)
Skylake
total instructions in shared programs: 15185583 -> 15184683 (<.01%)
instructions in affected programs: 239389 -> 238489 (-0.38%)
helped: 899
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.15% max: 1.85% x̄: 0.49% x̃: 0.44%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.09% max: 0.09% x̄: 0.09% x̃: 0.09%
95% mean confidence interval for instructions value: -1.01 -0.99
95% mean confidence interval for instructions %-change: -0.51% -0.48%
Instructions are helped.

total cycles in shared programs: 370964249 -> 370961508 (<.01%)
cycles in affected programs: 1487586 -> 1484845 (-0.18%)
helped: 420
HURT: 268
helped stats (abs) min: 1 max: 232 x̄: 22.41 x̃: 6
helped stats (rel) min: 0.05% max: 22.60% x̄: 1.30% x̃: 0.41%
HURT stats (abs)   min: 1 max: 230 x̄: 24.90 x̃: 10
HURT stats (rel)   min: <.01% max: 21.60% x̄: 1.45% x̃: 0.52%
95% mean confidence interval for cycles value: -7.61 -0.36
95% mean confidence interval for cycles %-change: -0.44% -0.02%
Cycles are helped.

No changes on Iron Lake or GM45.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Ian Romanick [Thu, 9 Feb 2017 15:21:47 +0000 (15:21 +0000)]

intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+

Instead of emitting ~(a & b), emit (~a | ~b) since logical-not of
operands is free on Gen8+.

v2: Fix swizzles.  Fix types for cmod propagation.

v3: Simplify logic for inverting source of inot(ixor(a, b)).  Suggested
by Ken.

Skylake and Broadwell had similar results. (Skylake shown)
Skylake
total instructions in shared programs: 15185593 -> 15185583 (<.01%)
instructions in affected programs: 5673 -> 5663 (-0.18%)
helped: 12
HURT: 1
helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1
helped stats (rel) min: 0.30% max: 5.88% x̄: 1.50% x̃: 0.70%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
95% mean confidence interval for instructions value: -1.66 0.13
95% mean confidence interval for instructions %-change: -2.60% -0.15%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 370977726 -> 370964249 (<.01%)
cycles in affected programs: 869987 -> 856510 (-1.55%)
helped: 15
HURT: 2
helped stats (abs) min: 2 max: 6640 x̄: 902.20 x̃: 16
helped stats (rel) min: <.01% max: 4.92% x̄: 1.71% x̃: 1.53%
HURT stats (abs)   min: 14 max: 42 x̄: 28.00 x̃: 28
HURT stats (rel)   min: 1.08% max: 3.18% x̄: 2.13% x̃: 2.13%
95% mean confidence interval for cycles value: -1654.87 69.34
95% mean confidence interval for cycles %-change: -2.29% -0.23%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Ian Romanick [Thu, 9 Feb 2017 15:20:04 +0000 (15:20 +0000)]

intel/fs: Emit logical-not of operands on Gen8+

On Gen8+ specifying negation of a logical operation such as AND actually
performs a logical-not.  Take advantage of this to generate fewer
instructions.

v2: Major rebase.  Use nir_src_as_alu_instr.  Fix swizzle handling.

No changes on any pre-Gen8 platform.

Skylake and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 15466902 -> 15466274 (<.01%)
instructions in affected programs: 1262953 -> 1262325 (-0.05%)
helped: 682
HURT: 4
helped stats (abs) min: 1 max: 5 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.03% max: 2.40% x̄: 0.18% x̃: 0.04%
HURT stats (abs)   min: 1 max: 62 x̄: 17.50 x̃: 3
HURT stats (rel)   min: 0.03% max: 1.89% x̄: 0.53% x̃: 0.10%
95% mean confidence interval for instructions value: -1.10 -0.73
95% mean confidence interval for instructions %-change: -0.19% -0.15%
Instructions are helped.

total cycles in shared programs: 410996093 -> 410950440 (-0.01%)
cycles in affected programs: 144389048 -> 144343395 (-0.03%)
helped: 519
HURT: 51
helped stats (abs) min: 1 max: 1060 x̄: 104.46 x̃: 140
helped stats (rel) min: 0.01% max: 10.98% x̄: 0.34% x̃: 0.03%
HURT stats (abs)   min: 1 max: 4060 x̄: 167.90 x̃: 22
HURT stats (rel)   min: <.01% max: 8.20% x̄: 0.96% x̃: 0.25%
95% mean confidence interval for cycles value: -97.16 -63.02
95% mean confidence interval for cycles %-change: -0.32% -0.13%
Cycles are helped.

total spills in shared programs: 95311 -> 95329 (0.02%)
spills in affected programs: 881 -> 899 (2.04%)
helped: 0
HURT: 4

total fills in shared programs: 93629 -> 93634 (<.01%)
fills in affected programs: 794 -> 799 (0.63%)
helped: 1
HURT: 2

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Ian Romanick [Wed, 5 Dec 2018 19:35:37 +0000 (11:35 -0800)]

intel/fs: Refactor ALU source and destination handling to a separate function

Other places will need to do this soon to properly handle source
swizzles.  The patch looks a little odd, but the change is pretty
straight forward.  All of the swizzle and mask handling is moved out,
but the code for handling move instructions and vecN instructions
remains in nir_emit_alu.

I'm not terribly pleased with the "need_dest" parameter, but
get_nir_dest is (somewhat surprisingly) destructive.  I am open to
suggestions of alternatives.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>