mesa.git
5 years agoradv: Don't optimize after lowering FS inputs
Connor Abbott [Tue, 9 Jul 2019 11:43:13 +0000 (13:43 +0200)]
radv: Don't optimize after lowering FS inputs

Currently this is done rather late in radv, after lowering booleans, so
it isn't safe to run additional optimizations that may add e.g. 1-bit
booleans. We could move the lowering parts earlier, but since right now
we only lower FS inputs and by this point all indirects have been
lowered away, there's no reason we should need to optimize anything.

One shader from Devil May Cry 5 was getting optimized, but only because
the optimization loop was working on 32-bit booleans which revealed an
opportunity that was hidden with 1-bit booleans, and we generated a
1-bit boolean which is invalid.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111092
Fixes: 118a66df9907772bb9e5503b736c95d7bb62d52c
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoandroid: amd/addrlib: add gfx10 support
Mauro Rossi [Sat, 6 Jul 2019 19:16:04 +0000 (21:16 +0200)]
android: amd/addrlib: add gfx10 support

Fix the following building error:

external/mesa/src/amd/addrlib/src/gfx10/gfx10addrlib.cpp:35:10:
fatal error: 'gfx10_gb_reg.h' file not found
         ^~~~~~~~~~~~~~~~
1 error generated.

Fixes: 78cdf9a ("amd/addrlib: add gfx10 support")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
5 years agoandroid: amd/common/gfx10: add register JSON
Mauro Rossi [Sat, 6 Jul 2019 18:21:31 +0000 (20:21 +0200)]
android: amd/common/gfx10: add register JSON

The necessary Android makefile building rules are added
and the generation rules are simplified for readability

Fixes the following building errors:

external/mesa/src/amd/common/ac_llvm_build.c:1496:45:
error: use of undeclared identifier 'V_008F0C_IMG_FORMAT_8_UINT'
   case V_008F0C_BUF_DATA_FORMAT_8: format = V_008F0C_IMG_FORMAT_8_UINT; break;
                                             ^
Fixes: 74a26af ("amd/common/gfx10: add register JSON")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
5 years agoandroid: radeonsi/gfx10: generate gfx10_format_table.h (v2)
Mauro Rossi [Sat, 6 Jul 2019 19:27:21 +0000 (21:27 +0200)]
android: radeonsi/gfx10: generate gfx10_format_table.h (v2)

Fix Android building rules for gfx10_format_table.h generated header

(v2) Add LOCAL_C_INCLUDES += $(intermediates)/radeonsi to fix error:

external/mesa/src/gallium/drivers/radeonsi/si_state.c:46:10:
fatal error: 'gfx10_format_table.h' file not found
         ^~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: 0ffa229 ("radeonsi/gfx10: generate gfx10_format_table.h")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
5 years agoandroid: virgl: remove unnecessary LOCAL_C_INCLUDES
Chih-Wei Huang [Mon, 8 Jul 2019 02:16:18 +0000 (10:16 +0800)]
android: virgl: remove unnecessary LOCAL_C_INCLUDES

The path could be imported automatically.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>
5 years agoandroid: vulkan/util: fix generating vk_enum_to_str.*
Chih-Wei Huang [Fri, 5 Jul 2019 08:35:19 +0000 (16:35 +0800)]
android: vulkan/util: fix generating vk_enum_to_str.*

The gen_enum_to_str.py generates vk_enum_to_str.c and its header at once.
However, the makefiles incorrectly list both files parallel with the same
recipes. That means both two files may be generated simultaneously by two
processes. The generating files may be truncated by another process, as
shown below:

$ cd $OUT/obj/STATIC_LIBRARIES/libmesa_vulkan_util_intermediates/util
$ ls -l

-rw-rw-r-- 1 lh lh 193713 Jul  5 13:31 vk_enum_to_str.c
-rw-rw-r-- 1 lh lh   4609 Jul  5 13:31 vk_enum_to_str.d
-rw-rw-r-- 1 lh lh      0 Jul  5 16:21 vk_enum_to_str.h

Let one file depends on the other with empty recipe to avoid the issue.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoandroid: radv: import include paths from used libraries
Chih-Wei Huang [Tue, 25 Jun 2019 09:11:12 +0000 (17:11 +0800)]
android: radv: import include paths from used libraries

It's unnecessary to manually add these include paths since they could
be imported automatically.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoandroid: anv: import include path of libmesa_nir
Chih-Wei Huang [Thu, 20 Jun 2019 10:14:35 +0000 (18:14 +0800)]
android: anv: import include path of libmesa_nir

Add libmesa_nir to a common LOCAL_STATIC_LIBRARIES defined by
ANV_STATIC_LIBRARIES so that its include path can be imported
automatically. Then ANV_INCLUDES is unnecessary and could be
eliminated.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoandroid: anv: eliminate libmesa_anv_entrypoints
Chih-Wei Huang [Thu, 20 Jun 2019 10:13:36 +0000 (18:13 +0800)]
android: anv: eliminate libmesa_anv_entrypoints

The dummy library libmesa_anv_entrypoints is totally unnecessary.
The four VULKAN_GENERATED_FILES could be generated and built in
libmesa_vulkan_common directly. The libraries using the generated
headers should get it via the exported include path.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoandroid: vulkan/util: fix export path
Chih-Wei Huang [Thu, 20 Jun 2019 07:51:44 +0000 (15:51 +0800)]
android: vulkan/util: fix export path

Export the correct include path so that the libraries use it can
get it automatically.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoandroid: radv: fix improper use of LOCAL_WHOLE_STATIC_LIBRARIES
Chih-Wei Huang [Thu, 20 Jun 2019 07:46:22 +0000 (15:46 +0800)]
android: radv: fix improper use of LOCAL_WHOLE_STATIC_LIBRARIES

The libmesa_git_sha1 is a dummy library. There is no reason to put
it into LOCAL_WHOLE_STATIC_LIBRARIES.

Move libmesa_vulkan_util to the vulkan.radv which really needs it.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoandroid: anv: fix improper use of LOCAL_WHOLE_STATIC_LIBRARIES
Chih-Wei Huang [Thu, 20 Jun 2019 07:45:03 +0000 (15:45 +0800)]
android: anv: fix improper use of LOCAL_WHOLE_STATIC_LIBRARIES

The libmesa_anv_entrypoints and libmesa_genxml are dummy libraries.
There is no reason to put them into LOCAL_WHOLE_STATIC_LIBRARIES.

Move libmesa_vulkan_util to the vulkan HAL which really needs it.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoandroid: radv: remove unused LOCAL_EXPORT_C_INCLUDE_DIRS
Chih-Wei Huang [Wed, 19 Jun 2019 08:48:52 +0000 (16:48 +0800)]
android: radv: remove unused LOCAL_EXPORT_C_INCLUDE_DIRS

The vulkan module is the final HAL. No need to export its headers
since none will import it.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoandroid: anv: remove unused LOCAL_EXPORT_C_INCLUDE_DIRS
Chih-Wei Huang [Wed, 19 Jun 2019 08:16:27 +0000 (16:16 +0800)]
android: anv: remove unused LOCAL_EXPORT_C_INCLUDE_DIRS

The vulkan module is the final HAL. No need to export its headers
since none will import it.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
5 years agonir/loop_analyze: Pass nir_const_values directly to helpers
Jason Ekstrand [Tue, 25 Jun 2019 01:27:26 +0000 (20:27 -0500)]
nir/loop_analyze: Pass nir_const_values directly to helpers

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/loop_analyze: Properly handle swizzles in loop conditions
Jason Ekstrand [Fri, 21 Jun 2019 14:18:16 +0000 (09:18 -0500)]
nir/loop_analyze: Properly handle swizzles in loop conditions

This commit re-plumbs all of nir_loop_analyze to use nir_ssa_scalar for
all intermediate values so that we can properly handle swizzles.  Even
though if conditions are required to be scalars, they may still consume
swizzles so you could have ((a.yzw < b.zzx).xz && c.xx).y == 0 as your
loop termination condition.  The old code would just bail the moment it
saw its first non-zero swizzle but we can now properly chase the scalar
from the if condition to all the way to a, b, and c.

Shader-db results on Kaby Lake:

    total loops in shared programs: 4388 -> 4364 (-0.55%)
    loops in affected programs: 29 -> 5 (-82.76%)
    helped: 29
    HURT: 5

Shader-db results on Haswell:

    total loops in shared programs: 4370 -> 4373 (0.07%)
    loops in affected programs: 2 -> 5 (150.00%)
    helped: 2
    HURT: 5

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/loop_analyze: Refactor detection of limit vars
Jason Ekstrand [Mon, 24 Jun 2019 22:33:02 +0000 (17:33 -0500)]
nir/loop_analyze: Refactor detection of limit vars

This commit reworks both get_induction_and_limit_vars() and
try_find_trip_count_vars_in_iand to return true on success and not
modify their output parameters on failure.  This makes their callers
significantly simpler.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir: Add some helpers for chasing SSA values properly
Jason Ekstrand [Thu, 20 Jun 2019 16:12:54 +0000 (11:12 -0500)]
nir: Add some helpers for chasing SSA values properly

There are various cases in which we want to chase SSA values through ALU
ops ranging from hand-written optimizations to back-end translation
code.  In all these cases, it can be very tricky to do properly because
of swizzles.  This set of helpers lets you easily work with a single
component of an SSA def and chase through ALU ops safely.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/loop_analyze: Bail if we encounter swizzles
Jason Ekstrand [Mon, 24 Jun 2019 21:21:07 +0000 (16:21 -0500)]
nir/loop_analyze: Bail if we encounter swizzles

None of the current code knows what to do with swizzles.  Take the safe
option for now and bail if we see one.  This does have a small shader-db
impact but it is at least safe.

Shader-db results on Kaby Lake:

    total loops in shared programs: 4364 -> 4388 (0.55%)
    loops in affected programs: 5 -> 29 (480.00%)
    helped: 5
    HURT: 29

Shader-db results on Haswell:

    total loops in shared programs: 4373 -> 4370 (-0.07%)
    loops in affected programs: 5 -> 2 (-60.00%)
    helped: 5
    HURT: 2

Fixes: 6772a17acc8ee "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/loop_analyze: Use new eval_const_* helpers in test_iterations
Jason Ekstrand [Thu, 20 Jun 2019 21:29:30 +0000 (16:29 -0500)]
nir/loop_analyze: Use new eval_const_* helpers in test_iterations

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/loop_analyze: Handle bit sizes correctly in calculate_iterations
Jason Ekstrand [Thu, 20 Jun 2019 21:26:19 +0000 (16:26 -0500)]
nir/loop_analyze: Handle bit sizes correctly in calculate_iterations

The current code assumes everything is 32-bit which is very likely true
but not guaranteed by any means.  Instead, use nir_eval_const_opcode to
do the calculations in a bit-size-agnostic way.  We also use the new
constant constructors to build the correct size constants.

Fixes: 6772a17acc8ee "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/loop_analyze: Fix phi-of-identical-alu detection
Jason Ekstrand [Thu, 20 Jun 2019 21:13:39 +0000 (16:13 -0500)]
nir/loop_analyze: Fix phi-of-identical-alu detection

One issue was that the original version didn't check that swizzles
matched when comparing ALU instructions so it could end up matching
very different instructions.  Using the nir_instrs_equal function from
nir_instr_set.c which we use for CSE should be much more reliable.
Another was that the loop assumes it will only run two iterations which
may not be true.  If there's something which guarantees that this case
only happens for phis after ifs, it wasn't documented.

Fixes: 9e6b39e1d521 "nir: detect more induction variables"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/instr_set: Expose nir_instrs_equal()
Jason Ekstrand [Thu, 20 Jun 2019 18:47:30 +0000 (13:47 -0500)]
nir/instr_set: Expose nir_instrs_equal()

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/builder: Use nir_const_value_for_* for constructing immediates
Jason Ekstrand [Mon, 24 Jun 2019 23:23:29 +0000 (18:23 -0500)]
nir/builder: Use nir_const_value_for_* for constructing immediates

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir: Refactor nir_src_as_* constant functions
Jason Ekstrand [Wed, 26 Jun 2019 01:33:46 +0000 (20:33 -0500)]
nir: Refactor nir_src_as_* constant functions

Now that we have the nir_const_value_as_* helpers, every one of these
functions is effectively the same except for the suffix they use so we
can easily define them with a repeated macro.  This also means that
they're inline and the fact that the nir_src is being passed by-value
should no longer really hurt anything.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir: Add more helpers for working with const values
Jason Ekstrand [Thu, 20 Jun 2019 15:36:10 +0000 (10:36 -0500)]
nir: Add more helpers for working with const values

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agovirgl: remove virgl_transfer_queue_lists
Chia-I Wu [Mon, 8 Jul 2019 23:45:36 +0000 (16:45 -0700)]
virgl: remove virgl_transfer_queue_lists

COMPLETED_LIST is always empty.  We only need one list.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
5 years agovirgl: simplify virgl_transfer_queue_extend
Chia-I Wu [Mon, 8 Jul 2019 23:31:46 +0000 (16:31 -0700)]
virgl: simplify virgl_transfer_queue_extend

We can reuse virgl_transfer_queue_find_pending.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
5 years agovirgl: remove transfer after transfer_write
Chia-I Wu [Mon, 8 Jul 2019 21:35:27 +0000 (14:35 -0700)]
virgl: remove transfer after transfer_write

Now that virgl_transfer_queue_is_queued does not search
COMPLETED_LIST, we don't need to move transfers to that list.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
5 years agovirgl: improve virgl_transfer_queue_is_queued
Chia-I Wu [Mon, 8 Jul 2019 23:34:32 +0000 (16:34 -0700)]
virgl: improve virgl_transfer_queue_is_queued

Search only the pending list and return immediately on the first
hit.

When the transfer queue was introduced, the function was used to
deal with

  write transfer -> draw -> write transfer

sequence.  It was used to tell if the second transfer intersects
with the first transfer. If yes, the transfer queue avoided
reordering the second transfer to before the draw (by flushing) in
case the draw uses the transferred data.

With the recent changes to the transfer code, the function is used
to deal with

  write transfer -> readback transfer

We want to avoid reordering the readback transfer to before the
first transfer (also by flushing).

In the old code, we needed to track the compeleted transfers as well
to avoid reordering.  But in the new code, a readback transfer is
guaranteed to see the data from the completed transfers (in other
words, it cannot be reoderered to before the already completed
transfers).  We don't need to search the COMPLETED_LIST.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
5 years agovirgl: fix transfers_intersect for mipmaps
Chia-I Wu [Mon, 8 Jul 2019 23:20:01 +0000 (16:20 -0700)]
virgl: fix transfers_intersect for mipmaps

We never use transfers_intersect with textures, but fix it anyway to
avoid confusion.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
5 years agovirgl: fix some false positives in transfers_overlap
Chia-I Wu [Mon, 8 Jul 2019 23:12:29 +0000 (16:12 -0700)]
virgl: fix some false positives in transfers_overlap

Rewrite the function and check z/depth more carefully.  We
intentionally avoid u_box_test_intersection_2d because it returns
true when two boxes touch but do not intersect and can be confusing.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
5 years agoradeonsi/gfx10: enable primitive binning by default
Marek Olšák [Thu, 4 Jul 2019 02:24:36 +0000 (22:24 -0400)]
radeonsi/gfx10: enable primitive binning by default

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx10: implement primitive binning
Marek Olšák [Wed, 3 Jul 2019 02:34:42 +0000 (22:34 -0400)]
radeonsi/gfx10: implement primitive binning

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi: simplify primitive binning enablement
Marek Olšák [Thu, 4 Jul 2019 02:27:12 +0000 (22:27 -0400)]
radeonsi: simplify primitive binning enablement

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi: set primitive binning tunables for dGPUs
Marek Olšák [Thu, 4 Jul 2019 02:23:18 +0000 (22:23 -0400)]
radeonsi: set primitive binning tunables for dGPUs

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi: set FLUSH_ON_BINNING_TRANSITION when needed
Marek Olšák [Thu, 4 Jul 2019 02:04:30 +0000 (22:04 -0400)]
radeonsi: set FLUSH_ON_BINNING_TRANSITION when needed

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx10: use the new scan converter when binning is disabled
Marek Olšák [Thu, 4 Jul 2019 01:57:43 +0000 (21:57 -0400)]
radeonsi/gfx10: use the new scan converter when binning is disabled

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx9: fix an oversight in primitive binning code
Marek Olšák [Wed, 3 Jul 2019 02:31:14 +0000 (22:31 -0400)]
radeonsi/gfx9: fix an oversight in primitive binning code

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi: use BREAK_BATCH instead of FLUSH_DFSM when CB_TARGET_MASK changes
Marek Olšák [Thu, 4 Jul 2019 01:12:46 +0000 (21:12 -0400)]
radeonsi: use BREAK_BATCH instead of FLUSH_DFSM when CB_TARGET_MASK changes

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx10: don't expose unimplemented PIPE_CAP_QUERY_SO_OVERFLOW
Marek Olšák [Wed, 3 Jul 2019 04:22:29 +0000 (00:22 -0400)]
radeonsi/gfx10: don't expose unimplemented PIPE_CAP_QUERY_SO_OVERFLOW

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx10: launch 2 compute waves per CU before going onto the next CU
Marek Olšák [Thu, 4 Jul 2019 02:56:58 +0000 (22:56 -0400)]
radeonsi/gfx10: launch 2 compute waves per CU before going onto the next CU

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx10: set more registers and fields
Marek Olšák [Thu, 4 Jul 2019 03:01:25 +0000 (23:01 -0400)]
radeonsi/gfx10: set more registers and fields

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx10: enable LATE_ALLOC_GS
Marek Olšák [Wed, 3 Jul 2019 04:09:21 +0000 (00:09 -0400)]
radeonsi/gfx10: enable LATE_ALLOC_GS

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx10: set HS/GS/CS.WGP_MODE
Marek Olšák [Wed, 3 Jul 2019 03:35:05 +0000 (23:35 -0400)]
radeonsi/gfx10: set HS/GS/CS.WGP_MODE

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx10: set GE_PC_ALLOC
Marek Olšák [Wed, 3 Jul 2019 02:48:49 +0000 (22:48 -0400)]
radeonsi/gfx10: set GE_PC_ALLOC

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx10: enable 1D textures
Marek Olšák [Wed, 3 Jul 2019 01:40:49 +0000 (21:40 -0400)]
radeonsi/gfx10: enable 1D textures

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx10: enable image stores with DCC
Marek Olšák [Sat, 29 Jun 2019 03:48:14 +0000 (23:48 -0400)]
radeonsi/gfx10: enable image stores with DCC

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx10: no need to invalidate L2 for framebuffer -> texture coherency
Marek Olšák [Sat, 29 Jun 2019 00:31:41 +0000 (20:31 -0400)]
radeonsi/gfx10: no need to invalidate L2 for framebuffer -> texture coherency

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx10: support pixel shaders without exports
Marek Olšák [Thu, 27 Jun 2019 02:57:10 +0000 (22:57 -0400)]
radeonsi/gfx10: support pixel shaders without exports

It only works if there are not color and no Z exports.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi/gfx10: enable vertex shaders without param space allocation
Marek Olšák [Thu, 27 Jun 2019 03:13:00 +0000 (23:13 -0400)]
radeonsi/gfx10: enable vertex shaders without param space allocation

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi: update DCC settings from PAL
Marek Olšák [Thu, 4 Jul 2019 01:55:07 +0000 (21:55 -0400)]
radeonsi: update DCC settings from PAL

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi: reorder shader IO indices for better IO space usage for tess and GS
Marek Olšák [Thu, 4 Jul 2019 00:43:28 +0000 (20:43 -0400)]
radeonsi: reorder shader IO indices for better IO space usage for tess and GS

The highest used index determines the stride for shader outputs in shaders
that use LDS or memory for outputs.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi: decrease maximum supported GENERIC varying index from 42 to 31
Marek Olšák [Wed, 3 Jul 2019 23:05:19 +0000 (19:05 -0400)]
radeonsi: decrease maximum supported GENERIC varying index from 42 to 31

This can decrease LDS and/or memory usage for shader outputs when geometry
shaders or tessellation is used.

Only PS inputs support higher indices and those aren't eliminated by
kill_outputs.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi: cosmetic cleanup in si_shader_io_get_unique_index
Marek Olšák [Wed, 3 Jul 2019 23:04:37 +0000 (19:04 -0400)]
radeonsi: cosmetic cleanup in si_shader_io_get_unique_index

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi: fix and clean up shader_type passing
Marek Olšák [Tue, 2 Jul 2019 22:43:40 +0000 (18:43 -0400)]
radeonsi: fix and clean up shader_type passing

- don't pass it via a parameter if it can be derived from other parameters
- set shader_type for ac_rtld_open
- use enum pipe_shader_type instead of unsigned

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi: enable RB+ for pixel shaders with no/non-contiguous color outputs
Marek Olšák [Thu, 27 Jun 2019 02:44:06 +0000 (22:44 -0400)]
radeonsi: enable RB+ for pixel shaders with no/non-contiguous color outputs

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie airlied@redhat.com
5 years agoradeonsi: don't set READ_ONLY for const_uploader to fix bindless texture hangs
Marek Olšák [Tue, 25 Jun 2019 22:59:50 +0000 (18:59 -0400)]
radeonsi: don't set READ_ONLY for const_uploader to fix bindless texture hangs

Bindless textures can update descriptors with WRITE_DATA.

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie airlied@redhat.com
5 years agogallium: Add util_format_is_unorm8 check
Alyssa Rosenzweig [Fri, 5 Jul 2019 15:40:22 +0000 (08:40 -0700)]
gallium: Add util_format_is_unorm8 check

Useful for formats that would work with the same driver code path as
RGBA8 UNORM but that don't meet the util_format_is_rgba8_variant
criteria due to a smaller channel count.

v2: Use simpler logic (suggested by Iago).

v3: Fix spelling erorr. boolean->bool (thank you airlied).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agonir: Add Panfrost-specific blending intrinsic
Alyssa Rosenzweig [Mon, 1 Jul 2019 22:01:19 +0000 (15:01 -0700)]
nir: Add Panfrost-specific blending intrinsic

This gives more flexibility than the normal store_deref/store_output
versions (particularly, it allows us to abuse the type system in awful
ways, which is necessary for efficient format conversion in blend
shaders.)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
5 years agoradeonsi: Expose support for 10-bit VP9 decode
Pratik Vishwakarma [Tue, 9 Jul 2019 06:23:26 +0000 (11:53 +0530)]
radeonsi: Expose support for 10-bit VP9 decode

Fix si_vid_is_format_supported to expose support
for 10-bit VP9 decode using P016 format. Without
this change, 10-bit decode will be exposed only
for HEVC even though newer hardware support
10-bit decode for VP9.

Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
5 years agonir: Add nir_imm_vec4_16
Alyssa Rosenzweig [Wed, 3 Jul 2019 20:00:14 +0000 (13:00 -0700)]
nir: Add nir_imm_vec4_16

We already have nir_imm_float16 and nir_imm_vec4; let's add the ability
to easily make immediate fp16 vectors as well, now that fp16 support is
maturing in NIR/GLSL.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonvc0: remove nvc0_program.tp.input_patch_size
Karol Herbst [Sun, 7 Jul 2019 19:27:47 +0000 (21:27 +0200)]
nvc0: remove nvc0_program.tp.input_patch_size

right now that's dead code

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
5 years agoradv: Add a common member in the union to make things more clear.
Bas Nieuwenhuizen [Tue, 9 Jul 2019 09:03:56 +0000 (11:03 +0200)]
radv: Add a common member in the union to make things more clear.

This clarifies that the struct can be used when the shader can be
one of VS/TES.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoRevert "radv: keep track of whether NGG is used for GS on GFX10"
Bas Nieuwenhuizen [Tue, 9 Jul 2019 09:00:33 +0000 (11:00 +0200)]
Revert "radv: keep track of whether NGG is used for GS on GFX10"

This reverts commit 63e0675d986744a9ed2d9a15b7cba84ff4a24fc2.

The GS is merged with the preceding shader and since the preceding
shader will have as_ngg set the final binary will have is_ngg set.

So we do not need the gs key here.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agodocs: update calendar, add news item and link release notes for 19.1.2
Juan A. Suarez Romero [Tue, 9 Jul 2019 09:22:13 +0000 (11:22 +0200)]
docs: update calendar, add news item and link release notes for 19.1.2

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
5 years agodocs: add sha256 checksums for 19.1.2
Juan A. Suarez Romero [Tue, 9 Jul 2019 09:18:55 +0000 (09:18 +0000)]
docs: add sha256 checksums for 19.1.2

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit e42399f4de80acb681b90ae4e35d8983b89d0329)

5 years agodocs: add release notes for 19.1.2
Juan A. Suarez Romero [Tue, 9 Jul 2019 09:09:53 +0000 (09:09 +0000)]
docs: add release notes for 19.1.2

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit fe1f7b538b7e8e4bd221c5d52ae72a3721c6aa08)

5 years agonir/lower_io_to_temporaries: Fix hash table leak
Connor Abbott [Mon, 8 Jul 2019 16:17:30 +0000 (18:17 +0200)]
nir/lower_io_to_temporaries: Fix hash table leak

Fixes: c45f5db527252384395e55fb1149b673ec7b5fa8 ("nir/lower_io_to_temporaries: Handle interpolation intrinsics")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoradv/gfx10: Use correct gs_out for tess point_mode.
Bas Nieuwenhuizen [Tue, 9 Jul 2019 07:41:14 +0000 (09:41 +0200)]
radv/gfx10: Use correct gs_out for tess point_mode.

Fixes: 204e4da9b47 "radv: Use correct gs_out with tessellation."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: set correct number of VGPRs for GS on GFX10
Samuel Pitoiset [Tue, 9 Jul 2019 06:44:01 +0000 (08:44 +0200)]
radv: set correct number of VGPRs for GS on GFX10

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv: fix VGT_ESGS_RING_ITEMSIZE for GS as NGG on GFX10
Samuel Pitoiset [Tue, 9 Jul 2019 06:44:00 +0000 (08:44 +0200)]
radv: fix VGT_ESGS_RING_ITEMSIZE for GS as NGG on GFX10

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv: emit VGT_GS_MAX_VERT_OUT for legacy and NGG paths for GS
Samuel Pitoiset [Tue, 9 Jul 2019 06:43:59 +0000 (08:43 +0200)]
radv: emit VGT_GS_MAX_VERT_OUT for legacy and NGG paths for GS

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv: emit the geometry shader as NGG if enabled on GFX10
Samuel Pitoiset [Tue, 9 Jul 2019 06:43:58 +0000 (08:43 +0200)]
radv: emit the geometry shader as NGG if enabled on GFX10

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv: keep track of whether NGG is used for GS on GFX10
Samuel Pitoiset [Tue, 9 Jul 2019 06:43:57 +0000 (08:43 +0200)]
radv: keep track of whether NGG is used for GS on GFX10

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv: add radv_pipeline_generate_hw_gs() helper
Samuel Pitoiset [Tue, 9 Jul 2019 06:43:56 +0000 (08:43 +0200)]
radv: add radv_pipeline_generate_hw_gs() helper

For legacy GS path.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv: fix setting VGT_REUSE_OFF for TES on GFX10
Samuel Pitoiset [Tue, 9 Jul 2019 06:27:31 +0000 (08:27 +0200)]
radv: fix setting VGT_REUSE_OFF for TES on GFX10

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv: fix computing the number of ES VGPRS for TES on GFX10
Samuel Pitoiset [Tue, 9 Jul 2019 06:27:30 +0000 (08:27 +0200)]
radv: fix computing the number of ES VGPRS for TES on GFX10

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv: set max workgroup size to 128 for TES as NGG on GFX10
Samuel Pitoiset [Tue, 9 Jul 2019 06:27:29 +0000 (08:27 +0200)]
radv: set max workgroup size to 128 for TES as NGG on GFX10

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv: fix allocating USER SGPRs on GFX10
Samuel Pitoiset [Tue, 9 Jul 2019 06:27:28 +0000 (08:27 +0200)]
radv: fix allocating USER SGPRs on GFX10

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agov3d: Early return with handle 0 when getting a bo on the simulator
Alejandro Piñeiro [Thu, 4 Jul 2019 12:11:27 +0000 (14:11 +0200)]
v3d: Early return with handle 0 when getting a bo on the simulator

Until now we were just asking entries on the bo hash table, and don't
worry if the handle was NULL, as we were just expecting to get a NULL
in return. It seems that now the hash table assert with some reserverd
pointers, included NULL. This commit just early returns with handle 0.

This change fixes several crashes on vk-gl-cts GLES tests when using
the v3d simulator, like:
KHR-GLES3.core.internalformat.copy_tex_image.*

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agovulkan/overlay: use a single macro to lookup objects
Lionel Landwerlin [Mon, 8 Jul 2019 13:04:06 +0000 (16:04 +0300)]
vulkan/overlay: use a single macro to lookup objects

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agovulkan/overlay: add queue present timing measurement
Lionel Landwerlin [Mon, 8 Jul 2019 13:03:14 +0000 (16:03 +0300)]
vulkan/overlay: add queue present timing measurement

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoradv/gfx10: Enable tess.
Bas Nieuwenhuizen [Mon, 8 Jul 2019 21:50:09 +0000 (23:50 +0200)]
radv/gfx10: Enable tess.

Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv/gfx10: Add pipeline state support for tess.
Bas Nieuwenhuizen [Mon, 8 Jul 2019 21:44:32 +0000 (23:44 +0200)]
radv/gfx10: Add pipeline state support for tess.

Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv/gfx10: Only set HW edge flags with gs & tess disabled.
Bas Nieuwenhuizen [Mon, 8 Jul 2019 21:43:34 +0000 (23:43 +0200)]
radv/gfx10: Only set HW edge flags with gs & tess disabled.

Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv/gfx10: Add tess eval ngg shader support.
Bas Nieuwenhuizen [Mon, 8 Jul 2019 21:42:45 +0000 (23:42 +0200)]
radv/gfx10: Add tess eval ngg shader support.

Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv: Use correct gs_out with tessellation.
Bas Nieuwenhuizen [Mon, 8 Jul 2019 21:18:55 +0000 (23:18 +0200)]
radv: Use correct gs_out with tessellation.

We should use the primitives output by the TES in that case.

There is always a separate TES if there is no GS.

Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv/gfx10: Use correct count of max_offchip_buffers.
Bas Nieuwenhuizen [Mon, 8 Jul 2019 21:18:28 +0000 (23:18 +0200)]
radv/gfx10: Use correct count of max_offchip_buffers.

Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradv/gfx10: Load global pointers in correct userdata registers for hs/gs.
Bas Nieuwenhuizen [Tue, 9 Jul 2019 00:56:10 +0000 (02:56 +0200)]
radv/gfx10: Load global pointers in correct userdata registers for hs/gs.

Fixes: cfaad5e3cad "radv/gfx10: implement radv_emit_global_shader_pointers()"
Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi: update function name in comment
Timothy Arceri [Mon, 8 Jul 2019 00:59:46 +0000 (10:59 +1000)]
radeonsi: update function name in comment

This was missed in 2361558eb71d

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agor600: remove query/apply_opaque_metadata callbacks
Timothy Arceri [Mon, 8 Jul 2019 00:52:45 +0000 (10:52 +1000)]
r600: remove query/apply_opaque_metadata callbacks

Theses seem to have been radeonsi specific callbacks that are no
longer needed now that these drivers no longer share this code
path.

These callbacks were removed from radeonsi in c0d44fe0e91c.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agovulkan/overlay: fix crash on freeing NULL command buffer
Lionel Landwerlin [Mon, 8 Jul 2019 13:00:59 +0000 (16:00 +0300)]
vulkan/overlay: fix crash on freeing NULL command buffer

It is legal to call vkFreeCommandBuffers() on NULL command buffers.

This fix requires eb41ce1b012f24 ("util/hash_table: Properly handle
the NULL key in hash_table_u64").

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4438188f492e1f ("vulkan/overlay: record stats in command buffers and accumulate on exec/submit")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agovulkan: bump headers & registry to 1.1.114
Lionel Landwerlin [Mon, 8 Jul 2019 07:30:50 +0000 (10:30 +0300)]
vulkan: bump headers & registry to 1.1.114

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoradv: only use specialised 3D meta paths on GFX9.
Dave Airlie [Mon, 8 Jul 2019 19:08:09 +0000 (05:08 +1000)]
radv: only use specialised 3D meta paths on GFX9.

GFX10 appears to act like GFX8 here.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agomesa: Set minimum possible GLSL version
Ian Romanick [Sat, 9 Mar 2019 07:50:29 +0000 (23:50 -0800)]
mesa: Set minimum possible GLSL version

Set the absolute minimum possible GLSL version.  API_OPENGL_CORE can
mean an OpenGL 3.0 forward-compatible context, so that implies a minimum
possible version of 1.30.  Otherwise, the minimum possible version 1.20.
Since Mesa unconditionally advertises GL_ARB_shading_language_100 and
GL_ARB_shader_objects, every driver has GLSL 1.20... even if they don't
advertise any extensions to enable any shader stages (e.g.,
GL_ARB_vertex_shader).

Converts about 2,500 piglit tests from crash to skip on NV18.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109524
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110955
Cc: mesa-stable@lists.freedesktop.org
5 years agoanv: Set maxComputeSharedMemorySize to 64k
Caio Marcelo de Oliveira Filho [Mon, 8 Jul 2019 17:36:59 +0000 (10:36 -0700)]
anv: Set maxComputeSharedMemorySize to 64k

This value is supported since gen7.  See also 8514c75a26e "i965: Set
compute shader shared memory max to 64k".

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/vec4: Delete vec4_visitor::emit_lrp
Ian Romanick [Thu, 6 Jun 2019 18:00:40 +0000 (11:00 -0700)]
intel/vec4: Delete vec4_visitor::emit_lrp

Effectivley unused since dd7135d55d5 ("intel/compiler: Use the flrp
lowering pass for all stages on Gen4 and Gen5").  I had intended to
remove this code as part of that series, but I forgot.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir: Allow nir_ssa_alu_instr_src_components to operate on non-SSA destinations
Ian Romanick [Fri, 7 Jun 2019 15:35:51 +0000 (08:35 -0700)]
nir: Allow nir_ssa_alu_instr_src_components to operate on non-SSA destinations

Existing users only operate on instructions with SSA destinations.  Some
later patches add new direct calls and indirect calls (via existing NIR
functions) on instructions after going out of SSA.  At the very least,
these calls are added by:

intel/vec4: Try to emit a VF source in try_immediate_source
intel/vec4: Try to emit a single load for multiple 3-src instruction operands

The first commit adds direct calls, and the second adds calls via
nir_alu_srcs_equal and nir_alu_srcs_negative_equal.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir: Handle swizzle in nir_alu_srcs_negative_equal
Ian Romanick [Mon, 10 Jun 2019 22:05:14 +0000 (15:05 -0700)]
nir: Handle swizzle in nir_alu_srcs_negative_equal

When I added this function, I was not sure if swizzles of immediate
values were a thing that occurred in NIR.  The only existing user of
these functions is the partial redundancy elimination for compares.
Since comparison instructions are inherently scalar, this does not
occur.

However, a couple later patches, "nir/algebraic: Recognize
open-coded flrp(-1, 1, a) and flrp(1, -1, a)" combined with "intel/vec4:
Try to emit a single load for multiple 3-src instruction operands",
collaborate to create a few thousand instances.

No shader-db changes on any Intel platform.

v2: Handle the swizzle in nir_alu_srcs_negative_equal and leave
nir_const_value_negative_equal unchanged.  Suggested by Jason.

v3: Correctly handle write masks.  Add note (and assertion) that the
caller is responsible for various compatibility checks.  The single
existing caller only calls this for combinations of scalar fadd and
float comparison instructions, so all of the requirements are met.  A
later patch (intel/vec4: Try to emit a single load for multiple 3-src
instruction operands) will call this for sources of the same
instruction, so all of the requirements are met.

v4: Add unit test for nir_opt_comparison_pre that is fixed by this
commit.

Reviewed-by: Matt Turner <mattst88@gmail.com>