aco: Treat all booleans as per-lane.
authorTimur Kristóf <timur.kristof@gmail.com>
Mon, 4 Nov 2019 18:28:08 +0000 (19:28 +0100)
committerTimur Kristóf <timur.kristof@gmail.com>
Thu, 14 Nov 2019 16:27:11 +0000 (17:27 +0100)
commit8995c0b30a696c709fac9e5f761c101913dc92ec
tree91ba26c6d1104198fb021ba436da8578249bbbe3
parenta1622c1a11bfb7112a856c2ff9b308d0aa3e98b6
aco: Treat all booleans as per-lane.

Previously, instruction selection had two kinds of booleans:
1. divergent which was per-lane and stored in s2 (VCC size)
2. uniform which was stored in s1
Additionally, uniform booleans were made per-lane when they resulted
from operations which were supported only by the VALU.

To decide which type was used, we relied on the destination size,
which was not reliable due to the per-lane uniform bools, but it
mostly works on wave64.
However, in wave32 mode (where VCC is also s1) this approach
makes it impossible keep track of which boolean is uniform and
which is divergent.

This commit makes all booleans per-lane.
The resulting excess code size will be taken care of by the optimizer.

v2 (by Daniel Schürmann):
- Better names for some functions
- Use s_andn2_b64 with exec for nir_op_inot
- Simplify code due to using s_and_b64 in bool_to_scalar_condition

v3 (by Timur Kristóf):
- Fix several subgroups regressions

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
src/amd/compiler/aco_instruction_selection.cpp
src/amd/compiler/aco_instruction_selection_setup.cpp
src/amd/compiler/aco_lower_bool_phis.cpp