Previously we would blindly emit an sequence like:
mov(1) f0.1<1>UW g1.14<0,1,0>UW
...
cmp.l.f0(16) g7<1>F g5<8,8,1>F 0x41700000F /* 15F */
(+f0.1) cmp.z.f0.1(16) null<1>D g7<8,8,1>D 0D
The first move sets the flags based on the initial execution mask.
Later discard sequences contain a predicated compare that can only
remove more SIMD channels. Often times the only user of the result from
the first compare is the second compare. Instead, generate a sequence
like
mov(1) f0.1<1>UW g1.14<0,1,0>UW
...
cmp.l.f0(16) g7<1>F g5<8,8,1>F 0x41700000F /* 15F */
(+f0.1) cmp.ge.f0.1(8) null<1>F g5<8,8,1>F 0x41700000F /* 15F */
If the results stored in g7 and f0.0 are not used, the comparison will
be eliminated. This removes an instruction and potentially reduces
register pressure.
v2: Major re-write of the commit message (including fixing the assembly
code). Suggested by Matt.
All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs:
17224434 ->
17198659 (-0.15%)
instructions in affected programs:
2908125 ->
2882350 (-0.89%)
helped: 18891
HURT: 5
helped stats (abs) min: 1 max: 12 x̄: 1.38 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 1.76% x̃: 1.02%
HURT stats (abs) min: 9 max: 105 x̄: 51.40 x̃: 35
HURT stats (rel) min: 0.43% max: 4.92% x̄: 2.34% x̃: 1.56%
95% mean confidence interval for instructions value: -1.39 -1.34
95% mean confidence interval for instructions %-change: -1.79% -1.73%
Instructions are helped.
total cycles in shared programs:
361468458 ->
361170679 (-0.08%)
cycles in affected programs:
38470116 ->
38172337 (-0.77%)
helped: 16202
HURT: 1456
helped stats (abs) min: 1 max: 4473 x̄: 26.24 x̃: 18
helped stats (rel) min: <.01% max: 28.44% x̄: 2.90% x̃: 2.18%
HURT stats (abs) min: 1 max: 5982 x̄: 87.51 x̃: 28
HURT stats (rel) min: <.01% max: 51.29% x̄: 5.48% x̃: 1.64%
95% mean confidence interval for cycles value: -18.24 -15.49
95% mean confidence interval for cycles %-change: -2.26% -2.14%
Cycles are helped.
total spills in shared programs: 12147 -> 12176 (0.24%)
spills in affected programs: 175 -> 204 (16.57%)
helped: 8
HURT: 5
total fills in shared programs: 25262 -> 25292 (0.12%)
fills in affected programs: 269 -> 299 (11.15%)
helped: 8
HURT: 5
Haswell
total instructions in shared programs:
13530316 ->
13502647 (-0.20%)
instructions in affected programs:
2507824 ->
2480155 (-1.10%)
helped: 18859
HURT: 10
helped stats (abs) min: 1 max: 12 x̄: 1.48 x̃: 1
helped stats (rel) min: 0.03% max: 27.78% x̄: 2.38% x̃: 1.41%
HURT stats (abs) min: 5 max: 39 x̄: 25.70 x̃: 31
HURT stats (rel) min: 0.22% max: 1.66% x̄: 1.09% x̃: 1.31%
95% mean confidence interval for instructions value: -1.49 -1.44
95% mean confidence interval for instructions %-change: -2.42% -2.34%
Instructions are helped.
total cycles in shared programs:
377865412 ->
377639034 (-0.06%)
cycles in affected programs:
40169572 ->
39943194 (-0.56%)
helped: 15550
HURT: 1938
helped stats (abs) min: 1 max: 2482 x̄: 25.67 x̃: 18
helped stats (rel) min: <.01% max: 37.77% x̄: 3.00% x̃: 2.25%
HURT stats (abs) min: 1 max: 4862 x̄: 89.17 x̃: 35
HURT stats (rel) min: <.01% max: 67.67% x̄: 6.16% x̃: 2.75%
95% mean confidence interval for cycles value: -14.42 -11.47
95% mean confidence interval for cycles %-change: -2.05% -1.91%
Cycles are helped.
total spills in shared programs: 26769 -> 26814 (0.17%)
spills in affected programs: 826 -> 871 (5.45%)
helped: 9
HURT: 10
total fills in shared programs: 38383 -> 38425 (0.11%)
fills in affected programs: 834 -> 876 (5.04%)
helped: 9
HURT: 10
LOST: 5
GAINED: 10
Ivy Bridge
total instructions in shared programs:
12079250 ->
12044139 (-0.29%)
instructions in affected programs:
2409680 ->
2374569 (-1.46%)
helped: 16135
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 2.18 x̃: 2
helped stats (rel) min: 0.07% max: 37.50% x̄: 2.72% x̃: 1.68%
95% mean confidence interval for instructions value: -2.21 -2.14
95% mean confidence interval for instructions %-change: -2.76% -2.67%
Instructions are helped.
total cycles in shared programs:
180116747 ->
179900405 (-0.12%)
cycles in affected programs:
25439823 ->
25223481 (-0.85%)
helped: 13817
HURT: 1499
helped stats (abs) min: 1 max: 1886 x̄: 26.40 x̃: 18
helped stats (rel) min: <.01% max: 38.84% x̄: 2.57% x̃: 1.97%
HURT stats (abs) min: 1 max: 3684 x̄: 98.99 x̃: 52
HURT stats (rel) min: <.01% max: 97.01% x̄: 6.37% x̃: 3.42%
95% mean confidence interval for cycles value: -15.68 -12.57
95% mean confidence interval for cycles %-change: -1.77% -1.63%
Cycles are helped.
LOST: 8
GAINED: 10
Sandy Bridge
total instructions in shared programs:
10878990 ->
10863659 (-0.14%)
instructions in affected programs:
1806702 ->
1791371 (-0.85%)
helped: 13023
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.18 x̃: 1
helped stats (rel) min: 0.07% max: 13.79% x̄: 1.65% x̃: 1.10%
95% mean confidence interval for instructions value: -1.18 -1.17
95% mean confidence interval for instructions %-change: -1.68% -1.62%
Instructions are helped.
total cycles in shared programs:
154082878 ->
153862810 (-0.14%)
cycles in affected programs:
20199374 ->
19979306 (-1.09%)
helped: 12048
HURT: 510
helped stats (abs) min: 1 max: 323 x̄: 20.57 x̃: 18
helped stats (rel) min: 0.03% max: 17.78% x̄: 2.05% x̃: 1.52%
HURT stats (abs) min: 1 max: 448 x̄: 54.39 x̃: 16
HURT stats (rel) min: 0.02% max: 37.98% x̄: 4.13% x̃: 1.17%
95% mean confidence interval for cycles value: -17.97 -17.08
95% mean confidence interval for cycles %-change: -1.84% -1.75%
Cycles are helped.
LOST: 1
GAINED: 0
Iron Lake
total instructions in shared programs:
8155075 ->
8142729 (-0.15%)
instructions in affected programs: 949495 -> 937149 (-1.30%)
helped: 5810
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.12 x̃: 2
helped stats (rel) min: 0.10% max: 16.67% x̄: 2.53% x̃: 1.85%
95% mean confidence interval for instructions value: -2.14 -2.11
95% mean confidence interval for instructions %-change: -2.59% -2.48%
Instructions are helped.
total cycles in shared programs:
188584610 ->
188549632 (-0.02%)
cycles in affected programs:
17274446 ->
17239468 (-0.20%)
helped: 3881
HURT: 90
helped stats (abs) min: 2 max: 168 x̄: 9.08 x̃: 6
helped stats (rel) min: <.01% max: 23.53% x̄: 0.83% x̃: 0.30%
HURT stats (abs) min: 2 max: 10 x̄: 2.80 x̃: 2
HURT stats (rel) min: <.01% max: 0.60% x̄: 0.10% x̃: 0.07%
95% mean confidence interval for cycles value: -9.35 -8.27
95% mean confidence interval for cycles %-change: -0.85% -0.77%
Cycles are helped.
GM45
total instructions in shared programs:
5019308 ->
5013119 (-0.12%)
instructions in affected programs: 489028 -> 482839 (-1.27%)
helped: 2912
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.13 x̃: 2
helped stats (rel) min: 0.10% max: 16.67% x̄: 2.46% x̃: 1.81%
95% mean confidence interval for instructions value: -2.14 -2.11
95% mean confidence interval for instructions %-change: -2.54% -2.39%
Instructions are helped.
total cycles in shared programs:
129002592 ->
128977804 (-0.02%)
cycles in affected programs:
12669152 ->
12644364 (-0.20%)
helped: 2759
HURT: 37
helped stats (abs) min: 2 max: 168 x̄: 9.03 x̃: 4
helped stats (rel) min: <.01% max: 21.43% x̄: 0.75% x̃: 0.31%
HURT stats (abs) min: 2 max: 10 x̄: 3.62 x̃: 4
HURT stats (rel) min: <.01% max: 0.41% x̄: 0.10% x̃: 0.04%
95% mean confidence interval for cycles value: -9.53 -8.20
95% mean confidence interval for cycles %-change: -0.79% -0.70%
Cycles are helped.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
#include "compiler/glsl/ir.h"
#include "brw_fs.h"
#include "brw_nir.h"
+#include "brw_eu.h"
#include "nir_search_helpers.h"
#include "util/u_math.h"
#include "util/bitscan.h"
* condition, we emit a CMP of g0 != g0, so all currently executing
* channels will get turned off.
*/
- fs_inst *cmp;
+ fs_inst *cmp = NULL;
if (instr->intrinsic == nir_intrinsic_discard_if) {
- cmp = bld.CMP(bld.null_reg_f(), get_nir_src(instr->src[0]),
- brw_imm_d(0), BRW_CONDITIONAL_Z);
+ nir_alu_instr *alu = nir_src_as_alu_instr(instr->src[0]);
+
+ if (alu != NULL &&
+ alu->op != nir_op_bcsel &&
+ alu->op != nir_op_inot) {
+ /* Re-emit the instruction that generated the Boolean value, but
+ * do not store it. Since this instruction will be conditional,
+ * other instructions that want to use the real Boolean value may
+ * get garbage. This was a problem for piglit's fs-discard-exit-2
+ * test.
+ *
+ * Ideally we'd detect that the instruction cannot have a
+ * conditional modifier before emitting the instructions. Alas,
+ * that is nigh impossible. Instead, we're going to assume the
+ * instruction (or last instruction) generated can have a
+ * conditional modifier. If it cannot, fallback to the old-style
+ * compare, and hope dead code elimination will clean up the
+ * extra instructions generated.
+ */
+ nir_emit_alu(bld, alu, false);
+
+ cmp = (fs_inst *) instructions.get_tail();
+ if (cmp->conditional_mod == BRW_CONDITIONAL_NONE) {
+ if (cmp->can_do_cmod())
+ cmp->conditional_mod = BRW_CONDITIONAL_Z;
+ else
+ cmp = NULL;
+ } else {
+ /* The old sequence that would have been generated is,
+ * basically, bool_result == false. This is equivalent to
+ * !bool_result, so negate the old modifier.
+ */
+ cmp->conditional_mod = brw_negate_cmod(cmp->conditional_mod);
+ }
+ }
+
+ if (cmp == NULL) {
+ cmp = bld.CMP(bld.null_reg_f(), get_nir_src(instr->src[0]),
+ brw_imm_d(0), BRW_CONDITIONAL_Z);
+ }
} else {
fs_reg some_reg = fs_reg(retype(brw_vec8_grf(0, 0),
BRW_REGISTER_TYPE_UW));
cmp = bld.CMP(bld.null_reg_f(), some_reg, some_reg, BRW_CONDITIONAL_NZ);
}
+
cmp->predicate = BRW_PREDICATE_NORMAL;
cmp->flag_subreg = 1;