We don't properly pass on ctx.lgkm_cnt/ctx.barrier_imm/etc, so this
waitcnt was necessary for barriers and correctly waiting for SMEM before
s_dcache_wb on GFX10.
Totals from affected shaders:
SGPRS: 33200 -> 33200 (0.00 %)
VGPRS: 31376 -> 31376 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size:
2431804 ->
2433956 (0.09 %) bytes
LDS: 316 -> 316 (0.00 %) blocks
Max Waves: 1609 -> 1609 (0.00 %)
This reverts commit
2c050b49b3d776f054f1265d5523cabb61f22fc3.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
/* check if this block is at the end of a loop */
for (unsigned succ_idx : block.linear_succs) {
/* eliminate any remaining counters */
- if (succ_idx <= block.index && (ctx.vm_cnt || ctx.exp_cnt || ctx.lgkm_cnt || ctx.vs_cnt) && !ctx.gpr_map.empty()) {
+ if (succ_idx <= block.index && (ctx.vm_cnt || ctx.exp_cnt || ctx.lgkm_cnt || ctx.vs_cnt)) {
// TODO: we could do better if we only wait if the regs between the block and other predecessors differ
aco_ptr<Instruction> branch = std::move(new_instructions.back());