From: Samuel Pitoiset Date: Tue, 21 Jan 2020 15:49:22 +0000 (+0100) Subject: aco: fix a hazard with v_interp_* and v_{read,readfirst}lane_* on GFX6 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=1ac49ba908acf70a8ae4aad71dc715bf625aea1e;p=mesa.git aco: fix a hazard with v_interp_* and v_{read,readfirst}lane_* on GFX6 It's required to insert 1 wait state if the dst VGPR of any v_interp_* is followed by a read with v_readfirstlane or v_readlane to fix GPU hangs on GFX6. Note that v_writelane_* is apparently not affected. This hazard isn't documented anywhere but AMD confirmed it. This fixes a GPU hang with the texturemipmapgen Sascha demo on GFX6. Signed-off-by: Samuel Pitoiset Reviewed-by: Daniel Schürmann Part-of: --- diff --git a/src/amd/compiler/README.md b/src/amd/compiler/README.md index 7c7e68f458e..9c47c07bc80 100644 --- a/src/amd/compiler/README.md +++ b/src/amd/compiler/README.md @@ -131,6 +131,15 @@ finish and then write to vcc (for example, `s_mov_b64 vcc, vcc`) to correct vccz Currently, we don't do this. +## GCN / GFX6 hazards + +### VINTRP followed by a read with v_readfirstlane or v_readlane + +It's required to insert 1 wait state if the dst VGPR of any v_interp_* is +followed by a read with v_readfirstlane or v_readlane to fix GPU hangs on GFX6. +Note that v_writelane_* is apparently not affected. This hazard isn't +documented anywhere but AMD confirmed it. + ## RDNA / GFX10 hazards ### SMEM store followed by a load with the same address diff --git a/src/amd/compiler/aco_insert_NOPs.cpp b/src/amd/compiler/aco_insert_NOPs.cpp index 689d5e25acc..8c032bb699c 100644 --- a/src/amd/compiler/aco_insert_NOPs.cpp +++ b/src/amd/compiler/aco_insert_NOPs.cpp @@ -353,6 +353,24 @@ int handle_instruction_gfx8_9(NOP_ctx_gfx8_9& ctx, aco_ptr& instr, ctx.VALU_wrsgpr = NOPs ? new_idx : new_idx + 1; } } + + /* It's required to insert 1 wait state if the dst VGPR of any v_interp_* + * is followed by a read with v_readfirstlane or v_readlane to fix GPU + * hangs on GFX6. Note that v_writelane_* is apparently not affected. + * This hazard isn't documented anywhere but AMD confirmed that hazard. + */ + if (ctx.chip_class == GFX6 && + !new_instructions.empty() && + (instr->opcode == aco_opcode::v_readfirstlane_b32 || + instr->opcode == aco_opcode::v_readlane_b32)) { + aco_ptr& pred = new_instructions.back(); + if (pred->format == Format::VINTRP) { + Definition pred_def = pred->definitions[0]; + Operand& op = instr->operands[0]; + if (regs_intersect(pred_def.physReg(), pred_def.size(), op.physReg(), op.size())) + NOPs = std::max(NOPs, 1); + } + } return NOPs; } else if (instr->isVMEM() && ctx.VALU_wrsgpr + 5 >= new_idx) { /* If the VALU writes the SGPR that is used by a VMEM, the user must add five wait states. */