aco: fix a hazard with v_interp_* and v_{read,readfirst}lane_* on GFX6

author Samuel Pitoiset <samuel.pitoiset@gmail.com>

Tue, 21 Jan 2020 15:49:22 +0000 (16:49 +0100)

committer Marge Bot <eric+marge@anholt.net>

Fri, 24 Jan 2020 18:34:27 +0000 (18:34 +0000)
author Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tue, 21 Jan 2020 15:49:22 +0000 (16:49 +0100)
committer Marge Bot <eric+marge@anholt.net>
Fri, 24 Jan 2020 18:34:27 +0000 (18:34 +0000)
diff --git a/src/amd/compiler/README.md b/src/amd/compiler/README.md

index 7c7e68f458eee53eb56238b59e9acdbb98de7720..9c47c07bc80de209748dd59c101dce1f4729a352 100644 (file)
--- a/src/amd/compiler/README.md
+++ b/src/amd/compiler/README.md
@@ -131,6 +131,15 @@ finish and then write to vcc (for example, `s_mov_b64 vcc, vcc`) to correct vccz
  
  Currently, we don't do this.
  
+## GCN / GFX6 hazards
+
+### VINTRP followed by a read with v_readfirstlane or v_readlane
+
+It's required to insert 1 wait state if the dst VGPR of any v_interp_* is
+followed by a read with v_readfirstlane or v_readlane to fix GPU hangs on GFX6.
+Note that v_writelane_* is apparently not affected. This hazard isn't
+documented anywhere but AMD confirmed it.
+
  ## RDNA / GFX10 hazards
  
  ### SMEM store followed by a load with the same address
diff --git a/src/amd/compiler/aco_insert_NOPs.cpp b/src/amd/compiler/aco_insert_NOPs.cpp

index 689d5e25acc2a2cc3a14267edf9c4ea7d0deedea..8c032bb699c54b551aef13913c33438357722235 100644 (file)
--- a/src/amd/compiler/aco_insert_NOPs.cpp
+++ b/src/amd/compiler/aco_insert_NOPs.cpp
@@ -353,6 +353,24 @@ int handle_instruction_gfx8_9(NOP_ctx_gfx8_9& ctx, aco_ptr<Instruction>& instr,
                 ctx.VALU_wrsgpr = NOPs ? new_idx : new_idx + 1;
           }
        }
+
+      /* It's required to insert 1 wait state if the dst VGPR of any v_interp_*
+       * is followed by a read with v_readfirstlane or v_readlane to fix GPU
+       * hangs on GFX6. Note that v_writelane_* is apparently not affected.
+       * This hazard isn't documented anywhere but AMD confirmed that hazard.
+       */
+      if (ctx.chip_class == GFX6 &&
+          !new_instructions.empty() &&
+          (instr->opcode == aco_opcode::v_readfirstlane_b32 ||
+           instr->opcode == aco_opcode::v_readlane_b32)) {
+         aco_ptr<Instruction>& pred = new_instructions.back();
+         if (pred->format == Format::VINTRP) {
+            Definition pred_def = pred->definitions[0];
+            Operand& op = instr->operands[0];
+            if (regs_intersect(pred_def.physReg(), pred_def.size(), op.physReg(), op.size()))
+               NOPs = std::max(NOPs, 1);
+         }
+      }
        return NOPs;
     } else if (instr->isVMEM() && ctx.VALU_wrsgpr + 5 >= new_idx) {
        /* If the VALU writes the SGPR that is used by a VMEM, the user must add five wait states. */
author	Samuel Pitoiset <samuel.pitoiset@gmail.com>
	Tue, 21 Jan 2020 15:49:22 +0000 (16:49 +0100)
committer	Marge Bot <eric+marge@anholt.net>
	Fri, 24 Jan 2020 18:34:27 +0000 (18:34 +0000)
src/amd/compiler/README.md		patch \| blob \| history
src/amd/compiler/aco_insert_NOPs.cpp		patch \| blob \| history