When generating the VPM write instruction for geometry shader outputs,
emit_store_output_gs ends up adding the base and offset arguments
together with an ADD instruction. The addition was done at the VIR level
after scheduling so it always ends up right next to the corresponding
stvpm instruction. Most of the time the offset is constant but nothing
does any constant folding at the VIR level.
This patch makes it instead fold the addition into the offset at the NIR
level in v3d_nir_lower_io so that the NIR-level constant folding can get
rid of the addition most of the time.
v2: Use nir_iadd_imm to simplify the code. (Eric Anholt)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5825>
{
assert(instr->num_components == 1);
{
assert(instr->num_components == 1);
+ struct qreg offset = ntq_get_src(c, instr->src[1], 0);
+
uint32_t base_offset = nir_intrinsic_base(instr);
uint32_t base_offset = nir_intrinsic_base(instr);
- struct qreg src_offset = ntq_get_src(c, instr->src[1], 0);
- struct qreg offset =
- vir_ADD(c, vir_uniform_ui(c, base_offset), src_offset);
+
+ if (base_offset)
+ offset = vir_ADD(c, vir_uniform_ui(c, base_offset), offset);
/* Usually, for VS or FS, we only emit outputs once at program end so
* our VPM writes are never in non-uniform control flow, but this
/* Usually, for VS or FS, we only emit outputs once at program end so
* our VPM writes are never in non-uniform control flow, but this
intr->num_components = 1;
intr->src[0] = nir_src_for_ssa(chan);
intr->num_components = 1;
intr->src[0] = nir_src_for_ssa(chan);
- if (offset)
- intr->src[1] = nir_src_for_ssa(offset);
- else
+ if (offset) {
+ /* When generating the VIR instruction, the base and the offset
+ * are just going to get added together with an ADD instruction
+ * so we might as well do the add here at the NIR level instead
+ * and let the constant folding do its magic.
+ */
+ intr->src[1] = nir_src_for_ssa(nir_iadd_imm(b, offset, base));
+ base = 0;
+ } else {
intr->src[1] = nir_src_for_ssa(nir_imm_int(b, 0));
intr->src[1] = nir_src_for_ssa(nir_imm_int(b, 0));
nir_intrinsic_set_base(intr, base);
nir_intrinsic_set_write_mask(intr, 0x1);
nir_intrinsic_set_base(intr, base);
nir_intrinsic_set_write_mask(intr, 0x1);