i965/fs: fix pull constant load component selection for doubles
authorIago Toral Quiroga <itoral@igalia.com>
Mon, 18 Jan 2016 12:09:31 +0000 (13:09 +0100)
committerSamuel Iglesias Gonsálvez <siglesias@igalia.com>
Mon, 16 May 2016 07:55:33 +0000 (09:55 +0200)
UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4 starting at a
constant offset that is 16-byte aligned. If we need to access an unaligned
offset we emit a load with an aligned offset and use the remaining constant
offset to select the component into the vec4 result that we are interested
in. This component must be computed in units of the type size, since that
is what fs_reg::set_smear expects.

This patch does this change in the two places where we use this message:
In demote_pull_constants when we lower uniform access with constant offset
into the pull constant buffer and in UBO loads with constant offset.

v2 (Sam):
- Fix set_smear() in fs_visitor::lower_constant_loads(), take into account
source type instead and remove MAX2 (Curro).
- Improve changes to nir_intrinsic_load_ubo case in nir_emit_intrinsic()
(Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
src/mesa/drivers/dri/i965/brw_fs.cpp
src/mesa/drivers/dri/i965/brw_fs_nir.cpp

index 6ef1e236e2f9823e233d6140ed2167abb6a105b6..06a5de1785f47badab2f33d6dcb03fc057e6a8ef 100644 (file)
@@ -2249,7 +2249,8 @@ fs_visitor::lower_constant_loads()
          inst->src[i].file = VGRF;
          inst->src[i].nr = dst.nr;
          inst->src[i].reg_offset = 0;
-         inst->src[i].set_smear(pull_index & 3);
+         inst->src[i].set_smear((pull_index & 3) * 4 /
+                                type_sz(inst->src[i].type));
 
          brw_mark_surface_used(prog_data, index);
       }
index 584a0d6bd527e71f72d3f7200574983e460e5466..17eb82ed56f53eb9e6c2d49127801024b96fd27b 100644 (file)
@@ -3382,17 +3382,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
          bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, packed_consts,
                   surf_index, const_offset_reg);
 
-         for (unsigned i = 0; i < instr->num_components; i++) {
-            packed_consts.set_smear(const_offset->u32[0] % 16 / 4 + i);
+         const fs_reg consts = byte_offset(packed_consts, const_offset->u32[0] % 16);
 
-            /* The std140 packing rules don't allow vectors to cross 16-byte
-             * boundaries, and a reg is 32 bytes.
-             */
-            assert(packed_consts.subreg_offset < 32);
-
-            bld.MOV(dest, packed_consts);
-            dest = offset(dest, bld, 1);
-         }
+         for (unsigned i = 0; i < instr->num_components; i++)
+            bld.MOV(offset(dest, bld, i), component(consts, i));
       }
       break;
    }