radeonsi: fix the VGT performance tweak for small instances
authorMarek Olšák <marek.olsak@amd.com>
Wed, 7 Sep 2016 23:42:06 +0000 (01:42 +0200)
committerMarek Olšák <marek.olsak@amd.com>
Fri, 9 Sep 2016 20:45:06 +0000 (22:45 +0200)
Based on the VGT spec.

The Vulkan driver doesn't do it optimally and they plan to fix it.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
src/gallium/drivers/radeonsi/si_state_draw.c

index d3e6e1ac93762ee596fbb96b3e5e13bb46199173..e44147f43b7e569d17c14f8b115461fda44a0b9b 100644 (file)
@@ -318,14 +318,15 @@ static unsigned si_get_ia_multi_vgt_param(struct si_context *sctx,
                        wd_switch_on_eop = true;
 
                /* Performance recommendation for 4 SE Gfx7-8 parts if
-                * instances are smaller than a primgroup. Ignore the fact
-                * primgroup_size is a primitive count, not vertex count.
-                * Don't do anything for indirect draws.
+                * instances are smaller than a primgroup.
+                * Assume indirect draws always use small instances.
+                * This is needed for good VS wave utilization.
                 */
                if (sctx->b.chip_class <= VI &&
                    sctx->b.screen->info.max_se >= 4 &&
-                   !info->indirect &&
-                   info->instance_count > 1 && info->count < primgroup_size)
+                   (info->indirect ||
+                    (info->instance_count > 1 &&
+                     si_num_prims_for_vertices(info) < primgroup_size)))
                        wd_switch_on_eop = true;
 
                /* Required on CIK and later. */