Initially we had NUM_TEX_TILE_ENTRIES of 50, however this was using too much
memory (mostly because the tile cache is operating on fixed max current
sampler views which could be fixed but that's another topic). So it was
decreased to 4. However this is a ridiculously low number which can't
actually really work (the number of tiles needed for as little as
a single quad with linear_mipmap_linear is 2 to 8 for a 2d texture, and
4 to 16 for a 3d texture), as it just about guarantees there will be
cache thrashing sometimes (just about always for 3d textures in fact, since
while there are 4 entries the cache is direct mapped).
So increase that number to 16 (which is still on the low side for direct
mapped cache though I guess using something like 4-way associativity would
be more effective than increasing this further) which has at least some good
chance to avoid thrashing. Since we don't want to increase memory requirements
however in turn decrease the tile size accordingly from 64 to 32 (as a bonus
point this also decreases the cost of texture thrashing which might still
happen sometimes).
I've seen performance improvement in the order of factor ~200 (specifically,
drawing the first frame from the replay from bug 41787 needs "only" ~10s
instead of ~30min, meaning I can actually compare the output with other
drivers...) with this.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
/**
* Cache tile size (width and height). This needs to be a power of two.
*/
-#define TEX_TILE_SIZE_LOG2 6
+#define TEX_TILE_SIZE_LOG2 5
#define TEX_TILE_SIZE (1 << TEX_TILE_SIZE_LOG2)
} data;
};
-#define NUM_TEX_TILE_ENTRIES 4
+/*
+ * The number of cache entries.
+ * Should not be decreased to lower than 16, and even that
+ * seems too low to avoid cache thrashing in some cases (because
+ * the cache is direct mapped, see tex_cache_pos() function).
+ */
+#define NUM_TEX_TILE_ENTRIES 16
struct softpipe_tex_tile_cache
{