From 90dfcc6b32c5a635d30cd04fc887a7ff78d3476d Mon Sep 17 00:00:00 2001
From: Thomas Helland <thomashelland90@gmail.com>
Date: Fri, 10 Feb 2017 19:14:32 +0100
Subject: [PATCH] util: Change the pointer hashing function
MIME-Version: 1.0
Content-Type: text/plain; charset=utf8
Content-Transfer-Encoding: 8bit

Use our knowledge that pointers are at least 4 byte aligned to remove
the useless digits. Then shift by 6, 10, and 14 bits and add this to
the original pointer, effectively folding in the entropy of the higher
bits of the pointer into a 4-bit section. Stopping at 14 means we can
add the entropy from 18 bits, or at least a 600Kbyte section of memory.
Assuming that ralloc allocates from a linearly allocated heap less than
this we can make a very efficient pointer hashing function for our usecase.
Even if we are not on an architecture that is 4 byte aligned, there is
still a high big chance that the thing we are allocating is at least
8 bytes in size, so even then we will have entropy into the third bit.

The 4 bit increment on the shifts is chosen rather arbitrarily; if we
had chosen a 3 bit increment we would need to add another xor to
cover a decently sized memorypool. Increasing it to 5 bits would
spread our entropy more, possibly hurting us with more collisions on
hash tables of size less than 32. With a hash table of size 16 there
are a max of 11 entries, and we can assume that with such a small table
collisions are not that painfull.

This allows us to hash the whole 32 or 64 bit pointer at once,
instead of running FNV1a, looping through each byte and doing
increments, decrements, muls, and xors on every byte. This cuts
_mesa_hash_data from 1.5 % on profiles, to making _mesa_hash_pointer
show up with a 0.09% share. Collisions on insertion actually seems to be
ever so slightly lower with this hash function, as found by printing
a loop counter and sorting the data.

perf stat shows a 1.5% reduction in instruction count,
and a 5% reduction in stalled cycles. Shader-db runtime goes
from 225 to 220 seconds.

No instruction-count changes in shader-db, but there are some minor
changes in cycle-count that is likely caused by nir walking a set
in some of its passes, and this causing a different ordering.
That might eventually lead to a difference in register allocation.
However, the effect is a net positive;

total cycles in shared programs: 24739550 -> 24738482 (-0.00%)
cycles in affected programs: 374468 -> 373400 (-0.29%)
helped: 178
HURT: 49

Reviewed-by: Marek OlÅ¡Ã¡k <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
---
 src/util/hash_table.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/util/hash_table.h b/src/util/hash_table.h
index b35ee871bb3..c7f577665dc 100644
--- a/src/util/hash_table.h
+++ b/src/util/hash_table.h
@@ -105,7 +105,8 @@ static inline uint32_t _mesa_key_hash_string(const void *key)
 
 static inline uint32_t _mesa_hash_pointer(const void *pointer)
 {
-   return _mesa_hash_data(&pointer, sizeof(pointer));
+   uintptr_t num = (uintptr_t) pointer;
+   return (uint32_t) ((num >> 2) ^ (num >> 6) ^ (num >> 10) ^ (num >> 14));
 }
 
 enum {
-- 
2.30.2