radeonsi: implement 32-bit pointers in user data SGPRs (v2)
User SGPRs changes:
VS: 14 -> 9
TCS: 14 -> 10
TES: 10 -> 6
GS: 8 -> 4
GSCOPY: 2 -> 1
PS: 9 -> 5
Merged VS-TCS: 24 -> 16
Merged VS-GS: 18 -> 11
Merged TES-GS: 18 -> 11
SGPRS:
2170102 ->
2158430 (-0.54 %)
VGPRS:
1645656 ->
1641516 (-0.25 %)
Spilled SGPRs: 9078 -> 8810 (-2.95 %)
Spilled VGPRs: 130 -> 114 (-12.31 %)
Scratch size: 1508 -> 1492 (-1.06 %) dwords per thread
Code Size:
52094872 ->
52692540 (1.15 %) bytes
Max Waves: 371848 -> 372723 (0.24 %)
v2: - the shader cache needs to take address32_hi into account
- set amdgpu-32bit-address-high-bits
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)