radeonsi: load the right number of components for VS inputs and TBOs
The supported counts are 1, 2, 4. (3=4)
The following snippet loads float, vec2, vec3, and vec4:
Before:
buffer_load_format_x v9, v4, s[0:3], 0 idxen ;
E0002000 80000904
buffer_load_format_xyzw v[0:3], v5, s[8:11], 0 idxen ;
E00C2000 80020005
s_waitcnt vmcnt(0) ;
BF8C0F70
buffer_load_format_xyzw v[2:5], v6, s[12:15], 0 idxen ;
E00C2000 80030206
s_waitcnt vmcnt(0) ;
BF8C0F70
buffer_load_format_xyzw v[5:8], v7, s[4:7], 0 idxen ;
E00C2000 80010507
After:
buffer_load_format_x v10, v4, s[0:3], 0 idxen ;
E0002000 80000A04
buffer_load_format_xy v[8:9], v5, s[8:11], 0 idxen ;
E0042000 80020805
buffer_load_format_xyzw v[0:3], v6, s[12:15], 0 idxen ;
E00C2000 80030006
s_waitcnt vmcnt(0) ;
BF8C0F70
buffer_load_format_xyzw v[3:6], v7, s[4:7], 0 idxen ;
E00C2000 80010307
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>