radeonsi: use v_mad_f32 for fma
v_fma_f32 runs at FP64 rate (= slow). Alien Isolation and F1 2015 seem
to use fma for all d3d multiply-add instructions, which is silly.
This tries to restore performance for those games.
The main difference between v_mad_f32 and v_fma_f32 is that v_mad doesn't
support denormals, which we don't enable anyway, because they are slow too.
Also, there is code size reduction:
Totals from affected shaders:
VGPRS: 109796 -> 109808 (0.01 %)
Spilled SGPRs: 29995 -> 30022 (0.09 %)
Spilled VGPRs: 12 -> 13 (8.33 %) <-- it's just one shader going from 12 to 13
Code Size:
6667596 ->
6476356 (-2.87 %) bytes
Max Waves: 26931 -> 26899 (-0.12 %)
I've not actually tested real performance.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>