nir: Recognize open-coded extract_u8.
Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack
three bytes from an integer and convert each into a float:
float((val >> 16u) & 0xffu)
float((val >> 8u) & 0xffu)
float((val >> 0u) & 0xffu)
Instead of shifting, masking, and type converting like this:
shr(8) g15<1>UD g25<8,8,1>UD 0x00000010UD
and(8) g16<1>UD g15<8,8,1>UD 0x000000ffUD
mov(8) g17<1>F g16<8,8,1>UD
shr(8) g18<1>UD g25<8,8,1>UD 0x00000008UD
and(8) g19<1>UD g18<8,8,1>UD 0x000000ffUD
mov(8) g20<1>F g19<8,8,1>UD
and(8) g21<1>UD g25<8,8,1>UD 0x000000ffUD
mov(8) g22<1>F g21<8,8,1>UD
i965 can simply extract a byte and convert to float in a single
instruction:
mov(8) g17<1>F g25.2<32,8,4>UB
mov(8) g20<1>F g25.1<32,8,4>UB
mov(8) g22<1>F g25.0<32,8,4>UB
This patch implements the first step: recognizing byte extraction. A
later patch will optimize out the conversion to float.
instructions in affected programs: 28568 -> 27450 (-3.91%)
helped: 7
cycles in affected programs: 210076 -> 203144 (-3.30%)
helped: 7
This patch decreases the number of instructions in the two Unigine
programs by:
#1721: 4520 -> 4374 instructions (-3.23%)
#1706: 3752 -> 3582 instructions (-4.53%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>