tgsi: SSE2 optimized exp2, log2 and pow implementations.
Special care must be taken when calling compiler generated SSE2 functions
from the runtime generated SSE2: saving the xmm registers, and notify gcc
the stack is not 16byte aligned.
It would be more efficient to keep the stack pointer 16byte aligned, but
too hairy, and not consistent in all x86 architectures.
This has been tested in linux x86 and windows x86 userspace. Not tested on
x86-64 because it is broken for other reasons (even without this change).