x86: reduce AVX512 FP set of insns decoded through vex_w_table[]
Like for AVX512-FP16, there's not that many FP insns where going through
this table is easier / cheaper than using suitable macros. Utilize %XS
and %XD more to eliminate a fair number of table entries.
While doing this I noticed a few anomalies. Where lines get touched /
moved anyway, these are being addressed right here:
- vmovshdup used EXx for its 2nd operand, thus displaying seemingly
valid broadcast when EVEX.b is set with a memory operand; use
EXEvexXNoBcst instead just like vmovsldup already does
- vmovlhps used EXx for its 3rd operand, when all sibling entries use
EXq; switch to EXq there for consistency (the two differ only for
memory operands)