x86: various operations on mask registers can avoid going through mod_table[]
Now that we have OP_R(), use it here as well, while wiring memory-only
operands to OP_M() at the same time. To keep the number of consumed
opcode bytes similar to before, make BadOp() also account for VEX/XOP/
EVEX prefix bytes. To keep that change simple, convert need_vex to an
actual count of prefix bytes (keeping intact all prior boolean uses of
the field).
Note how this improves disassembly of such bad encodings, by at least
leaving a hint towards what a "nearby" instruction is. (For KSHIFT*
change the immediates test testcases use, such that disassembly remains
sufficiently in sync.)
While there also use Ux for VPMOV{B,W,D,Q}2M, where decoding through
mod_table[] was missing in the earlier scheme.