x86: fold (supposed to be) identical code
The Q and L suffix exclusion checks in match_template() ought to be
(kept) in sync as far as their FPU and SIMD aspects go. This was
already violated by only the Q one checking for active broadcast.
Convert the code such that there'll be only one instance of the logic,
the more that subsequently the logic is liable to need further
refinement / extension. (The alternative would be to drop all SIMD-ness
from the L part, but it is in principle possible to enable all sorts of
SIMD support with just a pre-386 CPU, via suitable .arch directives.)