slp: Support optimizing load distribution
This introduces a post processing step for the pattern matcher to flatten
permutes introduced by the complex multiplications patterns.
This performs a blend early such that SLP is not cancelled by the LOAD_LANES
permute. This is a temporary workaround to the fact that loads are not CSEd
during building and is required to produce efficient code.
gcc/ChangeLog:
* tree-vect-slp.c (optimize_load_redistribution_1): New.
(optimize_load_redistribution, vect_is_slp_load_node): New.
(vect_match_slp_patterns): Use it.