From: Jakub Jelinek Date: Wed, 6 May 2020 18:05:02 +0000 (+0200) Subject: x86: Fix vextract* masked patterns [PR93069] X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=319eafce3e54c8cb10e3fddce6823a6a558fca8b;p=gcc.git x86: Fix vextract* masked patterns [PR93069] The AVX512F documentation clearly states that in instructions where the destination is a memory only merging-masking is possible, not zero-masking, and the assembler enforces that. The testcase in this patch fails to assemble because of Error: unsupported masking for `vextracti32x8' on vextracti32x8 $0x0, %zmm1, -64(%rsp){%k1}{z} For the vector extraction patterns, we apparently have 7 *_maskm patterns that only accept memory destinations and rtx_equal_p merge-masking source for it, 7 * corresponding patterns that allow memory destination only for the non-masked cases (through ), then 2 * patterns (lo ssehalf V16FI and lo ssehalf VI8F_256 ones) which do allow memory destination even for masked cases and are the cause of the testsuite failure, because we must not allow C constraint if the destination is m, and finally one pair of patterns (separate * and *_mask, hi ssehalf VI4F_256), which has another issue (for which I don't have a testcase though), where if it would match zero-masking with register destination, it wouldn't emit the needed {z} into assembly. The attached patch fixes those 3 issues only, perhaps more suitable for backporting. But, even with that fixed, we are missing 3 further *_maskm patterns and more importantly, I find the split into 3 separate patterns after subst, *_maskm for masking with memory destination, *_mask for masking with register destination and * for non-masking unnecessarily complex and harder for reload, so the included patch below (non-attached) instead kills all *_maskm patterns and splits the * patterns into * and *_mask by hand instead of subst, where the *_mask ones make sure that with v destination they use 0C, while with m destination they use 0 and as condition enforce that either destination is not MEM, or rtx_equal_p between the destination and corresponding merging-masking operand source. If we had those 3 missing *_maskm patterns, this patch would actually result in both shorter sse.md and shorter machine description after subst (e.g. length of tmp-mddump.md), as we don't have them, the patch is actually 16 lines longer sse.md, but still shorter tmp-mddump.md. 2020-05-06 Jakub Jelinek PR target/93069 * config/i386/subst.md (store_mask_constraint, store_mask_predicate): Remove. (avx512dq_vextract64x2_1_maskm, avx512f_vextract32x4_1_maskm, vec_extract_lo__maskm, vec_extract_hi__maskm): Remove. (avx512dq_vextract64x2_1): Split into ... (*avx512dq_vextract64x2_1, avx512dq_vextract64x2_1_mask): ... these new define_insns. Even in the masked variant allow memory output but in that case use 0 rather than 0C constraint on the source of masked-out elts. (avx512f_vextract32x4_1): Split into ... (*avx512f_vextract32x4_1, avx512f_vextract32x4_1_mask): ... these new define_insns. Even in the masked variant allow memory output but in that case use 0 rather than 0C constraint on the source of masked-out elts. (vec_extract_lo_): Split into ... (vec_extract_lo_, vec_extract_lo__mask): ... these new define_insns. Even in the masked variant allow memory output but in that case use 0 rather than 0C constraint on the source of masked-out elts. (vec_extract_hi_): Split into ... (vec_extract_hi_, vec_extract_hi__mask): ... these new define_insns. Even in the masked variant allow memory output but in that case use 0 rather than 0C constraint on the source of masked-out elts. --- diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 18800ec605a..1afb7824fa5 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,35 @@ +2020-05-06 Jakub Jelinek + + PR target/93069 + * config/i386/subst.md (store_mask_constraint, store_mask_predicate): + Remove. + (avx512dq_vextract64x2_1_maskm, + avx512f_vextract32x4_1_maskm, + vec_extract_lo__maskm, vec_extract_hi__maskm): Remove. + (avx512dq_vextract64x2_1): Split + into ... + (*avx512dq_vextract64x2_1, + avx512dq_vextract64x2_1_mask): ... these new + define_insns. Even in the masked variant allow memory output but in + that case use 0 rather than 0C constraint on the source of masked-out + elts. + (avx512f_vextract32x4_1): Split + into ... + (*avx512f_vextract32x4_1, + avx512f_vextract32x4_1_mask): ... these new define_insns. + Even in the masked variant allow memory output but in that case use + 0 rather than 0C constraint on the source of masked-out elts. + (vec_extract_lo_): Split into ... + (vec_extract_lo_, vec_extract_lo__mask): ... these new + define_insns. Even in the masked variant allow memory output but in + that case use 0 rather than 0C constraint on the source of masked-out + elts. + (vec_extract_hi_): Split into ... + (vec_extract_hi_, vec_extract_hi__mask): ... these new + define_insns. Even in the masked variant allow memory output but in + that case use 0 rather than 0C constraint on the source of masked-out + elts. + 2020-05-06 qing zhao PR c/94230 @@ -111,27 +143,27 @@ 2020-05-06 Hongtao Liu Wei Xiao - * gcc/common/config/i386/i386-common.c (OPTION_MASK_ISA2_SERIALIZE_SET, + * common/config/i386/i386-common.c (OPTION_MASK_ISA2_SERIALIZE_SET, OPTION_MASK_ISA2_SERIALIZE_UNSET): New macros. (ix86_handle_option): Handle -mserialize. - * gcc/config.gcc (serializeintrin.h): New header file. - * gcc/config/i386/cpuid.h (bit_SERIALIZE): New bit. - * gcc/config/i386/driver-i386.c (host_detect_local_cpu): Detect + * config.gcc (serializeintrin.h): New header file. + * config/i386/cpuid.h (bit_SERIALIZE): New bit. + * config/i386/driver-i386.c (host_detect_local_cpu): Detect -mserialize. - * gcc/config/i386/i386-builtin.def: Add new builtin. - * gcc/config/i386/i386-c.c (__SERIALIZE__): New macro. - * gcc/config/i386/i386-options.c (ix86_target_opts_isa2_opts): + * config/i386/i386-builtin.def: Add new builtin. + * config/i386/i386-c.c (__SERIALIZE__): New macro. + * config/i386/i386-options.c (ix86_target_opts_isa2_opts): Add -mserialize. * (ix86_valid_target_attribute_inner_p): Add target attribute * for serialize. - * gcc/config/i386/i386.h (TARGET_SERIALIZE, TARGET_SERIALIZE_P): + * config/i386/i386.h (TARGET_SERIALIZE, TARGET_SERIALIZE_P): New macros. - * gcc/config/i386/i386.md (UNSPECV_SERIALIZE): New unspec. + * config/i386/i386.md (UNSPECV_SERIALIZE): New unspec. (serialize): New define_insn. - * gcc/config/i386/i386.opt (mserialize): New option - * gcc/config/i386/immintrin.h: Include serailizeintrin.h. - * gcc/config/i386/serializeintrin.h: New header file. - * gcc/doc/invoke.texi: Add documents for -mserialize. + * config/i386/i386.opt (mserialize): New option + * config/i386/immintrin.h: Include serailizeintrin.h. + * config/i386/serializeintrin.h: New header file. + * doc/invoke.texi: Add documents for -mserialize. 2020-05-06 Richard Biener @@ -144,7 +176,7 @@ private branch. * config/rs6000/rs6000-c.c: Likewise. * config/rs6000/rs6000-call.c: Likewise. - * gcc/config/rs6000/rs6000.c: Likewise. + * config/rs6000/rs6000.c: Likewise. 2020-05-05 Sebastian Huber @@ -865,7 +897,7 @@ 2020-04-28 Alexandre Oliva PR target/94812 - * gcc/config/rs6000/rs6000.md (rs6000_mffsl): Copy result to + * config/rs6000/rs6000.md (rs6000_mffsl): Copy result to output operand in emulation. Don't overwrite pseudos. 2020-04-28 Jeff Law @@ -1120,7 +1152,7 @@ 2020-04-23 Bill Schmidt - * gcc/doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions): + * doc/extend.texi (PowerPC AltiVec/VSX Built-in Functions): Replace outdated link to ELFv2 ABI. 2020-04-23 Jakub Jelinek @@ -2700,7 +2732,7 @@ 2020-03-29 John David Anglin - * gcc/config/pa/pa.c (pa_asm_output_aligned_bss): Delete duplicate + * config/pa/pa.c (pa_asm_output_aligned_bss): Delete duplicate .align output. 2020-03-28 Jakub Jelinek @@ -3192,7 +3224,7 @@ 2020-03-21 Iain Sandoe PR target/93694 - * gcc/config/darwin.opt: Amend options descriptions. + * config/darwin.opt: Amend options descriptions. 2020-03-21 Richard Sandiford @@ -3214,7 +3246,7 @@ 2020-03-20 Carl Love PR/target 87583 - * gcc/config/rs6000/rs6000.c (rs6000_option_override_internal): + * config/rs6000/rs6000.c (rs6000_option_override_internal): Add check for TARGET_FPRND for Power 7 or newer. 2020-03-20 Jan Hubicka @@ -10798,7 +10830,7 @@ 2020-03-10 Jiufu Guo PR target/93709 - * gcc/config/rs6000/rs6000.c (rs6000_emit_p9_fp_minmax): Check + * config/rs6000/rs6000.c (rs6000_emit_p9_fp_minmax): Check NAN and SIGNED_ZEROR for smax/smin. 2020-03-10 Will Schmidt @@ -11856,9 +11888,9 @@ 2020-02-21 John David Anglin - * gcc/config/pa/pa.c (pa_function_value): Fix check for word and + * config/pa/pa.c (pa_function_value): Fix check for word and double-word size when handling aggregate return values. - * gcc/config/pa/som.h (ASM_DECLARE_FUNCTION_NAME): Fix to indicate + * config/pa/som.h (ASM_DECLARE_FUNCTION_NAME): Fix to indicate that homogeneous SFmode and DFmode aggregates are passed and returned in general registers. @@ -13983,7 +14015,7 @@ 2020-01-21 Mihail-Calin Ionescu - * gcc/config/arm/arm.c (clear_operation_p): + * config/arm/arm.c (clear_operation_p): Initialise last_regno, skip first iteration based on the first_set value and use ints instead of the unnecessary HOST_WIDE_INTs. diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 0d69c9eb903..7a7ecd4be87 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -8323,60 +8323,31 @@ DONE; }) -(define_insn "avx512dq_vextract64x2_1_maskm" - [(set (match_operand: 0 "memory_operand" "=m") +(define_insn "avx512dq_vextract64x2_1_mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") (vec_merge: (vec_select: - (match_operand:V8FI 1 "register_operand" "v") - (parallel [(match_operand 2 "const_0_to_7_operand") - (match_operand 3 "const_0_to_7_operand")])) - (match_operand: 4 "memory_operand" "0") - (match_operand:QI 5 "register_operand" "Yk")))] + (match_operand:V8FI 1 "register_operand" "v,v") + (parallel [(match_operand 2 "const_0_to_7_operand") + (match_operand 3 "const_0_to_7_operand")])) + (match_operand: 4 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 5 "register_operand" "Yk,Yk")))] "TARGET_AVX512DQ && INTVAL (operands[2]) % 2 == 0 && INTVAL (operands[2]) == INTVAL (operands[3]) - 1 - && rtx_equal_p (operands[4], operands[0])" -{ - operands[2] = GEN_INT ((INTVAL (operands[2])) >> 1); - return "vextract64x2\t{%2, %1, %0%{%5%}|%0%{%5%}, %1, %2}"; -} - [(set_attr "type" "sselog") - (set_attr "prefix_extra" "1") - (set_attr "length_immediate" "1") - (set_attr "memory" "store") - (set_attr "prefix" "evex") - (set_attr "mode" "")]) - -(define_insn "avx512f_vextract32x4_1_maskm" - [(set (match_operand: 0 "memory_operand" "=m") - (vec_merge: - (vec_select: - (match_operand:V16FI 1 "register_operand" "v") - (parallel [(match_operand 2 "const_0_to_15_operand") - (match_operand 3 "const_0_to_15_operand") - (match_operand 4 "const_0_to_15_operand") - (match_operand 5 "const_0_to_15_operand")])) - (match_operand: 6 "memory_operand" "0") - (match_operand:QI 7 "register_operand" "Yk")))] - "TARGET_AVX512F - && INTVAL (operands[2]) % 4 == 0 - && INTVAL (operands[2]) == INTVAL (operands[3]) - 1 - && INTVAL (operands[3]) == INTVAL (operands[4]) - 1 - && INTVAL (operands[4]) == INTVAL (operands[5]) - 1 - && rtx_equal_p (operands[6], operands[0])" + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[4]))" { - operands[2] = GEN_INT (INTVAL (operands[2]) >> 2); - return "vextract32x4\t{%2, %1, %0%{%7%}|%0%{%7%}, %1, %2}"; + operands[2] = GEN_INT (INTVAL (operands[2]) >> 1); + return "vextract64x2\t{%2, %1, %0%{%5%}%N4|%0%{%5%}%N4, %1, %2}"; } - [(set_attr "type" "sselog") + [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") - (set_attr "memory" "store") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "avx512dq_vextract64x2_1" - [(set (match_operand: 0 "" "=") +(define_insn "*avx512dq_vextract64x2_1" + [(set (match_operand: 0 "nonimmediate_operand" "=vm") (vec_select: (match_operand:V8FI 1 "register_operand" "v") (parallel [(match_operand 2 "const_0_to_7_operand") @@ -8386,7 +8357,7 @@ && INTVAL (operands[2]) == INTVAL (operands[3]) - 1" { operands[2] = GEN_INT (INTVAL (operands[2]) >> 1); - return "vextract64x2\t{%2, %1, %0|%0, %1, %2}"; + return "vextract64x2\t{%2, %1, %0|%0, %1, %2}"; } [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") @@ -8415,14 +8386,41 @@ operands[1] = gen_lowpart (mode, operands[1]); }) -(define_insn "avx512f_vextract32x4_1" - [(set (match_operand: 0 "" "=") +(define_insn "avx512f_vextract32x4_1_mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") + (vec_merge: + (vec_select: + (match_operand:V16FI 1 "register_operand" "v,v") + (parallel [(match_operand 2 "const_0_to_15_operand") + (match_operand 3 "const_0_to_15_operand") + (match_operand 4 "const_0_to_15_operand") + (match_operand 5 "const_0_to_15_operand")])) + (match_operand: 6 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 7 "register_operand" "Yk,Yk")))] + "TARGET_AVX512F + && INTVAL (operands[2]) % 4 == 0 + && INTVAL (operands[2]) == INTVAL (operands[3]) - 1 + && INTVAL (operands[3]) == INTVAL (operands[4]) - 1 + && INTVAL (operands[4]) == INTVAL (operands[5]) - 1 + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[6]))" +{ + operands[2] = GEN_INT (INTVAL (operands[2]) >> 2); + return "vextract32x4\t{%2, %1, %0%{%7%}%N6|%0%{%7%}%N6, %1, %2}"; +} + [(set_attr "type" "sselog1") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "*avx512f_vextract32x4_1" + [(set (match_operand: 0 "nonimmediate_operand" "=vm") (vec_select: (match_operand:V16FI 1 "register_operand" "v") - (parallel [(match_operand 2 "const_0_to_15_operand") - (match_operand 3 "const_0_to_15_operand") - (match_operand 4 "const_0_to_15_operand") - (match_operand 5 "const_0_to_15_operand")])))] + (parallel [(match_operand 2 "const_0_to_15_operand") + (match_operand 3 "const_0_to_15_operand") + (match_operand 4 "const_0_to_15_operand") + (match_operand 5 "const_0_to_15_operand")])))] "TARGET_AVX512F && INTVAL (operands[2]) % 4 == 0 && INTVAL (operands[2]) == INTVAL (operands[3]) - 1 @@ -8430,7 +8428,7 @@ && INTVAL (operands[4]) == INTVAL (operands[5]) - 1" { operands[2] = GEN_INT (INTVAL (operands[2]) >> 2); - return "vextract32x4\t{%2, %1, %0|%0, %1, %2}"; + return "vextract32x4\t{%2, %1, %0|%0, %1, %2}"; } [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") @@ -8514,35 +8512,35 @@ [(set (match_dup 0) (match_dup 1))] "operands[1] = gen_lowpart (mode, operands[1]);") -(define_insn "vec_extract_lo__maskm" - [(set (match_operand: 0 "memory_operand" "=m") +(define_insn "vec_extract_lo__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") (vec_merge: (vec_select: - (match_operand:V8FI 1 "register_operand" "v") + (match_operand:V8FI 1 "register_operand" "v,v") (parallel [(const_int 0) (const_int 1) - (const_int 2) (const_int 3)])) - (match_operand: 2 "memory_operand" "0") - (match_operand:QI 3 "register_operand" "Yk")))] + (const_int 2) (const_int 3)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] "TARGET_AVX512F - && rtx_equal_p (operands[2], operands[0])" - "vextract64x4\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}" + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract64x4\t{$0x0, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x0}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") + (set_attr "memory" "none,store") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "vec_extract_lo_" - [(set (match_operand: 0 "" "=v,,v") +(define_insn "vec_extract_lo_" + [(set (match_operand: 0 "nonimmediate_operand" "=v,vm,v") (vec_select: - (match_operand:V8FI 1 "" "v,v,") + (match_operand:V8FI 1 "nonimmediate_operand" "v,v,vm") (parallel [(const_int 0) (const_int 1) - (const_int 2) (const_int 3)])))] - "TARGET_AVX512F - && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" + (const_int 2) (const_int 3)])))] + "TARGET_AVX512F && !(MEM_P (operands[0]) && MEM_P (operands[1]))" { - if ( || (!TARGET_AVX512VL && !MEM_P (operands[1]))) - return "vextract64x4\t{$0x0, %1, %0|%0, %1, 0x0}"; + if (!TARGET_AVX512VL && !MEM_P (operands[1])) + return "vextract64x4\t{$0x0, %1, %0|%0, %1, 0x0}"; else return "#"; } @@ -8553,70 +8551,69 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "vec_extract_hi__maskm" - [(set (match_operand: 0 "memory_operand" "=m") +(define_insn "vec_extract_hi__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") (vec_merge: (vec_select: - (match_operand:V8FI 1 "register_operand" "v") + (match_operand:V8FI 1 "register_operand" "v,v") (parallel [(const_int 4) (const_int 5) - (const_int 6) (const_int 7)])) - (match_operand: 2 "memory_operand" "0") - (match_operand:QI 3 "register_operand" "Yk")))] + (const_int 6) (const_int 7)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] "TARGET_AVX512F - && rtx_equal_p (operands[2], operands[0])" - "vextract64x4\t{$0x1, %1, %0%{%3%}|%0%{%3%}, %1, 0x1}" - [(set_attr "type" "sselog") + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract64x4\t{$0x1, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x1}" + [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") - (set_attr "memory" "store") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "vec_extract_hi_" - [(set (match_operand: 0 "" "=") +(define_insn "vec_extract_hi_" + [(set (match_operand: 0 "nonimmediate_operand" "=vm") (vec_select: (match_operand:V8FI 1 "register_operand" "v") (parallel [(const_int 4) (const_int 5) - (const_int 6) (const_int 7)])))] + (const_int 6) (const_int 7)])))] "TARGET_AVX512F" - "vextract64x4\t{$0x1, %1, %0|%0, %1, 0x1}" + "vextract64x4\t{$0x1, %1, %0|%0, %1, 0x1}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "vec_extract_hi__maskm" - [(set (match_operand: 0 "memory_operand" "=m") +(define_insn "vec_extract_hi__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") (vec_merge: (vec_select: - (match_operand:V16FI 1 "register_operand" "v") + (match_operand:V16FI 1 "register_operand" "v,v") (parallel [(const_int 8) (const_int 9) - (const_int 10) (const_int 11) - (const_int 12) (const_int 13) - (const_int 14) (const_int 15)])) - (match_operand: 2 "memory_operand" "0") - (match_operand:QI 3 "register_operand" "Yk")))] + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] "TARGET_AVX512DQ - && rtx_equal_p (operands[2], operands[0])" - "vextract32x8\t{$0x1, %1, %0%{%3%}|%0%{%3%}, %1, 0x1}" + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract32x8\t{$0x1, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x1}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "vec_extract_hi_" - [(set (match_operand: 0 "" "=,vm") +(define_insn "vec_extract_hi_" + [(set (match_operand: 0 "nonimmediate_operand" "=vm,vm") (vec_select: (match_operand:V16FI 1 "register_operand" "v,v") (parallel [(const_int 8) (const_int 9) - (const_int 10) (const_int 11) - (const_int 12) (const_int 13) - (const_int 14) (const_int 15)])))] - "TARGET_AVX512F && " + (const_int 10) (const_int 11) + (const_int 12) (const_int 13) + (const_int 14) (const_int 15)])))] + "TARGET_AVX512F" "@ - vextract32x8\t{$0x1, %1, %0|%0, %1, 0x1} + vextract32x8\t{$0x1, %1, %0|%0, %1, 0x1} vextracti64x4\t{$0x1, %1, %0|%0, %1, 0x1}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") @@ -8692,27 +8689,44 @@ DONE; }) -(define_insn "vec_extract_lo_" - [(set (match_operand: 0 "" - "=v,v,") +(define_insn "vec_extract_lo__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") + (vec_merge: + (vec_select: + (match_operand:V16FI 1 "register_operand" "v,v") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] + "TARGET_AVX512DQ + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract32x8\t{$0x0, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x0}" + [(set_attr "type" "sselog1") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "memory" "none,store") + (set_attr "prefix" "evex") + (set_attr "mode" "")]) + +(define_insn "vec_extract_lo_" + [(set (match_operand: 0 "nonimmediate_operand" "=v,v,m") (vec_select: - (match_operand:V16FI 1 "" - "v,,v") + (match_operand:V16FI 1 "nonimmediate_operand" "v,m,v") (parallel [(const_int 0) (const_int 1) - (const_int 2) (const_int 3) - (const_int 4) (const_int 5) - (const_int 6) (const_int 7)])))] + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)])))] "TARGET_AVX512F - && - && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" + && !(MEM_P (operands[0]) && MEM_P (operands[1]))" { - if ( - || (!TARGET_AVX512VL - && !REG_P (operands[0]) - && EXT_REX_SSE_REG_P (operands[1]))) + if (!TARGET_AVX512VL + && !REG_P (operands[0]) + && EXT_REX_SSE_REG_P (operands[1])) { if (TARGET_AVX512DQ) - return "vextract32x8\t{$0x0, %1, %0|%0, %1, 0x0}"; + return "vextract32x8\t{$0x0, %1, %0|%0, %1, 0x0}"; else return "vextract64x4\t{$0x0, %1, %0|%0, %1, 0x0}"; } @@ -8750,29 +8764,34 @@ operands[1] = gen_lowpart (mode, operands[1]); }) -(define_insn "vec_extract_lo_" - [(set (match_operand: 0 "" - "=v,v,") - (vec_select: - (match_operand:VI8F_256 1 "" - "v,,v") - (parallel [(const_int 0) (const_int 1)])))] - "TARGET_AVX - && && - && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" -{ - if () - return "vextract64x2\t{$0x0, %1, %0|%0, %1, 0x0}"; - else - return "#"; -} +(define_insn "vec_extract_lo__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") + (vec_merge: + (vec_select: + (match_operand:VI8F_256 1 "register_operand" "v,v") + (parallel [(const_int 0) (const_int 1)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] + "TARGET_AVX512DQ + && TARGET_AVX512VL + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract64x2\t{$0x0, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x0}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") - (set_attr "memory" "none,load,store") + (set_attr "memory" "none,store") (set_attr "prefix" "evex") (set_attr "mode" "XI")]) +(define_insn "vec_extract_lo_" + [(set (match_operand: 0 "nonimmediate_operand" "=vm,v") + (vec_select: + (match_operand:VI8F_256 1 "nonimmediate_operand" "v,vm") + (parallel [(const_int 0) (const_int 1)])))] + "TARGET_AVX + && !(MEM_P (operands[0]) && MEM_P (operands[1]))" + "#") + (define_split [(set (match_operand: 0 "nonimmediate_operand") (vec_select: @@ -8783,20 +8802,38 @@ [(set (match_dup 0) (match_dup 1))] "operands[1] = gen_lowpart (mode, operands[1]);") -(define_insn "vec_extract_hi_" - [(set (match_operand: 0 "" "=v,") +(define_insn "vec_extract_hi__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") + (vec_merge: + (vec_select: + (match_operand:VI8F_256 1 "register_operand" "v,v") + (parallel [(const_int 2) (const_int 3)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] + "TARGET_AVX512DQ + && TARGET_AVX512VL + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract64x2\t{$0x1, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x1}" + [(set_attr "type" "sselog1") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "prefix" "vex") + (set_attr "mode" "")]) + +(define_insn "vec_extract_hi_" + [(set (match_operand: 0 "nonimmediate_operand" "=vm") (vec_select: - (match_operand:VI8F_256 1 "register_operand" "v,v") + (match_operand:VI8F_256 1 "register_operand" "v") (parallel [(const_int 2) (const_int 3)])))] - "TARGET_AVX && && " + "TARGET_AVX" { if (TARGET_AVX512VL) - { - if (TARGET_AVX512DQ) - return "vextract64x2\t{$0x1, %1, %0|%0, %1, 0x1}"; - else - return "vextract32x4\t{$0x1, %1, %0|%0, %1, 0x1}"; - } + { + if (TARGET_AVX512DQ) + return "vextract64x2\t{$0x1, %1, %0|%0, %1, 0x1}"; + else + return "vextract32x4\t{$0x1, %1, %0|%0, %1, 0x1}"; + } else return "vextract\t{$0x1, %1, %0|%0, %1, 0x1}"; } @@ -8817,74 +8854,50 @@ [(set (match_dup 0) (match_dup 1))] "operands[1] = gen_lowpart (mode, operands[1]);") -(define_insn "vec_extract_lo_" - [(set (match_operand: 0 "" - "=,v") - (vec_select: - (match_operand:VI4F_256 1 "" - "v,") - (parallel [(const_int 0) (const_int 1) - (const_int 2) (const_int 3)])))] - "TARGET_AVX - && - && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" -{ - if () - return "vextract32x4\t{$0x0, %1, %0|%0, %1, 0x0}"; - else - return "#"; -} - [(set_attr "type" "sselog1") - (set_attr "prefix_extra" "1") - (set_attr "length_immediate" "1") - (set_attr "prefix" "evex") - (set_attr "mode" "")]) - -(define_insn "vec_extract_lo__maskm" - [(set (match_operand: 0 "memory_operand" "=m") +(define_insn "vec_extract_lo__mask" + [(set (match_operand: 0 "nonimmediate_operand" "=v,m") (vec_merge: (vec_select: - (match_operand:VI4F_256 1 "register_operand" "v") + (match_operand:VI4F_256 1 "register_operand" "v,v") (parallel [(const_int 0) (const_int 1) - (const_int 2) (const_int 3)])) - (match_operand: 2 "memory_operand" "0") - (match_operand:QI 3 "register_operand" "Yk")))] - "TARGET_AVX512VL && TARGET_AVX512F - && rtx_equal_p (operands[2], operands[0])" - "vextract32x4\t{$0x0, %1, %0%{%3%}|%0%{%3%}, %1, 0x0}" + (const_int 2) (const_int 3)])) + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand:QI 3 "register_operand" "Yk,Yk")))] + "TARGET_AVX512VL + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" + "vextract32x4\t{$0x0, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x0}" [(set_attr "type" "sselog1") (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) -(define_insn "vec_extract_hi__maskm" - [(set (match_operand: 0 "memory_operand" "=m") - (vec_merge: - (vec_select: - (match_operand:VI4F_256 1 "register_operand" "v") - (parallel [(const_int 4) (const_int 5) - (const_int 6) (const_int 7)])) - (match_operand: 2 "memory_operand" "0") - (match_operand: 3 "register_operand" "Yk")))] - "TARGET_AVX512F && TARGET_AVX512VL - && rtx_equal_p (operands[2], operands[0])" - "vextract32x4\t{$0x1, %1, %0%{%3%}|%0%{%3%}, %1, 0x1}" +(define_insn "vec_extract_lo_" + [(set (match_operand: 0 "nonimmediate_operand" "=vm,v") + (vec_select: + (match_operand:VI4F_256 1 "nonimmediate_operand" "v,vm") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)])))] + "TARGET_AVX + && !(MEM_P (operands[0]) && MEM_P (operands[1]))" + "#" [(set_attr "type" "sselog1") + (set_attr "prefix_extra" "1") (set_attr "length_immediate" "1") (set_attr "prefix" "evex") (set_attr "mode" "")]) (define_insn "vec_extract_hi__mask" - [(set (match_operand: 0 "register_operand" "=v") + [(set (match_operand: 0 "register_operand" "=v,m") (vec_merge: (vec_select: - (match_operand:VI4F_256 1 "register_operand" "v") + (match_operand:VI4F_256 1 "register_operand" "v,v") (parallel [(const_int 4) (const_int 5) (const_int 6) (const_int 7)])) - (match_operand: 2 "nonimm_or_0_operand" "0C") - (match_operand: 3 "register_operand" "Yk")))] - "TARGET_AVX512VL" + (match_operand: 2 "nonimm_or_0_operand" "0C,0") + (match_operand: 3 "register_operand" "Yk,Yk")))] + "TARGET_AVX512VL + && (!MEM_P (operands[0]) || rtx_equal_p (operands[0], operands[2]))" "vextract32x4\t{$0x1, %1, %0%{%3%}%N2|%0%{%3%}%N2, %1, 0x1}" [(set_attr "type" "sselog1") (set_attr "length_immediate" "1") diff --git a/gcc/config/i386/subst.md b/gcc/config/i386/subst.md index 4a1c9b0801e..a5ca144c7f7 100644 --- a/gcc/config/i386/subst.md +++ b/gcc/config/i386/subst.md @@ -57,8 +57,6 @@ (define_subst_attr "mask_avx512vl_condition" "mask" "1" "TARGET_AVX512VL") (define_subst_attr "mask_avx512bw_condition" "mask" "1" "TARGET_AVX512BW") (define_subst_attr "mask_avx512dq_condition" "mask" "1" "TARGET_AVX512DQ") -(define_subst_attr "store_mask_constraint" "mask" "vm" "v") -(define_subst_attr "store_mask_predicate" "mask" "nonimmediate_operand" "register_operand") (define_subst_attr "mask_prefix" "mask" "vex" "evex") (define_subst_attr "mask_prefix2" "mask" "maybe_vex" "evex") (define_subst_attr "mask_prefix3" "mask" "orig,vex" "evex,evex")