This patch adds support for ARM's Scalable Vector Extension.
The patch just contains the core features that work with the
current vectoriser framework; later patches will add extra
capabilities to both the target-independent code and AArch64 code.
The patch doesn't include:
- support for unwinding frames whose size depends on the vector length
- modelling the effect of __tls_get_addr on the SVE registers
These are handled by later patches instead.
Some notes:
- The copyright years for aarch64-sve.md start at 2009 because some of
the code is based on aarch64.md, which also starts from then.
- The patch inserts spaces between items in the AArch64 section
of sourcebuild.texi. This matches at least the surrounding
architectures and looks a little nicer in the info output.
- aarch64-sve.md includes a pattern:
while_ult<GPI:mode><PRED_ALL:mode>
A later patch adds a matching "while_ult" optab, but the pattern
is also needed by the predicate vec_duplicate expander.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/invoke.texi (-msve-vector-bits=): Document new option.
(sve): Document new AArch64 extension.
* doc/md.texi (w): Extend the description of the AArch64
constraint to include SVE vectors.
(Upl, Upa): Document new AArch64 predicate constraints.
* config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New
enum.
* config/aarch64/aarch64.opt (sve_vector_bits): New enum.
(msve-vector-bits=): New option.
* config/aarch64/aarch64-option-extensions.def (fp, simd): Disable
SVE when these are disabled.
(sve): New extension.
* config/aarch64/aarch64-modes.def: Define SVE vector and predicate
modes. Adjust their number of units based on aarch64_sve_vg.
(MAX_BITSIZE_MODE_ANY_MODE): Define.
* config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New
aarch64_addr_query_type.
(aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode)
(aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
(aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries)
(aarch64_split_add_offset, aarch64_output_sve_cnt_immediate)
(aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate)
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_check_zero_based_sve_index_immediate): Declare.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): Likewise.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare.
(aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare.
(aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare.
(aarch64_regmode_natural_size): Likewise.
* config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro.
(AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift
left one place.
(AARCH64_ISA_SVE, TARGET_SVE): New macros.
(FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries
for VG and the SVE predicate registers.
(V_ALIASES): Add a "z"-prefixed alias.
(FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1.
(AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros.
(PR_REGNUM_P, PR_LO_REGNUM_P): Likewise.
(PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes.
(REG_CLASS_NAMES): Add entries for them.
(REG_CLASS_CONTENTS): Likewise. Update ALL_REGS to include VG
and the predicate registers.
(aarch64_sve_vg): Declare.
(BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED)
(SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros.
(REGMODE_NATURAL_SIZE): Define.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle
SVE macros.
* config/aarch64/aarch64.c: Include cfgrtl.h.
(simd_immediate_info): Add a constructor for series vectors,
and an associated step field.
(aarch64_sve_vg): New variable.
(aarch64_dbx_register_number): Handle VG and the predicate registers.
(aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete.
(VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE)
(VEC_ANY_DATA, VEC_STRUCT): New constants.
(aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p)
(aarch64_classify_vector_mode, aarch64_vector_data_mode_p)
(aarch64_sve_data_mode_p, aarch64_sve_pred_mode)
(aarch64_get_mask_mode): New functions.
(aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS
and FP_LO_REGS. Handle PR_REGS, PR_LO_REGS and PR_HI_REGS.
(aarch64_hard_regno_mode_ok): Handle VG. Also handle the SVE
predicate modes and predicate registers. Explicitly restrict
GPRs to modes of 16 bytes or smaller. Only allow FP registers
to store a vector mode if it is recognized by
aarch64_classify_vector_mode.
(aarch64_regmode_natural_size): New function.
(aarch64_hard_regno_caller_save_mode): Return the original mode
for predicates.
(aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate)
(aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl)
(aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate)
(aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New
functions.
(aarch64_add_offset): Add a temp2 parameter. Assert that temp1
does not overlap dest if the function is frame-related. Handle
SVE constants.
(aarch64_split_add_offset): New function.
(aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass
them aarch64_add_offset.
(aarch64_allocate_and_probe_stack_space): Add a temp2 parameter
and update call to aarch64_sub_sp.
(aarch64_add_cfa_expression): New function.
(aarch64_expand_prologue): Pass extra temporary registers to the
functions above. Handle the case in which we need to emit new
DW_CFA_expressions for registers that were originally saved
relative to the stack pointer, but now have to be expressed
relative to the frame pointer.
(aarch64_output_mi_thunk): Pass extra temporary registers to the
functions above.
(aarch64_expand_epilogue): Likewise. Prevent inheritance of
IP0 and IP1 values for SVE frames.
(aarch64_expand_vec_series): New function.
(aarch64_expand_sve_widened_duplicate): Likewise.
(aarch64_expand_sve_const_vector): Likewise.
(aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter.
Handle SVE constants. Use emit_move_insn to move a force_const_mem
into the register, rather than emitting a SET directly.
(aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move)
(aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p)
(offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p)
(offset_9bit_signed_scaled_p): New functions.
(aarch64_replicate_bitmask_imm): New function.
(aarch64_bitmask_imm): Use it.
(aarch64_cannot_force_const_mem): Reject expressions involving
a CONST_POLY_INT. Update call to aarch64_classify_symbol.
(aarch64_classify_index): Handle SVE indices, by requiring
a plain register index with a scale that matches the element size.
(aarch64_classify_address): Handle SVE addresses. Assert that
the mode of the address is VOIDmode or an integer mode.
Update call to aarch64_classify_symbol.
(aarch64_classify_symbolic_expression): Update call to
aarch64_classify_symbol.
(aarch64_const_vec_all_in_range_p): New function.
(aarch64_print_vector_float_operand): Likewise.
(aarch64_print_operand): Handle 'N' and 'C'. Use "zN" rather than
"vN" for FP registers with SVE modes. Handle (const ...) vectors
and the FP immediates 1.0 and 0.5.
(aarch64_print_address_internal): Handle SVE addresses.
(aarch64_print_operand_address): Use ADDR_QUERY_ANY.
(aarch64_regno_regclass): Handle predicate registers.
(aarch64_secondary_reload): Handle big-endian reloads of SVE
data modes.
(aarch64_class_max_nregs): Handle SVE modes and predicate registers.
(aarch64_rtx_costs): Check for ADDVL and ADDPL instructions.
(aarch64_convert_sve_vector_bits): New function.
(aarch64_override_options): Use it to handle -msve-vector-bits=.
(aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
rather than an rtx.
(aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode.
Handle SVE vector and predicate modes. Accept VL-based constants
that need only one temporary register, and VL offsets that require
no temporary registers.
(aarch64_conditional_register_usage): Mark the predicate registers
as fixed if SVE isn't available.
(aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode.
Return true for SVE vector and predicate modes.
(aarch64_simd_container_mode): Take the number of bits as a poly_int64
rather than an unsigned int. Handle SVE modes.
(aarch64_preferred_simd_mode): Update call accordingly. Handle
SVE modes.
(aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR
if SVE is enabled.
(aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
(aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
(aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
(aarch64_sve_float_mul_immediate_p): New functions.
(aarch64_sve_valid_immediate): New function.
(aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors.
Explicitly reject structure modes. Check for INDEX constants.
Handle PTRUE and PFALSE constants.
(aarch64_check_zero_based_sve_index_immediate): New function.
(aarch64_simd_imm_zero_p): Delete.
(aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for
vector modes. Accept constants in the range of CNT[BHWD].
(aarch64_simd_scalar_immediate_valid_for_move): Explicitly
ask for an Advanced SIMD mode.
(aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions.
(aarch64_simd_vector_alignment): Handle SVE predicates.
(aarch64_vectorize_preferred_vector_alignment): New function.
(aarch64_simd_vector_alignment_reachable): Use it instead of
the vector size.
(aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p.
(aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New
functions.
(MAX_VECT_LEN): Delete.
(expand_vec_perm_d): Add a vec_flags field.
(emit_unspec2, aarch64_expand_sve_vec_perm): New functions.
(aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip)
(aarch64_evpc_ext): Don't apply a big-endian lane correction
for SVE modes.
(aarch64_evpc_rev): Rename to...
(aarch64_evpc_rev_local): ...this. Use a predicated operation for SVE.
(aarch64_evpc_rev_global): New function.
(aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP.
(aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of
MAX_VECT_LEN.
(aarch64_evpc_sve_tbl): New function.
(aarch64_expand_vec_perm_const_1): Update after rename of
aarch64_evpc_rev. Handle SVE permutes too, trying
aarch64_evpc_rev_global and using aarch64_evpc_sve_tbl rather
than aarch64_evpc_tbl.
(aarch64_vectorize_vec_perm_const): Initialize vec_flags.
(aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code)
(aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int)
(aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or)
(aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float)
(aarch64_expand_sve_vcond): New functions.
(aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead
of aarch64_vector_mode_p.
(aarch64_dwarf_poly_indeterminate_value): New function.
(aarch64_compute_pressure_classes): Likewise.
(aarch64_can_change_mode_class): Likewise.
(TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine.
(TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise.
(TARGET_VECTORIZE_GET_MASK_MODE): Likewise.
(TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise.
(TARGET_COMPUTE_PRESSURE_CLASSES): Likewise.
(TARGET_CAN_CHANGE_MODE_CLASS): Likewise.
* config/aarch64/constraints.md (Upa, Upl, Uav, Uat, Usv, Usi, Utr)
(Uty, Dm, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vsA, vsM, vsN): New
constraints.
(Dn, Dl, Dr): Accept const as well as const_vector.
(Dz): Likewise. Compare against CONST0_RTX.
* config/aarch64/iterators.md: Refer to "Advanced SIMD" instead
of "vector" where appropriate.
(SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD)
(SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators.
(UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT)
(UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE)
(UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS)
(UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs.
(Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV)
(v_int_equiv): Extend to SVE modes.
(Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New
mode attributes.
(LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators.
(optab): Handle popcount, smin, smax, umin, umax, abs and sqrt.
(logical_nn, lr, sve_int_op, sve_fp_op): New code attributs.
(LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP)
(SVE_COND_FP_CMP): New int iterators.
(perm_hilo): Handle the new unpack unspecs.
(optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int
attributes.
* config/aarch64/predicates.md (aarch64_sve_cnt_immediate)
(aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate)
(aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand)
(aarch64_equality_operator, aarch64_constant_vector_operand)
(aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates.
(aarch64_sve_nonimmediate_operand): Likewise.
(aarch64_sve_general_operand): Likewise.
(aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise.
(aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate)
(aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise.
(aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise.
(aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise.
(aarch64_sve_float_arith_immediate): Likewise.
(aarch64_sve_float_arith_with_sub_immediate): Likewise.
(aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise.
(aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise.
(aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise.
(aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise.
(aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise.
(aarch64_sve_float_arith_operand): Likewise.
(aarch64_sve_float_arith_with_sub_operand): Likewise.
(aarch64_sve_float_mul_operand): Likewise.
(aarch64_sve_vec_perm_operand): Likewise.
(aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate.
(aarch64_mov_operand): Accept const_poly_int and const_vector.
(aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const
as well as const_vector.
(aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier
in file. Use CONST0_RTX and CONSTM1_RTX.
(aarch64_simd_or_scalar_imm_zero): Likewise. Add match_codes.
(aarch64_simd_reg_or_zero): Accept const as well as const_vector.
Use aarch64_simd_imm_zero.
* config/aarch64/aarch64-sve.md: New file.
* config/aarch64/aarch64.md: Include it.
(VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers.
(UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE)
(UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI)
(UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK)
(UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants.
(sve): New attribute.
(enabled): Disable instructions with the sve attribute unless
TARGET_SVE.
(movqi, movhi): Pass CONST_POLY_INT operaneds through
aarch64_expand_mov_immediate.
(*mov<mode>_aarch64, *movsi_aarch64, *movdi_aarch64): Handle
CNT[BHSD] immediates.
(movti): Split CONST_POLY_INT moves into two halves.
(add<mode>3): Accept aarch64_pluslong_or_poly_operand.
Split additions that need a temporary here if the destination
is the stack pointer.
(*add<mode>3_aarch64): Handle ADDVL and ADDPL immediates.
(*add<mode>3_poly_1): New instruction.
(set_clobber_cc): New expander.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256612
+2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
+ Alan Hayward <alan.hayward@arm.com>
+ David Sherwood <david.sherwood@arm.com>
+
+ * doc/invoke.texi (-msve-vector-bits=): Document new option.
+ (sve): Document new AArch64 extension.
+ * doc/md.texi (w): Extend the description of the AArch64
+ constraint to include SVE vectors.
+ (Upl, Upa): Document new AArch64 predicate constraints.
+ * config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New
+ enum.
+ * config/aarch64/aarch64.opt (sve_vector_bits): New enum.
+ (msve-vector-bits=): New option.
+ * config/aarch64/aarch64-option-extensions.def (fp, simd): Disable
+ SVE when these are disabled.
+ (sve): New extension.
+ * config/aarch64/aarch64-modes.def: Define SVE vector and predicate
+ modes. Adjust their number of units based on aarch64_sve_vg.
+ (MAX_BITSIZE_MODE_ANY_MODE): Define.
+ * config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New
+ aarch64_addr_query_type.
+ (aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode)
+ (aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
+ (aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries)
+ (aarch64_split_add_offset, aarch64_output_sve_cnt_immediate)
+ (aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate)
+ (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare.
+ (aarch64_simd_imm_zero_p): Delete.
+ (aarch64_check_zero_based_sve_index_immediate): Declare.
+ (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
+ (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
+ (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
+ (aarch64_sve_float_mul_immediate_p): Likewise.
+ (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
+ rather than an rtx.
+ (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare.
+ (aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback.
+ (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare.
+ (aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float)
+ (aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare.
+ (aarch64_regmode_natural_size): Likewise.
+ * config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro.
+ (AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift
+ left one place.
+ (AARCH64_ISA_SVE, TARGET_SVE): New macros.
+ (FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries
+ for VG and the SVE predicate registers.
+ (V_ALIASES): Add a "z"-prefixed alias.
+ (FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1.
+ (AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros.
+ (PR_REGNUM_P, PR_LO_REGNUM_P): Likewise.
+ (PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes.
+ (REG_CLASS_NAMES): Add entries for them.
+ (REG_CLASS_CONTENTS): Likewise. Update ALL_REGS to include VG
+ and the predicate registers.
+ (aarch64_sve_vg): Declare.
+ (BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED)
+ (SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros.
+ (REGMODE_NATURAL_SIZE): Define.
+ * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle
+ SVE macros.
+ * config/aarch64/aarch64.c: Include cfgrtl.h.
+ (simd_immediate_info): Add a constructor for series vectors,
+ and an associated step field.
+ (aarch64_sve_vg): New variable.
+ (aarch64_dbx_register_number): Handle VG and the predicate registers.
+ (aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete.
+ (VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE)
+ (VEC_ANY_DATA, VEC_STRUCT): New constants.
+ (aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p)
+ (aarch64_classify_vector_mode, aarch64_vector_data_mode_p)
+ (aarch64_sve_data_mode_p, aarch64_sve_pred_mode)
+ (aarch64_get_mask_mode): New functions.
+ (aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS
+ and FP_LO_REGS. Handle PR_REGS, PR_LO_REGS and PR_HI_REGS.
+ (aarch64_hard_regno_mode_ok): Handle VG. Also handle the SVE
+ predicate modes and predicate registers. Explicitly restrict
+ GPRs to modes of 16 bytes or smaller. Only allow FP registers
+ to store a vector mode if it is recognized by
+ aarch64_classify_vector_mode.
+ (aarch64_regmode_natural_size): New function.
+ (aarch64_hard_regno_caller_save_mode): Return the original mode
+ for predicates.
+ (aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate)
+ (aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl)
+ (aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate)
+ (aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New
+ functions.
+ (aarch64_add_offset): Add a temp2 parameter. Assert that temp1
+ does not overlap dest if the function is frame-related. Handle
+ SVE constants.
+ (aarch64_split_add_offset): New function.
+ (aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass
+ them aarch64_add_offset.
+ (aarch64_allocate_and_probe_stack_space): Add a temp2 parameter
+ and update call to aarch64_sub_sp.
+ (aarch64_add_cfa_expression): New function.
+ (aarch64_expand_prologue): Pass extra temporary registers to the
+ functions above. Handle the case in which we need to emit new
+ DW_CFA_expressions for registers that were originally saved
+ relative to the stack pointer, but now have to be expressed
+ relative to the frame pointer.
+ (aarch64_output_mi_thunk): Pass extra temporary registers to the
+ functions above.
+ (aarch64_expand_epilogue): Likewise. Prevent inheritance of
+ IP0 and IP1 values for SVE frames.
+ (aarch64_expand_vec_series): New function.
+ (aarch64_expand_sve_widened_duplicate): Likewise.
+ (aarch64_expand_sve_const_vector): Likewise.
+ (aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter.
+ Handle SVE constants. Use emit_move_insn to move a force_const_mem
+ into the register, rather than emitting a SET directly.
+ (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move)
+ (aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p)
+ (offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p)
+ (offset_9bit_signed_scaled_p): New functions.
+ (aarch64_replicate_bitmask_imm): New function.
+ (aarch64_bitmask_imm): Use it.
+ (aarch64_cannot_force_const_mem): Reject expressions involving
+ a CONST_POLY_INT. Update call to aarch64_classify_symbol.
+ (aarch64_classify_index): Handle SVE indices, by requiring
+ a plain register index with a scale that matches the element size.
+ (aarch64_classify_address): Handle SVE addresses. Assert that
+ the mode of the address is VOIDmode or an integer mode.
+ Update call to aarch64_classify_symbol.
+ (aarch64_classify_symbolic_expression): Update call to
+ aarch64_classify_symbol.
+ (aarch64_const_vec_all_in_range_p): New function.
+ (aarch64_print_vector_float_operand): Likewise.
+ (aarch64_print_operand): Handle 'N' and 'C'. Use "zN" rather than
+ "vN" for FP registers with SVE modes. Handle (const ...) vectors
+ and the FP immediates 1.0 and 0.5.
+ (aarch64_print_address_internal): Handle SVE addresses.
+ (aarch64_print_operand_address): Use ADDR_QUERY_ANY.
+ (aarch64_regno_regclass): Handle predicate registers.
+ (aarch64_secondary_reload): Handle big-endian reloads of SVE
+ data modes.
+ (aarch64_class_max_nregs): Handle SVE modes and predicate registers.
+ (aarch64_rtx_costs): Check for ADDVL and ADDPL instructions.
+ (aarch64_convert_sve_vector_bits): New function.
+ (aarch64_override_options): Use it to handle -msve-vector-bits=.
+ (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
+ rather than an rtx.
+ (aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode.
+ Handle SVE vector and predicate modes. Accept VL-based constants
+ that need only one temporary register, and VL offsets that require
+ no temporary registers.
+ (aarch64_conditional_register_usage): Mark the predicate registers
+ as fixed if SVE isn't available.
+ (aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode.
+ Return true for SVE vector and predicate modes.
+ (aarch64_simd_container_mode): Take the number of bits as a poly_int64
+ rather than an unsigned int. Handle SVE modes.
+ (aarch64_preferred_simd_mode): Update call accordingly. Handle
+ SVE modes.
+ (aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR
+ if SVE is enabled.
+ (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
+ (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
+ (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
+ (aarch64_sve_float_mul_immediate_p): New functions.
+ (aarch64_sve_valid_immediate): New function.
+ (aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors.
+ Explicitly reject structure modes. Check for INDEX constants.
+ Handle PTRUE and PFALSE constants.
+ (aarch64_check_zero_based_sve_index_immediate): New function.
+ (aarch64_simd_imm_zero_p): Delete.
+ (aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for
+ vector modes. Accept constants in the range of CNT[BHWD].
+ (aarch64_simd_scalar_immediate_valid_for_move): Explicitly
+ ask for an Advanced SIMD mode.
+ (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions.
+ (aarch64_simd_vector_alignment): Handle SVE predicates.
+ (aarch64_vectorize_preferred_vector_alignment): New function.
+ (aarch64_simd_vector_alignment_reachable): Use it instead of
+ the vector size.
+ (aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p.
+ (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New
+ functions.
+ (MAX_VECT_LEN): Delete.
+ (expand_vec_perm_d): Add a vec_flags field.
+ (emit_unspec2, aarch64_expand_sve_vec_perm): New functions.
+ (aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip)
+ (aarch64_evpc_ext): Don't apply a big-endian lane correction
+ for SVE modes.
+ (aarch64_evpc_rev): Rename to...
+ (aarch64_evpc_rev_local): ...this. Use a predicated operation for SVE.
+ (aarch64_evpc_rev_global): New function.
+ (aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP.
+ (aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of
+ MAX_VECT_LEN.
+ (aarch64_evpc_sve_tbl): New function.
+ (aarch64_expand_vec_perm_const_1): Update after rename of
+ aarch64_evpc_rev. Handle SVE permutes too, trying
+ aarch64_evpc_rev_global and using aarch64_evpc_sve_tbl rather
+ than aarch64_evpc_tbl.
+ (aarch64_vectorize_vec_perm_const): Initialize vec_flags.
+ (aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code)
+ (aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int)
+ (aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or)
+ (aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float)
+ (aarch64_expand_sve_vcond): New functions.
+ (aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead
+ of aarch64_vector_mode_p.
+ (aarch64_dwarf_poly_indeterminate_value): New function.
+ (aarch64_compute_pressure_classes): Likewise.
+ (aarch64_can_change_mode_class): Likewise.
+ (TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine.
+ (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise.
+ (TARGET_VECTORIZE_GET_MASK_MODE): Likewise.
+ (TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise.
+ (TARGET_COMPUTE_PRESSURE_CLASSES): Likewise.
+ (TARGET_CAN_CHANGE_MODE_CLASS): Likewise.
+ * config/aarch64/constraints.md (Upa, Upl, Uav, Uat, Usv, Usi, Utr)
+ (Uty, Dm, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vsA, vsM, vsN): New
+ constraints.
+ (Dn, Dl, Dr): Accept const as well as const_vector.
+ (Dz): Likewise. Compare against CONST0_RTX.
+ * config/aarch64/iterators.md: Refer to "Advanced SIMD" instead
+ of "vector" where appropriate.
+ (SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD)
+ (SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators.
+ (UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT)
+ (UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE)
+ (UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS)
+ (UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs.
+ (Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV)
+ (v_int_equiv): Extend to SVE modes.
+ (Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New
+ mode attributes.
+ (LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators.
+ (optab): Handle popcount, smin, smax, umin, umax, abs and sqrt.
+ (logical_nn, lr, sve_int_op, sve_fp_op): New code attributs.
+ (LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP)
+ (SVE_COND_FP_CMP): New int iterators.
+ (perm_hilo): Handle the new unpack unspecs.
+ (optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int
+ attributes.
+ * config/aarch64/predicates.md (aarch64_sve_cnt_immediate)
+ (aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate)
+ (aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand)
+ (aarch64_equality_operator, aarch64_constant_vector_operand)
+ (aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates.
+ (aarch64_sve_nonimmediate_operand): Likewise.
+ (aarch64_sve_general_operand): Likewise.
+ (aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise.
+ (aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate)
+ (aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise.
+ (aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise.
+ (aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise.
+ (aarch64_sve_float_arith_immediate): Likewise.
+ (aarch64_sve_float_arith_with_sub_immediate): Likewise.
+ (aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise.
+ (aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise.
+ (aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise.
+ (aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise.
+ (aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise.
+ (aarch64_sve_float_arith_operand): Likewise.
+ (aarch64_sve_float_arith_with_sub_operand): Likewise.
+ (aarch64_sve_float_mul_operand): Likewise.
+ (aarch64_sve_vec_perm_operand): Likewise.
+ (aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate.
+ (aarch64_mov_operand): Accept const_poly_int and const_vector.
+ (aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const
+ as well as const_vector.
+ (aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier
+ in file. Use CONST0_RTX and CONSTM1_RTX.
+ (aarch64_simd_or_scalar_imm_zero): Likewise. Add match_codes.
+ (aarch64_simd_reg_or_zero): Accept const as well as const_vector.
+ Use aarch64_simd_imm_zero.
+ * config/aarch64/aarch64-sve.md: New file.
+ * config/aarch64/aarch64.md: Include it.
+ (VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers.
+ (UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE)
+ (UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI)
+ (UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK)
+ (UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants.
+ (sve): New attribute.
+ (enabled): Disable instructions with the sve attribute unless
+ TARGET_SVE.
+ (movqi, movhi): Pass CONST_POLY_INT operaneds through
+ aarch64_expand_mov_immediate.
+ (*mov<mode>_aarch64, *movsi_aarch64, *movdi_aarch64): Handle
+ CNT[BHSD] immediates.
+ (movti): Split CONST_POLY_INT moves into two halves.
+ (add<mode>3): Accept aarch64_pluslong_or_poly_operand.
+ Split additions that need a temporary here if the destination
+ is the stack pointer.
+ (*add<mode>3_aarch64): Handle ADDVL and ADDPL immediates.
+ (*add<mode>3_poly_1): New instruction.
+ (set_clobber_cc): New expander.
+
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
* simplify-rtx.c (simplify_immed_subreg): Add an inner_bytes
aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile);
aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile);
+ aarch64_def_or_undef (TARGET_SVE, "__ARM_FEATURE_SVE", pfile);
+ cpp_undef (pfile, "__ARM_FEATURE_SVE_BITS");
+ if (TARGET_SVE)
+ {
+ int bits;
+ if (!BITS_PER_SVE_VECTOR.is_constant (&bits))
+ bits = 0;
+ builtin_define_with_int_value ("__ARM_FEATURE_SVE_BITS", bits);
+ }
aarch64_def_or_undef (TARGET_AES, "__ARM_FEATURE_AES", pfile);
aarch64_def_or_undef (TARGET_SHA2, "__ARM_FEATURE_SHA2", pfile);
ADJUST_FLOAT_FORMAT (HF, &ieee_half_format);
/* Vector modes. */
+
+VECTOR_BOOL_MODE (VNx16BI, 16, 2);
+VECTOR_BOOL_MODE (VNx8BI, 8, 2);
+VECTOR_BOOL_MODE (VNx4BI, 4, 2);
+VECTOR_BOOL_MODE (VNx2BI, 2, 2);
+
+ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
+ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
+ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
+ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
+
+ADJUST_ALIGNMENT (VNx16BI, 2);
+ADJUST_ALIGNMENT (VNx8BI, 2);
+ADJUST_ALIGNMENT (VNx4BI, 2);
+ADJUST_ALIGNMENT (VNx2BI, 2);
+
VECTOR_MODES (INT, 8); /* V8QI V4HI V2SI. */
VECTOR_MODES (INT, 16); /* V16QI V8HI V4SI V2DI. */
VECTOR_MODES (FLOAT, 8); /* V2SF. */
INT_MODE (CI, 48);
INT_MODE (XI, 64);
+/* Define SVE modes for NVECS vectors. VB, VH, VS and VD are the prefixes
+ for 8-bit, 16-bit, 32-bit and 64-bit elements respectively. It isn't
+ strictly necessary to set the alignment here, since the default would
+ be clamped to BIGGEST_ALIGNMENT anyhow, but it seems clearer. */
+#define SVE_MODES(NVECS, VB, VH, VS, VD) \
+ VECTOR_MODES_WITH_PREFIX (VNx, INT, 16 * NVECS); \
+ VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 16 * NVECS); \
+ \
+ ADJUST_NUNITS (VB##QI, aarch64_sve_vg * NVECS * 8); \
+ ADJUST_NUNITS (VH##HI, aarch64_sve_vg * NVECS * 4); \
+ ADJUST_NUNITS (VS##SI, aarch64_sve_vg * NVECS * 2); \
+ ADJUST_NUNITS (VD##DI, aarch64_sve_vg * NVECS); \
+ ADJUST_NUNITS (VH##HF, aarch64_sve_vg * NVECS * 4); \
+ ADJUST_NUNITS (VS##SF, aarch64_sve_vg * NVECS * 2); \
+ ADJUST_NUNITS (VD##DF, aarch64_sve_vg * NVECS); \
+ \
+ ADJUST_ALIGNMENT (VB##QI, 16); \
+ ADJUST_ALIGNMENT (VH##HI, 16); \
+ ADJUST_ALIGNMENT (VS##SI, 16); \
+ ADJUST_ALIGNMENT (VD##DI, 16); \
+ ADJUST_ALIGNMENT (VH##HF, 16); \
+ ADJUST_ALIGNMENT (VS##SF, 16); \
+ ADJUST_ALIGNMENT (VD##DF, 16);
+
+/* Give SVE vectors the names normally used for 256-bit vectors.
+ The actual number depends on command-line flags. */
+SVE_MODES (1, VNx16, VNx8, VNx4, VNx2)
+
/* Quad float: 128-bit floating mode for long doubles. */
FLOAT_MODE (TF, 16, ieee_quad_format);
+/* A 4-tuple of SVE vectors with the maximum -msve-vector-bits= setting.
+ Note that this is a limit only on the compile-time sizes of modes;
+ it is not a limit on the runtime sizes, since VL-agnostic code
+ must work with arbitary vector lengths. */
+#define MAX_BITSIZE_MODE_ANY_MODE (2048 * 4)
+
/* Coefficient 1 is multiplied by the number of 128-bit chunks in an
SVE vector (referred to as "VQ") minus one. */
#define NUM_POLY_INT_COEFFS 2
that are required. Their order is not important. */
/* Enabling "fp" just enables "fp".
- Disabling "fp" also disables "simd", "crypto", "fp16", "aes", "sha2", "sha3", and sm3/sm4. */
+ Disabling "fp" also disables "simd", "crypto", "fp16", "aes", "sha2",
+ "sha3", sm3/sm4 and "sve". */
AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO |\
AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2 |\
- AARCH64_FL_SHA3 | AARCH64_FL_SM4, "fp")
+ AARCH64_FL_SHA3 | AARCH64_FL_SM4 | AARCH64_FL_SVE, "fp")
/* Enabling "simd" also enables "fp".
- Disabling "simd" also disables "crypto", "dotprod", "aes", "sha2", "sha3" and "sm3/sm4". */
+ Disabling "simd" also disables "crypto", "dotprod", "aes", "sha2", "sha3",
+ "sm3/sm4" and "sve". */
AARCH64_OPT_EXTENSION("simd", AARCH64_FL_SIMD, AARCH64_FL_FP, AARCH64_FL_CRYPTO |\
AARCH64_FL_DOTPROD | AARCH64_FL_AES | AARCH64_FL_SHA2 |\
- AARCH64_FL_SHA3 | AARCH64_FL_SM4, "asimd")
+ AARCH64_FL_SHA3 | AARCH64_FL_SM4 | AARCH64_FL_SVE,
+ "asimd")
/* Enabling "crypto" also enables "fp" and "simd".
Disabling "crypto" disables "crypto", "aes", "sha2", "sha3" and "sm3/sm4". */
AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics")
/* Enabling "fp16" also enables "fp".
- Disabling "fp16" disables "fp16" and "fp16fml". */
-AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, AARCH64_FL_F16FML, "fphp asimdhp")
+ Disabling "fp16" disables "fp16", "fp16fml" and "sve". */
+AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP,
+ AARCH64_FL_F16FML | AARCH64_FL_SVE, "fphp asimdhp")
/* Enabling or disabling "rcpc" only changes "rcpc". */
AARCH64_OPT_EXTENSION("rcpc", AARCH64_FL_RCPC, 0, 0, "lrcpc")
Disabling "fp16fml" just disables "fp16fml". */
AARCH64_OPT_EXTENSION("fp16fml", AARCH64_FL_F16FML, AARCH64_FL_FP | AARCH64_FL_F16, 0, "asimdfml")
+/* Enabling "sve" also enables "fp16", "fp" and "simd".
+ Disabling "sve" just disables "sve". */
+AARCH64_OPT_EXTENSION("sve", AARCH64_FL_SVE, AARCH64_FL_FP | AARCH64_FL_SIMD | AARCH64_FL_F16, 0, "sve")
+
#undef AARCH64_OPT_EXTENSION
AARCH64_FUNCTION_ALL
};
+/* SVE vector register sizes. */
+enum aarch64_sve_vector_bits_enum {
+ SVE_SCALABLE,
+ SVE_128 = 128,
+ SVE_256 = 256,
+ SVE_512 = 512,
+ SVE_1024 = 1024,
+ SVE_2048 = 2048
+};
+
#endif
(the rules are the same for both).
ADDR_QUERY_LDP_STP
- Query what is valid for a load/store pair. */
+ Query what is valid for a load/store pair.
+
+ ADDR_QUERY_ANY
+ Query what is valid for at least one memory constraint, which may
+ allow things that "m" doesn't. For example, the SVE LDR and STR
+ addressing modes allow a wider range of immediate offsets than "m"
+ does. */
enum aarch64_addr_query_type {
ADDR_QUERY_M,
- ADDR_QUERY_LDP_STP
+ ADDR_QUERY_LDP_STP,
+ ADDR_QUERY_ANY
};
/* A set of tuning parameters contains references to size and time
enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
+bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
+ HOST_WIDE_INT);
bool aarch64_constant_address_p (rtx);
bool aarch64_emit_approx_div (rtx, rtx, rtx);
bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
+opt_machine_mode aarch64_sve_pred_mode (unsigned int);
+bool aarch64_sve_cnt_immediate_p (rtx);
+bool aarch64_sve_addvl_addpl_immediate_p (rtx);
+bool aarch64_sve_inc_dec_immediate_p (rtx);
+int aarch64_add_offset_temporaries (rtx);
+void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx);
bool aarch64_mov_operand_p (rtx, machine_mode);
rtx aarch64_reverse_mask (machine_mode, unsigned int);
bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64);
+char *aarch64_output_sve_cnt_immediate (const char *, const char *, rtx);
+char *aarch64_output_sve_addvl_addpl (rtx, rtx, rtx);
+char *aarch64_output_sve_inc_dec_immediate (const char *, rtx);
char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
char *aarch64_output_simd_mov_immediate (rtx, unsigned,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
+char *aarch64_output_sve_mov_immediate (rtx);
+char *aarch64_output_ptrue (machine_mode, char);
bool aarch64_pad_reg_upward (machine_mode, const_tree, bool);
bool aarch64_regno_ok_for_base_p (int, bool);
bool aarch64_regno_ok_for_index_p (int, bool);
bool aarch64_reinterpret_float_as_int (rtx value, unsigned HOST_WIDE_INT *fail);
bool aarch64_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
bool high);
-bool aarch64_simd_imm_zero_p (rtx, machine_mode);
bool aarch64_simd_scalar_immediate_valid_for_move (rtx, scalar_int_mode);
bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
+rtx aarch64_check_zero_based_sve_index_immediate (rtx);
+bool aarch64_sve_index_immediate_p (rtx);
+bool aarch64_sve_arith_immediate_p (rtx, bool);
+bool aarch64_sve_bitmask_immediate_p (rtx);
+bool aarch64_sve_dup_immediate_p (rtx);
+bool aarch64_sve_cmp_immediate_p (rtx, bool);
+bool aarch64_sve_float_arith_immediate_p (rtx, bool);
+bool aarch64_sve_float_mul_immediate_p (rtx);
bool aarch64_split_dimode_const_store (rtx, rtx);
bool aarch64_symbolic_address_p (rtx);
bool aarch64_uimm12_shift (HOST_WIDE_INT);
const char *aarch64_mangle_builtin_type (const_tree);
const char *aarch64_output_casesi (rtx *);
-enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx);
+enum aarch64_symbol_type aarch64_classify_symbol (rtx, HOST_WIDE_INT);
enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
enum reg_class aarch64_regno_regclass (unsigned);
int aarch64_asm_preferred_eh_data_format (int, int);
rtx aarch64_return_addr (int, rtx);
rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
bool aarch64_simd_mem_operand_p (rtx);
+bool aarch64_sve_ld1r_operand_p (rtx);
+bool aarch64_sve_ldr_operand_p (rtx);
rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool);
rtx aarch64_tls_get_addr (void);
tree aarch64_fold_builtin (tree, int, tree *, bool);
const char * aarch64_output_probe_stack_range (rtx, rtx);
void aarch64_err_no_fpadvsimd (machine_mode, const char *);
void aarch64_expand_epilogue (bool);
-void aarch64_expand_mov_immediate (rtx, rtx);
+void aarch64_expand_mov_immediate (rtx, rtx, rtx (*) (rtx, rtx) = 0);
+void aarch64_emit_sve_pred_move (rtx, rtx, rtx);
+void aarch64_expand_sve_mem_move (rtx, rtx, machine_mode);
void aarch64_expand_prologue (void);
void aarch64_expand_vector_init (rtx, rtx);
void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
bool aarch64_gen_adjusted_ldpstp (rtx *, bool, scalar_mode, RTX_CODE);
+
+void aarch64_expand_sve_vec_cmp_int (rtx, rtx_code, rtx, rtx);
+bool aarch64_expand_sve_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
+void aarch64_expand_sve_vcond (machine_mode, machine_mode, rtx *);
#endif /* RTX_CODE */
void aarch64_init_builtins (void);
extern void aarch64_split_combinev16qi (rtx operands[3]);
extern void aarch64_expand_vec_perm (rtx, rtx, rtx, rtx, unsigned int);
+extern void aarch64_expand_sve_vec_perm (rtx, rtx, rtx, rtx);
extern bool aarch64_madd_needs_nop (rtx_insn *);
extern void aarch64_final_prescan_insn (rtx_insn *);
void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
+poly_uint64 aarch64_regmode_natural_size (machine_mode);
+
#endif /* GCC_AARCH64_PROTOS_H */
--- /dev/null
+;; Machine description for AArch64 SVE.
+;; Copyright (C) 2009-2016 Free Software Foundation, Inc.
+;; Contributed by ARM Ltd.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Note on the handling of big-endian SVE
+;; --------------------------------------
+;;
+;; On big-endian systems, Advanced SIMD mov<mode> patterns act in the
+;; same way as movdi or movti would: the first byte of memory goes
+;; into the most significant byte of the register and the last byte
+;; of memory goes into the least significant byte of the register.
+;; This is the most natural ordering for Advanced SIMD and matches
+;; the ABI layout for 64-bit and 128-bit vector types.
+;;
+;; As a result, the order of bytes within the register is what GCC
+;; expects for a big-endian target, and subreg offsets therefore work
+;; as expected, with the first element in memory having subreg offset 0
+;; and the last element in memory having the subreg offset associated
+;; with a big-endian lowpart. However, this ordering also means that
+;; GCC's lane numbering does not match the architecture's numbering:
+;; GCC always treats the element at the lowest address in memory
+;; (subreg offset 0) as element 0, while the architecture treats
+;; the least significant end of the register as element 0.
+;;
+;; The situation for SVE is different. We want the layout of the
+;; SVE register to be same for mov<mode> as it is for maskload<mode>:
+;; logically, a mov<mode> load must be indistinguishable from a
+;; maskload<mode> whose mask is all true. We therefore need the
+;; register layout to match LD1 rather than LDR. The ABI layout of
+;; SVE types also matches LD1 byte ordering rather than LDR byte ordering.
+;;
+;; As a result, the architecture lane numbering matches GCC's lane
+;; numbering, with element 0 always being the first in memory.
+;; However:
+;;
+;; - Applying a subreg offset to a register does not give the element
+;; that GCC expects: the first element in memory has the subreg offset
+;; associated with a big-endian lowpart while the last element in memory
+;; has subreg offset 0. We handle this via TARGET_CAN_CHANGE_MODE_CLASS.
+;;
+;; - We cannot use LDR and STR for spill slots that might be accessed
+;; via subregs, since although the elements have the order GCC expects,
+;; the order of the bytes within the elements is different. We instead
+;; access spill slots via LD1 and ST1, using secondary reloads to
+;; reserve a predicate register.
+
+
+;; SVE data moves.
+(define_expand "mov<mode>"
+ [(set (match_operand:SVE_ALL 0 "nonimmediate_operand")
+ (match_operand:SVE_ALL 1 "general_operand"))]
+ "TARGET_SVE"
+ {
+ /* Use the predicated load and store patterns where possible.
+ This is required for big-endian targets (see the comment at the
+ head of the file) and increases the addressing choices for
+ little-endian. */
+ if ((MEM_P (operands[0]) || MEM_P (operands[1]))
+ && can_create_pseudo_p ())
+ {
+ aarch64_expand_sve_mem_move (operands[0], operands[1], <VPRED>mode);
+ DONE;
+ }
+
+ if (CONSTANT_P (operands[1]))
+ {
+ aarch64_expand_mov_immediate (operands[0], operands[1],
+ gen_vec_duplicate<mode>);
+ DONE;
+ }
+ }
+)
+
+;; Unpredicated moves (little-endian). Only allow memory operations
+;; during and after RA; before RA we want the predicated load and
+;; store patterns to be used instead.
+(define_insn "*aarch64_sve_mov<mode>_le"
+ [(set (match_operand:SVE_ALL 0 "aarch64_sve_nonimmediate_operand" "=w, Utr, w, w")
+ (match_operand:SVE_ALL 1 "aarch64_sve_general_operand" "Utr, w, w, Dn"))]
+ "TARGET_SVE
+ && !BYTES_BIG_ENDIAN
+ && ((lra_in_progress || reload_completed)
+ || (register_operand (operands[0], <MODE>mode)
+ && nonmemory_operand (operands[1], <MODE>mode)))"
+ "@
+ ldr\t%0, %1
+ str\t%1, %0
+ mov\t%0.d, %1.d
+ * return aarch64_output_sve_mov_immediate (operands[1]);"
+)
+
+;; Unpredicated moves (big-endian). Memory accesses require secondary
+;; reloads.
+(define_insn "*aarch64_sve_mov<mode>_be"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w")
+ (match_operand:SVE_ALL 1 "aarch64_nonmemory_operand" "w, Dn"))]
+ "TARGET_SVE && BYTES_BIG_ENDIAN"
+ "@
+ mov\t%0.d, %1.d
+ * return aarch64_output_sve_mov_immediate (operands[1]);"
+)
+
+;; Handle big-endian memory reloads. We use byte PTRUE for all modes
+;; to try to encourage reuse.
+(define_expand "aarch64_sve_reload_be"
+ [(parallel
+ [(set (match_operand 0)
+ (match_operand 1))
+ (clobber (match_operand:VNx16BI 2 "register_operand" "=Upl"))])]
+ "TARGET_SVE && BYTES_BIG_ENDIAN"
+ {
+ /* Create a PTRUE. */
+ emit_move_insn (operands[2], CONSTM1_RTX (VNx16BImode));
+
+ /* Refer to the PTRUE in the appropriate mode for this move. */
+ machine_mode mode = GET_MODE (operands[0]);
+ machine_mode pred_mode
+ = aarch64_sve_pred_mode (GET_MODE_UNIT_SIZE (mode)).require ();
+ rtx pred = gen_lowpart (pred_mode, operands[2]);
+
+ /* Emit a predicated load or store. */
+ aarch64_emit_sve_pred_move (operands[0], pred, operands[1]);
+ DONE;
+ }
+)
+
+;; A predicated load or store for which the predicate is known to be
+;; all-true. Note that this pattern is generated directly by
+;; aarch64_emit_sve_pred_move, so changes to this pattern will
+;; need changes there as well.
+(define_insn "*pred_mov<mode>"
+ [(set (match_operand:SVE_ALL 0 "nonimmediate_operand" "=w, m")
+ (unspec:SVE_ALL
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (match_operand:SVE_ALL 2 "nonimmediate_operand" "m, w")]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE
+ && (register_operand (operands[0], <MODE>mode)
+ || register_operand (operands[2], <MODE>mode))"
+ "@
+ ld1<Vesize>\t%0.<Vetype>, %1/z, %2
+ st1<Vesize>\t%2.<Vetype>, %1, %0"
+)
+
+(define_expand "movmisalign<mode>"
+ [(set (match_operand:SVE_ALL 0 "nonimmediate_operand")
+ (match_operand:SVE_ALL 1 "general_operand"))]
+ "TARGET_SVE"
+ {
+ /* Equivalent to a normal move for our purpooses. */
+ emit_move_insn (operands[0], operands[1]);
+ DONE;
+ }
+)
+
+(define_insn "maskload<mode><vpred>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL
+ [(match_operand:<VPRED> 2 "register_operand" "Upl")
+ (match_operand:SVE_ALL 1 "memory_operand" "m")]
+ UNSPEC_LD1_SVE))]
+ "TARGET_SVE"
+ "ld1<Vesize>\t%0.<Vetype>, %2/z, %1"
+)
+
+(define_insn "maskstore<mode><vpred>"
+ [(set (match_operand:SVE_ALL 0 "memory_operand" "+m")
+ (unspec:SVE_ALL [(match_operand:<VPRED> 2 "register_operand" "Upl")
+ (match_operand:SVE_ALL 1 "register_operand" "w")
+ (match_dup 0)]
+ UNSPEC_ST1_SVE))]
+ "TARGET_SVE"
+ "st1<Vesize>\t%1.<Vetype>, %2, %0"
+)
+
+(define_expand "mov<mode>"
+ [(set (match_operand:PRED_ALL 0 "nonimmediate_operand")
+ (match_operand:PRED_ALL 1 "general_operand"))]
+ "TARGET_SVE"
+ {
+ if (GET_CODE (operands[0]) == MEM)
+ operands[1] = force_reg (<MODE>mode, operands[1]);
+ }
+)
+
+(define_insn "*aarch64_sve_mov<mode>"
+ [(set (match_operand:PRED_ALL 0 "nonimmediate_operand" "=Upa, m, Upa, Upa, Upa")
+ (match_operand:PRED_ALL 1 "general_operand" "Upa, Upa, m, Dz, Dm"))]
+ "TARGET_SVE
+ && (register_operand (operands[0], <MODE>mode)
+ || register_operand (operands[1], <MODE>mode))"
+ "@
+ mov\t%0.b, %1.b
+ str\t%1, %0
+ ldr\t%0, %1
+ pfalse\t%0.b
+ * return aarch64_output_ptrue (<MODE>mode, '<Vetype>');"
+)
+
+;; Handle extractions from a predicate by converting to an integer vector
+;; and extracting from there.
+(define_expand "vec_extract<vpred><Vel>"
+ [(match_operand:<VEL> 0 "register_operand")
+ (match_operand:<VPRED> 1 "register_operand")
+ (match_operand:SI 2 "nonmemory_operand")
+ ;; Dummy operand to which we can attach the iterator.
+ (reg:SVE_I V0_REGNUM)]
+ "TARGET_SVE"
+ {
+ rtx tmp = gen_reg_rtx (<MODE>mode);
+ emit_insn (gen_aarch64_sve_dup<mode>_const (tmp, operands[1],
+ CONST1_RTX (<MODE>mode),
+ CONST0_RTX (<MODE>mode)));
+ emit_insn (gen_vec_extract<mode><Vel> (operands[0], tmp, operands[2]));
+ DONE;
+ }
+)
+
+(define_expand "vec_extract<mode><Vel>"
+ [(set (match_operand:<VEL> 0 "register_operand")
+ (vec_select:<VEL>
+ (match_operand:SVE_ALL 1 "register_operand")
+ (parallel [(match_operand:SI 2 "nonmemory_operand")])))]
+ "TARGET_SVE"
+ {
+ poly_int64 val;
+ if (poly_int_rtx_p (operands[2], &val)
+ && known_eq (val, GET_MODE_NUNITS (<MODE>mode) - 1))
+ {
+ /* The last element can be extracted with a LASTB and a false
+ predicate. */
+ rtx sel = force_reg (<VPRED>mode, CONST0_RTX (<VPRED>mode));
+ emit_insn (gen_aarch64_sve_lastb<mode> (operands[0], sel,
+ operands[1]));
+ DONE;
+ }
+ if (!CONST_INT_P (operands[2]))
+ {
+ /* Create an index with operand[2] as the base and -1 as the step.
+ It will then be zero for the element we care about. */
+ rtx index = gen_lowpart (<VEL_INT>mode, operands[2]);
+ index = force_reg (<VEL_INT>mode, index);
+ rtx series = gen_reg_rtx (<V_INT_EQUIV>mode);
+ emit_insn (gen_vec_series<v_int_equiv> (series, index, constm1_rtx));
+
+ /* Get a predicate that is true for only that element. */
+ rtx zero = CONST0_RTX (<V_INT_EQUIV>mode);
+ rtx cmp = gen_rtx_EQ (<V_INT_EQUIV>mode, series, zero);
+ rtx sel = gen_reg_rtx (<VPRED>mode);
+ emit_insn (gen_vec_cmp<v_int_equiv><vpred> (sel, cmp, series, zero));
+
+ /* Select the element using LASTB. */
+ emit_insn (gen_aarch64_sve_lastb<mode> (operands[0], sel,
+ operands[1]));
+ DONE;
+ }
+ }
+)
+
+;; Extract an element from the Advanced SIMD portion of the register.
+;; We don't just reuse the aarch64-simd.md pattern because we don't
+;; want any chnage in lane number on big-endian targets.
+(define_insn "*vec_extract<mode><Vel>_v128"
+ [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
+ (vec_select:<VEL>
+ (match_operand:SVE_ALL 1 "register_operand" "w, w, w")
+ (parallel [(match_operand:SI 2 "const_int_operand")])))]
+ "TARGET_SVE
+ && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 0, 15)"
+ {
+ operands[1] = gen_lowpart (<V128>mode, operands[1]);
+ switch (which_alternative)
+ {
+ case 0:
+ return "umov\\t%<vwcore>0, %1.<Vetype>[%2]";
+ case 1:
+ return "dup\\t%<Vetype>0, %1.<Vetype>[%2]";
+ case 2:
+ return "st1\\t{%1.<Vetype>}[%2], %0";
+ default:
+ gcc_unreachable ();
+ }
+ }
+ [(set_attr "type" "neon_to_gp_q, neon_dup_q, neon_store1_one_lane_q")]
+)
+
+;; Extract an element in the range of DUP. This pattern allows the
+;; source and destination to be different.
+(define_insn "*vec_extract<mode><Vel>_dup"
+ [(set (match_operand:<VEL> 0 "register_operand" "=w")
+ (vec_select:<VEL>
+ (match_operand:SVE_ALL 1 "register_operand" "w")
+ (parallel [(match_operand:SI 2 "const_int_operand")])))]
+ "TARGET_SVE
+ && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 16, 63)"
+ {
+ operands[0] = gen_rtx_REG (<MODE>mode, REGNO (operands[0]));
+ return "dup\t%0.<Vetype>, %1.<Vetype>[%2]";
+ }
+)
+
+;; Extract an element outside the range of DUP. This pattern requires the
+;; source and destination to be the same.
+(define_insn "*vec_extract<mode><Vel>_ext"
+ [(set (match_operand:<VEL> 0 "register_operand" "=w")
+ (vec_select:<VEL>
+ (match_operand:SVE_ALL 1 "register_operand" "0")
+ (parallel [(match_operand:SI 2 "const_int_operand")])))]
+ "TARGET_SVE && INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode) >= 64"
+ {
+ operands[0] = gen_rtx_REG (<MODE>mode, REGNO (operands[0]));
+ operands[2] = GEN_INT (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode));
+ return "ext\t%0.b, %0.b, %0.b, #%2";
+ }
+)
+
+;; Extract the last active element of operand 1 into operand 0.
+;; If no elements are active, extract the last inactive element instead.
+(define_insn "aarch64_sve_lastb<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand" "=r, w")
+ (unspec:<VEL>
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (match_operand:SVE_ALL 2 "register_operand" "w, w")]
+ UNSPEC_LASTB))]
+ "TARGET_SVE"
+ "@
+ lastb\t%<vwcore>0, %1, %2.<Vetype>
+ lastb\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
+(define_expand "vec_duplicate<mode>"
+ [(parallel
+ [(set (match_operand:SVE_ALL 0 "register_operand")
+ (vec_duplicate:SVE_ALL
+ (match_operand:<VEL> 1 "aarch64_sve_dup_operand")))
+ (clobber (scratch:<VPRED>))])]
+ "TARGET_SVE"
+ {
+ if (MEM_P (operands[1]))
+ {
+ rtx ptrue = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ emit_insn (gen_sve_ld1r<mode> (operands[0], ptrue, operands[1],
+ CONST0_RTX (<MODE>mode)));
+ DONE;
+ }
+ }
+)
+
+;; Accept memory operands for the benefit of combine, and also in case
+;; the scalar input gets spilled to memory during RA. We want to split
+;; the load at the first opportunity in order to allow the PTRUE to be
+;; optimized with surrounding code.
+(define_insn_and_split "*vec_duplicate<mode>_reg"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w, w")
+ (vec_duplicate:SVE_ALL
+ (match_operand:<VEL> 1 "aarch64_sve_dup_operand" "r, w, Uty")))
+ (clobber (match_scratch:<VPRED> 2 "=X, X, Upl"))]
+ "TARGET_SVE"
+ "@
+ mov\t%0.<Vetype>, %<vwcore>1
+ mov\t%0.<Vetype>, %<Vetype>1
+ #"
+ "&& MEM_P (operands[1])"
+ [(const_int 0)]
+ {
+ if (GET_CODE (operands[2]) == SCRATCH)
+ operands[2] = gen_reg_rtx (<VPRED>mode);
+ emit_move_insn (operands[2], CONSTM1_RTX (<VPRED>mode));
+ emit_insn (gen_sve_ld1r<mode> (operands[0], operands[2], operands[1],
+ CONST0_RTX (<MODE>mode)));
+ DONE;
+ }
+ [(set_attr "length" "4,4,8")]
+)
+
+;; This is used for vec_duplicate<mode>s from memory, but can also
+;; be used by combine to optimize selects of a a vec_duplicate<mode>
+;; with zero.
+(define_insn "sve_ld1r<mode>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (vec_duplicate:SVE_ALL
+ (match_operand:<VEL> 2 "aarch64_sve_ld1r_operand" "Uty"))
+ (match_operand:SVE_ALL 3 "aarch64_simd_imm_zero")]
+ UNSPEC_SEL))]
+ "TARGET_SVE"
+ "ld1r<Vesize>\t%0.<Vetype>, %1/z, %2"
+)
+
+;; Load 128 bits from memory and duplicate to fill a vector. Since there
+;; are so few operations on 128-bit "elements", we don't define a VNx1TI
+;; and simply use vectors of bytes instead.
+(define_insn "sve_ld1rq"
+ [(set (match_operand:VNx16QI 0 "register_operand" "=w")
+ (unspec:VNx16QI
+ [(match_operand:VNx16BI 1 "register_operand" "Upl")
+ (match_operand:TI 2 "aarch64_sve_ld1r_operand" "Uty")]
+ UNSPEC_LD1RQ))]
+ "TARGET_SVE"
+ "ld1rqb\t%0.b, %1/z, %2"
+)
+
+;; Implement a predicate broadcast by shifting the low bit of the scalar
+;; input into the top bit and using a WHILELO. An alternative would be to
+;; duplicate the input and do a compare with zero.
+(define_expand "vec_duplicate<mode>"
+ [(set (match_operand:PRED_ALL 0 "register_operand")
+ (vec_duplicate:PRED_ALL (match_operand 1 "register_operand")))]
+ "TARGET_SVE"
+ {
+ rtx tmp = gen_reg_rtx (DImode);
+ rtx op1 = gen_lowpart (DImode, operands[1]);
+ emit_insn (gen_ashldi3 (tmp, op1, gen_int_mode (63, DImode)));
+ emit_insn (gen_while_ultdi<mode> (operands[0], const0_rtx, tmp));
+ DONE;
+ }
+)
+
+(define_insn "vec_series<mode>"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w")
+ (vec_series:SVE_I
+ (match_operand:<VEL> 1 "aarch64_sve_index_operand" "Usi, r, r")
+ (match_operand:<VEL> 2 "aarch64_sve_index_operand" "r, Usi, r")))]
+ "TARGET_SVE"
+ "@
+ index\t%0.<Vetype>, #%1, %<vw>2
+ index\t%0.<Vetype>, %<vw>1, #%2
+ index\t%0.<Vetype>, %<vw>1, %<vw>2"
+)
+
+;; Optimize {x, x, x, x, ...} + {0, n, 2*n, 3*n, ...} if n is in range
+;; of an INDEX instruction.
+(define_insn "*vec_series<mode>_plus"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w")
+ (plus:SVE_I
+ (vec_duplicate:SVE_I
+ (match_operand:<VEL> 1 "register_operand" "r"))
+ (match_operand:SVE_I 2 "immediate_operand")))]
+ "TARGET_SVE && aarch64_check_zero_based_sve_index_immediate (operands[2])"
+ {
+ operands[2] = aarch64_check_zero_based_sve_index_immediate (operands[2]);
+ return "index\t%0.<Vetype>, %<vw>1, #%2";
+ }
+)
+
+(define_expand "vec_perm<mode>"
+ [(match_operand:SVE_ALL 0 "register_operand")
+ (match_operand:SVE_ALL 1 "register_operand")
+ (match_operand:SVE_ALL 2 "register_operand")
+ (match_operand:<V_INT_EQUIV> 3 "aarch64_sve_vec_perm_operand")]
+ "TARGET_SVE && GET_MODE_NUNITS (<MODE>mode).is_constant ()"
+ {
+ aarch64_expand_sve_vec_perm (operands[0], operands[1],
+ operands[2], operands[3]);
+ DONE;
+ }
+)
+
+(define_insn "*aarch64_sve_tbl<mode>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL
+ [(match_operand:SVE_ALL 1 "register_operand" "w")
+ (match_operand:<V_INT_EQUIV> 2 "register_operand" "w")]
+ UNSPEC_TBL))]
+ "TARGET_SVE"
+ "tbl\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+(define_insn "*aarch64_sve_<perm_insn><perm_hilo><mode>"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (unspec:PRED_ALL [(match_operand:PRED_ALL 1 "register_operand" "Upa")
+ (match_operand:PRED_ALL 2 "register_operand" "Upa")]
+ PERMUTE))]
+ "TARGET_SVE"
+ "<perm_insn><perm_hilo>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+(define_insn "*aarch64_sve_<perm_insn><perm_hilo><mode>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "w")
+ (match_operand:SVE_ALL 2 "register_operand" "w")]
+ PERMUTE))]
+ "TARGET_SVE"
+ "<perm_insn><perm_hilo>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+(define_insn "*aarch64_sve_rev64<mode>"
+ [(set (match_operand:SVE_BHS 0 "register_operand" "=w")
+ (unspec:SVE_BHS
+ [(match_operand:VNx2BI 1 "register_operand" "Upl")
+ (unspec:SVE_BHS [(match_operand:SVE_BHS 2 "register_operand" "w")]
+ UNSPEC_REV64)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "rev<Vesize>\t%0.d, %1/m, %2.d"
+)
+
+(define_insn "*aarch64_sve_rev32<mode>"
+ [(set (match_operand:SVE_BH 0 "register_operand" "=w")
+ (unspec:SVE_BH
+ [(match_operand:VNx4BI 1 "register_operand" "Upl")
+ (unspec:SVE_BH [(match_operand:SVE_BH 2 "register_operand" "w")]
+ UNSPEC_REV32)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "rev<Vesize>\t%0.s, %1/m, %2.s"
+)
+
+(define_insn "*aarch64_sve_rev16vnx16qi"
+ [(set (match_operand:VNx16QI 0 "register_operand" "=w")
+ (unspec:VNx16QI
+ [(match_operand:VNx8BI 1 "register_operand" "Upl")
+ (unspec:VNx16QI [(match_operand:VNx16QI 2 "register_operand" "w")]
+ UNSPEC_REV16)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "revb\t%0.h, %1/m, %2.h"
+)
+
+(define_insn "*aarch64_sve_rev<mode>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "w")]
+ UNSPEC_REV))]
+ "TARGET_SVE"
+ "rev\t%0.<Vetype>, %1.<Vetype>")
+
+(define_insn "*aarch64_sve_dup_lane<mode>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (vec_duplicate:SVE_ALL
+ (vec_select:<VEL>
+ (match_operand:SVE_ALL 1 "register_operand" "w")
+ (parallel [(match_operand:SI 2 "const_int_operand")]))))]
+ "TARGET_SVE
+ && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 0, 63)"
+ "dup\t%0.<Vetype>, %1.<Vetype>[%2]"
+)
+
+;; Note that the immediate (third) operand is the lane index not
+;; the byte index.
+(define_insn "*aarch64_sve_ext<mode>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "0")
+ (match_operand:SVE_ALL 2 "register_operand" "w")
+ (match_operand:SI 3 "const_int_operand")]
+ UNSPEC_EXT))]
+ "TARGET_SVE
+ && IN_RANGE (INTVAL (operands[3]) * GET_MODE_SIZE (<VEL>mode), 0, 255)"
+ {
+ operands[3] = GEN_INT (INTVAL (operands[3]) * GET_MODE_SIZE (<VEL>mode));
+ return "ext\\t%0.b, %0.b, %2.b, #%3";
+ }
+)
+
+(define_insn "add<mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w, w")
+ (plus:SVE_I
+ (match_operand:SVE_I 1 "register_operand" "%0, 0, 0, w")
+ (match_operand:SVE_I 2 "aarch64_sve_add_operand" "vsa, vsn, vsi, w")))]
+ "TARGET_SVE"
+ "@
+ add\t%0.<Vetype>, %0.<Vetype>, #%D2
+ sub\t%0.<Vetype>, %0.<Vetype>, #%N2
+ * return aarch64_output_sve_inc_dec_immediate (\"%0.<Vetype>\", operands[2]);
+ add\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+(define_insn "sub<mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+ (minus:SVE_I
+ (match_operand:SVE_I 1 "aarch64_sve_arith_operand" "w, vsa")
+ (match_operand:SVE_I 2 "register_operand" "w, 0")))]
+ "TARGET_SVE"
+ "@
+ sub\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>
+ subr\t%0.<Vetype>, %0.<Vetype>, #%D1"
+)
+
+;; Unpredicated multiplication.
+(define_expand "mul<mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand")
+ (unspec:SVE_I
+ [(match_dup 3)
+ (mult:SVE_I
+ (match_operand:SVE_I 1 "register_operand")
+ (match_operand:SVE_I 2 "aarch64_sve_mul_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Multiplication predicated with a PTRUE. We don't actually need the
+;; predicate for the first alternative, but using Upa or X isn't likely
+;; to gain much and would make the instruction seem less uniform to the
+;; register allocator.
+(define_insn "*mul<mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (mult:SVE_I
+ (match_operand:SVE_I 2 "register_operand" "%0, 0")
+ (match_operand:SVE_I 3 "aarch64_sve_mul_operand" "vsm, w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ mul\t%0.<Vetype>, %0.<Vetype>, #%3
+ mul\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+(define_insn "*madd<mode>"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+ (plus:SVE_I
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w")
+ (match_operand:SVE_I 3 "register_operand" "w, w"))]
+ UNSPEC_MERGE_PTRUE)
+ (match_operand:SVE_I 4 "register_operand" "w, 0")))]
+ "TARGET_SVE"
+ "@
+ mad\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>
+ mla\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>"
+)
+
+(define_insn "*msub<mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+ (minus:SVE_I
+ (match_operand:SVE_I 4 "register_operand" "w, 0")
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w")
+ (match_operand:SVE_I 3 "register_operand" "w, w"))]
+ UNSPEC_MERGE_PTRUE)))]
+ "TARGET_SVE"
+ "@
+ msb\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>
+ mls\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated NEG, NOT and POPCOUNT.
+(define_expand "<optab><mode>2"
+ [(set (match_operand:SVE_I 0 "register_operand")
+ (unspec:SVE_I
+ [(match_dup 2)
+ (SVE_INT_UNARY:SVE_I (match_operand:SVE_I 1 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; NEG, NOT and POPCOUNT predicated with a PTRUE.
+(define_insn "*<optab><mode>2"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w")
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (SVE_INT_UNARY:SVE_I
+ (match_operand:SVE_I 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
+)
+
+;; Vector AND, ORR and XOR.
+(define_insn "<optab><mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+ (LOGICAL:SVE_I
+ (match_operand:SVE_I 1 "register_operand" "%0, w")
+ (match_operand:SVE_I 2 "aarch64_sve_logical_operand" "vsl, w")))]
+ "TARGET_SVE"
+ "@
+ <logical>\t%0.<Vetype>, %0.<Vetype>, #%C2
+ <logical>\t%0.d, %1.d, %2.d"
+)
+
+;; Vector AND, ORR and XOR on floating-point modes. We avoid subregs
+;; by providing this, but we need to use UNSPECs since rtx logical ops
+;; aren't defined for floating-point modes.
+(define_insn "*<optab><mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w")
+ (unspec:SVE_F [(match_operand:SVE_F 1 "register_operand" "w")
+ (match_operand:SVE_F 2 "register_operand" "w")]
+ LOGICALF))]
+ "TARGET_SVE"
+ "<logicalf_op>\t%0.d, %1.d, %2.d"
+)
+
+;; REG_EQUAL notes on "not<mode>3" should ensure that we can generate
+;; this pattern even though the NOT instruction itself is predicated.
+(define_insn "bic<mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w")
+ (and:SVE_I
+ (not:SVE_I (match_operand:SVE_I 1 "register_operand" "w"))
+ (match_operand:SVE_I 2 "register_operand" "w")))]
+ "TARGET_SVE"
+ "bic\t%0.d, %2.d, %1.d"
+)
+
+;; Predicate AND. We can reuse one of the inputs as the GP.
+(define_insn "and<mode>3"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand" "Upa")
+ (match_operand:PRED_ALL 2 "register_operand" "Upa")))]
+ "TARGET_SVE"
+ "and\t%0.b, %1/z, %1.b, %2.b"
+)
+
+;; Unpredicated predicate ORR and XOR.
+(define_expand "<optab><mode>3"
+ [(set (match_operand:PRED_ALL 0 "register_operand")
+ (and:PRED_ALL
+ (LOGICAL_OR:PRED_ALL
+ (match_operand:PRED_ALL 1 "register_operand")
+ (match_operand:PRED_ALL 2 "register_operand"))
+ (match_dup 3)))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<MODE>mode, CONSTM1_RTX (<MODE>mode));
+ }
+)
+
+;; Predicated predicate ORR and XOR.
+(define_insn "pred_<optab><mode>3"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (and:PRED_ALL
+ (LOGICAL:PRED_ALL
+ (match_operand:PRED_ALL 2 "register_operand" "Upa")
+ (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+ (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+ "TARGET_SVE"
+ "<logical>\t%0.b, %1/z, %2.b, %3.b"
+)
+
+;; Perform a logical operation on operands 2 and 3, using operand 1 as
+;; the GP (which is known to be a PTRUE). Store the result in operand 0
+;; and set the flags in the same way as for PTEST. The (and ...) in the
+;; UNSPEC_PTEST_PTRUE is logically redundant, but means that the tested
+;; value is structurally equivalent to rhs of the second set.
+(define_insn "*<optab><mode>3_cc"
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC
+ (unspec:SI [(match_operand:PRED_ALL 1 "register_operand" "Upa")
+ (and:PRED_ALL
+ (LOGICAL:PRED_ALL
+ (match_operand:PRED_ALL 2 "register_operand" "Upa")
+ (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+ (match_dup 1))]
+ UNSPEC_PTEST_PTRUE)
+ (const_int 0)))
+ (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3))
+ (match_dup 1)))]
+ "TARGET_SVE"
+ "<logical>s\t%0.b, %1/z, %2.b, %3.b"
+)
+
+;; Unpredicated predicate inverse.
+(define_expand "one_cmpl<mode>2"
+ [(set (match_operand:PRED_ALL 0 "register_operand")
+ (and:PRED_ALL
+ (not:PRED_ALL (match_operand:PRED_ALL 1 "register_operand"))
+ (match_dup 2)))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<MODE>mode, CONSTM1_RTX (<MODE>mode));
+ }
+)
+
+;; Predicated predicate inverse.
+(define_insn "*one_cmpl<mode>3"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (and:PRED_ALL
+ (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+ (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+ "TARGET_SVE"
+ "not\t%0.b, %1/z, %2.b"
+)
+
+;; Predicated predicate BIC and ORN.
+(define_insn "*<nlogical><mode>3"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (and:PRED_ALL
+ (NLOGICAL:PRED_ALL
+ (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+ (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+ (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+ "TARGET_SVE"
+ "<nlogical>\t%0.b, %1/z, %3.b, %2.b"
+)
+
+;; Predicated predicate NAND and NOR.
+(define_insn "*<logical_nn><mode>3"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (and:PRED_ALL
+ (NLOGICAL:PRED_ALL
+ (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+ (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" "Upa")))
+ (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+ "TARGET_SVE"
+ "<logical_nn>\t%0.b, %1/z, %2.b, %3.b"
+)
+
+;; Unpredicated LSL, LSR and ASR by a vector.
+(define_expand "v<optab><mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand")
+ (unspec:SVE_I
+ [(match_dup 3)
+ (ASHIFT:SVE_I
+ (match_operand:SVE_I 1 "register_operand")
+ (match_operand:SVE_I 2 "aarch64_sve_<lr>shift_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; LSL, LSR and ASR by a vector, predicated with a PTRUE. We don't
+;; actually need the predicate for the first alternative, but using Upa
+;; or X isn't likely to gain much and would make the instruction seem
+;; less uniform to the register allocator.
+(define_insn "*v<optab><mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (ASHIFT:SVE_I
+ (match_operand:SVE_I 2 "register_operand" "w, 0")
+ (match_operand:SVE_I 3 "aarch64_sve_<lr>shift_operand" "D<lr>, w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ <shift>\t%0.<Vetype>, %2.<Vetype>, #%3
+ <shift>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+;; LSL, LSR and ASR by a scalar, which expands into one of the vector
+;; shifts above.
+(define_expand "<ASHIFT:optab><mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand")
+ (ASHIFT:SVE_I (match_operand:SVE_I 1 "register_operand")
+ (match_operand:<VEL> 2 "general_operand")))]
+ "TARGET_SVE"
+ {
+ rtx amount;
+ if (CONST_INT_P (operands[2]))
+ {
+ amount = gen_const_vec_duplicate (<MODE>mode, operands[2]);
+ if (!aarch64_sve_<lr>shift_operand (operands[2], <MODE>mode))
+ amount = force_reg (<MODE>mode, amount);
+ }
+ else
+ {
+ amount = gen_reg_rtx (<MODE>mode);
+ emit_insn (gen_vec_duplicate<mode> (amount,
+ convert_to_mode (<VEL>mode,
+ operands[2], 0)));
+ }
+ emit_insn (gen_v<optab><mode>3 (operands[0], operands[1], amount));
+ DONE;
+ }
+)
+
+;; Test all bits of operand 1. Operand 0 is a GP that is known to hold PTRUE.
+;;
+;; Using UNSPEC_PTEST_PTRUE allows combine patterns to assume that the GP
+;; is a PTRUE even if the optimizers haven't yet been able to propagate
+;; the constant. We would use a separate unspec code for PTESTs involving
+;; GPs that might not be PTRUEs.
+(define_insn "ptest_ptrue<mode>"
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC
+ (unspec:SI [(match_operand:PRED_ALL 0 "register_operand" "Upa")
+ (match_operand:PRED_ALL 1 "register_operand" "Upa")]
+ UNSPEC_PTEST_PTRUE)
+ (const_int 0)))]
+ "TARGET_SVE"
+ "ptest\t%0, %1.b"
+)
+
+;; Set element I of the result if operand1 + J < operand2 for all J in [0, I].
+;; with the comparison being unsigned.
+(define_insn "while_ult<GPI:mode><PRED_ALL:mode>"
+ [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (unspec:PRED_ALL [(match_operand:GPI 1 "aarch64_reg_or_zero" "rZ")
+ (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")]
+ UNSPEC_WHILE_LO))
+ (clobber (reg:CC CC_REGNUM))]
+ "TARGET_SVE"
+ "whilelo\t%0.<PRED_ALL:Vetype>, %<w>1, %<w>2"
+)
+
+;; WHILELO sets the flags in the same way as a PTEST with a PTRUE GP.
+;; Handle the case in which both results are useful. The GP operand
+;; to the PTEST isn't needed, so we allow it to be anything.
+(define_insn_and_split "while_ult<GPI:mode><PRED_ALL:mode>_cc"
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC
+ (unspec:SI [(match_operand:PRED_ALL 1)
+ (unspec:PRED_ALL
+ [(match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")
+ (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")]
+ UNSPEC_WHILE_LO)]
+ UNSPEC_PTEST_PTRUE)
+ (const_int 0)))
+ (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+ (unspec:PRED_ALL [(match_dup 2)
+ (match_dup 3)]
+ UNSPEC_WHILE_LO))]
+ "TARGET_SVE"
+ "whilelo\t%0.<PRED_ALL:Vetype>, %<w>2, %<w>3"
+ ;; Force the compiler to drop the unused predicate operand, so that we
+ ;; don't have an unnecessary PTRUE.
+ "&& !CONSTANT_P (operands[1])"
+ [(const_int 0)]
+ {
+ emit_insn (gen_while_ult<GPI:mode><PRED_ALL:mode>_cc
+ (operands[0], CONSTM1_RTX (<MODE>mode),
+ operands[2], operands[3]));
+ DONE;
+ }
+)
+
+;; Predicated integer comparison.
+(define_insn "*vec_cmp<cmp_op>_<mode>"
+ [(set (match_operand:<VPRED> 0 "register_operand" "=Upa, Upa")
+ (unspec:<VPRED>
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (match_operand:SVE_I 2 "register_operand" "w, w")
+ (match_operand:SVE_I 3 "aarch64_sve_cmp_<imm_con>_operand" "<imm_con>, w")]
+ SVE_COND_INT_CMP))
+ (clobber (reg:CC CC_REGNUM))]
+ "TARGET_SVE"
+ "@
+ cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+ cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Predicated integer comparison in which only the flags result is interesting.
+(define_insn "*vec_cmp<cmp_op>_<mode>_ptest"
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC
+ (unspec:SI
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (unspec:<VPRED>
+ [(match_dup 1)
+ (match_operand:SVE_I 2 "register_operand" "w, w")
+ (match_operand:SVE_I 3 "aarch64_sve_cmp_<imm_con>_operand" "<imm_con>, w")]
+ SVE_COND_INT_CMP)]
+ UNSPEC_PTEST_PTRUE)
+ (const_int 0)))
+ (clobber (match_scratch:<VPRED> 0 "=Upa, Upa"))]
+ "TARGET_SVE"
+ "@
+ cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+ cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Predicated comparison in which both the flag and predicate results
+;; are interesting.
+(define_insn "*vec_cmp<cmp_op>_<mode>_cc"
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC
+ (unspec:SI
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (unspec:<VPRED>
+ [(match_dup 1)
+ (match_operand:SVE_I 2 "register_operand" "w, w")
+ (match_operand:SVE_I 3 "aarch64_sve_cmp_<imm_con>_operand" "<imm_con>, w")]
+ SVE_COND_INT_CMP)]
+ UNSPEC_PTEST_PTRUE)
+ (const_int 0)))
+ (set (match_operand:<VPRED> 0 "register_operand" "=Upa, Upa")
+ (unspec:<VPRED>
+ [(match_dup 1)
+ (match_dup 2)
+ (match_dup 3)]
+ SVE_COND_INT_CMP))]
+ "TARGET_SVE"
+ "@
+ cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+ cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Predicated floating-point comparison (excluding FCMUO, which doesn't
+;; allow #0.0 as an operand).
+(define_insn "*vec_fcm<cmp_op><mode>"
+ [(set (match_operand:<VPRED> 0 "register_operand" "=Upa, Upa")
+ (unspec:<VPRED>
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (match_operand:SVE_F 2 "register_operand" "w, w")
+ (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero" "Dz, w")]
+ SVE_COND_FP_CMP))]
+ "TARGET_SVE"
+ "@
+ fcm<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #0.0
+ fcm<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Predicated FCMUO.
+(define_insn "*vec_fcmuo<mode>"
+ [(set (match_operand:<VPRED> 0 "register_operand" "=Upa")
+ (unspec:<VPRED>
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (match_operand:SVE_F 2 "register_operand" "w")
+ (match_operand:SVE_F 3 "register_operand" "w")]
+ UNSPEC_COND_UO))]
+ "TARGET_SVE"
+ "fcmuo\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; vcond_mask operand order: true, false, mask
+;; UNSPEC_SEL operand order: mask, true, false (as for VEC_COND_EXPR)
+;; SEL operand order: mask, true, false
+(define_insn "vcond_mask_<mode><vpred>"
+ [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+ (unspec:SVE_ALL
+ [(match_operand:<VPRED> 3 "register_operand" "Upa")
+ (match_operand:SVE_ALL 1 "register_operand" "w")
+ (match_operand:SVE_ALL 2 "register_operand" "w")]
+ UNSPEC_SEL))]
+ "TARGET_SVE"
+ "sel\t%0.<Vetype>, %3, %1.<Vetype>, %2.<Vetype>"
+)
+
+;; Selects between a duplicated immediate and zero.
+(define_insn "aarch64_sve_dup<mode>_const"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w")
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (match_operand:SVE_I 2 "aarch64_sve_dup_immediate")
+ (match_operand:SVE_I 3 "aarch64_simd_imm_zero")]
+ UNSPEC_SEL))]
+ "TARGET_SVE"
+ "mov\t%0.<Vetype>, %1/z, #%2"
+)
+
+;; Integer (signed) vcond. Don't enforce an immediate range here, since it
+;; depends on the comparison; leave it to aarch64_expand_sve_vcond instead.
+(define_expand "vcond<mode><v_int_equiv>"
+ [(set (match_operand:SVE_ALL 0 "register_operand")
+ (if_then_else:SVE_ALL
+ (match_operator 3 "comparison_operator"
+ [(match_operand:<V_INT_EQUIV> 4 "register_operand")
+ (match_operand:<V_INT_EQUIV> 5 "nonmemory_operand")])
+ (match_operand:SVE_ALL 1 "register_operand")
+ (match_operand:SVE_ALL 2 "register_operand")))]
+ "TARGET_SVE"
+ {
+ aarch64_expand_sve_vcond (<MODE>mode, <V_INT_EQUIV>mode, operands);
+ DONE;
+ }
+)
+
+;; Integer vcondu. Don't enforce an immediate range here, since it
+;; depends on the comparison; leave it to aarch64_expand_sve_vcond instead.
+(define_expand "vcondu<mode><v_int_equiv>"
+ [(set (match_operand:SVE_ALL 0 "register_operand")
+ (if_then_else:SVE_ALL
+ (match_operator 3 "comparison_operator"
+ [(match_operand:<V_INT_EQUIV> 4 "register_operand")
+ (match_operand:<V_INT_EQUIV> 5 "nonmemory_operand")])
+ (match_operand:SVE_ALL 1 "register_operand")
+ (match_operand:SVE_ALL 2 "register_operand")))]
+ "TARGET_SVE"
+ {
+ aarch64_expand_sve_vcond (<MODE>mode, <V_INT_EQUIV>mode, operands);
+ DONE;
+ }
+)
+
+;; Floating-point vcond. All comparisons except FCMUO allow a zero
+;; operand; aarch64_expand_sve_vcond handles the case of an FCMUO
+;; with zero.
+(define_expand "vcond<mode><v_fp_equiv>"
+ [(set (match_operand:SVE_SD 0 "register_operand")
+ (if_then_else:SVE_SD
+ (match_operator 3 "comparison_operator"
+ [(match_operand:<V_FP_EQUIV> 4 "register_operand")
+ (match_operand:<V_FP_EQUIV> 5 "aarch64_simd_reg_or_zero")])
+ (match_operand:SVE_SD 1 "register_operand")
+ (match_operand:SVE_SD 2 "register_operand")))]
+ "TARGET_SVE"
+ {
+ aarch64_expand_sve_vcond (<MODE>mode, <V_FP_EQUIV>mode, operands);
+ DONE;
+ }
+)
+
+;; Signed integer comparisons. Don't enforce an immediate range here, since
+;; it depends on the comparison; leave it to aarch64_expand_sve_vec_cmp_int
+;; instead.
+(define_expand "vec_cmp<mode><vpred>"
+ [(parallel
+ [(set (match_operand:<VPRED> 0 "register_operand")
+ (match_operator:<VPRED> 1 "comparison_operator"
+ [(match_operand:SVE_I 2 "register_operand")
+ (match_operand:SVE_I 3 "nonmemory_operand")]))
+ (clobber (reg:CC CC_REGNUM))])]
+ "TARGET_SVE"
+ {
+ aarch64_expand_sve_vec_cmp_int (operands[0], GET_CODE (operands[1]),
+ operands[2], operands[3]);
+ DONE;
+ }
+)
+
+;; Unsigned integer comparisons. Don't enforce an immediate range here, since
+;; it depends on the comparison; leave it to aarch64_expand_sve_vec_cmp_int
+;; instead.
+(define_expand "vec_cmpu<mode><vpred>"
+ [(parallel
+ [(set (match_operand:<VPRED> 0 "register_operand")
+ (match_operator:<VPRED> 1 "comparison_operator"
+ [(match_operand:SVE_I 2 "register_operand")
+ (match_operand:SVE_I 3 "nonmemory_operand")]))
+ (clobber (reg:CC CC_REGNUM))])]
+ "TARGET_SVE"
+ {
+ aarch64_expand_sve_vec_cmp_int (operands[0], GET_CODE (operands[1]),
+ operands[2], operands[3]);
+ DONE;
+ }
+)
+
+;; Floating-point comparisons. All comparisons except FCMUO allow a zero
+;; operand; aarch64_expand_sve_vec_cmp_float handles the case of an FCMUO
+;; with zero.
+(define_expand "vec_cmp<mode><vpred>"
+ [(set (match_operand:<VPRED> 0 "register_operand")
+ (match_operator:<VPRED> 1 "comparison_operator"
+ [(match_operand:SVE_F 2 "register_operand")
+ (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero")]))]
+ "TARGET_SVE"
+ {
+ aarch64_expand_sve_vec_cmp_float (operands[0], GET_CODE (operands[1]),
+ operands[2], operands[3], false);
+ DONE;
+ }
+)
+
+;; Branch based on predicate equality or inequality.
+(define_expand "cbranch<mode>4"
+ [(set (pc)
+ (if_then_else
+ (match_operator 0 "aarch64_equality_operator"
+ [(match_operand:PRED_ALL 1 "register_operand")
+ (match_operand:PRED_ALL 2 "aarch64_simd_reg_or_zero")])
+ (label_ref (match_operand 3 ""))
+ (pc)))]
+ ""
+ {
+ rtx ptrue = force_reg (<MODE>mode, CONSTM1_RTX (<MODE>mode));
+ rtx pred;
+ if (operands[2] == CONST0_RTX (<MODE>mode))
+ pred = operands[1];
+ else
+ {
+ pred = gen_reg_rtx (<MODE>mode);
+ emit_insn (gen_pred_xor<mode>3 (pred, ptrue, operands[1],
+ operands[2]));
+ }
+ emit_insn (gen_ptest_ptrue<mode> (ptrue, pred));
+ operands[1] = gen_rtx_REG (CCmode, CC_REGNUM);
+ operands[2] = const0_rtx;
+ }
+)
+
+;; Unpredicated integer MIN/MAX.
+(define_expand "<su><maxmin><mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand")
+ (unspec:SVE_I
+ [(match_dup 3)
+ (MAXMIN:SVE_I (match_operand:SVE_I 1 "register_operand")
+ (match_operand:SVE_I 2 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Integer MIN/MAX predicated with a PTRUE.
+(define_insn "*<su><maxmin><mode>3"
+ [(set (match_operand:SVE_I 0 "register_operand" "=w")
+ (unspec:SVE_I
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (MAXMIN:SVE_I (match_operand:SVE_I 2 "register_operand" "%0")
+ (match_operand:SVE_I 3 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<su><maxmin>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated floating-point MIN/MAX.
+(define_expand "<su><maxmin><mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 3)
+ (FMAXMIN:SVE_F (match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Floating-point MIN/MAX predicated with a PTRUE.
+(define_insn "*<su><maxmin><mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (FMAXMIN:SVE_F (match_operand:SVE_F 2 "register_operand" "%0")
+ (match_operand:SVE_F 3 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "f<maxmin>nm\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated fmin/fmax.
+(define_expand "<maxmin_uns><mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 3)
+ (unspec:SVE_F [(match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "register_operand")]
+ FMAXMIN_UNS)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; fmin/fmax predicated with a PTRUE.
+(define_insn "*<maxmin_uns><mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (unspec:SVE_F [(match_operand:SVE_F 2 "register_operand" "%0")
+ (match_operand:SVE_F 3 "register_operand" "w")]
+ FMAXMIN_UNS)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<maxmin_uns_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated integer add reduction.
+(define_expand "reduc_plus_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand")
+ (unspec:<VEL> [(match_dup 2)
+ (match_operand:SVE_I 1 "register_operand")]
+ UNSPEC_ADDV))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Predicated integer add reduction. The result is always 64-bits.
+(define_insn "*reduc_plus_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand" "=w")
+ (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (match_operand:SVE_I 2 "register_operand" "w")]
+ UNSPEC_ADDV))]
+ "TARGET_SVE"
+ "uaddv\t%d0, %1, %2.<Vetype>"
+)
+
+;; Unpredicated floating-point add reduction.
+(define_expand "reduc_plus_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand")
+ (unspec:<VEL> [(match_dup 2)
+ (match_operand:SVE_F 1 "register_operand")]
+ UNSPEC_FADDV))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Predicated floating-point add reduction.
+(define_insn "*reduc_plus_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand" "=w")
+ (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (match_operand:SVE_F 2 "register_operand" "w")]
+ UNSPEC_FADDV))]
+ "TARGET_SVE"
+ "faddv\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
+;; Unpredicated integer MIN/MAX reduction.
+(define_expand "reduc_<maxmin_uns>_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand")
+ (unspec:<VEL> [(match_dup 2)
+ (match_operand:SVE_I 1 "register_operand")]
+ MAXMINV))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Predicated integer MIN/MAX reduction.
+(define_insn "*reduc_<maxmin_uns>_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand" "=w")
+ (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (match_operand:SVE_I 2 "register_operand" "w")]
+ MAXMINV))]
+ "TARGET_SVE"
+ "<maxmin_uns_op>v\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
+;; Unpredicated floating-point MIN/MAX reduction.
+(define_expand "reduc_<maxmin_uns>_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand")
+ (unspec:<VEL> [(match_dup 2)
+ (match_operand:SVE_F 1 "register_operand")]
+ FMAXMINV))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Predicated floating-point MIN/MAX reduction.
+(define_insn "*reduc_<maxmin_uns>_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand" "=w")
+ (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (match_operand:SVE_F 2 "register_operand" "w")]
+ FMAXMINV))]
+ "TARGET_SVE"
+ "<maxmin_uns_op>v\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
+;; Unpredicated floating-point addition.
+(define_expand "add<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 3)
+ (plus:SVE_F
+ (match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "aarch64_sve_float_arith_with_sub_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Floating-point addition predicated with a PTRUE.
+(define_insn "*add<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl, Upl")
+ (plus:SVE_F
+ (match_operand:SVE_F 2 "register_operand" "%0, 0, w")
+ (match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" "vsA, vsN, w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fadd\t%0.<Vetype>, %1/m, %0.<Vetype>, #%3
+ fsub\t%0.<Vetype>, %1/m, %0.<Vetype>, #%N3
+ fadd\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated floating-point subtraction.
+(define_expand "sub<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 3)
+ (minus:SVE_F
+ (match_operand:SVE_F 1 "aarch64_sve_float_arith_operand")
+ (match_operand:SVE_F 2 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Floating-point subtraction predicated with a PTRUE.
+(define_insn "*sub<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl, Upl, Upl")
+ (minus:SVE_F
+ (match_operand:SVE_F 2 "aarch64_sve_float_arith_operand" "0, 0, vsA, w")
+ (match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" "vsA, vsN, 0, w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE
+ && (register_operand (operands[2], <MODE>mode)
+ || register_operand (operands[3], <MODE>mode))"
+ "@
+ fsub\t%0.<Vetype>, %1/m, %0.<Vetype>, #%3
+ fadd\t%0.<Vetype>, %1/m, %0.<Vetype>, #%N3
+ fsubr\t%0.<Vetype>, %1/m, %0.<Vetype>, #%2
+ fsub\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated floating-point multiplication.
+(define_expand "mul<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 3)
+ (mult:SVE_F
+ (match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "aarch64_sve_float_mul_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Floating-point multiplication predicated with a PTRUE.
+(define_insn "*mul<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (mult:SVE_F
+ (match_operand:SVE_F 2 "register_operand" "%0, w")
+ (match_operand:SVE_F 3 "aarch64_sve_float_mul_operand" "vsM, w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fmul\t%0.<Vetype>, %1/m, %0.<Vetype>, #%3
+ fmul\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated fma (%0 = (%1 * %2) + %3).
+(define_expand "fma<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 4)
+ (fma:SVE_F (match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "register_operand")
+ (match_operand:SVE_F 3 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; fma predicated with a PTRUE.
+(define_insn "*fma<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (fma:SVE_F (match_operand:SVE_F 3 "register_operand" "%0, w")
+ (match_operand:SVE_F 4 "register_operand" "w, w")
+ (match_operand:SVE_F 2 "register_operand" "w, 0"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fmad\t%0.<Vetype>, %1/m, %4.<Vetype>, %2.<Vetype>
+ fmla\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+;; Unpredicated fnma (%0 = (-%1 * %2) + %3).
+(define_expand "fnma<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 4)
+ (fma:SVE_F (neg:SVE_F
+ (match_operand:SVE_F 1 "register_operand"))
+ (match_operand:SVE_F 2 "register_operand")
+ (match_operand:SVE_F 3 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; fnma predicated with a PTRUE.
+(define_insn "*fnma<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (fma:SVE_F (neg:SVE_F
+ (match_operand:SVE_F 3 "register_operand" "%0, w"))
+ (match_operand:SVE_F 4 "register_operand" "w, w")
+ (match_operand:SVE_F 2 "register_operand" "w, 0"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fmsb\t%0.<Vetype>, %1/m, %4.<Vetype>, %2.<Vetype>
+ fmls\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+;; Unpredicated fms (%0 = (%1 * %2) - %3).
+(define_expand "fms<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 4)
+ (fma:SVE_F (match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "register_operand")
+ (neg:SVE_F
+ (match_operand:SVE_F 3 "register_operand")))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; fms predicated with a PTRUE.
+(define_insn "*fms<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (fma:SVE_F (match_operand:SVE_F 3 "register_operand" "%0, w")
+ (match_operand:SVE_F 4 "register_operand" "w, w")
+ (neg:SVE_F
+ (match_operand:SVE_F 2 "register_operand" "w, 0")))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fnmsb\t%0.<Vetype>, %1/m, %4.<Vetype>, %2.<Vetype>
+ fnmls\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+;; Unpredicated fnms (%0 = (-%1 * %2) - %3).
+(define_expand "fnms<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 4)
+ (fma:SVE_F (neg:SVE_F
+ (match_operand:SVE_F 1 "register_operand"))
+ (match_operand:SVE_F 2 "register_operand")
+ (neg:SVE_F
+ (match_operand:SVE_F 3 "register_operand")))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; fnms predicated with a PTRUE.
+(define_insn "*fnms<mode>4"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (fma:SVE_F (neg:SVE_F
+ (match_operand:SVE_F 3 "register_operand" "%0, w"))
+ (match_operand:SVE_F 4 "register_operand" "w, w")
+ (neg:SVE_F
+ (match_operand:SVE_F 2 "register_operand" "w, 0")))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fnmad\t%0.<Vetype>, %1/m, %4.<Vetype>, %2.<Vetype>
+ fnmla\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+;; Unpredicated floating-point division.
+(define_expand "div<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 3)
+ (div:SVE_F (match_operand:SVE_F 1 "register_operand")
+ (match_operand:SVE_F 2 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Floating-point division predicated with a PTRUE.
+(define_insn "*div<mode>3"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+ (div:SVE_F (match_operand:SVE_F 2 "register_operand" "0, w")
+ (match_operand:SVE_F 3 "register_operand" "w, 0"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "@
+ fdiv\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
+ fdivr\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>"
+)
+
+;; Unpredicated FNEG, FABS and FSQRT.
+(define_expand "<optab><mode>2"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 2)
+ (SVE_FP_UNARY:SVE_F (match_operand:SVE_F 1 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; FNEG, FABS and FSQRT predicated with a PTRUE.
+(define_insn "*<optab><mode>2"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (SVE_FP_UNARY:SVE_F (match_operand:SVE_F 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<sve_fp_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
+)
+
+;; Unpredicated FRINTy.
+(define_expand "<frint_pattern><mode>2"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 2)
+ (unspec:SVE_F [(match_operand:SVE_F 1 "register_operand")]
+ FRINT)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; FRINTy predicated with a PTRUE.
+(define_insn "*<frint_pattern><mode>2"
+ [(set (match_operand:SVE_F 0 "register_operand" "=w")
+ (unspec:SVE_F
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (unspec:SVE_F [(match_operand:SVE_F 2 "register_operand" "w")]
+ FRINT)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "frint<frint_suffix>\t%0.<Vetype>, %1/m, %2.<Vetype>"
+)
+
+;; Unpredicated conversion of floats to integers of the same size (HF to HI,
+;; SF to SI or DF to DI).
+(define_expand "<fix_trunc_optab><mode><v_int_equiv>2"
+ [(set (match_operand:<V_INT_EQUIV> 0 "register_operand")
+ (unspec:<V_INT_EQUIV>
+ [(match_dup 2)
+ (FIXUORS:<V_INT_EQUIV>
+ (match_operand:SVE_F 1 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Conversion of SF to DI, SI or HI, predicated with a PTRUE.
+(define_insn "*<fix_trunc_optab>v16hsf<mode>2"
+ [(set (match_operand:SVE_HSDI 0 "register_operand" "=w")
+ (unspec:SVE_HSDI
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (FIXUORS:SVE_HSDI
+ (match_operand:VNx8HF 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "fcvtz<su>\t%0.<Vetype>, %1/m, %2.h"
+)
+
+;; Conversion of SF to DI or SI, predicated with a PTRUE.
+(define_insn "*<fix_trunc_optab>vnx4sf<mode>2"
+ [(set (match_operand:SVE_SDI 0 "register_operand" "=w")
+ (unspec:SVE_SDI
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (FIXUORS:SVE_SDI
+ (match_operand:VNx4SF 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "fcvtz<su>\t%0.<Vetype>, %1/m, %2.s"
+)
+
+;; Conversion of DF to DI or SI, predicated with a PTRUE.
+(define_insn "*<fix_trunc_optab>vnx2df<mode>2"
+ [(set (match_operand:SVE_SDI 0 "register_operand" "=w")
+ (unspec:SVE_SDI
+ [(match_operand:VNx2BI 1 "register_operand" "Upl")
+ (FIXUORS:SVE_SDI
+ (match_operand:VNx2DF 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "fcvtz<su>\t%0.<Vetype>, %1/m, %2.d"
+)
+
+;; Unpredicated conversion of integers to floats of the same size
+;; (HI to HF, SI to SF or DI to DF).
+(define_expand "<optab><v_int_equiv><mode>2"
+ [(set (match_operand:SVE_F 0 "register_operand")
+ (unspec:SVE_F
+ [(match_dup 2)
+ (FLOATUORS:SVE_F
+ (match_operand:<V_INT_EQUIV> 1 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+ }
+)
+
+;; Conversion of DI, SI or HI to the same number of HFs, predicated
+;; with a PTRUE.
+(define_insn "*<optab><mode>vnx8hf2"
+ [(set (match_operand:VNx8HF 0 "register_operand" "=w")
+ (unspec:VNx8HF
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (FLOATUORS:VNx8HF
+ (match_operand:SVE_HSDI 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<su_optab>cvtf\t%0.h, %1/m, %2.<Vetype>"
+)
+
+;; Conversion of DI or SI to the same number of SFs, predicated with a PTRUE.
+(define_insn "*<optab><mode>vnx4sf2"
+ [(set (match_operand:VNx4SF 0 "register_operand" "=w")
+ (unspec:VNx4SF
+ [(match_operand:<VPRED> 1 "register_operand" "Upl")
+ (FLOATUORS:VNx4SF
+ (match_operand:SVE_SDI 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<su_optab>cvtf\t%0.s, %1/m, %2.<Vetype>"
+)
+
+;; Conversion of DI or SI to DF, predicated with a PTRUE.
+(define_insn "*<optab><mode>vnx2df2"
+ [(set (match_operand:VNx2DF 0 "register_operand" "=w")
+ (unspec:VNx2DF
+ [(match_operand:VNx2BI 1 "register_operand" "Upl")
+ (FLOATUORS:VNx2DF
+ (match_operand:SVE_SDI 2 "register_operand" "w"))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "<su_optab>cvtf\t%0.d, %1/m, %2.<Vetype>"
+)
+
+;; Conversion of DFs to the same number of SFs, or SFs to the same number
+;; of HFs.
+(define_insn "*trunc<Vwide><mode>2"
+ [(set (match_operand:SVE_HSF 0 "register_operand" "=w")
+ (unspec:SVE_HSF
+ [(match_operand:<VWIDE_PRED> 1 "register_operand" "Upl")
+ (unspec:SVE_HSF
+ [(match_operand:<VWIDE> 2 "register_operand" "w")]
+ UNSPEC_FLOAT_CONVERT)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "fcvt\t%0.<Vetype>, %1/m, %2.<Vewtype>"
+)
+
+;; Conversion of SFs to the same number of DFs, or HFs to the same number
+;; of SFs.
+(define_insn "*extend<mode><Vwide>2"
+ [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+ (unspec:<VWIDE>
+ [(match_operand:<VWIDE_PRED> 1 "register_operand" "Upl")
+ (unspec:<VWIDE>
+ [(match_operand:SVE_HSF 2 "register_operand" "w")]
+ UNSPEC_FLOAT_CONVERT)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ "fcvt\t%0.<Vewtype>, %1/m, %2.<Vetype>"
+)
+
+;; PUNPKHI and PUNPKLO.
+(define_insn "vec_unpack<su>_<perm_hilo>_<mode>"
+ [(set (match_operand:<VWIDE> 0 "register_operand" "=Upa")
+ (unspec:<VWIDE> [(match_operand:PRED_BHS 1 "register_operand" "Upa")]
+ UNPACK))]
+ "TARGET_SVE"
+ "punpk<perm_hilo>\t%0.h, %1.b"
+)
+
+;; SUNPKHI, UUNPKHI, SUNPKLO and UUNPKLO.
+(define_insn "vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>"
+ [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+ (unspec:<VWIDE> [(match_operand:SVE_BHSI 1 "register_operand" "w")]
+ UNPACK))]
+ "TARGET_SVE"
+ "<su>unpk<perm_hilo>\t%0.<Vewtype>, %1.<Vetype>"
+)
+
+;; Used by the vec_unpacks_<perm_hilo>_<mode> expander to unpack the bit
+;; representation of a VNx4SF or VNx8HF without conversion. The choice
+;; between signed and unsigned isn't significant.
+(define_insn "*vec_unpacku_<perm_hilo>_<mode>_no_convert"
+ [(set (match_operand:SVE_HSF 0 "register_operand" "=w")
+ (unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand" "w")]
+ UNPACK_UNSIGNED))]
+ "TARGET_SVE"
+ "uunpk<perm_hilo>\t%0.<Vewtype>, %1.<Vetype>"
+)
+
+;; Unpack one half of a VNx4SF to VNx2DF, or one half of a VNx8HF to VNx4SF.
+;; First unpack the source without conversion, then float-convert the
+;; unpacked source.
+(define_expand "vec_unpacks_<perm_hilo>_<mode>"
+ [(set (match_dup 2)
+ (unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand")]
+ UNPACK_UNSIGNED))
+ (set (match_operand:<VWIDE> 0 "register_operand")
+ (unspec:<VWIDE> [(match_dup 3)
+ (unspec:<VWIDE> [(match_dup 2)] UNSPEC_FLOAT_CONVERT)]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = gen_reg_rtx (<MODE>mode);
+ operands[3] = force_reg (<VWIDE_PRED>mode, CONSTM1_RTX (<VWIDE_PRED>mode));
+ }
+)
+
+;; Unpack one half of a VNx4SI to VNx2DF. First unpack from VNx4SI
+;; to VNx2DI, reinterpret the VNx2DI as a VNx4SI, then convert the
+;; unpacked VNx4SI to VNx2DF.
+(define_expand "vec_unpack<su_optab>_float_<perm_hilo>_vnx4si"
+ [(set (match_dup 2)
+ (unspec:VNx2DI [(match_operand:VNx4SI 1 "register_operand")]
+ UNPACK_UNSIGNED))
+ (set (match_operand:VNx2DF 0 "register_operand")
+ (unspec:VNx2DF [(match_dup 3)
+ (FLOATUORS:VNx2DF (match_dup 4))]
+ UNSPEC_MERGE_PTRUE))]
+ "TARGET_SVE"
+ {
+ operands[2] = gen_reg_rtx (VNx2DImode);
+ operands[3] = force_reg (VNx2BImode, CONSTM1_RTX (VNx2BImode));
+ operands[4] = gen_rtx_SUBREG (VNx4SImode, operands[2], 0);
+ }
+)
+
+;; Predicate pack. Use UZP1 on the narrower type, which discards
+;; the high part of each wide element.
+(define_insn "vec_pack_trunc_<Vwide>"
+ [(set (match_operand:PRED_BHS 0 "register_operand" "=Upa")
+ (unspec:PRED_BHS
+ [(match_operand:<VWIDE> 1 "register_operand" "Upa")
+ (match_operand:<VWIDE> 2 "register_operand" "Upa")]
+ UNSPEC_PACK))]
+ "TARGET_SVE"
+ "uzp1\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+;; Integer pack. Use UZP1 on the narrower type, which discards
+;; the high part of each wide element.
+(define_insn "vec_pack_trunc_<Vwide>"
+ [(set (match_operand:SVE_BHSI 0 "register_operand" "=w")
+ (unspec:SVE_BHSI
+ [(match_operand:<VWIDE> 1 "register_operand" "w")
+ (match_operand:<VWIDE> 2 "register_operand" "w")]
+ UNSPEC_PACK))]
+ "TARGET_SVE"
+ "uzp1\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+;; Convert two vectors of DF to SF, or two vectors of SF to HF, and pack
+;; the results into a single vector.
+(define_expand "vec_pack_trunc_<Vwide>"
+ [(set (match_dup 4)
+ (unspec:SVE_HSF
+ [(match_dup 3)
+ (unspec:SVE_HSF [(match_operand:<VWIDE> 1 "register_operand")]
+ UNSPEC_FLOAT_CONVERT)]
+ UNSPEC_MERGE_PTRUE))
+ (set (match_dup 5)
+ (unspec:SVE_HSF
+ [(match_dup 3)
+ (unspec:SVE_HSF [(match_operand:<VWIDE> 2 "register_operand")]
+ UNSPEC_FLOAT_CONVERT)]
+ UNSPEC_MERGE_PTRUE))
+ (set (match_operand:SVE_HSF 0 "register_operand")
+ (unspec:SVE_HSF [(match_dup 4) (match_dup 5)] UNSPEC_UZP1))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (<VWIDE_PRED>mode, CONSTM1_RTX (<VWIDE_PRED>mode));
+ operands[4] = gen_reg_rtx (<MODE>mode);
+ operands[5] = gen_reg_rtx (<MODE>mode);
+ }
+)
+
+;; Convert two vectors of DF to SI and pack the results into a single vector.
+(define_expand "vec_pack_<su>fix_trunc_vnx2df"
+ [(set (match_dup 4)
+ (unspec:VNx4SI
+ [(match_dup 3)
+ (FIXUORS:VNx4SI (match_operand:VNx2DF 1 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))
+ (set (match_dup 5)
+ (unspec:VNx4SI
+ [(match_dup 3)
+ (FIXUORS:VNx4SI (match_operand:VNx2DF 2 "register_operand"))]
+ UNSPEC_MERGE_PTRUE))
+ (set (match_operand:VNx4SI 0 "register_operand")
+ (unspec:VNx4SI [(match_dup 4) (match_dup 5)] UNSPEC_UZP1))]
+ "TARGET_SVE"
+ {
+ operands[3] = force_reg (VNx2BImode, CONSTM1_RTX (VNx2BImode));
+ operands[4] = gen_reg_rtx (VNx4SImode);
+ operands[5] = gen_reg_rtx (VNx4SImode);
+ }
+)
#include "sched-int.h"
#include "target-globals.h"
#include "common/common-target.h"
+#include "cfgrtl.h"
#include "selftest.h"
#include "selftest-rtl.h"
+#include "rtx-vector-builder.h"
/* This file should be included last. */
#include "target-def.h"
simd_immediate_info (scalar_int_mode, unsigned HOST_WIDE_INT,
insn_type = MOV, modifier_type = LSL,
unsigned int = 0);
+ simd_immediate_info (scalar_mode, rtx, rtx);
/* The mode of the elements. */
scalar_mode elt_mode;
- /* The value of each element. */
+ /* The value of each element if all elements are the same, or the
+ first value if the constant is a series. */
rtx value;
+ /* The value of the step if the constant is a series, null otherwise. */
+ rtx step;
+
/* The instruction to use to move the immediate into a vector. */
insn_type insn;
ELT_MODE_IN and value VALUE_IN. */
inline simd_immediate_info
::simd_immediate_info (scalar_float_mode elt_mode_in, rtx value_in)
- : elt_mode (elt_mode_in), value (value_in), insn (MOV),
+ : elt_mode (elt_mode_in), value (value_in), step (NULL_RTX), insn (MOV),
modifier (LSL), shift (0)
{}
insn_type insn_in, modifier_type modifier_in,
unsigned int shift_in)
: elt_mode (elt_mode_in), value (gen_int_mode (value_in, elt_mode_in)),
- insn (insn_in), modifier (modifier_in), shift (shift_in)
+ step (NULL_RTX), insn (insn_in), modifier (modifier_in), shift (shift_in)
+{}
+
+/* Construct an integer immediate in which each element has mode ELT_MODE_IN
+ and where element I is equal to VALUE_IN + I * STEP_IN. */
+inline simd_immediate_info
+::simd_immediate_info (scalar_mode elt_mode_in, rtx value_in, rtx step_in)
+ : elt_mode (elt_mode_in), value (value_in), step (step_in), insn (MOV),
+ modifier (LSL), shift (0)
{}
/* The current code model. */
enum aarch64_code_model aarch64_cmodel;
+/* The number of 64-bit elements in an SVE vector. */
+poly_uint16 aarch64_sve_vg;
+
#ifdef HAVE_AS_TLS
#undef TARGET_HAVE_TLS
#define TARGET_HAVE_TLS 1
const_tree type,
int misalignment,
bool is_packed);
-static machine_mode
-aarch64_simd_container_mode (scalar_mode mode, unsigned width);
+static machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64);
static bool aarch64_print_ldpstp_address (FILE *, machine_mode, rtx);
/* Major revision number of the ARM Architecture implemented by the target. */
return AARCH64_DWARF_SP;
else if (FP_REGNUM_P (regno))
return AARCH64_DWARF_V0 + regno - V0_REGNUM;
+ else if (PR_REGNUM_P (regno))
+ return AARCH64_DWARF_P0 + regno - P0_REGNUM;
+ else if (regno == VG_REGNUM)
+ return AARCH64_DWARF_VG;
/* Return values >= DWARF_FRAME_REGISTERS indicate that there is no
equivalent DWARF register. */
return DWARF_FRAME_REGISTERS;
}
-/* Return TRUE if MODE is any of the large INT modes. */
+/* Return true if MODE is any of the Advanced SIMD structure modes. */
+static bool
+aarch64_advsimd_struct_mode_p (machine_mode mode)
+{
+ return (TARGET_SIMD
+ && (mode == OImode || mode == CImode || mode == XImode));
+}
+
+/* Return true if MODE is an SVE predicate mode. */
+static bool
+aarch64_sve_pred_mode_p (machine_mode mode)
+{
+ return (TARGET_SVE
+ && (mode == VNx16BImode
+ || mode == VNx8BImode
+ || mode == VNx4BImode
+ || mode == VNx2BImode));
+}
+
+/* Three mutually-exclusive flags describing a vector or predicate type. */
+const unsigned int VEC_ADVSIMD = 1;
+const unsigned int VEC_SVE_DATA = 2;
+const unsigned int VEC_SVE_PRED = 4;
+/* Can be used in combination with VEC_ADVSIMD or VEC_SVE_DATA to indicate
+ a structure of 2, 3 or 4 vectors. */
+const unsigned int VEC_STRUCT = 8;
+/* Useful combinations of the above. */
+const unsigned int VEC_ANY_SVE = VEC_SVE_DATA | VEC_SVE_PRED;
+const unsigned int VEC_ANY_DATA = VEC_ADVSIMD | VEC_SVE_DATA;
+
+/* Return a set of flags describing the vector properties of mode MODE.
+ Ignore modes that are not supported by the current target. */
+static unsigned int
+aarch64_classify_vector_mode (machine_mode mode)
+{
+ if (aarch64_advsimd_struct_mode_p (mode))
+ return VEC_ADVSIMD | VEC_STRUCT;
+
+ if (aarch64_sve_pred_mode_p (mode))
+ return VEC_SVE_PRED;
+
+ scalar_mode inner = GET_MODE_INNER (mode);
+ if (VECTOR_MODE_P (mode)
+ && (inner == QImode
+ || inner == HImode
+ || inner == HFmode
+ || inner == SImode
+ || inner == SFmode
+ || inner == DImode
+ || inner == DFmode))
+ {
+ if (TARGET_SVE
+ && known_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR))
+ return VEC_SVE_DATA;
+
+ /* This includes V1DF but not V1DI (which doesn't exist). */
+ if (TARGET_SIMD
+ && (known_eq (GET_MODE_BITSIZE (mode), 64)
+ || known_eq (GET_MODE_BITSIZE (mode), 128)))
+ return VEC_ADVSIMD;
+ }
+
+ return 0;
+}
+
+/* Return true if MODE is any of the data vector modes, including
+ structure modes. */
static bool
-aarch64_vect_struct_mode_p (machine_mode mode)
+aarch64_vector_data_mode_p (machine_mode mode)
{
- return mode == OImode || mode == CImode || mode == XImode;
+ return aarch64_classify_vector_mode (mode) & VEC_ANY_DATA;
}
-/* Return TRUE if MODE is any of the vector modes. */
+/* Return true if MODE is an SVE data vector mode; either a single vector
+ or a structure of vectors. */
static bool
-aarch64_vector_mode_p (machine_mode mode)
+aarch64_sve_data_mode_p (machine_mode mode)
{
- return aarch64_vector_mode_supported_p (mode)
- || aarch64_vect_struct_mode_p (mode);
+ return aarch64_classify_vector_mode (mode) & VEC_SVE_DATA;
}
/* Implement target hook TARGET_ARRAY_MODE_SUPPORTED_P. */
return false;
}
+/* Return the SVE predicate mode to use for elements that have
+ ELEM_NBYTES bytes, if such a mode exists. */
+
+opt_machine_mode
+aarch64_sve_pred_mode (unsigned int elem_nbytes)
+{
+ if (TARGET_SVE)
+ {
+ if (elem_nbytes == 1)
+ return VNx16BImode;
+ if (elem_nbytes == 2)
+ return VNx8BImode;
+ if (elem_nbytes == 4)
+ return VNx4BImode;
+ if (elem_nbytes == 8)
+ return VNx2BImode;
+ }
+ return opt_machine_mode ();
+}
+
+/* Implement TARGET_VECTORIZE_GET_MASK_MODE. */
+
+static opt_machine_mode
+aarch64_get_mask_mode (poly_uint64 nunits, poly_uint64 nbytes)
+{
+ if (TARGET_SVE && known_eq (nbytes, BYTES_PER_SVE_VECTOR))
+ {
+ unsigned int elem_nbytes = vector_element_size (nbytes, nunits);
+ machine_mode pred_mode;
+ if (aarch64_sve_pred_mode (elem_nbytes).exists (&pred_mode))
+ return pred_mode;
+ }
+
+ return default_get_mask_mode (nunits, nbytes);
+}
+
/* Implement TARGET_HARD_REGNO_NREGS. */
static unsigned int
{
case FP_REGS:
case FP_LO_REGS:
+ if (aarch64_sve_data_mode_p (mode))
+ return exact_div (GET_MODE_SIZE (mode),
+ BYTES_PER_SVE_VECTOR).to_constant ();
return CEIL (lowest_size, UNITS_PER_VREG);
+ case PR_REGS:
+ case PR_LO_REGS:
+ case PR_HI_REGS:
+ return 1;
default:
return CEIL (lowest_size, UNITS_PER_WORD);
}
if (GET_MODE_CLASS (mode) == MODE_CC)
return regno == CC_REGNUM;
+ if (regno == VG_REGNUM)
+ /* This must have the same size as _Unwind_Word. */
+ return mode == DImode;
+
+ unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+ if (vec_flags & VEC_SVE_PRED)
+ return PR_REGNUM_P (regno);
+
+ if (PR_REGNUM_P (regno))
+ return 0;
+
if (regno == SP_REGNUM)
/* The purpose of comparing with ptr_mode is to support the
global register variable associated with the stack pointer
if (regno == FRAME_POINTER_REGNUM || regno == ARG_POINTER_REGNUM)
return mode == Pmode;
- if (GP_REGNUM_P (regno) && ! aarch64_vect_struct_mode_p (mode))
+ if (GP_REGNUM_P (regno) && known_le (GET_MODE_SIZE (mode), 16))
return true;
if (FP_REGNUM_P (regno))
{
- if (aarch64_vect_struct_mode_p (mode))
+ if (vec_flags & VEC_STRUCT)
return end_hard_regno (mode, regno) - 1 <= V31_REGNUM;
else
- return true;
+ return !VECTOR_MODE_P (mode) || vec_flags != 0;
}
return false;
return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
}
+/* Implement REGMODE_NATURAL_SIZE. */
+poly_uint64
+aarch64_regmode_natural_size (machine_mode mode)
+{
+ /* The natural size for SVE data modes is one SVE data vector,
+ and similarly for predicates. We can't independently modify
+ anything smaller than that. */
+ /* ??? For now, only do this for variable-width SVE registers.
+ Doing it for constant-sized registers breaks lower-subreg.c. */
+ /* ??? And once that's fixed, we should probably have similar
+ code for Advanced SIMD. */
+ if (!aarch64_sve_vg.is_constant ())
+ {
+ unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+ if (vec_flags & VEC_SVE_PRED)
+ return BYTES_PER_SVE_PRED;
+ if (vec_flags & VEC_SVE_DATA)
+ return BYTES_PER_SVE_VECTOR;
+ }
+ return UNITS_PER_WORD;
+}
+
/* Implement HARD_REGNO_CALLER_SAVE_MODE. */
machine_mode
-aarch64_hard_regno_caller_save_mode (unsigned, unsigned, machine_mode mode)
-{
+aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,
+ machine_mode mode)
+{
+ /* The predicate mode determines which bits are significant and
+ which are "don't care". Decreasing the number of lanes would
+ lose data while increasing the number of lanes would make bits
+ unnecessarily significant. */
+ if (PR_REGNUM_P (regno))
+ return mode;
if (known_ge (GET_MODE_SIZE (mode), 4))
return mode;
else
}
}
+/* Return true if we can move VALUE into a register using a single
+ CNT[BHWD] instruction. */
+
+static bool
+aarch64_sve_cnt_immediate_p (poly_int64 value)
+{
+ HOST_WIDE_INT factor = value.coeffs[0];
+ /* The coefficient must be [1, 16] * {2, 4, 8, 16}. */
+ return (value.coeffs[1] == factor
+ && IN_RANGE (factor, 2, 16 * 16)
+ && (factor & 1) == 0
+ && factor <= 16 * (factor & -factor));
+}
+
+/* Likewise for rtx X. */
+
+bool
+aarch64_sve_cnt_immediate_p (rtx x)
+{
+ poly_int64 value;
+ return poly_int_rtx_p (x, &value) && aarch64_sve_cnt_immediate_p (value);
+}
+
+/* Return the asm string for an instruction with a CNT-like vector size
+ operand (a vector pattern followed by a multiplier in the range [1, 16]).
+ PREFIX is the mnemonic without the size suffix and OPERANDS is the
+ first part of the operands template (the part that comes before the
+ vector size itself). FACTOR is the number of quadwords.
+ NELTS_PER_VQ, if nonzero, is the number of elements in each quadword.
+ If it is zero, we can use any element size. */
+
+static char *
+aarch64_output_sve_cnt_immediate (const char *prefix, const char *operands,
+ unsigned int factor,
+ unsigned int nelts_per_vq)
+{
+ static char buffer[sizeof ("sqincd\t%x0, %w0, all, mul #16")];
+
+ if (nelts_per_vq == 0)
+ /* There is some overlap in the ranges of the four CNT instructions.
+ Here we always use the smallest possible element size, so that the
+ multiplier is 1 whereever possible. */
+ nelts_per_vq = factor & -factor;
+ int shift = std::min (exact_log2 (nelts_per_vq), 4);
+ gcc_assert (IN_RANGE (shift, 1, 4));
+ char suffix = "dwhb"[shift - 1];
+
+ factor >>= shift;
+ unsigned int written;
+ if (factor == 1)
+ written = snprintf (buffer, sizeof (buffer), "%s%c\t%s",
+ prefix, suffix, operands);
+ else
+ written = snprintf (buffer, sizeof (buffer), "%s%c\t%s, all, mul #%d",
+ prefix, suffix, operands, factor);
+ gcc_assert (written < sizeof (buffer));
+ return buffer;
+}
+
+/* Return the asm string for an instruction with a CNT-like vector size
+ operand (a vector pattern followed by a multiplier in the range [1, 16]).
+ PREFIX is the mnemonic without the size suffix and OPERANDS is the
+ first part of the operands template (the part that comes before the
+ vector size itself). X is the value of the vector size operand,
+ as a polynomial integer rtx. */
+
+char *
+aarch64_output_sve_cnt_immediate (const char *prefix, const char *operands,
+ rtx x)
+{
+ poly_int64 value = rtx_to_poly_int64 (x);
+ gcc_assert (aarch64_sve_cnt_immediate_p (value));
+ return aarch64_output_sve_cnt_immediate (prefix, operands,
+ value.coeffs[1], 0);
+}
+
+/* Return true if we can add VALUE to a register using a single ADDVL
+ or ADDPL instruction. */
+
+static bool
+aarch64_sve_addvl_addpl_immediate_p (poly_int64 value)
+{
+ HOST_WIDE_INT factor = value.coeffs[0];
+ if (factor == 0 || value.coeffs[1] != factor)
+ return false;
+ /* FACTOR counts VG / 2, so a value of 2 is one predicate width
+ and a value of 16 is one vector width. */
+ return (((factor & 15) == 0 && IN_RANGE (factor, -32 * 16, 31 * 16))
+ || ((factor & 1) == 0 && IN_RANGE (factor, -32 * 2, 31 * 2)));
+}
+
+/* Likewise for rtx X. */
+
+bool
+aarch64_sve_addvl_addpl_immediate_p (rtx x)
+{
+ poly_int64 value;
+ return (poly_int_rtx_p (x, &value)
+ && aarch64_sve_addvl_addpl_immediate_p (value));
+}
+
+/* Return the asm string for adding ADDVL or ADDPL immediate X to operand 1
+ and storing the result in operand 0. */
+
+char *
+aarch64_output_sve_addvl_addpl (rtx dest, rtx base, rtx offset)
+{
+ static char buffer[sizeof ("addpl\t%x0, %x1, #-") + 3 * sizeof (int)];
+ poly_int64 offset_value = rtx_to_poly_int64 (offset);
+ gcc_assert (aarch64_sve_addvl_addpl_immediate_p (offset_value));
+
+ /* Use INC or DEC if possible. */
+ if (rtx_equal_p (dest, base) && GP_REGNUM_P (REGNO (dest)))
+ {
+ if (aarch64_sve_cnt_immediate_p (offset_value))
+ return aarch64_output_sve_cnt_immediate ("inc", "%x0",
+ offset_value.coeffs[1], 0);
+ if (aarch64_sve_cnt_immediate_p (-offset_value))
+ return aarch64_output_sve_cnt_immediate ("dec", "%x0",
+ -offset_value.coeffs[1], 0);
+ }
+
+ int factor = offset_value.coeffs[1];
+ if ((factor & 15) == 0)
+ snprintf (buffer, sizeof (buffer), "addvl\t%%x0, %%x1, #%d", factor / 16);
+ else
+ snprintf (buffer, sizeof (buffer), "addpl\t%%x0, %%x1, #%d", factor / 2);
+ return buffer;
+}
+
+/* Return true if X is a valid immediate for an SVE vector INC or DEC
+ instruction. If it is, store the number of elements in each vector
+ quadword in *NELTS_PER_VQ_OUT (if nonnull) and store the multiplication
+ factor in *FACTOR_OUT (if nonnull). */
+
+bool
+aarch64_sve_inc_dec_immediate_p (rtx x, int *factor_out,
+ unsigned int *nelts_per_vq_out)
+{
+ rtx elt;
+ poly_int64 value;
+
+ if (!const_vec_duplicate_p (x, &elt)
+ || !poly_int_rtx_p (elt, &value))
+ return false;
+
+ unsigned int nelts_per_vq = 128 / GET_MODE_UNIT_BITSIZE (GET_MODE (x));
+ if (nelts_per_vq != 8 && nelts_per_vq != 4 && nelts_per_vq != 2)
+ /* There's no vector INCB. */
+ return false;
+
+ HOST_WIDE_INT factor = value.coeffs[0];
+ if (value.coeffs[1] != factor)
+ return false;
+
+ /* The coefficient must be [1, 16] * NELTS_PER_VQ. */
+ if ((factor % nelts_per_vq) != 0
+ || !IN_RANGE (abs (factor), nelts_per_vq, 16 * nelts_per_vq))
+ return false;
+
+ if (factor_out)
+ *factor_out = factor;
+ if (nelts_per_vq_out)
+ *nelts_per_vq_out = nelts_per_vq;
+ return true;
+}
+
+/* Return true if X is a valid immediate for an SVE vector INC or DEC
+ instruction. */
+
+bool
+aarch64_sve_inc_dec_immediate_p (rtx x)
+{
+ return aarch64_sve_inc_dec_immediate_p (x, NULL, NULL);
+}
+
+/* Return the asm template for an SVE vector INC or DEC instruction.
+ OPERANDS gives the operands before the vector count and X is the
+ value of the vector count operand itself. */
+
+char *
+aarch64_output_sve_inc_dec_immediate (const char *operands, rtx x)
+{
+ int factor;
+ unsigned int nelts_per_vq;
+ if (!aarch64_sve_inc_dec_immediate_p (x, &factor, &nelts_per_vq))
+ gcc_unreachable ();
+ if (factor < 0)
+ return aarch64_output_sve_cnt_immediate ("dec", operands, -factor,
+ nelts_per_vq);
+ else
+ return aarch64_output_sve_cnt_immediate ("inc", operands, factor,
+ nelts_per_vq);
+}
static int
aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
return num_insns;
}
+/* Return the number of temporary registers that aarch64_add_offset_1
+ would need to add OFFSET to a register. */
+
+static unsigned int
+aarch64_add_offset_1_temporaries (HOST_WIDE_INT offset)
+{
+ return abs_hwi (offset) < 0x1000000 ? 0 : 1;
+}
+
/* A subroutine of aarch64_add_offset. Set DEST to SRC + OFFSET for
a non-polynomial OFFSET. MODE is the mode of the addition.
FRAME_RELATED_P is true if the RTX_FRAME_RELATED flag should
}
}
+/* Return the number of temporary registers that aarch64_add_offset
+ would need to move OFFSET into a register or add OFFSET to a register;
+ ADD_P is true if we want the latter rather than the former. */
+
+static unsigned int
+aarch64_offset_temporaries (bool add_p, poly_int64 offset)
+{
+ /* This follows the same structure as aarch64_add_offset. */
+ if (add_p && aarch64_sve_addvl_addpl_immediate_p (offset))
+ return 0;
+
+ unsigned int count = 0;
+ HOST_WIDE_INT factor = offset.coeffs[1];
+ HOST_WIDE_INT constant = offset.coeffs[0] - factor;
+ poly_int64 poly_offset (factor, factor);
+ if (add_p && aarch64_sve_addvl_addpl_immediate_p (poly_offset))
+ /* Need one register for the ADDVL/ADDPL result. */
+ count += 1;
+ else if (factor != 0)
+ {
+ factor = abs (factor);
+ if (factor > 16 * (factor & -factor))
+ /* Need one register for the CNT result and one for the multiplication
+ factor. If necessary, the second temporary can be reused for the
+ constant part of the offset. */
+ return 2;
+ /* Need one register for the CNT result (which might then
+ be shifted). */
+ count += 1;
+ }
+ return count + aarch64_add_offset_1_temporaries (constant);
+}
+
+/* If X can be represented as a poly_int64, return the number
+ of temporaries that are required to add it to a register.
+ Return -1 otherwise. */
+
+int
+aarch64_add_offset_temporaries (rtx x)
+{
+ poly_int64 offset;
+ if (!poly_int_rtx_p (x, &offset))
+ return -1;
+ return aarch64_offset_temporaries (true, offset);
+}
+
/* Set DEST to SRC + OFFSET. MODE is the mode of the addition.
FRAME_RELATED_P is true if the RTX_FRAME_RELATED flag should
be set and CFA adjustments added to the generated instructions.
TEMP1, if nonnull, is a register of mode MODE that can be used as a
temporary if register allocation is already complete. This temporary
- register may overlap DEST but must not overlap SRC. If TEMP1 is known
- to hold abs (OFFSET), EMIT_MOVE_IMM can be set to false to avoid emitting
- the immediate again.
+ register may overlap DEST if !FRAME_RELATED_P but must not overlap SRC.
+ If TEMP1 is known to hold abs (OFFSET), EMIT_MOVE_IMM can be set to
+ false to avoid emitting the immediate again.
+
+ TEMP2, if nonnull, is a second temporary register that doesn't
+ overlap either DEST or REG.
Since this function may be used to adjust the stack pointer, we must
ensure that it cannot cause transient stack deallocation (for example
static void
aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx src,
- poly_int64 offset, rtx temp1, bool frame_related_p,
- bool emit_move_imm = true)
+ poly_int64 offset, rtx temp1, rtx temp2,
+ bool frame_related_p, bool emit_move_imm = true)
{
gcc_assert (emit_move_imm || temp1 != NULL_RTX);
gcc_assert (temp1 == NULL_RTX || !reg_overlap_mentioned_p (temp1, src));
+ gcc_assert (temp1 == NULL_RTX
+ || !frame_related_p
+ || !reg_overlap_mentioned_p (temp1, dest));
+ gcc_assert (temp2 == NULL_RTX || !reg_overlap_mentioned_p (dest, temp2));
+
+ /* Try using ADDVL or ADDPL to add the whole value. */
+ if (src != const0_rtx && aarch64_sve_addvl_addpl_immediate_p (offset))
+ {
+ rtx offset_rtx = gen_int_mode (offset, mode);
+ rtx_insn *insn = emit_insn (gen_add3_insn (dest, src, offset_rtx));
+ RTX_FRAME_RELATED_P (insn) = frame_related_p;
+ return;
+ }
+
+ /* Coefficient 1 is multiplied by the number of 128-bit blocks in an
+ SVE vector register, over and above the minimum size of 128 bits.
+ This is equivalent to half the value returned by CNTD with a
+ vector shape of ALL. */
+ HOST_WIDE_INT factor = offset.coeffs[1];
+ HOST_WIDE_INT constant = offset.coeffs[0] - factor;
+
+ /* Try using ADDVL or ADDPL to add the VG-based part. */
+ poly_int64 poly_offset (factor, factor);
+ if (src != const0_rtx
+ && aarch64_sve_addvl_addpl_immediate_p (poly_offset))
+ {
+ rtx offset_rtx = gen_int_mode (poly_offset, mode);
+ if (frame_related_p)
+ {
+ rtx_insn *insn = emit_insn (gen_add3_insn (dest, src, offset_rtx));
+ RTX_FRAME_RELATED_P (insn) = true;
+ src = dest;
+ }
+ else
+ {
+ rtx addr = gen_rtx_PLUS (mode, src, offset_rtx);
+ src = aarch64_force_temporary (mode, temp1, addr);
+ temp1 = temp2;
+ temp2 = NULL_RTX;
+ }
+ }
+ /* Otherwise use a CNT-based sequence. */
+ else if (factor != 0)
+ {
+ /* Use a subtraction if we have a negative factor. */
+ rtx_code code = PLUS;
+ if (factor < 0)
+ {
+ factor = -factor;
+ code = MINUS;
+ }
+
+ /* Calculate CNTD * FACTOR / 2. First try to fold the division
+ into the multiplication. */
+ rtx val;
+ int shift = 0;
+ if (factor & 1)
+ /* Use a right shift by 1. */
+ shift = -1;
+ else
+ factor /= 2;
+ HOST_WIDE_INT low_bit = factor & -factor;
+ if (factor <= 16 * low_bit)
+ {
+ if (factor > 16 * 8)
+ {
+ /* "CNTB Xn, ALL, MUL #FACTOR" is out of range, so calculate
+ the value with the minimum multiplier and shift it into
+ position. */
+ int extra_shift = exact_log2 (low_bit);
+ shift += extra_shift;
+ factor >>= extra_shift;
+ }
+ val = gen_int_mode (poly_int64 (factor * 2, factor * 2), mode);
+ }
+ else
+ {
+ /* Use CNTD, then multiply it by FACTOR. */
+ val = gen_int_mode (poly_int64 (2, 2), mode);
+ val = aarch64_force_temporary (mode, temp1, val);
+
+ /* Go back to using a negative multiplication factor if we have
+ no register from which to subtract. */
+ if (code == MINUS && src == const0_rtx)
+ {
+ factor = -factor;
+ code = PLUS;
+ }
+ rtx coeff1 = gen_int_mode (factor, mode);
+ coeff1 = aarch64_force_temporary (mode, temp2, coeff1);
+ val = gen_rtx_MULT (mode, val, coeff1);
+ }
+
+ if (shift > 0)
+ {
+ /* Multiply by 1 << SHIFT. */
+ val = aarch64_force_temporary (mode, temp1, val);
+ val = gen_rtx_ASHIFT (mode, val, GEN_INT (shift));
+ }
+ else if (shift == -1)
+ {
+ /* Divide by 2. */
+ val = aarch64_force_temporary (mode, temp1, val);
+ val = gen_rtx_ASHIFTRT (mode, val, const1_rtx);
+ }
+
+ /* Calculate SRC +/- CNTD * FACTOR / 2. */
+ if (src != const0_rtx)
+ {
+ val = aarch64_force_temporary (mode, temp1, val);
+ val = gen_rtx_fmt_ee (code, mode, src, val);
+ }
+ else if (code == MINUS)
+ {
+ val = aarch64_force_temporary (mode, temp1, val);
+ val = gen_rtx_NEG (mode, val);
+ }
+
+ if (constant == 0 || frame_related_p)
+ {
+ rtx_insn *insn = emit_insn (gen_rtx_SET (dest, val));
+ if (frame_related_p)
+ {
+ RTX_FRAME_RELATED_P (insn) = true;
+ add_reg_note (insn, REG_CFA_ADJUST_CFA,
+ gen_rtx_SET (dest, plus_constant (Pmode, src,
+ poly_offset)));
+ }
+ src = dest;
+ if (constant == 0)
+ return;
+ }
+ else
+ {
+ src = aarch64_force_temporary (mode, temp1, val);
+ temp1 = temp2;
+ temp2 = NULL_RTX;
+ }
+
+ emit_move_imm = true;
+ }
- /* SVE support will go here. */
- HOST_WIDE_INT constant = offset.to_constant ();
aarch64_add_offset_1 (mode, dest, src, constant, temp1,
frame_related_p, emit_move_imm);
}
+/* Like aarch64_add_offset, but the offset is given as an rtx rather
+ than a poly_int64. */
+
+void
+aarch64_split_add_offset (scalar_int_mode mode, rtx dest, rtx src,
+ rtx offset_rtx, rtx temp1, rtx temp2)
+{
+ aarch64_add_offset (mode, dest, src, rtx_to_poly_int64 (offset_rtx),
+ temp1, temp2, false);
+}
+
/* Add DELTA to the stack pointer, marking the instructions frame-related.
TEMP1 is available as a temporary if nonnull. EMIT_MOVE_IMM is false
if TEMP1 already contains abs (DELTA). */
static inline void
-aarch64_add_sp (rtx temp1, poly_int64 delta, bool emit_move_imm)
+aarch64_add_sp (rtx temp1, rtx temp2, poly_int64 delta, bool emit_move_imm)
{
aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, delta,
- temp1, true, emit_move_imm);
+ temp1, temp2, true, emit_move_imm);
}
/* Subtract DELTA from the stack pointer, marking the instructions
if nonnull. */
static inline void
-aarch64_sub_sp (rtx temp1, poly_int64 delta, bool frame_related_p)
+aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, bool frame_related_p)
{
aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, -delta,
- temp1, frame_related_p);
+ temp1, temp2, frame_related_p);
}
-void
-aarch64_expand_mov_immediate (rtx dest, rtx imm)
+/* Set DEST to (vec_series BASE STEP). */
+
+static void
+aarch64_expand_vec_series (rtx dest, rtx base, rtx step)
{
machine_mode mode = GET_MODE (dest);
+ scalar_mode inner = GET_MODE_INNER (mode);
+
+ /* Each operand can be a register or an immediate in the range [-16, 15]. */
+ if (!aarch64_sve_index_immediate_p (base))
+ base = force_reg (inner, base);
+ if (!aarch64_sve_index_immediate_p (step))
+ step = force_reg (inner, step);
+
+ emit_set_insn (dest, gen_rtx_VEC_SERIES (mode, base, step));
+}
- gcc_assert (mode == SImode || mode == DImode);
+/* Try to duplicate SRC into SVE register DEST, given that SRC is an
+ integer of mode INT_MODE. Return true on success. */
+
+static bool
+aarch64_expand_sve_widened_duplicate (rtx dest, scalar_int_mode src_mode,
+ rtx src)
+{
+ /* If the constant is smaller than 128 bits, we can do the move
+ using a vector of SRC_MODEs. */
+ if (src_mode != TImode)
+ {
+ poly_uint64 count = exact_div (GET_MODE_SIZE (GET_MODE (dest)),
+ GET_MODE_SIZE (src_mode));
+ machine_mode dup_mode = mode_for_vector (src_mode, count).require ();
+ emit_move_insn (gen_lowpart (dup_mode, dest),
+ gen_const_vec_duplicate (dup_mode, src));
+ return true;
+ }
+
+ /* The bytes are loaded in little-endian order, so do a byteswap on
+ big-endian targets. */
+ if (BYTES_BIG_ENDIAN)
+ {
+ src = simplify_unary_operation (BSWAP, src_mode, src, src_mode);
+ if (!src)
+ return NULL_RTX;
+ }
+
+ /* Use LD1RQ to load the 128 bits from memory. */
+ src = force_const_mem (src_mode, src);
+ if (!src)
+ return false;
+
+ /* Make sure that the address is legitimate. */
+ if (!aarch64_sve_ld1r_operand_p (src))
+ {
+ rtx addr = force_reg (Pmode, XEXP (src, 0));
+ src = replace_equiv_address (src, addr);
+ }
+
+ rtx ptrue = force_reg (VNx16BImode, CONSTM1_RTX (VNx16BImode));
+ emit_insn (gen_sve_ld1rq (gen_lowpart (VNx16QImode, dest), ptrue, src));
+ return true;
+}
+
+/* Expand a move of general CONST_VECTOR SRC into DEST, given that it
+ isn't a simple duplicate or series. */
+
+static void
+aarch64_expand_sve_const_vector (rtx dest, rtx src)
+{
+ machine_mode mode = GET_MODE (src);
+ unsigned int npatterns = CONST_VECTOR_NPATTERNS (src);
+ unsigned int nelts_per_pattern = CONST_VECTOR_NELTS_PER_PATTERN (src);
+ gcc_assert (npatterns > 1);
+
+ if (nelts_per_pattern == 1)
+ {
+ /* The constant is a repeating seqeuence of at least two elements,
+ where the repeating elements occupy no more than 128 bits.
+ Get an integer representation of the replicated value. */
+ unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns;
+ gcc_assert (int_bits <= 128);
+
+ scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
+ rtx int_value = simplify_gen_subreg (int_mode, src, mode, 0);
+ if (int_value
+ && aarch64_expand_sve_widened_duplicate (dest, int_mode, int_value))
+ return;
+ }
+
+ /* Expand each pattern individually. */
+ rtx_vector_builder builder;
+ auto_vec<rtx, 16> vectors (npatterns);
+ for (unsigned int i = 0; i < npatterns; ++i)
+ {
+ builder.new_vector (mode, 1, nelts_per_pattern);
+ for (unsigned int j = 0; j < nelts_per_pattern; ++j)
+ builder.quick_push (CONST_VECTOR_ELT (src, i + j * npatterns));
+ vectors.quick_push (force_reg (mode, builder.build ()));
+ }
+
+ /* Use permutes to interleave the separate vectors. */
+ while (npatterns > 1)
+ {
+ npatterns /= 2;
+ for (unsigned int i = 0; i < npatterns; ++i)
+ {
+ rtx tmp = (npatterns == 1 ? dest : gen_reg_rtx (mode));
+ rtvec v = gen_rtvec (2, vectors[i], vectors[i + npatterns]);
+ emit_set_insn (tmp, gen_rtx_UNSPEC (mode, v, UNSPEC_ZIP1));
+ vectors[i] = tmp;
+ }
+ }
+ gcc_assert (vectors[0] == dest);
+}
+
+/* Set DEST to immediate IMM. For SVE vector modes, GEN_VEC_DUPLICATE
+ is a pattern that can be used to set DEST to a replicated scalar
+ element. */
+
+void
+aarch64_expand_mov_immediate (rtx dest, rtx imm,
+ rtx (*gen_vec_duplicate) (rtx, rtx))
+{
+ machine_mode mode = GET_MODE (dest);
/* Check on what type of symbol it is. */
scalar_int_mode int_mode;
if ((GET_CODE (imm) == SYMBOL_REF
|| GET_CODE (imm) == LABEL_REF
- || GET_CODE (imm) == CONST)
+ || GET_CODE (imm) == CONST
+ || GET_CODE (imm) == CONST_POLY_INT)
&& is_a <scalar_int_mode> (mode, &int_mode))
{
- rtx mem, base, offset;
+ rtx mem;
+ poly_int64 offset;
+ HOST_WIDE_INT const_offset;
enum aarch64_symbol_type sty;
/* If we have (const (plus symbol offset)), separate out the offset
before we start classifying the symbol. */
- split_const (imm, &base, &offset);
+ rtx base = strip_offset (imm, &offset);
- sty = aarch64_classify_symbol (base, offset);
+ /* We must always add an offset involving VL separately, rather than
+ folding it into the relocation. */
+ if (!offset.is_constant (&const_offset))
+ {
+ if (base == const0_rtx && aarch64_sve_cnt_immediate_p (offset))
+ emit_insn (gen_rtx_SET (dest, imm));
+ else
+ {
+ /* Do arithmetic on 32-bit values if the result is smaller
+ than that. */
+ if (partial_subreg_p (int_mode, SImode))
+ {
+ /* It is invalid to do symbol calculations in modes
+ narrower than SImode. */
+ gcc_assert (base == const0_rtx);
+ dest = gen_lowpart (SImode, dest);
+ int_mode = SImode;
+ }
+ if (base != const0_rtx)
+ {
+ base = aarch64_force_temporary (int_mode, dest, base);
+ aarch64_add_offset (int_mode, dest, base, offset,
+ NULL_RTX, NULL_RTX, false);
+ }
+ else
+ aarch64_add_offset (int_mode, dest, base, offset,
+ dest, NULL_RTX, false);
+ }
+ return;
+ }
+
+ sty = aarch64_classify_symbol (base, const_offset);
switch (sty)
{
case SYMBOL_FORCE_TO_MEM:
- if (offset != const0_rtx
+ if (const_offset != 0
&& targetm.cannot_force_const_mem (int_mode, imm))
{
gcc_assert (can_create_pseudo_p ());
base = aarch64_force_temporary (int_mode, dest, base);
- aarch64_add_offset (int_mode, dest, base, INTVAL (offset),
- NULL_RTX, false);
+ aarch64_add_offset (int_mode, dest, base, const_offset,
+ NULL_RTX, NULL_RTX, false);
return;
}
case SYMBOL_SMALL_GOT_4G:
case SYMBOL_TINY_GOT:
case SYMBOL_TINY_TLSIE:
- if (offset != const0_rtx)
+ if (const_offset != 0)
{
gcc_assert(can_create_pseudo_p ());
base = aarch64_force_temporary (int_mode, dest, base);
- aarch64_add_offset (int_mode, dest, base, INTVAL (offset),
- NULL_RTX, false);
+ aarch64_add_offset (int_mode, dest, base, const_offset,
+ NULL_RTX, NULL_RTX, false);
return;
}
/* FALLTHRU */
if (!CONST_INT_P (imm))
{
- if (GET_CODE (imm) == HIGH)
+ rtx base, step, value;
+ if (GET_CODE (imm) == HIGH
+ || aarch64_simd_valid_immediate (imm, NULL))
emit_insn (gen_rtx_SET (dest, imm));
+ else if (const_vec_series_p (imm, &base, &step))
+ aarch64_expand_vec_series (dest, base, step);
+ else if (const_vec_duplicate_p (imm, &value))
+ {
+ /* If the constant is out of range of an SVE vector move,
+ load it from memory if we can, otherwise move it into
+ a register and use a DUP. */
+ scalar_mode inner_mode = GET_MODE_INNER (mode);
+ rtx op = force_const_mem (inner_mode, value);
+ if (!op)
+ op = force_reg (inner_mode, value);
+ else if (!aarch64_sve_ld1r_operand_p (op))
+ {
+ rtx addr = force_reg (Pmode, XEXP (op, 0));
+ op = replace_equiv_address (op, addr);
+ }
+ emit_insn (gen_vec_duplicate (dest, op));
+ }
+ else if (GET_CODE (imm) == CONST_VECTOR
+ && !GET_MODE_NUNITS (GET_MODE (imm)).is_constant ())
+ aarch64_expand_sve_const_vector (dest, imm);
else
- {
+ {
rtx mem = force_const_mem (mode, imm);
gcc_assert (mem);
- emit_insn (gen_rtx_SET (dest, mem));
+ emit_move_insn (dest, mem);
}
return;
as_a <scalar_int_mode> (mode));
}
+/* Emit an SVE predicated move from SRC to DEST. PRED is a predicate
+ that is known to contain PTRUE. */
+
+void
+aarch64_emit_sve_pred_move (rtx dest, rtx pred, rtx src)
+{
+ emit_insn (gen_rtx_SET (dest, gen_rtx_UNSPEC (GET_MODE (dest),
+ gen_rtvec (2, pred, src),
+ UNSPEC_MERGE_PTRUE)));
+}
+
+/* Expand a pre-RA SVE data move from SRC to DEST in which at least one
+ operand is in memory. In this case we need to use the predicated LD1
+ and ST1 instead of LDR and STR, both for correctness on big-endian
+ targets and because LD1 and ST1 support a wider range of addressing modes.
+ PRED_MODE is the mode of the predicate.
+
+ See the comment at the head of aarch64-sve.md for details about the
+ big-endian handling. */
+
+void
+aarch64_expand_sve_mem_move (rtx dest, rtx src, machine_mode pred_mode)
+{
+ machine_mode mode = GET_MODE (dest);
+ rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+ if (!register_operand (src, mode)
+ && !register_operand (dest, mode))
+ {
+ rtx tmp = gen_reg_rtx (mode);
+ if (MEM_P (src))
+ aarch64_emit_sve_pred_move (tmp, ptrue, src);
+ else
+ emit_move_insn (tmp, src);
+ src = tmp;
+ }
+ aarch64_emit_sve_pred_move (dest, ptrue, src);
+}
+
static bool
aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
tree exp ATTRIBUTE_UNUSED)
return MIN (MAX (alignment, PARM_BOUNDARY), STACK_BOUNDARY);
}
+/* Implement TARGET_GET_RAW_RESULT_MODE and TARGET_GET_RAW_ARG_MODE. */
+
+static fixed_size_mode
+aarch64_get_reg_raw_mode (int regno)
+{
+ if (TARGET_SVE && FP_REGNUM_P (regno))
+ /* Don't use the SVE part of the register for __builtin_apply and
+ __builtin_return. The SVE registers aren't used by the normal PCS,
+ so using them there would be a waste of time. The PCS extensions
+ for SVE types are fundamentally incompatible with the
+ __builtin_return/__builtin_apply interface. */
+ return as_a <fixed_size_mode> (V16QImode);
+ return default_get_reg_raw_mode (regno);
+}
+
/* Implement TARGET_FUNCTION_ARG_PADDING.
Small aggregate types are placed in the lowest memory address.
}
}
+/* Return true if OFFSET is a signed 4-bit value multiplied by the size
+ of MODE. */
+
+static inline bool
+offset_4bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
+{
+ HOST_WIDE_INT multiple;
+ return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+ && IN_RANGE (multiple, -8, 7));
+}
+
+/* Return true if OFFSET is a unsigned 6-bit value multiplied by the size
+ of MODE. */
+
+static inline bool
+offset_6bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset)
+{
+ HOST_WIDE_INT multiple;
+ return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+ && IN_RANGE (multiple, 0, 63));
+}
+
+/* Return true if OFFSET is a signed 7-bit value multiplied by the size
+ of MODE. */
+
+bool
+aarch64_offset_7bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
+{
+ HOST_WIDE_INT multiple;
+ return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+ && IN_RANGE (multiple, -64, 63));
+}
+
+/* Return true if OFFSET is a signed 9-bit value. */
+
static inline bool
offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
poly_int64 offset)
&& IN_RANGE (const_offset, -256, 255));
}
+/* Return true if OFFSET is a signed 9-bit value multiplied by the size
+ of MODE. */
+
static inline bool
-offset_12bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset)
+offset_9bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
{
HOST_WIDE_INT multiple;
return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
- && IN_RANGE (multiple, 0, 4095));
+ && IN_RANGE (multiple, -256, 255));
}
-bool
-aarch64_offset_7bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
+/* Return true if OFFSET is an unsigned 12-bit value multiplied by the size
+ of MODE. */
+
+static inline bool
+offset_12bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset)
{
HOST_WIDE_INT multiple;
return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
- && IN_RANGE (multiple, -64, 63));
+ && IN_RANGE (multiple, 0, 4095));
}
/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */
cfun->machine->reg_is_wrapped_separately[regno] = true;
}
+/* Add a REG_CFA_EXPRESSION note to INSN to say that register REG
+ is saved at BASE + OFFSET. */
+
+static void
+aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg,
+ rtx base, poly_int64 offset)
+{
+ rtx mem = gen_frame_mem (DImode, plus_constant (Pmode, base, offset));
+ add_reg_note (insn, REG_CFA_EXPRESSION,
+ gen_rtx_SET (mem, regno_reg_rtx[reg]));
+}
+
/* AArch64 stack frames generated by this compiler look like:
+-------------------------------+
rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
- aarch64_sub_sp (ip0_rtx, initial_adjust, true);
+ aarch64_sub_sp (ip0_rtx, ip1_rtx, initial_adjust, true);
if (callee_adjust != 0)
aarch64_push_regs (reg1, reg2, callee_adjust);
if (emit_frame_chain)
{
+ poly_int64 reg_offset = callee_adjust;
if (callee_adjust == 0)
- aarch64_save_callee_saves (DImode, callee_offset, R29_REGNUM,
- R30_REGNUM, false);
+ {
+ reg1 = R29_REGNUM;
+ reg2 = R30_REGNUM;
+ reg_offset = callee_offset;
+ aarch64_save_callee_saves (DImode, reg_offset, reg1, reg2, false);
+ }
aarch64_add_offset (Pmode, hard_frame_pointer_rtx,
- stack_pointer_rtx, callee_offset, ip1_rtx,
- frame_pointer_needed);
+ stack_pointer_rtx, callee_offset,
+ ip1_rtx, ip0_rtx, frame_pointer_needed);
+ if (frame_pointer_needed && !frame_size.is_constant ())
+ {
+ /* Variable-sized frames need to describe the save slot
+ address using DW_CFA_expression rather than DW_CFA_offset.
+ This means that, without taking further action, the
+ locations of the registers that we've already saved would
+ remain based on the stack pointer even after we redefine
+ the CFA based on the frame pointer. We therefore need new
+ DW_CFA_expressions to re-express the save slots with addresses
+ based on the frame pointer. */
+ rtx_insn *insn = get_last_insn ();
+ gcc_assert (RTX_FRAME_RELATED_P (insn));
+
+ /* Add an explicit CFA definition if this was previously
+ implicit. */
+ if (!find_reg_note (insn, REG_CFA_ADJUST_CFA, NULL_RTX))
+ {
+ rtx src = plus_constant (Pmode, stack_pointer_rtx,
+ callee_offset);
+ add_reg_note (insn, REG_CFA_ADJUST_CFA,
+ gen_rtx_SET (hard_frame_pointer_rtx, src));
+ }
+
+ /* Change the save slot expressions for the registers that
+ we've already saved. */
+ reg_offset -= callee_offset;
+ aarch64_add_cfa_expression (insn, reg2, hard_frame_pointer_rtx,
+ reg_offset + UNITS_PER_WORD);
+ aarch64_add_cfa_expression (insn, reg1, hard_frame_pointer_rtx,
+ reg_offset);
+ }
emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
}
callee_adjust != 0 || emit_frame_chain);
aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
callee_adjust != 0 || emit_frame_chain);
- aarch64_sub_sp (ip1_rtx, final_adjust, !frame_pointer_needed);
+ aarch64_sub_sp (ip1_rtx, ip0_rtx, final_adjust, !frame_pointer_needed);
}
/* Return TRUE if we can use a simple_return insn.
unsigned reg2 = cfun->machine->frame.wb_candidate2;
rtx cfi_ops = NULL;
rtx_insn *insn;
+ /* A stack clash protection prologue may not have left IP0_REGNUM or
+ IP1_REGNUM in a usable state. The same is true for allocations
+ with an SVE component, since we then need both temporary registers
+ for each allocation. */
+ bool can_inherit_p = (initial_adjust.is_constant ()
+ && final_adjust.is_constant ()
+ && !flag_stack_clash_protection);
/* We need to add memory barrier to prevent read from deallocated stack. */
bool need_barrier_p
is restored on the instruction doing the writeback. */
aarch64_add_offset (Pmode, stack_pointer_rtx,
hard_frame_pointer_rtx, -callee_offset,
- ip1_rtx, callee_adjust == 0);
+ ip1_rtx, ip0_rtx, callee_adjust == 0);
else
- aarch64_add_sp (ip1_rtx, final_adjust, df_regs_ever_live_p (IP1_REGNUM));
+ aarch64_add_sp (ip1_rtx, ip0_rtx, final_adjust,
+ !can_inherit_p || df_regs_ever_live_p (IP1_REGNUM));
aarch64_restore_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
callee_adjust != 0, &cfi_ops);
cfi_ops = NULL;
}
- aarch64_add_sp (ip0_rtx, initial_adjust, df_regs_ever_live_p (IP0_REGNUM));
+ aarch64_add_sp (ip0_rtx, ip1_rtx, initial_adjust,
+ !can_inherit_p || df_regs_ever_live_p (IP0_REGNUM));
if (cfi_ops)
{
temp1 = gen_rtx_REG (Pmode, IP1_REGNUM);
if (vcall_offset == 0)
- aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1, false);
+ aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1, temp0, false);
else
{
gcc_assert ((vcall_offset & (POINTER_BYTES - 1)) == 0);
addr = gen_rtx_PRE_MODIFY (Pmode, this_rtx,
plus_constant (Pmode, this_rtx, delta));
else
- aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1,
- false);
+ aarch64_add_offset (Pmode, this_rtx, this_rtx, delta,
+ temp1, temp0, false);
}
if (Pmode == ptr_mode)
}
else
{
- /* Ignore sign extension. */
- val &= (HOST_WIDE_INT) 0xffffffff;
+ /* Ignore sign extension. */
+ val &= (HOST_WIDE_INT) 0xffffffff;
+ }
+ return ((val & (((HOST_WIDE_INT) 0xffff) << 0)) == val
+ || (val & (((HOST_WIDE_INT) 0xffff) << 16)) == val);
+}
+
+/* VAL is a value with the inner mode of MODE. Replicate it to fill a
+ 64-bit (DImode) integer. */
+
+static unsigned HOST_WIDE_INT
+aarch64_replicate_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode mode)
+{
+ unsigned int size = GET_MODE_UNIT_PRECISION (mode);
+ while (size < 64)
+ {
+ val &= (HOST_WIDE_INT_1U << size) - 1;
+ val |= val << size;
+ size *= 2;
}
- return ((val & (((HOST_WIDE_INT) 0xffff) << 0)) == val
- || (val & (((HOST_WIDE_INT) 0xffff) << 16)) == val);
+ return val;
}
/* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2. */
/* Check for a single sequence of one bits and return quickly if so.
The special cases of all ones and all zeroes returns false. */
- val = (unsigned HOST_WIDE_INT) val_in;
+ val = aarch64_replicate_bitmask_imm (val_in, mode);
tmp = val + (val & -val);
if (tmp == (tmp & -tmp))
if (GET_CODE (x) == HIGH)
return true;
+ /* There's no way to calculate VL-based values using relocations. */
+ subrtx_iterator::array_type array;
+ FOR_EACH_SUBRTX (iter, array, x, ALL)
+ if (GET_CODE (*iter) == CONST_POLY_INT)
+ return true;
+
split_const (x, &base, &offset);
if (GET_CODE (base) == SYMBOL_REF || GET_CODE (base) == LABEL_REF)
{
- if (aarch64_classify_symbol (base, offset)
+ if (aarch64_classify_symbol (base, INTVAL (offset))
!= SYMBOL_FORCE_TO_MEM)
return true;
else
&& contains_reg_of_mode[GENERAL_REGS][GET_MODE (SUBREG_REG (index))])
index = SUBREG_REG (index);
- if ((shift == 0
- || (shift > 0 && shift <= 3
- && known_eq (1 << shift, GET_MODE_SIZE (mode))))
- && REG_P (index)
+ if (aarch64_sve_data_mode_p (mode))
+ {
+ if (type != ADDRESS_REG_REG
+ || (1 << shift) != GET_MODE_UNIT_SIZE (mode))
+ return false;
+ }
+ else
+ {
+ if (shift != 0
+ && !(IN_RANGE (shift, 1, 3)
+ && known_eq (1 << shift, GET_MODE_SIZE (mode))))
+ return false;
+ }
+
+ if (REG_P (index)
&& aarch64_regno_ok_for_index_p (REGNO (index), strict_p))
{
info->type = type;
/* On BE, we use load/store pair for all large int mode load/stores.
TI/TFmode may also use a load/store pair. */
+ unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+ bool advsimd_struct_p = (vec_flags == (VEC_ADVSIMD | VEC_STRUCT));
bool load_store_pair_p = (type == ADDR_QUERY_LDP_STP
|| mode == TImode
|| mode == TFmode
- || (BYTES_BIG_ENDIAN
- && aarch64_vect_struct_mode_p (mode)));
+ || (BYTES_BIG_ENDIAN && advsimd_struct_p));
bool allow_reg_index_p = (!load_store_pair_p
- && (maybe_ne (GET_MODE_SIZE (mode), 16)
- || aarch64_vector_mode_supported_p (mode))
- && !aarch64_vect_struct_mode_p (mode));
+ && (known_lt (GET_MODE_SIZE (mode), 16)
+ || vec_flags == VEC_ADVSIMD
+ || vec_flags == VEC_SVE_DATA));
+
+ /* For SVE, only accept [Rn], [Rn, Rm, LSL #shift] and
+ [Rn, #offset, MUL VL]. */
+ if ((vec_flags & (VEC_SVE_DATA | VEC_SVE_PRED)) != 0
+ && (code != REG && code != PLUS))
+ return false;
/* On LE, for AdvSIMD, don't support anything other than POST_INC or
REG addressing. */
- if (aarch64_vect_struct_mode_p (mode) && !BYTES_BIG_ENDIAN
+ if (advsimd_struct_p
+ && !BYTES_BIG_ENDIAN
&& (code != POST_INC && code != REG))
return false;
+ gcc_checking_assert (GET_MODE (x) == VOIDmode
+ || SCALAR_INT_MODE_P (GET_MODE (x)));
+
switch (code)
{
case REG:
&& aarch64_offset_7bit_signed_scaled_p (TImode,
offset + 32));
+ /* Make "m" use the LD1 offset range for SVE data modes, so
+ that pre-RTL optimizers like ivopts will work to that
+ instead of the wider LDR/STR range. */
+ if (vec_flags == VEC_SVE_DATA)
+ return (type == ADDR_QUERY_M
+ ? offset_4bit_signed_scaled_p (mode, offset)
+ : offset_9bit_signed_scaled_p (mode, offset));
+
+ if (vec_flags == VEC_SVE_PRED)
+ return offset_9bit_signed_scaled_p (mode, offset);
+
if (load_store_pair_p)
return ((known_eq (GET_MODE_SIZE (mode), 4)
|| known_eq (GET_MODE_SIZE (mode), 8))
rtx sym, offs;
split_const (info->offset, &sym, &offs);
if (GET_CODE (sym) == SYMBOL_REF
- && (aarch64_classify_symbol (sym, offs) == SYMBOL_SMALL_ABSOLUTE))
+ && (aarch64_classify_symbol (sym, INTVAL (offs))
+ == SYMBOL_SMALL_ABSOLUTE))
{
/* The symbol and offset must be aligned to the access size. */
unsigned int align;
rtx offset;
split_const (x, &x, &offset);
- return aarch64_classify_symbol (x, offset);
+ return aarch64_classify_symbol (x, INTVAL (offset));
}
return aarch64_const_vec_all_same_in_range_p (x, val, val);
}
+/* Return true if VEC is a constant in which every element is in the range
+ [MINVAL, MAXVAL]. The elements do not need to have the same value. */
+
+static bool
+aarch64_const_vec_all_in_range_p (rtx vec,
+ HOST_WIDE_INT minval,
+ HOST_WIDE_INT maxval)
+{
+ if (GET_CODE (vec) != CONST_VECTOR
+ || GET_MODE_CLASS (GET_MODE (vec)) != MODE_VECTOR_INT)
+ return false;
+
+ int nunits;
+ if (!CONST_VECTOR_STEPPED_P (vec))
+ nunits = const_vector_encoded_nelts (vec);
+ else if (!CONST_VECTOR_NUNITS (vec).is_constant (&nunits))
+ return false;
+
+ for (int i = 0; i < nunits; i++)
+ {
+ rtx vec_elem = CONST_VECTOR_ELT (vec, i);
+ if (!CONST_INT_P (vec_elem)
+ || !IN_RANGE (INTVAL (vec_elem), minval, maxval))
+ return false;
+ }
+ return true;
+}
/* N Z C V. */
#define AARCH64_CC_V 1
0 /* NV, Any. */
};
+/* Print floating-point vector immediate operand X to F, negating it
+ first if NEGATE is true. Return true on success, false if it isn't
+ a constant we can handle. */
+
+static bool
+aarch64_print_vector_float_operand (FILE *f, rtx x, bool negate)
+{
+ rtx elt;
+
+ if (!const_vec_duplicate_p (x, &elt))
+ return false;
+
+ REAL_VALUE_TYPE r = *CONST_DOUBLE_REAL_VALUE (elt);
+ if (negate)
+ r = real_value_negate (&r);
+
+ /* We only handle the SVE single-bit immediates here. */
+ if (real_equal (&r, &dconst0))
+ asm_fprintf (f, "0.0");
+ else if (real_equal (&r, &dconst1))
+ asm_fprintf (f, "1.0");
+ else if (real_equal (&r, &dconsthalf))
+ asm_fprintf (f, "0.5");
+ else
+ return false;
+
+ return true;
+}
+
/* Print operand X to file F in a target specific manner according to CODE.
The acceptable formatting commands given by CODE are:
'c': An integer or symbol address without a preceding #
sign.
+ 'C': Take the duplicated element in a vector constant
+ and print it in hex.
+ 'D': Take the duplicated element in a vector constant
+ and print it as an unsigned integer, in decimal.
'e': Print the sign/zero-extend size as a character 8->b,
16->h, 32->w.
'p': Prints N such that 2^N == X (X must be power of 2 and
of regs.
'm': Print a condition (eq, ne, etc).
'M': Same as 'm', but invert condition.
+ 'N': Take the duplicated element in a vector constant
+ and print the negative of it in decimal.
'b/h/s/d/q': Print a scalar FP/SIMD register name.
'S/T/U/V': Print a FP/SIMD register name for a register list.
The register printed is the FP/SIMD register name
static void
aarch64_print_operand (FILE *f, rtx x, int code)
{
+ rtx elt;
switch (code)
{
case 'c':
}
break;
+ case 'N':
+ if (!const_vec_duplicate_p (x, &elt))
+ {
+ output_operand_lossage ("invalid vector constant");
+ return;
+ }
+
+ if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
+ asm_fprintf (f, "%wd", -INTVAL (elt));
+ else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_FLOAT
+ && aarch64_print_vector_float_operand (f, x, true))
+ ;
+ else
+ {
+ output_operand_lossage ("invalid vector constant");
+ return;
+ }
+ break;
+
case 'b':
case 'h':
case 's':
output_operand_lossage ("incompatible floating point / vector register operand for '%%%c'", code);
return;
}
- asm_fprintf (f, "v%d", REGNO (x) - V0_REGNUM + (code - 'S'));
+ asm_fprintf (f, "%c%d",
+ aarch64_sve_data_mode_p (GET_MODE (x)) ? 'z' : 'v',
+ REGNO (x) - V0_REGNUM + (code - 'S'));
break;
case 'R':
asm_fprintf (f, "0x%wx", UINTVAL (x) & 0xffff);
break;
+ case 'C':
+ {
+ /* Print a replicated constant in hex. */
+ if (!const_vec_duplicate_p (x, &elt) || !CONST_INT_P (elt))
+ {
+ output_operand_lossage ("invalid operand for '%%%c'", code);
+ return;
+ }
+ scalar_mode inner_mode = GET_MODE_INNER (GET_MODE (x));
+ asm_fprintf (f, "0x%wx", UINTVAL (elt) & GET_MODE_MASK (inner_mode));
+ }
+ break;
+
+ case 'D':
+ {
+ /* Print a replicated constant in decimal, treating it as
+ unsigned. */
+ if (!const_vec_duplicate_p (x, &elt) || !CONST_INT_P (elt))
+ {
+ output_operand_lossage ("invalid operand for '%%%c'", code);
+ return;
+ }
+ scalar_mode inner_mode = GET_MODE_INNER (GET_MODE (x));
+ asm_fprintf (f, "%wd", UINTVAL (elt) & GET_MODE_MASK (inner_mode));
+ }
+ break;
+
case 'w':
case 'x':
if (x == const0_rtx
switch (GET_CODE (x))
{
case REG:
- asm_fprintf (f, "%s", reg_names [REGNO (x)]);
+ if (aarch64_sve_data_mode_p (GET_MODE (x)))
+ asm_fprintf (f, "z%d", REGNO (x) - V0_REGNUM);
+ else
+ asm_fprintf (f, "%s", reg_names [REGNO (x)]);
break;
case MEM:
output_address (GET_MODE (x), XEXP (x, 0));
break;
- case CONST:
case LABEL_REF:
case SYMBOL_REF:
output_addr_const (asm_out_file, x);
asm_fprintf (f, "%wd", INTVAL (x));
break;
- case CONST_VECTOR:
- if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
+ case CONST:
+ if (!VECTOR_MODE_P (GET_MODE (x)))
{
- gcc_assert (
- aarch64_const_vec_all_same_in_range_p (x,
- HOST_WIDE_INT_MIN,
- HOST_WIDE_INT_MAX));
- asm_fprintf (f, "%wd", INTVAL (CONST_VECTOR_ELT (x, 0)));
+ output_addr_const (asm_out_file, x);
+ break;
}
- else if (aarch64_simd_imm_zero_p (x, GET_MODE (x)))
+ /* fall through */
+
+ case CONST_VECTOR:
+ if (!const_vec_duplicate_p (x, &elt))
{
- fputc ('0', f);
+ output_operand_lossage ("invalid vector constant");
+ return;
}
+
+ if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
+ asm_fprintf (f, "%wd", INTVAL (elt));
+ else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_FLOAT
+ && aarch64_print_vector_float_operand (f, x, false))
+ ;
else
- gcc_unreachable ();
+ {
+ output_operand_lossage ("invalid vector constant");
+ return;
+ }
break;
case CONST_DOUBLE:
case ADDRESS_REG_IMM:
if (known_eq (addr.const_offset, 0))
asm_fprintf (f, "[%s]", reg_names [REGNO (addr.base)]);
+ else if (aarch64_sve_data_mode_p (mode))
+ {
+ HOST_WIDE_INT vnum
+ = exact_div (addr.const_offset,
+ BYTES_PER_SVE_VECTOR).to_constant ();
+ asm_fprintf (f, "[%s, #%wd, mul vl]",
+ reg_names[REGNO (addr.base)], vnum);
+ }
+ else if (aarch64_sve_pred_mode_p (mode))
+ {
+ HOST_WIDE_INT vnum
+ = exact_div (addr.const_offset,
+ BYTES_PER_SVE_PRED).to_constant ();
+ asm_fprintf (f, "[%s, #%wd, mul vl]",
+ reg_names[REGNO (addr.base)], vnum);
+ }
else
asm_fprintf (f, "[%s, %wd]", reg_names [REGNO (addr.base)],
INTVAL (addr.offset));
static void
aarch64_print_operand_address (FILE *f, machine_mode mode, rtx x)
{
- if (!aarch64_print_address_internal (f, mode, x, ADDR_QUERY_M))
+ if (!aarch64_print_address_internal (f, mode, x, ADDR_QUERY_ANY))
output_addr_const (f, x);
}
if (FP_REGNUM_P (regno))
return FP_LO_REGNUM_P (regno) ? FP_LO_REGS : FP_REGS;
+ if (PR_REGNUM_P (regno))
+ return PR_LO_REGNUM_P (regno) ? PR_LO_REGS : PR_HI_REGS;
+
return NO_REGS;
}
machine_mode mode,
secondary_reload_info *sri)
{
+ if (BYTES_BIG_ENDIAN
+ && reg_class_subset_p (rclass, FP_REGS)
+ && (MEM_P (x) || (REG_P (x) && !HARD_REGISTER_P (x)))
+ && aarch64_sve_data_mode_p (mode))
+ {
+ sri->icode = CODE_FOR_aarch64_sve_reload_be;
+ return NO_REGS;
+ }
/* If we have to disable direct literal pool loads and stores because the
function is too big, then we need a scratch register. */
can hold MODE, but at the moment we need to handle all modes.
Just ignore any runtime parts for registers that can't store them. */
HOST_WIDE_INT lowest_size = constant_lower_bound (GET_MODE_SIZE (mode));
+ unsigned int nregs;
switch (regclass)
{
case CALLER_SAVE_REGS:
case POINTER_AND_FP_REGS:
case FP_REGS:
case FP_LO_REGS:
- return (aarch64_vector_mode_p (mode)
+ if (aarch64_sve_data_mode_p (mode)
+ && constant_multiple_p (GET_MODE_SIZE (mode),
+ BYTES_PER_SVE_VECTOR, &nregs))
+ return nregs;
+ return (aarch64_vector_data_mode_p (mode)
? CEIL (lowest_size, UNITS_PER_VREG)
: CEIL (lowest_size, UNITS_PER_WORD));
case STACK_REG:
+ case PR_REGS:
+ case PR_LO_REGS:
+ case PR_HI_REGS:
return 1;
case NO_REGS:
}
if (GET_MODE_CLASS (mode) == MODE_INT
- && CONST_INT_P (op1)
- && aarch64_uimm12_shift (INTVAL (op1)))
+ && ((CONST_INT_P (op1) && aarch64_uimm12_shift (INTVAL (op1)))
+ || aarch64_sve_addvl_addpl_immediate (op1, mode)))
{
*cost += rtx_cost (op0, mode, PLUS, 0, speed);
return &all_architectures[cpu->arch];
}
+/* Return the VG value associated with -msve-vector-bits= value VALUE. */
+
+static poly_uint16
+aarch64_convert_sve_vector_bits (aarch64_sve_vector_bits_enum value)
+{
+ /* For now generate vector-length agnostic code for -msve-vector-bits=128.
+ This ensures we can clearly distinguish SVE and Advanced SIMD modes when
+ deciding which .md file patterns to use and when deciding whether
+ something is a legitimate address or constant. */
+ if (value == SVE_SCALABLE || value == SVE_128)
+ return poly_uint16 (2, 2);
+ else
+ return (int) value / 64;
+}
+
/* Implement TARGET_OPTION_OVERRIDE. This is called once in the beginning
and is used to parse the -m{cpu,tune,arch} strings and setup the initial
tuning structs. In particular it must set selected_tune and
error ("assembler does not support -mabi=ilp32");
#endif
+ /* Convert -msve-vector-bits to a VG count. */
+ aarch64_sve_vg = aarch64_convert_sve_vector_bits (aarch64_sve_vector_bits);
+
if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE && TARGET_ILP32)
sorry ("return address signing is only supported for -mabi=lp64");
}
}
-/* Return the method that should be used to access SYMBOL_REF or
- LABEL_REF X. */
+/* Return the correct method for accessing X + OFFSET, where X is either
+ a SYMBOL_REF or LABEL_REF. */
enum aarch64_symbol_type
-aarch64_classify_symbol (rtx x, rtx offset)
+aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
{
if (GET_CODE (x) == LABEL_REF)
{
resolve to a symbol in this module, then force to memory. */
if ((SYMBOL_REF_WEAK (x)
&& !aarch64_symbol_binds_local_p (x))
- || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575)
+ || !IN_RANGE (offset, -1048575, 1048575))
return SYMBOL_FORCE_TO_MEM;
return SYMBOL_TINY_ABSOLUTE;
4G. */
if ((SYMBOL_REF_WEAK (x)
&& !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263),
+ || !IN_RANGE (offset, HOST_WIDE_INT_C (-4294967263),
HOST_WIDE_INT_C (4294967264)))
return SYMBOL_FORCE_TO_MEM;
return SYMBOL_SMALL_ABSOLUTE;
if (CONST_INT_P (x) || CONST_DOUBLE_P (x) || GET_CODE (x) == CONST_VECTOR)
return true;
- /* Do not allow vector struct mode constants. We could support
- 0 and -1 easily, but they need support in aarch64-simd.md. */
- if (aarch64_vect_struct_mode_p (mode))
+ /* Do not allow vector struct mode constants for Advanced SIMD.
+ We could support 0 and -1 easily, but they need support in
+ aarch64-simd.md. */
+ unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+ if (vec_flags == (VEC_ADVSIMD | VEC_STRUCT))
return false;
/* Do not allow wide int constants - this requires support in movti. */
if (CONST_WIDE_INT_P (x))
return false;
+ /* Only accept variable-length vector constants if they can be
+ handled directly.
+
+ ??? It would be possible to handle rematerialization of other
+ constants via secondary reloads. */
+ if (vec_flags & VEC_ANY_SVE)
+ return aarch64_simd_valid_immediate (x, NULL);
+
if (GET_CODE (x) == HIGH)
x = XEXP (x, 0);
- /* Do not allow const (plus (anchor_symbol, const_int)). */
- if (GET_CODE (x) == CONST)
- {
- rtx offset;
-
- split_const (x, &x, &offset);
+ /* Accept polynomial constants that can be calculated by using the
+ destination of a move as the sole temporary. Constants that
+ require a second temporary cannot be rematerialized (they can't be
+ forced to memory and also aren't legitimate constants). */
+ poly_int64 offset;
+ if (poly_int_rtx_p (x, &offset))
+ return aarch64_offset_temporaries (false, offset) <= 1;
+
+ /* If an offset is being added to something else, we need to allow the
+ base to be moved into the destination register, meaning that there
+ are no free temporaries for the offset. */
+ x = strip_offset (x, &offset);
+ if (!offset.is_constant () && aarch64_offset_temporaries (true, offset) > 0)
+ return false;
- if (SYMBOL_REF_P (x) && SYMBOL_REF_ANCHOR_P (x))
- return false;
- }
+ /* Do not allow const (plus (anchor_symbol, const_int)). */
+ if (maybe_ne (offset, 0) && SYMBOL_REF_P (x) && SYMBOL_REF_ANCHOR_P (x))
+ return false;
/* Treat symbols as constants. Avoid TLS symbols as they are complex,
so spilling them is better than rematerialization. */
call_used_regs[i] = 1;
}
}
+ if (!TARGET_SVE)
+ for (i = P0_REGNUM; i <= P15_REGNUM; i++)
+ {
+ fixed_regs[i] = 1;
+ call_used_regs[i] = 1;
+ }
}
/* Walk down the type tree of TYPE counting consecutive base elements.
static bool
aarch64_vector_mode_supported_p (machine_mode mode)
{
- if (TARGET_SIMD
- && (mode == V4SImode || mode == V8HImode
- || mode == V16QImode || mode == V2DImode
- || mode == V2SImode || mode == V4HImode
- || mode == V8QImode || mode == V2SFmode
- || mode == V4SFmode || mode == V2DFmode
- || mode == V4HFmode || mode == V8HFmode
- || mode == V1DFmode))
- return true;
-
- return false;
+ unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+ return vec_flags != 0 && (vec_flags & VEC_STRUCT) == 0;
}
/* Return appropriate SIMD container
for MODE within a vector of WIDTH bits. */
static machine_mode
-aarch64_simd_container_mode (scalar_mode mode, unsigned width)
+aarch64_simd_container_mode (scalar_mode mode, poly_int64 width)
{
- gcc_assert (width == 64 || width == 128);
+ if (TARGET_SVE && known_eq (width, BITS_PER_SVE_VECTOR))
+ switch (mode)
+ {
+ case E_DFmode:
+ return VNx2DFmode;
+ case E_SFmode:
+ return VNx4SFmode;
+ case E_HFmode:
+ return VNx8HFmode;
+ case E_DImode:
+ return VNx2DImode;
+ case E_SImode:
+ return VNx4SImode;
+ case E_HImode:
+ return VNx8HImode;
+ case E_QImode:
+ return VNx16QImode;
+ default:
+ return word_mode;
+ }
+
+ gcc_assert (known_eq (width, 64) || known_eq (width, 128));
if (TARGET_SIMD)
{
- if (width == 128)
+ if (known_eq (width, 128))
switch (mode)
{
case E_DFmode:
static machine_mode
aarch64_preferred_simd_mode (scalar_mode mode)
{
- return aarch64_simd_container_mode (mode, 128);
+ poly_int64 bits = TARGET_SVE ? BITS_PER_SVE_VECTOR : 128;
+ return aarch64_simd_container_mode (mode, bits);
}
/* Return a list of possible vector sizes for the vectorizer
static void
aarch64_autovectorize_vector_sizes (vector_sizes *sizes)
{
+ if (TARGET_SVE)
+ sizes->safe_push (BYTES_PER_SVE_VECTOR);
sizes->safe_push (16);
sizes->safe_push (8);
}
}
}
+/* Return true if BASE_OR_STEP is a valid immediate operand for an SVE INDEX
+ instruction. */
+
+bool
+aarch64_sve_index_immediate_p (rtx base_or_step)
+{
+ return (CONST_INT_P (base_or_step)
+ && IN_RANGE (INTVAL (base_or_step), -16, 15));
+}
+
+/* Return true if X is a valid immediate for the SVE ADD and SUB
+ instructions. Negate X first if NEGATE_P is true. */
+
+bool
+aarch64_sve_arith_immediate_p (rtx x, bool negate_p)
+{
+ rtx elt;
+
+ if (!const_vec_duplicate_p (x, &elt)
+ || !CONST_INT_P (elt))
+ return false;
+
+ HOST_WIDE_INT val = INTVAL (elt);
+ if (negate_p)
+ val = -val;
+ val &= GET_MODE_MASK (GET_MODE_INNER (GET_MODE (x)));
+
+ if (val & 0xff)
+ return IN_RANGE (val, 0, 0xff);
+ return IN_RANGE (val, 0, 0xff00);
+}
+
+/* Return true if X is a valid immediate operand for an SVE logical
+ instruction such as AND. */
+
+bool
+aarch64_sve_bitmask_immediate_p (rtx x)
+{
+ rtx elt;
+
+ return (const_vec_duplicate_p (x, &elt)
+ && CONST_INT_P (elt)
+ && aarch64_bitmask_imm (INTVAL (elt),
+ GET_MODE_INNER (GET_MODE (x))));
+}
+
+/* Return true if X is a valid immediate for the SVE DUP and CPY
+ instructions. */
+
+bool
+aarch64_sve_dup_immediate_p (rtx x)
+{
+ rtx elt;
+
+ if (!const_vec_duplicate_p (x, &elt)
+ || !CONST_INT_P (elt))
+ return false;
+
+ HOST_WIDE_INT val = INTVAL (elt);
+ if (val & 0xff)
+ return IN_RANGE (val, -0x80, 0x7f);
+ return IN_RANGE (val, -0x8000, 0x7f00);
+}
+
+/* Return true if X is a valid immediate operand for an SVE CMP instruction.
+ SIGNED_P says whether the operand is signed rather than unsigned. */
+
+bool
+aarch64_sve_cmp_immediate_p (rtx x, bool signed_p)
+{
+ rtx elt;
+
+ return (const_vec_duplicate_p (x, &elt)
+ && CONST_INT_P (elt)
+ && (signed_p
+ ? IN_RANGE (INTVAL (elt), -16, 15)
+ : IN_RANGE (INTVAL (elt), 0, 127)));
+}
+
+/* Return true if X is a valid immediate operand for an SVE FADD or FSUB
+ instruction. Negate X first if NEGATE_P is true. */
+
+bool
+aarch64_sve_float_arith_immediate_p (rtx x, bool negate_p)
+{
+ rtx elt;
+ REAL_VALUE_TYPE r;
+
+ if (!const_vec_duplicate_p (x, &elt)
+ || GET_CODE (elt) != CONST_DOUBLE)
+ return false;
+
+ r = *CONST_DOUBLE_REAL_VALUE (elt);
+
+ if (negate_p)
+ r = real_value_negate (&r);
+
+ if (real_equal (&r, &dconst1))
+ return true;
+ if (real_equal (&r, &dconsthalf))
+ return true;
+ return false;
+}
+
+/* Return true if X is a valid immediate operand for an SVE FMUL
+ instruction. */
+
+bool
+aarch64_sve_float_mul_immediate_p (rtx x)
+{
+ rtx elt;
+
+ /* GCC will never generate a multiply with an immediate of 2, so there is no
+ point testing for it (even though it is a valid constant). */
+ return (const_vec_duplicate_p (x, &elt)
+ && GET_CODE (elt) == CONST_DOUBLE
+ && real_equal (CONST_DOUBLE_REAL_VALUE (elt), &dconsthalf));
+}
+
/* Return true if replicating VAL32 is a valid 2-byte or 4-byte immediate
for the Advanced SIMD operation described by WHICH and INSN. If INFO
is nonnull, use it to describe valid immediates. */
return false;
}
+/* Return true if replicating VAL64 gives a valid immediate for an SVE MOV
+ instruction. If INFO is nonnull, use it to describe valid immediates. */
+
+static bool
+aarch64_sve_valid_immediate (unsigned HOST_WIDE_INT val64,
+ simd_immediate_info *info)
+{
+ scalar_int_mode mode = DImode;
+ unsigned int val32 = val64 & 0xffffffff;
+ if (val32 == (val64 >> 32))
+ {
+ mode = SImode;
+ unsigned int val16 = val32 & 0xffff;
+ if (val16 == (val32 >> 16))
+ {
+ mode = HImode;
+ unsigned int val8 = val16 & 0xff;
+ if (val8 == (val16 >> 8))
+ mode = QImode;
+ }
+ }
+ HOST_WIDE_INT val = trunc_int_for_mode (val64, mode);
+ if (IN_RANGE (val, -0x80, 0x7f))
+ {
+ /* DUP with no shift. */
+ if (info)
+ *info = simd_immediate_info (mode, val);
+ return true;
+ }
+ if ((val & 0xff) == 0 && IN_RANGE (val, -0x8000, 0x7f00))
+ {
+ /* DUP with LSL #8. */
+ if (info)
+ *info = simd_immediate_info (mode, val);
+ return true;
+ }
+ if (aarch64_bitmask_imm (val64, mode))
+ {
+ /* DUPM. */
+ if (info)
+ *info = simd_immediate_info (mode, val);
+ return true;
+ }
+ return false;
+}
+
/* Return true if OP is a valid SIMD immediate for the operation
described by WHICH. If INFO is nonnull, use it to describe valid
immediates. */
aarch64_simd_valid_immediate (rtx op, simd_immediate_info *info,
enum simd_immediate_check which)
{
- rtx elt = NULL;
+ machine_mode mode = GET_MODE (op);
+ unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+ if (vec_flags == 0 || vec_flags == (VEC_ADVSIMD | VEC_STRUCT))
+ return false;
+
+ scalar_mode elt_mode = GET_MODE_INNER (mode);
+ rtx elt = NULL, base, step;
unsigned int n_elts;
if (const_vec_duplicate_p (op, &elt))
n_elts = 1;
+ else if ((vec_flags & VEC_SVE_DATA)
+ && const_vec_series_p (op, &base, &step))
+ {
+ gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
+ if (!aarch64_sve_index_immediate_p (base)
+ || !aarch64_sve_index_immediate_p (step))
+ return false;
+
+ if (info)
+ *info = simd_immediate_info (elt_mode, base, step);
+ return true;
+ }
else if (GET_CODE (op) == CONST_VECTOR
&& CONST_VECTOR_NUNITS (op).is_constant (&n_elts))
/* N_ELTS set above. */;
else
return false;
- machine_mode mode = GET_MODE (op);
- scalar_mode elt_mode = GET_MODE_INNER (mode);
+ /* Handle PFALSE and PTRUE. */
+ if (vec_flags & VEC_SVE_PRED)
+ return (op == CONST0_RTX (mode)
+ || op == CONSTM1_RTX (mode));
+
scalar_float_mode elt_float_mode;
if (elt
&& is_a <scalar_float_mode> (elt_mode, &elt_float_mode)
val64 |= ((unsigned HOST_WIDE_INT) bytes[i % nbytes]
<< (i * BITS_PER_UNIT));
- return aarch64_advsimd_valid_immediate (val64, info, which);
+ if (vec_flags & VEC_SVE_DATA)
+ return aarch64_sve_valid_immediate (val64, info);
+ else
+ return aarch64_advsimd_valid_immediate (val64, info, which);
+}
+
+/* Check whether X is a VEC_SERIES-like constant that starts at 0 and
+ has a step in the range of INDEX. Return the index expression if so,
+ otherwise return null. */
+rtx
+aarch64_check_zero_based_sve_index_immediate (rtx x)
+{
+ rtx base, step;
+ if (const_vec_series_p (x, &base, &step)
+ && base == const0_rtx
+ && aarch64_sve_index_immediate_p (step))
+ return step;
+ return NULL_RTX;
}
/* Check of immediate shift constants are within range. */
return aarch64_const_vec_all_same_in_range_p (x, 1, bit_width);
}
-/* Return true if X is a uniform vector where all elements
- are either the floating-point constant 0.0 or the
- integer constant 0. */
-bool
-aarch64_simd_imm_zero_p (rtx x, machine_mode mode)
-{
- return x == CONST0_RTX (mode);
-}
-
-
/* Return the bitmask CONST_INT to select the bits required by a zero extract
operation of width WIDTH at bit position POS. */
if (CONST_INT_P (x))
return true;
+ if (VECTOR_MODE_P (GET_MODE (x)))
+ return aarch64_simd_valid_immediate (x, NULL);
+
if (GET_CODE (x) == SYMBOL_REF && mode == DImode && CONSTANT_ADDRESS_P (x))
return true;
+ if (aarch64_sve_cnt_immediate_p (x))
+ return true;
+
return aarch64_classify_symbolic_expression (x)
== SYMBOL_TINY_ABSOLUTE;
}
{
machine_mode vmode;
- vmode = aarch64_preferred_simd_mode (mode);
+ vmode = aarch64_simd_container_mode (mode, 64);
rtx op_v = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (op));
return aarch64_simd_valid_immediate (op_v, NULL);
}
}
/* Return TRUE if OP is a valid vector addressing mode. */
+
bool
aarch64_simd_mem_operand_p (rtx op)
{
|| REG_P (XEXP (op, 0)));
}
+/* Return true if OP is a valid MEM operand for an SVE LD1R instruction. */
+
+bool
+aarch64_sve_ld1r_operand_p (rtx op)
+{
+ struct aarch64_address_info addr;
+ scalar_mode mode;
+
+ return (MEM_P (op)
+ && is_a <scalar_mode> (GET_MODE (op), &mode)
+ && aarch64_classify_address (&addr, XEXP (op, 0), mode, false)
+ && addr.type == ADDRESS_REG_IMM
+ && offset_6bit_unsigned_scaled_p (mode, addr.const_offset));
+}
+
+/* Return true if OP is a valid MEM operand for an SVE LDR instruction.
+ The conditions for STR are the same. */
+bool
+aarch64_sve_ldr_operand_p (rtx op)
+{
+ struct aarch64_address_info addr;
+
+ return (MEM_P (op)
+ && aarch64_classify_address (&addr, XEXP (op, 0), GET_MODE (op),
+ false, ADDR_QUERY_ANY)
+ && addr.type == ADDRESS_REG_IMM);
+}
+
/* Emit a register copy from operand to operand, taking care not to
early-clobber source registers in the process.
}
/* Implement target hook TARGET_VECTOR_ALIGNMENT. The AAPCS64 sets the maximum
- alignment of a vector to 128 bits. */
+ alignment of a vector to 128 bits. SVE predicates have an alignment of
+ 16 bits. */
static HOST_WIDE_INT
aarch64_simd_vector_alignment (const_tree type)
{
+ if (TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST)
+ /* ??? Checking the mode isn't ideal, but VECTOR_BOOLEAN_TYPE_P can
+ be set for non-predicate vectors of booleans. Modes are the most
+ direct way we have of identifying real SVE predicate types. */
+ return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128;
HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
return MIN (align, 128);
}
+/* Implement target hook TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT. */
+static HOST_WIDE_INT
+aarch64_vectorize_preferred_vector_alignment (const_tree type)
+{
+ if (aarch64_sve_data_mode_p (TYPE_MODE (type)))
+ {
+ /* If the length of the vector is fixed, try to align to that length,
+ otherwise don't try to align at all. */
+ HOST_WIDE_INT result;
+ if (!BITS_PER_SVE_VECTOR.is_constant (&result))
+ result = TYPE_ALIGN (TREE_TYPE (type));
+ return result;
+ }
+ return TYPE_ALIGN (type);
+}
+
/* Implement target hook TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE. */
static bool
aarch64_simd_vector_alignment_reachable (const_tree type, bool is_packed)
if (is_packed)
return false;
- /* We guarantee alignment for vectors up to 128-bits. */
- if (tree_int_cst_compare (TYPE_SIZE (type),
- bitsize_int (BIGGEST_ALIGNMENT)) > 0)
+ /* For fixed-length vectors, check that the vectorizer will aim for
+ full-vector alignment. This isn't true for generic GCC vectors
+ that are wider than the ABI maximum of 128 bits. */
+ if (TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
+ && (wi::to_widest (TYPE_SIZE (type))
+ != aarch64_vectorize_preferred_vector_alignment (type)))
return false;
/* Vectors whose size is <= BIGGEST_ALIGNMENT are naturally aligned. */
static unsigned HOST_WIDE_INT
aarch64_shift_truncation_mask (machine_mode mode)
{
- return
- (!SHIFT_COUNT_TRUNCATED
- || aarch64_vector_mode_supported_p (mode)
- || aarch64_vect_struct_mode_p (mode))
- ? 0
- : (GET_MODE_UNIT_BITSIZE (mode) - 1);
+ if (!SHIFT_COUNT_TRUNCATED || aarch64_vector_data_mode_p (mode))
+ return 0;
+ return GET_MODE_UNIT_BITSIZE (mode) - 1;
}
/* Select a format to encode pointers in exception handling data. */
return aarch64_output_simd_mov_immediate (v_op, width);
}
+/* Return the output string to use for moving immediate CONST_VECTOR
+ into an SVE register. */
+
+char *
+aarch64_output_sve_mov_immediate (rtx const_vector)
+{
+ static char templ[40];
+ struct simd_immediate_info info;
+ char element_char;
+
+ bool is_valid = aarch64_simd_valid_immediate (const_vector, &info);
+ gcc_assert (is_valid);
+
+ element_char = sizetochar (GET_MODE_BITSIZE (info.elt_mode));
+
+ if (info.step)
+ {
+ snprintf (templ, sizeof (templ), "index\t%%0.%c, #"
+ HOST_WIDE_INT_PRINT_DEC ", #" HOST_WIDE_INT_PRINT_DEC,
+ element_char, INTVAL (info.value), INTVAL (info.step));
+ return templ;
+ }
+
+ if (GET_MODE_CLASS (info.elt_mode) == MODE_FLOAT)
+ {
+ if (aarch64_float_const_zero_rtx_p (info.value))
+ info.value = GEN_INT (0);
+ else
+ {
+ const int buf_size = 20;
+ char float_buf[buf_size] = {};
+ real_to_decimal_for_mode (float_buf,
+ CONST_DOUBLE_REAL_VALUE (info.value),
+ buf_size, buf_size, 1, info.elt_mode);
+
+ snprintf (templ, sizeof (templ), "fmov\t%%0.%c, #%s",
+ element_char, float_buf);
+ return templ;
+ }
+ }
+
+ snprintf (templ, sizeof (templ), "mov\t%%0.%c, #" HOST_WIDE_INT_PRINT_DEC,
+ element_char, INTVAL (info.value));
+ return templ;
+}
+
+/* Return the asm format for a PTRUE instruction whose destination has
+ mode MODE. SUFFIX is the element size suffix. */
+
+char *
+aarch64_output_ptrue (machine_mode mode, char suffix)
+{
+ unsigned int nunits;
+ static char buf[sizeof ("ptrue\t%0.N, vlNNNNN")];
+ if (GET_MODE_NUNITS (mode).is_constant (&nunits))
+ snprintf (buf, sizeof (buf), "ptrue\t%%0.%c, vl%d", suffix, nunits);
+ else
+ snprintf (buf, sizeof (buf), "ptrue\t%%0.%c, all", suffix);
+ return buf;
+}
+
/* Split operands into moves from op[1] + op[2] into op[0]. */
void
/* vec_perm support. */
-#define MAX_VECT_LEN 16
-
struct expand_vec_perm_d
{
rtx target, op0, op1;
vec_perm_indices perm;
machine_mode vmode;
+ unsigned int vec_flags;
bool one_vector_p;
bool testing_p;
};
aarch64_expand_vec_perm_1 (target, op0, op1, sel);
}
+/* Generate (set TARGET (unspec [OP0 OP1] CODE)). */
+
+static void
+emit_unspec2 (rtx target, int code, rtx op0, rtx op1)
+{
+ emit_insn (gen_rtx_SET (target,
+ gen_rtx_UNSPEC (GET_MODE (target),
+ gen_rtvec (2, op0, op1), code)));
+}
+
+/* Expand an SVE vec_perm with the given operands. */
+
+void
+aarch64_expand_sve_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
+{
+ machine_mode data_mode = GET_MODE (target);
+ machine_mode sel_mode = GET_MODE (sel);
+ /* Enforced by the pattern condition. */
+ int nunits = GET_MODE_NUNITS (sel_mode).to_constant ();
+
+ /* Note: vec_perm indices are supposed to wrap when they go beyond the
+ size of the two value vectors, i.e. the upper bits of the indices
+ are effectively ignored. SVE TBL instead produces 0 for any
+ out-of-range indices, so we need to modulo all the vec_perm indices
+ to ensure they are all in range. */
+ rtx sel_reg = force_reg (sel_mode, sel);
+
+ /* Check if the sel only references the first values vector. */
+ if (GET_CODE (sel) == CONST_VECTOR
+ && aarch64_const_vec_all_in_range_p (sel, 0, nunits - 1))
+ {
+ emit_unspec2 (target, UNSPEC_TBL, op0, sel_reg);
+ return;
+ }
+
+ /* Check if the two values vectors are the same. */
+ if (rtx_equal_p (op0, op1))
+ {
+ rtx max_sel = aarch64_simd_gen_const_vector_dup (sel_mode, nunits - 1);
+ rtx sel_mod = expand_simple_binop (sel_mode, AND, sel_reg, max_sel,
+ NULL, 0, OPTAB_DIRECT);
+ emit_unspec2 (target, UNSPEC_TBL, op0, sel_mod);
+ return;
+ }
+
+ /* Run TBL on for each value vector and combine the results. */
+
+ rtx res0 = gen_reg_rtx (data_mode);
+ rtx res1 = gen_reg_rtx (data_mode);
+ rtx neg_num_elems = aarch64_simd_gen_const_vector_dup (sel_mode, -nunits);
+ if (GET_CODE (sel) != CONST_VECTOR
+ || !aarch64_const_vec_all_in_range_p (sel, 0, 2 * nunits - 1))
+ {
+ rtx max_sel = aarch64_simd_gen_const_vector_dup (sel_mode,
+ 2 * nunits - 1);
+ sel_reg = expand_simple_binop (sel_mode, AND, sel_reg, max_sel,
+ NULL, 0, OPTAB_DIRECT);
+ }
+ emit_unspec2 (res0, UNSPEC_TBL, op0, sel_reg);
+ rtx sel_sub = expand_simple_binop (sel_mode, PLUS, sel_reg, neg_num_elems,
+ NULL, 0, OPTAB_DIRECT);
+ emit_unspec2 (res1, UNSPEC_TBL, op1, sel_sub);
+ if (GET_MODE_CLASS (data_mode) == MODE_VECTOR_INT)
+ emit_insn (gen_rtx_SET (target, gen_rtx_IOR (data_mode, res0, res1)));
+ else
+ emit_unspec2 (target, UNSPEC_IORF, res0, res1);
+}
+
/* Recognize patterns suitable for the TRN instructions. */
static bool
aarch64_evpc_trn (struct expand_vec_perm_d *d)
in0 = d->op0;
in1 = d->op1;
- if (BYTES_BIG_ENDIAN)
+ /* We don't need a big-endian lane correction for SVE; see the comment
+ at the head of aarch64-sve.md for details. */
+ if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD)
{
x = in0, in0 = in1, in1 = x;
odd = !odd;
in0 = d->op0;
in1 = d->op1;
- if (BYTES_BIG_ENDIAN)
+ /* We don't need a big-endian lane correction for SVE; see the comment
+ at the head of aarch64-sve.md for details. */
+ if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD)
{
x = in0, in0 = in1, in1 = x;
odd = !odd;
in0 = d->op0;
in1 = d->op1;
- if (BYTES_BIG_ENDIAN)
+ /* We don't need a big-endian lane correction for SVE; see the comment
+ at the head of aarch64-sve.md for details. */
+ if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD)
{
x = in0, in0 = in1, in1 = x;
high = !high;
/* The first element always refers to the first vector.
Check if the extracted indices are increasing by one. */
- if (!d->perm[0].is_constant (&location)
+ if (d->vec_flags == VEC_SVE_PRED
+ || !d->perm[0].is_constant (&location)
|| !d->perm.series_p (0, 1, location, 1))
return false;
return true;
/* The case where (location == 0) is a no-op for both big- and little-endian,
- and is removed by the mid-end at optimization levels -O1 and higher. */
+ and is removed by the mid-end at optimization levels -O1 and higher.
- if (BYTES_BIG_ENDIAN && (location != 0))
+ We don't need a big-endian lane correction for SVE; see the comment
+ at the head of aarch64-sve.md for details. */
+ if (BYTES_BIG_ENDIAN && location != 0 && d->vec_flags == VEC_ADVSIMD)
{
/* After setup, we want the high elements of the first vector (stored
at the LSB end of the register), and the low elements of the second
return true;
}
-/* Recognize patterns for the REV insns. */
+/* Recognize patterns for the REV{64,32,16} insns, which reverse elements
+ within each 64-bit, 32-bit or 16-bit granule. */
static bool
-aarch64_evpc_rev (struct expand_vec_perm_d *d)
+aarch64_evpc_rev_local (struct expand_vec_perm_d *d)
{
HOST_WIDE_INT diff;
unsigned int i, size, unspec;
+ machine_mode pred_mode;
- if (!d->one_vector_p
+ if (d->vec_flags == VEC_SVE_PRED
+ || !d->one_vector_p
|| !d->perm[0].is_constant (&diff))
return false;
size = (diff + 1) * GET_MODE_UNIT_SIZE (d->vmode);
if (size == 8)
- unspec = UNSPEC_REV64;
+ {
+ unspec = UNSPEC_REV64;
+ pred_mode = VNx2BImode;
+ }
else if (size == 4)
- unspec = UNSPEC_REV32;
+ {
+ unspec = UNSPEC_REV32;
+ pred_mode = VNx4BImode;
+ }
else if (size == 2)
- unspec = UNSPEC_REV16;
+ {
+ unspec = UNSPEC_REV16;
+ pred_mode = VNx8BImode;
+ }
else
return false;
if (d->testing_p)
return true;
- emit_set_insn (d->target, gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0),
- unspec));
+ rtx src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0), unspec);
+ if (d->vec_flags == VEC_SVE_DATA)
+ {
+ rtx pred = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+ src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (2, pred, src),
+ UNSPEC_MERGE_PTRUE);
+ }
+ emit_set_insn (d->target, src);
+ return true;
+}
+
+/* Recognize patterns for the REV insn, which reverses elements within
+ a full vector. */
+
+static bool
+aarch64_evpc_rev_global (struct expand_vec_perm_d *d)
+{
+ poly_uint64 nelt = d->perm.length ();
+
+ if (!d->one_vector_p || d->vec_flags != VEC_SVE_DATA)
+ return false;
+
+ if (!d->perm.series_p (0, 1, nelt - 1, -1))
+ return false;
+
+ /* Success! */
+ if (d->testing_p)
+ return true;
+
+ rtx src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0), UNSPEC_REV);
+ emit_set_insn (d->target, src);
return true;
}
machine_mode vmode = d->vmode;
rtx lane;
- if (d->perm.encoding ().encoded_nelts () != 1
+ if (d->vec_flags == VEC_SVE_PRED
+ || d->perm.encoding ().encoded_nelts () != 1
|| !d->perm[0].is_constant (&elt))
return false;
+ if (d->vec_flags == VEC_SVE_DATA && elt >= 64 * GET_MODE_UNIT_SIZE (vmode))
+ return false;
+
/* Success! */
if (d->testing_p)
return true;
static bool
aarch64_evpc_tbl (struct expand_vec_perm_d *d)
{
- rtx rperm[MAX_VECT_LEN], sel;
+ rtx rperm[MAX_COMPILE_TIME_VEC_BYTES], sel;
machine_mode vmode = d->vmode;
/* Make sure that the indices are constant. */
return true;
}
+/* Try to implement D using an SVE TBL instruction. */
+
+static bool
+aarch64_evpc_sve_tbl (struct expand_vec_perm_d *d)
+{
+ unsigned HOST_WIDE_INT nelt;
+
+ /* Permuting two variable-length vectors could overflow the
+ index range. */
+ if (!d->one_vector_p && !d->perm.length ().is_constant (&nelt))
+ return false;
+
+ if (d->testing_p)
+ return true;
+
+ machine_mode sel_mode = mode_for_int_vector (d->vmode).require ();
+ rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
+ aarch64_expand_sve_vec_perm (d->target, d->op0, d->op1, sel);
+ return true;
+}
+
static bool
aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
{
std::swap (d->op0, d->op1);
}
- if (TARGET_SIMD && known_gt (nelt, 1))
+ if ((d->vec_flags == VEC_ADVSIMD
+ || d->vec_flags == VEC_SVE_DATA
+ || d->vec_flags == VEC_SVE_PRED)
+ && known_gt (nelt, 1))
{
- if (aarch64_evpc_rev (d))
+ if (aarch64_evpc_rev_local (d))
+ return true;
+ else if (aarch64_evpc_rev_global (d))
return true;
else if (aarch64_evpc_ext (d))
return true;
return true;
else if (aarch64_evpc_trn (d))
return true;
- return aarch64_evpc_tbl (d);
+ if (d->vec_flags == VEC_SVE_DATA)
+ return aarch64_evpc_sve_tbl (d);
+ else if (d->vec_flags == VEC_SVE_DATA)
+ return aarch64_evpc_tbl (d);
}
return false;
}
d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
sel.nelts_per_input ());
d.vmode = vmode;
+ d.vec_flags = aarch64_classify_vector_mode (d.vmode);
d.target = target;
d.op0 = op0;
d.op1 = op1;
return force_reg (V16QImode, mask);
}
+/* Return true if X is a valid second operand for the SVE instruction
+ that implements integer comparison OP_CODE. */
+
+static bool
+aarch64_sve_cmp_operand_p (rtx_code op_code, rtx x)
+{
+ if (register_operand (x, VOIDmode))
+ return true;
+
+ switch (op_code)
+ {
+ case LTU:
+ case LEU:
+ case GEU:
+ case GTU:
+ return aarch64_sve_cmp_immediate_p (x, false);
+ case LT:
+ case LE:
+ case GE:
+ case GT:
+ case NE:
+ case EQ:
+ return aarch64_sve_cmp_immediate_p (x, true);
+ default:
+ gcc_unreachable ();
+ }
+}
+
+/* Return the UNSPEC_COND_* code for comparison CODE. */
+
+static unsigned int
+aarch64_unspec_cond_code (rtx_code code)
+{
+ switch (code)
+ {
+ case NE:
+ return UNSPEC_COND_NE;
+ case EQ:
+ return UNSPEC_COND_EQ;
+ case LT:
+ return UNSPEC_COND_LT;
+ case GT:
+ return UNSPEC_COND_GT;
+ case LE:
+ return UNSPEC_COND_LE;
+ case GE:
+ return UNSPEC_COND_GE;
+ case LTU:
+ return UNSPEC_COND_LO;
+ case GTU:
+ return UNSPEC_COND_HI;
+ case LEU:
+ return UNSPEC_COND_LS;
+ case GEU:
+ return UNSPEC_COND_HS;
+ case UNORDERED:
+ return UNSPEC_COND_UO;
+ default:
+ gcc_unreachable ();
+ }
+}
+
+/* Return an (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_<X>) expression,
+ where <X> is the operation associated with comparison CODE. */
+
+static rtx
+aarch64_gen_unspec_cond (rtx_code code, machine_mode pred_mode,
+ rtx pred, rtx op0, rtx op1)
+{
+ rtvec vec = gen_rtvec (3, pred, op0, op1);
+ return gen_rtx_UNSPEC (pred_mode, vec, aarch64_unspec_cond_code (code));
+}
+
+/* Expand an SVE integer comparison:
+
+ TARGET = CODE (OP0, OP1). */
+
+void
+aarch64_expand_sve_vec_cmp_int (rtx target, rtx_code code, rtx op0, rtx op1)
+{
+ machine_mode pred_mode = GET_MODE (target);
+ machine_mode data_mode = GET_MODE (op0);
+
+ if (!aarch64_sve_cmp_operand_p (code, op1))
+ op1 = force_reg (data_mode, op1);
+
+ rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+ rtx unspec = aarch64_gen_unspec_cond (code, pred_mode, ptrue, op0, op1);
+ emit_insn (gen_set_clobber_cc (target, unspec));
+}
+
+/* Emit an instruction:
+
+ (set TARGET (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_<X>))
+
+ where <X> is the operation associated with comparison CODE. */
+
+static void
+aarch64_emit_unspec_cond (rtx target, rtx_code code, machine_mode pred_mode,
+ rtx pred, rtx op0, rtx op1)
+{
+ rtx unspec = aarch64_gen_unspec_cond (code, pred_mode, pred, op0, op1);
+ emit_set_insn (target, unspec);
+}
+
+/* Emit:
+
+ (set TMP1 (unspec:PRED_MODE [PTRUE OP0 OP1] UNSPEC_COND_<X1>))
+ (set TMP2 (unspec:PRED_MODE [PTRUE OP0 OP1] UNSPEC_COND_<X2>))
+ (set TARGET (and:PRED_MODE (ior:PRED_MODE TMP1 TMP2) PTRUE))
+
+ where <Xi> is the operation associated with comparison CODEi. */
+
+static void
+aarch64_emit_unspec_cond_or (rtx target, rtx_code code1, rtx_code code2,
+ machine_mode pred_mode, rtx ptrue,
+ rtx op0, rtx op1)
+{
+ rtx tmp1 = gen_reg_rtx (pred_mode);
+ aarch64_emit_unspec_cond (tmp1, code1, pred_mode, ptrue, op0, op1);
+ rtx tmp2 = gen_reg_rtx (pred_mode);
+ aarch64_emit_unspec_cond (tmp2, code2, pred_mode, ptrue, op0, op1);
+ emit_set_insn (target, gen_rtx_AND (pred_mode,
+ gen_rtx_IOR (pred_mode, tmp1, tmp2),
+ ptrue));
+}
+
+/* If CAN_INVERT_P, emit an instruction:
+
+ (set TARGET (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_<X>))
+
+ where <X> is the operation associated with comparison CODE. Otherwise
+ emit:
+
+ (set TMP (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_<X>))
+ (set TARGET (and:PRED_MODE (not:PRED_MODE TMP) PTRUE))
+
+ where the second instructions sets TARGET to the inverse of TMP. */
+
+static void
+aarch64_emit_inverted_unspec_cond (rtx target, rtx_code code,
+ machine_mode pred_mode, rtx ptrue, rtx pred,
+ rtx op0, rtx op1, bool can_invert_p)
+{
+ if (can_invert_p)
+ aarch64_emit_unspec_cond (target, code, pred_mode, pred, op0, op1);
+ else
+ {
+ rtx tmp = gen_reg_rtx (pred_mode);
+ aarch64_emit_unspec_cond (tmp, code, pred_mode, pred, op0, op1);
+ emit_set_insn (target, gen_rtx_AND (pred_mode,
+ gen_rtx_NOT (pred_mode, tmp),
+ ptrue));
+ }
+}
+
+/* Expand an SVE floating-point comparison:
+
+ TARGET = CODE (OP0, OP1)
+
+ If CAN_INVERT_P is true, the caller can also handle inverted results;
+ return true if the result is in fact inverted. */
+
+bool
+aarch64_expand_sve_vec_cmp_float (rtx target, rtx_code code,
+ rtx op0, rtx op1, bool can_invert_p)
+{
+ machine_mode pred_mode = GET_MODE (target);
+ machine_mode data_mode = GET_MODE (op0);
+
+ rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+ switch (code)
+ {
+ case UNORDERED:
+ /* UNORDERED has no immediate form. */
+ op1 = force_reg (data_mode, op1);
+ aarch64_emit_unspec_cond (target, code, pred_mode, ptrue, op0, op1);
+ return false;
+
+ case LT:
+ case LE:
+ case GT:
+ case GE:
+ case EQ:
+ case NE:
+ /* There is native support for the comparison. */
+ aarch64_emit_unspec_cond (target, code, pred_mode, ptrue, op0, op1);
+ return false;
+
+ case ORDERED:
+ /* There is native support for the inverse comparison. */
+ op1 = force_reg (data_mode, op1);
+ aarch64_emit_inverted_unspec_cond (target, UNORDERED,
+ pred_mode, ptrue, ptrue, op0, op1,
+ can_invert_p);
+ return can_invert_p;
+
+ case LTGT:
+ /* This is a trapping operation (LT or GT). */
+ aarch64_emit_unspec_cond_or (target, LT, GT, pred_mode, ptrue, op0, op1);
+ return false;
+
+ case UNEQ:
+ if (!flag_trapping_math)
+ {
+ /* This would trap for signaling NaNs. */
+ op1 = force_reg (data_mode, op1);
+ aarch64_emit_unspec_cond_or (target, UNORDERED, EQ,
+ pred_mode, ptrue, op0, op1);
+ return false;
+ }
+ /* fall through */
+
+ case UNLT:
+ case UNLE:
+ case UNGT:
+ case UNGE:
+ {
+ rtx ordered = ptrue;
+ if (flag_trapping_math)
+ {
+ /* Only compare the elements that are known to be ordered. */
+ ordered = gen_reg_rtx (pred_mode);
+ op1 = force_reg (data_mode, op1);
+ aarch64_emit_inverted_unspec_cond (ordered, UNORDERED, pred_mode,
+ ptrue, ptrue, op0, op1, false);
+ }
+ if (code == UNEQ)
+ code = NE;
+ else
+ code = reverse_condition_maybe_unordered (code);
+ aarch64_emit_inverted_unspec_cond (target, code, pred_mode, ptrue,
+ ordered, op0, op1, can_invert_p);
+ return can_invert_p;
+ }
+
+ default:
+ gcc_unreachable ();
+ }
+}
+
+/* Expand an SVE vcond pattern with operands OPS. DATA_MODE is the mode
+ of the data being selected and CMP_MODE is the mode of the values being
+ compared. */
+
+void
+aarch64_expand_sve_vcond (machine_mode data_mode, machine_mode cmp_mode,
+ rtx *ops)
+{
+ machine_mode pred_mode
+ = aarch64_get_mask_mode (GET_MODE_NUNITS (cmp_mode),
+ GET_MODE_SIZE (cmp_mode)).require ();
+ rtx pred = gen_reg_rtx (pred_mode);
+ if (FLOAT_MODE_P (cmp_mode))
+ {
+ if (aarch64_expand_sve_vec_cmp_float (pred, GET_CODE (ops[3]),
+ ops[4], ops[5], true))
+ std::swap (ops[1], ops[2]);
+ }
+ else
+ aarch64_expand_sve_vec_cmp_int (pred, GET_CODE (ops[3]), ops[4], ops[5]);
+
+ rtvec vec = gen_rtvec (3, pred, ops[1], ops[2]);
+ emit_set_insn (ops[0], gen_rtx_UNSPEC (data_mode, vec, UNSPEC_SEL));
+}
+
/* Implement TARGET_MODES_TIEABLE_P. In principle we should always return
true. However due to issues with register allocation it is preferable
to avoid tieing integer scalar and FP scalar modes. Executing integer
/* We specifically want to allow elements of "structure" modes to
be tieable to the structure. This more general condition allows
- other rarer situations too. */
- if (aarch64_vector_mode_p (mode1) && aarch64_vector_mode_p (mode2))
+ other rarer situations too. The reason we don't extend this to
+ predicate modes is that there are no predicate structure modes
+ nor any specific instructions for extracting part of a predicate
+ register. */
+ if (aarch64_vector_data_mode_p (mode1)
+ && aarch64_vector_data_mode_p (mode2))
return true;
/* Also allow any scalar modes with vectors. */
}
}
+/* Implement the TARGET_DWARF_POLY_INDETERMINATE_VALUE hook. */
+
+static unsigned int
+aarch64_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
+ int *offset)
+{
+ /* Polynomial invariant 1 == (VG / 2) - 1. */
+ gcc_assert (i == 1);
+ *factor = 2;
+ *offset = 1;
+ return AARCH64_DWARF_VG;
+}
+
/* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE
if MODE is HFmode, and punt to the generic implementation otherwise. */
}
}
+/* Implement TARGET_COMPUTE_PRESSURE_CLASSES. */
+
+static int
+aarch64_compute_pressure_classes (reg_class *classes)
+{
+ int i = 0;
+ classes[i++] = GENERAL_REGS;
+ classes[i++] = FP_REGS;
+ /* PR_REGS isn't a useful pressure class because many predicate pseudo
+ registers need to go in PR_LO_REGS at some point during their
+ lifetime. Splitting it into two halves has the effect of making
+ all predicates count against PR_LO_REGS, so that we try whenever
+ possible to restrict the number of live predicates to 8. This
+ greatly reduces the amount of spilling in certain loops. */
+ classes[i++] = PR_LO_REGS;
+ classes[i++] = PR_HI_REGS;
+ return i;
+}
+
+/* Implement TARGET_CAN_CHANGE_MODE_CLASS. */
+
+static bool
+aarch64_can_change_mode_class (machine_mode from,
+ machine_mode to, reg_class_t)
+{
+ /* See the comment at the head of aarch64-sve.md for details. */
+ if (BYTES_BIG_ENDIAN
+ && (aarch64_sve_data_mode_p (from) != aarch64_sve_data_mode_p (to)))
+ return false;
+ return true;
+}
+
/* Target-specific selftests. */
#if CHECKING_P
#undef TARGET_FUNCTION_ARG_PADDING
#define TARGET_FUNCTION_ARG_PADDING aarch64_function_arg_padding
+#undef TARGET_GET_RAW_RESULT_MODE
+#define TARGET_GET_RAW_RESULT_MODE aarch64_get_reg_raw_mode
+#undef TARGET_GET_RAW_ARG_MODE
+#define TARGET_GET_RAW_ARG_MODE aarch64_get_reg_raw_mode
+
#undef TARGET_FUNCTION_OK_FOR_SIBCALL
#define TARGET_FUNCTION_OK_FOR_SIBCALL aarch64_function_ok_for_sibcall
#undef TARGET_VECTOR_ALIGNMENT
#define TARGET_VECTOR_ALIGNMENT aarch64_simd_vector_alignment
+#undef TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT
+#define TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT \
+ aarch64_vectorize_preferred_vector_alignment
#undef TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE
#define TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE \
aarch64_simd_vector_alignment_reachable
#define TARGET_VECTORIZE_VEC_PERM_CONST \
aarch64_vectorize_vec_perm_const
+#undef TARGET_VECTORIZE_GET_MASK_MODE
+#define TARGET_VECTORIZE_GET_MASK_MODE aarch64_get_mask_mode
+
#undef TARGET_INIT_LIBFUNCS
#define TARGET_INIT_LIBFUNCS aarch64_init_libfuncs
#undef TARGET_OMIT_STRUCT_RETURN_REG
#define TARGET_OMIT_STRUCT_RETURN_REG true
+#undef TARGET_DWARF_POLY_INDETERMINATE_VALUE
+#define TARGET_DWARF_POLY_INDETERMINATE_VALUE \
+ aarch64_dwarf_poly_indeterminate_value
+
/* The architecture reserves bits 0 and 1 so use bit 2 for descriptors. */
#undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 4
#undef TARGET_CONSTANT_ALIGNMENT
#define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
+#undef TARGET_COMPUTE_PRESSURE_CLASSES
+#define TARGET_COMPUTE_PRESSURE_CLASSES aarch64_compute_pressure_classes
+
+#undef TARGET_CAN_CHANGE_MODE_CLASS
+#define TARGET_CAN_CHANGE_MODE_CLASS aarch64_can_change_mode_class
+
#if CHECKING_P
#undef TARGET_RUN_TARGET_SELFTESTS
#define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
/* ARMv8.2-A architecture extensions. */
#define AARCH64_FL_V8_2 (1 << 8) /* Has ARMv8.2-A features. */
#define AARCH64_FL_F16 (1 << 9) /* Has ARMv8.2-A FP16 extensions. */
+#define AARCH64_FL_SVE (1 << 10) /* Has Scalable Vector Extensions. */
/* ARMv8.3-A architecture extensions. */
-#define AARCH64_FL_V8_3 (1 << 10) /* Has ARMv8.3-A features. */
-#define AARCH64_FL_RCPC (1 << 11) /* Has support for RCpc model. */
-#define AARCH64_FL_DOTPROD (1 << 12) /* Has ARMv8.2-A Dot Product ins. */
+#define AARCH64_FL_V8_3 (1 << 11) /* Has ARMv8.3-A features. */
+#define AARCH64_FL_RCPC (1 << 12) /* Has support for RCpc model. */
+#define AARCH64_FL_DOTPROD (1 << 13) /* Has ARMv8.2-A Dot Product ins. */
/* New flags to split crypto into aes and sha2. */
-#define AARCH64_FL_AES (1 << 13) /* Has Crypto AES. */
-#define AARCH64_FL_SHA2 (1 << 14) /* Has Crypto SHA2. */
+#define AARCH64_FL_AES (1 << 14) /* Has Crypto AES. */
+#define AARCH64_FL_SHA2 (1 << 15) /* Has Crypto SHA2. */
/* ARMv8.4-A architecture extensions. */
-#define AARCH64_FL_V8_4 (1 << 15) /* Has ARMv8.4-A features. */
-#define AARCH64_FL_SM4 (1 << 16) /* Has ARMv8.4-A SM3 and SM4. */
-#define AARCH64_FL_SHA3 (1 << 17) /* Has ARMv8.4-a SHA3 and SHA512. */
-#define AARCH64_FL_F16FML (1 << 18) /* Has ARMv8.4-a FP16 extensions. */
+#define AARCH64_FL_V8_4 (1 << 16) /* Has ARMv8.4-A features. */
+#define AARCH64_FL_SM4 (1 << 17) /* Has ARMv8.4-A SM3 and SM4. */
+#define AARCH64_FL_SHA3 (1 << 18) /* Has ARMv8.4-a SHA3 and SHA512. */
+#define AARCH64_FL_F16FML (1 << 19) /* Has ARMv8.4-a FP16 extensions. */
/* Has FP and SIMD. */
#define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
#define AARCH64_ISA_RDMA (aarch64_isa_flags & AARCH64_FL_RDMA)
#define AARCH64_ISA_V8_2 (aarch64_isa_flags & AARCH64_FL_V8_2)
#define AARCH64_ISA_F16 (aarch64_isa_flags & AARCH64_FL_F16)
+#define AARCH64_ISA_SVE (aarch64_isa_flags & AARCH64_FL_SVE)
#define AARCH64_ISA_V8_3 (aarch64_isa_flags & AARCH64_FL_V8_3)
#define AARCH64_ISA_DOTPROD (aarch64_isa_flags & AARCH64_FL_DOTPROD)
#define AARCH64_ISA_AES (aarch64_isa_flags & AARCH64_FL_AES)
/* Dot Product is an optional extension to AdvSIMD enabled through +dotprod. */
#define TARGET_DOTPROD (TARGET_SIMD && AARCH64_ISA_DOTPROD)
+/* SVE instructions, enabled through +sve. */
+#define TARGET_SVE (AARCH64_ISA_SVE)
+
/* ARMv8.3-A features. */
#define TARGET_ARMV8_3 (AARCH64_ISA_V8_3)
V0-V7 Parameter/result registers
The vector register V0 holds scalar B0, H0, S0 and D0 in its least
- significant bits. Unlike AArch32 S1 is not packed into D0,
- etc. */
+ significant bits. Unlike AArch32 S1 is not packed into D0, etc.
+
+ P0-P7 Predicate low registers: valid in all predicate contexts
+ P8-P15 Predicate high registers: used as scratch space
+
+ VG Pseudo "vector granules" register
+
+ VG is the number of 64-bit elements in an SVE vector. We define
+ it as a hard register so that we can easily map it to the DWARF VG
+ register. GCC internally uses the poly_int variable aarch64_sve_vg
+ instead. */
/* Note that we don't mark X30 as a call-clobbered register. The idea is
that it's really the call instructions themselves which clobber X30.
0, 0, 0, 0, 0, 0, 0, 0, /* V8 - V15 */ \
0, 0, 0, 0, 0, 0, 0, 0, /* V16 - V23 */ \
0, 0, 0, 0, 0, 0, 0, 0, /* V24 - V31 */ \
- 1, 1, 1, /* SFP, AP, CC */ \
+ 1, 1, 1, 1, /* SFP, AP, CC, VG */ \
+ 0, 0, 0, 0, 0, 0, 0, 0, /* P0 - P7 */ \
+ 0, 0, 0, 0, 0, 0, 0, 0, /* P8 - P15 */ \
}
#define CALL_USED_REGISTERS \
0, 0, 0, 0, 0, 0, 0, 0, /* V8 - V15 */ \
1, 1, 1, 1, 1, 1, 1, 1, /* V16 - V23 */ \
1, 1, 1, 1, 1, 1, 1, 1, /* V24 - V31 */ \
- 1, 1, 1, /* SFP, AP, CC */ \
+ 1, 1, 1, 1, /* SFP, AP, CC, VG */ \
+ 1, 1, 1, 1, 1, 1, 1, 1, /* P0 - P7 */ \
+ 1, 1, 1, 1, 1, 1, 1, 1, /* P8 - P15 */ \
}
#define REGISTER_NAMES \
"v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15", \
"v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", \
"v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31", \
- "sfp", "ap", "cc", \
+ "sfp", "ap", "cc", "vg", \
+ "p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7", \
+ "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15", \
}
/* Generate the register aliases for core register N */
{"d" # N, V0_REGNUM + (N)}, \
{"s" # N, V0_REGNUM + (N)}, \
{"h" # N, V0_REGNUM + (N)}, \
- {"b" # N, V0_REGNUM + (N)}
+ {"b" # N, V0_REGNUM + (N)}, \
+ {"z" # N, V0_REGNUM + (N)}
/* Provide aliases for all of the ISA defined register name forms.
These aliases are convenient for use in the clobber lists of inline
#define FRAME_POINTER_REGNUM SFP_REGNUM
#define STACK_POINTER_REGNUM SP_REGNUM
#define ARG_POINTER_REGNUM AP_REGNUM
-#define FIRST_PSEUDO_REGISTER 67
+#define FIRST_PSEUDO_REGISTER (P15_REGNUM + 1)
/* The number of (integer) argument register available. */
#define NUM_ARG_REGS 8
#define AARCH64_DWARF_NUMBER_R 31
#define AARCH64_DWARF_SP 31
+#define AARCH64_DWARF_VG 46
+#define AARCH64_DWARF_P0 48
#define AARCH64_DWARF_V0 64
/* The number of V registers. */
#define FP_LO_REGNUM_P(REGNO) \
(((unsigned) (REGNO - V0_REGNUM)) <= (V15_REGNUM - V0_REGNUM))
+#define PR_REGNUM_P(REGNO)\
+ (((unsigned) (REGNO - P0_REGNUM)) <= (P15_REGNUM - P0_REGNUM))
+
+#define PR_LO_REGNUM_P(REGNO)\
+ (((unsigned) (REGNO - P0_REGNUM)) <= (P7_REGNUM - P0_REGNUM))
+
\f
/* Register and constant classes. */
FP_LO_REGS,
FP_REGS,
POINTER_AND_FP_REGS,
+ PR_LO_REGS,
+ PR_HI_REGS,
+ PR_REGS,
ALL_REGS,
LIM_REG_CLASSES /* Last */
};
"FP_LO_REGS", \
"FP_REGS", \
"POINTER_AND_FP_REGS", \
+ "PR_LO_REGS", \
+ "PR_HI_REGS", \
+ "PR_REGS", \
"ALL_REGS" \
}
{ 0x00000000, 0x0000ffff, 0x00000000 }, /* FP_LO_REGS */ \
{ 0x00000000, 0xffffffff, 0x00000000 }, /* FP_REGS */ \
{ 0xffffffff, 0xffffffff, 0x00000003 }, /* POINTER_AND_FP_REGS */\
- { 0xffffffff, 0xffffffff, 0x00000007 } /* ALL_REGS */ \
+ { 0x00000000, 0x00000000, 0x00000ff0 }, /* PR_LO_REGS */ \
+ { 0x00000000, 0x00000000, 0x000ff000 }, /* PR_HI_REGS */ \
+ { 0x00000000, 0x00000000, 0x000ffff0 }, /* PR_REGS */ \
+ { 0xffffffff, 0xffffffff, 0x000fffff } /* ALL_REGS */ \
}
#define REGNO_REG_CLASS(REGNO) aarch64_regno_regclass (REGNO)
#define LIBGCC2_UNWIND_ATTRIBUTE \
__attribute__((optimize ("no-omit-frame-pointer")))
+#ifndef USED_FOR_TARGET
+extern poly_uint16 aarch64_sve_vg;
+
+/* The number of bits and bytes in an SVE vector. */
+#define BITS_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 64))
+#define BYTES_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 8))
+
+/* The number of bytes in an SVE predicate. */
+#define BYTES_PER_SVE_PRED aarch64_sve_vg
+
+/* The SVE mode for a vector of bytes. */
+#define SVE_BYTE_MODE VNx16QImode
+
+/* The maximum number of bytes in a fixed-size vector. This is 256 bytes
+ (for -msve-vector-bits=2048) multiplied by the maximum number of
+ vectors in a structure mode (4).
+
+ This limit must not be used for variable-size vectors, since
+ VL-agnostic code must work with arbitary vector lengths. */
+#define MAX_COMPILE_TIME_VEC_BYTES (256 * 4)
+#endif
+
+#define REGMODE_NATURAL_SIZE(MODE) aarch64_regmode_natural_size (MODE)
+
#endif /* GCC_AARCH64_H */
(SFP_REGNUM 64)
(AP_REGNUM 65)
(CC_REGNUM 66)
+ ;; Defined only to make the DWARF description simpler.
+ (VG_REGNUM 67)
+ (P0_REGNUM 68)
+ (P7_REGNUM 75)
+ (P15_REGNUM 83)
]
)
UNSPEC_PACI1716
UNSPEC_PACISP
UNSPEC_PRLG_STK
+ UNSPEC_REV
UNSPEC_RBIT
UNSPEC_SCVTF
UNSPEC_SISD_NEG
UNSPEC_RSQRTS
UNSPEC_NZCV
UNSPEC_XPACLRI
+ UNSPEC_LD1_SVE
+ UNSPEC_ST1_SVE
+ UNSPEC_LD1RQ
+ UNSPEC_MERGE_PTRUE
+ UNSPEC_PTEST_PTRUE
+ UNSPEC_UNPACKSHI
+ UNSPEC_UNPACKUHI
+ UNSPEC_UNPACKSLO
+ UNSPEC_UNPACKULO
+ UNSPEC_PACK
+ UNSPEC_FLOAT_CONVERT
+ UNSPEC_WHILE_LO
])
(define_c_enum "unspecv" [
;; will be disabled when !TARGET_SIMD.
(define_attr "simd" "no,yes" (const_string "no"))
+;; Attribute that specifies whether or not the instruction uses SVE.
+;; When this is set to yes for an alternative, that alternative
+;; will be disabled when !TARGET_SVE.
+(define_attr "sve" "no,yes" (const_string "no"))
+
(define_attr "length" ""
(const_int 4))
;; registers when -mgeneral-regs-only is specified.
(define_attr "enabled" "no,yes"
(cond [(ior
- (ior
- (and (eq_attr "fp" "yes")
- (eq (symbol_ref "TARGET_FLOAT") (const_int 0)))
- (and (eq_attr "simd" "yes")
- (eq (symbol_ref "TARGET_SIMD") (const_int 0))))
+ (and (eq_attr "fp" "yes")
+ (eq (symbol_ref "TARGET_FLOAT") (const_int 0)))
+ (and (eq_attr "simd" "yes")
+ (eq (symbol_ref "TARGET_SIMD") (const_int 0)))
(and (eq_attr "fp16" "yes")
- (eq (symbol_ref "TARGET_FP_F16INST") (const_int 0))))
+ (eq (symbol_ref "TARGET_FP_F16INST") (const_int 0)))
+ (and (eq_attr "sve" "yes")
+ (eq (symbol_ref "TARGET_SVE") (const_int 0))))
(const_string "no")
] (const_string "yes")))
"
if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx)
operands[1] = force_reg (<MODE>mode, operands[1]);
+
+ if (GET_CODE (operands[1]) == CONST_POLY_INT)
+ {
+ aarch64_expand_mov_immediate (operands[0], operands[1]);
+ DONE;
+ }
"
)
(define_insn "*mov<mode>_aarch64"
- [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r, *w,r,*w, m, m, r,*w,*w")
- (match_operand:SHORT 1 "general_operand" " r,M,D<hq>,m, m,rZ,*w,*w, r,*w"))]
+ [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r, *w,r ,r,*w, m, m, r,*w,*w")
+ (match_operand:SHORT 1 "aarch64_mov_operand" " r,M,D<hq>,Usv,m, m,rZ,*w,*w, r,*w"))]
"(register_operand (operands[0], <MODE>mode)
|| aarch64_reg_or_zero (operands[1], <MODE>mode))"
{
return aarch64_output_scalar_simd_mov_immediate (operands[1],
<MODE>mode);
case 3:
- return "ldr<size>\t%w0, %1";
+ return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
case 4:
- return "ldr\t%<size>0, %1";
+ return "ldr<size>\t%w0, %1";
case 5:
- return "str<size>\t%w1, %0";
+ return "ldr\t%<size>0, %1";
case 6:
- return "str\t%<size>1, %0";
+ return "str<size>\t%w1, %0";
case 7:
- return "umov\t%w0, %1.<v>[0]";
+ return "str\t%<size>1, %0";
case 8:
- return "dup\t%0.<Vallxd>, %w1";
+ return "umov\t%w0, %1.<v>[0]";
case 9:
+ return "dup\t%0.<Vallxd>, %w1";
+ case 10:
return "dup\t%<Vetype>0, %1.<v>[0]";
default:
gcc_unreachable ();
}
}
- [(set_attr "type" "mov_reg,mov_imm,neon_move,load_4,load_4,store_4,store_4,\
- neon_to_gp<q>,neon_from_gp<q>,neon_dup")
- (set_attr "simd" "*,*,yes,*,*,*,*,yes,yes,yes")]
+ ;; The "mov_imm" type for CNT is just a placeholder.
+ [(set_attr "type" "mov_reg,mov_imm,neon_move,mov_imm,load_4,load_4,store_4,
+ store_4,neon_to_gp<q>,neon_from_gp<q>,neon_dup")
+ (set_attr "simd" "*,*,yes,*,*,*,*,*,yes,yes,yes")
+ (set_attr "sve" "*,*,*,yes,*,*,*,*,*,*,*")]
)
(define_expand "mov<mode>"
)
(define_insn_and_split "*movsi_aarch64"
- [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,w, m, m, r, r, w,r,w, w")
- (match_operand:SI 1 "aarch64_mov_operand" " r,r,k,M,n,m,m,rZ,*w,Usa,Ush,rZ,w,w,Ds"))]
+ [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m, r, r, w,r,w, w")
+ (match_operand:SI 1 "aarch64_mov_operand" " r,r,k,M,n,Usv,m,m,rZ,*w,Usa,Ush,rZ,w,w,Ds"))]
"(register_operand (operands[0], SImode)
|| aarch64_reg_or_zero (operands[1], SImode))"
"@
mov\\t%w0, %w1
mov\\t%w0, %1
#
+ * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
ldr\\t%w0, %1
ldr\\t%s0, %1
str\\t%w1, %0
aarch64_expand_mov_immediate (operands[0], operands[1]);
DONE;
}"
- [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,load_4,load_4,store_4,store_4,\
- adr,adr,f_mcr,f_mrc,fmov,neon_move")
- (set_attr "fp" "*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
- (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
+ ;; The "mov_imm" type for CNT is just a placeholder.
+ [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4,
+ load_4,store_4,store_4,adr,adr,f_mcr,f_mrc,fmov,neon_move")
+ (set_attr "fp" "*,*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
+ (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")
+ (set_attr "sve" "*,*,*,*,*,yes,*,*,*,*,*,*,*,*,*,*")]
)
(define_insn_and_split "*movdi_aarch64"
- [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,r,w, m,m, r, r, w,r,w, w")
- (match_operand:DI 1 "aarch64_mov_operand" " r,r,k,N,M,n,m,m,rZ,w,Usa,Ush,rZ,w,w,Dd"))]
+ [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,r, r,w, m,m, r, r, w,r,w, w")
+ (match_operand:DI 1 "aarch64_mov_operand" " r,r,k,N,M,n,Usv,m,m,rZ,w,Usa,Ush,rZ,w,w,Dd"))]
"(register_operand (operands[0], DImode)
|| aarch64_reg_or_zero (operands[1], DImode))"
"@
mov\\t%x0, %1
mov\\t%w0, %1
#
+ * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
ldr\\t%x0, %1
ldr\\t%d0, %1
str\\t%x1, %0
aarch64_expand_mov_immediate (operands[0], operands[1]);
DONE;
}"
- [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_8,\
- load_8,store_8,store_8,adr,adr,f_mcr,f_mrc,fmov,neon_move")
- (set_attr "fp" "*,*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
- (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
+ ;; The "mov_imm" type for CNTD is just a placeholder.
+ [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,mov_imm,
+ load_8,load_8,store_8,store_8,adr,adr,f_mcr,f_mrc,fmov,
+ neon_move")
+ (set_attr "fp" "*,*,*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
+ (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")
+ (set_attr "sve" "*,*,*,*,*,*,yes,*,*,*,*,*,*,*,*,*,*")]
)
(define_insn "insv_imm<mode>"
"
if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx)
operands[1] = force_reg (TImode, operands[1]);
+
+ if (GET_CODE (operands[1]) == CONST_POLY_INT)
+ {
+ emit_move_insn (gen_lowpart (DImode, operands[0]),
+ gen_lowpart (DImode, operands[1]));
+ emit_move_insn (gen_highpart (DImode, operands[0]), const0_rtx);
+ DONE;
+ }
"
)
[(set
(match_operand:GPI 0 "register_operand" "")
(plus:GPI (match_operand:GPI 1 "register_operand" "")
- (match_operand:GPI 2 "aarch64_pluslong_operand" "")))]
+ (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" "")))]
""
{
/* If operands[1] is a subreg extract the inner RTX. */
&& (!REG_P (op1)
|| !REGNO_PTR_FRAME_P (REGNO (op1))))
operands[2] = force_reg (<MODE>mode, operands[2]);
+ /* Expand polynomial additions now if the destination is the stack
+ pointer, since we don't want to use that as a temporary. */
+ else if (operands[0] == stack_pointer_rtx
+ && aarch64_split_add_offset_immediate (operands[2], <MODE>mode))
+ {
+ aarch64_split_add_offset (<MODE>mode, operands[0], operands[1],
+ operands[2], NULL_RTX, NULL_RTX);
+ DONE;
+ }
})
(define_insn "*add<mode>3_aarch64"
[(set
- (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r")
+ (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r,rk")
(plus:GPI
- (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk")
- (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa")))]
+ (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk,rk")
+ (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa,Uav")))]
""
"@
add\\t%<w>0, %<w>1, %2
add\\t%<w>0, %<w>1, %<w>2
add\\t%<rtn>0<vas>, %<rtn>1<vas>, %<rtn>2<vas>
sub\\t%<w>0, %<w>1, #%n2
- #"
- [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm,multiple")
- (set_attr "simd" "*,*,yes,*,*")]
+ #
+ * return aarch64_output_sve_addvl_addpl (operands[0], operands[1], operands[2]);"
+ ;; The "alu_imm" type for ADDVL/ADDPL is just a placeholder.
+ [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm,multiple,alu_imm")
+ (set_attr "simd" "*,*,yes,*,*,*")]
)
;; zero_extend version of above
}
)
+;; Match addition of polynomial offsets that require one temporary, for which
+;; we can use the early-clobbered destination register. This is a separate
+;; pattern so that the early clobber doesn't affect register allocation
+;; for other forms of addition. However, we still need to provide an
+;; all-register alternative, in case the offset goes out of range after
+;; elimination. For completeness we might as well provide all GPR-based
+;; alternatives from the main pattern.
+;;
+;; We don't have a pattern for additions requiring two temporaries since at
+;; present LRA doesn't allow new scratches to be added during elimination.
+;; Such offsets should be rare anyway.
+;;
+;; ??? But if we added LRA support for new scratches, much of the ugliness
+;; here would go away. We could just handle all polynomial constants in
+;; this pattern.
+(define_insn_and_split "*add<mode>3_poly_1"
+ [(set
+ (match_operand:GPI 0 "register_operand" "=r,r,r,r,r,&r")
+ (plus:GPI
+ (match_operand:GPI 1 "register_operand" "%rk,rk,rk,rk,rk,rk")
+ (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" "I,r,J,Uaa,Uav,Uat")))]
+ "TARGET_SVE && operands[0] != stack_pointer_rtx"
+ "@
+ add\\t%<w>0, %<w>1, %2
+ add\\t%<w>0, %<w>1, %<w>2
+ sub\\t%<w>0, %<w>1, #%n2
+ #
+ * return aarch64_output_sve_addvl_addpl (operands[0], operands[1], operands[2]);
+ #"
+ "&& epilogue_completed
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && aarch64_split_add_offset_immediate (operands[2], <MODE>mode)"
+ [(const_int 0)]
+ {
+ aarch64_split_add_offset (<MODE>mode, operands[0], operands[1],
+ operands[2], operands[0], NULL_RTX);
+ DONE;
+ }
+ ;; The "alu_imm" type for ADDVL/ADDPL is just a placeholder.
+ [(set_attr "type" "alu_imm,alu_sreg,alu_imm,multiple,alu_imm,multiple")]
+)
+
(define_split
[(set (match_operand:DI 0 "register_operand")
(zero_extend:DI
DONE;
})
+;; Helper for aarch64.c code.
+(define_expand "set_clobber_cc"
+ [(parallel [(set (match_operand 0)
+ (match_operand 1))
+ (clobber (reg:CC CC_REGNUM))])])
+
;; AdvSIMD Stuff
(include "aarch64-simd.md")
;; ldp/stp peephole patterns
(include "aarch64-ldpstp.md")
+
+;; SVE.
+(include "aarch64-sve.md")
precision of division results to about 16 bits for
single precision and to 32 bits for double precision.
+Enum
+Name(sve_vector_bits) Type(enum aarch64_sve_vector_bits_enum)
+The possible SVE vector lengths:
+
+EnumValue
+Enum(sve_vector_bits) String(scalable) Value(SVE_SCALABLE)
+
+EnumValue
+Enum(sve_vector_bits) String(128) Value(SVE_128)
+
+EnumValue
+Enum(sve_vector_bits) String(256) Value(SVE_256)
+
+EnumValue
+Enum(sve_vector_bits) String(512) Value(SVE_512)
+
+EnumValue
+Enum(sve_vector_bits) String(1024) Value(SVE_1024)
+
+EnumValue
+Enum(sve_vector_bits) String(2048) Value(SVE_2048)
+
+msve-vector-bits=
+Target RejectNegative Joined Enum(sve_vector_bits) Var(aarch64_sve_vector_bits) Init(SVE_SCALABLE)
+-msve-vector-bits=N Set the number of bits in an SVE vector register to N.
+
mverbose-cost-dump
Common Undocumented Var(flag_aarch64_verbose_cost)
Enables verbose cost model dumping in the debug dump files.
(define_register_constraint "w" "FP_REGS"
"Floating point and SIMD vector registers.")
+(define_register_constraint "Upa" "PR_REGS"
+ "SVE predicate registers p0 - p15.")
+
+(define_register_constraint "Upl" "PR_LO_REGS"
+ "SVE predicate registers p0 - p7.")
+
(define_register_constraint "x" "FP_LO_REGS"
"Floating point and SIMD vector registers V0 - V15.")
(and (match_code "const_int")
(match_test "aarch64_pluslong_strict_immedate (op, VOIDmode)")))
+(define_constraint "Uav"
+ "@internal
+ A constraint that matches a VG-based constant that can be added by
+ a single ADDVL or ADDPL."
+ (match_operand 0 "aarch64_sve_addvl_addpl_immediate"))
+
+(define_constraint "Uat"
+ "@internal
+ A constraint that matches a VG-based constant that can be added by
+ using multiple instructions, with one temporary register."
+ (match_operand 0 "aarch64_split_add_offset_immediate"))
+
(define_constraint "J"
"A constant that can be used with a SUB operation (once negated)."
(and (match_code "const_int")
A constraint that matches the immediate constant -1."
(match_test "op == constm1_rtx"))
+(define_constraint "Usv"
+ "@internal
+ A constraint that matches a VG-based constant that can be loaded by
+ a single CNT[BHWD]."
+ (match_operand 0 "aarch64_sve_cnt_immediate"))
+
+(define_constraint "Usi"
+ "@internal
+ A constraint that matches an immediate operand valid for
+ the SVE INDEX instruction."
+ (match_operand 0 "aarch64_sve_index_immediate"))
+
(define_constraint "Ui1"
"@internal
A constraint that matches the immediate constant +1."
(match_test "aarch64_legitimate_address_p (DFmode, XEXP (op, 0), 1,
ADDR_QUERY_LDP_STP)")))
+(define_memory_constraint "Utr"
+ "@internal
+ An address valid for SVE LDR and STR instructions (as distinct from
+ LD[1234] and ST[1234] patterns)."
+ (and (match_code "mem")
+ (match_test "aarch64_sve_ldr_operand_p (op)")))
+
(define_memory_constraint "Utv"
"@internal
An address valid for loading/storing opaque structure
(match_test "aarch64_legitimate_address_p (V2DImode,
XEXP (op, 0), 1)")))
+(define_memory_constraint "Uty"
+ "@internal
+ An address valid for SVE LD1Rs."
+ (and (match_code "mem")
+ (match_test "aarch64_sve_ld1r_operand_p (op)")))
+
(define_constraint "Ufc"
"A floating point constant which can be used with an\
FMOV immediate operation."
(define_constraint "Dn"
"@internal
A constraint that matches vector of immediates."
- (and (match_code "const_vector")
+ (and (match_code "const,const_vector")
(match_test "aarch64_simd_valid_immediate (op, NULL)")))
(define_constraint "Dh"
(define_constraint "Dl"
"@internal
A constraint that matches vector of immediates for left shifts."
- (and (match_code "const_vector")
+ (and (match_code "const,const_vector")
(match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op),
true)")))
(define_constraint "Dr"
"@internal
A constraint that matches vector of immediates for right shifts."
- (and (match_code "const_vector")
+ (and (match_code "const,const_vector")
(match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op),
false)")))
(define_constraint "Dz"
"@internal
- A constraint that matches vector of immediate zero."
- (and (match_code "const_vector")
- (match_test "aarch64_simd_imm_zero_p (op, GET_MODE (op))")))
+ A constraint that matches a vector of immediate zero."
+ (and (match_code "const,const_vector")
+ (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+(define_constraint "Dm"
+ "@internal
+ A constraint that matches a vector of immediate minus one."
+ (and (match_code "const,const_vector")
+ (match_test "op == CONST1_RTX (GET_MODE (op))")))
(define_constraint "Dd"
"@internal
"@internal
An address valid for a prefetch instruction."
(match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
+
+(define_constraint "vsa"
+ "@internal
+ A constraint that matches an immediate operand valid for SVE
+ arithmetic instructions."
+ (match_operand 0 "aarch64_sve_arith_immediate"))
+
+(define_constraint "vsc"
+ "@internal
+ A constraint that matches a signed immediate operand valid for SVE
+ CMP instructions."
+ (match_operand 0 "aarch64_sve_cmp_vsc_immediate"))
+
+(define_constraint "vsd"
+ "@internal
+ A constraint that matches an unsigned immediate operand valid for SVE
+ CMP instructions."
+ (match_operand 0 "aarch64_sve_cmp_vsd_immediate"))
+
+(define_constraint "vsi"
+ "@internal
+ A constraint that matches a vector count operand valid for SVE INC and
+ DEC instructions."
+ (match_operand 0 "aarch64_sve_inc_dec_immediate"))
+
+(define_constraint "vsn"
+ "@internal
+ A constraint that matches an immediate operand whose negative
+ is valid for SVE SUB instructions."
+ (match_operand 0 "aarch64_sve_sub_arith_immediate"))
+
+(define_constraint "vsl"
+ "@internal
+ A constraint that matches an immediate operand valid for SVE logical
+ operations."
+ (match_operand 0 "aarch64_sve_logical_immediate"))
+
+(define_constraint "vsm"
+ "@internal
+ A constraint that matches an immediate operand valid for SVE MUL
+ operations."
+ (match_operand 0 "aarch64_sve_mul_immediate"))
+
+(define_constraint "vsA"
+ "@internal
+ A constraint that matches an immediate operand valid for SVE FADD
+ and FSUB operations."
+ (match_operand 0 "aarch64_sve_float_arith_immediate"))
+
+(define_constraint "vsM"
+ "@internal
+ A constraint that matches an imediate operand valid for SVE FMUL
+ operations."
+ (match_operand 0 "aarch64_sve_float_mul_immediate"))
+
+(define_constraint "vsN"
+ "@internal
+ A constraint that matches the negative of vsA"
+ (match_operand 0 "aarch64_sve_float_arith_with_sub_immediate"))
;; Iterator for all scalar floating point modes (SF, DF and TF)
(define_mode_iterator GPF_TF [SF DF TF])
-;; Integer vector modes.
+;; Integer Advanced SIMD modes.
(define_mode_iterator VDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI])
-;; vector and scalar, 64 & 128-bit container, all integer modes
+;; Advanced SIMD and scalar, 64 & 128-bit container, all integer modes.
(define_mode_iterator VSDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI QI HI SI DI])
-;; vector and scalar, 64 & 128-bit container: all vector integer modes;
-;; 64-bit scalar integer mode
+;; Advanced SIMD and scalar, 64 & 128-bit container: all Advanced SIMD
+;; integer modes; 64-bit scalar integer mode.
(define_mode_iterator VSDQ_I_DI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI DI])
;; Double vector modes.
(define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF])
-;; vector, 64-bit container, all integer modes
+;; Advanced SIMD, 64-bit container, all integer modes.
(define_mode_iterator VD_BHSI [V8QI V4HI V2SI])
;; 128 and 64-bit container; 8, 16, 32-bit vector integer modes
;; pointer-sized quantities. Exactly one of the two alternatives will match.
(define_mode_iterator PTR [(SI "ptr_mode == SImode") (DI "ptr_mode == DImode")])
-;; Vector Float modes suitable for moving, loading and storing.
+;; Advanced SIMD Float modes suitable for moving, loading and storing.
(define_mode_iterator VDQF_F16 [V4HF V8HF V2SF V4SF V2DF])
-;; Vector Float modes.
+;; Advanced SIMD Float modes.
(define_mode_iterator VDQF [V2SF V4SF V2DF])
(define_mode_iterator VHSDF [(V4HF "TARGET_SIMD_F16INST")
(V8HF "TARGET_SIMD_F16INST")
V2SF V4SF V2DF])
-;; Vector Float modes, and DF.
+;; Advanced SIMD Float modes, and DF.
(define_mode_iterator VHSDF_DF [(V4HF "TARGET_SIMD_F16INST")
(V8HF "TARGET_SIMD_F16INST")
V2SF V4SF V2DF DF])
(HF "TARGET_SIMD_F16INST")
SF DF])
-;; Vector single Float modes.
+;; Advanced SIMD single Float modes.
(define_mode_iterator VDQSF [V2SF V4SF])
;; Quad vector Float modes with half/single elements.
;; Modes suitable to use as the return type of a vcond expression.
(define_mode_iterator VDQF_COND [V2SF V2SI V4SF V4SI V2DF V2DI])
-;; All Float modes.
+;; All scalar and Advanced SIMD Float modes.
(define_mode_iterator VALLF [V2SF V4SF V2DF SF DF])
-;; Vector Float modes with 2 elements.
+;; Advanced SIMD Float modes with 2 elements.
(define_mode_iterator V2F [V2SF V2DF])
-;; All vector modes on which we support any arithmetic operations.
+;; All Advanced SIMD modes on which we support any arithmetic operations.
(define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF])
-;; All vector modes suitable for moving, loading, and storing.
+;; All Advanced SIMD modes suitable for moving, loading, and storing.
(define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
V4HF V8HF V2SF V4SF V2DF])
(define_mode_iterator VALL_F16_NO_V2Q [V8QI V16QI V4HI V8HI V2SI V4SI
V4HF V8HF V2SF V4SF])
-;; All vector modes barring HF modes, plus DI.
+;; All Advanced SIMD modes barring HF modes, plus DI.
(define_mode_iterator VALLDI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF DI])
-;; All vector modes and DI.
+;; All Advanced SIMD modes and DI.
(define_mode_iterator VALLDI_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
V4HF V8HF V2SF V4SF V2DF DI])
-;; All vector modes, plus DI and DF.
+;; All Advanced SIMD modes, plus DI and DF.
(define_mode_iterator VALLDIF [V8QI V16QI V4HI V8HI V2SI V4SI
V2DI V4HF V8HF V2SF V4SF V2DF DI DF])
-;; Vector modes for Integer reduction across lanes.
+;; Advanced SIMD modes for Integer reduction across lanes.
(define_mode_iterator VDQV [V8QI V16QI V4HI V8HI V4SI V2DI])
-;; Vector modes(except V2DI) for Integer reduction across lanes.
+;; Advanced SIMD modes (except V2DI) for Integer reduction across lanes.
(define_mode_iterator VDQV_S [V8QI V16QI V4HI V8HI V4SI])
;; All double integer narrow-able modes.
;; All quad integer narrow-able modes.
(define_mode_iterator VQN [V8HI V4SI V2DI])
-;; Vector and scalar 128-bit container: narrowable 16, 32, 64-bit integer modes
+;; Advanced SIMD and scalar 128-bit container: narrowable 16, 32, 64-bit
+;; integer modes
(define_mode_iterator VSQN_HSDI [V8HI V4SI V2DI HI SI DI])
;; All quad integer widen-able modes.
;; Double vector modes for combines.
(define_mode_iterator VDC [V8QI V4HI V4HF V2SI V2SF DI DF])
-;; Vector modes except double int.
+;; Advanced SIMD modes except double int.
(define_mode_iterator VDQIF [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF V2DF])
(define_mode_iterator VDQIF_F16 [V8QI V16QI V4HI V8HI V2SI V4SI
V4HF V8HF V2SF V4SF V2DF])
-;; Vector modes for S type.
+;; Advanced SIMD modes for S type.
(define_mode_iterator VDQ_SI [V2SI V4SI])
-;; Vector modes for S and D
+;; Advanced SIMD modes for S and D.
(define_mode_iterator VDQ_SDI [V2SI V4SI V2DI])
-;; Vector modes for H, S and D
+;; Advanced SIMD modes for H, S and D.
(define_mode_iterator VDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
(V8HI "TARGET_SIMD_F16INST")
V2SI V4SI V2DI])
-;; Scalar and Vector modes for S and D
+;; Scalar and Advanced SIMD modes for S and D.
(define_mode_iterator VSDQ_SDI [V2SI V4SI V2DI SI DI])
-;; Scalar and Vector modes for S and D, Vector modes for H.
+;; Scalar and Advanced SIMD modes for S and D, Advanced SIMD modes for H.
(define_mode_iterator VSDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
(V8HI "TARGET_SIMD_F16INST")
V2SI V4SI V2DI
(HI "TARGET_SIMD_F16INST")
SI DI])
-;; Vector modes for Q and H types.
+;; Advanced SIMD modes for Q and H types.
(define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])
-;; Vector modes for H and S types.
+;; Advanced SIMD modes for H and S types.
(define_mode_iterator VDQHS [V4HI V8HI V2SI V4SI])
-;; Vector modes for H, S and D types.
+;; Advanced SIMD modes for H, S and D types.
(define_mode_iterator VDQHSD [V4HI V8HI V2SI V4SI V2DI])
-;; Vector and scalar integer modes for H and S
+;; Advanced SIMD and scalar integer modes for H and S.
(define_mode_iterator VSDQ_HSI [V4HI V8HI V2SI V4SI HI SI])
-;; Vector and scalar 64-bit container: 16, 32-bit integer modes
+;; Advanced SIMD and scalar 64-bit container: 16, 32-bit integer modes.
(define_mode_iterator VSD_HSI [V4HI V2SI HI SI])
-;; Vector 64-bit container: 16, 32-bit integer modes
+;; Advanced SIMD 64-bit container: 16, 32-bit integer modes.
(define_mode_iterator VD_HSI [V4HI V2SI])
;; Scalar 64-bit container: 16, 32-bit integer modes
(define_mode_iterator SD_HSI [HI SI])
-;; Vector 64-bit container: 16, 32-bit integer modes
+;; Advanced SIMD 64-bit container: 16, 32-bit integer modes.
(define_mode_iterator VQ_HSI [V8HI V4SI])
;; All byte modes.
(define_mode_iterator TX [TI TF])
-;; Opaque structure modes.
+;; Advanced SIMD opaque structure modes.
(define_mode_iterator VSTRUCT [OI CI XI])
;; Double scalar modes
(define_mode_iterator DX [DI DF])
-;; Modes available for <f>mul lane operations.
+;; Modes available for Advanced SIMD <f>mul lane operations.
(define_mode_iterator VMUL [V4HI V8HI V2SI V4SI
(V4HF "TARGET_SIMD_F16INST")
(V8HF "TARGET_SIMD_F16INST")
V2SF V4SF V2DF])
-;; Modes available for <f>mul lane operations changing lane count.
+;; Modes available for Advanced SIMD <f>mul lane operations changing lane
+;; count.
(define_mode_iterator VMUL_CHANGE_NLANES [V4HI V8HI V2SI V4SI V2SF V4SF])
+;; All SVE vector modes.
+(define_mode_iterator SVE_ALL [VNx16QI VNx8HI VNx4SI VNx2DI
+ VNx8HF VNx4SF VNx2DF])
+
+;; All SVE vector modes that have 8-bit or 16-bit elements.
+(define_mode_iterator SVE_BH [VNx16QI VNx8HI VNx8HF])
+
+;; All SVE vector modes that have 8-bit, 16-bit or 32-bit elements.
+(define_mode_iterator SVE_BHS [VNx16QI VNx8HI VNx4SI VNx8HF VNx4SF])
+
+;; All SVE integer vector modes that have 8-bit, 16-bit or 32-bit elements.
+(define_mode_iterator SVE_BHSI [VNx16QI VNx8HI VNx4SI])
+
+;; All SVE integer vector modes that have 16-bit, 32-bit or 64-bit elements.
+(define_mode_iterator SVE_HSDI [VNx16QI VNx8HI VNx4SI])
+
+;; All SVE floating-point vector modes that have 16-bit or 32-bit elements.
+(define_mode_iterator SVE_HSF [VNx8HF VNx4SF])
+
+;; All SVE vector modes that have 32-bit or 64-bit elements.
+(define_mode_iterator SVE_SD [VNx4SI VNx2DI VNx4SF VNx2DF])
+
+;; All SVE integer vector modes that have 32-bit or 64-bit elements.
+(define_mode_iterator SVE_SDI [VNx4SI VNx2DI])
+
+;; All SVE integer vector modes.
+(define_mode_iterator SVE_I [VNx16QI VNx8HI VNx4SI VNx2DI])
+
+;; All SVE floating-point vector modes.
+(define_mode_iterator SVE_F [VNx8HF VNx4SF VNx2DF])
+
+;; All SVE predicate modes.
+(define_mode_iterator PRED_ALL [VNx16BI VNx8BI VNx4BI VNx2BI])
+
+;; SVE predicate modes that control 8-bit, 16-bit or 32-bit elements.
+(define_mode_iterator PRED_BHS [VNx16BI VNx8BI VNx4BI])
+
;; ------------------------------------------------------------------
;; Unspec enumerations for Advance SIMD. These could well go into
;; aarch64.md but for their use in int_iterators here.
UNSPEC_FMLSL ; Used in aarch64-simd.md.
UNSPEC_FMLAL2 ; Used in aarch64-simd.md.
UNSPEC_FMLSL2 ; Used in aarch64-simd.md.
+ UNSPEC_SEL ; Used in aarch64-sve.md.
+ UNSPEC_ANDF ; Used in aarch64-sve.md.
+ UNSPEC_IORF ; Used in aarch64-sve.md.
+ UNSPEC_XORF ; Used in aarch64-sve.md.
+ UNSPEC_COND_LT ; Used in aarch64-sve.md.
+ UNSPEC_COND_LE ; Used in aarch64-sve.md.
+ UNSPEC_COND_EQ ; Used in aarch64-sve.md.
+ UNSPEC_COND_NE ; Used in aarch64-sve.md.
+ UNSPEC_COND_GE ; Used in aarch64-sve.md.
+ UNSPEC_COND_GT ; Used in aarch64-sve.md.
+ UNSPEC_COND_LO ; Used in aarch64-sve.md.
+ UNSPEC_COND_LS ; Used in aarch64-sve.md.
+ UNSPEC_COND_HS ; Used in aarch64-sve.md.
+ UNSPEC_COND_HI ; Used in aarch64-sve.md.
+ UNSPEC_COND_UO ; Used in aarch64-sve.md.
+ UNSPEC_LASTB ; Used in aarch64-sve.md.
])
;; ------------------------------------------------------------------
(HI "")])
;; Mode-to-individual element type mapping.
-(define_mode_attr Vetype [(V8QI "b") (V16QI "b")
- (V4HI "h") (V8HI "h")
- (V2SI "s") (V4SI "s")
- (V2DI "d") (V4HF "h")
- (V8HF "h") (V2SF "s")
- (V4SF "s") (V2DF "d")
+(define_mode_attr Vetype [(V8QI "b") (V16QI "b") (VNx16QI "b") (VNx16BI "b")
+ (V4HI "h") (V8HI "h") (VNx8HI "h") (VNx8BI "h")
+ (V2SI "s") (V4SI "s") (VNx4SI "s") (VNx4BI "s")
+ (V2DI "d") (VNx2DI "d") (VNx2BI "d")
+ (V4HF "h") (V8HF "h") (VNx8HF "h")
+ (V2SF "s") (V4SF "s") (VNx4SF "s")
+ (V2DF "d") (VNx2DF "d")
(HF "h")
(SF "s") (DF "d")
(QI "b") (HI "h")
(SI "s") (DI "d")])
+;; Equivalent of "size" for a vector element.
+(define_mode_attr Vesize [(VNx16QI "b")
+ (VNx8HI "h") (VNx8HF "h")
+ (VNx4SI "w") (VNx4SF "w")
+ (VNx2DI "d") (VNx2DF "d")])
+
;; Vetype is used everywhere in scheduling type and assembly output,
;; sometimes they are not the same, for example HF modes on some
;; instructions. stype is defined to represent scheduling type
(SI "8b")])
;; Define element mode for each vector mode.
-(define_mode_attr VEL [(V8QI "QI") (V16QI "QI")
- (V4HI "HI") (V8HI "HI")
- (V2SI "SI") (V4SI "SI")
- (DI "DI") (V2DI "DI")
- (V4HF "HF") (V8HF "HF")
- (V2SF "SF") (V4SF "SF")
- (V2DF "DF") (DF "DF")
- (SI "SI") (HI "HI")
+(define_mode_attr VEL [(V8QI "QI") (V16QI "QI") (VNx16QI "QI")
+ (V4HI "HI") (V8HI "HI") (VNx8HI "HI")
+ (V2SI "SI") (V4SI "SI") (VNx4SI "SI")
+ (DI "DI") (V2DI "DI") (VNx2DI "DI")
+ (V4HF "HF") (V8HF "HF") (VNx8HF "HF")
+ (V2SF "SF") (V4SF "SF") (VNx4SF "SF")
+ (DF "DF") (V2DF "DF") (VNx2DF "DF")
+ (SI "SI") (HI "HI")
(QI "QI")])
;; Define element mode for each vector mode (lower case).
-(define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
- (V4HI "hi") (V8HI "hi")
- (V2SI "si") (V4SI "si")
- (DI "di") (V2DI "di")
- (V4HF "hf") (V8HF "hf")
- (V2SF "sf") (V4SF "sf")
- (V2DF "df") (DF "df")
+(define_mode_attr Vel [(V8QI "qi") (V16QI "qi") (VNx16QI "qi")
+ (V4HI "hi") (V8HI "hi") (VNx8HI "hi")
+ (V2SI "si") (V4SI "si") (VNx4SI "si")
+ (DI "di") (V2DI "di") (VNx2DI "di")
+ (V4HF "hf") (V8HF "hf") (VNx8HF "hf")
+ (V2SF "sf") (V4SF "sf") (VNx4SF "sf")
+ (V2DF "df") (DF "df") (VNx2DF "df")
(SI "si") (HI "hi")
(QI "qi")])
+;; Element mode with floating-point values replaced by like-sized integers.
+(define_mode_attr VEL_INT [(VNx16QI "QI")
+ (VNx8HI "HI") (VNx8HF "HI")
+ (VNx4SI "SI") (VNx4SF "SI")
+ (VNx2DI "DI") (VNx2DF "DI")])
+
+;; Gives the mode of the 128-bit lowpart of an SVE vector.
+(define_mode_attr V128 [(VNx16QI "V16QI")
+ (VNx8HI "V8HI") (VNx8HF "V8HF")
+ (VNx4SI "V4SI") (VNx4SF "V4SF")
+ (VNx2DI "V2DI") (VNx2DF "V2DF")])
+
+;; ...and again in lower case.
+(define_mode_attr v128 [(VNx16QI "v16qi")
+ (VNx8HI "v8hi") (VNx8HF "v8hf")
+ (VNx4SI "v4si") (VNx4SF "v4sf")
+ (VNx2DI "v2di") (VNx2DF "v2df")])
+
;; 64-bit container modes the inner or scalar source mode.
(define_mode_attr VCOND [(HI "V4HI") (SI "V2SI")
(V4HI "V4HI") (V8HI "V4HI")
(V2DI "4s")])
;; Widened modes of vector modes.
-(define_mode_attr VWIDE [(V8QI "V8HI") (V4HI "V4SI")
- (V2SI "V2DI") (V16QI "V8HI")
- (V8HI "V4SI") (V4SI "V2DI")
- (HI "SI") (SI "DI")
- (V8HF "V4SF") (V4SF "V2DF")
- (V4HF "V4SF") (V2SF "V2DF")]
-)
+(define_mode_attr VWIDE [(V8QI "V8HI") (V4HI "V4SI")
+ (V2SI "V2DI") (V16QI "V8HI")
+ (V8HI "V4SI") (V4SI "V2DI")
+ (HI "SI") (SI "DI")
+ (V8HF "V4SF") (V4SF "V2DF")
+ (V4HF "V4SF") (V2SF "V2DF")
+ (VNx8HF "VNx4SF") (VNx4SF "VNx2DF")
+ (VNx16QI "VNx8HI") (VNx8HI "VNx4SI")
+ (VNx4SI "VNx2DI")
+ (VNx16BI "VNx8BI") (VNx8BI "VNx4BI")
+ (VNx4BI "VNx2BI")])
+
+;; Predicate mode associated with VWIDE.
+(define_mode_attr VWIDE_PRED [(VNx8HF "VNx4BI") (VNx4SF "VNx2BI")])
;; Widened modes of vector modes, lowercase
-(define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf")])
+(define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf")
+ (VNx16QI "vnx8hi") (VNx8HI "vnx4si")
+ (VNx4SI "vnx2di")
+ (VNx8HF "vnx4sf") (VNx4SF "vnx2df")
+ (VNx16BI "vnx8bi") (VNx8BI "vnx4bi")
+ (VNx4BI "vnx2bi")])
;; Widened mode register suffixes for VD_BHSI/VQW/VQ_HSF.
(define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s")
(V8HI "4s") (V4SI "2d")
(V8HF "4s") (V4SF "2d")])
+;; SVE vector after widening
+(define_mode_attr Vewtype [(VNx16QI "h")
+ (VNx8HI "s") (VNx8HF "s")
+ (VNx4SI "d") (VNx4SF "d")])
+
;; Widened mode register suffixes for VDW/VQW.
(define_mode_attr Vmwtype [(V8QI ".8h") (V4HI ".4s")
(V2SI ".2d") (V16QI ".8h")
(V4SF "2s")])
;; Define corresponding core/FP element mode for each vector mode.
-(define_mode_attr vw [(V8QI "w") (V16QI "w")
- (V4HI "w") (V8HI "w")
- (V2SI "w") (V4SI "w")
- (DI "x") (V2DI "x")
- (V2SF "s") (V4SF "s")
- (V2DF "d")])
+(define_mode_attr vw [(V8QI "w") (V16QI "w") (VNx16QI "w")
+ (V4HI "w") (V8HI "w") (VNx8HI "w")
+ (V2SI "w") (V4SI "w") (VNx4SI "w")
+ (DI "x") (V2DI "x") (VNx2DI "x")
+ (VNx8HF "h")
+ (V2SF "s") (V4SF "s") (VNx4SF "s")
+ (V2DF "d") (VNx2DF "d")])
;; Corresponding core element mode for each vector mode. This is a
;; variation on <vw> mapping FP modes to GP regs.
-(define_mode_attr vwcore [(V8QI "w") (V16QI "w")
- (V4HI "w") (V8HI "w")
- (V2SI "w") (V4SI "w")
- (DI "x") (V2DI "x")
- (V4HF "w") (V8HF "w")
- (V2SF "w") (V4SF "w")
- (V2DF "x")])
+(define_mode_attr vwcore [(V8QI "w") (V16QI "w") (VNx16QI "w")
+ (V4HI "w") (V8HI "w") (VNx8HI "w")
+ (V2SI "w") (V4SI "w") (VNx4SI "w")
+ (DI "x") (V2DI "x") (VNx2DI "x")
+ (V4HF "w") (V8HF "w") (VNx8HF "w")
+ (V2SF "w") (V4SF "w") (VNx4SF "w")
+ (V2DF "x") (VNx2DF "x")])
;; Double vector types for ALLX.
(define_mode_attr Vallxd [(QI "8b") (HI "4h") (SI "2s")])
(DI "DI") (V2DI "V2DI")
(V4HF "V4HI") (V8HF "V8HI")
(V2SF "V2SI") (V4SF "V4SI")
- (V2DF "V2DI") (DF "DI")
- (SF "SI") (HF "HI")])
+ (DF "DI") (V2DF "V2DI")
+ (SF "SI") (HF "HI")
+ (VNx16QI "VNx16QI")
+ (VNx8HI "VNx8HI") (VNx8HF "VNx8HI")
+ (VNx4SI "VNx4SI") (VNx4SF "VNx4SI")
+ (VNx2DI "VNx2DI") (VNx2DF "VNx2DI")
+])
;; Lower case mode with floating-point values replaced by like-sized integers.
(define_mode_attr v_int_equiv [(V8QI "v8qi") (V16QI "v16qi")
(DI "di") (V2DI "v2di")
(V4HF "v4hi") (V8HF "v8hi")
(V2SF "v2si") (V4SF "v4si")
- (V2DF "v2di") (DF "di")
- (SF "si")])
+ (DF "di") (V2DF "v2di")
+ (SF "si")
+ (VNx16QI "vnx16qi")
+ (VNx8HI "vnx8hi") (VNx8HF "vnx8hi")
+ (VNx4SI "vnx4si") (VNx4SF "vnx4si")
+ (VNx2DI "vnx2di") (VNx2DF "vnx2di")
+])
+
+;; Floating-point equivalent of selected modes.
+(define_mode_attr V_FP_EQUIV [(VNx4SI "VNx4SF") (VNx4SF "VNx4SF")
+ (VNx2DI "VNx2DF") (VNx2DF "VNx2DF")])
+(define_mode_attr v_fp_equiv [(VNx4SI "vnx4sf") (VNx4SF "vnx4sf")
+ (VNx2DI "vnx2df") (VNx2DF "vnx2df")])
;; Mode for vector conditional operations where the comparison has
;; different type from the lhs.
(define_code_attr f16mac [(plus "a") (minus "s")])
+;; The predicate mode associated with an SVE data mode.
+(define_mode_attr VPRED [(VNx16QI "VNx16BI")
+ (VNx8HI "VNx8BI") (VNx8HF "VNx8BI")
+ (VNx4SI "VNx4BI") (VNx4SF "VNx4BI")
+ (VNx2DI "VNx2BI") (VNx2DF "VNx2BI")])
+
+;; ...and again in lower case.
+(define_mode_attr vpred [(VNx16QI "vnx16bi")
+ (VNx8HI "vnx8bi") (VNx8HF "vnx8bi")
+ (VNx4SI "vnx4bi") (VNx4SF "vnx4bi")
+ (VNx2DI "vnx2bi") (VNx2DF "vnx2bi")])
+
;; -------------------------------------------------------------------
;; Code Iterators
;; -------------------------------------------------------------------
;; Code iterator for logical operations
(define_code_iterator LOGICAL [and ior xor])
+;; LOGICAL without AND.
+(define_code_iterator LOGICAL_OR [ior xor])
+
;; Code iterator for logical operations whose :nlogical works on SIMD registers.
(define_code_iterator NLOGICAL [and ior])
;; Unsigned comparison operators.
(define_code_iterator FAC_COMPARISONS [lt le ge gt])
+;; SVE integer unary operations.
+(define_code_iterator SVE_INT_UNARY [neg not popcount])
+
+;; SVE floating-point unary operations.
+(define_code_iterator SVE_FP_UNARY [neg abs sqrt])
+
;; -------------------------------------------------------------------
;; Code Attributes
;; -------------------------------------------------------------------
(unsigned_fix "fixuns")
(float "float")
(unsigned_float "floatuns")
+ (popcount "popcount")
(and "and")
(ior "ior")
(xor "xor")
(us_minus "qsub")
(ss_neg "qneg")
(ss_abs "qabs")
+ (smin "smin")
+ (smax "smax")
+ (umin "umin")
+ (umax "umax")
(eq "eq")
(ne "ne")
(lt "lt")
(ltu "ltu")
(leu "leu")
(geu "geu")
- (gtu "gtu")])
+ (gtu "gtu")
+ (abs "abs")
+ (sqrt "sqrt")])
;; For comparison operators we use the FCM* and CM* instructions.
;; As there are no CMLE or CMLT instructions which act on 3 vector
;; Operation names for negate and bitwise complement.
(define_code_attr neg_not_op [(neg "neg") (not "not")])
-;; Similar, but when not(op)
+;; Similar, but when the second operand is inverted.
(define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")])
+;; Similar, but when both operands are inverted.
+(define_code_attr logical_nn [(and "nor") (ior "nand")])
+
;; Sign- or zero-extending data-op
(define_code_attr su [(sign_extend "s") (zero_extend "u")
(sign_extract "s") (zero_extract "u")
(smax "s") (umax "u")
(smin "s") (umin "u")])
+;; Whether a shift is left or right.
+(define_code_attr lr [(ashift "l") (ashiftrt "r") (lshiftrt "r")])
+
;; Emit conditional branch instructions.
(define_code_attr bcond [(eq "beq") (ne "bne") (lt "bne") (ge "beq")])
;; Attribute to describe constants acceptable in atomic logical operations
(define_mode_attr lconst_atomic [(QI "K") (HI "K") (SI "K") (DI "L")])
+;; The integer SVE instruction that implements an rtx code.
+(define_code_attr sve_int_op [(plus "add")
+ (neg "neg")
+ (smin "smin")
+ (smax "smax")
+ (umin "umin")
+ (umax "umax")
+ (and "and")
+ (ior "orr")
+ (xor "eor")
+ (not "not")
+ (popcount "cnt")])
+
+;; The floating-point SVE instruction that implements an rtx code.
+(define_code_attr sve_fp_op [(plus "fadd")
+ (neg "fneg")
+ (abs "fabs")
+ (sqrt "fsqrt")])
+
;; -------------------------------------------------------------------
;; Int Iterators.
;; -------------------------------------------------------------------
(define_int_iterator FMAXMINV [UNSPEC_FMAXV UNSPEC_FMINV
UNSPEC_FMAXNMV UNSPEC_FMINNMV])
+(define_int_iterator LOGICALF [UNSPEC_ANDF UNSPEC_IORF UNSPEC_XORF])
+
(define_int_iterator HADDSUB [UNSPEC_SHADD UNSPEC_UHADD
UNSPEC_SRHADD UNSPEC_URHADD
UNSPEC_SHSUB UNSPEC_UHSUB
UNSPEC_TRN1 UNSPEC_TRN2
UNSPEC_UZP1 UNSPEC_UZP2])
+(define_int_iterator OPTAB_PERMUTE [UNSPEC_ZIP1 UNSPEC_ZIP2
+ UNSPEC_UZP1 UNSPEC_UZP2])
+
(define_int_iterator REVERSE [UNSPEC_REV64 UNSPEC_REV32 UNSPEC_REV16])
(define_int_iterator FRINT [UNSPEC_FRINTZ UNSPEC_FRINTP UNSPEC_FRINTM
(define_int_iterator VFMLA16_HIGH [UNSPEC_FMLAL2 UNSPEC_FMLSL2])
+(define_int_iterator UNPACK [UNSPEC_UNPACKSHI UNSPEC_UNPACKUHI
+ UNSPEC_UNPACKSLO UNSPEC_UNPACKULO])
+
+(define_int_iterator UNPACK_UNSIGNED [UNSPEC_UNPACKULO UNSPEC_UNPACKUHI])
+
+(define_int_iterator SVE_COND_INT_CMP [UNSPEC_COND_LT UNSPEC_COND_LE
+ UNSPEC_COND_EQ UNSPEC_COND_NE
+ UNSPEC_COND_GE UNSPEC_COND_GT
+ UNSPEC_COND_LO UNSPEC_COND_LS
+ UNSPEC_COND_HS UNSPEC_COND_HI])
+
+(define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_LT UNSPEC_COND_LE
+ UNSPEC_COND_EQ UNSPEC_COND_NE
+ UNSPEC_COND_GE UNSPEC_COND_GT])
+
;; Iterators for atomic operations.
(define_int_iterator ATOMIC_LDOP
;; -------------------------------------------------------------------
;; Int Iterators Attributes.
;; -------------------------------------------------------------------
+
+;; The optab associated with an operation. Note that for ANDF, IORF
+;; and XORF, the optab pattern is not actually defined; we just use this
+;; name for consistency with the integer patterns.
+(define_int_attr optab [(UNSPEC_ANDF "and")
+ (UNSPEC_IORF "ior")
+ (UNSPEC_XORF "xor")])
+
(define_int_attr maxmin_uns [(UNSPEC_UMAXV "umax")
(UNSPEC_UMINV "umin")
(UNSPEC_SMAXV "smax")
(UNSPEC_FMAXNM "fmaxnm")
(UNSPEC_FMINNM "fminnm")])
+;; The SVE logical instruction that implements an unspec.
+(define_int_attr logicalf_op [(UNSPEC_ANDF "and")
+ (UNSPEC_IORF "orr")
+ (UNSPEC_XORF "eor")])
+
+;; "s" for signed operations and "u" for unsigned ones.
+(define_int_attr su [(UNSPEC_UNPACKSHI "s")
+ (UNSPEC_UNPACKUHI "u")
+ (UNSPEC_UNPACKSLO "s")
+ (UNSPEC_UNPACKULO "u")])
+
(define_int_attr sur [(UNSPEC_SHADD "s") (UNSPEC_UHADD "u")
(UNSPEC_SRHADD "sr") (UNSPEC_URHADD "ur")
(UNSPEC_SHSUB "s") (UNSPEC_UHSUB "u")
(define_int_attr perm_hilo [(UNSPEC_ZIP1 "1") (UNSPEC_ZIP2 "2")
(UNSPEC_TRN1 "1") (UNSPEC_TRN2 "2")
- (UNSPEC_UZP1 "1") (UNSPEC_UZP2 "2")])
+ (UNSPEC_UZP1 "1") (UNSPEC_UZP2 "2")
+ (UNSPEC_UNPACKSHI "hi") (UNSPEC_UNPACKUHI "hi")
+ (UNSPEC_UNPACKSLO "lo") (UNSPEC_UNPACKULO "lo")])
(define_int_attr frecp_suffix [(UNSPEC_FRECPE "e") (UNSPEC_FRECPX "x")])
(define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s")
(UNSPEC_FMLAL2 "a") (UNSPEC_FMLSL2 "s")])
+
+;; The condition associated with an UNSPEC_COND_<xx>.
+(define_int_attr cmp_op [(UNSPEC_COND_LT "lt")
+ (UNSPEC_COND_LE "le")
+ (UNSPEC_COND_EQ "eq")
+ (UNSPEC_COND_NE "ne")
+ (UNSPEC_COND_GE "ge")
+ (UNSPEC_COND_GT "gt")
+ (UNSPEC_COND_LO "lo")
+ (UNSPEC_COND_LS "ls")
+ (UNSPEC_COND_HS "hs")
+ (UNSPEC_COND_HI "hi")])
+
+;; The constraint to use for an UNSPEC_COND_<xx>.
+(define_int_attr imm_con [(UNSPEC_COND_EQ "vsc")
+ (UNSPEC_COND_NE "vsc")
+ (UNSPEC_COND_LT "vsc")
+ (UNSPEC_COND_GE "vsc")
+ (UNSPEC_COND_LE "vsc")
+ (UNSPEC_COND_GT "vsc")
+ (UNSPEC_COND_LO "vsd")
+ (UNSPEC_COND_LS "vsd")
+ (UNSPEC_COND_HS "vsd")
+ (UNSPEC_COND_HI "vsd")])
(define_predicate "aarch64_fp_vec_pow2"
(match_test "aarch64_vec_fpconst_pow_of_2 (op) > 0"))
+(define_predicate "aarch64_sve_cnt_immediate"
+ (and (match_code "const_poly_int")
+ (match_test "aarch64_sve_cnt_immediate_p (op)")))
+
(define_predicate "aarch64_sub_immediate"
(and (match_code "const_int")
(match_test "aarch64_uimm12_shift (-INTVAL (op))")))
(and (match_operand 0 "aarch64_pluslong_immediate")
(not (match_operand 0 "aarch64_plus_immediate"))))
+(define_predicate "aarch64_sve_addvl_addpl_immediate"
+ (and (match_code "const_poly_int")
+ (match_test "aarch64_sve_addvl_addpl_immediate_p (op)")))
+
+(define_predicate "aarch64_split_add_offset_immediate"
+ (and (match_code "const_poly_int")
+ (match_test "aarch64_add_offset_temporaries (op) == 1")))
+
(define_predicate "aarch64_pluslong_operand"
(ior (match_operand 0 "register_operand")
- (match_operand 0 "aarch64_pluslong_immediate")))
+ (match_operand 0 "aarch64_pluslong_immediate")
+ (match_operand 0 "aarch64_sve_addvl_addpl_immediate")))
+
+(define_predicate "aarch64_pluslong_or_poly_operand"
+ (ior (match_operand 0 "aarch64_pluslong_operand")
+ (match_operand 0 "aarch64_split_add_offset_immediate")))
(define_predicate "aarch64_logical_immediate"
(and (match_code "const_int")
})
(define_predicate "aarch64_mov_operand"
- (and (match_code "reg,subreg,mem,const,const_int,symbol_ref,label_ref,high")
+ (and (match_code "reg,subreg,mem,const,const_int,symbol_ref,label_ref,high,
+ const_poly_int,const_vector")
(ior (match_operand 0 "register_operand")
(ior (match_operand 0 "memory_operand")
(match_test "aarch64_mov_operand_p (op, mode)")))))
+(define_predicate "aarch64_nonmemory_operand"
+ (and (match_code "reg,subreg,const,const_int,symbol_ref,label_ref,high,
+ const_poly_int,const_vector")
+ (ior (match_operand 0 "register_operand")
+ (match_test "aarch64_mov_operand_p (op, mode)"))))
+
(define_predicate "aarch64_movti_operand"
(and (match_code "reg,subreg,mem,const_int")
(ior (match_operand 0 "register_operand")
return aarch64_get_condition_code (op) >= 0;
})
+(define_special_predicate "aarch64_equality_operator"
+ (match_code "eq,ne"))
+
(define_special_predicate "aarch64_carry_operation"
(match_code "ne,geu")
{
})
(define_special_predicate "aarch64_simd_lshift_imm"
- (match_code "const_vector")
+ (match_code "const,const_vector")
{
return aarch64_simd_shift_imm_p (op, mode, true);
})
(define_special_predicate "aarch64_simd_rshift_imm"
- (match_code "const_vector")
+ (match_code "const,const_vector")
{
return aarch64_simd_shift_imm_p (op, mode, false);
})
+(define_predicate "aarch64_simd_imm_zero"
+ (and (match_code "const,const_vector")
+ (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+(define_predicate "aarch64_simd_or_scalar_imm_zero"
+ (and (match_code "const_int,const_double,const,const_vector")
+ (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+(define_predicate "aarch64_simd_imm_minus_one"
+ (and (match_code "const,const_vector")
+ (match_test "op == CONSTM1_RTX (GET_MODE (op))")))
+
(define_predicate "aarch64_simd_reg_or_zero"
- (and (match_code "reg,subreg,const_int,const_double,const_vector")
+ (and (match_code "reg,subreg,const_int,const_double,const,const_vector")
(ior (match_operand 0 "register_operand")
- (ior (match_test "op == const0_rtx")
- (match_test "aarch64_simd_imm_zero_p (op, mode)")))))
+ (match_test "op == const0_rtx")
+ (match_operand 0 "aarch64_simd_imm_zero"))))
(define_predicate "aarch64_simd_struct_operand"
(and (match_code "mem")
|| GET_CODE (XEXP (op, 0)) == POST_INC
|| GET_CODE (XEXP (op, 0)) == REG")))
-(define_special_predicate "aarch64_simd_imm_zero"
- (match_code "const_vector")
-{
- return aarch64_simd_imm_zero_p (op, mode);
-})
-
-(define_special_predicate "aarch64_simd_or_scalar_imm_zero"
- (match_test "aarch64_simd_imm_zero_p (op, mode)"))
-
-(define_special_predicate "aarch64_simd_imm_minus_one"
- (match_code "const_vector")
-{
- return aarch64_const_vec_all_same_int_p (op, -1);
-})
-
;; Predicates used by the various SIMD shift operations. These
;; fall in to 3 categories.
;; Shifts with a range 0-(bit_size - 1) (aarch64_simd_shift_imm)
(define_predicate "aarch64_constant_pool_symref"
(and (match_code "symbol_ref")
(match_test "CONSTANT_POOL_ADDRESS_P (op)")))
+
+(define_predicate "aarch64_constant_vector_operand"
+ (match_code "const,const_vector"))
+
+(define_predicate "aarch64_sve_ld1r_operand"
+ (and (match_operand 0 "memory_operand")
+ (match_test "aarch64_sve_ld1r_operand_p (op)")))
+
+;; Like memory_operand, but restricted to addresses that are valid for
+;; SVE LDR and STR instructions.
+(define_predicate "aarch64_sve_ldr_operand"
+ (and (match_code "mem")
+ (match_test "aarch64_sve_ldr_operand_p (op)")))
+
+(define_predicate "aarch64_sve_nonimmediate_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_ldr_operand")))
+
+(define_predicate "aarch64_sve_general_operand"
+ (and (match_code "reg,subreg,mem,const,const_vector")
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_ldr_operand")
+ (match_test "aarch64_mov_operand_p (op, mode)"))))
+
+;; Doesn't include immediates, since those are handled by the move
+;; patterns instead.
+(define_predicate "aarch64_sve_dup_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_ld1r_operand")))
+
+(define_predicate "aarch64_sve_arith_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_arith_immediate_p (op, false)")))
+
+(define_predicate "aarch64_sve_sub_arith_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_arith_immediate_p (op, true)")))
+
+(define_predicate "aarch64_sve_inc_dec_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_inc_dec_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_logical_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_bitmask_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_mul_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_const_vec_all_same_in_range_p (op, -128, 127)")))
+
+(define_predicate "aarch64_sve_dup_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_dup_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_cmp_vsc_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_cmp_immediate_p (op, true)")))
+
+(define_predicate "aarch64_sve_cmp_vsd_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_cmp_immediate_p (op, false)")))
+
+(define_predicate "aarch64_sve_index_immediate"
+ (and (match_code "const_int")
+ (match_test "aarch64_sve_index_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_float_arith_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_float_arith_immediate_p (op, false)")))
+
+(define_predicate "aarch64_sve_float_arith_with_sub_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_float_arith_immediate_p (op, true)")))
+
+(define_predicate "aarch64_sve_float_mul_immediate"
+ (and (match_code "const,const_vector")
+ (match_test "aarch64_sve_float_mul_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_arith_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_arith_immediate")))
+
+(define_predicate "aarch64_sve_add_operand"
+ (ior (match_operand 0 "aarch64_sve_arith_operand")
+ (match_operand 0 "aarch64_sve_sub_arith_immediate")
+ (match_operand 0 "aarch64_sve_inc_dec_immediate")))
+
+(define_predicate "aarch64_sve_logical_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_logical_immediate")))
+
+(define_predicate "aarch64_sve_lshift_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_simd_lshift_imm")))
+
+(define_predicate "aarch64_sve_rshift_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_simd_rshift_imm")))
+
+(define_predicate "aarch64_sve_mul_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_mul_immediate")))
+
+(define_predicate "aarch64_sve_cmp_vsc_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_cmp_vsc_immediate")))
+
+(define_predicate "aarch64_sve_cmp_vsd_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_cmp_vsd_immediate")))
+
+(define_predicate "aarch64_sve_index_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_index_immediate")))
+
+(define_predicate "aarch64_sve_float_arith_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_float_arith_immediate")))
+
+(define_predicate "aarch64_sve_float_arith_with_sub_operand"
+ (ior (match_operand 0 "aarch64_sve_float_arith_operand")
+ (match_operand 0 "aarch64_sve_float_arith_with_sub_immediate")))
+
+(define_predicate "aarch64_sve_float_mul_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_sve_float_mul_immediate")))
+
+(define_predicate "aarch64_sve_vec_perm_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_constant_vector_operand")))
functions, and @samp{all}, which enables pointer signing for all functions. The
default value is @samp{none}.
+@item -msve-vector-bits=@var{bits}
+@opindex msve-vector-bits
+Specify the number of bits in an SVE vector register. This option only has
+an effect when SVE is enabled.
+
+GCC supports two forms of SVE code generation: ``vector-length
+agnostic'' output that works with any size of vector register and
+``vector-length specific'' output that only works when the vector
+registers are a particular size. Replacing @var{bits} with
+@samp{scalable} selects vector-length agnostic output while
+replacing it with a number selects vector-length specific output.
+The possible lengths in the latter case are: 128, 256, 512, 1024
+and 2048. @samp{scalable} is the default.
+
+At present, @samp{-msve-vector-bits=128} produces the same output
+as @samp{-msve-vector-bits=scalable}.
+
@end table
@subsubsection @option{-march} and @option{-mcpu} Feature Modifiers
Enable Advanced SIMD instructions. This also enables floating-point
instructions. This is on by default for all possible values for options
@option{-march} and @option{-mcpu}.
+@item sve
+Enable Scalable Vector Extension instructions. This also enables Advanced
+SIMD and floating-point instructions.
@item lse
Enable Large System Extension instructions. This is on by default for
@option{-march=armv8.1-a}.
The stack pointer register (@code{SP})
@item w
-Floating point or SIMD vector register
+Floating point register, Advanced SIMD vector register or SVE vector register
+
+@item Upl
+One of the low eight SVE predicate registers (@code{P0} to @code{P7})
+
+@item Upa
+Any of the SVE predicate registers (@code{P0} to @code{P15})
@item I
Integer constant that is valid as an immediate operand in an @code{ADD}