[AArch64] Add SVE support

author Richard Sandiford <richard.sandiford@linaro.org>

Sat, 13 Jan 2018 17:50:35 +0000 (17:50 +0000)

committer Richard Sandiford <rsandifo@gcc.gnu.org>

Sat, 13 Jan 2018 17:50:35 +0000 (17:50 +0000)
author Richard Sandiford <richard.sandiford@linaro.org>
Sat, 13 Jan 2018 17:50:35 +0000 (17:50 +0000)
committer Richard Sandiford <rsandifo@gcc.gnu.org>
Sat, 13 Jan 2018 17:50:35 +0000 (17:50 +0000)
diff --git a/gcc/ChangeLog b/gcc/ChangeLog

index 3f1919e774f78e8c82ee32af6176bbca2dd5f70d..40da1eb477aecbe47c5fe94ab388ae0114e565da 100644 (file)
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,295 @@
+2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
+           Alan Hayward  <alan.hayward@arm.com>
+           David Sherwood  <david.sherwood@arm.com>
+
+       * doc/invoke.texi (-msve-vector-bits=): Document new option.
+       (sve): Document new AArch64 extension.
+       * doc/md.texi (w): Extend the description of the AArch64
+       constraint to include SVE vectors.
+       (Upl, Upa): Document new AArch64 predicate constraints.
+       * config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum): New
+       enum.
+       * config/aarch64/aarch64.opt (sve_vector_bits): New enum.
+       (msve-vector-bits=): New option.
+       * config/aarch64/aarch64-option-extensions.def (fp, simd): Disable
+       SVE when these are disabled.
+       (sve): New extension.
+       * config/aarch64/aarch64-modes.def: Define SVE vector and predicate
+       modes.  Adjust their number of units based on aarch64_sve_vg.
+       (MAX_BITSIZE_MODE_ANY_MODE): Define.
+       * config/aarch64/aarch64-protos.h (ADDR_QUERY_ANY): New
+       aarch64_addr_query_type.
+       (aarch64_const_vec_all_same_in_range_p, aarch64_sve_pred_mode)
+       (aarch64_sve_cnt_immediate_p, aarch64_sve_addvl_addpl_immediate_p)
+       (aarch64_sve_inc_dec_immediate_p, aarch64_add_offset_temporaries)
+       (aarch64_split_add_offset, aarch64_output_sve_cnt_immediate)
+       (aarch64_output_sve_addvl_addpl, aarch64_output_sve_inc_dec_immediate)
+       (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): Declare.
+       (aarch64_simd_imm_zero_p): Delete.
+       (aarch64_check_zero_based_sve_index_immediate): Declare.
+       (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
+       (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
+       (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
+       (aarch64_sve_float_mul_immediate_p): Likewise.
+       (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
+       rather than an rtx.
+       (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): Declare.
+       (aarch64_expand_mov_immediate): Take a gen_vec_duplicate callback.
+       (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move): Declare.
+       (aarch64_expand_sve_vec_cmp_int, aarch64_expand_sve_vec_cmp_float)
+       (aarch64_expand_sve_vcond, aarch64_expand_sve_vec_perm): Declare.
+       (aarch64_regmode_natural_size): Likewise.
+       * config/aarch64/aarch64.h (AARCH64_FL_SVE): New macro.
+       (AARCH64_FL_V8_3, AARCH64_FL_RCPC, AARCH64_FL_DOTPROD): Shift
+       left one place.
+       (AARCH64_ISA_SVE, TARGET_SVE): New macros.
+       (FIXED_REGISTERS, CALL_USED_REGISTERS, REGISTER_NAMES): Add entries
+       for VG and the SVE predicate registers.
+       (V_ALIASES): Add a "z"-prefixed alias.
+       (FIRST_PSEUDO_REGISTER): Change to P15_REGNUM + 1.
+       (AARCH64_DWARF_VG, AARCH64_DWARF_P0): New macros.
+       (PR_REGNUM_P, PR_LO_REGNUM_P): Likewise.
+       (PR_LO_REGS, PR_HI_REGS, PR_REGS): New reg_classes.
+       (REG_CLASS_NAMES): Add entries for them.
+       (REG_CLASS_CONTENTS): Likewise.  Update ALL_REGS to include VG
+       and the predicate registers.
+       (aarch64_sve_vg): Declare.
+       (BITS_PER_SVE_VECTOR, BYTES_PER_SVE_VECTOR, BYTES_PER_SVE_PRED)
+       (SVE_BYTE_MODE, MAX_COMPILE_TIME_VEC_BYTES): New macros.
+       (REGMODE_NATURAL_SIZE): Define.
+       * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Handle
+       SVE macros.
+       * config/aarch64/aarch64.c: Include cfgrtl.h.
+       (simd_immediate_info): Add a constructor for series vectors,
+       and an associated step field.
+       (aarch64_sve_vg): New variable.
+       (aarch64_dbx_register_number): Handle VG and the predicate registers.
+       (aarch64_vect_struct_mode_p, aarch64_vector_mode_p): Delete.
+       (VEC_ADVSIMD, VEC_SVE_DATA, VEC_SVE_PRED, VEC_STRUCT, VEC_ANY_SVE)
+       (VEC_ANY_DATA, VEC_STRUCT): New constants.
+       (aarch64_advsimd_struct_mode_p, aarch64_sve_pred_mode_p)
+       (aarch64_classify_vector_mode, aarch64_vector_data_mode_p)
+       (aarch64_sve_data_mode_p, aarch64_sve_pred_mode)
+       (aarch64_get_mask_mode): New functions.
+       (aarch64_hard_regno_nregs): Handle SVE data modes for FP_REGS
+       and FP_LO_REGS.  Handle PR_REGS, PR_LO_REGS and PR_HI_REGS.
+       (aarch64_hard_regno_mode_ok): Handle VG.  Also handle the SVE
+       predicate modes and predicate registers.  Explicitly restrict
+       GPRs to modes of 16 bytes or smaller.  Only allow FP registers
+       to store a vector mode if it is recognized by
+       aarch64_classify_vector_mode.
+       (aarch64_regmode_natural_size): New function.
+       (aarch64_hard_regno_caller_save_mode): Return the original mode
+       for predicates.
+       (aarch64_sve_cnt_immediate_p, aarch64_output_sve_cnt_immediate)
+       (aarch64_sve_addvl_addpl_immediate_p, aarch64_output_sve_addvl_addpl)
+       (aarch64_sve_inc_dec_immediate_p, aarch64_output_sve_inc_dec_immediate)
+       (aarch64_add_offset_1_temporaries, aarch64_offset_temporaries): New
+       functions.
+       (aarch64_add_offset): Add a temp2 parameter.  Assert that temp1
+       does not overlap dest if the function is frame-related.  Handle
+       SVE constants.
+       (aarch64_split_add_offset): New function.
+       (aarch64_add_sp, aarch64_sub_sp): Add temp2 parameters and pass
+       them aarch64_add_offset.
+       (aarch64_allocate_and_probe_stack_space): Add a temp2 parameter
+       and update call to aarch64_sub_sp.
+       (aarch64_add_cfa_expression): New function.
+       (aarch64_expand_prologue): Pass extra temporary registers to the
+       functions above.  Handle the case in which we need to emit new
+       DW_CFA_expressions for registers that were originally saved
+       relative to the stack pointer, but now have to be expressed
+       relative to the frame pointer.
+       (aarch64_output_mi_thunk): Pass extra temporary registers to the
+       functions above.
+       (aarch64_expand_epilogue): Likewise.  Prevent inheritance of
+       IP0 and IP1 values for SVE frames.
+       (aarch64_expand_vec_series): New function.
+       (aarch64_expand_sve_widened_duplicate): Likewise.
+       (aarch64_expand_sve_const_vector): Likewise.
+       (aarch64_expand_mov_immediate): Add a gen_vec_duplicate parameter.
+       Handle SVE constants.  Use emit_move_insn to move a force_const_mem
+       into the register, rather than emitting a SET directly.
+       (aarch64_emit_sve_pred_move, aarch64_expand_sve_mem_move)
+       (aarch64_get_reg_raw_mode, offset_4bit_signed_scaled_p)
+       (offset_6bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p)
+       (offset_9bit_signed_scaled_p): New functions.
+       (aarch64_replicate_bitmask_imm): New function.
+       (aarch64_bitmask_imm): Use it.
+       (aarch64_cannot_force_const_mem): Reject expressions involving
+       a CONST_POLY_INT.  Update call to aarch64_classify_symbol.
+       (aarch64_classify_index): Handle SVE indices, by requiring
+       a plain register index with a scale that matches the element size.
+       (aarch64_classify_address): Handle SVE addresses.  Assert that
+       the mode of the address is VOIDmode or an integer mode.
+       Update call to aarch64_classify_symbol.
+       (aarch64_classify_symbolic_expression): Update call to
+       aarch64_classify_symbol.
+       (aarch64_const_vec_all_in_range_p): New function.
+       (aarch64_print_vector_float_operand): Likewise.
+       (aarch64_print_operand): Handle 'N' and 'C'.  Use "zN" rather than
+       "vN" for FP registers with SVE modes.  Handle (const ...) vectors
+       and the FP immediates 1.0 and 0.5.
+       (aarch64_print_address_internal): Handle SVE addresses.
+       (aarch64_print_operand_address): Use ADDR_QUERY_ANY.
+       (aarch64_regno_regclass): Handle predicate registers.
+       (aarch64_secondary_reload): Handle big-endian reloads of SVE
+       data modes.
+       (aarch64_class_max_nregs): Handle SVE modes and predicate registers.
+       (aarch64_rtx_costs): Check for ADDVL and ADDPL instructions.
+       (aarch64_convert_sve_vector_bits): New function.
+       (aarch64_override_options): Use it to handle -msve-vector-bits=.
+       (aarch64_classify_symbol): Take the offset as a HOST_WIDE_INT
+       rather than an rtx.
+       (aarch64_legitimate_constant_p): Use aarch64_classify_vector_mode.
+       Handle SVE vector and predicate modes.  Accept VL-based constants
+       that need only one temporary register, and VL offsets that require
+       no temporary registers.
+       (aarch64_conditional_register_usage): Mark the predicate registers
+       as fixed if SVE isn't available.
+       (aarch64_vector_mode_supported_p): Use aarch64_classify_vector_mode.
+       Return true for SVE vector and predicate modes.
+       (aarch64_simd_container_mode): Take the number of bits as a poly_int64
+       rather than an unsigned int.  Handle SVE modes.
+       (aarch64_preferred_simd_mode): Update call accordingly.  Handle
+       SVE modes.
+       (aarch64_autovectorize_vector_sizes): Add BYTES_PER_SVE_VECTOR
+       if SVE is enabled.
+       (aarch64_sve_index_immediate_p, aarch64_sve_arith_immediate_p)
+       (aarch64_sve_bitmask_immediate_p, aarch64_sve_dup_immediate_p)
+       (aarch64_sve_cmp_immediate_p, aarch64_sve_float_arith_immediate_p)
+       (aarch64_sve_float_mul_immediate_p): New functions.
+       (aarch64_sve_valid_immediate): New function.
+       (aarch64_simd_valid_immediate): Use it as the fallback for SVE vectors.
+       Explicitly reject structure modes.  Check for INDEX constants.
+       Handle PTRUE and PFALSE constants.
+       (aarch64_check_zero_based_sve_index_immediate): New function.
+       (aarch64_simd_imm_zero_p): Delete.
+       (aarch64_mov_operand_p): Use aarch64_simd_valid_immediate for
+       vector modes.  Accept constants in the range of CNT[BHWD].
+       (aarch64_simd_scalar_immediate_valid_for_move): Explicitly
+       ask for an Advanced SIMD mode.
+       (aarch64_sve_ld1r_operand_p, aarch64_sve_ldr_operand_p): New functions.
+       (aarch64_simd_vector_alignment): Handle SVE predicates.
+       (aarch64_vectorize_preferred_vector_alignment): New function.
+       (aarch64_simd_vector_alignment_reachable): Use it instead of
+       the vector size.
+       (aarch64_shift_truncation_mask): Use aarch64_vector_data_mode_p.
+       (aarch64_output_sve_mov_immediate, aarch64_output_ptrue): New
+       functions.
+       (MAX_VECT_LEN): Delete.
+       (expand_vec_perm_d): Add a vec_flags field.
+       (emit_unspec2, aarch64_expand_sve_vec_perm): New functions.
+       (aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip)
+       (aarch64_evpc_ext): Don't apply a big-endian lane correction
+       for SVE modes.
+       (aarch64_evpc_rev): Rename to...
+       (aarch64_evpc_rev_local): ...this.  Use a predicated operation for SVE.
+       (aarch64_evpc_rev_global): New function.
+       (aarch64_evpc_dup): Enforce a 64-byte range for SVE DUP.
+       (aarch64_evpc_tbl): Use MAX_COMPILE_TIME_VEC_BYTES instead of
+       MAX_VECT_LEN.
+       (aarch64_evpc_sve_tbl): New function.
+       (aarch64_expand_vec_perm_const_1): Update after rename of
+       aarch64_evpc_rev.  Handle SVE permutes too, trying
+       aarch64_evpc_rev_global and using aarch64_evpc_sve_tbl rather
+       than aarch64_evpc_tbl.
+       (aarch64_vectorize_vec_perm_const): Initialize vec_flags.
+       (aarch64_sve_cmp_operand_p, aarch64_unspec_cond_code)
+       (aarch64_gen_unspec_cond, aarch64_expand_sve_vec_cmp_int)
+       (aarch64_emit_unspec_cond, aarch64_emit_unspec_cond_or)
+       (aarch64_emit_inverted_unspec_cond, aarch64_expand_sve_vec_cmp_float)
+       (aarch64_expand_sve_vcond): New functions.
+       (aarch64_modes_tieable_p): Use aarch64_vector_data_mode_p instead
+       of aarch64_vector_mode_p.
+       (aarch64_dwarf_poly_indeterminate_value): New function.
+       (aarch64_compute_pressure_classes): Likewise.
+       (aarch64_can_change_mode_class): Likewise.
+       (TARGET_GET_RAW_RESULT_MODE, TARGET_GET_RAW_ARG_MODE): Redefine.
+       (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): Likewise.
+       (TARGET_VECTORIZE_GET_MASK_MODE): Likewise.
+       (TARGET_DWARF_POLY_INDETERMINATE_VALUE): Likewise.
+       (TARGET_COMPUTE_PRESSURE_CLASSES): Likewise.
+       (TARGET_CAN_CHANGE_MODE_CLASS): Likewise.
+       * config/aarch64/constraints.md (Upa, Upl, Uav, Uat, Usv, Usi, Utr)
+       (Uty, Dm, vsa, vsc, vsd, vsi, vsn, vsl, vsm, vsA, vsM, vsN): New
+       constraints.
+       (Dn, Dl, Dr): Accept const as well as const_vector.
+       (Dz): Likewise.  Compare against CONST0_RTX.
+       * config/aarch64/iterators.md: Refer to "Advanced SIMD" instead
+       of "vector" where appropriate.
+       (SVE_ALL, SVE_BH, SVE_BHS, SVE_BHSI, SVE_HSDI, SVE_HSF, SVE_SD)
+       (SVE_SDI, SVE_I, SVE_F, PRED_ALL, PRED_BHS): New mode iterators.
+       (UNSPEC_SEL, UNSPEC_ANDF, UNSPEC_IORF, UNSPEC_XORF, UNSPEC_COND_LT)
+       (UNSPEC_COND_LE, UNSPEC_COND_EQ, UNSPEC_COND_NE, UNSPEC_COND_GE)
+       (UNSPEC_COND_GT, UNSPEC_COND_LO, UNSPEC_COND_LS, UNSPEC_COND_HS)
+       (UNSPEC_COND_HI, UNSPEC_COND_UO): New unspecs.
+       (Vetype, VEL, Vel, VWIDE, Vwide, vw, vwcore, V_INT_EQUIV)
+       (v_int_equiv): Extend to SVE modes.
+       (Vesize, V128, v128, Vewtype, V_FP_EQUIV, v_fp_equiv, VPRED): New
+       mode attributes.
+       (LOGICAL_OR, SVE_INT_UNARY, SVE_FP_UNARY): New code iterators.
+       (optab): Handle popcount, smin, smax, umin, umax, abs and sqrt.
+       (logical_nn, lr, sve_int_op, sve_fp_op): New code attributs.
+       (LOGICALF, OPTAB_PERMUTE, UNPACK, UNPACK_UNSIGNED, SVE_COND_INT_CMP)
+       (SVE_COND_FP_CMP): New int iterators.
+       (perm_hilo): Handle the new unpack unspecs.
+       (optab, logicalf_op, su, perm_optab, cmp_op, imm_con): New int
+       attributes.
+       * config/aarch64/predicates.md (aarch64_sve_cnt_immediate)
+       (aarch64_sve_addvl_addpl_immediate, aarch64_split_add_offset_immediate)
+       (aarch64_pluslong_or_poly_operand, aarch64_nonmemory_operand)
+       (aarch64_equality_operator, aarch64_constant_vector_operand)
+       (aarch64_sve_ld1r_operand, aarch64_sve_ldr_operand): New predicates.
+       (aarch64_sve_nonimmediate_operand): Likewise.
+       (aarch64_sve_general_operand): Likewise.
+       (aarch64_sve_dup_operand, aarch64_sve_arith_immediate): Likewise.
+       (aarch64_sve_sub_arith_immediate, aarch64_sve_inc_dec_immediate)
+       (aarch64_sve_logical_immediate, aarch64_sve_mul_immediate): Likewise.
+       (aarch64_sve_dup_immediate, aarch64_sve_cmp_vsc_immediate): Likewise.
+       (aarch64_sve_cmp_vsd_immediate, aarch64_sve_index_immediate): Likewise.
+       (aarch64_sve_float_arith_immediate): Likewise.
+       (aarch64_sve_float_arith_with_sub_immediate): Likewise.
+       (aarch64_sve_float_mul_immediate, aarch64_sve_arith_operand): Likewise.
+       (aarch64_sve_add_operand, aarch64_sve_logical_operand): Likewise.
+       (aarch64_sve_lshift_operand, aarch64_sve_rshift_operand): Likewise.
+       (aarch64_sve_mul_operand, aarch64_sve_cmp_vsc_operand): Likewise.
+       (aarch64_sve_cmp_vsd_operand, aarch64_sve_index_operand): Likewise.
+       (aarch64_sve_float_arith_operand): Likewise.
+       (aarch64_sve_float_arith_with_sub_operand): Likewise.
+       (aarch64_sve_float_mul_operand): Likewise.
+       (aarch64_sve_vec_perm_operand): Likewise.
+       (aarch64_pluslong_operand): Include aarch64_sve_addvl_addpl_immediate.
+       (aarch64_mov_operand): Accept const_poly_int and const_vector.
+       (aarch64_simd_lshift_imm, aarch64_simd_rshift_imm): Accept const
+       as well as const_vector.
+       (aarch64_simd_imm_zero, aarch64_simd_imm_minus_one): Move earlier
+       in file.  Use CONST0_RTX and CONSTM1_RTX.
+       (aarch64_simd_or_scalar_imm_zero): Likewise.  Add match_codes.
+       (aarch64_simd_reg_or_zero): Accept const as well as const_vector.
+       Use aarch64_simd_imm_zero.
+       * config/aarch64/aarch64-sve.md: New file.
+       * config/aarch64/aarch64.md: Include it.
+       (VG_REGNUM, P0_REGNUM, P7_REGNUM, P15_REGNUM): New register numbers.
+       (UNSPEC_REV, UNSPEC_LD1_SVE, UNSPEC_ST1_SVE, UNSPEC_MERGE_PTRUE)
+       (UNSPEC_PTEST_PTRUE, UNSPEC_UNPACKSHI, UNSPEC_UNPACKUHI)
+       (UNSPEC_UNPACKSLO, UNSPEC_UNPACKULO, UNSPEC_PACK)
+       (UNSPEC_FLOAT_CONVERT, UNSPEC_WHILE_LO): New unspec constants.
+       (sve): New attribute.
+       (enabled): Disable instructions with the sve attribute unless
+       TARGET_SVE.
+       (movqi, movhi): Pass CONST_POLY_INT operaneds through
+       aarch64_expand_mov_immediate.
+       (*mov<mode>_aarch64, *movsi_aarch64, *movdi_aarch64): Handle
+       CNT[BHSD] immediates.
+       (movti): Split CONST_POLY_INT moves into two halves.
+       (add<mode>3): Accept aarch64_pluslong_or_poly_operand.
+       Split additions that need a temporary here if the destination
+       is the stack pointer.
+       (*add<mode>3_aarch64): Handle ADDVL and ADDPL immediates.
+       (*add<mode>3_poly_1): New instruction.
+       (set_clobber_cc): New expander.
+
  2018-01-13  Richard Sandiford  <richard.sandiford@linaro.org>
  
         * simplify-rtx.c (simplify_immed_subreg): Add an inner_bytes
diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c

index 172c30fb520c5cab282372a5102f6c86015a4259..40c738c7c3b0fc09378dd8058f09e4e4fff33a6a 100644 (file)
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -136,6 +136,15 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
  
    aarch64_def_or_undef (TARGET_CRYPTO, "__ARM_FEATURE_CRYPTO", pfile);
    aarch64_def_or_undef (TARGET_SIMD_RDMA, "__ARM_FEATURE_QRDMX", pfile);
+  aarch64_def_or_undef (TARGET_SVE, "__ARM_FEATURE_SVE", pfile);
+  cpp_undef (pfile, "__ARM_FEATURE_SVE_BITS");
+  if (TARGET_SVE)
+    {
+      int bits;
+      if (!BITS_PER_SVE_VECTOR.is_constant (&bits))
+       bits = 0;
+      builtin_define_with_int_value ("__ARM_FEATURE_SVE_BITS", bits);
+    }
  
    aarch64_def_or_undef (TARGET_AES, "__ARM_FEATURE_AES", pfile);
    aarch64_def_or_undef (TARGET_SHA2, "__ARM_FEATURE_SHA2", pfile);
diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def

index de40f72d666c0d9233bea0c342992d1f9ae8ecfc..4e9da29d321567cd83ee0012bd96d900e16bad2c 100644 (file)
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -30,6 +30,22 @@ FLOAT_MODE (HF, 2, 0);
  ADJUST_FLOAT_FORMAT (HF, &ieee_half_format);
  
  /* Vector modes.  */
+
+VECTOR_BOOL_MODE (VNx16BI, 16, 2);
+VECTOR_BOOL_MODE (VNx8BI, 8, 2);
+VECTOR_BOOL_MODE (VNx4BI, 4, 2);
+VECTOR_BOOL_MODE (VNx2BI, 2, 2);
+
+ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
+ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
+ADJUST_NUNITS (VNx4BI, aarch64_sve_vg * 2);
+ADJUST_NUNITS (VNx2BI, aarch64_sve_vg);
+
+ADJUST_ALIGNMENT (VNx16BI, 2);
+ADJUST_ALIGNMENT (VNx8BI, 2);
+ADJUST_ALIGNMENT (VNx4BI, 2);
+ADJUST_ALIGNMENT (VNx2BI, 2);
+
  VECTOR_MODES (INT, 8);        /*       V8QI V4HI V2SI.  */
  VECTOR_MODES (INT, 16);       /* V16QI V8HI V4SI V2DI.  */
  VECTOR_MODES (FLOAT, 8);      /*                 V2SF.  */
@@ -45,9 +61,43 @@ INT_MODE (OI, 32);
  INT_MODE (CI, 48);
  INT_MODE (XI, 64);
  
+/* Define SVE modes for NVECS vectors.  VB, VH, VS and VD are the prefixes
+   for 8-bit, 16-bit, 32-bit and 64-bit elements respectively.  It isn't
+   strictly necessary to set the alignment here, since the default would
+   be clamped to BIGGEST_ALIGNMENT anyhow, but it seems clearer.  */
+#define SVE_MODES(NVECS, VB, VH, VS, VD) \
+  VECTOR_MODES_WITH_PREFIX (VNx, INT, 16 * NVECS); \
+  VECTOR_MODES_WITH_PREFIX (VNx, FLOAT, 16 * NVECS); \
+  \
+  ADJUST_NUNITS (VB##QI, aarch64_sve_vg * NVECS * 8); \
+  ADJUST_NUNITS (VH##HI, aarch64_sve_vg * NVECS * 4); \
+  ADJUST_NUNITS (VS##SI, aarch64_sve_vg * NVECS * 2); \
+  ADJUST_NUNITS (VD##DI, aarch64_sve_vg * NVECS); \
+  ADJUST_NUNITS (VH##HF, aarch64_sve_vg * NVECS * 4); \
+  ADJUST_NUNITS (VS##SF, aarch64_sve_vg * NVECS * 2); \
+  ADJUST_NUNITS (VD##DF, aarch64_sve_vg * NVECS); \
+  \
+  ADJUST_ALIGNMENT (VB##QI, 16); \
+  ADJUST_ALIGNMENT (VH##HI, 16); \
+  ADJUST_ALIGNMENT (VS##SI, 16); \
+  ADJUST_ALIGNMENT (VD##DI, 16); \
+  ADJUST_ALIGNMENT (VH##HF, 16); \
+  ADJUST_ALIGNMENT (VS##SF, 16); \
+  ADJUST_ALIGNMENT (VD##DF, 16);
+
+/* Give SVE vectors the names normally used for 256-bit vectors.
+   The actual number depends on command-line flags.  */
+SVE_MODES (1, VNx16, VNx8, VNx4, VNx2)
+
  /* Quad float: 128-bit floating mode for long doubles.  */
  FLOAT_MODE (TF, 16, ieee_quad_format);
  
+/* A 4-tuple of SVE vectors with the maximum -msve-vector-bits= setting.
+   Note that this is a limit only on the compile-time sizes of modes;
+   it is not a limit on the runtime sizes, since VL-agnostic code
+   must work with arbitary vector lengths.  */
+#define MAX_BITSIZE_MODE_ANY_MODE (2048 * 4)
+
  /* Coefficient 1 is multiplied by the number of 128-bit chunks in an
     SVE vector (referred to as "VQ") minus one.  */
  #define NUM_POLY_INT_COEFFS 2
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def

index 593dad9381c18400bb28d4fc321239f17720c5eb..5fe5e3f7dddf622a48a5b9458ef30449a886f395 100644 (file)
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -39,16 +39,19 @@
     that are required.  Their order is not important.  */
  
  /* Enabling "fp" just enables "fp".
-   Disabling "fp" also disables "simd", "crypto", "fp16", "aes", "sha2", "sha3", and sm3/sm4.  */
+   Disabling "fp" also disables "simd", "crypto", "fp16", "aes", "sha2",
+   "sha3", sm3/sm4 and "sve".  */
  AARCH64_OPT_EXTENSION("fp", AARCH64_FL_FP, 0, AARCH64_FL_SIMD | AARCH64_FL_CRYPTO |\
                       AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2 |\
-                     AARCH64_FL_SHA3 | AARCH64_FL_SM4, "fp")
+                     AARCH64_FL_SHA3 | AARCH64_FL_SM4 | AARCH64_FL_SVE, "fp")
  
  /* Enabling "simd" also enables "fp".
-   Disabling "simd" also disables "crypto", "dotprod", "aes", "sha2", "sha3" and "sm3/sm4".  */
+   Disabling "simd" also disables "crypto", "dotprod", "aes", "sha2", "sha3",
+   "sm3/sm4" and "sve".  */
  AARCH64_OPT_EXTENSION("simd", AARCH64_FL_SIMD, AARCH64_FL_FP, AARCH64_FL_CRYPTO |\
                       AARCH64_FL_DOTPROD | AARCH64_FL_AES | AARCH64_FL_SHA2 |\
-                     AARCH64_FL_SHA3 | AARCH64_FL_SM4, "asimd")
+                     AARCH64_FL_SHA3 | AARCH64_FL_SM4 | AARCH64_FL_SVE,
+                     "asimd")
  
  /* Enabling "crypto" also enables "fp" and "simd".
     Disabling "crypto" disables "crypto", "aes", "sha2", "sha3" and "sm3/sm4".  */
@@ -63,8 +66,9 @@ AARCH64_OPT_EXTENSION("crc", AARCH64_FL_CRC, 0, 0, "crc32")
  AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics")
  
  /* Enabling "fp16" also enables "fp".
-   Disabling "fp16" disables "fp16" and "fp16fml".  */
-AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP, AARCH64_FL_F16FML, "fphp asimdhp")
+   Disabling "fp16" disables "fp16", "fp16fml" and "sve".  */
+AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, AARCH64_FL_FP,
+                     AARCH64_FL_F16FML | AARCH64_FL_SVE, "fphp asimdhp")
  
  /* Enabling or disabling "rcpc" only changes "rcpc".  */
  AARCH64_OPT_EXTENSION("rcpc", AARCH64_FL_RCPC, 0, 0, "lrcpc")
@@ -97,4 +101,8 @@ AARCH64_OPT_EXTENSION("sm4", AARCH64_FL_SM4, AARCH64_FL_SIMD, 0, "sm3 sm4")
     Disabling "fp16fml" just disables "fp16fml".  */
  AARCH64_OPT_EXTENSION("fp16fml", AARCH64_FL_F16FML, AARCH64_FL_FP | AARCH64_FL_F16, 0, "asimdfml")
  
+/* Enabling "sve" also enables "fp16", "fp" and "simd".
+   Disabling "sve" just disables "sve".  */
+AARCH64_OPT_EXTENSION("sve", AARCH64_FL_SVE, AARCH64_FL_FP | AARCH64_FL_SIMD | AARCH64_FL_F16, 0, "sve")
+
  #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h

index 19929728a31d84fbe68aed451e74bfd4a95fe52c..7a5c6d7664f47b220840d7fdd4e68c5fedbb3d6e 100644 (file)
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -81,4 +81,14 @@ enum aarch64_function_type {
    AARCH64_FUNCTION_ALL
  };
  
+/* SVE vector register sizes.  */
+enum aarch64_sve_vector_bits_enum {
+  SVE_SCALABLE,
+  SVE_128 = 128,
+  SVE_256 = 256,
+  SVE_512 = 512,
+  SVE_1024 = 1024,
+  SVE_2048 = 2048
+};
+
  #endif
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h

index 8c3471bdbb87f00ab365092d58e5f9f0f6a605d1..4f1fc15d39dbff8741b9b2c698ea63396d62dea0 100644 (file)
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -118,10 +118,17 @@ enum aarch64_symbol_type
        (the rules are the same for both).
  
     ADDR_QUERY_LDP_STP
-      Query what is valid for a load/store pair.  */
+      Query what is valid for a load/store pair.
+
+   ADDR_QUERY_ANY
+      Query what is valid for at least one memory constraint, which may
+      allow things that "m" doesn't.  For example, the SVE LDR and STR
+      addressing modes allow a wider range of immediate offsets than "m"
+      does.  */
  enum aarch64_addr_query_type {
    ADDR_QUERY_M,
-  ADDR_QUERY_LDP_STP
+  ADDR_QUERY_LDP_STP,
+  ADDR_QUERY_ANY
  };
  
  /* A set of tuning parameters contains references to size and time
@@ -344,6 +351,8 @@ int aarch64_branch_cost (bool, bool);
  enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
  bool aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode);
  bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
+bool aarch64_const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT,
+                                           HOST_WIDE_INT);
  bool aarch64_constant_address_p (rtx);
  bool aarch64_emit_approx_div (rtx, rtx, rtx);
  bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
@@ -364,23 +373,41 @@ bool aarch64_legitimate_pic_operand_p (rtx);
  bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
  bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
  bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
+opt_machine_mode aarch64_sve_pred_mode (unsigned int);
+bool aarch64_sve_cnt_immediate_p (rtx);
+bool aarch64_sve_addvl_addpl_immediate_p (rtx);
+bool aarch64_sve_inc_dec_immediate_p (rtx);
+int aarch64_add_offset_temporaries (rtx);
+void aarch64_split_add_offset (scalar_int_mode, rtx, rtx, rtx, rtx, rtx);
  bool aarch64_mov_operand_p (rtx, machine_mode);
  rtx aarch64_reverse_mask (machine_mode, unsigned int);
  bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64);
+char *aarch64_output_sve_cnt_immediate (const char *, const char *, rtx);
+char *aarch64_output_sve_addvl_addpl (rtx, rtx, rtx);
+char *aarch64_output_sve_inc_dec_immediate (const char *, rtx);
  char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
  char *aarch64_output_simd_mov_immediate (rtx, unsigned,
                         enum simd_immediate_check w = AARCH64_CHECK_MOV);
+char *aarch64_output_sve_mov_immediate (rtx);
+char *aarch64_output_ptrue (machine_mode, char);
  bool aarch64_pad_reg_upward (machine_mode, const_tree, bool);
  bool aarch64_regno_ok_for_base_p (int, bool);
  bool aarch64_regno_ok_for_index_p (int, bool);
  bool aarch64_reinterpret_float_as_int (rtx value, unsigned HOST_WIDE_INT *fail);
  bool aarch64_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
                                             bool high);
-bool aarch64_simd_imm_zero_p (rtx, machine_mode);
  bool aarch64_simd_scalar_immediate_valid_for_move (rtx, scalar_int_mode);
  bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
  bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
                         enum simd_immediate_check w = AARCH64_CHECK_MOV);
+rtx aarch64_check_zero_based_sve_index_immediate (rtx);
+bool aarch64_sve_index_immediate_p (rtx);
+bool aarch64_sve_arith_immediate_p (rtx, bool);
+bool aarch64_sve_bitmask_immediate_p (rtx);
+bool aarch64_sve_dup_immediate_p (rtx);
+bool aarch64_sve_cmp_immediate_p (rtx, bool);
+bool aarch64_sve_float_arith_immediate_p (rtx, bool);
+bool aarch64_sve_float_mul_immediate_p (rtx);
  bool aarch64_split_dimode_const_store (rtx, rtx);
  bool aarch64_symbolic_address_p (rtx);
  bool aarch64_uimm12_shift (HOST_WIDE_INT);
@@ -388,7 +415,7 @@ bool aarch64_use_return_insn_p (void);
  const char *aarch64_mangle_builtin_type (const_tree);
  const char *aarch64_output_casesi (rtx *);
  
-enum aarch64_symbol_type aarch64_classify_symbol (rtx, rtx);
+enum aarch64_symbol_type aarch64_classify_symbol (rtx, HOST_WIDE_INT);
  enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx);
  enum reg_class aarch64_regno_regclass (unsigned);
  int aarch64_asm_preferred_eh_data_format (int, int);
@@ -403,6 +430,8 @@ const char *aarch64_output_move_struct (rtx *operands);
  rtx aarch64_return_addr (int, rtx);
  rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
  bool aarch64_simd_mem_operand_p (rtx);
+bool aarch64_sve_ld1r_operand_p (rtx);
+bool aarch64_sve_ldr_operand_p (rtx);
  rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool);
  rtx aarch64_tls_get_addr (void);
  tree aarch64_fold_builtin (tree, int, tree *, bool);
@@ -414,7 +443,9 @@ const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
  const char * aarch64_output_probe_stack_range (rtx, rtx);
  void aarch64_err_no_fpadvsimd (machine_mode, const char *);
  void aarch64_expand_epilogue (bool);
-void aarch64_expand_mov_immediate (rtx, rtx);
+void aarch64_expand_mov_immediate (rtx, rtx, rtx (*) (rtx, rtx) = 0);
+void aarch64_emit_sve_pred_move (rtx, rtx, rtx);
+void aarch64_expand_sve_mem_move (rtx, rtx, machine_mode);
  void aarch64_expand_prologue (void);
  void aarch64_expand_vector_init (rtx, rtx);
  void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
@@ -467,6 +498,10 @@ void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
  void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
  
  bool aarch64_gen_adjusted_ldpstp (rtx *, bool, scalar_mode, RTX_CODE);
+
+void aarch64_expand_sve_vec_cmp_int (rtx, rtx_code, rtx, rtx);
+bool aarch64_expand_sve_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
+void aarch64_expand_sve_vcond (machine_mode, machine_mode, rtx *);
  #endif /* RTX_CODE */
  
  void aarch64_init_builtins (void);
@@ -485,6 +520,7 @@ tree aarch64_builtin_vectorized_function (unsigned int, tree, tree);
  
  extern void aarch64_split_combinev16qi (rtx operands[3]);
  extern void aarch64_expand_vec_perm (rtx, rtx, rtx, rtx, unsigned int);
+extern void aarch64_expand_sve_vec_perm (rtx, rtx, rtx, rtx);
  extern bool aarch64_madd_needs_nop (rtx_insn *);
  extern void aarch64_final_prescan_insn (rtx_insn *);
  void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
@@ -508,4 +544,6 @@ std::string aarch64_get_extension_string_for_isa_flags (unsigned long,
  
  rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
  
+poly_uint64 aarch64_regmode_natural_size (machine_mode);
+
  #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md

new file mode 100644 (file)

index 0000000..352c306
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -0,0 +1,1922 @@
+;; Machine description for AArch64 SVE.
+;; Copyright (C) 2009-2016 Free Software Foundation, Inc.
+;; Contributed by ARM Ltd.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Note on the handling of big-endian SVE
+;; --------------------------------------
+;;
+;; On big-endian systems, Advanced SIMD mov<mode> patterns act in the
+;; same way as movdi or movti would: the first byte of memory goes
+;; into the most significant byte of the register and the last byte
+;; of memory goes into the least significant byte of the register.
+;; This is the most natural ordering for Advanced SIMD and matches
+;; the ABI layout for 64-bit and 128-bit vector types.
+;;
+;; As a result, the order of bytes within the register is what GCC
+;; expects for a big-endian target, and subreg offsets therefore work
+;; as expected, with the first element in memory having subreg offset 0
+;; and the last element in memory having the subreg offset associated
+;; with a big-endian lowpart.  However, this ordering also means that
+;; GCC's lane numbering does not match the architecture's numbering:
+;; GCC always treats the element at the lowest address in memory
+;; (subreg offset 0) as element 0, while the architecture treats
+;; the least significant end of the register as element 0.
+;;
+;; The situation for SVE is different.  We want the layout of the
+;; SVE register to be same for mov<mode> as it is for maskload<mode>:
+;; logically, a mov<mode> load must be indistinguishable from a
+;; maskload<mode> whose mask is all true.  We therefore need the
+;; register layout to match LD1 rather than LDR.  The ABI layout of
+;; SVE types also matches LD1 byte ordering rather than LDR byte ordering.
+;;
+;; As a result, the architecture lane numbering matches GCC's lane
+;; numbering, with element 0 always being the first in memory.
+;; However:
+;;
+;; - Applying a subreg offset to a register does not give the element
+;;   that GCC expects: the first element in memory has the subreg offset
+;;   associated with a big-endian lowpart while the last element in memory
+;;   has subreg offset 0.  We handle this via TARGET_CAN_CHANGE_MODE_CLASS.
+;;
+;; - We cannot use LDR and STR for spill slots that might be accessed
+;;   via subregs, since although the elements have the order GCC expects,
+;;   the order of the bytes within the elements is different.  We instead
+;;   access spill slots via LD1 and ST1, using secondary reloads to
+;;   reserve a predicate register.
+
+
+;; SVE data moves.
+(define_expand "mov<mode>"
+  [(set (match_operand:SVE_ALL 0 "nonimmediate_operand")
+       (match_operand:SVE_ALL 1 "general_operand"))]
+  "TARGET_SVE"
+  {
+    /* Use the predicated load and store patterns where possible.
+       This is required for big-endian targets (see the comment at the
+       head of the file) and increases the addressing choices for
+       little-endian.  */
+    if ((MEM_P (operands[0]) || MEM_P (operands[1]))
+        && can_create_pseudo_p ())
+      {
+       aarch64_expand_sve_mem_move (operands[0], operands[1], <VPRED>mode);
+       DONE;
+      }
+
+    if (CONSTANT_P (operands[1]))
+      {
+       aarch64_expand_mov_immediate (operands[0], operands[1],
+                                     gen_vec_duplicate<mode>);
+       DONE;
+      }
+  }
+)
+
+;; Unpredicated moves (little-endian).  Only allow memory operations
+;; during and after RA; before RA we want the predicated load and
+;; store patterns to be used instead.
+(define_insn "*aarch64_sve_mov<mode>_le"
+  [(set (match_operand:SVE_ALL 0 "aarch64_sve_nonimmediate_operand" "=w, Utr, w, w")
+       (match_operand:SVE_ALL 1 "aarch64_sve_general_operand" "Utr, w, w, Dn"))]
+  "TARGET_SVE
+   && !BYTES_BIG_ENDIAN
+   && ((lra_in_progress || reload_completed)
+       || (register_operand (operands[0], <MODE>mode)
+          && nonmemory_operand (operands[1], <MODE>mode)))"
+  "@
+   ldr\t%0, %1
+   str\t%1, %0
+   mov\t%0.d, %1.d
+   * return aarch64_output_sve_mov_immediate (operands[1]);"
+)
+
+;; Unpredicated moves (big-endian).  Memory accesses require secondary
+;; reloads.
+(define_insn "*aarch64_sve_mov<mode>_be"
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w")
+       (match_operand:SVE_ALL 1 "aarch64_nonmemory_operand" "w, Dn"))]
+  "TARGET_SVE && BYTES_BIG_ENDIAN"
+  "@
+   mov\t%0.d, %1.d
+   * return aarch64_output_sve_mov_immediate (operands[1]);"
+)
+
+;; Handle big-endian memory reloads.  We use byte PTRUE for all modes
+;; to try to encourage reuse.
+(define_expand "aarch64_sve_reload_be"
+  [(parallel
+     [(set (match_operand 0)
+           (match_operand 1))
+      (clobber (match_operand:VNx16BI 2 "register_operand" "=Upl"))])]
+  "TARGET_SVE && BYTES_BIG_ENDIAN"
+  {
+    /* Create a PTRUE.  */
+    emit_move_insn (operands[2], CONSTM1_RTX (VNx16BImode));
+
+    /* Refer to the PTRUE in the appropriate mode for this move.  */
+    machine_mode mode = GET_MODE (operands[0]);
+    machine_mode pred_mode
+      = aarch64_sve_pred_mode (GET_MODE_UNIT_SIZE (mode)).require ();
+    rtx pred = gen_lowpart (pred_mode, operands[2]);
+
+    /* Emit a predicated load or store.  */
+    aarch64_emit_sve_pred_move (operands[0], pred, operands[1]);
+    DONE;
+  }
+)
+
+;; A predicated load or store for which the predicate is known to be
+;; all-true.  Note that this pattern is generated directly by
+;; aarch64_emit_sve_pred_move, so changes to this pattern will
+;; need changes there as well.
+(define_insn "*pred_mov<mode>"
+  [(set (match_operand:SVE_ALL 0 "nonimmediate_operand" "=w, m")
+       (unspec:SVE_ALL
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+          (match_operand:SVE_ALL 2 "nonimmediate_operand" "m, w")]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE
+   && (register_operand (operands[0], <MODE>mode)
+       || register_operand (operands[2], <MODE>mode))"
+  "@
+   ld1<Vesize>\t%0.<Vetype>, %1/z, %2
+   st1<Vesize>\t%2.<Vetype>, %1, %0"
+)
+
+(define_expand "movmisalign<mode>"
+  [(set (match_operand:SVE_ALL 0 "nonimmediate_operand")
+       (match_operand:SVE_ALL 1 "general_operand"))]
+  "TARGET_SVE"
+  {
+    /* Equivalent to a normal move for our purpooses.  */
+    emit_move_insn (operands[0], operands[1]);
+    DONE;
+  }
+)
+
+(define_insn "maskload<mode><vpred>"
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+       (unspec:SVE_ALL
+         [(match_operand:<VPRED> 2 "register_operand" "Upl")
+          (match_operand:SVE_ALL 1 "memory_operand" "m")]
+         UNSPEC_LD1_SVE))]
+  "TARGET_SVE"
+  "ld1<Vesize>\t%0.<Vetype>, %2/z, %1"
+)
+
+(define_insn "maskstore<mode><vpred>"
+  [(set (match_operand:SVE_ALL 0 "memory_operand" "+m")
+       (unspec:SVE_ALL [(match_operand:<VPRED> 2 "register_operand" "Upl")
+                        (match_operand:SVE_ALL 1 "register_operand" "w")
+                        (match_dup 0)]
+                       UNSPEC_ST1_SVE))]
+  "TARGET_SVE"
+  "st1<Vesize>\t%1.<Vetype>, %2, %0"
+)
+
+(define_expand "mov<mode>"
+  [(set (match_operand:PRED_ALL 0 "nonimmediate_operand")
+       (match_operand:PRED_ALL 1 "general_operand"))]
+  "TARGET_SVE"
+  {
+    if (GET_CODE (operands[0]) == MEM)
+      operands[1] = force_reg (<MODE>mode, operands[1]);
+  }
+)
+
+(define_insn "*aarch64_sve_mov<mode>"
+  [(set (match_operand:PRED_ALL 0 "nonimmediate_operand" "=Upa, m, Upa, Upa, Upa")
+       (match_operand:PRED_ALL 1 "general_operand" "Upa, Upa, m, Dz, Dm"))]
+  "TARGET_SVE
+   && (register_operand (operands[0], <MODE>mode)
+       || register_operand (operands[1], <MODE>mode))"
+  "@
+   mov\t%0.b, %1.b
+   str\t%1, %0
+   ldr\t%0, %1
+   pfalse\t%0.b
+   * return aarch64_output_ptrue (<MODE>mode, '<Vetype>');"
+)
+
+;; Handle extractions from a predicate by converting to an integer vector
+;; and extracting from there.
+(define_expand "vec_extract<vpred><Vel>"
+  [(match_operand:<VEL> 0 "register_operand")
+   (match_operand:<VPRED> 1 "register_operand")
+   (match_operand:SI 2 "nonmemory_operand")
+   ;; Dummy operand to which we can attach the iterator.
+   (reg:SVE_I V0_REGNUM)]
+  "TARGET_SVE"
+  {
+    rtx tmp = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_aarch64_sve_dup<mode>_const (tmp, operands[1],
+                                               CONST1_RTX (<MODE>mode),
+                                               CONST0_RTX (<MODE>mode)));
+    emit_insn (gen_vec_extract<mode><Vel> (operands[0], tmp, operands[2]));
+    DONE;
+  }
+)
+
+(define_expand "vec_extract<mode><Vel>"
+  [(set (match_operand:<VEL> 0 "register_operand")
+       (vec_select:<VEL>
+         (match_operand:SVE_ALL 1 "register_operand")
+         (parallel [(match_operand:SI 2 "nonmemory_operand")])))]
+  "TARGET_SVE"
+  {
+    poly_int64 val;
+    if (poly_int_rtx_p (operands[2], &val)
+       && known_eq (val, GET_MODE_NUNITS (<MODE>mode) - 1))
+      {
+       /* The last element can be extracted with a LASTB and a false
+          predicate.  */
+       rtx sel = force_reg (<VPRED>mode, CONST0_RTX (<VPRED>mode));
+       emit_insn (gen_aarch64_sve_lastb<mode> (operands[0], sel,
+                                               operands[1]));
+       DONE;
+      }
+    if (!CONST_INT_P (operands[2]))
+      {
+       /* Create an index with operand[2] as the base and -1 as the step.
+          It will then be zero for the element we care about.  */
+       rtx index = gen_lowpart (<VEL_INT>mode, operands[2]);
+       index = force_reg (<VEL_INT>mode, index);
+       rtx series = gen_reg_rtx (<V_INT_EQUIV>mode);
+       emit_insn (gen_vec_series<v_int_equiv> (series, index, constm1_rtx));
+
+       /* Get a predicate that is true for only that element.  */
+       rtx zero = CONST0_RTX (<V_INT_EQUIV>mode);
+       rtx cmp = gen_rtx_EQ (<V_INT_EQUIV>mode, series, zero);
+       rtx sel = gen_reg_rtx (<VPRED>mode);
+       emit_insn (gen_vec_cmp<v_int_equiv><vpred> (sel, cmp, series, zero));
+
+       /* Select the element using LASTB.  */
+       emit_insn (gen_aarch64_sve_lastb<mode> (operands[0], sel,
+                                               operands[1]));
+       DONE;
+      }
+  }
+)
+
+;; Extract an element from the Advanced SIMD portion of the register.
+;; We don't just reuse the aarch64-simd.md pattern because we don't
+;; want any chnage in lane number on big-endian targets.
+(define_insn "*vec_extract<mode><Vel>_v128"
+  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
+       (vec_select:<VEL>
+         (match_operand:SVE_ALL 1 "register_operand" "w, w, w")
+         (parallel [(match_operand:SI 2 "const_int_operand")])))]
+  "TARGET_SVE
+   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 0, 15)"
+  {
+    operands[1] = gen_lowpart (<V128>mode, operands[1]);
+    switch (which_alternative)
+      {
+       case 0:
+         return "umov\\t%<vwcore>0, %1.<Vetype>[%2]";
+       case 1:
+         return "dup\\t%<Vetype>0, %1.<Vetype>[%2]";
+       case 2:
+         return "st1\\t{%1.<Vetype>}[%2], %0";
+       default:
+         gcc_unreachable ();
+      }
+  }
+  [(set_attr "type" "neon_to_gp_q, neon_dup_q, neon_store1_one_lane_q")]
+)
+
+;; Extract an element in the range of DUP.  This pattern allows the
+;; source and destination to be different.
+(define_insn "*vec_extract<mode><Vel>_dup"
+  [(set (match_operand:<VEL> 0 "register_operand" "=w")
+       (vec_select:<VEL>
+         (match_operand:SVE_ALL 1 "register_operand" "w")
+         (parallel [(match_operand:SI 2 "const_int_operand")])))]
+  "TARGET_SVE
+   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 16, 63)"
+  {
+    operands[0] = gen_rtx_REG (<MODE>mode, REGNO (operands[0]));
+    return "dup\t%0.<Vetype>, %1.<Vetype>[%2]";
+  }
+)
+
+;; Extract an element outside the range of DUP.  This pattern requires the
+;; source and destination to be the same.
+(define_insn "*vec_extract<mode><Vel>_ext"
+  [(set (match_operand:<VEL> 0 "register_operand" "=w")
+       (vec_select:<VEL>
+         (match_operand:SVE_ALL 1 "register_operand" "0")
+         (parallel [(match_operand:SI 2 "const_int_operand")])))]
+  "TARGET_SVE && INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode) >= 64"
+  {
+    operands[0] = gen_rtx_REG (<MODE>mode, REGNO (operands[0]));
+    operands[2] = GEN_INT (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode));
+    return "ext\t%0.b, %0.b, %0.b, #%2";
+  }
+)
+
+;; Extract the last active element of operand 1 into operand 0.
+;; If no elements are active, extract the last inactive element instead.
+(define_insn "aarch64_sve_lastb<mode>"
+  [(set (match_operand:<VEL> 0 "register_operand" "=r, w")
+       (unspec:<VEL>
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+          (match_operand:SVE_ALL 2 "register_operand" "w, w")]
+         UNSPEC_LASTB))]
+  "TARGET_SVE"
+  "@
+   lastb\t%<vwcore>0, %1, %2.<Vetype>
+   lastb\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
+(define_expand "vec_duplicate<mode>"
+  [(parallel
+    [(set (match_operand:SVE_ALL 0 "register_operand")
+         (vec_duplicate:SVE_ALL
+           (match_operand:<VEL> 1 "aarch64_sve_dup_operand")))
+     (clobber (scratch:<VPRED>))])]
+  "TARGET_SVE"
+  {
+    if (MEM_P (operands[1]))
+      {
+       rtx ptrue = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+       emit_insn (gen_sve_ld1r<mode> (operands[0], ptrue, operands[1],
+                                      CONST0_RTX (<MODE>mode)));
+       DONE;
+      }
+  }
+)
+
+;; Accept memory operands for the benefit of combine, and also in case
+;; the scalar input gets spilled to memory during RA.  We want to split
+;; the load at the first opportunity in order to allow the PTRUE to be
+;; optimized with surrounding code.
+(define_insn_and_split "*vec_duplicate<mode>_reg"
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w, w")
+       (vec_duplicate:SVE_ALL
+         (match_operand:<VEL> 1 "aarch64_sve_dup_operand" "r, w, Uty")))
+   (clobber (match_scratch:<VPRED> 2 "=X, X, Upl"))]
+  "TARGET_SVE"
+  "@
+   mov\t%0.<Vetype>, %<vwcore>1
+   mov\t%0.<Vetype>, %<Vetype>1
+   #"
+  "&& MEM_P (operands[1])"
+  [(const_int 0)]
+  {
+    if (GET_CODE (operands[2]) == SCRATCH)
+      operands[2] = gen_reg_rtx (<VPRED>mode);
+    emit_move_insn (operands[2], CONSTM1_RTX (<VPRED>mode));
+    emit_insn (gen_sve_ld1r<mode> (operands[0], operands[2], operands[1],
+                                  CONST0_RTX (<MODE>mode)));
+    DONE;
+  }
+  [(set_attr "length" "4,4,8")]
+)
+
+;; This is used for vec_duplicate<mode>s from memory, but can also
+;; be used by combine to optimize selects of a a vec_duplicate<mode>
+;; with zero.
+(define_insn "sve_ld1r<mode>"
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+       (unspec:SVE_ALL
+         [(match_operand:<VPRED> 1 "register_operand" "Upl")
+          (vec_duplicate:SVE_ALL
+            (match_operand:<VEL> 2 "aarch64_sve_ld1r_operand" "Uty"))
+          (match_operand:SVE_ALL 3 "aarch64_simd_imm_zero")]
+         UNSPEC_SEL))]
+  "TARGET_SVE"
+  "ld1r<Vesize>\t%0.<Vetype>, %1/z, %2"
+)
+
+;; Load 128 bits from memory and duplicate to fill a vector.  Since there
+;; are so few operations on 128-bit "elements", we don't define a VNx1TI
+;; and simply use vectors of bytes instead.
+(define_insn "sve_ld1rq"
+  [(set (match_operand:VNx16QI 0 "register_operand" "=w")
+       (unspec:VNx16QI
+         [(match_operand:VNx16BI 1 "register_operand" "Upl")
+          (match_operand:TI 2 "aarch64_sve_ld1r_operand" "Uty")]
+         UNSPEC_LD1RQ))]
+  "TARGET_SVE"
+  "ld1rqb\t%0.b, %1/z, %2"
+)
+
+;; Implement a predicate broadcast by shifting the low bit of the scalar
+;; input into the top bit and using a WHILELO.  An alternative would be to
+;; duplicate the input and do a compare with zero.
+(define_expand "vec_duplicate<mode>"
+  [(set (match_operand:PRED_ALL 0 "register_operand")
+       (vec_duplicate:PRED_ALL (match_operand 1 "register_operand")))]
+  "TARGET_SVE"
+  {
+    rtx tmp = gen_reg_rtx (DImode);
+    rtx op1 = gen_lowpart (DImode, operands[1]);
+    emit_insn (gen_ashldi3 (tmp, op1, gen_int_mode (63, DImode)));
+    emit_insn (gen_while_ultdi<mode> (operands[0], const0_rtx, tmp));
+    DONE;
+  }
+)
+
+(define_insn "vec_series<mode>"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w")
+       (vec_series:SVE_I
+         (match_operand:<VEL> 1 "aarch64_sve_index_operand" "Usi, r, r")
+         (match_operand:<VEL> 2 "aarch64_sve_index_operand" "r, Usi, r")))]
+  "TARGET_SVE"
+  "@
+   index\t%0.<Vetype>, #%1, %<vw>2
+   index\t%0.<Vetype>, %<vw>1, #%2
+   index\t%0.<Vetype>, %<vw>1, %<vw>2"
+)
+
+;; Optimize {x, x, x, x, ...} + {0, n, 2*n, 3*n, ...} if n is in range
+;; of an INDEX instruction.
+(define_insn "*vec_series<mode>_plus"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w")
+       (plus:SVE_I
+         (vec_duplicate:SVE_I
+           (match_operand:<VEL> 1 "register_operand" "r"))
+         (match_operand:SVE_I 2 "immediate_operand")))]
+  "TARGET_SVE && aarch64_check_zero_based_sve_index_immediate (operands[2])"
+  {
+    operands[2] = aarch64_check_zero_based_sve_index_immediate (operands[2]);
+    return "index\t%0.<Vetype>, %<vw>1, #%2";
+  }
+)
+
+(define_expand "vec_perm<mode>"
+  [(match_operand:SVE_ALL 0 "register_operand")
+   (match_operand:SVE_ALL 1 "register_operand")
+   (match_operand:SVE_ALL 2 "register_operand")
+   (match_operand:<V_INT_EQUIV> 3 "aarch64_sve_vec_perm_operand")]
+  "TARGET_SVE && GET_MODE_NUNITS (<MODE>mode).is_constant ()"
+  {
+    aarch64_expand_sve_vec_perm (operands[0], operands[1],
+                                operands[2], operands[3]);
+    DONE;
+  }
+)
+
+(define_insn "*aarch64_sve_tbl<mode>"
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+       (unspec:SVE_ALL
+         [(match_operand:SVE_ALL 1 "register_operand" "w")
+          (match_operand:<V_INT_EQUIV> 2 "register_operand" "w")]
+         UNSPEC_TBL))]
+  "TARGET_SVE"
+  "tbl\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+(define_insn "*aarch64_sve_<perm_insn><perm_hilo><mode>"
+  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+       (unspec:PRED_ALL [(match_operand:PRED_ALL 1 "register_operand" "Upa")
+                         (match_operand:PRED_ALL 2 "register_operand" "Upa")]
+                        PERMUTE))]
+  "TARGET_SVE"
+  "<perm_insn><perm_hilo>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+(define_insn "*aarch64_sve_<perm_insn><perm_hilo><mode>"
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+       (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "w")
+                        (match_operand:SVE_ALL 2 "register_operand" "w")]
+                       PERMUTE))]
+  "TARGET_SVE"
+  "<perm_insn><perm_hilo>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+(define_insn "*aarch64_sve_rev64<mode>"
+  [(set (match_operand:SVE_BHS 0 "register_operand" "=w")
+       (unspec:SVE_BHS
+         [(match_operand:VNx2BI 1 "register_operand" "Upl")
+          (unspec:SVE_BHS [(match_operand:SVE_BHS 2 "register_operand" "w")]
+                          UNSPEC_REV64)]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "rev<Vesize>\t%0.d, %1/m, %2.d"
+)
+
+(define_insn "*aarch64_sve_rev32<mode>"
+  [(set (match_operand:SVE_BH 0 "register_operand" "=w")
+       (unspec:SVE_BH
+         [(match_operand:VNx4BI 1 "register_operand" "Upl")
+          (unspec:SVE_BH [(match_operand:SVE_BH 2 "register_operand" "w")]
+                         UNSPEC_REV32)]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "rev<Vesize>\t%0.s, %1/m, %2.s"
+)
+
+(define_insn "*aarch64_sve_rev16vnx16qi"
+  [(set (match_operand:VNx16QI 0 "register_operand" "=w")
+       (unspec:VNx16QI
+         [(match_operand:VNx8BI 1 "register_operand" "Upl")
+          (unspec:VNx16QI [(match_operand:VNx16QI 2 "register_operand" "w")]
+                          UNSPEC_REV16)]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "revb\t%0.h, %1/m, %2.h"
+)
+
+(define_insn "*aarch64_sve_rev<mode>"
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+       (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "w")]
+                       UNSPEC_REV))]
+  "TARGET_SVE"
+  "rev\t%0.<Vetype>, %1.<Vetype>")
+
+(define_insn "*aarch64_sve_dup_lane<mode>"
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+       (vec_duplicate:SVE_ALL
+         (vec_select:<VEL>
+           (match_operand:SVE_ALL 1 "register_operand" "w")
+           (parallel [(match_operand:SI 2 "const_int_operand")]))))]
+  "TARGET_SVE
+   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 0, 63)"
+  "dup\t%0.<Vetype>, %1.<Vetype>[%2]"
+)
+
+;; Note that the immediate (third) operand is the lane index not
+;; the byte index.
+(define_insn "*aarch64_sve_ext<mode>"
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+       (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "0")
+                        (match_operand:SVE_ALL 2 "register_operand" "w")
+                        (match_operand:SI 3 "const_int_operand")]
+                       UNSPEC_EXT))]
+  "TARGET_SVE
+   && IN_RANGE (INTVAL (operands[3]) * GET_MODE_SIZE (<VEL>mode), 0, 255)"
+  {
+    operands[3] = GEN_INT (INTVAL (operands[3]) * GET_MODE_SIZE (<VEL>mode));
+    return "ext\\t%0.b, %0.b, %2.b, #%3";
+  }
+)
+
+(define_insn "add<mode>3"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, w, w")
+       (plus:SVE_I
+         (match_operand:SVE_I 1 "register_operand" "%0, 0, 0, w")
+         (match_operand:SVE_I 2 "aarch64_sve_add_operand" "vsa, vsn, vsi, w")))]
+  "TARGET_SVE"
+  "@
+   add\t%0.<Vetype>, %0.<Vetype>, #%D2
+   sub\t%0.<Vetype>, %0.<Vetype>, #%N2
+   * return aarch64_output_sve_inc_dec_immediate (\"%0.<Vetype>\", operands[2]);
+   add\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+(define_insn "sub<mode>3"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+       (minus:SVE_I
+         (match_operand:SVE_I 1 "aarch64_sve_arith_operand" "w, vsa")
+         (match_operand:SVE_I 2 "register_operand" "w, 0")))]
+  "TARGET_SVE"
+  "@
+   sub\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>
+   subr\t%0.<Vetype>, %0.<Vetype>, #%D1"
+)
+
+;; Unpredicated multiplication.
+(define_expand "mul<mode>3"
+  [(set (match_operand:SVE_I 0 "register_operand")
+       (unspec:SVE_I
+         [(match_dup 3)
+          (mult:SVE_I
+            (match_operand:SVE_I 1 "register_operand")
+            (match_operand:SVE_I 2 "aarch64_sve_mul_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Multiplication predicated with a PTRUE.  We don't actually need the
+;; predicate for the first alternative, but using Upa or X isn't likely
+;; to gain much and would make the instruction seem less uniform to the
+;; register allocator.
+(define_insn "*mul<mode>3"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+       (unspec:SVE_I
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+          (mult:SVE_I
+            (match_operand:SVE_I 2 "register_operand" "%0, 0")
+            (match_operand:SVE_I 3 "aarch64_sve_mul_operand" "vsm, w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "@
+   mul\t%0.<Vetype>, %0.<Vetype>, #%3
+   mul\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+(define_insn "*madd<mode>"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+       (plus:SVE_I
+         (unspec:SVE_I
+           [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+            (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w")
+                        (match_operand:SVE_I 3 "register_operand" "w, w"))]
+           UNSPEC_MERGE_PTRUE)
+         (match_operand:SVE_I 4 "register_operand" "w, 0")))]
+  "TARGET_SVE"
+  "@
+   mad\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>
+   mla\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>"
+)
+
+(define_insn "*msub<mode>3"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+       (minus:SVE_I
+         (match_operand:SVE_I 4 "register_operand" "w, 0")
+         (unspec:SVE_I
+           [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+            (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w")
+                        (match_operand:SVE_I 3 "register_operand" "w, w"))]
+           UNSPEC_MERGE_PTRUE)))]
+  "TARGET_SVE"
+  "@
+   msb\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>
+   mls\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated NEG, NOT and POPCOUNT.
+(define_expand "<optab><mode>2"
+  [(set (match_operand:SVE_I 0 "register_operand")
+       (unspec:SVE_I
+         [(match_dup 2)
+          (SVE_INT_UNARY:SVE_I (match_operand:SVE_I 1 "register_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; NEG, NOT and POPCOUNT predicated with a PTRUE.
+(define_insn "*<optab><mode>2"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w")
+       (unspec:SVE_I
+         [(match_operand:<VPRED> 1 "register_operand" "Upl")
+          (SVE_INT_UNARY:SVE_I
+            (match_operand:SVE_I 2 "register_operand" "w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "<sve_int_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
+)
+
+;; Vector AND, ORR and XOR.
+(define_insn "<optab><mode>3"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+       (LOGICAL:SVE_I
+         (match_operand:SVE_I 1 "register_operand" "%0, w")
+         (match_operand:SVE_I 2 "aarch64_sve_logical_operand" "vsl, w")))]
+  "TARGET_SVE"
+  "@
+   <logical>\t%0.<Vetype>, %0.<Vetype>, #%C2
+   <logical>\t%0.d, %1.d, %2.d"
+)
+
+;; Vector AND, ORR and XOR on floating-point modes.  We avoid subregs
+;; by providing this, but we need to use UNSPECs since rtx logical ops
+;; aren't defined for floating-point modes.
+(define_insn "*<optab><mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w")
+       (unspec:SVE_F [(match_operand:SVE_F 1 "register_operand" "w")
+                      (match_operand:SVE_F 2 "register_operand" "w")]
+                     LOGICALF))]
+  "TARGET_SVE"
+  "<logicalf_op>\t%0.d, %1.d, %2.d"
+)
+
+;; REG_EQUAL notes on "not<mode>3" should ensure that we can generate
+;; this pattern even though the NOT instruction itself is predicated.
+(define_insn "bic<mode>3"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w")
+       (and:SVE_I
+         (not:SVE_I (match_operand:SVE_I 1 "register_operand" "w"))
+         (match_operand:SVE_I 2 "register_operand" "w")))]
+  "TARGET_SVE"
+  "bic\t%0.d, %2.d, %1.d"
+)
+
+;; Predicate AND.  We can reuse one of the inputs as the GP.
+(define_insn "and<mode>3"
+  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+       (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand" "Upa")
+                     (match_operand:PRED_ALL 2 "register_operand" "Upa")))]
+  "TARGET_SVE"
+  "and\t%0.b, %1/z, %1.b, %2.b"
+)
+
+;; Unpredicated predicate ORR and XOR.
+(define_expand "<optab><mode>3"
+  [(set (match_operand:PRED_ALL 0 "register_operand")
+       (and:PRED_ALL
+         (LOGICAL_OR:PRED_ALL
+           (match_operand:PRED_ALL 1 "register_operand")
+           (match_operand:PRED_ALL 2 "register_operand"))
+         (match_dup 3)))]
+  "TARGET_SVE"
+  {
+    operands[3] = force_reg (<MODE>mode, CONSTM1_RTX (<MODE>mode));
+  }
+)
+
+;; Predicated predicate ORR and XOR.
+(define_insn "pred_<optab><mode>3"
+  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+       (and:PRED_ALL
+         (LOGICAL:PRED_ALL
+           (match_operand:PRED_ALL 2 "register_operand" "Upa")
+           (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+         (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+  "TARGET_SVE"
+  "<logical>\t%0.b, %1/z, %2.b, %3.b"
+)
+
+;; Perform a logical operation on operands 2 and 3, using operand 1 as
+;; the GP (which is known to be a PTRUE).  Store the result in operand 0
+;; and set the flags in the same way as for PTEST.  The (and ...) in the
+;; UNSPEC_PTEST_PTRUE is logically redundant, but means that the tested
+;; value is structurally equivalent to rhs of the second set.
+(define_insn "*<optab><mode>3_cc"
+  [(set (reg:CC CC_REGNUM)
+       (compare:CC
+         (unspec:SI [(match_operand:PRED_ALL 1 "register_operand" "Upa")
+                     (and:PRED_ALL
+                       (LOGICAL:PRED_ALL
+                         (match_operand:PRED_ALL 2 "register_operand" "Upa")
+                         (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+                       (match_dup 1))]
+                    UNSPEC_PTEST_PTRUE)
+         (const_int 0)))
+   (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+       (and:PRED_ALL (LOGICAL:PRED_ALL (match_dup 2) (match_dup 3))
+                     (match_dup 1)))]
+  "TARGET_SVE"
+  "<logical>s\t%0.b, %1/z, %2.b, %3.b"
+)
+
+;; Unpredicated predicate inverse.
+(define_expand "one_cmpl<mode>2"
+  [(set (match_operand:PRED_ALL 0 "register_operand")
+       (and:PRED_ALL
+         (not:PRED_ALL (match_operand:PRED_ALL 1 "register_operand"))
+         (match_dup 2)))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<MODE>mode, CONSTM1_RTX (<MODE>mode));
+  }
+)
+
+;; Predicated predicate inverse.
+(define_insn "*one_cmpl<mode>3"
+  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+       (and:PRED_ALL
+         (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+         (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+  "TARGET_SVE"
+  "not\t%0.b, %1/z, %2.b"
+)
+
+;; Predicated predicate BIC and ORN.
+(define_insn "*<nlogical><mode>3"
+  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+       (and:PRED_ALL
+         (NLOGICAL:PRED_ALL
+           (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+           (match_operand:PRED_ALL 3 "register_operand" "Upa"))
+         (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+  "TARGET_SVE"
+  "<nlogical>\t%0.b, %1/z, %3.b, %2.b"
+)
+
+;; Predicated predicate NAND and NOR.
+(define_insn "*<logical_nn><mode>3"
+  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+       (and:PRED_ALL
+         (NLOGICAL:PRED_ALL
+           (not:PRED_ALL (match_operand:PRED_ALL 2 "register_operand" "Upa"))
+           (not:PRED_ALL (match_operand:PRED_ALL 3 "register_operand" "Upa")))
+         (match_operand:PRED_ALL 1 "register_operand" "Upa")))]
+  "TARGET_SVE"
+  "<logical_nn>\t%0.b, %1/z, %2.b, %3.b"
+)
+
+;; Unpredicated LSL, LSR and ASR by a vector.
+(define_expand "v<optab><mode>3"
+  [(set (match_operand:SVE_I 0 "register_operand")
+       (unspec:SVE_I
+         [(match_dup 3)
+          (ASHIFT:SVE_I
+            (match_operand:SVE_I 1 "register_operand")
+            (match_operand:SVE_I 2 "aarch64_sve_<lr>shift_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; LSL, LSR and ASR by a vector, predicated with a PTRUE.  We don't
+;; actually need the predicate for the first alternative, but using Upa
+;; or X isn't likely to gain much and would make the instruction seem
+;; less uniform to the register allocator.
+(define_insn "*v<optab><mode>3"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
+       (unspec:SVE_I
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+          (ASHIFT:SVE_I
+            (match_operand:SVE_I 2 "register_operand" "w, 0")
+            (match_operand:SVE_I 3 "aarch64_sve_<lr>shift_operand" "D<lr>, w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "@
+   <shift>\t%0.<Vetype>, %2.<Vetype>, #%3
+   <shift>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+;; LSL, LSR and ASR by a scalar, which expands into one of the vector
+;; shifts above.
+(define_expand "<ASHIFT:optab><mode>3"
+  [(set (match_operand:SVE_I 0 "register_operand")
+       (ASHIFT:SVE_I (match_operand:SVE_I 1 "register_operand")
+                     (match_operand:<VEL> 2 "general_operand")))]
+  "TARGET_SVE"
+  {
+    rtx amount;
+    if (CONST_INT_P (operands[2]))
+      {
+       amount = gen_const_vec_duplicate (<MODE>mode, operands[2]);
+       if (!aarch64_sve_<lr>shift_operand (operands[2], <MODE>mode))
+         amount = force_reg (<MODE>mode, amount);
+      }
+    else
+      {
+       amount = gen_reg_rtx (<MODE>mode);
+       emit_insn (gen_vec_duplicate<mode> (amount,
+                                           convert_to_mode (<VEL>mode,
+                                                            operands[2], 0)));
+      }
+    emit_insn (gen_v<optab><mode>3 (operands[0], operands[1], amount));
+    DONE;
+  }
+)
+
+;; Test all bits of operand 1.  Operand 0 is a GP that is known to hold PTRUE.
+;;
+;; Using UNSPEC_PTEST_PTRUE allows combine patterns to assume that the GP
+;; is a PTRUE even if the optimizers haven't yet been able to propagate
+;; the constant.  We would use a separate unspec code for PTESTs involving
+;; GPs that might not be PTRUEs.
+(define_insn "ptest_ptrue<mode>"
+  [(set (reg:CC CC_REGNUM)
+       (compare:CC
+         (unspec:SI [(match_operand:PRED_ALL 0 "register_operand" "Upa")
+                     (match_operand:PRED_ALL 1 "register_operand" "Upa")]
+                    UNSPEC_PTEST_PTRUE)
+         (const_int 0)))]
+  "TARGET_SVE"
+  "ptest\t%0, %1.b"
+)
+
+;; Set element I of the result if operand1 + J < operand2 for all J in [0, I].
+;; with the comparison being unsigned.
+(define_insn "while_ult<GPI:mode><PRED_ALL:mode>"
+  [(set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+       (unspec:PRED_ALL [(match_operand:GPI 1 "aarch64_reg_or_zero" "rZ")
+                         (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")]
+                        UNSPEC_WHILE_LO))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_SVE"
+  "whilelo\t%0.<PRED_ALL:Vetype>, %<w>1, %<w>2"
+)
+
+;; WHILELO sets the flags in the same way as a PTEST with a PTRUE GP.
+;; Handle the case in which both results are useful.  The GP operand
+;; to the PTEST isn't needed, so we allow it to be anything.
+(define_insn_and_split "while_ult<GPI:mode><PRED_ALL:mode>_cc"
+  [(set (reg:CC CC_REGNUM)
+       (compare:CC
+         (unspec:SI [(match_operand:PRED_ALL 1)
+                     (unspec:PRED_ALL
+                       [(match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")
+                        (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")]
+                       UNSPEC_WHILE_LO)]
+                    UNSPEC_PTEST_PTRUE)
+         (const_int 0)))
+   (set (match_operand:PRED_ALL 0 "register_operand" "=Upa")
+       (unspec:PRED_ALL [(match_dup 2)
+                         (match_dup 3)]
+                        UNSPEC_WHILE_LO))]
+  "TARGET_SVE"
+  "whilelo\t%0.<PRED_ALL:Vetype>, %<w>2, %<w>3"
+  ;; Force the compiler to drop the unused predicate operand, so that we
+  ;; don't have an unnecessary PTRUE.
+  "&& !CONSTANT_P (operands[1])"
+  [(const_int 0)]
+  {
+    emit_insn (gen_while_ult<GPI:mode><PRED_ALL:mode>_cc
+              (operands[0], CONSTM1_RTX (<MODE>mode),
+               operands[2], operands[3]));
+    DONE;
+  }
+)
+
+;; Predicated integer comparison.
+(define_insn "*vec_cmp<cmp_op>_<mode>"
+  [(set (match_operand:<VPRED> 0 "register_operand" "=Upa, Upa")
+       (unspec:<VPRED>
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+          (match_operand:SVE_I 2 "register_operand" "w, w")
+          (match_operand:SVE_I 3 "aarch64_sve_cmp_<imm_con>_operand" "<imm_con>, w")]
+         SVE_COND_INT_CMP))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_SVE"
+  "@
+   cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+   cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Predicated integer comparison in which only the flags result is interesting.
+(define_insn "*vec_cmp<cmp_op>_<mode>_ptest"
+  [(set (reg:CC CC_REGNUM)
+       (compare:CC
+         (unspec:SI
+           [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+            (unspec:<VPRED>
+              [(match_dup 1)
+               (match_operand:SVE_I 2 "register_operand" "w, w")
+               (match_operand:SVE_I 3 "aarch64_sve_cmp_<imm_con>_operand" "<imm_con>, w")]
+              SVE_COND_INT_CMP)]
+           UNSPEC_PTEST_PTRUE)
+         (const_int 0)))
+   (clobber (match_scratch:<VPRED> 0 "=Upa, Upa"))]
+  "TARGET_SVE"
+  "@
+   cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+   cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Predicated comparison in which both the flag and predicate results
+;; are interesting.
+(define_insn "*vec_cmp<cmp_op>_<mode>_cc"
+  [(set (reg:CC CC_REGNUM)
+       (compare:CC
+         (unspec:SI
+           [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+            (unspec:<VPRED>
+              [(match_dup 1)
+               (match_operand:SVE_I 2 "register_operand" "w, w")
+               (match_operand:SVE_I 3 "aarch64_sve_cmp_<imm_con>_operand" "<imm_con>, w")]
+              SVE_COND_INT_CMP)]
+           UNSPEC_PTEST_PTRUE)
+         (const_int 0)))
+   (set (match_operand:<VPRED> 0 "register_operand" "=Upa, Upa")
+       (unspec:<VPRED>
+         [(match_dup 1)
+          (match_dup 2)
+          (match_dup 3)]
+         SVE_COND_INT_CMP))]
+  "TARGET_SVE"
+  "@
+   cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #%3
+   cmp<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Predicated floating-point comparison (excluding FCMUO, which doesn't
+;; allow #0.0 as an operand).
+(define_insn "*vec_fcm<cmp_op><mode>"
+  [(set (match_operand:<VPRED> 0 "register_operand" "=Upa, Upa")
+       (unspec:<VPRED>
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+          (match_operand:SVE_F 2 "register_operand" "w, w")
+          (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero" "Dz, w")]
+         SVE_COND_FP_CMP))]
+  "TARGET_SVE"
+  "@
+   fcm<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, #0.0
+   fcm<cmp_op>\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Predicated FCMUO.
+(define_insn "*vec_fcmuo<mode>"
+  [(set (match_operand:<VPRED> 0 "register_operand" "=Upa")
+       (unspec:<VPRED>
+         [(match_operand:<VPRED> 1 "register_operand" "Upl")
+          (match_operand:SVE_F 2 "register_operand" "w")
+          (match_operand:SVE_F 3 "register_operand" "w")]
+         UNSPEC_COND_UO))]
+  "TARGET_SVE"
+  "fcmuo\t%0.<Vetype>, %1/z, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; vcond_mask operand order: true, false, mask
+;; UNSPEC_SEL operand order: mask, true, false (as for VEC_COND_EXPR)
+;; SEL operand order:        mask, true, false
+(define_insn "vcond_mask_<mode><vpred>"
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+       (unspec:SVE_ALL
+         [(match_operand:<VPRED> 3 "register_operand" "Upa")
+          (match_operand:SVE_ALL 1 "register_operand" "w")
+          (match_operand:SVE_ALL 2 "register_operand" "w")]
+         UNSPEC_SEL))]
+  "TARGET_SVE"
+  "sel\t%0.<Vetype>, %3, %1.<Vetype>, %2.<Vetype>"
+)
+
+;; Selects between a duplicated immediate and zero.
+(define_insn "aarch64_sve_dup<mode>_const"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w")
+       (unspec:SVE_I
+         [(match_operand:<VPRED> 1 "register_operand" "Upl")
+          (match_operand:SVE_I 2 "aarch64_sve_dup_immediate")
+          (match_operand:SVE_I 3 "aarch64_simd_imm_zero")]
+         UNSPEC_SEL))]
+  "TARGET_SVE"
+  "mov\t%0.<Vetype>, %1/z, #%2"
+)
+
+;; Integer (signed) vcond.  Don't enforce an immediate range here, since it
+;; depends on the comparison; leave it to aarch64_expand_sve_vcond instead.
+(define_expand "vcond<mode><v_int_equiv>"
+  [(set (match_operand:SVE_ALL 0 "register_operand")
+       (if_then_else:SVE_ALL
+         (match_operator 3 "comparison_operator"
+           [(match_operand:<V_INT_EQUIV> 4 "register_operand")
+            (match_operand:<V_INT_EQUIV> 5 "nonmemory_operand")])
+         (match_operand:SVE_ALL 1 "register_operand")
+         (match_operand:SVE_ALL 2 "register_operand")))]
+  "TARGET_SVE"
+  {
+    aarch64_expand_sve_vcond (<MODE>mode, <V_INT_EQUIV>mode, operands);
+    DONE;
+  }
+)
+
+;; Integer vcondu.  Don't enforce an immediate range here, since it
+;; depends on the comparison; leave it to aarch64_expand_sve_vcond instead.
+(define_expand "vcondu<mode><v_int_equiv>"
+  [(set (match_operand:SVE_ALL 0 "register_operand")
+       (if_then_else:SVE_ALL
+         (match_operator 3 "comparison_operator"
+           [(match_operand:<V_INT_EQUIV> 4 "register_operand")
+            (match_operand:<V_INT_EQUIV> 5 "nonmemory_operand")])
+         (match_operand:SVE_ALL 1 "register_operand")
+         (match_operand:SVE_ALL 2 "register_operand")))]
+  "TARGET_SVE"
+  {
+    aarch64_expand_sve_vcond (<MODE>mode, <V_INT_EQUIV>mode, operands);
+    DONE;
+  }
+)
+
+;; Floating-point vcond.  All comparisons except FCMUO allow a zero
+;; operand; aarch64_expand_sve_vcond handles the case of an FCMUO
+;; with zero.
+(define_expand "vcond<mode><v_fp_equiv>"
+  [(set (match_operand:SVE_SD 0 "register_operand")
+       (if_then_else:SVE_SD
+         (match_operator 3 "comparison_operator"
+           [(match_operand:<V_FP_EQUIV> 4 "register_operand")
+            (match_operand:<V_FP_EQUIV> 5 "aarch64_simd_reg_or_zero")])
+         (match_operand:SVE_SD 1 "register_operand")
+         (match_operand:SVE_SD 2 "register_operand")))]
+  "TARGET_SVE"
+  {
+    aarch64_expand_sve_vcond (<MODE>mode, <V_FP_EQUIV>mode, operands);
+    DONE;
+  }
+)
+
+;; Signed integer comparisons.  Don't enforce an immediate range here, since
+;; it depends on the comparison; leave it to aarch64_expand_sve_vec_cmp_int
+;; instead.
+(define_expand "vec_cmp<mode><vpred>"
+  [(parallel
+    [(set (match_operand:<VPRED> 0 "register_operand")
+         (match_operator:<VPRED> 1 "comparison_operator"
+           [(match_operand:SVE_I 2 "register_operand")
+            (match_operand:SVE_I 3 "nonmemory_operand")]))
+     (clobber (reg:CC CC_REGNUM))])]
+  "TARGET_SVE"
+  {
+    aarch64_expand_sve_vec_cmp_int (operands[0], GET_CODE (operands[1]),
+                                   operands[2], operands[3]);
+    DONE;
+  }
+)
+
+;; Unsigned integer comparisons.  Don't enforce an immediate range here, since
+;; it depends on the comparison; leave it to aarch64_expand_sve_vec_cmp_int
+;; instead.
+(define_expand "vec_cmpu<mode><vpred>"
+  [(parallel
+    [(set (match_operand:<VPRED> 0 "register_operand")
+         (match_operator:<VPRED> 1 "comparison_operator"
+           [(match_operand:SVE_I 2 "register_operand")
+            (match_operand:SVE_I 3 "nonmemory_operand")]))
+     (clobber (reg:CC CC_REGNUM))])]
+  "TARGET_SVE"
+  {
+    aarch64_expand_sve_vec_cmp_int (operands[0], GET_CODE (operands[1]),
+                                   operands[2], operands[3]);
+    DONE;
+  }
+)
+
+;; Floating-point comparisons.  All comparisons except FCMUO allow a zero
+;; operand; aarch64_expand_sve_vec_cmp_float handles the case of an FCMUO
+;; with zero.
+(define_expand "vec_cmp<mode><vpred>"
+  [(set (match_operand:<VPRED> 0 "register_operand")
+       (match_operator:<VPRED> 1 "comparison_operator"
+         [(match_operand:SVE_F 2 "register_operand")
+          (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero")]))]
+  "TARGET_SVE"
+  {
+    aarch64_expand_sve_vec_cmp_float (operands[0], GET_CODE (operands[1]),
+                                     operands[2], operands[3], false);
+    DONE;
+  }
+)
+
+;; Branch based on predicate equality or inequality.
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+       (if_then_else
+         (match_operator 0 "aarch64_equality_operator"
+           [(match_operand:PRED_ALL 1 "register_operand")
+            (match_operand:PRED_ALL 2 "aarch64_simd_reg_or_zero")])
+         (label_ref (match_operand 3 ""))
+         (pc)))]
+  ""
+  {
+    rtx ptrue = force_reg (<MODE>mode, CONSTM1_RTX (<MODE>mode));
+    rtx pred;
+    if (operands[2] == CONST0_RTX (<MODE>mode))
+      pred = operands[1];
+    else
+      {
+       pred = gen_reg_rtx (<MODE>mode);
+       emit_insn (gen_pred_xor<mode>3 (pred, ptrue, operands[1],
+                                       operands[2]));
+      }
+    emit_insn (gen_ptest_ptrue<mode> (ptrue, pred));
+    operands[1] = gen_rtx_REG (CCmode, CC_REGNUM);
+    operands[2] = const0_rtx;
+  }
+)
+
+;; Unpredicated integer MIN/MAX.
+(define_expand "<su><maxmin><mode>3"
+  [(set (match_operand:SVE_I 0 "register_operand")
+       (unspec:SVE_I
+         [(match_dup 3)
+          (MAXMIN:SVE_I (match_operand:SVE_I 1 "register_operand")
+                        (match_operand:SVE_I 2 "register_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Integer MIN/MAX predicated with a PTRUE.
+(define_insn "*<su><maxmin><mode>3"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w")
+       (unspec:SVE_I
+         [(match_operand:<VPRED> 1 "register_operand" "Upl")
+          (MAXMIN:SVE_I (match_operand:SVE_I 2 "register_operand" "%0")
+                        (match_operand:SVE_I 3 "register_operand" "w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "<su><maxmin>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated floating-point MIN/MAX.
+(define_expand "<su><maxmin><mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
+         [(match_dup 3)
+          (FMAXMIN:SVE_F (match_operand:SVE_F 1 "register_operand")
+                         (match_operand:SVE_F 2 "register_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Floating-point MIN/MAX predicated with a PTRUE.
+(define_insn "*<su><maxmin><mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w")
+       (unspec:SVE_F
+         [(match_operand:<VPRED> 1 "register_operand" "Upl")
+          (FMAXMIN:SVE_F (match_operand:SVE_F 2 "register_operand" "%0")
+                         (match_operand:SVE_F 3 "register_operand" "w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "f<maxmin>nm\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated fmin/fmax.
+(define_expand "<maxmin_uns><mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
+         [(match_dup 3)
+          (unspec:SVE_F [(match_operand:SVE_F 1 "register_operand")
+                         (match_operand:SVE_F 2 "register_operand")]
+                        FMAXMIN_UNS)]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; fmin/fmax predicated with a PTRUE.
+(define_insn "*<maxmin_uns><mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w")
+       (unspec:SVE_F
+         [(match_operand:<VPRED> 1 "register_operand" "Upl")
+          (unspec:SVE_F [(match_operand:SVE_F 2 "register_operand" "%0")
+                         (match_operand:SVE_F 3 "register_operand" "w")]
+                        FMAXMIN_UNS)]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "<maxmin_uns_op>\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated integer add reduction.
+(define_expand "reduc_plus_scal_<mode>"
+  [(set (match_operand:<VEL> 0 "register_operand")
+       (unspec:<VEL> [(match_dup 2)
+                      (match_operand:SVE_I 1 "register_operand")]
+                     UNSPEC_ADDV))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Predicated integer add reduction.  The result is always 64-bits.
+(define_insn "*reduc_plus_scal_<mode>"
+  [(set (match_operand:<VEL> 0 "register_operand" "=w")
+       (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+                      (match_operand:SVE_I 2 "register_operand" "w")]
+                     UNSPEC_ADDV))]
+  "TARGET_SVE"
+  "uaddv\t%d0, %1, %2.<Vetype>"
+)
+
+;; Unpredicated floating-point add reduction.
+(define_expand "reduc_plus_scal_<mode>"
+  [(set (match_operand:<VEL> 0 "register_operand")
+       (unspec:<VEL> [(match_dup 2)
+                      (match_operand:SVE_F 1 "register_operand")]
+                     UNSPEC_FADDV))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Predicated floating-point add reduction.
+(define_insn "*reduc_plus_scal_<mode>"
+  [(set (match_operand:<VEL> 0 "register_operand" "=w")
+       (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+                      (match_operand:SVE_F 2 "register_operand" "w")]
+                     UNSPEC_FADDV))]
+  "TARGET_SVE"
+  "faddv\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
+;; Unpredicated integer MIN/MAX reduction.
+(define_expand "reduc_<maxmin_uns>_scal_<mode>"
+  [(set (match_operand:<VEL> 0 "register_operand")
+       (unspec:<VEL> [(match_dup 2)
+                      (match_operand:SVE_I 1 "register_operand")]
+                     MAXMINV))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Predicated integer MIN/MAX reduction.
+(define_insn "*reduc_<maxmin_uns>_scal_<mode>"
+  [(set (match_operand:<VEL> 0 "register_operand" "=w")
+       (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+                      (match_operand:SVE_I 2 "register_operand" "w")]
+                     MAXMINV))]
+  "TARGET_SVE"
+  "<maxmin_uns_op>v\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
+;; Unpredicated floating-point MIN/MAX reduction.
+(define_expand "reduc_<maxmin_uns>_scal_<mode>"
+  [(set (match_operand:<VEL> 0 "register_operand")
+       (unspec:<VEL> [(match_dup 2)
+                      (match_operand:SVE_F 1 "register_operand")]
+                     FMAXMINV))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Predicated floating-point MIN/MAX reduction.
+(define_insn "*reduc_<maxmin_uns>_scal_<mode>"
+  [(set (match_operand:<VEL> 0 "register_operand" "=w")
+       (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+                      (match_operand:SVE_F 2 "register_operand" "w")]
+                     FMAXMINV))]
+  "TARGET_SVE"
+  "<maxmin_uns_op>v\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
+;; Unpredicated floating-point addition.
+(define_expand "add<mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
+         [(match_dup 3)
+          (plus:SVE_F
+            (match_operand:SVE_F 1 "register_operand")
+            (match_operand:SVE_F 2 "aarch64_sve_float_arith_with_sub_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Floating-point addition predicated with a PTRUE.
+(define_insn "*add<mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w")
+       (unspec:SVE_F
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl, Upl")
+          (plus:SVE_F
+             (match_operand:SVE_F 2 "register_operand" "%0, 0, w")
+             (match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" "vsA, vsN, w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "@
+   fadd\t%0.<Vetype>, %1/m, %0.<Vetype>, #%3
+   fsub\t%0.<Vetype>, %1/m, %0.<Vetype>, #%N3
+   fadd\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated floating-point subtraction.
+(define_expand "sub<mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
+         [(match_dup 3)
+          (minus:SVE_F
+            (match_operand:SVE_F 1 "aarch64_sve_float_arith_operand")
+            (match_operand:SVE_F 2 "register_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Floating-point subtraction predicated with a PTRUE.
+(define_insn "*sub<mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w, w")
+       (unspec:SVE_F
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl, Upl, Upl")
+          (minus:SVE_F
+            (match_operand:SVE_F 2 "aarch64_sve_float_arith_operand" "0, 0, vsA, w")
+            (match_operand:SVE_F 3 "aarch64_sve_float_arith_with_sub_operand" "vsA, vsN, 0, w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE
+   && (register_operand (operands[2], <MODE>mode)
+       || register_operand (operands[3], <MODE>mode))"
+  "@
+   fsub\t%0.<Vetype>, %1/m, %0.<Vetype>, #%3
+   fadd\t%0.<Vetype>, %1/m, %0.<Vetype>, #%N3
+   fsubr\t%0.<Vetype>, %1/m, %0.<Vetype>, #%2
+   fsub\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated floating-point multiplication.
+(define_expand "mul<mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
+         [(match_dup 3)
+          (mult:SVE_F
+            (match_operand:SVE_F 1 "register_operand")
+            (match_operand:SVE_F 2 "aarch64_sve_float_mul_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Floating-point multiplication predicated with a PTRUE.
+(define_insn "*mul<mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+       (unspec:SVE_F
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+          (mult:SVE_F
+            (match_operand:SVE_F 2 "register_operand" "%0, w")
+            (match_operand:SVE_F 3 "aarch64_sve_float_mul_operand" "vsM, w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "@
+   fmul\t%0.<Vetype>, %1/m, %0.<Vetype>, #%3
+   fmul\t%0.<Vetype>, %2.<Vetype>, %3.<Vetype>"
+)
+
+;; Unpredicated fma (%0 = (%1 * %2) + %3).
+(define_expand "fma<mode>4"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
+         [(match_dup 4)
+          (fma:SVE_F (match_operand:SVE_F 1 "register_operand")
+                     (match_operand:SVE_F 2 "register_operand")
+                     (match_operand:SVE_F 3 "register_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; fma predicated with a PTRUE.
+(define_insn "*fma<mode>4"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+       (unspec:SVE_F
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+          (fma:SVE_F (match_operand:SVE_F 3 "register_operand" "%0, w")
+                     (match_operand:SVE_F 4 "register_operand" "w, w")
+                     (match_operand:SVE_F 2 "register_operand" "w, 0"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "@
+   fmad\t%0.<Vetype>, %1/m, %4.<Vetype>, %2.<Vetype>
+   fmla\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+;; Unpredicated fnma (%0 = (-%1 * %2) + %3).
+(define_expand "fnma<mode>4"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
+         [(match_dup 4)
+          (fma:SVE_F (neg:SVE_F
+                       (match_operand:SVE_F 1 "register_operand"))
+                     (match_operand:SVE_F 2 "register_operand")
+                     (match_operand:SVE_F 3 "register_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; fnma predicated with a PTRUE.
+(define_insn "*fnma<mode>4"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+       (unspec:SVE_F
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+          (fma:SVE_F (neg:SVE_F
+                       (match_operand:SVE_F 3 "register_operand" "%0, w"))
+                     (match_operand:SVE_F 4 "register_operand" "w, w")
+                     (match_operand:SVE_F 2 "register_operand" "w, 0"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "@
+   fmsb\t%0.<Vetype>, %1/m, %4.<Vetype>, %2.<Vetype>
+   fmls\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+;; Unpredicated fms (%0 = (%1 * %2) - %3).
+(define_expand "fms<mode>4"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
+         [(match_dup 4)
+          (fma:SVE_F (match_operand:SVE_F 1 "register_operand")
+                     (match_operand:SVE_F 2 "register_operand")
+                     (neg:SVE_F
+                       (match_operand:SVE_F 3 "register_operand")))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; fms predicated with a PTRUE.
+(define_insn "*fms<mode>4"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+       (unspec:SVE_F
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+          (fma:SVE_F (match_operand:SVE_F 3 "register_operand" "%0, w")
+                     (match_operand:SVE_F 4 "register_operand" "w, w")
+                     (neg:SVE_F
+                       (match_operand:SVE_F 2 "register_operand" "w, 0")))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "@
+   fnmsb\t%0.<Vetype>, %1/m, %4.<Vetype>, %2.<Vetype>
+   fnmls\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+;; Unpredicated fnms (%0 = (-%1 * %2) - %3).
+(define_expand "fnms<mode>4"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
+         [(match_dup 4)
+          (fma:SVE_F (neg:SVE_F
+                       (match_operand:SVE_F 1 "register_operand"))
+                     (match_operand:SVE_F 2 "register_operand")
+                     (neg:SVE_F
+                       (match_operand:SVE_F 3 "register_operand")))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[4] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; fnms predicated with a PTRUE.
+(define_insn "*fnms<mode>4"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+       (unspec:SVE_F
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+          (fma:SVE_F (neg:SVE_F
+                       (match_operand:SVE_F 3 "register_operand" "%0, w"))
+                     (match_operand:SVE_F 4 "register_operand" "w, w")
+                     (neg:SVE_F
+                       (match_operand:SVE_F 2 "register_operand" "w, 0")))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "@
+   fnmad\t%0.<Vetype>, %1/m, %4.<Vetype>, %2.<Vetype>
+   fnmla\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+;; Unpredicated floating-point division.
+(define_expand "div<mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
+         [(match_dup 3)
+          (div:SVE_F (match_operand:SVE_F 1 "register_operand")
+                     (match_operand:SVE_F 2 "register_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[3] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Floating-point division predicated with a PTRUE.
+(define_insn "*div<mode>3"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
+       (unspec:SVE_F
+         [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+          (div:SVE_F (match_operand:SVE_F 2 "register_operand" "0, w")
+                     (match_operand:SVE_F 3 "register_operand" "w, 0"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "@
+   fdiv\t%0.<Vetype>, %1/m, %0.<Vetype>, %3.<Vetype>
+   fdivr\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>"
+)
+
+;; Unpredicated FNEG, FABS and FSQRT.
+(define_expand "<optab><mode>2"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
+         [(match_dup 2)
+          (SVE_FP_UNARY:SVE_F (match_operand:SVE_F 1 "register_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; FNEG, FABS and FSQRT predicated with a PTRUE.
+(define_insn "*<optab><mode>2"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w")
+       (unspec:SVE_F
+         [(match_operand:<VPRED> 1 "register_operand" "Upl")
+          (SVE_FP_UNARY:SVE_F (match_operand:SVE_F 2 "register_operand" "w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "<sve_fp_op>\t%0.<Vetype>, %1/m, %2.<Vetype>"
+)
+
+;; Unpredicated FRINTy.
+(define_expand "<frint_pattern><mode>2"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
+         [(match_dup 2)
+          (unspec:SVE_F [(match_operand:SVE_F 1 "register_operand")]
+                        FRINT)]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; FRINTy predicated with a PTRUE.
+(define_insn "*<frint_pattern><mode>2"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w")
+       (unspec:SVE_F
+         [(match_operand:<VPRED> 1 "register_operand" "Upl")
+          (unspec:SVE_F [(match_operand:SVE_F 2 "register_operand" "w")]
+                        FRINT)]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "frint<frint_suffix>\t%0.<Vetype>, %1/m, %2.<Vetype>"
+)
+
+;; Unpredicated conversion of floats to integers of the same size (HF to HI,
+;; SF to SI or DF to DI).
+(define_expand "<fix_trunc_optab><mode><v_int_equiv>2"
+  [(set (match_operand:<V_INT_EQUIV> 0 "register_operand")
+       (unspec:<V_INT_EQUIV>
+         [(match_dup 2)
+          (FIXUORS:<V_INT_EQUIV>
+            (match_operand:SVE_F 1 "register_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Conversion of SF to DI, SI or HI, predicated with a PTRUE.
+(define_insn "*<fix_trunc_optab>v16hsf<mode>2"
+  [(set (match_operand:SVE_HSDI 0 "register_operand" "=w")
+       (unspec:SVE_HSDI
+         [(match_operand:<VPRED> 1 "register_operand" "Upl")
+          (FIXUORS:SVE_HSDI
+            (match_operand:VNx8HF 2 "register_operand" "w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "fcvtz<su>\t%0.<Vetype>, %1/m, %2.h"
+)
+
+;; Conversion of SF to DI or SI, predicated with a PTRUE.
+(define_insn "*<fix_trunc_optab>vnx4sf<mode>2"
+  [(set (match_operand:SVE_SDI 0 "register_operand" "=w")
+       (unspec:SVE_SDI
+         [(match_operand:<VPRED> 1 "register_operand" "Upl")
+          (FIXUORS:SVE_SDI
+            (match_operand:VNx4SF 2 "register_operand" "w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "fcvtz<su>\t%0.<Vetype>, %1/m, %2.s"
+)
+
+;; Conversion of DF to DI or SI, predicated with a PTRUE.
+(define_insn "*<fix_trunc_optab>vnx2df<mode>2"
+  [(set (match_operand:SVE_SDI 0 "register_operand" "=w")
+       (unspec:SVE_SDI
+         [(match_operand:VNx2BI 1 "register_operand" "Upl")
+          (FIXUORS:SVE_SDI
+            (match_operand:VNx2DF 2 "register_operand" "w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "fcvtz<su>\t%0.<Vetype>, %1/m, %2.d"
+)
+
+;; Unpredicated conversion of integers to floats of the same size
+;; (HI to HF, SI to SF or DI to DF).
+(define_expand "<optab><v_int_equiv><mode>2"
+  [(set (match_operand:SVE_F 0 "register_operand")
+       (unspec:SVE_F
+         [(match_dup 2)
+          (FLOATUORS:SVE_F
+            (match_operand:<V_INT_EQUIV> 1 "register_operand"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+;; Conversion of DI, SI or HI to the same number of HFs, predicated
+;; with a PTRUE.
+(define_insn "*<optab><mode>vnx8hf2"
+  [(set (match_operand:VNx8HF 0 "register_operand" "=w")
+       (unspec:VNx8HF
+         [(match_operand:<VPRED> 1 "register_operand" "Upl")
+          (FLOATUORS:VNx8HF
+            (match_operand:SVE_HSDI 2 "register_operand" "w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "<su_optab>cvtf\t%0.h, %1/m, %2.<Vetype>"
+)
+
+;; Conversion of DI or SI to the same number of SFs, predicated with a PTRUE.
+(define_insn "*<optab><mode>vnx4sf2"
+  [(set (match_operand:VNx4SF 0 "register_operand" "=w")
+       (unspec:VNx4SF
+         [(match_operand:<VPRED> 1 "register_operand" "Upl")
+          (FLOATUORS:VNx4SF
+            (match_operand:SVE_SDI 2 "register_operand" "w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "<su_optab>cvtf\t%0.s, %1/m, %2.<Vetype>"
+)
+
+;; Conversion of DI or SI to DF, predicated with a PTRUE.
+(define_insn "*<optab><mode>vnx2df2"
+  [(set (match_operand:VNx2DF 0 "register_operand" "=w")
+       (unspec:VNx2DF
+         [(match_operand:VNx2BI 1 "register_operand" "Upl")
+          (FLOATUORS:VNx2DF
+            (match_operand:SVE_SDI 2 "register_operand" "w"))]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "<su_optab>cvtf\t%0.d, %1/m, %2.<Vetype>"
+)
+
+;; Conversion of DFs to the same number of SFs, or SFs to the same number
+;; of HFs.
+(define_insn "*trunc<Vwide><mode>2"
+  [(set (match_operand:SVE_HSF 0 "register_operand" "=w")
+       (unspec:SVE_HSF
+         [(match_operand:<VWIDE_PRED> 1 "register_operand" "Upl")
+          (unspec:SVE_HSF
+            [(match_operand:<VWIDE> 2 "register_operand" "w")]
+            UNSPEC_FLOAT_CONVERT)]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "fcvt\t%0.<Vetype>, %1/m, %2.<Vewtype>"
+)
+
+;; Conversion of SFs to the same number of DFs, or HFs to the same number
+;; of SFs.
+(define_insn "*extend<mode><Vwide>2"
+  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+       (unspec:<VWIDE>
+         [(match_operand:<VWIDE_PRED> 1 "register_operand" "Upl")
+          (unspec:<VWIDE>
+            [(match_operand:SVE_HSF 2 "register_operand" "w")]
+            UNSPEC_FLOAT_CONVERT)]
+         UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  "fcvt\t%0.<Vewtype>, %1/m, %2.<Vetype>"
+)
+
+;; PUNPKHI and PUNPKLO.
+(define_insn "vec_unpack<su>_<perm_hilo>_<mode>"
+  [(set (match_operand:<VWIDE> 0 "register_operand" "=Upa")
+       (unspec:<VWIDE> [(match_operand:PRED_BHS 1 "register_operand" "Upa")]
+                       UNPACK))]
+  "TARGET_SVE"
+  "punpk<perm_hilo>\t%0.h, %1.b"
+)
+
+;; SUNPKHI, UUNPKHI, SUNPKLO and UUNPKLO.
+(define_insn "vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>"
+  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
+       (unspec:<VWIDE> [(match_operand:SVE_BHSI 1 "register_operand" "w")]
+                       UNPACK))]
+  "TARGET_SVE"
+  "<su>unpk<perm_hilo>\t%0.<Vewtype>, %1.<Vetype>"
+)
+
+;; Used by the vec_unpacks_<perm_hilo>_<mode> expander to unpack the bit
+;; representation of a VNx4SF or VNx8HF without conversion.  The choice
+;; between signed and unsigned isn't significant.
+(define_insn "*vec_unpacku_<perm_hilo>_<mode>_no_convert"
+  [(set (match_operand:SVE_HSF 0 "register_operand" "=w")
+       (unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand" "w")]
+                       UNPACK_UNSIGNED))]
+  "TARGET_SVE"
+  "uunpk<perm_hilo>\t%0.<Vewtype>, %1.<Vetype>"
+)
+
+;; Unpack one half of a VNx4SF to VNx2DF, or one half of a VNx8HF to VNx4SF.
+;; First unpack the source without conversion, then float-convert the
+;; unpacked source.
+(define_expand "vec_unpacks_<perm_hilo>_<mode>"
+  [(set (match_dup 2)
+       (unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand")]
+                       UNPACK_UNSIGNED))
+   (set (match_operand:<VWIDE> 0 "register_operand")
+       (unspec:<VWIDE> [(match_dup 3)
+                        (unspec:<VWIDE> [(match_dup 2)] UNSPEC_FLOAT_CONVERT)]
+                       UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[2] = gen_reg_rtx (<MODE>mode);
+    operands[3] = force_reg (<VWIDE_PRED>mode, CONSTM1_RTX (<VWIDE_PRED>mode));
+  }
+)
+
+;; Unpack one half of a VNx4SI to VNx2DF.  First unpack from VNx4SI
+;; to VNx2DI, reinterpret the VNx2DI as a VNx4SI, then convert the
+;; unpacked VNx4SI to VNx2DF.
+(define_expand "vec_unpack<su_optab>_float_<perm_hilo>_vnx4si"
+  [(set (match_dup 2)
+       (unspec:VNx2DI [(match_operand:VNx4SI 1 "register_operand")]
+                      UNPACK_UNSIGNED))
+   (set (match_operand:VNx2DF 0 "register_operand")
+       (unspec:VNx2DF [(match_dup 3)
+                       (FLOATUORS:VNx2DF (match_dup 4))]
+                      UNSPEC_MERGE_PTRUE))]
+  "TARGET_SVE"
+  {
+    operands[2] = gen_reg_rtx (VNx2DImode);
+    operands[3] = force_reg (VNx2BImode, CONSTM1_RTX (VNx2BImode));
+    operands[4] = gen_rtx_SUBREG (VNx4SImode, operands[2], 0);
+  }
+)
+
+;; Predicate pack.  Use UZP1 on the narrower type, which discards
+;; the high part of each wide element.
+(define_insn "vec_pack_trunc_<Vwide>"
+  [(set (match_operand:PRED_BHS 0 "register_operand" "=Upa")
+       (unspec:PRED_BHS
+         [(match_operand:<VWIDE> 1 "register_operand" "Upa")
+          (match_operand:<VWIDE> 2 "register_operand" "Upa")]
+         UNSPEC_PACK))]
+  "TARGET_SVE"
+  "uzp1\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+;; Integer pack.  Use UZP1 on the narrower type, which discards
+;; the high part of each wide element.
+(define_insn "vec_pack_trunc_<Vwide>"
+  [(set (match_operand:SVE_BHSI 0 "register_operand" "=w")
+       (unspec:SVE_BHSI
+         [(match_operand:<VWIDE> 1 "register_operand" "w")
+          (match_operand:<VWIDE> 2 "register_operand" "w")]
+         UNSPEC_PACK))]
+  "TARGET_SVE"
+  "uzp1\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
+)
+
+;; Convert two vectors of DF to SF, or two vectors of SF to HF, and pack
+;; the results into a single vector.
+(define_expand "vec_pack_trunc_<Vwide>"
+  [(set (match_dup 4)
+       (unspec:SVE_HSF
+         [(match_dup 3)
+          (unspec:SVE_HSF [(match_operand:<VWIDE> 1 "register_operand")]
+                          UNSPEC_FLOAT_CONVERT)]
+         UNSPEC_MERGE_PTRUE))
+   (set (match_dup 5)
+       (unspec:SVE_HSF
+         [(match_dup 3)
+          (unspec:SVE_HSF [(match_operand:<VWIDE> 2 "register_operand")]
+                          UNSPEC_FLOAT_CONVERT)]
+         UNSPEC_MERGE_PTRUE))
+   (set (match_operand:SVE_HSF 0 "register_operand")
+       (unspec:SVE_HSF [(match_dup 4) (match_dup 5)] UNSPEC_UZP1))]
+  "TARGET_SVE"
+  {
+    operands[3] = force_reg (<VWIDE_PRED>mode, CONSTM1_RTX (<VWIDE_PRED>mode));
+    operands[4] = gen_reg_rtx (<MODE>mode);
+    operands[5] = gen_reg_rtx (<MODE>mode);
+  }
+)
+
+;; Convert two vectors of DF to SI and pack the results into a single vector.
+(define_expand "vec_pack_<su>fix_trunc_vnx2df"
+  [(set (match_dup 4)
+       (unspec:VNx4SI
+         [(match_dup 3)
+          (FIXUORS:VNx4SI (match_operand:VNx2DF 1 "register_operand"))]
+         UNSPEC_MERGE_PTRUE))
+   (set (match_dup 5)
+       (unspec:VNx4SI
+         [(match_dup 3)
+          (FIXUORS:VNx4SI (match_operand:VNx2DF 2 "register_operand"))]
+         UNSPEC_MERGE_PTRUE))
+   (set (match_operand:VNx4SI 0 "register_operand")
+       (unspec:VNx4SI [(match_dup 4) (match_dup 5)] UNSPEC_UZP1))]
+  "TARGET_SVE"
+  {
+    operands[3] = force_reg (VNx2BImode, CONSTM1_RTX (VNx2BImode));
+    operands[4] = gen_reg_rtx (VNx4SImode);
+    operands[5] = gen_reg_rtx (VNx4SImode);
+  }
+)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

index ae44c2abe110b7732e1c36de51122595d0acf29e..c5ed870ef57a458ae5f8a393cfa20c58c446271e 100644 (file)
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -67,8 +67,10 @@
  #include "sched-int.h"
  #include "target-globals.h"
  #include "common/common-target.h"
+#include "cfgrtl.h"
  #include "selftest.h"
  #include "selftest-rtl.h"
+#include "rtx-vector-builder.h"
  
  /* This file should be included last.  */
  #include "target-def.h"
@@ -129,13 +131,18 @@ struct simd_immediate_info
    simd_immediate_info (scalar_int_mode, unsigned HOST_WIDE_INT,
                        insn_type = MOV, modifier_type = LSL,
                        unsigned int = 0);
+  simd_immediate_info (scalar_mode, rtx, rtx);
  
    /* The mode of the elements.  */
    scalar_mode elt_mode;
  
-  /* The value of each element.  */
+  /* The value of each element if all elements are the same, or the
+     first value if the constant is a series.  */
    rtx value;
  
+  /* The value of the step if the constant is a series, null otherwise.  */
+  rtx step;
+
    /* The instruction to use to move the immediate into a vector.  */
    insn_type insn;
  
@@ -149,7 +156,7 @@ struct simd_immediate_info
     ELT_MODE_IN and value VALUE_IN.  */
  inline simd_immediate_info
  ::simd_immediate_info (scalar_float_mode elt_mode_in, rtx value_in)
-  : elt_mode (elt_mode_in), value (value_in), insn (MOV),
+  : elt_mode (elt_mode_in), value (value_in), step (NULL_RTX), insn (MOV),
      modifier (LSL), shift (0)
  {}
  
@@ -162,12 +169,23 @@ inline simd_immediate_info
                        insn_type insn_in, modifier_type modifier_in,
                        unsigned int shift_in)
    : elt_mode (elt_mode_in), value (gen_int_mode (value_in, elt_mode_in)),
-    insn (insn_in), modifier (modifier_in), shift (shift_in)
+    step (NULL_RTX), insn (insn_in), modifier (modifier_in), shift (shift_in)
+{}
+
+/* Construct an integer immediate in which each element has mode ELT_MODE_IN
+   and where element I is equal to VALUE_IN + I * STEP_IN.  */
+inline simd_immediate_info
+::simd_immediate_info (scalar_mode elt_mode_in, rtx value_in, rtx step_in)
+  : elt_mode (elt_mode_in), value (value_in), step (step_in), insn (MOV),
+    modifier (LSL), shift (0)
  {}
  
  /* The current code model.  */
  enum aarch64_code_model aarch64_cmodel;
  
+/* The number of 64-bit elements in an SVE vector.  */
+poly_uint16 aarch64_sve_vg;
+
  #ifdef HAVE_AS_TLS
  #undef TARGET_HAVE_TLS
  #define TARGET_HAVE_TLS 1
@@ -187,8 +205,7 @@ static bool aarch64_builtin_support_vector_misalignment (machine_mode mode,
                                                          const_tree type,
                                                          int misalignment,
                                                          bool is_packed);
-static machine_mode
-aarch64_simd_container_mode (scalar_mode mode, unsigned width);
+static machine_mode aarch64_simd_container_mode (scalar_mode, poly_int64);
  static bool aarch64_print_ldpstp_address (FILE *, machine_mode, rtx);
  
  /* Major revision number of the ARM Architecture implemented by the target.  */
@@ -1100,25 +1117,95 @@ aarch64_dbx_register_number (unsigned regno)
       return AARCH64_DWARF_SP;
     else if (FP_REGNUM_P (regno))
       return AARCH64_DWARF_V0 + regno - V0_REGNUM;
+   else if (PR_REGNUM_P (regno))
+     return AARCH64_DWARF_P0 + regno - P0_REGNUM;
+   else if (regno == VG_REGNUM)
+     return AARCH64_DWARF_VG;
  
     /* Return values >= DWARF_FRAME_REGISTERS indicate that there is no
        equivalent DWARF register.  */
     return DWARF_FRAME_REGISTERS;
  }
  
-/* Return TRUE if MODE is any of the large INT modes.  */
+/* Return true if MODE is any of the Advanced SIMD structure modes.  */
+static bool
+aarch64_advsimd_struct_mode_p (machine_mode mode)
+{
+  return (TARGET_SIMD
+         && (mode == OImode || mode == CImode || mode == XImode));
+}
+
+/* Return true if MODE is an SVE predicate mode.  */
+static bool
+aarch64_sve_pred_mode_p (machine_mode mode)
+{
+  return (TARGET_SVE
+         && (mode == VNx16BImode
+             || mode == VNx8BImode
+             || mode == VNx4BImode
+             || mode == VNx2BImode));
+}
+
+/* Three mutually-exclusive flags describing a vector or predicate type.  */
+const unsigned int VEC_ADVSIMD  = 1;
+const unsigned int VEC_SVE_DATA = 2;
+const unsigned int VEC_SVE_PRED = 4;
+/* Can be used in combination with VEC_ADVSIMD or VEC_SVE_DATA to indicate
+   a structure of 2, 3 or 4 vectors.  */
+const unsigned int VEC_STRUCT   = 8;
+/* Useful combinations of the above.  */
+const unsigned int VEC_ANY_SVE  = VEC_SVE_DATA | VEC_SVE_PRED;
+const unsigned int VEC_ANY_DATA = VEC_ADVSIMD | VEC_SVE_DATA;
+
+/* Return a set of flags describing the vector properties of mode MODE.
+   Ignore modes that are not supported by the current target.  */
+static unsigned int
+aarch64_classify_vector_mode (machine_mode mode)
+{
+  if (aarch64_advsimd_struct_mode_p (mode))
+    return VEC_ADVSIMD | VEC_STRUCT;
+
+  if (aarch64_sve_pred_mode_p (mode))
+    return VEC_SVE_PRED;
+
+  scalar_mode inner = GET_MODE_INNER (mode);
+  if (VECTOR_MODE_P (mode)
+      && (inner == QImode
+         || inner == HImode
+         || inner == HFmode
+         || inner == SImode
+         || inner == SFmode
+         || inner == DImode
+         || inner == DFmode))
+    {
+      if (TARGET_SVE
+         && known_eq (GET_MODE_BITSIZE (mode), BITS_PER_SVE_VECTOR))
+       return VEC_SVE_DATA;
+
+      /* This includes V1DF but not V1DI (which doesn't exist).  */
+      if (TARGET_SIMD
+         && (known_eq (GET_MODE_BITSIZE (mode), 64)
+             || known_eq (GET_MODE_BITSIZE (mode), 128)))
+       return VEC_ADVSIMD;
+    }
+
+  return 0;
+}
+
+/* Return true if MODE is any of the data vector modes, including
+   structure modes.  */
  static bool
-aarch64_vect_struct_mode_p (machine_mode mode)
+aarch64_vector_data_mode_p (machine_mode mode)
  {
-  return mode == OImode || mode == CImode || mode == XImode;
+  return aarch64_classify_vector_mode (mode) & VEC_ANY_DATA;
  }
  
-/* Return TRUE if MODE is any of the vector modes.  */
+/* Return true if MODE is an SVE data vector mode; either a single vector
+   or a structure of vectors.  */
  static bool
-aarch64_vector_mode_p (machine_mode mode)
+aarch64_sve_data_mode_p (machine_mode mode)
  {
-  return aarch64_vector_mode_supported_p (mode)
-        || aarch64_vect_struct_mode_p (mode);
+  return aarch64_classify_vector_mode (mode) & VEC_SVE_DATA;
  }
  
  /* Implement target hook TARGET_ARRAY_MODE_SUPPORTED_P.  */
@@ -1135,6 +1222,42 @@ aarch64_array_mode_supported_p (machine_mode mode,
    return false;
  }
  
+/* Return the SVE predicate mode to use for elements that have
+   ELEM_NBYTES bytes, if such a mode exists.  */
+
+opt_machine_mode
+aarch64_sve_pred_mode (unsigned int elem_nbytes)
+{
+  if (TARGET_SVE)
+    {
+      if (elem_nbytes == 1)
+       return VNx16BImode;
+      if (elem_nbytes == 2)
+       return VNx8BImode;
+      if (elem_nbytes == 4)
+       return VNx4BImode;
+      if (elem_nbytes == 8)
+       return VNx2BImode;
+    }
+  return opt_machine_mode ();
+}
+
+/* Implement TARGET_VECTORIZE_GET_MASK_MODE.  */
+
+static opt_machine_mode
+aarch64_get_mask_mode (poly_uint64 nunits, poly_uint64 nbytes)
+{
+  if (TARGET_SVE && known_eq (nbytes, BYTES_PER_SVE_VECTOR))
+    {
+      unsigned int elem_nbytes = vector_element_size (nbytes, nunits);
+      machine_mode pred_mode;
+      if (aarch64_sve_pred_mode (elem_nbytes).exists (&pred_mode))
+       return pred_mode;
+    }
+
+  return default_get_mask_mode (nunits, nbytes);
+}
+
  /* Implement TARGET_HARD_REGNO_NREGS.  */
  
  static unsigned int
@@ -1149,7 +1272,14 @@ aarch64_hard_regno_nregs (unsigned regno, machine_mode mode)
      {
      case FP_REGS:
      case FP_LO_REGS:
+      if (aarch64_sve_data_mode_p (mode))
+       return exact_div (GET_MODE_SIZE (mode),
+                         BYTES_PER_SVE_VECTOR).to_constant ();
        return CEIL (lowest_size, UNITS_PER_VREG);
+    case PR_REGS:
+    case PR_LO_REGS:
+    case PR_HI_REGS:
+      return 1;
      default:
        return CEIL (lowest_size, UNITS_PER_WORD);
      }
@@ -1164,6 +1294,17 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
    if (GET_MODE_CLASS (mode) == MODE_CC)
      return regno == CC_REGNUM;
  
+  if (regno == VG_REGNUM)
+    /* This must have the same size as _Unwind_Word.  */
+    return mode == DImode;
+
+  unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+  if (vec_flags & VEC_SVE_PRED)
+    return PR_REGNUM_P (regno);
+
+  if (PR_REGNUM_P (regno))
+    return 0;
+
    if (regno == SP_REGNUM)
      /* The purpose of comparing with ptr_mode is to support the
         global register variable associated with the stack pointer
@@ -1173,15 +1314,15 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
    if (regno == FRAME_POINTER_REGNUM || regno == ARG_POINTER_REGNUM)
      return mode == Pmode;
  
-  if (GP_REGNUM_P (regno) && ! aarch64_vect_struct_mode_p (mode))
+  if (GP_REGNUM_P (regno) && known_le (GET_MODE_SIZE (mode), 16))
      return true;
  
    if (FP_REGNUM_P (regno))
      {
-      if (aarch64_vect_struct_mode_p (mode))
+      if (vec_flags & VEC_STRUCT)
         return end_hard_regno (mode, regno) - 1 <= V31_REGNUM;
        else
-       return true;
+       return !VECTOR_MODE_P (mode) || vec_flags != 0;
      }
  
    return false;
@@ -1197,10 +1338,39 @@ aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
    return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
  }
  
+/* Implement REGMODE_NATURAL_SIZE.  */
+poly_uint64
+aarch64_regmode_natural_size (machine_mode mode)
+{
+  /* The natural size for SVE data modes is one SVE data vector,
+     and similarly for predicates.  We can't independently modify
+     anything smaller than that.  */
+  /* ??? For now, only do this for variable-width SVE registers.
+     Doing it for constant-sized registers breaks lower-subreg.c.  */
+  /* ??? And once that's fixed, we should probably have similar
+     code for Advanced SIMD.  */
+  if (!aarch64_sve_vg.is_constant ())
+    {
+      unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+      if (vec_flags & VEC_SVE_PRED)
+       return BYTES_PER_SVE_PRED;
+      if (vec_flags & VEC_SVE_DATA)
+       return BYTES_PER_SVE_VECTOR;
+    }
+  return UNITS_PER_WORD;
+}
+
  /* Implement HARD_REGNO_CALLER_SAVE_MODE.  */
  machine_mode
-aarch64_hard_regno_caller_save_mode (unsigned, unsigned, machine_mode mode)
-{
+aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,
+                                    machine_mode mode)
+{
+  /* The predicate mode determines which bits are significant and
+     which are "don't care".  Decreasing the number of lanes would
+     lose data while increasing the number of lanes would make bits
+     unnecessarily significant.  */
+  if (PR_REGNUM_P (regno))
+    return mode;
    if (known_ge (GET_MODE_SIZE (mode), 4))
      return mode;
    else
@@ -1886,6 +2056,200 @@ aarch64_force_temporary (machine_mode mode, rtx x, rtx value)
      }
  }
  
+/* Return true if we can move VALUE into a register using a single
+   CNT[BHWD] instruction.  */
+
+static bool
+aarch64_sve_cnt_immediate_p (poly_int64 value)
+{
+  HOST_WIDE_INT factor = value.coeffs[0];
+  /* The coefficient must be [1, 16] * {2, 4, 8, 16}.  */
+  return (value.coeffs[1] == factor
+         && IN_RANGE (factor, 2, 16 * 16)
+         && (factor & 1) == 0
+         && factor <= 16 * (factor & -factor));
+}
+
+/* Likewise for rtx X.  */
+
+bool
+aarch64_sve_cnt_immediate_p (rtx x)
+{
+  poly_int64 value;
+  return poly_int_rtx_p (x, &value) && aarch64_sve_cnt_immediate_p (value);
+}
+
+/* Return the asm string for an instruction with a CNT-like vector size
+   operand (a vector pattern followed by a multiplier in the range [1, 16]).
+   PREFIX is the mnemonic without the size suffix and OPERANDS is the
+   first part of the operands template (the part that comes before the
+   vector size itself).  FACTOR is the number of quadwords.
+   NELTS_PER_VQ, if nonzero, is the number of elements in each quadword.
+   If it is zero, we can use any element size.  */
+
+static char *
+aarch64_output_sve_cnt_immediate (const char *prefix, const char *operands,
+                                 unsigned int factor,
+                                 unsigned int nelts_per_vq)
+{
+  static char buffer[sizeof ("sqincd\t%x0, %w0, all, mul #16")];
+
+  if (nelts_per_vq == 0)
+    /* There is some overlap in the ranges of the four CNT instructions.
+       Here we always use the smallest possible element size, so that the
+       multiplier is 1 whereever possible.  */
+    nelts_per_vq = factor & -factor;
+  int shift = std::min (exact_log2 (nelts_per_vq), 4);
+  gcc_assert (IN_RANGE (shift, 1, 4));
+  char suffix = "dwhb"[shift - 1];
+
+  factor >>= shift;
+  unsigned int written;
+  if (factor == 1)
+    written = snprintf (buffer, sizeof (buffer), "%s%c\t%s",
+                       prefix, suffix, operands);
+  else
+    written = snprintf (buffer, sizeof (buffer), "%s%c\t%s, all, mul #%d",
+                       prefix, suffix, operands, factor);
+  gcc_assert (written < sizeof (buffer));
+  return buffer;
+}
+
+/* Return the asm string for an instruction with a CNT-like vector size
+   operand (a vector pattern followed by a multiplier in the range [1, 16]).
+   PREFIX is the mnemonic without the size suffix and OPERANDS is the
+   first part of the operands template (the part that comes before the
+   vector size itself).  X is the value of the vector size operand,
+   as a polynomial integer rtx.  */
+
+char *
+aarch64_output_sve_cnt_immediate (const char *prefix, const char *operands,
+                                 rtx x)
+{
+  poly_int64 value = rtx_to_poly_int64 (x);
+  gcc_assert (aarch64_sve_cnt_immediate_p (value));
+  return aarch64_output_sve_cnt_immediate (prefix, operands,
+                                          value.coeffs[1], 0);
+}
+
+/* Return true if we can add VALUE to a register using a single ADDVL
+   or ADDPL instruction.  */
+
+static bool
+aarch64_sve_addvl_addpl_immediate_p (poly_int64 value)
+{
+  HOST_WIDE_INT factor = value.coeffs[0];
+  if (factor == 0 || value.coeffs[1] != factor)
+    return false;
+  /* FACTOR counts VG / 2, so a value of 2 is one predicate width
+     and a value of 16 is one vector width.  */
+  return (((factor & 15) == 0 && IN_RANGE (factor, -32 * 16, 31 * 16))
+         || ((factor & 1) == 0 && IN_RANGE (factor, -32 * 2, 31 * 2)));
+}
+
+/* Likewise for rtx X.  */
+
+bool
+aarch64_sve_addvl_addpl_immediate_p (rtx x)
+{
+  poly_int64 value;
+  return (poly_int_rtx_p (x, &value)
+         && aarch64_sve_addvl_addpl_immediate_p (value));
+}
+
+/* Return the asm string for adding ADDVL or ADDPL immediate X to operand 1
+   and storing the result in operand 0.  */
+
+char *
+aarch64_output_sve_addvl_addpl (rtx dest, rtx base, rtx offset)
+{
+  static char buffer[sizeof ("addpl\t%x0, %x1, #-") + 3 * sizeof (int)];
+  poly_int64 offset_value = rtx_to_poly_int64 (offset);
+  gcc_assert (aarch64_sve_addvl_addpl_immediate_p (offset_value));
+
+  /* Use INC or DEC if possible.  */
+  if (rtx_equal_p (dest, base) && GP_REGNUM_P (REGNO (dest)))
+    {
+      if (aarch64_sve_cnt_immediate_p (offset_value))
+       return aarch64_output_sve_cnt_immediate ("inc", "%x0",
+                                                offset_value.coeffs[1], 0);
+      if (aarch64_sve_cnt_immediate_p (-offset_value))
+       return aarch64_output_sve_cnt_immediate ("dec", "%x0",
+                                                -offset_value.coeffs[1], 0);
+    }
+
+  int factor = offset_value.coeffs[1];
+  if ((factor & 15) == 0)
+    snprintf (buffer, sizeof (buffer), "addvl\t%%x0, %%x1, #%d", factor / 16);
+  else
+    snprintf (buffer, sizeof (buffer), "addpl\t%%x0, %%x1, #%d", factor / 2);
+  return buffer;
+}
+
+/* Return true if X is a valid immediate for an SVE vector INC or DEC
+   instruction.  If it is, store the number of elements in each vector
+   quadword in *NELTS_PER_VQ_OUT (if nonnull) and store the multiplication
+   factor in *FACTOR_OUT (if nonnull).  */
+
+bool
+aarch64_sve_inc_dec_immediate_p (rtx x, int *factor_out,
+                                unsigned int *nelts_per_vq_out)
+{
+  rtx elt;
+  poly_int64 value;
+
+  if (!const_vec_duplicate_p (x, &elt)
+      || !poly_int_rtx_p (elt, &value))
+    return false;
+
+  unsigned int nelts_per_vq = 128 / GET_MODE_UNIT_BITSIZE (GET_MODE (x));
+  if (nelts_per_vq != 8 && nelts_per_vq != 4 && nelts_per_vq != 2)
+    /* There's no vector INCB.  */
+    return false;
+
+  HOST_WIDE_INT factor = value.coeffs[0];
+  if (value.coeffs[1] != factor)
+    return false;
+
+  /* The coefficient must be [1, 16] * NELTS_PER_VQ.  */
+  if ((factor % nelts_per_vq) != 0
+      || !IN_RANGE (abs (factor), nelts_per_vq, 16 * nelts_per_vq))
+    return false;
+
+  if (factor_out)
+    *factor_out = factor;
+  if (nelts_per_vq_out)
+    *nelts_per_vq_out = nelts_per_vq;
+  return true;
+}
+
+/* Return true if X is a valid immediate for an SVE vector INC or DEC
+   instruction.  */
+
+bool
+aarch64_sve_inc_dec_immediate_p (rtx x)
+{
+  return aarch64_sve_inc_dec_immediate_p (x, NULL, NULL);
+}
+
+/* Return the asm template for an SVE vector INC or DEC instruction.
+   OPERANDS gives the operands before the vector count and X is the
+   value of the vector count operand itself.  */
+
+char *
+aarch64_output_sve_inc_dec_immediate (const char *operands, rtx x)
+{
+  int factor;
+  unsigned int nelts_per_vq;
+  if (!aarch64_sve_inc_dec_immediate_p (x, &factor, &nelts_per_vq))
+    gcc_unreachable ();
+  if (factor < 0)
+    return aarch64_output_sve_cnt_immediate ("dec", operands, -factor,
+                                            nelts_per_vq);
+  else
+    return aarch64_output_sve_cnt_immediate ("inc", operands, factor,
+                                            nelts_per_vq);
+}
  
  static int
  aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
@@ -2011,6 +2375,15 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
    return num_insns;
  }
  
+/* Return the number of temporary registers that aarch64_add_offset_1
+   would need to add OFFSET to a register.  */
+
+static unsigned int
+aarch64_add_offset_1_temporaries (HOST_WIDE_INT offset)
+{
+  return abs_hwi (offset) < 0x1000000 ? 0 : 1;
+}
+
  /* A subroutine of aarch64_add_offset.  Set DEST to SRC + OFFSET for
     a non-polynomial OFFSET.  MODE is the mode of the addition.
     FRAME_RELATED_P is true if the RTX_FRAME_RELATED flag should
@@ -2092,15 +2465,64 @@ aarch64_add_offset_1 (scalar_int_mode mode, rtx dest,
      }
  }
  
+/* Return the number of temporary registers that aarch64_add_offset
+   would need to move OFFSET into a register or add OFFSET to a register;
+   ADD_P is true if we want the latter rather than the former.  */
+
+static unsigned int
+aarch64_offset_temporaries (bool add_p, poly_int64 offset)
+{
+  /* This follows the same structure as aarch64_add_offset.  */
+  if (add_p && aarch64_sve_addvl_addpl_immediate_p (offset))
+    return 0;
+
+  unsigned int count = 0;
+  HOST_WIDE_INT factor = offset.coeffs[1];
+  HOST_WIDE_INT constant = offset.coeffs[0] - factor;
+  poly_int64 poly_offset (factor, factor);
+  if (add_p && aarch64_sve_addvl_addpl_immediate_p (poly_offset))
+    /* Need one register for the ADDVL/ADDPL result.  */
+    count += 1;
+  else if (factor != 0)
+    {
+      factor = abs (factor);
+      if (factor > 16 * (factor & -factor))
+       /* Need one register for the CNT result and one for the multiplication
+          factor.  If necessary, the second temporary can be reused for the
+          constant part of the offset.  */
+       return 2;
+      /* Need one register for the CNT result (which might then
+        be shifted).  */
+      count += 1;
+    }
+  return count + aarch64_add_offset_1_temporaries (constant);
+}
+
+/* If X can be represented as a poly_int64, return the number
+   of temporaries that are required to add it to a register.
+   Return -1 otherwise.  */
+
+int
+aarch64_add_offset_temporaries (rtx x)
+{
+  poly_int64 offset;
+  if (!poly_int_rtx_p (x, &offset))
+    return -1;
+  return aarch64_offset_temporaries (true, offset);
+}
+
  /* Set DEST to SRC + OFFSET.  MODE is the mode of the addition.
     FRAME_RELATED_P is true if the RTX_FRAME_RELATED flag should
     be set and CFA adjustments added to the generated instructions.
  
     TEMP1, if nonnull, is a register of mode MODE that can be used as a
     temporary if register allocation is already complete.  This temporary
-   register may overlap DEST but must not overlap SRC.  If TEMP1 is known
-   to hold abs (OFFSET), EMIT_MOVE_IMM can be set to false to avoid emitting
-   the immediate again.
+   register may overlap DEST if !FRAME_RELATED_P but must not overlap SRC.
+   If TEMP1 is known to hold abs (OFFSET), EMIT_MOVE_IMM can be set to
+   false to avoid emitting the immediate again.
+
+   TEMP2, if nonnull, is a second temporary register that doesn't
+   overlap either DEST or REG.
  
     Since this function may be used to adjust the stack pointer, we must
     ensure that it cannot cause transient stack deallocation (for example
@@ -2109,27 +2531,177 @@ aarch64_add_offset_1 (scalar_int_mode mode, rtx dest,
  
  static void
  aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx src,
-                   poly_int64 offset, rtx temp1, bool frame_related_p,
-                   bool emit_move_imm = true)
+                   poly_int64 offset, rtx temp1, rtx temp2,
+                   bool frame_related_p, bool emit_move_imm = true)
  {
    gcc_assert (emit_move_imm || temp1 != NULL_RTX);
    gcc_assert (temp1 == NULL_RTX || !reg_overlap_mentioned_p (temp1, src));
+  gcc_assert (temp1 == NULL_RTX
+             || !frame_related_p
+             || !reg_overlap_mentioned_p (temp1, dest));
+  gcc_assert (temp2 == NULL_RTX || !reg_overlap_mentioned_p (dest, temp2));
+
+  /* Try using ADDVL or ADDPL to add the whole value.  */
+  if (src != const0_rtx && aarch64_sve_addvl_addpl_immediate_p (offset))
+    {
+      rtx offset_rtx = gen_int_mode (offset, mode);
+      rtx_insn *insn = emit_insn (gen_add3_insn (dest, src, offset_rtx));
+      RTX_FRAME_RELATED_P (insn) = frame_related_p;
+      return;
+    }
+
+  /* Coefficient 1 is multiplied by the number of 128-bit blocks in an
+     SVE vector register, over and above the minimum size of 128 bits.
+     This is equivalent to half the value returned by CNTD with a
+     vector shape of ALL.  */
+  HOST_WIDE_INT factor = offset.coeffs[1];
+  HOST_WIDE_INT constant = offset.coeffs[0] - factor;
+
+  /* Try using ADDVL or ADDPL to add the VG-based part.  */
+  poly_int64 poly_offset (factor, factor);
+  if (src != const0_rtx
+      && aarch64_sve_addvl_addpl_immediate_p (poly_offset))
+    {
+      rtx offset_rtx = gen_int_mode (poly_offset, mode);
+      if (frame_related_p)
+       {
+         rtx_insn *insn = emit_insn (gen_add3_insn (dest, src, offset_rtx));
+         RTX_FRAME_RELATED_P (insn) = true;
+         src = dest;
+       }
+      else
+       {
+         rtx addr = gen_rtx_PLUS (mode, src, offset_rtx);
+         src = aarch64_force_temporary (mode, temp1, addr);
+         temp1 = temp2;
+         temp2 = NULL_RTX;
+       }
+    }
+  /* Otherwise use a CNT-based sequence.  */
+  else if (factor != 0)
+    {
+      /* Use a subtraction if we have a negative factor.  */
+      rtx_code code = PLUS;
+      if (factor < 0)
+       {
+         factor = -factor;
+         code = MINUS;
+       }
+
+      /* Calculate CNTD * FACTOR / 2.  First try to fold the division
+        into the multiplication.  */
+      rtx val;
+      int shift = 0;
+      if (factor & 1)
+       /* Use a right shift by 1.  */
+       shift = -1;
+      else
+       factor /= 2;
+      HOST_WIDE_INT low_bit = factor & -factor;
+      if (factor <= 16 * low_bit)
+       {
+         if (factor > 16 * 8)
+           {
+             /* "CNTB Xn, ALL, MUL #FACTOR" is out of range, so calculate
+                the value with the minimum multiplier and shift it into
+                position.  */
+             int extra_shift = exact_log2 (low_bit);
+             shift += extra_shift;
+             factor >>= extra_shift;
+           }
+         val = gen_int_mode (poly_int64 (factor * 2, factor * 2), mode);
+       }
+      else
+       {
+         /* Use CNTD, then multiply it by FACTOR.  */
+         val = gen_int_mode (poly_int64 (2, 2), mode);
+         val = aarch64_force_temporary (mode, temp1, val);
+
+         /* Go back to using a negative multiplication factor if we have
+            no register from which to subtract.  */
+         if (code == MINUS && src == const0_rtx)
+           {
+             factor = -factor;
+             code = PLUS;
+           }
+         rtx coeff1 = gen_int_mode (factor, mode);
+         coeff1 = aarch64_force_temporary (mode, temp2, coeff1);
+         val = gen_rtx_MULT (mode, val, coeff1);
+       }
+
+      if (shift > 0)
+       {
+         /* Multiply by 1 << SHIFT.  */
+         val = aarch64_force_temporary (mode, temp1, val);
+         val = gen_rtx_ASHIFT (mode, val, GEN_INT (shift));
+       }
+      else if (shift == -1)
+       {
+         /* Divide by 2.  */
+         val = aarch64_force_temporary (mode, temp1, val);
+         val = gen_rtx_ASHIFTRT (mode, val, const1_rtx);
+       }
+
+      /* Calculate SRC +/- CNTD * FACTOR / 2.  */
+      if (src != const0_rtx)
+       {
+         val = aarch64_force_temporary (mode, temp1, val);
+         val = gen_rtx_fmt_ee (code, mode, src, val);
+       }
+      else if (code == MINUS)
+       {
+         val = aarch64_force_temporary (mode, temp1, val);
+         val = gen_rtx_NEG (mode, val);
+       }
+
+      if (constant == 0 || frame_related_p)
+       {
+         rtx_insn *insn = emit_insn (gen_rtx_SET (dest, val));
+         if (frame_related_p)
+           {
+             RTX_FRAME_RELATED_P (insn) = true;
+             add_reg_note (insn, REG_CFA_ADJUST_CFA,
+                           gen_rtx_SET (dest, plus_constant (Pmode, src,
+                                                             poly_offset)));
+           }
+         src = dest;
+         if (constant == 0)
+           return;
+       }
+      else
+       {
+         src = aarch64_force_temporary (mode, temp1, val);
+         temp1 = temp2;
+         temp2 = NULL_RTX;
+       }
+
+      emit_move_imm = true;
+    }
  
-  /* SVE support will go here.  */
-  HOST_WIDE_INT constant = offset.to_constant ();
    aarch64_add_offset_1 (mode, dest, src, constant, temp1,
                         frame_related_p, emit_move_imm);
  }
  
+/* Like aarch64_add_offset, but the offset is given as an rtx rather
+   than a poly_int64.  */
+
+void
+aarch64_split_add_offset (scalar_int_mode mode, rtx dest, rtx src,
+                         rtx offset_rtx, rtx temp1, rtx temp2)
+{
+  aarch64_add_offset (mode, dest, src, rtx_to_poly_int64 (offset_rtx),
+                     temp1, temp2, false);
+}
+
  /* Add DELTA to the stack pointer, marking the instructions frame-related.
     TEMP1 is available as a temporary if nonnull.  EMIT_MOVE_IMM is false
     if TEMP1 already contains abs (DELTA).  */
  
  static inline void
-aarch64_add_sp (rtx temp1, poly_int64 delta, bool emit_move_imm)
+aarch64_add_sp (rtx temp1, rtx temp2, poly_int64 delta, bool emit_move_imm)
  {
    aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, delta,
-                     temp1, true, emit_move_imm);
+                     temp1, temp2, true, emit_move_imm);
  }
  
  /* Subtract DELTA from the stack pointer, marking the instructions
@@ -2137,44 +2709,195 @@ aarch64_add_sp (rtx temp1, poly_int64 delta, bool emit_move_imm)
     if nonnull.  */
  
  static inline void
-aarch64_sub_sp (rtx temp1, poly_int64 delta, bool frame_related_p)
+aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, bool frame_related_p)
  {
    aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, -delta,
-                     temp1, frame_related_p);
+                     temp1, temp2, frame_related_p);
  }
  
-void
-aarch64_expand_mov_immediate (rtx dest, rtx imm)
+/* Set DEST to (vec_series BASE STEP).  */
+
+static void
+aarch64_expand_vec_series (rtx dest, rtx base, rtx step)
  {
    machine_mode mode = GET_MODE (dest);
+  scalar_mode inner = GET_MODE_INNER (mode);
+
+  /* Each operand can be a register or an immediate in the range [-16, 15].  */
+  if (!aarch64_sve_index_immediate_p (base))
+    base = force_reg (inner, base);
+  if (!aarch64_sve_index_immediate_p (step))
+    step = force_reg (inner, step);
+
+  emit_set_insn (dest, gen_rtx_VEC_SERIES (mode, base, step));
+}
  
-  gcc_assert (mode == SImode || mode == DImode);
+/* Try to duplicate SRC into SVE register DEST, given that SRC is an
+   integer of mode INT_MODE.  Return true on success.  */
+
+static bool
+aarch64_expand_sve_widened_duplicate (rtx dest, scalar_int_mode src_mode,
+                                     rtx src)
+{
+  /* If the constant is smaller than 128 bits, we can do the move
+     using a vector of SRC_MODEs.  */
+  if (src_mode != TImode)
+    {
+      poly_uint64 count = exact_div (GET_MODE_SIZE (GET_MODE (dest)),
+                                    GET_MODE_SIZE (src_mode));
+      machine_mode dup_mode = mode_for_vector (src_mode, count).require ();
+      emit_move_insn (gen_lowpart (dup_mode, dest),
+                     gen_const_vec_duplicate (dup_mode, src));
+      return true;
+    }
+
+  /* The bytes are loaded in little-endian order, so do a byteswap on
+     big-endian targets.  */
+  if (BYTES_BIG_ENDIAN)
+    {
+      src = simplify_unary_operation (BSWAP, src_mode, src, src_mode);
+      if (!src)
+       return NULL_RTX;
+    }
+
+  /* Use LD1RQ to load the 128 bits from memory.  */
+  src = force_const_mem (src_mode, src);
+  if (!src)
+    return false;
+
+  /* Make sure that the address is legitimate.  */
+  if (!aarch64_sve_ld1r_operand_p (src))
+    {
+      rtx addr = force_reg (Pmode, XEXP (src, 0));
+      src = replace_equiv_address (src, addr);
+    }
+
+  rtx ptrue = force_reg (VNx16BImode, CONSTM1_RTX (VNx16BImode));
+  emit_insn (gen_sve_ld1rq (gen_lowpart (VNx16QImode, dest), ptrue, src));
+  return true;
+}
+
+/* Expand a move of general CONST_VECTOR SRC into DEST, given that it
+   isn't a simple duplicate or series.  */
+
+static void
+aarch64_expand_sve_const_vector (rtx dest, rtx src)
+{
+  machine_mode mode = GET_MODE (src);
+  unsigned int npatterns = CONST_VECTOR_NPATTERNS (src);
+  unsigned int nelts_per_pattern = CONST_VECTOR_NELTS_PER_PATTERN (src);
+  gcc_assert (npatterns > 1);
+
+  if (nelts_per_pattern == 1)
+    {
+      /* The constant is a repeating seqeuence of at least two elements,
+        where the repeating elements occupy no more than 128 bits.
+        Get an integer representation of the replicated value.  */
+      unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns;
+      gcc_assert (int_bits <= 128);
+
+      scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
+      rtx int_value = simplify_gen_subreg (int_mode, src, mode, 0);
+      if (int_value
+         && aarch64_expand_sve_widened_duplicate (dest, int_mode, int_value))
+       return;
+    }
+
+  /* Expand each pattern individually.  */
+  rtx_vector_builder builder;
+  auto_vec<rtx, 16> vectors (npatterns);
+  for (unsigned int i = 0; i < npatterns; ++i)
+    {
+      builder.new_vector (mode, 1, nelts_per_pattern);
+      for (unsigned int j = 0; j < nelts_per_pattern; ++j)
+       builder.quick_push (CONST_VECTOR_ELT (src, i + j * npatterns));
+      vectors.quick_push (force_reg (mode, builder.build ()));
+    }
+
+  /* Use permutes to interleave the separate vectors.  */
+  while (npatterns > 1)
+    {
+      npatterns /= 2;
+      for (unsigned int i = 0; i < npatterns; ++i)
+       {
+         rtx tmp = (npatterns == 1 ? dest : gen_reg_rtx (mode));
+         rtvec v = gen_rtvec (2, vectors[i], vectors[i + npatterns]);
+         emit_set_insn (tmp, gen_rtx_UNSPEC (mode, v, UNSPEC_ZIP1));
+         vectors[i] = tmp;
+       }
+    }
+  gcc_assert (vectors[0] == dest);
+}
+
+/* Set DEST to immediate IMM.  For SVE vector modes, GEN_VEC_DUPLICATE
+   is a pattern that can be used to set DEST to a replicated scalar
+   element.  */
+
+void
+aarch64_expand_mov_immediate (rtx dest, rtx imm,
+                             rtx (*gen_vec_duplicate) (rtx, rtx))
+{
+  machine_mode mode = GET_MODE (dest);
  
    /* Check on what type of symbol it is.  */
    scalar_int_mode int_mode;
    if ((GET_CODE (imm) == SYMBOL_REF
         || GET_CODE (imm) == LABEL_REF
-       || GET_CODE (imm) == CONST)
+       || GET_CODE (imm) == CONST
+       || GET_CODE (imm) == CONST_POLY_INT)
        && is_a <scalar_int_mode> (mode, &int_mode))
      {
-      rtx mem, base, offset;
+      rtx mem;
+      poly_int64 offset;
+      HOST_WIDE_INT const_offset;
        enum aarch64_symbol_type sty;
  
        /* If we have (const (plus symbol offset)), separate out the offset
          before we start classifying the symbol.  */
-      split_const (imm, &base, &offset);
+      rtx base = strip_offset (imm, &offset);
  
-      sty = aarch64_classify_symbol (base, offset);
+      /* We must always add an offset involving VL separately, rather than
+        folding it into the relocation.  */
+      if (!offset.is_constant (&const_offset))
+       {
+         if (base == const0_rtx && aarch64_sve_cnt_immediate_p (offset))
+           emit_insn (gen_rtx_SET (dest, imm));
+         else
+           {
+             /* Do arithmetic on 32-bit values if the result is smaller
+                than that.  */
+             if (partial_subreg_p (int_mode, SImode))
+               {
+                 /* It is invalid to do symbol calculations in modes
+                    narrower than SImode.  */
+                 gcc_assert (base == const0_rtx);
+                 dest = gen_lowpart (SImode, dest);
+                 int_mode = SImode;
+               }
+             if (base != const0_rtx)
+               {
+                 base = aarch64_force_temporary (int_mode, dest, base);
+                 aarch64_add_offset (int_mode, dest, base, offset,
+                                     NULL_RTX, NULL_RTX, false);
+               }
+             else
+               aarch64_add_offset (int_mode, dest, base, offset,
+                                   dest, NULL_RTX, false);
+           }
+         return;
+       }
+
+      sty = aarch64_classify_symbol (base, const_offset);
        switch (sty)
         {
         case SYMBOL_FORCE_TO_MEM:
-         if (offset != const0_rtx
+         if (const_offset != 0
               && targetm.cannot_force_const_mem (int_mode, imm))
             {
               gcc_assert (can_create_pseudo_p ());
               base = aarch64_force_temporary (int_mode, dest, base);
-             aarch64_add_offset (int_mode, dest, base, INTVAL (offset),
-                                 NULL_RTX, false);
+             aarch64_add_offset (int_mode, dest, base, const_offset,
+                                 NULL_RTX, NULL_RTX, false);
               return;
             }
  
@@ -2209,12 +2932,12 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
         case SYMBOL_SMALL_GOT_4G:
         case SYMBOL_TINY_GOT:
         case SYMBOL_TINY_TLSIE:
-         if (offset != const0_rtx)
+         if (const_offset != 0)
             {
               gcc_assert(can_create_pseudo_p ());
               base = aarch64_force_temporary (int_mode, dest, base);
-             aarch64_add_offset (int_mode, dest, base, INTVAL (offset),
-                                 NULL_RTX, false);
+             aarch64_add_offset (int_mode, dest, base, const_offset,
+                                 NULL_RTX, NULL_RTX, false);
               return;
             }
           /* FALLTHRU */
@@ -2235,13 +2958,36 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
  
    if (!CONST_INT_P (imm))
      {
-      if (GET_CODE (imm) == HIGH)
+      rtx base, step, value;
+      if (GET_CODE (imm) == HIGH
+         || aarch64_simd_valid_immediate (imm, NULL))
         emit_insn (gen_rtx_SET (dest, imm));
+      else if (const_vec_series_p (imm, &base, &step))
+       aarch64_expand_vec_series (dest, base, step);
+      else if (const_vec_duplicate_p (imm, &value))
+       {
+         /* If the constant is out of range of an SVE vector move,
+            load it from memory if we can, otherwise move it into
+            a register and use a DUP.  */
+         scalar_mode inner_mode = GET_MODE_INNER (mode);
+         rtx op = force_const_mem (inner_mode, value);
+         if (!op)
+           op = force_reg (inner_mode, value);
+         else if (!aarch64_sve_ld1r_operand_p (op))
+           {
+             rtx addr = force_reg (Pmode, XEXP (op, 0));
+             op = replace_equiv_address (op, addr);
+           }
+         emit_insn (gen_vec_duplicate (dest, op));
+       }
+      else if (GET_CODE (imm) == CONST_VECTOR
+              && !GET_MODE_NUNITS (GET_MODE (imm)).is_constant ())
+       aarch64_expand_sve_const_vector (dest, imm);
        else
-        {
+       {
           rtx mem = force_const_mem (mode, imm);
           gcc_assert (mem);
-         emit_insn (gen_rtx_SET (dest, mem));
+         emit_move_insn (dest, mem);
         }
  
        return;
@@ -2251,6 +2997,44 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
                                   as_a <scalar_int_mode> (mode));
  }
  
+/* Emit an SVE predicated move from SRC to DEST.  PRED is a predicate
+   that is known to contain PTRUE.  */
+
+void
+aarch64_emit_sve_pred_move (rtx dest, rtx pred, rtx src)
+{
+  emit_insn (gen_rtx_SET (dest, gen_rtx_UNSPEC (GET_MODE (dest),
+                                               gen_rtvec (2, pred, src),
+                                               UNSPEC_MERGE_PTRUE)));
+}
+
+/* Expand a pre-RA SVE data move from SRC to DEST in which at least one
+   operand is in memory.  In this case we need to use the predicated LD1
+   and ST1 instead of LDR and STR, both for correctness on big-endian
+   targets and because LD1 and ST1 support a wider range of addressing modes.
+   PRED_MODE is the mode of the predicate.
+
+   See the comment at the head of aarch64-sve.md for details about the
+   big-endian handling.  */
+
+void
+aarch64_expand_sve_mem_move (rtx dest, rtx src, machine_mode pred_mode)
+{
+  machine_mode mode = GET_MODE (dest);
+  rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+  if (!register_operand (src, mode)
+      && !register_operand (dest, mode))
+    {
+      rtx tmp = gen_reg_rtx (mode);
+      if (MEM_P (src))
+       aarch64_emit_sve_pred_move (tmp, ptrue, src);
+      else
+       emit_move_insn (tmp, src);
+      src = tmp;
+    }
+  aarch64_emit_sve_pred_move (dest, ptrue, src);
+}
+
  static bool
  aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
                                  tree exp ATTRIBUTE_UNUSED)
@@ -2715,6 +3499,21 @@ aarch64_function_arg_boundary (machine_mode mode, const_tree type)
    return MIN (MAX (alignment, PARM_BOUNDARY), STACK_BOUNDARY);
  }
  
+/* Implement TARGET_GET_RAW_RESULT_MODE and TARGET_GET_RAW_ARG_MODE.  */
+
+static fixed_size_mode
+aarch64_get_reg_raw_mode (int regno)
+{
+  if (TARGET_SVE && FP_REGNUM_P (regno))
+    /* Don't use the SVE part of the register for __builtin_apply and
+       __builtin_return.  The SVE registers aren't used by the normal PCS,
+       so using them there would be a waste of time.  The PCS extensions
+       for SVE types are fundamentally incompatible with the
+       __builtin_return/__builtin_apply interface.  */
+    return as_a <fixed_size_mode> (V16QImode);
+  return default_get_reg_raw_mode (regno);
+}
+
  /* Implement TARGET_FUNCTION_ARG_PADDING.
  
     Small aggregate types are placed in the lowest memory address.
@@ -3472,6 +4271,41 @@ aarch64_restore_callee_saves (machine_mode mode,
      }
  }
  
+/* Return true if OFFSET is a signed 4-bit value multiplied by the size
+   of MODE.  */
+
+static inline bool
+offset_4bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
+{
+  HOST_WIDE_INT multiple;
+  return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+         && IN_RANGE (multiple, -8, 7));
+}
+
+/* Return true if OFFSET is a unsigned 6-bit value multiplied by the size
+   of MODE.  */
+
+static inline bool
+offset_6bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset)
+{
+  HOST_WIDE_INT multiple;
+  return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+         && IN_RANGE (multiple, 0, 63));
+}
+
+/* Return true if OFFSET is a signed 7-bit value multiplied by the size
+   of MODE.  */
+
+bool
+aarch64_offset_7bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
+{
+  HOST_WIDE_INT multiple;
+  return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+         && IN_RANGE (multiple, -64, 63));
+}
+
+/* Return true if OFFSET is a signed 9-bit value.  */
+
  static inline bool
  offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
                                poly_int64 offset)
@@ -3481,20 +4315,26 @@ offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
           && IN_RANGE (const_offset, -256, 255));
  }
  
+/* Return true if OFFSET is a signed 9-bit value multiplied by the size
+   of MODE.  */
+
  static inline bool
-offset_12bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset)
+offset_9bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
  {
    HOST_WIDE_INT multiple;
    return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
-         && IN_RANGE (multiple, 0, 4095));
+         && IN_RANGE (multiple, -256, 255));
  }
  
-bool
-aarch64_offset_7bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
+/* Return true if OFFSET is an unsigned 12-bit value multiplied by the size
+   of MODE.  */
+
+static inline bool
+offset_12bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset)
  {
    HOST_WIDE_INT multiple;
    return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
-         && IN_RANGE (multiple, -64, 63));
+         && IN_RANGE (multiple, 0, 4095));
  }
  
  /* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
@@ -3713,6 +4553,18 @@ aarch64_set_handled_components (sbitmap components)
        cfun->machine->reg_is_wrapped_separately[regno] = true;
  }
  
+/* Add a REG_CFA_EXPRESSION note to INSN to say that register REG
+   is saved at BASE + OFFSET.  */
+
+static void
+aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg,
+                           rtx base, poly_int64 offset)
+{
+  rtx mem = gen_frame_mem (DImode, plus_constant (Pmode, base, offset));
+  add_reg_note (insn, REG_CFA_EXPRESSION,
+               gen_rtx_SET (mem, regno_reg_rtx[reg]));
+}
+
  /* AArch64 stack frames generated by this compiler look like:
  
         +-------------------------------+
@@ -3798,19 +4650,55 @@ aarch64_expand_prologue (void)
    rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
    rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
  
-  aarch64_sub_sp (ip0_rtx, initial_adjust, true);
+  aarch64_sub_sp (ip0_rtx, ip1_rtx, initial_adjust, true);
  
    if (callee_adjust != 0)
      aarch64_push_regs (reg1, reg2, callee_adjust);
  
    if (emit_frame_chain)
      {
+      poly_int64 reg_offset = callee_adjust;
        if (callee_adjust == 0)
-       aarch64_save_callee_saves (DImode, callee_offset, R29_REGNUM,
-                                  R30_REGNUM, false);
+       {
+         reg1 = R29_REGNUM;
+         reg2 = R30_REGNUM;
+         reg_offset = callee_offset;
+         aarch64_save_callee_saves (DImode, reg_offset, reg1, reg2, false);
+       }
        aarch64_add_offset (Pmode, hard_frame_pointer_rtx,
-                         stack_pointer_rtx, callee_offset, ip1_rtx,
-                         frame_pointer_needed);
+                         stack_pointer_rtx, callee_offset,
+                         ip1_rtx, ip0_rtx, frame_pointer_needed);
+      if (frame_pointer_needed && !frame_size.is_constant ())
+       {
+         /* Variable-sized frames need to describe the save slot
+            address using DW_CFA_expression rather than DW_CFA_offset.
+            This means that, without taking further action, the
+            locations of the registers that we've already saved would
+            remain based on the stack pointer even after we redefine
+            the CFA based on the frame pointer.  We therefore need new
+            DW_CFA_expressions to re-express the save slots with addresses
+            based on the frame pointer.  */
+         rtx_insn *insn = get_last_insn ();
+         gcc_assert (RTX_FRAME_RELATED_P (insn));
+
+         /* Add an explicit CFA definition if this was previously
+            implicit.  */
+         if (!find_reg_note (insn, REG_CFA_ADJUST_CFA, NULL_RTX))
+           {
+             rtx src = plus_constant (Pmode, stack_pointer_rtx,
+                                      callee_offset);
+             add_reg_note (insn, REG_CFA_ADJUST_CFA,
+                           gen_rtx_SET (hard_frame_pointer_rtx, src));
+           }
+
+         /* Change the save slot expressions for the registers that
+            we've already saved.  */
+         reg_offset -= callee_offset;
+         aarch64_add_cfa_expression (insn, reg2, hard_frame_pointer_rtx,
+                                     reg_offset + UNITS_PER_WORD);
+         aarch64_add_cfa_expression (insn, reg1, hard_frame_pointer_rtx,
+                                     reg_offset);
+       }
        emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
      }
  
@@ -3818,7 +4706,7 @@ aarch64_expand_prologue (void)
                              callee_adjust != 0 || emit_frame_chain);
    aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
                              callee_adjust != 0 || emit_frame_chain);
-  aarch64_sub_sp (ip1_rtx, final_adjust, !frame_pointer_needed);
+  aarch64_sub_sp (ip1_rtx, ip0_rtx, final_adjust, !frame_pointer_needed);
  }
  
  /* Return TRUE if we can use a simple_return insn.
@@ -3859,6 +4747,13 @@ aarch64_expand_epilogue (bool for_sibcall)
    unsigned reg2 = cfun->machine->frame.wb_candidate2;
    rtx cfi_ops = NULL;
    rtx_insn *insn;
+  /* A stack clash protection prologue may not have left IP0_REGNUM or
+     IP1_REGNUM in a usable state.  The same is true for allocations
+     with an SVE component, since we then need both temporary registers
+     for each allocation.  */
+  bool can_inherit_p = (initial_adjust.is_constant ()
+                       && final_adjust.is_constant ()
+                       && !flag_stack_clash_protection);
  
    /* We need to add memory barrier to prevent read from deallocated stack.  */
    bool need_barrier_p
@@ -3884,9 +4779,10 @@ aarch64_expand_epilogue (bool for_sibcall)
         is restored on the instruction doing the writeback.  */
      aarch64_add_offset (Pmode, stack_pointer_rtx,
                         hard_frame_pointer_rtx, -callee_offset,
-                       ip1_rtx, callee_adjust == 0);
+                       ip1_rtx, ip0_rtx, callee_adjust == 0);
    else
-    aarch64_add_sp (ip1_rtx, final_adjust, df_regs_ever_live_p (IP1_REGNUM));
+    aarch64_add_sp (ip1_rtx, ip0_rtx, final_adjust,
+                   !can_inherit_p || df_regs_ever_live_p (IP1_REGNUM));
  
    aarch64_restore_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
                                 callee_adjust != 0, &cfi_ops);
@@ -3909,7 +4805,8 @@ aarch64_expand_epilogue (bool for_sibcall)
        cfi_ops = NULL;
      }
  
-  aarch64_add_sp (ip0_rtx, initial_adjust, df_regs_ever_live_p (IP0_REGNUM));
+  aarch64_add_sp (ip0_rtx, ip1_rtx, initial_adjust,
+                 !can_inherit_p || df_regs_ever_live_p (IP0_REGNUM));
  
    if (cfi_ops)
      {
@@ -4019,7 +4916,7 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
    temp1 = gen_rtx_REG (Pmode, IP1_REGNUM);
  
    if (vcall_offset == 0)
-    aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1, false);
+    aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1, temp0, false);
    else
      {
        gcc_assert ((vcall_offset & (POINTER_BYTES - 1)) == 0);
@@ -4031,8 +4928,8 @@ aarch64_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
             addr = gen_rtx_PRE_MODIFY (Pmode, this_rtx,
                                        plus_constant (Pmode, this_rtx, delta));
           else
-           aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1,
-                               false);
+           aarch64_add_offset (Pmode, this_rtx, this_rtx, delta,
+                               temp1, temp0, false);
         }
  
        if (Pmode == ptr_mode)
@@ -4126,11 +5023,27 @@ aarch64_movw_imm (HOST_WIDE_INT val, scalar_int_mode mode)
      }
    else
      {
-      /* Ignore sign extension.  */
-      val &= (HOST_WIDE_INT) 0xffffffff;
+      /* Ignore sign extension.  */
+      val &= (HOST_WIDE_INT) 0xffffffff;
+    }
+  return ((val & (((HOST_WIDE_INT) 0xffff) << 0)) == val
+         || (val & (((HOST_WIDE_INT) 0xffff) << 16)) == val);
+}
+
+/* VAL is a value with the inner mode of MODE.  Replicate it to fill a
+   64-bit (DImode) integer.  */
+
+static unsigned HOST_WIDE_INT
+aarch64_replicate_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode mode)
+{
+  unsigned int size = GET_MODE_UNIT_PRECISION (mode);
+  while (size < 64)
+    {
+      val &= (HOST_WIDE_INT_1U << size) - 1;
+      val |= val << size;
+      size *= 2;
      }
-  return ((val & (((HOST_WIDE_INT) 0xffff) << 0)) == val
-         || (val & (((HOST_WIDE_INT) 0xffff) << 16)) == val);
+  return val;
  }
  
  /* Multipliers for repeating bitmasks of width 32, 16, 8, 4, and 2.  */
@@ -4155,7 +5068,7 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode)
  
    /* Check for a single sequence of one bits and return quickly if so.
       The special cases of all ones and all zeroes returns false.  */
-  val = (unsigned HOST_WIDE_INT) val_in;
+  val = aarch64_replicate_bitmask_imm (val_in, mode);
    tmp = val + (val & -val);
  
    if (tmp == (tmp & -tmp))
@@ -4257,10 +5170,16 @@ aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
    if (GET_CODE (x) == HIGH)
      return true;
  
+  /* There's no way to calculate VL-based values using relocations.  */
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, x, ALL)
+    if (GET_CODE (*iter) == CONST_POLY_INT)
+      return true;
+
    split_const (x, &base, &offset);
    if (GET_CODE (base) == SYMBOL_REF || GET_CODE (base) == LABEL_REF)
      {
-      if (aarch64_classify_symbol (base, offset)
+      if (aarch64_classify_symbol (base, INTVAL (offset))
           != SYMBOL_FORCE_TO_MEM)
         return true;
        else
@@ -4496,10 +5415,21 @@ aarch64_classify_index (struct aarch64_address_info *info, rtx x,
        && contains_reg_of_mode[GENERAL_REGS][GET_MODE (SUBREG_REG (index))])
      index = SUBREG_REG (index);
  
-  if ((shift == 0
-       || (shift > 0 && shift <= 3
-          && known_eq (1 << shift, GET_MODE_SIZE (mode))))
-      && REG_P (index)
+  if (aarch64_sve_data_mode_p (mode))
+    {
+      if (type != ADDRESS_REG_REG
+         || (1 << shift) != GET_MODE_UNIT_SIZE (mode))
+       return false;
+    }
+  else
+    {
+      if (shift != 0
+         && !(IN_RANGE (shift, 1, 3)
+              && known_eq (1 << shift, GET_MODE_SIZE (mode))))
+       return false;
+    }
+
+  if (REG_P (index)
        && aarch64_regno_ok_for_index_p (REGNO (index), strict_p))
      {
        info->type = type;
@@ -4552,23 +5482,34 @@ aarch64_classify_address (struct aarch64_address_info *info,
  
    /* On BE, we use load/store pair for all large int mode load/stores.
       TI/TFmode may also use a load/store pair.  */
+  unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+  bool advsimd_struct_p = (vec_flags == (VEC_ADVSIMD | VEC_STRUCT));
    bool load_store_pair_p = (type == ADDR_QUERY_LDP_STP
                             || mode == TImode
                             || mode == TFmode
-                           || (BYTES_BIG_ENDIAN
-                               && aarch64_vect_struct_mode_p (mode)));
+                           || (BYTES_BIG_ENDIAN && advsimd_struct_p));
  
    bool allow_reg_index_p = (!load_store_pair_p
-                           && (maybe_ne (GET_MODE_SIZE (mode), 16)
-                               || aarch64_vector_mode_supported_p (mode))
-                           && !aarch64_vect_struct_mode_p (mode));
+                           && (known_lt (GET_MODE_SIZE (mode), 16)
+                               || vec_flags == VEC_ADVSIMD
+                               || vec_flags == VEC_SVE_DATA));
+
+  /* For SVE, only accept [Rn], [Rn, Rm, LSL #shift] and
+     [Rn, #offset, MUL VL].  */
+  if ((vec_flags & (VEC_SVE_DATA | VEC_SVE_PRED)) != 0
+      && (code != REG && code != PLUS))
+    return false;
  
    /* On LE, for AdvSIMD, don't support anything other than POST_INC or
       REG addressing.  */
-  if (aarch64_vect_struct_mode_p (mode) && !BYTES_BIG_ENDIAN
+  if (advsimd_struct_p
+      && !BYTES_BIG_ENDIAN
        && (code != POST_INC && code != REG))
      return false;
  
+  gcc_checking_assert (GET_MODE (x) == VOIDmode
+                      || SCALAR_INT_MODE_P (GET_MODE (x)));
+
    switch (code)
      {
      case REG:
@@ -4641,6 +5582,17 @@ aarch64_classify_address (struct aarch64_address_info *info,
                     && aarch64_offset_7bit_signed_scaled_p (TImode,
                                                             offset + 32));
  
+         /* Make "m" use the LD1 offset range for SVE data modes, so
+            that pre-RTL optimizers like ivopts will work to that
+            instead of the wider LDR/STR range.  */
+         if (vec_flags == VEC_SVE_DATA)
+           return (type == ADDR_QUERY_M
+                   ? offset_4bit_signed_scaled_p (mode, offset)
+                   : offset_9bit_signed_scaled_p (mode, offset));
+
+         if (vec_flags == VEC_SVE_PRED)
+           return offset_9bit_signed_scaled_p (mode, offset);
+
           if (load_store_pair_p)
             return ((known_eq (GET_MODE_SIZE (mode), 4)
                      || known_eq (GET_MODE_SIZE (mode), 8))
@@ -4741,7 +5693,8 @@ aarch64_classify_address (struct aarch64_address_info *info,
           rtx sym, offs;
           split_const (info->offset, &sym, &offs);
           if (GET_CODE (sym) == SYMBOL_REF
-             && (aarch64_classify_symbol (sym, offs) == SYMBOL_SMALL_ABSOLUTE))
+             && (aarch64_classify_symbol (sym, INTVAL (offs))
+                 == SYMBOL_SMALL_ABSOLUTE))
             {
               /* The symbol and offset must be aligned to the access size.  */
               unsigned int align;
@@ -4812,7 +5765,7 @@ aarch64_classify_symbolic_expression (rtx x)
    rtx offset;
  
    split_const (x, &x, &offset);
-  return aarch64_classify_symbol (x, offset);
+  return aarch64_classify_symbol (x, INTVAL (offset));
  }
  
  
@@ -5265,6 +6218,33 @@ aarch64_const_vec_all_same_int_p (rtx x, HOST_WIDE_INT val)
    return aarch64_const_vec_all_same_in_range_p (x, val, val);
  }
  
+/* Return true if VEC is a constant in which every element is in the range
+   [MINVAL, MAXVAL].  The elements do not need to have the same value.  */
+
+static bool
+aarch64_const_vec_all_in_range_p (rtx vec,
+                                 HOST_WIDE_INT minval,
+                                 HOST_WIDE_INT maxval)
+{
+  if (GET_CODE (vec) != CONST_VECTOR
+      || GET_MODE_CLASS (GET_MODE (vec)) != MODE_VECTOR_INT)
+    return false;
+
+  int nunits;
+  if (!CONST_VECTOR_STEPPED_P (vec))
+    nunits = const_vector_encoded_nelts (vec);
+  else if (!CONST_VECTOR_NUNITS (vec).is_constant (&nunits))
+    return false;
+
+  for (int i = 0; i < nunits; i++)
+    {
+      rtx vec_elem = CONST_VECTOR_ELT (vec, i);
+      if (!CONST_INT_P (vec_elem)
+         || !IN_RANGE (INTVAL (vec_elem), minval, maxval))
+       return false;
+    }
+  return true;
+}
  
  /* N Z C V.  */
  #define AARCH64_CC_V 1
@@ -5293,10 +6273,43 @@ static const int aarch64_nzcv_codes[] =
    0            /* NV, Any.  */
  };
  
+/* Print floating-point vector immediate operand X to F, negating it
+   first if NEGATE is true.  Return true on success, false if it isn't
+   a constant we can handle.  */
+
+static bool
+aarch64_print_vector_float_operand (FILE *f, rtx x, bool negate)
+{
+  rtx elt;
+
+  if (!const_vec_duplicate_p (x, &elt))
+    return false;
+
+  REAL_VALUE_TYPE r = *CONST_DOUBLE_REAL_VALUE (elt);
+  if (negate)
+    r = real_value_negate (&r);
+
+  /* We only handle the SVE single-bit immediates here.  */
+  if (real_equal (&r, &dconst0))
+    asm_fprintf (f, "0.0");
+  else if (real_equal (&r, &dconst1))
+    asm_fprintf (f, "1.0");
+  else if (real_equal (&r, &dconsthalf))
+    asm_fprintf (f, "0.5");
+  else
+    return false;
+
+  return true;
+}
+
  /* Print operand X to file F in a target specific manner according to CODE.
     The acceptable formatting commands given by CODE are:
       'c':              An integer or symbol address without a preceding #
                         sign.
+     'C':              Take the duplicated element in a vector constant
+                       and print it in hex.
+     'D':              Take the duplicated element in a vector constant
+                       and print it as an unsigned integer, in decimal.
       'e':              Print the sign/zero-extend size as a character 8->b,
                         16->h, 32->w.
       'p':              Prints N such that 2^N == X (X must be power of 2 and
@@ -5306,6 +6319,8 @@ static const int aarch64_nzcv_codes[] =
                         of regs.
       'm':              Print a condition (eq, ne, etc).
       'M':              Same as 'm', but invert condition.
+     'N':              Take the duplicated element in a vector constant
+                       and print the negative of it in decimal.
       'b/h/s/d/q':      Print a scalar FP/SIMD register name.
       'S/T/U/V':                Print a FP/SIMD register name for a register list.
                         The register printed is the FP/SIMD register name
@@ -5332,6 +6347,7 @@ static const int aarch64_nzcv_codes[] =
  static void
  aarch64_print_operand (FILE *f, rtx x, int code)
  {
+  rtx elt;
    switch (code)
      {
      case 'c':
@@ -5448,6 +6464,25 @@ aarch64_print_operand (FILE *f, rtx x, int code)
        }
        break;
  
+    case 'N':
+      if (!const_vec_duplicate_p (x, &elt))
+       {
+         output_operand_lossage ("invalid vector constant");
+         return;
+       }
+
+      if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
+       asm_fprintf (f, "%wd", -INTVAL (elt));
+      else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_FLOAT
+              && aarch64_print_vector_float_operand (f, x, true))
+       ;
+      else
+       {
+         output_operand_lossage ("invalid vector constant");
+         return;
+       }
+      break;
+
      case 'b':
      case 'h':
      case 's':
@@ -5470,7 +6505,9 @@ aarch64_print_operand (FILE *f, rtx x, int code)
           output_operand_lossage ("incompatible floating point / vector register operand for '%%%c'", code);
           return;
         }
-      asm_fprintf (f, "v%d", REGNO (x) - V0_REGNUM + (code - 'S'));
+      asm_fprintf (f, "%c%d",
+                  aarch64_sve_data_mode_p (GET_MODE (x)) ? 'z' : 'v',
+                  REGNO (x) - V0_REGNUM + (code - 'S'));
        break;
  
      case 'R':
@@ -5491,6 +6528,33 @@ aarch64_print_operand (FILE *f, rtx x, int code)
        asm_fprintf (f, "0x%wx", UINTVAL (x) & 0xffff);
        break;
  
+    case 'C':
+      {
+       /* Print a replicated constant in hex.  */
+       if (!const_vec_duplicate_p (x, &elt) || !CONST_INT_P (elt))
+         {
+           output_operand_lossage ("invalid operand for '%%%c'", code);
+           return;
+         }
+       scalar_mode inner_mode = GET_MODE_INNER (GET_MODE (x));
+       asm_fprintf (f, "0x%wx", UINTVAL (elt) & GET_MODE_MASK (inner_mode));
+      }
+      break;
+
+    case 'D':
+      {
+       /* Print a replicated constant in decimal, treating it as
+          unsigned.  */
+       if (!const_vec_duplicate_p (x, &elt) || !CONST_INT_P (elt))
+         {
+           output_operand_lossage ("invalid operand for '%%%c'", code);
+           return;
+         }
+       scalar_mode inner_mode = GET_MODE_INNER (GET_MODE (x));
+       asm_fprintf (f, "%wd", UINTVAL (elt) & GET_MODE_MASK (inner_mode));
+      }
+      break;
+
      case 'w':
      case 'x':
        if (x == const0_rtx
@@ -5524,14 +6588,16 @@ aarch64_print_operand (FILE *f, rtx x, int code)
        switch (GET_CODE (x))
         {
         case REG:
-         asm_fprintf (f, "%s", reg_names [REGNO (x)]);
+         if (aarch64_sve_data_mode_p (GET_MODE (x)))
+           asm_fprintf (f, "z%d", REGNO (x) - V0_REGNUM);
+         else
+           asm_fprintf (f, "%s", reg_names [REGNO (x)]);
           break;
  
         case MEM:
           output_address (GET_MODE (x), XEXP (x, 0));
           break;
  
-       case CONST:
         case LABEL_REF:
         case SYMBOL_REF:
           output_addr_const (asm_out_file, x);
@@ -5541,21 +6607,31 @@ aarch64_print_operand (FILE *f, rtx x, int code)
           asm_fprintf (f, "%wd", INTVAL (x));
           break;
  
-       case CONST_VECTOR:
-         if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
+       case CONST:
+         if (!VECTOR_MODE_P (GET_MODE (x)))
             {
-             gcc_assert (
-                 aarch64_const_vec_all_same_in_range_p (x,
-                                                        HOST_WIDE_INT_MIN,
-                                                        HOST_WIDE_INT_MAX));
-             asm_fprintf (f, "%wd", INTVAL (CONST_VECTOR_ELT (x, 0)));
+             output_addr_const (asm_out_file, x);
+             break;
             }
-         else if (aarch64_simd_imm_zero_p (x, GET_MODE (x)))
+         /* fall through */
+
+       case CONST_VECTOR:
+         if (!const_vec_duplicate_p (x, &elt))
             {
-             fputc ('0', f);
+             output_operand_lossage ("invalid vector constant");
+             return;
             }
+
+         if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
+           asm_fprintf (f, "%wd", INTVAL (elt));
+         else if (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_FLOAT
+                  && aarch64_print_vector_float_operand (f, x, false))
+           ;
           else
-           gcc_unreachable ();
+           {
+             output_operand_lossage ("invalid vector constant");
+             return;
+           }
           break;
  
         case CONST_DOUBLE:
@@ -5740,6 +6816,22 @@ aarch64_print_address_internal (FILE *f, machine_mode mode, rtx x,
        case ADDRESS_REG_IMM:
         if (known_eq (addr.const_offset, 0))
           asm_fprintf (f, "[%s]", reg_names [REGNO (addr.base)]);
+       else if (aarch64_sve_data_mode_p (mode))
+         {
+           HOST_WIDE_INT vnum
+             = exact_div (addr.const_offset,
+                          BYTES_PER_SVE_VECTOR).to_constant ();
+           asm_fprintf (f, "[%s, #%wd, mul vl]",
+                        reg_names[REGNO (addr.base)], vnum);
+         }
+       else if (aarch64_sve_pred_mode_p (mode))
+         {
+           HOST_WIDE_INT vnum
+             = exact_div (addr.const_offset,
+                          BYTES_PER_SVE_PRED).to_constant ();
+           asm_fprintf (f, "[%s, #%wd, mul vl]",
+                        reg_names[REGNO (addr.base)], vnum);
+         }
         else
           asm_fprintf (f, "[%s, %wd]", reg_names [REGNO (addr.base)],
                        INTVAL (addr.offset));
@@ -5827,7 +6919,7 @@ aarch64_print_ldpstp_address (FILE *f, machine_mode mode, rtx x)
  static void
  aarch64_print_operand_address (FILE *f, machine_mode mode, rtx x)
  {
-  if (!aarch64_print_address_internal (f, mode, x, ADDR_QUERY_M))
+  if (!aarch64_print_address_internal (f, mode, x, ADDR_QUERY_ANY))
      output_addr_const (f, x);
  }
  
@@ -5882,6 +6974,9 @@ aarch64_regno_regclass (unsigned regno)
    if (FP_REGNUM_P (regno))
      return FP_LO_REGNUM_P (regno) ?  FP_LO_REGS : FP_REGS;
  
+  if (PR_REGNUM_P (regno))
+    return PR_LO_REGNUM_P (regno) ? PR_LO_REGS : PR_HI_REGS;
+
    return NO_REGS;
  }
  
@@ -6035,6 +7130,14 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x,
                           machine_mode mode,
                           secondary_reload_info *sri)
  {
+  if (BYTES_BIG_ENDIAN
+      && reg_class_subset_p (rclass, FP_REGS)
+      && (MEM_P (x) || (REG_P (x) && !HARD_REGISTER_P (x)))
+      && aarch64_sve_data_mode_p (mode))
+    {
+      sri->icode = CODE_FOR_aarch64_sve_reload_be;
+      return NO_REGS;
+    }
  
    /* If we have to disable direct literal pool loads and stores because the
       function is too big, then we need a scratch register.  */
@@ -6176,6 +7279,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
       can hold MODE, but at the moment we need to handle all modes.
       Just ignore any runtime parts for registers that can't store them.  */
    HOST_WIDE_INT lowest_size = constant_lower_bound (GET_MODE_SIZE (mode));
+  unsigned int nregs;
    switch (regclass)
      {
      case CALLER_SAVE_REGS:
@@ -6185,10 +7289,17 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
      case POINTER_AND_FP_REGS:
      case FP_REGS:
      case FP_LO_REGS:
-      return (aarch64_vector_mode_p (mode)
+      if (aarch64_sve_data_mode_p (mode)
+         && constant_multiple_p (GET_MODE_SIZE (mode),
+                                 BYTES_PER_SVE_VECTOR, &nregs))
+       return nregs;
+      return (aarch64_vector_data_mode_p (mode)
               ? CEIL (lowest_size, UNITS_PER_VREG)
               : CEIL (lowest_size, UNITS_PER_WORD));
      case STACK_REG:
+    case PR_REGS:
+    case PR_LO_REGS:
+    case PR_HI_REGS:
        return 1;
  
      case NO_REGS:
@@ -7497,8 +8608,8 @@ cost_plus:
           }
  
         if (GET_MODE_CLASS (mode) == MODE_INT
-           && CONST_INT_P (op1)
-           && aarch64_uimm12_shift (INTVAL (op1)))
+           && ((CONST_INT_P (op1) && aarch64_uimm12_shift (INTVAL (op1)))
+               || aarch64_sve_addvl_addpl_immediate (op1, mode)))
           {
             *cost += rtx_cost (op0, mode, PLUS, 0, speed);
  
@@ -9415,6 +10526,21 @@ aarch64_get_arch (enum aarch64_arch arch)
    return &all_architectures[cpu->arch];
  }
  
+/* Return the VG value associated with -msve-vector-bits= value VALUE.  */
+
+static poly_uint16
+aarch64_convert_sve_vector_bits (aarch64_sve_vector_bits_enum value)
+{
+  /* For now generate vector-length agnostic code for -msve-vector-bits=128.
+     This ensures we can clearly distinguish SVE and Advanced SIMD modes when
+     deciding which .md file patterns to use and when deciding whether
+     something is a legitimate address or constant.  */
+  if (value == SVE_SCALABLE || value == SVE_128)
+    return poly_uint16 (2, 2);
+  else
+    return (int) value / 64;
+}
+
  /* Implement TARGET_OPTION_OVERRIDE.  This is called once in the beginning
     and is used to parse the -m{cpu,tune,arch} strings and setup the initial
     tuning structs.  In particular it must set selected_tune and
@@ -9516,6 +10642,9 @@ aarch64_override_options (void)
      error ("assembler does not support -mabi=ilp32");
  #endif
  
+  /* Convert -msve-vector-bits to a VG count.  */
+  aarch64_sve_vg = aarch64_convert_sve_vector_bits (aarch64_sve_vector_bits);
+
    if (aarch64_ra_sign_scope != AARCH64_FUNCTION_NONE && TARGET_ILP32)
      sorry ("return address signing is only supported for -mabi=lp64");
  
@@ -10392,11 +11521,11 @@ aarch64_classify_tls_symbol (rtx x)
      }
  }
  
-/* Return the method that should be used to access SYMBOL_REF or
-   LABEL_REF X.  */
+/* Return the correct method for accessing X + OFFSET, where X is either
+   a SYMBOL_REF or LABEL_REF.  */
  
  enum aarch64_symbol_type
-aarch64_classify_symbol (rtx x, rtx offset)
+aarch64_classify_symbol (rtx x, HOST_WIDE_INT offset)
  {
    if (GET_CODE (x) == LABEL_REF)
      {
@@ -10439,7 +11568,7 @@ aarch64_classify_symbol (rtx x, rtx offset)
              resolve to a symbol in this module, then force to memory.  */
           if ((SYMBOL_REF_WEAK (x)
                && !aarch64_symbol_binds_local_p (x))
-             || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575)
+             || !IN_RANGE (offset, -1048575, 1048575))
             return SYMBOL_FORCE_TO_MEM;
           return SYMBOL_TINY_ABSOLUTE;
  
@@ -10448,7 +11577,7 @@ aarch64_classify_symbol (rtx x, rtx offset)
              4G.  */
           if ((SYMBOL_REF_WEAK (x)
                && !aarch64_symbol_binds_local_p (x))
-             || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263),
+             || !IN_RANGE (offset, HOST_WIDE_INT_C (-4294967263),
                             HOST_WIDE_INT_C (4294967264)))
             return SYMBOL_FORCE_TO_MEM;
           return SYMBOL_SMALL_ABSOLUTE;
@@ -10511,28 +11640,46 @@ aarch64_legitimate_constant_p (machine_mode mode, rtx x)
    if (CONST_INT_P (x) || CONST_DOUBLE_P (x) || GET_CODE (x) == CONST_VECTOR)
      return true;
  
-  /* Do not allow vector struct mode constants.  We could support
-     0 and -1 easily, but they need support in aarch64-simd.md.  */
-  if (aarch64_vect_struct_mode_p (mode))
+  /* Do not allow vector struct mode constants for Advanced SIMD.
+     We could support 0 and -1 easily, but they need support in
+     aarch64-simd.md.  */
+  unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+  if (vec_flags == (VEC_ADVSIMD | VEC_STRUCT))
      return false;
  
    /* Do not allow wide int constants - this requires support in movti.  */
    if (CONST_WIDE_INT_P (x))
      return false;
  
+  /* Only accept variable-length vector constants if they can be
+     handled directly.
+
+     ??? It would be possible to handle rematerialization of other
+     constants via secondary reloads.  */
+  if (vec_flags & VEC_ANY_SVE)
+    return aarch64_simd_valid_immediate (x, NULL);
+
    if (GET_CODE (x) == HIGH)
      x = XEXP (x, 0);
  
-  /* Do not allow const (plus (anchor_symbol, const_int)).  */
-  if (GET_CODE (x) == CONST)
-    {
-      rtx offset;
-
-      split_const (x, &x, &offset);
+  /* Accept polynomial constants that can be calculated by using the
+     destination of a move as the sole temporary.  Constants that
+     require a second temporary cannot be rematerialized (they can't be
+     forced to memory and also aren't legitimate constants).  */
+  poly_int64 offset;
+  if (poly_int_rtx_p (x, &offset))
+    return aarch64_offset_temporaries (false, offset) <= 1;
+
+  /* If an offset is being added to something else, we need to allow the
+     base to be moved into the destination register, meaning that there
+     are no free temporaries for the offset.  */
+  x = strip_offset (x, &offset);
+  if (!offset.is_constant () && aarch64_offset_temporaries (true, offset) > 0)
+    return false;
  
-      if (SYMBOL_REF_P (x) && SYMBOL_REF_ANCHOR_P (x))
-       return false;
-    }
+  /* Do not allow const (plus (anchor_symbol, const_int)).  */
+  if (maybe_ne (offset, 0) && SYMBOL_REF_P (x) && SYMBOL_REF_ANCHOR_P (x))
+    return false;
  
    /* Treat symbols as constants.  Avoid TLS symbols as they are complex,
       so spilling them is better than rematerialization.  */
@@ -11079,6 +12226,12 @@ aarch64_conditional_register_usage (void)
           call_used_regs[i] = 1;
         }
      }
+  if (!TARGET_SVE)
+    for (i = P0_REGNUM; i <= P15_REGNUM; i++)
+      {
+       fixed_regs[i] = 1;
+       call_used_regs[i] = 1;
+      }
  }
  
  /* Walk down the type tree of TYPE counting consecutive base elements.
@@ -11372,28 +12525,40 @@ aarch64_struct_value_rtx (tree fndecl ATTRIBUTE_UNUSED,
  static bool
  aarch64_vector_mode_supported_p (machine_mode mode)
  {
-  if (TARGET_SIMD
-      && (mode == V4SImode  || mode == V8HImode
-         || mode == V16QImode || mode == V2DImode
-         || mode == V2SImode  || mode == V4HImode
-         || mode == V8QImode || mode == V2SFmode
-         || mode == V4SFmode || mode == V2DFmode
-         || mode == V4HFmode || mode == V8HFmode
-         || mode == V1DFmode))
-    return true;
-
-  return false;
+  unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+  return vec_flags != 0 && (vec_flags & VEC_STRUCT) == 0;
  }
  
  /* Return appropriate SIMD container
     for MODE within a vector of WIDTH bits.  */
  static machine_mode
-aarch64_simd_container_mode (scalar_mode mode, unsigned width)
+aarch64_simd_container_mode (scalar_mode mode, poly_int64 width)
  {
-  gcc_assert (width == 64 || width == 128);
+  if (TARGET_SVE && known_eq (width, BITS_PER_SVE_VECTOR))
+    switch (mode)
+      {
+      case E_DFmode:
+       return VNx2DFmode;
+      case E_SFmode:
+       return VNx4SFmode;
+      case E_HFmode:
+       return VNx8HFmode;
+      case E_DImode:
+       return VNx2DImode;
+      case E_SImode:
+       return VNx4SImode;
+      case E_HImode:
+       return VNx8HImode;
+      case E_QImode:
+       return VNx16QImode;
+      default:
+       return word_mode;
+      }
+
+  gcc_assert (known_eq (width, 64) || known_eq (width, 128));
    if (TARGET_SIMD)
      {
-      if (width == 128)
+      if (known_eq (width, 128))
         switch (mode)
           {
           case E_DFmode:
@@ -11437,7 +12602,8 @@ aarch64_simd_container_mode (scalar_mode mode, unsigned width)
  static machine_mode
  aarch64_preferred_simd_mode (scalar_mode mode)
  {
-  return aarch64_simd_container_mode (mode, 128);
+  poly_int64 bits = TARGET_SVE ? BITS_PER_SVE_VECTOR : 128;
+  return aarch64_simd_container_mode (mode, bits);
  }
  
  /* Return a list of possible vector sizes for the vectorizer
@@ -11445,6 +12611,8 @@ aarch64_preferred_simd_mode (scalar_mode mode)
  static void
  aarch64_autovectorize_vector_sizes (vector_sizes *sizes)
  {
+  if (TARGET_SVE)
+    sizes->safe_push (BYTES_PER_SVE_VECTOR);
    sizes->safe_push (16);
    sizes->safe_push (8);
  }
@@ -11606,6 +12774,125 @@ sizetochar (int size)
      }
  }
  
+/* Return true if BASE_OR_STEP is a valid immediate operand for an SVE INDEX
+   instruction.  */
+
+bool
+aarch64_sve_index_immediate_p (rtx base_or_step)
+{
+  return (CONST_INT_P (base_or_step)
+         && IN_RANGE (INTVAL (base_or_step), -16, 15));
+}
+
+/* Return true if X is a valid immediate for the SVE ADD and SUB
+   instructions.  Negate X first if NEGATE_P is true.  */
+
+bool
+aarch64_sve_arith_immediate_p (rtx x, bool negate_p)
+{
+  rtx elt;
+
+  if (!const_vec_duplicate_p (x, &elt)
+      || !CONST_INT_P (elt))
+    return false;
+
+  HOST_WIDE_INT val = INTVAL (elt);
+  if (negate_p)
+    val = -val;
+  val &= GET_MODE_MASK (GET_MODE_INNER (GET_MODE (x)));
+
+  if (val & 0xff)
+    return IN_RANGE (val, 0, 0xff);
+  return IN_RANGE (val, 0, 0xff00);
+}
+
+/* Return true if X is a valid immediate operand for an SVE logical
+   instruction such as AND.  */
+
+bool
+aarch64_sve_bitmask_immediate_p (rtx x)
+{
+  rtx elt;
+
+  return (const_vec_duplicate_p (x, &elt)
+         && CONST_INT_P (elt)
+         && aarch64_bitmask_imm (INTVAL (elt),
+                                 GET_MODE_INNER (GET_MODE (x))));
+}
+
+/* Return true if X is a valid immediate for the SVE DUP and CPY
+   instructions.  */
+
+bool
+aarch64_sve_dup_immediate_p (rtx x)
+{
+  rtx elt;
+
+  if (!const_vec_duplicate_p (x, &elt)
+      || !CONST_INT_P (elt))
+    return false;
+
+  HOST_WIDE_INT val = INTVAL (elt);
+  if (val & 0xff)
+    return IN_RANGE (val, -0x80, 0x7f);
+  return IN_RANGE (val, -0x8000, 0x7f00);
+}
+
+/* Return true if X is a valid immediate operand for an SVE CMP instruction.
+   SIGNED_P says whether the operand is signed rather than unsigned.  */
+
+bool
+aarch64_sve_cmp_immediate_p (rtx x, bool signed_p)
+{
+  rtx elt;
+
+  return (const_vec_duplicate_p (x, &elt)
+         && CONST_INT_P (elt)
+         && (signed_p
+             ? IN_RANGE (INTVAL (elt), -16, 15)
+             : IN_RANGE (INTVAL (elt), 0, 127)));
+}
+
+/* Return true if X is a valid immediate operand for an SVE FADD or FSUB
+   instruction.  Negate X first if NEGATE_P is true.  */
+
+bool
+aarch64_sve_float_arith_immediate_p (rtx x, bool negate_p)
+{
+  rtx elt;
+  REAL_VALUE_TYPE r;
+
+  if (!const_vec_duplicate_p (x, &elt)
+      || GET_CODE (elt) != CONST_DOUBLE)
+    return false;
+
+  r = *CONST_DOUBLE_REAL_VALUE (elt);
+
+  if (negate_p)
+    r = real_value_negate (&r);
+
+  if (real_equal (&r, &dconst1))
+    return true;
+  if (real_equal (&r, &dconsthalf))
+    return true;
+  return false;
+}
+
+/* Return true if X is a valid immediate operand for an SVE FMUL
+   instruction.  */
+
+bool
+aarch64_sve_float_mul_immediate_p (rtx x)
+{
+  rtx elt;
+
+  /* GCC will never generate a multiply with an immediate of 2, so there is no
+     point testing for it (even though it is a valid constant).  */
+  return (const_vec_duplicate_p (x, &elt)
+         && GET_CODE (elt) == CONST_DOUBLE
+         && real_equal (CONST_DOUBLE_REAL_VALUE (elt), &dconsthalf));
+}
+
  /* Return true if replicating VAL32 is a valid 2-byte or 4-byte immediate
     for the Advanced SIMD operation described by WHICH and INSN.  If INFO
     is nonnull, use it to describe valid immediates.  */
@@ -11710,6 +12997,52 @@ aarch64_advsimd_valid_immediate (unsigned HOST_WIDE_INT val64,
    return false;
  }
  
+/* Return true if replicating VAL64 gives a valid immediate for an SVE MOV
+   instruction.  If INFO is nonnull, use it to describe valid immediates.  */
+
+static bool
+aarch64_sve_valid_immediate (unsigned HOST_WIDE_INT val64,
+                            simd_immediate_info *info)
+{
+  scalar_int_mode mode = DImode;
+  unsigned int val32 = val64 & 0xffffffff;
+  if (val32 == (val64 >> 32))
+    {
+      mode = SImode;
+      unsigned int val16 = val32 & 0xffff;
+      if (val16 == (val32 >> 16))
+       {
+         mode = HImode;
+         unsigned int val8 = val16 & 0xff;
+         if (val8 == (val16 >> 8))
+           mode = QImode;
+       }
+    }
+  HOST_WIDE_INT val = trunc_int_for_mode (val64, mode);
+  if (IN_RANGE (val, -0x80, 0x7f))
+    {
+      /* DUP with no shift.  */
+      if (info)
+       *info = simd_immediate_info (mode, val);
+      return true;
+    }
+  if ((val & 0xff) == 0 && IN_RANGE (val, -0x8000, 0x7f00))
+    {
+      /* DUP with LSL #8.  */
+      if (info)
+       *info = simd_immediate_info (mode, val);
+      return true;
+    }
+  if (aarch64_bitmask_imm (val64, mode))
+    {
+      /* DUPM.  */
+      if (info)
+       *info = simd_immediate_info (mode, val);
+      return true;
+    }
+  return false;
+}
+
  /* Return true if OP is a valid SIMD immediate for the operation
     described by WHICH.  If INFO is nonnull, use it to describe valid
     immediates.  */
@@ -11717,18 +13050,39 @@ bool
  aarch64_simd_valid_immediate (rtx op, simd_immediate_info *info,
                               enum simd_immediate_check which)
  {
-  rtx elt = NULL;
+  machine_mode mode = GET_MODE (op);
+  unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+  if (vec_flags == 0 || vec_flags == (VEC_ADVSIMD | VEC_STRUCT))
+    return false;
+
+  scalar_mode elt_mode = GET_MODE_INNER (mode);
+  rtx elt = NULL, base, step;
    unsigned int n_elts;
    if (const_vec_duplicate_p (op, &elt))
      n_elts = 1;
+  else if ((vec_flags & VEC_SVE_DATA)
+          && const_vec_series_p (op, &base, &step))
+    {
+      gcc_assert (GET_MODE_CLASS (mode) == MODE_VECTOR_INT);
+      if (!aarch64_sve_index_immediate_p (base)
+         || !aarch64_sve_index_immediate_p (step))
+       return false;
+
+      if (info)
+       *info = simd_immediate_info (elt_mode, base, step);
+      return true;
+    }
    else if (GET_CODE (op) == CONST_VECTOR
            && CONST_VECTOR_NUNITS (op).is_constant (&n_elts))
      /* N_ELTS set above.  */;
    else
      return false;
  
-  machine_mode mode = GET_MODE (op);
-  scalar_mode elt_mode = GET_MODE_INNER (mode);
+  /* Handle PFALSE and PTRUE.  */
+  if (vec_flags & VEC_SVE_PRED)
+    return (op == CONST0_RTX (mode)
+           || op == CONSTM1_RTX (mode));
+
    scalar_float_mode elt_float_mode;
    if (elt
        && is_a <scalar_float_mode> (elt_mode, &elt_float_mode)
@@ -11785,7 +13139,24 @@ aarch64_simd_valid_immediate (rtx op, simd_immediate_info *info,
      val64 |= ((unsigned HOST_WIDE_INT) bytes[i % nbytes]
               << (i * BITS_PER_UNIT));
  
-  return aarch64_advsimd_valid_immediate (val64, info, which);
+  if (vec_flags & VEC_SVE_DATA)
+    return aarch64_sve_valid_immediate (val64, info);
+  else
+    return aarch64_advsimd_valid_immediate (val64, info, which);
+}
+
+/* Check whether X is a VEC_SERIES-like constant that starts at 0 and
+   has a step in the range of INDEX.  Return the index expression if so,
+   otherwise return null.  */
+rtx
+aarch64_check_zero_based_sve_index_immediate (rtx x)
+{
+  rtx base, step;
+  if (const_vec_series_p (x, &base, &step)
+      && base == const0_rtx
+      && aarch64_sve_index_immediate_p (step))
+    return step;
+  return NULL_RTX;
  }
  
  /* Check of immediate shift constants are within range.  */
@@ -11799,16 +13170,6 @@ aarch64_simd_shift_imm_p (rtx x, machine_mode mode, bool left)
      return aarch64_const_vec_all_same_in_range_p (x, 1, bit_width);
  }
  
-/* Return true if X is a uniform vector where all elements
-   are either the floating-point constant 0.0 or the
-   integer constant 0.  */
-bool
-aarch64_simd_imm_zero_p (rtx x, machine_mode mode)
-{
-  return x == CONST0_RTX (mode);
-}
-
-
  /* Return the bitmask CONST_INT to select the bits required by a zero extract
     operation of width WIDTH at bit position POS.  */
  
@@ -11833,9 +13194,15 @@ aarch64_mov_operand_p (rtx x, machine_mode mode)
    if (CONST_INT_P (x))
      return true;
  
+  if (VECTOR_MODE_P (GET_MODE (x)))
+    return aarch64_simd_valid_immediate (x, NULL);
+
    if (GET_CODE (x) == SYMBOL_REF && mode == DImode && CONSTANT_ADDRESS_P (x))
      return true;
  
+  if (aarch64_sve_cnt_immediate_p (x))
+    return true;
+
    return aarch64_classify_symbolic_expression (x)
      == SYMBOL_TINY_ABSOLUTE;
  }
@@ -11855,7 +13222,7 @@ aarch64_simd_scalar_immediate_valid_for_move (rtx op, scalar_int_mode mode)
  {
    machine_mode vmode;
  
-  vmode = aarch64_preferred_simd_mode (mode);
+  vmode = aarch64_simd_container_mode (mode, 64);
    rtx op_v = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (op));
    return aarch64_simd_valid_immediate (op_v, NULL);
  }
@@ -11965,6 +13332,7 @@ aarch64_endian_lane_rtx (machine_mode mode, unsigned int n)
  }
  
  /* Return TRUE if OP is a valid vector addressing mode.  */
+
  bool
  aarch64_simd_mem_operand_p (rtx op)
  {
@@ -11972,6 +13340,34 @@ aarch64_simd_mem_operand_p (rtx op)
                         || REG_P (XEXP (op, 0)));
  }
  
+/* Return true if OP is a valid MEM operand for an SVE LD1R instruction.  */
+
+bool
+aarch64_sve_ld1r_operand_p (rtx op)
+{
+  struct aarch64_address_info addr;
+  scalar_mode mode;
+
+  return (MEM_P (op)
+         && is_a <scalar_mode> (GET_MODE (op), &mode)
+         && aarch64_classify_address (&addr, XEXP (op, 0), mode, false)
+         && addr.type == ADDRESS_REG_IMM
+         && offset_6bit_unsigned_scaled_p (mode, addr.const_offset));
+}
+
+/* Return true if OP is a valid MEM operand for an SVE LDR instruction.
+   The conditions for STR are the same.  */
+bool
+aarch64_sve_ldr_operand_p (rtx op)
+{
+  struct aarch64_address_info addr;
+
+  return (MEM_P (op)
+         && aarch64_classify_address (&addr, XEXP (op, 0), GET_MODE (op),
+                                      false, ADDR_QUERY_ANY)
+         && addr.type == ADDRESS_REG_IMM);
+}
+
  /* Emit a register copy from operand to operand, taking care not to
     early-clobber source registers in the process.
  
@@ -12006,14 +13402,36 @@ aarch64_simd_attr_length_rglist (machine_mode mode)
  }
  
  /* Implement target hook TARGET_VECTOR_ALIGNMENT.  The AAPCS64 sets the maximum
-   alignment of a vector to 128 bits.  */
+   alignment of a vector to 128 bits.  SVE predicates have an alignment of
+   16 bits.  */
  static HOST_WIDE_INT
  aarch64_simd_vector_alignment (const_tree type)
  {
+  if (TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST)
+    /* ??? Checking the mode isn't ideal, but VECTOR_BOOLEAN_TYPE_P can
+       be set for non-predicate vectors of booleans.  Modes are the most
+       direct way we have of identifying real SVE predicate types.  */
+    return GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL ? 16 : 128;
    HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
    return MIN (align, 128);
  }
  
+/* Implement target hook TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT.  */
+static HOST_WIDE_INT
+aarch64_vectorize_preferred_vector_alignment (const_tree type)
+{
+  if (aarch64_sve_data_mode_p (TYPE_MODE (type)))
+    {
+      /* If the length of the vector is fixed, try to align to that length,
+        otherwise don't try to align at all.  */
+      HOST_WIDE_INT result;
+      if (!BITS_PER_SVE_VECTOR.is_constant (&result))
+       result = TYPE_ALIGN (TREE_TYPE (type));
+      return result;
+    }
+  return TYPE_ALIGN (type);
+}
+
  /* Implement target hook TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE.  */
  static bool
  aarch64_simd_vector_alignment_reachable (const_tree type, bool is_packed)
@@ -12021,9 +13439,12 @@ aarch64_simd_vector_alignment_reachable (const_tree type, bool is_packed)
    if (is_packed)
      return false;
  
-  /* We guarantee alignment for vectors up to 128-bits.  */
-  if (tree_int_cst_compare (TYPE_SIZE (type),
-                           bitsize_int (BIGGEST_ALIGNMENT)) > 0)
+  /* For fixed-length vectors, check that the vectorizer will aim for
+     full-vector alignment.  This isn't true for generic GCC vectors
+     that are wider than the ABI maximum of 128 bits.  */
+  if (TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
+      && (wi::to_widest (TYPE_SIZE (type))
+         != aarch64_vectorize_preferred_vector_alignment (type)))
      return false;
  
    /* Vectors whose size is <= BIGGEST_ALIGNMENT are naturally aligned.  */
@@ -12268,12 +13689,9 @@ aarch64_expand_vector_init (rtx target, rtx vals)
  static unsigned HOST_WIDE_INT
  aarch64_shift_truncation_mask (machine_mode mode)
  {
-  return
-    (!SHIFT_COUNT_TRUNCATED
-     || aarch64_vector_mode_supported_p (mode)
-     || aarch64_vect_struct_mode_p (mode))
-    ? 0
-    : (GET_MODE_UNIT_BITSIZE (mode) - 1);
+  if (!SHIFT_COUNT_TRUNCATED || aarch64_vector_data_mode_p (mode))
+    return 0;
+  return GET_MODE_UNIT_BITSIZE (mode) - 1;
  }
  
  /* Select a format to encode pointers in exception handling data.  */
@@ -13250,6 +14668,67 @@ aarch64_output_scalar_simd_mov_immediate (rtx immediate, scalar_int_mode mode)
    return aarch64_output_simd_mov_immediate (v_op, width);
  }
  
+/* Return the output string to use for moving immediate CONST_VECTOR
+   into an SVE register.  */
+
+char *
+aarch64_output_sve_mov_immediate (rtx const_vector)
+{
+  static char templ[40];
+  struct simd_immediate_info info;
+  char element_char;
+
+  bool is_valid = aarch64_simd_valid_immediate (const_vector, &info);
+  gcc_assert (is_valid);
+
+  element_char = sizetochar (GET_MODE_BITSIZE (info.elt_mode));
+
+  if (info.step)
+    {
+      snprintf (templ, sizeof (templ), "index\t%%0.%c, #"
+               HOST_WIDE_INT_PRINT_DEC ", #" HOST_WIDE_INT_PRINT_DEC,
+               element_char, INTVAL (info.value), INTVAL (info.step));
+      return templ;
+    }
+
+  if (GET_MODE_CLASS (info.elt_mode) == MODE_FLOAT)
+    {
+      if (aarch64_float_const_zero_rtx_p (info.value))
+       info.value = GEN_INT (0);
+      else
+       {
+         const int buf_size = 20;
+         char float_buf[buf_size] = {};
+         real_to_decimal_for_mode (float_buf,
+                                   CONST_DOUBLE_REAL_VALUE (info.value),
+                                   buf_size, buf_size, 1, info.elt_mode);
+
+         snprintf (templ, sizeof (templ), "fmov\t%%0.%c, #%s",
+                   element_char, float_buf);
+         return templ;
+       }
+    }
+
+  snprintf (templ, sizeof (templ), "mov\t%%0.%c, #" HOST_WIDE_INT_PRINT_DEC,
+           element_char, INTVAL (info.value));
+  return templ;
+}
+
+/* Return the asm format for a PTRUE instruction whose destination has
+   mode MODE.  SUFFIX is the element size suffix.  */
+
+char *
+aarch64_output_ptrue (machine_mode mode, char suffix)
+{
+  unsigned int nunits;
+  static char buf[sizeof ("ptrue\t%0.N, vlNNNNN")];
+  if (GET_MODE_NUNITS (mode).is_constant (&nunits))
+    snprintf (buf, sizeof (buf), "ptrue\t%%0.%c, vl%d", suffix, nunits);
+  else
+    snprintf (buf, sizeof (buf), "ptrue\t%%0.%c, all", suffix);
+  return buf;
+}
+
  /* Split operands into moves from op[1] + op[2] into op[0].  */
  
  void
@@ -13304,13 +14783,12 @@ aarch64_split_combinev16qi (rtx operands[3])
  
  /* vec_perm support.  */
  
-#define MAX_VECT_LEN 16
-
  struct expand_vec_perm_d
  {
    rtx target, op0, op1;
    vec_perm_indices perm;
    machine_mode vmode;
+  unsigned int vec_flags;
    bool one_vector_p;
    bool testing_p;
  };
@@ -13392,6 +14870,74 @@ aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel,
    aarch64_expand_vec_perm_1 (target, op0, op1, sel);
  }
  
+/* Generate (set TARGET (unspec [OP0 OP1] CODE)).  */
+
+static void
+emit_unspec2 (rtx target, int code, rtx op0, rtx op1)
+{
+  emit_insn (gen_rtx_SET (target,
+                         gen_rtx_UNSPEC (GET_MODE (target),
+                                         gen_rtvec (2, op0, op1), code)));
+}
+
+/* Expand an SVE vec_perm with the given operands.  */
+
+void
+aarch64_expand_sve_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
+{
+  machine_mode data_mode = GET_MODE (target);
+  machine_mode sel_mode = GET_MODE (sel);
+  /* Enforced by the pattern condition.  */
+  int nunits = GET_MODE_NUNITS (sel_mode).to_constant ();
+
+  /* Note: vec_perm indices are supposed to wrap when they go beyond the
+     size of the two value vectors, i.e. the upper bits of the indices
+     are effectively ignored.  SVE TBL instead produces 0 for any
+     out-of-range indices, so we need to modulo all the vec_perm indices
+     to ensure they are all in range.  */
+  rtx sel_reg = force_reg (sel_mode, sel);
+
+  /* Check if the sel only references the first values vector.  */
+  if (GET_CODE (sel) == CONST_VECTOR
+      && aarch64_const_vec_all_in_range_p (sel, 0, nunits - 1))
+    {
+      emit_unspec2 (target, UNSPEC_TBL, op0, sel_reg);
+      return;
+    }
+
+  /* Check if the two values vectors are the same.  */
+  if (rtx_equal_p (op0, op1))
+    {
+      rtx max_sel = aarch64_simd_gen_const_vector_dup (sel_mode, nunits - 1);
+      rtx sel_mod = expand_simple_binop (sel_mode, AND, sel_reg, max_sel,
+                                        NULL, 0, OPTAB_DIRECT);
+      emit_unspec2 (target, UNSPEC_TBL, op0, sel_mod);
+      return;
+    }
+
+  /* Run TBL on for each value vector and combine the results.  */
+
+  rtx res0 = gen_reg_rtx (data_mode);
+  rtx res1 = gen_reg_rtx (data_mode);
+  rtx neg_num_elems = aarch64_simd_gen_const_vector_dup (sel_mode, -nunits);
+  if (GET_CODE (sel) != CONST_VECTOR
+      || !aarch64_const_vec_all_in_range_p (sel, 0, 2 * nunits - 1))
+    {
+      rtx max_sel = aarch64_simd_gen_const_vector_dup (sel_mode,
+                                                      2 * nunits - 1);
+      sel_reg = expand_simple_binop (sel_mode, AND, sel_reg, max_sel,
+                                    NULL, 0, OPTAB_DIRECT);
+    }
+  emit_unspec2 (res0, UNSPEC_TBL, op0, sel_reg);
+  rtx sel_sub = expand_simple_binop (sel_mode, PLUS, sel_reg, neg_num_elems,
+                                    NULL, 0, OPTAB_DIRECT);
+  emit_unspec2 (res1, UNSPEC_TBL, op1, sel_sub);
+  if (GET_MODE_CLASS (data_mode) == MODE_VECTOR_INT)
+    emit_insn (gen_rtx_SET (target, gen_rtx_IOR (data_mode, res0, res1)));
+  else
+    emit_unspec2 (target, UNSPEC_IORF, res0, res1);
+}
+
  /* Recognize patterns suitable for the TRN instructions.  */
  static bool
  aarch64_evpc_trn (struct expand_vec_perm_d *d)
@@ -13418,7 +14964,9 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d)
  
    in0 = d->op0;
    in1 = d->op1;
-  if (BYTES_BIG_ENDIAN)
+  /* We don't need a big-endian lane correction for SVE; see the comment
+     at the head of aarch64-sve.md for details.  */
+  if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD)
      {
        x = in0, in0 = in1, in1 = x;
        odd = !odd;
@@ -13454,7 +15002,9 @@ aarch64_evpc_uzp (struct expand_vec_perm_d *d)
  
    in0 = d->op0;
    in1 = d->op1;
-  if (BYTES_BIG_ENDIAN)
+  /* We don't need a big-endian lane correction for SVE; see the comment
+     at the head of aarch64-sve.md for details.  */
+  if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD)
      {
        x = in0, in0 = in1, in1 = x;
        odd = !odd;
@@ -13493,7 +15043,9 @@ aarch64_evpc_zip (struct expand_vec_perm_d *d)
  
    in0 = d->op0;
    in1 = d->op1;
-  if (BYTES_BIG_ENDIAN)
+  /* We don't need a big-endian lane correction for SVE; see the comment
+     at the head of aarch64-sve.md for details.  */
+  if (BYTES_BIG_ENDIAN && d->vec_flags == VEC_ADVSIMD)
      {
        x = in0, in0 = in1, in1 = x;
        high = !high;
@@ -13515,7 +15067,8 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d)
  
    /* The first element always refers to the first vector.
       Check if the extracted indices are increasing by one.  */
-  if (!d->perm[0].is_constant (&location)
+  if (d->vec_flags == VEC_SVE_PRED
+      || !d->perm[0].is_constant (&location)
        || !d->perm.series_p (0, 1, location, 1))
      return false;
  
@@ -13524,9 +15077,11 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d)
      return true;
  
    /* The case where (location == 0) is a no-op for both big- and little-endian,
-     and is removed by the mid-end at optimization levels -O1 and higher.  */
+     and is removed by the mid-end at optimization levels -O1 and higher.
  
-  if (BYTES_BIG_ENDIAN && (location != 0))
+     We don't need a big-endian lane correction for SVE; see the comment
+     at the head of aarch64-sve.md for details.  */
+  if (BYTES_BIG_ENDIAN && location != 0 && d->vec_flags == VEC_ADVSIMD)
      {
        /* After setup, we want the high elements of the first vector (stored
           at the LSB end of the register), and the low elements of the second
@@ -13546,25 +15101,37 @@ aarch64_evpc_ext (struct expand_vec_perm_d *d)
    return true;
  }
  
-/* Recognize patterns for the REV insns.  */
+/* Recognize patterns for the REV{64,32,16} insns, which reverse elements
+   within each 64-bit, 32-bit or 16-bit granule.  */
  
  static bool
-aarch64_evpc_rev (struct expand_vec_perm_d *d)
+aarch64_evpc_rev_local (struct expand_vec_perm_d *d)
  {
    HOST_WIDE_INT diff;
    unsigned int i, size, unspec;
+  machine_mode pred_mode;
  
-  if (!d->one_vector_p
+  if (d->vec_flags == VEC_SVE_PRED
+      || !d->one_vector_p
        || !d->perm[0].is_constant (&diff))
      return false;
  
    size = (diff + 1) * GET_MODE_UNIT_SIZE (d->vmode);
    if (size == 8)
-    unspec = UNSPEC_REV64;
+    {
+      unspec = UNSPEC_REV64;
+      pred_mode = VNx2BImode;
+    }
    else if (size == 4)
-    unspec = UNSPEC_REV32;
+    {
+      unspec = UNSPEC_REV32;
+      pred_mode = VNx4BImode;
+    }
    else if (size == 2)
-    unspec = UNSPEC_REV16;
+    {
+      unspec = UNSPEC_REV16;
+      pred_mode = VNx8BImode;
+    }
    else
      return false;
  
@@ -13577,8 +15144,37 @@ aarch64_evpc_rev (struct expand_vec_perm_d *d)
    if (d->testing_p)
      return true;
  
-  emit_set_insn (d->target, gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0),
-                                           unspec));
+  rtx src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0), unspec);
+  if (d->vec_flags == VEC_SVE_DATA)
+    {
+      rtx pred = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+      src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (2, pred, src),
+                           UNSPEC_MERGE_PTRUE);
+    }
+  emit_set_insn (d->target, src);
+  return true;
+}
+
+/* Recognize patterns for the REV insn, which reverses elements within
+   a full vector.  */
+
+static bool
+aarch64_evpc_rev_global (struct expand_vec_perm_d *d)
+{
+  poly_uint64 nelt = d->perm.length ();
+
+  if (!d->one_vector_p || d->vec_flags != VEC_SVE_DATA)
+    return false;
+
+  if (!d->perm.series_p (0, 1, nelt - 1, -1))
+    return false;
+
+  /* Success! */
+  if (d->testing_p)
+    return true;
+
+  rtx src = gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0), UNSPEC_REV);
+  emit_set_insn (d->target, src);
    return true;
  }
  
@@ -13591,10 +15187,14 @@ aarch64_evpc_dup (struct expand_vec_perm_d *d)
    machine_mode vmode = d->vmode;
    rtx lane;
  
-  if (d->perm.encoding ().encoded_nelts () != 1
+  if (d->vec_flags == VEC_SVE_PRED
+      || d->perm.encoding ().encoded_nelts () != 1
        || !d->perm[0].is_constant (&elt))
      return false;
  
+  if (d->vec_flags == VEC_SVE_DATA && elt >= 64 * GET_MODE_UNIT_SIZE (vmode))
+    return false;
+
    /* Success! */
    if (d->testing_p)
      return true;
@@ -13616,7 +15216,7 @@ aarch64_evpc_dup (struct expand_vec_perm_d *d)
  static bool
  aarch64_evpc_tbl (struct expand_vec_perm_d *d)
  {
-  rtx rperm[MAX_VECT_LEN], sel;
+  rtx rperm[MAX_COMPILE_TIME_VEC_BYTES], sel;
    machine_mode vmode = d->vmode;
  
    /* Make sure that the indices are constant.  */
@@ -13652,6 +15252,27 @@ aarch64_evpc_tbl (struct expand_vec_perm_d *d)
    return true;
  }
  
+/* Try to implement D using an SVE TBL instruction.  */
+
+static bool
+aarch64_evpc_sve_tbl (struct expand_vec_perm_d *d)
+{
+  unsigned HOST_WIDE_INT nelt;
+
+  /* Permuting two variable-length vectors could overflow the
+     index range.  */
+  if (!d->one_vector_p && !d->perm.length ().is_constant (&nelt))
+    return false;
+
+  if (d->testing_p)
+    return true;
+
+  machine_mode sel_mode = mode_for_int_vector (d->vmode).require ();
+  rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm);
+  aarch64_expand_sve_vec_perm (d->target, d->op0, d->op1, sel);
+  return true;
+}
+
  static bool
  aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
  {
@@ -13665,9 +15286,14 @@ aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
        std::swap (d->op0, d->op1);
      }
  
-  if (TARGET_SIMD && known_gt (nelt, 1))
+  if ((d->vec_flags == VEC_ADVSIMD
+       || d->vec_flags == VEC_SVE_DATA
+       || d->vec_flags == VEC_SVE_PRED)
+      && known_gt (nelt, 1))
      {
-      if (aarch64_evpc_rev (d))
+      if (aarch64_evpc_rev_local (d))
+       return true;
+      else if (aarch64_evpc_rev_global (d))
         return true;
        else if (aarch64_evpc_ext (d))
         return true;
@@ -13679,7 +15305,10 @@ aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
         return true;
        else if (aarch64_evpc_trn (d))
         return true;
-      return aarch64_evpc_tbl (d);
+      if (d->vec_flags == VEC_SVE_DATA)
+       return aarch64_evpc_sve_tbl (d);
+      else if (d->vec_flags == VEC_SVE_DATA)
+       return aarch64_evpc_tbl (d);
      }
    return false;
  }
@@ -13711,6 +15340,7 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, rtx target, rtx op0,
    d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2,
                      sel.nelts_per_input ());
    d.vmode = vmode;
+  d.vec_flags = aarch64_classify_vector_mode (d.vmode);
    d.target = target;
    d.op0 = op0;
    d.op1 = op1;
@@ -13749,6 +15379,272 @@ aarch64_reverse_mask (machine_mode mode, unsigned int nunits)
    return force_reg (V16QImode, mask);
  }
  
+/* Return true if X is a valid second operand for the SVE instruction
+   that implements integer comparison OP_CODE.  */
+
+static bool
+aarch64_sve_cmp_operand_p (rtx_code op_code, rtx x)
+{
+  if (register_operand (x, VOIDmode))
+    return true;
+
+  switch (op_code)
+    {
+    case LTU:
+    case LEU:
+    case GEU:
+    case GTU:
+      return aarch64_sve_cmp_immediate_p (x, false);
+    case LT:
+    case LE:
+    case GE:
+    case GT:
+    case NE:
+    case EQ:
+      return aarch64_sve_cmp_immediate_p (x, true);
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return the UNSPEC_COND_* code for comparison CODE.  */
+
+static unsigned int
+aarch64_unspec_cond_code (rtx_code code)
+{
+  switch (code)
+    {
+    case NE:
+      return UNSPEC_COND_NE;
+    case EQ:
+      return UNSPEC_COND_EQ;
+    case LT:
+      return UNSPEC_COND_LT;
+    case GT:
+      return UNSPEC_COND_GT;
+    case LE:
+      return UNSPEC_COND_LE;
+    case GE:
+      return UNSPEC_COND_GE;
+    case LTU:
+      return UNSPEC_COND_LO;
+    case GTU:
+      return UNSPEC_COND_HI;
+    case LEU:
+      return UNSPEC_COND_LS;
+    case GEU:
+      return UNSPEC_COND_HS;
+    case UNORDERED:
+      return UNSPEC_COND_UO;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return an (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_<X>) expression,
+   where <X> is the operation associated with comparison CODE.  */
+
+static rtx
+aarch64_gen_unspec_cond (rtx_code code, machine_mode pred_mode,
+                        rtx pred, rtx op0, rtx op1)
+{
+  rtvec vec = gen_rtvec (3, pred, op0, op1);
+  return gen_rtx_UNSPEC (pred_mode, vec, aarch64_unspec_cond_code (code));
+}
+
+/* Expand an SVE integer comparison:
+
+     TARGET = CODE (OP0, OP1).  */
+
+void
+aarch64_expand_sve_vec_cmp_int (rtx target, rtx_code code, rtx op0, rtx op1)
+{
+  machine_mode pred_mode = GET_MODE (target);
+  machine_mode data_mode = GET_MODE (op0);
+
+  if (!aarch64_sve_cmp_operand_p (code, op1))
+    op1 = force_reg (data_mode, op1);
+
+  rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+  rtx unspec = aarch64_gen_unspec_cond (code, pred_mode, ptrue, op0, op1);
+  emit_insn (gen_set_clobber_cc (target, unspec));
+}
+
+/* Emit an instruction:
+
+      (set TARGET (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_<X>))
+
+   where <X> is the operation associated with comparison CODE.  */
+
+static void
+aarch64_emit_unspec_cond (rtx target, rtx_code code, machine_mode pred_mode,
+                         rtx pred, rtx op0, rtx op1)
+{
+  rtx unspec = aarch64_gen_unspec_cond (code, pred_mode, pred, op0, op1);
+  emit_set_insn (target, unspec);
+}
+
+/* Emit:
+
+      (set TMP1 (unspec:PRED_MODE [PTRUE OP0 OP1] UNSPEC_COND_<X1>))
+      (set TMP2 (unspec:PRED_MODE [PTRUE OP0 OP1] UNSPEC_COND_<X2>))
+      (set TARGET (and:PRED_MODE (ior:PRED_MODE TMP1 TMP2) PTRUE))
+
+   where <Xi> is the operation associated with comparison CODEi.  */
+
+static void
+aarch64_emit_unspec_cond_or (rtx target, rtx_code code1, rtx_code code2,
+                            machine_mode pred_mode, rtx ptrue,
+                            rtx op0, rtx op1)
+{
+  rtx tmp1 = gen_reg_rtx (pred_mode);
+  aarch64_emit_unspec_cond (tmp1, code1, pred_mode, ptrue, op0, op1);
+  rtx tmp2 = gen_reg_rtx (pred_mode);
+  aarch64_emit_unspec_cond (tmp2, code2, pred_mode, ptrue, op0, op1);
+  emit_set_insn (target, gen_rtx_AND (pred_mode,
+                                     gen_rtx_IOR (pred_mode, tmp1, tmp2),
+                                     ptrue));
+}
+
+/* If CAN_INVERT_P, emit an instruction:
+
+      (set TARGET (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_<X>))
+
+   where <X> is the operation associated with comparison CODE.  Otherwise
+   emit:
+
+      (set TMP (unspec:PRED_MODE [PRED OP0 OP1] UNSPEC_COND_<X>))
+      (set TARGET (and:PRED_MODE (not:PRED_MODE TMP) PTRUE))
+
+   where the second instructions sets TARGET to the inverse of TMP.  */
+
+static void
+aarch64_emit_inverted_unspec_cond (rtx target, rtx_code code,
+                                  machine_mode pred_mode, rtx ptrue, rtx pred,
+                                  rtx op0, rtx op1, bool can_invert_p)
+{
+  if (can_invert_p)
+    aarch64_emit_unspec_cond (target, code, pred_mode, pred, op0, op1);
+  else
+    {
+      rtx tmp = gen_reg_rtx (pred_mode);
+      aarch64_emit_unspec_cond (tmp, code, pred_mode, pred, op0, op1);
+      emit_set_insn (target, gen_rtx_AND (pred_mode,
+                                         gen_rtx_NOT (pred_mode, tmp),
+                                         ptrue));
+    }
+}
+
+/* Expand an SVE floating-point comparison:
+
+     TARGET = CODE (OP0, OP1)
+
+   If CAN_INVERT_P is true, the caller can also handle inverted results;
+   return true if the result is in fact inverted.  */
+
+bool
+aarch64_expand_sve_vec_cmp_float (rtx target, rtx_code code,
+                                 rtx op0, rtx op1, bool can_invert_p)
+{
+  machine_mode pred_mode = GET_MODE (target);
+  machine_mode data_mode = GET_MODE (op0);
+
+  rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+  switch (code)
+    {
+    case UNORDERED:
+      /* UNORDERED has no immediate form.  */
+      op1 = force_reg (data_mode, op1);
+      aarch64_emit_unspec_cond (target, code, pred_mode, ptrue, op0, op1);
+      return false;
+
+    case LT:
+    case LE:
+    case GT:
+    case GE:
+    case EQ:
+    case NE:
+      /* There is native support for the comparison.  */
+      aarch64_emit_unspec_cond (target, code, pred_mode, ptrue, op0, op1);
+      return false;
+
+    case ORDERED:
+      /* There is native support for the inverse comparison.  */
+      op1 = force_reg (data_mode, op1);
+      aarch64_emit_inverted_unspec_cond (target, UNORDERED,
+                                        pred_mode, ptrue, ptrue, op0, op1,
+                                        can_invert_p);
+      return can_invert_p;
+
+    case LTGT:
+      /* This is a trapping operation (LT or GT).  */
+      aarch64_emit_unspec_cond_or (target, LT, GT, pred_mode, ptrue, op0, op1);
+      return false;
+
+    case UNEQ:
+      if (!flag_trapping_math)
+       {
+         /* This would trap for signaling NaNs.  */
+         op1 = force_reg (data_mode, op1);
+         aarch64_emit_unspec_cond_or (target, UNORDERED, EQ,
+                                      pred_mode, ptrue, op0, op1);
+         return false;
+       }
+      /* fall through */
+
+    case UNLT:
+    case UNLE:
+    case UNGT:
+    case UNGE:
+      {
+       rtx ordered = ptrue;
+       if (flag_trapping_math)
+         {
+           /* Only compare the elements that are known to be ordered.  */
+           ordered = gen_reg_rtx (pred_mode);
+           op1 = force_reg (data_mode, op1);
+           aarch64_emit_inverted_unspec_cond (ordered, UNORDERED, pred_mode,
+                                              ptrue, ptrue, op0, op1, false);
+         }
+       if (code == UNEQ)
+         code = NE;
+       else
+         code = reverse_condition_maybe_unordered (code);
+       aarch64_emit_inverted_unspec_cond (target, code, pred_mode, ptrue,
+                                          ordered, op0, op1, can_invert_p);
+       return can_invert_p;
+      }
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Expand an SVE vcond pattern with operands OPS.  DATA_MODE is the mode
+   of the data being selected and CMP_MODE is the mode of the values being
+   compared.  */
+
+void
+aarch64_expand_sve_vcond (machine_mode data_mode, machine_mode cmp_mode,
+                         rtx *ops)
+{
+  machine_mode pred_mode
+    = aarch64_get_mask_mode (GET_MODE_NUNITS (cmp_mode),
+                            GET_MODE_SIZE (cmp_mode)).require ();
+  rtx pred = gen_reg_rtx (pred_mode);
+  if (FLOAT_MODE_P (cmp_mode))
+    {
+      if (aarch64_expand_sve_vec_cmp_float (pred, GET_CODE (ops[3]),
+                                           ops[4], ops[5], true))
+       std::swap (ops[1], ops[2]);
+    }
+  else
+    aarch64_expand_sve_vec_cmp_int (pred, GET_CODE (ops[3]), ops[4], ops[5]);
+
+  rtvec vec = gen_rtvec (3, pred, ops[1], ops[2]);
+  emit_set_insn (ops[0], gen_rtx_UNSPEC (data_mode, vec, UNSPEC_SEL));
+}
+
  /* Implement TARGET_MODES_TIEABLE_P.  In principle we should always return
     true.  However due to issues with register allocation it is preferable
     to avoid tieing integer scalar and FP scalar modes.  Executing integer
@@ -13765,8 +15661,12 @@ aarch64_modes_tieable_p (machine_mode mode1, machine_mode mode2)
  
    /* We specifically want to allow elements of "structure" modes to
       be tieable to the structure.  This more general condition allows
-     other rarer situations too.  */
-  if (aarch64_vector_mode_p (mode1) && aarch64_vector_mode_p (mode2))
+     other rarer situations too.  The reason we don't extend this to
+     predicate modes is that there are no predicate structure modes
+     nor any specific instructions for extracting part of a predicate
+     register.  */
+  if (aarch64_vector_data_mode_p (mode1)
+      && aarch64_vector_data_mode_p (mode2))
      return true;
  
    /* Also allow any scalar modes with vectors.  */
@@ -15020,6 +16920,19 @@ aarch64_optab_supported_p (int op, machine_mode mode1, machine_mode,
      }
  }
  
+/* Implement the TARGET_DWARF_POLY_INDETERMINATE_VALUE hook.  */
+
+static unsigned int
+aarch64_dwarf_poly_indeterminate_value (unsigned int i, unsigned int *factor,
+                                       int *offset)
+{
+  /* Polynomial invariant 1 == (VG / 2) - 1.  */
+  gcc_assert (i == 1);
+  *factor = 2;
+  *offset = 1;
+  return AARCH64_DWARF_VG;
+}
+
  /* Implement TARGET_LIBGCC_FLOATING_POINT_MODE_SUPPORTED_P - return TRUE
     if MODE is HFmode, and punt to the generic implementation otherwise.  */
  
@@ -15112,6 +17025,38 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn)
      }
  }
  
+/* Implement TARGET_COMPUTE_PRESSURE_CLASSES.  */
+
+static int
+aarch64_compute_pressure_classes (reg_class *classes)
+{
+  int i = 0;
+  classes[i++] = GENERAL_REGS;
+  classes[i++] = FP_REGS;
+  /* PR_REGS isn't a useful pressure class because many predicate pseudo
+     registers need to go in PR_LO_REGS at some point during their
+     lifetime.  Splitting it into two halves has the effect of making
+     all predicates count against PR_LO_REGS, so that we try whenever
+     possible to restrict the number of live predicates to 8.  This
+     greatly reduces the amount of spilling in certain loops.  */
+  classes[i++] = PR_LO_REGS;
+  classes[i++] = PR_HI_REGS;
+  return i;
+}
+
+/* Implement TARGET_CAN_CHANGE_MODE_CLASS.  */
+
+static bool
+aarch64_can_change_mode_class (machine_mode from,
+                              machine_mode to, reg_class_t)
+{
+  /* See the comment at the head of aarch64-sve.md for details.  */
+  if (BYTES_BIG_ENDIAN
+      && (aarch64_sve_data_mode_p (from) != aarch64_sve_data_mode_p (to)))
+    return false;
+  return true;
+}
+
  /* Target-specific selftests.  */
  
  #if CHECKING_P
@@ -15260,6 +17205,11 @@ aarch64_run_selftests (void)
  #undef TARGET_FUNCTION_ARG_PADDING
  #define TARGET_FUNCTION_ARG_PADDING aarch64_function_arg_padding
  
+#undef TARGET_GET_RAW_RESULT_MODE
+#define TARGET_GET_RAW_RESULT_MODE aarch64_get_reg_raw_mode
+#undef TARGET_GET_RAW_ARG_MODE
+#define TARGET_GET_RAW_ARG_MODE aarch64_get_reg_raw_mode
+
  #undef TARGET_FUNCTION_OK_FOR_SIBCALL
  #define TARGET_FUNCTION_OK_FOR_SIBCALL aarch64_function_ok_for_sibcall
  
@@ -15468,6 +17418,9 @@ aarch64_libgcc_floating_mode_supported_p
  #undef TARGET_VECTOR_ALIGNMENT
  #define TARGET_VECTOR_ALIGNMENT aarch64_simd_vector_alignment
  
+#undef TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT
+#define TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT \
+  aarch64_vectorize_preferred_vector_alignment
  #undef TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE
  #define TARGET_VECTORIZE_VECTOR_ALIGNMENT_REACHABLE \
    aarch64_simd_vector_alignment_reachable
@@ -15478,6 +17431,9 @@ aarch64_libgcc_floating_mode_supported_p
  #define TARGET_VECTORIZE_VEC_PERM_CONST \
    aarch64_vectorize_vec_perm_const
  
+#undef TARGET_VECTORIZE_GET_MASK_MODE
+#define TARGET_VECTORIZE_GET_MASK_MODE aarch64_get_mask_mode
+
  #undef TARGET_INIT_LIBFUNCS
  #define TARGET_INIT_LIBFUNCS aarch64_init_libfuncs
  
@@ -15532,6 +17488,10 @@ aarch64_libgcc_floating_mode_supported_p
  #undef TARGET_OMIT_STRUCT_RETURN_REG
  #define TARGET_OMIT_STRUCT_RETURN_REG true
  
+#undef TARGET_DWARF_POLY_INDETERMINATE_VALUE
+#define TARGET_DWARF_POLY_INDETERMINATE_VALUE \
+  aarch64_dwarf_poly_indeterminate_value
+
  /* The architecture reserves bits 0 and 1 so use bit 2 for descriptors.  */
  #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
  #define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 4
@@ -15551,6 +17511,12 @@ aarch64_libgcc_floating_mode_supported_p
  #undef TARGET_CONSTANT_ALIGNMENT
  #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
  
+#undef TARGET_COMPUTE_PRESSURE_CLASSES
+#define TARGET_COMPUTE_PRESSURE_CLASSES aarch64_compute_pressure_classes
+
+#undef TARGET_CAN_CHANGE_MODE_CLASS
+#define TARGET_CAN_CHANGE_MODE_CLASS aarch64_can_change_mode_class
+
  #if CHECKING_P
  #undef TARGET_RUN_TARGET_SELFTESTS
  #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h

index 98e45171043e8f5c5b5bfe4576185837760bdd23..fc99fc4627ec8bc8f8e9dae01489ed0f2ee00459 100644 (file)
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -144,18 +144,19 @@ extern unsigned aarch64_architecture_version;
  /* ARMv8.2-A architecture extensions.  */
  #define AARCH64_FL_V8_2       (1 << 8)  /* Has ARMv8.2-A features.  */
  #define AARCH64_FL_F16       (1 << 9)  /* Has ARMv8.2-A FP16 extensions.  */
+#define AARCH64_FL_SVE        (1 << 10) /* Has Scalable Vector Extensions.  */
  /* ARMv8.3-A architecture extensions.  */
-#define AARCH64_FL_V8_3       (1 << 10)  /* Has ARMv8.3-A features.  */
-#define AARCH64_FL_RCPC       (1 << 11)  /* Has support for RCpc model.  */
-#define AARCH64_FL_DOTPROD    (1 << 12)  /* Has ARMv8.2-A Dot Product ins.  */
+#define AARCH64_FL_V8_3       (1 << 11)  /* Has ARMv8.3-A features.  */
+#define AARCH64_FL_RCPC       (1 << 12)  /* Has support for RCpc model.  */
+#define AARCH64_FL_DOTPROD    (1 << 13)  /* Has ARMv8.2-A Dot Product ins.  */
  /* New flags to split crypto into aes and sha2.  */
-#define AARCH64_FL_AES       (1 << 13)  /* Has Crypto AES.  */
-#define AARCH64_FL_SHA2              (1 << 14)  /* Has Crypto SHA2.  */
+#define AARCH64_FL_AES       (1 << 14)  /* Has Crypto AES.  */
+#define AARCH64_FL_SHA2              (1 << 15)  /* Has Crypto SHA2.  */
  /* ARMv8.4-A architecture extensions.  */
-#define AARCH64_FL_V8_4              (1 << 15)  /* Has ARMv8.4-A features.  */
-#define AARCH64_FL_SM4       (1 << 16)  /* Has ARMv8.4-A SM3 and SM4.  */
-#define AARCH64_FL_SHA3              (1 << 17)  /* Has ARMv8.4-a SHA3 and SHA512.  */
-#define AARCH64_FL_F16FML     (1 << 18)  /* Has ARMv8.4-a FP16 extensions.  */
+#define AARCH64_FL_V8_4              (1 << 16)  /* Has ARMv8.4-A features.  */
+#define AARCH64_FL_SM4       (1 << 17)  /* Has ARMv8.4-A SM3 and SM4.  */
+#define AARCH64_FL_SHA3              (1 << 18)  /* Has ARMv8.4-a SHA3 and SHA512.  */
+#define AARCH64_FL_F16FML     (1 << 19)  /* Has ARMv8.4-a FP16 extensions.  */
  
  /* Has FP and SIMD.  */
  #define AARCH64_FL_FPSIMD     (AARCH64_FL_FP | AARCH64_FL_SIMD)
@@ -186,6 +187,7 @@ extern unsigned aarch64_architecture_version;
  #define AARCH64_ISA_RDMA          (aarch64_isa_flags & AARCH64_FL_RDMA)
  #define AARCH64_ISA_V8_2          (aarch64_isa_flags & AARCH64_FL_V8_2)
  #define AARCH64_ISA_F16                   (aarch64_isa_flags & AARCH64_FL_F16)
+#define AARCH64_ISA_SVE            (aarch64_isa_flags & AARCH64_FL_SVE)
  #define AARCH64_ISA_V8_3          (aarch64_isa_flags & AARCH64_FL_V8_3)
  #define AARCH64_ISA_DOTPROD       (aarch64_isa_flags & AARCH64_FL_DOTPROD)
  #define AARCH64_ISA_AES                   (aarch64_isa_flags & AARCH64_FL_AES)
@@ -226,6 +228,9 @@ extern unsigned aarch64_architecture_version;
  /* Dot Product is an optional extension to AdvSIMD enabled through +dotprod.  */
  #define TARGET_DOTPROD (TARGET_SIMD && AARCH64_ISA_DOTPROD)
  
+/* SVE instructions, enabled through +sve.  */
+#define TARGET_SVE (AARCH64_ISA_SVE)
+
  /* ARMv8.3-A features.  */
  #define TARGET_ARMV8_3 (AARCH64_ISA_V8_3)
  
@@ -286,8 +291,17 @@ extern unsigned aarch64_architecture_version;
     V0-V7       Parameter/result registers
  
     The vector register V0 holds scalar B0, H0, S0 and D0 in its least
-   significant bits.  Unlike AArch32 S1 is not packed into D0,
-   etc.  */
+   significant bits.  Unlike AArch32 S1 is not packed into D0, etc.
+
+   P0-P7        Predicate low registers: valid in all predicate contexts
+   P8-P15       Predicate high registers: used as scratch space
+
+   VG           Pseudo "vector granules" register
+
+   VG is the number of 64-bit elements in an SVE vector.  We define
+   it as a hard register so that we can easily map it to the DWARF VG
+   register.  GCC internally uses the poly_int variable aarch64_sve_vg
+   instead.  */
  
  /* Note that we don't mark X30 as a call-clobbered register.  The idea is
     that it's really the call instructions themselves which clobber X30.
@@ -308,7 +322,9 @@ extern unsigned aarch64_architecture_version;
      0, 0, 0, 0,   0, 0, 0, 0,   /* V8 - V15 */         \
      0, 0, 0, 0,   0, 0, 0, 0,   /* V16 - V23 */         \
      0, 0, 0, 0,   0, 0, 0, 0,   /* V24 - V31 */         \
-    1, 1, 1,                   /* SFP, AP, CC */       \
+    1, 1, 1, 1,                        /* SFP, AP, CC, VG */   \
+    0, 0, 0, 0,   0, 0, 0, 0,   /* P0 - P7 */           \
+    0, 0, 0, 0,   0, 0, 0, 0,   /* P8 - P15 */          \
    }
  
  #define CALL_USED_REGISTERS                            \
@@ -321,7 +337,9 @@ extern unsigned aarch64_architecture_version;
      0, 0, 0, 0,   0, 0, 0, 0,  /* V8 - V15 */          \
      1, 1, 1, 1,   1, 1, 1, 1,   /* V16 - V23 */         \
      1, 1, 1, 1,   1, 1, 1, 1,   /* V24 - V31 */         \
-    1, 1, 1,                   /* SFP, AP, CC */       \
+    1, 1, 1, 1,                        /* SFP, AP, CC, VG */   \
+    1, 1, 1, 1,   1, 1, 1, 1,  /* P0 - P7 */           \
+    1, 1, 1, 1,   1, 1, 1, 1,  /* P8 - P15 */          \
    }
  
  #define REGISTER_NAMES                                         \
@@ -334,7 +352,9 @@ extern unsigned aarch64_architecture_version;
      "v8",  "v9",  "v10", "v11", "v12", "v13", "v14", "v15",    \
      "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23",    \
      "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31",    \
-    "sfp", "ap",  "cc",                                                \
+    "sfp", "ap",  "cc",  "vg",                                 \
+    "p0",  "p1",  "p2",  "p3",  "p4",  "p5",  "p6",  "p7",     \
+    "p8",  "p9",  "p10", "p11", "p12", "p13", "p14", "p15",    \
    }
  
  /* Generate the register aliases for core register N */
@@ -345,7 +365,8 @@ extern unsigned aarch64_architecture_version;
                       {"d" # N, V0_REGNUM + (N)}, \
                       {"s" # N, V0_REGNUM + (N)}, \
                       {"h" # N, V0_REGNUM + (N)}, \
-                     {"b" # N, V0_REGNUM + (N)}
+                     {"b" # N, V0_REGNUM + (N)}, \
+                     {"z" # N, V0_REGNUM + (N)}
  
  /* Provide aliases for all of the ISA defined register name forms.
     These aliases are convenient for use in the clobber lists of inline
@@ -387,7 +408,7 @@ extern unsigned aarch64_architecture_version;
  #define FRAME_POINTER_REGNUM           SFP_REGNUM
  #define STACK_POINTER_REGNUM           SP_REGNUM
  #define ARG_POINTER_REGNUM             AP_REGNUM
-#define FIRST_PSEUDO_REGISTER          67
+#define FIRST_PSEUDO_REGISTER          (P15_REGNUM + 1)
  
  /* The number of (integer) argument register available.  */
  #define NUM_ARG_REGS                   8
@@ -408,6 +429,8 @@ extern unsigned aarch64_architecture_version;
  #define AARCH64_DWARF_NUMBER_R 31
  
  #define AARCH64_DWARF_SP       31
+#define AARCH64_DWARF_VG       46
+#define AARCH64_DWARF_P0       48
  #define AARCH64_DWARF_V0       64
  
  /* The number of V registers.  */
@@ -472,6 +495,12 @@ extern unsigned aarch64_architecture_version;
  #define FP_LO_REGNUM_P(REGNO)            \
    (((unsigned) (REGNO - V0_REGNUM)) <= (V15_REGNUM - V0_REGNUM))
  
+#define PR_REGNUM_P(REGNO)\
+  (((unsigned) (REGNO - P0_REGNUM)) <= (P15_REGNUM - P0_REGNUM))
+
+#define PR_LO_REGNUM_P(REGNO)\
+  (((unsigned) (REGNO - P0_REGNUM)) <= (P7_REGNUM - P0_REGNUM))
+
  \f
  /* Register and constant classes.  */
  
@@ -485,6 +514,9 @@ enum reg_class
    FP_LO_REGS,
    FP_REGS,
    POINTER_AND_FP_REGS,
+  PR_LO_REGS,
+  PR_HI_REGS,
+  PR_REGS,
    ALL_REGS,
    LIM_REG_CLASSES              /* Last */
  };
@@ -501,6 +533,9 @@ enum reg_class
    "FP_LO_REGS",                                        \
    "FP_REGS",                                   \
    "POINTER_AND_FP_REGS",                       \
+  "PR_LO_REGS",                                        \
+  "PR_HI_REGS",                                        \
+  "PR_REGS",                                   \
    "ALL_REGS"                                   \
  }
  
@@ -514,7 +549,10 @@ enum reg_class
    { 0x00000000, 0x0000ffff, 0x00000000 },       /* FP_LO_REGS  */      \
    { 0x00000000, 0xffffffff, 0x00000000 },       /* FP_REGS  */         \
    { 0xffffffff, 0xffffffff, 0x00000003 },      /* POINTER_AND_FP_REGS */\
-  { 0xffffffff, 0xffffffff, 0x00000007 }       /* ALL_REGS */          \
+  { 0x00000000, 0x00000000, 0x00000ff0 },      /* PR_LO_REGS */        \
+  { 0x00000000, 0x00000000, 0x000ff000 },      /* PR_HI_REGS */        \
+  { 0x00000000, 0x00000000, 0x000ffff0 },      /* PR_REGS */           \
+  { 0xffffffff, 0xffffffff, 0x000fffff }       /* ALL_REGS */          \
  }
  
  #define REGNO_REG_CLASS(REGNO) aarch64_regno_regclass (REGNO)
@@ -998,4 +1036,28 @@ extern tree aarch64_fp16_ptr_type_node;
  #define LIBGCC2_UNWIND_ATTRIBUTE \
    __attribute__((optimize ("no-omit-frame-pointer")))
  
+#ifndef USED_FOR_TARGET
+extern poly_uint16 aarch64_sve_vg;
+
+/* The number of bits and bytes in an SVE vector.  */
+#define BITS_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 64))
+#define BYTES_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 8))
+
+/* The number of bytes in an SVE predicate.  */
+#define BYTES_PER_SVE_PRED aarch64_sve_vg
+
+/* The SVE mode for a vector of bytes.  */
+#define SVE_BYTE_MODE VNx16QImode
+
+/* The maximum number of bytes in a fixed-size vector.  This is 256 bytes
+   (for -msve-vector-bits=2048) multiplied by the maximum number of
+   vectors in a structure mode (4).
+
+   This limit must not be used for variable-size vectors, since
+   VL-agnostic code must work with arbitary vector lengths.  */
+#define MAX_COMPILE_TIME_VEC_BYTES (256 * 4)
+#endif
+
+#define REGMODE_NATURAL_SIZE(MODE) aarch64_regmode_natural_size (MODE)
+
  #endif /* GCC_AARCH64_H */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md

index 854c44830e694a960acd3d5d507cfe2623661ad3..728136a7fbaabc7e87a1f77be84e3face4257b3f 100644 (file)
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -63,6 +63,11 @@
      (SFP_REGNUM                64)
      (AP_REGNUM         65)
      (CC_REGNUM         66)
+    ;; Defined only to make the DWARF description simpler.
+    (VG_REGNUM         67)
+    (P0_REGNUM         68)
+    (P7_REGNUM         75)
+    (P15_REGNUM                83)
    ]
  )
  
@@ -114,6 +119,7 @@
      UNSPEC_PACI1716
      UNSPEC_PACISP
      UNSPEC_PRLG_STK
+    UNSPEC_REV
      UNSPEC_RBIT
      UNSPEC_SCVTF
      UNSPEC_SISD_NEG
@@ -143,6 +149,18 @@
      UNSPEC_RSQRTS
      UNSPEC_NZCV
      UNSPEC_XPACLRI
+    UNSPEC_LD1_SVE
+    UNSPEC_ST1_SVE
+    UNSPEC_LD1RQ
+    UNSPEC_MERGE_PTRUE
+    UNSPEC_PTEST_PTRUE
+    UNSPEC_UNPACKSHI
+    UNSPEC_UNPACKUHI
+    UNSPEC_UNPACKSLO
+    UNSPEC_UNPACKULO
+    UNSPEC_PACK
+    UNSPEC_FLOAT_CONVERT
+    UNSPEC_WHILE_LO
  ])
  
  (define_c_enum "unspecv" [
@@ -194,6 +212,11 @@
  ;; will be disabled when !TARGET_SIMD.
  (define_attr "simd" "no,yes" (const_string "no"))
  
+;; Attribute that specifies whether or not the instruction uses SVE.
+;; When this is set to yes for an alternative, that alternative
+;; will be disabled when !TARGET_SVE.
+(define_attr "sve" "no,yes" (const_string "no"))
+
  (define_attr "length" ""
    (const_int 4))
  
@@ -202,13 +225,14 @@
  ;; registers when -mgeneral-regs-only is specified.
  (define_attr "enabled" "no,yes"
    (cond [(ior
-           (ior
-               (and (eq_attr "fp" "yes")
-                    (eq (symbol_ref "TARGET_FLOAT") (const_int 0)))
-               (and (eq_attr "simd" "yes")
-                    (eq (symbol_ref "TARGET_SIMD") (const_int 0))))
+           (and (eq_attr "fp" "yes")
+                (eq (symbol_ref "TARGET_FLOAT") (const_int 0)))
+           (and (eq_attr "simd" "yes")
+                (eq (symbol_ref "TARGET_SIMD") (const_int 0)))
             (and (eq_attr "fp16" "yes")
-                (eq (symbol_ref "TARGET_FP_F16INST") (const_int 0))))
+                (eq (symbol_ref "TARGET_FP_F16INST") (const_int 0)))
+           (and (eq_attr "sve" "yes")
+                (eq (symbol_ref "TARGET_SVE") (const_int 0))))
             (const_string "no")
         ] (const_string "yes")))
  
@@ -866,12 +890,18 @@
    "
      if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx)
        operands[1] = force_reg (<MODE>mode, operands[1]);
+
+    if (GET_CODE (operands[1]) == CONST_POLY_INT)
+      {
+       aarch64_expand_mov_immediate (operands[0], operands[1]);
+       DONE;
+      }
    "
  )
  
  (define_insn "*mov<mode>_aarch64"
-  [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r,   *w,r,*w, m, m, r,*w,*w")
-        (match_operand:SHORT 1 "general_operand"      " r,M,D<hq>,m, m,rZ,*w,*w, r,*w"))]
+  [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r,   *w,r ,r,*w, m, m, r,*w,*w")
+       (match_operand:SHORT 1 "aarch64_mov_operand"  " r,M,D<hq>,Usv,m, m,rZ,*w,*w, r,*w"))]
    "(register_operand (operands[0], <MODE>mode)
      || aarch64_reg_or_zero (operands[1], <MODE>mode))"
  {
@@ -885,26 +915,30 @@
         return aarch64_output_scalar_simd_mov_immediate (operands[1],
                                                         <MODE>mode);
       case 3:
-       return "ldr<size>\t%w0, %1";
+       return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
       case 4:
-       return "ldr\t%<size>0, %1";
+       return "ldr<size>\t%w0, %1";
       case 5:
-       return "str<size>\t%w1, %0";
+       return "ldr\t%<size>0, %1";
       case 6:
-       return "str\t%<size>1, %0";
+       return "str<size>\t%w1, %0";
       case 7:
-       return "umov\t%w0, %1.<v>[0]";
+       return "str\t%<size>1, %0";
       case 8:
-       return "dup\t%0.<Vallxd>, %w1";
+       return "umov\t%w0, %1.<v>[0]";
       case 9:
+       return "dup\t%0.<Vallxd>, %w1";
+     case 10:
         return "dup\t%<Vetype>0, %1.<v>[0]";
       default:
         gcc_unreachable ();
       }
  }
-  [(set_attr "type" "mov_reg,mov_imm,neon_move,load_4,load_4,store_4,store_4,\
-                     neon_to_gp<q>,neon_from_gp<q>,neon_dup")
-   (set_attr "simd" "*,*,yes,*,*,*,*,yes,yes,yes")]
+  ;; The "mov_imm" type for CNT is just a placeholder.
+  [(set_attr "type" "mov_reg,mov_imm,neon_move,mov_imm,load_4,load_4,store_4,
+                    store_4,neon_to_gp<q>,neon_from_gp<q>,neon_dup")
+   (set_attr "simd" "*,*,yes,*,*,*,*,*,yes,yes,yes")
+   (set_attr "sve" "*,*,*,yes,*,*,*,*,*,*,*")]
  )
  
  (define_expand "mov<mode>"
@@ -932,8 +966,8 @@
  )
  
  (define_insn_and_split "*movsi_aarch64"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,w, m, m,  r,  r, w,r,w, w")
-       (match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,m,m,rZ,*w,Usa,Ush,rZ,w,w,Ds"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r, r,w, m, m,  r,  r, w,r,w, w")
+       (match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,Usv,m,m,rZ,*w,Usa,Ush,rZ,w,w,Ds"))]
    "(register_operand (operands[0], SImode)
      || aarch64_reg_or_zero (operands[1], SImode))"
    "@
@@ -942,6 +976,7 @@
     mov\\t%w0, %w1
     mov\\t%w0, %1
     #
+   * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
     ldr\\t%w0, %1
     ldr\\t%s0, %1
     str\\t%w1, %0
@@ -959,15 +994,17 @@
         aarch64_expand_mov_immediate (operands[0], operands[1]);
         DONE;
      }"
-  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,load_4,load_4,store_4,store_4,\
-                   adr,adr,f_mcr,f_mrc,fmov,neon_move")
-   (set_attr "fp" "*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
-   (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
+  ;; The "mov_imm" type for CNT is just a placeholder.
+  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_4,
+                   load_4,store_4,store_4,adr,adr,f_mcr,f_mrc,fmov,neon_move")
+   (set_attr "fp" "*,*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
+   (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")
+   (set_attr "sve" "*,*,*,*,*,yes,*,*,*,*,*,*,*,*,*,*")]
  )
  
  (define_insn_and_split "*movdi_aarch64"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,r,w, m,m,  r,  r, w,r,w, w")
-       (match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,M,n,m,m,rZ,w,Usa,Ush,rZ,w,w,Dd"))]
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,r, r,w, m,m,  r,  r, w,r,w, w")
+       (match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,M,n,Usv,m,m,rZ,w,Usa,Ush,rZ,w,w,Dd"))]
    "(register_operand (operands[0], DImode)
      || aarch64_reg_or_zero (operands[1], DImode))"
    "@
@@ -977,6 +1014,7 @@
     mov\\t%x0, %1
     mov\\t%w0, %1
     #
+   * return aarch64_output_sve_cnt_immediate (\"cnt\", \"%x0\", operands[1]);
     ldr\\t%x0, %1
     ldr\\t%d0, %1
     str\\t%x1, %0
@@ -994,10 +1032,13 @@
         aarch64_expand_mov_immediate (operands[0], operands[1]);
         DONE;
      }"
-  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,load_8,\
-                     load_8,store_8,store_8,adr,adr,f_mcr,f_mrc,fmov,neon_move")
-   (set_attr "fp" "*,*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
-   (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
+  ;; The "mov_imm" type for CNTD is just a placeholder.
+  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,mov_imm,mov_imm,
+                    load_8,load_8,store_8,store_8,adr,adr,f_mcr,f_mrc,fmov,
+                    neon_move")
+   (set_attr "fp" "*,*,*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
+   (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")
+   (set_attr "sve" "*,*,*,*,*,*,yes,*,*,*,*,*,*,*,*,*,*")]
  )
  
  (define_insn "insv_imm<mode>"
@@ -1018,6 +1059,14 @@
    "
      if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx)
        operands[1] = force_reg (TImode, operands[1]);
+
+    if (GET_CODE (operands[1]) == CONST_POLY_INT)
+      {
+       emit_move_insn (gen_lowpart (DImode, operands[0]),
+                       gen_lowpart (DImode, operands[1]));
+       emit_move_insn (gen_highpart (DImode, operands[0]), const0_rtx);
+       DONE;
+      }
    "
  )
  
@@ -1542,7 +1591,7 @@
    [(set
      (match_operand:GPI 0 "register_operand" "")
      (plus:GPI (match_operand:GPI 1 "register_operand" "")
-             (match_operand:GPI 2 "aarch64_pluslong_operand" "")))]
+             (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" "")))]
    ""
  {
    /* If operands[1] is a subreg extract the inner RTX.  */
@@ -1555,23 +1604,34 @@
        && (!REG_P (op1)
          || !REGNO_PTR_FRAME_P (REGNO (op1))))
      operands[2] = force_reg (<MODE>mode, operands[2]);
+  /* Expand polynomial additions now if the destination is the stack
+     pointer, since we don't want to use that as a temporary.  */
+  else if (operands[0] == stack_pointer_rtx
+          && aarch64_split_add_offset_immediate (operands[2], <MODE>mode))
+    {
+      aarch64_split_add_offset (<MODE>mode, operands[0], operands[1],
+                               operands[2], NULL_RTX, NULL_RTX);
+      DONE;
+    }
  })
  
  (define_insn "*add<mode>3_aarch64"
    [(set
-    (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r")
+    (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r,rk")
      (plus:GPI
-     (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk")
-     (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa")))]
+     (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk,rk")
+     (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa,Uav")))]
    ""
    "@
    add\\t%<w>0, %<w>1, %2
    add\\t%<w>0, %<w>1, %<w>2
    add\\t%<rtn>0<vas>, %<rtn>1<vas>, %<rtn>2<vas>
    sub\\t%<w>0, %<w>1, #%n2
-  #"
-  [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm,multiple")
-   (set_attr "simd" "*,*,yes,*,*")]
+  #
+  * return aarch64_output_sve_addvl_addpl (operands[0], operands[1], operands[2]);"
+  ;; The "alu_imm" type for ADDVL/ADDPL is just a placeholder.
+  [(set_attr "type" "alu_imm,alu_sreg,neon_add,alu_imm,multiple,alu_imm")
+   (set_attr "simd" "*,*,yes,*,*,*")]
  )
  
  ;; zero_extend version of above
@@ -1633,6 +1693,48 @@
    }
  )
  
+;; Match addition of polynomial offsets that require one temporary, for which
+;; we can use the early-clobbered destination register.  This is a separate
+;; pattern so that the early clobber doesn't affect register allocation
+;; for other forms of addition.  However, we still need to provide an
+;; all-register alternative, in case the offset goes out of range after
+;; elimination.  For completeness we might as well provide all GPR-based
+;; alternatives from the main pattern.
+;;
+;; We don't have a pattern for additions requiring two temporaries since at
+;; present LRA doesn't allow new scratches to be added during elimination.
+;; Such offsets should be rare anyway.
+;;
+;; ??? But if we added LRA support for new scratches, much of the ugliness
+;; here would go away.  We could just handle all polynomial constants in
+;; this pattern.
+(define_insn_and_split "*add<mode>3_poly_1"
+  [(set
+    (match_operand:GPI 0 "register_operand" "=r,r,r,r,r,&r")
+    (plus:GPI
+     (match_operand:GPI 1 "register_operand" "%rk,rk,rk,rk,rk,rk")
+     (match_operand:GPI 2 "aarch64_pluslong_or_poly_operand" "I,r,J,Uaa,Uav,Uat")))]
+  "TARGET_SVE && operands[0] != stack_pointer_rtx"
+  "@
+  add\\t%<w>0, %<w>1, %2
+  add\\t%<w>0, %<w>1, %<w>2
+  sub\\t%<w>0, %<w>1, #%n2
+  #
+  * return aarch64_output_sve_addvl_addpl (operands[0], operands[1], operands[2]);
+  #"
+  "&& epilogue_completed
+   && !reg_overlap_mentioned_p (operands[0], operands[1])
+   && aarch64_split_add_offset_immediate (operands[2], <MODE>mode)"
+  [(const_int 0)]
+  {
+    aarch64_split_add_offset (<MODE>mode, operands[0], operands[1],
+                             operands[2], operands[0], NULL_RTX);
+    DONE;
+  }
+  ;; The "alu_imm" type for ADDVL/ADDPL is just a placeholder.
+  [(set_attr "type" "alu_imm,alu_sreg,alu_imm,multiple,alu_imm,multiple")]
+)
+
  (define_split
    [(set (match_operand:DI 0 "register_operand")
         (zero_extend:DI
@@ -5797,6 +5899,12 @@
    DONE;
  })
  
+;; Helper for aarch64.c code.
+(define_expand "set_clobber_cc"
+  [(parallel [(set (match_operand 0)
+                  (match_operand 1))
+             (clobber (reg:CC CC_REGNUM))])])
+
  ;; AdvSIMD Stuff
  (include "aarch64-simd.md")
  
@@ -5805,3 +5913,6 @@
  
  ;; ldp/stp peephole patterns
  (include "aarch64-ldpstp.md")
+
+;; SVE.
+(include "aarch64-sve.md")
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt

index 18bf0e30fd1f677da953e291dccf5893d14c215e..52eaf8c6f408fb640dbc858d4cf4a70054fe8082 100644 (file)
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -185,6 +185,32 @@ Enable the division approximation.  Enabling this reduces
  precision of division results to about 16 bits for
  single precision and to 32 bits for double precision.
  
+Enum
+Name(sve_vector_bits) Type(enum aarch64_sve_vector_bits_enum)
+The possible SVE vector lengths:
+
+EnumValue
+Enum(sve_vector_bits) String(scalable) Value(SVE_SCALABLE)
+
+EnumValue
+Enum(sve_vector_bits) String(128) Value(SVE_128)
+
+EnumValue
+Enum(sve_vector_bits) String(256) Value(SVE_256)
+
+EnumValue
+Enum(sve_vector_bits) String(512) Value(SVE_512)
+
+EnumValue
+Enum(sve_vector_bits) String(1024) Value(SVE_1024)
+
+EnumValue
+Enum(sve_vector_bits) String(2048) Value(SVE_2048)
+
+msve-vector-bits=
+Target RejectNegative Joined Enum(sve_vector_bits) Var(aarch64_sve_vector_bits) Init(SVE_SCALABLE)
+-msve-vector-bits=N    Set the number of bits in an SVE vector register to N.
+
  mverbose-cost-dump
  Common Undocumented Var(flag_aarch64_verbose_cost)
  Enables verbose cost model dumping in the debug dump files.
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md

index 18adbc691ececf7fbb2c8f6d8280462f427119ca..b004f7888e188c09cee8d74de7850504ac096497 100644 (file)
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -27,6 +27,12 @@
  (define_register_constraint "w" "FP_REGS"
    "Floating point and SIMD vector registers.")
  
+(define_register_constraint "Upa" "PR_REGS"
+  "SVE predicate registers p0 - p15.")
+
+(define_register_constraint "Upl" "PR_LO_REGS"
+  "SVE predicate registers p0 - p7.")
+
  (define_register_constraint "x" "FP_LO_REGS"
    "Floating point and SIMD vector registers V0 - V15.")
  
@@ -40,6 +46,18 @@
    (and (match_code "const_int")
         (match_test "aarch64_pluslong_strict_immedate (op, VOIDmode)")))
  
+(define_constraint "Uav"
+  "@internal
+   A constraint that matches a VG-based constant that can be added by
+   a single ADDVL or ADDPL."
+ (match_operand 0 "aarch64_sve_addvl_addpl_immediate"))
+
+(define_constraint "Uat"
+  "@internal
+   A constraint that matches a VG-based constant that can be added by
+   using multiple instructions, with one temporary register."
+ (match_operand 0 "aarch64_split_add_offset_immediate"))
+
  (define_constraint "J"
   "A constant that can be used with a SUB operation (once negated)."
   (and (match_code "const_int")
@@ -134,6 +152,18 @@
    A constraint that matches the immediate constant -1."
    (match_test "op == constm1_rtx"))
  
+(define_constraint "Usv"
+  "@internal
+   A constraint that matches a VG-based constant that can be loaded by
+   a single CNT[BHWD]."
+ (match_operand 0 "aarch64_sve_cnt_immediate"))
+
+(define_constraint "Usi"
+  "@internal
+ A constraint that matches an immediate operand valid for
+ the SVE INDEX instruction."
+ (match_operand 0 "aarch64_sve_index_immediate"))
+
  (define_constraint "Ui1"
    "@internal
    A constraint that matches the immediate constant +1."
@@ -192,6 +222,13 @@
         (match_test "aarch64_legitimate_address_p (DFmode, XEXP (op, 0), 1,
                                                   ADDR_QUERY_LDP_STP)")))
  
+(define_memory_constraint "Utr"
+  "@internal
+   An address valid for SVE LDR and STR instructions (as distinct from
+   LD[1234] and ST[1234] patterns)."
+  (and (match_code "mem")
+       (match_test "aarch64_sve_ldr_operand_p (op)")))
+
  (define_memory_constraint "Utv"
    "@internal
     An address valid for loading/storing opaque structure
@@ -206,6 +243,12 @@
         (match_test "aarch64_legitimate_address_p (V2DImode,
                                                   XEXP (op, 0), 1)")))
  
+(define_memory_constraint "Uty"
+  "@internal
+   An address valid for SVE LD1Rs."
+  (and (match_code "mem")
+       (match_test "aarch64_sve_ld1r_operand_p (op)")))
+
  (define_constraint "Ufc"
    "A floating point constant which can be used with an\
     FMOV immediate operation."
@@ -235,7 +278,7 @@
  (define_constraint "Dn"
    "@internal
   A constraint that matches vector of immediates."
- (and (match_code "const_vector")
+ (and (match_code "const,const_vector")
        (match_test "aarch64_simd_valid_immediate (op, NULL)")))
  
  (define_constraint "Dh"
@@ -257,21 +300,27 @@
  (define_constraint "Dl"
    "@internal
   A constraint that matches vector of immediates for left shifts."
- (and (match_code "const_vector")
+ (and (match_code "const,const_vector")
        (match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op),
                                                  true)")))
  
  (define_constraint "Dr"
    "@internal
   A constraint that matches vector of immediates for right shifts."
- (and (match_code "const_vector")
+ (and (match_code "const,const_vector")
        (match_test "aarch64_simd_shift_imm_p (op, GET_MODE (op),
                                                  false)")))
  (define_constraint "Dz"
    "@internal
- A constraint that matches vector of immediate zero."
- (and (match_code "const_vector")
-      (match_test "aarch64_simd_imm_zero_p (op, GET_MODE (op))")))
+ A constraint that matches a vector of immediate zero."
+ (and (match_code "const,const_vector")
+      (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+(define_constraint "Dm"
+  "@internal
+ A constraint that matches a vector of immediate minus one."
+ (and (match_code "const,const_vector")
+      (match_test "op == CONST1_RTX (GET_MODE (op))")))
  
  (define_constraint "Dd"
    "@internal
@@ -291,3 +340,62 @@
    "@internal
   An address valid for a prefetch instruction."
   (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
+
+(define_constraint "vsa"
+  "@internal
+   A constraint that matches an immediate operand valid for SVE
+   arithmetic instructions."
+ (match_operand 0 "aarch64_sve_arith_immediate"))
+
+(define_constraint "vsc"
+  "@internal
+   A constraint that matches a signed immediate operand valid for SVE
+   CMP instructions."
+ (match_operand 0 "aarch64_sve_cmp_vsc_immediate"))
+
+(define_constraint "vsd"
+  "@internal
+   A constraint that matches an unsigned immediate operand valid for SVE
+   CMP instructions."
+ (match_operand 0 "aarch64_sve_cmp_vsd_immediate"))
+
+(define_constraint "vsi"
+  "@internal
+   A constraint that matches a vector count operand valid for SVE INC and
+   DEC instructions."
+ (match_operand 0 "aarch64_sve_inc_dec_immediate"))
+
+(define_constraint "vsn"
+  "@internal
+   A constraint that matches an immediate operand whose negative
+   is valid for SVE SUB instructions."
+ (match_operand 0 "aarch64_sve_sub_arith_immediate"))
+
+(define_constraint "vsl"
+  "@internal
+   A constraint that matches an immediate operand valid for SVE logical
+   operations."
+ (match_operand 0 "aarch64_sve_logical_immediate"))
+
+(define_constraint "vsm"
+  "@internal
+   A constraint that matches an immediate operand valid for SVE MUL
+   operations."
+ (match_operand 0 "aarch64_sve_mul_immediate"))
+
+(define_constraint "vsA"
+  "@internal
+   A constraint that matches an immediate operand valid for SVE FADD
+   and FSUB operations."
+ (match_operand 0 "aarch64_sve_float_arith_immediate"))
+
+(define_constraint "vsM"
+  "@internal
+   A constraint that matches an imediate operand valid for SVE FMUL
+   operations."
+ (match_operand 0 "aarch64_sve_float_mul_immediate"))
+
+(define_constraint "vsN"
+  "@internal
+   A constraint that matches the negative of vsA"
+ (match_operand 0 "aarch64_sve_float_arith_with_sub_immediate"))
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md

index e199dfdb4ea9dfb192fcd24596b77e65c7bdd444..0fe42edbc6103d83d5db5801cf8c80a2d792b9f5 100644 (file)
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -56,20 +56,20 @@
  ;; Iterator for all scalar floating point modes (SF, DF and TF)
  (define_mode_iterator GPF_TF [SF DF TF])
  
-;; Integer vector modes.
+;; Integer Advanced SIMD modes.
  (define_mode_iterator VDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI])
  
-;; vector and scalar, 64 & 128-bit container, all integer modes
+;; Advanced SIMD and scalar, 64 & 128-bit container, all integer modes.
  (define_mode_iterator VSDQ_I [V8QI V16QI V4HI V8HI V2SI V4SI V2DI QI HI SI DI])
  
-;; vector and scalar, 64 & 128-bit container: all vector integer modes;
-;; 64-bit scalar integer mode
+;; Advanced SIMD and scalar, 64 & 128-bit container: all Advanced SIMD
+;; integer modes; 64-bit scalar integer mode.
  (define_mode_iterator VSDQ_I_DI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI DI])
  
  ;; Double vector modes.
  (define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF])
  
-;; vector, 64-bit container, all integer modes
+;; Advanced SIMD, 64-bit container, all integer modes.
  (define_mode_iterator VD_BHSI [V8QI V4HI V2SI])
  
  ;; 128 and 64-bit container; 8, 16, 32-bit vector integer modes
@@ -94,16 +94,16 @@
  ;; pointer-sized quantities.  Exactly one of the two alternatives will match.
  (define_mode_iterator PTR [(SI "ptr_mode == SImode") (DI "ptr_mode == DImode")])
  
-;; Vector Float modes suitable for moving, loading and storing.
+;; Advanced SIMD Float modes suitable for moving, loading and storing.
  (define_mode_iterator VDQF_F16 [V4HF V8HF V2SF V4SF V2DF])
  
-;; Vector Float modes.
+;; Advanced SIMD Float modes.
  (define_mode_iterator VDQF [V2SF V4SF V2DF])
  (define_mode_iterator VHSDF [(V4HF "TARGET_SIMD_F16INST")
                              (V8HF "TARGET_SIMD_F16INST")
                              V2SF V4SF V2DF])
  
-;; Vector Float modes, and DF.
+;; Advanced SIMD Float modes, and DF.
  (define_mode_iterator VHSDF_DF [(V4HF "TARGET_SIMD_F16INST")
                                 (V8HF "TARGET_SIMD_F16INST")
                                 V2SF V4SF V2DF DF])
@@ -113,7 +113,7 @@
                                   (HF "TARGET_SIMD_F16INST")
                                   SF DF])
  
-;; Vector single Float modes.
+;; Advanced SIMD single Float modes.
  (define_mode_iterator VDQSF [V2SF V4SF])
  
  ;; Quad vector Float modes with half/single elements.
@@ -122,16 +122,16 @@
  ;; Modes suitable to use as the return type of a vcond expression.
  (define_mode_iterator VDQF_COND [V2SF V2SI V4SF V4SI V2DF V2DI])
  
-;; All Float modes.
+;; All scalar and Advanced SIMD Float modes.
  (define_mode_iterator VALLF [V2SF V4SF V2DF SF DF])
  
-;; Vector Float modes with 2 elements.
+;; Advanced SIMD Float modes with 2 elements.
  (define_mode_iterator V2F [V2SF V2DF])
  
-;; All vector modes on which we support any arithmetic operations.
+;; All Advanced SIMD modes on which we support any arithmetic operations.
  (define_mode_iterator VALL [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF])
  
-;; All vector modes suitable for moving, loading, and storing.
+;; All Advanced SIMD modes suitable for moving, loading, and storing.
  (define_mode_iterator VALL_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
                                 V4HF V8HF V2SF V4SF V2DF])
  
@@ -139,21 +139,21 @@
  (define_mode_iterator VALL_F16_NO_V2Q [V8QI V16QI V4HI V8HI V2SI V4SI
                                 V4HF V8HF V2SF V4SF])
  
-;; All vector modes barring HF modes, plus DI.
+;; All Advanced SIMD modes barring HF modes, plus DI.
  (define_mode_iterator VALLDI [V8QI V16QI V4HI V8HI V2SI V4SI V2DI V2SF V4SF V2DF DI])
  
-;; All vector modes and DI.
+;; All Advanced SIMD modes and DI.
  (define_mode_iterator VALLDI_F16 [V8QI V16QI V4HI V8HI V2SI V4SI V2DI
                                   V4HF V8HF V2SF V4SF V2DF DI])
  
-;; All vector modes, plus DI and DF.
+;; All Advanced SIMD modes, plus DI and DF.
  (define_mode_iterator VALLDIF [V8QI V16QI V4HI V8HI V2SI V4SI
                                V2DI V4HF V8HF V2SF V4SF V2DF DI DF])
  
-;; Vector modes for Integer reduction across lanes.
+;; Advanced SIMD modes for Integer reduction across lanes.
  (define_mode_iterator VDQV [V8QI V16QI V4HI V8HI V4SI V2DI])
  
-;; Vector modes(except V2DI) for Integer reduction across lanes.
+;; Advanced SIMD modes (except V2DI) for Integer reduction across lanes.
  (define_mode_iterator VDQV_S [V8QI V16QI V4HI V8HI V4SI])
  
  ;; All double integer narrow-able modes.
@@ -162,7 +162,8 @@
  ;; All quad integer narrow-able modes.
  (define_mode_iterator VQN [V8HI V4SI V2DI])
  
-;; Vector and scalar 128-bit container: narrowable 16, 32, 64-bit integer modes
+;; Advanced SIMD and scalar 128-bit container: narrowable 16, 32, 64-bit
+;; integer modes
  (define_mode_iterator VSQN_HSDI [V8HI V4SI V2DI HI SI DI])
  
  ;; All quad integer widen-able modes.
@@ -171,54 +172,54 @@
  ;; Double vector modes for combines.
  (define_mode_iterator VDC [V8QI V4HI V4HF V2SI V2SF DI DF])
  
-;; Vector modes except double int.
+;; Advanced SIMD modes except double int.
  (define_mode_iterator VDQIF [V8QI V16QI V4HI V8HI V2SI V4SI V2SF V4SF V2DF])
  (define_mode_iterator VDQIF_F16 [V8QI V16QI V4HI V8HI V2SI V4SI
                                   V4HF V8HF V2SF V4SF V2DF])
  
-;; Vector modes for S type.
+;; Advanced SIMD modes for S type.
  (define_mode_iterator VDQ_SI [V2SI V4SI])
  
-;; Vector modes for S and D
+;; Advanced SIMD modes for S and D.
  (define_mode_iterator VDQ_SDI [V2SI V4SI V2DI])
  
-;; Vector modes for H, S and D
+;; Advanced SIMD modes for H, S and D.
  (define_mode_iterator VDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
                                 (V8HI "TARGET_SIMD_F16INST")
                                 V2SI V4SI V2DI])
  
-;; Scalar and Vector modes for S and D
+;; Scalar and Advanced SIMD modes for S and D.
  (define_mode_iterator VSDQ_SDI [V2SI V4SI V2DI SI DI])
  
-;; Scalar and Vector modes for S and D, Vector modes for H.
+;; Scalar and Advanced SIMD modes for S and D, Advanced SIMD modes for H.
  (define_mode_iterator VSDQ_HSDI [(V4HI "TARGET_SIMD_F16INST")
                                  (V8HI "TARGET_SIMD_F16INST")
                                  V2SI V4SI V2DI
                                  (HI "TARGET_SIMD_F16INST")
                                  SI DI])
  
-;; Vector modes for Q and H types.
+;; Advanced SIMD modes for Q and H types.
  (define_mode_iterator VDQQH [V8QI V16QI V4HI V8HI])
  
-;; Vector modes for H and S types.
+;; Advanced SIMD modes for H and S types.
  (define_mode_iterator VDQHS [V4HI V8HI V2SI V4SI])
  
-;; Vector modes for H, S and D types.
+;; Advanced SIMD modes for H, S and D types.
  (define_mode_iterator VDQHSD [V4HI V8HI V2SI V4SI V2DI])
  
-;; Vector and scalar integer modes for H and S
+;; Advanced SIMD and scalar integer modes for H and S.
  (define_mode_iterator VSDQ_HSI [V4HI V8HI V2SI V4SI HI SI])
  
-;; Vector and scalar 64-bit container: 16, 32-bit integer modes
+;; Advanced SIMD and scalar 64-bit container: 16, 32-bit integer modes.
  (define_mode_iterator VSD_HSI [V4HI V2SI HI SI])
  
-;; Vector 64-bit container: 16, 32-bit integer modes
+;; Advanced SIMD 64-bit container: 16, 32-bit integer modes.
  (define_mode_iterator VD_HSI [V4HI V2SI])
  
  ;; Scalar 64-bit container: 16, 32-bit integer modes
  (define_mode_iterator SD_HSI [HI SI])
  
-;; Vector 64-bit container: 16, 32-bit integer modes
+;; Advanced SIMD 64-bit container: 16, 32-bit integer modes.
  (define_mode_iterator VQ_HSI [V8HI V4SI])
  
  ;; All byte modes.
@@ -229,21 +230,59 @@
  
  (define_mode_iterator TX [TI TF])
  
-;; Opaque structure modes.
+;; Advanced SIMD opaque structure modes.
  (define_mode_iterator VSTRUCT [OI CI XI])
  
  ;; Double scalar modes
  (define_mode_iterator DX [DI DF])
  
-;; Modes available for <f>mul lane operations.
+;; Modes available for Advanced SIMD <f>mul lane operations.
  (define_mode_iterator VMUL [V4HI V8HI V2SI V4SI
                             (V4HF "TARGET_SIMD_F16INST")
                             (V8HF "TARGET_SIMD_F16INST")
                             V2SF V4SF V2DF])
  
-;; Modes available for <f>mul lane operations changing lane count.
+;; Modes available for Advanced SIMD <f>mul lane operations changing lane
+;; count.
  (define_mode_iterator VMUL_CHANGE_NLANES [V4HI V8HI V2SI V4SI V2SF V4SF])
  
+;; All SVE vector modes.
+(define_mode_iterator SVE_ALL [VNx16QI VNx8HI VNx4SI VNx2DI
+                              VNx8HF VNx4SF VNx2DF])
+
+;; All SVE vector modes that have 8-bit or 16-bit elements.
+(define_mode_iterator SVE_BH [VNx16QI VNx8HI VNx8HF])
+
+;; All SVE vector modes that have 8-bit, 16-bit or 32-bit elements.
+(define_mode_iterator SVE_BHS [VNx16QI VNx8HI VNx4SI VNx8HF VNx4SF])
+
+;; All SVE integer vector modes that have 8-bit, 16-bit or 32-bit elements.
+(define_mode_iterator SVE_BHSI [VNx16QI VNx8HI VNx4SI])
+
+;; All SVE integer vector modes that have 16-bit, 32-bit or 64-bit elements.
+(define_mode_iterator SVE_HSDI [VNx16QI VNx8HI VNx4SI])
+
+;; All SVE floating-point vector modes that have 16-bit or 32-bit elements.
+(define_mode_iterator SVE_HSF [VNx8HF VNx4SF])
+
+;; All SVE vector modes that have 32-bit or 64-bit elements.
+(define_mode_iterator SVE_SD [VNx4SI VNx2DI VNx4SF VNx2DF])
+
+;; All SVE integer vector modes that have 32-bit or 64-bit elements.
+(define_mode_iterator SVE_SDI [VNx4SI VNx2DI])
+
+;; All SVE integer vector modes.
+(define_mode_iterator SVE_I [VNx16QI VNx8HI VNx4SI VNx2DI])
+
+;; All SVE floating-point vector modes.
+(define_mode_iterator SVE_F [VNx8HF VNx4SF VNx2DF])
+
+;; All SVE predicate modes.
+(define_mode_iterator PRED_ALL [VNx16BI VNx8BI VNx4BI VNx2BI])
+
+;; SVE predicate modes that control 8-bit, 16-bit or 32-bit elements.
+(define_mode_iterator PRED_BHS [VNx16BI VNx8BI VNx4BI])
+
  ;; ------------------------------------------------------------------
  ;; Unspec enumerations for Advance SIMD. These could well go into
  ;; aarch64.md but for their use in int_iterators here.
@@ -378,6 +417,22 @@
      UNSPEC_FMLSL       ; Used in aarch64-simd.md.
      UNSPEC_FMLAL2      ; Used in aarch64-simd.md.
      UNSPEC_FMLSL2      ; Used in aarch64-simd.md.
+    UNSPEC_SEL         ; Used in aarch64-sve.md.
+    UNSPEC_ANDF                ; Used in aarch64-sve.md.
+    UNSPEC_IORF                ; Used in aarch64-sve.md.
+    UNSPEC_XORF                ; Used in aarch64-sve.md.
+    UNSPEC_COND_LT     ; Used in aarch64-sve.md.
+    UNSPEC_COND_LE     ; Used in aarch64-sve.md.
+    UNSPEC_COND_EQ     ; Used in aarch64-sve.md.
+    UNSPEC_COND_NE     ; Used in aarch64-sve.md.
+    UNSPEC_COND_GE     ; Used in aarch64-sve.md.
+    UNSPEC_COND_GT     ; Used in aarch64-sve.md.
+    UNSPEC_COND_LO     ; Used in aarch64-sve.md.
+    UNSPEC_COND_LS     ; Used in aarch64-sve.md.
+    UNSPEC_COND_HS     ; Used in aarch64-sve.md.
+    UNSPEC_COND_HI     ; Used in aarch64-sve.md.
+    UNSPEC_COND_UO     ; Used in aarch64-sve.md.
+    UNSPEC_LASTB       ; Used in aarch64-sve.md.
  ])
  
  ;; ------------------------------------------------------------------
@@ -535,17 +590,24 @@
                            (HI   "")])
  
  ;; Mode-to-individual element type mapping.
-(define_mode_attr Vetype [(V8QI "b") (V16QI "b")
-                         (V4HI "h") (V8HI  "h")
-                          (V2SI "s") (V4SI  "s")
-                         (V2DI "d") (V4HF "h")
-                         (V8HF "h") (V2SF  "s")
-                         (V4SF "s") (V2DF  "d")
+(define_mode_attr Vetype [(V8QI "b") (V16QI "b") (VNx16QI "b") (VNx16BI "b")
+                         (V4HI "h") (V8HI  "h") (VNx8HI  "h") (VNx8BI  "h")
+                         (V2SI "s") (V4SI  "s") (VNx4SI  "s") (VNx4BI  "s")
+                         (V2DI "d")             (VNx2DI  "d") (VNx2BI  "d")
+                         (V4HF "h") (V8HF  "h") (VNx8HF  "h")
+                         (V2SF "s") (V4SF  "s") (VNx4SF  "s")
+                         (V2DF "d")             (VNx2DF  "d")
                           (HF   "h")
                           (SF   "s") (DF  "d")
                           (QI "b")   (HI "h")
                           (SI "s")   (DI "d")])
  
+;; Equivalent of "size" for a vector element.
+(define_mode_attr Vesize [(VNx16QI "b")
+                         (VNx8HI  "h") (VNx8HF "h")
+                         (VNx4SI  "w") (VNx4SF "w")
+                         (VNx2DI  "d") (VNx2DF "d")])
+
  ;; Vetype is used everywhere in scheduling type and assembly output,
  ;; sometimes they are not the same, for example HF modes on some
  ;; instructions.  stype is defined to represent scheduling type
@@ -567,27 +629,45 @@
                           (SI   "8b")])
  
  ;; Define element mode for each vector mode.
-(define_mode_attr VEL [(V8QI "QI") (V16QI "QI")
-                       (V4HI "HI") (V8HI "HI")
-                        (V2SI "SI") (V4SI "SI")
-                        (DI "DI")   (V2DI "DI")
-                        (V4HF "HF") (V8HF "HF")
-                        (V2SF "SF") (V4SF "SF")
-                        (V2DF "DF") (DF "DF")
-                       (SI   "SI") (HI   "HI")
+(define_mode_attr VEL [(V8QI  "QI") (V16QI "QI") (VNx16QI "QI")
+                       (V4HI "HI") (V8HI  "HI") (VNx8HI  "HI")
+                       (V2SI "SI") (V4SI  "SI") (VNx4SI  "SI")
+                       (DI   "DI") (V2DI  "DI") (VNx2DI  "DI")
+                       (V4HF "HF") (V8HF  "HF") (VNx8HF  "HF")
+                       (V2SF "SF") (V4SF  "SF") (VNx4SF  "SF")
+                       (DF   "DF") (V2DF  "DF") (VNx2DF  "DF")
+                       (SI   "SI") (HI    "HI")
                         (QI   "QI")])
  
  ;; Define element mode for each vector mode (lower case).
-(define_mode_attr Vel [(V8QI "qi") (V16QI "qi")
-                       (V4HI "hi") (V8HI "hi")
-                       (V2SI "si") (V4SI "si")
-                       (DI "di")   (V2DI "di")
-                       (V4HF "hf") (V8HF "hf")
-                       (V2SF "sf") (V4SF "sf")
-                       (V2DF "df") (DF "df")
+(define_mode_attr Vel [(V8QI "qi") (V16QI "qi") (VNx16QI "qi")
+                       (V4HI "hi") (V8HI "hi") (VNx8HI  "hi")
+                       (V2SI "si") (V4SI "si") (VNx4SI  "si")
+                       (DI "di")   (V2DI "di") (VNx2DI  "di")
+                       (V4HF "hf") (V8HF "hf") (VNx8HF  "hf")
+                       (V2SF "sf") (V4SF "sf") (VNx4SF  "sf")
+                       (V2DF "df") (DF "df")   (VNx2DF  "df")
                         (SI   "si") (HI   "hi")
                         (QI   "qi")])
  
+;; Element mode with floating-point values replaced by like-sized integers.
+(define_mode_attr VEL_INT [(VNx16QI "QI")
+                          (VNx8HI  "HI") (VNx8HF "HI")
+                          (VNx4SI  "SI") (VNx4SF "SI")
+                          (VNx2DI  "DI") (VNx2DF "DI")])
+
+;; Gives the mode of the 128-bit lowpart of an SVE vector.
+(define_mode_attr V128 [(VNx16QI "V16QI")
+                       (VNx8HI  "V8HI") (VNx8HF "V8HF")
+                       (VNx4SI  "V4SI") (VNx4SF "V4SF")
+                       (VNx2DI  "V2DI") (VNx2DF "V2DF")])
+
+;; ...and again in lower case.
+(define_mode_attr v128 [(VNx16QI "v16qi")
+                       (VNx8HI  "v8hi") (VNx8HF "v8hf")
+                       (VNx4SI  "v4si") (VNx4SF "v4sf")
+                       (VNx2DI  "v2di") (VNx2DF "v2df")])
+
  ;; 64-bit container modes the inner or scalar source mode.
  (define_mode_attr VCOND [(HI "V4HI") (SI "V2SI")
                          (V4HI "V4HI") (V8HI "V4HI")
@@ -666,16 +746,28 @@
                            (V2DI "4s")])
  
  ;; Widened modes of vector modes.
-(define_mode_attr VWIDE [(V8QI "V8HI") (V4HI "V4SI")
-                        (V2SI "V2DI") (V16QI "V8HI") 
-                        (V8HI "V4SI") (V4SI "V2DI")
-                        (HI "SI")     (SI "DI")
-                        (V8HF "V4SF") (V4SF "V2DF")
-                        (V4HF "V4SF") (V2SF "V2DF")]
-)
+(define_mode_attr VWIDE [(V8QI  "V8HI")  (V4HI  "V4SI")
+                        (V2SI  "V2DI")  (V16QI "V8HI")
+                        (V8HI  "V4SI")  (V4SI  "V2DI")
+                        (HI    "SI")    (SI    "DI")
+                        (V8HF  "V4SF")  (V4SF  "V2DF")
+                        (V4HF  "V4SF")  (V2SF  "V2DF")
+                        (VNx8HF  "VNx4SF") (VNx4SF "VNx2DF")
+                        (VNx16QI "VNx8HI") (VNx8HI "VNx4SI")
+                        (VNx4SI  "VNx2DI")
+                        (VNx16BI "VNx8BI") (VNx8BI "VNx4BI")
+                        (VNx4BI  "VNx2BI")])
+
+;; Predicate mode associated with VWIDE.
+(define_mode_attr VWIDE_PRED [(VNx8HF "VNx4BI") (VNx4SF "VNx2BI")])
  
  ;; Widened modes of vector modes, lowercase
-(define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf")])
+(define_mode_attr Vwide [(V2SF "v2df") (V4HF "v4sf")
+                        (VNx16QI "vnx8hi") (VNx8HI "vnx4si")
+                        (VNx4SI  "vnx2di")
+                        (VNx8HF  "vnx4sf") (VNx4SF "vnx2df")
+                        (VNx16BI "vnx8bi") (VNx8BI "vnx4bi")
+                        (VNx4BI  "vnx2bi")])
  
  ;; Widened mode register suffixes for VD_BHSI/VQW/VQ_HSF.
  (define_mode_attr Vwtype [(V8QI "8h") (V4HI "4s")
@@ -683,6 +775,11 @@
                           (V8HI "4s") (V4SI "2d")
                           (V8HF "4s") (V4SF "2d")])
  
+;; SVE vector after widening
+(define_mode_attr Vewtype [(VNx16QI "h")
+                          (VNx8HI  "s") (VNx8HF "s")
+                          (VNx4SI  "d") (VNx4SF "d")])
+
  ;; Widened mode register suffixes for VDW/VQW.
  (define_mode_attr Vmwtype [(V8QI ".8h") (V4HI ".4s")
                            (V2SI ".2d") (V16QI ".8h") 
@@ -696,22 +793,23 @@
                              (V4SF "2s")])
  
  ;; Define corresponding core/FP element mode for each vector mode.
-(define_mode_attr vw   [(V8QI "w") (V16QI "w")
-                        (V4HI "w") (V8HI "w")
-                        (V2SI "w") (V4SI "w")
-                        (DI   "x") (V2DI "x")
-                        (V2SF "s") (V4SF "s")
-                        (V2DF "d")])
+(define_mode_attr vw [(V8QI "w") (V16QI "w") (VNx16QI "w")
+                     (V4HI "w") (V8HI "w") (VNx8HI "w")
+                     (V2SI "w") (V4SI "w") (VNx4SI "w")
+                     (DI   "x") (V2DI "x") (VNx2DI "x")
+                     (VNx8HF "h")
+                     (V2SF "s") (V4SF "s") (VNx4SF "s")
+                     (V2DF "d") (VNx2DF "d")])
  
  ;; Corresponding core element mode for each vector mode.  This is a
  ;; variation on <vw> mapping FP modes to GP regs.
-(define_mode_attr vwcore  [(V8QI "w") (V16QI "w")
-                          (V4HI "w") (V8HI "w")
-                          (V2SI "w") (V4SI "w")
-                          (DI   "x") (V2DI "x")
-                          (V4HF "w") (V8HF "w")
-                          (V2SF "w") (V4SF "w")
-                          (V2DF "x")])
+(define_mode_attr vwcore [(V8QI "w") (V16QI "w") (VNx16QI "w")
+                         (V4HI "w") (V8HI "w") (VNx8HI "w")
+                         (V2SI "w") (V4SI "w") (VNx4SI "w")
+                         (DI   "x") (V2DI "x") (VNx2DI "x")
+                         (V4HF "w") (V8HF "w") (VNx8HF "w")
+                         (V2SF "w") (V4SF "w") (VNx4SF "w")
+                         (V2DF "x") (VNx2DF "x")])
  
  ;; Double vector types for ALLX.
  (define_mode_attr Vallxd [(QI "8b") (HI "4h") (SI "2s")])
@@ -723,8 +821,13 @@
                                (DI   "DI")   (V2DI  "V2DI")
                                (V4HF "V4HI") (V8HF  "V8HI")
                                (V2SF "V2SI") (V4SF  "V4SI")
-                              (V2DF "V2DI") (DF    "DI")
-                              (SF   "SI")   (HF    "HI")])
+                              (DF   "DI")   (V2DF  "V2DI")
+                              (SF   "SI")   (HF    "HI")
+                              (VNx16QI "VNx16QI")
+                              (VNx8HI  "VNx8HI") (VNx8HF "VNx8HI")
+                              (VNx4SI  "VNx4SI") (VNx4SF "VNx4SI")
+                              (VNx2DI  "VNx2DI") (VNx2DF "VNx2DI")
+])
  
  ;; Lower case mode with floating-point values replaced by like-sized integers.
  (define_mode_attr v_int_equiv [(V8QI "v8qi") (V16QI "v16qi")
@@ -733,8 +836,19 @@
                                (DI   "di")   (V2DI  "v2di")
                                (V4HF "v4hi") (V8HF  "v8hi")
                                (V2SF "v2si") (V4SF  "v4si")
-                              (V2DF "v2di") (DF    "di")
-                              (SF   "si")])
+                              (DF   "di")   (V2DF  "v2di")
+                              (SF   "si")
+                              (VNx16QI "vnx16qi")
+                              (VNx8HI  "vnx8hi") (VNx8HF "vnx8hi")
+                              (VNx4SI  "vnx4si") (VNx4SF "vnx4si")
+                              (VNx2DI  "vnx2di") (VNx2DF "vnx2di")
+])
+
+;; Floating-point equivalent of selected modes.
+(define_mode_attr V_FP_EQUIV [(VNx4SI "VNx4SF") (VNx4SF "VNx4SF")
+                             (VNx2DI "VNx2DF") (VNx2DF "VNx2DF")])
+(define_mode_attr v_fp_equiv [(VNx4SI "vnx4sf") (VNx4SF "vnx4sf")
+                             (VNx2DI "vnx2df") (VNx2DF "vnx2df")])
  
  ;; Mode for vector conditional operations where the comparison has
  ;; different type from the lhs.
@@ -869,6 +983,18 @@
  
  (define_code_attr f16mac [(plus "a") (minus "s")])
  
+;; The predicate mode associated with an SVE data mode.
+(define_mode_attr VPRED [(VNx16QI "VNx16BI")
+                        (VNx8HI "VNx8BI") (VNx8HF "VNx8BI")
+                        (VNx4SI "VNx4BI") (VNx4SF "VNx4BI")
+                        (VNx2DI "VNx2BI") (VNx2DF "VNx2BI")])
+
+;; ...and again in lower case.
+(define_mode_attr vpred [(VNx16QI "vnx16bi")
+                        (VNx8HI "vnx8bi") (VNx8HF "vnx8bi")
+                        (VNx4SI "vnx4bi") (VNx4SF "vnx4bi")
+                        (VNx2DI "vnx2bi") (VNx2DF "vnx2bi")])
+
  ;; -------------------------------------------------------------------
  ;; Code Iterators
  ;; -------------------------------------------------------------------
@@ -882,6 +1008,9 @@
  ;; Code iterator for logical operations
  (define_code_iterator LOGICAL [and ior xor])
  
+;; LOGICAL without AND.
+(define_code_iterator LOGICAL_OR [ior xor])
+
  ;; Code iterator for logical operations whose :nlogical works on SIMD registers.
  (define_code_iterator NLOGICAL [and ior])
  
@@ -940,6 +1069,12 @@
  ;; Unsigned comparison operators.
  (define_code_iterator FAC_COMPARISONS [lt le ge gt])
  
+;; SVE integer unary operations.
+(define_code_iterator SVE_INT_UNARY [neg not popcount])
+
+;; SVE floating-point unary operations.
+(define_code_iterator SVE_FP_UNARY [neg abs sqrt])
+
  ;; -------------------------------------------------------------------
  ;; Code Attributes
  ;; -------------------------------------------------------------------
@@ -956,6 +1091,7 @@
                          (unsigned_fix "fixuns")
                          (float "float")
                          (unsigned_float "floatuns")
+                        (popcount "popcount")
                          (and "and")
                          (ior "ior")
                          (xor "xor")
@@ -969,6 +1105,10 @@
                          (us_minus "qsub")
                          (ss_neg "qneg")
                          (ss_abs "qabs")
+                        (smin "smin")
+                        (smax "smax")
+                        (umin "umin")
+                        (umax "umax")
                          (eq "eq")
                          (ne "ne")
                          (lt "lt")
@@ -978,7 +1118,9 @@
                          (ltu "ltu")
                          (leu "leu")
                          (geu "geu")
-                        (gtu "gtu")])
+                        (gtu "gtu")
+                        (abs "abs")
+                        (sqrt "sqrt")])
  
  ;; For comparison operators we use the FCM* and CM* instructions.
  ;; As there are no CMLE or CMLT instructions which act on 3 vector
@@ -1021,9 +1163,12 @@
  ;; Operation names for negate and bitwise complement.
  (define_code_attr neg_not_op [(neg "neg") (not "not")])
  
-;; Similar, but when not(op)
+;; Similar, but when the second operand is inverted.
  (define_code_attr nlogical [(and "bic") (ior "orn") (xor "eon")])
  
+;; Similar, but when both operands are inverted.
+(define_code_attr logical_nn [(and "nor") (ior "nand")])
+
  ;; Sign- or zero-extending data-op
  (define_code_attr su [(sign_extend "s") (zero_extend "u")
                       (sign_extract "s") (zero_extract "u")
@@ -1032,6 +1177,9 @@
                       (smax "s") (umax "u")
                       (smin "s") (umin "u")])
  
+;; Whether a shift is left or right.
+(define_code_attr lr [(ashift "l") (ashiftrt "r") (lshiftrt "r")])
+
  ;; Emit conditional branch instructions.
  (define_code_attr bcond [(eq "beq") (ne "bne") (lt "bne") (ge "beq")])
  
@@ -1077,6 +1225,25 @@
  ;; Attribute to describe constants acceptable in atomic logical operations
  (define_mode_attr lconst_atomic [(QI "K") (HI "K") (SI "K") (DI "L")])
  
+;; The integer SVE instruction that implements an rtx code.
+(define_code_attr sve_int_op [(plus "add")
+                             (neg "neg")
+                             (smin "smin")
+                             (smax "smax")
+                             (umin "umin")
+                             (umax "umax")
+                             (and "and")
+                             (ior "orr")
+                             (xor "eor")
+                             (not "not")
+                             (popcount "cnt")])
+
+;; The floating-point SVE instruction that implements an rtx code.
+(define_code_attr sve_fp_op [(plus "fadd")
+                            (neg "fneg")
+                            (abs "fabs")
+                            (sqrt "fsqrt")])
+
  ;; -------------------------------------------------------------------
  ;; Int Iterators.
  ;; -------------------------------------------------------------------
@@ -1086,6 +1253,8 @@
  (define_int_iterator FMAXMINV [UNSPEC_FMAXV UNSPEC_FMINV
                                UNSPEC_FMAXNMV UNSPEC_FMINNMV])
  
+(define_int_iterator LOGICALF [UNSPEC_ANDF UNSPEC_IORF UNSPEC_XORF])
+
  (define_int_iterator HADDSUB [UNSPEC_SHADD UNSPEC_UHADD
                               UNSPEC_SRHADD UNSPEC_URHADD
                               UNSPEC_SHSUB UNSPEC_UHSUB
@@ -1141,6 +1310,9 @@
                               UNSPEC_TRN1 UNSPEC_TRN2
                               UNSPEC_UZP1 UNSPEC_UZP2])
  
+(define_int_iterator OPTAB_PERMUTE [UNSPEC_ZIP1 UNSPEC_ZIP2
+                                   UNSPEC_UZP1 UNSPEC_UZP2])
+
  (define_int_iterator REVERSE [UNSPEC_REV64 UNSPEC_REV32 UNSPEC_REV16])
  
  (define_int_iterator FRINT [UNSPEC_FRINTZ UNSPEC_FRINTP UNSPEC_FRINTM
@@ -1179,6 +1351,21 @@
  
  (define_int_iterator VFMLA16_HIGH [UNSPEC_FMLAL2 UNSPEC_FMLSL2])
  
+(define_int_iterator UNPACK [UNSPEC_UNPACKSHI UNSPEC_UNPACKUHI
+                            UNSPEC_UNPACKSLO UNSPEC_UNPACKULO])
+
+(define_int_iterator UNPACK_UNSIGNED [UNSPEC_UNPACKULO UNSPEC_UNPACKUHI])
+
+(define_int_iterator SVE_COND_INT_CMP [UNSPEC_COND_LT UNSPEC_COND_LE
+                                      UNSPEC_COND_EQ UNSPEC_COND_NE
+                                      UNSPEC_COND_GE UNSPEC_COND_GT
+                                      UNSPEC_COND_LO UNSPEC_COND_LS
+                                      UNSPEC_COND_HS UNSPEC_COND_HI])
+
+(define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_LT UNSPEC_COND_LE
+                                     UNSPEC_COND_EQ UNSPEC_COND_NE
+                                     UNSPEC_COND_GE UNSPEC_COND_GT])
+
  ;; Iterators for atomic operations.
  
  (define_int_iterator ATOMIC_LDOP
@@ -1192,6 +1379,14 @@
  ;; -------------------------------------------------------------------
  ;; Int Iterators Attributes.
  ;; -------------------------------------------------------------------
+
+;; The optab associated with an operation.  Note that for ANDF, IORF
+;; and XORF, the optab pattern is not actually defined; we just use this
+;; name for consistency with the integer patterns.
+(define_int_attr optab [(UNSPEC_ANDF "and")
+                       (UNSPEC_IORF "ior")
+                       (UNSPEC_XORF "xor")])
+
  (define_int_attr  maxmin_uns [(UNSPEC_UMAXV "umax")
                               (UNSPEC_UMINV "umin")
                               (UNSPEC_SMAXV "smax")
@@ -1218,6 +1413,17 @@
                                  (UNSPEC_FMAXNM "fmaxnm")
                                  (UNSPEC_FMINNM "fminnm")])
  
+;; The SVE logical instruction that implements an unspec.
+(define_int_attr logicalf_op [(UNSPEC_ANDF "and")
+                             (UNSPEC_IORF "orr")
+                             (UNSPEC_XORF "eor")])
+
+;; "s" for signed operations and "u" for unsigned ones.
+(define_int_attr su [(UNSPEC_UNPACKSHI "s")
+                    (UNSPEC_UNPACKUHI "u")
+                    (UNSPEC_UNPACKSLO "s")
+                    (UNSPEC_UNPACKULO "u")])
+
  (define_int_attr sur [(UNSPEC_SHADD "s") (UNSPEC_UHADD "u")
                       (UNSPEC_SRHADD "sr") (UNSPEC_URHADD "ur")
                       (UNSPEC_SHSUB "s") (UNSPEC_UHSUB "u")
@@ -1328,7 +1534,9 @@
  
  (define_int_attr perm_hilo [(UNSPEC_ZIP1 "1") (UNSPEC_ZIP2 "2")
                             (UNSPEC_TRN1 "1") (UNSPEC_TRN2 "2")
-                           (UNSPEC_UZP1 "1") (UNSPEC_UZP2 "2")])
+                           (UNSPEC_UZP1 "1") (UNSPEC_UZP2 "2")
+                           (UNSPEC_UNPACKSHI "hi") (UNSPEC_UNPACKUHI "hi")
+                           (UNSPEC_UNPACKSLO "lo") (UNSPEC_UNPACKULO "lo")])
  
  (define_int_attr frecp_suffix  [(UNSPEC_FRECPE "e") (UNSPEC_FRECPX "x")])
  
@@ -1361,3 +1569,27 @@
  
  (define_int_attr f16mac1 [(UNSPEC_FMLAL "a") (UNSPEC_FMLSL "s")
                           (UNSPEC_FMLAL2 "a") (UNSPEC_FMLSL2 "s")])
+
+;; The condition associated with an UNSPEC_COND_<xx>.
+(define_int_attr cmp_op [(UNSPEC_COND_LT "lt")
+                        (UNSPEC_COND_LE "le")
+                        (UNSPEC_COND_EQ "eq")
+                        (UNSPEC_COND_NE "ne")
+                        (UNSPEC_COND_GE "ge")
+                        (UNSPEC_COND_GT "gt")
+                        (UNSPEC_COND_LO "lo")
+                        (UNSPEC_COND_LS "ls")
+                        (UNSPEC_COND_HS "hs")
+                        (UNSPEC_COND_HI "hi")])
+
+;; The constraint to use for an UNSPEC_COND_<xx>.
+(define_int_attr imm_con [(UNSPEC_COND_EQ "vsc")
+                         (UNSPEC_COND_NE "vsc")
+                         (UNSPEC_COND_LT "vsc")
+                         (UNSPEC_COND_GE "vsc")
+                         (UNSPEC_COND_LE "vsc")
+                         (UNSPEC_COND_GT "vsc")
+                         (UNSPEC_COND_LO "vsd")
+                         (UNSPEC_COND_LS "vsd")
+                         (UNSPEC_COND_HS "vsd")
+                         (UNSPEC_COND_HI "vsd")])
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md

index 65b2df6ed1ab34097b27345d972e7e4ce93da889..7424f506a5c6a289d3afc4472670e57d4029c8f3 100644 (file)
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -93,6 +93,10 @@
  (define_predicate "aarch64_fp_vec_pow2"
    (match_test "aarch64_vec_fpconst_pow_of_2 (op) > 0"))
  
+(define_predicate "aarch64_sve_cnt_immediate"
+  (and (match_code "const_poly_int")
+       (match_test "aarch64_sve_cnt_immediate_p (op)")))
+
  (define_predicate "aarch64_sub_immediate"
    (and (match_code "const_int")
         (match_test "aarch64_uimm12_shift (-INTVAL (op))")))
@@ -114,9 +118,22 @@
    (and (match_operand 0 "aarch64_pluslong_immediate")
         (not (match_operand 0 "aarch64_plus_immediate"))))
  
+(define_predicate "aarch64_sve_addvl_addpl_immediate"
+  (and (match_code "const_poly_int")
+       (match_test "aarch64_sve_addvl_addpl_immediate_p (op)")))
+
+(define_predicate "aarch64_split_add_offset_immediate"
+  (and (match_code "const_poly_int")
+       (match_test "aarch64_add_offset_temporaries (op) == 1")))
+
  (define_predicate "aarch64_pluslong_operand"
    (ior (match_operand 0 "register_operand")
-       (match_operand 0 "aarch64_pluslong_immediate")))
+       (match_operand 0 "aarch64_pluslong_immediate")
+       (match_operand 0 "aarch64_sve_addvl_addpl_immediate")))
+
+(define_predicate "aarch64_pluslong_or_poly_operand"
+  (ior (match_operand 0 "aarch64_pluslong_operand")
+       (match_operand 0 "aarch64_split_add_offset_immediate")))
  
  (define_predicate "aarch64_logical_immediate"
    (and (match_code "const_int")
@@ -263,11 +280,18 @@
  })
  
  (define_predicate "aarch64_mov_operand"
-  (and (match_code "reg,subreg,mem,const,const_int,symbol_ref,label_ref,high")
+  (and (match_code "reg,subreg,mem,const,const_int,symbol_ref,label_ref,high,
+                   const_poly_int,const_vector")
         (ior (match_operand 0 "register_operand")
             (ior (match_operand 0 "memory_operand")
                  (match_test "aarch64_mov_operand_p (op, mode)")))))
  
+(define_predicate "aarch64_nonmemory_operand"
+  (and (match_code "reg,subreg,const,const_int,symbol_ref,label_ref,high,
+                   const_poly_int,const_vector")
+       (ior (match_operand 0 "register_operand")
+           (match_test "aarch64_mov_operand_p (op, mode)"))))
+
  (define_predicate "aarch64_movti_operand"
    (and (match_code "reg,subreg,mem,const_int")
         (ior (match_operand 0 "register_operand")
@@ -303,6 +327,9 @@
    return aarch64_get_condition_code (op) >= 0;
  })
  
+(define_special_predicate "aarch64_equality_operator"
+  (match_code "eq,ne"))
+
  (define_special_predicate "aarch64_carry_operation"
    (match_code "ne,geu")
  {
@@ -342,22 +369,34 @@
  })
  
  (define_special_predicate "aarch64_simd_lshift_imm"
-  (match_code "const_vector")
+  (match_code "const,const_vector")
  {
    return aarch64_simd_shift_imm_p (op, mode, true);
  })
  
  (define_special_predicate "aarch64_simd_rshift_imm"
-  (match_code "const_vector")
+  (match_code "const,const_vector")
  {
    return aarch64_simd_shift_imm_p (op, mode, false);
  })
  
+(define_predicate "aarch64_simd_imm_zero"
+  (and (match_code "const,const_vector")
+       (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+(define_predicate "aarch64_simd_or_scalar_imm_zero"
+  (and (match_code "const_int,const_double,const,const_vector")
+       (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+(define_predicate "aarch64_simd_imm_minus_one"
+  (and (match_code "const,const_vector")
+       (match_test "op == CONSTM1_RTX (GET_MODE (op))")))
+
  (define_predicate "aarch64_simd_reg_or_zero"
-  (and (match_code "reg,subreg,const_int,const_double,const_vector")
+  (and (match_code "reg,subreg,const_int,const_double,const,const_vector")
         (ior (match_operand 0 "register_operand")
-           (ior (match_test "op == const0_rtx")
-                (match_test "aarch64_simd_imm_zero_p (op, mode)")))))
+           (match_test "op == const0_rtx")
+           (match_operand 0 "aarch64_simd_imm_zero"))))
  
  (define_predicate "aarch64_simd_struct_operand"
    (and (match_code "mem")
@@ -377,21 +416,6 @@
                     || GET_CODE (XEXP (op, 0)) == POST_INC
                     || GET_CODE (XEXP (op, 0)) == REG")))
  
-(define_special_predicate "aarch64_simd_imm_zero"
-  (match_code "const_vector")
-{
-  return aarch64_simd_imm_zero_p (op, mode);
-})
-
-(define_special_predicate "aarch64_simd_or_scalar_imm_zero"
-  (match_test "aarch64_simd_imm_zero_p (op, mode)"))
-
-(define_special_predicate "aarch64_simd_imm_minus_one"
-  (match_code "const_vector")
-{
-  return aarch64_const_vec_all_same_int_p (op, -1);
-})
-
  ;; Predicates used by the various SIMD shift operations.  These
  ;; fall in to 3 categories.
  ;;   Shifts with a range 0-(bit_size - 1) (aarch64_simd_shift_imm)
@@ -448,3 +472,133 @@
  (define_predicate "aarch64_constant_pool_symref"
     (and (match_code "symbol_ref")
         (match_test "CONSTANT_POOL_ADDRESS_P (op)")))
+
+(define_predicate "aarch64_constant_vector_operand"
+  (match_code "const,const_vector"))
+
+(define_predicate "aarch64_sve_ld1r_operand"
+  (and (match_operand 0 "memory_operand")
+       (match_test "aarch64_sve_ld1r_operand_p (op)")))
+
+;; Like memory_operand, but restricted to addresses that are valid for
+;; SVE LDR and STR instructions.
+(define_predicate "aarch64_sve_ldr_operand"
+  (and (match_code "mem")
+       (match_test "aarch64_sve_ldr_operand_p (op)")))
+
+(define_predicate "aarch64_sve_nonimmediate_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_ldr_operand")))
+
+(define_predicate "aarch64_sve_general_operand"
+  (and (match_code "reg,subreg,mem,const,const_vector")
+       (ior (match_operand 0 "register_operand")
+           (match_operand 0 "aarch64_sve_ldr_operand")
+           (match_test "aarch64_mov_operand_p (op, mode)"))))
+
+;; Doesn't include immediates, since those are handled by the move
+;; patterns instead.
+(define_predicate "aarch64_sve_dup_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_ld1r_operand")))
+
+(define_predicate "aarch64_sve_arith_immediate"
+  (and (match_code "const,const_vector")
+       (match_test "aarch64_sve_arith_immediate_p (op, false)")))
+
+(define_predicate "aarch64_sve_sub_arith_immediate"
+  (and (match_code "const,const_vector")
+       (match_test "aarch64_sve_arith_immediate_p (op, true)")))
+
+(define_predicate "aarch64_sve_inc_dec_immediate"
+  (and (match_code "const,const_vector")
+       (match_test "aarch64_sve_inc_dec_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_logical_immediate"
+  (and (match_code "const,const_vector")
+       (match_test "aarch64_sve_bitmask_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_mul_immediate"
+  (and (match_code "const,const_vector")
+       (match_test "aarch64_const_vec_all_same_in_range_p (op, -128, 127)")))
+
+(define_predicate "aarch64_sve_dup_immediate"
+  (and (match_code "const,const_vector")
+       (match_test "aarch64_sve_dup_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_cmp_vsc_immediate"
+  (and (match_code "const,const_vector")
+       (match_test "aarch64_sve_cmp_immediate_p (op, true)")))
+
+(define_predicate "aarch64_sve_cmp_vsd_immediate"
+  (and (match_code "const,const_vector")
+       (match_test "aarch64_sve_cmp_immediate_p (op, false)")))
+
+(define_predicate "aarch64_sve_index_immediate"
+  (and (match_code "const_int")
+       (match_test "aarch64_sve_index_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_float_arith_immediate"
+  (and (match_code "const,const_vector")
+       (match_test "aarch64_sve_float_arith_immediate_p (op, false)")))
+
+(define_predicate "aarch64_sve_float_arith_with_sub_immediate"
+  (and (match_code "const,const_vector")
+       (match_test "aarch64_sve_float_arith_immediate_p (op, true)")))
+
+(define_predicate "aarch64_sve_float_mul_immediate"
+  (and (match_code "const,const_vector")
+       (match_test "aarch64_sve_float_mul_immediate_p (op)")))
+
+(define_predicate "aarch64_sve_arith_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_arith_immediate")))
+
+(define_predicate "aarch64_sve_add_operand"
+  (ior (match_operand 0 "aarch64_sve_arith_operand")
+       (match_operand 0 "aarch64_sve_sub_arith_immediate")
+       (match_operand 0 "aarch64_sve_inc_dec_immediate")))
+
+(define_predicate "aarch64_sve_logical_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_logical_immediate")))
+
+(define_predicate "aarch64_sve_lshift_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_simd_lshift_imm")))
+
+(define_predicate "aarch64_sve_rshift_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_simd_rshift_imm")))
+
+(define_predicate "aarch64_sve_mul_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_mul_immediate")))
+
+(define_predicate "aarch64_sve_cmp_vsc_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_cmp_vsc_immediate")))
+
+(define_predicate "aarch64_sve_cmp_vsd_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_cmp_vsd_immediate")))
+
+(define_predicate "aarch64_sve_index_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_index_immediate")))
+
+(define_predicate "aarch64_sve_float_arith_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_float_arith_immediate")))
+
+(define_predicate "aarch64_sve_float_arith_with_sub_operand"
+  (ior (match_operand 0 "aarch64_sve_float_arith_operand")
+       (match_operand 0 "aarch64_sve_float_arith_with_sub_immediate")))
+
+(define_predicate "aarch64_sve_float_mul_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_sve_float_mul_immediate")))
+
+(define_predicate "aarch64_sve_vec_perm_operand"
+  (ior (match_operand 0 "register_operand")
+       (match_operand 0 "aarch64_constant_vector_operand")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi

index 89a4727ecdff559f17578ea485e89f08a096a8bd..28c61a078d2a9b96af1e7f194dcaeda916da25ef 100644 (file)
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14594,6 +14594,23 @@ Permissible values are @samp{none}, which disables return address signing,
  functions, and @samp{all}, which enables pointer signing for all functions.  The
  default value is @samp{none}.
  
+@item -msve-vector-bits=@var{bits}
+@opindex msve-vector-bits
+Specify the number of bits in an SVE vector register.  This option only has
+an effect when SVE is enabled.
+
+GCC supports two forms of SVE code generation: ``vector-length
+agnostic'' output that works with any size of vector register and
+``vector-length specific'' output that only works when the vector
+registers are a particular size.  Replacing @var{bits} with
+@samp{scalable} selects vector-length agnostic output while
+replacing it with a number selects vector-length specific output.
+The possible lengths in the latter case are: 128, 256, 512, 1024
+and 2048.  @samp{scalable} is the default.
+
+At present, @samp{-msve-vector-bits=128} produces the same output
+as @samp{-msve-vector-bits=scalable}.
+
  @end table
  
  @subsubsection @option{-march} and @option{-mcpu} Feature Modifiers
@@ -14617,6 +14634,9 @@ values for options @option{-march} and @option{-mcpu}.
  Enable Advanced SIMD instructions.  This also enables floating-point
  instructions.  This is on by default for all possible values for options
  @option{-march} and @option{-mcpu}.
+@item sve
+Enable Scalable Vector Extension instructions.  This also enables Advanced
+SIMD and floating-point instructions.
  @item lse
  Enable Large System Extension instructions.  This is on by default for
  @option{-march=armv8.1-a}.
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi

index 497df1bb50154611df3daddaca3316d4a630837b..e956c751b573a8732ba7199bf4ff371e68657aa3 100644 (file)
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -1735,7 +1735,13 @@ the meanings of that architecture's constraints.
  The stack pointer register (@code{SP})
  
  @item w
-Floating point or SIMD vector register
+Floating point register, Advanced SIMD vector register or SVE vector register
+
+@item Upl
+One of the low eight SVE predicate registers (@code{P0} to @code{P7})
+
+@item Upa
+Any of the SVE predicate registers (@code{P0} to @code{P15})
  
  @item I
  Integer constant that is valid as an immediate operand in an @code{ADD}
author	Richard Sandiford <richard.sandiford@linaro.org>
	Sat, 13 Jan 2018 17:50:35 +0000 (17:50 +0000)
committer	Richard Sandiford <rsandifo@gcc.gnu.org>
	Sat, 13 Jan 2018 17:50:35 +0000 (17:50 +0000)
gcc/ChangeLog		patch \| blob \| history
gcc/config/aarch64/aarch64-c.c		patch \| blob \| history
gcc/config/aarch64/aarch64-modes.def		patch \| blob \| history
gcc/config/aarch64/aarch64-option-extensions.def		patch \| blob \| history
gcc/config/aarch64/aarch64-opts.h		patch \| blob \| history
gcc/config/aarch64/aarch64-protos.h		patch \| blob \| history
gcc/config/aarch64/aarch64-sve.md	[new file with mode: 0644]	patch \| blob
gcc/config/aarch64/aarch64.c		patch \| blob \| history
gcc/config/aarch64/aarch64.h		patch \| blob \| history
gcc/config/aarch64/aarch64.md		patch \| blob \| history
gcc/config/aarch64/aarch64.opt		patch \| blob \| history
gcc/config/aarch64/constraints.md		patch \| blob \| history
gcc/config/aarch64/iterators.md		patch \| blob \| history
gcc/config/aarch64/predicates.md		patch \| blob \| history
gcc/doc/invoke.texi		patch \| blob \| history
gcc/doc/md.texi		patch \| blob \| history