+2019-10-29 Richard Sandiford <richard.sandiford@arm.com>
+
+ * calls.c (pass_by_reference): Leave the target to decide whether
+ POLY_INT_CST-sized arguments should be passed by value or reference,
+ rather than forcing them to be passed by reference.
+ (must_pass_in_stack_var_size): Likewise.
+ * config/aarch64/aarch64.md (LAST_SAVED_REGNUM): Redefine from
+ V31_REGNUM to P15_REGNUM.
+ * config/aarch64/aarch64-protos.h (aarch64_init_cumulative_args):
+ Take an extra "silent_p" parameter, defaulting to false.
+ (aarch64_sve::svbool_type_p): Declare.
+ (aarch64_sve::nvectors_if_data_type): Likewise.
+ * config/aarch64/aarch64.h (NUM_PR_ARG_REGS): New macro.
+ (aarch64_frame::reg_offset): Turn into poly_int64s.
+ (aarch64_frame::save_regs_size): Likewise.
+ (aarch64_frame::below_hard_fp_saved_regs_size): New field.
+ (aarch64_frame::sve_callee_adjust): Likewise.
+ (aarch64_frame::spare_reg_reg): Likewise.
+ (ARM_PCS_SVE): New arm_pcs value.
+ (CUMULATIVE_ARGS::aapcs_nprn): New field.
+ (CUMULATIVE_ARGS::aapcs_nextnprn): Likewise.
+ (CUMULATIVE_ARGS::silent_p): Likewise.
+ (BITS_PER_SVE_PRED): New macro.
+ * config/aarch64/aarch64.c (handle_aarch64_vector_pcs_attribute): New
+ function. Reject aarch64_vector_pcs attributes on SVE functions.
+ (aarch64_attribute_table): Use the above handler.
+ (aarch64_sve_abi): New function.
+ (aarch64_sve_argument_p): Likewise.
+ (aarch64_returns_value_in_sve_regs_p): Likewise.
+ (aarch64_takes_arguments_in_sve_regs_p): Likewise.
+ (aarch64_fntype_abi): Check for SVE functions and return the SVE PCS
+ descriptor for them.
+ (aarch64_simd_decl_p): Delete.
+ (aarch64_emit_cfi_for_reg_p): New function.
+ (aarch64_reg_save_mode): Remove the fndecl argument and instead use
+ crtl->abi to choose the mode for FP registers. Handle the SVE PCS.
+ (aarch64_hard_regno_call_part_clobbered): Do not treat FP registers
+ as partly clobbered for the SVE PCS.
+ (aarch64_function_ok_for_sibcall): Check whether the two functions
+ use the same ABI, rather than checking specifically for whether
+ they're aarch64_vector_pcs functions.
+ (aarch64_pass_by_reference): Raise an error for attempts to pass
+ SVE arguments when SVE is disabled. Pass SVE arguments by reference
+ if there are not enough free registers left, or if the argument is
+ variadic.
+ (aarch64_function_value): Handle SVE predicates, vectors and tuples.
+ (aarch64_return_in_memory): Do not return SVE predicates, vectors and
+ tuples in memory.
+ (aarch64_layout_arg): Take a function_arg_info rather than
+ individual properties. Handle SVE predicates, vectors and tuples.
+ Raise an error if they are passed to unprototyped functions.
+ (aarch64_function_arg): If the silent_p flag is set, suppress the
+ usual error about using float registers without TARGET_FLOAT.
+ (aarch64_init_cumulative_args): Take a silent_p parameter and store
+ it in the cumulative_args structure. Initialize aapcs_nprn and
+ aapcs_nextnprn. If the silent_p flag is set, suppress the usual
+ error about using float registers without TARGET_FLOAT.
+ If the silent_p flag is not set, also raise an error about
+ using SVE functions when SVE is disabled.
+ (aarch64_function_arg_advance): Update the call to aarch64_layout_arg,
+ and call it for SVE functions too. Update aapcs_nprn similarly
+ to the other register counts.
+ (aarch64_layout_frame): If a big-endian function needs to save
+ and restore Z8-Z15, search for a spare predicate that it can use.
+ Store SVE predicates at the bottom of the register save area,
+ followed by SVE vectors, then followed by the normal slots.
+ Keep pointing the hard frame pointer at the base of the normal slots,
+ above the SVE vectors. Update the various frame creation and
+ tear-down strategies for the new layout, initializing the new
+ sve_callee_adjust field. Add an additional layout for frames
+ whose saved registers are all SVE registers.
+ (aarch64_register_saved_on_entry): Cope with poly_int64 reg_offsets.
+ (aarch64_return_address_signing_enabled): Likewise.
+ (aarch64_push_regs, aarch64_pop_regs): Update calls to
+ aarch64_reg_save_mode.
+ (aarch64_adjust_sve_callee_save_base): New function.
+ (aarch64_add_cfa_expression): Move earlier in file. Take the
+ saved register as an rtx rather than a register number and use
+ its mode for the MEM slot.
+ (aarch64_save_callee_saves): Remove the mode argument and instead
+ use aarch64_reg_save_mode to get the mode of each save slot.
+ Add a hard_fp_valid_p parameter. Cope with poly_int64 register
+ offsets. Allow GP offsets to be saved at a VL-based offset from
+ the stack, handling this case using the frame pointer if available
+ or a temporary register otherwise. Use ST1D to save Z8-Z15 for
+ big-endian SVE functions; use normal moves for other SVE saves.
+ Only mark the save as frame-related if aarch64_emit_cfi_for_reg_p
+ returns true. Add explicit CFA notes when not storing via the
+ stack pointer. Do not try to pair SVE saves.
+ (aarch64_restore_callee_saves): Cope with poly_int64 register
+ offsets. Use LD1D to restore Z8-Z15 for big-endian SVE functions;
+ use normal moves for other SVE restores. Only add CFA restore notes
+ if aarch64_emit_cfi_for_reg_p returns true. Do not try to pair
+ SVE restores.
+ (aarch64_get_separate_components): Always keep the first SVE save
+ in the prologue if we need to use it as a stack probe. Don't allow
+ Z8-Z15 saves and loads to be shrink-wrapped for big-endian targets.
+ Likewise the spare predicate register that they need. Update the
+ offset calculation to account for the SVE save area. Use the
+ appropriate range check for SVE LDR and STR instructions.
+ (aarch64_components_for_bb): Cope with poly_int64 reg_offsets.
+ (aarch64_process_components): Likewise. Update the offset
+ calculation to account for the SVE save area. Only mark the
+ save as frame-related if aarch64_emit_cfi_for_reg_p returns true.
+ Do not try to pair SVE saves.
+ (aarch64_allocate_and_probe_stack_space): Cope with poly_int64
+ reg_offsets. When handling the final allocation, expect the
+ first SVE register save to be part of the initial allocation
+ and for it to act as a probe at SP. Account for the SVE callee
+ save area in the dump information.
+ (aarch64_expand_prologue): Update the frame diagram. Fold the
+ SVE callee allocation into the initial allocation if stack clash
+ protection is enabled. Use new variables to track the offset
+ of the frame chain (and hard frame pointer) from the current
+ stack pointer, and likewise the offset of the bottom of the
+ register save area. Update calls to aarch64_save_callee_saves
+ and aarch64_add_cfa_expression. Apply sve_callee_adjust before
+ saving the FP&SIMD registers. Save the predicate registers.
+ (aarch64_expand_epilogue): Take below_hard_fp_saved_regs_size
+ into account when setting the stack pointer from the frame pointer,
+ and when deciding whether we can inherit the initial adjustment
+ amount from the prologue. Restore the predicate registers after
+ the vector registers, then apply sve_callee_adjust, then restore
+ the general registers.
+ (aarch64_secondary_reload): Don't use secondary SVE reloads
+ for VNx16BImode.
+ (aapcs_vfp_sub_candidate): Assert that the type is not an SVE type.
+ (aarch64_short_vector_p): Return false for SVE types.
+ (aarch64_vfp_is_call_or_return_candidate): Initialize *is_ha
+ at the start of the function. Return false for SVE types.
+ (aarch64_asm_output_variant_pcs): Output .variant_pcs for SVE
+ functions too.
+ (TARGET_STRICT_ARGUMENT_NAMING): Redefine to request strict naming.
+ * config/aarch64/aarch64-sve.md (*aarch64_sve_mov<mode>_le): Extend
+ to big-endian targets for bytewise moves.
+ (*aarch64_sve_mov<mode>_be): Exclude the bytewise case.
+
2019-10-29 Richard Sandiford <richard.sandiford@arm.com>
Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
return true;
/* GCC post 3.4 passes *all* variable sized types by reference. */
- if (!TYPE_SIZE (type) || TREE_CODE (TYPE_SIZE (type)) != INTEGER_CST)
+ if (!TYPE_SIZE (type) || !poly_int_tree_p (TYPE_SIZE (type)))
return true;
/* If a record type should be passed the same as its first (and only)
return false;
/* If the type has variable size... */
- if (TREE_CODE (TYPE_SIZE (arg.type)) != INTEGER_CST)
+ if (!poly_int_tree_p (TYPE_SIZE (arg.type)))
return true;
/* If the type is marked as addressable (it is required
void aarch64_expand_vector_init (rtx, rtx);
void aarch64_sve_expand_vector_init (rtx, rtx);
void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
- const_tree, unsigned);
+ const_tree, unsigned, bool = false);
void aarch64_init_expanders (void);
void aarch64_init_simd_builtins (void);
void aarch64_emit_call_insn (rtx);
void handle_arm_sve_h ();
tree builtin_decl (unsigned, bool);
bool builtin_type_p (const_tree);
+ bool svbool_type_p (const_tree);
+ unsigned int nvectors_if_data_type (const_tree);
const char *mangle_builtin_type (const_tree);
tree resolve_overloaded_builtin (location_t, unsigned int,
vec<tree, va_gc> *);
}
)
-;; Unpredicated moves (little-endian). Only allow memory operations
-;; during and after RA; before RA we want the predicated load and
-;; store patterns to be used instead.
+;; Unpredicated moves (bytes or little-endian). Only allow memory operations
+;; during and after RA; before RA we want the predicated load and store
+;; patterns to be used instead.
(define_insn "*aarch64_sve_mov<mode>_le"
[(set (match_operand:SVE_ALL 0 "aarch64_sve_nonimmediate_operand" "=w, Utr, w, w")
(match_operand:SVE_ALL 1 "aarch64_sve_general_operand" "Utr, w, w, Dn"))]
"TARGET_SVE
- && !BYTES_BIG_ENDIAN
+ && (<MODE>mode == VNx16QImode || !BYTES_BIG_ENDIAN)
&& ((lra_in_progress || reload_completed)
|| (register_operand (operands[0], <MODE>mode)
&& nonmemory_operand (operands[1], <MODE>mode)))"
* return aarch64_output_sve_mov_immediate (operands[1]);"
)
-;; Unpredicated moves (big-endian). Memory accesses require secondary
+;; Unpredicated moves (non-byte big-endian). Memory accesses require secondary
;; reloads.
(define_insn "*aarch64_sve_mov<mode>_be"
[(set (match_operand:SVE_ALL 0 "register_operand" "=w, w")
(match_operand:SVE_ALL 1 "aarch64_nonmemory_operand" "w, Dn"))]
- "TARGET_SVE && BYTES_BIG_ENDIAN"
+ "TARGET_SVE && BYTES_BIG_ENDIAN && <MODE>mode != VNx16QImode"
"@
mov\t%0.d, %1.d
* return aarch64_output_sve_mov_immediate (operands[1]);"
/* The current tuning set. */
struct tune_params aarch64_tune_params = generic_tunings;
+/* Check whether an 'aarch64_vector_pcs' attribute is valid. */
+
+static tree
+handle_aarch64_vector_pcs_attribute (tree *node, tree name, tree,
+ int, bool *no_add_attrs)
+{
+ /* Since we set fn_type_req to true, the caller should have checked
+ this for us. */
+ gcc_assert (FUNC_OR_METHOD_TYPE_P (*node));
+ switch ((arm_pcs) fntype_abi (*node).id ())
+ {
+ case ARM_PCS_AAPCS64:
+ case ARM_PCS_SIMD:
+ return NULL_TREE;
+
+ case ARM_PCS_SVE:
+ error ("the %qE attribute cannot be applied to an SVE function type",
+ name);
+ *no_add_attrs = true;
+ return NULL_TREE;
+
+ case ARM_PCS_TLSDESC:
+ case ARM_PCS_UNKNOWN:
+ break;
+ }
+ gcc_unreachable ();
+}
+
/* Table of machine attributes. */
static const struct attribute_spec aarch64_attribute_table[] =
{
/* { name, min_len, max_len, decl_req, type_req, fn_type_req,
affects_type_identity, handler, exclude } */
- { "aarch64_vector_pcs", 0, 0, false, true, true, true, NULL, NULL },
+ { "aarch64_vector_pcs", 0, 0, false, true, true, true,
+ handle_aarch64_vector_pcs_attribute, NULL },
{ NULL, 0, 0, false, false, false, false, NULL, NULL }
};
return simd_abi;
}
+/* Return the descriptor of the SVE PCS. */
+
+static const predefined_function_abi &
+aarch64_sve_abi (void)
+{
+ predefined_function_abi &sve_abi = function_abis[ARM_PCS_SVE];
+ if (!sve_abi.initialized_p ())
+ {
+ HARD_REG_SET full_reg_clobbers
+ = default_function_abi.full_reg_clobbers ();
+ for (int regno = V8_REGNUM; regno <= V23_REGNUM; ++regno)
+ CLEAR_HARD_REG_BIT (full_reg_clobbers, regno);
+ for (int regno = P4_REGNUM; regno <= P11_REGNUM; ++regno)
+ CLEAR_HARD_REG_BIT (full_reg_clobbers, regno);
+ sve_abi.initialize (ARM_PCS_SVE, full_reg_clobbers);
+ }
+ return sve_abi;
+}
+
/* Generate code to enable conditional branches in functions over 1 MiB. */
const char *
aarch64_gen_far_branch (rtx * operands, int pos_label, const char * dest,
return false;
}
+/* Return true if TYPE is a type that should be passed or returned in
+ SVE registers, assuming enough registers are available. When returning
+ true, set *NUM_ZR and *NUM_PR to the number of required Z and P registers
+ respectively. */
+
+static bool
+aarch64_sve_argument_p (const_tree type, unsigned int *num_zr,
+ unsigned int *num_pr)
+{
+ if (aarch64_sve::svbool_type_p (type))
+ {
+ *num_pr = 1;
+ *num_zr = 0;
+ return true;
+ }
+
+ if (unsigned int nvectors = aarch64_sve::nvectors_if_data_type (type))
+ {
+ *num_pr = 0;
+ *num_zr = nvectors;
+ return true;
+ }
+
+ return false;
+}
+
+/* Return true if a function with type FNTYPE returns its value in
+ SVE vector or predicate registers. */
+
+static bool
+aarch64_returns_value_in_sve_regs_p (const_tree fntype)
+{
+ unsigned int num_zr, num_pr;
+ tree return_type = TREE_TYPE (fntype);
+ return (return_type != error_mark_node
+ && aarch64_sve_argument_p (return_type, &num_zr, &num_pr));
+}
+
+/* Return true if a function with type FNTYPE takes arguments in
+ SVE vector or predicate registers. */
+
+static bool
+aarch64_takes_arguments_in_sve_regs_p (const_tree fntype)
+{
+ CUMULATIVE_ARGS args_so_far_v;
+ aarch64_init_cumulative_args (&args_so_far_v, NULL_TREE, NULL_RTX,
+ NULL_TREE, 0, true);
+ cumulative_args_t args_so_far = pack_cumulative_args (&args_so_far_v);
+
+ for (tree chain = TYPE_ARG_TYPES (fntype);
+ chain && chain != void_list_node;
+ chain = TREE_CHAIN (chain))
+ {
+ tree arg_type = TREE_VALUE (chain);
+ if (arg_type == error_mark_node)
+ return false;
+
+ function_arg_info arg (arg_type, /*named=*/true);
+ apply_pass_by_reference_rules (&args_so_far_v, arg);
+ unsigned int num_zr, num_pr;
+ if (aarch64_sve_argument_p (arg.type, &num_zr, &num_pr))
+ return true;
+
+ targetm.calls.function_arg_advance (args_so_far, arg);
+ }
+ return false;
+}
+
/* Implement TARGET_FNTYPE_ABI. */
static const predefined_function_abi &
{
if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
return aarch64_simd_abi ();
+
+ if (aarch64_returns_value_in_sve_regs_p (fntype)
+ || aarch64_takes_arguments_in_sve_regs_p (fntype))
+ return aarch64_sve_abi ();
+
return default_function_abi;
}
-/* Return true if this is a definition of a vectorized simd function. */
+/* Return true if we should emit CFI for register REGNO. */
static bool
-aarch64_simd_decl_p (tree fndecl)
+aarch64_emit_cfi_for_reg_p (unsigned int regno)
{
- tree fntype;
-
- if (fndecl == NULL)
- return false;
- fntype = TREE_TYPE (fndecl);
- if (fntype == NULL)
- return false;
-
- /* Functions with the aarch64_vector_pcs attribute use the simd ABI. */
- if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)) != NULL)
- return true;
-
- return false;
+ return (GP_REGNUM_P (regno)
+ || !default_function_abi.clobbers_full_reg_p (regno));
}
-/* Return the mode a register save/restore should use. DImode for integer
- registers, DFmode for FP registers in non-SIMD functions (they only save
- the bottom half of a 128 bit register), or TFmode for FP registers in
- SIMD functions. */
+/* Return the mode we should use to save and restore register REGNO. */
static machine_mode
-aarch64_reg_save_mode (tree fndecl, unsigned regno)
+aarch64_reg_save_mode (unsigned int regno)
{
- return GP_REGNUM_P (regno)
- ? E_DImode
- : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
+ if (GP_REGNUM_P (regno))
+ return DImode;
+
+ if (FP_REGNUM_P (regno))
+ switch (crtl->abi->id ())
+ {
+ case ARM_PCS_AAPCS64:
+ /* Only the low 64 bits are saved by the base PCS. */
+ return DFmode;
+
+ case ARM_PCS_SIMD:
+ /* The vector PCS saves the low 128 bits (which is the full
+ register on non-SVE targets). */
+ return TFmode;
+
+ case ARM_PCS_SVE:
+ /* Use vectors of DImode for registers that need frame
+ information, so that the first 64 bytes of the save slot
+ are always the equivalent of what storing D<n> would give. */
+ if (aarch64_emit_cfi_for_reg_p (regno))
+ return VNx2DImode;
+
+ /* Use vectors of bytes otherwise, so that the layout is
+ endian-agnostic, and so that we can use LDR and STR for
+ big-endian targets. */
+ return VNx16QImode;
+
+ case ARM_PCS_TLSDESC:
+ case ARM_PCS_UNKNOWN:
+ break;
+ }
+
+ if (PR_REGNUM_P (regno))
+ /* Save the full predicate register. */
+ return VNx16BImode;
+
+ gcc_unreachable ();
}
/* Implement TARGET_INSN_CALLEE_ABI. */
unsigned int regno,
machine_mode mode)
{
- if (FP_REGNUM_P (regno))
+ if (FP_REGNUM_P (regno) && abi_id != ARM_PCS_SVE)
{
poly_int64 per_register_size = GET_MODE_SIZE (mode);
unsigned int nregs = hard_regno_nregs (regno, mode);
}
static bool
-aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
- tree exp ATTRIBUTE_UNUSED)
+aarch64_function_ok_for_sibcall (tree, tree exp)
{
- if (aarch64_simd_decl_p (cfun->decl) != aarch64_simd_decl_p (decl))
+ if (crtl->abi->id () != expr_callee_abi (exp).id ())
return false;
return true;
/* Implement TARGET_PASS_BY_REFERENCE. */
static bool
-aarch64_pass_by_reference (cumulative_args_t, const function_arg_info &arg)
+aarch64_pass_by_reference (cumulative_args_t pcum_v,
+ const function_arg_info &arg)
{
+ CUMULATIVE_ARGS *pcum = get_cumulative_args (pcum_v);
HOST_WIDE_INT size;
machine_mode dummymode;
int nregs;
+ unsigned int num_zr, num_pr;
+ if (arg.type && aarch64_sve_argument_p (arg.type, &num_zr, &num_pr))
+ {
+ if (pcum && !pcum->silent_p && !TARGET_SVE)
+ /* We can't gracefully recover at this point, so make this a
+ fatal error. */
+ fatal_error (input_location, "arguments of type %qT require"
+ " the SVE ISA extension", arg.type);
+
+ /* Variadic SVE types are passed by reference. Normal non-variadic
+ arguments are too if we've run out of registers. */
+ return (!arg.named
+ || pcum->aapcs_nvrn + num_zr > NUM_FP_ARG_REGS
+ || pcum->aapcs_nprn + num_pr > NUM_PR_ARG_REGS);
+ }
+
/* GET_MODE_SIZE (BLKmode) is useless since it is 0. */
if (arg.mode == BLKmode && arg.type)
size = int_size_in_bytes (arg.type);
if (INTEGRAL_TYPE_P (type))
mode = promote_function_mode (type, mode, &unsignedp, func, 1);
+ unsigned int num_zr, num_pr;
+ if (type && aarch64_sve_argument_p (type, &num_zr, &num_pr))
+ {
+ /* Don't raise an error here if we're called when SVE is disabled,
+ since this is really just a query function. Other code must
+ do that where appropriate. */
+ mode = TYPE_MODE_RAW (type);
+ gcc_assert (VECTOR_MODE_P (mode)
+ && (!TARGET_SVE || aarch64_sve_mode_p (mode)));
+
+ if (num_zr > 0 && num_pr == 0)
+ return gen_rtx_REG (mode, V0_REGNUM);
+
+ if (num_zr == 0 && num_pr == 1)
+ return gen_rtx_REG (mode, P0_REGNUM);
+
+ gcc_unreachable ();
+ }
+
+ /* Generic vectors that map to SVE modes with -msve-vector-bits=N are
+ returned in memory, not by value. */
+ gcc_assert (!aarch64_sve_mode_p (mode));
+
if (aarch64_return_in_msb (type))
{
HOST_WIDE_INT size = int_size_in_bytes (type);
/* Simple scalar types always returned in registers. */
return false;
+ unsigned int num_zr, num_pr;
+ if (type && aarch64_sve_argument_p (type, &num_zr, &num_pr))
+ {
+ /* All SVE types we support fit in registers. For example, it isn't
+ yet possible to define an aggregate of 9+ SVE vectors or 5+ SVE
+ predicates. */
+ gcc_assert (num_zr <= NUM_FP_ARG_REGS && num_pr <= NUM_PR_ARG_REGS);
+ return false;
+ }
+
if (aarch64_vfp_is_call_or_return_candidate (TYPE_MODE (type),
type,
&ag_mode,
numbers refer to the rule numbers in the AAPCS64. */
static void
-aarch64_layout_arg (cumulative_args_t pcum_v, machine_mode mode,
- const_tree type,
- bool named ATTRIBUTE_UNUSED)
+aarch64_layout_arg (cumulative_args_t pcum_v, const function_arg_info &arg)
{
CUMULATIVE_ARGS *pcum = get_cumulative_args (pcum_v);
+ tree type = arg.type;
+ machine_mode mode = arg.mode;
int ncrn, nvrn, nregs;
bool allocate_ncrn, allocate_nvrn;
HOST_WIDE_INT size;
pcum->aapcs_arg_processed = true;
+ unsigned int num_zr, num_pr;
+ if (type && aarch64_sve_argument_p (type, &num_zr, &num_pr))
+ {
+ /* The PCS says that it is invalid to pass an SVE value to an
+ unprototyped function. There is no ABI-defined location we
+ can return in this case, so we have no real choice but to raise
+ an error immediately, even though this is only a query function. */
+ if (arg.named && pcum->pcs_variant != ARM_PCS_SVE)
+ {
+ gcc_assert (!pcum->silent_p);
+ error ("SVE type %qT cannot be passed to an unprototyped function",
+ arg.type);
+ /* Avoid repeating the message, and avoid tripping the assert
+ below. */
+ pcum->pcs_variant = ARM_PCS_SVE;
+ }
+
+ /* We would have converted the argument into pass-by-reference
+ form if it didn't fit in registers. */
+ pcum->aapcs_nextnvrn = pcum->aapcs_nvrn + num_zr;
+ pcum->aapcs_nextnprn = pcum->aapcs_nprn + num_pr;
+ gcc_assert (arg.named
+ && pcum->pcs_variant == ARM_PCS_SVE
+ && aarch64_sve_mode_p (mode)
+ && pcum->aapcs_nextnvrn <= NUM_FP_ARG_REGS
+ && pcum->aapcs_nextnprn <= NUM_PR_ARG_REGS);
+
+ if (num_zr > 0 && num_pr == 0)
+ pcum->aapcs_reg = gen_rtx_REG (mode, V0_REGNUM + pcum->aapcs_nvrn);
+ else if (num_zr == 0 && num_pr == 1)
+ pcum->aapcs_reg = gen_rtx_REG (mode, P0_REGNUM + pcum->aapcs_nprn);
+ else
+ gcc_unreachable ();
+ return;
+ }
+
+ /* Generic vectors that map to SVE modes with -msve-vector-bits=N are
+ passed by reference, not by value. */
+ gcc_assert (!aarch64_sve_mode_p (mode));
+
/* Size in bytes, rounded to the nearest multiple of 8 bytes. */
if (type)
size = int_size_in_bytes (type);
and homogenous short-vector aggregates (HVA). */
if (allocate_nvrn)
{
- if (!TARGET_FLOAT)
+ if (!pcum->silent_p && !TARGET_FLOAT)
aarch64_err_no_fpadvsimd (mode);
if (nvrn + nregs <= NUM_FP_ARG_REGS)
{
CUMULATIVE_ARGS *pcum = get_cumulative_args (pcum_v);
gcc_assert (pcum->pcs_variant == ARM_PCS_AAPCS64
- || pcum->pcs_variant == ARM_PCS_SIMD);
+ || pcum->pcs_variant == ARM_PCS_SIMD
+ || pcum->pcs_variant == ARM_PCS_SVE);
if (arg.end_marker_p ())
return gen_int_mode (pcum->pcs_variant, DImode);
- aarch64_layout_arg (pcum_v, arg.mode, arg.type, arg.named);
+ aarch64_layout_arg (pcum_v, arg);
return pcum->aapcs_reg;
}
const_tree fntype,
rtx libname ATTRIBUTE_UNUSED,
const_tree fndecl ATTRIBUTE_UNUSED,
- unsigned n_named ATTRIBUTE_UNUSED)
+ unsigned n_named ATTRIBUTE_UNUSED,
+ bool silent_p)
{
pcum->aapcs_ncrn = 0;
pcum->aapcs_nvrn = 0;
+ pcum->aapcs_nprn = 0;
pcum->aapcs_nextncrn = 0;
pcum->aapcs_nextnvrn = 0;
+ pcum->aapcs_nextnprn = 0;
if (fntype)
pcum->pcs_variant = (arm_pcs) fntype_abi (fntype).id ();
else
pcum->aapcs_arg_processed = false;
pcum->aapcs_stack_words = 0;
pcum->aapcs_stack_size = 0;
+ pcum->silent_p = silent_p;
- if (!TARGET_FLOAT
+ if (!silent_p
+ && !TARGET_FLOAT
&& fndecl && TREE_PUBLIC (fndecl)
&& fntype && fntype != error_mark_node)
{
&mode, &nregs, NULL))
aarch64_err_no_fpadvsimd (TYPE_MODE (type));
}
- return;
+
+ if (!silent_p
+ && !TARGET_SVE
+ && pcum->pcs_variant == ARM_PCS_SVE)
+ {
+ /* We can't gracefully recover at this point, so make this a
+ fatal error. */
+ if (fndecl)
+ fatal_error (input_location, "%qE requires the SVE ISA extension",
+ fndecl);
+ else
+ fatal_error (input_location, "calls to functions of type %qT require"
+ " the SVE ISA extension", fntype);
+ }
}
static void
{
CUMULATIVE_ARGS *pcum = get_cumulative_args (pcum_v);
if (pcum->pcs_variant == ARM_PCS_AAPCS64
- || pcum->pcs_variant == ARM_PCS_SIMD)
+ || pcum->pcs_variant == ARM_PCS_SIMD
+ || pcum->pcs_variant == ARM_PCS_SVE)
{
- aarch64_layout_arg (pcum_v, arg.mode, arg.type, arg.named);
+ aarch64_layout_arg (pcum_v, arg);
gcc_assert ((pcum->aapcs_reg != NULL_RTX)
!= (pcum->aapcs_stack_words != 0));
pcum->aapcs_arg_processed = false;
pcum->aapcs_ncrn = pcum->aapcs_nextncrn;
pcum->aapcs_nvrn = pcum->aapcs_nextnvrn;
+ pcum->aapcs_nprn = pcum->aapcs_nextnprn;
pcum->aapcs_stack_size += pcum->aapcs_stack_words;
pcum->aapcs_stack_words = 0;
pcum->aapcs_reg = NULL_RTX;
static void
aarch64_layout_frame (void)
{
- HOST_WIDE_INT offset = 0;
+ poly_int64 offset = 0;
int regno, last_fp_reg = INVALID_REGNUM;
- bool simd_function = (crtl->abi->id () == ARM_PCS_SIMD);
+ machine_mode vector_save_mode = aarch64_reg_save_mode (V8_REGNUM);
+ poly_int64 vector_save_size = GET_MODE_SIZE (vector_save_mode);
+ bool frame_related_fp_reg_p = false;
aarch64_frame &frame = cfun->machine->frame;
frame.emit_frame_chain = aarch64_needs_frame_chain ();
frame.wb_candidate1 = INVALID_REGNUM;
frame.wb_candidate2 = INVALID_REGNUM;
+ frame.spare_pred_reg = INVALID_REGNUM;
/* First mark all the registers that really need to be saved... */
- for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
- frame.reg_offset[regno] = SLOT_NOT_REQUIRED;
-
- for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
+ for (regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
frame.reg_offset[regno] = SLOT_NOT_REQUIRED;
/* ... that includes the eh data registers (if needed)... */
{
frame.reg_offset[regno] = SLOT_REQUIRED;
last_fp_reg = regno;
+ if (aarch64_emit_cfi_for_reg_p (regno))
+ frame_related_fp_reg_p = true;
}
+ /* Big-endian SVE frames need a spare predicate register in order
+ to save Z8-Z15. Decide which register they should use. Prefer
+ an unused argument register if possible, so that we don't force P4
+ to be saved unnecessarily. */
+ if (frame_related_fp_reg_p
+ && crtl->abi->id () == ARM_PCS_SVE
+ && BYTES_BIG_ENDIAN)
+ {
+ bitmap live1 = df_get_live_out (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+ bitmap live2 = df_get_live_in (EXIT_BLOCK_PTR_FOR_FN (cfun));
+ for (regno = P0_REGNUM; regno <= P7_REGNUM; regno++)
+ if (!bitmap_bit_p (live1, regno) && !bitmap_bit_p (live2, regno))
+ break;
+ gcc_assert (regno <= P7_REGNUM);
+ frame.spare_pred_reg = regno;
+ df_set_regs_ever_live (regno, true);
+ }
+
+ for (regno = P0_REGNUM; regno <= P15_REGNUM; regno++)
+ if (df_regs_ever_live_p (regno)
+ && !fixed_regs[regno]
+ && !crtl->abi->clobbers_full_reg_p (regno))
+ frame.reg_offset[regno] = SLOT_REQUIRED;
+
+ /* With stack-clash, LR must be saved in non-leaf functions. */
+ gcc_assert (crtl->is_leaf
+ || maybe_ne (frame.reg_offset[R30_REGNUM], SLOT_NOT_REQUIRED));
+
+ /* Now assign stack slots for the registers. Start with the predicate
+ registers, since predicate LDR and STR have a relatively small
+ offset range. These saves happen below the hard frame pointer. */
+ for (regno = P0_REGNUM; regno <= P15_REGNUM; regno++)
+ if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+ {
+ frame.reg_offset[regno] = offset;
+ offset += BYTES_PER_SVE_PRED;
+ }
+
+ /* We save a maximum of 8 predicate registers, and since vector
+ registers are 8 times the size of a predicate register, all the
+ saved predicates fit within a single vector. Doing this also
+ rounds the offset to a 128-bit boundary. */
+ if (maybe_ne (offset, 0))
+ {
+ gcc_assert (known_le (offset, vector_save_size));
+ offset = vector_save_size;
+ }
+
+ /* If we need to save any SVE vector registers, add them next. */
+ if (last_fp_reg != (int) INVALID_REGNUM && crtl->abi->id () == ARM_PCS_SVE)
+ for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
+ if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
+ {
+ frame.reg_offset[regno] = offset;
+ offset += vector_save_size;
+ }
+
+ /* OFFSET is now the offset of the hard frame pointer from the bottom
+ of the callee save area. */
+ bool saves_below_hard_fp_p = maybe_ne (offset, 0);
+ frame.below_hard_fp_saved_regs_size = offset;
if (frame.emit_frame_chain)
{
/* FP and LR are placed in the linkage record. */
- frame.reg_offset[R29_REGNUM] = 0;
+ frame.reg_offset[R29_REGNUM] = offset;
frame.wb_candidate1 = R29_REGNUM;
- frame.reg_offset[R30_REGNUM] = UNITS_PER_WORD;
+ frame.reg_offset[R30_REGNUM] = offset + UNITS_PER_WORD;
frame.wb_candidate2 = R30_REGNUM;
- offset = 2 * UNITS_PER_WORD;
+ offset += 2 * UNITS_PER_WORD;
}
- /* With stack-clash, LR must be saved in non-leaf functions. */
- gcc_assert (crtl->is_leaf
- || frame.reg_offset[R30_REGNUM] != SLOT_NOT_REQUIRED);
-
- /* Now assign stack slots for them. */
for (regno = R0_REGNUM; regno <= R30_REGNUM; regno++)
- if (frame.reg_offset[regno] == SLOT_REQUIRED)
+ if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
{
frame.reg_offset[regno] = offset;
if (frame.wb_candidate1 == INVALID_REGNUM)
offset += UNITS_PER_WORD;
}
- HOST_WIDE_INT max_int_offset = offset;
- offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
- bool has_align_gap = offset != max_int_offset;
+ poly_int64 max_int_offset = offset;
+ offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+ bool has_align_gap = maybe_ne (offset, max_int_offset);
for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
- if (frame.reg_offset[regno] == SLOT_REQUIRED)
+ if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
{
/* If there is an alignment gap between integer and fp callee-saves,
allocate the last fp register to it if possible. */
if (regno == last_fp_reg
&& has_align_gap
- && !simd_function
- && (offset & 8) == 0)
+ && known_eq (vector_save_size, 8)
+ && multiple_p (offset, 16))
{
frame.reg_offset[regno] = max_int_offset;
break;
else if (frame.wb_candidate2 == INVALID_REGNUM
&& frame.wb_candidate1 >= V0_REGNUM)
frame.wb_candidate2 = regno;
- offset += simd_function ? UNITS_PER_VREG : UNITS_PER_WORD;
+ offset += vector_save_size;
}
- offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+ offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
frame.saved_regs_size = offset;
- HOST_WIDE_INT varargs_and_saved_regs_size
- = offset + frame.saved_varargs_size;
+ poly_int64 varargs_and_saved_regs_size = offset + frame.saved_varargs_size;
- frame.hard_fp_offset
+ poly_int64 above_outgoing_args
= aligned_upper_bound (varargs_and_saved_regs_size
+ get_frame_size (),
STACK_BOUNDARY / BITS_PER_UNIT);
+ frame.hard_fp_offset
+ = above_outgoing_args - frame.below_hard_fp_saved_regs_size;
+
/* Both these values are already aligned. */
gcc_assert (multiple_p (crtl->outgoing_args_size,
STACK_BOUNDARY / BITS_PER_UNIT));
- frame.frame_size = frame.hard_fp_offset + crtl->outgoing_args_size;
+ frame.frame_size = above_outgoing_args + crtl->outgoing_args_size;
frame.locals_offset = frame.saved_varargs_size;
frame.initial_adjust = 0;
frame.final_adjust = 0;
frame.callee_adjust = 0;
+ frame.sve_callee_adjust = 0;
frame.callee_offset = 0;
HOST_WIDE_INT max_push_offset = 0;
max_push_offset = 256;
HOST_WIDE_INT const_size, const_outgoing_args_size, const_fp_offset;
+ HOST_WIDE_INT const_saved_regs_size;
if (frame.frame_size.is_constant (&const_size)
&& const_size < max_push_offset
- && known_eq (crtl->outgoing_args_size, 0))
+ && known_eq (frame.hard_fp_offset, const_size))
{
/* Simple, small frame with no outgoing arguments:
+
stp reg1, reg2, [sp, -frame_size]!
stp reg3, reg4, [sp, 16] */
frame.callee_adjust = const_size;
}
else if (crtl->outgoing_args_size.is_constant (&const_outgoing_args_size)
- && const_outgoing_args_size + frame.saved_regs_size < 512
+ && frame.saved_regs_size.is_constant (&const_saved_regs_size)
+ && const_outgoing_args_size + const_saved_regs_size < 512
+ /* We could handle this case even with outgoing args, provided
+ that the number of args left us with valid offsets for all
+ predicate and vector save slots. It's such a rare case that
+ it hardly seems worth the effort though. */
+ && (!saves_below_hard_fp_p || const_outgoing_args_size == 0)
&& !(cfun->calls_alloca
&& frame.hard_fp_offset.is_constant (&const_fp_offset)
&& const_fp_offset < max_push_offset))
{
/* Frame with small outgoing arguments:
+
sub sp, sp, frame_size
stp reg1, reg2, [sp, outgoing_args_size]
stp reg3, reg4, [sp, outgoing_args_size + 16] */
frame.initial_adjust = frame.frame_size;
frame.callee_offset = const_outgoing_args_size;
}
+ else if (saves_below_hard_fp_p
+ && known_eq (frame.saved_regs_size,
+ frame.below_hard_fp_saved_regs_size))
+ {
+ /* Frame in which all saves are SVE saves:
+
+ sub sp, sp, hard_fp_offset + below_hard_fp_saved_regs_size
+ save SVE registers relative to SP
+ sub sp, sp, outgoing_args_size */
+ frame.initial_adjust = (frame.hard_fp_offset
+ + frame.below_hard_fp_saved_regs_size);
+ frame.final_adjust = crtl->outgoing_args_size;
+ }
else if (frame.hard_fp_offset.is_constant (&const_fp_offset)
&& const_fp_offset < max_push_offset)
{
- /* Frame with large outgoing arguments but a small local area:
+ /* Frame with large outgoing arguments or SVE saves, but with
+ a small local area:
+
stp reg1, reg2, [sp, -hard_fp_offset]!
stp reg3, reg4, [sp, 16]
+ [sub sp, sp, below_hard_fp_saved_regs_size]
+ [save SVE registers relative to SP]
sub sp, sp, outgoing_args_size */
frame.callee_adjust = const_fp_offset;
+ frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size;
frame.final_adjust = crtl->outgoing_args_size;
}
else
{
- /* Frame with large local area and outgoing arguments using frame pointer:
+ /* Frame with large local area and outgoing arguments or SVE saves,
+ using frame pointer:
+
sub sp, sp, hard_fp_offset
stp x29, x30, [sp, 0]
add x29, sp, 0
stp reg3, reg4, [sp, 16]
+ [sub sp, sp, below_hard_fp_saved_regs_size]
+ [save SVE registers relative to SP]
sub sp, sp, outgoing_args_size */
frame.initial_adjust = frame.hard_fp_offset;
+ frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size;
frame.final_adjust = crtl->outgoing_args_size;
}
/* Make sure the individual adjustments add up to the full frame size. */
gcc_assert (known_eq (frame.initial_adjust
+ frame.callee_adjust
+ + frame.sve_callee_adjust
+ frame.final_adjust, frame.frame_size));
frame.laid_out = true;
static bool
aarch64_register_saved_on_entry (int regno)
{
- return cfun->machine->frame.reg_offset[regno] >= 0;
+ return known_ge (cfun->machine->frame.reg_offset[regno], 0);
}
/* Return the next register up from REGNO up to LIMIT for the callee
aarch64_push_regs (unsigned regno1, unsigned regno2, HOST_WIDE_INT adjustment)
{
rtx_insn *insn;
- machine_mode mode = aarch64_reg_save_mode (cfun->decl, regno1);
+ machine_mode mode = aarch64_reg_save_mode (regno1);
if (regno2 == INVALID_REGNUM)
return aarch64_pushwb_single_reg (mode, regno1, adjustment);
aarch64_pop_regs (unsigned regno1, unsigned regno2, HOST_WIDE_INT adjustment,
rtx *cfi_ops)
{
- machine_mode mode = aarch64_reg_save_mode (cfun->decl, regno1);
+ machine_mode mode = aarch64_reg_save_mode (regno1);
rtx reg1 = gen_rtx_REG (mode, regno1);
*cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg1, *cfi_ops);
if its LR is pushed onto stack. */
return (aarch64_ra_sign_scope == AARCH64_FUNCTION_ALL
|| (aarch64_ra_sign_scope == AARCH64_FUNCTION_NON_LEAF
- && cfun->machine->frame.reg_offset[LR_REGNUM] >= 0));
+ && known_ge (cfun->machine->frame.reg_offset[LR_REGNUM], 0)));
}
/* Return TRUE if Branch Target Identification Mechanism is enabled. */
return (aarch64_enable_bti == 1);
}
+/* The caller is going to use ST1D or LD1D to save or restore an SVE
+ register in mode MODE at BASE_RTX + OFFSET, where OFFSET is in
+ the range [1, 16] * GET_MODE_SIZE (MODE). Prepare for this by:
+
+ (1) updating BASE_RTX + OFFSET so that it is a legitimate ST1D
+ or LD1D address
+
+ (2) setting PRED to a valid predicate register for the ST1D or LD1D,
+ if the variable isn't already nonnull
+
+ (1) is needed when OFFSET is in the range [8, 16] * GET_MODE_SIZE (MODE).
+ Handle this case using a temporary base register that is suitable for
+ all offsets in that range. Use ANCHOR_REG as this base register if it
+ is nonnull, otherwise create a new register and store it in ANCHOR_REG. */
+
+static inline void
+aarch64_adjust_sve_callee_save_base (machine_mode mode, rtx &base_rtx,
+ rtx &anchor_reg, poly_int64 &offset,
+ rtx &ptrue)
+{
+ if (maybe_ge (offset, 8 * GET_MODE_SIZE (mode)))
+ {
+ /* This is the maximum valid offset of the anchor from the base.
+ Lower values would be valid too. */
+ poly_int64 anchor_offset = 16 * GET_MODE_SIZE (mode);
+ if (!anchor_reg)
+ {
+ anchor_reg = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM);
+ emit_insn (gen_add3_insn (anchor_reg, base_rtx,
+ gen_int_mode (anchor_offset, Pmode)));
+ }
+ base_rtx = anchor_reg;
+ offset -= anchor_offset;
+ }
+ if (!ptrue)
+ {
+ int pred_reg = cfun->machine->frame.spare_pred_reg;
+ emit_move_insn (gen_rtx_REG (VNx16BImode, pred_reg),
+ CONSTM1_RTX (VNx16BImode));
+ ptrue = gen_rtx_REG (VNx2BImode, pred_reg);
+ }
+}
+
+/* Add a REG_CFA_EXPRESSION note to INSN to say that register REG
+ is saved at BASE + OFFSET. */
+
+static void
+aarch64_add_cfa_expression (rtx_insn *insn, rtx reg,
+ rtx base, poly_int64 offset)
+{
+ rtx mem = gen_frame_mem (GET_MODE (reg),
+ plus_constant (Pmode, base, offset));
+ add_reg_note (insn, REG_CFA_EXPRESSION, gen_rtx_SET (mem, reg));
+}
+
/* Emit code to save the callee-saved registers from register number START
to LIMIT to the stack at the location starting at offset START_OFFSET,
- skipping any write-back candidates if SKIP_WB is true. */
+ skipping any write-back candidates if SKIP_WB is true. HARD_FP_VALID_P
+ is true if the hard frame pointer has been set up. */
static void
-aarch64_save_callee_saves (machine_mode mode, poly_int64 start_offset,
- unsigned start, unsigned limit, bool skip_wb)
+aarch64_save_callee_saves (poly_int64 start_offset,
+ unsigned start, unsigned limit, bool skip_wb,
+ bool hard_fp_valid_p)
{
rtx_insn *insn;
unsigned regno;
unsigned regno2;
+ rtx anchor_reg = NULL_RTX, ptrue = NULL_RTX;
for (regno = aarch64_next_callee_save (start, limit);
regno <= limit;
{
rtx reg, mem;
poly_int64 offset;
- int offset_diff;
+ bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno);
if (skip_wb
&& (regno == cfun->machine->frame.wb_candidate1
continue;
if (cfun->machine->reg_is_wrapped_separately[regno])
- continue;
+ continue;
+ machine_mode mode = aarch64_reg_save_mode (regno);
reg = gen_rtx_REG (mode, regno);
offset = start_offset + cfun->machine->frame.reg_offset[regno];
- mem = gen_frame_mem (mode, plus_constant (Pmode, stack_pointer_rtx,
- offset));
+ rtx base_rtx = stack_pointer_rtx;
+ poly_int64 sp_offset = offset;
- regno2 = aarch64_next_callee_save (regno + 1, limit);
- offset_diff = cfun->machine->frame.reg_offset[regno2]
- - cfun->machine->frame.reg_offset[regno];
+ HOST_WIDE_INT const_offset;
+ if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ aarch64_adjust_sve_callee_save_base (mode, base_rtx, anchor_reg,
+ offset, ptrue);
+ else if (GP_REGNUM_P (regno)
+ && (!offset.is_constant (&const_offset) || const_offset >= 512))
+ {
+ gcc_assert (known_eq (start_offset, 0));
+ poly_int64 fp_offset
+ = cfun->machine->frame.below_hard_fp_saved_regs_size;
+ if (hard_fp_valid_p)
+ base_rtx = hard_frame_pointer_rtx;
+ else
+ {
+ if (!anchor_reg)
+ {
+ anchor_reg = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM);
+ emit_insn (gen_add3_insn (anchor_reg, base_rtx,
+ gen_int_mode (fp_offset, Pmode)));
+ }
+ base_rtx = anchor_reg;
+ }
+ offset -= fp_offset;
+ }
+ mem = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset));
+ bool need_cfa_note_p = (base_rtx != stack_pointer_rtx);
- if (regno2 <= limit
+ if (!aarch64_sve_mode_p (mode)
+ && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit
&& !cfun->machine->reg_is_wrapped_separately[regno2]
- && known_eq (GET_MODE_SIZE (mode), offset_diff))
+ && known_eq (GET_MODE_SIZE (mode),
+ cfun->machine->frame.reg_offset[regno2]
+ - cfun->machine->frame.reg_offset[regno]))
{
rtx reg2 = gen_rtx_REG (mode, regno2);
rtx mem2;
- offset = start_offset + cfun->machine->frame.reg_offset[regno2];
- mem2 = gen_frame_mem (mode, plus_constant (Pmode, stack_pointer_rtx,
- offset));
+ offset += GET_MODE_SIZE (mode);
+ mem2 = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset));
insn = emit_insn (aarch64_gen_store_pair (mode, mem, reg, mem2,
reg2));
always assumed to be relevant to the frame
calculations; subsequent parts, are only
frame-related if explicitly marked. */
- RTX_FRAME_RELATED_P (XVECEXP (PATTERN (insn), 0, 1)) = 1;
+ if (aarch64_emit_cfi_for_reg_p (regno2))
+ {
+ if (need_cfa_note_p)
+ aarch64_add_cfa_expression (insn, reg2, stack_pointer_rtx,
+ sp_offset + GET_MODE_SIZE (mode));
+ else
+ RTX_FRAME_RELATED_P (XVECEXP (PATTERN (insn), 0, 1)) = 1;
+ }
+
regno = regno2;
}
+ else if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ {
+ insn = emit_insn (gen_aarch64_pred_mov (mode, mem, ptrue, reg));
+ need_cfa_note_p = true;
+ }
+ else if (aarch64_sve_mode_p (mode))
+ insn = emit_insn (gen_rtx_SET (mem, reg));
else
insn = emit_move_insn (mem, reg);
- RTX_FRAME_RELATED_P (insn) = 1;
+ RTX_FRAME_RELATED_P (insn) = frame_related_p;
+ if (frame_related_p && need_cfa_note_p)
+ aarch64_add_cfa_expression (insn, reg, stack_pointer_rtx, sp_offset);
}
}
-/* Emit code to restore the callee registers of mode MODE from register
- number START up to and including LIMIT. Restore from the stack offset
- START_OFFSET, skipping any write-back candidates if SKIP_WB is true.
- Write the appropriate REG_CFA_RESTORE notes into CFI_OPS. */
+/* Emit code to restore the callee registers from register number START
+ up to and including LIMIT. Restore from the stack offset START_OFFSET,
+ skipping any write-back candidates if SKIP_WB is true. Write the
+ appropriate REG_CFA_RESTORE notes into CFI_OPS. */
static void
-aarch64_restore_callee_saves (machine_mode mode,
- poly_int64 start_offset, unsigned start,
+aarch64_restore_callee_saves (poly_int64 start_offset, unsigned start,
unsigned limit, bool skip_wb, rtx *cfi_ops)
{
- rtx base_rtx = stack_pointer_rtx;
unsigned regno;
unsigned regno2;
poly_int64 offset;
+ rtx anchor_reg = NULL_RTX, ptrue = NULL_RTX;
for (regno = aarch64_next_callee_save (start, limit);
regno <= limit;
regno = aarch64_next_callee_save (regno + 1, limit))
{
+ bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno);
if (cfun->machine->reg_is_wrapped_separately[regno])
- continue;
+ continue;
rtx reg, mem;
- int offset_diff;
if (skip_wb
&& (regno == cfun->machine->frame.wb_candidate1
|| regno == cfun->machine->frame.wb_candidate2))
continue;
+ machine_mode mode = aarch64_reg_save_mode (regno);
reg = gen_rtx_REG (mode, regno);
offset = start_offset + cfun->machine->frame.reg_offset[regno];
+ rtx base_rtx = stack_pointer_rtx;
+ if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ aarch64_adjust_sve_callee_save_base (mode, base_rtx, anchor_reg,
+ offset, ptrue);
mem = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset));
- regno2 = aarch64_next_callee_save (regno + 1, limit);
- offset_diff = cfun->machine->frame.reg_offset[regno2]
- - cfun->machine->frame.reg_offset[regno];
-
- if (regno2 <= limit
+ if (!aarch64_sve_mode_p (mode)
+ && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit
&& !cfun->machine->reg_is_wrapped_separately[regno2]
- && known_eq (GET_MODE_SIZE (mode), offset_diff))
+ && known_eq (GET_MODE_SIZE (mode),
+ cfun->machine->frame.reg_offset[regno2]
+ - cfun->machine->frame.reg_offset[regno]))
{
rtx reg2 = gen_rtx_REG (mode, regno2);
rtx mem2;
- offset = start_offset + cfun->machine->frame.reg_offset[regno2];
+ offset += GET_MODE_SIZE (mode);
mem2 = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset));
emit_insn (aarch64_gen_load_pair (mode, reg, mem, reg2, mem2));
*cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg2, *cfi_ops);
regno = regno2;
}
+ else if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ emit_insn (gen_aarch64_pred_mov (mode, reg, ptrue, mem));
+ else if (aarch64_sve_mode_p (mode))
+ emit_insn (gen_rtx_SET (reg, mem));
else
emit_move_insn (reg, mem);
- *cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg, *cfi_ops);
+ if (frame_related_p)
+ *cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg, *cfi_ops);
}
}
for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
if (aarch64_register_saved_on_entry (regno))
{
+ /* Punt on saves and restores that use ST1D and LD1D. We could
+ try to be smarter, but it would involve making sure that the
+ spare predicate register itself is safe to use at the save
+ and restore points. Also, when a frame pointer is being used,
+ the slots are often out of reach of ST1D and LD1D anyway. */
+ machine_mode mode = aarch64_reg_save_mode (regno);
+ if (mode == VNx2DImode && BYTES_BIG_ENDIAN)
+ continue;
+
poly_int64 offset = cfun->machine->frame.reg_offset[regno];
- if (!frame_pointer_needed)
- offset += cfun->machine->frame.frame_size
- - cfun->machine->frame.hard_fp_offset;
+
+ /* If the register is saved in the first SVE save slot, we use
+ it as a stack probe for -fstack-clash-protection. */
+ if (flag_stack_clash_protection
+ && maybe_ne (cfun->machine->frame.below_hard_fp_saved_regs_size, 0)
+ && known_eq (offset, 0))
+ continue;
+
+ /* Get the offset relative to the register we'll use. */
+ if (frame_pointer_needed)
+ offset -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+ else
+ offset += crtl->outgoing_args_size;
+
/* Check that we can access the stack slot of the register with one
direct load with no adjustments needed. */
- if (offset_12bit_unsigned_scaled_p (DImode, offset))
+ if (aarch64_sve_mode_p (mode)
+ ? offset_9bit_signed_scaled_p (mode, offset)
+ : offset_12bit_unsigned_scaled_p (mode, offset))
bitmap_set_bit (components, regno);
}
if (frame_pointer_needed)
bitmap_clear_bit (components, HARD_FRAME_POINTER_REGNUM);
+ /* If the spare predicate register used by big-endian SVE code
+ is call-preserved, it must be saved in the main prologue
+ before any saves that use it. */
+ if (cfun->machine->frame.spare_pred_reg != INVALID_REGNUM)
+ bitmap_clear_bit (components, cfun->machine->frame.spare_pred_reg);
+
unsigned reg1 = cfun->machine->frame.wb_candidate1;
unsigned reg2 = cfun->machine->frame.wb_candidate2;
/* If registers have been chosen to be stored/restored with
|| bitmap_bit_p (gen, regno)
|| bitmap_bit_p (kill, regno)))
{
- unsigned regno2, offset, offset2;
bitmap_set_bit (components, regno);
/* If there is a callee-save at an adjacent offset, add it too
to increase the use of LDP/STP. */
- offset = cfun->machine->frame.reg_offset[regno];
- regno2 = ((offset & 8) == 0) ? regno + 1 : regno - 1;
+ poly_int64 offset = cfun->machine->frame.reg_offset[regno];
+ unsigned regno2 = multiple_p (offset, 16) ? regno + 1 : regno - 1;
if (regno2 <= LAST_SAVED_REGNUM)
{
- offset2 = cfun->machine->frame.reg_offset[regno2];
- if ((offset & ~8) == (offset2 & ~8))
+ poly_int64 offset2 = cfun->machine->frame.reg_offset[regno2];
+ if (regno < regno2
+ ? known_eq (offset + 8, offset2)
+ : multiple_p (offset2, 16) && known_eq (offset2 + 8, offset))
bitmap_set_bit (components, regno2);
}
}
while (regno != last_regno)
{
- /* AAPCS64 section 5.1.2 requires only the low 64 bits to be saved
- so DFmode for the vector registers is enough. For simd functions
- we want to save the low 128 bits. */
- machine_mode mode = aarch64_reg_save_mode (cfun->decl, regno);
+ bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno);
+ machine_mode mode = aarch64_reg_save_mode (regno);
rtx reg = gen_rtx_REG (mode, regno);
poly_int64 offset = cfun->machine->frame.reg_offset[regno];
- if (!frame_pointer_needed)
- offset += cfun->machine->frame.frame_size
- - cfun->machine->frame.hard_fp_offset;
+ if (frame_pointer_needed)
+ offset -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+ else
+ offset += crtl->outgoing_args_size;
+
rtx addr = plus_constant (Pmode, ptr_reg, offset);
rtx mem = gen_frame_mem (mode, addr);
if (regno2 == last_regno)
{
insn = emit_insn (set);
- RTX_FRAME_RELATED_P (insn) = 1;
- if (prologue_p)
- add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set));
- else
- add_reg_note (insn, REG_CFA_RESTORE, reg);
+ if (frame_related_p)
+ {
+ RTX_FRAME_RELATED_P (insn) = 1;
+ if (prologue_p)
+ add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set));
+ else
+ add_reg_note (insn, REG_CFA_RESTORE, reg);
+ }
break;
}
poly_int64 offset2 = cfun->machine->frame.reg_offset[regno2];
/* The next register is not of the same class or its offset is not
mergeable with the current one into a pair. */
- if (!satisfies_constraint_Ump (mem)
+ if (aarch64_sve_mode_p (mode)
+ || !satisfies_constraint_Ump (mem)
|| GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
|| (crtl->abi->id () == ARM_PCS_SIMD && FP_REGNUM_P (regno))
|| maybe_ne ((offset2 - cfun->machine->frame.reg_offset[regno]),
GET_MODE_SIZE (mode)))
{
insn = emit_insn (set);
- RTX_FRAME_RELATED_P (insn) = 1;
- if (prologue_p)
- add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set));
- else
- add_reg_note (insn, REG_CFA_RESTORE, reg);
+ if (frame_related_p)
+ {
+ RTX_FRAME_RELATED_P (insn) = 1;
+ if (prologue_p)
+ add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set));
+ else
+ add_reg_note (insn, REG_CFA_RESTORE, reg);
+ }
regno = regno2;
continue;
}
+ bool frame_related2_p = aarch64_emit_cfi_for_reg_p (regno2);
+
/* REGNO2 can be saved/restored in a pair with REGNO. */
rtx reg2 = gen_rtx_REG (mode, regno2);
- if (!frame_pointer_needed)
- offset2 += cfun->machine->frame.frame_size
- - cfun->machine->frame.hard_fp_offset;
+ if (frame_pointer_needed)
+ offset2 -= cfun->machine->frame.below_hard_fp_saved_regs_size;
+ else
+ offset2 += crtl->outgoing_args_size;
rtx addr2 = plus_constant (Pmode, ptr_reg, offset2);
rtx mem2 = gen_frame_mem (mode, addr2);
rtx set2 = prologue_p ? gen_rtx_SET (mem2, reg2)
else
insn = emit_insn (aarch64_gen_load_pair (mode, reg, mem, reg2, mem2));
- RTX_FRAME_RELATED_P (insn) = 1;
- if (prologue_p)
+ if (frame_related_p || frame_related2_p)
{
- add_reg_note (insn, REG_CFA_OFFSET, set);
- add_reg_note (insn, REG_CFA_OFFSET, set2);
- }
- else
- {
- add_reg_note (insn, REG_CFA_RESTORE, reg);
- add_reg_note (insn, REG_CFA_RESTORE, reg2);
+ RTX_FRAME_RELATED_P (insn) = 1;
+ if (prologue_p)
+ {
+ if (frame_related_p)
+ add_reg_note (insn, REG_CFA_OFFSET, set);
+ if (frame_related2_p)
+ add_reg_note (insn, REG_CFA_OFFSET, set2);
+ }
+ else
+ {
+ if (frame_related_p)
+ add_reg_note (insn, REG_CFA_RESTORE, reg);
+ if (frame_related2_p)
+ add_reg_note (insn, REG_CFA_RESTORE, reg2);
+ }
}
regno = aarch64_get_next_set_bit (components, regno2 + 1);
HOST_WIDE_INT guard_size
= 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
- /* When doing the final adjustment for the outgoing argument size we can't
- assume that LR was saved at position 0. So subtract it's offset from the
- ABI safe buffer so that we don't accidentally allow an adjustment that
- would result in an allocation larger than the ABI buffer without
- probing. */
HOST_WIDE_INT min_probe_threshold
- = final_adjustment_p
- ? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
- : guard_size - guard_used_by_caller;
+ = (final_adjustment_p
+ ? guard_used_by_caller
+ : guard_size - guard_used_by_caller);
+ /* When doing the final adjustment for the outgoing arguments, take into
+ account any unprobed space there is above the current SP. There are
+ two cases:
+
+ - When saving SVE registers below the hard frame pointer, we force
+ the lowest save to take place in the prologue before doing the final
+ adjustment (i.e. we don't allow the save to be shrink-wrapped).
+ This acts as a probe at SP, so there is no unprobed space.
+
+ - When there are no SVE register saves, we use the store of the link
+ register as a probe. We can't assume that LR was saved at position 0
+ though, so treat any space below it as unprobed. */
+ if (final_adjustment_p
+ && known_eq (cfun->machine->frame.below_hard_fp_saved_regs_size, 0))
+ {
+ poly_int64 lr_offset = cfun->machine->frame.reg_offset[LR_REGNUM];
+ if (known_ge (lr_offset, 0))
+ min_probe_threshold -= lr_offset.to_constant ();
+ else
+ gcc_assert (!flag_stack_clash_protection || known_eq (poly_size, 0));
+ }
poly_int64 frame_size = cfun->machine->frame.frame_size;
if (flag_stack_clash_protection && !final_adjustment_p)
{
poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
+ poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
poly_int64 final_adjust = cfun->machine->frame.final_adjust;
if (known_eq (frame_size, 0))
{
dump_stack_clash_frame_info (NO_PROBE_NO_FRAME, false);
}
- else if (known_lt (initial_adjust, guard_size - guard_used_by_caller)
+ else if (known_lt (initial_adjust + sve_callee_adjust,
+ guard_size - guard_used_by_caller)
&& known_lt (final_adjust, guard_used_by_caller))
{
dump_stack_clash_frame_info (NO_PROBE_SMALL_FRAME, true);
return 0;
}
-/* Add a REG_CFA_EXPRESSION note to INSN to say that register REG
- is saved at BASE + OFFSET. */
-
-static void
-aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg,
- rtx base, poly_int64 offset)
-{
- rtx mem = gen_frame_mem (DImode, plus_constant (Pmode, base, offset));
- add_reg_note (insn, REG_CFA_EXPRESSION,
- gen_rtx_SET (mem, regno_reg_rtx[reg]));
-}
-
/* AArch64 stack frames generated by this compiler look like:
+-------------------------------+
+-------------------------------+ |
| LR' | |
+-------------------------------+ |
- | FP' | / <- hard_frame_pointer_rtx (aligned)
- +-------------------------------+
+ | FP' | |
+ +-------------------------------+ |<- hard_frame_pointer_rtx (aligned)
+ | SVE vector registers | | \
+ +-------------------------------+ | | below_hard_fp_saved_regs_size
+ | SVE predicate registers | / /
+ +-------------------------------+
| dynamic allocation |
+-------------------------------+
| padding |
The following registers are reserved during frame layout and should not be
used for any other purpose:
- - r11: Used by stack clash protection when SVE is enabled.
+ - r11: Used by stack clash protection when SVE is enabled, and also
+ as an anchor register when saving and restoring registers
- r12(EP0) and r13(EP1): Used as temporaries for stack adjustment.
- r14 and r15: Used for speculation tracking.
- r16(IP0), r17(IP1): Used by indirect tailcalls.
HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
poly_int64 final_adjust = cfun->machine->frame.final_adjust;
poly_int64 callee_offset = cfun->machine->frame.callee_offset;
+ poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
+ poly_int64 below_hard_fp_saved_regs_size
+ = cfun->machine->frame.below_hard_fp_saved_regs_size;
unsigned reg1 = cfun->machine->frame.wb_candidate1;
unsigned reg2 = cfun->machine->frame.wb_candidate2;
bool emit_frame_chain = cfun->machine->frame.emit_frame_chain;
rtx_insn *insn;
+ if (flag_stack_clash_protection && known_eq (callee_adjust, 0))
+ {
+ /* Fold the SVE allocation into the initial allocation.
+ We don't do this in aarch64_layout_arg to avoid pessimizing
+ the epilogue code. */
+ initial_adjust += sve_callee_adjust;
+ sve_callee_adjust = 0;
+ }
+
/* Sign return address for functions. */
if (aarch64_return_address_signing_enabled ())
{
if (callee_adjust != 0)
aarch64_push_regs (reg1, reg2, callee_adjust);
+ /* The offset of the frame chain record (if any) from the current SP. */
+ poly_int64 chain_offset = (initial_adjust + callee_adjust
+ - cfun->machine->frame.hard_fp_offset);
+ gcc_assert (known_ge (chain_offset, 0));
+
+ /* The offset of the bottom of the save area from the current SP. */
+ poly_int64 saved_regs_offset = chain_offset - below_hard_fp_saved_regs_size;
+
if (emit_frame_chain)
{
- poly_int64 reg_offset = callee_adjust;
if (callee_adjust == 0)
{
reg1 = R29_REGNUM;
reg2 = R30_REGNUM;
- reg_offset = callee_offset;
- aarch64_save_callee_saves (DImode, reg_offset, reg1, reg2, false);
+ aarch64_save_callee_saves (saved_regs_offset, reg1, reg2,
+ false, false);
}
+ else
+ gcc_assert (known_eq (chain_offset, 0));
aarch64_add_offset (Pmode, hard_frame_pointer_rtx,
- stack_pointer_rtx, callee_offset,
+ stack_pointer_rtx, chain_offset,
tmp1_rtx, tmp0_rtx, frame_pointer_needed);
if (frame_pointer_needed && !frame_size.is_constant ())
{
/* Change the save slot expressions for the registers that
we've already saved. */
- reg_offset -= callee_offset;
- aarch64_add_cfa_expression (insn, reg2, hard_frame_pointer_rtx,
- reg_offset + UNITS_PER_WORD);
- aarch64_add_cfa_expression (insn, reg1, hard_frame_pointer_rtx,
- reg_offset);
+ aarch64_add_cfa_expression (insn, regno_reg_rtx[reg2],
+ hard_frame_pointer_rtx, UNITS_PER_WORD);
+ aarch64_add_cfa_expression (insn, regno_reg_rtx[reg1],
+ hard_frame_pointer_rtx, 0);
}
emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
}
- aarch64_save_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
- callee_adjust != 0 || emit_frame_chain);
- if (crtl->abi->id () == ARM_PCS_SIMD)
- aarch64_save_callee_saves (TFmode, callee_offset, V0_REGNUM, V31_REGNUM,
- callee_adjust != 0 || emit_frame_chain);
- else
- aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
- callee_adjust != 0 || emit_frame_chain);
+ aarch64_save_callee_saves (saved_regs_offset, R0_REGNUM, R30_REGNUM,
+ callee_adjust != 0 || emit_frame_chain,
+ emit_frame_chain);
+ if (maybe_ne (sve_callee_adjust, 0))
+ {
+ gcc_assert (!flag_stack_clash_protection
+ || known_eq (initial_adjust, 0));
+ aarch64_allocate_and_probe_stack_space (tmp1_rtx, tmp0_rtx,
+ sve_callee_adjust,
+ !frame_pointer_needed, false);
+ saved_regs_offset += sve_callee_adjust;
+ }
+ aarch64_save_callee_saves (saved_regs_offset, P0_REGNUM, P15_REGNUM,
+ false, emit_frame_chain);
+ aarch64_save_callee_saves (saved_regs_offset, V0_REGNUM, V31_REGNUM,
+ callee_adjust != 0 || emit_frame_chain,
+ emit_frame_chain);
/* We may need to probe the final adjustment if it is larger than the guard
that is assumed by the called. */
HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
poly_int64 final_adjust = cfun->machine->frame.final_adjust;
poly_int64 callee_offset = cfun->machine->frame.callee_offset;
+ poly_int64 sve_callee_adjust = cfun->machine->frame.sve_callee_adjust;
+ poly_int64 below_hard_fp_saved_regs_size
+ = cfun->machine->frame.below_hard_fp_saved_regs_size;
unsigned reg1 = cfun->machine->frame.wb_candidate1;
unsigned reg2 = cfun->machine->frame.wb_candidate2;
rtx cfi_ops = NULL;
= 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
- /* We can re-use the registers when the allocation amount is smaller than
- guard_size - guard_used_by_caller because we won't be doing any probes
- then. In such situations the register should remain live with the correct
+ /* We can re-use the registers when:
+
+ (a) the deallocation amount is the same as the corresponding
+ allocation amount (which is false if we combine the initial
+ and SVE callee save allocations in the prologue); and
+
+ (b) the allocation amount doesn't need a probe (which is false
+ if the amount is guard_size - guard_used_by_caller or greater).
+
+ In such situations the register should remain live with the correct
value. */
bool can_inherit_p = (initial_adjust.is_constant ()
- && final_adjust.is_constant ())
+ && final_adjust.is_constant ()
&& (!flag_stack_clash_protection
- || known_lt (initial_adjust,
- guard_size - guard_used_by_caller));
+ || (known_lt (initial_adjust,
+ guard_size - guard_used_by_caller)
+ && known_eq (sve_callee_adjust, 0))));
/* We need to add memory barrier to prevent read from deallocated stack. */
bool need_barrier_p
/* If writeback is used when restoring callee-saves, the CFA
is restored on the instruction doing the writeback. */
aarch64_add_offset (Pmode, stack_pointer_rtx,
- hard_frame_pointer_rtx, -callee_offset,
+ hard_frame_pointer_rtx,
+ -callee_offset - below_hard_fp_saved_regs_size,
tmp1_rtx, tmp0_rtx, callee_adjust == 0);
else
/* The case where we need to re-use the register here is very rare, so
immediate doesn't fit. */
aarch64_add_sp (tmp1_rtx, tmp0_rtx, final_adjust, true);
- aarch64_restore_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
+ /* Restore the vector registers before the predicate registers,
+ so that we can use P4 as a temporary for big-endian SVE frames. */
+ aarch64_restore_callee_saves (callee_offset, V0_REGNUM, V31_REGNUM,
+ callee_adjust != 0, &cfi_ops);
+ aarch64_restore_callee_saves (callee_offset, P0_REGNUM, P15_REGNUM,
+ false, &cfi_ops);
+ if (maybe_ne (sve_callee_adjust, 0))
+ aarch64_add_sp (NULL_RTX, NULL_RTX, sve_callee_adjust, true);
+ aarch64_restore_callee_saves (callee_offset - sve_callee_adjust,
+ R0_REGNUM, R30_REGNUM,
callee_adjust != 0, &cfi_ops);
- if (crtl->abi->id () == ARM_PCS_SIMD)
- aarch64_restore_callee_saves (TFmode, callee_offset, V0_REGNUM, V31_REGNUM,
- callee_adjust != 0, &cfi_ops);
- else
- aarch64_restore_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
- callee_adjust != 0, &cfi_ops);
if (need_barrier_p)
emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
secondary_reload_info *sri)
{
/* Use aarch64_sve_reload_be for SVE reloads that cannot be handled
- directly by the *aarch64_sve_mov<mode>_be move pattern. See the
+ directly by the *aarch64_sve_mov<mode>_[lb]e move patterns. See the
comment at the head of aarch64-sve.md for more details about the
big-endian handling. */
if (BYTES_BIG_ENDIAN
&& reg_class_subset_p (rclass, FP_REGS)
&& !((REG_P (x) && HARD_REGISTER_P (x))
|| aarch64_simd_valid_immediate (x, NULL))
+ && mode != VNx16QImode
&& aarch64_sve_data_mode_p (mode))
{
sri->icode = CODE_FOR_aarch64_sve_reload_be;
machine_mode mode;
HOST_WIDE_INT size;
+ /* SVE types (and types containing SVE types) must be handled
+ before calling this function. */
+ gcc_assert (!aarch64_sve::builtin_type_p (type));
+
switch (TREE_CODE (type))
{
case REAL_TYPE:
{
poly_int64 size = -1;
+ if (type && aarch64_sve::builtin_type_p (type))
+ return false;
+
if (type && TREE_CODE (type) == VECTOR_TYPE)
size = int_size_in_bytes (type);
else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
int *count,
bool *is_ha)
{
+ if (is_ha != NULL) *is_ha = false;
+
+ if (type && aarch64_sve::builtin_type_p (type))
+ return false;
+
machine_mode new_mode = VOIDmode;
bool composite_p = aarch64_composite_type_p (type, mode);
- if (is_ha != NULL) *is_ha = false;
-
if ((!composite_p && GET_MODE_CLASS (mode) == MODE_FLOAT)
|| aarch64_short_vector_p (type, mode))
{
static void
aarch64_asm_output_variant_pcs (FILE *stream, const tree decl, const char* name)
{
- if (aarch64_simd_decl_p (decl))
+ if (TREE_CODE (decl) == FUNCTION_DECL)
{
- fprintf (stream, "\t.variant_pcs\t");
- assemble_name (stream, name);
- fprintf (stream, "\n");
+ arm_pcs pcs = (arm_pcs) fndecl_abi (decl).id ();
+ if (pcs == ARM_PCS_SIMD || pcs == ARM_PCS_SVE)
+ {
+ fprintf (stream, "\t.variant_pcs\t");
+ assemble_name (stream, name);
+ fprintf (stream, "\n");
+ }
}
}
#undef TARGET_ASM_POST_CFI_STARTPROC
#define TARGET_ASM_POST_CFI_STARTPROC aarch64_post_cfi_startproc
+#undef TARGET_STRICT_ARGUMENT_NAMING
+#define TARGET_STRICT_ARGUMENT_NAMING hook_bool_CUMULATIVE_ARGS_true
+
struct gcc_target targetm = TARGET_INITIALIZER;
#include "gt-aarch64.h"
#define ARG_POINTER_REGNUM AP_REGNUM
#define FIRST_PSEUDO_REGISTER (FFRT_REGNUM + 1)
-/* The number of (integer) argument register available. */
+/* The number of argument registers available for each class. */
#define NUM_ARG_REGS 8
#define NUM_FP_ARG_REGS 8
+#define NUM_PR_ARG_REGS 4
/* A Homogeneous Floating-Point or Short-Vector Aggregate may have at most
four members. */
#ifdef HAVE_POLY_INT_H
struct GTY (()) aarch64_frame
{
- HOST_WIDE_INT reg_offset[FIRST_PSEUDO_REGISTER];
+ poly_int64 reg_offset[LAST_SAVED_REGNUM + 1];
/* The number of extra stack bytes taken up by register varargs.
This area is allocated by the callee at the very top of the
STACK_BOUNDARY. */
HOST_WIDE_INT saved_varargs_size;
- /* The size of the saved callee-save int/FP registers. */
+ /* The size of the callee-save registers with a slot in REG_OFFSET. */
+ poly_int64 saved_regs_size;
- HOST_WIDE_INT saved_regs_size;
+ /* The size of the callee-save registers with a slot in REG_OFFSET that
+ are saved below the hard frame pointer. */
+ poly_int64 below_hard_fp_saved_regs_size;
/* Offset from the base of the frame (incomming SP) to the
top of the locals area. This value is always a multiple of
It may be non-zero if no push is used (ie. callee_adjust == 0). */
poly_int64 callee_offset;
+ /* The size of the stack adjustment before saving or after restoring
+ SVE registers. */
+ poly_int64 sve_callee_adjust;
+
/* The size of the stack adjustment after saving callee-saves. */
poly_int64 final_adjust;
unsigned wb_candidate1;
unsigned wb_candidate2;
+ /* Big-endian SVE frames need a spare predicate register in order
+ to save vector registers in the correct layout for unwinding.
+ This is the register they should use. */
+ unsigned spare_pred_reg;
+
bool laid_out;
};
{
ARM_PCS_AAPCS64, /* Base standard AAPCS for 64 bit. */
ARM_PCS_SIMD, /* For aarch64_vector_pcs functions. */
+ ARM_PCS_SVE, /* For functions that pass or return
+ values in SVE registers. */
ARM_PCS_TLSDESC, /* For targets of tlsdesc calls. */
ARM_PCS_UNKNOWN
};
int aapcs_nextncrn; /* Next next core register number. */
int aapcs_nvrn; /* Next Vector register number. */
int aapcs_nextnvrn; /* Next Next Vector register number. */
+ int aapcs_nprn; /* Next Predicate register number. */
+ int aapcs_nextnprn; /* Next Next Predicate register number. */
rtx aapcs_reg; /* Register assigned to this argument. This
is NULL_RTX if this parameter goes on
the stack. */
aapcs_reg == NULL_RTX. */
int aapcs_stack_size; /* The total size (in words, per 8 byte) of the
stack arg area so far. */
+ bool silent_p; /* True if we should act silently, rather than
+ raise an error for invalid calls. */
} CUMULATIVE_ARGS;
#endif
#define BITS_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 64))
#define BYTES_PER_SVE_VECTOR (poly_uint16 (aarch64_sve_vg * 8))
-/* The number of bytes in an SVE predicate. */
+/* The number of bits and bytes in an SVE predicate. */
+#define BITS_PER_SVE_PRED BYTES_PER_SVE_VECTOR
#define BYTES_PER_SVE_PRED aarch64_sve_vg
/* The SVE mode for a vector of bytes. */
(V29_REGNUM 61)
(V30_REGNUM 62)
(V31_REGNUM 63)
- (LAST_SAVED_REGNUM 63)
(SFP_REGNUM 64)
(AP_REGNUM 65)
(CC_REGNUM 66)
(P13_REGNUM 81)
(P14_REGNUM 82)
(P15_REGNUM 83)
+ (LAST_SAVED_REGNUM 83)
(FFR_REGNUM 84)
;; "FFR token": a fake register used for representing the scheduling
;; restrictions on FFR-related operations.
+2019-10-29 Richard Sandiford <richard.sandiford@arm.com>
+
+ * gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp: New file.
+ * gcc.target/aarch64/sve/pcs/annotate_1.c: New test.
+ * gcc.target/aarch64/sve/pcs/annotate_2.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/annotate_3.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/annotate_4.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/annotate_5.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/annotate_6.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/annotate_7.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_1.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_10.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_11_nosc.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_11_sc.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_2.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_3.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_4.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_le_f16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_le_f32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_le_f64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_le_s16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_le_s32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_le_s64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_le_s8.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_le_u16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_le_u32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_le_u64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_5_le_u8.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_be_f16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_be_f32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_be_f64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_be_s16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_be_s32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_be_s64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_be_s8.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_be_u16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_be_u32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_be_u64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_be_u8.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_le_f16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_le_f32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_le_f64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_le_s16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_le_s32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_le_s64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_le_s8.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_le_u16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_le_u32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_le_u64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_6_le_u8.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_7.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_8.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/args_9.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/nosve_1.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/nosve_2.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/nosve_3.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/nosve_4.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/nosve_5.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/nosve_6.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/nosve_7.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/nosve_8.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_1.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_1_1024.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_1_2048.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_1_256.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_1_512.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_2.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_3.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_4.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_4_1024.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_4_2048.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_4_256.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_4_512.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_5.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_5_1024.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_5_2048.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_5_256.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_5_512.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_6.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_7.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_8.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/return_9.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/saves_1_be_nowrap.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/saves_1_be_wrap.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/saves_1_le_nowrap.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/saves_1_le_wrap.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/saves_2_be_nowrap.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/saves_2_be_wrap.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/saves_2_le_nowrap.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/saves_2_le_wrap.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/saves_3.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/saves_4_be.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/saves_4_le.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/saves_5_be.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/saves_5_le.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/stack_clash_1.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/stack_clash_1_256.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/stack_clash_1_512.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/stack_clash_1_1024.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/stack_clash_1_2048.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/stack_clash_2.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/stack_clash_3.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/unprototyped_1.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_2_f16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_2_f32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_2_f64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_2_s16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_2_s32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_2_s64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_2_s8.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_2_u16.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_2_u32.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_2_u64.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_2_u8.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_3_nosc.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/varargs_3_sc.c: Likewise.
+ * gcc.target/aarch64/sve/pcs/vpcs_1.c: Likewise.
+ * g++.target/aarch64/sve/catch_7.C: Likewise.
+
2019-10-29 Richard Sandiford <richard.sandiford@arm.com>
Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
--- /dev/null
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O" } */
+
+#include <arm_sve.h>
+
+void __attribute__ ((noipa))
+f1 (void)
+{
+ throw 1;
+}
+
+void __attribute__ ((noipa))
+f2 (svbool_t)
+{
+ register svint8_t z8 asm ("z8") = svindex_s8 (11, 1);
+ asm volatile ("" :: "w" (z8));
+ f1 ();
+}
+
+void __attribute__ ((noipa))
+f3 (int n)
+{
+ register double d8 asm ("v8") = 42.0;
+ for (int i = 0; i < n; ++i)
+ {
+ asm volatile ("" : "=w" (d8) : "w" (d8));
+ try { f2 (svptrue_b8 ()); } catch (int) { break; }
+ }
+ if (d8 != 42.0)
+ __builtin_abort ();
+}
+
+int
+main (void)
+{
+ f3 (100);
+ return 0;
+}
--- /dev/null
+# Specific regression driver for AArch64 SVE.
+# Copyright (C) 2009-2019 Free Software Foundation, Inc.
+# Contributed by ARM Ltd.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3. If not see
+# <http://www.gnu.org/licenses/>. */
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } then {
+ return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+ set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# Initialize `dg'.
+dg-init
+
+# Force SVE if we're not testing it already.
+if { [check_effective_target_aarch64_sve] } {
+ set sve_flags ""
+} else {
+ set sve_flags "-march=armv8.2-a+sve"
+}
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] \
+ $sve_flags $DEFAULT_CFLAGS
+
+# All done.
+dg-finish
--- /dev/null
+/* { dg-do compile } */
+
+#include <arm_sve.h>
+
+svbool_t ret_b (void) { return svptrue_b8 (); }
+
+svint8_t ret_s8 (void) { return svdup_s8 (0); }
+svint16_t ret_s16 (void) { return svdup_s16 (0); }
+svint32_t ret_s32 (void) { return svdup_s32 (0); }
+svint64_t ret_s64 (void) { return svdup_s64 (0); }
+svuint8_t ret_u8 (void) { return svdup_u8 (0); }
+svuint16_t ret_u16 (void) { return svdup_u16 (0); }
+svuint32_t ret_u32 (void) { return svdup_u32 (0); }
+svuint64_t ret_u64 (void) { return svdup_u64 (0); }
+svfloat16_t ret_f16 (void) { return svdup_f16 (0); }
+svfloat32_t ret_f32 (void) { return svdup_f32 (0); }
+svfloat64_t ret_f64 (void) { return svdup_f64 (0); }
+
+svint8x2_t ret_s8x2 (void) { return svundef2_s8 (); }
+svint16x2_t ret_s16x2 (void) { return svundef2_s16 (); }
+svint32x2_t ret_s32x2 (void) { return svundef2_s32 (); }
+svint64x2_t ret_s64x2 (void) { return svundef2_s64 (); }
+svuint8x2_t ret_u8x2 (void) { return svundef2_u8 (); }
+svuint16x2_t ret_u16x2 (void) { return svundef2_u16 (); }
+svuint32x2_t ret_u32x2 (void) { return svundef2_u32 (); }
+svuint64x2_t ret_u64x2 (void) { return svundef2_u64 (); }
+svfloat16x2_t ret_f16x2 (void) { return svundef2_f16 (); }
+svfloat32x2_t ret_f32x2 (void) { return svundef2_f32 (); }
+svfloat64x2_t ret_f64x2 (void) { return svundef2_f64 (); }
+
+svint8x3_t ret_s8x3 (void) { return svundef3_s8 (); }
+svint16x3_t ret_s16x3 (void) { return svundef3_s16 (); }
+svint32x3_t ret_s32x3 (void) { return svundef3_s32 (); }
+svint64x3_t ret_s64x3 (void) { return svundef3_s64 (); }
+svuint8x3_t ret_u8x3 (void) { return svundef3_u8 (); }
+svuint16x3_t ret_u16x3 (void) { return svundef3_u16 (); }
+svuint32x3_t ret_u32x3 (void) { return svundef3_u32 (); }
+svuint64x3_t ret_u64x3 (void) { return svundef3_u64 (); }
+svfloat16x3_t ret_f16x3 (void) { return svundef3_f16 (); }
+svfloat32x3_t ret_f32x3 (void) { return svundef3_f32 (); }
+svfloat64x3_t ret_f64x3 (void) { return svundef3_f64 (); }
+
+svint8x4_t ret_s8x4 (void) { return svundef4_s8 (); }
+svint16x4_t ret_s16x4 (void) { return svundef4_s16 (); }
+svint32x4_t ret_s32x4 (void) { return svundef4_s32 (); }
+svint64x4_t ret_s64x4 (void) { return svundef4_s64 (); }
+svuint8x4_t ret_u8x4 (void) { return svundef4_u8 (); }
+svuint16x4_t ret_u16x4 (void) { return svundef4_u16 (); }
+svuint32x4_t ret_u32x4 (void) { return svundef4_u32 (); }
+svuint64x4_t ret_u64x4 (void) { return svundef4_u64 (); }
+svfloat16x4_t ret_f16x4 (void) { return svundef4_f16 (); }
+svfloat32x4_t ret_f32x4 (void) { return svundef4_f32 (); }
+svfloat64x4_t ret_f64x4 (void) { return svundef4_f64 (); }
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_b\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s8\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u8\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f64\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s8x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u8x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f64x2\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s8x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s16x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s16x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s32x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s64x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u8x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u16x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u32x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u64x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f16x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f32x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f64x3\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s8x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s16x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s32x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_s64x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u8x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u16x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u32x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_u64x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f16x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f32x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tret_f64x4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+
+#include <arm_sve.h>
+
+void fn_b (svbool_t x) {}
+
+void fn_s8 (svint8_t x) {}
+void fn_s16 (svint16_t x) {}
+void fn_s32 (svint32_t x) {}
+void fn_s64 (svint64_t x) {}
+void fn_u8 (svuint8_t x) {}
+void fn_u16 (svuint16_t x) {}
+void fn_u32 (svuint32_t x) {}
+void fn_u64 (svuint64_t x) {}
+void fn_f16 (svfloat16_t x) {}
+void fn_f32 (svfloat32_t x) {}
+void fn_f64 (svfloat64_t x) {}
+
+void fn_s8x2 (svint8x2_t x) {}
+void fn_s16x2 (svint16x2_t x) {}
+void fn_s32x2 (svint32x2_t x) {}
+void fn_s64x2 (svint64x2_t x) {}
+void fn_u8x2 (svuint8x2_t x) {}
+void fn_u16x2 (svuint16x2_t x) {}
+void fn_u32x2 (svuint32x2_t x) {}
+void fn_u64x2 (svuint64x2_t x) {}
+void fn_f16x2 (svfloat16x2_t x) {}
+void fn_f32x2 (svfloat32x2_t x) {}
+void fn_f64x2 (svfloat64x2_t x) {}
+
+void fn_s8x3 (svint8x3_t x) {}
+void fn_s16x3 (svint16x3_t x) {}
+void fn_s32x3 (svint32x3_t x) {}
+void fn_s64x3 (svint64x3_t x) {}
+void fn_u8x3 (svuint8x3_t x) {}
+void fn_u16x3 (svuint16x3_t x) {}
+void fn_u32x3 (svuint32x3_t x) {}
+void fn_u64x3 (svuint64x3_t x) {}
+void fn_f16x3 (svfloat16x3_t x) {}
+void fn_f32x3 (svfloat32x3_t x) {}
+void fn_f64x3 (svfloat64x3_t x) {}
+
+void fn_s8x4 (svint8x4_t x) {}
+void fn_s16x4 (svint16x4_t x) {}
+void fn_s32x4 (svint32x4_t x) {}
+void fn_s64x4 (svint64x4_t x) {}
+void fn_u8x4 (svuint8x4_t x) {}
+void fn_u16x4 (svuint16x4_t x) {}
+void fn_u32x4 (svuint32x4_t x) {}
+void fn_u64x4 (svuint64x4_t x) {}
+void fn_f16x4 (svfloat16x4_t x) {}
+void fn_f32x4 (svfloat32x4_t x) {}
+void fn_f64x4 (svfloat64x4_t x) {}
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_b\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x2\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x3\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+
+#include <arm_sve.h>
+
+void fn_s8 (float d0, float d1, float d2, float d3, svint8_t x) {}
+void fn_s16 (float d0, float d1, float d2, float d3, svint16_t x) {}
+void fn_s32 (float d0, float d1, float d2, float d3, svint32_t x) {}
+void fn_s64 (float d0, float d1, float d2, float d3, svint64_t x) {}
+void fn_u8 (float d0, float d1, float d2, float d3, svuint8_t x) {}
+void fn_u16 (float d0, float d1, float d2, float d3, svuint16_t x) {}
+void fn_u32 (float d0, float d1, float d2, float d3, svuint32_t x) {}
+void fn_u64 (float d0, float d1, float d2, float d3, svuint64_t x) {}
+void fn_f16 (float d0, float d1, float d2, float d3, svfloat16_t x) {}
+void fn_f32 (float d0, float d1, float d2, float d3, svfloat32_t x) {}
+void fn_f64 (float d0, float d1, float d2, float d3, svfloat64_t x) {}
+
+void fn_s8x2 (float d0, float d1, float d2, float d3, svint8x2_t x) {}
+void fn_s16x2 (float d0, float d1, float d2, float d3, svint16x2_t x) {}
+void fn_s32x2 (float d0, float d1, float d2, float d3, svint32x2_t x) {}
+void fn_s64x2 (float d0, float d1, float d2, float d3, svint64x2_t x) {}
+void fn_u8x2 (float d0, float d1, float d2, float d3, svuint8x2_t x) {}
+void fn_u16x2 (float d0, float d1, float d2, float d3, svuint16x2_t x) {}
+void fn_u32x2 (float d0, float d1, float d2, float d3, svuint32x2_t x) {}
+void fn_u64x2 (float d0, float d1, float d2, float d3, svuint64x2_t x) {}
+void fn_f16x2 (float d0, float d1, float d2, float d3, svfloat16x2_t x) {}
+void fn_f32x2 (float d0, float d1, float d2, float d3, svfloat32x2_t x) {}
+void fn_f64x2 (float d0, float d1, float d2, float d3, svfloat64x2_t x) {}
+
+void fn_s8x3 (float d0, float d1, float d2, float d3, svint8x3_t x) {}
+void fn_s16x3 (float d0, float d1, float d2, float d3, svint16x3_t x) {}
+void fn_s32x3 (float d0, float d1, float d2, float d3, svint32x3_t x) {}
+void fn_s64x3 (float d0, float d1, float d2, float d3, svint64x3_t x) {}
+void fn_u8x3 (float d0, float d1, float d2, float d3, svuint8x3_t x) {}
+void fn_u16x3 (float d0, float d1, float d2, float d3, svuint16x3_t x) {}
+void fn_u32x3 (float d0, float d1, float d2, float d3, svuint32x3_t x) {}
+void fn_u64x3 (float d0, float d1, float d2, float d3, svuint64x3_t x) {}
+void fn_f16x3 (float d0, float d1, float d2, float d3, svfloat16x3_t x) {}
+void fn_f32x3 (float d0, float d1, float d2, float d3, svfloat32x3_t x) {}
+void fn_f64x3 (float d0, float d1, float d2, float d3, svfloat64x3_t x) {}
+
+void fn_s8x4 (float d0, float d1, float d2, float d3, svint8x4_t x) {}
+void fn_s16x4 (float d0, float d1, float d2, float d3, svint16x4_t x) {}
+void fn_s32x4 (float d0, float d1, float d2, float d3, svint32x4_t x) {}
+void fn_s64x4 (float d0, float d1, float d2, float d3, svint64x4_t x) {}
+void fn_u8x4 (float d0, float d1, float d2, float d3, svuint8x4_t x) {}
+void fn_u16x4 (float d0, float d1, float d2, float d3, svuint16x4_t x) {}
+void fn_u32x4 (float d0, float d1, float d2, float d3, svuint32x4_t x) {}
+void fn_u64x4 (float d0, float d1, float d2, float d3, svuint64x4_t x) {}
+void fn_f16x4 (float d0, float d1, float d2, float d3, svfloat16x4_t x) {}
+void fn_f32x4 (float d0, float d1, float d2, float d3, svfloat32x4_t x) {}
+void fn_f64x4 (float d0, float d1, float d2, float d3, svfloat64x4_t x) {}
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x2\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x3\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x4\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+
+#include <arm_sve.h>
+
+void fn_s8 (float d0, float d1, float d2, float d3,
+ float d4, svint8_t x) {}
+void fn_s16 (float d0, float d1, float d2, float d3,
+ float d4, svint16_t x) {}
+void fn_s32 (float d0, float d1, float d2, float d3,
+ float d4, svint32_t x) {}
+void fn_s64 (float d0, float d1, float d2, float d3,
+ float d4, svint64_t x) {}
+void fn_u8 (float d0, float d1, float d2, float d3,
+ float d4, svuint8_t x) {}
+void fn_u16 (float d0, float d1, float d2, float d3,
+ float d4, svuint16_t x) {}
+void fn_u32 (float d0, float d1, float d2, float d3,
+ float d4, svuint32_t x) {}
+void fn_u64 (float d0, float d1, float d2, float d3,
+ float d4, svuint64_t x) {}
+void fn_f16 (float d0, float d1, float d2, float d3,
+ float d4, svfloat16_t x) {}
+void fn_f32 (float d0, float d1, float d2, float d3,
+ float d4, svfloat32_t x) {}
+void fn_f64 (float d0, float d1, float d2, float d3,
+ float d4, svfloat64_t x) {}
+
+void fn_s8x2 (float d0, float d1, float d2, float d3,
+ float d4, svint8x2_t x) {}
+void fn_s16x2 (float d0, float d1, float d2, float d3,
+ float d4, svint16x2_t x) {}
+void fn_s32x2 (float d0, float d1, float d2, float d3,
+ float d4, svint32x2_t x) {}
+void fn_s64x2 (float d0, float d1, float d2, float d3,
+ float d4, svint64x2_t x) {}
+void fn_u8x2 (float d0, float d1, float d2, float d3,
+ float d4, svuint8x2_t x) {}
+void fn_u16x2 (float d0, float d1, float d2, float d3,
+ float d4, svuint16x2_t x) {}
+void fn_u32x2 (float d0, float d1, float d2, float d3,
+ float d4, svuint32x2_t x) {}
+void fn_u64x2 (float d0, float d1, float d2, float d3,
+ float d4, svuint64x2_t x) {}
+void fn_f16x2 (float d0, float d1, float d2, float d3,
+ float d4, svfloat16x2_t x) {}
+void fn_f32x2 (float d0, float d1, float d2, float d3,
+ float d4, svfloat32x2_t x) {}
+void fn_f64x2 (float d0, float d1, float d2, float d3,
+ float d4, svfloat64x2_t x) {}
+
+void fn_s8x3 (float d0, float d1, float d2, float d3,
+ float d4, svint8x3_t x) {}
+void fn_s16x3 (float d0, float d1, float d2, float d3,
+ float d4, svint16x3_t x) {}
+void fn_s32x3 (float d0, float d1, float d2, float d3,
+ float d4, svint32x3_t x) {}
+void fn_s64x3 (float d0, float d1, float d2, float d3,
+ float d4, svint64x3_t x) {}
+void fn_u8x3 (float d0, float d1, float d2, float d3,
+ float d4, svuint8x3_t x) {}
+void fn_u16x3 (float d0, float d1, float d2, float d3,
+ float d4, svuint16x3_t x) {}
+void fn_u32x3 (float d0, float d1, float d2, float d3,
+ float d4, svuint32x3_t x) {}
+void fn_u64x3 (float d0, float d1, float d2, float d3,
+ float d4, svuint64x3_t x) {}
+void fn_f16x3 (float d0, float d1, float d2, float d3,
+ float d4, svfloat16x3_t x) {}
+void fn_f32x3 (float d0, float d1, float d2, float d3,
+ float d4, svfloat32x3_t x) {}
+void fn_f64x3 (float d0, float d1, float d2, float d3,
+ float d4, svfloat64x3_t x) {}
+
+void fn_s8x4 (float d0, float d1, float d2, float d3,
+ float d4, svint8x4_t x) {}
+void fn_s16x4 (float d0, float d1, float d2, float d3,
+ float d4, svint16x4_t x) {}
+void fn_s32x4 (float d0, float d1, float d2, float d3,
+ float d4, svint32x4_t x) {}
+void fn_s64x4 (float d0, float d1, float d2, float d3,
+ float d4, svint64x4_t x) {}
+void fn_u8x4 (float d0, float d1, float d2, float d3,
+ float d4, svuint8x4_t x) {}
+void fn_u16x4 (float d0, float d1, float d2, float d3,
+ float d4, svuint16x4_t x) {}
+void fn_u32x4 (float d0, float d1, float d2, float d3,
+ float d4, svuint32x4_t x) {}
+void fn_u64x4 (float d0, float d1, float d2, float d3,
+ float d4, svuint64x4_t x) {}
+void fn_f16x4 (float d0, float d1, float d2, float d3,
+ float d4, svfloat16x4_t x) {}
+void fn_f32x4 (float d0, float d1, float d2, float d3,
+ float d4, svfloat32x4_t x) {}
+void fn_f64x4 (float d0, float d1, float d2, float d3,
+ float d4, svfloat64x4_t x) {}
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x2\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x3\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x3\n} } } */
+
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s8x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s16x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s32x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s64x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u8x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+
+#include <arm_sve.h>
+
+void fn_s8 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint8_t x) {}
+void fn_s16 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint16_t x) {}
+void fn_s32 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint32_t x) {}
+void fn_s64 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint64_t x) {}
+void fn_u8 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint8_t x) {}
+void fn_u16 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint16_t x) {}
+void fn_u32 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint32_t x) {}
+void fn_u64 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint64_t x) {}
+void fn_f16 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svfloat16_t x) {}
+void fn_f32 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svfloat32_t x) {}
+void fn_f64 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svfloat64_t x) {}
+
+void fn_s8x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint8x2_t x) {}
+void fn_s16x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint16x2_t x) {}
+void fn_s32x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint32x2_t x) {}
+void fn_s64x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint64x2_t x) {}
+void fn_u8x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint8x2_t x) {}
+void fn_u16x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint16x2_t x) {}
+void fn_u32x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint32x2_t x) {}
+void fn_u64x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint64x2_t x) {}
+void fn_f16x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svfloat16x2_t x) {}
+void fn_f32x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svfloat32x2_t x) {}
+void fn_f64x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svfloat64x2_t x) {}
+
+void fn_s8x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint8x3_t x) {}
+void fn_s16x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint16x3_t x) {}
+void fn_s32x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint32x3_t x) {}
+void fn_s64x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint64x3_t x) {}
+void fn_u8x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint8x3_t x) {}
+void fn_u16x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint16x3_t x) {}
+void fn_u32x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint32x3_t x) {}
+void fn_u64x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint64x3_t x) {}
+void fn_f16x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svfloat16x3_t x) {}
+void fn_f32x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svfloat32x3_t x) {}
+void fn_f64x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svfloat64x3_t x) {}
+
+void fn_s8x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint8x4_t x) {}
+void fn_s16x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint16x4_t x) {}
+void fn_s32x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint32x4_t x) {}
+void fn_s64x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svint64x4_t x) {}
+void fn_u8x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint8x4_t x) {}
+void fn_u16x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint16x4_t x) {}
+void fn_u32x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint32x4_t x) {}
+void fn_u64x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svuint64x4_t x) {}
+void fn_f16x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svfloat16x4_t x) {}
+void fn_f32x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svfloat32x4_t x) {}
+void fn_f64x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, svfloat64x4_t x) {}
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32x2\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64x2\n} } } */
+
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s8x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s16x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s32x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s64x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u8x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x3\n} } } */
+
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s8x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s16x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s32x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s64x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u8x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+
+#include <arm_sve.h>
+
+void fn_s8 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint8_t x) {}
+void fn_s16 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint16_t x) {}
+void fn_s32 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint32_t x) {}
+void fn_s64 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint64_t x) {}
+void fn_u8 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint8_t x) {}
+void fn_u16 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint16_t x) {}
+void fn_u32 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint32_t x) {}
+void fn_u64 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint64_t x) {}
+void fn_f16 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svfloat16_t x) {}
+void fn_f32 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svfloat32_t x) {}
+void fn_f64 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svfloat64_t x) {}
+
+void fn_s8x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint8x2_t x) {}
+void fn_s16x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint16x2_t x) {}
+void fn_s32x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint32x2_t x) {}
+void fn_s64x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint64x2_t x) {}
+void fn_u8x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint8x2_t x) {}
+void fn_u16x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint16x2_t x) {}
+void fn_u32x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint32x2_t x) {}
+void fn_u64x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint64x2_t x) {}
+void fn_f16x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svfloat16x2_t x) {}
+void fn_f32x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svfloat32x2_t x) {}
+void fn_f64x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svfloat64x2_t x) {}
+
+void fn_s8x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint8x3_t x) {}
+void fn_s16x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint16x3_t x) {}
+void fn_s32x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint32x3_t x) {}
+void fn_s64x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint64x3_t x) {}
+void fn_u8x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint8x3_t x) {}
+void fn_u16x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint16x3_t x) {}
+void fn_u32x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint32x3_t x) {}
+void fn_u64x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint64x3_t x) {}
+void fn_f16x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svfloat16x3_t x) {}
+void fn_f32x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svfloat32x3_t x) {}
+void fn_f64x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svfloat64x3_t x) {}
+
+void fn_s8x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint8x4_t x) {}
+void fn_s16x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint16x4_t x) {}
+void fn_s32x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint32x4_t x) {}
+void fn_s64x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svint64x4_t x) {}
+void fn_u8x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint8x4_t x) {}
+void fn_u16x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint16x4_t x) {}
+void fn_u32x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint32x4_t x) {}
+void fn_u64x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svuint64x4_t x) {}
+void fn_f16x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svfloat16x4_t x) {}
+void fn_f32x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svfloat32x4_t x) {}
+void fn_f64x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, svfloat64x4_t x) {}
+
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s8\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_s64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u8\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_u64\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f16\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f32\n} } } */
+/* { dg-final { scan-assembler {\t\.variant_pcs\tfn_f64\n} } } */
+
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s8x2\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s16x2\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s32x2\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s64x2\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u8x2\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x2\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x2\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x2\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x2\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x2\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x2\n} } } */
+
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s8x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s16x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s32x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s64x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u8x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x3\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x3\n} } } */
+
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s8x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s16x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s32x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_s64x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u8x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u16x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u32x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_u64x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f16x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f32x4\n} } } */
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\tfn_f64x4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+
+#include <arm_sve.h>
+
+void fn_s8 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint8_t x) {}
+void fn_s16 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint16_t x) {}
+void fn_s32 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint32_t x) {}
+void fn_s64 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint64_t x) {}
+void fn_u8 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint8_t x) {}
+void fn_u16 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint16_t x) {}
+void fn_u32 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint32_t x) {}
+void fn_u64 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint64_t x) {}
+void fn_f16 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svfloat16_t x) {}
+void fn_f32 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svfloat32_t x) {}
+void fn_f64 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svfloat64_t x) {}
+
+void fn_s8x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint8x2_t x) {}
+void fn_s16x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint16x2_t x) {}
+void fn_s32x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint32x2_t x) {}
+void fn_s64x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint64x2_t x) {}
+void fn_u8x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint8x2_t x) {}
+void fn_u16x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint16x2_t x) {}
+void fn_u32x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint32x2_t x) {}
+void fn_u64x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint64x2_t x) {}
+void fn_f16x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svfloat16x2_t x) {}
+void fn_f32x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svfloat32x2_t x) {}
+void fn_f64x2 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svfloat64x2_t x) {}
+
+void fn_s8x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint8x3_t x) {}
+void fn_s16x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint16x3_t x) {}
+void fn_s32x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint32x3_t x) {}
+void fn_s64x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint64x3_t x) {}
+void fn_u8x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint8x3_t x) {}
+void fn_u16x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint16x3_t x) {}
+void fn_u32x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint32x3_t x) {}
+void fn_u64x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint64x3_t x) {}
+void fn_f16x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svfloat16x3_t x) {}
+void fn_f32x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svfloat32x3_t x) {}
+void fn_f64x3 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svfloat64x3_t x) {}
+
+void fn_s8x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint8x4_t x) {}
+void fn_s16x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint16x4_t x) {}
+void fn_s32x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint32x4_t x) {}
+void fn_s64x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svint64x4_t x) {}
+void fn_u8x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint8x4_t x) {}
+void fn_u16x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint16x4_t x) {}
+void fn_u32x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint32x4_t x) {}
+void fn_u64x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svuint64x4_t x) {}
+void fn_f16x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svfloat16x4_t x) {}
+void fn_f32x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svfloat32x4_t x) {}
+void fn_f64x4 (float d0, float d1, float d2, float d3,
+ float d4, float d5, float d6, float d7, svfloat64x4_t x) {}
+
+/* { dg-final { scan-assembler-not {\t\.variant_pcs\t\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+/*
+** callee_pred:
+** ldr (p[0-9]+), \[x0\]
+** ldr (p[0-9]+), \[x1\]
+** brkpa (p[0-7])\.b, p0/z, p1\.b, p2\.b
+** brkpb (p[0-7])\.b, \3/z, p3\.b, \1\.b
+** brka p0\.b, \4/z, \2\.b
+** ret
+*/
+__SVBool_t __attribute__((noipa))
+callee_pred (__SVBool_t p0, __SVBool_t p1, __SVBool_t p2, __SVBool_t p3,
+ __SVBool_t mem0, __SVBool_t mem1)
+{
+ p0 = svbrkpa_z (p0, p1, p2);
+ p0 = svbrkpb_z (p0, p3, mem0);
+ return svbrka_z (p0, mem1);
+}
+
+/*
+** caller_pred:
+** ...
+** ptrue (p[0-9]+)\.b, vl5
+** str \1, \[x0\]
+** ...
+** ptrue (p[0-9]+)\.h, vl6
+** str \2, \[x1\]
+** ptrue p3\.d, vl4
+** ptrue p2\.s, vl3
+** ptrue p1\.h, vl2
+** ptrue p0\.b, vl1
+** bl callee_pred
+** ...
+*/
+__SVBool_t __attribute__((noipa))
+caller_pred (void)
+{
+ return callee_pred (svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4),
+ svptrue_pat_b8 (SV_VL5),
+ svptrue_pat_b16 (SV_VL6));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+/*
+** callee:
+** fadd s0, (s0, s6|s6, s0)
+** ret
+*/
+float __attribute__((noipa))
+callee (float s0, double d1, svfloat32x4_t z2, svfloat64x4_t stack1,
+ float s6, double d7)
+{
+ return s0 + s6;
+}
+
+float __attribute__((noipa))
+caller (float32_t *x0, float64_t *x1)
+{
+ return callee (0.0f, 1.0,
+ svld4 (svptrue_b8 (), x0),
+ svld4 (svptrue_b8 (), x1),
+ 6.0f, 7.0);
+}
+
+/* { dg-final { scan-assembler {\tld4w\t{z2\.s - z5\.s}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - z[0-9]+\.d}, p[0-7]/z, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tmovi\tv0\.[24]s, #0\n} } } */
+/* { dg-final { scan-assembler {\tfmov\td1, #?1\.0} } } */
+/* { dg-final { scan-assembler {\tfmov\ts6, #?6\.0} } } */
+/* { dg-final { scan-assembler {\tfmov\td7, #?7\.0} } } */
--- /dev/null
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O0 -g" } */
+
+#include <arm_sve.h>
+
+void __attribute__((noipa))
+callee (svbool_t p, svint8_t s8, svuint16x4_t u16, svfloat32x3_t f32,
+ svint64x2_t s64)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+
+ if (svptest_any (pg, sveor_z (pg, p, svptrue_pat_b8 (SV_VL7))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, s8, svindex_s8 (1, 2))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 0), svindex_u16 (2, 3))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 1), svindex_u16 (3, 4))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 2), svindex_u16 (4, 5))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 3), svindex_u16 (5, 6))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget3 (f32, 0), svdup_f32 (1.0))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget3 (f32, 1), svdup_f32 (2.0))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget3 (f32, 2), svdup_f32 (3.0))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget2 (s64, 0), svindex_s64 (6, 7))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget2 (s64, 1), svindex_s64 (7, 8))))
+ __builtin_abort ();
+}
+
+int __attribute__((noipa))
+main (void)
+{
+ callee (svptrue_pat_b8 (SV_VL7),
+ svindex_s8 (1, 2),
+ svcreate4 (svindex_u16 (2, 3),
+ svindex_u16 (3, 4),
+ svindex_u16 (4, 5),
+ svindex_u16 (5, 6)),
+ svcreate3 (svdup_f32 (1.0),
+ svdup_f32 (2.0),
+ svdup_f32 (3.0)),
+ svcreate2 (svindex_s64 (6, 7),
+ svindex_s64 (7, 8)));
+}
--- /dev/null
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O0 -fstack-clash-protection -g" } */
+
+#include <arm_sve.h>
+
+void __attribute__((noipa))
+callee (svbool_t p, svint8_t s8, svuint16x4_t u16, svfloat32x3_t f32,
+ svint64x2_t s64)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+
+ if (svptest_any (pg, sveor_z (pg, p, svptrue_pat_b8 (SV_VL7))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, s8, svindex_s8 (1, 2))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 0), svindex_u16 (2, 3))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 1), svindex_u16 (3, 4))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 2), svindex_u16 (4, 5))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 3), svindex_u16 (5, 6))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget3 (f32, 0), svdup_f32 (1.0))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget3 (f32, 1), svdup_f32 (2.0))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget3 (f32, 2), svdup_f32 (3.0))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget2 (s64, 0), svindex_s64 (6, 7))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget2 (s64, 1), svindex_s64 (7, 8))))
+ __builtin_abort ();
+}
+
+int __attribute__((noipa))
+main (void)
+{
+ callee (svptrue_pat_b8 (SV_VL7),
+ svindex_s8 (1, 2),
+ svcreate4 (svindex_u16 (2, 3),
+ svindex_u16 (3, 4),
+ svindex_u16 (4, 5),
+ svindex_u16 (5, 6)),
+ svcreate3 (svdup_f32 (1.0),
+ svdup_f32 (2.0),
+ svdup_f32 (3.0)),
+ svcreate2 (svindex_s64 (6, 7),
+ svindex_s64 (7, 8)));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+/*
+** callee_int:
+** ptrue p3\.b, all
+** ld1b (z(?:2[4-9]|3[0-1]).b), p3/z, \[x4\]
+** st1b \1, p2, \[x0\]
+** st1b z4\.b, p1, \[x0\]
+** st1h z5\.h, p1, \[x1\]
+** st1w z6\.s, p1, \[x2\]
+** st1d z7\.d, p1, \[x3\]
+** st1b z0\.b, p0, \[x0\]
+** st1h z1\.h, p0, \[x1\]
+** st1w z2\.s, p0, \[x2\]
+** st1d z3\.d, p0, \[x3\]
+** ret
+*/
+void __attribute__((noipa))
+callee_int (int8_t *x0, int16_t *x1, int32_t *x2, int64_t *x3,
+ svint8_t z0, svint16_t z1, svint32_t z2, svint64_t z3,
+ svint8_t z4, svint16_t z5, svint32_t z6, svint64_t z7,
+ svint8_t z8,
+ svbool_t p0, svbool_t p1, svbool_t p2)
+{
+ svst1 (p2, x0, z8);
+ svst1 (p1, x0, z4);
+ svst1 (p1, x1, z5);
+ svst1 (p1, x2, z6);
+ svst1 (p1, x3, z7);
+ svst1 (p0, x0, z0);
+ svst1 (p0, x1, z1);
+ svst1 (p0, x2, z2);
+ svst1 (p0, x3, z3);
+}
+
+void __attribute__((noipa))
+caller_int (int8_t *x0, int16_t *x1, int32_t *x2, int64_t *x3)
+{
+ callee_int (x0, x1, x2, x3,
+ svdup_s8 (0),
+ svdup_s16 (1),
+ svdup_s32 (2),
+ svdup_s64 (3),
+ svdup_s8 (4),
+ svdup_s16 (5),
+ svdup_s32 (6),
+ svdup_s64 (7),
+ svdup_s8 (8),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tmov\tz0\.b, #0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz1\.h, #1\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2\.s, #2\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3\.d, #3\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4\.b, #4\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5\.h, #5\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz6\.s, #6\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz7\.d, #7\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx4, sp\n} } } */
+/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.b), #8\n.*\tst1b\t\1, p[0-7], \[x4\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+/*
+** callee_uint:
+** ptrue p3\.b, all
+** ld1b (z(?:2[4-9]|3[0-1]).b), p3/z, \[x4\]
+** st1b \1, p2, \[x0\]
+** st1b z4\.b, p1, \[x0\]
+** st1h z5\.h, p1, \[x1\]
+** st1w z6\.s, p1, \[x2\]
+** st1d z7\.d, p1, \[x3\]
+** st1b z0\.b, p0, \[x0\]
+** st1h z1\.h, p0, \[x1\]
+** st1w z2\.s, p0, \[x2\]
+** st1d z3\.d, p0, \[x3\]
+** ret
+*/
+void __attribute__((noipa))
+callee_uint (uint8_t *x0, uint16_t *x1, uint32_t *x2, uint64_t *x3,
+ svuint8_t z0, svuint16_t z1, svuint32_t z2, svuint64_t z3,
+ svuint8_t z4, svuint16_t z5, svuint32_t z6, svuint64_t z7,
+ svuint8_t z8,
+ svbool_t p0, svbool_t p1, svbool_t p2)
+{
+ svst1 (p2, x0, z8);
+ svst1 (p1, x0, z4);
+ svst1 (p1, x1, z5);
+ svst1 (p1, x2, z6);
+ svst1 (p1, x3, z7);
+ svst1 (p0, x0, z0);
+ svst1 (p0, x1, z1);
+ svst1 (p0, x2, z2);
+ svst1 (p0, x3, z3);
+}
+
+void __attribute__((noipa))
+caller_uint (uint8_t *x0, uint16_t *x1, uint32_t *x2, uint64_t *x3)
+{
+ callee_uint (x0, x1, x2, x3,
+ svdup_u8 (0),
+ svdup_u16 (1),
+ svdup_u32 (2),
+ svdup_u64 (3),
+ svdup_u8 (4),
+ svdup_u16 (5),
+ svdup_u32 (6),
+ svdup_u64 (7),
+ svdup_u8 (8),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tmov\tz0\.b, #0\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz1\.h, #1\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz2\.s, #2\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz3\.d, #3\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz4\.b, #4\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz5\.h, #5\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz6\.s, #6\n} } } */
+/* { dg-final { scan-assembler {\tmov\tz7\.d, #7\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx4, sp\n} } } */
+/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.b), #8\n.*\tst1b\t\1, p[0-7], \[x4\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+/*
+** callee_float:
+** ptrue p3\.b, all
+** ld1h (z(?:2[4-9]|3[0-1]).h), p3/z, \[x4\]
+** st1h \1, p2, \[x0\]
+** st1h z4\.h, p1, \[x0\]
+** st1h z5\.h, p1, \[x1\]
+** st1w z6\.s, p1, \[x2\]
+** st1d z7\.d, p1, \[x3\]
+** st1h z0\.h, p0, \[x0\]
+** st1h z1\.h, p0, \[x1\]
+** st1w z2\.s, p0, \[x2\]
+** st1d z3\.d, p0, \[x3\]
+** ret
+*/
+void __attribute__((noipa))
+callee_float (float16_t *x0, float16_t *x1, float32_t *x2, float64_t *x3,
+ svfloat16_t z0, svfloat16_t z1, svfloat32_t z2, svfloat64_t z3,
+ svfloat16_t z4, svfloat16_t z5, svfloat32_t z6, svfloat64_t z7,
+ svfloat16_t z8,
+ svbool_t p0, svbool_t p1, svbool_t p2)
+{
+ svst1 (p2, x0, z8);
+ svst1 (p1, x0, z4);
+ svst1 (p1, x1, z5);
+ svst1 (p1, x2, z6);
+ svst1 (p1, x3, z7);
+ svst1 (p0, x0, z0);
+ svst1 (p0, x1, z1);
+ svst1 (p0, x2, z2);
+ svst1 (p0, x3, z3);
+}
+
+void __attribute__((noipa))
+caller_float (float16_t *x0, float16_t *x1, float32_t *x2, float64_t *x3)
+{
+ callee_float (x0, x1, x2, x3,
+ svdup_f16 (0),
+ svdup_f16 (1),
+ svdup_f32 (2),
+ svdup_f64 (3),
+ svdup_f16 (4),
+ svdup_f16 (5),
+ svdup_f32 (6),
+ svdup_f64 (7),
+ svdup_f16 (8),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tmov\tz0\.[bhsd], #0\n} } } */
+/* { dg-final { scan-assembler {\tfmov\tz1\.h, #1\.0} } } */
+/* { dg-final { scan-assembler {\tfmov\tz2\.s, #2\.0} } } */
+/* { dg-final { scan-assembler {\tfmov\tz3\.d, #3\.0} } } */
+/* { dg-final { scan-assembler {\tfmov\tz4\.h, #4\.0} } } */
+/* { dg-final { scan-assembler {\tfmov\tz5\.h, #5\.0} } } */
+/* { dg-final { scan-assembler {\tfmov\tz6\.s, #6\.0} } } */
+/* { dg-final { scan-assembler {\tfmov\tz7\.d, #7\.0} } } */
+/* { dg-final { scan-assembler {\tmov\tx4, sp\n} } } */
+/* { dg-final { scan-assembler {\tfmov\t(z[0-9]+\.h), #8\.0.*\tst1h\t\1, p[0-7], \[x4\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p4\.b, all
+** (
+** ld1h (z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
+** ld1h (z[0-9]+\.h), p4/z, \[x1\]
+** st2h {\2 - \1}, p0, \[x0\]
+** |
+** ld1h (z[0-9]+\.h), p4/z, \[x1\]
+** ld1h (z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
+** st2h {\3 - \4}, p0, \[x0\]
+** )
+** st4h {z0\.h - z3\.h}, p1, \[x0\]
+** st3h {z4\.h - z6\.h}, p2, \[x0\]
+** st1h z7\.h, p3, \[x0\]
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svfloat16x4_t z0, svfloat16x3_t z4, svfloat16x2_t stack,
+ svfloat16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_f16 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_f16 (pg, x0, -8),
+ svld3_vnum_f16 (pg, x0, -3),
+ svld2_vnum_f16 (pg, x0, 0),
+ svld1_vnum_f16 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p4\.b, all
+** (
+** ld1w (z[0-9]+\.s), p4/z, \[x1, #1, mul vl\]
+** ld1w (z[0-9]+\.s), p4/z, \[x1\]
+** st2w {\2 - \1}, p0, \[x0\]
+** |
+** ld1w (z[0-9]+\.s), p4/z, \[x1\]
+** ld1w (z[0-9]+\.s), p4/z, \[x1, #1, mul vl\]
+** st2w {\3 - \4}, p0, \[x0\]
+** )
+** st4w {z0\.s - z3\.s}, p1, \[x0\]
+** st3w {z4\.s - z6\.s}, p2, \[x0\]
+** st1w z7\.s, p3, \[x0\]
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svfloat32x4_t z0, svfloat32x3_t z4, svfloat32x2_t stack,
+ svfloat32_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_f32 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_f32 (pg, x0, -8),
+ svld3_vnum_f32 (pg, x0, -3),
+ svld2_vnum_f32 (pg, x0, 0),
+ svld1_vnum_f32 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4w\t{z0\.s - z3\.s}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z4\.s - z6\.s}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz7\.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{(z[0-9]+\.s) - z[0-9]+\.s}.*\tst1w\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+\.s - (z[0-9]+\.s)}.*\tst1w\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p4\.b, all
+** (
+** ld1d (z[0-9]+\.d), p4/z, \[x1, #1, mul vl\]
+** ld1d (z[0-9]+\.d), p4/z, \[x1\]
+** st2d {\2 - \1}, p0, \[x0\]
+** |
+** ld1d (z[0-9]+\.d), p4/z, \[x1\]
+** ld1d (z[0-9]+\.d), p4/z, \[x1, #1, mul vl\]
+** st2d {\3 - \4}, p0, \[x0\]
+** )
+** st4d {z0\.d - z3\.d}, p1, \[x0\]
+** st3d {z4\.d - z6\.d}, p2, \[x0\]
+** st1d z7\.d, p3, \[x0\]
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svfloat64x4_t z0, svfloat64x3_t z4, svfloat64x2_t stack,
+ svfloat64_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_f64 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_f64 (pg, x0, -8),
+ svld3_vnum_f64 (pg, x0, -3),
+ svld2_vnum_f64 (pg, x0, 0),
+ svld1_vnum_f64 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4d\t{z0\.d - z3\.d}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z4\.d - z6\.d}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz7\.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{(z[0-9]+\.d) - z[0-9]+\.d}.*\tst1d\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+\.d - (z[0-9]+\.d)}.*\tst1d\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p4\.b, all
+** (
+** ld1h (z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
+** ld1h (z[0-9]+\.h), p4/z, \[x1\]
+** st2h {\2 - \1}, p0, \[x0\]
+** |
+** ld1h (z[0-9]+\.h), p4/z, \[x1\]
+** ld1h (z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
+** st2h {\3 - \4}, p0, \[x0\]
+** )
+** st4h {z0\.h - z3\.h}, p1, \[x0\]
+** st3h {z4\.h - z6\.h}, p2, \[x0\]
+** st1h z7\.h, p3, \[x0\]
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svint16x4_t z0, svint16x3_t z4, svint16x2_t stack,
+ svint16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_s16 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_s16 (pg, x0, -8),
+ svld3_vnum_s16 (pg, x0, -3),
+ svld2_vnum_s16 (pg, x0, 0),
+ svld1_vnum_s16 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p4\.b, all
+** (
+** ld1w (z[0-9]+\.s), p4/z, \[x1, #1, mul vl\]
+** ld1w (z[0-9]+\.s), p4/z, \[x1\]
+** st2w {\2 - \1}, p0, \[x0\]
+** |
+** ld1w (z[0-9]+\.s), p4/z, \[x1\]
+** ld1w (z[0-9]+\.s), p4/z, \[x1, #1, mul vl\]
+** st2w {\3 - \4}, p0, \[x0\]
+** )
+** st4w {z0\.s - z3\.s}, p1, \[x0\]
+** st3w {z4\.s - z6\.s}, p2, \[x0\]
+** st1w z7\.s, p3, \[x0\]
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svint32x4_t z0, svint32x3_t z4, svint32x2_t stack,
+ svint32_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_s32 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_s32 (pg, x0, -8),
+ svld3_vnum_s32 (pg, x0, -3),
+ svld2_vnum_s32 (pg, x0, 0),
+ svld1_vnum_s32 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4w\t{z0\.s - z3\.s}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z4\.s - z6\.s}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz7\.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{(z[0-9]+\.s) - z[0-9]+\.s}.*\tst1w\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+\.s - (z[0-9]+\.s)}.*\tst1w\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p4\.b, all
+** (
+** ld1d (z[0-9]+\.d), p4/z, \[x1, #1, mul vl\]
+** ld1d (z[0-9]+\.d), p4/z, \[x1\]
+** st2d {\2 - \1}, p0, \[x0\]
+** |
+** ld1d (z[0-9]+\.d), p4/z, \[x1\]
+** ld1d (z[0-9]+\.d), p4/z, \[x1, #1, mul vl\]
+** st2d {\3 - \4}, p0, \[x0\]
+** )
+** st4d {z0\.d - z3\.d}, p1, \[x0\]
+** st3d {z4\.d - z6\.d}, p2, \[x0\]
+** st1d z7\.d, p3, \[x0\]
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svint64x4_t z0, svint64x3_t z4, svint64x2_t stack,
+ svint64_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_s64 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_s64 (pg, x0, -8),
+ svld3_vnum_s64 (pg, x0, -3),
+ svld2_vnum_s64 (pg, x0, 0),
+ svld1_vnum_s64 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4d\t{z0\.d - z3\.d}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z4\.d - z6\.d}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz7\.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{(z[0-9]+\.d) - z[0-9]+\.d}.*\tst1d\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+\.d - (z[0-9]+\.d)}.*\tst1d\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p4\.b, all
+** (
+** ld1b (z[0-9]+\.b), p4/z, \[x1, #1, mul vl\]
+** ld1b (z[0-9]+\.b), p4/z, \[x1\]
+** st2b {\2 - \1}, p0, \[x0\]
+** |
+** ld1b (z[0-9]+\.b), p4/z, \[x1\]
+** ld1b (z[0-9]+\.b), p4/z, \[x1, #1, mul vl\]
+** st2b {\3 - \4}, p0, \[x0\]
+** )
+** st4b {z0\.b - z3\.b}, p1, \[x0\]
+** st3b {z4\.b - z6\.b}, p2, \[x0\]
+** st1b z7\.b, p3, \[x0\]
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svint8x4_t z0, svint8x3_t z4, svint8x2_t stack,
+ svint8_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_s8 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_s8 (pg, x0, -8),
+ svld3_vnum_s8 (pg, x0, -3),
+ svld2_vnum_s8 (pg, x0, 0),
+ svld1_vnum_s8 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4b\t{z0\.b - z3\.b}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3b\t{z4\.b - z6\.b}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz7\.b, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2b\t{(z[0-9]+\.b) - z[0-9]+\.b}.*\tst1b\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2b\t{z[0-9]+\.b - (z[0-9]+\.b)}.*\tst1b\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p4\.b, all
+** (
+** ld1h (z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
+** ld1h (z[0-9]+\.h), p4/z, \[x1\]
+** st2h {\2 - \1}, p0, \[x0\]
+** |
+** ld1h (z[0-9]+\.h), p4/z, \[x1\]
+** ld1h (z[0-9]+\.h), p4/z, \[x1, #1, mul vl\]
+** st2h {\3 - \4}, p0, \[x0\]
+** )
+** st4h {z0\.h - z3\.h}, p1, \[x0\]
+** st3h {z4\.h - z6\.h}, p2, \[x0\]
+** st1h z7\.h, p3, \[x0\]
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svuint16x4_t z0, svuint16x3_t z4, svuint16x2_t stack,
+ svuint16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_u16 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_u16 (pg, x0, -8),
+ svld3_vnum_u16 (pg, x0, -3),
+ svld2_vnum_u16 (pg, x0, 0),
+ svld1_vnum_u16 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p4\.b, all
+** (
+** ld1w (z[0-9]+\.s), p4/z, \[x1, #1, mul vl\]
+** ld1w (z[0-9]+\.s), p4/z, \[x1\]
+** st2w {\2 - \1}, p0, \[x0\]
+** |
+** ld1w (z[0-9]+\.s), p4/z, \[x1\]
+** ld1w (z[0-9]+\.s), p4/z, \[x1, #1, mul vl\]
+** st2w {\3 - \4}, p0, \[x0\]
+** )
+** st4w {z0\.s - z3\.s}, p1, \[x0\]
+** st3w {z4\.s - z6\.s}, p2, \[x0\]
+** st1w z7\.s, p3, \[x0\]
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svuint32x4_t z0, svuint32x3_t z4, svuint32x2_t stack,
+ svuint32_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_u32 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_u32 (pg, x0, -8),
+ svld3_vnum_u32 (pg, x0, -3),
+ svld2_vnum_u32 (pg, x0, 0),
+ svld1_vnum_u32 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4w\t{z0\.s - z3\.s}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z4\.s - z6\.s}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz7\.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{(z[0-9]+\.s) - z[0-9]+\.s}.*\tst1w\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+\.s - (z[0-9]+\.s)}.*\tst1w\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p4\.b, all
+** (
+** ld1d (z[0-9]+\.d), p4/z, \[x1, #1, mul vl\]
+** ld1d (z[0-9]+\.d), p4/z, \[x1\]
+** st2d {\2 - \1}, p0, \[x0\]
+** |
+** ld1d (z[0-9]+\.d), p4/z, \[x1\]
+** ld1d (z[0-9]+\.d), p4/z, \[x1, #1, mul vl\]
+** st2d {\3 - \4}, p0, \[x0\]
+** )
+** st4d {z0\.d - z3\.d}, p1, \[x0\]
+** st3d {z4\.d - z6\.d}, p2, \[x0\]
+** st1d z7\.d, p3, \[x0\]
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svuint64x4_t z0, svuint64x3_t z4, svuint64x2_t stack,
+ svuint64_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_u64 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_u64 (pg, x0, -8),
+ svld3_vnum_u64 (pg, x0, -3),
+ svld2_vnum_u64 (pg, x0, 0),
+ svld1_vnum_u64 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4d\t{z0\.d - z3\.d}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z4\.d - z6\.d}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz7\.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{(z[0-9]+\.d) - z[0-9]+\.d}.*\tst1d\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+\.d - (z[0-9]+\.d)}.*\tst1d\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p4\.b, all
+** (
+** ld1b (z[0-9]+\.b), p4/z, \[x1, #1, mul vl\]
+** ld1b (z[0-9]+\.b), p4/z, \[x1\]
+** st2b {\2 - \1}, p0, \[x0\]
+** |
+** ld1b (z[0-9]+\.b), p4/z, \[x1\]
+** ld1b (z[0-9]+\.b), p4/z, \[x1, #1, mul vl\]
+** st2b {\3 - \4}, p0, \[x0\]
+** )
+** st4b {z0\.b - z3\.b}, p1, \[x0\]
+** st3b {z4\.b - z6\.b}, p2, \[x0\]
+** st1b z7\.b, p3, \[x0\]
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svuint8x4_t z0, svuint8x3_t z4, svuint8x2_t stack,
+ svuint8_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_u8 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_u8 (pg, x0, -8),
+ svld3_vnum_u8 (pg, x0, -3),
+ svld2_vnum_u8 (pg, x0, 0),
+ svld1_vnum_u8 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4b\t{z0\.b - z3\.b}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3b\t{z4\.b - z6\.b}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz7\.b, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2b\t{(z[0-9]+\.b) - z[0-9]+\.b}.*\tst1b\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2b\t{z[0-9]+\.b - (z[0-9]+\.b)}.*\tst1b\t\1, p[0-7], \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** (
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** ldr (z[0-9]+), \[x1\]
+** st2h {\2\.h - \1\.h}, p0, \[x0\]
+** |
+** ldr (z[0-9]+), \[x1\]
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** st2h {\3\.h - \4\.h}, p0, \[x0\]
+** )
+** st4h {z0\.h - z3\.h}, p1, \[x0\]
+** st3h {z4\.h - z6\.h}, p2, \[x0\]
+** st1h z7\.h, p3, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svfloat16x4_t z0, svfloat16x3_t z4, svfloat16x2_t stack,
+ svfloat16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_f16 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_f16 (pg, x0, -8),
+ svld3_vnum_f16 (pg, x0, -3),
+ svld2_vnum_f16 (pg, x0, 0),
+ svld1_vnum_f16 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** (
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** ldr (z[0-9]+), \[x1\]
+** st2w {\2\.s - \1\.s}, p0, \[x0\]
+** |
+** ldr (z[0-9]+), \[x1\]
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** st2w {\3\.s - \4\.s}, p0, \[x0\]
+** )
+** st4w {z0\.s - z3\.s}, p1, \[x0\]
+** st3w {z4\.s - z6\.s}, p2, \[x0\]
+** st1w z7\.s, p3, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svfloat32x4_t z0, svfloat32x3_t z4, svfloat32x2_t stack,
+ svfloat32_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_f32 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_f32 (pg, x0, -8),
+ svld3_vnum_f32 (pg, x0, -3),
+ svld2_vnum_f32 (pg, x0, 0),
+ svld1_vnum_f32 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4w\t{z0\.s - z3\.s}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z4\.s - z6\.s}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz7\.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{(z[0-9]+)\.s - z[0-9]+\.s}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+\.s - (z[0-9]+)\.s}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** (
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** ldr (z[0-9]+), \[x1\]
+** st2d {\2\.d - \1\.d}, p0, \[x0\]
+** |
+** ldr (z[0-9]+), \[x1\]
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** st2d {\3\.d - \4\.d}, p0, \[x0\]
+** )
+** st4d {z0\.d - z3\.d}, p1, \[x0\]
+** st3d {z4\.d - z6\.d}, p2, \[x0\]
+** st1d z7\.d, p3, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svfloat64x4_t z0, svfloat64x3_t z4, svfloat64x2_t stack,
+ svfloat64_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_f64 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_f64 (pg, x0, -8),
+ svld3_vnum_f64 (pg, x0, -3),
+ svld2_vnum_f64 (pg, x0, 0),
+ svld1_vnum_f64 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4d\t{z0\.d - z3\.d}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z4\.d - z6\.d}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz7\.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{(z[0-9]+)\.d - z[0-9]+\.d}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+\.d - (z[0-9]+)\.d}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** (
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** ldr (z[0-9]+), \[x1\]
+** st2h {\2\.h - \1\.h}, p0, \[x0\]
+** |
+** ldr (z[0-9]+), \[x1\]
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** st2h {\3\.h - \4\.h}, p0, \[x0\]
+** )
+** st4h {z0\.h - z3\.h}, p1, \[x0\]
+** st3h {z4\.h - z6\.h}, p2, \[x0\]
+** st1h z7\.h, p3, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svint16x4_t z0, svint16x3_t z4, svint16x2_t stack,
+ svint16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_s16 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_s16 (pg, x0, -8),
+ svld3_vnum_s16 (pg, x0, -3),
+ svld2_vnum_s16 (pg, x0, 0),
+ svld1_vnum_s16 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** (
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** ldr (z[0-9]+), \[x1\]
+** st2w {\2\.s - \1\.s}, p0, \[x0\]
+** |
+** ldr (z[0-9]+), \[x1\]
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** st2w {\3\.s - \4\.s}, p0, \[x0\]
+** )
+** st4w {z0\.s - z3\.s}, p1, \[x0\]
+** st3w {z4\.s - z6\.s}, p2, \[x0\]
+** st1w z7\.s, p3, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svint32x4_t z0, svint32x3_t z4, svint32x2_t stack,
+ svint32_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_s32 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_s32 (pg, x0, -8),
+ svld3_vnum_s32 (pg, x0, -3),
+ svld2_vnum_s32 (pg, x0, 0),
+ svld1_vnum_s32 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4w\t{z0\.s - z3\.s}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z4\.s - z6\.s}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz7\.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{(z[0-9]+)\.s - z[0-9]+\.s}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+\.s - (z[0-9]+)\.s}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** (
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** ldr (z[0-9]+), \[x1\]
+** st2d {\2\.d - \1\.d}, p0, \[x0\]
+** |
+** ldr (z[0-9]+), \[x1\]
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** st2d {\3\.d - \4\.d}, p0, \[x0\]
+** )
+** st4d {z0\.d - z3\.d}, p1, \[x0\]
+** st3d {z4\.d - z6\.d}, p2, \[x0\]
+** st1d z7\.d, p3, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svint64x4_t z0, svint64x3_t z4, svint64x2_t stack,
+ svint64_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_s64 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_s64 (pg, x0, -8),
+ svld3_vnum_s64 (pg, x0, -3),
+ svld2_vnum_s64 (pg, x0, 0),
+ svld1_vnum_s64 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4d\t{z0\.d - z3\.d}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z4\.d - z6\.d}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz7\.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{(z[0-9]+)\.d - z[0-9]+\.d}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+\.d - (z[0-9]+)\.d}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** (
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** ldr (z[0-9]+), \[x1\]
+** st2b {\2\.b - \1\.b}, p0, \[x0\]
+** |
+** ldr (z[0-9]+), \[x1\]
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** st2b {\3\.b - \4\.b}, p0, \[x0\]
+** )
+** st4b {z0\.b - z3\.b}, p1, \[x0\]
+** st3b {z4\.b - z6\.b}, p2, \[x0\]
+** st1b z7\.b, p3, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svint8x4_t z0, svint8x3_t z4, svint8x2_t stack,
+ svint8_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_s8 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_s8 (pg, x0, -8),
+ svld3_vnum_s8 (pg, x0, -3),
+ svld2_vnum_s8 (pg, x0, 0),
+ svld1_vnum_s8 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4b\t{z0\.b - z3\.b}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3b\t{z4\.b - z6\.b}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz7\.b, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2b\t{(z[0-9]+)\.b - z[0-9]+\.b}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2b\t{z[0-9]+\.b - (z[0-9]+)\.b}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** (
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** ldr (z[0-9]+), \[x1\]
+** st2h {\2\.h - \1\.h}, p0, \[x0\]
+** |
+** ldr (z[0-9]+), \[x1\]
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** st2h {\3\.h - \4\.h}, p0, \[x0\]
+** )
+** st4h {z0\.h - z3\.h}, p1, \[x0\]
+** st3h {z4\.h - z6\.h}, p2, \[x0\]
+** st1h z7\.h, p3, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svuint16x4_t z0, svuint16x3_t z4, svuint16x2_t stack,
+ svuint16_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_u16 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_u16 (pg, x0, -8),
+ svld3_vnum_u16 (pg, x0, -3),
+ svld2_vnum_u16 (pg, x0, 0),
+ svld1_vnum_u16 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4h\t{z0\.h - z3\.h}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z4\.h - z6\.h}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\tz7\.h, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** (
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** ldr (z[0-9]+), \[x1\]
+** st2w {\2\.s - \1\.s}, p0, \[x0\]
+** |
+** ldr (z[0-9]+), \[x1\]
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** st2w {\3\.s - \4\.s}, p0, \[x0\]
+** )
+** st4w {z0\.s - z3\.s}, p1, \[x0\]
+** st3w {z4\.s - z6\.s}, p2, \[x0\]
+** st1w z7\.s, p3, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svuint32x4_t z0, svuint32x3_t z4, svuint32x2_t stack,
+ svuint32_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_u32 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_u32 (pg, x0, -8),
+ svld3_vnum_u32 (pg, x0, -3),
+ svld2_vnum_u32 (pg, x0, 0),
+ svld1_vnum_u32 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4w\t{z0\.s - z3\.s}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z4\.s - z6\.s}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz7\.s, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{(z[0-9]+)\.s - z[0-9]+\.s}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{z[0-9]+\.s - (z[0-9]+)\.s}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** (
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** ldr (z[0-9]+), \[x1\]
+** st2d {\2\.d - \1\.d}, p0, \[x0\]
+** |
+** ldr (z[0-9]+), \[x1\]
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** st2d {\3\.d - \4\.d}, p0, \[x0\]
+** )
+** st4d {z0\.d - z3\.d}, p1, \[x0\]
+** st3d {z4\.d - z6\.d}, p2, \[x0\]
+** st1d z7\.d, p3, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svuint64x4_t z0, svuint64x3_t z4, svuint64x2_t stack,
+ svuint64_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_u64 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_u64 (pg, x0, -8),
+ svld3_vnum_u64 (pg, x0, -3),
+ svld2_vnum_u64 (pg, x0, 0),
+ svld1_vnum_u64 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4d\t{z0\.d - z3\.d}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z4\.d - z6\.d}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\tz7\.d, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{(z[0-9]+)\.d - z[0-9]+\.d}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{z[0-9]+\.d - (z[0-9]+)\.d}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee:
+** (
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** ldr (z[0-9]+), \[x1\]
+** st2b {\2\.b - \1\.b}, p0, \[x0\]
+** |
+** ldr (z[0-9]+), \[x1\]
+** ldr (z[0-9]+), \[x1, #1, mul vl\]
+** st2b {\3\.b - \4\.b}, p0, \[x0\]
+** )
+** st4b {z0\.b - z3\.b}, p1, \[x0\]
+** st3b {z4\.b - z6\.b}, p2, \[x0\]
+** st1b z7\.b, p3, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee (void *x0, svuint8x4_t z0, svuint8x3_t z4, svuint8x2_t stack,
+ svuint8_t z7, svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ svst2 (p0, x0, stack);
+ svst4 (p1, x0, z0);
+ svst3 (p2, x0, z4);
+ svst1_u8 (p3, x0, z7);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee (x0,
+ svld4_vnum_u8 (pg, x0, -8),
+ svld3_vnum_u8 (pg, x0, -3),
+ svld2_vnum_u8 (pg, x0, 0),
+ svld1_vnum_u8 (pg, x0, 2),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3),
+ svptrue_pat_b64 (SV_VL4));
+}
+
+/* { dg-final { scan-assembler {\tld4b\t{z0\.b - z3\.b}, p[0-7]/z, \[x0, #-8, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3b\t{z4\.b - z6\.b}, p[0-7]/z, \[x0, #-3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\tz7\.b, p[0-7]/z, \[x0, #2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tmov\tx1, sp\n} } } */
+/* { dg-final { scan-assembler {\tld2b\t{(z[0-9]+)\.b - z[0-9]+\.b}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld2b\t{z[0-9]+\.b - (z[0-9]+)\.b}.*\tstr\t\1, \[x1, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp3\.d, vl4\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ptrue p3\.b, all
+** ...
+** ld1h (z[0-9]+\.h), p3/z, \[x1, #3, mul vl\]
+** ...
+** st4h {z[0-9]+\.h - \1}, p0, \[x0\]
+** st2h {z3\.h - z4\.h}, p1, \[x0\]
+** st3h {z5\.h - z7\.h}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svfloat16x3_t z0, svfloat16x2_t z3, svfloat16x3_t z5,
+ svfloat16x4_t stack1, svfloat16_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_f16 (p0, x0, stack1);
+ svst2_f16 (p1, x0, z3);
+ svst3_f16 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1h (z[0-9]+\.h), p3/z, \[x2\]
+** st1h \1, p0, \[x0\]
+** st2h {z3\.h - z4\.h}, p1, \[x0\]
+** st3h {z0\.h - z2\.h}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svfloat16x3_t z0, svfloat16x2_t z3, svfloat16x3_t z5,
+ svfloat16x4_t stack1, svfloat16_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_f16 (p0, x0, stack2);
+ svst2_f16 (p1, x0, z3);
+ svst3_f16 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_f16 (pg, x0, -9),
+ svld2_vnum_f16 (pg, x0, -2),
+ svld3_vnum_f16 (pg, x0, 0),
+ svld4_vnum_f16 (pg, x0, 8),
+ svld1_vnum_f16 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ptrue p3\.b, all
+** ...
+** ld1w (z[0-9]+\.s), p3/z, \[x1, #3, mul vl\]
+** ...
+** st4w {z[0-9]+\.s - \1}, p0, \[x0\]
+** st2w {z3\.s - z4\.s}, p1, \[x0\]
+** st3w {z5\.s - z7\.s}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svfloat32x3_t z0, svfloat32x2_t z3, svfloat32x3_t z5,
+ svfloat32x4_t stack1, svfloat32_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_f32 (p0, x0, stack1);
+ svst2_f32 (p1, x0, z3);
+ svst3_f32 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1w (z[0-9]+\.s), p3/z, \[x2\]
+** st1w \1, p0, \[x0\]
+** st2w {z3\.s - z4\.s}, p1, \[x0\]
+** st3w {z0\.s - z2\.s}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svfloat32x3_t z0, svfloat32x2_t z3, svfloat32x3_t z5,
+ svfloat32x4_t stack1, svfloat32_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_f32 (p0, x0, stack2);
+ svst2_f32 (p1, x0, z3);
+ svst3_f32 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_f32 (pg, x0, -9),
+ svld2_vnum_f32 (pg, x0, -2),
+ svld3_vnum_f32 (pg, x0, 0),
+ svld4_vnum_f32 (pg, x0, 8),
+ svld1_vnum_f32 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3w\t{z0\.s - z2\.s}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{z3\.s - z4\.s}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z5\.s - z7\.s}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{(z[0-9]+\.s) - z[0-9]+\.s}.*\tst1w\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+\.s - (z[0-9]+\.s)}.*\tst1w\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1w\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ptrue p3\.b, all
+** ...
+** ld1d (z[0-9]+\.d), p3/z, \[x1, #3, mul vl\]
+** ...
+** st4d {z[0-9]+\.d - \1}, p0, \[x0\]
+** st2d {z3\.d - z4\.d}, p1, \[x0\]
+** st3d {z5\.d - z7\.d}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svfloat64x3_t z0, svfloat64x2_t z3, svfloat64x3_t z5,
+ svfloat64x4_t stack1, svfloat64_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_f64 (p0, x0, stack1);
+ svst2_f64 (p1, x0, z3);
+ svst3_f64 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1d (z[0-9]+\.d), p3/z, \[x2\]
+** st1d \1, p0, \[x0\]
+** st2d {z3\.d - z4\.d}, p1, \[x0\]
+** st3d {z0\.d - z2\.d}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svfloat64x3_t z0, svfloat64x2_t z3, svfloat64x3_t z5,
+ svfloat64x4_t stack1, svfloat64_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_f64 (p0, x0, stack2);
+ svst2_f64 (p1, x0, z3);
+ svst3_f64 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_f64 (pg, x0, -9),
+ svld2_vnum_f64 (pg, x0, -2),
+ svld3_vnum_f64 (pg, x0, 0),
+ svld4_vnum_f64 (pg, x0, 8),
+ svld1_vnum_f64 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3d\t{z0\.d - z2\.d}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{z3\.d - z4\.d}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z5\.d - z7\.d}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{(z[0-9]+\.d) - z[0-9]+\.d}.*\tst1d\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - (z[0-9]+\.d)}.*\tst1d\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\t(z[0-9]+\.d), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1d\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ptrue p3\.b, all
+** ...
+** ld1h (z[0-9]+\.h), p3/z, \[x1, #3, mul vl\]
+** ...
+** st4h {z[0-9]+\.h - \1}, p0, \[x0\]
+** st2h {z3\.h - z4\.h}, p1, \[x0\]
+** st3h {z5\.h - z7\.h}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svint16x3_t z0, svint16x2_t z3, svint16x3_t z5,
+ svint16x4_t stack1, svint16_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_s16 (p0, x0, stack1);
+ svst2_s16 (p1, x0, z3);
+ svst3_s16 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1h (z[0-9]+\.h), p3/z, \[x2\]
+** st1h \1, p0, \[x0\]
+** st2h {z3\.h - z4\.h}, p1, \[x0\]
+** st3h {z0\.h - z2\.h}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svint16x3_t z0, svint16x2_t z3, svint16x3_t z5,
+ svint16x4_t stack1, svint16_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_s16 (p0, x0, stack2);
+ svst2_s16 (p1, x0, z3);
+ svst3_s16 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_s16 (pg, x0, -9),
+ svld2_vnum_s16 (pg, x0, -2),
+ svld3_vnum_s16 (pg, x0, 0),
+ svld4_vnum_s16 (pg, x0, 8),
+ svld1_vnum_s16 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ptrue p3\.b, all
+** ...
+** ld1w (z[0-9]+\.s), p3/z, \[x1, #3, mul vl\]
+** ...
+** st4w {z[0-9]+\.s - \1\}, p0, \[x0\]
+** st2w {z3\.s - z4\.s}, p1, \[x0\]
+** st3w {z5\.s - z7\.s}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svint32x3_t z0, svint32x2_t z3, svint32x3_t z5,
+ svint32x4_t stack1, svint32_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_s32 (p0, x0, stack1);
+ svst2_s32 (p1, x0, z3);
+ svst3_s32 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1w (z[0-9]+\.s), p3/z, \[x2\]
+** st1w \1, p0, \[x0\]
+** st2w {z3\.s - z4\.s}, p1, \[x0\]
+** st3w {z0\.s - z2\.s}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svint32x3_t z0, svint32x2_t z3, svint32x3_t z5,
+ svint32x4_t stack1, svint32_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_s32 (p0, x0, stack2);
+ svst2_s32 (p1, x0, z3);
+ svst3_s32 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_s32 (pg, x0, -9),
+ svld2_vnum_s32 (pg, x0, -2),
+ svld3_vnum_s32 (pg, x0, 0),
+ svld4_vnum_s32 (pg, x0, 8),
+ svld1_vnum_s32 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3w\t{z0\.s - z2\.s}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{z3\.s - z4\.s}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z5\.s - z7\.s}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{(z[0-9]+\.s) - z[0-9]+\.s}.*\tst1w\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+\.s - (z[0-9]+\.s)}.*\tst1w\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1w\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ptrue p3\.b, all
+** ...
+** ld1d (z[0-9]+\.d), p3/z, \[x1, #3, mul vl\]
+** ...
+** st4d {z[0-9]+\.d - \1}, p0, \[x0\]
+** st2d {z3\.d - z4\.d}, p1, \[x0\]
+** st3d {z5\.d - z7\.d}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svint64x3_t z0, svint64x2_t z3, svint64x3_t z5,
+ svint64x4_t stack1, svint64_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_s64 (p0, x0, stack1);
+ svst2_s64 (p1, x0, z3);
+ svst3_s64 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1d (z[0-9]+\.d), p3/z, \[x2\]
+** st1d \1, p0, \[x0\]
+** st2d {z3\.d - z4\.d}, p1, \[x0\]
+** st3d {z0\.d - z2\.d}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svint64x3_t z0, svint64x2_t z3, svint64x3_t z5,
+ svint64x4_t stack1, svint64_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_s64 (p0, x0, stack2);
+ svst2_s64 (p1, x0, z3);
+ svst3_s64 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_s64 (pg, x0, -9),
+ svld2_vnum_s64 (pg, x0, -2),
+ svld3_vnum_s64 (pg, x0, 0),
+ svld4_vnum_s64 (pg, x0, 8),
+ svld1_vnum_s64 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3d\t{z0\.d - z2\.d}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{z3\.d - z4\.d}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z5\.d - z7\.d}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{(z[0-9]+\.d) - z[0-9]+\.d}.*\tst1d\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - (z[0-9]+\.d)}.*\tst1d\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\t(z[0-9]+\.d), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1d\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ptrue p3\.b, all
+** ...
+** ld1b (z[0-9]+\.b), p3/z, \[x1, #3, mul vl\]
+** ...
+** st4b {z[0-9]+\.b - \1}, p0, \[x0\]
+** st2b {z3\.b - z4\.b}, p1, \[x0\]
+** st3b {z5\.b - z7\.b}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svint8x3_t z0, svint8x2_t z3, svint8x3_t z5,
+ svint8x4_t stack1, svint8_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_s8 (p0, x0, stack1);
+ svst2_s8 (p1, x0, z3);
+ svst3_s8 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1b (z[0-9]+\.b), p3/z, \[x2\]
+** st1b \1, p0, \[x0\]
+** st2b {z3\.b - z4\.b}, p1, \[x0\]
+** st3b {z0\.b - z2\.b}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svint8x3_t z0, svint8x2_t z3, svint8x3_t z5,
+ svint8x4_t stack1, svint8_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_s8 (p0, x0, stack2);
+ svst2_s8 (p1, x0, z3);
+ svst3_s8 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_s8 (pg, x0, -9),
+ svld2_vnum_s8 (pg, x0, -2),
+ svld3_vnum_s8 (pg, x0, 0),
+ svld4_vnum_s8 (pg, x0, 8),
+ svld1_vnum_s8 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3b\t{z0\.b - z2\.b}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2b\t{z3\.b - z4\.b}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3b\t{z5\.b - z7\.b}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4b\t{(z[0-9]+\.b) - z[0-9]+\.b}.*\tst1b\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4b\t{z[0-9]+\.b - (z[0-9]+\.b)}.*\tst1b\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\t(z[0-9]+\.b), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1b\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ptrue p3\.b, all
+** ...
+** ld1h (z[0-9]+\.h), p3/z, \[x1, #3, mul vl\]
+** ...
+** st4h {z[0-9]+\.h - \1}, p0, \[x0\]
+** st2h {z3\.h - z4\.h}, p1, \[x0\]
+** st3h {z5\.h - z7\.h}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svuint16x3_t z0, svuint16x2_t z3, svuint16x3_t z5,
+ svuint16x4_t stack1, svuint16_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_u16 (p0, x0, stack1);
+ svst2_u16 (p1, x0, z3);
+ svst3_u16 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1h (z[0-9]+\.h), p3/z, \[x2\]
+** st1h \1, p0, \[x0\]
+** st2h {z3\.h - z4\.h}, p1, \[x0\]
+** st3h {z0\.h - z2\.h}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svuint16x3_t z0, svuint16x2_t z3, svuint16x3_t z5,
+ svuint16x4_t stack1, svuint16_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_u16 (p0, x0, stack2);
+ svst2_u16 (p1, x0, z3);
+ svst3_u16 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_u16 (pg, x0, -9),
+ svld2_vnum_u16 (pg, x0, -2),
+ svld3_vnum_u16 (pg, x0, 0),
+ svld4_vnum_u16 (pg, x0, 8),
+ svld1_vnum_u16 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+\.h) - z[0-9]+\.h}.*\tst1h\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+\.h)}.*\tst1h\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ptrue p3\.b, all
+** ...
+** ld1w (z[0-9]+\.s), p3/z, \[x1, #3, mul vl\]
+** ...
+** st4w {z[0-9]+\.s - \1}, p0, \[x0\]
+** st2w {z3\.s - z4\.s}, p1, \[x0\]
+** st3w {z5\.s - z7\.s}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svuint32x3_t z0, svuint32x2_t z3, svuint32x3_t z5,
+ svuint32x4_t stack1, svuint32_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_u32 (p0, x0, stack1);
+ svst2_u32 (p1, x0, z3);
+ svst3_u32 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1w (z[0-9]+\.s), p3/z, \[x2\]
+** st1w \1, p0, \[x0\]
+** st2w {z3\.s - z4\.s}, p1, \[x0\]
+** st3w {z0\.s - z2\.s}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svuint32x3_t z0, svuint32x2_t z3, svuint32x3_t z5,
+ svuint32x4_t stack1, svuint32_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_u32 (p0, x0, stack2);
+ svst2_u32 (p1, x0, z3);
+ svst3_u32 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_u32 (pg, x0, -9),
+ svld2_vnum_u32 (pg, x0, -2),
+ svld3_vnum_u32 (pg, x0, 0),
+ svld4_vnum_u32 (pg, x0, 8),
+ svld1_vnum_u32 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3w\t{z0\.s - z2\.s}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{z3\.s - z4\.s}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z5\.s - z7\.s}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{(z[0-9]+\.s) - z[0-9]+\.s}.*\tst1w\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+\.s - (z[0-9]+\.s)}.*\tst1w\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1w\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ptrue p3\.b, all
+** ...
+** ld1d (z[0-9]+\.d), p3/z, \[x1, #3, mul vl\]
+** ...
+** st4d {z[0-9]+\.d - \1}, p0, \[x0\]
+** st2d {z3\.d - z4\.d}, p1, \[x0\]
+** st3d {z5\.d - z7\.d}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svuint64x3_t z0, svuint64x2_t z3, svuint64x3_t z5,
+ svuint64x4_t stack1, svuint64_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_u64 (p0, x0, stack1);
+ svst2_u64 (p1, x0, z3);
+ svst3_u64 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1d (z[0-9]+\.d), p3/z, \[x2\]
+** st1d \1, p0, \[x0\]
+** st2d {z3\.d - z4\.d}, p1, \[x0\]
+** st3d {z0\.d - z2\.d}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svuint64x3_t z0, svuint64x2_t z3, svuint64x3_t z5,
+ svuint64x4_t stack1, svuint64_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_u64 (p0, x0, stack2);
+ svst2_u64 (p1, x0, z3);
+ svst3_u64 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_u64 (pg, x0, -9),
+ svld2_vnum_u64 (pg, x0, -2),
+ svld3_vnum_u64 (pg, x0, 0),
+ svld4_vnum_u64 (pg, x0, 8),
+ svld1_vnum_u64 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3d\t{z0\.d - z2\.d}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{z3\.d - z4\.d}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z5\.d - z7\.d}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{(z[0-9]+\.d) - z[0-9]+\.d}.*\tst1d\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - (z[0-9]+\.d)}.*\tst1d\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\t(z[0-9]+\.d), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1d\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ptrue p3\.b, all
+** ...
+** ld1b (z[0-9]+\.b), p3/z, \[x1, #3, mul vl\]
+** ...
+** st4b {z[0-9]+\.b - \1}, p0, \[x0\]
+** st2b {z3\.b - z4\.b}, p1, \[x0\]
+** st3b {z5\.b - z7\.b}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svuint8x3_t z0, svuint8x2_t z3, svuint8x3_t z5,
+ svuint8x4_t stack1, svuint8_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_u8 (p0, x0, stack1);
+ svst2_u8 (p1, x0, z3);
+ svst3_u8 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1b (z[0-9]+\.b), p3/z, \[x2\]
+** st1b \1, p0, \[x0\]
+** st2b {z3\.b - z4\.b}, p1, \[x0\]
+** st3b {z0\.b - z2\.b}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svuint8x3_t z0, svuint8x2_t z3, svuint8x3_t z5,
+ svuint8x4_t stack1, svuint8_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_u8 (p0, x0, stack2);
+ svst2_u8 (p1, x0, z3);
+ svst3_u8 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_u8 (pg, x0, -9),
+ svld2_vnum_u8 (pg, x0, -2),
+ svld3_vnum_u8 (pg, x0, 0),
+ svld4_vnum_u8 (pg, x0, 8),
+ svld1_vnum_u8 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3b\t{z0\.b - z2\.b}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2b\t{z3\.b - z4\.b}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3b\t{z5\.b - z7\.b}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4b\t{(z[0-9]+\.b) - z[0-9]+\.b}.*\tst1b\t\1, p[0-7], \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4b\t{z[0-9]+\.b - (z[0-9]+\.b)}.*\tst1b\t\1, p[0-7], \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\t(z[0-9]+\.b), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1b\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ...
+** ldr (z[0-9]+), \[x1, #3, mul vl\]
+** ...
+** st4h {z[0-9]+\.h - \1\.h}, p0, \[x0\]
+** st2h {z3\.h - z4\.h}, p1, \[x0\]
+** st3h {z5\.h - z7\.h}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svfloat16x3_t z0, svfloat16x2_t z3, svfloat16x3_t z5,
+ svfloat16x4_t stack1, svfloat16_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_f16 (p0, x0, stack1);
+ svst2_f16 (p1, x0, z3);
+ svst3_f16 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1h (z[0-9]+\.h), p3/z, \[x2\]
+** st1h \1, p0, \[x0\]
+** st2h {z3\.h - z4\.h}, p1, \[x0\]
+** st3h {z0\.h - z2\.h}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svfloat16x3_t z0, svfloat16x2_t z3, svfloat16x3_t z5,
+ svfloat16x4_t stack1, svfloat16_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_f16 (p0, x0, stack2);
+ svst2_f16 (p1, x0, z3);
+ svst3_f16 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_f16 (pg, x0, -9),
+ svld2_vnum_f16 (pg, x0, -2),
+ svld3_vnum_f16 (pg, x0, 0),
+ svld4_vnum_f16 (pg, x0, 8),
+ svld1_vnum_f16 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ...
+** ldr (z[0-9]+), \[x1, #3, mul vl\]
+** ...
+** st4w {z[0-9]+\.s - \1\.s}, p0, \[x0\]
+** st2w {z3\.s - z4\.s}, p1, \[x0\]
+** st3w {z5\.s - z7\.s}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svfloat32x3_t z0, svfloat32x2_t z3, svfloat32x3_t z5,
+ svfloat32x4_t stack1, svfloat32_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_f32 (p0, x0, stack1);
+ svst2_f32 (p1, x0, z3);
+ svst3_f32 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1w (z[0-9]+\.s), p3/z, \[x2\]
+** st1w \1, p0, \[x0\]
+** st2w {z3\.s - z4\.s}, p1, \[x0\]
+** st3w {z0\.s - z2\.s}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svfloat32x3_t z0, svfloat32x2_t z3, svfloat32x3_t z5,
+ svfloat32x4_t stack1, svfloat32_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_f32 (p0, x0, stack2);
+ svst2_f32 (p1, x0, z3);
+ svst3_f32 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_f32 (pg, x0, -9),
+ svld2_vnum_f32 (pg, x0, -2),
+ svld3_vnum_f32 (pg, x0, 0),
+ svld4_vnum_f32 (pg, x0, 8),
+ svld1_vnum_f32 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3w\t{z0\.s - z2\.s}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{z3\.s - z4\.s}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z5\.s - z7\.s}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{(z[0-9]+)\.s - z[0-9]+\.s}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+\.s - (z[0-9]+)\.s}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1w\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ...
+** ldr (z[0-9]+), \[x1, #3, mul vl\]
+** ...
+** st4d {z[0-9]+\.d - \1\.d}, p0, \[x0\]
+** st2d {z3\.d - z4\.d}, p1, \[x0\]
+** st3d {z5\.d - z7\.d}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svfloat64x3_t z0, svfloat64x2_t z3, svfloat64x3_t z5,
+ svfloat64x4_t stack1, svfloat64_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_f64 (p0, x0, stack1);
+ svst2_f64 (p1, x0, z3);
+ svst3_f64 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1d (z[0-9]+\.d), p3/z, \[x2\]
+** st1d \1, p0, \[x0\]
+** st2d {z3\.d - z4\.d}, p1, \[x0\]
+** st3d {z0\.d - z2\.d}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svfloat64x3_t z0, svfloat64x2_t z3, svfloat64x3_t z5,
+ svfloat64x4_t stack1, svfloat64_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_f64 (p0, x0, stack2);
+ svst2_f64 (p1, x0, z3);
+ svst3_f64 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_f64 (pg, x0, -9),
+ svld2_vnum_f64 (pg, x0, -2),
+ svld3_vnum_f64 (pg, x0, 0),
+ svld4_vnum_f64 (pg, x0, 8),
+ svld1_vnum_f64 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3d\t{z0\.d - z2\.d}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{z3\.d - z4\.d}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z5\.d - z7\.d}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{(z[0-9]+)\.d - z[0-9]+\.d}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - (z[0-9]+)\.d}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\t(z[0-9]+\.d), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1d\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ...
+** ldr (z[0-9]+), \[x1, #3, mul vl\]
+** ...
+** st4h {z[0-9]+\.h - \1\.h}, p0, \[x0\]
+** st2h {z3\.h - z4\.h}, p1, \[x0\]
+** st3h {z5\.h - z7\.h}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svint16x3_t z0, svint16x2_t z3, svint16x3_t z5,
+ svint16x4_t stack1, svint16_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_s16 (p0, x0, stack1);
+ svst2_s16 (p1, x0, z3);
+ svst3_s16 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1h (z[0-9]+\.h), p3/z, \[x2\]
+** st1h \1, p0, \[x0\]
+** st2h {z3\.h - z4\.h}, p1, \[x0\]
+** st3h {z0\.h - z2\.h}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svint16x3_t z0, svint16x2_t z3, svint16x3_t z5,
+ svint16x4_t stack1, svint16_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_s16 (p0, x0, stack2);
+ svst2_s16 (p1, x0, z3);
+ svst3_s16 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_s16 (pg, x0, -9),
+ svld2_vnum_s16 (pg, x0, -2),
+ svld3_vnum_s16 (pg, x0, 0),
+ svld4_vnum_s16 (pg, x0, 8),
+ svld1_vnum_s16 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ...
+** ldr (z[0-9]+), \[x1, #3, mul vl\]
+** ...
+** st4w {z[0-9]+\.s - \1\.s}, p0, \[x0\]
+** st2w {z3\.s - z4\.s}, p1, \[x0\]
+** st3w {z5\.s - z7\.s}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svint32x3_t z0, svint32x2_t z3, svint32x3_t z5,
+ svint32x4_t stack1, svint32_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_s32 (p0, x0, stack1);
+ svst2_s32 (p1, x0, z3);
+ svst3_s32 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1w (z[0-9]+\.s), p3/z, \[x2\]
+** st1w \1, p0, \[x0\]
+** st2w {z3\.s - z4\.s}, p1, \[x0\]
+** st3w {z0\.s - z2\.s}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svint32x3_t z0, svint32x2_t z3, svint32x3_t z5,
+ svint32x4_t stack1, svint32_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_s32 (p0, x0, stack2);
+ svst2_s32 (p1, x0, z3);
+ svst3_s32 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_s32 (pg, x0, -9),
+ svld2_vnum_s32 (pg, x0, -2),
+ svld3_vnum_s32 (pg, x0, 0),
+ svld4_vnum_s32 (pg, x0, 8),
+ svld1_vnum_s32 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3w\t{z0\.s - z2\.s}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{z3\.s - z4\.s}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z5\.s - z7\.s}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{(z[0-9]+)\.s - z[0-9]+\.s}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+\.s - (z[0-9]+)\.s}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1w\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ...
+** ldr (z[0-9]+), \[x1, #3, mul vl\]
+** ...
+** st4d {z[0-9]+\.d - \1\.d}, p0, \[x0\]
+** st2d {z3\.d - z4\.d}, p1, \[x0\]
+** st3d {z5\.d - z7\.d}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svint64x3_t z0, svint64x2_t z3, svint64x3_t z5,
+ svint64x4_t stack1, svint64_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_s64 (p0, x0, stack1);
+ svst2_s64 (p1, x0, z3);
+ svst3_s64 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1d (z[0-9]+\.d), p3/z, \[x2\]
+** st1d \1, p0, \[x0\]
+** st2d {z3\.d - z4\.d}, p1, \[x0\]
+** st3d {z0\.d - z2\.d}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svint64x3_t z0, svint64x2_t z3, svint64x3_t z5,
+ svint64x4_t stack1, svint64_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_s64 (p0, x0, stack2);
+ svst2_s64 (p1, x0, z3);
+ svst3_s64 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_s64 (pg, x0, -9),
+ svld2_vnum_s64 (pg, x0, -2),
+ svld3_vnum_s64 (pg, x0, 0),
+ svld4_vnum_s64 (pg, x0, 8),
+ svld1_vnum_s64 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3d\t{z0\.d - z2\.d}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{z3\.d - z4\.d}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z5\.d - z7\.d}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{(z[0-9]+)\.d - z[0-9]+\.d}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - (z[0-9]+)\.d}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\t(z[0-9]+\.d), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1d\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ...
+** ldr (z[0-9]+), \[x1, #3, mul vl\]
+** ...
+** st4b {z[0-9]+\.b - \1\.b}, p0, \[x0\]
+** st2b {z3\.b - z4\.b}, p1, \[x0\]
+** st3b {z5\.b - z7\.b}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svint8x3_t z0, svint8x2_t z3, svint8x3_t z5,
+ svint8x4_t stack1, svint8_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_s8 (p0, x0, stack1);
+ svst2_s8 (p1, x0, z3);
+ svst3_s8 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1b (z[0-9]+\.b), p3/z, \[x2\]
+** st1b \1, p0, \[x0\]
+** st2b {z3\.b - z4\.b}, p1, \[x0\]
+** st3b {z0\.b - z2\.b}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svint8x3_t z0, svint8x2_t z3, svint8x3_t z5,
+ svint8x4_t stack1, svint8_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_s8 (p0, x0, stack2);
+ svst2_s8 (p1, x0, z3);
+ svst3_s8 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_s8 (pg, x0, -9),
+ svld2_vnum_s8 (pg, x0, -2),
+ svld3_vnum_s8 (pg, x0, 0),
+ svld4_vnum_s8 (pg, x0, 8),
+ svld1_vnum_s8 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3b\t{z0\.b - z2\.b}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2b\t{z3\.b - z4\.b}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3b\t{z5\.b - z7\.b}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4b\t{(z[0-9]+)\.b - z[0-9]+\.b}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4b\t{z[0-9]+\.b - (z[0-9]+)\.b}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\t(z[0-9]+\.b), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1b\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ...
+** ldr (z[0-9]+), \[x1, #3, mul vl\]
+** ...
+** st4h {z[0-9]+\.h - \1\.h}, p0, \[x0\]
+** st2h {z3\.h - z4\.h}, p1, \[x0\]
+** st3h {z5\.h - z7\.h}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svuint16x3_t z0, svuint16x2_t z3, svuint16x3_t z5,
+ svuint16x4_t stack1, svuint16_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_u16 (p0, x0, stack1);
+ svst2_u16 (p1, x0, z3);
+ svst3_u16 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1h (z[0-9]+\.h), p3/z, \[x2\]
+** st1h \1, p0, \[x0\]
+** st2h {z3\.h - z4\.h}, p1, \[x0\]
+** st3h {z0\.h - z2\.h}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svuint16x3_t z0, svuint16x2_t z3, svuint16x3_t z5,
+ svuint16x4_t stack1, svuint16_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_u16 (p0, x0, stack2);
+ svst2_u16 (p1, x0, z3);
+ svst3_u16 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_u16 (pg, x0, -9),
+ svld2_vnum_u16 (pg, x0, -2),
+ svld3_vnum_u16 (pg, x0, 0),
+ svld4_vnum_u16 (pg, x0, 8),
+ svld1_vnum_u16 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3h\t{z0\.h - z2\.h}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2h\t{z3\.h - z4\.h}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3h\t{z5\.h - z7\.h}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{(z[0-9]+)\.h - z[0-9]+\.h}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4h\t{z[0-9]+\.h - (z[0-9]+)\.h}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1h\t(z[0-9]+\.h), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1h\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ...
+** ldr (z[0-9]+), \[x1, #3, mul vl\]
+** ...
+** st4w {z[0-9]+\.s - \1\.s}, p0, \[x0\]
+** st2w {z3\.s - z4\.s}, p1, \[x0\]
+** st3w {z5\.s - z7\.s}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svuint32x3_t z0, svuint32x2_t z3, svuint32x3_t z5,
+ svuint32x4_t stack1, svuint32_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_u32 (p0, x0, stack1);
+ svst2_u32 (p1, x0, z3);
+ svst3_u32 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1w (z[0-9]+\.s), p3/z, \[x2\]
+** st1w \1, p0, \[x0\]
+** st2w {z3\.s - z4\.s}, p1, \[x0\]
+** st3w {z0\.s - z2\.s}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svuint32x3_t z0, svuint32x2_t z3, svuint32x3_t z5,
+ svuint32x4_t stack1, svuint32_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_u32 (p0, x0, stack2);
+ svst2_u32 (p1, x0, z3);
+ svst3_u32 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_u32 (pg, x0, -9),
+ svld2_vnum_u32 (pg, x0, -2),
+ svld3_vnum_u32 (pg, x0, 0),
+ svld4_vnum_u32 (pg, x0, 8),
+ svld1_vnum_u32 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3w\t{z0\.s - z2\.s}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2w\t{z3\.s - z4\.s}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3w\t{z5\.s - z7\.s}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{(z[0-9]+)\.s - z[0-9]+\.s}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4w\t{z[0-9]+\.s - (z[0-9]+)\.s}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1w\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ...
+** ldr (z[0-9]+), \[x1, #3, mul vl\]
+** ...
+** st4d {z[0-9]+\.d - \1\.d}, p0, \[x0\]
+** st2d {z3\.d - z4\.d}, p1, \[x0\]
+** st3d {z5\.d - z7\.d}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svuint64x3_t z0, svuint64x2_t z3, svuint64x3_t z5,
+ svuint64x4_t stack1, svuint64_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_u64 (p0, x0, stack1);
+ svst2_u64 (p1, x0, z3);
+ svst3_u64 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1d (z[0-9]+\.d), p3/z, \[x2\]
+** st1d \1, p0, \[x0\]
+** st2d {z3\.d - z4\.d}, p1, \[x0\]
+** st3d {z0\.d - z2\.d}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svuint64x3_t z0, svuint64x2_t z3, svuint64x3_t z5,
+ svuint64x4_t stack1, svuint64_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_u64 (p0, x0, stack2);
+ svst2_u64 (p1, x0, z3);
+ svst3_u64 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_u64 (pg, x0, -9),
+ svld2_vnum_u64 (pg, x0, -2),
+ svld3_vnum_u64 (pg, x0, 0),
+ svld4_vnum_u64 (pg, x0, 8),
+ svld1_vnum_u64 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3d\t{z0\.d - z2\.d}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2d\t{z3\.d - z4\.d}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3d\t{z5\.d - z7\.d}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{(z[0-9]+)\.d - z[0-9]+\.d}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4d\t{z[0-9]+\.d - (z[0-9]+)\.d}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1d\t(z[0-9]+\.d), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1d\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** callee1:
+** ...
+** ldr (z[0-9]+), \[x1, #3, mul vl\]
+** ...
+** st4b {z[0-9]+\.b - \1\.b}, p0, \[x0\]
+** st2b {z3\.b - z4\.b}, p1, \[x0\]
+** st3b {z5\.b - z7\.b}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee1 (void *x0, svuint8x3_t z0, svuint8x2_t z3, svuint8x3_t z5,
+ svuint8x4_t stack1, svuint8_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst4_u8 (p0, x0, stack1);
+ svst2_u8 (p1, x0, z3);
+ svst3_u8 (p2, x0, z5);
+}
+
+/*
+** callee2:
+** ptrue p3\.b, all
+** ld1b (z[0-9]+\.b), p3/z, \[x2\]
+** st1b \1, p0, \[x0\]
+** st2b {z3\.b - z4\.b}, p1, \[x0\]
+** st3b {z0\.b - z2\.b}, p2, \[x0\]
+** ret
+*/
+void __attribute__((noipa))
+callee2 (void *x0, svuint8x3_t z0, svuint8x2_t z3, svuint8x3_t z5,
+ svuint8x4_t stack1, svuint8_t stack2, svbool_t p0,
+ svbool_t p1, svbool_t p2)
+{
+ svst1_u8 (p0, x0, stack2);
+ svst2_u8 (p1, x0, z3);
+ svst3_u8 (p2, x0, z0);
+}
+
+void __attribute__((noipa))
+caller (void *x0)
+{
+ svbool_t pg;
+ pg = svptrue_b8 ();
+ callee1 (x0,
+ svld3_vnum_u8 (pg, x0, -9),
+ svld2_vnum_u8 (pg, x0, -2),
+ svld3_vnum_u8 (pg, x0, 0),
+ svld4_vnum_u8 (pg, x0, 8),
+ svld1_vnum_u8 (pg, x0, 5),
+ svptrue_pat_b8 (SV_VL1),
+ svptrue_pat_b16 (SV_VL2),
+ svptrue_pat_b32 (SV_VL3));
+}
+
+/* { dg-final { scan-assembler {\tld3b\t{z0\.b - z2\.b}, p[0-7]/z, \[x0, #-9, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld2b\t{z3\.b - z4\.b}, p[0-7]/z, \[x0, #-2, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld3b\t{z5\.b - z7\.b}, p[0-7]/z, \[x0\]\n} } } */
+/* { dg-final { scan-assembler {\tld4b\t{(z[0-9]+)\.b - z[0-9]+\.b}.*\tstr\t\1, \[x1\]\n} } } */
+/* { dg-final { scan-assembler {\tld4b\t{z[0-9]+\.b - (z[0-9]+)\.b}.*\tstr\t\1, \[x1, #3, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tld1b\t(z[0-9]+\.b), p[0-7]/z, \[x0, #5, mul vl\]\n.*\tst1b\t\1, p[0-7], \[x2\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp0\.b, vl1\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp1\.h, vl2\n} } } */
+/* { dg-final { scan-assembler {\tptrue\tp2\.s, vl3\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+/*
+** callee:
+** ...
+** ldr (x[0-9]+), \[sp\]
+** ...
+** ld1b (z[0-9]+\.b), p[1-3]/z, \[\1\]
+** st1b \2, p0, \[x0, x7\]
+** ret
+*/
+void __attribute__((noipa))
+callee (int8_t *x0, int x1, int x2, int x3,
+ int x4, int x5, svbool_t p0, int x6, int64_t x7,
+ svint32x4_t z0, svint32x4_t z4, svint8_t stack)
+{
+ svst1 (p0, x0 + x7, stack);
+}
+
+void __attribute__((noipa))
+caller (int8_t *x0, svbool_t p0, svint32x4_t z0, svint32x4_t z4)
+{
+ callee (x0, 1, 2, 3, 4, 5, p0, 6, 7, z0, z4, svdup_s8 (42));
+}
+
+/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.b), #42\n.*\tst1b\t\1, p[0-7], \[(x[0-9]+)\]\n.*\tstr\t\2, \[sp\]\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+/*
+** callee:
+** ptrue (p[1-3])\.b, all
+** ld1b (z[0-9]+\.b), \1/z, \[x4\]
+** st1b \2, p0, \[x0, x7\]
+** ret
+*/
+void __attribute__((noipa))
+callee (int8_t *x0, int x1, int x2, int x3,
+ svint32x4_t z0, svint32x4_t z4, svint8_t stack,
+ int x5, svbool_t p0, int x6, int64_t x7)
+{
+ svst1 (p0, x0 + x7, stack);
+}
+
+void __attribute__((noipa))
+caller (int8_t *x0, svbool_t p0, svint32x4_t z0, svint32x4_t z4)
+{
+ callee (x0, 1, 2, 3, z0, z4, svdup_s8 (42), 5, p0, 6, 7);
+}
+
+/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.b), #42\n.*\tst1b\t\1, p[0-7], \[x4\]\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+/*
+** callee:
+** ldr (x[0-9]+), \[sp, 8\]
+** ldr p0, \[\1\]
+** ret
+*/
+svbool_t __attribute__((noipa))
+callee (svint64x4_t z0, svint16x4_t z4,
+ svint64_t stack1, svint32_t stack2,
+ svint16_t stack3, svint8_t stack4,
+ svuint64_t stack5, svuint32_t stack6,
+ svuint16_t stack7, svuint8_t stack8,
+ svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3,
+ svbool_t stack9, svbool_t stack10)
+{
+ return stack10;
+}
+
+uint64_t __attribute__((noipa))
+caller (int64_t *x0, int16_t *x1, svbool_t p0)
+{
+ svbool_t res;
+ res = callee (svld4 (p0, x0),
+ svld4 (p0, x1),
+ svdup_s64 (1),
+ svdup_s32 (2),
+ svdup_s16 (3),
+ svdup_s8 (4),
+ svdup_u64 (5),
+ svdup_u32 (6),
+ svdup_u16 (7),
+ svdup_u8 (8),
+ svptrue_pat_b8 (SV_VL5),
+ svptrue_pat_b16 (SV_VL6),
+ svptrue_pat_b32 (SV_VL7),
+ svptrue_pat_b64 (SV_VL8),
+ svptrue_pat_b8 (SV_MUL3),
+ svptrue_pat_b16 (SV_MUL3));
+ return svcntp_b8 (res, res);
+}
+
+/* { dg-final { scan-assembler {\tptrue\t(p[0-9]+)\.b, mul3\n\tstr\t\1, \[(x[0-9]+)\]\n.*\tstr\t\2, \[sp\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\t(p[0-9]+)\.h, mul3\n\tstr\t\1, \[(x[0-9]+)\]\n.*\tstr\t\2, \[sp, 8\]\n} } } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-prune-output "compilation terminated" } */
+
+#include <arm_sve.h>
+
+#pragma GCC target "+nosve"
+
+svbool_t return_bool ();
+
+void
+f (void)
+{
+ return_bool (); /* { dg-error {'return_bool' requires the SVE ISA extension} } */
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-prune-output "compilation terminated" } */
+
+#include <arm_sve.h>
+
+#pragma GCC target "+nosve"
+
+svbool_t return_bool ();
+
+void
+f (svbool_t *ptr)
+{
+ *ptr = return_bool (); /* { dg-error {'return_bool' requires the SVE ISA extension} } */
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-prune-output "compilation terminated" } */
+
+#include <arm_sve.h>
+
+#pragma GCC target "+nosve"
+
+svbool_t (*return_bool) ();
+
+void
+f (svbool_t *ptr)
+{
+ *ptr = return_bool (); /* { dg-error {calls to functions of type 'svbool_t\(\)' require the SVE ISA extension} } */
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-prune-output "compilation terminated" } */
+
+#include <arm_sve.h>
+
+#pragma GCC target "+nosve"
+
+void take_svuint8 (svuint8_t);
+
+void
+f (svuint8_t *ptr)
+{
+ take_svuint8 (*ptr); /* { dg-error {'take_svuint8' requires the SVE ISA extension} } */
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-prune-output "compilation terminated" } */
+
+#include <arm_sve.h>
+
+#pragma GCC target "+nosve"
+
+void take_svuint8_eventually (float, float, float, float,
+ float, float, float, float, svuint8_t);
+
+void
+f (svuint8_t *ptr)
+{
+ take_svuint8_eventually (0, 0, 0, 0, 0, 0, 0, 0, *ptr); /* { dg-error {arguments of type '(svuint8_t|__SVUint8_t)' require the SVE ISA extension} } */
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-prune-output "compilation terminated" } */
+
+#include <arm_sve.h>
+
+#pragma GCC target "+nosve"
+
+void unprototyped ();
+
+void
+f (svuint8_t *ptr)
+{
+ unprototyped (*ptr); /* { dg-error {arguments of type '(svuint8_t|__SVUint8_t)' require the SVE ISA extension} } */
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-prune-output "compilation terminated" } */
+
+#include <arm_sve.h>
+
+#pragma GCC target "+nosve"
+
+void f (svuint8_t x) {} /* { dg-error {'f' requires the SVE ISA extension} } */
--- /dev/null
+/* { dg-do compile } */
+/* { dg-prune-output "compilation terminated" } */
+
+#include <arm_sve.h>
+
+#pragma GCC target "+nosve"
+
+void
+f (float a, float b, float c, float d, float e, float f, float g, float h, svuint8_t x) /* { dg-error {arguments of type '(svuint8_t|__SVUint8_t)' require the SVE ISA extension} } */
+{
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** callee_pred:
+** ldr p0, \[x0\]
+** ret
+*/
+__SVBool_t __attribute__((noipa))
+callee_pred (__SVBool_t *ptr)
+{
+ return *ptr;
+}
+
+#include <arm_sve.h>
+
+/*
+** caller_pred:
+** ...
+** bl callee_pred
+** cntp x0, p0, p0.b
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+uint64_t __attribute__((noipa))
+caller_pred (__SVBool_t *ptr1)
+{
+ __SVBool_t p;
+ p = callee_pred (ptr1);
+ return svcntp_b8 (p, p);
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=1024 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** callee_pred:
+** ldr p0, \[x0\]
+** ret
+*/
+__SVBool_t __attribute__((noipa))
+callee_pred (__SVBool_t *ptr)
+{
+ return *ptr;
+}
+
+#include <arm_sve.h>
+
+/*
+** caller_pred:
+** ...
+** bl callee_pred
+** cntp x0, p0, p0.b
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+uint64_t __attribute__((noipa))
+caller_pred (__SVBool_t *ptr1)
+{
+ __SVBool_t p = callee_pred (ptr1);
+ return svcntp_b8 (p, p);
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=2048 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** callee_pred:
+** ldr p0, \[x0\]
+** ret
+*/
+__SVBool_t __attribute__((noipa))
+callee_pred (__SVBool_t *ptr)
+{
+ return *ptr;
+}
+
+#include <arm_sve.h>
+
+/*
+** caller_pred:
+** ...
+** bl callee_pred
+** cntp x0, p0, p0.b
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+uint64_t __attribute__((noipa))
+caller_pred (__SVBool_t *ptr1)
+{
+ __SVBool_t p = callee_pred (ptr1);
+ return svcntp_b8 (p, p);
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=256 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** callee_pred:
+** ldr p0, \[x0\]
+** ret
+*/
+__SVBool_t __attribute__((noipa))
+callee_pred (__SVBool_t *ptr)
+{
+ return *ptr;
+}
+
+#include <arm_sve.h>
+
+/*
+** caller_pred:
+** ...
+** bl callee_pred
+** cntp x0, p0, p0.b
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+uint64_t __attribute__((noipa))
+caller_pred (__SVBool_t *ptr1)
+{
+ __SVBool_t p = callee_pred (ptr1);
+ return svcntp_b8 (p, p);
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=512 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+** callee_pred:
+** ldr p0, \[x0\]
+** ret
+*/
+__SVBool_t __attribute__((noipa))
+callee_pred (__SVBool_t *ptr)
+{
+ return *ptr;
+}
+
+#include <arm_sve.h>
+
+/*
+** caller_pred:
+** ...
+** bl callee_pred
+** cntp x0, p0, p0.b
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+uint64_t __attribute__((noipa))
+caller_pred (__SVBool_t *ptr1)
+{
+ __SVBool_t p = callee_pred (ptr1);
+ return svcntp_b8 (p, p);
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+/*
+** callee_pred:
+** ldr p0, \[x0\]
+** ret
+*/
+svbool_t __attribute__((noipa))
+callee_pred (svbool_t *ptr)
+{
+ return *ptr;
+}
+
+/*
+** caller_pred:
+** ...
+** bl callee_pred
+** cntp x0, p0, p0.b
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+uint64_t __attribute__((noipa))
+caller_pred (svbool_t *ptr1)
+{
+ svbool_t p;
+ p = callee_pred (ptr1);
+ return svcntp_b8 (p, p);
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+typedef svbool_t my_pred;
+
+/*
+** callee_pred:
+** ldr p0, \[x0\]
+** ret
+*/
+my_pred __attribute__((noipa))
+callee_pred (my_pred *ptr)
+{
+ return *ptr;
+}
+
+/*
+** caller_pred:
+** ...
+** bl callee_pred
+** cntp x0, p0, p0.b
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+uint64_t __attribute__((noipa))
+caller_pred (my_pred *ptr1)
+{
+ my_pred p;
+ p = callee_pred (ptr1);
+ return svcntp_b8 (p, p);
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, all
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s8, __SVInt8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, all
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u8, __SVUint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, all
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s16, __SVInt16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, all
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u16, __SVUint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, all
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f16, __SVFloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, all
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s32, __SVInt32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, all
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u32, __SVUint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, all
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f32, __SVFloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, all
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s64, __SVInt64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, all
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u64, __SVUint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, all
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f64, __SVFloat64_t)
+
+#include <arm_sve.h>
+
+#define CALLER(SUFFIX, TYPE) \
+ typeof (svaddv (svptrue_b8 (), *(TYPE *) 0)) \
+ __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1) \
+ { \
+ return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1)); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ptrue (p[0-7])\.b, all
+** saddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s8, __SVInt8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ptrue (p[0-7])\.b, all
+** uaddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u8, __SVUint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ptrue (p[0-7])\.b, all
+** saddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s16, __SVInt16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ptrue (p[0-7])\.b, all
+** uaddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u16, __SVUint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ptrue (p[0-7])\.b, all
+** faddv h0, \1, z0\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f16, __SVFloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ptrue (p[0-7])\.b, all
+** saddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s32, __SVInt32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ptrue (p[0-7])\.b, all
+** uaddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u32, __SVUint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ptrue (p[0-7])\.b, all
+** faddv s0, \1, z0\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f32, __SVFloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ptrue (p[0-7])\.b, all
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s64, __SVInt64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ptrue (p[0-7])\.b, all
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u64, __SVUint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ptrue (p[0-7])\.b, all
+** faddv d0, \1, z0\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f64, __SVFloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=1024 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, vl128
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s8, __SVInt8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, vl128
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u8, __SVUint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, vl128
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s16, __SVInt16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, vl128
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u16, __SVUint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, vl128
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f16, __SVFloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, vl128
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s32, __SVInt32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, vl128
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u32, __SVUint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, vl128
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f32, __SVFloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, vl128
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s64, __SVInt64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, vl128
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u64, __SVUint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, vl128
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f64, __SVFloat64_t)
+
+#include <arm_sve.h>
+
+#define CALLER(SUFFIX, TYPE) \
+ typeof (svaddv (svptrue_b8 (), *(TYPE *) 0)) \
+ __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1) \
+ { \
+ return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1)); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ptrue (p[0-7])\.b, vl128
+** saddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s8, __SVInt8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ptrue (p[0-7])\.b, vl128
+** uaddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u8, __SVUint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ptrue (p[0-7])\.b, vl128
+** saddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s16, __SVInt16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ptrue (p[0-7])\.b, vl128
+** uaddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u16, __SVUint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ptrue (p[0-7])\.b, vl128
+** faddv h0, \1, z0\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f16, __SVFloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ptrue (p[0-7])\.b, vl128
+** saddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s32, __SVInt32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ptrue (p[0-7])\.b, vl128
+** uaddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u32, __SVUint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ptrue (p[0-7])\.b, vl128
+** faddv s0, \1, z0\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f32, __SVFloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ptrue (p[0-7])\.b, vl128
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s64, __SVInt64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ptrue (p[0-7])\.b, vl128
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u64, __SVUint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ptrue (p[0-7])\.b, vl128
+** faddv d0, \1, z0\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f64, __SVFloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=2048 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, vl256
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s8, __SVInt8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, vl256
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u8, __SVUint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, vl256
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s16, __SVInt16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, vl256
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u16, __SVUint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, vl256
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f16, __SVFloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, vl256
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s32, __SVInt32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, vl256
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u32, __SVUint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, vl256
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f32, __SVFloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, vl256
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s64, __SVInt64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, vl256
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u64, __SVUint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, vl256
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f64, __SVFloat64_t)
+
+#include <arm_sve.h>
+
+#define CALLER(SUFFIX, TYPE) \
+ typeof (svaddv (svptrue_b8 (), *(TYPE *) 0)) \
+ __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1) \
+ { \
+ return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1)); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ptrue (p[0-7])\.b, vl256
+** saddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s8, __SVInt8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ptrue (p[0-7])\.b, vl256
+** uaddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u8, __SVUint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ptrue (p[0-7])\.b, vl256
+** saddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s16, __SVInt16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ptrue (p[0-7])\.b, vl256
+** uaddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u16, __SVUint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ptrue (p[0-7])\.b, vl256
+** faddv h0, \1, z0\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f16, __SVFloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ptrue (p[0-7])\.b, vl256
+** saddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s32, __SVInt32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ptrue (p[0-7])\.b, vl256
+** uaddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u32, __SVUint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ptrue (p[0-7])\.b, vl256
+** faddv s0, \1, z0\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f32, __SVFloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ptrue (p[0-7])\.b, vl256
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s64, __SVInt64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ptrue (p[0-7])\.b, vl256
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u64, __SVUint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ptrue (p[0-7])\.b, vl256
+** faddv d0, \1, z0\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f64, __SVFloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=256 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, vl32
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s8, __SVInt8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, vl32
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u8, __SVUint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, vl32
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s16, __SVInt16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, vl32
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u16, __SVUint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, vl32
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f16, __SVFloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, vl32
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s32, __SVInt32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, vl32
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u32, __SVUint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, vl32
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f32, __SVFloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, vl32
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s64, __SVInt64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, vl32
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u64, __SVUint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, vl32
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f64, __SVFloat64_t)
+
+#include <arm_sve.h>
+
+#define CALLER(SUFFIX, TYPE) \
+ typeof (svaddv (svptrue_b8 (), *(TYPE *) 0)) \
+ __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1) \
+ { \
+ return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1)); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ptrue (p[0-7])\.b, vl32
+** saddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s8, __SVInt8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ptrue (p[0-7])\.b, vl32
+** uaddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u8, __SVUint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ptrue (p[0-7])\.b, vl32
+** saddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s16, __SVInt16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ptrue (p[0-7])\.b, vl32
+** uaddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u16, __SVUint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ptrue (p[0-7])\.b, vl32
+** faddv h0, \1, z0\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f16, __SVFloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ptrue (p[0-7])\.b, vl32
+** saddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s32, __SVInt32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ptrue (p[0-7])\.b, vl32
+** uaddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u32, __SVUint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ptrue (p[0-7])\.b, vl32
+** faddv s0, \1, z0\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f32, __SVFloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ptrue (p[0-7])\.b, vl32
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s64, __SVInt64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ptrue (p[0-7])\.b, vl32
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u64, __SVUint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ptrue (p[0-7])\.b, vl32
+** faddv d0, \1, z0\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f64, __SVFloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=512 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, vl64
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s8, __SVInt8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, vl64
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u8, __SVUint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, vl64
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s16, __SVInt16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, vl64
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u16, __SVUint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, vl64
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f16, __SVFloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, vl64
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s32, __SVInt32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, vl64
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u32, __SVUint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, vl64
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f32, __SVFloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, vl64
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s64, __SVInt64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, vl64
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u64, __SVUint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, vl64
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f64, __SVFloat64_t)
+
+#include <arm_sve.h>
+
+#define CALLER(SUFFIX, TYPE) \
+ typeof (svaddv (svptrue_b8 (), *(TYPE *) 0)) \
+ __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1) \
+ { \
+ return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1)); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ptrue (p[0-7])\.b, vl64
+** saddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s8, __SVInt8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ptrue (p[0-7])\.b, vl64
+** uaddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u8, __SVUint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ptrue (p[0-7])\.b, vl64
+** saddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s16, __SVInt16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ptrue (p[0-7])\.b, vl64
+** uaddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u16, __SVUint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ptrue (p[0-7])\.b, vl64
+** faddv h0, \1, z0\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f16, __SVFloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ptrue (p[0-7])\.b, vl64
+** saddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s32, __SVInt32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ptrue (p[0-7])\.b, vl64
+** uaddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u32, __SVUint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ptrue (p[0-7])\.b, vl64
+** faddv s0, \1, z0\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f32, __SVFloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ptrue (p[0-7])\.b, vl64
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s64, __SVInt64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ptrue (p[0-7])\.b, vl64
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u64, __SVUint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ptrue (p[0-7])\.b, vl64
+** faddv d0, \1, z0\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f64, __SVFloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, all
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s8, svint8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, all
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u8, svuint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, all
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s16, svint16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, all
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u16, svuint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, all
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f16, svfloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, all
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s32, svint32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, all
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u32, svuint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, all
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f32, svfloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, all
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s64, svint64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, all
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u64, svuint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, all
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f64, svfloat64_t)
+
+#define CALLER(SUFFIX, TYPE) \
+ typeof (svaddv (svptrue_b8 (), *(TYPE *) 0)) \
+ __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1) \
+ { \
+ return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1)); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ptrue (p[0-7])\.b, all
+** saddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s8, svint8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ptrue (p[0-7])\.b, all
+** uaddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u8, svuint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ptrue (p[0-7])\.b, all
+** saddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s16, svint16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ptrue (p[0-7])\.b, all
+** uaddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u16, svuint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ptrue (p[0-7])\.b, all
+** faddv h0, \1, z0\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f16, svfloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ptrue (p[0-7])\.b, all
+** saddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s32, svint32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ptrue (p[0-7])\.b, all
+** uaddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u32, svuint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ptrue (p[0-7])\.b, all
+** faddv s0, \1, z0\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f32, svfloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ptrue (p[0-7])\.b, all
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s64, svint64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ptrue (p[0-7])\.b, all
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u64, svuint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ptrue (p[0-7])\.b, all
+** faddv d0, \1, z0\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f64, svfloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=1024 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, vl128
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s8, svint8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, vl128
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u8, svuint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, vl128
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s16, svint16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, vl128
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u16, svuint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, vl128
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f16, svfloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, vl128
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s32, svint32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, vl128
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u32, svuint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, vl128
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f32, svfloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, vl128
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s64, svint64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, vl128
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u64, svuint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, vl128
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f64, svfloat64_t)
+
+#define CALLER(SUFFIX, TYPE) \
+ typeof (svaddv (svptrue_b8 (), *(TYPE *) 0)) \
+ __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1) \
+ { \
+ return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1)); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ptrue (p[0-7])\.b, vl128
+** saddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s8, svint8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ptrue (p[0-7])\.b, vl128
+** uaddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u8, svuint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ptrue (p[0-7])\.b, vl128
+** saddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s16, svint16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ptrue (p[0-7])\.b, vl128
+** uaddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u16, svuint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ptrue (p[0-7])\.b, vl128
+** faddv h0, \1, z0\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f16, svfloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ptrue (p[0-7])\.b, vl128
+** saddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s32, svint32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ptrue (p[0-7])\.b, vl128
+** uaddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u32, svuint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ptrue (p[0-7])\.b, vl128
+** faddv s0, \1, z0\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f32, svfloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ptrue (p[0-7])\.b, vl128
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s64, svint64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ptrue (p[0-7])\.b, vl128
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u64, svuint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ptrue (p[0-7])\.b, vl128
+** faddv d0, \1, z0\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f64, svfloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=2048 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, vl256
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s8, svint8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, vl256
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u8, svuint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, vl256
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s16, svint16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, vl256
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u16, svuint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, vl256
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f16, svfloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, vl256
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s32, svint32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, vl256
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u32, svuint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, vl256
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f32, svfloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, vl256
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s64, svint64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, vl256
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u64, svuint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, vl256
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f64, svfloat64_t)
+
+#define CALLER(SUFFIX, TYPE) \
+ typeof (svaddv (svptrue_b8 (), *(TYPE *) 0)) \
+ __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1) \
+ { \
+ return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1)); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ptrue (p[0-7])\.b, vl256
+** saddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s8, svint8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ptrue (p[0-7])\.b, vl256
+** uaddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u8, svuint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ptrue (p[0-7])\.b, vl256
+** saddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s16, svint16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ptrue (p[0-7])\.b, vl256
+** uaddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u16, svuint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ptrue (p[0-7])\.b, vl256
+** faddv h0, \1, z0\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f16, svfloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ptrue (p[0-7])\.b, vl256
+** saddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s32, svint32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ptrue (p[0-7])\.b, vl256
+** uaddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u32, svuint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ptrue (p[0-7])\.b, vl256
+** faddv s0, \1, z0\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f32, svfloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ptrue (p[0-7])\.b, vl256
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s64, svint64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ptrue (p[0-7])\.b, vl256
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u64, svuint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ptrue (p[0-7])\.b, vl256
+** faddv d0, \1, z0\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f64, svfloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=256 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, vl32
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s8, svint8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, vl32
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u8, svuint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, vl32
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s16, svint16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, vl32
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u16, svuint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, vl32
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f16, svfloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, vl32
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s32, svint32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, vl32
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u32, svuint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, vl32
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f32, svfloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, vl32
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s64, svint64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, vl32
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u64, svuint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, vl32
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f64, svfloat64_t)
+
+#define CALLER(SUFFIX, TYPE) \
+ typeof (svaddv (svptrue_b8 (), *(TYPE *) 0)) \
+ __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1) \
+ { \
+ return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1)); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ptrue (p[0-7])\.b, vl32
+** saddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s8, svint8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ptrue (p[0-7])\.b, vl32
+** uaddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u8, svuint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ptrue (p[0-7])\.b, vl32
+** saddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s16, svint16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ptrue (p[0-7])\.b, vl32
+** uaddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u16, svuint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ptrue (p[0-7])\.b, vl32
+** faddv h0, \1, z0\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f16, svfloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ptrue (p[0-7])\.b, vl32
+** saddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s32, svint32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ptrue (p[0-7])\.b, vl32
+** uaddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u32, svuint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ptrue (p[0-7])\.b, vl32
+** faddv s0, \1, z0\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f32, svfloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ptrue (p[0-7])\.b, vl32
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s64, svint64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ptrue (p[0-7])\.b, vl32
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u64, svuint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ptrue (p[0-7])\.b, vl32
+** faddv d0, \1, z0\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f64, svfloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=512 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, vl64
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s8, svint8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, vl64
+** ld1b z0\.b, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u8, svuint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, vl64
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s16, svint16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, vl64
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u16, svuint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, vl64
+** ld1h z0\.h, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f16, svfloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, vl64
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s32, svint32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, vl64
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u32, svuint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, vl64
+** ld1w z0\.s, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f32, svfloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, vl64
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (s64, svint64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, vl64
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (u64, svuint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, vl64
+** ld1d z0\.d, \1/z, \[x0\]
+** ret
+*/
+CALLEE (f64, svfloat64_t)
+
+#define CALLER(SUFFIX, TYPE) \
+ typeof (svaddv (svptrue_b8 (), *(TYPE *) 0)) \
+ __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1) \
+ { \
+ return svaddv (svptrue_b8 (), callee_##SUFFIX (ptr1)); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ptrue (p[0-7])\.b, vl64
+** saddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s8, svint8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ptrue (p[0-7])\.b, vl64
+** uaddv (d[0-9]+), \1, z0\.b
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u8, svuint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ptrue (p[0-7])\.b, vl64
+** saddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s16, svint16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ptrue (p[0-7])\.b, vl64
+** uaddv (d[0-9]+), \1, z0\.h
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u16, svuint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ptrue (p[0-7])\.b, vl64
+** faddv h0, \1, z0\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f16, svfloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ptrue (p[0-7])\.b, vl64
+** saddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s32, svint32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ptrue (p[0-7])\.b, vl64
+** uaddv (d[0-9]+), \1, z0\.s
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u32, svuint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ptrue (p[0-7])\.b, vl64
+** faddv s0, \1, z0\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f32, svfloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ptrue (p[0-7])\.b, vl64
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (s64, svint64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ptrue (p[0-7])\.b, vl64
+** uaddv (d[0-9]+), \1, z0\.d
+** fmov x0, \2
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (u64, svuint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ptrue (p[0-7])\.b, vl64
+** faddv d0, \1, z0\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+CALLER (f64, svfloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <stdint.h>
+
+typedef int8_t svint8_t __attribute__ ((vector_size (32)));
+typedef uint8_t svuint8_t __attribute__ ((vector_size (32)));
+
+typedef int16_t svint16_t __attribute__ ((vector_size (32)));
+typedef uint16_t svuint16_t __attribute__ ((vector_size (32)));
+typedef __fp16 svfloat16_t __attribute__ ((vector_size (32)));
+
+typedef int32_t svint32_t __attribute__ ((vector_size (32)));
+typedef uint32_t svuint32_t __attribute__ ((vector_size (32)));
+typedef float svfloat32_t __attribute__ ((vector_size (32)));
+
+typedef int64_t svint64_t __attribute__ ((vector_size (32)));
+typedef uint64_t svuint64_t __attribute__ ((vector_size (32)));
+typedef double svfloat64_t __attribute__ ((vector_size (32)));
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** (
+** ld1 ({v.*}), \[x0\]
+** st1 \1, \[x8\]
+** |
+** ldp (q[0-9]+, q[0-9]+), \[x0\]
+** stp \2, \[x8\]
+** )
+** ret
+*/
+CALLEE (s8, svint8_t)
+
+/*
+** callee_u8:
+** (
+** ld1 ({v.*}), \[x0\]
+** st1 \1, \[x8\]
+** |
+** ldp (q[0-9]+, q[0-9]+), \[x0\]
+** stp \2, \[x8\]
+** )
+** ret
+*/
+CALLEE (u8, svuint8_t)
+
+/*
+** callee_s16:
+** (
+** ld1 ({v.*}), \[x0\]
+** st1 \1, \[x8\]
+** |
+** ldp (q[0-9]+, q[0-9]+), \[x0\]
+** stp \2, \[x8\]
+** )
+** ret
+*/
+CALLEE (s16, svint16_t)
+
+/*
+** callee_u16:
+** (
+** ld1 ({v.*}), \[x0\]
+** st1 \1, \[x8\]
+** |
+** ldp (q[0-9]+, q[0-9]+), \[x0\]
+** stp \2, \[x8\]
+** )
+** ret
+*/
+CALLEE (u16, svuint16_t)
+
+/* Currently we scalarize this. */
+CALLEE (f16, svfloat16_t)
+
+/*
+** callee_s32:
+** (
+** ld1 ({v.*}), \[x0\]
+** st1 \1, \[x8\]
+** |
+** ldp (q[0-9]+, q[0-9]+), \[x0\]
+** stp \2, \[x8\]
+** )
+** ret
+*/
+CALLEE (s32, svint32_t)
+
+/*
+** callee_u32:
+** (
+** ld1 ({v.*}), \[x0\]
+** st1 \1, \[x8\]
+** |
+** ldp (q[0-9]+, q[0-9]+), \[x0\]
+** stp \2, \[x8\]
+** )
+** ret
+*/
+CALLEE (u32, svuint32_t)
+
+/* Currently we scalarize this. */
+CALLEE (f32, svfloat32_t)
+
+/*
+** callee_s64:
+** (
+** ld1 ({v.*}), \[x0\]
+** st1 \1, \[x8\]
+** |
+** ldp (q[0-9]+, q[0-9]+), \[x0\]
+** stp \2, \[x8\]
+** )
+** ret
+*/
+CALLEE (s64, svint64_t)
+
+/*
+** callee_u64:
+** (
+** ld1 ({v.*}), \[x0\]
+** st1 \1, \[x8\]
+** |
+** ldp (q[0-9]+, q[0-9]+), \[x0\]
+** stp \2, \[x8\]
+** )
+** ret
+*/
+CALLEE (u64, svuint64_t)
+
+/* Currently we scalarize this. */
+CALLEE (f64, svfloat64_t)
+
+#define CALLER(SUFFIX, TYPE) \
+ typeof ((*(TYPE *) 0)[0]) \
+ __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1) \
+ { \
+ return callee_##SUFFIX (ptr1)[0]; \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ldrb w0, \[sp, 16\]
+** ldp x29, x30, \[sp\], 48
+** ret
+*/
+CALLER (s8, svint8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ldrb w0, \[sp, 16\]
+** ldp x29, x30, \[sp\], 48
+** ret
+*/
+CALLER (u8, svuint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ldrh w0, \[sp, 16\]
+** ldp x29, x30, \[sp\], 48
+** ret
+*/
+CALLER (s16, svint16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ldrh w0, \[sp, 16\]
+** ldp x29, x30, \[sp\], 48
+** ret
+*/
+CALLER (u16, svuint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ldr h0, \[sp, 16\]
+** ldp x29, x30, \[sp\], 48
+** ret
+*/
+CALLER (f16, svfloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ldr w0, \[sp, 16\]
+** ldp x29, x30, \[sp\], 48
+** ret
+*/
+CALLER (s32, svint32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ldr w0, \[sp, 16\]
+** ldp x29, x30, \[sp\], 48
+** ret
+*/
+CALLER (u32, svuint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ldr s0, \[sp, 16\]
+** ldp x29, x30, \[sp\], 48
+** ret
+*/
+CALLER (f32, svfloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ldr x0, \[sp, 16\]
+** ldp x29, x30, \[sp\], 48
+** ret
+*/
+CALLER (s64, svint64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ldr x0, \[sp, 16\]
+** ldp x29, x30, \[sp\], 48
+** ret
+*/
+CALLER (u64, svuint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ldr d0, \[sp, 16\]
+** ldp x29, x30, \[sp\], 48
+** ret
+*/
+CALLER (f64, svfloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=1024 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <stdint.h>
+
+typedef int8_t svint8_t __attribute__ ((vector_size (128)));
+typedef uint8_t svuint8_t __attribute__ ((vector_size (128)));
+
+typedef int16_t svint16_t __attribute__ ((vector_size (128)));
+typedef uint16_t svuint16_t __attribute__ ((vector_size (128)));
+typedef __fp16 svfloat16_t __attribute__ ((vector_size (128)));
+
+typedef int32_t svint32_t __attribute__ ((vector_size (128)));
+typedef uint32_t svuint32_t __attribute__ ((vector_size (128)));
+typedef float svfloat32_t __attribute__ ((vector_size (128)));
+
+typedef int64_t svint64_t __attribute__ ((vector_size (128)));
+typedef uint64_t svuint64_t __attribute__ ((vector_size (128)));
+typedef double svfloat64_t __attribute__ ((vector_size (128)));
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, vl128
+** ld1b z0\.b, \1/z, \[x0\]
+** st1b z0\.b, \1, \[x8\]
+** ret
+*/
+CALLEE (s8, svint8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, vl128
+** ld1b z0\.b, \1/z, \[x0\]
+** st1b z0\.b, \1, \[x8\]
+** ret
+*/
+CALLEE (u8, svuint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, vl128
+** ld1h z0\.h, \1/z, \[x0\]
+** st1h z0\.h, \1, \[x8\]
+** ret
+*/
+CALLEE (s16, svint16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, vl128
+** ld1h z0\.h, \1/z, \[x0\]
+** st1h z0\.h, \1, \[x8\]
+** ret
+*/
+CALLEE (u16, svuint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, vl128
+** ld1h z0\.h, \1/z, \[x0\]
+** st1h z0\.h, \1, \[x8\]
+** ret
+*/
+CALLEE (f16, svfloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, vl128
+** ld1w z0\.s, \1/z, \[x0\]
+** st1w z0\.s, \1, \[x8\]
+** ret
+*/
+CALLEE (s32, svint32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, vl128
+** ld1w z0\.s, \1/z, \[x0\]
+** st1w z0\.s, \1, \[x8\]
+** ret
+*/
+CALLEE (u32, svuint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, vl128
+** ld1w z0\.s, \1/z, \[x0\]
+** st1w z0\.s, \1, \[x8\]
+** ret
+*/
+CALLEE (f32, svfloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, vl128
+** ld1d z0\.d, \1/z, \[x0\]
+** st1d z0\.d, \1, \[x8\]
+** ret
+*/
+CALLEE (s64, svint64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, vl128
+** ld1d z0\.d, \1/z, \[x0\]
+** st1d z0\.d, \1, \[x8\]
+** ret
+*/
+CALLEE (u64, svuint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, vl128
+** ld1d z0\.d, \1/z, \[x0\]
+** st1d z0\.d, \1, \[x8\]
+** ret
+*/
+CALLEE (f64, svfloat64_t)
+
+#define CALLER(SUFFIX, TYPE) \
+ void __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1, TYPE *ptr2) \
+ { \
+ *ptr2 = callee_##SUFFIX (ptr1); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[[^]]*\]
+** st1b \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s8, svint8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[[^]]*\]
+** st1b \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u8, svuint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+** st1h \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s16, svint16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+** st1h \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u16, svuint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+** st1h \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (f16, svfloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[[^]]*\]
+** st1w \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s32, svint32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[[^]]*\]
+** st1w \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u32, svuint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[[^]]*\]
+** st1w \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (f32, svfloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[[^]]*\]
+** st1d \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s64, svint64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[[^]]*\]
+** st1d \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u64, svuint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[[^]]*\]
+** st1d \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (f64, svfloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=2048 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <stdint.h>
+
+typedef int8_t svint8_t __attribute__ ((vector_size (256)));
+typedef uint8_t svuint8_t __attribute__ ((vector_size (256)));
+
+typedef int16_t svint16_t __attribute__ ((vector_size (256)));
+typedef uint16_t svuint16_t __attribute__ ((vector_size (256)));
+typedef __fp16 svfloat16_t __attribute__ ((vector_size (256)));
+
+typedef int32_t svint32_t __attribute__ ((vector_size (256)));
+typedef uint32_t svuint32_t __attribute__ ((vector_size (256)));
+typedef float svfloat32_t __attribute__ ((vector_size (256)));
+
+typedef int64_t svint64_t __attribute__ ((vector_size (256)));
+typedef uint64_t svuint64_t __attribute__ ((vector_size (256)));
+typedef double svfloat64_t __attribute__ ((vector_size (256)));
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, vl256
+** ld1b z0\.b, \1/z, \[x0\]
+** st1b z0\.b, \1, \[x8\]
+** ret
+*/
+CALLEE (s8, svint8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, vl256
+** ld1b z0\.b, \1/z, \[x0\]
+** st1b z0\.b, \1, \[x8\]
+** ret
+*/
+CALLEE (u8, svuint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, vl256
+** ld1h z0\.h, \1/z, \[x0\]
+** st1h z0\.h, \1, \[x8\]
+** ret
+*/
+CALLEE (s16, svint16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, vl256
+** ld1h z0\.h, \1/z, \[x0\]
+** st1h z0\.h, \1, \[x8\]
+** ret
+*/
+CALLEE (u16, svuint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, vl256
+** ld1h z0\.h, \1/z, \[x0\]
+** st1h z0\.h, \1, \[x8\]
+** ret
+*/
+CALLEE (f16, svfloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, vl256
+** ld1w z0\.s, \1/z, \[x0\]
+** st1w z0\.s, \1, \[x8\]
+** ret
+*/
+CALLEE (s32, svint32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, vl256
+** ld1w z0\.s, \1/z, \[x0\]
+** st1w z0\.s, \1, \[x8\]
+** ret
+*/
+CALLEE (u32, svuint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, vl256
+** ld1w z0\.s, \1/z, \[x0\]
+** st1w z0\.s, \1, \[x8\]
+** ret
+*/
+CALLEE (f32, svfloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, vl256
+** ld1d z0\.d, \1/z, \[x0\]
+** st1d z0\.d, \1, \[x8\]
+** ret
+*/
+CALLEE (s64, svint64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, vl256
+** ld1d z0\.d, \1/z, \[x0\]
+** st1d z0\.d, \1, \[x8\]
+** ret
+*/
+CALLEE (u64, svuint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, vl256
+** ld1d z0\.d, \1/z, \[x0\]
+** st1d z0\.d, \1, \[x8\]
+** ret
+*/
+CALLEE (f64, svfloat64_t)
+
+#define CALLER(SUFFIX, TYPE) \
+ void __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1, TYPE *ptr2) \
+ { \
+ *ptr2 = callee_##SUFFIX (ptr1); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[[^]]*\]
+** st1b \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s8, svint8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[[^]]*\]
+** st1b \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u8, svuint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+** st1h \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s16, svint16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+** st1h \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u16, svuint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+** st1h \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (f16, svfloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[[^]]*\]
+** st1w \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s32, svint32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[[^]]*\]
+** st1w \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u32, svuint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[[^]]*\]
+** st1w \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (f32, svfloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[[^]]*\]
+** st1d \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s64, svint64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[[^]]*\]
+** st1d \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u64, svuint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[[^]]*\]
+** st1d \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (f64, svfloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=256 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <stdint.h>
+
+typedef int8_t svint8_t __attribute__ ((vector_size (32)));
+typedef uint8_t svuint8_t __attribute__ ((vector_size (32)));
+
+typedef int16_t svint16_t __attribute__ ((vector_size (32)));
+typedef uint16_t svuint16_t __attribute__ ((vector_size (32)));
+typedef __fp16 svfloat16_t __attribute__ ((vector_size (32)));
+
+typedef int32_t svint32_t __attribute__ ((vector_size (32)));
+typedef uint32_t svuint32_t __attribute__ ((vector_size (32)));
+typedef float svfloat32_t __attribute__ ((vector_size (32)));
+
+typedef int64_t svint64_t __attribute__ ((vector_size (32)));
+typedef uint64_t svuint64_t __attribute__ ((vector_size (32)));
+typedef double svfloat64_t __attribute__ ((vector_size (32)));
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, vl32
+** ld1b z0\.b, \1/z, \[x0\]
+** st1b z0\.b, \1, \[x8\]
+** ret
+*/
+CALLEE (s8, svint8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, vl32
+** ld1b z0\.b, \1/z, \[x0\]
+** st1b z0\.b, \1, \[x8\]
+** ret
+*/
+CALLEE (u8, svuint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, vl32
+** ld1h z0\.h, \1/z, \[x0\]
+** st1h z0\.h, \1, \[x8\]
+** ret
+*/
+CALLEE (s16, svint16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, vl32
+** ld1h z0\.h, \1/z, \[x0\]
+** st1h z0\.h, \1, \[x8\]
+** ret
+*/
+CALLEE (u16, svuint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, vl32
+** ld1h z0\.h, \1/z, \[x0\]
+** st1h z0\.h, \1, \[x8\]
+** ret
+*/
+CALLEE (f16, svfloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, vl32
+** ld1w z0\.s, \1/z, \[x0\]
+** st1w z0\.s, \1, \[x8\]
+** ret
+*/
+CALLEE (s32, svint32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, vl32
+** ld1w z0\.s, \1/z, \[x0\]
+** st1w z0\.s, \1, \[x8\]
+** ret
+*/
+CALLEE (u32, svuint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, vl32
+** ld1w z0\.s, \1/z, \[x0\]
+** st1w z0\.s, \1, \[x8\]
+** ret
+*/
+CALLEE (f32, svfloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, vl32
+** ld1d z0\.d, \1/z, \[x0\]
+** st1d z0\.d, \1, \[x8\]
+** ret
+*/
+CALLEE (s64, svint64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, vl32
+** ld1d z0\.d, \1/z, \[x0\]
+** st1d z0\.d, \1, \[x8\]
+** ret
+*/
+CALLEE (u64, svuint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, vl32
+** ld1d z0\.d, \1/z, \[x0\]
+** st1d z0\.d, \1, \[x8\]
+** ret
+*/
+CALLEE (f64, svfloat64_t)
+
+#define CALLER(SUFFIX, TYPE) \
+ void __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1, TYPE *ptr2) \
+ { \
+ *ptr2 = callee_##SUFFIX (ptr1); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[[^]]*\]
+** st1b \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s8, svint8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[[^]]*\]
+** st1b \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u8, svuint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+** st1h \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s16, svint16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+** st1h \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u16, svuint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+** st1h \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (f16, svfloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[[^]]*\]
+** st1w \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s32, svint32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[[^]]*\]
+** st1w \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u32, svuint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[[^]]*\]
+** st1w \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (f32, svfloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[[^]]*\]
+** st1d \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s64, svint64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[[^]]*\]
+** st1d \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u64, svuint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[[^]]*\]
+** st1d \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (f64, svfloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -msve-vector-bits=512 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <stdint.h>
+
+typedef int8_t svint8_t __attribute__ ((vector_size (64)));
+typedef uint8_t svuint8_t __attribute__ ((vector_size (64)));
+
+typedef int16_t svint16_t __attribute__ ((vector_size (64)));
+typedef uint16_t svuint16_t __attribute__ ((vector_size (64)));
+typedef __fp16 svfloat16_t __attribute__ ((vector_size (64)));
+
+typedef int32_t svint32_t __attribute__ ((vector_size (64)));
+typedef uint32_t svuint32_t __attribute__ ((vector_size (64)));
+typedef float svfloat32_t __attribute__ ((vector_size (64)));
+
+typedef int64_t svint64_t __attribute__ ((vector_size (64)));
+typedef uint64_t svuint64_t __attribute__ ((vector_size (64)));
+typedef double svfloat64_t __attribute__ ((vector_size (64)));
+
+#define CALLEE(SUFFIX, TYPE) \
+ TYPE __attribute__((noipa)) \
+ callee_##SUFFIX (TYPE *ptr) \
+ { \
+ return *ptr; \
+ }
+
+/*
+** callee_s8:
+** ptrue (p[0-7])\.b, vl64
+** ld1b z0\.b, \1/z, \[x0\]
+** st1b z0\.b, \1, \[x8\]
+** ret
+*/
+CALLEE (s8, svint8_t)
+
+/*
+** callee_u8:
+** ptrue (p[0-7])\.b, vl64
+** ld1b z0\.b, \1/z, \[x0\]
+** st1b z0\.b, \1, \[x8\]
+** ret
+*/
+CALLEE (u8, svuint8_t)
+
+/*
+** callee_s16:
+** ptrue (p[0-7])\.b, vl64
+** ld1h z0\.h, \1/z, \[x0\]
+** st1h z0\.h, \1, \[x8\]
+** ret
+*/
+CALLEE (s16, svint16_t)
+
+/*
+** callee_u16:
+** ptrue (p[0-7])\.b, vl64
+** ld1h z0\.h, \1/z, \[x0\]
+** st1h z0\.h, \1, \[x8\]
+** ret
+*/
+CALLEE (u16, svuint16_t)
+
+/*
+** callee_f16:
+** ptrue (p[0-7])\.b, vl64
+** ld1h z0\.h, \1/z, \[x0\]
+** st1h z0\.h, \1, \[x8\]
+** ret
+*/
+CALLEE (f16, svfloat16_t)
+
+/*
+** callee_s32:
+** ptrue (p[0-7])\.b, vl64
+** ld1w z0\.s, \1/z, \[x0\]
+** st1w z0\.s, \1, \[x8\]
+** ret
+*/
+CALLEE (s32, svint32_t)
+
+/*
+** callee_u32:
+** ptrue (p[0-7])\.b, vl64
+** ld1w z0\.s, \1/z, \[x0\]
+** st1w z0\.s, \1, \[x8\]
+** ret
+*/
+CALLEE (u32, svuint32_t)
+
+/*
+** callee_f32:
+** ptrue (p[0-7])\.b, vl64
+** ld1w z0\.s, \1/z, \[x0\]
+** st1w z0\.s, \1, \[x8\]
+** ret
+*/
+CALLEE (f32, svfloat32_t)
+
+/*
+** callee_s64:
+** ptrue (p[0-7])\.b, vl64
+** ld1d z0\.d, \1/z, \[x0\]
+** st1d z0\.d, \1, \[x8\]
+** ret
+*/
+CALLEE (s64, svint64_t)
+
+/*
+** callee_u64:
+** ptrue (p[0-7])\.b, vl64
+** ld1d z0\.d, \1/z, \[x0\]
+** st1d z0\.d, \1, \[x8\]
+** ret
+*/
+CALLEE (u64, svuint64_t)
+
+/*
+** callee_f64:
+** ptrue (p[0-7])\.b, vl64
+** ld1d z0\.d, \1/z, \[x0\]
+** st1d z0\.d, \1, \[x8\]
+** ret
+*/
+CALLEE (f64, svfloat64_t)
+
+#define CALLER(SUFFIX, TYPE) \
+ void __attribute__((noipa)) \
+ caller_##SUFFIX (TYPE *ptr1, TYPE *ptr2) \
+ { \
+ *ptr2 = callee_##SUFFIX (ptr1); \
+ }
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[[^]]*\]
+** st1b \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s8, svint8_t)
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[[^]]*\]
+** st1b \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u8, svuint8_t)
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+** st1h \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s16, svint16_t)
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+** st1h \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u16, svuint16_t)
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[[^]]*\]
+** st1h \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (f16, svfloat16_t)
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[[^]]*\]
+** st1w \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s32, svint32_t)
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[[^]]*\]
+** st1w \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u32, svuint32_t)
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[[^]]*\]
+** st1w \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (f32, svfloat32_t)
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[[^]]*\]
+** st1d \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (s64, svint64_t)
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[[^]]*\]
+** st1d \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (u64, svuint64_t)
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[[^]]*\]
+** st1d \1, \2, \[[^]]*\]
+** ...
+** ret
+*/
+CALLER (f64, svfloat64_t)
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+/*
+** callee_s8:
+** mov z0\.b, #1
+** mov z1\.b, #2
+** ret
+*/
+svint8x2_t __attribute__((noipa))
+callee_s8 (void)
+{
+ return svcreate2 (svdup_s8 (1), svdup_s8 (2));
+}
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** trn1 z0\.b, z0\.b, z1\.b
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svint8_t __attribute__((noipa))
+caller_s8 (void)
+{
+ svint8x2_t res;
+ res = callee_s8 ();
+ return svtrn1 (svget2 (res, 0), svget2 (res, 1));
+}
+
+/*
+** callee_u8:
+** mov z0\.b, #3
+** mov z1\.b, #4
+** ret
+*/
+svuint8x2_t __attribute__((noipa))
+callee_u8 (void)
+{
+ return svcreate2 (svdup_u8 (3), svdup_u8 (4));
+}
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** trn2 z0\.b, z1\.b, z0\.b
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svuint8_t __attribute__((noipa))
+caller_u8 (void)
+{
+ svuint8x2_t res;
+ res = callee_u8 ();
+ return svtrn2 (svget2 (res, 1), svget2 (res, 0));
+}
+
+/*
+** callee_s16:
+** mov z0\.h, #1
+** mov z1\.h, #2
+** ret
+*/
+svint16x2_t __attribute__((noipa))
+callee_s16 (void)
+{
+ return svcreate2 (svdup_s16 (1), svdup_s16 (2));
+}
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** trn1 z0\.h, z0\.h, z1\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svint16_t __attribute__((noipa))
+caller_s16 (void)
+{
+ svint16x2_t res;
+ res = callee_s16 ();
+ return svtrn1 (svget2 (res, 0), svget2 (res, 1));
+}
+
+/*
+** callee_u16:
+** mov z0\.h, #3
+** mov z1\.h, #4
+** ret
+*/
+svuint16x2_t __attribute__((noipa))
+callee_u16 (void)
+{
+ return svcreate2 (svdup_u16 (3), svdup_u16 (4));
+}
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** trn2 z0\.h, z1\.h, z0\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svuint16_t __attribute__((noipa))
+caller_u16 (void)
+{
+ svuint16x2_t res;
+ res = callee_u16 ();
+ return svtrn2 (svget2 (res, 1), svget2 (res, 0));
+}
+
+/*
+** callee_f16:
+** fmov z0\.h, #5\.0(?:e\+0)?
+** fmov z1\.h, #6\.0(?:e\+0)?
+** ret
+*/
+svfloat16x2_t __attribute__((noipa))
+callee_f16 (void)
+{
+ return svcreate2 (svdup_f16 (5), svdup_f16 (6));
+}
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** zip1 z0\.h, z1\.h, z0\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svfloat16_t __attribute__((noipa))
+caller_f16 (void)
+{
+ svfloat16x2_t res;
+ res = callee_f16 ();
+ return svzip1 (svget2 (res, 1), svget2 (res, 0));
+}
+
+/*
+** callee_s32:
+** mov z0\.s, #1
+** mov z1\.s, #2
+** ret
+*/
+svint32x2_t __attribute__((noipa))
+callee_s32 (void)
+{
+ return svcreate2 (svdup_s32 (1), svdup_s32 (2));
+}
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** trn1 z0\.s, z0\.s, z1\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svint32_t __attribute__((noipa))
+caller_s32 (void)
+{
+ svint32x2_t res;
+ res = callee_s32 ();
+ return svtrn1 (svget2 (res, 0), svget2 (res, 1));
+}
+
+/*
+** callee_u32:
+** mov z0\.s, #3
+** mov z1\.s, #4
+** ret
+*/
+svuint32x2_t __attribute__((noipa))
+callee_u32 (void)
+{
+ return svcreate2 (svdup_u32 (3), svdup_u32 (4));
+}
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** trn2 z0\.s, z1\.s, z0\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svuint32_t __attribute__((noipa))
+caller_u32 (void)
+{
+ svuint32x2_t res;
+ res = callee_u32 ();
+ return svtrn2 (svget2 (res, 1), svget2 (res, 0));
+}
+
+/*
+** callee_f32:
+** fmov z0\.s, #5\.0(?:e\+0)?
+** fmov z1\.s, #6\.0(?:e\+0)?
+** ret
+*/
+svfloat32x2_t __attribute__((noipa))
+callee_f32 (void)
+{
+ return svcreate2 (svdup_f32 (5), svdup_f32 (6));
+}
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** zip1 z0\.s, z1\.s, z0\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svfloat32_t __attribute__((noipa))
+caller_f32 (void)
+{
+ svfloat32x2_t res;
+ res = callee_f32 ();
+ return svzip1 (svget2 (res, 1), svget2 (res, 0));
+}
+
+/*
+** callee_s64:
+** mov z0\.d, #1
+** mov z1\.d, #2
+** ret
+*/
+svint64x2_t __attribute__((noipa))
+callee_s64 (void)
+{
+ return svcreate2 (svdup_s64 (1), svdup_s64 (2));
+}
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** trn1 z0\.d, z0\.d, z1\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svint64_t __attribute__((noipa))
+caller_s64 (void)
+{
+ svint64x2_t res;
+ res = callee_s64 ();
+ return svtrn1 (svget2 (res, 0), svget2 (res, 1));
+}
+
+/*
+** callee_u64:
+** mov z0\.d, #3
+** mov z1\.d, #4
+** ret
+*/
+svuint64x2_t __attribute__((noipa))
+callee_u64 (void)
+{
+ return svcreate2 (svdup_u64 (3), svdup_u64 (4));
+}
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** trn2 z0\.d, z1\.d, z0\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svuint64_t __attribute__((noipa))
+caller_u64 (void)
+{
+ svuint64x2_t res;
+ res = callee_u64 ();
+ return svtrn2 (svget2 (res, 1), svget2 (res, 0));
+}
+
+/*
+** callee_f64:
+** fmov z0\.d, #5\.0(?:e\+0)?
+** fmov z1\.d, #6\.0(?:e\+0)?
+** ret
+*/
+svfloat64x2_t __attribute__((noipa))
+callee_f64 (void)
+{
+ return svcreate2 (svdup_f64 (5), svdup_f64 (6));
+}
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** zip1 z0\.d, z1\.d, z0\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svfloat64_t __attribute__((noipa))
+caller_f64 (void)
+{
+ svfloat64x2_t res;
+ res = callee_f64 ();
+ return svzip1 (svget2 (res, 1), svget2 (res, 0));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -frename-registers -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+/*
+** callee_s8:
+** mov z0\.b, #1
+** mov z1\.b, #2
+** mov z2\.b, #3
+** ret
+*/
+svint8x3_t __attribute__((noipa))
+callee_s8 (void)
+{
+ return svcreate3 (svdup_s8 (1), svdup_s8 (2), svdup_s8 (3));
+}
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** ptrue (p[0-7])\.b, all
+** mad z0\.b, \1/m, z1\.b, z2\.b
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svint8_t __attribute__((noipa))
+caller_s8 (void)
+{
+ svint8x3_t res;
+ res = callee_s8 ();
+ return svmad_x (svptrue_b8 (),
+ svget3 (res, 0), svget3 (res, 1), svget3 (res, 2));
+}
+
+/*
+** callee_u8:
+** mov z0\.b, #4
+** mov z1\.b, #5
+** mov z2\.b, #6
+** ret
+*/
+svuint8x3_t __attribute__((noipa))
+callee_u8 (void)
+{
+ return svcreate3 (svdup_u8 (4), svdup_u8 (5), svdup_u8 (6));
+}
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** ptrue (p[0-7])\.b, all
+** msb z0\.b, \1/m, z1\.b, z2\.b
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svuint8_t __attribute__((noipa))
+caller_u8 (void)
+{
+ svuint8x3_t res;
+ res = callee_u8 ();
+ return svmsb_x (svptrue_b8 (),
+ svget3 (res, 0), svget3 (res, 1), svget3 (res, 2));
+}
+
+/*
+** callee_s16:
+** mov z0\.h, #1
+** mov z1\.h, #2
+** mov z2\.h, #3
+** ret
+*/
+svint16x3_t __attribute__((noipa))
+callee_s16 (void)
+{
+ return svcreate3 (svdup_s16 (1), svdup_s16 (2), svdup_s16 (3));
+}
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** ptrue (p[0-7])\.b, all
+** mls z0\.h, \1/m, z1\.h, z2\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svint16_t __attribute__((noipa))
+caller_s16 (void)
+{
+ svint16x3_t res;
+ res = callee_s16 ();
+ return svmls_x (svptrue_b16 (),
+ svget3 (res, 0), svget3 (res, 1), svget3 (res, 2));
+}
+
+/*
+** callee_u16:
+** mov z0\.h, #4
+** mov z1\.h, #5
+** mov z2\.h, #6
+** ret
+*/
+svuint16x3_t __attribute__((noipa))
+callee_u16 (void)
+{
+ return svcreate3 (svdup_u16 (4), svdup_u16 (5), svdup_u16 (6));
+}
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** ptrue (p[0-7])\.b, all
+** mla z0\.h, \1/m, z1\.h, z2\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svuint16_t __attribute__((noipa))
+caller_u16 (void)
+{
+ svuint16x3_t res;
+ res = callee_u16 ();
+ return svmla_x (svptrue_b16 (),
+ svget3 (res, 0), svget3 (res, 1), svget3 (res, 2));
+}
+
+/*
+** callee_f16:
+** fmov z0\.h, #1\.0(?:e\+0)?
+** fmov z1\.h, #2\.0(?:e\+0)?
+** fmov z2\.h, #3\.0(?:e\+0)?
+** ret
+*/
+svfloat16x3_t __attribute__((noipa))
+callee_f16 (void)
+{
+ return svcreate3 (svdup_f16 (1), svdup_f16 (2), svdup_f16 (3));
+}
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** ptrue (p[0-7])\.b, all
+** fmla z0\.h, \1/m, z1\.h, z2\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svfloat16_t __attribute__((noipa))
+caller_f16 (void)
+{
+ svfloat16x3_t res;
+ res = callee_f16 ();
+ return svmla_x (svptrue_b16 (),
+ svget3 (res, 0), svget3 (res, 1), svget3 (res, 2));
+}
+
+/*
+** callee_s32:
+** mov z0\.s, #1
+** mov z1\.s, #2
+** mov z2\.s, #3
+** ret
+*/
+svint32x3_t __attribute__((noipa))
+callee_s32 (void)
+{
+ return svcreate3 (svdup_s32 (1), svdup_s32 (2), svdup_s32 (3));
+}
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** ptrue (p[0-7])\.b, all
+** mad z0\.s, \1/m, z1\.s, z2\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svint32_t __attribute__((noipa))
+caller_s32 (void)
+{
+ svint32x3_t res;
+ res = callee_s32 ();
+ return svmad_x (svptrue_b32 (),
+ svget3 (res, 0), svget3 (res, 1), svget3 (res, 2));
+}
+
+/*
+** callee_u32:
+** mov z0\.s, #4
+** mov z1\.s, #5
+** mov z2\.s, #6
+** ret
+*/
+svuint32x3_t __attribute__((noipa))
+callee_u32 (void)
+{
+ return svcreate3 (svdup_u32 (4), svdup_u32 (5), svdup_u32 (6));
+}
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** ptrue (p[0-7])\.b, all
+** msb z0\.s, \1/m, z1\.s, z2\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svuint32_t __attribute__((noipa))
+caller_u32 (void)
+{
+ svuint32x3_t res;
+ res = callee_u32 ();
+ return svmsb_x (svptrue_b32 (),
+ svget3 (res, 0), svget3 (res, 1), svget3 (res, 2));
+}
+
+/*
+** callee_f32:
+** fmov z0\.s, #1\.0(?:e\+0)?
+** fmov z1\.s, #2\.0(?:e\+0)?
+** fmov z2\.s, #3\.0(?:e\+0)?
+** ret
+*/
+svfloat32x3_t __attribute__((noipa))
+callee_f32 (void)
+{
+ return svcreate3 (svdup_f32 (1), svdup_f32 (2), svdup_f32 (3));
+}
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** ptrue (p[0-7])\.b, all
+** fmla z0\.s, \1/m, z1\.s, z2\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svfloat32_t __attribute__((noipa))
+caller_f32 (void)
+{
+ svfloat32x3_t res;
+ res = callee_f32 ();
+ return svmla_x (svptrue_b32 (),
+ svget3 (res, 0), svget3 (res, 1), svget3 (res, 2));
+}
+
+/*
+** callee_s64:
+** mov z0\.d, #1
+** mov z1\.d, #2
+** mov z2\.d, #3
+** ret
+*/
+svint64x3_t __attribute__((noipa))
+callee_s64 (void)
+{
+ return svcreate3 (svdup_s64 (1), svdup_s64 (2), svdup_s64 (3));
+}
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** ptrue (p[0-7])\.b, all
+** mls z0\.d, \1/m, z1\.d, z2\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svint64_t __attribute__((noipa))
+caller_s64 (void)
+{
+ svint64x3_t res;
+ res = callee_s64 ();
+ return svmls_x (svptrue_b64 (),
+ svget3 (res, 0), svget3 (res, 1), svget3 (res, 2));
+}
+
+/*
+** callee_u64:
+** mov z0\.d, #4
+** mov z1\.d, #5
+** mov z2\.d, #6
+** ret
+*/
+svuint64x3_t __attribute__((noipa))
+callee_u64 (void)
+{
+ return svcreate3 (svdup_u64 (4), svdup_u64 (5), svdup_u64 (6));
+}
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** ptrue (p[0-7])\.b, all
+** mla z0\.d, \1/m, z1\.d, z2\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svuint64_t __attribute__((noipa))
+caller_u64 (void)
+{
+ svuint64x3_t res;
+ res = callee_u64 ();
+ return svmla_x (svptrue_b64 (),
+ svget3 (res, 0), svget3 (res, 1), svget3 (res, 2));
+}
+
+/*
+** callee_f64:
+** fmov z0\.d, #1\.0(?:e\+0)?
+** fmov z1\.d, #2\.0(?:e\+0)?
+** fmov z2\.d, #3\.0(?:e\+0)?
+** ret
+*/
+svfloat64x3_t __attribute__((noipa))
+callee_f64 (void)
+{
+ return svcreate3 (svdup_f64 (1), svdup_f64 (2), svdup_f64 (3));
+}
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** ptrue (p[0-7])\.b, all
+** fmla z0\.d, \1/m, z1\.d, z2\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svfloat64_t __attribute__((noipa))
+caller_f64 (void)
+{
+ svfloat64x3_t res;
+ res = callee_f64 ();
+ return svmla_x (svptrue_b64 (),
+ svget3 (res, 0), svget3 (res, 1), svget3 (res, 2));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -frename-registers -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+/*
+** callee_s8:
+** mov z0\.b, #1
+** mov z1\.b, #2
+** mov z2\.b, #3
+** mov z3\.b, #4
+** ret
+*/
+svint8x4_t __attribute__((noipa))
+callee_s8 (void)
+{
+ return svcreate4 (svdup_s8 (1), svdup_s8 (2), svdup_s8 (3), svdup_s8 (4));
+}
+
+/*
+** caller_s8:
+** ...
+** bl callee_s8
+** add (z[2-7]\.b), z2\.b, z3\.b
+** ptrue (p[0-7])\.b, all
+** mla z0\.b, \2/m, (z1\.b, \1|\1, z1\.b)
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svint8_t __attribute__((noipa))
+caller_s8 (void)
+{
+ svint8x4_t res;
+ res = callee_s8 ();
+ return svmla_x (svptrue_b8 (), svget4 (res, 0), svget4 (res, 1),
+ svadd_x (svptrue_b8 (),
+ svget4 (res, 2),
+ svget4 (res, 3)));
+}
+
+/*
+** callee_u8:
+** mov z0\.b, #4
+** mov z1\.b, #5
+** mov z2\.b, #6
+** mov z3\.b, #7
+** ret
+*/
+svuint8x4_t __attribute__((noipa))
+callee_u8 (void)
+{
+ return svcreate4 (svdup_u8 (4), svdup_u8 (5), svdup_u8 (6), svdup_u8 (7));
+}
+
+/*
+** caller_u8:
+** ...
+** bl callee_u8
+** sub (z[2-7]\.b), z2\.b, z3\.b
+** ptrue (p[0-7])\.b, all
+** mla z0\.b, \2/m, (z1\.b, \1|\1, z1\.b)
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svuint8_t __attribute__((noipa))
+caller_u8 (void)
+{
+ svuint8x4_t res;
+ res = callee_u8 ();
+ return svmla_x (svptrue_b8 (), svget4 (res, 0), svget4 (res, 1),
+ svsub_x (svptrue_b8 (),
+ svget4 (res, 2),
+ svget4 (res, 3)));
+}
+
+/*
+** callee_s16:
+** mov z0\.h, #1
+** mov z1\.h, #2
+** mov z2\.h, #3
+** mov z3\.h, #4
+** ret
+*/
+svint16x4_t __attribute__((noipa))
+callee_s16 (void)
+{
+ return svcreate4 (svdup_s16 (1), svdup_s16 (2),
+ svdup_s16 (3), svdup_s16 (4));
+}
+
+/*
+** caller_s16:
+** ...
+** bl callee_s16
+** add (z[2-7]\.h), z2\.h, z3\.h
+** ptrue (p[0-7])\.b, all
+** mad z0\.h, \2/m, (z1\.h, \1|\1, z1\.h)
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svint16_t __attribute__((noipa))
+caller_s16 (void)
+{
+ svint16x4_t res;
+ res = callee_s16 ();
+ return svmad_x (svptrue_b16 (), svget4 (res, 0), svget4 (res, 1),
+ svadd_x (svptrue_b16 (),
+ svget4 (res, 2),
+ svget4 (res, 3)));
+}
+
+/*
+** callee_u16:
+** mov z0\.h, #4
+** mov z1\.h, #5
+** mov z2\.h, #6
+** mov z3\.h, #7
+** ret
+*/
+svuint16x4_t __attribute__((noipa))
+callee_u16 (void)
+{
+ return svcreate4 (svdup_u16 (4), svdup_u16 (5),
+ svdup_u16 (6), svdup_u16 (7));
+}
+
+/*
+** caller_u16:
+** ...
+** bl callee_u16
+** sub (z[2-7]\.h), z2\.h, z3\.h
+** ptrue (p[0-7])\.b, all
+** mad z0\.h, \2/m, (z1\.h, \1|\1, z1\.h)
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svuint16_t __attribute__((noipa))
+caller_u16 (void)
+{
+ svuint16x4_t res;
+ res = callee_u16 ();
+ return svmad_x (svptrue_b16 (), svget4 (res, 0), svget4 (res, 1),
+ svsub_x (svptrue_b16 (),
+ svget4 (res, 2),
+ svget4 (res, 3)));
+}
+
+/*
+** callee_f16:
+** fmov z0\.h, #1\.0(?:e\+0)?
+** fmov z1\.h, #2\.0(?:e\+0)?
+** fmov z2\.h, #3\.0(?:e\+0)?
+** fmov z3\.h, #4\.0(?:e\+0)?
+** ret
+*/
+svfloat16x4_t __attribute__((noipa))
+callee_f16 (void)
+{
+ return svcreate4 (svdup_f16 (1), svdup_f16 (2),
+ svdup_f16 (3), svdup_f16 (4));
+}
+
+/*
+** caller_f16:
+** ...
+** bl callee_f16
+** fadd (z[0-9]+\.h), z0\.h, z1\.h
+** fmul (z[0-9]+\.h), \1, z2\.h
+** fadd z0\.h, \2, z3\.h
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svfloat16_t __attribute__((noipa))
+caller_f16 (void)
+{
+ svfloat16x4_t res;
+ res = callee_f16 ();
+ return svadd_x (svptrue_b16 (),
+ svmul_x (svptrue_b16 (),
+ svadd_x (svptrue_b16 (), svget4 (res, 0),
+ svget4 (res, 1)),
+ svget4 (res, 2)),
+ svget4 (res, 3));
+}
+
+/*
+** callee_s32:
+** mov z0\.s, #1
+** mov z1\.s, #2
+** mov z2\.s, #3
+** mov z3\.s, #4
+** ret
+*/
+svint32x4_t __attribute__((noipa))
+callee_s32 (void)
+{
+ return svcreate4 (svdup_s32 (1), svdup_s32 (2),
+ svdup_s32 (3), svdup_s32 (4));
+}
+
+/*
+** caller_s32:
+** ...
+** bl callee_s32
+** add (z[2-7]\.s), z2\.s, z3\.s
+** ptrue (p[0-7])\.b, all
+** msb z0\.s, \2/m, (z1\.s, \1|\1, z1\.s)
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svint32_t __attribute__((noipa))
+caller_s32 (void)
+{
+ svint32x4_t res;
+ res = callee_s32 ();
+ return svmsb_x (svptrue_b32 (), svget4 (res, 0), svget4 (res, 1),
+ svadd_x (svptrue_b32 (),
+ svget4 (res, 2),
+ svget4 (res, 3)));
+}
+
+/*
+** callee_u32:
+** mov z0\.s, #4
+** mov z1\.s, #5
+** mov z2\.s, #6
+** mov z3\.s, #7
+** ret
+*/
+svuint32x4_t __attribute__((noipa))
+callee_u32 (void)
+{
+ return svcreate4 (svdup_u32 (4), svdup_u32 (5),
+ svdup_u32 (6), svdup_u32 (7));
+}
+
+/*
+** caller_u32:
+** ...
+** bl callee_u32
+** sub (z[2-7]\.s), z2\.s, z3\.s
+** ptrue (p[0-7])\.b, all
+** msb z0\.s, \2/m, (z1\.s, \1|\1, z1\.s)
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svuint32_t __attribute__((noipa))
+caller_u32 (void)
+{
+ svuint32x4_t res;
+ res = callee_u32 ();
+ return svmsb_x (svptrue_b32 (), svget4 (res, 0), svget4 (res, 1),
+ svsub_x (svptrue_b32 (),
+ svget4 (res, 2),
+ svget4 (res, 3)));
+}
+
+/*
+** callee_f32:
+** fmov z0\.s, #1\.0(?:e\+0)?
+** fmov z1\.s, #2\.0(?:e\+0)?
+** fmov z2\.s, #3\.0(?:e\+0)?
+** fmov z3\.s, #4\.0(?:e\+0)?
+** ret
+*/
+svfloat32x4_t __attribute__((noipa))
+callee_f32 (void)
+{
+ return svcreate4 (svdup_f32 (1), svdup_f32 (2),
+ svdup_f32 (3), svdup_f32 (4));
+}
+
+/*
+** caller_f32:
+** ...
+** bl callee_f32
+** fadd (z[0-9]+\.s), z0\.s, z1\.s
+** fmul (z[0-9]+\.s), \1, z2\.s
+** fadd z0\.s, \2, z3\.s
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svfloat32_t __attribute__((noipa))
+caller_f32 (void)
+{
+ svfloat32x4_t res;
+ res = callee_f32 ();
+ return svadd_x (svptrue_b32 (),
+ svmul_x (svptrue_b32 (),
+ svadd_x (svptrue_b32 (), svget4 (res, 0),
+ svget4 (res, 1)),
+ svget4 (res, 2)),
+ svget4 (res, 3));
+}
+
+/*
+** callee_s64:
+** mov z0\.d, #1
+** mov z1\.d, #2
+** mov z2\.d, #3
+** mov z3\.d, #4
+** ret
+*/
+svint64x4_t __attribute__((noipa))
+callee_s64 (void)
+{
+ return svcreate4 (svdup_s64 (1), svdup_s64 (2),
+ svdup_s64 (3), svdup_s64 (4));
+}
+
+/*
+** caller_s64:
+** ...
+** bl callee_s64
+** add (z[2-7]\.d), z2\.d, z3\.d
+** ptrue (p[0-7])\.b, all
+** mls z0\.d, \2/m, (z1\.d, \1|\1, z1\.d)
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svint64_t __attribute__((noipa))
+caller_s64 (void)
+{
+ svint64x4_t res;
+ res = callee_s64 ();
+ return svmls_x (svptrue_b64 (), svget4 (res, 0), svget4 (res, 1),
+ svadd_x (svptrue_b64 (),
+ svget4 (res, 2),
+ svget4 (res, 3)));
+}
+
+/*
+** callee_u64:
+** mov z0\.d, #4
+** mov z1\.d, #5
+** mov z2\.d, #6
+** mov z3\.d, #7
+** ret
+*/
+svuint64x4_t __attribute__((noipa))
+callee_u64 (void)
+{
+ return svcreate4 (svdup_u64 (4), svdup_u64 (5),
+ svdup_u64 (6), svdup_u64 (7));
+}
+
+/*
+** caller_u64:
+** ...
+** bl callee_u64
+** sub (z[2-7]\.d), z2\.d, z3\.d
+** ptrue (p[0-7])\.b, all
+** mls z0\.d, \2/m, (z1\.d, \1|\1, z1\.d)
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svuint64_t __attribute__((noipa))
+caller_u64 (void)
+{
+ svuint64x4_t res;
+ res = callee_u64 ();
+ return svmls_x (svptrue_b64 (), svget4 (res, 0), svget4 (res, 1),
+ svsub_x (svptrue_b64 (),
+ svget4 (res, 2),
+ svget4 (res, 3)));
+}
+
+/*
+** callee_f64:
+** fmov z0\.d, #1\.0(?:e\+0)?
+** fmov z1\.d, #2\.0(?:e\+0)?
+** fmov z2\.d, #3\.0(?:e\+0)?
+** fmov z3\.d, #4\.0(?:e\+0)?
+** ret
+*/
+svfloat64x4_t __attribute__((noipa))
+callee_f64 (void)
+{
+ return svcreate4 (svdup_f64 (1), svdup_f64 (2),
+ svdup_f64 (3), svdup_f64 (4));
+}
+
+/*
+** caller_f64:
+** ...
+** bl callee_f64
+** fadd (z[0-9]+\.d), z0\.d, z1\.d
+** fmul (z[0-9]+\.d), \1, z2\.d
+** fadd z0\.d, \2, z3\.d
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svfloat64_t __attribute__((noipa))
+caller_f64 (void)
+{
+ svfloat64x4_t res;
+ res = callee_f64 ();
+ return svadd_x (svptrue_b64 (),
+ svmul_x (svptrue_b64 (),
+ svadd_x (svptrue_b64 (), svget4 (res, 0),
+ svget4 (res, 1)),
+ svget4 (res, 2)),
+ svget4 (res, 3));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-shrink-wrap -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** test_1:
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** ptrue p1\.b, all
+** st1d z8\.d, p1, \[sp, #1, mul vl\]
+** st1d z9\.d, p1, \[sp, #2, mul vl\]
+** st1d z10\.d, p1, \[sp, #3, mul vl\]
+** st1d z11\.d, p1, \[sp, #4, mul vl\]
+** st1d z12\.d, p1, \[sp, #5, mul vl\]
+** st1d z13\.d, p1, \[sp, #6, mul vl\]
+** st1d z14\.d, p1, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** st1d z15\.d, p1, \[x11, #-8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** ptrue p0\.b, all
+** ptrue p1\.b, all
+** ld1d z8\.d, p1/z, \[sp, #1, mul vl\]
+** ld1d z9\.d, p1/z, \[sp, #2, mul vl\]
+** ld1d z10\.d, p1/z, \[sp, #3, mul vl\]
+** ld1d z11\.d, p1/z, \[sp, #4, mul vl\]
+** ld1d z12\.d, p1/z, \[sp, #5, mul vl\]
+** ld1d z13\.d, p1/z, \[sp, #6, mul vl\]
+** ld1d z14\.d, p1/z, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** ld1d z15\.d, p1/z, \[x11, #-8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z8", "z9", "z10", "z11", "z12", "z13", "z14", "z15",
+ "z16", "z17", "z18", "z19", "z20", "z21", "z22", "z23",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7",
+ "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** ptrue p0\.b, all
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** addvl sp, sp, #-6
+** str p5, \[sp\]
+** str p6, \[sp, #1, mul vl\]
+** str p11, \[sp, #2, mul vl\]
+** ptrue p1\.b, all
+** st1d z8\.d, p1, \[sp, #1, mul vl\]
+** st1d z13\.d, p1, \[sp, #2, mul vl\]
+** str z19, \[sp, #3, mul vl\]
+** str z20, \[sp, #4, mul vl\]
+** str z22, \[sp, #5, mul vl\]
+** ptrue p0\.b, all
+** ptrue p1\.b, all
+** ld1d z8\.d, p1/z, \[sp, #1, mul vl\]
+** ld1d z13\.d, p1/z, \[sp, #2, mul vl\]
+** ldr z19, \[sp, #3, mul vl\]
+** ldr z20, \[sp, #4, mul vl\]
+** ldr z22, \[sp, #5, mul vl\]
+** ldr p5, \[sp\]
+** ldr p6, \[sp, #1, mul vl\]
+** ldr p11, \[sp, #2, mul vl\]
+** addvl sp, sp, #6
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ asm volatile ("" :::
+ "z8", "z13", "z19", "z20", "z22",
+ "p5", "p6", "p11");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p0\.b, all
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_5:
+** addvl sp, sp, #-1
+** ptrue p1\.b, all
+** st1d z15\.d, p1, \[sp\]
+** ptrue p0\.b, all
+** ptrue p1\.b, all
+** ld1d z15\.d, p1/z, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ asm volatile ("" ::: "z15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_6:
+** addvl sp, sp, #-2
+** str p4, \[sp\]
+** ptrue p4\.b, all
+** st1d z15\.d, p4, \[sp, #1, mul vl\]
+** mov z0\.b, #1
+** ptrue p4\.b, all
+** ld1d z15\.d, p4/z, \[sp, #1, mul vl\]
+** ldr p4, \[sp\]
+** addvl sp, sp, #2
+** ret
+*/
+svint8_t
+test_6 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ asm volatile ("" :: "Upa" (p0), "Upa" (p1), "Upa" (p2), "Upa" (p3) : "z15");
+ return svdup_s8 (1);
+}
+
+/*
+** test_7:
+** addvl sp, sp, #-1
+** str z16, \[sp\]
+** ptrue p0\.b, all
+** ldr z16, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ asm volatile ("" ::: "z16");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fshrink-wrap -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** test_1:
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** ptrue p1\.b, all
+** st1d z8\.d, p1, \[sp, #1, mul vl\]
+** st1d z9\.d, p1, \[sp, #2, mul vl\]
+** st1d z10\.d, p1, \[sp, #3, mul vl\]
+** st1d z11\.d, p1, \[sp, #4, mul vl\]
+** st1d z12\.d, p1, \[sp, #5, mul vl\]
+** st1d z13\.d, p1, \[sp, #6, mul vl\]
+** st1d z14\.d, p1, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** st1d z15\.d, p1, \[x11, #-8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** ptrue p0\.b, all
+** ptrue p1\.b, all
+** ld1d z8\.d, p1/z, \[sp, #1, mul vl\]
+** ld1d z9\.d, p1/z, \[sp, #2, mul vl\]
+** ld1d z10\.d, p1/z, \[sp, #3, mul vl\]
+** ld1d z11\.d, p1/z, \[sp, #4, mul vl\]
+** ld1d z12\.d, p1/z, \[sp, #5, mul vl\]
+** ld1d z13\.d, p1/z, \[sp, #6, mul vl\]
+** ld1d z14\.d, p1/z, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** ld1d z15\.d, p1/z, \[x11, #-8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z8", "z9", "z10", "z11", "z12", "z13", "z14", "z15",
+ "z16", "z17", "z18", "z19", "z20", "z21", "z22", "z23",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7",
+ "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** ptrue p0\.b, all
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** addvl sp, sp, #-6
+** str p5, \[sp\]
+** str p6, \[sp, #1, mul vl\]
+** str p11, \[sp, #2, mul vl\]
+** ptrue p1\.b, all
+** st1d z8\.d, p1, \[sp, #1, mul vl\]
+** st1d z13\.d, p1, \[sp, #2, mul vl\]
+** str z19, \[sp, #3, mul vl\]
+** str z20, \[sp, #4, mul vl\]
+** str z22, \[sp, #5, mul vl\]
+** ptrue p0\.b, all
+** ptrue p1\.b, all
+** ld1d z8\.d, p1/z, \[sp, #1, mul vl\]
+** ld1d z13\.d, p1/z, \[sp, #2, mul vl\]
+** ldr z19, \[sp, #3, mul vl\]
+** ldr z20, \[sp, #4, mul vl\]
+** ldr z22, \[sp, #5, mul vl\]
+** ldr p5, \[sp\]
+** ldr p6, \[sp, #1, mul vl\]
+** ldr p11, \[sp, #2, mul vl\]
+** addvl sp, sp, #6
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ asm volatile ("" :::
+ "z8", "z13", "z19", "z20", "z22",
+ "p5", "p6", "p11");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p0\.b, all
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_5:
+** addvl sp, sp, #-1
+** ptrue p1\.b, all
+** st1d z15\.d, p1, \[sp\]
+** ptrue p0\.b, all
+** ptrue p1\.b, all
+** ld1d z15\.d, p1/z, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ asm volatile ("" ::: "z15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_6:
+** addvl sp, sp, #-2
+** str p4, \[sp\]
+** ptrue p4\.b, all
+** st1d z15\.d, p4, \[sp, #1, mul vl\]
+** mov z0\.b, #1
+** ptrue p4\.b, all
+** ld1d z15\.d, p4/z, \[sp, #1, mul vl\]
+** ldr p4, \[sp\]
+** addvl sp, sp, #2
+** ret
+*/
+svint8_t
+test_6 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ asm volatile ("" :: "Upa" (p0), "Upa" (p1), "Upa" (p2), "Upa" (p3) : "z15");
+ return svdup_s8 (1);
+}
+
+/*
+** test_7:
+** addvl sp, sp, #-1
+** str z16, \[sp\]
+** ptrue p0\.b, all
+** ldr z16, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ asm volatile ("" ::: "z16");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-shrink-wrap -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** test_1:
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** ptrue p0\.b, all
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z8", "z9", "z10", "z11", "z12", "z13", "z14", "z15",
+ "z16", "z17", "z18", "z19", "z20", "z21", "z22", "z23",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7",
+ "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** ptrue p0\.b, all
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** addvl sp, sp, #-6
+** str p5, \[sp\]
+** str p6, \[sp, #1, mul vl\]
+** str p11, \[sp, #2, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z13, \[sp, #2, mul vl\]
+** str z19, \[sp, #3, mul vl\]
+** str z20, \[sp, #4, mul vl\]
+** str z22, \[sp, #5, mul vl\]
+** ptrue p0\.b, all
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z13, \[sp, #2, mul vl\]
+** ldr z19, \[sp, #3, mul vl\]
+** ldr z20, \[sp, #4, mul vl\]
+** ldr z22, \[sp, #5, mul vl\]
+** ldr p5, \[sp\]
+** ldr p6, \[sp, #1, mul vl\]
+** ldr p11, \[sp, #2, mul vl\]
+** addvl sp, sp, #6
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ asm volatile ("" :::
+ "z8", "z13", "z19", "z20", "z22",
+ "p5", "p6", "p11");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p0\.b, all
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_5:
+** addvl sp, sp, #-1
+** str z15, \[sp\]
+** ptrue p0\.b, all
+** ldr z15, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ asm volatile ("" ::: "z15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_6:
+** addvl sp, sp, #-1
+** str z15, \[sp\]
+** mov z0\.b, #1
+** ldr z15, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svint8_t
+test_6 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ asm volatile ("" :: "Upa" (p0), "Upa" (p1), "Upa" (p2), "Upa" (p3) : "z15");
+ return svdup_s8 (1);
+}
+
+/*
+** test_7:
+** addvl sp, sp, #-1
+** str z16, \[sp\]
+** ptrue p0\.b, all
+** ldr z16, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ asm volatile ("" ::: "z16");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fshrink-wrap -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** test_1:
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** ptrue p0\.b, all
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z8", "z9", "z10", "z11", "z12", "z13", "z14", "z15",
+ "z16", "z17", "z18", "z19", "z20", "z21", "z22", "z23",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7",
+ "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** ptrue p0\.b, all
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** addvl sp, sp, #-6
+** str p5, \[sp\]
+** str p6, \[sp, #1, mul vl\]
+** str p11, \[sp, #2, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z13, \[sp, #2, mul vl\]
+** str z19, \[sp, #3, mul vl\]
+** str z20, \[sp, #4, mul vl\]
+** str z22, \[sp, #5, mul vl\]
+** ptrue p0\.b, all
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z13, \[sp, #2, mul vl\]
+** ldr z19, \[sp, #3, mul vl\]
+** ldr z20, \[sp, #4, mul vl\]
+** ldr z22, \[sp, #5, mul vl\]
+** ldr p5, \[sp\]
+** ldr p6, \[sp, #1, mul vl\]
+** ldr p11, \[sp, #2, mul vl\]
+** addvl sp, sp, #6
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ asm volatile ("" :::
+ "z8", "z13", "z19", "z20", "z22",
+ "p5", "p6", "p11");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** addvl sp, sp, #-1
+** str p4, \[sp\]
+** ptrue p0\.b, all
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_5:
+** addvl sp, sp, #-1
+** str z15, \[sp\]
+** ptrue p0\.b, all
+** ldr z15, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ asm volatile ("" ::: "z15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_6:
+** addvl sp, sp, #-1
+** str z15, \[sp\]
+** mov z0\.b, #1
+** ldr z15, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svint8_t
+test_6 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ asm volatile ("" :: "Upa" (p0), "Upa" (p1), "Upa" (p2), "Upa" (p3) : "z15");
+ return svdup_s8 (1);
+}
+
+/*
+** test_7:
+** addvl sp, sp, #-1
+** str z16, \[sp\]
+** ptrue p0\.b, all
+** ldr z16, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ asm volatile ("" ::: "z16");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-shrink-wrap -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+void standard_callee (void);
+__attribute__((aarch64_vector_pcs)) void vpcs_callee (void);
+
+/*
+** calls_standard:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** ptrue p0\.b, all
+** st1d z8\.d, p0, \[sp, #1, mul vl\]
+** st1d z9\.d, p0, \[sp, #2, mul vl\]
+** st1d z10\.d, p0, \[sp, #3, mul vl\]
+** st1d z11\.d, p0, \[sp, #4, mul vl\]
+** st1d z12\.d, p0, \[sp, #5, mul vl\]
+** st1d z13\.d, p0, \[sp, #6, mul vl\]
+** st1d z14\.d, p0, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** st1d z15\.d, p0, \[x11, #-8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** bl standard_callee
+** ptrue p0\.b, all
+** ld1d z8\.d, p0/z, \[sp, #1, mul vl\]
+** ld1d z9\.d, p0/z, \[sp, #2, mul vl\]
+** ld1d z10\.d, p0/z, \[sp, #3, mul vl\]
+** ld1d z11\.d, p0/z, \[sp, #4, mul vl\]
+** ld1d z12\.d, p0/z, \[sp, #5, mul vl\]
+** ld1d z13\.d, p0/z, \[sp, #6, mul vl\]
+** ld1d z14\.d, p0/z, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** ld1d z15\.d, p0/z, \[x11, #-8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void calls_standard (__SVInt8_t x) { standard_callee (); }
+
+/*
+** calls_vpcs:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** ptrue p0\.b, all
+** st1d z8\.d, p0, \[sp, #1, mul vl\]
+** st1d z9\.d, p0, \[sp, #2, mul vl\]
+** st1d z10\.d, p0, \[sp, #3, mul vl\]
+** st1d z11\.d, p0, \[sp, #4, mul vl\]
+** st1d z12\.d, p0, \[sp, #5, mul vl\]
+** st1d z13\.d, p0, \[sp, #6, mul vl\]
+** st1d z14\.d, p0, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** st1d z15\.d, p0, \[x11, #-8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** bl vpcs_callee
+** ptrue p0\.b, all
+** ld1d z8\.d, p0/z, \[sp, #1, mul vl\]
+** ld1d z9\.d, p0/z, \[sp, #2, mul vl\]
+** ld1d z10\.d, p0/z, \[sp, #3, mul vl\]
+** ld1d z11\.d, p0/z, \[sp, #4, mul vl\]
+** ld1d z12\.d, p0/z, \[sp, #5, mul vl\]
+** ld1d z13\.d, p0/z, \[sp, #6, mul vl\]
+** ld1d z14\.d, p0/z, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** ld1d z15\.d, p0/z, \[x11, #-8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void calls_vpcs (__SVInt8_t x) { vpcs_callee (); }
+
+/*
+** calls_standard_ptr:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** ptrue p0\.b, all
+** st1d z8\.d, p0, \[sp, #1, mul vl\]
+** st1d z9\.d, p0, \[sp, #2, mul vl\]
+** st1d z10\.d, p0, \[sp, #3, mul vl\]
+** st1d z11\.d, p0, \[sp, #4, mul vl\]
+** st1d z12\.d, p0, \[sp, #5, mul vl\]
+** st1d z13\.d, p0, \[sp, #6, mul vl\]
+** st1d z14\.d, p0, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** st1d z15\.d, p0, \[x11, #-8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** blr x0
+** ptrue p0\.b, all
+** ld1d z8\.d, p0/z, \[sp, #1, mul vl\]
+** ld1d z9\.d, p0/z, \[sp, #2, mul vl\]
+** ld1d z10\.d, p0/z, \[sp, #3, mul vl\]
+** ld1d z11\.d, p0/z, \[sp, #4, mul vl\]
+** ld1d z12\.d, p0/z, \[sp, #5, mul vl\]
+** ld1d z13\.d, p0/z, \[sp, #6, mul vl\]
+** ld1d z14\.d, p0/z, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** ld1d z15\.d, p0/z, \[x11, #-8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void
+calls_standard_ptr (__SVInt8_t x, void (*fn) (void))
+{
+ fn ();
+}
+
+/*
+** calls_vpcs_ptr:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** ptrue p0\.b, all
+** st1d z8\.d, p0, \[sp, #1, mul vl\]
+** st1d z9\.d, p0, \[sp, #2, mul vl\]
+** st1d z10\.d, p0, \[sp, #3, mul vl\]
+** st1d z11\.d, p0, \[sp, #4, mul vl\]
+** st1d z12\.d, p0, \[sp, #5, mul vl\]
+** st1d z13\.d, p0, \[sp, #6, mul vl\]
+** st1d z14\.d, p0, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** st1d z15\.d, p0, \[x11, #-8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** blr x0
+** ptrue p0\.b, all
+** ld1d z8\.d, p0/z, \[sp, #1, mul vl\]
+** ld1d z9\.d, p0/z, \[sp, #2, mul vl\]
+** ld1d z10\.d, p0/z, \[sp, #3, mul vl\]
+** ld1d z11\.d, p0/z, \[sp, #4, mul vl\]
+** ld1d z12\.d, p0/z, \[sp, #5, mul vl\]
+** ld1d z13\.d, p0/z, \[sp, #6, mul vl\]
+** ld1d z14\.d, p0/z, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** ld1d z15\.d, p0/z, \[x11, #-8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void
+calls_vpcs_ptr (__SVInt8_t x,
+ void (*__attribute__((aarch64_vector_pcs)) fn) (void))
+{
+ fn ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fshrink-wrap -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+void standard_callee (void);
+__attribute__((aarch64_vector_pcs)) void vpcs_callee (void);
+
+/*
+** calls_standard:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** ptrue p0\.b, all
+** st1d z8\.d, p0, \[sp, #1, mul vl\]
+** st1d z9\.d, p0, \[sp, #2, mul vl\]
+** st1d z10\.d, p0, \[sp, #3, mul vl\]
+** st1d z11\.d, p0, \[sp, #4, mul vl\]
+** st1d z12\.d, p0, \[sp, #5, mul vl\]
+** st1d z13\.d, p0, \[sp, #6, mul vl\]
+** st1d z14\.d, p0, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** st1d z15\.d, p0, \[x11, #-8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** bl standard_callee
+** ptrue p0\.b, all
+** ld1d z8\.d, p0/z, \[sp, #1, mul vl\]
+** ld1d z9\.d, p0/z, \[sp, #2, mul vl\]
+** ld1d z10\.d, p0/z, \[sp, #3, mul vl\]
+** ld1d z11\.d, p0/z, \[sp, #4, mul vl\]
+** ld1d z12\.d, p0/z, \[sp, #5, mul vl\]
+** ld1d z13\.d, p0/z, \[sp, #6, mul vl\]
+** ld1d z14\.d, p0/z, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** ld1d z15\.d, p0/z, \[x11, #-8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void calls_standard (__SVInt8_t x) { standard_callee (); }
+
+/*
+** calls_vpcs:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** ptrue p0\.b, all
+** st1d z8\.d, p0, \[sp, #1, mul vl\]
+** st1d z9\.d, p0, \[sp, #2, mul vl\]
+** st1d z10\.d, p0, \[sp, #3, mul vl\]
+** st1d z11\.d, p0, \[sp, #4, mul vl\]
+** st1d z12\.d, p0, \[sp, #5, mul vl\]
+** st1d z13\.d, p0, \[sp, #6, mul vl\]
+** st1d z14\.d, p0, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** st1d z15\.d, p0, \[x11, #-8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** bl vpcs_callee
+** ptrue p0\.b, all
+** ld1d z8\.d, p0/z, \[sp, #1, mul vl\]
+** ld1d z9\.d, p0/z, \[sp, #2, mul vl\]
+** ld1d z10\.d, p0/z, \[sp, #3, mul vl\]
+** ld1d z11\.d, p0/z, \[sp, #4, mul vl\]
+** ld1d z12\.d, p0/z, \[sp, #5, mul vl\]
+** ld1d z13\.d, p0/z, \[sp, #6, mul vl\]
+** ld1d z14\.d, p0/z, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** ld1d z15\.d, p0/z, \[x11, #-8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void calls_vpcs (__SVInt8_t x) { vpcs_callee (); }
+
+/*
+** calls_standard_ptr:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** ptrue p0\.b, all
+** st1d z8\.d, p0, \[sp, #1, mul vl\]
+** st1d z9\.d, p0, \[sp, #2, mul vl\]
+** st1d z10\.d, p0, \[sp, #3, mul vl\]
+** st1d z11\.d, p0, \[sp, #4, mul vl\]
+** st1d z12\.d, p0, \[sp, #5, mul vl\]
+** st1d z13\.d, p0, \[sp, #6, mul vl\]
+** st1d z14\.d, p0, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** st1d z15\.d, p0, \[x11, #-8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** blr x0
+** ptrue p0\.b, all
+** ld1d z8\.d, p0/z, \[sp, #1, mul vl\]
+** ld1d z9\.d, p0/z, \[sp, #2, mul vl\]
+** ld1d z10\.d, p0/z, \[sp, #3, mul vl\]
+** ld1d z11\.d, p0/z, \[sp, #4, mul vl\]
+** ld1d z12\.d, p0/z, \[sp, #5, mul vl\]
+** ld1d z13\.d, p0/z, \[sp, #6, mul vl\]
+** ld1d z14\.d, p0/z, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** ld1d z15\.d, p0/z, \[x11, #-8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void
+calls_standard_ptr (__SVInt8_t x, void (*fn) (void))
+{
+ fn ();
+}
+
+/*
+** calls_vpcs_ptr:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** ptrue p0\.b, all
+** st1d z8\.d, p0, \[sp, #1, mul vl\]
+** st1d z9\.d, p0, \[sp, #2, mul vl\]
+** st1d z10\.d, p0, \[sp, #3, mul vl\]
+** st1d z11\.d, p0, \[sp, #4, mul vl\]
+** st1d z12\.d, p0, \[sp, #5, mul vl\]
+** st1d z13\.d, p0, \[sp, #6, mul vl\]
+** st1d z14\.d, p0, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** st1d z15\.d, p0, \[x11, #-8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** blr x0
+** ptrue p0\.b, all
+** ld1d z8\.d, p0/z, \[sp, #1, mul vl\]
+** ld1d z9\.d, p0/z, \[sp, #2, mul vl\]
+** ld1d z10\.d, p0/z, \[sp, #3, mul vl\]
+** ld1d z11\.d, p0/z, \[sp, #4, mul vl\]
+** ld1d z12\.d, p0/z, \[sp, #5, mul vl\]
+** ld1d z13\.d, p0/z, \[sp, #6, mul vl\]
+** ld1d z14\.d, p0/z, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** ld1d z15\.d, p0/z, \[x11, #-8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void
+calls_vpcs_ptr (__SVInt8_t x,
+ void (*__attribute__((aarch64_vector_pcs)) fn) (void))
+{
+ fn ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-shrink-wrap -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+void standard_callee (void);
+__attribute__((aarch64_vector_pcs)) void vpcs_callee (void);
+
+/*
+** calls_standard:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** bl standard_callee
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void calls_standard (__SVInt8_t x) { standard_callee (); }
+
+/*
+** calls_vpcs:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** bl vpcs_callee
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void calls_vpcs (__SVInt8_t x) { vpcs_callee (); }
+
+/*
+** calls_standard_ptr:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** blr x0
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void
+calls_standard_ptr (__SVInt8_t x, void (*fn) (void))
+{
+ fn ();
+}
+
+/*
+** calls_vpcs_ptr:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** blr x0
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void
+calls_vpcs_ptr (__SVInt8_t x,
+ void (*__attribute__((aarch64_vector_pcs)) fn) (void))
+{
+ fn ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fshrink-wrap -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+void standard_callee (void);
+__attribute__((aarch64_vector_pcs)) void vpcs_callee (void);
+
+/*
+** calls_standard:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** bl standard_callee
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void calls_standard (__SVInt8_t x) { standard_callee (); }
+
+/*
+** calls_vpcs:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** bl vpcs_callee
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void calls_vpcs (__SVInt8_t x) { vpcs_callee (); }
+
+/*
+** calls_standard_ptr:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** blr x0
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void
+calls_standard_ptr (__SVInt8_t x, void (*fn) (void))
+{
+ fn ();
+}
+
+/*
+** calls_vpcs_ptr:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** blr x0
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+void
+calls_vpcs_ptr (__SVInt8_t x,
+ void (*__attribute__((aarch64_vector_pcs)) fn) (void))
+{
+ fn ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+
+int sve_callee (svint8_t);
+
+/*
+** standard_caller:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** mov z0\.b, #1
+** bl sve_callee
+** add w0, w0, #?1
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+int standard_caller (void) { return sve_callee (svdup_s8 (1)) + 1; }
+
+/*
+** vpcs_caller:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** mov z0\.b, #1
+** bl sve_callee
+** add w0, w0, #?1
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+__attribute__((aarch64_vector_pcs))
+int vpcs_caller (void) { return sve_callee (svdup_s8 (1)) + 1; }
+
+/*
+** sve_caller:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** mov z0\.b, #1
+** bl sve_callee
+** add w0, w0, #?1
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+int sve_caller (svbool_t p0) { return sve_callee (svdup_s8 (1)) + 1; }
+
+/*
+** standard_caller_ptr:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** mov z0\.h, #1
+** blr x0
+** add w0, w0, #?1
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+int
+standard_caller_ptr (int (*fn) (__SVInt16_t))
+{
+ return fn (svdup_s16 (1)) + 1;
+}
+
+/*
+** vpcs_caller_ptr:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** mov z0\.h, #1
+** blr x0
+** add w0, w0, #?1
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+int __attribute__((aarch64_vector_pcs))
+vpcs_caller_ptr (int (*fn) (__SVInt16_t))
+{
+ return fn (svdup_s16 (1)) + 1;
+}
+
+/*
+** sve_caller_ptr:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** mov z0\.h, #1
+** blr x0
+** add w0, w0, #?1
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+int
+sve_caller_ptr (svbool_t pg, int (*fn) (svint16_t))
+{
+ return fn (svdup_s16 (1)) + 1;
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+void standard_callee (__SVInt8_t *);
+
+/*
+** calls_standard:
+** addvl sp, sp, #-1
+** (
+** stp x29, x30, \[sp, -16\]!
+** |
+** sub sp, sp, #?16
+** stp x29, x30, \[sp\]
+** )
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** ptrue p0\.b, all
+** st1d z8\.d, p0, \[sp, #1, mul vl\]
+** st1d z9\.d, p0, \[sp, #2, mul vl\]
+** st1d z10\.d, p0, \[sp, #3, mul vl\]
+** st1d z11\.d, p0, \[sp, #4, mul vl\]
+** st1d z12\.d, p0, \[sp, #5, mul vl\]
+** st1d z13\.d, p0, \[sp, #6, mul vl\]
+** st1d z14\.d, p0, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** st1d z15\.d, p0, \[x11, #-8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** addvl x0, sp, #17
+** add x0, x0, #?16
+** bl standard_callee
+** ptrue p0\.b, all
+** ld1d z8\.d, p0/z, \[sp, #1, mul vl\]
+** ld1d z9\.d, p0/z, \[sp, #2, mul vl\]
+** ld1d z10\.d, p0/z, \[sp, #3, mul vl\]
+** ld1d z11\.d, p0/z, \[sp, #4, mul vl\]
+** ld1d z12\.d, p0/z, \[sp, #5, mul vl\]
+** ld1d z13\.d, p0/z, \[sp, #6, mul vl\]
+** ld1d z14\.d, p0/z, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** ld1d z15\.d, p0/z, \[x11, #-8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** (
+** ldp x29, x30, \[sp\], 16
+** addvl sp, sp, #1
+** |
+** ldp x29, x30, \[sp\]
+** addvl sp, sp, #1
+** add sp, sp, #?16
+** )
+** ret
+*/
+void calls_standard (__SVInt8_t x) { __SVInt8_t tmp; standard_callee (&tmp); }
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+void standard_callee (__SVInt8_t *);
+
+/*
+** calls_standard:
+** addvl sp, sp, #-1
+** (
+** stp x29, x30, \[sp, -16\]!
+** |
+** sub sp, sp, #?16
+** stp x29, x30, \[sp\]
+** )
+** mov x29, sp
+** addvl sp, sp, #-17
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** addvl x0, sp, #17
+** add x0, x0, #?16
+** bl standard_callee
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** (
+** ldp x29, x30, \[sp\], 16
+** addvl sp, sp, #1
+** |
+** ldp x29, x30, \[sp\]
+** addvl sp, sp, #1
+** add sp, sp, #?16
+** )
+** ret
+*/
+void calls_standard (__SVInt8_t x) { __SVInt8_t tmp; standard_callee (&tmp); }
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mbig-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+void standard_callee (void);
+
+/*
+** calls_standard:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** ptrue p0\.b, all
+** st1d z8\.d, p0, \[sp, #1, mul vl\]
+** st1d z9\.d, p0, \[sp, #2, mul vl\]
+** st1d z10\.d, p0, \[sp, #3, mul vl\]
+** st1d z11\.d, p0, \[sp, #4, mul vl\]
+** st1d z12\.d, p0, \[sp, #5, mul vl\]
+** st1d z13\.d, p0, \[sp, #6, mul vl\]
+** st1d z14\.d, p0, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** st1d z15\.d, p0, \[x11, #-8, mul vl\]
+** cbnz w0, \.L[0-9]+
+** ptrue p0\.b, all
+** ld1d z8\.d, p0/z, \[sp, #1, mul vl\]
+** ld1d z9\.d, p0/z, \[sp, #2, mul vl\]
+** ld1d z10\.d, p0/z, \[sp, #3, mul vl\]
+** ld1d z11\.d, p0/z, \[sp, #4, mul vl\]
+** ld1d z12\.d, p0/z, \[sp, #5, mul vl\]
+** ld1d z13\.d, p0/z, \[sp, #6, mul vl\]
+** ld1d z14\.d, p0/z, \[sp, #7, mul vl\]
+** addvl x11, sp, #16
+** ld1d z15\.d, p0/z, \[x11, #-8, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+** ...
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** bl standard_callee
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** b \.L[0-9]+
+*/
+void
+calls_standard (__SVInt8_t x, int y)
+{
+ asm volatile ("" ::: "z8");
+ if (__builtin_expect (y, 0))
+ standard_callee ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+void standard_callee (void);
+
+/*
+** calls_standard:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** addvl sp, sp, #-17
+** str z8, \[sp, #1, mul vl\]
+** cbnz w0, \.L[0-9]+
+** ldr z8, \[sp, #1, mul vl\]
+** addvl sp, sp, #17
+** ldp x29, x30, \[sp\], 16
+** ret
+** ...
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** bl standard_callee
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** b \.L[0-9]+
+*/
+void
+calls_standard (__SVInt8_t x, int y)
+{
+ asm volatile ("" ::: "z8");
+ if (__builtin_expect (y, 0))
+ standard_callee ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fshrink-wrap -fstack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** test_1:
+** cntb x12
+** mov x13, #?17
+** mul x12, x12, x13
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** ptrue p0\.b, all
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** addvl sp, sp, #17
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z8", "z9", "z10", "z11", "z12", "z13", "z14", "z15",
+ "z16", "z17", "z18", "z19", "z20", "z21", "z22", "z23",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7",
+ "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** ptrue p0\.b, all
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** cntb x12, all, mul #6
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** str p5, \[sp\]
+** str p6, \[sp, #1, mul vl\]
+** str p11, \[sp, #2, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z13, \[sp, #2, mul vl\]
+** str z19, \[sp, #3, mul vl\]
+** str z20, \[sp, #4, mul vl\]
+** str z22, \[sp, #5, mul vl\]
+** ptrue p0\.b, all
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z13, \[sp, #2, mul vl\]
+** ldr z19, \[sp, #3, mul vl\]
+** ldr z20, \[sp, #4, mul vl\]
+** ldr z22, \[sp, #5, mul vl\]
+** ldr p5, \[sp\]
+** ldr p6, \[sp, #1, mul vl\]
+** ldr p11, \[sp, #2, mul vl\]
+** addvl sp, sp, #6
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ asm volatile ("" :::
+ "z8", "z13", "z19", "z20", "z22",
+ "p5", "p6", "p11");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** cntb x12
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** str p4, \[sp\]
+** ptrue p0\.b, all
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_5:
+** cntb x12
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** str z15, \[sp\]
+** ptrue p0\.b, all
+** ldr z15, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ asm volatile ("" ::: "z15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_6:
+** cntb x12
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** str z15, \[sp\]
+** mov z0\.b, #1
+** ldr z15, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svint8_t
+test_6 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ asm volatile ("" :: "Upa" (p0), "Upa" (p1), "Upa" (p2), "Upa" (p3) : "z15");
+ return svdup_s8 (1);
+}
+
+/*
+** test_7:
+** cntb x12
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** str z16, \[sp\]
+** ptrue p0\.b, all
+** ldr z16, \[sp\]
+** addvl sp, sp, #1
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ asm volatile ("" ::: "z16");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fshrink-wrap -fstack-clash-protection -msve-vector-bits=1024 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** test_1:
+** sub sp, sp, #2176
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** ptrue p0\.b, vl128
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** add sp, sp, #?2176
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z8", "z9", "z10", "z11", "z12", "z13", "z14", "z15",
+ "z16", "z17", "z18", "z19", "z20", "z21", "z22", "z23",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7",
+ "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** ptrue p0\.b, vl128
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** sub sp, sp, #768
+** str p5, \[sp\]
+** str p6, \[sp, #1, mul vl\]
+** str p11, \[sp, #2, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z13, \[sp, #2, mul vl\]
+** str z19, \[sp, #3, mul vl\]
+** str z20, \[sp, #4, mul vl\]
+** str z22, \[sp, #5, mul vl\]
+** ptrue p0\.b, vl128
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z13, \[sp, #2, mul vl\]
+** ldr z19, \[sp, #3, mul vl\]
+** ldr z20, \[sp, #4, mul vl\]
+** ldr z22, \[sp, #5, mul vl\]
+** ldr p5, \[sp\]
+** ldr p6, \[sp, #1, mul vl\]
+** ldr p11, \[sp, #2, mul vl\]
+** add sp, sp, #?768
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ asm volatile ("" :::
+ "z8", "z13", "z19", "z20", "z22",
+ "p5", "p6", "p11");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** sub sp, sp, #128
+** str p4, \[sp\]
+** ptrue p0\.b, vl128
+** ldr p4, \[sp\]
+** add sp, sp, #?128
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_5:
+** sub sp, sp, #128
+** str z15, \[sp\]
+** ptrue p0\.b, vl128
+** ldr z15, \[sp\]
+** add sp, sp, #?128
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ asm volatile ("" ::: "z15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_6:
+** sub sp, sp, #128
+** str z15, \[sp\]
+** mov z0\.b, #1
+** ldr z15, \[sp\]
+** add sp, sp, #?128
+** ret
+*/
+svint8_t
+test_6 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ asm volatile ("" :: "Upa" (p0), "Upa" (p1), "Upa" (p2), "Upa" (p3) : "z15");
+ return svdup_s8 (1);
+}
+
+/*
+** test_7:
+** sub sp, sp, #128
+** str z16, \[sp\]
+** ptrue p0\.b, vl128
+** ldr z16, \[sp\]
+** add sp, sp, #?128
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ asm volatile ("" ::: "z16");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fshrink-wrap -fstack-clash-protection -msve-vector-bits=2048 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** test_1:
+** mov x12, #?4352
+** sub sp, sp, x12
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** ptrue p0\.b, vl256
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z8", "z9", "z10", "z11", "z12", "z13", "z14", "z15",
+ "z16", "z17", "z18", "z19", "z20", "z21", "z22", "z23",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7",
+ "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** ptrue p0\.b, vl256
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** sub sp, sp, #1536
+** str p5, \[sp\]
+** str p6, \[sp, #1, mul vl\]
+** str p11, \[sp, #2, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z13, \[sp, #2, mul vl\]
+** str z19, \[sp, #3, mul vl\]
+** str z20, \[sp, #4, mul vl\]
+** str z22, \[sp, #5, mul vl\]
+** ptrue p0\.b, vl256
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z13, \[sp, #2, mul vl\]
+** ldr z19, \[sp, #3, mul vl\]
+** ldr z20, \[sp, #4, mul vl\]
+** ldr z22, \[sp, #5, mul vl\]
+** ldr p5, \[sp\]
+** ldr p6, \[sp, #1, mul vl\]
+** ldr p11, \[sp, #2, mul vl\]
+** add sp, sp, #?1536
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ asm volatile ("" :::
+ "z8", "z13", "z19", "z20", "z22",
+ "p5", "p6", "p11");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** sub sp, sp, #256
+** str p4, \[sp\]
+** ptrue p0\.b, vl256
+** ldr p4, \[sp\]
+** add sp, sp, #?256
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_5:
+** sub sp, sp, #256
+** str z15, \[sp\]
+** ptrue p0\.b, vl256
+** ldr z15, \[sp\]
+** add sp, sp, #?256
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ asm volatile ("" ::: "z15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_6:
+** sub sp, sp, #256
+** str z15, \[sp\]
+** mov z0\.b, #1
+** ldr z15, \[sp\]
+** add sp, sp, #?256
+** ret
+*/
+svint8_t
+test_6 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ asm volatile ("" :: "Upa" (p0), "Upa" (p1), "Upa" (p2), "Upa" (p3) : "z15");
+ return svdup_s8 (1);
+}
+
+/*
+** test_7:
+** sub sp, sp, #256
+** str z16, \[sp\]
+** ptrue p0\.b, vl256
+** ldr z16, \[sp\]
+** add sp, sp, #?256
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ asm volatile ("" ::: "z16");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fshrink-wrap -fstack-clash-protection -msve-vector-bits=256 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** test_1:
+** sub sp, sp, #544
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** ptrue p0\.b, vl32
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** add sp, sp, #?544
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z8", "z9", "z10", "z11", "z12", "z13", "z14", "z15",
+ "z16", "z17", "z18", "z19", "z20", "z21", "z22", "z23",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7",
+ "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** ptrue p0\.b, vl32
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** sub sp, sp, #192
+** str p5, \[sp\]
+** str p6, \[sp, #1, mul vl\]
+** str p11, \[sp, #2, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z13, \[sp, #2, mul vl\]
+** str z19, \[sp, #3, mul vl\]
+** str z20, \[sp, #4, mul vl\]
+** str z22, \[sp, #5, mul vl\]
+** ptrue p0\.b, vl32
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z13, \[sp, #2, mul vl\]
+** ldr z19, \[sp, #3, mul vl\]
+** ldr z20, \[sp, #4, mul vl\]
+** ldr z22, \[sp, #5, mul vl\]
+** ldr p5, \[sp\]
+** ldr p6, \[sp, #1, mul vl\]
+** ldr p11, \[sp, #2, mul vl\]
+** add sp, sp, #?192
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ asm volatile ("" :::
+ "z8", "z13", "z19", "z20", "z22",
+ "p5", "p6", "p11");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** sub sp, sp, #32
+** str p4, \[sp\]
+** ptrue p0\.b, vl32
+** ldr p4, \[sp\]
+** add sp, sp, #?32
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_5:
+** sub sp, sp, #32
+** str z15, \[sp\]
+** ptrue p0\.b, vl32
+** ldr z15, \[sp\]
+** add sp, sp, #?32
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ asm volatile ("" ::: "z15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_6:
+** sub sp, sp, #32
+** str z15, \[sp\]
+** mov z0\.b, #1
+** ldr z15, \[sp\]
+** add sp, sp, #?32
+** ret
+*/
+svint8_t
+test_6 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ asm volatile ("" :: "Upa" (p0), "Upa" (p1), "Upa" (p2), "Upa" (p3) : "z15");
+ return svdup_s8 (1);
+}
+
+/*
+** test_7:
+** sub sp, sp, #32
+** str z16, \[sp\]
+** ptrue p0\.b, vl32
+** ldr z16, \[sp\]
+** add sp, sp, #?32
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ asm volatile ("" ::: "z16");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -mlittle-endian -fshrink-wrap -fstack-clash-protection -msve-vector-bits=512 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** test_1:
+** sub sp, sp, #1088
+** str p4, \[sp\]
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** str p7, \[sp, #3, mul vl\]
+** str p8, \[sp, #4, mul vl\]
+** str p9, \[sp, #5, mul vl\]
+** str p10, \[sp, #6, mul vl\]
+** str p11, \[sp, #7, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z9, \[sp, #2, mul vl\]
+** str z10, \[sp, #3, mul vl\]
+** str z11, \[sp, #4, mul vl\]
+** str z12, \[sp, #5, mul vl\]
+** str z13, \[sp, #6, mul vl\]
+** str z14, \[sp, #7, mul vl\]
+** str z15, \[sp, #8, mul vl\]
+** str z16, \[sp, #9, mul vl\]
+** str z17, \[sp, #10, mul vl\]
+** str z18, \[sp, #11, mul vl\]
+** str z19, \[sp, #12, mul vl\]
+** str z20, \[sp, #13, mul vl\]
+** str z21, \[sp, #14, mul vl\]
+** str z22, \[sp, #15, mul vl\]
+** str z23, \[sp, #16, mul vl\]
+** ptrue p0\.b, vl64
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z9, \[sp, #2, mul vl\]
+** ldr z10, \[sp, #3, mul vl\]
+** ldr z11, \[sp, #4, mul vl\]
+** ldr z12, \[sp, #5, mul vl\]
+** ldr z13, \[sp, #6, mul vl\]
+** ldr z14, \[sp, #7, mul vl\]
+** ldr z15, \[sp, #8, mul vl\]
+** ldr z16, \[sp, #9, mul vl\]
+** ldr z17, \[sp, #10, mul vl\]
+** ldr z18, \[sp, #11, mul vl\]
+** ldr z19, \[sp, #12, mul vl\]
+** ldr z20, \[sp, #13, mul vl\]
+** ldr z21, \[sp, #14, mul vl\]
+** ldr z22, \[sp, #15, mul vl\]
+** ldr z23, \[sp, #16, mul vl\]
+** ldr p4, \[sp\]
+** ldr p5, \[sp, #1, mul vl\]
+** ldr p6, \[sp, #2, mul vl\]
+** ldr p7, \[sp, #3, mul vl\]
+** ldr p8, \[sp, #4, mul vl\]
+** ldr p9, \[sp, #5, mul vl\]
+** ldr p10, \[sp, #6, mul vl\]
+** ldr p11, \[sp, #7, mul vl\]
+** add sp, sp, #?1088
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z8", "z9", "z10", "z11", "z12", "z13", "z14", "z15",
+ "z16", "z17", "z18", "z19", "z20", "z21", "z22", "z23",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p4", "p5", "p6", "p7",
+ "p8", "p9", "p10", "p11", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** ptrue p0\.b, vl64
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ asm volatile ("" :::
+ "z0", "z1", "z2", "z3", "z4", "z5", "z6", "z7",
+ "z24", "z25", "z26", "z27", "z28", "z29", "z30", "z31",
+ "p0", "p1", "p2", "p3", "p12", "p13", "p14", "p15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** sub sp, sp, #384
+** str p5, \[sp\]
+** str p6, \[sp, #1, mul vl\]
+** str p11, \[sp, #2, mul vl\]
+** str z8, \[sp, #1, mul vl\]
+** str z13, \[sp, #2, mul vl\]
+** str z19, \[sp, #3, mul vl\]
+** str z20, \[sp, #4, mul vl\]
+** str z22, \[sp, #5, mul vl\]
+** ptrue p0\.b, vl64
+** ldr z8, \[sp, #1, mul vl\]
+** ldr z13, \[sp, #2, mul vl\]
+** ldr z19, \[sp, #3, mul vl\]
+** ldr z20, \[sp, #4, mul vl\]
+** ldr z22, \[sp, #5, mul vl\]
+** ldr p5, \[sp\]
+** ldr p6, \[sp, #1, mul vl\]
+** ldr p11, \[sp, #2, mul vl\]
+** add sp, sp, #?384
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ asm volatile ("" :::
+ "z8", "z13", "z19", "z20", "z22",
+ "p5", "p6", "p11");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** sub sp, sp, #64
+** str p4, \[sp\]
+** ptrue p0\.b, vl64
+** ldr p4, \[sp\]
+** add sp, sp, #?64
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_5:
+** sub sp, sp, #64
+** str z15, \[sp\]
+** ptrue p0\.b, vl64
+** ldr z15, \[sp\]
+** add sp, sp, #?64
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ asm volatile ("" ::: "z15");
+ return svptrue_b8 ();
+}
+
+/*
+** test_6:
+** sub sp, sp, #64
+** str z15, \[sp\]
+** mov z0\.b, #1
+** ldr z15, \[sp\]
+** add sp, sp, #?64
+** ret
+*/
+svint8_t
+test_6 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t p3)
+{
+ asm volatile ("" :: "Upa" (p0), "Upa" (p1), "Upa" (p2), "Upa" (p3) : "z15");
+ return svdup_s8 (1);
+}
+
+/*
+** test_7:
+** sub sp, sp, #64
+** str z16, \[sp\]
+** ptrue p0\.b, vl64
+** ldr z16, \[sp\]
+** add sp, sp, #?64
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ asm volatile ("" ::: "z16");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -fshrink-wrap -fstack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+svbool_t take_stack_args (volatile void *, void *, int, int, int,
+ int, int, int, int);
+
+/*
+** test_1:
+** cntb x12
+** add x12, x12, #?16
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, all
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** add sp, sp, #?16
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ volatile int x = 1;
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** stp x24, x25, \[sp, -48\]!
+** str x26, \[sp, 16\]
+** cntb x13
+** mov x11, sp
+** ...
+** sub sp, sp, x13
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, all
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ldr x26, \[sp, 16\]
+** ldp x24, x25, \[sp\], 48
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ volatile int x = 1;
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** cntb x12
+** mov x13, #?4128
+** add x12, x12, x13
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** addvl x11, sp, #1
+** stp x24, x25, \[x11\]
+** str x26, \[x11, 16\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, all
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ldp x24, x25, \[sp\]
+** ldr x26, \[sp, 16\]
+** mov x12, #?4128
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ volatile int x[1024];
+ asm volatile ("" :: "r" (x) : "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** cntb x12, all, mul #2
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** str p4, \[sp\]
+** ...
+** ptrue p0\.h, all
+** ldr p4, \[sp\]
+** addvl sp, sp, #2
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ volatile svint32_t b;
+ b = svdup_s32 (1);
+ asm volatile ("" ::: "p4");
+ return svptrue_b16 ();
+}
+
+/*
+** test_5:
+** cntb x12, all, mul #2
+** add x12, x12, #?32
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** addvl x11, sp, #1
+** stp x24, x25, \[x11\]
+** str x26, \[x11, 16\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.h, all
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ldp x24, x25, \[sp\]
+** ldr x26, \[sp, 16\]
+** addvl sp, sp, #1
+** add sp, sp, #?32
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ volatile svint32_t b;
+ b = svdup_s32 (1);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b16 ();
+}
+
+/*
+** test_6:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** cntb x13
+** mov x11, sp
+** ...
+** sub sp, sp, x13
+** str p4, \[sp\]
+** sub sp, sp, #?16
+** ...
+** ptrue p0\.b, all
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svbool_t
+test_6 (void)
+{
+ take_stack_args (0, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_7:
+** cntb x12
+** mov x13, #?4112
+** add x12, x12, x13
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** addvl x11, sp, #1
+** stp x29, x30, \[x11\]
+** addvl x29, sp, #1
+** str p4, \[sp\]
+** sub sp, sp, #?16
+** ...
+** ptrue p0\.b, all
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ldp x29, x30, \[sp\]
+** mov x12, #?4112
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ volatile int x[1024];
+ take_stack_args (x, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_8:
+** cntb x12
+** mov x13, #?4144
+** add x12, x12, x13
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** addvl x11, sp, #1
+** stp x29, x30, \[x11\]
+** addvl x29, sp, #1
+** stp x24, x25, \[x29, 16\]
+** str x26, \[x29, 32\]
+** str p4, \[sp\]
+** sub sp, sp, #?16
+** ...
+** ptrue p0\.b, all
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** mov x12, #?4144
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_8 (void)
+{
+ volatile int x[1024];
+ take_stack_args (x, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_9:
+** cntb x12
+** mov x13, #?4112
+** add x12, x12, x13
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** addvl x11, sp, #1
+** stp x29, x30, \[x11\]
+** addvl x29, sp, #1
+** str p4, \[sp\]
+** sub sp, sp, #?16
+** ...
+** ptrue p0\.b, all
+** addvl sp, x29, #-1
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ldp x29, x30, \[sp\]
+** mov x12, #?4112
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_9 (int n)
+{
+ volatile int x[1024];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_10:
+** cntb x12
+** mov x13, #?4144
+** add x12, x12, x13
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** addvl x11, sp, #1
+** stp x29, x30, \[x11\]
+** addvl x29, sp, #1
+** stp x24, x25, \[x29, 16\]
+** str x26, \[x29, 32\]
+** str p4, \[sp\]
+** sub sp, sp, #?16
+** ...
+** ptrue p0\.b, all
+** addvl sp, x29, #-1
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** mov x12, #?4144
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_10 (int n)
+{
+ volatile int x[1024];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_11:
+** cntb x12
+** add x12, x12, #?3008
+** add x12, x12, #?126976
+** mov x11, sp
+** ...
+** sub sp, sp, x12
+** addvl x11, sp, #1
+** stp x29, x30, \[x11\]
+** addvl x29, sp, #1
+** stp x24, x25, \[x29, 16\]
+** str x26, \[x29, 32\]
+** str p4, \[sp\]
+** sub sp, sp, #?16
+** ...
+** ptrue p0\.b, all
+** addvl sp, x29, #-1
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** add sp, sp, #?3008
+** add sp, sp, #?126976
+** ret
+*/
+svbool_t
+test_11 (int n)
+{
+ volatile int x[0x7ee4];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -fshrink-wrap -fstack-clash-protection -msve-vector-bits=1024 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+svbool_t take_stack_args (volatile void *, void *, int, int, int,
+ int, int, int, int);
+
+/*
+** test_1:
+** sub sp, sp, #144
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl128
+** ldr p4, \[sp\]
+** add sp, sp, #?144
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ volatile int x = 1;
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** sub sp, sp, #176
+** stp x24, x25, \[sp, 128\]
+** str x26, \[sp, 144\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl128
+** ldr p4, \[sp\]
+** ldp x24, x25, \[sp, 128\]
+** ldr x26, \[sp, 144\]
+** add sp, sp, #?176
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ volatile int x = 1;
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** mov x12, #?4256
+** sub sp, sp, x12
+** stp x24, x25, \[sp, 128\]
+** str x26, \[sp, 144\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl128
+** ldr p4, \[sp\]
+** ldp x24, x25, \[sp, 128\]
+** ldr x26, \[sp, 144\]
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ volatile int x[1024];
+ asm volatile ("" :: "r" (x) : "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** sub sp, sp, #256
+** str p4, \[sp\]
+** ...
+** ptrue p0\.h, vl64
+** ldr p4, \[sp\]
+** add sp, sp, #?256
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ volatile svint32_t b;
+ b = svdup_s32 (1);
+ asm volatile ("" ::: "p4");
+ return svptrue_b16 ();
+}
+
+/*
+** test_5:
+** sub sp, sp, #288
+** stp x24, x25, \[sp, 128\]
+** str x26, \[sp, 144\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.h, vl64
+** ldr p4, \[sp\]
+** ldp x24, x25, \[sp, 128\]
+** ldr x26, \[sp, 144\]
+** add sp, sp, #?288
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ volatile svint32_t b;
+ b = svdup_s32 (1);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b16 ();
+}
+
+/*
+** test_6:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** sub sp, sp, #128
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl128
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** add sp, sp, #?128
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svbool_t
+test_6 (void)
+{
+ take_stack_args (0, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_7:
+** mov x12, #?4240
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 128\]
+** add x29, sp, #?128
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl128
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** add sp, sp, #?128
+** ldp x29, x30, \[sp\]
+** mov x12, #?4112
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ volatile int x[1024];
+ take_stack_args (x, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_8:
+** mov x12, #?4272
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 128\]
+** add x29, sp, #?128
+** stp x24, x25, \[sp, 144\]
+** str x26, \[sp, 160\]
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl128
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** add sp, sp, #?128
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** mov x12, #?4144
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_8 (void)
+{
+ volatile int x[1024];
+ take_stack_args (x, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_9:
+** mov x12, #?4240
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 128\]
+** add x29, sp, #?128
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl128
+** sub sp, x29, #128
+** ldr p4, \[sp\]
+** add sp, sp, #?128
+** ldp x29, x30, \[sp\]
+** mov x12, #?4112
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_9 (int n)
+{
+ volatile int x[1024];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_10:
+** mov x12, #?4272
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 128\]
+** add x29, sp, #?128
+** stp x24, x25, \[sp, 144\]
+** str x26, \[sp, 160\]
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl128
+** sub sp, x29, #128
+** ldr p4, \[sp\]
+** add sp, sp, #?128
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** mov x12, #?4144
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_10 (int n)
+{
+ volatile int x[1024];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_11:
+** sub sp, sp, #65536
+** str xzr, \[sp, 1024\]
+** mov x12, #?64576
+** sub sp, sp, x12
+** str xzr, \[sp, 1024\]
+** stp x29, x30, \[sp, 128\]
+** add x29, sp, #?128
+** stp x24, x25, \[sp, 144\]
+** str x26, \[sp, 160\]
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl128
+** sub sp, x29, #128
+** ldr p4, \[sp\]
+** add sp, sp, #?128
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** add sp, sp, #?3008
+** add sp, sp, #?126976
+** ret
+*/
+svbool_t
+test_11 (int n)
+{
+ volatile int x[0x7ee4];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -fshrink-wrap -fstack-clash-protection -msve-vector-bits=2048 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+svbool_t take_stack_args (volatile void *, void *, int, int, int,
+ int, int, int, int);
+
+/*
+** test_1:
+** sub sp, sp, #272
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl256
+** ldr p4, \[sp\]
+** add sp, sp, #?272
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ volatile int x = 1;
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** sub sp, sp, #304
+** stp x24, x25, \[sp, 256\]
+** str x26, \[sp, 272\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl256
+** ldr p4, \[sp\]
+** ldp x24, x25, \[sp, 256\]
+** ldr x26, \[sp, 272\]
+** add sp, sp, #?304
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ volatile int x = 1;
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** mov x12, #?4384
+** sub sp, sp, x12
+** stp x24, x25, \[sp, 256\]
+** str x26, \[sp, 272\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl256
+** ldr p4, \[sp\]
+** ldp x24, x25, \[sp, 256\]
+** ldr x26, \[sp, 272\]
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ volatile int x[1024];
+ asm volatile ("" :: "r" (x) : "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** sub sp, sp, #512
+** str p4, \[sp\]
+** ...
+** ptrue p0\.h, vl128
+** ldr p4, \[sp\]
+** add sp, sp, #?512
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ volatile svint32_t b;
+ b = svdup_s32 (1);
+ asm volatile ("" ::: "p4");
+ return svptrue_b16 ();
+}
+
+/*
+** test_5:
+** sub sp, sp, #544
+** stp x24, x25, \[sp, 256\]
+** str x26, \[sp, 272\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.h, vl128
+** ldr p4, \[sp\]
+** ldp x24, x25, \[sp, 256\]
+** ldr x26, \[sp, 272\]
+** add sp, sp, #?544
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ volatile svint32_t b;
+ b = svdup_s32 (1);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b16 ();
+}
+
+/*
+** test_6:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** sub sp, sp, #256
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl256
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** add sp, sp, #?256
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svbool_t
+test_6 (void)
+{
+ take_stack_args (0, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_7:
+** mov x12, #?4368
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 256\]
+** add x29, sp, #?256
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl256
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** add sp, sp, #?256
+** ldp x29, x30, \[sp\]
+** mov x12, #?4112
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ volatile int x[1024];
+ take_stack_args (x, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_8:
+** mov x12, #?4400
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 256\]
+** add x29, sp, #?256
+** stp x24, x25, \[sp, 272\]
+** str x26, \[sp, 288\]
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl256
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** add sp, sp, #?256
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** mov x12, #?4144
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_8 (void)
+{
+ volatile int x[1024];
+ take_stack_args (x, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_9:
+** mov x12, #?4368
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 256\]
+** add x29, sp, #?256
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl256
+** sub sp, x29, #256
+** ldr p4, \[sp\]
+** add sp, sp, #?256
+** ldp x29, x30, \[sp\]
+** mov x12, #?4112
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_9 (int n)
+{
+ volatile int x[1024];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_10:
+** mov x12, #?4400
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 256\]
+** add x29, sp, #?256
+** stp x24, x25, \[sp, 272\]
+** str x26, \[sp, 288\]
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl256
+** sub sp, x29, #256
+** ldr p4, \[sp\]
+** add sp, sp, #?256
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** mov x12, #?4144
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_10 (int n)
+{
+ volatile int x[1024];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_11:
+** sub sp, sp, #65536
+** str xzr, \[sp, 1024\]
+** mov x12, #?64704
+** sub sp, sp, x12
+** str xzr, \[sp, 1024\]
+** stp x29, x30, \[sp, 256\]
+** add x29, sp, #?256
+** stp x24, x25, \[sp, 272\]
+** str x26, \[sp, 288\]
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl256
+** sub sp, x29, #256
+** ldr p4, \[sp\]
+** add sp, sp, #?256
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** add sp, sp, #?3008
+** add sp, sp, #?126976
+** ret
+*/
+svbool_t
+test_11 (int n)
+{
+ volatile int x[0x7ee4];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -fshrink-wrap -fstack-clash-protection -msve-vector-bits=256 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+svbool_t take_stack_args (volatile void *, void *, int, int, int,
+ int, int, int, int);
+
+/*
+** test_1:
+** sub sp, sp, #48
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl32
+** ldr p4, \[sp\]
+** add sp, sp, #?48
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ volatile int x = 1;
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** sub sp, sp, #80
+** stp x24, x25, \[sp, 32\]
+** str x26, \[sp, 48\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl32
+** ldr p4, \[sp\]
+** ldp x24, x25, \[sp, 32\]
+** ldr x26, \[sp, 48\]
+** add sp, sp, #?80
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ volatile int x = 1;
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** mov x12, #?4160
+** sub sp, sp, x12
+** stp x24, x25, \[sp, 32\]
+** str x26, \[sp, 48\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl32
+** ldr p4, \[sp\]
+** ldp x24, x25, \[sp, 32\]
+** ldr x26, \[sp, 48\]
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ volatile int x[1024];
+ asm volatile ("" :: "r" (x) : "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** sub sp, sp, #64
+** str p4, \[sp\]
+** ...
+** ptrue p0\.h, vl16
+** ldr p4, \[sp\]
+** add sp, sp, #?64
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ volatile svint32_t b;
+ b = svdup_s32 (1);
+ asm volatile ("" ::: "p4");
+ return svptrue_b16 ();
+}
+
+/*
+** test_5:
+** sub sp, sp, #96
+** stp x24, x25, \[sp, 32\]
+** str x26, \[sp, 48\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.h, vl16
+** ldr p4, \[sp\]
+** ldp x24, x25, \[sp, 32\]
+** ldr x26, \[sp, 48\]
+** add sp, sp, #?96
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ volatile svint32_t b;
+ b = svdup_s32 (1);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b16 ();
+}
+
+/*
+** test_6:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** sub sp, sp, #32
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl32
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** add sp, sp, #?32
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svbool_t
+test_6 (void)
+{
+ take_stack_args (0, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_7:
+** mov x12, #?4144
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 32\]
+** add x29, sp, #?32
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl32
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** add sp, sp, #?32
+** ldp x29, x30, \[sp\]
+** mov x12, #?4112
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ volatile int x[1024];
+ take_stack_args (x, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_8:
+** mov x12, #?4176
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 32\]
+** add x29, sp, #?32
+** stp x24, x25, \[sp, 48\]
+** str x26, \[sp, 64\]
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl32
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** add sp, sp, #?32
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** mov x12, #?4144
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_8 (void)
+{
+ volatile int x[1024];
+ take_stack_args (x, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_9:
+** mov x12, #?4144
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 32\]
+** add x29, sp, #?32
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl32
+** sub sp, x29, #32
+** ldr p4, \[sp\]
+** add sp, sp, #?32
+** ldp x29, x30, \[sp\]
+** mov x12, #?4112
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_9 (int n)
+{
+ volatile int x[1024];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_10:
+** mov x12, #?4176
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 32\]
+** add x29, sp, #?32
+** stp x24, x25, \[sp, 48\]
+** str x26, \[sp, 64\]
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl32
+** sub sp, x29, #32
+** ldr p4, \[sp\]
+** add sp, sp, #?32
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** mov x12, #?4144
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_10 (int n)
+{
+ volatile int x[1024];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_11:
+** sub sp, sp, #65536
+** str xzr, \[sp, 1024\]
+** mov x12, #?64480
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 32\]
+** add x29, sp, #?32
+** stp x24, x25, \[sp, 48\]
+** str x26, \[sp, 64\]
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl32
+** sub sp, x29, #32
+** ldr p4, \[sp\]
+** add sp, sp, #?32
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** add sp, sp, #?3008
+** add sp, sp, #?126976
+** ret
+*/
+svbool_t
+test_11 (int n)
+{
+ volatile int x[0x7ee4];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -fshrink-wrap -fstack-clash-protection -msve-vector-bits=512 -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+svbool_t take_stack_args (volatile void *, void *, int, int, int,
+ int, int, int, int);
+
+/*
+** test_1:
+** sub sp, sp, #80
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl64
+** ldr p4, \[sp\]
+** add sp, sp, #?80
+** ret
+*/
+svbool_t
+test_1 (void)
+{
+ volatile int x = 1;
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** sub sp, sp, #112
+** stp x24, x25, \[sp, 64\]
+** str x26, \[sp, 80\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl64
+** ldr p4, \[sp\]
+** ldp x24, x25, \[sp, 64\]
+** ldr x26, \[sp, 80\]
+** add sp, sp, #?112
+** ret
+*/
+svbool_t
+test_2 (void)
+{
+ volatile int x = 1;
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_3:
+** mov x12, #?4192
+** sub sp, sp, x12
+** stp x24, x25, \[sp, 64\]
+** str x26, \[sp, 80\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl64
+** ldr p4, \[sp\]
+** ldp x24, x25, \[sp, 64\]
+** ldr x26, \[sp, 80\]
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_3 (void)
+{
+ volatile int x[1024];
+ asm volatile ("" :: "r" (x) : "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_4:
+** sub sp, sp, #128
+** str p4, \[sp\]
+** ...
+** ptrue p0\.h, vl32
+** ldr p4, \[sp\]
+** add sp, sp, #?128
+** ret
+*/
+svbool_t
+test_4 (void)
+{
+ volatile svint32_t b;
+ b = svdup_s32 (1);
+ asm volatile ("" ::: "p4");
+ return svptrue_b16 ();
+}
+
+/*
+** test_5:
+** sub sp, sp, #160
+** stp x24, x25, \[sp, 64\]
+** str x26, \[sp, 80\]
+** str p4, \[sp\]
+** ...
+** ptrue p0\.h, vl32
+** ldr p4, \[sp\]
+** ldp x24, x25, \[sp, 64\]
+** ldr x26, \[sp, 80\]
+** add sp, sp, #?160
+** ret
+*/
+svbool_t
+test_5 (void)
+{
+ volatile svint32_t b;
+ b = svdup_s32 (1);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b16 ();
+}
+
+/*
+** test_6:
+** stp x29, x30, \[sp, -16\]!
+** mov x29, sp
+** sub sp, sp, #64
+** str p4, \[sp\]
+** ...
+** ptrue p0\.b, vl64
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** add sp, sp, #?64
+** ldp x29, x30, \[sp\], 16
+** ret
+*/
+svbool_t
+test_6 (void)
+{
+ take_stack_args (0, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_7:
+** mov x12, #?4176
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 64\]
+** add x29, sp, #?64
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl64
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** add sp, sp, #?64
+** ldp x29, x30, \[sp\]
+** mov x12, #?4112
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_7 (void)
+{
+ volatile int x[1024];
+ take_stack_args (x, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_8:
+** mov x12, #?4208
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 64\]
+** add x29, sp, #?64
+** stp x24, x25, \[sp, 80\]
+** str x26, \[sp, 96\]
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl64
+** add sp, sp, #?16
+** ldr p4, \[sp\]
+** add sp, sp, #?64
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** mov x12, #?4144
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_8 (void)
+{
+ volatile int x[1024];
+ take_stack_args (x, 0, 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_9:
+** mov x12, #?4176
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 64\]
+** add x29, sp, #?64
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl64
+** sub sp, x29, #64
+** ldr p4, \[sp\]
+** add sp, sp, #?64
+** ldp x29, x30, \[sp\]
+** mov x12, #?4112
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_9 (int n)
+{
+ volatile int x[1024];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4");
+ return svptrue_b8 ();
+}
+
+/*
+** test_10:
+** mov x12, #?4208
+** sub sp, sp, x12
+** stp x29, x30, \[sp, 64\]
+** add x29, sp, #?64
+** stp x24, x25, \[sp, 80\]
+** str x26, \[sp, 96\]
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl64
+** sub sp, x29, #64
+** ldr p4, \[sp\]
+** add sp, sp, #?64
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** mov x12, #?4144
+** add sp, sp, x12
+** ret
+*/
+svbool_t
+test_10 (int n)
+{
+ volatile int x[1024];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
+
+/*
+** test_11:
+** sub sp, sp, #65536
+** str xzr, \[sp, 1024\]
+** mov x12, #?64512
+** sub sp, sp, x12
+** str xzr, \[sp, 1024\]
+** stp x29, x30, \[sp, 64\]
+** add x29, sp, #?64
+** stp x24, x25, \[sp, 80\]
+** str x26, \[sp, 96\]
+** str p4, \[sp\]
+** sub sp, sp, #16
+** ...
+** ptrue p0\.b, vl64
+** sub sp, x29, #64
+** ldr p4, \[sp\]
+** add sp, sp, #?64
+** ldp x24, x25, \[sp, 16\]
+** ldr x26, \[sp, 32\]
+** ldp x29, x30, \[sp\]
+** add sp, sp, #?3008
+** add sp, sp, #?126976
+** ret
+*/
+svbool_t
+test_11 (int n)
+{
+ volatile int x[0x7ee4];
+ take_stack_args (x, __builtin_alloca (n), 1, 2, 3, 4, 5, 6, 7);
+ asm volatile ("" ::: "p4", "x24", "x25", "x26");
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O -fshrink-wrap -fstack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#pragma GCC aarch64 "arm_sve.h"
+
+/*
+** test_1:
+** str x24, \[sp, -32\]!
+** cntb x13
+** mov x11, sp
+** ...
+** sub sp, sp, x13
+** str p4, \[sp\]
+** cbz w0, [^\n]*
+** ...
+** ptrue p0\.b, all
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ldr x24, \[sp\], 32
+** ret
+*/
+svbool_t
+test_1 (int n)
+{
+ asm volatile ("" ::: "x24");
+ if (n)
+ {
+ volatile int x = 1;
+ asm volatile ("" ::: "p4");
+ }
+ return svptrue_b8 ();
+}
+
+/*
+** test_2:
+** str x24, \[sp, -32\]!
+** cntb x13
+** mov x11, sp
+** ...
+** sub sp, sp, x13
+** str p4, \[sp\]
+** cbz w0, [^\n]*
+** str p5, \[sp, #1, mul vl\]
+** str p6, \[sp, #2, mul vl\]
+** ...
+** ptrue p0\.b, all
+** ldr p4, \[sp\]
+** addvl sp, sp, #1
+** ldr x24, \[sp\], 32
+** ret
+*/
+svbool_t
+test_2 (int n)
+{
+ asm volatile ("" ::: "x24");
+ if (n)
+ {
+ volatile int x = 1;
+ asm volatile ("" ::: "p4", "p5", "p6");
+ }
+ return svptrue_b8 ();
+}
--- /dev/null
+/* { dg-do compile } */
+
+#include <arm_sve.h>
+
+void unprototyped ();
+
+void
+f (svuint8_t *ptr)
+{
+ unprototyped (*ptr); /* { dg-error {SVE type '(svuint8_t|__SVUint8_t)' cannot be passed to an unprototyped function} } */
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+/*
+** callee_0:
+** ...
+** ldr (p[0-7]), \[x1\]
+** ...
+** cntp x0, \1, \1\.b
+** ...
+** ret
+*/
+uint64_t __attribute__((noipa))
+callee_0 (int64_t *ptr, ...)
+{
+ va_list va;
+ svbool_t pg;
+
+ va_start (va, ptr);
+ pg = va_arg (va, svbool_t);
+ va_end (va);
+ return svcntp_b8 (pg, pg);
+}
+
+/*
+** caller_0:
+** ...
+** ptrue (p[0-7])\.d, vl7
+** ...
+** str \1, \[x1\]
+** ...
+** ret
+*/
+uint64_t __attribute__((noipa))
+caller_0 (int64_t *ptr)
+{
+ return callee_0 (ptr, svptrue_pat_b64 (SV_VL7));
+}
+
+/*
+** callee_1:
+** ...
+** ldr (p[0-7]), \[x2\]
+** ...
+** cntp x0, \1, \1\.b
+** ...
+** ret
+*/
+uint64_t __attribute__((noipa))
+callee_1 (int64_t *ptr, ...)
+{
+ va_list va;
+ svbool_t pg;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ pg = va_arg (va, svbool_t);
+ va_end (va);
+ return svcntp_b8 (pg, pg);
+}
+
+/*
+** caller_1:
+** ...
+** ptrue (p[0-7])\.d, vl7
+** ...
+** str \1, \[x2\]
+** ...
+** ret
+*/
+uint64_t __attribute__((noipa))
+caller_1 (int64_t *ptr)
+{
+ return callee_1 (ptr, 1, svptrue_pat_b64 (SV_VL7));
+}
+
+/*
+** callee_7:
+** ...
+** ldr (p[0-7]), \[x7\]
+** ...
+** cntp x0, \1, \1\.b
+** ...
+** ret
+*/
+uint64_t __attribute__((noipa))
+callee_7 (int64_t *ptr, ...)
+{
+ va_list va;
+ svbool_t pg;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ pg = va_arg (va, svbool_t);
+ va_end (va);
+ return svcntp_b8 (pg, pg);
+}
+
+/*
+** caller_7:
+** ...
+** ptrue (p[0-7])\.d, vl7
+** ...
+** str \1, \[x7\]
+** ...
+** ret
+*/
+uint64_t __attribute__((noipa))
+caller_7 (int64_t *ptr)
+{
+ return callee_7 (ptr, 1, 2, 3, 4, 5, 6, svptrue_pat_b64 (SV_VL7));
+}
+
+/* FIXME: We should be able to get rid of the va_list object. */
+/*
+** callee_8:
+** sub sp, sp, #([0-9]+)
+** ...
+** ldr (x[0-9]+), \[sp, \1\]
+** ...
+** ldr (p[0-7]), \[\2\]
+** ...
+** cntp x0, \3, \3\.b
+** ...
+** ret
+*/
+uint64_t __attribute__((noipa))
+callee_8 (int64_t *ptr, ...)
+{
+ va_list va;
+ svbool_t pg;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ pg = va_arg (va, svbool_t);
+ va_end (va);
+ return svcntp_b8 (pg, pg);
+}
+
+/*
+** caller_8:
+** ...
+** ptrue (p[0-7])\.d, vl7
+** ...
+** str \1, \[(x[0-9]+)\]
+** ...
+** str \2, \[sp\]
+** ...
+** ret
+*/
+uint64_t __attribute__((noipa))
+caller_8 (int64_t *ptr)
+{
+ return callee_8 (ptr, 1, 2, 3, 4, 5, 6, 7, svptrue_pat_b64 (SV_VL7));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+/*
+** callee_0:
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[x1\]
+** ...
+** st1h \1, \2, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_0 (int16_t *ptr, ...)
+{
+ va_list va;
+ svint16_t vec;
+
+ va_start (va, ptr);
+ vec = va_arg (va, svint16_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_0:
+** ...
+** fmov (z[0-9]+\.h), #9\.0[^\n]*
+** ...
+** st1h \1, p[0-7], \[x1\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_0 (int16_t *ptr)
+{
+ callee_0 (ptr, svdup_f16 (9));
+}
+
+/*
+** callee_1:
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[x2\]
+** ...
+** st1h \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_1 (int16_t *ptr, ...)
+{
+ va_list va;
+ svint16_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ vec = va_arg (va, svint16_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_1:
+** ...
+** fmov (z[0-9]+\.h), #9\.0[^\n]*
+** ...
+** st1h \1, p[0-7], \[x2\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_1 (int16_t *ptr)
+{
+ callee_1 (ptr, 1, svdup_f16 (9));
+}
+
+/*
+** callee_7:
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[x7\]
+** ...
+** st1h \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_7 (int16_t *ptr, ...)
+{
+ va_list va;
+ svint16_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint16_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_7:
+** ...
+** fmov (z[0-9]+\.h), #9\.0[^\n]*
+** ...
+** st1h \1, p[0-7], \[x7\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_7 (int16_t *ptr)
+{
+ callee_7 (ptr, 1, 2, 3, 4, 5, 6, svdup_f16 (9));
+}
+
+/* FIXME: We should be able to get rid of the va_list object. */
+/*
+** callee_8:
+** sub sp, sp, #([0-9]+)
+** ...
+** ldr (x[0-9]+), \[sp, \1\]
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[\2\]
+** ...
+** st1h \3, \4, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_8 (int16_t *ptr, ...)
+{
+ va_list va;
+ svint16_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint16_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_8:
+** ...
+** fmov (z[0-9]+\.h), #9\.0[^\n]*
+** ...
+** st1h \1, p[0-7], \[(x[0-9]+)\]
+** ...
+** str \2, \[sp\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_8 (int16_t *ptr)
+{
+ callee_8 (ptr, 1, 2, 3, 4, 5, 6, 7, svdup_f16 (9));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+/*
+** callee_0:
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[x1\]
+** ...
+** st1w \1, \2, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_0 (int32_t *ptr, ...)
+{
+ va_list va;
+ svint32_t vec;
+
+ va_start (va, ptr);
+ vec = va_arg (va, svint32_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_0:
+** ...
+** fmov (z[0-9]+\.s), #9\.0[^\n]*
+** ...
+** st1w \1, p[0-7], \[x1\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_0 (int32_t *ptr)
+{
+ callee_0 (ptr, svdup_f32 (9));
+}
+
+/*
+** callee_1:
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[x2\]
+** ...
+** st1w \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_1 (int32_t *ptr, ...)
+{
+ va_list va;
+ svint32_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ vec = va_arg (va, svint32_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_1:
+** ...
+** fmov (z[0-9]+\.s), #9\.0[^\n]*
+** ...
+** st1w \1, p[0-7], \[x2\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_1 (int32_t *ptr)
+{
+ callee_1 (ptr, 1, svdup_f32 (9));
+}
+
+/*
+** callee_7:
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[x7\]
+** ...
+** st1w \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_7 (int32_t *ptr, ...)
+{
+ va_list va;
+ svint32_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint32_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_7:
+** ...
+** fmov (z[0-9]+\.s), #9\.0[^\n]*
+** ...
+** st1w \1, p[0-7], \[x7\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_7 (int32_t *ptr)
+{
+ callee_7 (ptr, 1, 2, 3, 4, 5, 6, svdup_f32 (9));
+}
+
+/* FIXME: We should be able to get rid of the va_list object. */
+/*
+** callee_8:
+** sub sp, sp, #([0-9]+)
+** ...
+** ldr (x[0-9]+), \[sp, \1\]
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[\2\]
+** ...
+** st1w \3, \4, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_8 (int32_t *ptr, ...)
+{
+ va_list va;
+ svint32_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint32_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_8:
+** ...
+** fmov (z[0-9]+\.s), #9\.0[^\n]*
+** ...
+** st1w \1, p[0-7], \[(x[0-9]+)\]
+** ...
+** str \2, \[sp\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_8 (int32_t *ptr)
+{
+ callee_8 (ptr, 1, 2, 3, 4, 5, 6, 7, svdup_f32 (9));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+/*
+** callee_0:
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[x1\]
+** ...
+** st1d \1, \2, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_0 (int64_t *ptr, ...)
+{
+ va_list va;
+ svint64_t vec;
+
+ va_start (va, ptr);
+ vec = va_arg (va, svint64_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_0:
+** ...
+** fmov (z[0-9]+\.d), #9\.0[^\n]*
+** ...
+** st1d \1, p[0-7], \[x1\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_0 (int64_t *ptr)
+{
+ callee_0 (ptr, svdup_f64 (9));
+}
+
+/*
+** callee_1:
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[x2\]
+** ...
+** st1d \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_1 (int64_t *ptr, ...)
+{
+ va_list va;
+ svint64_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ vec = va_arg (va, svint64_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_1:
+** ...
+** fmov (z[0-9]+\.d), #9\.0[^\n]*
+** ...
+** st1d \1, p[0-7], \[x2\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_1 (int64_t *ptr)
+{
+ callee_1 (ptr, 1, svdup_f64 (9));
+}
+
+/*
+** callee_7:
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[x7\]
+** ...
+** st1d \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_7 (int64_t *ptr, ...)
+{
+ va_list va;
+ svint64_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint64_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_7:
+** ...
+** fmov (z[0-9]+\.d), #9\.0[^\n]*
+** ...
+** st1d \1, p[0-7], \[x7\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_7 (int64_t *ptr)
+{
+ callee_7 (ptr, 1, 2, 3, 4, 5, 6, svdup_f64 (9));
+}
+
+/* FIXME: We should be able to get rid of the va_list object. */
+/*
+** callee_8:
+** sub sp, sp, #([0-9]+)
+** ...
+** ldr (x[0-9]+), \[sp, \1\]
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[\2\]
+** ...
+** st1d \3, \4, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_8 (int64_t *ptr, ...)
+{
+ va_list va;
+ svint64_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint64_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_8:
+** ...
+** fmov (z[0-9]+\.d), #9\.0[^\n]*
+** ...
+** st1d \1, p[0-7], \[(x[0-9]+)\]
+** ...
+** str \2, \[sp\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_8 (int64_t *ptr)
+{
+ callee_8 (ptr, 1, 2, 3, 4, 5, 6, 7, svdup_f64 (9));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+/*
+** callee_0:
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[x1\]
+** ...
+** st1h \1, \2, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_0 (int16_t *ptr, ...)
+{
+ va_list va;
+ svint16_t vec;
+
+ va_start (va, ptr);
+ vec = va_arg (va, svint16_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_0:
+** ...
+** mov (z[0-9]+\.h), #42
+** ...
+** st1h \1, p[0-7], \[x1\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_0 (int16_t *ptr)
+{
+ callee_0 (ptr, svdup_s16 (42));
+}
+
+/*
+** callee_1:
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[x2\]
+** ...
+** st1h \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_1 (int16_t *ptr, ...)
+{
+ va_list va;
+ svint16_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ vec = va_arg (va, svint16_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_1:
+** ...
+** mov (z[0-9]+\.h), #42
+** ...
+** st1h \1, p[0-7], \[x2\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_1 (int16_t *ptr)
+{
+ callee_1 (ptr, 1, svdup_s16 (42));
+}
+
+/*
+** callee_7:
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[x7\]
+** ...
+** st1h \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_7 (int16_t *ptr, ...)
+{
+ va_list va;
+ svint16_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint16_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_7:
+** ...
+** mov (z[0-9]+\.h), #42
+** ...
+** st1h \1, p[0-7], \[x7\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_7 (int16_t *ptr)
+{
+ callee_7 (ptr, 1, 2, 3, 4, 5, 6, svdup_s16 (42));
+}
+
+/* FIXME: We should be able to get rid of the va_list object. */
+/*
+** callee_8:
+** sub sp, sp, #([0-9]+)
+** ...
+** ldr (x[0-9]+), \[sp, \1\]
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[\2\]
+** ...
+** st1h \3, \4, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_8 (int16_t *ptr, ...)
+{
+ va_list va;
+ svint16_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint16_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_8:
+** ...
+** mov (z[0-9]+\.h), #42
+** ...
+** st1h \1, p[0-7], \[(x[0-9]+)\]
+** ...
+** str \2, \[sp\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_8 (int16_t *ptr)
+{
+ callee_8 (ptr, 1, 2, 3, 4, 5, 6, 7, svdup_s16 (42));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+/*
+** callee_0:
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[x1\]
+** ...
+** st1w \1, \2, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_0 (int32_t *ptr, ...)
+{
+ va_list va;
+ svint32_t vec;
+
+ va_start (va, ptr);
+ vec = va_arg (va, svint32_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_0:
+** ...
+** mov (z[0-9]+\.s), #42
+** ...
+** st1w \1, p[0-7], \[x1\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_0 (int32_t *ptr)
+{
+ callee_0 (ptr, svdup_s32 (42));
+}
+
+/*
+** callee_1:
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[x2\]
+** ...
+** st1w \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_1 (int32_t *ptr, ...)
+{
+ va_list va;
+ svint32_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ vec = va_arg (va, svint32_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_1:
+** ...
+** mov (z[0-9]+\.s), #42
+** ...
+** st1w \1, p[0-7], \[x2\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_1 (int32_t *ptr)
+{
+ callee_1 (ptr, 1, svdup_s32 (42));
+}
+
+/*
+** callee_7:
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[x7\]
+** ...
+** st1w \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_7 (int32_t *ptr, ...)
+{
+ va_list va;
+ svint32_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint32_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_7:
+** ...
+** mov (z[0-9]+\.s), #42
+** ...
+** st1w \1, p[0-7], \[x7\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_7 (int32_t *ptr)
+{
+ callee_7 (ptr, 1, 2, 3, 4, 5, 6, svdup_s32 (42));
+}
+
+/* FIXME: We should be able to get rid of the va_list object. */
+/*
+** callee_8:
+** sub sp, sp, #([0-9]+)
+** ...
+** ldr (x[0-9]+), \[sp, \1\]
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[\2\]
+** ...
+** st1w \3, \4, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_8 (int32_t *ptr, ...)
+{
+ va_list va;
+ svint32_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint32_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_8:
+** ...
+** mov (z[0-9]+\.s), #42
+** ...
+** st1w \1, p[0-7], \[(x[0-9]+)\]
+** ...
+** str \2, \[sp\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_8 (int32_t *ptr)
+{
+ callee_8 (ptr, 1, 2, 3, 4, 5, 6, 7, svdup_s32 (42));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+/*
+** callee_0:
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[x1\]
+** ...
+** st1d \1, \2, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_0 (int64_t *ptr, ...)
+{
+ va_list va;
+ svint64_t vec;
+
+ va_start (va, ptr);
+ vec = va_arg (va, svint64_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_0:
+** ...
+** mov (z[0-9]+\.d), #42
+** ...
+** st1d \1, p[0-7], \[x1\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_0 (int64_t *ptr)
+{
+ callee_0 (ptr, svdup_s64 (42));
+}
+
+/*
+** callee_1:
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[x2\]
+** ...
+** st1d \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_1 (int64_t *ptr, ...)
+{
+ va_list va;
+ svint64_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ vec = va_arg (va, svint64_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_1:
+** ...
+** mov (z[0-9]+\.d), #42
+** ...
+** st1d \1, p[0-7], \[x2\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_1 (int64_t *ptr)
+{
+ callee_1 (ptr, 1, svdup_s64 (42));
+}
+
+/*
+** callee_7:
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[x7\]
+** ...
+** st1d \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_7 (int64_t *ptr, ...)
+{
+ va_list va;
+ svint64_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint64_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_7:
+** ...
+** mov (z[0-9]+\.d), #42
+** ...
+** st1d \1, p[0-7], \[x7\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_7 (int64_t *ptr)
+{
+ callee_7 (ptr, 1, 2, 3, 4, 5, 6, svdup_s64 (42));
+}
+
+/* FIXME: We should be able to get rid of the va_list object. */
+/*
+** callee_8:
+** sub sp, sp, #([0-9]+)
+** ...
+** ldr (x[0-9]+), \[sp, \1\]
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[\2\]
+** ...
+** st1d \3, \4, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_8 (int64_t *ptr, ...)
+{
+ va_list va;
+ svint64_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint64_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_8:
+** ...
+** mov (z[0-9]+\.d), #42
+** ...
+** st1d \1, p[0-7], \[(x[0-9]+)\]
+** ...
+** str \2, \[sp\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_8 (int64_t *ptr)
+{
+ callee_8 (ptr, 1, 2, 3, 4, 5, 6, 7, svdup_s64 (42));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+/*
+** callee_0:
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[x1\]
+** ...
+** st1b \1, \2, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_0 (int8_t *ptr, ...)
+{
+ va_list va;
+ svint8_t vec;
+
+ va_start (va, ptr);
+ vec = va_arg (va, svint8_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_0:
+** ...
+** mov (z[0-9]+\.b), #42
+** ...
+** st1b \1, p[0-7], \[x1\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_0 (int8_t *ptr)
+{
+ callee_0 (ptr, svdup_s8 (42));
+}
+
+/*
+** callee_1:
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[x2\]
+** ...
+** st1b \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_1 (int8_t *ptr, ...)
+{
+ va_list va;
+ svint8_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ vec = va_arg (va, svint8_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_1:
+** ...
+** mov (z[0-9]+\.b), #42
+** ...
+** st1b \1, p[0-7], \[x2\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_1 (int8_t *ptr)
+{
+ callee_1 (ptr, 1, svdup_s8 (42));
+}
+
+/*
+** callee_7:
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[x7\]
+** ...
+** st1b \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_7 (int8_t *ptr, ...)
+{
+ va_list va;
+ svint8_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint8_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_7:
+** ...
+** mov (z[0-9]+\.b), #42
+** ...
+** st1b \1, p[0-7], \[x7\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_7 (int8_t *ptr)
+{
+ callee_7 (ptr, 1, 2, 3, 4, 5, 6, svdup_s8 (42));
+}
+
+/* FIXME: We should be able to get rid of the va_list object. */
+/*
+** callee_8:
+** sub sp, sp, #([0-9]+)
+** ...
+** ldr (x[0-9]+), \[sp, \1\]
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[\2\]
+** ...
+** st1b \3, \4, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_8 (int8_t *ptr, ...)
+{
+ va_list va;
+ svint8_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint8_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_8:
+** ...
+** mov (z[0-9]+\.b), #42
+** ...
+** st1b \1, p[0-7], \[(x[0-9]+)\]
+** ...
+** str \2, \[sp\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_8 (int8_t *ptr)
+{
+ callee_8 (ptr, 1, 2, 3, 4, 5, 6, 7, svdup_s8 (42));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+/*
+** callee_0:
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[x1\]
+** ...
+** st1h \1, \2, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_0 (int16_t *ptr, ...)
+{
+ va_list va;
+ svint16_t vec;
+
+ va_start (va, ptr);
+ vec = va_arg (va, svint16_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_0:
+** ...
+** mov (z[0-9]+\.h), #42
+** ...
+** st1h \1, p[0-7], \[x1\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_0 (int16_t *ptr)
+{
+ callee_0 (ptr, svdup_u16 (42));
+}
+
+/*
+** callee_1:
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[x2\]
+** ...
+** st1h \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_1 (int16_t *ptr, ...)
+{
+ va_list va;
+ svint16_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ vec = va_arg (va, svint16_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_1:
+** ...
+** mov (z[0-9]+\.h), #42
+** ...
+** st1h \1, p[0-7], \[x2\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_1 (int16_t *ptr)
+{
+ callee_1 (ptr, 1, svdup_u16 (42));
+}
+
+/*
+** callee_7:
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[x7\]
+** ...
+** st1h \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_7 (int16_t *ptr, ...)
+{
+ va_list va;
+ svint16_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint16_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_7:
+** ...
+** mov (z[0-9]+\.h), #42
+** ...
+** st1h \1, p[0-7], \[x7\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_7 (int16_t *ptr)
+{
+ callee_7 (ptr, 1, 2, 3, 4, 5, 6, svdup_u16 (42));
+}
+
+/* FIXME: We should be able to get rid of the va_list object. */
+/*
+** callee_8:
+** sub sp, sp, #([0-9]+)
+** ...
+** ldr (x[0-9]+), \[sp, \1\]
+** ...
+** ld1h (z[0-9]+\.h), (p[0-7])/z, \[\2\]
+** ...
+** st1h \3, \4, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_8 (int16_t *ptr, ...)
+{
+ va_list va;
+ svint16_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint16_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_8:
+** ...
+** mov (z[0-9]+\.h), #42
+** ...
+** st1h \1, p[0-7], \[(x[0-9]+)\]
+** ...
+** str \2, \[sp\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_8 (int16_t *ptr)
+{
+ callee_8 (ptr, 1, 2, 3, 4, 5, 6, 7, svdup_u16 (42));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+/*
+** callee_0:
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[x1\]
+** ...
+** st1w \1, \2, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_0 (int32_t *ptr, ...)
+{
+ va_list va;
+ svint32_t vec;
+
+ va_start (va, ptr);
+ vec = va_arg (va, svint32_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_0:
+** ...
+** mov (z[0-9]+\.s), #42
+** ...
+** st1w \1, p[0-7], \[x1\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_0 (int32_t *ptr)
+{
+ callee_0 (ptr, svdup_u32 (42));
+}
+
+/*
+** callee_1:
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[x2\]
+** ...
+** st1w \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_1 (int32_t *ptr, ...)
+{
+ va_list va;
+ svint32_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ vec = va_arg (va, svint32_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_1:
+** ...
+** mov (z[0-9]+\.s), #42
+** ...
+** st1w \1, p[0-7], \[x2\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_1 (int32_t *ptr)
+{
+ callee_1 (ptr, 1, svdup_u32 (42));
+}
+
+/*
+** callee_7:
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[x7\]
+** ...
+** st1w \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_7 (int32_t *ptr, ...)
+{
+ va_list va;
+ svint32_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint32_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_7:
+** ...
+** mov (z[0-9]+\.s), #42
+** ...
+** st1w \1, p[0-7], \[x7\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_7 (int32_t *ptr)
+{
+ callee_7 (ptr, 1, 2, 3, 4, 5, 6, svdup_u32 (42));
+}
+
+/* FIXME: We should be able to get rid of the va_list object. */
+/*
+** callee_8:
+** sub sp, sp, #([0-9]+)
+** ...
+** ldr (x[0-9]+), \[sp, \1\]
+** ...
+** ld1w (z[0-9]+\.s), (p[0-7])/z, \[\2\]
+** ...
+** st1w \3, \4, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_8 (int32_t *ptr, ...)
+{
+ va_list va;
+ svint32_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint32_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_8:
+** ...
+** mov (z[0-9]+\.s), #42
+** ...
+** st1w \1, p[0-7], \[(x[0-9]+)\]
+** ...
+** str \2, \[sp\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_8 (int32_t *ptr)
+{
+ callee_8 (ptr, 1, 2, 3, 4, 5, 6, 7, svdup_u32 (42));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+/*
+** callee_0:
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[x1\]
+** ...
+** st1d \1, \2, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_0 (int64_t *ptr, ...)
+{
+ va_list va;
+ svint64_t vec;
+
+ va_start (va, ptr);
+ vec = va_arg (va, svint64_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_0:
+** ...
+** mov (z[0-9]+\.d), #42
+** ...
+** st1d \1, p[0-7], \[x1\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_0 (int64_t *ptr)
+{
+ callee_0 (ptr, svdup_u64 (42));
+}
+
+/*
+** callee_1:
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[x2\]
+** ...
+** st1d \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_1 (int64_t *ptr, ...)
+{
+ va_list va;
+ svint64_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ vec = va_arg (va, svint64_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_1:
+** ...
+** mov (z[0-9]+\.d), #42
+** ...
+** st1d \1, p[0-7], \[x2\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_1 (int64_t *ptr)
+{
+ callee_1 (ptr, 1, svdup_u64 (42));
+}
+
+/*
+** callee_7:
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[x7\]
+** ...
+** st1d \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_7 (int64_t *ptr, ...)
+{
+ va_list va;
+ svint64_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint64_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_7:
+** ...
+** mov (z[0-9]+\.d), #42
+** ...
+** st1d \1, p[0-7], \[x7\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_7 (int64_t *ptr)
+{
+ callee_7 (ptr, 1, 2, 3, 4, 5, 6, svdup_u64 (42));
+}
+
+/* FIXME: We should be able to get rid of the va_list object. */
+/*
+** callee_8:
+** sub sp, sp, #([0-9]+)
+** ...
+** ldr (x[0-9]+), \[sp, \1\]
+** ...
+** ld1d (z[0-9]+\.d), (p[0-7])/z, \[\2\]
+** ...
+** st1d \3, \4, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_8 (int64_t *ptr, ...)
+{
+ va_list va;
+ svint64_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint64_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_8:
+** ...
+** mov (z[0-9]+\.d), #42
+** ...
+** st1d \1, p[0-7], \[(x[0-9]+)\]
+** ...
+** str \2, \[sp\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_8 (int64_t *ptr)
+{
+ callee_8 (ptr, 1, 2, 3, 4, 5, 6, 7, svdup_u64 (42));
+}
--- /dev/null
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-clash-protection -g" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+/*
+** callee_0:
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[x1\]
+** ...
+** st1b \1, \2, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_0 (int8_t *ptr, ...)
+{
+ va_list va;
+ svint8_t vec;
+
+ va_start (va, ptr);
+ vec = va_arg (va, svint8_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_0:
+** ...
+** mov (z[0-9]+\.b), #42
+** ...
+** st1b \1, p[0-7], \[x1\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_0 (int8_t *ptr)
+{
+ callee_0 (ptr, svdup_u8 (42));
+}
+
+/*
+** callee_1:
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[x2\]
+** ...
+** st1b \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_1 (int8_t *ptr, ...)
+{
+ va_list va;
+ svint8_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ vec = va_arg (va, svint8_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_1:
+** ...
+** mov (z[0-9]+\.b), #42
+** ...
+** st1b \1, p[0-7], \[x2\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_1 (int8_t *ptr)
+{
+ callee_1 (ptr, 1, svdup_u8 (42));
+}
+
+/*
+** callee_7:
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[x7\]
+** ...
+** st1b \1, p[0-7], \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_7 (int8_t *ptr, ...)
+{
+ va_list va;
+ svint8_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint8_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_7:
+** ...
+** mov (z[0-9]+\.b), #42
+** ...
+** st1b \1, p[0-7], \[x7\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_7 (int8_t *ptr)
+{
+ callee_7 (ptr, 1, 2, 3, 4, 5, 6, svdup_u8 (42));
+}
+
+/* FIXME: We should be able to get rid of the va_list object. */
+/*
+** callee_8:
+** sub sp, sp, #([0-9]+)
+** ...
+** ldr (x[0-9]+), \[sp, \1\]
+** ...
+** ld1b (z[0-9]+\.b), (p[0-7])/z, \[\2\]
+** ...
+** st1b \3, \4, \[x0\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+callee_8 (int8_t *ptr, ...)
+{
+ va_list va;
+ svint8_t vec;
+
+ va_start (va, ptr);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ va_arg (va, int);
+ vec = va_arg (va, svint8_t);
+ va_end (va);
+ svst1 (svptrue_b8 (), ptr, vec);
+}
+
+/*
+** caller_8:
+** ...
+** mov (z[0-9]+\.b), #42
+** ...
+** st1b \1, p[0-7], \[(x[0-9]+)\]
+** ...
+** str \2, \[sp\]
+** ...
+** ret
+*/
+void __attribute__((noipa))
+caller_8 (int8_t *ptr)
+{
+ callee_8 (ptr, 1, 2, 3, 4, 5, 6, 7, svdup_u8 (42));
+}
--- /dev/null
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O0 -g" } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+void __attribute__((noipa))
+callee (int foo, ...)
+{
+ va_list va;
+ svbool_t pg, p;
+ svint8_t s8;
+ svuint16x4_t u16;
+ svfloat32x3_t f32;
+ svint64x2_t s64;
+
+ va_start (va, foo);
+ p = va_arg (va, svbool_t);
+ s8 = va_arg (va, svint8_t);
+ u16 = va_arg (va, svuint16x4_t);
+ f32 = va_arg (va, svfloat32x3_t);
+ s64 = va_arg (va, svint64x2_t);
+
+ pg = svptrue_b8 ();
+
+ if (svptest_any (pg, sveor_z (pg, p, svptrue_pat_b8 (SV_VL7))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, s8, svindex_s8 (1, 2))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 0), svindex_u16 (2, 3))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 1), svindex_u16 (3, 4))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 2), svindex_u16 (4, 5))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 3), svindex_u16 (5, 6))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget3 (f32, 0), svdup_f32 (1.0))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget3 (f32, 1), svdup_f32 (2.0))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget3 (f32, 2), svdup_f32 (3.0))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget2 (s64, 0), svindex_s64 (6, 7))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget2 (s64, 1), svindex_s64 (7, 8))))
+ __builtin_abort ();
+}
+
+int __attribute__((noipa))
+main (void)
+{
+ callee (100,
+ svptrue_pat_b8 (SV_VL7),
+ svindex_s8 (1, 2),
+ svcreate4 (svindex_u16 (2, 3),
+ svindex_u16 (3, 4),
+ svindex_u16 (4, 5),
+ svindex_u16 (5, 6)),
+ svcreate3 (svdup_f32 (1.0),
+ svdup_f32 (2.0),
+ svdup_f32 (3.0)),
+ svcreate2 (svindex_s64 (6, 7),
+ svindex_s64 (7, 8)));
+}
--- /dev/null
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O0 -fstack-clash-protection -g" } */
+
+#include <arm_sve.h>
+#include <stdarg.h>
+
+void __attribute__((noipa))
+callee (int foo, ...)
+{
+ va_list va;
+ svbool_t pg, p;
+ svint8_t s8;
+ svuint16x4_t u16;
+ svfloat32x3_t f32;
+ svint64x2_t s64;
+
+ va_start (va, foo);
+ p = va_arg (va, svbool_t);
+ s8 = va_arg (va, svint8_t);
+ u16 = va_arg (va, svuint16x4_t);
+ f32 = va_arg (va, svfloat32x3_t);
+ s64 = va_arg (va, svint64x2_t);
+
+ pg = svptrue_b8 ();
+
+ if (svptest_any (pg, sveor_z (pg, p, svptrue_pat_b8 (SV_VL7))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, s8, svindex_s8 (1, 2))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 0), svindex_u16 (2, 3))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 1), svindex_u16 (3, 4))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 2), svindex_u16 (4, 5))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget4 (u16, 3), svindex_u16 (5, 6))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget3 (f32, 0), svdup_f32 (1.0))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget3 (f32, 1), svdup_f32 (2.0))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget3 (f32, 2), svdup_f32 (3.0))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget2 (s64, 0), svindex_s64 (6, 7))))
+ __builtin_abort ();
+
+ if (svptest_any (pg, svcmpne (pg, svget2 (s64, 1), svindex_s64 (7, 8))))
+ __builtin_abort ();
+}
+
+int __attribute__((noipa))
+main (void)
+{
+ callee (100,
+ svptrue_pat_b8 (SV_VL7),
+ svindex_s8 (1, 2),
+ svcreate4 (svindex_u16 (2, 3),
+ svindex_u16 (3, 4),
+ svindex_u16 (4, 5),
+ svindex_u16 (5, 6)),
+ svcreate3 (svdup_f32 (1.0),
+ svdup_f32 (2.0),
+ svdup_f32 (3.0)),
+ svcreate2 (svindex_s64 (6, 7),
+ svindex_s64 (7, 8)));
+}
--- /dev/null
+/* { dg-do compile } */
+
+__attribute__ ((aarch64_vector_pcs)) void f1 (__SVBool_t); /* { dg-error {the 'aarch64_vector_pcs' attribute cannot be applied to an SVE function type} } */
+__attribute__ ((aarch64_vector_pcs)) void f2 (__SVInt8_t s8) {} /* { dg-error {the 'aarch64_vector_pcs' attribute cannot be applied to an SVE function type} } */
+__attribute__ ((aarch64_vector_pcs)) void (*f3) (__SVInt16_t); /* { dg-error {the 'aarch64_vector_pcs' attribute cannot be applied to an SVE function type} } */
+typedef __attribute__ ((aarch64_vector_pcs)) void (*f4) (__SVInt32_t); /* { dg-error {the 'aarch64_vector_pcs' attribute cannot be applied to an SVE function type} } */