From: Sandra Loosemore Date: Sun, 1 Feb 2015 02:11:30 +0000 (-0500) Subject: md.texi (Machine Constraints): Alphabetize table by target. X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=b4fbcb1bf2f569af3e57e91132f3573f37ad3800;p=gcc.git md.texi (Machine Constraints): Alphabetize table by target. 2015-01-31 Sandra Loosemore gcc/ * doc/md.texi (Machine Constraints): Alphabetize table by target. * doc/extend.texi (x86 Variable Attributes): Move section to correct alphabetization after renaming. (x86 Type Attributes): Likewise. (Target Builtins): Re-alphabetize menu. (x86 Built-in Functions): Move section to correct alphabetization after renaming. (x86 transactional memory intrinsics): Likewise. * doc/invoke.texi (Option Summary): Re-alphabetize x86 Options and x86 Windows Options in table and menu. (x86 Options): Move section to correct alphabetization after renaming. (x86 Windows Options): Likewise. From-SVN: r220315 --- diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 7c06f05777d..0618d835f9a 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,19 @@ +2015-01-31 Sandra Loosemore + + * doc/md.texi (Machine Constraints): Alphabetize table by target. + * doc/extend.texi (x86 Variable Attributes): Move section to + correct alphabetization after renaming. + (x86 Type Attributes): Likewise. + (Target Builtins): Re-alphabetize menu. + (x86 Built-in Functions): Move section to correct alphabetization + after renaming. + (x86 transactional memory intrinsics): Likewise. + * doc/invoke.texi (Option Summary): Re-alphabetize x86 Options + and x86 Windows Options in table and menu. + (x86 Options): Move section to correct alphabetization after + renaming. + (x86 Windows Options): Likewise. + 2015-01-31 Sandra Loosemore * doc/extend.texi: Use "x86", "x86-32", and "x86-64" as the diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 681812e2c9b..18068508d03 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -5521,6 +5521,23 @@ int cpu_clock __attribute__((cb(0x123))); @end table +@subsection PowerPC Variable Attributes + +Three attributes currently are defined for PowerPC configurations: +@code{altivec}, @code{ms_struct} and @code{gcc_struct}. + +For full documentation of the struct attributes please see the +documentation in @ref{x86 Variable Attributes}. + +For documentation of @code{altivec} attribute please see the +documentation in @ref{PowerPC Type Attributes}. + +@subsection SPU Variable Attributes + +The SPU supports the @code{spu_vector} attribute for variables. For +documentation of this attribute please see the documentation in +@ref{SPU Type Attributes}. + @anchor{x86 Variable Attributes} @subsection x86 Variable Attributes @@ -5659,23 +5676,6 @@ Here, @code{t5} takes up 2 bytes. @end enumerate @end table -@subsection PowerPC Variable Attributes - -Three attributes currently are defined for PowerPC configurations: -@code{altivec}, @code{ms_struct} and @code{gcc_struct}. - -For full documentation of the struct attributes please see the -documentation in @ref{x86 Variable Attributes}. - -For documentation of @code{altivec} attribute please see the -documentation in @ref{PowerPC Type Attributes}. - -@subsection SPU Variable Attributes - -The SPU supports the @code{spu_vector} attribute for variables. For -documentation of this attribute please see the documentation in -@ref{SPU Type Attributes}. - @subsection Xstormy16 Variable Attributes One attribute is currently defined for xstormy16 configurations: @@ -6078,30 +6078,6 @@ Specifically, the @code{based}, @code{tiny}, @code{near}, and @code{far} attributes may be applied to either. The @code{io} and @code{cb} attributes may not be applied to types. -@anchor{x86 Type Attributes} -@subsection x86 Type Attributes - -Two attributes are currently defined for x86 configurations: -@code{ms_struct} and @code{gcc_struct}. - -@table @code - -@item ms_struct -@itemx gcc_struct -@cindex @code{ms_struct} -@cindex @code{gcc_struct} - -If @code{packed} is used on a structure, or if bit-fields are used -it may be that the Microsoft ABI packs them differently -than GCC normally packs them. Particularly when moving packed -data between functions compiled with GCC and the native Microsoft compiler -(either via function call or as data in a file), it may be necessary to access -either format. - -Currently @option{-m[no-]ms-bitfields} is provided for the Microsoft Windows x86 -compilers to match the native Microsoft compiler. -@end table - @anchor{PowerPC Type Attributes} @subsection PowerPC Type Attributes @@ -6134,6 +6110,30 @@ allows one to declare vector data types supported by the Sony/Toshiba/IBM SPU Language Extensions Specification. It is intended to support the @code{__vector} keyword. +@anchor{x86 Type Attributes} +@subsection x86 Type Attributes + +Two attributes are currently defined for x86 configurations: +@code{ms_struct} and @code{gcc_struct}. + +@table @code + +@item ms_struct +@itemx gcc_struct +@cindex @code{ms_struct} +@cindex @code{gcc_struct} + +If @code{packed} is used on a structure, or if bit-fields are used +it may be that the Microsoft ABI packs them differently +than GCC normally packs them. Particularly when moving packed +data between functions compiled with GCC and the native Microsoft compiler +(either via function call or as data in a file), it may be necessary to access +either format. + +Currently @option{-m[no-]ms-bitfields} is provided for the Microsoft Windows x86 +compilers to match the native Microsoft compiler. +@end table + @node Alignment @section Inquiring on Alignment of Types or Variables @cindex alignment @@ -10113,8 +10113,6 @@ instructions, but allow the compiler to schedule those calls. * AVR Built-in Functions:: * Blackfin Built-in Functions:: * FR-V Built-in Functions:: -* x86 Built-in Functions:: -* x86 transactional memory intrinsics:: * MIPS DSP Built-in Functions:: * MIPS Paired-Single Support:: * MIPS Loongson Built-in Functions:: @@ -10133,6 +10131,8 @@ instructions, but allow the compiler to schedule those calls. * TI C6X Built-in Functions:: * TILE-Gx Built-in Functions:: * TILEPro Built-in Functions:: +* x86 Built-in Functions:: +* x86 transactional memory intrinsics:: @end menu @node AArch64 Built-in Functions @@ -11484,5787 +11484,5787 @@ Use the @code{nldub} instruction to load the contents of address @var{x} into the data cache. The instruction is issued in slot I1@. @end table -@node x86 Built-in Functions -@subsection x86 Built-in Functions +@node MIPS DSP Built-in Functions +@subsection MIPS DSP Built-in Functions -These built-in functions are available for the x86-32 and x86-64 family -of computers, depending on the command-line switches used. +The MIPS DSP Application-Specific Extension (ASE) includes new +instructions that are designed to improve the performance of DSP and +media applications. It provides instructions that operate on packed +8-bit/16-bit integer data, Q7, Q15 and Q31 fractional data. -If you specify command-line switches such as @option{-msse}, -the compiler could use the extended instruction sets even if the built-ins -are not used explicitly in the program. For this reason, applications -that perform run-time CPU detection must compile separate files for each -supported architecture, using the appropriate flags. In particular, -the file containing the CPU detection code should be compiled without -these options. +GCC supports MIPS DSP operations using both the generic +vector extensions (@pxref{Vector Extensions}) and a collection of +MIPS-specific built-in functions. Both kinds of support are +enabled by the @option{-mdsp} command-line option. -The following machine modes are available for use with MMX built-in functions -(@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32-bit integers, -@code{V4HI} for a vector of four 16-bit integers, and @code{V8QI} for a -vector of eight 8-bit integers. Some of the built-in functions operate on -MMX registers as a whole 64-bit entity, these use @code{V1DI} as their mode. +Revision 2 of the ASE was introduced in the second half of 2006. +This revision adds extra instructions to the original ASE, but is +otherwise backwards-compatible with it. You can select revision 2 +using the command-line option @option{-mdspr2}; this option implies +@option{-mdsp}. -If 3DNow!@: extensions are enabled, @code{V2SF} is used as a mode for a vector -of two 32-bit floating-point values. +The SCOUNT and POS bits of the DSP control register are global. The +WRDSP, EXTPDP, EXTPDPV and MTHLIP instructions modify the SCOUNT and +POS bits. During optimization, the compiler does not delete these +instructions and it does not delete calls to functions containing +these instructions. -If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32-bit -floating-point values. Some instructions use a vector of four 32-bit -integers, these use @code{V4SI}. Finally, some instructions operate on an -entire vector register, interpreting it as a 128-bit integer, these use mode -@code{TI}. +At present, GCC only provides support for operations on 32-bit +vectors. The vector type associated with 8-bit integer data is +usually called @code{v4i8}, the vector type associated with Q7 +is usually called @code{v4q7}, the vector type associated with 16-bit +integer data is usually called @code{v2i16}, and the vector type +associated with Q15 is usually called @code{v2q15}. They can be +defined in C as follows: -In 64-bit mode, the x86-64 family of processors uses additional built-in -functions for efficient use of @code{TF} (@code{__float128}) 128-bit -floating point and @code{TC} 128-bit complex floating-point values. +@smallexample +typedef signed char v4i8 __attribute__ ((vector_size(4))); +typedef signed char v4q7 __attribute__ ((vector_size(4))); +typedef short v2i16 __attribute__ ((vector_size(4))); +typedef short v2q15 __attribute__ ((vector_size(4))); +@end smallexample -The following floating-point built-in functions are available in 64-bit -mode. All of them implement the function that is part of the name. +@code{v4i8}, @code{v4q7}, @code{v2i16} and @code{v2q15} values are +initialized in the same way as aggregates. For example: @smallexample -__float128 __builtin_fabsq (__float128) -__float128 __builtin_copysignq (__float128, __float128) +v4i8 a = @{1, 2, 3, 4@}; +v4i8 b; +b = (v4i8) @{5, 6, 7, 8@}; + +v2q15 c = @{0x0fcb, 0x3a75@}; +v2q15 d; +d = (v2q15) @{0.1234 * 0x1.0p15, 0.4567 * 0x1.0p15@}; @end smallexample -The following built-in function is always available. +@emph{Note:} The CPU's endianness determines the order in which values +are packed. On little-endian targets, the first value is the least +significant and the last value is the most significant. The opposite +order applies to big-endian targets. For example, the code above +sets the lowest byte of @code{a} to @code{1} on little-endian targets +and @code{4} on big-endian targets. -@table @code -@item void __builtin_ia32_pause (void) -Generates the @code{pause} machine instruction with a compiler memory -barrier. -@end table +@emph{Note:} Q7, Q15 and Q31 values must be initialized with their integer +representation. As shown in this example, the integer representation +of a Q7 value can be obtained by multiplying the fractional value by +@code{0x1.0p7}. The equivalent for Q15 values is to multiply by +@code{0x1.0p15}. The equivalent for Q31 values is to multiply by +@code{0x1.0p31}. -The following floating-point built-in functions are made available in the -64-bit mode. +The table below lists the @code{v4i8} and @code{v2q15} operations for which +hardware support exists. @code{a} and @code{b} are @code{v4i8} values, +and @code{c} and @code{d} are @code{v2q15} values. -@table @code -@item __float128 __builtin_infq (void) -Similar to @code{__builtin_inf}, except the return type is @code{__float128}. -@findex __builtin_infq +@multitable @columnfractions .50 .50 +@item C code @tab MIPS instruction +@item @code{a + b} @tab @code{addu.qb} +@item @code{c + d} @tab @code{addq.ph} +@item @code{a - b} @tab @code{subu.qb} +@item @code{c - d} @tab @code{subq.ph} +@end multitable -@item __float128 __builtin_huge_valq (void) -Similar to @code{__builtin_huge_val}, except the return type is @code{__float128}. -@findex __builtin_huge_valq -@end table +The table below lists the @code{v2i16} operation for which +hardware support exists for the DSP ASE REV 2. @code{e} and @code{f} are +@code{v2i16} values. -The following built-in functions are always available and can be used to -check the target platform type. +@multitable @columnfractions .50 .50 +@item C code @tab MIPS instruction +@item @code{e * f} @tab @code{mul.ph} +@end multitable -@deftypefn {Built-in Function} void __builtin_cpu_init (void) -This function runs the CPU detection code to check the type of CPU and the -features supported. This built-in function needs to be invoked along with the built-in functions -to check CPU type and features, @code{__builtin_cpu_is} and -@code{__builtin_cpu_supports}, only when used in a function that is -executed before any constructors are called. The CPU detection code is -automatically executed in a very high priority constructor. +It is easier to describe the DSP built-in functions if we first define +the following types: -For example, this function has to be used in @code{ifunc} resolvers that -check for CPU type using the built-in functions @code{__builtin_cpu_is} -and @code{__builtin_cpu_supports}, or in constructors on targets that -don't support constructor priority. @smallexample - -static void (*resolve_memcpy (void)) (void) -@{ - // ifunc resolvers fire before constructors, explicitly call the init - // function. - __builtin_cpu_init (); - if (__builtin_cpu_supports ("ssse3")) - return ssse3_memcpy; // super fast memcpy with ssse3 instructions. - else - return default_memcpy; -@} - -void *memcpy (void *, const void *, size_t) - __attribute__ ((ifunc ("resolve_memcpy"))); +typedef int q31; +typedef int i32; +typedef unsigned int ui32; +typedef long long a64; @end smallexample -@end deftypefn - -@deftypefn {Built-in Function} int __builtin_cpu_is (const char *@var{cpuname}) -This function returns a positive integer if the run-time CPU -is of type @var{cpuname} -and returns @code{0} otherwise. The following CPU names can be detected: +@code{q31} and @code{i32} are actually the same as @code{int}, but we +use @code{q31} to indicate a Q31 fractional value and @code{i32} to +indicate a 32-bit integer value. Similarly, @code{a64} is the same as +@code{long long}, but we use @code{a64} to indicate values that are +placed in one of the four DSP accumulators (@code{$ac0}, +@code{$ac1}, @code{$ac2} or @code{$ac3}). -@table @samp -@item intel -Intel CPU. +Also, some built-in functions prefer or require immediate numbers as +parameters, because the corresponding DSP instructions accept both immediate +numbers and register operands, or accept immediate numbers only. The +immediate parameters are listed as follows. -@item atom -Intel Atom CPU. +@smallexample +imm0_3: 0 to 3. +imm0_7: 0 to 7. +imm0_15: 0 to 15. +imm0_31: 0 to 31. +imm0_63: 0 to 63. +imm0_255: 0 to 255. +imm_n32_31: -32 to 31. +imm_n512_511: -512 to 511. +@end smallexample -@item core2 -Intel Core 2 CPU. +The following built-in functions map directly to a particular MIPS DSP +instruction. Please refer to the architecture specification +for details on what each instruction does. -@item corei7 -Intel Core i7 CPU. +@smallexample +v2q15 __builtin_mips_addq_ph (v2q15, v2q15) +v2q15 __builtin_mips_addq_s_ph (v2q15, v2q15) +q31 __builtin_mips_addq_s_w (q31, q31) +v4i8 __builtin_mips_addu_qb (v4i8, v4i8) +v4i8 __builtin_mips_addu_s_qb (v4i8, v4i8) +v2q15 __builtin_mips_subq_ph (v2q15, v2q15) +v2q15 __builtin_mips_subq_s_ph (v2q15, v2q15) +q31 __builtin_mips_subq_s_w (q31, q31) +v4i8 __builtin_mips_subu_qb (v4i8, v4i8) +v4i8 __builtin_mips_subu_s_qb (v4i8, v4i8) +i32 __builtin_mips_addsc (i32, i32) +i32 __builtin_mips_addwc (i32, i32) +i32 __builtin_mips_modsub (i32, i32) +i32 __builtin_mips_raddu_w_qb (v4i8) +v2q15 __builtin_mips_absq_s_ph (v2q15) +q31 __builtin_mips_absq_s_w (q31) +v4i8 __builtin_mips_precrq_qb_ph (v2q15, v2q15) +v2q15 __builtin_mips_precrq_ph_w (q31, q31) +v2q15 __builtin_mips_precrq_rs_ph_w (q31, q31) +v4i8 __builtin_mips_precrqu_s_qb_ph (v2q15, v2q15) +q31 __builtin_mips_preceq_w_phl (v2q15) +q31 __builtin_mips_preceq_w_phr (v2q15) +v2q15 __builtin_mips_precequ_ph_qbl (v4i8) +v2q15 __builtin_mips_precequ_ph_qbr (v4i8) +v2q15 __builtin_mips_precequ_ph_qbla (v4i8) +v2q15 __builtin_mips_precequ_ph_qbra (v4i8) +v2q15 __builtin_mips_preceu_ph_qbl (v4i8) +v2q15 __builtin_mips_preceu_ph_qbr (v4i8) +v2q15 __builtin_mips_preceu_ph_qbla (v4i8) +v2q15 __builtin_mips_preceu_ph_qbra (v4i8) +v4i8 __builtin_mips_shll_qb (v4i8, imm0_7) +v4i8 __builtin_mips_shll_qb (v4i8, i32) +v2q15 __builtin_mips_shll_ph (v2q15, imm0_15) +v2q15 __builtin_mips_shll_ph (v2q15, i32) +v2q15 __builtin_mips_shll_s_ph (v2q15, imm0_15) +v2q15 __builtin_mips_shll_s_ph (v2q15, i32) +q31 __builtin_mips_shll_s_w (q31, imm0_31) +q31 __builtin_mips_shll_s_w (q31, i32) +v4i8 __builtin_mips_shrl_qb (v4i8, imm0_7) +v4i8 __builtin_mips_shrl_qb (v4i8, i32) +v2q15 __builtin_mips_shra_ph (v2q15, imm0_15) +v2q15 __builtin_mips_shra_ph (v2q15, i32) +v2q15 __builtin_mips_shra_r_ph (v2q15, imm0_15) +v2q15 __builtin_mips_shra_r_ph (v2q15, i32) +q31 __builtin_mips_shra_r_w (q31, imm0_31) +q31 __builtin_mips_shra_r_w (q31, i32) +v2q15 __builtin_mips_muleu_s_ph_qbl (v4i8, v2q15) +v2q15 __builtin_mips_muleu_s_ph_qbr (v4i8, v2q15) +v2q15 __builtin_mips_mulq_rs_ph (v2q15, v2q15) +q31 __builtin_mips_muleq_s_w_phl (v2q15, v2q15) +q31 __builtin_mips_muleq_s_w_phr (v2q15, v2q15) +a64 __builtin_mips_dpau_h_qbl (a64, v4i8, v4i8) +a64 __builtin_mips_dpau_h_qbr (a64, v4i8, v4i8) +a64 __builtin_mips_dpsu_h_qbl (a64, v4i8, v4i8) +a64 __builtin_mips_dpsu_h_qbr (a64, v4i8, v4i8) +a64 __builtin_mips_dpaq_s_w_ph (a64, v2q15, v2q15) +a64 __builtin_mips_dpaq_sa_l_w (a64, q31, q31) +a64 __builtin_mips_dpsq_s_w_ph (a64, v2q15, v2q15) +a64 __builtin_mips_dpsq_sa_l_w (a64, q31, q31) +a64 __builtin_mips_mulsaq_s_w_ph (a64, v2q15, v2q15) +a64 __builtin_mips_maq_s_w_phl (a64, v2q15, v2q15) +a64 __builtin_mips_maq_s_w_phr (a64, v2q15, v2q15) +a64 __builtin_mips_maq_sa_w_phl (a64, v2q15, v2q15) +a64 __builtin_mips_maq_sa_w_phr (a64, v2q15, v2q15) +i32 __builtin_mips_bitrev (i32) +i32 __builtin_mips_insv (i32, i32) +v4i8 __builtin_mips_repl_qb (imm0_255) +v4i8 __builtin_mips_repl_qb (i32) +v2q15 __builtin_mips_repl_ph (imm_n512_511) +v2q15 __builtin_mips_repl_ph (i32) +void __builtin_mips_cmpu_eq_qb (v4i8, v4i8) +void __builtin_mips_cmpu_lt_qb (v4i8, v4i8) +void __builtin_mips_cmpu_le_qb (v4i8, v4i8) +i32 __builtin_mips_cmpgu_eq_qb (v4i8, v4i8) +i32 __builtin_mips_cmpgu_lt_qb (v4i8, v4i8) +i32 __builtin_mips_cmpgu_le_qb (v4i8, v4i8) +void __builtin_mips_cmp_eq_ph (v2q15, v2q15) +void __builtin_mips_cmp_lt_ph (v2q15, v2q15) +void __builtin_mips_cmp_le_ph (v2q15, v2q15) +v4i8 __builtin_mips_pick_qb (v4i8, v4i8) +v2q15 __builtin_mips_pick_ph (v2q15, v2q15) +v2q15 __builtin_mips_packrl_ph (v2q15, v2q15) +i32 __builtin_mips_extr_w (a64, imm0_31) +i32 __builtin_mips_extr_w (a64, i32) +i32 __builtin_mips_extr_r_w (a64, imm0_31) +i32 __builtin_mips_extr_s_h (a64, i32) +i32 __builtin_mips_extr_rs_w (a64, imm0_31) +i32 __builtin_mips_extr_rs_w (a64, i32) +i32 __builtin_mips_extr_s_h (a64, imm0_31) +i32 __builtin_mips_extr_r_w (a64, i32) +i32 __builtin_mips_extp (a64, imm0_31) +i32 __builtin_mips_extp (a64, i32) +i32 __builtin_mips_extpdp (a64, imm0_31) +i32 __builtin_mips_extpdp (a64, i32) +a64 __builtin_mips_shilo (a64, imm_n32_31) +a64 __builtin_mips_shilo (a64, i32) +a64 __builtin_mips_mthlip (a64, i32) +void __builtin_mips_wrdsp (i32, imm0_63) +i32 __builtin_mips_rddsp (imm0_63) +i32 __builtin_mips_lbux (void *, i32) +i32 __builtin_mips_lhx (void *, i32) +i32 __builtin_mips_lwx (void *, i32) +a64 __builtin_mips_ldx (void *, i32) [MIPS64 only] +i32 __builtin_mips_bposge32 (void) +a64 __builtin_mips_madd (a64, i32, i32); +a64 __builtin_mips_maddu (a64, ui32, ui32); +a64 __builtin_mips_msub (a64, i32, i32); +a64 __builtin_mips_msubu (a64, ui32, ui32); +a64 __builtin_mips_mult (i32, i32); +a64 __builtin_mips_multu (ui32, ui32); +@end smallexample -@item nehalem -Intel Core i7 Nehalem CPU. +The following built-in functions map directly to a particular MIPS DSP REV 2 +instruction. Please refer to the architecture specification +for details on what each instruction does. -@item westmere -Intel Core i7 Westmere CPU. +@smallexample +v4q7 __builtin_mips_absq_s_qb (v4q7); +v2i16 __builtin_mips_addu_ph (v2i16, v2i16); +v2i16 __builtin_mips_addu_s_ph (v2i16, v2i16); +v4i8 __builtin_mips_adduh_qb (v4i8, v4i8); +v4i8 __builtin_mips_adduh_r_qb (v4i8, v4i8); +i32 __builtin_mips_append (i32, i32, imm0_31); +i32 __builtin_mips_balign (i32, i32, imm0_3); +i32 __builtin_mips_cmpgdu_eq_qb (v4i8, v4i8); +i32 __builtin_mips_cmpgdu_lt_qb (v4i8, v4i8); +i32 __builtin_mips_cmpgdu_le_qb (v4i8, v4i8); +a64 __builtin_mips_dpa_w_ph (a64, v2i16, v2i16); +a64 __builtin_mips_dps_w_ph (a64, v2i16, v2i16); +v2i16 __builtin_mips_mul_ph (v2i16, v2i16); +v2i16 __builtin_mips_mul_s_ph (v2i16, v2i16); +q31 __builtin_mips_mulq_rs_w (q31, q31); +v2q15 __builtin_mips_mulq_s_ph (v2q15, v2q15); +q31 __builtin_mips_mulq_s_w (q31, q31); +a64 __builtin_mips_mulsa_w_ph (a64, v2i16, v2i16); +v4i8 __builtin_mips_precr_qb_ph (v2i16, v2i16); +v2i16 __builtin_mips_precr_sra_ph_w (i32, i32, imm0_31); +v2i16 __builtin_mips_precr_sra_r_ph_w (i32, i32, imm0_31); +i32 __builtin_mips_prepend (i32, i32, imm0_31); +v4i8 __builtin_mips_shra_qb (v4i8, imm0_7); +v4i8 __builtin_mips_shra_r_qb (v4i8, imm0_7); +v4i8 __builtin_mips_shra_qb (v4i8, i32); +v4i8 __builtin_mips_shra_r_qb (v4i8, i32); +v2i16 __builtin_mips_shrl_ph (v2i16, imm0_15); +v2i16 __builtin_mips_shrl_ph (v2i16, i32); +v2i16 __builtin_mips_subu_ph (v2i16, v2i16); +v2i16 __builtin_mips_subu_s_ph (v2i16, v2i16); +v4i8 __builtin_mips_subuh_qb (v4i8, v4i8); +v4i8 __builtin_mips_subuh_r_qb (v4i8, v4i8); +v2q15 __builtin_mips_addqh_ph (v2q15, v2q15); +v2q15 __builtin_mips_addqh_r_ph (v2q15, v2q15); +q31 __builtin_mips_addqh_w (q31, q31); +q31 __builtin_mips_addqh_r_w (q31, q31); +v2q15 __builtin_mips_subqh_ph (v2q15, v2q15); +v2q15 __builtin_mips_subqh_r_ph (v2q15, v2q15); +q31 __builtin_mips_subqh_w (q31, q31); +q31 __builtin_mips_subqh_r_w (q31, q31); +a64 __builtin_mips_dpax_w_ph (a64, v2i16, v2i16); +a64 __builtin_mips_dpsx_w_ph (a64, v2i16, v2i16); +a64 __builtin_mips_dpaqx_s_w_ph (a64, v2q15, v2q15); +a64 __builtin_mips_dpaqx_sa_w_ph (a64, v2q15, v2q15); +a64 __builtin_mips_dpsqx_s_w_ph (a64, v2q15, v2q15); +a64 __builtin_mips_dpsqx_sa_w_ph (a64, v2q15, v2q15); +@end smallexample -@item sandybridge -Intel Core i7 Sandy Bridge CPU. -@item amd -AMD CPU. +@node MIPS Paired-Single Support +@subsection MIPS Paired-Single Support -@item amdfam10h -AMD Family 10h CPU. +The MIPS64 architecture includes a number of instructions that +operate on pairs of single-precision floating-point values. +Each pair is packed into a 64-bit floating-point register, +with one element being designated the ``upper half'' and +the other being designated the ``lower half''. -@item barcelona -AMD Family 10h Barcelona CPU. +GCC supports paired-single operations using both the generic +vector extensions (@pxref{Vector Extensions}) and a collection of +MIPS-specific built-in functions. Both kinds of support are +enabled by the @option{-mpaired-single} command-line option. -@item shanghai -AMD Family 10h Shanghai CPU. +The vector type associated with paired-single values is usually +called @code{v2sf}. It can be defined in C as follows: -@item istanbul -AMD Family 10h Istanbul CPU. +@smallexample +typedef float v2sf __attribute__ ((vector_size (8))); +@end smallexample -@item btver1 -AMD Family 14h CPU. +@code{v2sf} values are initialized in the same way as aggregates. +For example: -@item amdfam15h -AMD Family 15h CPU. +@smallexample +v2sf a = @{1.5, 9.1@}; +v2sf b; +float e, f; +b = (v2sf) @{e, f@}; +@end smallexample -@item bdver1 -AMD Family 15h Bulldozer version 1. +@emph{Note:} The CPU's endianness determines which value is stored in +the upper half of a register and which value is stored in the lower half. +On little-endian targets, the first value is the lower one and the second +value is the upper one. The opposite order applies to big-endian targets. +For example, the code above sets the lower half of @code{a} to +@code{1.5} on little-endian targets and @code{9.1} on big-endian targets. -@item bdver2 -AMD Family 15h Bulldozer version 2. +@node MIPS Loongson Built-in Functions +@subsection MIPS Loongson Built-in Functions -@item bdver3 -AMD Family 15h Bulldozer version 3. +GCC provides intrinsics to access the SIMD instructions provided by the +ST Microelectronics Loongson-2E and -2F processors. These intrinsics, +available after inclusion of the @code{loongson.h} header file, +operate on the following 64-bit vector types: -@item bdver4 -AMD Family 15h Bulldozer version 4. +@itemize +@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers; +@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers; +@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers; +@item @code{int8x8_t}, a vector of eight signed 8-bit integers; +@item @code{int16x4_t}, a vector of four signed 16-bit integers; +@item @code{int32x2_t}, a vector of two signed 32-bit integers. +@end itemize -@item btver2 -AMD Family 16h CPU. -@end table +The intrinsics provided are listed below; each is named after the +machine instruction to which it corresponds, with suffixes added as +appropriate to distinguish intrinsics that expand to the same machine +instruction yet have different argument types. Refer to the architecture +documentation for a description of the functionality of each +instruction. -Here is an example: @smallexample -if (__builtin_cpu_is ("corei7")) - @{ - do_corei7 (); // Core i7 specific implementation. - @} -else - @{ - do_generic (); // Generic implementation. - @} +int16x4_t packsswh (int32x2_t s, int32x2_t t); +int8x8_t packsshb (int16x4_t s, int16x4_t t); +uint8x8_t packushb (uint16x4_t s, uint16x4_t t); +uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t); +uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t); +uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t); +int32x2_t paddw_s (int32x2_t s, int32x2_t t); +int16x4_t paddh_s (int16x4_t s, int16x4_t t); +int8x8_t paddb_s (int8x8_t s, int8x8_t t); +uint64_t paddd_u (uint64_t s, uint64_t t); +int64_t paddd_s (int64_t s, int64_t t); +int16x4_t paddsh (int16x4_t s, int16x4_t t); +int8x8_t paddsb (int8x8_t s, int8x8_t t); +uint16x4_t paddush (uint16x4_t s, uint16x4_t t); +uint8x8_t paddusb (uint8x8_t s, uint8x8_t t); +uint64_t pandn_ud (uint64_t s, uint64_t t); +uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t); +uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t); +uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t); +int64_t pandn_sd (int64_t s, int64_t t); +int32x2_t pandn_sw (int32x2_t s, int32x2_t t); +int16x4_t pandn_sh (int16x4_t s, int16x4_t t); +int8x8_t pandn_sb (int8x8_t s, int8x8_t t); +uint16x4_t pavgh (uint16x4_t s, uint16x4_t t); +uint8x8_t pavgb (uint8x8_t s, uint8x8_t t); +uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t); +uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t); +uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t); +int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t); +int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t); +int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t); +uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t); +uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t); +uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t); +int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t); +int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t); +int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t); +uint16x4_t pextrh_u (uint16x4_t s, int field); +int16x4_t pextrh_s (int16x4_t s, int field); +uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t); +uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t); +uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t); +uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t); +int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t); +int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t); +int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t); +int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t); +int32x2_t pmaddhw (int16x4_t s, int16x4_t t); +int16x4_t pmaxsh (int16x4_t s, int16x4_t t); +uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t); +int16x4_t pminsh (int16x4_t s, int16x4_t t); +uint8x8_t pminub (uint8x8_t s, uint8x8_t t); +uint8x8_t pmovmskb_u (uint8x8_t s); +int8x8_t pmovmskb_s (int8x8_t s); +uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t); +int16x4_t pmulhh (int16x4_t s, int16x4_t t); +int16x4_t pmullh (int16x4_t s, int16x4_t t); +int64_t pmuluw (uint32x2_t s, uint32x2_t t); +uint8x8_t pasubub (uint8x8_t s, uint8x8_t t); +uint16x4_t biadd (uint8x8_t s); +uint16x4_t psadbh (uint8x8_t s, uint8x8_t t); +uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order); +int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order); +uint16x4_t psllh_u (uint16x4_t s, uint8_t amount); +int16x4_t psllh_s (int16x4_t s, uint8_t amount); +uint32x2_t psllw_u (uint32x2_t s, uint8_t amount); +int32x2_t psllw_s (int32x2_t s, uint8_t amount); +uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount); +int16x4_t psrlh_s (int16x4_t s, uint8_t amount); +uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount); +int32x2_t psrlw_s (int32x2_t s, uint8_t amount); +uint16x4_t psrah_u (uint16x4_t s, uint8_t amount); +int16x4_t psrah_s (int16x4_t s, uint8_t amount); +uint32x2_t psraw_u (uint32x2_t s, uint8_t amount); +int32x2_t psraw_s (int32x2_t s, uint8_t amount); +uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t); +uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t); +uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t); +int32x2_t psubw_s (int32x2_t s, int32x2_t t); +int16x4_t psubh_s (int16x4_t s, int16x4_t t); +int8x8_t psubb_s (int8x8_t s, int8x8_t t); +uint64_t psubd_u (uint64_t s, uint64_t t); +int64_t psubd_s (int64_t s, int64_t t); +int16x4_t psubsh (int16x4_t s, int16x4_t t); +int8x8_t psubsb (int8x8_t s, int8x8_t t); +uint16x4_t psubush (uint16x4_t s, uint16x4_t t); +uint8x8_t psubusb (uint8x8_t s, uint8x8_t t); +uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t); +uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t); +uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t); +int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t); +int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t); +int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t); +uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t); +uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t); +uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t); +int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t); +int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t); +int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t); @end smallexample -@end deftypefn -@deftypefn {Built-in Function} int __builtin_cpu_supports (const char *@var{feature}) -This function returns a positive integer if the run-time CPU -supports @var{feature} -and returns @code{0} otherwise. The following features can be detected: +@menu +* Paired-Single Arithmetic:: +* Paired-Single Built-in Functions:: +* MIPS-3D Built-in Functions:: +@end menu -@table @samp -@item cmov -CMOV instruction. -@item mmx -MMX instructions. -@item popcnt -POPCNT instruction. -@item sse -SSE instructions. -@item sse2 -SSE2 instructions. -@item sse3 -SSE3 instructions. -@item ssse3 -SSSE3 instructions. -@item sse4.1 -SSE4.1 instructions. -@item sse4.2 -SSE4.2 instructions. -@item avx -AVX instructions. -@item avx2 -AVX2 instructions. -@item avx512f -AVX512F instructions. -@end table +@node Paired-Single Arithmetic +@subsubsection Paired-Single Arithmetic -Here is an example: -@smallexample -if (__builtin_cpu_supports ("popcnt")) - @{ - asm("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc"); - @} -else - @{ - count = generic_countbits (n); //generic implementation. - @} -@end smallexample -@end deftypefn +The table below lists the @code{v2sf} operations for which hardware +support exists. @code{a}, @code{b} and @code{c} are @code{v2sf} +values and @code{x} is an integral value. +@multitable @columnfractions .50 .50 +@item C code @tab MIPS instruction +@item @code{a + b} @tab @code{add.ps} +@item @code{a - b} @tab @code{sub.ps} +@item @code{-a} @tab @code{neg.ps} +@item @code{a * b} @tab @code{mul.ps} +@item @code{a * b + c} @tab @code{madd.ps} +@item @code{a * b - c} @tab @code{msub.ps} +@item @code{-(a * b + c)} @tab @code{nmadd.ps} +@item @code{-(a * b - c)} @tab @code{nmsub.ps} +@item @code{x ? a : b} @tab @code{movn.ps}/@code{movz.ps} +@end multitable -The following built-in functions are made available by @option{-mmmx}. -All of them generate the machine instruction that is part of the name. +Note that the multiply-accumulate instructions can be disabled +using the command-line option @code{-mno-fused-madd}. -@smallexample -v8qi __builtin_ia32_paddb (v8qi, v8qi) -v4hi __builtin_ia32_paddw (v4hi, v4hi) -v2si __builtin_ia32_paddd (v2si, v2si) -v8qi __builtin_ia32_psubb (v8qi, v8qi) -v4hi __builtin_ia32_psubw (v4hi, v4hi) -v2si __builtin_ia32_psubd (v2si, v2si) -v8qi __builtin_ia32_paddsb (v8qi, v8qi) -v4hi __builtin_ia32_paddsw (v4hi, v4hi) -v8qi __builtin_ia32_psubsb (v8qi, v8qi) -v4hi __builtin_ia32_psubsw (v4hi, v4hi) -v8qi __builtin_ia32_paddusb (v8qi, v8qi) -v4hi __builtin_ia32_paddusw (v4hi, v4hi) -v8qi __builtin_ia32_psubusb (v8qi, v8qi) -v4hi __builtin_ia32_psubusw (v4hi, v4hi) -v4hi __builtin_ia32_pmullw (v4hi, v4hi) -v4hi __builtin_ia32_pmulhw (v4hi, v4hi) -di __builtin_ia32_pand (di, di) -di __builtin_ia32_pandn (di,di) -di __builtin_ia32_por (di, di) -di __builtin_ia32_pxor (di, di) -v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi) -v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi) -v2si __builtin_ia32_pcmpeqd (v2si, v2si) -v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi) -v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi) -v2si __builtin_ia32_pcmpgtd (v2si, v2si) -v8qi __builtin_ia32_punpckhbw (v8qi, v8qi) -v4hi __builtin_ia32_punpckhwd (v4hi, v4hi) -v2si __builtin_ia32_punpckhdq (v2si, v2si) -v8qi __builtin_ia32_punpcklbw (v8qi, v8qi) -v4hi __builtin_ia32_punpcklwd (v4hi, v4hi) -v2si __builtin_ia32_punpckldq (v2si, v2si) -v8qi __builtin_ia32_packsswb (v4hi, v4hi) -v4hi __builtin_ia32_packssdw (v2si, v2si) -v8qi __builtin_ia32_packuswb (v4hi, v4hi) +@node Paired-Single Built-in Functions +@subsubsection Paired-Single Built-in Functions -v4hi __builtin_ia32_psllw (v4hi, v4hi) -v2si __builtin_ia32_pslld (v2si, v2si) -v1di __builtin_ia32_psllq (v1di, v1di) -v4hi __builtin_ia32_psrlw (v4hi, v4hi) -v2si __builtin_ia32_psrld (v2si, v2si) -v1di __builtin_ia32_psrlq (v1di, v1di) -v4hi __builtin_ia32_psraw (v4hi, v4hi) -v2si __builtin_ia32_psrad (v2si, v2si) -v4hi __builtin_ia32_psllwi (v4hi, int) -v2si __builtin_ia32_pslldi (v2si, int) -v1di __builtin_ia32_psllqi (v1di, int) -v4hi __builtin_ia32_psrlwi (v4hi, int) -v2si __builtin_ia32_psrldi (v2si, int) -v1di __builtin_ia32_psrlqi (v1di, int) -v4hi __builtin_ia32_psrawi (v4hi, int) -v2si __builtin_ia32_psradi (v2si, int) +The following paired-single functions map directly to a particular +MIPS instruction. Please refer to the architecture specification +for details on what each instruction does. -@end smallexample +@table @code +@item v2sf __builtin_mips_pll_ps (v2sf, v2sf) +Pair lower lower (@code{pll.ps}). -The following built-in functions are made available either with -@option{-msse}, or with a combination of @option{-m3dnow} and -@option{-march=athlon}. All of them generate the machine -instruction that is part of the name. +@item v2sf __builtin_mips_pul_ps (v2sf, v2sf) +Pair upper lower (@code{pul.ps}). -@smallexample -v4hi __builtin_ia32_pmulhuw (v4hi, v4hi) -v8qi __builtin_ia32_pavgb (v8qi, v8qi) -v4hi __builtin_ia32_pavgw (v4hi, v4hi) -v1di __builtin_ia32_psadbw (v8qi, v8qi) -v8qi __builtin_ia32_pmaxub (v8qi, v8qi) -v4hi __builtin_ia32_pmaxsw (v4hi, v4hi) -v8qi __builtin_ia32_pminub (v8qi, v8qi) -v4hi __builtin_ia32_pminsw (v4hi, v4hi) -int __builtin_ia32_pmovmskb (v8qi) -void __builtin_ia32_maskmovq (v8qi, v8qi, char *) -void __builtin_ia32_movntq (di *, di) -void __builtin_ia32_sfence (void) +@item v2sf __builtin_mips_plu_ps (v2sf, v2sf) +Pair lower upper (@code{plu.ps}). + +@item v2sf __builtin_mips_puu_ps (v2sf, v2sf) +Pair upper upper (@code{puu.ps}). + +@item v2sf __builtin_mips_cvt_ps_s (float, float) +Convert pair to paired single (@code{cvt.ps.s}). + +@item float __builtin_mips_cvt_s_pl (v2sf) +Convert pair lower to single (@code{cvt.s.pl}). + +@item float __builtin_mips_cvt_s_pu (v2sf) +Convert pair upper to single (@code{cvt.s.pu}). + +@item v2sf __builtin_mips_abs_ps (v2sf) +Absolute value (@code{abs.ps}). + +@item v2sf __builtin_mips_alnv_ps (v2sf, v2sf, int) +Align variable (@code{alnv.ps}). + +@emph{Note:} The value of the third parameter must be 0 or 4 +modulo 8, otherwise the result is unpredictable. Please read the +instruction description for details. +@end table + +The following multi-instruction functions are also available. +In each case, @var{cond} can be any of the 16 floating-point conditions: +@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult}, +@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq}, @code{ngl}, +@code{lt}, @code{nge}, @code{le} or @code{ngt}. + +@table @code +@item v2sf __builtin_mips_movt_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +@itemx v2sf __builtin_mips_movf_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +Conditional move based on floating-point comparison (@code{c.@var{cond}.ps}, +@code{movt.ps}/@code{movf.ps}). + +The @code{movt} functions return the value @var{x} computed by: + +@smallexample +c.@var{cond}.ps @var{cc},@var{a},@var{b} +mov.ps @var{x},@var{c} +movt.ps @var{x},@var{d},@var{cc} @end smallexample -The following built-in functions are available when @option{-msse} is used. -All of them generate the machine instruction that is part of the name. +The @code{movf} functions are similar but use @code{movf.ps} instead +of @code{movt.ps}. + +@item int __builtin_mips_upper_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +@itemx int __builtin_mips_lower_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +Comparison of two paired-single values (@code{c.@var{cond}.ps}, +@code{bc1t}/@code{bc1f}). + +These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps} +and return either the upper or lower half of the result. For example: @smallexample -int __builtin_ia32_comieq (v4sf, v4sf) -int __builtin_ia32_comineq (v4sf, v4sf) -int __builtin_ia32_comilt (v4sf, v4sf) -int __builtin_ia32_comile (v4sf, v4sf) -int __builtin_ia32_comigt (v4sf, v4sf) -int __builtin_ia32_comige (v4sf, v4sf) -int __builtin_ia32_ucomieq (v4sf, v4sf) -int __builtin_ia32_ucomineq (v4sf, v4sf) -int __builtin_ia32_ucomilt (v4sf, v4sf) -int __builtin_ia32_ucomile (v4sf, v4sf) -int __builtin_ia32_ucomigt (v4sf, v4sf) -int __builtin_ia32_ucomige (v4sf, v4sf) -v4sf __builtin_ia32_addps (v4sf, v4sf) -v4sf __builtin_ia32_subps (v4sf, v4sf) -v4sf __builtin_ia32_mulps (v4sf, v4sf) -v4sf __builtin_ia32_divps (v4sf, v4sf) -v4sf __builtin_ia32_addss (v4sf, v4sf) -v4sf __builtin_ia32_subss (v4sf, v4sf) -v4sf __builtin_ia32_mulss (v4sf, v4sf) -v4sf __builtin_ia32_divss (v4sf, v4sf) -v4sf __builtin_ia32_cmpeqps (v4sf, v4sf) -v4sf __builtin_ia32_cmpltps (v4sf, v4sf) -v4sf __builtin_ia32_cmpleps (v4sf, v4sf) -v4sf __builtin_ia32_cmpgtps (v4sf, v4sf) -v4sf __builtin_ia32_cmpgeps (v4sf, v4sf) -v4sf __builtin_ia32_cmpunordps (v4sf, v4sf) -v4sf __builtin_ia32_cmpneqps (v4sf, v4sf) -v4sf __builtin_ia32_cmpnltps (v4sf, v4sf) -v4sf __builtin_ia32_cmpnleps (v4sf, v4sf) -v4sf __builtin_ia32_cmpngtps (v4sf, v4sf) -v4sf __builtin_ia32_cmpngeps (v4sf, v4sf) -v4sf __builtin_ia32_cmpordps (v4sf, v4sf) -v4sf __builtin_ia32_cmpeqss (v4sf, v4sf) -v4sf __builtin_ia32_cmpltss (v4sf, v4sf) -v4sf __builtin_ia32_cmpless (v4sf, v4sf) -v4sf __builtin_ia32_cmpunordss (v4sf, v4sf) -v4sf __builtin_ia32_cmpneqss (v4sf, v4sf) -v4sf __builtin_ia32_cmpnltss (v4sf, v4sf) -v4sf __builtin_ia32_cmpnless (v4sf, v4sf) -v4sf __builtin_ia32_cmpordss (v4sf, v4sf) -v4sf __builtin_ia32_maxps (v4sf, v4sf) -v4sf __builtin_ia32_maxss (v4sf, v4sf) -v4sf __builtin_ia32_minps (v4sf, v4sf) -v4sf __builtin_ia32_minss (v4sf, v4sf) -v4sf __builtin_ia32_andps (v4sf, v4sf) -v4sf __builtin_ia32_andnps (v4sf, v4sf) -v4sf __builtin_ia32_orps (v4sf, v4sf) -v4sf __builtin_ia32_xorps (v4sf, v4sf) -v4sf __builtin_ia32_movss (v4sf, v4sf) -v4sf __builtin_ia32_movhlps (v4sf, v4sf) -v4sf __builtin_ia32_movlhps (v4sf, v4sf) -v4sf __builtin_ia32_unpckhps (v4sf, v4sf) -v4sf __builtin_ia32_unpcklps (v4sf, v4sf) -v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si) -v4sf __builtin_ia32_cvtsi2ss (v4sf, int) -v2si __builtin_ia32_cvtps2pi (v4sf) -int __builtin_ia32_cvtss2si (v4sf) -v2si __builtin_ia32_cvttps2pi (v4sf) -int __builtin_ia32_cvttss2si (v4sf) -v4sf __builtin_ia32_rcpps (v4sf) -v4sf __builtin_ia32_rsqrtps (v4sf) -v4sf __builtin_ia32_sqrtps (v4sf) -v4sf __builtin_ia32_rcpss (v4sf) -v4sf __builtin_ia32_rsqrtss (v4sf) -v4sf __builtin_ia32_sqrtss (v4sf) -v4sf __builtin_ia32_shufps (v4sf, v4sf, int) -void __builtin_ia32_movntps (float *, v4sf) -int __builtin_ia32_movmskps (v4sf) +v2sf a, b; +if (__builtin_mips_upper_c_eq_ps (a, b)) + upper_halves_are_equal (); +else + upper_halves_are_unequal (); + +if (__builtin_mips_lower_c_eq_ps (a, b)) + lower_halves_are_equal (); +else + lower_halves_are_unequal (); @end smallexample +@end table -The following built-in functions are available when @option{-msse} is used. +@node MIPS-3D Built-in Functions +@subsubsection MIPS-3D Built-in Functions + +The MIPS-3D Application-Specific Extension (ASE) includes additional +paired-single instructions that are designed to improve the performance +of 3D graphics operations. Support for these instructions is controlled +by the @option{-mips3d} command-line option. + +The functions listed below map directly to a particular MIPS-3D +instruction. Please refer to the architecture specification for +more details on what each instruction does. @table @code -@item v4sf __builtin_ia32_loadups (float *) -Generates the @code{movups} machine instruction as a load from memory. -@item void __builtin_ia32_storeups (float *, v4sf) -Generates the @code{movups} machine instruction as a store to memory. -@item v4sf __builtin_ia32_loadss (float *) -Generates the @code{movss} machine instruction as a load from memory. -@item v4sf __builtin_ia32_loadhps (v4sf, const v2sf *) -Generates the @code{movhps} machine instruction as a load from memory. -@item v4sf __builtin_ia32_loadlps (v4sf, const v2sf *) -Generates the @code{movlps} machine instruction as a load from memory -@item void __builtin_ia32_storehps (v2sf *, v4sf) -Generates the @code{movhps} machine instruction as a store to memory. -@item void __builtin_ia32_storelps (v2sf *, v4sf) -Generates the @code{movlps} machine instruction as a store to memory. +@item v2sf __builtin_mips_addr_ps (v2sf, v2sf) +Reduction add (@code{addr.ps}). + +@item v2sf __builtin_mips_mulr_ps (v2sf, v2sf) +Reduction multiply (@code{mulr.ps}). + +@item v2sf __builtin_mips_cvt_pw_ps (v2sf) +Convert paired single to paired word (@code{cvt.pw.ps}). + +@item v2sf __builtin_mips_cvt_ps_pw (v2sf) +Convert paired word to paired single (@code{cvt.ps.pw}). + +@item float __builtin_mips_recip1_s (float) +@itemx double __builtin_mips_recip1_d (double) +@itemx v2sf __builtin_mips_recip1_ps (v2sf) +Reduced-precision reciprocal (sequence step 1) (@code{recip1.@var{fmt}}). + +@item float __builtin_mips_recip2_s (float, float) +@itemx double __builtin_mips_recip2_d (double, double) +@itemx v2sf __builtin_mips_recip2_ps (v2sf, v2sf) +Reduced-precision reciprocal (sequence step 2) (@code{recip2.@var{fmt}}). + +@item float __builtin_mips_rsqrt1_s (float) +@itemx double __builtin_mips_rsqrt1_d (double) +@itemx v2sf __builtin_mips_rsqrt1_ps (v2sf) +Reduced-precision reciprocal square root (sequence step 1) +(@code{rsqrt1.@var{fmt}}). + +@item float __builtin_mips_rsqrt2_s (float, float) +@itemx double __builtin_mips_rsqrt2_d (double, double) +@itemx v2sf __builtin_mips_rsqrt2_ps (v2sf, v2sf) +Reduced-precision reciprocal square root (sequence step 2) +(@code{rsqrt2.@var{fmt}}). @end table -The following built-in functions are available when @option{-msse2} is used. -All of them generate the machine instruction that is part of the name. +The following multi-instruction functions are also available. +In each case, @var{cond} can be any of the 16 floating-point conditions: +@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult}, +@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq}, +@code{ngl}, @code{lt}, @code{nge}, @code{le} or @code{ngt}. + +@table @code +@item int __builtin_mips_cabs_@var{cond}_s (float @var{a}, float @var{b}) +@itemx int __builtin_mips_cabs_@var{cond}_d (double @var{a}, double @var{b}) +Absolute comparison of two scalar values (@code{cabs.@var{cond}.@var{fmt}}, +@code{bc1t}/@code{bc1f}). + +These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.s} +or @code{cabs.@var{cond}.d} and return the result as a boolean value. +For example: @smallexample -int __builtin_ia32_comisdeq (v2df, v2df) -int __builtin_ia32_comisdlt (v2df, v2df) -int __builtin_ia32_comisdle (v2df, v2df) -int __builtin_ia32_comisdgt (v2df, v2df) -int __builtin_ia32_comisdge (v2df, v2df) -int __builtin_ia32_comisdneq (v2df, v2df) -int __builtin_ia32_ucomisdeq (v2df, v2df) -int __builtin_ia32_ucomisdlt (v2df, v2df) -int __builtin_ia32_ucomisdle (v2df, v2df) -int __builtin_ia32_ucomisdgt (v2df, v2df) -int __builtin_ia32_ucomisdge (v2df, v2df) -int __builtin_ia32_ucomisdneq (v2df, v2df) -v2df __builtin_ia32_cmpeqpd (v2df, v2df) -v2df __builtin_ia32_cmpltpd (v2df, v2df) -v2df __builtin_ia32_cmplepd (v2df, v2df) -v2df __builtin_ia32_cmpgtpd (v2df, v2df) -v2df __builtin_ia32_cmpgepd (v2df, v2df) -v2df __builtin_ia32_cmpunordpd (v2df, v2df) -v2df __builtin_ia32_cmpneqpd (v2df, v2df) -v2df __builtin_ia32_cmpnltpd (v2df, v2df) -v2df __builtin_ia32_cmpnlepd (v2df, v2df) -v2df __builtin_ia32_cmpngtpd (v2df, v2df) -v2df __builtin_ia32_cmpngepd (v2df, v2df) -v2df __builtin_ia32_cmpordpd (v2df, v2df) -v2df __builtin_ia32_cmpeqsd (v2df, v2df) -v2df __builtin_ia32_cmpltsd (v2df, v2df) -v2df __builtin_ia32_cmplesd (v2df, v2df) -v2df __builtin_ia32_cmpunordsd (v2df, v2df) -v2df __builtin_ia32_cmpneqsd (v2df, v2df) -v2df __builtin_ia32_cmpnltsd (v2df, v2df) -v2df __builtin_ia32_cmpnlesd (v2df, v2df) -v2df __builtin_ia32_cmpordsd (v2df, v2df) -v2di __builtin_ia32_paddq (v2di, v2di) -v2di __builtin_ia32_psubq (v2di, v2di) -v2df __builtin_ia32_addpd (v2df, v2df) -v2df __builtin_ia32_subpd (v2df, v2df) -v2df __builtin_ia32_mulpd (v2df, v2df) -v2df __builtin_ia32_divpd (v2df, v2df) -v2df __builtin_ia32_addsd (v2df, v2df) -v2df __builtin_ia32_subsd (v2df, v2df) -v2df __builtin_ia32_mulsd (v2df, v2df) -v2df __builtin_ia32_divsd (v2df, v2df) -v2df __builtin_ia32_minpd (v2df, v2df) -v2df __builtin_ia32_maxpd (v2df, v2df) -v2df __builtin_ia32_minsd (v2df, v2df) -v2df __builtin_ia32_maxsd (v2df, v2df) -v2df __builtin_ia32_andpd (v2df, v2df) -v2df __builtin_ia32_andnpd (v2df, v2df) -v2df __builtin_ia32_orpd (v2df, v2df) -v2df __builtin_ia32_xorpd (v2df, v2df) -v2df __builtin_ia32_movsd (v2df, v2df) -v2df __builtin_ia32_unpckhpd (v2df, v2df) -v2df __builtin_ia32_unpcklpd (v2df, v2df) -v16qi __builtin_ia32_paddb128 (v16qi, v16qi) -v8hi __builtin_ia32_paddw128 (v8hi, v8hi) -v4si __builtin_ia32_paddd128 (v4si, v4si) -v2di __builtin_ia32_paddq128 (v2di, v2di) -v16qi __builtin_ia32_psubb128 (v16qi, v16qi) -v8hi __builtin_ia32_psubw128 (v8hi, v8hi) -v4si __builtin_ia32_psubd128 (v4si, v4si) -v2di __builtin_ia32_psubq128 (v2di, v2di) -v8hi __builtin_ia32_pmullw128 (v8hi, v8hi) -v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi) -v2di __builtin_ia32_pand128 (v2di, v2di) -v2di __builtin_ia32_pandn128 (v2di, v2di) -v2di __builtin_ia32_por128 (v2di, v2di) -v2di __builtin_ia32_pxor128 (v2di, v2di) -v16qi __builtin_ia32_pavgb128 (v16qi, v16qi) -v8hi __builtin_ia32_pavgw128 (v8hi, v8hi) -v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi) -v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi) -v4si __builtin_ia32_pcmpeqd128 (v4si, v4si) -v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi) -v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi) -v4si __builtin_ia32_pcmpgtd128 (v4si, v4si) -v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi) -v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi) -v16qi __builtin_ia32_pminub128 (v16qi, v16qi) -v8hi __builtin_ia32_pminsw128 (v8hi, v8hi) -v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi) -v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi) -v4si __builtin_ia32_punpckhdq128 (v4si, v4si) -v2di __builtin_ia32_punpckhqdq128 (v2di, v2di) -v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi) -v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi) -v4si __builtin_ia32_punpckldq128 (v4si, v4si) -v2di __builtin_ia32_punpcklqdq128 (v2di, v2di) -v16qi __builtin_ia32_packsswb128 (v8hi, v8hi) -v8hi __builtin_ia32_packssdw128 (v4si, v4si) -v16qi __builtin_ia32_packuswb128 (v8hi, v8hi) -v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi) -void __builtin_ia32_maskmovdqu (v16qi, v16qi) -v2df __builtin_ia32_loadupd (double *) -void __builtin_ia32_storeupd (double *, v2df) -v2df __builtin_ia32_loadhpd (v2df, double const *) -v2df __builtin_ia32_loadlpd (v2df, double const *) -int __builtin_ia32_movmskpd (v2df) -int __builtin_ia32_pmovmskb128 (v16qi) -void __builtin_ia32_movnti (int *, int) -void __builtin_ia32_movnti64 (long long int *, long long int) -void __builtin_ia32_movntpd (double *, v2df) -void __builtin_ia32_movntdq (v2df *, v2df) -v4si __builtin_ia32_pshufd (v4si, int) -v8hi __builtin_ia32_pshuflw (v8hi, int) -v8hi __builtin_ia32_pshufhw (v8hi, int) -v2di __builtin_ia32_psadbw128 (v16qi, v16qi) -v2df __builtin_ia32_sqrtpd (v2df) -v2df __builtin_ia32_sqrtsd (v2df) -v2df __builtin_ia32_shufpd (v2df, v2df, int) -v2df __builtin_ia32_cvtdq2pd (v4si) -v4sf __builtin_ia32_cvtdq2ps (v4si) -v4si __builtin_ia32_cvtpd2dq (v2df) -v2si __builtin_ia32_cvtpd2pi (v2df) -v4sf __builtin_ia32_cvtpd2ps (v2df) -v4si __builtin_ia32_cvttpd2dq (v2df) -v2si __builtin_ia32_cvttpd2pi (v2df) -v2df __builtin_ia32_cvtpi2pd (v2si) -int __builtin_ia32_cvtsd2si (v2df) -int __builtin_ia32_cvttsd2si (v2df) -long long __builtin_ia32_cvtsd2si64 (v2df) -long long __builtin_ia32_cvttsd2si64 (v2df) -v4si __builtin_ia32_cvtps2dq (v4sf) -v2df __builtin_ia32_cvtps2pd (v4sf) -v4si __builtin_ia32_cvttps2dq (v4sf) -v2df __builtin_ia32_cvtsi2sd (v2df, int) -v2df __builtin_ia32_cvtsi642sd (v2df, long long) -v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df) -v2df __builtin_ia32_cvtss2sd (v2df, v4sf) -void __builtin_ia32_clflush (const void *) -void __builtin_ia32_lfence (void) -void __builtin_ia32_mfence (void) -v16qi __builtin_ia32_loaddqu (const char *) -void __builtin_ia32_storedqu (char *, v16qi) -v1di __builtin_ia32_pmuludq (v2si, v2si) -v2di __builtin_ia32_pmuludq128 (v4si, v4si) -v8hi __builtin_ia32_psllw128 (v8hi, v8hi) -v4si __builtin_ia32_pslld128 (v4si, v4si) -v2di __builtin_ia32_psllq128 (v2di, v2di) -v8hi __builtin_ia32_psrlw128 (v8hi, v8hi) -v4si __builtin_ia32_psrld128 (v4si, v4si) -v2di __builtin_ia32_psrlq128 (v2di, v2di) -v8hi __builtin_ia32_psraw128 (v8hi, v8hi) -v4si __builtin_ia32_psrad128 (v4si, v4si) -v2di __builtin_ia32_pslldqi128 (v2di, int) -v8hi __builtin_ia32_psllwi128 (v8hi, int) -v4si __builtin_ia32_pslldi128 (v4si, int) -v2di __builtin_ia32_psllqi128 (v2di, int) -v2di __builtin_ia32_psrldqi128 (v2di, int) -v8hi __builtin_ia32_psrlwi128 (v8hi, int) -v4si __builtin_ia32_psrldi128 (v4si, int) -v2di __builtin_ia32_psrlqi128 (v2di, int) -v8hi __builtin_ia32_psrawi128 (v8hi, int) -v4si __builtin_ia32_psradi128 (v4si, int) -v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi) -v2di __builtin_ia32_movq128 (v2di) +float a, b; +if (__builtin_mips_cabs_eq_s (a, b)) + true (); +else + false (); @end smallexample -The following built-in functions are available when @option{-msse3} is used. -All of them generate the machine instruction that is part of the name. +@item int __builtin_mips_upper_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +@itemx int __builtin_mips_lower_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +Absolute comparison of two paired-single values (@code{cabs.@var{cond}.ps}, +@code{bc1t}/@code{bc1f}). + +These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.ps} +and return either the upper or lower half of the result. For example: @smallexample -v2df __builtin_ia32_addsubpd (v2df, v2df) -v4sf __builtin_ia32_addsubps (v4sf, v4sf) -v2df __builtin_ia32_haddpd (v2df, v2df) -v4sf __builtin_ia32_haddps (v4sf, v4sf) -v2df __builtin_ia32_hsubpd (v2df, v2df) -v4sf __builtin_ia32_hsubps (v4sf, v4sf) -v16qi __builtin_ia32_lddqu (char const *) -void __builtin_ia32_monitor (void *, unsigned int, unsigned int) -v4sf __builtin_ia32_movshdup (v4sf) -v4sf __builtin_ia32_movsldup (v4sf) -void __builtin_ia32_mwait (unsigned int, unsigned int) +v2sf a, b; +if (__builtin_mips_upper_cabs_eq_ps (a, b)) + upper_halves_are_equal (); +else + upper_halves_are_unequal (); + +if (__builtin_mips_lower_cabs_eq_ps (a, b)) + lower_halves_are_equal (); +else + lower_halves_are_unequal (); @end smallexample -The following built-in functions are available when @option{-mssse3} is used. -All of them generate the machine instruction that is part of the name. +@item v2sf __builtin_mips_movt_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +@itemx v2sf __builtin_mips_movf_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +Conditional move based on absolute comparison (@code{cabs.@var{cond}.ps}, +@code{movt.ps}/@code{movf.ps}). + +The @code{movt} functions return the value @var{x} computed by: @smallexample -v2si __builtin_ia32_phaddd (v2si, v2si) -v4hi __builtin_ia32_phaddw (v4hi, v4hi) -v4hi __builtin_ia32_phaddsw (v4hi, v4hi) -v2si __builtin_ia32_phsubd (v2si, v2si) -v4hi __builtin_ia32_phsubw (v4hi, v4hi) -v4hi __builtin_ia32_phsubsw (v4hi, v4hi) -v4hi __builtin_ia32_pmaddubsw (v8qi, v8qi) -v4hi __builtin_ia32_pmulhrsw (v4hi, v4hi) -v8qi __builtin_ia32_pshufb (v8qi, v8qi) -v8qi __builtin_ia32_psignb (v8qi, v8qi) -v2si __builtin_ia32_psignd (v2si, v2si) -v4hi __builtin_ia32_psignw (v4hi, v4hi) -v1di __builtin_ia32_palignr (v1di, v1di, int) -v8qi __builtin_ia32_pabsb (v8qi) -v2si __builtin_ia32_pabsd (v2si) -v4hi __builtin_ia32_pabsw (v4hi) +cabs.@var{cond}.ps @var{cc},@var{a},@var{b} +mov.ps @var{x},@var{c} +movt.ps @var{x},@var{d},@var{cc} @end smallexample -The following built-in functions are available when @option{-mssse3} is used. -All of them generate the machine instruction that is part of the name. +The @code{movf} functions are similar but use @code{movf.ps} instead +of @code{movt.ps}. + +@item int __builtin_mips_any_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +@itemx int __builtin_mips_all_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +@itemx int __builtin_mips_any_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +@itemx int __builtin_mips_all_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) +Comparison of two paired-single values +(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps}, +@code{bc1any2t}/@code{bc1any2f}). + +These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps} +or @code{cabs.@var{cond}.ps}. The @code{any} forms return true if either +result is true and the @code{all} forms return true if both results are true. +For example: @smallexample -v4si __builtin_ia32_phaddd128 (v4si, v4si) -v8hi __builtin_ia32_phaddw128 (v8hi, v8hi) -v8hi __builtin_ia32_phaddsw128 (v8hi, v8hi) -v4si __builtin_ia32_phsubd128 (v4si, v4si) -v8hi __builtin_ia32_phsubw128 (v8hi, v8hi) -v8hi __builtin_ia32_phsubsw128 (v8hi, v8hi) -v8hi __builtin_ia32_pmaddubsw128 (v16qi, v16qi) -v8hi __builtin_ia32_pmulhrsw128 (v8hi, v8hi) -v16qi __builtin_ia32_pshufb128 (v16qi, v16qi) -v16qi __builtin_ia32_psignb128 (v16qi, v16qi) -v4si __builtin_ia32_psignd128 (v4si, v4si) -v8hi __builtin_ia32_psignw128 (v8hi, v8hi) -v2di __builtin_ia32_palignr128 (v2di, v2di, int) -v16qi __builtin_ia32_pabsb128 (v16qi) -v4si __builtin_ia32_pabsd128 (v4si) -v8hi __builtin_ia32_pabsw128 (v8hi) +v2sf a, b; +if (__builtin_mips_any_c_eq_ps (a, b)) + one_is_true (); +else + both_are_false (); + +if (__builtin_mips_all_c_eq_ps (a, b)) + both_are_true (); +else + one_is_false (); @end smallexample -The following built-in functions are available when @option{-msse4.1} is -used. All of them generate the machine instruction that is part of the -name. +@item int __builtin_mips_any_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +@itemx int __builtin_mips_all_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +@itemx int __builtin_mips_any_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +@itemx int __builtin_mips_all_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) +Comparison of four paired-single values +(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps}, +@code{bc1any4t}/@code{bc1any4f}). + +These functions use @code{c.@var{cond}.ps} or @code{cabs.@var{cond}.ps} +to compare @var{a} with @var{b} and to compare @var{c} with @var{d}. +The @code{any} forms return true if any of the four results are true +and the @code{all} forms return true if all four results are true. +For example: @smallexample -v2df __builtin_ia32_blendpd (v2df, v2df, const int) -v4sf __builtin_ia32_blendps (v4sf, v4sf, const int) -v2df __builtin_ia32_blendvpd (v2df, v2df, v2df) -v4sf __builtin_ia32_blendvps (v4sf, v4sf, v4sf) -v2df __builtin_ia32_dppd (v2df, v2df, const int) -v4sf __builtin_ia32_dpps (v4sf, v4sf, const int) -v4sf __builtin_ia32_insertps128 (v4sf, v4sf, const int) -v2di __builtin_ia32_movntdqa (v2di *); -v16qi __builtin_ia32_mpsadbw128 (v16qi, v16qi, const int) -v8hi __builtin_ia32_packusdw128 (v4si, v4si) -v16qi __builtin_ia32_pblendvb128 (v16qi, v16qi, v16qi) -v8hi __builtin_ia32_pblendw128 (v8hi, v8hi, const int) -v2di __builtin_ia32_pcmpeqq (v2di, v2di) -v8hi __builtin_ia32_phminposuw128 (v8hi) -v16qi __builtin_ia32_pmaxsb128 (v16qi, v16qi) -v4si __builtin_ia32_pmaxsd128 (v4si, v4si) -v4si __builtin_ia32_pmaxud128 (v4si, v4si) -v8hi __builtin_ia32_pmaxuw128 (v8hi, v8hi) -v16qi __builtin_ia32_pminsb128 (v16qi, v16qi) -v4si __builtin_ia32_pminsd128 (v4si, v4si) -v4si __builtin_ia32_pminud128 (v4si, v4si) -v8hi __builtin_ia32_pminuw128 (v8hi, v8hi) -v4si __builtin_ia32_pmovsxbd128 (v16qi) -v2di __builtin_ia32_pmovsxbq128 (v16qi) -v8hi __builtin_ia32_pmovsxbw128 (v16qi) -v2di __builtin_ia32_pmovsxdq128 (v4si) -v4si __builtin_ia32_pmovsxwd128 (v8hi) -v2di __builtin_ia32_pmovsxwq128 (v8hi) -v4si __builtin_ia32_pmovzxbd128 (v16qi) -v2di __builtin_ia32_pmovzxbq128 (v16qi) -v8hi __builtin_ia32_pmovzxbw128 (v16qi) -v2di __builtin_ia32_pmovzxdq128 (v4si) -v4si __builtin_ia32_pmovzxwd128 (v8hi) -v2di __builtin_ia32_pmovzxwq128 (v8hi) -v2di __builtin_ia32_pmuldq128 (v4si, v4si) -v4si __builtin_ia32_pmulld128 (v4si, v4si) -int __builtin_ia32_ptestc128 (v2di, v2di) -int __builtin_ia32_ptestnzc128 (v2di, v2di) -int __builtin_ia32_ptestz128 (v2di, v2di) -v2df __builtin_ia32_roundpd (v2df, const int) -v4sf __builtin_ia32_roundps (v4sf, const int) -v2df __builtin_ia32_roundsd (v2df, v2df, const int) -v4sf __builtin_ia32_roundss (v4sf, v4sf, const int) +v2sf a, b, c, d; +if (__builtin_mips_any_c_eq_4s (a, b, c, d)) + some_are_true (); +else + all_are_false (); + +if (__builtin_mips_all_c_eq_4s (a, b, c, d)) + all_are_true (); +else + some_are_false (); @end smallexample +@end table -The following built-in functions are available when @option{-msse4.1} is -used. +@node Other MIPS Built-in Functions +@subsection Other MIPS Built-in Functions + +GCC provides other MIPS-specific built-in functions: @table @code -@item v4sf __builtin_ia32_vec_set_v4sf (v4sf, float, const int) -Generates the @code{insertps} machine instruction. -@item int __builtin_ia32_vec_ext_v16qi (v16qi, const int) -Generates the @code{pextrb} machine instruction. -@item v16qi __builtin_ia32_vec_set_v16qi (v16qi, int, const int) -Generates the @code{pinsrb} machine instruction. -@item v4si __builtin_ia32_vec_set_v4si (v4si, int, const int) -Generates the @code{pinsrd} machine instruction. -@item v2di __builtin_ia32_vec_set_v2di (v2di, long long, const int) -Generates the @code{pinsrq} machine instruction in 64bit mode. +@item void __builtin_mips_cache (int @var{op}, const volatile void *@var{addr}) +Insert a @samp{cache} instruction with operands @var{op} and @var{addr}. +GCC defines the preprocessor macro @code{___GCC_HAVE_BUILTIN_MIPS_CACHE} +when this function is available. + +@item unsigned int __builtin_mips_get_fcsr (void) +@itemx void __builtin_mips_set_fcsr (unsigned int @var{value}) +Get and set the contents of the floating-point control and status register +(FPU control register 31). These functions are only available in hard-float +code but can be called in both MIPS16 and non-MIPS16 contexts. + +@code{__builtin_mips_set_fcsr} can be used to change any bit of the +register except the condition codes, which GCC assumes are preserved. @end table -The following built-in functions are changed to generate new SSE4.1 -instructions when @option{-msse4.1} is used. +@node MSP430 Built-in Functions +@subsection MSP430 Built-in Functions + +GCC provides a couple of special builtin functions to aid in the +writing of interrupt handlers in C. @table @code -@item float __builtin_ia32_vec_ext_v4sf (v4sf, const int) -Generates the @code{extractps} machine instruction. -@item int __builtin_ia32_vec_ext_v4si (v4si, const int) -Generates the @code{pextrd} machine instruction. -@item long long __builtin_ia32_vec_ext_v2di (v2di, const int) -Generates the @code{pextrq} machine instruction in 64bit mode. +@item __bic_SR_register_on_exit (int @var{mask}) +This clears the indicated bits in the saved copy of the status register +currently residing on the stack. This only works inside interrupt +handlers and the changes to the status register will only take affect +once the handler returns. + +@item __bis_SR_register_on_exit (int @var{mask}) +This sets the indicated bits in the saved copy of the status register +currently residing on the stack. This only works inside interrupt +handlers and the changes to the status register will only take affect +once the handler returns. + +@item __delay_cycles (long long @var{cycles}) +This inserts an instruction sequence that takes exactly @var{cycles} +cycles (between 0 and about 17E9) to complete. The inserted sequence +may use jumps, loops, or no-ops, and does not interfere with any other +instructions. Note that @var{cycles} must be a compile-time constant +integer - that is, you must pass a number, not a variable that may be +optimized to a constant later. The number of cycles delayed by this +builtin is exact. @end table -The following built-in functions are available when @option{-msse4.2} is -used. All of them generate the machine instruction that is part of the -name. +@node NDS32 Built-in Functions +@subsection NDS32 Built-in Functions -@smallexample -v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int) -int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int) -int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int) -int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int) -int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int) -int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int) -int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int) -v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int) -int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int) -int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int) -int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int) -int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int) -int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int) -int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int) -v2di __builtin_ia32_pcmpgtq (v2di, v2di) -@end smallexample +These built-in functions are available for the NDS32 target: -The following built-in functions are available when @option{-msse4.2} is -used. +@deftypefn {Built-in Function} void __builtin_nds32_isync (int *@var{addr}) +Insert an ISYNC instruction into the instruction stream where +@var{addr} is an instruction address for serialization. +@end deftypefn -@table @code -@item unsigned int __builtin_ia32_crc32qi (unsigned int, unsigned char) -Generates the @code{crc32b} machine instruction. -@item unsigned int __builtin_ia32_crc32hi (unsigned int, unsigned short) -Generates the @code{crc32w} machine instruction. -@item unsigned int __builtin_ia32_crc32si (unsigned int, unsigned int) -Generates the @code{crc32l} machine instruction. -@item unsigned long long __builtin_ia32_crc32di (unsigned long long, unsigned long long) -Generates the @code{crc32q} machine instruction. -@end table +@deftypefn {Built-in Function} void __builtin_nds32_isb (void) +Insert an ISB instruction into the instruction stream. +@end deftypefn -The following built-in functions are changed to generate new SSE4.2 -instructions when @option{-msse4.2} is used. +@deftypefn {Built-in Function} int __builtin_nds32_mfsr (int @var{sr}) +Return the content of a system register which is mapped by @var{sr}. +@end deftypefn + +@deftypefn {Built-in Function} int __builtin_nds32_mfusr (int @var{usr}) +Return the content of a user space register which is mapped by @var{usr}. +@end deftypefn + +@deftypefn {Built-in Function} void __builtin_nds32_mtsr (int @var{value}, int @var{sr}) +Move the @var{value} to a system register which is mapped by @var{sr}. +@end deftypefn + +@deftypefn {Built-in Function} void __builtin_nds32_mtusr (int @var{value}, int @var{usr}) +Move the @var{value} to a user space register which is mapped by @var{usr}. +@end deftypefn + +@deftypefn {Built-in Function} void __builtin_nds32_setgie_en (void) +Enable global interrupt. +@end deftypefn + +@deftypefn {Built-in Function} void __builtin_nds32_setgie_dis (void) +Disable global interrupt. +@end deftypefn + +@node picoChip Built-in Functions +@subsection picoChip Built-in Functions + +GCC provides an interface to selected machine instructions from the +picoChip instruction set. @table @code -@item int __builtin_popcount (unsigned int) -Generates the @code{popcntl} machine instruction. -@item int __builtin_popcountl (unsigned long) -Generates the @code{popcntl} or @code{popcntq} machine instruction, -depending on the size of @code{unsigned long}. -@item int __builtin_popcountll (unsigned long long) -Generates the @code{popcntq} machine instruction. +@item int __builtin_sbc (int @var{value}) +Sign bit count. Return the number of consecutive bits in @var{value} +that have the same value as the sign bit. The result is the number of +leading sign bits minus one, giving the number of redundant sign bits in +@var{value}. + +@item int __builtin_byteswap (int @var{value}) +Byte swap. Return the result of swapping the upper and lower bytes of +@var{value}. + +@item int __builtin_brev (int @var{value}) +Bit reversal. Return the result of reversing the bits in +@var{value}. Bit 15 is swapped with bit 0, bit 14 is swapped with bit 1, +and so on. + +@item int __builtin_adds (int @var{x}, int @var{y}) +Saturating addition. Return the result of adding @var{x} and @var{y}, +storing the value 32767 if the result overflows. + +@item int __builtin_subs (int @var{x}, int @var{y}) +Saturating subtraction. Return the result of subtracting @var{y} from +@var{x}, storing the value @minus{}32768 if the result overflows. + +@item void __builtin_halt (void) +Halt. The processor stops execution. This built-in is useful for +implementing assertions. + @end table -The following built-in functions are available when @option{-mavx} is -used. All of them generate the machine instruction that is part of the -name. +@node PowerPC Built-in Functions +@subsection PowerPC Built-in Functions +These built-in functions are available for the PowerPC family of +processors: @smallexample -v4df __builtin_ia32_addpd256 (v4df,v4df) -v8sf __builtin_ia32_addps256 (v8sf,v8sf) -v4df __builtin_ia32_addsubpd256 (v4df,v4df) -v8sf __builtin_ia32_addsubps256 (v8sf,v8sf) -v4df __builtin_ia32_andnpd256 (v4df,v4df) -v8sf __builtin_ia32_andnps256 (v8sf,v8sf) -v4df __builtin_ia32_andpd256 (v4df,v4df) -v8sf __builtin_ia32_andps256 (v8sf,v8sf) -v4df __builtin_ia32_blendpd256 (v4df,v4df,int) -v8sf __builtin_ia32_blendps256 (v8sf,v8sf,int) -v4df __builtin_ia32_blendvpd256 (v4df,v4df,v4df) -v8sf __builtin_ia32_blendvps256 (v8sf,v8sf,v8sf) -v2df __builtin_ia32_cmppd (v2df,v2df,int) -v4df __builtin_ia32_cmppd256 (v4df,v4df,int) -v4sf __builtin_ia32_cmpps (v4sf,v4sf,int) -v8sf __builtin_ia32_cmpps256 (v8sf,v8sf,int) -v2df __builtin_ia32_cmpsd (v2df,v2df,int) -v4sf __builtin_ia32_cmpss (v4sf,v4sf,int) -v4df __builtin_ia32_cvtdq2pd256 (v4si) -v8sf __builtin_ia32_cvtdq2ps256 (v8si) -v4si __builtin_ia32_cvtpd2dq256 (v4df) -v4sf __builtin_ia32_cvtpd2ps256 (v4df) -v8si __builtin_ia32_cvtps2dq256 (v8sf) -v4df __builtin_ia32_cvtps2pd256 (v4sf) -v4si __builtin_ia32_cvttpd2dq256 (v4df) -v8si __builtin_ia32_cvttps2dq256 (v8sf) -v4df __builtin_ia32_divpd256 (v4df,v4df) -v8sf __builtin_ia32_divps256 (v8sf,v8sf) -v8sf __builtin_ia32_dpps256 (v8sf,v8sf,int) -v4df __builtin_ia32_haddpd256 (v4df,v4df) -v8sf __builtin_ia32_haddps256 (v8sf,v8sf) -v4df __builtin_ia32_hsubpd256 (v4df,v4df) -v8sf __builtin_ia32_hsubps256 (v8sf,v8sf) -v32qi __builtin_ia32_lddqu256 (pcchar) -v32qi __builtin_ia32_loaddqu256 (pcchar) -v4df __builtin_ia32_loadupd256 (pcdouble) -v8sf __builtin_ia32_loadups256 (pcfloat) -v2df __builtin_ia32_maskloadpd (pcv2df,v2df) -v4df __builtin_ia32_maskloadpd256 (pcv4df,v4df) -v4sf __builtin_ia32_maskloadps (pcv4sf,v4sf) -v8sf __builtin_ia32_maskloadps256 (pcv8sf,v8sf) -void __builtin_ia32_maskstorepd (pv2df,v2df,v2df) -void __builtin_ia32_maskstorepd256 (pv4df,v4df,v4df) -void __builtin_ia32_maskstoreps (pv4sf,v4sf,v4sf) -void __builtin_ia32_maskstoreps256 (pv8sf,v8sf,v8sf) -v4df __builtin_ia32_maxpd256 (v4df,v4df) -v8sf __builtin_ia32_maxps256 (v8sf,v8sf) -v4df __builtin_ia32_minpd256 (v4df,v4df) -v8sf __builtin_ia32_minps256 (v8sf,v8sf) -v4df __builtin_ia32_movddup256 (v4df) -int __builtin_ia32_movmskpd256 (v4df) -int __builtin_ia32_movmskps256 (v8sf) -v8sf __builtin_ia32_movshdup256 (v8sf) -v8sf __builtin_ia32_movsldup256 (v8sf) -v4df __builtin_ia32_mulpd256 (v4df,v4df) -v8sf __builtin_ia32_mulps256 (v8sf,v8sf) -v4df __builtin_ia32_orpd256 (v4df,v4df) -v8sf __builtin_ia32_orps256 (v8sf,v8sf) -v2df __builtin_ia32_pd_pd256 (v4df) -v4df __builtin_ia32_pd256_pd (v2df) -v4sf __builtin_ia32_ps_ps256 (v8sf) -v8sf __builtin_ia32_ps256_ps (v4sf) -int __builtin_ia32_ptestc256 (v4di,v4di,ptest) -int __builtin_ia32_ptestnzc256 (v4di,v4di,ptest) -int __builtin_ia32_ptestz256 (v4di,v4di,ptest) -v8sf __builtin_ia32_rcpps256 (v8sf) -v4df __builtin_ia32_roundpd256 (v4df,int) -v8sf __builtin_ia32_roundps256 (v8sf,int) -v8sf __builtin_ia32_rsqrtps_nr256 (v8sf) -v8sf __builtin_ia32_rsqrtps256 (v8sf) -v4df __builtin_ia32_shufpd256 (v4df,v4df,int) -v8sf __builtin_ia32_shufps256 (v8sf,v8sf,int) -v4si __builtin_ia32_si_si256 (v8si) -v8si __builtin_ia32_si256_si (v4si) -v4df __builtin_ia32_sqrtpd256 (v4df) -v8sf __builtin_ia32_sqrtps_nr256 (v8sf) -v8sf __builtin_ia32_sqrtps256 (v8sf) -void __builtin_ia32_storedqu256 (pchar,v32qi) -void __builtin_ia32_storeupd256 (pdouble,v4df) -void __builtin_ia32_storeups256 (pfloat,v8sf) -v4df __builtin_ia32_subpd256 (v4df,v4df) -v8sf __builtin_ia32_subps256 (v8sf,v8sf) -v4df __builtin_ia32_unpckhpd256 (v4df,v4df) -v8sf __builtin_ia32_unpckhps256 (v8sf,v8sf) -v4df __builtin_ia32_unpcklpd256 (v4df,v4df) -v8sf __builtin_ia32_unpcklps256 (v8sf,v8sf) -v4df __builtin_ia32_vbroadcastf128_pd256 (pcv2df) -v8sf __builtin_ia32_vbroadcastf128_ps256 (pcv4sf) -v4df __builtin_ia32_vbroadcastsd256 (pcdouble) -v4sf __builtin_ia32_vbroadcastss (pcfloat) -v8sf __builtin_ia32_vbroadcastss256 (pcfloat) -v2df __builtin_ia32_vextractf128_pd256 (v4df,int) -v4sf __builtin_ia32_vextractf128_ps256 (v8sf,int) -v4si __builtin_ia32_vextractf128_si256 (v8si,int) -v4df __builtin_ia32_vinsertf128_pd256 (v4df,v2df,int) -v8sf __builtin_ia32_vinsertf128_ps256 (v8sf,v4sf,int) -v8si __builtin_ia32_vinsertf128_si256 (v8si,v4si,int) -v4df __builtin_ia32_vperm2f128_pd256 (v4df,v4df,int) -v8sf __builtin_ia32_vperm2f128_ps256 (v8sf,v8sf,int) -v8si __builtin_ia32_vperm2f128_si256 (v8si,v8si,int) -v2df __builtin_ia32_vpermil2pd (v2df,v2df,v2di,int) -v4df __builtin_ia32_vpermil2pd256 (v4df,v4df,v4di,int) -v4sf __builtin_ia32_vpermil2ps (v4sf,v4sf,v4si,int) -v8sf __builtin_ia32_vpermil2ps256 (v8sf,v8sf,v8si,int) -v2df __builtin_ia32_vpermilpd (v2df,int) -v4df __builtin_ia32_vpermilpd256 (v4df,int) -v4sf __builtin_ia32_vpermilps (v4sf,int) -v8sf __builtin_ia32_vpermilps256 (v8sf,int) -v2df __builtin_ia32_vpermilvarpd (v2df,v2di) -v4df __builtin_ia32_vpermilvarpd256 (v4df,v4di) -v4sf __builtin_ia32_vpermilvarps (v4sf,v4si) -v8sf __builtin_ia32_vpermilvarps256 (v8sf,v8si) -int __builtin_ia32_vtestcpd (v2df,v2df,ptest) -int __builtin_ia32_vtestcpd256 (v4df,v4df,ptest) -int __builtin_ia32_vtestcps (v4sf,v4sf,ptest) -int __builtin_ia32_vtestcps256 (v8sf,v8sf,ptest) -int __builtin_ia32_vtestnzcpd (v2df,v2df,ptest) -int __builtin_ia32_vtestnzcpd256 (v4df,v4df,ptest) -int __builtin_ia32_vtestnzcps (v4sf,v4sf,ptest) -int __builtin_ia32_vtestnzcps256 (v8sf,v8sf,ptest) -int __builtin_ia32_vtestzpd (v2df,v2df,ptest) -int __builtin_ia32_vtestzpd256 (v4df,v4df,ptest) -int __builtin_ia32_vtestzps (v4sf,v4sf,ptest) -int __builtin_ia32_vtestzps256 (v8sf,v8sf,ptest) -void __builtin_ia32_vzeroall (void) -void __builtin_ia32_vzeroupper (void) -v4df __builtin_ia32_xorpd256 (v4df,v4df) -v8sf __builtin_ia32_xorps256 (v8sf,v8sf) -@end smallexample - -The following built-in functions are available when @option{-mavx2} is -used. All of them generate the machine instruction that is part of the -name. - -@smallexample -v32qi __builtin_ia32_mpsadbw256 (v32qi,v32qi,int) -v32qi __builtin_ia32_pabsb256 (v32qi) -v16hi __builtin_ia32_pabsw256 (v16hi) -v8si __builtin_ia32_pabsd256 (v8si) -v16hi __builtin_ia32_packssdw256 (v8si,v8si) -v32qi __builtin_ia32_packsswb256 (v16hi,v16hi) -v16hi __builtin_ia32_packusdw256 (v8si,v8si) -v32qi __builtin_ia32_packuswb256 (v16hi,v16hi) -v32qi __builtin_ia32_paddb256 (v32qi,v32qi) -v16hi __builtin_ia32_paddw256 (v16hi,v16hi) -v8si __builtin_ia32_paddd256 (v8si,v8si) -v4di __builtin_ia32_paddq256 (v4di,v4di) -v32qi __builtin_ia32_paddsb256 (v32qi,v32qi) -v16hi __builtin_ia32_paddsw256 (v16hi,v16hi) -v32qi __builtin_ia32_paddusb256 (v32qi,v32qi) -v16hi __builtin_ia32_paddusw256 (v16hi,v16hi) -v4di __builtin_ia32_palignr256 (v4di,v4di,int) -v4di __builtin_ia32_andsi256 (v4di,v4di) -v4di __builtin_ia32_andnotsi256 (v4di,v4di) -v32qi __builtin_ia32_pavgb256 (v32qi,v32qi) -v16hi __builtin_ia32_pavgw256 (v16hi,v16hi) -v32qi __builtin_ia32_pblendvb256 (v32qi,v32qi,v32qi) -v16hi __builtin_ia32_pblendw256 (v16hi,v16hi,int) -v32qi __builtin_ia32_pcmpeqb256 (v32qi,v32qi) -v16hi __builtin_ia32_pcmpeqw256 (v16hi,v16hi) -v8si __builtin_ia32_pcmpeqd256 (c8si,v8si) -v4di __builtin_ia32_pcmpeqq256 (v4di,v4di) -v32qi __builtin_ia32_pcmpgtb256 (v32qi,v32qi) -v16hi __builtin_ia32_pcmpgtw256 (16hi,v16hi) -v8si __builtin_ia32_pcmpgtd256 (v8si,v8si) -v4di __builtin_ia32_pcmpgtq256 (v4di,v4di) -v16hi __builtin_ia32_phaddw256 (v16hi,v16hi) -v8si __builtin_ia32_phaddd256 (v8si,v8si) -v16hi __builtin_ia32_phaddsw256 (v16hi,v16hi) -v16hi __builtin_ia32_phsubw256 (v16hi,v16hi) -v8si __builtin_ia32_phsubd256 (v8si,v8si) -v16hi __builtin_ia32_phsubsw256 (v16hi,v16hi) -v32qi __builtin_ia32_pmaddubsw256 (v32qi,v32qi) -v16hi __builtin_ia32_pmaddwd256 (v16hi,v16hi) -v32qi __builtin_ia32_pmaxsb256 (v32qi,v32qi) -v16hi __builtin_ia32_pmaxsw256 (v16hi,v16hi) -v8si __builtin_ia32_pmaxsd256 (v8si,v8si) -v32qi __builtin_ia32_pmaxub256 (v32qi,v32qi) -v16hi __builtin_ia32_pmaxuw256 (v16hi,v16hi) -v8si __builtin_ia32_pmaxud256 (v8si,v8si) -v32qi __builtin_ia32_pminsb256 (v32qi,v32qi) -v16hi __builtin_ia32_pminsw256 (v16hi,v16hi) -v8si __builtin_ia32_pminsd256 (v8si,v8si) -v32qi __builtin_ia32_pminub256 (v32qi,v32qi) -v16hi __builtin_ia32_pminuw256 (v16hi,v16hi) -v8si __builtin_ia32_pminud256 (v8si,v8si) -int __builtin_ia32_pmovmskb256 (v32qi) -v16hi __builtin_ia32_pmovsxbw256 (v16qi) -v8si __builtin_ia32_pmovsxbd256 (v16qi) -v4di __builtin_ia32_pmovsxbq256 (v16qi) -v8si __builtin_ia32_pmovsxwd256 (v8hi) -v4di __builtin_ia32_pmovsxwq256 (v8hi) -v4di __builtin_ia32_pmovsxdq256 (v4si) -v16hi __builtin_ia32_pmovzxbw256 (v16qi) -v8si __builtin_ia32_pmovzxbd256 (v16qi) -v4di __builtin_ia32_pmovzxbq256 (v16qi) -v8si __builtin_ia32_pmovzxwd256 (v8hi) -v4di __builtin_ia32_pmovzxwq256 (v8hi) -v4di __builtin_ia32_pmovzxdq256 (v4si) -v4di __builtin_ia32_pmuldq256 (v8si,v8si) -v16hi __builtin_ia32_pmulhrsw256 (v16hi, v16hi) -v16hi __builtin_ia32_pmulhuw256 (v16hi,v16hi) -v16hi __builtin_ia32_pmulhw256 (v16hi,v16hi) -v16hi __builtin_ia32_pmullw256 (v16hi,v16hi) -v8si __builtin_ia32_pmulld256 (v8si,v8si) -v4di __builtin_ia32_pmuludq256 (v8si,v8si) -v4di __builtin_ia32_por256 (v4di,v4di) -v16hi __builtin_ia32_psadbw256 (v32qi,v32qi) -v32qi __builtin_ia32_pshufb256 (v32qi,v32qi) -v8si __builtin_ia32_pshufd256 (v8si,int) -v16hi __builtin_ia32_pshufhw256 (v16hi,int) -v16hi __builtin_ia32_pshuflw256 (v16hi,int) -v32qi __builtin_ia32_psignb256 (v32qi,v32qi) -v16hi __builtin_ia32_psignw256 (v16hi,v16hi) -v8si __builtin_ia32_psignd256 (v8si,v8si) -v4di __builtin_ia32_pslldqi256 (v4di,int) -v16hi __builtin_ia32_psllwi256 (16hi,int) -v16hi __builtin_ia32_psllw256(v16hi,v8hi) -v8si __builtin_ia32_pslldi256 (v8si,int) -v8si __builtin_ia32_pslld256(v8si,v4si) -v4di __builtin_ia32_psllqi256 (v4di,int) -v4di __builtin_ia32_psllq256(v4di,v2di) -v16hi __builtin_ia32_psrawi256 (v16hi,int) -v16hi __builtin_ia32_psraw256 (v16hi,v8hi) -v8si __builtin_ia32_psradi256 (v8si,int) -v8si __builtin_ia32_psrad256 (v8si,v4si) -v4di __builtin_ia32_psrldqi256 (v4di, int) -v16hi __builtin_ia32_psrlwi256 (v16hi,int) -v16hi __builtin_ia32_psrlw256 (v16hi,v8hi) -v8si __builtin_ia32_psrldi256 (v8si,int) -v8si __builtin_ia32_psrld256 (v8si,v4si) -v4di __builtin_ia32_psrlqi256 (v4di,int) -v4di __builtin_ia32_psrlq256(v4di,v2di) -v32qi __builtin_ia32_psubb256 (v32qi,v32qi) -v32hi __builtin_ia32_psubw256 (v16hi,v16hi) -v8si __builtin_ia32_psubd256 (v8si,v8si) -v4di __builtin_ia32_psubq256 (v4di,v4di) -v32qi __builtin_ia32_psubsb256 (v32qi,v32qi) -v16hi __builtin_ia32_psubsw256 (v16hi,v16hi) -v32qi __builtin_ia32_psubusb256 (v32qi,v32qi) -v16hi __builtin_ia32_psubusw256 (v16hi,v16hi) -v32qi __builtin_ia32_punpckhbw256 (v32qi,v32qi) -v16hi __builtin_ia32_punpckhwd256 (v16hi,v16hi) -v8si __builtin_ia32_punpckhdq256 (v8si,v8si) -v4di __builtin_ia32_punpckhqdq256 (v4di,v4di) -v32qi __builtin_ia32_punpcklbw256 (v32qi,v32qi) -v16hi __builtin_ia32_punpcklwd256 (v16hi,v16hi) -v8si __builtin_ia32_punpckldq256 (v8si,v8si) -v4di __builtin_ia32_punpcklqdq256 (v4di,v4di) -v4di __builtin_ia32_pxor256 (v4di,v4di) -v4di __builtin_ia32_movntdqa256 (pv4di) -v4sf __builtin_ia32_vbroadcastss_ps (v4sf) -v8sf __builtin_ia32_vbroadcastss_ps256 (v4sf) -v4df __builtin_ia32_vbroadcastsd_pd256 (v2df) -v4di __builtin_ia32_vbroadcastsi256 (v2di) -v4si __builtin_ia32_pblendd128 (v4si,v4si) -v8si __builtin_ia32_pblendd256 (v8si,v8si) -v32qi __builtin_ia32_pbroadcastb256 (v16qi) -v16hi __builtin_ia32_pbroadcastw256 (v8hi) -v8si __builtin_ia32_pbroadcastd256 (v4si) -v4di __builtin_ia32_pbroadcastq256 (v2di) -v16qi __builtin_ia32_pbroadcastb128 (v16qi) -v8hi __builtin_ia32_pbroadcastw128 (v8hi) -v4si __builtin_ia32_pbroadcastd128 (v4si) -v2di __builtin_ia32_pbroadcastq128 (v2di) -v8si __builtin_ia32_permvarsi256 (v8si,v8si) -v4df __builtin_ia32_permdf256 (v4df,int) -v8sf __builtin_ia32_permvarsf256 (v8sf,v8sf) -v4di __builtin_ia32_permdi256 (v4di,int) -v4di __builtin_ia32_permti256 (v4di,v4di,int) -v4di __builtin_ia32_extract128i256 (v4di,int) -v4di __builtin_ia32_insert128i256 (v4di,v2di,int) -v8si __builtin_ia32_maskloadd256 (pcv8si,v8si) -v4di __builtin_ia32_maskloadq256 (pcv4di,v4di) -v4si __builtin_ia32_maskloadd (pcv4si,v4si) -v2di __builtin_ia32_maskloadq (pcv2di,v2di) -void __builtin_ia32_maskstored256 (pv8si,v8si,v8si) -void __builtin_ia32_maskstoreq256 (pv4di,v4di,v4di) -void __builtin_ia32_maskstored (pv4si,v4si,v4si) -void __builtin_ia32_maskstoreq (pv2di,v2di,v2di) -v8si __builtin_ia32_psllv8si (v8si,v8si) -v4si __builtin_ia32_psllv4si (v4si,v4si) -v4di __builtin_ia32_psllv4di (v4di,v4di) -v2di __builtin_ia32_psllv2di (v2di,v2di) -v8si __builtin_ia32_psrav8si (v8si,v8si) -v4si __builtin_ia32_psrav4si (v4si,v4si) -v8si __builtin_ia32_psrlv8si (v8si,v8si) -v4si __builtin_ia32_psrlv4si (v4si,v4si) -v4di __builtin_ia32_psrlv4di (v4di,v4di) -v2di __builtin_ia32_psrlv2di (v2di,v2di) -v2df __builtin_ia32_gathersiv2df (v2df, pcdouble,v4si,v2df,int) -v4df __builtin_ia32_gathersiv4df (v4df, pcdouble,v4si,v4df,int) -v2df __builtin_ia32_gatherdiv2df (v2df, pcdouble,v2di,v2df,int) -v4df __builtin_ia32_gatherdiv4df (v4df, pcdouble,v4di,v4df,int) -v4sf __builtin_ia32_gathersiv4sf (v4sf, pcfloat,v4si,v4sf,int) -v8sf __builtin_ia32_gathersiv8sf (v8sf, pcfloat,v8si,v8sf,int) -v4sf __builtin_ia32_gatherdiv4sf (v4sf, pcfloat,v2di,v4sf,int) -v4sf __builtin_ia32_gatherdiv4sf256 (v4sf, pcfloat,v4di,v4sf,int) -v2di __builtin_ia32_gathersiv2di (v2di, pcint64,v4si,v2di,int) -v4di __builtin_ia32_gathersiv4di (v4di, pcint64,v4si,v4di,int) -v2di __builtin_ia32_gatherdiv2di (v2di, pcint64,v2di,v2di,int) -v4di __builtin_ia32_gatherdiv4di (v4di, pcint64,v4di,v4di,int) -v4si __builtin_ia32_gathersiv4si (v4si, pcint,v4si,v4si,int) -v8si __builtin_ia32_gathersiv8si (v8si, pcint,v8si,v8si,int) -v4si __builtin_ia32_gatherdiv4si (v4si, pcint,v2di,v4si,int) -v4si __builtin_ia32_gatherdiv4si256 (v4si, pcint,v4di,v4si,int) -@end smallexample - -The following built-in functions are available when @option{-maes} is -used. All of them generate the machine instruction that is part of the -name. - -@smallexample -v2di __builtin_ia32_aesenc128 (v2di, v2di) -v2di __builtin_ia32_aesenclast128 (v2di, v2di) -v2di __builtin_ia32_aesdec128 (v2di, v2di) -v2di __builtin_ia32_aesdeclast128 (v2di, v2di) -v2di __builtin_ia32_aeskeygenassist128 (v2di, const int) -v2di __builtin_ia32_aesimc128 (v2di) -@end smallexample - -The following built-in function is available when @option{-mpclmul} is -used. - -@table @code -@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int) -Generates the @code{pclmulqdq} machine instruction. -@end table - -The following built-in function is available when @option{-mfsgsbase} is -used. All of them generate the machine instruction that is part of the -name. - -@smallexample -unsigned int __builtin_ia32_rdfsbase32 (void) -unsigned long long __builtin_ia32_rdfsbase64 (void) -unsigned int __builtin_ia32_rdgsbase32 (void) -unsigned long long __builtin_ia32_rdgsbase64 (void) -void _writefsbase_u32 (unsigned int) -void _writefsbase_u64 (unsigned long long) -void _writegsbase_u32 (unsigned int) -void _writegsbase_u64 (unsigned long long) -@end smallexample - -The following built-in function is available when @option{-mrdrnd} is -used. All of them generate the machine instruction that is part of the -name. - -@smallexample -unsigned int __builtin_ia32_rdrand16_step (unsigned short *) -unsigned int __builtin_ia32_rdrand32_step (unsigned int *) -unsigned int __builtin_ia32_rdrand64_step (unsigned long long *) -@end smallexample - -The following built-in functions are available when @option{-msse4a} is used. -All of them generate the machine instruction that is part of the name. - -@smallexample -void __builtin_ia32_movntsd (double *, v2df) -void __builtin_ia32_movntss (float *, v4sf) -v2di __builtin_ia32_extrq (v2di, v16qi) -v2di __builtin_ia32_extrqi (v2di, const unsigned int, const unsigned int) -v2di __builtin_ia32_insertq (v2di, v2di) -v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int) -@end smallexample - -The following built-in functions are available when @option{-mxop} is used. -@smallexample -v2df __builtin_ia32_vfrczpd (v2df) -v4sf __builtin_ia32_vfrczps (v4sf) -v2df __builtin_ia32_vfrczsd (v2df) -v4sf __builtin_ia32_vfrczss (v4sf) -v4df __builtin_ia32_vfrczpd256 (v4df) -v8sf __builtin_ia32_vfrczps256 (v8sf) -v2di __builtin_ia32_vpcmov (v2di, v2di, v2di) -v2di __builtin_ia32_vpcmov_v2di (v2di, v2di, v2di) -v4si __builtin_ia32_vpcmov_v4si (v4si, v4si, v4si) -v8hi __builtin_ia32_vpcmov_v8hi (v8hi, v8hi, v8hi) -v16qi __builtin_ia32_vpcmov_v16qi (v16qi, v16qi, v16qi) -v2df __builtin_ia32_vpcmov_v2df (v2df, v2df, v2df) -v4sf __builtin_ia32_vpcmov_v4sf (v4sf, v4sf, v4sf) -v4di __builtin_ia32_vpcmov_v4di256 (v4di, v4di, v4di) -v8si __builtin_ia32_vpcmov_v8si256 (v8si, v8si, v8si) -v16hi __builtin_ia32_vpcmov_v16hi256 (v16hi, v16hi, v16hi) -v32qi __builtin_ia32_vpcmov_v32qi256 (v32qi, v32qi, v32qi) -v4df __builtin_ia32_vpcmov_v4df256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vpcmov_v8sf256 (v8sf, v8sf, v8sf) -v16qi __builtin_ia32_vpcomeqb (v16qi, v16qi) -v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi) -v4si __builtin_ia32_vpcomeqd (v4si, v4si) -v2di __builtin_ia32_vpcomeqq (v2di, v2di) -v16qi __builtin_ia32_vpcomequb (v16qi, v16qi) -v4si __builtin_ia32_vpcomequd (v4si, v4si) -v2di __builtin_ia32_vpcomequq (v2di, v2di) -v8hi __builtin_ia32_vpcomequw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi) -v16qi __builtin_ia32_vpcomfalseb (v16qi, v16qi) -v4si __builtin_ia32_vpcomfalsed (v4si, v4si) -v2di __builtin_ia32_vpcomfalseq (v2di, v2di) -v16qi __builtin_ia32_vpcomfalseub (v16qi, v16qi) -v4si __builtin_ia32_vpcomfalseud (v4si, v4si) -v2di __builtin_ia32_vpcomfalseuq (v2di, v2di) -v8hi __builtin_ia32_vpcomfalseuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomfalsew (v8hi, v8hi) -v16qi __builtin_ia32_vpcomgeb (v16qi, v16qi) -v4si __builtin_ia32_vpcomged (v4si, v4si) -v2di __builtin_ia32_vpcomgeq (v2di, v2di) -v16qi __builtin_ia32_vpcomgeub (v16qi, v16qi) -v4si __builtin_ia32_vpcomgeud (v4si, v4si) -v2di __builtin_ia32_vpcomgeuq (v2di, v2di) -v8hi __builtin_ia32_vpcomgeuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomgew (v8hi, v8hi) -v16qi __builtin_ia32_vpcomgtb (v16qi, v16qi) -v4si __builtin_ia32_vpcomgtd (v4si, v4si) -v2di __builtin_ia32_vpcomgtq (v2di, v2di) -v16qi __builtin_ia32_vpcomgtub (v16qi, v16qi) -v4si __builtin_ia32_vpcomgtud (v4si, v4si) -v2di __builtin_ia32_vpcomgtuq (v2di, v2di) -v8hi __builtin_ia32_vpcomgtuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomgtw (v8hi, v8hi) -v16qi __builtin_ia32_vpcomleb (v16qi, v16qi) -v4si __builtin_ia32_vpcomled (v4si, v4si) -v2di __builtin_ia32_vpcomleq (v2di, v2di) -v16qi __builtin_ia32_vpcomleub (v16qi, v16qi) -v4si __builtin_ia32_vpcomleud (v4si, v4si) -v2di __builtin_ia32_vpcomleuq (v2di, v2di) -v8hi __builtin_ia32_vpcomleuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomlew (v8hi, v8hi) -v16qi __builtin_ia32_vpcomltb (v16qi, v16qi) -v4si __builtin_ia32_vpcomltd (v4si, v4si) -v2di __builtin_ia32_vpcomltq (v2di, v2di) -v16qi __builtin_ia32_vpcomltub (v16qi, v16qi) -v4si __builtin_ia32_vpcomltud (v4si, v4si) -v2di __builtin_ia32_vpcomltuq (v2di, v2di) -v8hi __builtin_ia32_vpcomltuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomltw (v8hi, v8hi) -v16qi __builtin_ia32_vpcomneb (v16qi, v16qi) -v4si __builtin_ia32_vpcomned (v4si, v4si) -v2di __builtin_ia32_vpcomneq (v2di, v2di) -v16qi __builtin_ia32_vpcomneub (v16qi, v16qi) -v4si __builtin_ia32_vpcomneud (v4si, v4si) -v2di __builtin_ia32_vpcomneuq (v2di, v2di) -v8hi __builtin_ia32_vpcomneuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomnew (v8hi, v8hi) -v16qi __builtin_ia32_vpcomtrueb (v16qi, v16qi) -v4si __builtin_ia32_vpcomtrued (v4si, v4si) -v2di __builtin_ia32_vpcomtrueq (v2di, v2di) -v16qi __builtin_ia32_vpcomtrueub (v16qi, v16qi) -v4si __builtin_ia32_vpcomtrueud (v4si, v4si) -v2di __builtin_ia32_vpcomtrueuq (v2di, v2di) -v8hi __builtin_ia32_vpcomtrueuw (v8hi, v8hi) -v8hi __builtin_ia32_vpcomtruew (v8hi, v8hi) -v4si __builtin_ia32_vphaddbd (v16qi) -v2di __builtin_ia32_vphaddbq (v16qi) -v8hi __builtin_ia32_vphaddbw (v16qi) -v2di __builtin_ia32_vphadddq (v4si) -v4si __builtin_ia32_vphaddubd (v16qi) -v2di __builtin_ia32_vphaddubq (v16qi) -v8hi __builtin_ia32_vphaddubw (v16qi) -v2di __builtin_ia32_vphaddudq (v4si) -v4si __builtin_ia32_vphadduwd (v8hi) -v2di __builtin_ia32_vphadduwq (v8hi) -v4si __builtin_ia32_vphaddwd (v8hi) -v2di __builtin_ia32_vphaddwq (v8hi) -v8hi __builtin_ia32_vphsubbw (v16qi) -v2di __builtin_ia32_vphsubdq (v4si) -v4si __builtin_ia32_vphsubwd (v8hi) -v4si __builtin_ia32_vpmacsdd (v4si, v4si, v4si) -v2di __builtin_ia32_vpmacsdqh (v4si, v4si, v2di) -v2di __builtin_ia32_vpmacsdql (v4si, v4si, v2di) -v4si __builtin_ia32_vpmacssdd (v4si, v4si, v4si) -v2di __builtin_ia32_vpmacssdqh (v4si, v4si, v2di) -v2di __builtin_ia32_vpmacssdql (v4si, v4si, v2di) -v4si __builtin_ia32_vpmacsswd (v8hi, v8hi, v4si) -v8hi __builtin_ia32_vpmacssww (v8hi, v8hi, v8hi) -v4si __builtin_ia32_vpmacswd (v8hi, v8hi, v4si) -v8hi __builtin_ia32_vpmacsww (v8hi, v8hi, v8hi) -v4si __builtin_ia32_vpmadcsswd (v8hi, v8hi, v4si) -v4si __builtin_ia32_vpmadcswd (v8hi, v8hi, v4si) -v16qi __builtin_ia32_vpperm (v16qi, v16qi, v16qi) -v16qi __builtin_ia32_vprotb (v16qi, v16qi) -v4si __builtin_ia32_vprotd (v4si, v4si) -v2di __builtin_ia32_vprotq (v2di, v2di) -v8hi __builtin_ia32_vprotw (v8hi, v8hi) -v16qi __builtin_ia32_vpshab (v16qi, v16qi) -v4si __builtin_ia32_vpshad (v4si, v4si) -v2di __builtin_ia32_vpshaq (v2di, v2di) -v8hi __builtin_ia32_vpshaw (v8hi, v8hi) -v16qi __builtin_ia32_vpshlb (v16qi, v16qi) -v4si __builtin_ia32_vpshld (v4si, v4si) -v2di __builtin_ia32_vpshlq (v2di, v2di) -v8hi __builtin_ia32_vpshlw (v8hi, v8hi) -@end smallexample - -The following built-in functions are available when @option{-mfma4} is used. -All of them generate the machine instruction that is part of the name. - -@smallexample -v2df __builtin_ia32_vfmaddpd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfmaddps (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfmaddsd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfmaddss (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfmsubpd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfmsubps (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfmsubsd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfmsubss (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfnmaddpd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfnmaddps (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfnmaddsd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfnmaddss (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfnmsubpd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfnmsubps (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfnmsubsd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfnmsubss (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfmaddsubpd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfmaddsubps (v4sf, v4sf, v4sf) -v2df __builtin_ia32_vfmsubaddpd (v2df, v2df, v2df) -v4sf __builtin_ia32_vfmsubaddps (v4sf, v4sf, v4sf) -v4df __builtin_ia32_vfmaddpd256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vfmaddps256 (v8sf, v8sf, v8sf) -v4df __builtin_ia32_vfmsubpd256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vfmsubps256 (v8sf, v8sf, v8sf) -v4df __builtin_ia32_vfnmaddpd256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vfnmaddps256 (v8sf, v8sf, v8sf) -v4df __builtin_ia32_vfnmsubpd256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vfnmsubps256 (v8sf, v8sf, v8sf) -v4df __builtin_ia32_vfmaddsubpd256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vfmaddsubps256 (v8sf, v8sf, v8sf) -v4df __builtin_ia32_vfmsubaddpd256 (v4df, v4df, v4df) -v8sf __builtin_ia32_vfmsubaddps256 (v8sf, v8sf, v8sf) - -@end smallexample - -The following built-in functions are available when @option{-mlwp} is used. - -@smallexample -void __builtin_ia32_llwpcb16 (void *); -void __builtin_ia32_llwpcb32 (void *); -void __builtin_ia32_llwpcb64 (void *); -void * __builtin_ia32_llwpcb16 (void); -void * __builtin_ia32_llwpcb32 (void); -void * __builtin_ia32_llwpcb64 (void); -void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short) -void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int) -void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int) -unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short) -unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int) -unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int) -@end smallexample - -The following built-in functions are available when @option{-mbmi} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int); -unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long); -@end smallexample - -The following built-in functions are available when @option{-mbmi2} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -unsigned int _bzhi_u32 (unsigned int, unsigned int) -unsigned int _pdep_u32 (unsigned int, unsigned int) -unsigned int _pext_u32 (unsigned int, unsigned int) -unsigned long long _bzhi_u64 (unsigned long long, unsigned long long) -unsigned long long _pdep_u64 (unsigned long long, unsigned long long) -unsigned long long _pext_u64 (unsigned long long, unsigned long long) -@end smallexample - -The following built-in functions are available when @option{-mlzcnt} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -unsigned short __builtin_ia32_lzcnt_16(unsigned short); -unsigned int __builtin_ia32_lzcnt_u32(unsigned int); -unsigned long long __builtin_ia32_lzcnt_u64 (unsigned long long); -@end smallexample - -The following built-in functions are available when @option{-mfxsr} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -void __builtin_ia32_fxsave (void *) -void __builtin_ia32_fxrstor (void *) -void __builtin_ia32_fxsave64 (void *) -void __builtin_ia32_fxrstor64 (void *) -@end smallexample - -The following built-in functions are available when @option{-mxsave} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -void __builtin_ia32_xsave (void *, long long) -void __builtin_ia32_xrstor (void *, long long) -void __builtin_ia32_xsave64 (void *, long long) -void __builtin_ia32_xrstor64 (void *, long long) -@end smallexample - -The following built-in functions are available when @option{-mxsaveopt} is used. -All of them generate the machine instruction that is part of the name. -@smallexample -void __builtin_ia32_xsaveopt (void *, long long) -void __builtin_ia32_xsaveopt64 (void *, long long) -@end smallexample - -The following built-in functions are available when @option{-mtbm} is used. -Both of them generate the immediate form of the bextr machine instruction. -@smallexample -unsigned int __builtin_ia32_bextri_u32 (unsigned int, const unsigned int); -unsigned long long __builtin_ia32_bextri_u64 (unsigned long long, const unsigned long long); -@end smallexample - - -The following built-in functions are available when @option{-m3dnow} is used. -All of them generate the machine instruction that is part of the name. - -@smallexample -void __builtin_ia32_femms (void) -v8qi __builtin_ia32_pavgusb (v8qi, v8qi) -v2si __builtin_ia32_pf2id (v2sf) -v2sf __builtin_ia32_pfacc (v2sf, v2sf) -v2sf __builtin_ia32_pfadd (v2sf, v2sf) -v2si __builtin_ia32_pfcmpeq (v2sf, v2sf) -v2si __builtin_ia32_pfcmpge (v2sf, v2sf) -v2si __builtin_ia32_pfcmpgt (v2sf, v2sf) -v2sf __builtin_ia32_pfmax (v2sf, v2sf) -v2sf __builtin_ia32_pfmin (v2sf, v2sf) -v2sf __builtin_ia32_pfmul (v2sf, v2sf) -v2sf __builtin_ia32_pfrcp (v2sf) -v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf) -v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf) -v2sf __builtin_ia32_pfrsqrt (v2sf) -v2sf __builtin_ia32_pfsub (v2sf, v2sf) -v2sf __builtin_ia32_pfsubr (v2sf, v2sf) -v2sf __builtin_ia32_pi2fd (v2si) -v4hi __builtin_ia32_pmulhrw (v4hi, v4hi) -@end smallexample - -The following built-in functions are available when both @option{-m3dnow} -and @option{-march=athlon} are used. All of them generate the machine -instruction that is part of the name. - -@smallexample -v2si __builtin_ia32_pf2iw (v2sf) -v2sf __builtin_ia32_pfnacc (v2sf, v2sf) -v2sf __builtin_ia32_pfpnacc (v2sf, v2sf) -v2sf __builtin_ia32_pi2fw (v2si) -v2sf __builtin_ia32_pswapdsf (v2sf) -v2si __builtin_ia32_pswapdsi (v2si) -@end smallexample - -The following built-in functions are available when @option{-mrtm} is used -They are used for restricted transactional memory. These are the internal -low level functions. Normally the functions in -@ref{x86 transactional memory intrinsics} should be used instead. - -@smallexample -int __builtin_ia32_xbegin () -void __builtin_ia32_xend () -void __builtin_ia32_xabort (status) -int __builtin_ia32_xtest () -@end smallexample - -@node x86 transactional memory intrinsics -@subsection x86 transaction memory intrinsics - -Hardware transactional memory intrinsics for x86. These allow to use -memory transactions with RTM (Restricted Transactional Memory). -For using HLE (Hardware Lock Elision) see @ref{x86 specific memory model extensions for transactional memory} instead. -This support is enabled with the @option{-mrtm} option. - -A memory transaction commits all changes to memory in an atomic way, -as visible to other threads. If the transaction fails it is rolled back -and all side effects discarded. - -Generally there is no guarantee that a memory transaction ever succeeds -and suitable fallback code always needs to be supplied. - -@deftypefn {RTM Function} {unsigned} _xbegin () -Start a RTM (Restricted Transactional Memory) transaction. -Returns _XBEGIN_STARTED when the transaction -started successfully (note this is not 0, so the constant has to be -explicitely tested). When the transaction aborts all side effects -are undone and an abort code is returned. There is no guarantee -any transaction ever succeeds, so there always needs to be a valid -tested fallback path. -@end deftypefn - -@smallexample -#include - -if ((status = _xbegin ()) == _XBEGIN_STARTED) @{ - ... transaction code... - _xend (); -@} else @{ - ... non transactional fallback path... -@} -@end smallexample - -Valid abort status bits (when the value is not @code{_XBEGIN_STARTED}) are: - -@table @code -@item _XABORT_EXPLICIT -Transaction explicitely aborted with @code{_xabort}. The parameter passed -to @code{_xabort} is available with @code{_XABORT_CODE(status)} -@item _XABORT_RETRY -Transaction retry is possible. -@item _XABORT_CONFLICT -Transaction abort due to a memory conflict with another thread -@item _XABORT_CAPACITY -Transaction abort due to the transaction using too much memory -@item _XABORT_DEBUG -Transaction abort due to a debug trap -@item _XABORT_NESTED -Transaction abort in a inner nested transaction -@end table - -@deftypefn {RTM Function} {void} _xend () -Commit the current transaction. When no transaction is active this will -fault. All memory side effects of the transactions will become visible -to other threads in an atomic matter. -@end deftypefn - -@deftypefn {RTM Function} {int} _xtest () -Return a value not zero when a transaction is currently active, otherwise 0. -@end deftypefn - -@deftypefn {RTM Function} {void} _xabort (status) -Abort the current transaction. When no transaction is active this is a no-op. -status must be a 8bit constant, that is included in the status code returned -by @code{_xbegin} -@end deftypefn - -@node MIPS DSP Built-in Functions -@subsection MIPS DSP Built-in Functions - -The MIPS DSP Application-Specific Extension (ASE) includes new -instructions that are designed to improve the performance of DSP and -media applications. It provides instructions that operate on packed -8-bit/16-bit integer data, Q7, Q15 and Q31 fractional data. - -GCC supports MIPS DSP operations using both the generic -vector extensions (@pxref{Vector Extensions}) and a collection of -MIPS-specific built-in functions. Both kinds of support are -enabled by the @option{-mdsp} command-line option. - -Revision 2 of the ASE was introduced in the second half of 2006. -This revision adds extra instructions to the original ASE, but is -otherwise backwards-compatible with it. You can select revision 2 -using the command-line option @option{-mdspr2}; this option implies -@option{-mdsp}. - -The SCOUNT and POS bits of the DSP control register are global. The -WRDSP, EXTPDP, EXTPDPV and MTHLIP instructions modify the SCOUNT and -POS bits. During optimization, the compiler does not delete these -instructions and it does not delete calls to functions containing -these instructions. - -At present, GCC only provides support for operations on 32-bit -vectors. The vector type associated with 8-bit integer data is -usually called @code{v4i8}, the vector type associated with Q7 -is usually called @code{v4q7}, the vector type associated with 16-bit -integer data is usually called @code{v2i16}, and the vector type -associated with Q15 is usually called @code{v2q15}. They can be -defined in C as follows: - -@smallexample -typedef signed char v4i8 __attribute__ ((vector_size(4))); -typedef signed char v4q7 __attribute__ ((vector_size(4))); -typedef short v2i16 __attribute__ ((vector_size(4))); -typedef short v2q15 __attribute__ ((vector_size(4))); -@end smallexample - -@code{v4i8}, @code{v4q7}, @code{v2i16} and @code{v2q15} values are -initialized in the same way as aggregates. For example: - -@smallexample -v4i8 a = @{1, 2, 3, 4@}; -v4i8 b; -b = (v4i8) @{5, 6, 7, 8@}; - -v2q15 c = @{0x0fcb, 0x3a75@}; -v2q15 d; -d = (v2q15) @{0.1234 * 0x1.0p15, 0.4567 * 0x1.0p15@}; -@end smallexample - -@emph{Note:} The CPU's endianness determines the order in which values -are packed. On little-endian targets, the first value is the least -significant and the last value is the most significant. The opposite -order applies to big-endian targets. For example, the code above -sets the lowest byte of @code{a} to @code{1} on little-endian targets -and @code{4} on big-endian targets. - -@emph{Note:} Q7, Q15 and Q31 values must be initialized with their integer -representation. As shown in this example, the integer representation -of a Q7 value can be obtained by multiplying the fractional value by -@code{0x1.0p7}. The equivalent for Q15 values is to multiply by -@code{0x1.0p15}. The equivalent for Q31 values is to multiply by -@code{0x1.0p31}. - -The table below lists the @code{v4i8} and @code{v2q15} operations for which -hardware support exists. @code{a} and @code{b} are @code{v4i8} values, -and @code{c} and @code{d} are @code{v2q15} values. - -@multitable @columnfractions .50 .50 -@item C code @tab MIPS instruction -@item @code{a + b} @tab @code{addu.qb} -@item @code{c + d} @tab @code{addq.ph} -@item @code{a - b} @tab @code{subu.qb} -@item @code{c - d} @tab @code{subq.ph} -@end multitable - -The table below lists the @code{v2i16} operation for which -hardware support exists for the DSP ASE REV 2. @code{e} and @code{f} are -@code{v2i16} values. - -@multitable @columnfractions .50 .50 -@item C code @tab MIPS instruction -@item @code{e * f} @tab @code{mul.ph} -@end multitable - -It is easier to describe the DSP built-in functions if we first define -the following types: - -@smallexample -typedef int q31; -typedef int i32; -typedef unsigned int ui32; -typedef long long a64; +float __builtin_recipdivf (float, float); +float __builtin_rsqrtf (float); +double __builtin_recipdiv (double, double); +double __builtin_rsqrt (double); +uint64_t __builtin_ppc_get_timebase (); +unsigned long __builtin_ppc_mftb (); +double __builtin_unpack_longdouble (long double, int); +long double __builtin_pack_longdouble (double, double); @end smallexample -@code{q31} and @code{i32} are actually the same as @code{int}, but we -use @code{q31} to indicate a Q31 fractional value and @code{i32} to -indicate a 32-bit integer value. Similarly, @code{a64} is the same as -@code{long long}, but we use @code{a64} to indicate values that are -placed in one of the four DSP accumulators (@code{$ac0}, -@code{$ac1}, @code{$ac2} or @code{$ac3}). - -Also, some built-in functions prefer or require immediate numbers as -parameters, because the corresponding DSP instructions accept both immediate -numbers and register operands, or accept immediate numbers only. The -immediate parameters are listed as follows. +The @code{vec_rsqrt}, @code{__builtin_rsqrt}, and +@code{__builtin_rsqrtf} functions generate multiple instructions to +implement the reciprocal sqrt functionality using reciprocal sqrt +estimate instructions. -@smallexample -imm0_3: 0 to 3. -imm0_7: 0 to 7. -imm0_15: 0 to 15. -imm0_31: 0 to 31. -imm0_63: 0 to 63. -imm0_255: 0 to 255. -imm_n32_31: -32 to 31. -imm_n512_511: -512 to 511. -@end smallexample +The @code{__builtin_recipdiv}, and @code{__builtin_recipdivf} +functions generate multiple instructions to implement division using +the reciprocal estimate instructions. -The following built-in functions map directly to a particular MIPS DSP -instruction. Please refer to the architecture specification -for details on what each instruction does. +The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb} +functions generate instructions to read the Time Base Register. The +@code{__builtin_ppc_get_timebase} function may generate multiple +instructions and always returns the 64 bits of the Time Base Register. +The @code{__builtin_ppc_mftb} function always generates one instruction and +returns the Time Base Register value as an unsigned long, throwing away +the most significant word on 32-bit environments. +The following built-in functions are available for the PowerPC family +of processors, starting with ISA 2.06 or later (@option{-mcpu=power7} +or @option{-mpopcntd}): @smallexample -v2q15 __builtin_mips_addq_ph (v2q15, v2q15) -v2q15 __builtin_mips_addq_s_ph (v2q15, v2q15) -q31 __builtin_mips_addq_s_w (q31, q31) -v4i8 __builtin_mips_addu_qb (v4i8, v4i8) -v4i8 __builtin_mips_addu_s_qb (v4i8, v4i8) -v2q15 __builtin_mips_subq_ph (v2q15, v2q15) -v2q15 __builtin_mips_subq_s_ph (v2q15, v2q15) -q31 __builtin_mips_subq_s_w (q31, q31) -v4i8 __builtin_mips_subu_qb (v4i8, v4i8) -v4i8 __builtin_mips_subu_s_qb (v4i8, v4i8) -i32 __builtin_mips_addsc (i32, i32) -i32 __builtin_mips_addwc (i32, i32) -i32 __builtin_mips_modsub (i32, i32) -i32 __builtin_mips_raddu_w_qb (v4i8) -v2q15 __builtin_mips_absq_s_ph (v2q15) -q31 __builtin_mips_absq_s_w (q31) -v4i8 __builtin_mips_precrq_qb_ph (v2q15, v2q15) -v2q15 __builtin_mips_precrq_ph_w (q31, q31) -v2q15 __builtin_mips_precrq_rs_ph_w (q31, q31) -v4i8 __builtin_mips_precrqu_s_qb_ph (v2q15, v2q15) -q31 __builtin_mips_preceq_w_phl (v2q15) -q31 __builtin_mips_preceq_w_phr (v2q15) -v2q15 __builtin_mips_precequ_ph_qbl (v4i8) -v2q15 __builtin_mips_precequ_ph_qbr (v4i8) -v2q15 __builtin_mips_precequ_ph_qbla (v4i8) -v2q15 __builtin_mips_precequ_ph_qbra (v4i8) -v2q15 __builtin_mips_preceu_ph_qbl (v4i8) -v2q15 __builtin_mips_preceu_ph_qbr (v4i8) -v2q15 __builtin_mips_preceu_ph_qbla (v4i8) -v2q15 __builtin_mips_preceu_ph_qbra (v4i8) -v4i8 __builtin_mips_shll_qb (v4i8, imm0_7) -v4i8 __builtin_mips_shll_qb (v4i8, i32) -v2q15 __builtin_mips_shll_ph (v2q15, imm0_15) -v2q15 __builtin_mips_shll_ph (v2q15, i32) -v2q15 __builtin_mips_shll_s_ph (v2q15, imm0_15) -v2q15 __builtin_mips_shll_s_ph (v2q15, i32) -q31 __builtin_mips_shll_s_w (q31, imm0_31) -q31 __builtin_mips_shll_s_w (q31, i32) -v4i8 __builtin_mips_shrl_qb (v4i8, imm0_7) -v4i8 __builtin_mips_shrl_qb (v4i8, i32) -v2q15 __builtin_mips_shra_ph (v2q15, imm0_15) -v2q15 __builtin_mips_shra_ph (v2q15, i32) -v2q15 __builtin_mips_shra_r_ph (v2q15, imm0_15) -v2q15 __builtin_mips_shra_r_ph (v2q15, i32) -q31 __builtin_mips_shra_r_w (q31, imm0_31) -q31 __builtin_mips_shra_r_w (q31, i32) -v2q15 __builtin_mips_muleu_s_ph_qbl (v4i8, v2q15) -v2q15 __builtin_mips_muleu_s_ph_qbr (v4i8, v2q15) -v2q15 __builtin_mips_mulq_rs_ph (v2q15, v2q15) -q31 __builtin_mips_muleq_s_w_phl (v2q15, v2q15) -q31 __builtin_mips_muleq_s_w_phr (v2q15, v2q15) -a64 __builtin_mips_dpau_h_qbl (a64, v4i8, v4i8) -a64 __builtin_mips_dpau_h_qbr (a64, v4i8, v4i8) -a64 __builtin_mips_dpsu_h_qbl (a64, v4i8, v4i8) -a64 __builtin_mips_dpsu_h_qbr (a64, v4i8, v4i8) -a64 __builtin_mips_dpaq_s_w_ph (a64, v2q15, v2q15) -a64 __builtin_mips_dpaq_sa_l_w (a64, q31, q31) -a64 __builtin_mips_dpsq_s_w_ph (a64, v2q15, v2q15) -a64 __builtin_mips_dpsq_sa_l_w (a64, q31, q31) -a64 __builtin_mips_mulsaq_s_w_ph (a64, v2q15, v2q15) -a64 __builtin_mips_maq_s_w_phl (a64, v2q15, v2q15) -a64 __builtin_mips_maq_s_w_phr (a64, v2q15, v2q15) -a64 __builtin_mips_maq_sa_w_phl (a64, v2q15, v2q15) -a64 __builtin_mips_maq_sa_w_phr (a64, v2q15, v2q15) -i32 __builtin_mips_bitrev (i32) -i32 __builtin_mips_insv (i32, i32) -v4i8 __builtin_mips_repl_qb (imm0_255) -v4i8 __builtin_mips_repl_qb (i32) -v2q15 __builtin_mips_repl_ph (imm_n512_511) -v2q15 __builtin_mips_repl_ph (i32) -void __builtin_mips_cmpu_eq_qb (v4i8, v4i8) -void __builtin_mips_cmpu_lt_qb (v4i8, v4i8) -void __builtin_mips_cmpu_le_qb (v4i8, v4i8) -i32 __builtin_mips_cmpgu_eq_qb (v4i8, v4i8) -i32 __builtin_mips_cmpgu_lt_qb (v4i8, v4i8) -i32 __builtin_mips_cmpgu_le_qb (v4i8, v4i8) -void __builtin_mips_cmp_eq_ph (v2q15, v2q15) -void __builtin_mips_cmp_lt_ph (v2q15, v2q15) -void __builtin_mips_cmp_le_ph (v2q15, v2q15) -v4i8 __builtin_mips_pick_qb (v4i8, v4i8) -v2q15 __builtin_mips_pick_ph (v2q15, v2q15) -v2q15 __builtin_mips_packrl_ph (v2q15, v2q15) -i32 __builtin_mips_extr_w (a64, imm0_31) -i32 __builtin_mips_extr_w (a64, i32) -i32 __builtin_mips_extr_r_w (a64, imm0_31) -i32 __builtin_mips_extr_s_h (a64, i32) -i32 __builtin_mips_extr_rs_w (a64, imm0_31) -i32 __builtin_mips_extr_rs_w (a64, i32) -i32 __builtin_mips_extr_s_h (a64, imm0_31) -i32 __builtin_mips_extr_r_w (a64, i32) -i32 __builtin_mips_extp (a64, imm0_31) -i32 __builtin_mips_extp (a64, i32) -i32 __builtin_mips_extpdp (a64, imm0_31) -i32 __builtin_mips_extpdp (a64, i32) -a64 __builtin_mips_shilo (a64, imm_n32_31) -a64 __builtin_mips_shilo (a64, i32) -a64 __builtin_mips_mthlip (a64, i32) -void __builtin_mips_wrdsp (i32, imm0_63) -i32 __builtin_mips_rddsp (imm0_63) -i32 __builtin_mips_lbux (void *, i32) -i32 __builtin_mips_lhx (void *, i32) -i32 __builtin_mips_lwx (void *, i32) -a64 __builtin_mips_ldx (void *, i32) [MIPS64 only] -i32 __builtin_mips_bposge32 (void) -a64 __builtin_mips_madd (a64, i32, i32); -a64 __builtin_mips_maddu (a64, ui32, ui32); -a64 __builtin_mips_msub (a64, i32, i32); -a64 __builtin_mips_msubu (a64, ui32, ui32); -a64 __builtin_mips_mult (i32, i32); -a64 __builtin_mips_multu (ui32, ui32); +long __builtin_bpermd (long, long); +int __builtin_divwe (int, int); +int __builtin_divweo (int, int); +unsigned int __builtin_divweu (unsigned int, unsigned int); +unsigned int __builtin_divweuo (unsigned int, unsigned int); +long __builtin_divde (long, long); +long __builtin_divdeo (long, long); +unsigned long __builtin_divdeu (unsigned long, unsigned long); +unsigned long __builtin_divdeuo (unsigned long, unsigned long); +unsigned int cdtbcd (unsigned int); +unsigned int cbcdtd (unsigned int); +unsigned int addg6s (unsigned int, unsigned int); @end smallexample -The following built-in functions map directly to a particular MIPS DSP REV 2 -instruction. Please refer to the architecture specification -for details on what each instruction does. +The @code{__builtin_divde}, @code{__builtin_divdeo}, +@code{__builitin_divdeu}, @code{__builtin_divdeou} functions require a +64-bit environment support ISA 2.06 or later. +The following built-in functions are available for the PowerPC family +of processors when hardware decimal floating point +(@option{-mhard-dfp}) is available: @smallexample -v4q7 __builtin_mips_absq_s_qb (v4q7); -v2i16 __builtin_mips_addu_ph (v2i16, v2i16); -v2i16 __builtin_mips_addu_s_ph (v2i16, v2i16); -v4i8 __builtin_mips_adduh_qb (v4i8, v4i8); -v4i8 __builtin_mips_adduh_r_qb (v4i8, v4i8); -i32 __builtin_mips_append (i32, i32, imm0_31); -i32 __builtin_mips_balign (i32, i32, imm0_3); -i32 __builtin_mips_cmpgdu_eq_qb (v4i8, v4i8); -i32 __builtin_mips_cmpgdu_lt_qb (v4i8, v4i8); -i32 __builtin_mips_cmpgdu_le_qb (v4i8, v4i8); -a64 __builtin_mips_dpa_w_ph (a64, v2i16, v2i16); -a64 __builtin_mips_dps_w_ph (a64, v2i16, v2i16); -v2i16 __builtin_mips_mul_ph (v2i16, v2i16); -v2i16 __builtin_mips_mul_s_ph (v2i16, v2i16); -q31 __builtin_mips_mulq_rs_w (q31, q31); -v2q15 __builtin_mips_mulq_s_ph (v2q15, v2q15); -q31 __builtin_mips_mulq_s_w (q31, q31); -a64 __builtin_mips_mulsa_w_ph (a64, v2i16, v2i16); -v4i8 __builtin_mips_precr_qb_ph (v2i16, v2i16); -v2i16 __builtin_mips_precr_sra_ph_w (i32, i32, imm0_31); -v2i16 __builtin_mips_precr_sra_r_ph_w (i32, i32, imm0_31); -i32 __builtin_mips_prepend (i32, i32, imm0_31); -v4i8 __builtin_mips_shra_qb (v4i8, imm0_7); -v4i8 __builtin_mips_shra_r_qb (v4i8, imm0_7); -v4i8 __builtin_mips_shra_qb (v4i8, i32); -v4i8 __builtin_mips_shra_r_qb (v4i8, i32); -v2i16 __builtin_mips_shrl_ph (v2i16, imm0_15); -v2i16 __builtin_mips_shrl_ph (v2i16, i32); -v2i16 __builtin_mips_subu_ph (v2i16, v2i16); -v2i16 __builtin_mips_subu_s_ph (v2i16, v2i16); -v4i8 __builtin_mips_subuh_qb (v4i8, v4i8); -v4i8 __builtin_mips_subuh_r_qb (v4i8, v4i8); -v2q15 __builtin_mips_addqh_ph (v2q15, v2q15); -v2q15 __builtin_mips_addqh_r_ph (v2q15, v2q15); -q31 __builtin_mips_addqh_w (q31, q31); -q31 __builtin_mips_addqh_r_w (q31, q31); -v2q15 __builtin_mips_subqh_ph (v2q15, v2q15); -v2q15 __builtin_mips_subqh_r_ph (v2q15, v2q15); -q31 __builtin_mips_subqh_w (q31, q31); -q31 __builtin_mips_subqh_r_w (q31, q31); -a64 __builtin_mips_dpax_w_ph (a64, v2i16, v2i16); -a64 __builtin_mips_dpsx_w_ph (a64, v2i16, v2i16); -a64 __builtin_mips_dpaqx_s_w_ph (a64, v2q15, v2q15); -a64 __builtin_mips_dpaqx_sa_w_ph (a64, v2q15, v2q15); -a64 __builtin_mips_dpsqx_s_w_ph (a64, v2q15, v2q15); -a64 __builtin_mips_dpsqx_sa_w_ph (a64, v2q15, v2q15); +_Decimal64 __builtin_dxex (_Decimal64); +_Decimal128 __builtin_dxexq (_Decimal128); +_Decimal64 __builtin_ddedpd (int, _Decimal64); +_Decimal128 __builtin_ddedpdq (int, _Decimal128); +_Decimal64 __builtin_denbcd (int, _Decimal64); +_Decimal128 __builtin_denbcdq (int, _Decimal128); +_Decimal64 __builtin_diex (_Decimal64, _Decimal64); +_Decimal128 _builtin_diexq (_Decimal128, _Decimal128); +_Decimal64 __builtin_dscli (_Decimal64, int); +_Decimal128 __builitn_dscliq (_Decimal128, int); +_Decimal64 __builtin_dscri (_Decimal64, int); +_Decimal128 __builitn_dscriq (_Decimal128, int); +unsigned long long __builtin_unpack_dec128 (_Decimal128, int); +_Decimal128 __builtin_pack_dec128 (unsigned long long, unsigned long long); @end smallexample +The following built-in functions are available for the PowerPC family +of processors when the Vector Scalar (vsx) instruction set is +available: +@smallexample +unsigned long long __builtin_unpack_vector_int128 (vector __int128_t, int); +vector __int128_t __builtin_pack_vector_int128 (unsigned long long, + unsigned long long); +@end smallexample -@node MIPS Paired-Single Support -@subsection MIPS Paired-Single Support +@node PowerPC AltiVec/VSX Built-in Functions +@subsection PowerPC AltiVec Built-in Functions -The MIPS64 architecture includes a number of instructions that -operate on pairs of single-precision floating-point values. -Each pair is packed into a 64-bit floating-point register, -with one element being designated the ``upper half'' and -the other being designated the ``lower half''. +GCC provides an interface for the PowerPC family of processors to access +the AltiVec operations described in Motorola's AltiVec Programming +Interface Manual. The interface is made available by including +@code{} and using @option{-maltivec} and +@option{-mabi=altivec}. The interface supports the following vector +types. -GCC supports paired-single operations using both the generic -vector extensions (@pxref{Vector Extensions}) and a collection of -MIPS-specific built-in functions. Both kinds of support are -enabled by the @option{-mpaired-single} command-line option. +@smallexample +vector unsigned char +vector signed char +vector bool char -The vector type associated with paired-single values is usually -called @code{v2sf}. It can be defined in C as follows: +vector unsigned short +vector signed short +vector bool short +vector pixel -@smallexample -typedef float v2sf __attribute__ ((vector_size (8))); +vector unsigned int +vector signed int +vector bool int +vector float @end smallexample -@code{v2sf} values are initialized in the same way as aggregates. -For example: +If @option{-mvsx} is used the following additional vector types are +implemented. @smallexample -v2sf a = @{1.5, 9.1@}; -v2sf b; -float e, f; -b = (v2sf) @{e, f@}; +vector unsigned long +vector signed long +vector double @end smallexample -@emph{Note:} The CPU's endianness determines which value is stored in -the upper half of a register and which value is stored in the lower half. -On little-endian targets, the first value is the lower one and the second -value is the upper one. The opposite order applies to big-endian targets. -For example, the code above sets the lower half of @code{a} to -@code{1.5} on little-endian targets and @code{9.1} on big-endian targets. +The long types are only implemented for 64-bit code generation, and +the long type is only used in the floating point/integer conversion +instructions. -@node MIPS Loongson Built-in Functions -@subsection MIPS Loongson Built-in Functions +GCC's implementation of the high-level language interface available from +C and C++ code differs from Motorola's documentation in several ways. -GCC provides intrinsics to access the SIMD instructions provided by the -ST Microelectronics Loongson-2E and -2F processors. These intrinsics, -available after inclusion of the @code{loongson.h} header file, -operate on the following 64-bit vector types: +@itemize @bullet -@itemize -@item @code{uint8x8_t}, a vector of eight unsigned 8-bit integers; -@item @code{uint16x4_t}, a vector of four unsigned 16-bit integers; -@item @code{uint32x2_t}, a vector of two unsigned 32-bit integers; -@item @code{int8x8_t}, a vector of eight signed 8-bit integers; -@item @code{int16x4_t}, a vector of four signed 16-bit integers; -@item @code{int32x2_t}, a vector of two signed 32-bit integers. -@end itemize +@item +A vector constant is a list of constant expressions within curly braces. -The intrinsics provided are listed below; each is named after the -machine instruction to which it corresponds, with suffixes added as -appropriate to distinguish intrinsics that expand to the same machine -instruction yet have different argument types. Refer to the architecture -documentation for a description of the functionality of each -instruction. +@item +A vector initializer requires no cast if the vector constant is of the +same type as the variable it is initializing. + +@item +If @code{signed} or @code{unsigned} is omitted, the signedness of the +vector type is the default signedness of the base type. The default +varies depending on the operating system, so a portable program should +always specify the signedness. + +@item +Compiling with @option{-maltivec} adds keywords @code{__vector}, +@code{vector}, @code{__pixel}, @code{pixel}, @code{__bool} and +@code{bool}. When compiling ISO C, the context-sensitive substitution +of the keywords @code{vector}, @code{pixel} and @code{bool} is +disabled. To use them, you must include @code{} instead. + +@item +GCC allows using a @code{typedef} name as the type specifier for a +vector type. + +@item +For C, overloaded functions are implemented with macros so the following +does not work: @smallexample -int16x4_t packsswh (int32x2_t s, int32x2_t t); -int8x8_t packsshb (int16x4_t s, int16x4_t t); -uint8x8_t packushb (uint16x4_t s, uint16x4_t t); -uint32x2_t paddw_u (uint32x2_t s, uint32x2_t t); -uint16x4_t paddh_u (uint16x4_t s, uint16x4_t t); -uint8x8_t paddb_u (uint8x8_t s, uint8x8_t t); -int32x2_t paddw_s (int32x2_t s, int32x2_t t); -int16x4_t paddh_s (int16x4_t s, int16x4_t t); -int8x8_t paddb_s (int8x8_t s, int8x8_t t); -uint64_t paddd_u (uint64_t s, uint64_t t); -int64_t paddd_s (int64_t s, int64_t t); -int16x4_t paddsh (int16x4_t s, int16x4_t t); -int8x8_t paddsb (int8x8_t s, int8x8_t t); -uint16x4_t paddush (uint16x4_t s, uint16x4_t t); -uint8x8_t paddusb (uint8x8_t s, uint8x8_t t); -uint64_t pandn_ud (uint64_t s, uint64_t t); -uint32x2_t pandn_uw (uint32x2_t s, uint32x2_t t); -uint16x4_t pandn_uh (uint16x4_t s, uint16x4_t t); -uint8x8_t pandn_ub (uint8x8_t s, uint8x8_t t); -int64_t pandn_sd (int64_t s, int64_t t); -int32x2_t pandn_sw (int32x2_t s, int32x2_t t); -int16x4_t pandn_sh (int16x4_t s, int16x4_t t); -int8x8_t pandn_sb (int8x8_t s, int8x8_t t); -uint16x4_t pavgh (uint16x4_t s, uint16x4_t t); -uint8x8_t pavgb (uint8x8_t s, uint8x8_t t); -uint32x2_t pcmpeqw_u (uint32x2_t s, uint32x2_t t); -uint16x4_t pcmpeqh_u (uint16x4_t s, uint16x4_t t); -uint8x8_t pcmpeqb_u (uint8x8_t s, uint8x8_t t); -int32x2_t pcmpeqw_s (int32x2_t s, int32x2_t t); -int16x4_t pcmpeqh_s (int16x4_t s, int16x4_t t); -int8x8_t pcmpeqb_s (int8x8_t s, int8x8_t t); -uint32x2_t pcmpgtw_u (uint32x2_t s, uint32x2_t t); -uint16x4_t pcmpgth_u (uint16x4_t s, uint16x4_t t); -uint8x8_t pcmpgtb_u (uint8x8_t s, uint8x8_t t); -int32x2_t pcmpgtw_s (int32x2_t s, int32x2_t t); -int16x4_t pcmpgth_s (int16x4_t s, int16x4_t t); -int8x8_t pcmpgtb_s (int8x8_t s, int8x8_t t); -uint16x4_t pextrh_u (uint16x4_t s, int field); -int16x4_t pextrh_s (int16x4_t s, int field); -uint16x4_t pinsrh_0_u (uint16x4_t s, uint16x4_t t); -uint16x4_t pinsrh_1_u (uint16x4_t s, uint16x4_t t); -uint16x4_t pinsrh_2_u (uint16x4_t s, uint16x4_t t); -uint16x4_t pinsrh_3_u (uint16x4_t s, uint16x4_t t); -int16x4_t pinsrh_0_s (int16x4_t s, int16x4_t t); -int16x4_t pinsrh_1_s (int16x4_t s, int16x4_t t); -int16x4_t pinsrh_2_s (int16x4_t s, int16x4_t t); -int16x4_t pinsrh_3_s (int16x4_t s, int16x4_t t); -int32x2_t pmaddhw (int16x4_t s, int16x4_t t); -int16x4_t pmaxsh (int16x4_t s, int16x4_t t); -uint8x8_t pmaxub (uint8x8_t s, uint8x8_t t); -int16x4_t pminsh (int16x4_t s, int16x4_t t); -uint8x8_t pminub (uint8x8_t s, uint8x8_t t); -uint8x8_t pmovmskb_u (uint8x8_t s); -int8x8_t pmovmskb_s (int8x8_t s); -uint16x4_t pmulhuh (uint16x4_t s, uint16x4_t t); -int16x4_t pmulhh (int16x4_t s, int16x4_t t); -int16x4_t pmullh (int16x4_t s, int16x4_t t); -int64_t pmuluw (uint32x2_t s, uint32x2_t t); -uint8x8_t pasubub (uint8x8_t s, uint8x8_t t); -uint16x4_t biadd (uint8x8_t s); -uint16x4_t psadbh (uint8x8_t s, uint8x8_t t); -uint16x4_t pshufh_u (uint16x4_t dest, uint16x4_t s, uint8_t order); -int16x4_t pshufh_s (int16x4_t dest, int16x4_t s, uint8_t order); -uint16x4_t psllh_u (uint16x4_t s, uint8_t amount); -int16x4_t psllh_s (int16x4_t s, uint8_t amount); -uint32x2_t psllw_u (uint32x2_t s, uint8_t amount); -int32x2_t psllw_s (int32x2_t s, uint8_t amount); -uint16x4_t psrlh_u (uint16x4_t s, uint8_t amount); -int16x4_t psrlh_s (int16x4_t s, uint8_t amount); -uint32x2_t psrlw_u (uint32x2_t s, uint8_t amount); -int32x2_t psrlw_s (int32x2_t s, uint8_t amount); -uint16x4_t psrah_u (uint16x4_t s, uint8_t amount); -int16x4_t psrah_s (int16x4_t s, uint8_t amount); -uint32x2_t psraw_u (uint32x2_t s, uint8_t amount); -int32x2_t psraw_s (int32x2_t s, uint8_t amount); -uint32x2_t psubw_u (uint32x2_t s, uint32x2_t t); -uint16x4_t psubh_u (uint16x4_t s, uint16x4_t t); -uint8x8_t psubb_u (uint8x8_t s, uint8x8_t t); -int32x2_t psubw_s (int32x2_t s, int32x2_t t); -int16x4_t psubh_s (int16x4_t s, int16x4_t t); -int8x8_t psubb_s (int8x8_t s, int8x8_t t); -uint64_t psubd_u (uint64_t s, uint64_t t); -int64_t psubd_s (int64_t s, int64_t t); -int16x4_t psubsh (int16x4_t s, int16x4_t t); -int8x8_t psubsb (int8x8_t s, int8x8_t t); -uint16x4_t psubush (uint16x4_t s, uint16x4_t t); -uint8x8_t psubusb (uint8x8_t s, uint8x8_t t); -uint32x2_t punpckhwd_u (uint32x2_t s, uint32x2_t t); -uint16x4_t punpckhhw_u (uint16x4_t s, uint16x4_t t); -uint8x8_t punpckhbh_u (uint8x8_t s, uint8x8_t t); -int32x2_t punpckhwd_s (int32x2_t s, int32x2_t t); -int16x4_t punpckhhw_s (int16x4_t s, int16x4_t t); -int8x8_t punpckhbh_s (int8x8_t s, int8x8_t t); -uint32x2_t punpcklwd_u (uint32x2_t s, uint32x2_t t); -uint16x4_t punpcklhw_u (uint16x4_t s, uint16x4_t t); -uint8x8_t punpcklbh_u (uint8x8_t s, uint8x8_t t); -int32x2_t punpcklwd_s (int32x2_t s, int32x2_t t); -int16x4_t punpcklhw_s (int16x4_t s, int16x4_t t); -int8x8_t punpcklbh_s (int8x8_t s, int8x8_t t); + vec_add ((vector signed int)@{1, 2, 3, 4@}, foo); @end smallexample -@menu -* Paired-Single Arithmetic:: -* Paired-Single Built-in Functions:: -* MIPS-3D Built-in Functions:: -@end menu +@noindent +Since @code{vec_add} is a macro, the vector constant in the example +is treated as four separate arguments. Wrap the entire argument in +parentheses for this to work. +@end itemize + +@emph{Note:} Only the @code{} interface is supported. +Internally, GCC uses built-in functions to achieve the functionality in +the aforementioned header file, but they are not supported and are +subject to change without notice. + +The following interfaces are supported for the generic and specific +AltiVec operations and the AltiVec predicates. In cases where there +is a direct mapping between generic and specific operations, only the +generic names are shown here, although the specific operations can also +be used. + +Arguments that are documented as @code{const int} require literal +integral values within the range required for that operation. + +@smallexample +vector signed char vec_abs (vector signed char); +vector signed short vec_abs (vector signed short); +vector signed int vec_abs (vector signed int); +vector float vec_abs (vector float); + +vector signed char vec_abss (vector signed char); +vector signed short vec_abss (vector signed short); +vector signed int vec_abss (vector signed int); + +vector signed char vec_add (vector bool char, vector signed char); +vector signed char vec_add (vector signed char, vector bool char); +vector signed char vec_add (vector signed char, vector signed char); +vector unsigned char vec_add (vector bool char, vector unsigned char); +vector unsigned char vec_add (vector unsigned char, vector bool char); +vector unsigned char vec_add (vector unsigned char, + vector unsigned char); +vector signed short vec_add (vector bool short, vector signed short); +vector signed short vec_add (vector signed short, vector bool short); +vector signed short vec_add (vector signed short, vector signed short); +vector unsigned short vec_add (vector bool short, + vector unsigned short); +vector unsigned short vec_add (vector unsigned short, + vector bool short); +vector unsigned short vec_add (vector unsigned short, + vector unsigned short); +vector signed int vec_add (vector bool int, vector signed int); +vector signed int vec_add (vector signed int, vector bool int); +vector signed int vec_add (vector signed int, vector signed int); +vector unsigned int vec_add (vector bool int, vector unsigned int); +vector unsigned int vec_add (vector unsigned int, vector bool int); +vector unsigned int vec_add (vector unsigned int, vector unsigned int); +vector float vec_add (vector float, vector float); + +vector float vec_vaddfp (vector float, vector float); -@node Paired-Single Arithmetic -@subsubsection Paired-Single Arithmetic +vector signed int vec_vadduwm (vector bool int, vector signed int); +vector signed int vec_vadduwm (vector signed int, vector bool int); +vector signed int vec_vadduwm (vector signed int, vector signed int); +vector unsigned int vec_vadduwm (vector bool int, vector unsigned int); +vector unsigned int vec_vadduwm (vector unsigned int, vector bool int); +vector unsigned int vec_vadduwm (vector unsigned int, + vector unsigned int); -The table below lists the @code{v2sf} operations for which hardware -support exists. @code{a}, @code{b} and @code{c} are @code{v2sf} -values and @code{x} is an integral value. +vector signed short vec_vadduhm (vector bool short, + vector signed short); +vector signed short vec_vadduhm (vector signed short, + vector bool short); +vector signed short vec_vadduhm (vector signed short, + vector signed short); +vector unsigned short vec_vadduhm (vector bool short, + vector unsigned short); +vector unsigned short vec_vadduhm (vector unsigned short, + vector bool short); +vector unsigned short vec_vadduhm (vector unsigned short, + vector unsigned short); -@multitable @columnfractions .50 .50 -@item C code @tab MIPS instruction -@item @code{a + b} @tab @code{add.ps} -@item @code{a - b} @tab @code{sub.ps} -@item @code{-a} @tab @code{neg.ps} -@item @code{a * b} @tab @code{mul.ps} -@item @code{a * b + c} @tab @code{madd.ps} -@item @code{a * b - c} @tab @code{msub.ps} -@item @code{-(a * b + c)} @tab @code{nmadd.ps} -@item @code{-(a * b - c)} @tab @code{nmsub.ps} -@item @code{x ? a : b} @tab @code{movn.ps}/@code{movz.ps} -@end multitable +vector signed char vec_vaddubm (vector bool char, vector signed char); +vector signed char vec_vaddubm (vector signed char, vector bool char); +vector signed char vec_vaddubm (vector signed char, vector signed char); +vector unsigned char vec_vaddubm (vector bool char, + vector unsigned char); +vector unsigned char vec_vaddubm (vector unsigned char, + vector bool char); +vector unsigned char vec_vaddubm (vector unsigned char, + vector unsigned char); -Note that the multiply-accumulate instructions can be disabled -using the command-line option @code{-mno-fused-madd}. +vector unsigned int vec_addc (vector unsigned int, vector unsigned int); -@node Paired-Single Built-in Functions -@subsubsection Paired-Single Built-in Functions +vector unsigned char vec_adds (vector bool char, vector unsigned char); +vector unsigned char vec_adds (vector unsigned char, vector bool char); +vector unsigned char vec_adds (vector unsigned char, + vector unsigned char); +vector signed char vec_adds (vector bool char, vector signed char); +vector signed char vec_adds (vector signed char, vector bool char); +vector signed char vec_adds (vector signed char, vector signed char); +vector unsigned short vec_adds (vector bool short, + vector unsigned short); +vector unsigned short vec_adds (vector unsigned short, + vector bool short); +vector unsigned short vec_adds (vector unsigned short, + vector unsigned short); +vector signed short vec_adds (vector bool short, vector signed short); +vector signed short vec_adds (vector signed short, vector bool short); +vector signed short vec_adds (vector signed short, vector signed short); +vector unsigned int vec_adds (vector bool int, vector unsigned int); +vector unsigned int vec_adds (vector unsigned int, vector bool int); +vector unsigned int vec_adds (vector unsigned int, vector unsigned int); +vector signed int vec_adds (vector bool int, vector signed int); +vector signed int vec_adds (vector signed int, vector bool int); +vector signed int vec_adds (vector signed int, vector signed int); -The following paired-single functions map directly to a particular -MIPS instruction. Please refer to the architecture specification -for details on what each instruction does. +vector signed int vec_vaddsws (vector bool int, vector signed int); +vector signed int vec_vaddsws (vector signed int, vector bool int); +vector signed int vec_vaddsws (vector signed int, vector signed int); -@table @code -@item v2sf __builtin_mips_pll_ps (v2sf, v2sf) -Pair lower lower (@code{pll.ps}). +vector unsigned int vec_vadduws (vector bool int, vector unsigned int); +vector unsigned int vec_vadduws (vector unsigned int, vector bool int); +vector unsigned int vec_vadduws (vector unsigned int, + vector unsigned int); -@item v2sf __builtin_mips_pul_ps (v2sf, v2sf) -Pair upper lower (@code{pul.ps}). +vector signed short vec_vaddshs (vector bool short, + vector signed short); +vector signed short vec_vaddshs (vector signed short, + vector bool short); +vector signed short vec_vaddshs (vector signed short, + vector signed short); -@item v2sf __builtin_mips_plu_ps (v2sf, v2sf) -Pair lower upper (@code{plu.ps}). +vector unsigned short vec_vadduhs (vector bool short, + vector unsigned short); +vector unsigned short vec_vadduhs (vector unsigned short, + vector bool short); +vector unsigned short vec_vadduhs (vector unsigned short, + vector unsigned short); -@item v2sf __builtin_mips_puu_ps (v2sf, v2sf) -Pair upper upper (@code{puu.ps}). +vector signed char vec_vaddsbs (vector bool char, vector signed char); +vector signed char vec_vaddsbs (vector signed char, vector bool char); +vector signed char vec_vaddsbs (vector signed char, vector signed char); -@item v2sf __builtin_mips_cvt_ps_s (float, float) -Convert pair to paired single (@code{cvt.ps.s}). +vector unsigned char vec_vaddubs (vector bool char, + vector unsigned char); +vector unsigned char vec_vaddubs (vector unsigned char, + vector bool char); +vector unsigned char vec_vaddubs (vector unsigned char, + vector unsigned char); -@item float __builtin_mips_cvt_s_pl (v2sf) -Convert pair lower to single (@code{cvt.s.pl}). +vector float vec_and (vector float, vector float); +vector float vec_and (vector float, vector bool int); +vector float vec_and (vector bool int, vector float); +vector bool int vec_and (vector bool int, vector bool int); +vector signed int vec_and (vector bool int, vector signed int); +vector signed int vec_and (vector signed int, vector bool int); +vector signed int vec_and (vector signed int, vector signed int); +vector unsigned int vec_and (vector bool int, vector unsigned int); +vector unsigned int vec_and (vector unsigned int, vector bool int); +vector unsigned int vec_and (vector unsigned int, vector unsigned int); +vector bool short vec_and (vector bool short, vector bool short); +vector signed short vec_and (vector bool short, vector signed short); +vector signed short vec_and (vector signed short, vector bool short); +vector signed short vec_and (vector signed short, vector signed short); +vector unsigned short vec_and (vector bool short, + vector unsigned short); +vector unsigned short vec_and (vector unsigned short, + vector bool short); +vector unsigned short vec_and (vector unsigned short, + vector unsigned short); +vector signed char vec_and (vector bool char, vector signed char); +vector bool char vec_and (vector bool char, vector bool char); +vector signed char vec_and (vector signed char, vector bool char); +vector signed char vec_and (vector signed char, vector signed char); +vector unsigned char vec_and (vector bool char, vector unsigned char); +vector unsigned char vec_and (vector unsigned char, vector bool char); +vector unsigned char vec_and (vector unsigned char, + vector unsigned char); -@item float __builtin_mips_cvt_s_pu (v2sf) -Convert pair upper to single (@code{cvt.s.pu}). +vector float vec_andc (vector float, vector float); +vector float vec_andc (vector float, vector bool int); +vector float vec_andc (vector bool int, vector float); +vector bool int vec_andc (vector bool int, vector bool int); +vector signed int vec_andc (vector bool int, vector signed int); +vector signed int vec_andc (vector signed int, vector bool int); +vector signed int vec_andc (vector signed int, vector signed int); +vector unsigned int vec_andc (vector bool int, vector unsigned int); +vector unsigned int vec_andc (vector unsigned int, vector bool int); +vector unsigned int vec_andc (vector unsigned int, vector unsigned int); +vector bool short vec_andc (vector bool short, vector bool short); +vector signed short vec_andc (vector bool short, vector signed short); +vector signed short vec_andc (vector signed short, vector bool short); +vector signed short vec_andc (vector signed short, vector signed short); +vector unsigned short vec_andc (vector bool short, + vector unsigned short); +vector unsigned short vec_andc (vector unsigned short, + vector bool short); +vector unsigned short vec_andc (vector unsigned short, + vector unsigned short); +vector signed char vec_andc (vector bool char, vector signed char); +vector bool char vec_andc (vector bool char, vector bool char); +vector signed char vec_andc (vector signed char, vector bool char); +vector signed char vec_andc (vector signed char, vector signed char); +vector unsigned char vec_andc (vector bool char, vector unsigned char); +vector unsigned char vec_andc (vector unsigned char, vector bool char); +vector unsigned char vec_andc (vector unsigned char, + vector unsigned char); -@item v2sf __builtin_mips_abs_ps (v2sf) -Absolute value (@code{abs.ps}). +vector unsigned char vec_avg (vector unsigned char, + vector unsigned char); +vector signed char vec_avg (vector signed char, vector signed char); +vector unsigned short vec_avg (vector unsigned short, + vector unsigned short); +vector signed short vec_avg (vector signed short, vector signed short); +vector unsigned int vec_avg (vector unsigned int, vector unsigned int); +vector signed int vec_avg (vector signed int, vector signed int); -@item v2sf __builtin_mips_alnv_ps (v2sf, v2sf, int) -Align variable (@code{alnv.ps}). +vector signed int vec_vavgsw (vector signed int, vector signed int); -@emph{Note:} The value of the third parameter must be 0 or 4 -modulo 8, otherwise the result is unpredictable. Please read the -instruction description for details. -@end table +vector unsigned int vec_vavguw (vector unsigned int, + vector unsigned int); -The following multi-instruction functions are also available. -In each case, @var{cond} can be any of the 16 floating-point conditions: -@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult}, -@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq}, @code{ngl}, -@code{lt}, @code{nge}, @code{le} or @code{ngt}. +vector signed short vec_vavgsh (vector signed short, + vector signed short); -@table @code -@item v2sf __builtin_mips_movt_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -@itemx v2sf __builtin_mips_movf_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -Conditional move based on floating-point comparison (@code{c.@var{cond}.ps}, -@code{movt.ps}/@code{movf.ps}). +vector unsigned short vec_vavguh (vector unsigned short, + vector unsigned short); -The @code{movt} functions return the value @var{x} computed by: +vector signed char vec_vavgsb (vector signed char, vector signed char); -@smallexample -c.@var{cond}.ps @var{cc},@var{a},@var{b} -mov.ps @var{x},@var{c} -movt.ps @var{x},@var{d},@var{cc} -@end smallexample +vector unsigned char vec_vavgub (vector unsigned char, + vector unsigned char); + +vector float vec_copysign (vector float); + +vector float vec_ceil (vector float); + +vector signed int vec_cmpb (vector float, vector float); + +vector bool char vec_cmpeq (vector signed char, vector signed char); +vector bool char vec_cmpeq (vector unsigned char, vector unsigned char); +vector bool short vec_cmpeq (vector signed short, vector signed short); +vector bool short vec_cmpeq (vector unsigned short, + vector unsigned short); +vector bool int vec_cmpeq (vector signed int, vector signed int); +vector bool int vec_cmpeq (vector unsigned int, vector unsigned int); +vector bool int vec_cmpeq (vector float, vector float); + +vector bool int vec_vcmpeqfp (vector float, vector float); -The @code{movf} functions are similar but use @code{movf.ps} instead -of @code{movt.ps}. +vector bool int vec_vcmpequw (vector signed int, vector signed int); +vector bool int vec_vcmpequw (vector unsigned int, vector unsigned int); -@item int __builtin_mips_upper_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -@itemx int __builtin_mips_lower_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -Comparison of two paired-single values (@code{c.@var{cond}.ps}, -@code{bc1t}/@code{bc1f}). +vector bool short vec_vcmpequh (vector signed short, + vector signed short); +vector bool short vec_vcmpequh (vector unsigned short, + vector unsigned short); -These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps} -and return either the upper or lower half of the result. For example: +vector bool char vec_vcmpequb (vector signed char, vector signed char); +vector bool char vec_vcmpequb (vector unsigned char, + vector unsigned char); -@smallexample -v2sf a, b; -if (__builtin_mips_upper_c_eq_ps (a, b)) - upper_halves_are_equal (); -else - upper_halves_are_unequal (); +vector bool int vec_cmpge (vector float, vector float); -if (__builtin_mips_lower_c_eq_ps (a, b)) - lower_halves_are_equal (); -else - lower_halves_are_unequal (); -@end smallexample -@end table +vector bool char vec_cmpgt (vector unsigned char, vector unsigned char); +vector bool char vec_cmpgt (vector signed char, vector signed char); +vector bool short vec_cmpgt (vector unsigned short, + vector unsigned short); +vector bool short vec_cmpgt (vector signed short, vector signed short); +vector bool int vec_cmpgt (vector unsigned int, vector unsigned int); +vector bool int vec_cmpgt (vector signed int, vector signed int); +vector bool int vec_cmpgt (vector float, vector float); -@node MIPS-3D Built-in Functions -@subsubsection MIPS-3D Built-in Functions +vector bool int vec_vcmpgtfp (vector float, vector float); -The MIPS-3D Application-Specific Extension (ASE) includes additional -paired-single instructions that are designed to improve the performance -of 3D graphics operations. Support for these instructions is controlled -by the @option{-mips3d} command-line option. +vector bool int vec_vcmpgtsw (vector signed int, vector signed int); -The functions listed below map directly to a particular MIPS-3D -instruction. Please refer to the architecture specification for -more details on what each instruction does. +vector bool int vec_vcmpgtuw (vector unsigned int, vector unsigned int); -@table @code -@item v2sf __builtin_mips_addr_ps (v2sf, v2sf) -Reduction add (@code{addr.ps}). +vector bool short vec_vcmpgtsh (vector signed short, + vector signed short); -@item v2sf __builtin_mips_mulr_ps (v2sf, v2sf) -Reduction multiply (@code{mulr.ps}). +vector bool short vec_vcmpgtuh (vector unsigned short, + vector unsigned short); -@item v2sf __builtin_mips_cvt_pw_ps (v2sf) -Convert paired single to paired word (@code{cvt.pw.ps}). +vector bool char vec_vcmpgtsb (vector signed char, vector signed char); -@item v2sf __builtin_mips_cvt_ps_pw (v2sf) -Convert paired word to paired single (@code{cvt.ps.pw}). +vector bool char vec_vcmpgtub (vector unsigned char, + vector unsigned char); -@item float __builtin_mips_recip1_s (float) -@itemx double __builtin_mips_recip1_d (double) -@itemx v2sf __builtin_mips_recip1_ps (v2sf) -Reduced-precision reciprocal (sequence step 1) (@code{recip1.@var{fmt}}). +vector bool int vec_cmple (vector float, vector float); -@item float __builtin_mips_recip2_s (float, float) -@itemx double __builtin_mips_recip2_d (double, double) -@itemx v2sf __builtin_mips_recip2_ps (v2sf, v2sf) -Reduced-precision reciprocal (sequence step 2) (@code{recip2.@var{fmt}}). +vector bool char vec_cmplt (vector unsigned char, vector unsigned char); +vector bool char vec_cmplt (vector signed char, vector signed char); +vector bool short vec_cmplt (vector unsigned short, + vector unsigned short); +vector bool short vec_cmplt (vector signed short, vector signed short); +vector bool int vec_cmplt (vector unsigned int, vector unsigned int); +vector bool int vec_cmplt (vector signed int, vector signed int); +vector bool int vec_cmplt (vector float, vector float); -@item float __builtin_mips_rsqrt1_s (float) -@itemx double __builtin_mips_rsqrt1_d (double) -@itemx v2sf __builtin_mips_rsqrt1_ps (v2sf) -Reduced-precision reciprocal square root (sequence step 1) -(@code{rsqrt1.@var{fmt}}). +vector float vec_cpsgn (vector float, vector float); -@item float __builtin_mips_rsqrt2_s (float, float) -@itemx double __builtin_mips_rsqrt2_d (double, double) -@itemx v2sf __builtin_mips_rsqrt2_ps (v2sf, v2sf) -Reduced-precision reciprocal square root (sequence step 2) -(@code{rsqrt2.@var{fmt}}). -@end table +vector float vec_ctf (vector unsigned int, const int); +vector float vec_ctf (vector signed int, const int); +vector double vec_ctf (vector unsigned long, const int); +vector double vec_ctf (vector signed long, const int); -The following multi-instruction functions are also available. -In each case, @var{cond} can be any of the 16 floating-point conditions: -@code{f}, @code{un}, @code{eq}, @code{ueq}, @code{olt}, @code{ult}, -@code{ole}, @code{ule}, @code{sf}, @code{ngle}, @code{seq}, -@code{ngl}, @code{lt}, @code{nge}, @code{le} or @code{ngt}. +vector float vec_vcfsx (vector signed int, const int); -@table @code -@item int __builtin_mips_cabs_@var{cond}_s (float @var{a}, float @var{b}) -@itemx int __builtin_mips_cabs_@var{cond}_d (double @var{a}, double @var{b}) -Absolute comparison of two scalar values (@code{cabs.@var{cond}.@var{fmt}}, -@code{bc1t}/@code{bc1f}). +vector float vec_vcfux (vector unsigned int, const int); -These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.s} -or @code{cabs.@var{cond}.d} and return the result as a boolean value. -For example: +vector signed int vec_cts (vector float, const int); +vector signed long vec_cts (vector double, const int); -@smallexample -float a, b; -if (__builtin_mips_cabs_eq_s (a, b)) - true (); -else - false (); -@end smallexample +vector unsigned int vec_ctu (vector float, const int); +vector unsigned long vec_ctu (vector double, const int); -@item int __builtin_mips_upper_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -@itemx int __builtin_mips_lower_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -Absolute comparison of two paired-single values (@code{cabs.@var{cond}.ps}, -@code{bc1t}/@code{bc1f}). +void vec_dss (const int); -These functions compare @var{a} and @var{b} using @code{cabs.@var{cond}.ps} -and return either the upper or lower half of the result. For example: +void vec_dssall (void); -@smallexample -v2sf a, b; -if (__builtin_mips_upper_cabs_eq_ps (a, b)) - upper_halves_are_equal (); -else - upper_halves_are_unequal (); +void vec_dst (const vector unsigned char *, int, const int); +void vec_dst (const vector signed char *, int, const int); +void vec_dst (const vector bool char *, int, const int); +void vec_dst (const vector unsigned short *, int, const int); +void vec_dst (const vector signed short *, int, const int); +void vec_dst (const vector bool short *, int, const int); +void vec_dst (const vector pixel *, int, const int); +void vec_dst (const vector unsigned int *, int, const int); +void vec_dst (const vector signed int *, int, const int); +void vec_dst (const vector bool int *, int, const int); +void vec_dst (const vector float *, int, const int); +void vec_dst (const unsigned char *, int, const int); +void vec_dst (const signed char *, int, const int); +void vec_dst (const unsigned short *, int, const int); +void vec_dst (const short *, int, const int); +void vec_dst (const unsigned int *, int, const int); +void vec_dst (const int *, int, const int); +void vec_dst (const unsigned long *, int, const int); +void vec_dst (const long *, int, const int); +void vec_dst (const float *, int, const int); -if (__builtin_mips_lower_cabs_eq_ps (a, b)) - lower_halves_are_equal (); -else - lower_halves_are_unequal (); -@end smallexample +void vec_dstst (const vector unsigned char *, int, const int); +void vec_dstst (const vector signed char *, int, const int); +void vec_dstst (const vector bool char *, int, const int); +void vec_dstst (const vector unsigned short *, int, const int); +void vec_dstst (const vector signed short *, int, const int); +void vec_dstst (const vector bool short *, int, const int); +void vec_dstst (const vector pixel *, int, const int); +void vec_dstst (const vector unsigned int *, int, const int); +void vec_dstst (const vector signed int *, int, const int); +void vec_dstst (const vector bool int *, int, const int); +void vec_dstst (const vector float *, int, const int); +void vec_dstst (const unsigned char *, int, const int); +void vec_dstst (const signed char *, int, const int); +void vec_dstst (const unsigned short *, int, const int); +void vec_dstst (const short *, int, const int); +void vec_dstst (const unsigned int *, int, const int); +void vec_dstst (const int *, int, const int); +void vec_dstst (const unsigned long *, int, const int); +void vec_dstst (const long *, int, const int); +void vec_dstst (const float *, int, const int); -@item v2sf __builtin_mips_movt_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -@itemx v2sf __builtin_mips_movf_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -Conditional move based on absolute comparison (@code{cabs.@var{cond}.ps}, -@code{movt.ps}/@code{movf.ps}). +void vec_dststt (const vector unsigned char *, int, const int); +void vec_dststt (const vector signed char *, int, const int); +void vec_dststt (const vector bool char *, int, const int); +void vec_dststt (const vector unsigned short *, int, const int); +void vec_dststt (const vector signed short *, int, const int); +void vec_dststt (const vector bool short *, int, const int); +void vec_dststt (const vector pixel *, int, const int); +void vec_dststt (const vector unsigned int *, int, const int); +void vec_dststt (const vector signed int *, int, const int); +void vec_dststt (const vector bool int *, int, const int); +void vec_dststt (const vector float *, int, const int); +void vec_dststt (const unsigned char *, int, const int); +void vec_dststt (const signed char *, int, const int); +void vec_dststt (const unsigned short *, int, const int); +void vec_dststt (const short *, int, const int); +void vec_dststt (const unsigned int *, int, const int); +void vec_dststt (const int *, int, const int); +void vec_dststt (const unsigned long *, int, const int); +void vec_dststt (const long *, int, const int); +void vec_dststt (const float *, int, const int); -The @code{movt} functions return the value @var{x} computed by: +void vec_dstt (const vector unsigned char *, int, const int); +void vec_dstt (const vector signed char *, int, const int); +void vec_dstt (const vector bool char *, int, const int); +void vec_dstt (const vector unsigned short *, int, const int); +void vec_dstt (const vector signed short *, int, const int); +void vec_dstt (const vector bool short *, int, const int); +void vec_dstt (const vector pixel *, int, const int); +void vec_dstt (const vector unsigned int *, int, const int); +void vec_dstt (const vector signed int *, int, const int); +void vec_dstt (const vector bool int *, int, const int); +void vec_dstt (const vector float *, int, const int); +void vec_dstt (const unsigned char *, int, const int); +void vec_dstt (const signed char *, int, const int); +void vec_dstt (const unsigned short *, int, const int); +void vec_dstt (const short *, int, const int); +void vec_dstt (const unsigned int *, int, const int); +void vec_dstt (const int *, int, const int); +void vec_dstt (const unsigned long *, int, const int); +void vec_dstt (const long *, int, const int); +void vec_dstt (const float *, int, const int); -@smallexample -cabs.@var{cond}.ps @var{cc},@var{a},@var{b} -mov.ps @var{x},@var{c} -movt.ps @var{x},@var{d},@var{cc} -@end smallexample +vector float vec_expte (vector float); + +vector float vec_floor (vector float); + +vector float vec_ld (int, const vector float *); +vector float vec_ld (int, const float *); +vector bool int vec_ld (int, const vector bool int *); +vector signed int vec_ld (int, const vector signed int *); +vector signed int vec_ld (int, const int *); +vector signed int vec_ld (int, const long *); +vector unsigned int vec_ld (int, const vector unsigned int *); +vector unsigned int vec_ld (int, const unsigned int *); +vector unsigned int vec_ld (int, const unsigned long *); +vector bool short vec_ld (int, const vector bool short *); +vector pixel vec_ld (int, const vector pixel *); +vector signed short vec_ld (int, const vector signed short *); +vector signed short vec_ld (int, const short *); +vector unsigned short vec_ld (int, const vector unsigned short *); +vector unsigned short vec_ld (int, const unsigned short *); +vector bool char vec_ld (int, const vector bool char *); +vector signed char vec_ld (int, const vector signed char *); +vector signed char vec_ld (int, const signed char *); +vector unsigned char vec_ld (int, const vector unsigned char *); +vector unsigned char vec_ld (int, const unsigned char *); -The @code{movf} functions are similar but use @code{movf.ps} instead -of @code{movt.ps}. +vector signed char vec_lde (int, const signed char *); +vector unsigned char vec_lde (int, const unsigned char *); +vector signed short vec_lde (int, const short *); +vector unsigned short vec_lde (int, const unsigned short *); +vector float vec_lde (int, const float *); +vector signed int vec_lde (int, const int *); +vector unsigned int vec_lde (int, const unsigned int *); +vector signed int vec_lde (int, const long *); +vector unsigned int vec_lde (int, const unsigned long *); -@item int __builtin_mips_any_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -@itemx int __builtin_mips_all_c_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -@itemx int __builtin_mips_any_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -@itemx int __builtin_mips_all_cabs_@var{cond}_ps (v2sf @var{a}, v2sf @var{b}) -Comparison of two paired-single values -(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps}, -@code{bc1any2t}/@code{bc1any2f}). +vector float vec_lvewx (int, float *); +vector signed int vec_lvewx (int, int *); +vector unsigned int vec_lvewx (int, unsigned int *); +vector signed int vec_lvewx (int, long *); +vector unsigned int vec_lvewx (int, unsigned long *); -These functions compare @var{a} and @var{b} using @code{c.@var{cond}.ps} -or @code{cabs.@var{cond}.ps}. The @code{any} forms return true if either -result is true and the @code{all} forms return true if both results are true. -For example: +vector signed short vec_lvehx (int, short *); +vector unsigned short vec_lvehx (int, unsigned short *); -@smallexample -v2sf a, b; -if (__builtin_mips_any_c_eq_ps (a, b)) - one_is_true (); -else - both_are_false (); +vector signed char vec_lvebx (int, char *); +vector unsigned char vec_lvebx (int, unsigned char *); -if (__builtin_mips_all_c_eq_ps (a, b)) - both_are_true (); -else - one_is_false (); -@end smallexample +vector float vec_ldl (int, const vector float *); +vector float vec_ldl (int, const float *); +vector bool int vec_ldl (int, const vector bool int *); +vector signed int vec_ldl (int, const vector signed int *); +vector signed int vec_ldl (int, const int *); +vector signed int vec_ldl (int, const long *); +vector unsigned int vec_ldl (int, const vector unsigned int *); +vector unsigned int vec_ldl (int, const unsigned int *); +vector unsigned int vec_ldl (int, const unsigned long *); +vector bool short vec_ldl (int, const vector bool short *); +vector pixel vec_ldl (int, const vector pixel *); +vector signed short vec_ldl (int, const vector signed short *); +vector signed short vec_ldl (int, const short *); +vector unsigned short vec_ldl (int, const vector unsigned short *); +vector unsigned short vec_ldl (int, const unsigned short *); +vector bool char vec_ldl (int, const vector bool char *); +vector signed char vec_ldl (int, const vector signed char *); +vector signed char vec_ldl (int, const signed char *); +vector unsigned char vec_ldl (int, const vector unsigned char *); +vector unsigned char vec_ldl (int, const unsigned char *); -@item int __builtin_mips_any_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -@itemx int __builtin_mips_all_c_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -@itemx int __builtin_mips_any_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -@itemx int __builtin_mips_all_cabs_@var{cond}_4s (v2sf @var{a}, v2sf @var{b}, v2sf @var{c}, v2sf @var{d}) -Comparison of four paired-single values -(@code{c.@var{cond}.ps}/@code{cabs.@var{cond}.ps}, -@code{bc1any4t}/@code{bc1any4f}). +vector float vec_loge (vector float); -These functions use @code{c.@var{cond}.ps} or @code{cabs.@var{cond}.ps} -to compare @var{a} with @var{b} and to compare @var{c} with @var{d}. -The @code{any} forms return true if any of the four results are true -and the @code{all} forms return true if all four results are true. -For example: +vector unsigned char vec_lvsl (int, const volatile unsigned char *); +vector unsigned char vec_lvsl (int, const volatile signed char *); +vector unsigned char vec_lvsl (int, const volatile unsigned short *); +vector unsigned char vec_lvsl (int, const volatile short *); +vector unsigned char vec_lvsl (int, const volatile unsigned int *); +vector unsigned char vec_lvsl (int, const volatile int *); +vector unsigned char vec_lvsl (int, const volatile unsigned long *); +vector unsigned char vec_lvsl (int, const volatile long *); +vector unsigned char vec_lvsl (int, const volatile float *); -@smallexample -v2sf a, b, c, d; -if (__builtin_mips_any_c_eq_4s (a, b, c, d)) - some_are_true (); -else - all_are_false (); +vector unsigned char vec_lvsr (int, const volatile unsigned char *); +vector unsigned char vec_lvsr (int, const volatile signed char *); +vector unsigned char vec_lvsr (int, const volatile unsigned short *); +vector unsigned char vec_lvsr (int, const volatile short *); +vector unsigned char vec_lvsr (int, const volatile unsigned int *); +vector unsigned char vec_lvsr (int, const volatile int *); +vector unsigned char vec_lvsr (int, const volatile unsigned long *); +vector unsigned char vec_lvsr (int, const volatile long *); +vector unsigned char vec_lvsr (int, const volatile float *); -if (__builtin_mips_all_c_eq_4s (a, b, c, d)) - all_are_true (); -else - some_are_false (); -@end smallexample -@end table +vector float vec_madd (vector float, vector float, vector float); -@node Other MIPS Built-in Functions -@subsection Other MIPS Built-in Functions +vector signed short vec_madds (vector signed short, + vector signed short, + vector signed short); -GCC provides other MIPS-specific built-in functions: +vector unsigned char vec_max (vector bool char, vector unsigned char); +vector unsigned char vec_max (vector unsigned char, vector bool char); +vector unsigned char vec_max (vector unsigned char, + vector unsigned char); +vector signed char vec_max (vector bool char, vector signed char); +vector signed char vec_max (vector signed char, vector bool char); +vector signed char vec_max (vector signed char, vector signed char); +vector unsigned short vec_max (vector bool short, + vector unsigned short); +vector unsigned short vec_max (vector unsigned short, + vector bool short); +vector unsigned short vec_max (vector unsigned short, + vector unsigned short); +vector signed short vec_max (vector bool short, vector signed short); +vector signed short vec_max (vector signed short, vector bool short); +vector signed short vec_max (vector signed short, vector signed short); +vector unsigned int vec_max (vector bool int, vector unsigned int); +vector unsigned int vec_max (vector unsigned int, vector bool int); +vector unsigned int vec_max (vector unsigned int, vector unsigned int); +vector signed int vec_max (vector bool int, vector signed int); +vector signed int vec_max (vector signed int, vector bool int); +vector signed int vec_max (vector signed int, vector signed int); +vector float vec_max (vector float, vector float); -@table @code -@item void __builtin_mips_cache (int @var{op}, const volatile void *@var{addr}) -Insert a @samp{cache} instruction with operands @var{op} and @var{addr}. -GCC defines the preprocessor macro @code{___GCC_HAVE_BUILTIN_MIPS_CACHE} -when this function is available. +vector float vec_vmaxfp (vector float, vector float); -@item unsigned int __builtin_mips_get_fcsr (void) -@itemx void __builtin_mips_set_fcsr (unsigned int @var{value}) -Get and set the contents of the floating-point control and status register -(FPU control register 31). These functions are only available in hard-float -code but can be called in both MIPS16 and non-MIPS16 contexts. +vector signed int vec_vmaxsw (vector bool int, vector signed int); +vector signed int vec_vmaxsw (vector signed int, vector bool int); +vector signed int vec_vmaxsw (vector signed int, vector signed int); -@code{__builtin_mips_set_fcsr} can be used to change any bit of the -register except the condition codes, which GCC assumes are preserved. -@end table +vector unsigned int vec_vmaxuw (vector bool int, vector unsigned int); +vector unsigned int vec_vmaxuw (vector unsigned int, vector bool int); +vector unsigned int vec_vmaxuw (vector unsigned int, + vector unsigned int); -@node MSP430 Built-in Functions -@subsection MSP430 Built-in Functions +vector signed short vec_vmaxsh (vector bool short, vector signed short); +vector signed short vec_vmaxsh (vector signed short, vector bool short); +vector signed short vec_vmaxsh (vector signed short, + vector signed short); -GCC provides a couple of special builtin functions to aid in the -writing of interrupt handlers in C. +vector unsigned short vec_vmaxuh (vector bool short, + vector unsigned short); +vector unsigned short vec_vmaxuh (vector unsigned short, + vector bool short); +vector unsigned short vec_vmaxuh (vector unsigned short, + vector unsigned short); -@table @code -@item __bic_SR_register_on_exit (int @var{mask}) -This clears the indicated bits in the saved copy of the status register -currently residing on the stack. This only works inside interrupt -handlers and the changes to the status register will only take affect -once the handler returns. +vector signed char vec_vmaxsb (vector bool char, vector signed char); +vector signed char vec_vmaxsb (vector signed char, vector bool char); +vector signed char vec_vmaxsb (vector signed char, vector signed char); -@item __bis_SR_register_on_exit (int @var{mask}) -This sets the indicated bits in the saved copy of the status register -currently residing on the stack. This only works inside interrupt -handlers and the changes to the status register will only take affect -once the handler returns. +vector unsigned char vec_vmaxub (vector bool char, + vector unsigned char); +vector unsigned char vec_vmaxub (vector unsigned char, + vector bool char); +vector unsigned char vec_vmaxub (vector unsigned char, + vector unsigned char); -@item __delay_cycles (long long @var{cycles}) -This inserts an instruction sequence that takes exactly @var{cycles} -cycles (between 0 and about 17E9) to complete. The inserted sequence -may use jumps, loops, or no-ops, and does not interfere with any other -instructions. Note that @var{cycles} must be a compile-time constant -integer - that is, you must pass a number, not a variable that may be -optimized to a constant later. The number of cycles delayed by this -builtin is exact. -@end table +vector bool char vec_mergeh (vector bool char, vector bool char); +vector signed char vec_mergeh (vector signed char, vector signed char); +vector unsigned char vec_mergeh (vector unsigned char, + vector unsigned char); +vector bool short vec_mergeh (vector bool short, vector bool short); +vector pixel vec_mergeh (vector pixel, vector pixel); +vector signed short vec_mergeh (vector signed short, + vector signed short); +vector unsigned short vec_mergeh (vector unsigned short, + vector unsigned short); +vector float vec_mergeh (vector float, vector float); +vector bool int vec_mergeh (vector bool int, vector bool int); +vector signed int vec_mergeh (vector signed int, vector signed int); +vector unsigned int vec_mergeh (vector unsigned int, + vector unsigned int); -@node NDS32 Built-in Functions -@subsection NDS32 Built-in Functions +vector float vec_vmrghw (vector float, vector float); +vector bool int vec_vmrghw (vector bool int, vector bool int); +vector signed int vec_vmrghw (vector signed int, vector signed int); +vector unsigned int vec_vmrghw (vector unsigned int, + vector unsigned int); -These built-in functions are available for the NDS32 target: +vector bool short vec_vmrghh (vector bool short, vector bool short); +vector signed short vec_vmrghh (vector signed short, + vector signed short); +vector unsigned short vec_vmrghh (vector unsigned short, + vector unsigned short); +vector pixel vec_vmrghh (vector pixel, vector pixel); -@deftypefn {Built-in Function} void __builtin_nds32_isync (int *@var{addr}) -Insert an ISYNC instruction into the instruction stream where -@var{addr} is an instruction address for serialization. -@end deftypefn +vector bool char vec_vmrghb (vector bool char, vector bool char); +vector signed char vec_vmrghb (vector signed char, vector signed char); +vector unsigned char vec_vmrghb (vector unsigned char, + vector unsigned char); -@deftypefn {Built-in Function} void __builtin_nds32_isb (void) -Insert an ISB instruction into the instruction stream. -@end deftypefn +vector bool char vec_mergel (vector bool char, vector bool char); +vector signed char vec_mergel (vector signed char, vector signed char); +vector unsigned char vec_mergel (vector unsigned char, + vector unsigned char); +vector bool short vec_mergel (vector bool short, vector bool short); +vector pixel vec_mergel (vector pixel, vector pixel); +vector signed short vec_mergel (vector signed short, + vector signed short); +vector unsigned short vec_mergel (vector unsigned short, + vector unsigned short); +vector float vec_mergel (vector float, vector float); +vector bool int vec_mergel (vector bool int, vector bool int); +vector signed int vec_mergel (vector signed int, vector signed int); +vector unsigned int vec_mergel (vector unsigned int, + vector unsigned int); -@deftypefn {Built-in Function} int __builtin_nds32_mfsr (int @var{sr}) -Return the content of a system register which is mapped by @var{sr}. -@end deftypefn +vector float vec_vmrglw (vector float, vector float); +vector signed int vec_vmrglw (vector signed int, vector signed int); +vector unsigned int vec_vmrglw (vector unsigned int, + vector unsigned int); +vector bool int vec_vmrglw (vector bool int, vector bool int); -@deftypefn {Built-in Function} int __builtin_nds32_mfusr (int @var{usr}) -Return the content of a user space register which is mapped by @var{usr}. -@end deftypefn +vector bool short vec_vmrglh (vector bool short, vector bool short); +vector signed short vec_vmrglh (vector signed short, + vector signed short); +vector unsigned short vec_vmrglh (vector unsigned short, + vector unsigned short); +vector pixel vec_vmrglh (vector pixel, vector pixel); -@deftypefn {Built-in Function} void __builtin_nds32_mtsr (int @var{value}, int @var{sr}) -Move the @var{value} to a system register which is mapped by @var{sr}. -@end deftypefn +vector bool char vec_vmrglb (vector bool char, vector bool char); +vector signed char vec_vmrglb (vector signed char, vector signed char); +vector unsigned char vec_vmrglb (vector unsigned char, + vector unsigned char); -@deftypefn {Built-in Function} void __builtin_nds32_mtusr (int @var{value}, int @var{usr}) -Move the @var{value} to a user space register which is mapped by @var{usr}. -@end deftypefn +vector unsigned short vec_mfvscr (void); -@deftypefn {Built-in Function} void __builtin_nds32_setgie_en (void) -Enable global interrupt. -@end deftypefn +vector unsigned char vec_min (vector bool char, vector unsigned char); +vector unsigned char vec_min (vector unsigned char, vector bool char); +vector unsigned char vec_min (vector unsigned char, + vector unsigned char); +vector signed char vec_min (vector bool char, vector signed char); +vector signed char vec_min (vector signed char, vector bool char); +vector signed char vec_min (vector signed char, vector signed char); +vector unsigned short vec_min (vector bool short, + vector unsigned short); +vector unsigned short vec_min (vector unsigned short, + vector bool short); +vector unsigned short vec_min (vector unsigned short, + vector unsigned short); +vector signed short vec_min (vector bool short, vector signed short); +vector signed short vec_min (vector signed short, vector bool short); +vector signed short vec_min (vector signed short, vector signed short); +vector unsigned int vec_min (vector bool int, vector unsigned int); +vector unsigned int vec_min (vector unsigned int, vector bool int); +vector unsigned int vec_min (vector unsigned int, vector unsigned int); +vector signed int vec_min (vector bool int, vector signed int); +vector signed int vec_min (vector signed int, vector bool int); +vector signed int vec_min (vector signed int, vector signed int); +vector float vec_min (vector float, vector float); -@deftypefn {Built-in Function} void __builtin_nds32_setgie_dis (void) -Disable global interrupt. -@end deftypefn +vector float vec_vminfp (vector float, vector float); -@node picoChip Built-in Functions -@subsection picoChip Built-in Functions +vector signed int vec_vminsw (vector bool int, vector signed int); +vector signed int vec_vminsw (vector signed int, vector bool int); +vector signed int vec_vminsw (vector signed int, vector signed int); -GCC provides an interface to selected machine instructions from the -picoChip instruction set. +vector unsigned int vec_vminuw (vector bool int, vector unsigned int); +vector unsigned int vec_vminuw (vector unsigned int, vector bool int); +vector unsigned int vec_vminuw (vector unsigned int, + vector unsigned int); -@table @code -@item int __builtin_sbc (int @var{value}) -Sign bit count. Return the number of consecutive bits in @var{value} -that have the same value as the sign bit. The result is the number of -leading sign bits minus one, giving the number of redundant sign bits in -@var{value}. +vector signed short vec_vminsh (vector bool short, vector signed short); +vector signed short vec_vminsh (vector signed short, vector bool short); +vector signed short vec_vminsh (vector signed short, + vector signed short); -@item int __builtin_byteswap (int @var{value}) -Byte swap. Return the result of swapping the upper and lower bytes of -@var{value}. +vector unsigned short vec_vminuh (vector bool short, + vector unsigned short); +vector unsigned short vec_vminuh (vector unsigned short, + vector bool short); +vector unsigned short vec_vminuh (vector unsigned short, + vector unsigned short); -@item int __builtin_brev (int @var{value}) -Bit reversal. Return the result of reversing the bits in -@var{value}. Bit 15 is swapped with bit 0, bit 14 is swapped with bit 1, -and so on. +vector signed char vec_vminsb (vector bool char, vector signed char); +vector signed char vec_vminsb (vector signed char, vector bool char); +vector signed char vec_vminsb (vector signed char, vector signed char); -@item int __builtin_adds (int @var{x}, int @var{y}) -Saturating addition. Return the result of adding @var{x} and @var{y}, -storing the value 32767 if the result overflows. +vector unsigned char vec_vminub (vector bool char, + vector unsigned char); +vector unsigned char vec_vminub (vector unsigned char, + vector bool char); +vector unsigned char vec_vminub (vector unsigned char, + vector unsigned char); -@item int __builtin_subs (int @var{x}, int @var{y}) -Saturating subtraction. Return the result of subtracting @var{y} from -@var{x}, storing the value @minus{}32768 if the result overflows. +vector signed short vec_mladd (vector signed short, + vector signed short, + vector signed short); +vector signed short vec_mladd (vector signed short, + vector unsigned short, + vector unsigned short); +vector signed short vec_mladd (vector unsigned short, + vector signed short, + vector signed short); +vector unsigned short vec_mladd (vector unsigned short, + vector unsigned short, + vector unsigned short); -@item void __builtin_halt (void) -Halt. The processor stops execution. This built-in is useful for -implementing assertions. +vector signed short vec_mradds (vector signed short, + vector signed short, + vector signed short); -@end table +vector unsigned int vec_msum (vector unsigned char, + vector unsigned char, + vector unsigned int); +vector signed int vec_msum (vector signed char, + vector unsigned char, + vector signed int); +vector unsigned int vec_msum (vector unsigned short, + vector unsigned short, + vector unsigned int); +vector signed int vec_msum (vector signed short, + vector signed short, + vector signed int); -@node PowerPC Built-in Functions -@subsection PowerPC Built-in Functions +vector signed int vec_vmsumshm (vector signed short, + vector signed short, + vector signed int); -These built-in functions are available for the PowerPC family of -processors: -@smallexample -float __builtin_recipdivf (float, float); -float __builtin_rsqrtf (float); -double __builtin_recipdiv (double, double); -double __builtin_rsqrt (double); -uint64_t __builtin_ppc_get_timebase (); -unsigned long __builtin_ppc_mftb (); -double __builtin_unpack_longdouble (long double, int); -long double __builtin_pack_longdouble (double, double); -@end smallexample +vector unsigned int vec_vmsumuhm (vector unsigned short, + vector unsigned short, + vector unsigned int); -The @code{vec_rsqrt}, @code{__builtin_rsqrt}, and -@code{__builtin_rsqrtf} functions generate multiple instructions to -implement the reciprocal sqrt functionality using reciprocal sqrt -estimate instructions. +vector signed int vec_vmsummbm (vector signed char, + vector unsigned char, + vector signed int); -The @code{__builtin_recipdiv}, and @code{__builtin_recipdivf} -functions generate multiple instructions to implement division using -the reciprocal estimate instructions. +vector unsigned int vec_vmsumubm (vector unsigned char, + vector unsigned char, + vector unsigned int); -The @code{__builtin_ppc_get_timebase} and @code{__builtin_ppc_mftb} -functions generate instructions to read the Time Base Register. The -@code{__builtin_ppc_get_timebase} function may generate multiple -instructions and always returns the 64 bits of the Time Base Register. -The @code{__builtin_ppc_mftb} function always generates one instruction and -returns the Time Base Register value as an unsigned long, throwing away -the most significant word on 32-bit environments. +vector unsigned int vec_msums (vector unsigned short, + vector unsigned short, + vector unsigned int); +vector signed int vec_msums (vector signed short, + vector signed short, + vector signed int); -The following built-in functions are available for the PowerPC family -of processors, starting with ISA 2.06 or later (@option{-mcpu=power7} -or @option{-mpopcntd}): -@smallexample -long __builtin_bpermd (long, long); -int __builtin_divwe (int, int); -int __builtin_divweo (int, int); -unsigned int __builtin_divweu (unsigned int, unsigned int); -unsigned int __builtin_divweuo (unsigned int, unsigned int); -long __builtin_divde (long, long); -long __builtin_divdeo (long, long); -unsigned long __builtin_divdeu (unsigned long, unsigned long); -unsigned long __builtin_divdeuo (unsigned long, unsigned long); -unsigned int cdtbcd (unsigned int); -unsigned int cbcdtd (unsigned int); -unsigned int addg6s (unsigned int, unsigned int); -@end smallexample +vector signed int vec_vmsumshs (vector signed short, + vector signed short, + vector signed int); -The @code{__builtin_divde}, @code{__builtin_divdeo}, -@code{__builitin_divdeu}, @code{__builtin_divdeou} functions require a -64-bit environment support ISA 2.06 or later. +vector unsigned int vec_vmsumuhs (vector unsigned short, + vector unsigned short, + vector unsigned int); -The following built-in functions are available for the PowerPC family -of processors when hardware decimal floating point -(@option{-mhard-dfp}) is available: -@smallexample -_Decimal64 __builtin_dxex (_Decimal64); -_Decimal128 __builtin_dxexq (_Decimal128); -_Decimal64 __builtin_ddedpd (int, _Decimal64); -_Decimal128 __builtin_ddedpdq (int, _Decimal128); -_Decimal64 __builtin_denbcd (int, _Decimal64); -_Decimal128 __builtin_denbcdq (int, _Decimal128); -_Decimal64 __builtin_diex (_Decimal64, _Decimal64); -_Decimal128 _builtin_diexq (_Decimal128, _Decimal128); -_Decimal64 __builtin_dscli (_Decimal64, int); -_Decimal128 __builitn_dscliq (_Decimal128, int); -_Decimal64 __builtin_dscri (_Decimal64, int); -_Decimal128 __builitn_dscriq (_Decimal128, int); -unsigned long long __builtin_unpack_dec128 (_Decimal128, int); -_Decimal128 __builtin_pack_dec128 (unsigned long long, unsigned long long); -@end smallexample +void vec_mtvscr (vector signed int); +void vec_mtvscr (vector unsigned int); +void vec_mtvscr (vector bool int); +void vec_mtvscr (vector signed short); +void vec_mtvscr (vector unsigned short); +void vec_mtvscr (vector bool short); +void vec_mtvscr (vector pixel); +void vec_mtvscr (vector signed char); +void vec_mtvscr (vector unsigned char); +void vec_mtvscr (vector bool char); -The following built-in functions are available for the PowerPC family -of processors when the Vector Scalar (vsx) instruction set is -available: -@smallexample -unsigned long long __builtin_unpack_vector_int128 (vector __int128_t, int); -vector __int128_t __builtin_pack_vector_int128 (unsigned long long, - unsigned long long); -@end smallexample +vector unsigned short vec_mule (vector unsigned char, + vector unsigned char); +vector signed short vec_mule (vector signed char, + vector signed char); +vector unsigned int vec_mule (vector unsigned short, + vector unsigned short); +vector signed int vec_mule (vector signed short, vector signed short); + +vector signed int vec_vmulesh (vector signed short, + vector signed short); -@node PowerPC AltiVec/VSX Built-in Functions -@subsection PowerPC AltiVec Built-in Functions +vector unsigned int vec_vmuleuh (vector unsigned short, + vector unsigned short); -GCC provides an interface for the PowerPC family of processors to access -the AltiVec operations described in Motorola's AltiVec Programming -Interface Manual. The interface is made available by including -@code{} and using @option{-maltivec} and -@option{-mabi=altivec}. The interface supports the following vector -types. +vector signed short vec_vmulesb (vector signed char, + vector signed char); -@smallexample -vector unsigned char -vector signed char -vector bool char +vector unsigned short vec_vmuleub (vector unsigned char, + vector unsigned char); -vector unsigned short -vector signed short -vector bool short -vector pixel +vector unsigned short vec_mulo (vector unsigned char, + vector unsigned char); +vector signed short vec_mulo (vector signed char, vector signed char); +vector unsigned int vec_mulo (vector unsigned short, + vector unsigned short); +vector signed int vec_mulo (vector signed short, vector signed short); -vector unsigned int -vector signed int -vector bool int -vector float -@end smallexample +vector signed int vec_vmulosh (vector signed short, + vector signed short); -If @option{-mvsx} is used the following additional vector types are -implemented. +vector unsigned int vec_vmulouh (vector unsigned short, + vector unsigned short); -@smallexample -vector unsigned long -vector signed long -vector double -@end smallexample +vector signed short vec_vmulosb (vector signed char, + vector signed char); -The long types are only implemented for 64-bit code generation, and -the long type is only used in the floating point/integer conversion -instructions. +vector unsigned short vec_vmuloub (vector unsigned char, + vector unsigned char); -GCC's implementation of the high-level language interface available from -C and C++ code differs from Motorola's documentation in several ways. +vector float vec_nmsub (vector float, vector float, vector float); -@itemize @bullet +vector float vec_nor (vector float, vector float); +vector signed int vec_nor (vector signed int, vector signed int); +vector unsigned int vec_nor (vector unsigned int, vector unsigned int); +vector bool int vec_nor (vector bool int, vector bool int); +vector signed short vec_nor (vector signed short, vector signed short); +vector unsigned short vec_nor (vector unsigned short, + vector unsigned short); +vector bool short vec_nor (vector bool short, vector bool short); +vector signed char vec_nor (vector signed char, vector signed char); +vector unsigned char vec_nor (vector unsigned char, + vector unsigned char); +vector bool char vec_nor (vector bool char, vector bool char); -@item -A vector constant is a list of constant expressions within curly braces. +vector float vec_or (vector float, vector float); +vector float vec_or (vector float, vector bool int); +vector float vec_or (vector bool int, vector float); +vector bool int vec_or (vector bool int, vector bool int); +vector signed int vec_or (vector bool int, vector signed int); +vector signed int vec_or (vector signed int, vector bool int); +vector signed int vec_or (vector signed int, vector signed int); +vector unsigned int vec_or (vector bool int, vector unsigned int); +vector unsigned int vec_or (vector unsigned int, vector bool int); +vector unsigned int vec_or (vector unsigned int, vector unsigned int); +vector bool short vec_or (vector bool short, vector bool short); +vector signed short vec_or (vector bool short, vector signed short); +vector signed short vec_or (vector signed short, vector bool short); +vector signed short vec_or (vector signed short, vector signed short); +vector unsigned short vec_or (vector bool short, vector unsigned short); +vector unsigned short vec_or (vector unsigned short, vector bool short); +vector unsigned short vec_or (vector unsigned short, + vector unsigned short); +vector signed char vec_or (vector bool char, vector signed char); +vector bool char vec_or (vector bool char, vector bool char); +vector signed char vec_or (vector signed char, vector bool char); +vector signed char vec_or (vector signed char, vector signed char); +vector unsigned char vec_or (vector bool char, vector unsigned char); +vector unsigned char vec_or (vector unsigned char, vector bool char); +vector unsigned char vec_or (vector unsigned char, + vector unsigned char); -@item -A vector initializer requires no cast if the vector constant is of the -same type as the variable it is initializing. +vector signed char vec_pack (vector signed short, vector signed short); +vector unsigned char vec_pack (vector unsigned short, + vector unsigned short); +vector bool char vec_pack (vector bool short, vector bool short); +vector signed short vec_pack (vector signed int, vector signed int); +vector unsigned short vec_pack (vector unsigned int, + vector unsigned int); +vector bool short vec_pack (vector bool int, vector bool int); -@item -If @code{signed} or @code{unsigned} is omitted, the signedness of the -vector type is the default signedness of the base type. The default -varies depending on the operating system, so a portable program should -always specify the signedness. +vector bool short vec_vpkuwum (vector bool int, vector bool int); +vector signed short vec_vpkuwum (vector signed int, vector signed int); +vector unsigned short vec_vpkuwum (vector unsigned int, + vector unsigned int); -@item -Compiling with @option{-maltivec} adds keywords @code{__vector}, -@code{vector}, @code{__pixel}, @code{pixel}, @code{__bool} and -@code{bool}. When compiling ISO C, the context-sensitive substitution -of the keywords @code{vector}, @code{pixel} and @code{bool} is -disabled. To use them, you must include @code{} instead. +vector bool char vec_vpkuhum (vector bool short, vector bool short); +vector signed char vec_vpkuhum (vector signed short, + vector signed short); +vector unsigned char vec_vpkuhum (vector unsigned short, + vector unsigned short); -@item -GCC allows using a @code{typedef} name as the type specifier for a -vector type. +vector pixel vec_packpx (vector unsigned int, vector unsigned int); -@item -For C, overloaded functions are implemented with macros so the following -does not work: +vector unsigned char vec_packs (vector unsigned short, + vector unsigned short); +vector signed char vec_packs (vector signed short, vector signed short); +vector unsigned short vec_packs (vector unsigned int, + vector unsigned int); +vector signed short vec_packs (vector signed int, vector signed int); -@smallexample - vec_add ((vector signed int)@{1, 2, 3, 4@}, foo); -@end smallexample +vector signed short vec_vpkswss (vector signed int, vector signed int); -@noindent -Since @code{vec_add} is a macro, the vector constant in the example -is treated as four separate arguments. Wrap the entire argument in -parentheses for this to work. -@end itemize +vector unsigned short vec_vpkuwus (vector unsigned int, + vector unsigned int); -@emph{Note:} Only the @code{} interface is supported. -Internally, GCC uses built-in functions to achieve the functionality in -the aforementioned header file, but they are not supported and are -subject to change without notice. +vector signed char vec_vpkshss (vector signed short, + vector signed short); -The following interfaces are supported for the generic and specific -AltiVec operations and the AltiVec predicates. In cases where there -is a direct mapping between generic and specific operations, only the -generic names are shown here, although the specific operations can also -be used. +vector unsigned char vec_vpkuhus (vector unsigned short, + vector unsigned short); -Arguments that are documented as @code{const int} require literal -integral values within the range required for that operation. +vector unsigned char vec_packsu (vector unsigned short, + vector unsigned short); +vector unsigned char vec_packsu (vector signed short, + vector signed short); +vector unsigned short vec_packsu (vector unsigned int, + vector unsigned int); +vector unsigned short vec_packsu (vector signed int, vector signed int); -@smallexample -vector signed char vec_abs (vector signed char); -vector signed short vec_abs (vector signed short); -vector signed int vec_abs (vector signed int); -vector float vec_abs (vector float); +vector unsigned short vec_vpkswus (vector signed int, + vector signed int); -vector signed char vec_abss (vector signed char); -vector signed short vec_abss (vector signed short); -vector signed int vec_abss (vector signed int); +vector unsigned char vec_vpkshus (vector signed short, + vector signed short); -vector signed char vec_add (vector bool char, vector signed char); -vector signed char vec_add (vector signed char, vector bool char); -vector signed char vec_add (vector signed char, vector signed char); -vector unsigned char vec_add (vector bool char, vector unsigned char); -vector unsigned char vec_add (vector unsigned char, vector bool char); -vector unsigned char vec_add (vector unsigned char, +vector float vec_perm (vector float, + vector float, + vector unsigned char); +vector signed int vec_perm (vector signed int, + vector signed int, + vector unsigned char); +vector unsigned int vec_perm (vector unsigned int, + vector unsigned int, vector unsigned char); -vector signed short vec_add (vector bool short, vector signed short); -vector signed short vec_add (vector signed short, vector bool short); -vector signed short vec_add (vector signed short, vector signed short); -vector unsigned short vec_add (vector bool short, - vector unsigned short); -vector unsigned short vec_add (vector unsigned short, - vector bool short); -vector unsigned short vec_add (vector unsigned short, - vector unsigned short); -vector signed int vec_add (vector bool int, vector signed int); -vector signed int vec_add (vector signed int, vector bool int); -vector signed int vec_add (vector signed int, vector signed int); -vector unsigned int vec_add (vector bool int, vector unsigned int); -vector unsigned int vec_add (vector unsigned int, vector bool int); -vector unsigned int vec_add (vector unsigned int, vector unsigned int); -vector float vec_add (vector float, vector float); +vector bool int vec_perm (vector bool int, + vector bool int, + vector unsigned char); +vector signed short vec_perm (vector signed short, + vector signed short, + vector unsigned char); +vector unsigned short vec_perm (vector unsigned short, + vector unsigned short, + vector unsigned char); +vector bool short vec_perm (vector bool short, + vector bool short, + vector unsigned char); +vector pixel vec_perm (vector pixel, + vector pixel, + vector unsigned char); +vector signed char vec_perm (vector signed char, + vector signed char, + vector unsigned char); +vector unsigned char vec_perm (vector unsigned char, + vector unsigned char, + vector unsigned char); +vector bool char vec_perm (vector bool char, + vector bool char, + vector unsigned char); -vector float vec_vaddfp (vector float, vector float); +vector float vec_re (vector float); + +vector signed char vec_rl (vector signed char, + vector unsigned char); +vector unsigned char vec_rl (vector unsigned char, + vector unsigned char); +vector signed short vec_rl (vector signed short, vector unsigned short); +vector unsigned short vec_rl (vector unsigned short, + vector unsigned short); +vector signed int vec_rl (vector signed int, vector unsigned int); +vector unsigned int vec_rl (vector unsigned int, vector unsigned int); + +vector signed int vec_vrlw (vector signed int, vector unsigned int); +vector unsigned int vec_vrlw (vector unsigned int, vector unsigned int); + +vector signed short vec_vrlh (vector signed short, + vector unsigned short); +vector unsigned short vec_vrlh (vector unsigned short, + vector unsigned short); -vector signed int vec_vadduwm (vector bool int, vector signed int); -vector signed int vec_vadduwm (vector signed int, vector bool int); -vector signed int vec_vadduwm (vector signed int, vector signed int); -vector unsigned int vec_vadduwm (vector bool int, vector unsigned int); -vector unsigned int vec_vadduwm (vector unsigned int, vector bool int); -vector unsigned int vec_vadduwm (vector unsigned int, - vector unsigned int); +vector signed char vec_vrlb (vector signed char, vector unsigned char); +vector unsigned char vec_vrlb (vector unsigned char, + vector unsigned char); -vector signed short vec_vadduhm (vector bool short, - vector signed short); -vector signed short vec_vadduhm (vector signed short, - vector bool short); -vector signed short vec_vadduhm (vector signed short, - vector signed short); -vector unsigned short vec_vadduhm (vector bool short, - vector unsigned short); -vector unsigned short vec_vadduhm (vector unsigned short, - vector bool short); -vector unsigned short vec_vadduhm (vector unsigned short, - vector unsigned short); +vector float vec_round (vector float); -vector signed char vec_vaddubm (vector bool char, vector signed char); -vector signed char vec_vaddubm (vector signed char, vector bool char); -vector signed char vec_vaddubm (vector signed char, vector signed char); -vector unsigned char vec_vaddubm (vector bool char, - vector unsigned char); -vector unsigned char vec_vaddubm (vector unsigned char, - vector bool char); -vector unsigned char vec_vaddubm (vector unsigned char, - vector unsigned char); +vector float vec_recip (vector float, vector float); -vector unsigned int vec_addc (vector unsigned int, vector unsigned int); +vector float vec_rsqrt (vector float); -vector unsigned char vec_adds (vector bool char, vector unsigned char); -vector unsigned char vec_adds (vector unsigned char, vector bool char); -vector unsigned char vec_adds (vector unsigned char, - vector unsigned char); -vector signed char vec_adds (vector bool char, vector signed char); -vector signed char vec_adds (vector signed char, vector bool char); -vector signed char vec_adds (vector signed char, vector signed char); -vector unsigned short vec_adds (vector bool short, - vector unsigned short); -vector unsigned short vec_adds (vector unsigned short, - vector bool short); -vector unsigned short vec_adds (vector unsigned short, - vector unsigned short); -vector signed short vec_adds (vector bool short, vector signed short); -vector signed short vec_adds (vector signed short, vector bool short); -vector signed short vec_adds (vector signed short, vector signed short); -vector unsigned int vec_adds (vector bool int, vector unsigned int); -vector unsigned int vec_adds (vector unsigned int, vector bool int); -vector unsigned int vec_adds (vector unsigned int, vector unsigned int); -vector signed int vec_adds (vector bool int, vector signed int); -vector signed int vec_adds (vector signed int, vector bool int); -vector signed int vec_adds (vector signed int, vector signed int); +vector float vec_rsqrte (vector float); -vector signed int vec_vaddsws (vector bool int, vector signed int); -vector signed int vec_vaddsws (vector signed int, vector bool int); -vector signed int vec_vaddsws (vector signed int, vector signed int); +vector float vec_sel (vector float, vector float, vector bool int); +vector float vec_sel (vector float, vector float, vector unsigned int); +vector signed int vec_sel (vector signed int, + vector signed int, + vector bool int); +vector signed int vec_sel (vector signed int, + vector signed int, + vector unsigned int); +vector unsigned int vec_sel (vector unsigned int, + vector unsigned int, + vector bool int); +vector unsigned int vec_sel (vector unsigned int, + vector unsigned int, + vector unsigned int); +vector bool int vec_sel (vector bool int, + vector bool int, + vector bool int); +vector bool int vec_sel (vector bool int, + vector bool int, + vector unsigned int); +vector signed short vec_sel (vector signed short, + vector signed short, + vector bool short); +vector signed short vec_sel (vector signed short, + vector signed short, + vector unsigned short); +vector unsigned short vec_sel (vector unsigned short, + vector unsigned short, + vector bool short); +vector unsigned short vec_sel (vector unsigned short, + vector unsigned short, + vector unsigned short); +vector bool short vec_sel (vector bool short, + vector bool short, + vector bool short); +vector bool short vec_sel (vector bool short, + vector bool short, + vector unsigned short); +vector signed char vec_sel (vector signed char, + vector signed char, + vector bool char); +vector signed char vec_sel (vector signed char, + vector signed char, + vector unsigned char); +vector unsigned char vec_sel (vector unsigned char, + vector unsigned char, + vector bool char); +vector unsigned char vec_sel (vector unsigned char, + vector unsigned char, + vector unsigned char); +vector bool char vec_sel (vector bool char, + vector bool char, + vector bool char); +vector bool char vec_sel (vector bool char, + vector bool char, + vector unsigned char); -vector unsigned int vec_vadduws (vector bool int, vector unsigned int); -vector unsigned int vec_vadduws (vector unsigned int, vector bool int); -vector unsigned int vec_vadduws (vector unsigned int, - vector unsigned int); +vector signed char vec_sl (vector signed char, + vector unsigned char); +vector unsigned char vec_sl (vector unsigned char, + vector unsigned char); +vector signed short vec_sl (vector signed short, vector unsigned short); +vector unsigned short vec_sl (vector unsigned short, + vector unsigned short); +vector signed int vec_sl (vector signed int, vector unsigned int); +vector unsigned int vec_sl (vector unsigned int, vector unsigned int); -vector signed short vec_vaddshs (vector bool short, - vector signed short); -vector signed short vec_vaddshs (vector signed short, - vector bool short); -vector signed short vec_vaddshs (vector signed short, - vector signed short); +vector signed int vec_vslw (vector signed int, vector unsigned int); +vector unsigned int vec_vslw (vector unsigned int, vector unsigned int); -vector unsigned short vec_vadduhs (vector bool short, - vector unsigned short); -vector unsigned short vec_vadduhs (vector unsigned short, - vector bool short); -vector unsigned short vec_vadduhs (vector unsigned short, - vector unsigned short); +vector signed short vec_vslh (vector signed short, + vector unsigned short); +vector unsigned short vec_vslh (vector unsigned short, + vector unsigned short); -vector signed char vec_vaddsbs (vector bool char, vector signed char); -vector signed char vec_vaddsbs (vector signed char, vector bool char); -vector signed char vec_vaddsbs (vector signed char, vector signed char); +vector signed char vec_vslb (vector signed char, vector unsigned char); +vector unsigned char vec_vslb (vector unsigned char, + vector unsigned char); -vector unsigned char vec_vaddubs (vector bool char, - vector unsigned char); -vector unsigned char vec_vaddubs (vector unsigned char, - vector bool char); -vector unsigned char vec_vaddubs (vector unsigned char, - vector unsigned char); +vector float vec_sld (vector float, vector float, const int); +vector signed int vec_sld (vector signed int, + vector signed int, + const int); +vector unsigned int vec_sld (vector unsigned int, + vector unsigned int, + const int); +vector bool int vec_sld (vector bool int, + vector bool int, + const int); +vector signed short vec_sld (vector signed short, + vector signed short, + const int); +vector unsigned short vec_sld (vector unsigned short, + vector unsigned short, + const int); +vector bool short vec_sld (vector bool short, + vector bool short, + const int); +vector pixel vec_sld (vector pixel, + vector pixel, + const int); +vector signed char vec_sld (vector signed char, + vector signed char, + const int); +vector unsigned char vec_sld (vector unsigned char, + vector unsigned char, + const int); +vector bool char vec_sld (vector bool char, + vector bool char, + const int); -vector float vec_and (vector float, vector float); -vector float vec_and (vector float, vector bool int); -vector float vec_and (vector bool int, vector float); -vector bool int vec_and (vector bool int, vector bool int); -vector signed int vec_and (vector bool int, vector signed int); -vector signed int vec_and (vector signed int, vector bool int); -vector signed int vec_and (vector signed int, vector signed int); -vector unsigned int vec_and (vector bool int, vector unsigned int); -vector unsigned int vec_and (vector unsigned int, vector bool int); -vector unsigned int vec_and (vector unsigned int, vector unsigned int); -vector bool short vec_and (vector bool short, vector bool short); -vector signed short vec_and (vector bool short, vector signed short); -vector signed short vec_and (vector signed short, vector bool short); -vector signed short vec_and (vector signed short, vector signed short); -vector unsigned short vec_and (vector bool short, - vector unsigned short); -vector unsigned short vec_and (vector unsigned short, - vector bool short); -vector unsigned short vec_and (vector unsigned short, +vector signed int vec_sll (vector signed int, + vector unsigned int); +vector signed int vec_sll (vector signed int, + vector unsigned short); +vector signed int vec_sll (vector signed int, + vector unsigned char); +vector unsigned int vec_sll (vector unsigned int, + vector unsigned int); +vector unsigned int vec_sll (vector unsigned int, + vector unsigned short); +vector unsigned int vec_sll (vector unsigned int, + vector unsigned char); +vector bool int vec_sll (vector bool int, + vector unsigned int); +vector bool int vec_sll (vector bool int, + vector unsigned short); +vector bool int vec_sll (vector bool int, + vector unsigned char); +vector signed short vec_sll (vector signed short, + vector unsigned int); +vector signed short vec_sll (vector signed short, + vector unsigned short); +vector signed short vec_sll (vector signed short, + vector unsigned char); +vector unsigned short vec_sll (vector unsigned short, + vector unsigned int); +vector unsigned short vec_sll (vector unsigned short, vector unsigned short); -vector signed char vec_and (vector bool char, vector signed char); -vector bool char vec_and (vector bool char, vector bool char); -vector signed char vec_and (vector signed char, vector bool char); -vector signed char vec_and (vector signed char, vector signed char); -vector unsigned char vec_and (vector bool char, vector unsigned char); -vector unsigned char vec_and (vector unsigned char, vector bool char); -vector unsigned char vec_and (vector unsigned char, +vector unsigned short vec_sll (vector unsigned short, + vector unsigned char); +vector bool short vec_sll (vector bool short, vector unsigned int); +vector bool short vec_sll (vector bool short, vector unsigned short); +vector bool short vec_sll (vector bool short, vector unsigned char); +vector pixel vec_sll (vector pixel, vector unsigned int); +vector pixel vec_sll (vector pixel, vector unsigned short); +vector pixel vec_sll (vector pixel, vector unsigned char); +vector signed char vec_sll (vector signed char, vector unsigned int); +vector signed char vec_sll (vector signed char, vector unsigned short); +vector signed char vec_sll (vector signed char, vector unsigned char); +vector unsigned char vec_sll (vector unsigned char, + vector unsigned int); +vector unsigned char vec_sll (vector unsigned char, + vector unsigned short); +vector unsigned char vec_sll (vector unsigned char, vector unsigned char); +vector bool char vec_sll (vector bool char, vector unsigned int); +vector bool char vec_sll (vector bool char, vector unsigned short); +vector bool char vec_sll (vector bool char, vector unsigned char); -vector float vec_andc (vector float, vector float); -vector float vec_andc (vector float, vector bool int); -vector float vec_andc (vector bool int, vector float); -vector bool int vec_andc (vector bool int, vector bool int); -vector signed int vec_andc (vector bool int, vector signed int); -vector signed int vec_andc (vector signed int, vector bool int); -vector signed int vec_andc (vector signed int, vector signed int); -vector unsigned int vec_andc (vector bool int, vector unsigned int); -vector unsigned int vec_andc (vector unsigned int, vector bool int); -vector unsigned int vec_andc (vector unsigned int, vector unsigned int); -vector bool short vec_andc (vector bool short, vector bool short); -vector signed short vec_andc (vector bool short, vector signed short); -vector signed short vec_andc (vector signed short, vector bool short); -vector signed short vec_andc (vector signed short, vector signed short); -vector unsigned short vec_andc (vector bool short, - vector unsigned short); -vector unsigned short vec_andc (vector unsigned short, - vector bool short); -vector unsigned short vec_andc (vector unsigned short, - vector unsigned short); -vector signed char vec_andc (vector bool char, vector signed char); -vector bool char vec_andc (vector bool char, vector bool char); -vector signed char vec_andc (vector signed char, vector bool char); -vector signed char vec_andc (vector signed char, vector signed char); -vector unsigned char vec_andc (vector bool char, vector unsigned char); -vector unsigned char vec_andc (vector unsigned char, vector bool char); -vector unsigned char vec_andc (vector unsigned char, +vector float vec_slo (vector float, vector signed char); +vector float vec_slo (vector float, vector unsigned char); +vector signed int vec_slo (vector signed int, vector signed char); +vector signed int vec_slo (vector signed int, vector unsigned char); +vector unsigned int vec_slo (vector unsigned int, vector signed char); +vector unsigned int vec_slo (vector unsigned int, vector unsigned char); +vector signed short vec_slo (vector signed short, vector signed char); +vector signed short vec_slo (vector signed short, vector unsigned char); +vector unsigned short vec_slo (vector unsigned short, + vector signed char); +vector unsigned short vec_slo (vector unsigned short, vector unsigned char); - -vector unsigned char vec_avg (vector unsigned char, +vector pixel vec_slo (vector pixel, vector signed char); +vector pixel vec_slo (vector pixel, vector unsigned char); +vector signed char vec_slo (vector signed char, vector signed char); +vector signed char vec_slo (vector signed char, vector unsigned char); +vector unsigned char vec_slo (vector unsigned char, vector signed char); +vector unsigned char vec_slo (vector unsigned char, vector unsigned char); -vector signed char vec_avg (vector signed char, vector signed char); -vector unsigned short vec_avg (vector unsigned short, - vector unsigned short); -vector signed short vec_avg (vector signed short, vector signed short); -vector unsigned int vec_avg (vector unsigned int, vector unsigned int); -vector signed int vec_avg (vector signed int, vector signed int); -vector signed int vec_vavgsw (vector signed int, vector signed int); +vector signed char vec_splat (vector signed char, const int); +vector unsigned char vec_splat (vector unsigned char, const int); +vector bool char vec_splat (vector bool char, const int); +vector signed short vec_splat (vector signed short, const int); +vector unsigned short vec_splat (vector unsigned short, const int); +vector bool short vec_splat (vector bool short, const int); +vector pixel vec_splat (vector pixel, const int); +vector float vec_splat (vector float, const int); +vector signed int vec_splat (vector signed int, const int); +vector unsigned int vec_splat (vector unsigned int, const int); +vector bool int vec_splat (vector bool int, const int); +vector signed long vec_splat (vector signed long, const int); +vector unsigned long vec_splat (vector unsigned long, const int); -vector unsigned int vec_vavguw (vector unsigned int, - vector unsigned int); +vector signed char vec_splats (signed char); +vector unsigned char vec_splats (unsigned char); +vector signed short vec_splats (signed short); +vector unsigned short vec_splats (unsigned short); +vector signed int vec_splats (signed int); +vector unsigned int vec_splats (unsigned int); +vector float vec_splats (float); -vector signed short vec_vavgsh (vector signed short, - vector signed short); +vector float vec_vspltw (vector float, const int); +vector signed int vec_vspltw (vector signed int, const int); +vector unsigned int vec_vspltw (vector unsigned int, const int); +vector bool int vec_vspltw (vector bool int, const int); -vector unsigned short vec_vavguh (vector unsigned short, - vector unsigned short); +vector bool short vec_vsplth (vector bool short, const int); +vector signed short vec_vsplth (vector signed short, const int); +vector unsigned short vec_vsplth (vector unsigned short, const int); +vector pixel vec_vsplth (vector pixel, const int); -vector signed char vec_vavgsb (vector signed char, vector signed char); +vector signed char vec_vspltb (vector signed char, const int); +vector unsigned char vec_vspltb (vector unsigned char, const int); +vector bool char vec_vspltb (vector bool char, const int); -vector unsigned char vec_vavgub (vector unsigned char, - vector unsigned char); +vector signed char vec_splat_s8 (const int); -vector float vec_copysign (vector float); +vector signed short vec_splat_s16 (const int); -vector float vec_ceil (vector float); +vector signed int vec_splat_s32 (const int); -vector signed int vec_cmpb (vector float, vector float); +vector unsigned char vec_splat_u8 (const int); -vector bool char vec_cmpeq (vector signed char, vector signed char); -vector bool char vec_cmpeq (vector unsigned char, vector unsigned char); -vector bool short vec_cmpeq (vector signed short, vector signed short); -vector bool short vec_cmpeq (vector unsigned short, - vector unsigned short); -vector bool int vec_cmpeq (vector signed int, vector signed int); -vector bool int vec_cmpeq (vector unsigned int, vector unsigned int); -vector bool int vec_cmpeq (vector float, vector float); +vector unsigned short vec_splat_u16 (const int); -vector bool int vec_vcmpeqfp (vector float, vector float); +vector unsigned int vec_splat_u32 (const int); -vector bool int vec_vcmpequw (vector signed int, vector signed int); -vector bool int vec_vcmpequw (vector unsigned int, vector unsigned int); +vector signed char vec_sr (vector signed char, vector unsigned char); +vector unsigned char vec_sr (vector unsigned char, + vector unsigned char); +vector signed short vec_sr (vector signed short, + vector unsigned short); +vector unsigned short vec_sr (vector unsigned short, + vector unsigned short); +vector signed int vec_sr (vector signed int, vector unsigned int); +vector unsigned int vec_sr (vector unsigned int, vector unsigned int); -vector bool short vec_vcmpequh (vector signed short, - vector signed short); -vector bool short vec_vcmpequh (vector unsigned short, +vector signed int vec_vsrw (vector signed int, vector unsigned int); +vector unsigned int vec_vsrw (vector unsigned int, vector unsigned int); + +vector signed short vec_vsrh (vector signed short, + vector unsigned short); +vector unsigned short vec_vsrh (vector unsigned short, vector unsigned short); -vector bool char vec_vcmpequb (vector signed char, vector signed char); -vector bool char vec_vcmpequb (vector unsigned char, +vector signed char vec_vsrb (vector signed char, vector unsigned char); +vector unsigned char vec_vsrb (vector unsigned char, vector unsigned char); -vector bool int vec_cmpge (vector float, vector float); - -vector bool char vec_cmpgt (vector unsigned char, vector unsigned char); -vector bool char vec_cmpgt (vector signed char, vector signed char); -vector bool short vec_cmpgt (vector unsigned short, +vector signed char vec_sra (vector signed char, vector unsigned char); +vector unsigned char vec_sra (vector unsigned char, + vector unsigned char); +vector signed short vec_sra (vector signed short, vector unsigned short); -vector bool short vec_cmpgt (vector signed short, vector signed short); -vector bool int vec_cmpgt (vector unsigned int, vector unsigned int); -vector bool int vec_cmpgt (vector signed int, vector signed int); -vector bool int vec_cmpgt (vector float, vector float); - -vector bool int vec_vcmpgtfp (vector float, vector float); - -vector bool int vec_vcmpgtsw (vector signed int, vector signed int); - -vector bool int vec_vcmpgtuw (vector unsigned int, vector unsigned int); - -vector bool short vec_vcmpgtsh (vector signed short, - vector signed short); - -vector bool short vec_vcmpgtuh (vector unsigned short, - vector unsigned short); +vector unsigned short vec_sra (vector unsigned short, + vector unsigned short); +vector signed int vec_sra (vector signed int, vector unsigned int); +vector unsigned int vec_sra (vector unsigned int, vector unsigned int); -vector bool char vec_vcmpgtsb (vector signed char, vector signed char); +vector signed int vec_vsraw (vector signed int, vector unsigned int); +vector unsigned int vec_vsraw (vector unsigned int, + vector unsigned int); -vector bool char vec_vcmpgtub (vector unsigned char, - vector unsigned char); +vector signed short vec_vsrah (vector signed short, + vector unsigned short); +vector unsigned short vec_vsrah (vector unsigned short, + vector unsigned short); -vector bool int vec_cmple (vector float, vector float); +vector signed char vec_vsrab (vector signed char, vector unsigned char); +vector unsigned char vec_vsrab (vector unsigned char, + vector unsigned char); -vector bool char vec_cmplt (vector unsigned char, vector unsigned char); -vector bool char vec_cmplt (vector signed char, vector signed char); -vector bool short vec_cmplt (vector unsigned short, +vector signed int vec_srl (vector signed int, vector unsigned int); +vector signed int vec_srl (vector signed int, vector unsigned short); +vector signed int vec_srl (vector signed int, vector unsigned char); +vector unsigned int vec_srl (vector unsigned int, vector unsigned int); +vector unsigned int vec_srl (vector unsigned int, vector unsigned short); -vector bool short vec_cmplt (vector signed short, vector signed short); -vector bool int vec_cmplt (vector unsigned int, vector unsigned int); -vector bool int vec_cmplt (vector signed int, vector signed int); -vector bool int vec_cmplt (vector float, vector float); - -vector float vec_cpsgn (vector float, vector float); +vector unsigned int vec_srl (vector unsigned int, vector unsigned char); +vector bool int vec_srl (vector bool int, vector unsigned int); +vector bool int vec_srl (vector bool int, vector unsigned short); +vector bool int vec_srl (vector bool int, vector unsigned char); +vector signed short vec_srl (vector signed short, vector unsigned int); +vector signed short vec_srl (vector signed short, + vector unsigned short); +vector signed short vec_srl (vector signed short, vector unsigned char); +vector unsigned short vec_srl (vector unsigned short, + vector unsigned int); +vector unsigned short vec_srl (vector unsigned short, + vector unsigned short); +vector unsigned short vec_srl (vector unsigned short, + vector unsigned char); +vector bool short vec_srl (vector bool short, vector unsigned int); +vector bool short vec_srl (vector bool short, vector unsigned short); +vector bool short vec_srl (vector bool short, vector unsigned char); +vector pixel vec_srl (vector pixel, vector unsigned int); +vector pixel vec_srl (vector pixel, vector unsigned short); +vector pixel vec_srl (vector pixel, vector unsigned char); +vector signed char vec_srl (vector signed char, vector unsigned int); +vector signed char vec_srl (vector signed char, vector unsigned short); +vector signed char vec_srl (vector signed char, vector unsigned char); +vector unsigned char vec_srl (vector unsigned char, + vector unsigned int); +vector unsigned char vec_srl (vector unsigned char, + vector unsigned short); +vector unsigned char vec_srl (vector unsigned char, + vector unsigned char); +vector bool char vec_srl (vector bool char, vector unsigned int); +vector bool char vec_srl (vector bool char, vector unsigned short); +vector bool char vec_srl (vector bool char, vector unsigned char); -vector float vec_ctf (vector unsigned int, const int); -vector float vec_ctf (vector signed int, const int); -vector double vec_ctf (vector unsigned long, const int); -vector double vec_ctf (vector signed long, const int); +vector float vec_sro (vector float, vector signed char); +vector float vec_sro (vector float, vector unsigned char); +vector signed int vec_sro (vector signed int, vector signed char); +vector signed int vec_sro (vector signed int, vector unsigned char); +vector unsigned int vec_sro (vector unsigned int, vector signed char); +vector unsigned int vec_sro (vector unsigned int, vector unsigned char); +vector signed short vec_sro (vector signed short, vector signed char); +vector signed short vec_sro (vector signed short, vector unsigned char); +vector unsigned short vec_sro (vector unsigned short, + vector signed char); +vector unsigned short vec_sro (vector unsigned short, + vector unsigned char); +vector pixel vec_sro (vector pixel, vector signed char); +vector pixel vec_sro (vector pixel, vector unsigned char); +vector signed char vec_sro (vector signed char, vector signed char); +vector signed char vec_sro (vector signed char, vector unsigned char); +vector unsigned char vec_sro (vector unsigned char, vector signed char); +vector unsigned char vec_sro (vector unsigned char, + vector unsigned char); -vector float vec_vcfsx (vector signed int, const int); +void vec_st (vector float, int, vector float *); +void vec_st (vector float, int, float *); +void vec_st (vector signed int, int, vector signed int *); +void vec_st (vector signed int, int, int *); +void vec_st (vector unsigned int, int, vector unsigned int *); +void vec_st (vector unsigned int, int, unsigned int *); +void vec_st (vector bool int, int, vector bool int *); +void vec_st (vector bool int, int, unsigned int *); +void vec_st (vector bool int, int, int *); +void vec_st (vector signed short, int, vector signed short *); +void vec_st (vector signed short, int, short *); +void vec_st (vector unsigned short, int, vector unsigned short *); +void vec_st (vector unsigned short, int, unsigned short *); +void vec_st (vector bool short, int, vector bool short *); +void vec_st (vector bool short, int, unsigned short *); +void vec_st (vector pixel, int, vector pixel *); +void vec_st (vector pixel, int, unsigned short *); +void vec_st (vector pixel, int, short *); +void vec_st (vector bool short, int, short *); +void vec_st (vector signed char, int, vector signed char *); +void vec_st (vector signed char, int, signed char *); +void vec_st (vector unsigned char, int, vector unsigned char *); +void vec_st (vector unsigned char, int, unsigned char *); +void vec_st (vector bool char, int, vector bool char *); +void vec_st (vector bool char, int, unsigned char *); +void vec_st (vector bool char, int, signed char *); -vector float vec_vcfux (vector unsigned int, const int); +void vec_ste (vector signed char, int, signed char *); +void vec_ste (vector unsigned char, int, unsigned char *); +void vec_ste (vector bool char, int, signed char *); +void vec_ste (vector bool char, int, unsigned char *); +void vec_ste (vector signed short, int, short *); +void vec_ste (vector unsigned short, int, unsigned short *); +void vec_ste (vector bool short, int, short *); +void vec_ste (vector bool short, int, unsigned short *); +void vec_ste (vector pixel, int, short *); +void vec_ste (vector pixel, int, unsigned short *); +void vec_ste (vector float, int, float *); +void vec_ste (vector signed int, int, int *); +void vec_ste (vector unsigned int, int, unsigned int *); +void vec_ste (vector bool int, int, int *); +void vec_ste (vector bool int, int, unsigned int *); -vector signed int vec_cts (vector float, const int); -vector signed long vec_cts (vector double, const int); +void vec_stvewx (vector float, int, float *); +void vec_stvewx (vector signed int, int, int *); +void vec_stvewx (vector unsigned int, int, unsigned int *); +void vec_stvewx (vector bool int, int, int *); +void vec_stvewx (vector bool int, int, unsigned int *); -vector unsigned int vec_ctu (vector float, const int); -vector unsigned long vec_ctu (vector double, const int); +void vec_stvehx (vector signed short, int, short *); +void vec_stvehx (vector unsigned short, int, unsigned short *); +void vec_stvehx (vector bool short, int, short *); +void vec_stvehx (vector bool short, int, unsigned short *); +void vec_stvehx (vector pixel, int, short *); +void vec_stvehx (vector pixel, int, unsigned short *); -void vec_dss (const int); +void vec_stvebx (vector signed char, int, signed char *); +void vec_stvebx (vector unsigned char, int, unsigned char *); +void vec_stvebx (vector bool char, int, signed char *); +void vec_stvebx (vector bool char, int, unsigned char *); -void vec_dssall (void); +void vec_stl (vector float, int, vector float *); +void vec_stl (vector float, int, float *); +void vec_stl (vector signed int, int, vector signed int *); +void vec_stl (vector signed int, int, int *); +void vec_stl (vector unsigned int, int, vector unsigned int *); +void vec_stl (vector unsigned int, int, unsigned int *); +void vec_stl (vector bool int, int, vector bool int *); +void vec_stl (vector bool int, int, unsigned int *); +void vec_stl (vector bool int, int, int *); +void vec_stl (vector signed short, int, vector signed short *); +void vec_stl (vector signed short, int, short *); +void vec_stl (vector unsigned short, int, vector unsigned short *); +void vec_stl (vector unsigned short, int, unsigned short *); +void vec_stl (vector bool short, int, vector bool short *); +void vec_stl (vector bool short, int, unsigned short *); +void vec_stl (vector bool short, int, short *); +void vec_stl (vector pixel, int, vector pixel *); +void vec_stl (vector pixel, int, unsigned short *); +void vec_stl (vector pixel, int, short *); +void vec_stl (vector signed char, int, vector signed char *); +void vec_stl (vector signed char, int, signed char *); +void vec_stl (vector unsigned char, int, vector unsigned char *); +void vec_stl (vector unsigned char, int, unsigned char *); +void vec_stl (vector bool char, int, vector bool char *); +void vec_stl (vector bool char, int, unsigned char *); +void vec_stl (vector bool char, int, signed char *); -void vec_dst (const vector unsigned char *, int, const int); -void vec_dst (const vector signed char *, int, const int); -void vec_dst (const vector bool char *, int, const int); -void vec_dst (const vector unsigned short *, int, const int); -void vec_dst (const vector signed short *, int, const int); -void vec_dst (const vector bool short *, int, const int); -void vec_dst (const vector pixel *, int, const int); -void vec_dst (const vector unsigned int *, int, const int); -void vec_dst (const vector signed int *, int, const int); -void vec_dst (const vector bool int *, int, const int); -void vec_dst (const vector float *, int, const int); -void vec_dst (const unsigned char *, int, const int); -void vec_dst (const signed char *, int, const int); -void vec_dst (const unsigned short *, int, const int); -void vec_dst (const short *, int, const int); -void vec_dst (const unsigned int *, int, const int); -void vec_dst (const int *, int, const int); -void vec_dst (const unsigned long *, int, const int); -void vec_dst (const long *, int, const int); -void vec_dst (const float *, int, const int); +vector signed char vec_sub (vector bool char, vector signed char); +vector signed char vec_sub (vector signed char, vector bool char); +vector signed char vec_sub (vector signed char, vector signed char); +vector unsigned char vec_sub (vector bool char, vector unsigned char); +vector unsigned char vec_sub (vector unsigned char, vector bool char); +vector unsigned char vec_sub (vector unsigned char, + vector unsigned char); +vector signed short vec_sub (vector bool short, vector signed short); +vector signed short vec_sub (vector signed short, vector bool short); +vector signed short vec_sub (vector signed short, vector signed short); +vector unsigned short vec_sub (vector bool short, + vector unsigned short); +vector unsigned short vec_sub (vector unsigned short, + vector bool short); +vector unsigned short vec_sub (vector unsigned short, + vector unsigned short); +vector signed int vec_sub (vector bool int, vector signed int); +vector signed int vec_sub (vector signed int, vector bool int); +vector signed int vec_sub (vector signed int, vector signed int); +vector unsigned int vec_sub (vector bool int, vector unsigned int); +vector unsigned int vec_sub (vector unsigned int, vector bool int); +vector unsigned int vec_sub (vector unsigned int, vector unsigned int); +vector float vec_sub (vector float, vector float); -void vec_dstst (const vector unsigned char *, int, const int); -void vec_dstst (const vector signed char *, int, const int); -void vec_dstst (const vector bool char *, int, const int); -void vec_dstst (const vector unsigned short *, int, const int); -void vec_dstst (const vector signed short *, int, const int); -void vec_dstst (const vector bool short *, int, const int); -void vec_dstst (const vector pixel *, int, const int); -void vec_dstst (const vector unsigned int *, int, const int); -void vec_dstst (const vector signed int *, int, const int); -void vec_dstst (const vector bool int *, int, const int); -void vec_dstst (const vector float *, int, const int); -void vec_dstst (const unsigned char *, int, const int); -void vec_dstst (const signed char *, int, const int); -void vec_dstst (const unsigned short *, int, const int); -void vec_dstst (const short *, int, const int); -void vec_dstst (const unsigned int *, int, const int); -void vec_dstst (const int *, int, const int); -void vec_dstst (const unsigned long *, int, const int); -void vec_dstst (const long *, int, const int); -void vec_dstst (const float *, int, const int); +vector float vec_vsubfp (vector float, vector float); -void vec_dststt (const vector unsigned char *, int, const int); -void vec_dststt (const vector signed char *, int, const int); -void vec_dststt (const vector bool char *, int, const int); -void vec_dststt (const vector unsigned short *, int, const int); -void vec_dststt (const vector signed short *, int, const int); -void vec_dststt (const vector bool short *, int, const int); -void vec_dststt (const vector pixel *, int, const int); -void vec_dststt (const vector unsigned int *, int, const int); -void vec_dststt (const vector signed int *, int, const int); -void vec_dststt (const vector bool int *, int, const int); -void vec_dststt (const vector float *, int, const int); -void vec_dststt (const unsigned char *, int, const int); -void vec_dststt (const signed char *, int, const int); -void vec_dststt (const unsigned short *, int, const int); -void vec_dststt (const short *, int, const int); -void vec_dststt (const unsigned int *, int, const int); -void vec_dststt (const int *, int, const int); -void vec_dststt (const unsigned long *, int, const int); -void vec_dststt (const long *, int, const int); -void vec_dststt (const float *, int, const int); +vector signed int vec_vsubuwm (vector bool int, vector signed int); +vector signed int vec_vsubuwm (vector signed int, vector bool int); +vector signed int vec_vsubuwm (vector signed int, vector signed int); +vector unsigned int vec_vsubuwm (vector bool int, vector unsigned int); +vector unsigned int vec_vsubuwm (vector unsigned int, vector bool int); +vector unsigned int vec_vsubuwm (vector unsigned int, + vector unsigned int); -void vec_dstt (const vector unsigned char *, int, const int); -void vec_dstt (const vector signed char *, int, const int); -void vec_dstt (const vector bool char *, int, const int); -void vec_dstt (const vector unsigned short *, int, const int); -void vec_dstt (const vector signed short *, int, const int); -void vec_dstt (const vector bool short *, int, const int); -void vec_dstt (const vector pixel *, int, const int); -void vec_dstt (const vector unsigned int *, int, const int); -void vec_dstt (const vector signed int *, int, const int); -void vec_dstt (const vector bool int *, int, const int); -void vec_dstt (const vector float *, int, const int); -void vec_dstt (const unsigned char *, int, const int); -void vec_dstt (const signed char *, int, const int); -void vec_dstt (const unsigned short *, int, const int); -void vec_dstt (const short *, int, const int); -void vec_dstt (const unsigned int *, int, const int); -void vec_dstt (const int *, int, const int); -void vec_dstt (const unsigned long *, int, const int); -void vec_dstt (const long *, int, const int); -void vec_dstt (const float *, int, const int); +vector signed short vec_vsubuhm (vector bool short, + vector signed short); +vector signed short vec_vsubuhm (vector signed short, + vector bool short); +vector signed short vec_vsubuhm (vector signed short, + vector signed short); +vector unsigned short vec_vsubuhm (vector bool short, + vector unsigned short); +vector unsigned short vec_vsubuhm (vector unsigned short, + vector bool short); +vector unsigned short vec_vsubuhm (vector unsigned short, + vector unsigned short); -vector float vec_expte (vector float); +vector signed char vec_vsububm (vector bool char, vector signed char); +vector signed char vec_vsububm (vector signed char, vector bool char); +vector signed char vec_vsububm (vector signed char, vector signed char); +vector unsigned char vec_vsububm (vector bool char, + vector unsigned char); +vector unsigned char vec_vsububm (vector unsigned char, + vector bool char); +vector unsigned char vec_vsububm (vector unsigned char, + vector unsigned char); -vector float vec_floor (vector float); +vector unsigned int vec_subc (vector unsigned int, vector unsigned int); -vector float vec_ld (int, const vector float *); -vector float vec_ld (int, const float *); -vector bool int vec_ld (int, const vector bool int *); -vector signed int vec_ld (int, const vector signed int *); -vector signed int vec_ld (int, const int *); -vector signed int vec_ld (int, const long *); -vector unsigned int vec_ld (int, const vector unsigned int *); -vector unsigned int vec_ld (int, const unsigned int *); -vector unsigned int vec_ld (int, const unsigned long *); -vector bool short vec_ld (int, const vector bool short *); -vector pixel vec_ld (int, const vector pixel *); -vector signed short vec_ld (int, const vector signed short *); -vector signed short vec_ld (int, const short *); -vector unsigned short vec_ld (int, const vector unsigned short *); -vector unsigned short vec_ld (int, const unsigned short *); -vector bool char vec_ld (int, const vector bool char *); -vector signed char vec_ld (int, const vector signed char *); -vector signed char vec_ld (int, const signed char *); -vector unsigned char vec_ld (int, const vector unsigned char *); -vector unsigned char vec_ld (int, const unsigned char *); +vector unsigned char vec_subs (vector bool char, vector unsigned char); +vector unsigned char vec_subs (vector unsigned char, vector bool char); +vector unsigned char vec_subs (vector unsigned char, + vector unsigned char); +vector signed char vec_subs (vector bool char, vector signed char); +vector signed char vec_subs (vector signed char, vector bool char); +vector signed char vec_subs (vector signed char, vector signed char); +vector unsigned short vec_subs (vector bool short, + vector unsigned short); +vector unsigned short vec_subs (vector unsigned short, + vector bool short); +vector unsigned short vec_subs (vector unsigned short, + vector unsigned short); +vector signed short vec_subs (vector bool short, vector signed short); +vector signed short vec_subs (vector signed short, vector bool short); +vector signed short vec_subs (vector signed short, vector signed short); +vector unsigned int vec_subs (vector bool int, vector unsigned int); +vector unsigned int vec_subs (vector unsigned int, vector bool int); +vector unsigned int vec_subs (vector unsigned int, vector unsigned int); +vector signed int vec_subs (vector bool int, vector signed int); +vector signed int vec_subs (vector signed int, vector bool int); +vector signed int vec_subs (vector signed int, vector signed int); -vector signed char vec_lde (int, const signed char *); -vector unsigned char vec_lde (int, const unsigned char *); -vector signed short vec_lde (int, const short *); -vector unsigned short vec_lde (int, const unsigned short *); -vector float vec_lde (int, const float *); -vector signed int vec_lde (int, const int *); -vector unsigned int vec_lde (int, const unsigned int *); -vector signed int vec_lde (int, const long *); -vector unsigned int vec_lde (int, const unsigned long *); +vector signed int vec_vsubsws (vector bool int, vector signed int); +vector signed int vec_vsubsws (vector signed int, vector bool int); +vector signed int vec_vsubsws (vector signed int, vector signed int); -vector float vec_lvewx (int, float *); -vector signed int vec_lvewx (int, int *); -vector unsigned int vec_lvewx (int, unsigned int *); -vector signed int vec_lvewx (int, long *); -vector unsigned int vec_lvewx (int, unsigned long *); +vector unsigned int vec_vsubuws (vector bool int, vector unsigned int); +vector unsigned int vec_vsubuws (vector unsigned int, vector bool int); +vector unsigned int vec_vsubuws (vector unsigned int, + vector unsigned int); -vector signed short vec_lvehx (int, short *); -vector unsigned short vec_lvehx (int, unsigned short *); +vector signed short vec_vsubshs (vector bool short, + vector signed short); +vector signed short vec_vsubshs (vector signed short, + vector bool short); +vector signed short vec_vsubshs (vector signed short, + vector signed short); -vector signed char vec_lvebx (int, char *); -vector unsigned char vec_lvebx (int, unsigned char *); +vector unsigned short vec_vsubuhs (vector bool short, + vector unsigned short); +vector unsigned short vec_vsubuhs (vector unsigned short, + vector bool short); +vector unsigned short vec_vsubuhs (vector unsigned short, + vector unsigned short); -vector float vec_ldl (int, const vector float *); -vector float vec_ldl (int, const float *); -vector bool int vec_ldl (int, const vector bool int *); -vector signed int vec_ldl (int, const vector signed int *); -vector signed int vec_ldl (int, const int *); -vector signed int vec_ldl (int, const long *); -vector unsigned int vec_ldl (int, const vector unsigned int *); -vector unsigned int vec_ldl (int, const unsigned int *); -vector unsigned int vec_ldl (int, const unsigned long *); -vector bool short vec_ldl (int, const vector bool short *); -vector pixel vec_ldl (int, const vector pixel *); -vector signed short vec_ldl (int, const vector signed short *); -vector signed short vec_ldl (int, const short *); -vector unsigned short vec_ldl (int, const vector unsigned short *); -vector unsigned short vec_ldl (int, const unsigned short *); -vector bool char vec_ldl (int, const vector bool char *); -vector signed char vec_ldl (int, const vector signed char *); -vector signed char vec_ldl (int, const signed char *); -vector unsigned char vec_ldl (int, const vector unsigned char *); -vector unsigned char vec_ldl (int, const unsigned char *); +vector signed char vec_vsubsbs (vector bool char, vector signed char); +vector signed char vec_vsubsbs (vector signed char, vector bool char); +vector signed char vec_vsubsbs (vector signed char, vector signed char); -vector float vec_loge (vector float); +vector unsigned char vec_vsububs (vector bool char, + vector unsigned char); +vector unsigned char vec_vsububs (vector unsigned char, + vector bool char); +vector unsigned char vec_vsububs (vector unsigned char, + vector unsigned char); -vector unsigned char vec_lvsl (int, const volatile unsigned char *); -vector unsigned char vec_lvsl (int, const volatile signed char *); -vector unsigned char vec_lvsl (int, const volatile unsigned short *); -vector unsigned char vec_lvsl (int, const volatile short *); -vector unsigned char vec_lvsl (int, const volatile unsigned int *); -vector unsigned char vec_lvsl (int, const volatile int *); -vector unsigned char vec_lvsl (int, const volatile unsigned long *); -vector unsigned char vec_lvsl (int, const volatile long *); -vector unsigned char vec_lvsl (int, const volatile float *); +vector unsigned int vec_sum4s (vector unsigned char, + vector unsigned int); +vector signed int vec_sum4s (vector signed char, vector signed int); +vector signed int vec_sum4s (vector signed short, vector signed int); -vector unsigned char vec_lvsr (int, const volatile unsigned char *); -vector unsigned char vec_lvsr (int, const volatile signed char *); -vector unsigned char vec_lvsr (int, const volatile unsigned short *); -vector unsigned char vec_lvsr (int, const volatile short *); -vector unsigned char vec_lvsr (int, const volatile unsigned int *); -vector unsigned char vec_lvsr (int, const volatile int *); -vector unsigned char vec_lvsr (int, const volatile unsigned long *); -vector unsigned char vec_lvsr (int, const volatile long *); -vector unsigned char vec_lvsr (int, const volatile float *); +vector signed int vec_vsum4shs (vector signed short, vector signed int); -vector float vec_madd (vector float, vector float, vector float); +vector signed int vec_vsum4sbs (vector signed char, vector signed int); -vector signed short vec_madds (vector signed short, - vector signed short, - vector signed short); +vector unsigned int vec_vsum4ubs (vector unsigned char, + vector unsigned int); -vector unsigned char vec_max (vector bool char, vector unsigned char); -vector unsigned char vec_max (vector unsigned char, vector bool char); -vector unsigned char vec_max (vector unsigned char, - vector unsigned char); -vector signed char vec_max (vector bool char, vector signed char); -vector signed char vec_max (vector signed char, vector bool char); -vector signed char vec_max (vector signed char, vector signed char); -vector unsigned short vec_max (vector bool short, - vector unsigned short); -vector unsigned short vec_max (vector unsigned short, - vector bool short); -vector unsigned short vec_max (vector unsigned short, - vector unsigned short); -vector signed short vec_max (vector bool short, vector signed short); -vector signed short vec_max (vector signed short, vector bool short); -vector signed short vec_max (vector signed short, vector signed short); -vector unsigned int vec_max (vector bool int, vector unsigned int); -vector unsigned int vec_max (vector unsigned int, vector bool int); -vector unsigned int vec_max (vector unsigned int, vector unsigned int); -vector signed int vec_max (vector bool int, vector signed int); -vector signed int vec_max (vector signed int, vector bool int); -vector signed int vec_max (vector signed int, vector signed int); -vector float vec_max (vector float, vector float); +vector signed int vec_sum2s (vector signed int, vector signed int); -vector float vec_vmaxfp (vector float, vector float); +vector signed int vec_sums (vector signed int, vector signed int); -vector signed int vec_vmaxsw (vector bool int, vector signed int); -vector signed int vec_vmaxsw (vector signed int, vector bool int); -vector signed int vec_vmaxsw (vector signed int, vector signed int); +vector float vec_trunc (vector float); -vector unsigned int vec_vmaxuw (vector bool int, vector unsigned int); -vector unsigned int vec_vmaxuw (vector unsigned int, vector bool int); -vector unsigned int vec_vmaxuw (vector unsigned int, - vector unsigned int); +vector signed short vec_unpackh (vector signed char); +vector bool short vec_unpackh (vector bool char); +vector signed int vec_unpackh (vector signed short); +vector bool int vec_unpackh (vector bool short); +vector unsigned int vec_unpackh (vector pixel); -vector signed short vec_vmaxsh (vector bool short, vector signed short); -vector signed short vec_vmaxsh (vector signed short, vector bool short); -vector signed short vec_vmaxsh (vector signed short, - vector signed short); +vector bool int vec_vupkhsh (vector bool short); +vector signed int vec_vupkhsh (vector signed short); -vector unsigned short vec_vmaxuh (vector bool short, - vector unsigned short); -vector unsigned short vec_vmaxuh (vector unsigned short, - vector bool short); -vector unsigned short vec_vmaxuh (vector unsigned short, - vector unsigned short); +vector unsigned int vec_vupkhpx (vector pixel); -vector signed char vec_vmaxsb (vector bool char, vector signed char); -vector signed char vec_vmaxsb (vector signed char, vector bool char); -vector signed char vec_vmaxsb (vector signed char, vector signed char); +vector bool short vec_vupkhsb (vector bool char); +vector signed short vec_vupkhsb (vector signed char); -vector unsigned char vec_vmaxub (vector bool char, - vector unsigned char); -vector unsigned char vec_vmaxub (vector unsigned char, - vector bool char); -vector unsigned char vec_vmaxub (vector unsigned char, - vector unsigned char); +vector signed short vec_unpackl (vector signed char); +vector bool short vec_unpackl (vector bool char); +vector unsigned int vec_unpackl (vector pixel); +vector signed int vec_unpackl (vector signed short); +vector bool int vec_unpackl (vector bool short); -vector bool char vec_mergeh (vector bool char, vector bool char); -vector signed char vec_mergeh (vector signed char, vector signed char); -vector unsigned char vec_mergeh (vector unsigned char, - vector unsigned char); -vector bool short vec_mergeh (vector bool short, vector bool short); -vector pixel vec_mergeh (vector pixel, vector pixel); -vector signed short vec_mergeh (vector signed short, - vector signed short); -vector unsigned short vec_mergeh (vector unsigned short, - vector unsigned short); -vector float vec_mergeh (vector float, vector float); -vector bool int vec_mergeh (vector bool int, vector bool int); -vector signed int vec_mergeh (vector signed int, vector signed int); -vector unsigned int vec_mergeh (vector unsigned int, - vector unsigned int); +vector unsigned int vec_vupklpx (vector pixel); -vector float vec_vmrghw (vector float, vector float); -vector bool int vec_vmrghw (vector bool int, vector bool int); -vector signed int vec_vmrghw (vector signed int, vector signed int); -vector unsigned int vec_vmrghw (vector unsigned int, - vector unsigned int); +vector bool int vec_vupklsh (vector bool short); +vector signed int vec_vupklsh (vector signed short); -vector bool short vec_vmrghh (vector bool short, vector bool short); -vector signed short vec_vmrghh (vector signed short, - vector signed short); -vector unsigned short vec_vmrghh (vector unsigned short, - vector unsigned short); -vector pixel vec_vmrghh (vector pixel, vector pixel); +vector bool short vec_vupklsb (vector bool char); +vector signed short vec_vupklsb (vector signed char); -vector bool char vec_vmrghb (vector bool char, vector bool char); -vector signed char vec_vmrghb (vector signed char, vector signed char); -vector unsigned char vec_vmrghb (vector unsigned char, - vector unsigned char); +vector float vec_xor (vector float, vector float); +vector float vec_xor (vector float, vector bool int); +vector float vec_xor (vector bool int, vector float); +vector bool int vec_xor (vector bool int, vector bool int); +vector signed int vec_xor (vector bool int, vector signed int); +vector signed int vec_xor (vector signed int, vector bool int); +vector signed int vec_xor (vector signed int, vector signed int); +vector unsigned int vec_xor (vector bool int, vector unsigned int); +vector unsigned int vec_xor (vector unsigned int, vector bool int); +vector unsigned int vec_xor (vector unsigned int, vector unsigned int); +vector bool short vec_xor (vector bool short, vector bool short); +vector signed short vec_xor (vector bool short, vector signed short); +vector signed short vec_xor (vector signed short, vector bool short); +vector signed short vec_xor (vector signed short, vector signed short); +vector unsigned short vec_xor (vector bool short, + vector unsigned short); +vector unsigned short vec_xor (vector unsigned short, + vector bool short); +vector unsigned short vec_xor (vector unsigned short, + vector unsigned short); +vector signed char vec_xor (vector bool char, vector signed char); +vector bool char vec_xor (vector bool char, vector bool char); +vector signed char vec_xor (vector signed char, vector bool char); +vector signed char vec_xor (vector signed char, vector signed char); +vector unsigned char vec_xor (vector bool char, vector unsigned char); +vector unsigned char vec_xor (vector unsigned char, vector bool char); +vector unsigned char vec_xor (vector unsigned char, + vector unsigned char); + +int vec_all_eq (vector signed char, vector bool char); +int vec_all_eq (vector signed char, vector signed char); +int vec_all_eq (vector unsigned char, vector bool char); +int vec_all_eq (vector unsigned char, vector unsigned char); +int vec_all_eq (vector bool char, vector bool char); +int vec_all_eq (vector bool char, vector unsigned char); +int vec_all_eq (vector bool char, vector signed char); +int vec_all_eq (vector signed short, vector bool short); +int vec_all_eq (vector signed short, vector signed short); +int vec_all_eq (vector unsigned short, vector bool short); +int vec_all_eq (vector unsigned short, vector unsigned short); +int vec_all_eq (vector bool short, vector bool short); +int vec_all_eq (vector bool short, vector unsigned short); +int vec_all_eq (vector bool short, vector signed short); +int vec_all_eq (vector pixel, vector pixel); +int vec_all_eq (vector signed int, vector bool int); +int vec_all_eq (vector signed int, vector signed int); +int vec_all_eq (vector unsigned int, vector bool int); +int vec_all_eq (vector unsigned int, vector unsigned int); +int vec_all_eq (vector bool int, vector bool int); +int vec_all_eq (vector bool int, vector unsigned int); +int vec_all_eq (vector bool int, vector signed int); +int vec_all_eq (vector float, vector float); -vector bool char vec_mergel (vector bool char, vector bool char); -vector signed char vec_mergel (vector signed char, vector signed char); -vector unsigned char vec_mergel (vector unsigned char, - vector unsigned char); -vector bool short vec_mergel (vector bool short, vector bool short); -vector pixel vec_mergel (vector pixel, vector pixel); -vector signed short vec_mergel (vector signed short, - vector signed short); -vector unsigned short vec_mergel (vector unsigned short, - vector unsigned short); -vector float vec_mergel (vector float, vector float); -vector bool int vec_mergel (vector bool int, vector bool int); -vector signed int vec_mergel (vector signed int, vector signed int); -vector unsigned int vec_mergel (vector unsigned int, - vector unsigned int); +int vec_all_ge (vector bool char, vector unsigned char); +int vec_all_ge (vector unsigned char, vector bool char); +int vec_all_ge (vector unsigned char, vector unsigned char); +int vec_all_ge (vector bool char, vector signed char); +int vec_all_ge (vector signed char, vector bool char); +int vec_all_ge (vector signed char, vector signed char); +int vec_all_ge (vector bool short, vector unsigned short); +int vec_all_ge (vector unsigned short, vector bool short); +int vec_all_ge (vector unsigned short, vector unsigned short); +int vec_all_ge (vector signed short, vector signed short); +int vec_all_ge (vector bool short, vector signed short); +int vec_all_ge (vector signed short, vector bool short); +int vec_all_ge (vector bool int, vector unsigned int); +int vec_all_ge (vector unsigned int, vector bool int); +int vec_all_ge (vector unsigned int, vector unsigned int); +int vec_all_ge (vector bool int, vector signed int); +int vec_all_ge (vector signed int, vector bool int); +int vec_all_ge (vector signed int, vector signed int); +int vec_all_ge (vector float, vector float); -vector float vec_vmrglw (vector float, vector float); -vector signed int vec_vmrglw (vector signed int, vector signed int); -vector unsigned int vec_vmrglw (vector unsigned int, - vector unsigned int); -vector bool int vec_vmrglw (vector bool int, vector bool int); +int vec_all_gt (vector bool char, vector unsigned char); +int vec_all_gt (vector unsigned char, vector bool char); +int vec_all_gt (vector unsigned char, vector unsigned char); +int vec_all_gt (vector bool char, vector signed char); +int vec_all_gt (vector signed char, vector bool char); +int vec_all_gt (vector signed char, vector signed char); +int vec_all_gt (vector bool short, vector unsigned short); +int vec_all_gt (vector unsigned short, vector bool short); +int vec_all_gt (vector unsigned short, vector unsigned short); +int vec_all_gt (vector bool short, vector signed short); +int vec_all_gt (vector signed short, vector bool short); +int vec_all_gt (vector signed short, vector signed short); +int vec_all_gt (vector bool int, vector unsigned int); +int vec_all_gt (vector unsigned int, vector bool int); +int vec_all_gt (vector unsigned int, vector unsigned int); +int vec_all_gt (vector bool int, vector signed int); +int vec_all_gt (vector signed int, vector bool int); +int vec_all_gt (vector signed int, vector signed int); +int vec_all_gt (vector float, vector float); -vector bool short vec_vmrglh (vector bool short, vector bool short); -vector signed short vec_vmrglh (vector signed short, - vector signed short); -vector unsigned short vec_vmrglh (vector unsigned short, - vector unsigned short); -vector pixel vec_vmrglh (vector pixel, vector pixel); +int vec_all_in (vector float, vector float); -vector bool char vec_vmrglb (vector bool char, vector bool char); -vector signed char vec_vmrglb (vector signed char, vector signed char); -vector unsigned char vec_vmrglb (vector unsigned char, - vector unsigned char); +int vec_all_le (vector bool char, vector unsigned char); +int vec_all_le (vector unsigned char, vector bool char); +int vec_all_le (vector unsigned char, vector unsigned char); +int vec_all_le (vector bool char, vector signed char); +int vec_all_le (vector signed char, vector bool char); +int vec_all_le (vector signed char, vector signed char); +int vec_all_le (vector bool short, vector unsigned short); +int vec_all_le (vector unsigned short, vector bool short); +int vec_all_le (vector unsigned short, vector unsigned short); +int vec_all_le (vector bool short, vector signed short); +int vec_all_le (vector signed short, vector bool short); +int vec_all_le (vector signed short, vector signed short); +int vec_all_le (vector bool int, vector unsigned int); +int vec_all_le (vector unsigned int, vector bool int); +int vec_all_le (vector unsigned int, vector unsigned int); +int vec_all_le (vector bool int, vector signed int); +int vec_all_le (vector signed int, vector bool int); +int vec_all_le (vector signed int, vector signed int); +int vec_all_le (vector float, vector float); -vector unsigned short vec_mfvscr (void); +int vec_all_lt (vector bool char, vector unsigned char); +int vec_all_lt (vector unsigned char, vector bool char); +int vec_all_lt (vector unsigned char, vector unsigned char); +int vec_all_lt (vector bool char, vector signed char); +int vec_all_lt (vector signed char, vector bool char); +int vec_all_lt (vector signed char, vector signed char); +int vec_all_lt (vector bool short, vector unsigned short); +int vec_all_lt (vector unsigned short, vector bool short); +int vec_all_lt (vector unsigned short, vector unsigned short); +int vec_all_lt (vector bool short, vector signed short); +int vec_all_lt (vector signed short, vector bool short); +int vec_all_lt (vector signed short, vector signed short); +int vec_all_lt (vector bool int, vector unsigned int); +int vec_all_lt (vector unsigned int, vector bool int); +int vec_all_lt (vector unsigned int, vector unsigned int); +int vec_all_lt (vector bool int, vector signed int); +int vec_all_lt (vector signed int, vector bool int); +int vec_all_lt (vector signed int, vector signed int); +int vec_all_lt (vector float, vector float); -vector unsigned char vec_min (vector bool char, vector unsigned char); -vector unsigned char vec_min (vector unsigned char, vector bool char); -vector unsigned char vec_min (vector unsigned char, - vector unsigned char); -vector signed char vec_min (vector bool char, vector signed char); -vector signed char vec_min (vector signed char, vector bool char); -vector signed char vec_min (vector signed char, vector signed char); -vector unsigned short vec_min (vector bool short, - vector unsigned short); -vector unsigned short vec_min (vector unsigned short, - vector bool short); -vector unsigned short vec_min (vector unsigned short, - vector unsigned short); -vector signed short vec_min (vector bool short, vector signed short); -vector signed short vec_min (vector signed short, vector bool short); -vector signed short vec_min (vector signed short, vector signed short); -vector unsigned int vec_min (vector bool int, vector unsigned int); -vector unsigned int vec_min (vector unsigned int, vector bool int); -vector unsigned int vec_min (vector unsigned int, vector unsigned int); -vector signed int vec_min (vector bool int, vector signed int); -vector signed int vec_min (vector signed int, vector bool int); -vector signed int vec_min (vector signed int, vector signed int); -vector float vec_min (vector float, vector float); +int vec_all_nan (vector float); -vector float vec_vminfp (vector float, vector float); +int vec_all_ne (vector signed char, vector bool char); +int vec_all_ne (vector signed char, vector signed char); +int vec_all_ne (vector unsigned char, vector bool char); +int vec_all_ne (vector unsigned char, vector unsigned char); +int vec_all_ne (vector bool char, vector bool char); +int vec_all_ne (vector bool char, vector unsigned char); +int vec_all_ne (vector bool char, vector signed char); +int vec_all_ne (vector signed short, vector bool short); +int vec_all_ne (vector signed short, vector signed short); +int vec_all_ne (vector unsigned short, vector bool short); +int vec_all_ne (vector unsigned short, vector unsigned short); +int vec_all_ne (vector bool short, vector bool short); +int vec_all_ne (vector bool short, vector unsigned short); +int vec_all_ne (vector bool short, vector signed short); +int vec_all_ne (vector pixel, vector pixel); +int vec_all_ne (vector signed int, vector bool int); +int vec_all_ne (vector signed int, vector signed int); +int vec_all_ne (vector unsigned int, vector bool int); +int vec_all_ne (vector unsigned int, vector unsigned int); +int vec_all_ne (vector bool int, vector bool int); +int vec_all_ne (vector bool int, vector unsigned int); +int vec_all_ne (vector bool int, vector signed int); +int vec_all_ne (vector float, vector float); -vector signed int vec_vminsw (vector bool int, vector signed int); -vector signed int vec_vminsw (vector signed int, vector bool int); -vector signed int vec_vminsw (vector signed int, vector signed int); +int vec_all_nge (vector float, vector float); -vector unsigned int vec_vminuw (vector bool int, vector unsigned int); -vector unsigned int vec_vminuw (vector unsigned int, vector bool int); -vector unsigned int vec_vminuw (vector unsigned int, - vector unsigned int); +int vec_all_ngt (vector float, vector float); -vector signed short vec_vminsh (vector bool short, vector signed short); -vector signed short vec_vminsh (vector signed short, vector bool short); -vector signed short vec_vminsh (vector signed short, - vector signed short); +int vec_all_nle (vector float, vector float); -vector unsigned short vec_vminuh (vector bool short, - vector unsigned short); -vector unsigned short vec_vminuh (vector unsigned short, - vector bool short); -vector unsigned short vec_vminuh (vector unsigned short, - vector unsigned short); +int vec_all_nlt (vector float, vector float); -vector signed char vec_vminsb (vector bool char, vector signed char); -vector signed char vec_vminsb (vector signed char, vector bool char); -vector signed char vec_vminsb (vector signed char, vector signed char); +int vec_all_numeric (vector float); -vector unsigned char vec_vminub (vector bool char, - vector unsigned char); -vector unsigned char vec_vminub (vector unsigned char, - vector bool char); -vector unsigned char vec_vminub (vector unsigned char, - vector unsigned char); +int vec_any_eq (vector signed char, vector bool char); +int vec_any_eq (vector signed char, vector signed char); +int vec_any_eq (vector unsigned char, vector bool char); +int vec_any_eq (vector unsigned char, vector unsigned char); +int vec_any_eq (vector bool char, vector bool char); +int vec_any_eq (vector bool char, vector unsigned char); +int vec_any_eq (vector bool char, vector signed char); +int vec_any_eq (vector signed short, vector bool short); +int vec_any_eq (vector signed short, vector signed short); +int vec_any_eq (vector unsigned short, vector bool short); +int vec_any_eq (vector unsigned short, vector unsigned short); +int vec_any_eq (vector bool short, vector bool short); +int vec_any_eq (vector bool short, vector unsigned short); +int vec_any_eq (vector bool short, vector signed short); +int vec_any_eq (vector pixel, vector pixel); +int vec_any_eq (vector signed int, vector bool int); +int vec_any_eq (vector signed int, vector signed int); +int vec_any_eq (vector unsigned int, vector bool int); +int vec_any_eq (vector unsigned int, vector unsigned int); +int vec_any_eq (vector bool int, vector bool int); +int vec_any_eq (vector bool int, vector unsigned int); +int vec_any_eq (vector bool int, vector signed int); +int vec_any_eq (vector float, vector float); -vector signed short vec_mladd (vector signed short, - vector signed short, - vector signed short); -vector signed short vec_mladd (vector signed short, - vector unsigned short, - vector unsigned short); -vector signed short vec_mladd (vector unsigned short, - vector signed short, - vector signed short); -vector unsigned short vec_mladd (vector unsigned short, - vector unsigned short, - vector unsigned short); +int vec_any_ge (vector signed char, vector bool char); +int vec_any_ge (vector unsigned char, vector bool char); +int vec_any_ge (vector unsigned char, vector unsigned char); +int vec_any_ge (vector signed char, vector signed char); +int vec_any_ge (vector bool char, vector unsigned char); +int vec_any_ge (vector bool char, vector signed char); +int vec_any_ge (vector unsigned short, vector bool short); +int vec_any_ge (vector unsigned short, vector unsigned short); +int vec_any_ge (vector signed short, vector signed short); +int vec_any_ge (vector signed short, vector bool short); +int vec_any_ge (vector bool short, vector unsigned short); +int vec_any_ge (vector bool short, vector signed short); +int vec_any_ge (vector signed int, vector bool int); +int vec_any_ge (vector unsigned int, vector bool int); +int vec_any_ge (vector unsigned int, vector unsigned int); +int vec_any_ge (vector signed int, vector signed int); +int vec_any_ge (vector bool int, vector unsigned int); +int vec_any_ge (vector bool int, vector signed int); +int vec_any_ge (vector float, vector float); -vector signed short vec_mradds (vector signed short, - vector signed short, - vector signed short); +int vec_any_gt (vector bool char, vector unsigned char); +int vec_any_gt (vector unsigned char, vector bool char); +int vec_any_gt (vector unsigned char, vector unsigned char); +int vec_any_gt (vector bool char, vector signed char); +int vec_any_gt (vector signed char, vector bool char); +int vec_any_gt (vector signed char, vector signed char); +int vec_any_gt (vector bool short, vector unsigned short); +int vec_any_gt (vector unsigned short, vector bool short); +int vec_any_gt (vector unsigned short, vector unsigned short); +int vec_any_gt (vector bool short, vector signed short); +int vec_any_gt (vector signed short, vector bool short); +int vec_any_gt (vector signed short, vector signed short); +int vec_any_gt (vector bool int, vector unsigned int); +int vec_any_gt (vector unsigned int, vector bool int); +int vec_any_gt (vector unsigned int, vector unsigned int); +int vec_any_gt (vector bool int, vector signed int); +int vec_any_gt (vector signed int, vector bool int); +int vec_any_gt (vector signed int, vector signed int); +int vec_any_gt (vector float, vector float); -vector unsigned int vec_msum (vector unsigned char, - vector unsigned char, - vector unsigned int); -vector signed int vec_msum (vector signed char, - vector unsigned char, - vector signed int); -vector unsigned int vec_msum (vector unsigned short, - vector unsigned short, - vector unsigned int); -vector signed int vec_msum (vector signed short, - vector signed short, - vector signed int); +int vec_any_le (vector bool char, vector unsigned char); +int vec_any_le (vector unsigned char, vector bool char); +int vec_any_le (vector unsigned char, vector unsigned char); +int vec_any_le (vector bool char, vector signed char); +int vec_any_le (vector signed char, vector bool char); +int vec_any_le (vector signed char, vector signed char); +int vec_any_le (vector bool short, vector unsigned short); +int vec_any_le (vector unsigned short, vector bool short); +int vec_any_le (vector unsigned short, vector unsigned short); +int vec_any_le (vector bool short, vector signed short); +int vec_any_le (vector signed short, vector bool short); +int vec_any_le (vector signed short, vector signed short); +int vec_any_le (vector bool int, vector unsigned int); +int vec_any_le (vector unsigned int, vector bool int); +int vec_any_le (vector unsigned int, vector unsigned int); +int vec_any_le (vector bool int, vector signed int); +int vec_any_le (vector signed int, vector bool int); +int vec_any_le (vector signed int, vector signed int); +int vec_any_le (vector float, vector float); -vector signed int vec_vmsumshm (vector signed short, - vector signed short, - vector signed int); +int vec_any_lt (vector bool char, vector unsigned char); +int vec_any_lt (vector unsigned char, vector bool char); +int vec_any_lt (vector unsigned char, vector unsigned char); +int vec_any_lt (vector bool char, vector signed char); +int vec_any_lt (vector signed char, vector bool char); +int vec_any_lt (vector signed char, vector signed char); +int vec_any_lt (vector bool short, vector unsigned short); +int vec_any_lt (vector unsigned short, vector bool short); +int vec_any_lt (vector unsigned short, vector unsigned short); +int vec_any_lt (vector bool short, vector signed short); +int vec_any_lt (vector signed short, vector bool short); +int vec_any_lt (vector signed short, vector signed short); +int vec_any_lt (vector bool int, vector unsigned int); +int vec_any_lt (vector unsigned int, vector bool int); +int vec_any_lt (vector unsigned int, vector unsigned int); +int vec_any_lt (vector bool int, vector signed int); +int vec_any_lt (vector signed int, vector bool int); +int vec_any_lt (vector signed int, vector signed int); +int vec_any_lt (vector float, vector float); -vector unsigned int vec_vmsumuhm (vector unsigned short, - vector unsigned short, - vector unsigned int); +int vec_any_nan (vector float); -vector signed int vec_vmsummbm (vector signed char, - vector unsigned char, - vector signed int); +int vec_any_ne (vector signed char, vector bool char); +int vec_any_ne (vector signed char, vector signed char); +int vec_any_ne (vector unsigned char, vector bool char); +int vec_any_ne (vector unsigned char, vector unsigned char); +int vec_any_ne (vector bool char, vector bool char); +int vec_any_ne (vector bool char, vector unsigned char); +int vec_any_ne (vector bool char, vector signed char); +int vec_any_ne (vector signed short, vector bool short); +int vec_any_ne (vector signed short, vector signed short); +int vec_any_ne (vector unsigned short, vector bool short); +int vec_any_ne (vector unsigned short, vector unsigned short); +int vec_any_ne (vector bool short, vector bool short); +int vec_any_ne (vector bool short, vector unsigned short); +int vec_any_ne (vector bool short, vector signed short); +int vec_any_ne (vector pixel, vector pixel); +int vec_any_ne (vector signed int, vector bool int); +int vec_any_ne (vector signed int, vector signed int); +int vec_any_ne (vector unsigned int, vector bool int); +int vec_any_ne (vector unsigned int, vector unsigned int); +int vec_any_ne (vector bool int, vector bool int); +int vec_any_ne (vector bool int, vector unsigned int); +int vec_any_ne (vector bool int, vector signed int); +int vec_any_ne (vector float, vector float); -vector unsigned int vec_vmsumubm (vector unsigned char, - vector unsigned char, - vector unsigned int); +int vec_any_nge (vector float, vector float); -vector unsigned int vec_msums (vector unsigned short, - vector unsigned short, - vector unsigned int); -vector signed int vec_msums (vector signed short, - vector signed short, - vector signed int); +int vec_any_ngt (vector float, vector float); -vector signed int vec_vmsumshs (vector signed short, - vector signed short, - vector signed int); +int vec_any_nle (vector float, vector float); -vector unsigned int vec_vmsumuhs (vector unsigned short, - vector unsigned short, - vector unsigned int); +int vec_any_nlt (vector float, vector float); -void vec_mtvscr (vector signed int); -void vec_mtvscr (vector unsigned int); -void vec_mtvscr (vector bool int); -void vec_mtvscr (vector signed short); -void vec_mtvscr (vector unsigned short); -void vec_mtvscr (vector bool short); -void vec_mtvscr (vector pixel); -void vec_mtvscr (vector signed char); -void vec_mtvscr (vector unsigned char); -void vec_mtvscr (vector bool char); +int vec_any_numeric (vector float); -vector unsigned short vec_mule (vector unsigned char, - vector unsigned char); -vector signed short vec_mule (vector signed char, - vector signed char); -vector unsigned int vec_mule (vector unsigned short, - vector unsigned short); -vector signed int vec_mule (vector signed short, vector signed short); +int vec_any_out (vector float, vector float); +@end smallexample -vector signed int vec_vmulesh (vector signed short, - vector signed short); +If the vector/scalar (VSX) instruction set is available, the following +additional functions are available: -vector unsigned int vec_vmuleuh (vector unsigned short, - vector unsigned short); +@smallexample +vector double vec_abs (vector double); +vector double vec_add (vector double, vector double); +vector double vec_and (vector double, vector double); +vector double vec_and (vector double, vector bool long); +vector double vec_and (vector bool long, vector double); +vector long vec_and (vector long, vector long); +vector long vec_and (vector long, vector bool long); +vector long vec_and (vector bool long, vector long); +vector unsigned long vec_and (vector unsigned long, vector unsigned long); +vector unsigned long vec_and (vector unsigned long, vector bool long); +vector unsigned long vec_and (vector bool long, vector unsigned long); +vector double vec_andc (vector double, vector double); +vector double vec_andc (vector double, vector bool long); +vector double vec_andc (vector bool long, vector double); +vector long vec_andc (vector long, vector long); +vector long vec_andc (vector long, vector bool long); +vector long vec_andc (vector bool long, vector long); +vector unsigned long vec_andc (vector unsigned long, vector unsigned long); +vector unsigned long vec_andc (vector unsigned long, vector bool long); +vector unsigned long vec_andc (vector bool long, vector unsigned long); +vector double vec_ceil (vector double); +vector bool long vec_cmpeq (vector double, vector double); +vector bool long vec_cmpge (vector double, vector double); +vector bool long vec_cmpgt (vector double, vector double); +vector bool long vec_cmple (vector double, vector double); +vector bool long vec_cmplt (vector double, vector double); +vector double vec_cpsgn (vector double, vector double); +vector float vec_div (vector float, vector float); +vector double vec_div (vector double, vector double); +vector long vec_div (vector long, vector long); +vector unsigned long vec_div (vector unsigned long, vector unsigned long); +vector double vec_floor (vector double); +vector double vec_ld (int, const vector double *); +vector double vec_ld (int, const double *); +vector double vec_ldl (int, const vector double *); +vector double vec_ldl (int, const double *); +vector unsigned char vec_lvsl (int, const volatile double *); +vector unsigned char vec_lvsr (int, const volatile double *); +vector double vec_madd (vector double, vector double, vector double); +vector double vec_max (vector double, vector double); +vector signed long vec_mergeh (vector signed long, vector signed long); +vector signed long vec_mergeh (vector signed long, vector bool long); +vector signed long vec_mergeh (vector bool long, vector signed long); +vector unsigned long vec_mergeh (vector unsigned long, vector unsigned long); +vector unsigned long vec_mergeh (vector unsigned long, vector bool long); +vector unsigned long vec_mergeh (vector bool long, vector unsigned long); +vector signed long vec_mergel (vector signed long, vector signed long); +vector signed long vec_mergel (vector signed long, vector bool long); +vector signed long vec_mergel (vector bool long, vector signed long); +vector unsigned long vec_mergel (vector unsigned long, vector unsigned long); +vector unsigned long vec_mergel (vector unsigned long, vector bool long); +vector unsigned long vec_mergel (vector bool long, vector unsigned long); +vector double vec_min (vector double, vector double); +vector float vec_msub (vector float, vector float, vector float); +vector double vec_msub (vector double, vector double, vector double); +vector float vec_mul (vector float, vector float); +vector double vec_mul (vector double, vector double); +vector long vec_mul (vector long, vector long); +vector unsigned long vec_mul (vector unsigned long, vector unsigned long); +vector float vec_nearbyint (vector float); +vector double vec_nearbyint (vector double); +vector float vec_nmadd (vector float, vector float, vector float); +vector double vec_nmadd (vector double, vector double, vector double); +vector double vec_nmsub (vector double, vector double, vector double); +vector double vec_nor (vector double, vector double); +vector long vec_nor (vector long, vector long); +vector long vec_nor (vector long, vector bool long); +vector long vec_nor (vector bool long, vector long); +vector unsigned long vec_nor (vector unsigned long, vector unsigned long); +vector unsigned long vec_nor (vector unsigned long, vector bool long); +vector unsigned long vec_nor (vector bool long, vector unsigned long); +vector double vec_or (vector double, vector double); +vector double vec_or (vector double, vector bool long); +vector double vec_or (vector bool long, vector double); +vector long vec_or (vector long, vector long); +vector long vec_or (vector long, vector bool long); +vector long vec_or (vector bool long, vector long); +vector unsigned long vec_or (vector unsigned long, vector unsigned long); +vector unsigned long vec_or (vector unsigned long, vector bool long); +vector unsigned long vec_or (vector bool long, vector unsigned long); +vector double vec_perm (vector double, vector double, vector unsigned char); +vector long vec_perm (vector long, vector long, vector unsigned char); +vector unsigned long vec_perm (vector unsigned long, vector unsigned long, + vector unsigned char); +vector double vec_rint (vector double); +vector double vec_recip (vector double, vector double); +vector double vec_rsqrt (vector double); +vector double vec_rsqrte (vector double); +vector double vec_sel (vector double, vector double, vector bool long); +vector double vec_sel (vector double, vector double, vector unsigned long); +vector long vec_sel (vector long, vector long, vector long); +vector long vec_sel (vector long, vector long, vector unsigned long); +vector long vec_sel (vector long, vector long, vector bool long); +vector unsigned long vec_sel (vector unsigned long, vector unsigned long, + vector long); +vector unsigned long vec_sel (vector unsigned long, vector unsigned long, + vector unsigned long); +vector unsigned long vec_sel (vector unsigned long, vector unsigned long, + vector bool long); +vector double vec_splats (double); +vector signed long vec_splats (signed long); +vector unsigned long vec_splats (unsigned long); +vector float vec_sqrt (vector float); +vector double vec_sqrt (vector double); +void vec_st (vector double, int, vector double *); +void vec_st (vector double, int, double *); +vector double vec_sub (vector double, vector double); +vector double vec_trunc (vector double); +vector double vec_xor (vector double, vector double); +vector double vec_xor (vector double, vector bool long); +vector double vec_xor (vector bool long, vector double); +vector long vec_xor (vector long, vector long); +vector long vec_xor (vector long, vector bool long); +vector long vec_xor (vector bool long, vector long); +vector unsigned long vec_xor (vector unsigned long, vector unsigned long); +vector unsigned long vec_xor (vector unsigned long, vector bool long); +vector unsigned long vec_xor (vector bool long, vector unsigned long); +int vec_all_eq (vector double, vector double); +int vec_all_ge (vector double, vector double); +int vec_all_gt (vector double, vector double); +int vec_all_le (vector double, vector double); +int vec_all_lt (vector double, vector double); +int vec_all_nan (vector double); +int vec_all_ne (vector double, vector double); +int vec_all_nge (vector double, vector double); +int vec_all_ngt (vector double, vector double); +int vec_all_nle (vector double, vector double); +int vec_all_nlt (vector double, vector double); +int vec_all_numeric (vector double); +int vec_any_eq (vector double, vector double); +int vec_any_ge (vector double, vector double); +int vec_any_gt (vector double, vector double); +int vec_any_le (vector double, vector double); +int vec_any_lt (vector double, vector double); +int vec_any_nan (vector double); +int vec_any_ne (vector double, vector double); +int vec_any_nge (vector double, vector double); +int vec_any_ngt (vector double, vector double); +int vec_any_nle (vector double, vector double); +int vec_any_nlt (vector double, vector double); +int vec_any_numeric (vector double); -vector signed short vec_vmulesb (vector signed char, - vector signed char); +vector double vec_vsx_ld (int, const vector double *); +vector double vec_vsx_ld (int, const double *); +vector float vec_vsx_ld (int, const vector float *); +vector float vec_vsx_ld (int, const float *); +vector bool int vec_vsx_ld (int, const vector bool int *); +vector signed int vec_vsx_ld (int, const vector signed int *); +vector signed int vec_vsx_ld (int, const int *); +vector signed int vec_vsx_ld (int, const long *); +vector unsigned int vec_vsx_ld (int, const vector unsigned int *); +vector unsigned int vec_vsx_ld (int, const unsigned int *); +vector unsigned int vec_vsx_ld (int, const unsigned long *); +vector bool short vec_vsx_ld (int, const vector bool short *); +vector pixel vec_vsx_ld (int, const vector pixel *); +vector signed short vec_vsx_ld (int, const vector signed short *); +vector signed short vec_vsx_ld (int, const short *); +vector unsigned short vec_vsx_ld (int, const vector unsigned short *); +vector unsigned short vec_vsx_ld (int, const unsigned short *); +vector bool char vec_vsx_ld (int, const vector bool char *); +vector signed char vec_vsx_ld (int, const vector signed char *); +vector signed char vec_vsx_ld (int, const signed char *); +vector unsigned char vec_vsx_ld (int, const vector unsigned char *); +vector unsigned char vec_vsx_ld (int, const unsigned char *); -vector unsigned short vec_vmuleub (vector unsigned char, - vector unsigned char); +void vec_vsx_st (vector double, int, vector double *); +void vec_vsx_st (vector double, int, double *); +void vec_vsx_st (vector float, int, vector float *); +void vec_vsx_st (vector float, int, float *); +void vec_vsx_st (vector signed int, int, vector signed int *); +void vec_vsx_st (vector signed int, int, int *); +void vec_vsx_st (vector unsigned int, int, vector unsigned int *); +void vec_vsx_st (vector unsigned int, int, unsigned int *); +void vec_vsx_st (vector bool int, int, vector bool int *); +void vec_vsx_st (vector bool int, int, unsigned int *); +void vec_vsx_st (vector bool int, int, int *); +void vec_vsx_st (vector signed short, int, vector signed short *); +void vec_vsx_st (vector signed short, int, short *); +void vec_vsx_st (vector unsigned short, int, vector unsigned short *); +void vec_vsx_st (vector unsigned short, int, unsigned short *); +void vec_vsx_st (vector bool short, int, vector bool short *); +void vec_vsx_st (vector bool short, int, unsigned short *); +void vec_vsx_st (vector pixel, int, vector pixel *); +void vec_vsx_st (vector pixel, int, unsigned short *); +void vec_vsx_st (vector pixel, int, short *); +void vec_vsx_st (vector bool short, int, short *); +void vec_vsx_st (vector signed char, int, vector signed char *); +void vec_vsx_st (vector signed char, int, signed char *); +void vec_vsx_st (vector unsigned char, int, vector unsigned char *); +void vec_vsx_st (vector unsigned char, int, unsigned char *); +void vec_vsx_st (vector bool char, int, vector bool char *); +void vec_vsx_st (vector bool char, int, unsigned char *); +void vec_vsx_st (vector bool char, int, signed char *); -vector unsigned short vec_mulo (vector unsigned char, - vector unsigned char); -vector signed short vec_mulo (vector signed char, vector signed char); -vector unsigned int vec_mulo (vector unsigned short, - vector unsigned short); -vector signed int vec_mulo (vector signed short, vector signed short); +vector double vec_xxpermdi (vector double, vector double, int); +vector float vec_xxpermdi (vector float, vector float, int); +vector long long vec_xxpermdi (vector long long, vector long long, int); +vector unsigned long long vec_xxpermdi (vector unsigned long long, + vector unsigned long long, int); +vector int vec_xxpermdi (vector int, vector int, int); +vector unsigned int vec_xxpermdi (vector unsigned int, + vector unsigned int, int); +vector short vec_xxpermdi (vector short, vector short, int); +vector unsigned short vec_xxpermdi (vector unsigned short, + vector unsigned short, int); +vector signed char vec_xxpermdi (vector signed char, vector signed char, int); +vector unsigned char vec_xxpermdi (vector unsigned char, + vector unsigned char, int); -vector signed int vec_vmulosh (vector signed short, - vector signed short); +vector double vec_xxsldi (vector double, vector double, int); +vector float vec_xxsldi (vector float, vector float, int); +vector long long vec_xxsldi (vector long long, vector long long, int); +vector unsigned long long vec_xxsldi (vector unsigned long long, + vector unsigned long long, int); +vector int vec_xxsldi (vector int, vector int, int); +vector unsigned int vec_xxsldi (vector unsigned int, vector unsigned int, int); +vector short vec_xxsldi (vector short, vector short, int); +vector unsigned short vec_xxsldi (vector unsigned short, + vector unsigned short, int); +vector signed char vec_xxsldi (vector signed char, vector signed char, int); +vector unsigned char vec_xxsldi (vector unsigned char, + vector unsigned char, int); +@end smallexample -vector unsigned int vec_vmulouh (vector unsigned short, - vector unsigned short); +Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always +generate the AltiVec @samp{LVX} and @samp{STVX} instructions even +if the VSX instruction set is available. The @samp{vec_vsx_ld} and +@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X}, +@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions. -vector signed short vec_vmulosb (vector signed char, - vector signed char); +If the ISA 2.07 additions to the vector/scalar (power8-vector) +instruction set is available, the following additional functions are +available for both 32-bit and 64-bit targets. For 64-bit targets, you +can use @var{vector long} instead of @var{vector long long}, +@var{vector bool long} instead of @var{vector bool long long}, and +@var{vector unsigned long} instead of @var{vector unsigned long long}. -vector unsigned short vec_vmuloub (vector unsigned char, - vector unsigned char); +@smallexample +vector long long vec_abs (vector long long); -vector float vec_nmsub (vector float, vector float, vector float); +vector long long vec_add (vector long long, vector long long); +vector unsigned long long vec_add (vector unsigned long long, + vector unsigned long long); -vector float vec_nor (vector float, vector float); -vector signed int vec_nor (vector signed int, vector signed int); -vector unsigned int vec_nor (vector unsigned int, vector unsigned int); -vector bool int vec_nor (vector bool int, vector bool int); -vector signed short vec_nor (vector signed short, vector signed short); -vector unsigned short vec_nor (vector unsigned short, - vector unsigned short); -vector bool short vec_nor (vector bool short, vector bool short); -vector signed char vec_nor (vector signed char, vector signed char); -vector unsigned char vec_nor (vector unsigned char, - vector unsigned char); -vector bool char vec_nor (vector bool char, vector bool char); +int vec_all_eq (vector long long, vector long long); +int vec_all_eq (vector unsigned long long, vector unsigned long long); +int vec_all_ge (vector long long, vector long long); +int vec_all_ge (vector unsigned long long, vector unsigned long long); +int vec_all_gt (vector long long, vector long long); +int vec_all_gt (vector unsigned long long, vector unsigned long long); +int vec_all_le (vector long long, vector long long); +int vec_all_le (vector unsigned long long, vector unsigned long long); +int vec_all_lt (vector long long, vector long long); +int vec_all_lt (vector unsigned long long, vector unsigned long long); +int vec_all_ne (vector long long, vector long long); +int vec_all_ne (vector unsigned long long, vector unsigned long long); -vector float vec_or (vector float, vector float); -vector float vec_or (vector float, vector bool int); -vector float vec_or (vector bool int, vector float); -vector bool int vec_or (vector bool int, vector bool int); -vector signed int vec_or (vector bool int, vector signed int); -vector signed int vec_or (vector signed int, vector bool int); -vector signed int vec_or (vector signed int, vector signed int); -vector unsigned int vec_or (vector bool int, vector unsigned int); -vector unsigned int vec_or (vector unsigned int, vector bool int); -vector unsigned int vec_or (vector unsigned int, vector unsigned int); -vector bool short vec_or (vector bool short, vector bool short); -vector signed short vec_or (vector bool short, vector signed short); -vector signed short vec_or (vector signed short, vector bool short); -vector signed short vec_or (vector signed short, vector signed short); -vector unsigned short vec_or (vector bool short, vector unsigned short); -vector unsigned short vec_or (vector unsigned short, vector bool short); -vector unsigned short vec_or (vector unsigned short, - vector unsigned short); -vector signed char vec_or (vector bool char, vector signed char); -vector bool char vec_or (vector bool char, vector bool char); -vector signed char vec_or (vector signed char, vector bool char); -vector signed char vec_or (vector signed char, vector signed char); -vector unsigned char vec_or (vector bool char, vector unsigned char); -vector unsigned char vec_or (vector unsigned char, vector bool char); -vector unsigned char vec_or (vector unsigned char, - vector unsigned char); +int vec_any_eq (vector long long, vector long long); +int vec_any_eq (vector unsigned long long, vector unsigned long long); +int vec_any_ge (vector long long, vector long long); +int vec_any_ge (vector unsigned long long, vector unsigned long long); +int vec_any_gt (vector long long, vector long long); +int vec_any_gt (vector unsigned long long, vector unsigned long long); +int vec_any_le (vector long long, vector long long); +int vec_any_le (vector unsigned long long, vector unsigned long long); +int vec_any_lt (vector long long, vector long long); +int vec_any_lt (vector unsigned long long, vector unsigned long long); +int vec_any_ne (vector long long, vector long long); +int vec_any_ne (vector unsigned long long, vector unsigned long long); -vector signed char vec_pack (vector signed short, vector signed short); -vector unsigned char vec_pack (vector unsigned short, +vector long long vec_eqv (vector long long, vector long long); +vector long long vec_eqv (vector bool long long, vector long long); +vector long long vec_eqv (vector long long, vector bool long long); +vector unsigned long long vec_eqv (vector unsigned long long, + vector unsigned long long); +vector unsigned long long vec_eqv (vector bool long long, + vector unsigned long long); +vector unsigned long long vec_eqv (vector unsigned long long, + vector bool long long); +vector int vec_eqv (vector int, vector int); +vector int vec_eqv (vector bool int, vector int); +vector int vec_eqv (vector int, vector bool int); +vector unsigned int vec_eqv (vector unsigned int, vector unsigned int); +vector unsigned int vec_eqv (vector bool unsigned int, + vector unsigned int); +vector unsigned int vec_eqv (vector unsigned int, + vector bool unsigned int); +vector short vec_eqv (vector short, vector short); +vector short vec_eqv (vector bool short, vector short); +vector short vec_eqv (vector short, vector bool short); +vector unsigned short vec_eqv (vector unsigned short, vector unsigned short); +vector unsigned short vec_eqv (vector bool unsigned short, vector unsigned short); -vector bool char vec_pack (vector bool short, vector bool short); -vector signed short vec_pack (vector signed int, vector signed int); -vector unsigned short vec_pack (vector unsigned int, - vector unsigned int); -vector bool short vec_pack (vector bool int, vector bool int); - -vector bool short vec_vpkuwum (vector bool int, vector bool int); -vector signed short vec_vpkuwum (vector signed int, vector signed int); -vector unsigned short vec_vpkuwum (vector unsigned int, - vector unsigned int); - -vector bool char vec_vpkuhum (vector bool short, vector bool short); -vector signed char vec_vpkuhum (vector signed short, - vector signed short); -vector unsigned char vec_vpkuhum (vector unsigned short, - vector unsigned short); +vector unsigned short vec_eqv (vector unsigned short, + vector bool unsigned short); +vector signed char vec_eqv (vector signed char, vector signed char); +vector signed char vec_eqv (vector bool signed char, vector signed char); +vector signed char vec_eqv (vector signed char, vector bool signed char); +vector unsigned char vec_eqv (vector unsigned char, vector unsigned char); +vector unsigned char vec_eqv (vector bool unsigned char, vector unsigned char); +vector unsigned char vec_eqv (vector unsigned char, vector bool unsigned char); -vector pixel vec_packpx (vector unsigned int, vector unsigned int); +vector long long vec_max (vector long long, vector long long); +vector unsigned long long vec_max (vector unsigned long long, + vector unsigned long long); -vector unsigned char vec_packs (vector unsigned short, - vector unsigned short); -vector signed char vec_packs (vector signed short, vector signed short); -vector unsigned short vec_packs (vector unsigned int, - vector unsigned int); -vector signed short vec_packs (vector signed int, vector signed int); +vector signed int vec_mergee (vector signed int, vector signed int); +vector unsigned int vec_mergee (vector unsigned int, vector unsigned int); +vector bool int vec_mergee (vector bool int, vector bool int); -vector signed short vec_vpkswss (vector signed int, vector signed int); +vector signed int vec_mergeo (vector signed int, vector signed int); +vector unsigned int vec_mergeo (vector unsigned int, vector unsigned int); +vector bool int vec_mergeo (vector bool int, vector bool int); -vector unsigned short vec_vpkuwus (vector unsigned int, - vector unsigned int); +vector long long vec_min (vector long long, vector long long); +vector unsigned long long vec_min (vector unsigned long long, + vector unsigned long long); -vector signed char vec_vpkshss (vector signed short, - vector signed short); +vector long long vec_nand (vector long long, vector long long); +vector long long vec_nand (vector bool long long, vector long long); +vector long long vec_nand (vector long long, vector bool long long); +vector unsigned long long vec_nand (vector unsigned long long, + vector unsigned long long); +vector unsigned long long vec_nand (vector bool long long, + vector unsigned long long); +vector unsigned long long vec_nand (vector unsigned long long, + vector bool long long); +vector int vec_nand (vector int, vector int); +vector int vec_nand (vector bool int, vector int); +vector int vec_nand (vector int, vector bool int); +vector unsigned int vec_nand (vector unsigned int, vector unsigned int); +vector unsigned int vec_nand (vector bool unsigned int, + vector unsigned int); +vector unsigned int vec_nand (vector unsigned int, + vector bool unsigned int); +vector short vec_nand (vector short, vector short); +vector short vec_nand (vector bool short, vector short); +vector short vec_nand (vector short, vector bool short); +vector unsigned short vec_nand (vector unsigned short, vector unsigned short); +vector unsigned short vec_nand (vector bool unsigned short, + vector unsigned short); +vector unsigned short vec_nand (vector unsigned short, + vector bool unsigned short); +vector signed char vec_nand (vector signed char, vector signed char); +vector signed char vec_nand (vector bool signed char, vector signed char); +vector signed char vec_nand (vector signed char, vector bool signed char); +vector unsigned char vec_nand (vector unsigned char, vector unsigned char); +vector unsigned char vec_nand (vector bool unsigned char, vector unsigned char); +vector unsigned char vec_nand (vector unsigned char, vector bool unsigned char); -vector unsigned char vec_vpkuhus (vector unsigned short, - vector unsigned short); +vector long long vec_orc (vector long long, vector long long); +vector long long vec_orc (vector bool long long, vector long long); +vector long long vec_orc (vector long long, vector bool long long); +vector unsigned long long vec_orc (vector unsigned long long, + vector unsigned long long); +vector unsigned long long vec_orc (vector bool long long, + vector unsigned long long); +vector unsigned long long vec_orc (vector unsigned long long, + vector bool long long); +vector int vec_orc (vector int, vector int); +vector int vec_orc (vector bool int, vector int); +vector int vec_orc (vector int, vector bool int); +vector unsigned int vec_orc (vector unsigned int, vector unsigned int); +vector unsigned int vec_orc (vector bool unsigned int, + vector unsigned int); +vector unsigned int vec_orc (vector unsigned int, + vector bool unsigned int); +vector short vec_orc (vector short, vector short); +vector short vec_orc (vector bool short, vector short); +vector short vec_orc (vector short, vector bool short); +vector unsigned short vec_orc (vector unsigned short, vector unsigned short); +vector unsigned short vec_orc (vector bool unsigned short, + vector unsigned short); +vector unsigned short vec_orc (vector unsigned short, + vector bool unsigned short); +vector signed char vec_orc (vector signed char, vector signed char); +vector signed char vec_orc (vector bool signed char, vector signed char); +vector signed char vec_orc (vector signed char, vector bool signed char); +vector unsigned char vec_orc (vector unsigned char, vector unsigned char); +vector unsigned char vec_orc (vector bool unsigned char, vector unsigned char); +vector unsigned char vec_orc (vector unsigned char, vector bool unsigned char); -vector unsigned char vec_packsu (vector unsigned short, - vector unsigned short); -vector unsigned char vec_packsu (vector signed short, - vector signed short); -vector unsigned short vec_packsu (vector unsigned int, - vector unsigned int); -vector unsigned short vec_packsu (vector signed int, vector signed int); +vector int vec_pack (vector long long, vector long long); +vector unsigned int vec_pack (vector unsigned long long, + vector unsigned long long); +vector bool int vec_pack (vector bool long long, vector bool long long); -vector unsigned short vec_vpkswus (vector signed int, - vector signed int); +vector int vec_packs (vector long long, vector long long); +vector unsigned int vec_packs (vector unsigned long long, + vector unsigned long long); -vector unsigned char vec_vpkshus (vector signed short, - vector signed short); +vector unsigned int vec_packsu (vector long long, vector long long); +vector unsigned int vec_packsu (vector unsigned long long, + vector unsigned long long); -vector float vec_perm (vector float, - vector float, - vector unsigned char); -vector signed int vec_perm (vector signed int, - vector signed int, - vector unsigned char); -vector unsigned int vec_perm (vector unsigned int, - vector unsigned int, - vector unsigned char); -vector bool int vec_perm (vector bool int, - vector bool int, - vector unsigned char); -vector signed short vec_perm (vector signed short, - vector signed short, - vector unsigned char); -vector unsigned short vec_perm (vector unsigned short, - vector unsigned short, - vector unsigned char); -vector bool short vec_perm (vector bool short, - vector bool short, - vector unsigned char); -vector pixel vec_perm (vector pixel, - vector pixel, - vector unsigned char); -vector signed char vec_perm (vector signed char, - vector signed char, - vector unsigned char); -vector unsigned char vec_perm (vector unsigned char, - vector unsigned char, - vector unsigned char); -vector bool char vec_perm (vector bool char, - vector bool char, - vector unsigned char); +vector long long vec_rl (vector long long, + vector unsigned long long); +vector long long vec_rl (vector unsigned long long, + vector unsigned long long); -vector float vec_re (vector float); +vector long long vec_sl (vector long long, vector unsigned long long); +vector long long vec_sl (vector unsigned long long, + vector unsigned long long); -vector signed char vec_rl (vector signed char, - vector unsigned char); -vector unsigned char vec_rl (vector unsigned char, - vector unsigned char); -vector signed short vec_rl (vector signed short, vector unsigned short); -vector unsigned short vec_rl (vector unsigned short, - vector unsigned short); -vector signed int vec_rl (vector signed int, vector unsigned int); -vector unsigned int vec_rl (vector unsigned int, vector unsigned int); +vector long long vec_sr (vector long long, vector unsigned long long); +vector unsigned long long char vec_sr (vector unsigned long long, + vector unsigned long long); -vector signed int vec_vrlw (vector signed int, vector unsigned int); -vector unsigned int vec_vrlw (vector unsigned int, vector unsigned int); +vector long long vec_sra (vector long long, vector unsigned long long); +vector unsigned long long vec_sra (vector unsigned long long, + vector unsigned long long); -vector signed short vec_vrlh (vector signed short, - vector unsigned short); -vector unsigned short vec_vrlh (vector unsigned short, - vector unsigned short); +vector long long vec_sub (vector long long, vector long long); +vector unsigned long long vec_sub (vector unsigned long long, + vector unsigned long long); -vector signed char vec_vrlb (vector signed char, vector unsigned char); -vector unsigned char vec_vrlb (vector unsigned char, - vector unsigned char); +vector long long vec_unpackh (vector int); +vector unsigned long long vec_unpackh (vector unsigned int); -vector float vec_round (vector float); +vector long long vec_unpackl (vector int); +vector unsigned long long vec_unpackl (vector unsigned int); -vector float vec_recip (vector float, vector float); +vector long long vec_vaddudm (vector long long, vector long long); +vector long long vec_vaddudm (vector bool long long, vector long long); +vector long long vec_vaddudm (vector long long, vector bool long long); +vector unsigned long long vec_vaddudm (vector unsigned long long, + vector unsigned long long); +vector unsigned long long vec_vaddudm (vector bool unsigned long long, + vector unsigned long long); +vector unsigned long long vec_vaddudm (vector unsigned long long, + vector bool unsigned long long); -vector float vec_rsqrt (vector float); +vector long long vec_vbpermq (vector signed char, vector signed char); +vector long long vec_vbpermq (vector unsigned char, vector unsigned char); -vector float vec_rsqrte (vector float); +vector long long vec_cntlz (vector long long); +vector unsigned long long vec_cntlz (vector unsigned long long); +vector int vec_cntlz (vector int); +vector unsigned int vec_cntlz (vector int); +vector short vec_cntlz (vector short); +vector unsigned short vec_cntlz (vector unsigned short); +vector signed char vec_cntlz (vector signed char); +vector unsigned char vec_cntlz (vector unsigned char); -vector float vec_sel (vector float, vector float, vector bool int); -vector float vec_sel (vector float, vector float, vector unsigned int); -vector signed int vec_sel (vector signed int, - vector signed int, - vector bool int); -vector signed int vec_sel (vector signed int, - vector signed int, - vector unsigned int); -vector unsigned int vec_sel (vector unsigned int, - vector unsigned int, - vector bool int); -vector unsigned int vec_sel (vector unsigned int, - vector unsigned int, - vector unsigned int); -vector bool int vec_sel (vector bool int, - vector bool int, - vector bool int); -vector bool int vec_sel (vector bool int, - vector bool int, - vector unsigned int); -vector signed short vec_sel (vector signed short, - vector signed short, - vector bool short); -vector signed short vec_sel (vector signed short, - vector signed short, - vector unsigned short); -vector unsigned short vec_sel (vector unsigned short, - vector unsigned short, - vector bool short); -vector unsigned short vec_sel (vector unsigned short, - vector unsigned short, - vector unsigned short); -vector bool short vec_sel (vector bool short, - vector bool short, - vector bool short); -vector bool short vec_sel (vector bool short, - vector bool short, - vector unsigned short); -vector signed char vec_sel (vector signed char, - vector signed char, - vector bool char); -vector signed char vec_sel (vector signed char, - vector signed char, - vector unsigned char); -vector unsigned char vec_sel (vector unsigned char, - vector unsigned char, - vector bool char); -vector unsigned char vec_sel (vector unsigned char, - vector unsigned char, - vector unsigned char); -vector bool char vec_sel (vector bool char, - vector bool char, - vector bool char); -vector bool char vec_sel (vector bool char, - vector bool char, - vector unsigned char); +vector long long vec_vclz (vector long long); +vector unsigned long long vec_vclz (vector unsigned long long); +vector int vec_vclz (vector int); +vector unsigned int vec_vclz (vector int); +vector short vec_vclz (vector short); +vector unsigned short vec_vclz (vector unsigned short); +vector signed char vec_vclz (vector signed char); +vector unsigned char vec_vclz (vector unsigned char); -vector signed char vec_sl (vector signed char, - vector unsigned char); -vector unsigned char vec_sl (vector unsigned char, - vector unsigned char); -vector signed short vec_sl (vector signed short, vector unsigned short); -vector unsigned short vec_sl (vector unsigned short, - vector unsigned short); -vector signed int vec_sl (vector signed int, vector unsigned int); -vector unsigned int vec_sl (vector unsigned int, vector unsigned int); +vector signed char vec_vclzb (vector signed char); +vector unsigned char vec_vclzb (vector unsigned char); -vector signed int vec_vslw (vector signed int, vector unsigned int); -vector unsigned int vec_vslw (vector unsigned int, vector unsigned int); +vector long long vec_vclzd (vector long long); +vector unsigned long long vec_vclzd (vector unsigned long long); -vector signed short vec_vslh (vector signed short, - vector unsigned short); -vector unsigned short vec_vslh (vector unsigned short, - vector unsigned short); +vector short vec_vclzh (vector short); +vector unsigned short vec_vclzh (vector unsigned short); -vector signed char vec_vslb (vector signed char, vector unsigned char); -vector unsigned char vec_vslb (vector unsigned char, - vector unsigned char); +vector int vec_vclzw (vector int); +vector unsigned int vec_vclzw (vector int); -vector float vec_sld (vector float, vector float, const int); -vector signed int vec_sld (vector signed int, - vector signed int, - const int); -vector unsigned int vec_sld (vector unsigned int, - vector unsigned int, - const int); -vector bool int vec_sld (vector bool int, - vector bool int, - const int); -vector signed short vec_sld (vector signed short, - vector signed short, - const int); -vector unsigned short vec_sld (vector unsigned short, - vector unsigned short, - const int); -vector bool short vec_sld (vector bool short, - vector bool short, - const int); -vector pixel vec_sld (vector pixel, - vector pixel, - const int); -vector signed char vec_sld (vector signed char, - vector signed char, - const int); -vector unsigned char vec_sld (vector unsigned char, - vector unsigned char, - const int); -vector bool char vec_sld (vector bool char, - vector bool char, - const int); +vector signed char vec_vgbbd (vector signed char); +vector unsigned char vec_vgbbd (vector unsigned char); -vector signed int vec_sll (vector signed int, - vector unsigned int); -vector signed int vec_sll (vector signed int, - vector unsigned short); -vector signed int vec_sll (vector signed int, - vector unsigned char); -vector unsigned int vec_sll (vector unsigned int, - vector unsigned int); -vector unsigned int vec_sll (vector unsigned int, - vector unsigned short); -vector unsigned int vec_sll (vector unsigned int, - vector unsigned char); -vector bool int vec_sll (vector bool int, - vector unsigned int); -vector bool int vec_sll (vector bool int, - vector unsigned short); -vector bool int vec_sll (vector bool int, - vector unsigned char); -vector signed short vec_sll (vector signed short, - vector unsigned int); -vector signed short vec_sll (vector signed short, - vector unsigned short); -vector signed short vec_sll (vector signed short, - vector unsigned char); -vector unsigned short vec_sll (vector unsigned short, - vector unsigned int); -vector unsigned short vec_sll (vector unsigned short, - vector unsigned short); -vector unsigned short vec_sll (vector unsigned short, - vector unsigned char); -vector bool short vec_sll (vector bool short, vector unsigned int); -vector bool short vec_sll (vector bool short, vector unsigned short); -vector bool short vec_sll (vector bool short, vector unsigned char); -vector pixel vec_sll (vector pixel, vector unsigned int); -vector pixel vec_sll (vector pixel, vector unsigned short); -vector pixel vec_sll (vector pixel, vector unsigned char); -vector signed char vec_sll (vector signed char, vector unsigned int); -vector signed char vec_sll (vector signed char, vector unsigned short); -vector signed char vec_sll (vector signed char, vector unsigned char); -vector unsigned char vec_sll (vector unsigned char, - vector unsigned int); -vector unsigned char vec_sll (vector unsigned char, - vector unsigned short); -vector unsigned char vec_sll (vector unsigned char, - vector unsigned char); -vector bool char vec_sll (vector bool char, vector unsigned int); -vector bool char vec_sll (vector bool char, vector unsigned short); -vector bool char vec_sll (vector bool char, vector unsigned char); +vector long long vec_vmaxsd (vector long long, vector long long); -vector float vec_slo (vector float, vector signed char); -vector float vec_slo (vector float, vector unsigned char); -vector signed int vec_slo (vector signed int, vector signed char); -vector signed int vec_slo (vector signed int, vector unsigned char); -vector unsigned int vec_slo (vector unsigned int, vector signed char); -vector unsigned int vec_slo (vector unsigned int, vector unsigned char); -vector signed short vec_slo (vector signed short, vector signed char); -vector signed short vec_slo (vector signed short, vector unsigned char); -vector unsigned short vec_slo (vector unsigned short, - vector signed char); -vector unsigned short vec_slo (vector unsigned short, - vector unsigned char); -vector pixel vec_slo (vector pixel, vector signed char); -vector pixel vec_slo (vector pixel, vector unsigned char); -vector signed char vec_slo (vector signed char, vector signed char); -vector signed char vec_slo (vector signed char, vector unsigned char); -vector unsigned char vec_slo (vector unsigned char, vector signed char); -vector unsigned char vec_slo (vector unsigned char, - vector unsigned char); +vector unsigned long long vec_vmaxud (vector unsigned long long, + unsigned vector long long); -vector signed char vec_splat (vector signed char, const int); -vector unsigned char vec_splat (vector unsigned char, const int); -vector bool char vec_splat (vector bool char, const int); -vector signed short vec_splat (vector signed short, const int); -vector unsigned short vec_splat (vector unsigned short, const int); -vector bool short vec_splat (vector bool short, const int); -vector pixel vec_splat (vector pixel, const int); -vector float vec_splat (vector float, const int); -vector signed int vec_splat (vector signed int, const int); -vector unsigned int vec_splat (vector unsigned int, const int); -vector bool int vec_splat (vector bool int, const int); -vector signed long vec_splat (vector signed long, const int); -vector unsigned long vec_splat (vector unsigned long, const int); +vector long long vec_vminsd (vector long long, vector long long); -vector signed char vec_splats (signed char); -vector unsigned char vec_splats (unsigned char); -vector signed short vec_splats (signed short); -vector unsigned short vec_splats (unsigned short); -vector signed int vec_splats (signed int); -vector unsigned int vec_splats (unsigned int); -vector float vec_splats (float); +vector unsigned long long vec_vminud (vector long long, + vector long long); -vector float vec_vspltw (vector float, const int); -vector signed int vec_vspltw (vector signed int, const int); -vector unsigned int vec_vspltw (vector unsigned int, const int); -vector bool int vec_vspltw (vector bool int, const int); +vector int vec_vpksdss (vector long long, vector long long); +vector unsigned int vec_vpksdss (vector long long, vector long long); -vector bool short vec_vsplth (vector bool short, const int); -vector signed short vec_vsplth (vector signed short, const int); -vector unsigned short vec_vsplth (vector unsigned short, const int); -vector pixel vec_vsplth (vector pixel, const int); +vector unsigned int vec_vpkudus (vector unsigned long long, + vector unsigned long long); -vector signed char vec_vspltb (vector signed char, const int); -vector unsigned char vec_vspltb (vector unsigned char, const int); -vector bool char vec_vspltb (vector bool char, const int); +vector int vec_vpkudum (vector long long, vector long long); +vector unsigned int vec_vpkudum (vector unsigned long long, + vector unsigned long long); +vector bool int vec_vpkudum (vector bool long long, vector bool long long); -vector signed char vec_splat_s8 (const int); +vector long long vec_vpopcnt (vector long long); +vector unsigned long long vec_vpopcnt (vector unsigned long long); +vector int vec_vpopcnt (vector int); +vector unsigned int vec_vpopcnt (vector int); +vector short vec_vpopcnt (vector short); +vector unsigned short vec_vpopcnt (vector unsigned short); +vector signed char vec_vpopcnt (vector signed char); +vector unsigned char vec_vpopcnt (vector unsigned char); -vector signed short vec_splat_s16 (const int); +vector signed char vec_vpopcntb (vector signed char); +vector unsigned char vec_vpopcntb (vector unsigned char); -vector signed int vec_splat_s32 (const int); +vector long long vec_vpopcntd (vector long long); +vector unsigned long long vec_vpopcntd (vector unsigned long long); -vector unsigned char vec_splat_u8 (const int); +vector short vec_vpopcnth (vector short); +vector unsigned short vec_vpopcnth (vector unsigned short); -vector unsigned short vec_splat_u16 (const int); +vector int vec_vpopcntw (vector int); +vector unsigned int vec_vpopcntw (vector int); -vector unsigned int vec_splat_u32 (const int); +vector long long vec_vrld (vector long long, vector unsigned long long); +vector unsigned long long vec_vrld (vector unsigned long long, + vector unsigned long long); -vector signed char vec_sr (vector signed char, vector unsigned char); -vector unsigned char vec_sr (vector unsigned char, - vector unsigned char); -vector signed short vec_sr (vector signed short, - vector unsigned short); -vector unsigned short vec_sr (vector unsigned short, - vector unsigned short); -vector signed int vec_sr (vector signed int, vector unsigned int); -vector unsigned int vec_sr (vector unsigned int, vector unsigned int); +vector long long vec_vsld (vector long long, vector unsigned long long); +vector long long vec_vsld (vector unsigned long long, + vector unsigned long long); -vector signed int vec_vsrw (vector signed int, vector unsigned int); -vector unsigned int vec_vsrw (vector unsigned int, vector unsigned int); +vector long long vec_vsrad (vector long long, vector unsigned long long); +vector unsigned long long vec_vsrad (vector unsigned long long, + vector unsigned long long); -vector signed short vec_vsrh (vector signed short, - vector unsigned short); -vector unsigned short vec_vsrh (vector unsigned short, - vector unsigned short); +vector long long vec_vsrd (vector long long, vector unsigned long long); +vector unsigned long long char vec_vsrd (vector unsigned long long, + vector unsigned long long); -vector signed char vec_vsrb (vector signed char, vector unsigned char); -vector unsigned char vec_vsrb (vector unsigned char, - vector unsigned char); +vector long long vec_vsubudm (vector long long, vector long long); +vector long long vec_vsubudm (vector bool long long, vector long long); +vector long long vec_vsubudm (vector long long, vector bool long long); +vector unsigned long long vec_vsubudm (vector unsigned long long, + vector unsigned long long); +vector unsigned long long vec_vsubudm (vector bool long long, + vector unsigned long long); +vector unsigned long long vec_vsubudm (vector unsigned long long, + vector bool long long); -vector signed char vec_sra (vector signed char, vector unsigned char); -vector unsigned char vec_sra (vector unsigned char, - vector unsigned char); -vector signed short vec_sra (vector signed short, - vector unsigned short); -vector unsigned short vec_sra (vector unsigned short, - vector unsigned short); -vector signed int vec_sra (vector signed int, vector unsigned int); -vector unsigned int vec_sra (vector unsigned int, vector unsigned int); +vector long long vec_vupkhsw (vector int); +vector unsigned long long vec_vupkhsw (vector unsigned int); -vector signed int vec_vsraw (vector signed int, vector unsigned int); -vector unsigned int vec_vsraw (vector unsigned int, - vector unsigned int); +vector long long vec_vupklsw (vector int); +vector unsigned long long vec_vupklsw (vector int); +@end smallexample -vector signed short vec_vsrah (vector signed short, - vector unsigned short); -vector unsigned short vec_vsrah (vector unsigned short, - vector unsigned short); +If the ISA 2.07 additions to the vector/scalar (power8-vector) +instruction set is available, the following additional functions are +available for 64-bit targets. New vector types +(@var{vector __int128_t} and @var{vector __uint128_t}) are available +to hold the @var{__int128_t} and @var{__uint128_t} types to use these +builtins. -vector signed char vec_vsrab (vector signed char, vector unsigned char); -vector unsigned char vec_vsrab (vector unsigned char, - vector unsigned char); +The normal vector extract, and set operations work on +@var{vector __int128_t} and @var{vector __uint128_t} types, +but the index value must be 0. -vector signed int vec_srl (vector signed int, vector unsigned int); -vector signed int vec_srl (vector signed int, vector unsigned short); -vector signed int vec_srl (vector signed int, vector unsigned char); -vector unsigned int vec_srl (vector unsigned int, vector unsigned int); -vector unsigned int vec_srl (vector unsigned int, - vector unsigned short); -vector unsigned int vec_srl (vector unsigned int, vector unsigned char); -vector bool int vec_srl (vector bool int, vector unsigned int); -vector bool int vec_srl (vector bool int, vector unsigned short); -vector bool int vec_srl (vector bool int, vector unsigned char); -vector signed short vec_srl (vector signed short, vector unsigned int); -vector signed short vec_srl (vector signed short, - vector unsigned short); -vector signed short vec_srl (vector signed short, vector unsigned char); -vector unsigned short vec_srl (vector unsigned short, - vector unsigned int); -vector unsigned short vec_srl (vector unsigned short, - vector unsigned short); -vector unsigned short vec_srl (vector unsigned short, - vector unsigned char); -vector bool short vec_srl (vector bool short, vector unsigned int); -vector bool short vec_srl (vector bool short, vector unsigned short); -vector bool short vec_srl (vector bool short, vector unsigned char); -vector pixel vec_srl (vector pixel, vector unsigned int); -vector pixel vec_srl (vector pixel, vector unsigned short); -vector pixel vec_srl (vector pixel, vector unsigned char); -vector signed char vec_srl (vector signed char, vector unsigned int); -vector signed char vec_srl (vector signed char, vector unsigned short); -vector signed char vec_srl (vector signed char, vector unsigned char); -vector unsigned char vec_srl (vector unsigned char, - vector unsigned int); -vector unsigned char vec_srl (vector unsigned char, - vector unsigned short); -vector unsigned char vec_srl (vector unsigned char, - vector unsigned char); -vector bool char vec_srl (vector bool char, vector unsigned int); -vector bool char vec_srl (vector bool char, vector unsigned short); -vector bool char vec_srl (vector bool char, vector unsigned char); +@smallexample +vector __int128_t vec_vaddcuq (vector __int128_t, vector __int128_t); +vector __uint128_t vec_vaddcuq (vector __uint128_t, vector __uint128_t); -vector float vec_sro (vector float, vector signed char); -vector float vec_sro (vector float, vector unsigned char); -vector signed int vec_sro (vector signed int, vector signed char); -vector signed int vec_sro (vector signed int, vector unsigned char); -vector unsigned int vec_sro (vector unsigned int, vector signed char); -vector unsigned int vec_sro (vector unsigned int, vector unsigned char); -vector signed short vec_sro (vector signed short, vector signed char); -vector signed short vec_sro (vector signed short, vector unsigned char); -vector unsigned short vec_sro (vector unsigned short, - vector signed char); -vector unsigned short vec_sro (vector unsigned short, - vector unsigned char); -vector pixel vec_sro (vector pixel, vector signed char); -vector pixel vec_sro (vector pixel, vector unsigned char); -vector signed char vec_sro (vector signed char, vector signed char); -vector signed char vec_sro (vector signed char, vector unsigned char); -vector unsigned char vec_sro (vector unsigned char, vector signed char); -vector unsigned char vec_sro (vector unsigned char, - vector unsigned char); +vector __int128_t vec_vadduqm (vector __int128_t, vector __int128_t); +vector __uint128_t vec_vadduqm (vector __uint128_t, vector __uint128_t); -void vec_st (vector float, int, vector float *); -void vec_st (vector float, int, float *); -void vec_st (vector signed int, int, vector signed int *); -void vec_st (vector signed int, int, int *); -void vec_st (vector unsigned int, int, vector unsigned int *); -void vec_st (vector unsigned int, int, unsigned int *); -void vec_st (vector bool int, int, vector bool int *); -void vec_st (vector bool int, int, unsigned int *); -void vec_st (vector bool int, int, int *); -void vec_st (vector signed short, int, vector signed short *); -void vec_st (vector signed short, int, short *); -void vec_st (vector unsigned short, int, vector unsigned short *); -void vec_st (vector unsigned short, int, unsigned short *); -void vec_st (vector bool short, int, vector bool short *); -void vec_st (vector bool short, int, unsigned short *); -void vec_st (vector pixel, int, vector pixel *); -void vec_st (vector pixel, int, unsigned short *); -void vec_st (vector pixel, int, short *); -void vec_st (vector bool short, int, short *); -void vec_st (vector signed char, int, vector signed char *); -void vec_st (vector signed char, int, signed char *); -void vec_st (vector unsigned char, int, vector unsigned char *); -void vec_st (vector unsigned char, int, unsigned char *); -void vec_st (vector bool char, int, vector bool char *); -void vec_st (vector bool char, int, unsigned char *); -void vec_st (vector bool char, int, signed char *); +vector __int128_t vec_vaddecuq (vector __int128_t, vector __int128_t, + vector __int128_t); +vector __uint128_t vec_vaddecuq (vector __uint128_t, vector __uint128_t, + vector __uint128_t); -void vec_ste (vector signed char, int, signed char *); -void vec_ste (vector unsigned char, int, unsigned char *); -void vec_ste (vector bool char, int, signed char *); -void vec_ste (vector bool char, int, unsigned char *); -void vec_ste (vector signed short, int, short *); -void vec_ste (vector unsigned short, int, unsigned short *); -void vec_ste (vector bool short, int, short *); -void vec_ste (vector bool short, int, unsigned short *); -void vec_ste (vector pixel, int, short *); -void vec_ste (vector pixel, int, unsigned short *); -void vec_ste (vector float, int, float *); -void vec_ste (vector signed int, int, int *); -void vec_ste (vector unsigned int, int, unsigned int *); -void vec_ste (vector bool int, int, int *); -void vec_ste (vector bool int, int, unsigned int *); +vector __int128_t vec_vaddeuqm (vector __int128_t, vector __int128_t, + vector __int128_t); +vector __uint128_t vec_vaddeuqm (vector __uint128_t, vector __uint128_t, + vector __uint128_t); -void vec_stvewx (vector float, int, float *); -void vec_stvewx (vector signed int, int, int *); -void vec_stvewx (vector unsigned int, int, unsigned int *); -void vec_stvewx (vector bool int, int, int *); -void vec_stvewx (vector bool int, int, unsigned int *); +vector __int128_t vec_vsubecuq (vector __int128_t, vector __int128_t, + vector __int128_t); +vector __uint128_t vec_vsubecuq (vector __uint128_t, vector __uint128_t, + vector __uint128_t); -void vec_stvehx (vector signed short, int, short *); -void vec_stvehx (vector unsigned short, int, unsigned short *); -void vec_stvehx (vector bool short, int, short *); -void vec_stvehx (vector bool short, int, unsigned short *); -void vec_stvehx (vector pixel, int, short *); -void vec_stvehx (vector pixel, int, unsigned short *); +vector __int128_t vec_vsubeuqm (vector __int128_t, vector __int128_t, + vector __int128_t); +vector __uint128_t vec_vsubeuqm (vector __uint128_t, vector __uint128_t, + vector __uint128_t); -void vec_stvebx (vector signed char, int, signed char *); -void vec_stvebx (vector unsigned char, int, unsigned char *); -void vec_stvebx (vector bool char, int, signed char *); -void vec_stvebx (vector bool char, int, unsigned char *); +vector __int128_t vec_vsubcuq (vector __int128_t, vector __int128_t); +vector __uint128_t vec_vsubcuq (vector __uint128_t, vector __uint128_t); -void vec_stl (vector float, int, vector float *); -void vec_stl (vector float, int, float *); -void vec_stl (vector signed int, int, vector signed int *); -void vec_stl (vector signed int, int, int *); -void vec_stl (vector unsigned int, int, vector unsigned int *); -void vec_stl (vector unsigned int, int, unsigned int *); -void vec_stl (vector bool int, int, vector bool int *); -void vec_stl (vector bool int, int, unsigned int *); -void vec_stl (vector bool int, int, int *); -void vec_stl (vector signed short, int, vector signed short *); -void vec_stl (vector signed short, int, short *); -void vec_stl (vector unsigned short, int, vector unsigned short *); -void vec_stl (vector unsigned short, int, unsigned short *); -void vec_stl (vector bool short, int, vector bool short *); -void vec_stl (vector bool short, int, unsigned short *); -void vec_stl (vector bool short, int, short *); -void vec_stl (vector pixel, int, vector pixel *); -void vec_stl (vector pixel, int, unsigned short *); -void vec_stl (vector pixel, int, short *); -void vec_stl (vector signed char, int, vector signed char *); -void vec_stl (vector signed char, int, signed char *); -void vec_stl (vector unsigned char, int, vector unsigned char *); -void vec_stl (vector unsigned char, int, unsigned char *); -void vec_stl (vector bool char, int, vector bool char *); -void vec_stl (vector bool char, int, unsigned char *); -void vec_stl (vector bool char, int, signed char *); +__int128_t vec_vsubuqm (__int128_t, __int128_t); +__uint128_t vec_vsubuqm (__uint128_t, __uint128_t); -vector signed char vec_sub (vector bool char, vector signed char); -vector signed char vec_sub (vector signed char, vector bool char); -vector signed char vec_sub (vector signed char, vector signed char); -vector unsigned char vec_sub (vector bool char, vector unsigned char); -vector unsigned char vec_sub (vector unsigned char, vector bool char); -vector unsigned char vec_sub (vector unsigned char, - vector unsigned char); -vector signed short vec_sub (vector bool short, vector signed short); -vector signed short vec_sub (vector signed short, vector bool short); -vector signed short vec_sub (vector signed short, vector signed short); -vector unsigned short vec_sub (vector bool short, - vector unsigned short); -vector unsigned short vec_sub (vector unsigned short, - vector bool short); -vector unsigned short vec_sub (vector unsigned short, - vector unsigned short); -vector signed int vec_sub (vector bool int, vector signed int); -vector signed int vec_sub (vector signed int, vector bool int); -vector signed int vec_sub (vector signed int, vector signed int); -vector unsigned int vec_sub (vector bool int, vector unsigned int); -vector unsigned int vec_sub (vector unsigned int, vector bool int); -vector unsigned int vec_sub (vector unsigned int, vector unsigned int); -vector float vec_sub (vector float, vector float); +vector __int128_t __builtin_bcdadd (vector __int128_t, vector__int128_t); +int __builtin_bcdadd_lt (vector __int128_t, vector__int128_t); +int __builtin_bcdadd_eq (vector __int128_t, vector__int128_t); +int __builtin_bcdadd_gt (vector __int128_t, vector__int128_t); +int __builtin_bcdadd_ov (vector __int128_t, vector__int128_t); +vector __int128_t bcdsub (vector __int128_t, vector__int128_t); +int __builtin_bcdsub_lt (vector __int128_t, vector__int128_t); +int __builtin_bcdsub_eq (vector __int128_t, vector__int128_t); +int __builtin_bcdsub_gt (vector __int128_t, vector__int128_t); +int __builtin_bcdsub_ov (vector __int128_t, vector__int128_t); +@end smallexample -vector float vec_vsubfp (vector float, vector float); +If the cryptographic instructions are enabled (@option{-mcrypto} or +@option{-mcpu=power8}), the following builtins are enabled. -vector signed int vec_vsubuwm (vector bool int, vector signed int); -vector signed int vec_vsubuwm (vector signed int, vector bool int); -vector signed int vec_vsubuwm (vector signed int, vector signed int); -vector unsigned int vec_vsubuwm (vector bool int, vector unsigned int); -vector unsigned int vec_vsubuwm (vector unsigned int, vector bool int); -vector unsigned int vec_vsubuwm (vector unsigned int, - vector unsigned int); +@smallexample +vector unsigned long long __builtin_crypto_vsbox (vector unsigned long long); -vector signed short vec_vsubuhm (vector bool short, - vector signed short); -vector signed short vec_vsubuhm (vector signed short, - vector bool short); -vector signed short vec_vsubuhm (vector signed short, - vector signed short); -vector unsigned short vec_vsubuhm (vector bool short, - vector unsigned short); -vector unsigned short vec_vsubuhm (vector unsigned short, - vector bool short); -vector unsigned short vec_vsubuhm (vector unsigned short, - vector unsigned short); +vector unsigned long long __builtin_crypto_vcipher (vector unsigned long long, + vector unsigned long long); -vector signed char vec_vsububm (vector bool char, vector signed char); -vector signed char vec_vsububm (vector signed char, vector bool char); -vector signed char vec_vsububm (vector signed char, vector signed char); -vector unsigned char vec_vsububm (vector bool char, - vector unsigned char); -vector unsigned char vec_vsububm (vector unsigned char, - vector bool char); -vector unsigned char vec_vsububm (vector unsigned char, - vector unsigned char); +vector unsigned long long __builtin_crypto_vcipherlast + (vector unsigned long long, + vector unsigned long long); + +vector unsigned long long __builtin_crypto_vncipher (vector unsigned long long, + vector unsigned long long); -vector unsigned int vec_subc (vector unsigned int, vector unsigned int); +vector unsigned long long __builtin_crypto_vncipherlast + (vector unsigned long long, + vector unsigned long long); -vector unsigned char vec_subs (vector bool char, vector unsigned char); -vector unsigned char vec_subs (vector unsigned char, vector bool char); -vector unsigned char vec_subs (vector unsigned char, - vector unsigned char); -vector signed char vec_subs (vector bool char, vector signed char); -vector signed char vec_subs (vector signed char, vector bool char); -vector signed char vec_subs (vector signed char, vector signed char); -vector unsigned short vec_subs (vector bool short, - vector unsigned short); -vector unsigned short vec_subs (vector unsigned short, - vector bool short); -vector unsigned short vec_subs (vector unsigned short, - vector unsigned short); -vector signed short vec_subs (vector bool short, vector signed short); -vector signed short vec_subs (vector signed short, vector bool short); -vector signed short vec_subs (vector signed short, vector signed short); -vector unsigned int vec_subs (vector bool int, vector unsigned int); -vector unsigned int vec_subs (vector unsigned int, vector bool int); -vector unsigned int vec_subs (vector unsigned int, vector unsigned int); -vector signed int vec_subs (vector bool int, vector signed int); -vector signed int vec_subs (vector signed int, vector bool int); -vector signed int vec_subs (vector signed int, vector signed int); +vector unsigned char __builtin_crypto_vpermxor (vector unsigned char, + vector unsigned char, + vector unsigned char); -vector signed int vec_vsubsws (vector bool int, vector signed int); -vector signed int vec_vsubsws (vector signed int, vector bool int); -vector signed int vec_vsubsws (vector signed int, vector signed int); +vector unsigned short __builtin_crypto_vpermxor (vector unsigned short, + vector unsigned short, + vector unsigned short); -vector unsigned int vec_vsubuws (vector bool int, vector unsigned int); -vector unsigned int vec_vsubuws (vector unsigned int, vector bool int); -vector unsigned int vec_vsubuws (vector unsigned int, - vector unsigned int); +vector unsigned int __builtin_crypto_vpermxor (vector unsigned int, + vector unsigned int, + vector unsigned int); -vector signed short vec_vsubshs (vector bool short, - vector signed short); -vector signed short vec_vsubshs (vector signed short, - vector bool short); -vector signed short vec_vsubshs (vector signed short, - vector signed short); +vector unsigned long long __builtin_crypto_vpermxor (vector unsigned long long, + vector unsigned long long, + vector unsigned long long); -vector unsigned short vec_vsubuhs (vector bool short, - vector unsigned short); -vector unsigned short vec_vsubuhs (vector unsigned short, - vector bool short); -vector unsigned short vec_vsubuhs (vector unsigned short, - vector unsigned short); +vector unsigned char __builtin_crypto_vpmsumb (vector unsigned char, + vector unsigned char); -vector signed char vec_vsubsbs (vector bool char, vector signed char); -vector signed char vec_vsubsbs (vector signed char, vector bool char); -vector signed char vec_vsubsbs (vector signed char, vector signed char); +vector unsigned short __builtin_crypto_vpmsumb (vector unsigned short, + vector unsigned short); -vector unsigned char vec_vsububs (vector bool char, - vector unsigned char); -vector unsigned char vec_vsububs (vector unsigned char, - vector bool char); -vector unsigned char vec_vsububs (vector unsigned char, - vector unsigned char); +vector unsigned int __builtin_crypto_vpmsumb (vector unsigned int, + vector unsigned int); -vector unsigned int vec_sum4s (vector unsigned char, - vector unsigned int); -vector signed int vec_sum4s (vector signed char, vector signed int); -vector signed int vec_sum4s (vector signed short, vector signed int); +vector unsigned long long __builtin_crypto_vpmsumb (vector unsigned long long, + vector unsigned long long); -vector signed int vec_vsum4shs (vector signed short, vector signed int); +vector unsigned long long __builtin_crypto_vshasigmad + (vector unsigned long long, int, int); -vector signed int vec_vsum4sbs (vector signed char, vector signed int); +vector unsigned int __builtin_crypto_vshasigmaw (vector unsigned int, + int, int); +@end smallexample -vector unsigned int vec_vsum4ubs (vector unsigned char, - vector unsigned int); +The second argument to the @var{__builtin_crypto_vshasigmad} and +@var{__builtin_crypto_vshasigmaw} builtin functions must be a constant +integer that is 0 or 1. The third argument to these builtin functions +must be a constant integer in the range of 0 to 15. -vector signed int vec_sum2s (vector signed int, vector signed int); +@node PowerPC Hardware Transactional Memory Built-in Functions +@subsection PowerPC Hardware Transactional Memory Built-in Functions +GCC provides two interfaces for accessing the Hardware Transactional +Memory (HTM) instructions available on some of the PowerPC family +of prcoessors (eg, POWER8). The two interfaces come in a low level +interface, consisting of built-in functions specific to PowerPC and a +higher level interface consisting of inline functions that are common +between PowerPC and S/390. -vector signed int vec_sums (vector signed int, vector signed int); +@subsubsection PowerPC HTM Low Level Built-in Functions -vector float vec_trunc (vector float); +The following low level built-in functions are available with +@option{-mhtm} or @option{-mcpu=CPU} where CPU is `power8' or later. +They all generate the machine instruction that is part of the name. -vector signed short vec_unpackh (vector signed char); -vector bool short vec_unpackh (vector bool char); -vector signed int vec_unpackh (vector signed short); -vector bool int vec_unpackh (vector bool short); -vector unsigned int vec_unpackh (vector pixel); +The HTM built-ins return true or false depending on their success and +their arguments match exactly the type and order of the associated +hardware instruction's operands. Refer to the ISA manual for a +description of each instruction's operands. -vector bool int vec_vupkhsh (vector bool short); -vector signed int vec_vupkhsh (vector signed short); +@smallexample +unsigned int __builtin_tbegin (unsigned int) +unsigned int __builtin_tend (unsigned int) -vector unsigned int vec_vupkhpx (vector pixel); +unsigned int __builtin_tabort (unsigned int) +unsigned int __builtin_tabortdc (unsigned int, unsigned int, unsigned int) +unsigned int __builtin_tabortdci (unsigned int, unsigned int, int) +unsigned int __builtin_tabortwc (unsigned int, unsigned int, unsigned int) +unsigned int __builtin_tabortwci (unsigned int, unsigned int, int) -vector bool short vec_vupkhsb (vector bool char); -vector signed short vec_vupkhsb (vector signed char); +unsigned int __builtin_tcheck (unsigned int) +unsigned int __builtin_treclaim (unsigned int) +unsigned int __builtin_trechkpt (void) +unsigned int __builtin_tsr (unsigned int) +@end smallexample -vector signed short vec_unpackl (vector signed char); -vector bool short vec_unpackl (vector bool char); -vector unsigned int vec_unpackl (vector pixel); -vector signed int vec_unpackl (vector signed short); -vector bool int vec_unpackl (vector bool short); +In addition to the above HTM built-ins, we have added built-ins for +some common extended mnemonics of the HTM instructions: -vector unsigned int vec_vupklpx (vector pixel); +@smallexample +unsigned int __builtin_tendall (void) +unsigned int __builtin_tresume (void) +unsigned int __builtin_tsuspend (void) +@end smallexample -vector bool int vec_vupklsh (vector bool short); -vector signed int vec_vupklsh (vector signed short); +The following set of built-in functions are available to gain access +to the HTM specific special purpose registers. -vector bool short vec_vupklsb (vector bool char); -vector signed short vec_vupklsb (vector signed char); +@smallexample +unsigned long __builtin_get_texasr (void) +unsigned long __builtin_get_texasru (void) +unsigned long __builtin_get_tfhar (void) +unsigned long __builtin_get_tfiar (void) -vector float vec_xor (vector float, vector float); -vector float vec_xor (vector float, vector bool int); -vector float vec_xor (vector bool int, vector float); -vector bool int vec_xor (vector bool int, vector bool int); -vector signed int vec_xor (vector bool int, vector signed int); -vector signed int vec_xor (vector signed int, vector bool int); -vector signed int vec_xor (vector signed int, vector signed int); -vector unsigned int vec_xor (vector bool int, vector unsigned int); -vector unsigned int vec_xor (vector unsigned int, vector bool int); -vector unsigned int vec_xor (vector unsigned int, vector unsigned int); -vector bool short vec_xor (vector bool short, vector bool short); -vector signed short vec_xor (vector bool short, vector signed short); -vector signed short vec_xor (vector signed short, vector bool short); -vector signed short vec_xor (vector signed short, vector signed short); -vector unsigned short vec_xor (vector bool short, - vector unsigned short); -vector unsigned short vec_xor (vector unsigned short, - vector bool short); -vector unsigned short vec_xor (vector unsigned short, - vector unsigned short); -vector signed char vec_xor (vector bool char, vector signed char); -vector bool char vec_xor (vector bool char, vector bool char); -vector signed char vec_xor (vector signed char, vector bool char); -vector signed char vec_xor (vector signed char, vector signed char); -vector unsigned char vec_xor (vector bool char, vector unsigned char); -vector unsigned char vec_xor (vector unsigned char, vector bool char); -vector unsigned char vec_xor (vector unsigned char, - vector unsigned char); +void __builtin_set_texasr (unsigned long); +void __builtin_set_texasru (unsigned long); +void __builtin_set_tfhar (unsigned long); +void __builtin_set_tfiar (unsigned long); +@end smallexample -int vec_all_eq (vector signed char, vector bool char); -int vec_all_eq (vector signed char, vector signed char); -int vec_all_eq (vector unsigned char, vector bool char); -int vec_all_eq (vector unsigned char, vector unsigned char); -int vec_all_eq (vector bool char, vector bool char); -int vec_all_eq (vector bool char, vector unsigned char); -int vec_all_eq (vector bool char, vector signed char); -int vec_all_eq (vector signed short, vector bool short); -int vec_all_eq (vector signed short, vector signed short); -int vec_all_eq (vector unsigned short, vector bool short); -int vec_all_eq (vector unsigned short, vector unsigned short); -int vec_all_eq (vector bool short, vector bool short); -int vec_all_eq (vector bool short, vector unsigned short); -int vec_all_eq (vector bool short, vector signed short); -int vec_all_eq (vector pixel, vector pixel); -int vec_all_eq (vector signed int, vector bool int); -int vec_all_eq (vector signed int, vector signed int); -int vec_all_eq (vector unsigned int, vector bool int); -int vec_all_eq (vector unsigned int, vector unsigned int); -int vec_all_eq (vector bool int, vector bool int); -int vec_all_eq (vector bool int, vector unsigned int); -int vec_all_eq (vector bool int, vector signed int); -int vec_all_eq (vector float, vector float); +Example usage of these low level built-in functions may look like: -int vec_all_ge (vector bool char, vector unsigned char); -int vec_all_ge (vector unsigned char, vector bool char); -int vec_all_ge (vector unsigned char, vector unsigned char); -int vec_all_ge (vector bool char, vector signed char); -int vec_all_ge (vector signed char, vector bool char); -int vec_all_ge (vector signed char, vector signed char); -int vec_all_ge (vector bool short, vector unsigned short); -int vec_all_ge (vector unsigned short, vector bool short); -int vec_all_ge (vector unsigned short, vector unsigned short); -int vec_all_ge (vector signed short, vector signed short); -int vec_all_ge (vector bool short, vector signed short); -int vec_all_ge (vector signed short, vector bool short); -int vec_all_ge (vector bool int, vector unsigned int); -int vec_all_ge (vector unsigned int, vector bool int); -int vec_all_ge (vector unsigned int, vector unsigned int); -int vec_all_ge (vector bool int, vector signed int); -int vec_all_ge (vector signed int, vector bool int); -int vec_all_ge (vector signed int, vector signed int); -int vec_all_ge (vector float, vector float); +@smallexample +#include -int vec_all_gt (vector bool char, vector unsigned char); -int vec_all_gt (vector unsigned char, vector bool char); -int vec_all_gt (vector unsigned char, vector unsigned char); -int vec_all_gt (vector bool char, vector signed char); -int vec_all_gt (vector signed char, vector bool char); -int vec_all_gt (vector signed char, vector signed char); -int vec_all_gt (vector bool short, vector unsigned short); -int vec_all_gt (vector unsigned short, vector bool short); -int vec_all_gt (vector unsigned short, vector unsigned short); -int vec_all_gt (vector bool short, vector signed short); -int vec_all_gt (vector signed short, vector bool short); -int vec_all_gt (vector signed short, vector signed short); -int vec_all_gt (vector bool int, vector unsigned int); -int vec_all_gt (vector unsigned int, vector bool int); -int vec_all_gt (vector unsigned int, vector unsigned int); -int vec_all_gt (vector bool int, vector signed int); -int vec_all_gt (vector signed int, vector bool int); -int vec_all_gt (vector signed int, vector signed int); -int vec_all_gt (vector float, vector float); +int num_retries = 10; + +while (1) + @{ + if (__builtin_tbegin (0)) + @{ + /* Transaction State Initiated. */ + if (is_locked (lock)) + __builtin_tabort (0); + ... transaction code... + __builtin_tend (0); + break; + @} + else + @{ + /* Transaction State Failed. Use locks if the transaction + failure is "persistent" or we've tried too many times. */ + if (num_retries-- <= 0 + || _TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ())) + @{ + acquire_lock (lock); + ... non transactional fallback path... + release_lock (lock); + break; + @} + @} + @} +@end smallexample -int vec_all_in (vector float, vector float); +One final built-in function has been added that returns the value of +the 2-bit Transaction State field of the Machine Status Register (MSR) +as stored in @code{CR0}. -int vec_all_le (vector bool char, vector unsigned char); -int vec_all_le (vector unsigned char, vector bool char); -int vec_all_le (vector unsigned char, vector unsigned char); -int vec_all_le (vector bool char, vector signed char); -int vec_all_le (vector signed char, vector bool char); -int vec_all_le (vector signed char, vector signed char); -int vec_all_le (vector bool short, vector unsigned short); -int vec_all_le (vector unsigned short, vector bool short); -int vec_all_le (vector unsigned short, vector unsigned short); -int vec_all_le (vector bool short, vector signed short); -int vec_all_le (vector signed short, vector bool short); -int vec_all_le (vector signed short, vector signed short); -int vec_all_le (vector bool int, vector unsigned int); -int vec_all_le (vector unsigned int, vector bool int); -int vec_all_le (vector unsigned int, vector unsigned int); -int vec_all_le (vector bool int, vector signed int); -int vec_all_le (vector signed int, vector bool int); -int vec_all_le (vector signed int, vector signed int); -int vec_all_le (vector float, vector float); +@smallexample +unsigned long __builtin_ttest (void) +@end smallexample -int vec_all_lt (vector bool char, vector unsigned char); -int vec_all_lt (vector unsigned char, vector bool char); -int vec_all_lt (vector unsigned char, vector unsigned char); -int vec_all_lt (vector bool char, vector signed char); -int vec_all_lt (vector signed char, vector bool char); -int vec_all_lt (vector signed char, vector signed char); -int vec_all_lt (vector bool short, vector unsigned short); -int vec_all_lt (vector unsigned short, vector bool short); -int vec_all_lt (vector unsigned short, vector unsigned short); -int vec_all_lt (vector bool short, vector signed short); -int vec_all_lt (vector signed short, vector bool short); -int vec_all_lt (vector signed short, vector signed short); -int vec_all_lt (vector bool int, vector unsigned int); -int vec_all_lt (vector unsigned int, vector bool int); -int vec_all_lt (vector unsigned int, vector unsigned int); -int vec_all_lt (vector bool int, vector signed int); -int vec_all_lt (vector signed int, vector bool int); -int vec_all_lt (vector signed int, vector signed int); -int vec_all_lt (vector float, vector float); +This built-in can be used to determine the current transaction state +using the following code example: -int vec_all_nan (vector float); +@smallexample +#include -int vec_all_ne (vector signed char, vector bool char); -int vec_all_ne (vector signed char, vector signed char); -int vec_all_ne (vector unsigned char, vector bool char); -int vec_all_ne (vector unsigned char, vector unsigned char); -int vec_all_ne (vector bool char, vector bool char); -int vec_all_ne (vector bool char, vector unsigned char); -int vec_all_ne (vector bool char, vector signed char); -int vec_all_ne (vector signed short, vector bool short); -int vec_all_ne (vector signed short, vector signed short); -int vec_all_ne (vector unsigned short, vector bool short); -int vec_all_ne (vector unsigned short, vector unsigned short); -int vec_all_ne (vector bool short, vector bool short); -int vec_all_ne (vector bool short, vector unsigned short); -int vec_all_ne (vector bool short, vector signed short); -int vec_all_ne (vector pixel, vector pixel); -int vec_all_ne (vector signed int, vector bool int); -int vec_all_ne (vector signed int, vector signed int); -int vec_all_ne (vector unsigned int, vector bool int); -int vec_all_ne (vector unsigned int, vector unsigned int); -int vec_all_ne (vector bool int, vector bool int); -int vec_all_ne (vector bool int, vector unsigned int); -int vec_all_ne (vector bool int, vector signed int); -int vec_all_ne (vector float, vector float); +unsigned char tx_state = _HTM_STATE (__builtin_ttest ()); -int vec_all_nge (vector float, vector float); +if (tx_state == _HTM_TRANSACTIONAL) + @{ + /* Code to use in transactional state. */ + @} +else if (tx_state == _HTM_NONTRANSACTIONAL) + @{ + /* Code to use in non-transactional state. */ + @} +else if (tx_state == _HTM_SUSPENDED) + @{ + /* Code to use in transaction suspended state. */ + @} +@end smallexample -int vec_all_ngt (vector float, vector float); +@subsubsection PowerPC HTM High Level Inline Functions -int vec_all_nle (vector float, vector float); +The following high level HTM interface is made available by including +@code{} and using @option{-mhtm} or @option{-mcpu=CPU} +where CPU is `power8' or later. This interface is common between PowerPC +and S/390, allowing users to write one HTM source implementation that +can be compiled and executed on either system. -int vec_all_nlt (vector float, vector float); +@smallexample +long __TM_simple_begin (void) +long __TM_begin (void* const TM_buff) +long __TM_end (void) +void __TM_abort (void) +void __TM_named_abort (unsigned char const code) +void __TM_resume (void) +void __TM_suspend (void) -int vec_all_numeric (vector float); +long __TM_is_user_abort (void* const TM_buff) +long __TM_is_named_user_abort (void* const TM_buff, unsigned char *code) +long __TM_is_illegal (void* const TM_buff) +long __TM_is_footprint_exceeded (void* const TM_buff) +long __TM_nesting_depth (void* const TM_buff) +long __TM_is_nested_too_deep(void* const TM_buff) +long __TM_is_conflict(void* const TM_buff) +long __TM_is_failure_persistent(void* const TM_buff) +long __TM_failure_address(void* const TM_buff) +long long __TM_failure_code(void* const TM_buff) +@end smallexample -int vec_any_eq (vector signed char, vector bool char); -int vec_any_eq (vector signed char, vector signed char); -int vec_any_eq (vector unsigned char, vector bool char); -int vec_any_eq (vector unsigned char, vector unsigned char); -int vec_any_eq (vector bool char, vector bool char); -int vec_any_eq (vector bool char, vector unsigned char); -int vec_any_eq (vector bool char, vector signed char); -int vec_any_eq (vector signed short, vector bool short); -int vec_any_eq (vector signed short, vector signed short); -int vec_any_eq (vector unsigned short, vector bool short); -int vec_any_eq (vector unsigned short, vector unsigned short); -int vec_any_eq (vector bool short, vector bool short); -int vec_any_eq (vector bool short, vector unsigned short); -int vec_any_eq (vector bool short, vector signed short); -int vec_any_eq (vector pixel, vector pixel); -int vec_any_eq (vector signed int, vector bool int); -int vec_any_eq (vector signed int, vector signed int); -int vec_any_eq (vector unsigned int, vector bool int); -int vec_any_eq (vector unsigned int, vector unsigned int); -int vec_any_eq (vector bool int, vector bool int); -int vec_any_eq (vector bool int, vector unsigned int); -int vec_any_eq (vector bool int, vector signed int); -int vec_any_eq (vector float, vector float); +Using these common set of HTM inline functions, we can create +a more portable version of the HTM example in the previous +section that will work on either PowerPC or S/390: -int vec_any_ge (vector signed char, vector bool char); -int vec_any_ge (vector unsigned char, vector bool char); -int vec_any_ge (vector unsigned char, vector unsigned char); -int vec_any_ge (vector signed char, vector signed char); -int vec_any_ge (vector bool char, vector unsigned char); -int vec_any_ge (vector bool char, vector signed char); -int vec_any_ge (vector unsigned short, vector bool short); -int vec_any_ge (vector unsigned short, vector unsigned short); -int vec_any_ge (vector signed short, vector signed short); -int vec_any_ge (vector signed short, vector bool short); -int vec_any_ge (vector bool short, vector unsigned short); -int vec_any_ge (vector bool short, vector signed short); -int vec_any_ge (vector signed int, vector bool int); -int vec_any_ge (vector unsigned int, vector bool int); -int vec_any_ge (vector unsigned int, vector unsigned int); -int vec_any_ge (vector signed int, vector signed int); -int vec_any_ge (vector bool int, vector unsigned int); -int vec_any_ge (vector bool int, vector signed int); -int vec_any_ge (vector float, vector float); +@smallexample +#include -int vec_any_gt (vector bool char, vector unsigned char); -int vec_any_gt (vector unsigned char, vector bool char); -int vec_any_gt (vector unsigned char, vector unsigned char); -int vec_any_gt (vector bool char, vector signed char); -int vec_any_gt (vector signed char, vector bool char); -int vec_any_gt (vector signed char, vector signed char); -int vec_any_gt (vector bool short, vector unsigned short); -int vec_any_gt (vector unsigned short, vector bool short); -int vec_any_gt (vector unsigned short, vector unsigned short); -int vec_any_gt (vector bool short, vector signed short); -int vec_any_gt (vector signed short, vector bool short); -int vec_any_gt (vector signed short, vector signed short); -int vec_any_gt (vector bool int, vector unsigned int); -int vec_any_gt (vector unsigned int, vector bool int); -int vec_any_gt (vector unsigned int, vector unsigned int); -int vec_any_gt (vector bool int, vector signed int); -int vec_any_gt (vector signed int, vector bool int); -int vec_any_gt (vector signed int, vector signed int); -int vec_any_gt (vector float, vector float); +int num_retries = 10; +TM_buff_type TM_buff; -int vec_any_le (vector bool char, vector unsigned char); -int vec_any_le (vector unsigned char, vector bool char); -int vec_any_le (vector unsigned char, vector unsigned char); -int vec_any_le (vector bool char, vector signed char); -int vec_any_le (vector signed char, vector bool char); -int vec_any_le (vector signed char, vector signed char); -int vec_any_le (vector bool short, vector unsigned short); -int vec_any_le (vector unsigned short, vector bool short); -int vec_any_le (vector unsigned short, vector unsigned short); -int vec_any_le (vector bool short, vector signed short); -int vec_any_le (vector signed short, vector bool short); -int vec_any_le (vector signed short, vector signed short); -int vec_any_le (vector bool int, vector unsigned int); -int vec_any_le (vector unsigned int, vector bool int); -int vec_any_le (vector unsigned int, vector unsigned int); -int vec_any_le (vector bool int, vector signed int); -int vec_any_le (vector signed int, vector bool int); -int vec_any_le (vector signed int, vector signed int); -int vec_any_le (vector float, vector float); +while (1) + @{ + if (__TM_begin (TM_buff)) + @{ + /* Transaction State Initiated. */ + if (is_locked (lock)) + __TM_abort (); + ... transaction code... + __TM_end (); + break; + @} + else + @{ + /* Transaction State Failed. Use locks if the transaction + failure is "persistent" or we've tried too many times. */ + if (num_retries-- <= 0 + || __TM_is_failure_persistent (TM_buff)) + @{ + acquire_lock (lock); + ... non transactional fallback path... + release_lock (lock); + break; + @} + @} + @} +@end smallexample -int vec_any_lt (vector bool char, vector unsigned char); -int vec_any_lt (vector unsigned char, vector bool char); -int vec_any_lt (vector unsigned char, vector unsigned char); -int vec_any_lt (vector bool char, vector signed char); -int vec_any_lt (vector signed char, vector bool char); -int vec_any_lt (vector signed char, vector signed char); -int vec_any_lt (vector bool short, vector unsigned short); -int vec_any_lt (vector unsigned short, vector bool short); -int vec_any_lt (vector unsigned short, vector unsigned short); -int vec_any_lt (vector bool short, vector signed short); -int vec_any_lt (vector signed short, vector bool short); -int vec_any_lt (vector signed short, vector signed short); -int vec_any_lt (vector bool int, vector unsigned int); -int vec_any_lt (vector unsigned int, vector bool int); -int vec_any_lt (vector unsigned int, vector unsigned int); -int vec_any_lt (vector bool int, vector signed int); -int vec_any_lt (vector signed int, vector bool int); -int vec_any_lt (vector signed int, vector signed int); -int vec_any_lt (vector float, vector float); +@node RX Built-in Functions +@subsection RX Built-in Functions +GCC supports some of the RX instructions which cannot be expressed in +the C programming language via the use of built-in functions. The +following functions are supported: -int vec_any_nan (vector float); +@deftypefn {Built-in Function} void __builtin_rx_brk (void) +Generates the @code{brk} machine instruction. +@end deftypefn -int vec_any_ne (vector signed char, vector bool char); -int vec_any_ne (vector signed char, vector signed char); -int vec_any_ne (vector unsigned char, vector bool char); -int vec_any_ne (vector unsigned char, vector unsigned char); -int vec_any_ne (vector bool char, vector bool char); -int vec_any_ne (vector bool char, vector unsigned char); -int vec_any_ne (vector bool char, vector signed char); -int vec_any_ne (vector signed short, vector bool short); -int vec_any_ne (vector signed short, vector signed short); -int vec_any_ne (vector unsigned short, vector bool short); -int vec_any_ne (vector unsigned short, vector unsigned short); -int vec_any_ne (vector bool short, vector bool short); -int vec_any_ne (vector bool short, vector unsigned short); -int vec_any_ne (vector bool short, vector signed short); -int vec_any_ne (vector pixel, vector pixel); -int vec_any_ne (vector signed int, vector bool int); -int vec_any_ne (vector signed int, vector signed int); -int vec_any_ne (vector unsigned int, vector bool int); -int vec_any_ne (vector unsigned int, vector unsigned int); -int vec_any_ne (vector bool int, vector bool int); -int vec_any_ne (vector bool int, vector unsigned int); -int vec_any_ne (vector bool int, vector signed int); -int vec_any_ne (vector float, vector float); +@deftypefn {Built-in Function} void __builtin_rx_clrpsw (int) +Generates the @code{clrpsw} machine instruction to clear the specified +bit in the processor status word. +@end deftypefn -int vec_any_nge (vector float, vector float); +@deftypefn {Built-in Function} void __builtin_rx_int (int) +Generates the @code{int} machine instruction to generate an interrupt +with the specified value. +@end deftypefn -int vec_any_ngt (vector float, vector float); +@deftypefn {Built-in Function} void __builtin_rx_machi (int, int) +Generates the @code{machi} machine instruction to add the result of +multiplying the top 16 bits of the two arguments into the +accumulator. +@end deftypefn -int vec_any_nle (vector float, vector float); +@deftypefn {Built-in Function} void __builtin_rx_maclo (int, int) +Generates the @code{maclo} machine instruction to add the result of +multiplying the bottom 16 bits of the two arguments into the +accumulator. +@end deftypefn -int vec_any_nlt (vector float, vector float); +@deftypefn {Built-in Function} void __builtin_rx_mulhi (int, int) +Generates the @code{mulhi} machine instruction to place the result of +multiplying the top 16 bits of the two arguments into the +accumulator. +@end deftypefn -int vec_any_numeric (vector float); +@deftypefn {Built-in Function} void __builtin_rx_mullo (int, int) +Generates the @code{mullo} machine instruction to place the result of +multiplying the bottom 16 bits of the two arguments into the +accumulator. +@end deftypefn -int vec_any_out (vector float, vector float); -@end smallexample +@deftypefn {Built-in Function} int __builtin_rx_mvfachi (void) +Generates the @code{mvfachi} machine instruction to read the top +32 bits of the accumulator. +@end deftypefn -If the vector/scalar (VSX) instruction set is available, the following -additional functions are available: +@deftypefn {Built-in Function} int __builtin_rx_mvfacmi (void) +Generates the @code{mvfacmi} machine instruction to read the middle +32 bits of the accumulator. +@end deftypefn -@smallexample -vector double vec_abs (vector double); -vector double vec_add (vector double, vector double); -vector double vec_and (vector double, vector double); -vector double vec_and (vector double, vector bool long); -vector double vec_and (vector bool long, vector double); -vector long vec_and (vector long, vector long); -vector long vec_and (vector long, vector bool long); -vector long vec_and (vector bool long, vector long); -vector unsigned long vec_and (vector unsigned long, vector unsigned long); -vector unsigned long vec_and (vector unsigned long, vector bool long); -vector unsigned long vec_and (vector bool long, vector unsigned long); -vector double vec_andc (vector double, vector double); -vector double vec_andc (vector double, vector bool long); -vector double vec_andc (vector bool long, vector double); -vector long vec_andc (vector long, vector long); -vector long vec_andc (vector long, vector bool long); -vector long vec_andc (vector bool long, vector long); -vector unsigned long vec_andc (vector unsigned long, vector unsigned long); -vector unsigned long vec_andc (vector unsigned long, vector bool long); -vector unsigned long vec_andc (vector bool long, vector unsigned long); -vector double vec_ceil (vector double); -vector bool long vec_cmpeq (vector double, vector double); -vector bool long vec_cmpge (vector double, vector double); -vector bool long vec_cmpgt (vector double, vector double); -vector bool long vec_cmple (vector double, vector double); -vector bool long vec_cmplt (vector double, vector double); -vector double vec_cpsgn (vector double, vector double); -vector float vec_div (vector float, vector float); -vector double vec_div (vector double, vector double); -vector long vec_div (vector long, vector long); -vector unsigned long vec_div (vector unsigned long, vector unsigned long); -vector double vec_floor (vector double); -vector double vec_ld (int, const vector double *); -vector double vec_ld (int, const double *); -vector double vec_ldl (int, const vector double *); -vector double vec_ldl (int, const double *); -vector unsigned char vec_lvsl (int, const volatile double *); -vector unsigned char vec_lvsr (int, const volatile double *); -vector double vec_madd (vector double, vector double, vector double); -vector double vec_max (vector double, vector double); -vector signed long vec_mergeh (vector signed long, vector signed long); -vector signed long vec_mergeh (vector signed long, vector bool long); -vector signed long vec_mergeh (vector bool long, vector signed long); -vector unsigned long vec_mergeh (vector unsigned long, vector unsigned long); -vector unsigned long vec_mergeh (vector unsigned long, vector bool long); -vector unsigned long vec_mergeh (vector bool long, vector unsigned long); -vector signed long vec_mergel (vector signed long, vector signed long); -vector signed long vec_mergel (vector signed long, vector bool long); -vector signed long vec_mergel (vector bool long, vector signed long); -vector unsigned long vec_mergel (vector unsigned long, vector unsigned long); -vector unsigned long vec_mergel (vector unsigned long, vector bool long); -vector unsigned long vec_mergel (vector bool long, vector unsigned long); -vector double vec_min (vector double, vector double); -vector float vec_msub (vector float, vector float, vector float); -vector double vec_msub (vector double, vector double, vector double); -vector float vec_mul (vector float, vector float); -vector double vec_mul (vector double, vector double); -vector long vec_mul (vector long, vector long); -vector unsigned long vec_mul (vector unsigned long, vector unsigned long); -vector float vec_nearbyint (vector float); -vector double vec_nearbyint (vector double); -vector float vec_nmadd (vector float, vector float, vector float); -vector double vec_nmadd (vector double, vector double, vector double); -vector double vec_nmsub (vector double, vector double, vector double); -vector double vec_nor (vector double, vector double); -vector long vec_nor (vector long, vector long); -vector long vec_nor (vector long, vector bool long); -vector long vec_nor (vector bool long, vector long); -vector unsigned long vec_nor (vector unsigned long, vector unsigned long); -vector unsigned long vec_nor (vector unsigned long, vector bool long); -vector unsigned long vec_nor (vector bool long, vector unsigned long); -vector double vec_or (vector double, vector double); -vector double vec_or (vector double, vector bool long); -vector double vec_or (vector bool long, vector double); -vector long vec_or (vector long, vector long); -vector long vec_or (vector long, vector bool long); -vector long vec_or (vector bool long, vector long); -vector unsigned long vec_or (vector unsigned long, vector unsigned long); -vector unsigned long vec_or (vector unsigned long, vector bool long); -vector unsigned long vec_or (vector bool long, vector unsigned long); -vector double vec_perm (vector double, vector double, vector unsigned char); -vector long vec_perm (vector long, vector long, vector unsigned char); -vector unsigned long vec_perm (vector unsigned long, vector unsigned long, - vector unsigned char); -vector double vec_rint (vector double); -vector double vec_recip (vector double, vector double); -vector double vec_rsqrt (vector double); -vector double vec_rsqrte (vector double); -vector double vec_sel (vector double, vector double, vector bool long); -vector double vec_sel (vector double, vector double, vector unsigned long); -vector long vec_sel (vector long, vector long, vector long); -vector long vec_sel (vector long, vector long, vector unsigned long); -vector long vec_sel (vector long, vector long, vector bool long); -vector unsigned long vec_sel (vector unsigned long, vector unsigned long, - vector long); -vector unsigned long vec_sel (vector unsigned long, vector unsigned long, - vector unsigned long); -vector unsigned long vec_sel (vector unsigned long, vector unsigned long, - vector bool long); -vector double vec_splats (double); -vector signed long vec_splats (signed long); -vector unsigned long vec_splats (unsigned long); -vector float vec_sqrt (vector float); -vector double vec_sqrt (vector double); -void vec_st (vector double, int, vector double *); -void vec_st (vector double, int, double *); -vector double vec_sub (vector double, vector double); -vector double vec_trunc (vector double); -vector double vec_xor (vector double, vector double); -vector double vec_xor (vector double, vector bool long); -vector double vec_xor (vector bool long, vector double); -vector long vec_xor (vector long, vector long); -vector long vec_xor (vector long, vector bool long); -vector long vec_xor (vector bool long, vector long); -vector unsigned long vec_xor (vector unsigned long, vector unsigned long); -vector unsigned long vec_xor (vector unsigned long, vector bool long); -vector unsigned long vec_xor (vector bool long, vector unsigned long); -int vec_all_eq (vector double, vector double); -int vec_all_ge (vector double, vector double); -int vec_all_gt (vector double, vector double); -int vec_all_le (vector double, vector double); -int vec_all_lt (vector double, vector double); -int vec_all_nan (vector double); -int vec_all_ne (vector double, vector double); -int vec_all_nge (vector double, vector double); -int vec_all_ngt (vector double, vector double); -int vec_all_nle (vector double, vector double); -int vec_all_nlt (vector double, vector double); -int vec_all_numeric (vector double); -int vec_any_eq (vector double, vector double); -int vec_any_ge (vector double, vector double); -int vec_any_gt (vector double, vector double); -int vec_any_le (vector double, vector double); -int vec_any_lt (vector double, vector double); -int vec_any_nan (vector double); -int vec_any_ne (vector double, vector double); -int vec_any_nge (vector double, vector double); -int vec_any_ngt (vector double, vector double); -int vec_any_nle (vector double, vector double); -int vec_any_nlt (vector double, vector double); -int vec_any_numeric (vector double); +@deftypefn {Built-in Function} int __builtin_rx_mvfc (int) +Generates the @code{mvfc} machine instruction which reads the control +register specified in its argument and returns its value. +@end deftypefn + +@deftypefn {Built-in Function} void __builtin_rx_mvtachi (int) +Generates the @code{mvtachi} machine instruction to set the top +32 bits of the accumulator. +@end deftypefn + +@deftypefn {Built-in Function} void __builtin_rx_mvtaclo (int) +Generates the @code{mvtaclo} machine instruction to set the bottom +32 bits of the accumulator. +@end deftypefn + +@deftypefn {Built-in Function} void __builtin_rx_mvtc (int reg, int val) +Generates the @code{mvtc} machine instruction which sets control +register number @code{reg} to @code{val}. +@end deftypefn -vector double vec_vsx_ld (int, const vector double *); -vector double vec_vsx_ld (int, const double *); -vector float vec_vsx_ld (int, const vector float *); -vector float vec_vsx_ld (int, const float *); -vector bool int vec_vsx_ld (int, const vector bool int *); -vector signed int vec_vsx_ld (int, const vector signed int *); -vector signed int vec_vsx_ld (int, const int *); -vector signed int vec_vsx_ld (int, const long *); -vector unsigned int vec_vsx_ld (int, const vector unsigned int *); -vector unsigned int vec_vsx_ld (int, const unsigned int *); -vector unsigned int vec_vsx_ld (int, const unsigned long *); -vector bool short vec_vsx_ld (int, const vector bool short *); -vector pixel vec_vsx_ld (int, const vector pixel *); -vector signed short vec_vsx_ld (int, const vector signed short *); -vector signed short vec_vsx_ld (int, const short *); -vector unsigned short vec_vsx_ld (int, const vector unsigned short *); -vector unsigned short vec_vsx_ld (int, const unsigned short *); -vector bool char vec_vsx_ld (int, const vector bool char *); -vector signed char vec_vsx_ld (int, const vector signed char *); -vector signed char vec_vsx_ld (int, const signed char *); -vector unsigned char vec_vsx_ld (int, const vector unsigned char *); -vector unsigned char vec_vsx_ld (int, const unsigned char *); +@deftypefn {Built-in Function} void __builtin_rx_mvtipl (int) +Generates the @code{mvtipl} machine instruction set the interrupt +priority level. +@end deftypefn -void vec_vsx_st (vector double, int, vector double *); -void vec_vsx_st (vector double, int, double *); -void vec_vsx_st (vector float, int, vector float *); -void vec_vsx_st (vector float, int, float *); -void vec_vsx_st (vector signed int, int, vector signed int *); -void vec_vsx_st (vector signed int, int, int *); -void vec_vsx_st (vector unsigned int, int, vector unsigned int *); -void vec_vsx_st (vector unsigned int, int, unsigned int *); -void vec_vsx_st (vector bool int, int, vector bool int *); -void vec_vsx_st (vector bool int, int, unsigned int *); -void vec_vsx_st (vector bool int, int, int *); -void vec_vsx_st (vector signed short, int, vector signed short *); -void vec_vsx_st (vector signed short, int, short *); -void vec_vsx_st (vector unsigned short, int, vector unsigned short *); -void vec_vsx_st (vector unsigned short, int, unsigned short *); -void vec_vsx_st (vector bool short, int, vector bool short *); -void vec_vsx_st (vector bool short, int, unsigned short *); -void vec_vsx_st (vector pixel, int, vector pixel *); -void vec_vsx_st (vector pixel, int, unsigned short *); -void vec_vsx_st (vector pixel, int, short *); -void vec_vsx_st (vector bool short, int, short *); -void vec_vsx_st (vector signed char, int, vector signed char *); -void vec_vsx_st (vector signed char, int, signed char *); -void vec_vsx_st (vector unsigned char, int, vector unsigned char *); -void vec_vsx_st (vector unsigned char, int, unsigned char *); -void vec_vsx_st (vector bool char, int, vector bool char *); -void vec_vsx_st (vector bool char, int, unsigned char *); -void vec_vsx_st (vector bool char, int, signed char *); +@deftypefn {Built-in Function} void __builtin_rx_racw (int) +Generates the @code{racw} machine instruction to round the accumulator +according to the specified mode. +@end deftypefn -vector double vec_xxpermdi (vector double, vector double, int); -vector float vec_xxpermdi (vector float, vector float, int); -vector long long vec_xxpermdi (vector long long, vector long long, int); -vector unsigned long long vec_xxpermdi (vector unsigned long long, - vector unsigned long long, int); -vector int vec_xxpermdi (vector int, vector int, int); -vector unsigned int vec_xxpermdi (vector unsigned int, - vector unsigned int, int); -vector short vec_xxpermdi (vector short, vector short, int); -vector unsigned short vec_xxpermdi (vector unsigned short, - vector unsigned short, int); -vector signed char vec_xxpermdi (vector signed char, vector signed char, int); -vector unsigned char vec_xxpermdi (vector unsigned char, - vector unsigned char, int); +@deftypefn {Built-in Function} int __builtin_rx_revw (int) +Generates the @code{revw} machine instruction which swaps the bytes in +the argument so that bits 0--7 now occupy bits 8--15 and vice versa, +and also bits 16--23 occupy bits 24--31 and vice versa. +@end deftypefn -vector double vec_xxsldi (vector double, vector double, int); -vector float vec_xxsldi (vector float, vector float, int); -vector long long vec_xxsldi (vector long long, vector long long, int); -vector unsigned long long vec_xxsldi (vector unsigned long long, - vector unsigned long long, int); -vector int vec_xxsldi (vector int, vector int, int); -vector unsigned int vec_xxsldi (vector unsigned int, vector unsigned int, int); -vector short vec_xxsldi (vector short, vector short, int); -vector unsigned short vec_xxsldi (vector unsigned short, - vector unsigned short, int); -vector signed char vec_xxsldi (vector signed char, vector signed char, int); -vector unsigned char vec_xxsldi (vector unsigned char, - vector unsigned char, int); -@end smallexample +@deftypefn {Built-in Function} void __builtin_rx_rmpa (void) +Generates the @code{rmpa} machine instruction which initiates a +repeated multiply and accumulate sequence. +@end deftypefn -Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even -if the VSX instruction set is available. The @samp{vec_vsx_ld} and -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X}, -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions. +@deftypefn {Built-in Function} void __builtin_rx_round (float) +Generates the @code{round} machine instruction which returns the +floating-point argument rounded according to the current rounding mode +set in the floating-point status word register. +@end deftypefn -If the ISA 2.07 additions to the vector/scalar (power8-vector) -instruction set is available, the following additional functions are -available for both 32-bit and 64-bit targets. For 64-bit targets, you -can use @var{vector long} instead of @var{vector long long}, -@var{vector bool long} instead of @var{vector bool long long}, and -@var{vector unsigned long} instead of @var{vector unsigned long long}. +@deftypefn {Built-in Function} int __builtin_rx_sat (int) +Generates the @code{sat} machine instruction which returns the +saturated value of the argument. +@end deftypefn -@smallexample -vector long long vec_abs (vector long long); +@deftypefn {Built-in Function} void __builtin_rx_setpsw (int) +Generates the @code{setpsw} machine instruction to set the specified +bit in the processor status word. +@end deftypefn -vector long long vec_add (vector long long, vector long long); -vector unsigned long long vec_add (vector unsigned long long, - vector unsigned long long); +@deftypefn {Built-in Function} void __builtin_rx_wait (void) +Generates the @code{wait} machine instruction. +@end deftypefn -int vec_all_eq (vector long long, vector long long); -int vec_all_eq (vector unsigned long long, vector unsigned long long); -int vec_all_ge (vector long long, vector long long); -int vec_all_ge (vector unsigned long long, vector unsigned long long); -int vec_all_gt (vector long long, vector long long); -int vec_all_gt (vector unsigned long long, vector unsigned long long); -int vec_all_le (vector long long, vector long long); -int vec_all_le (vector unsigned long long, vector unsigned long long); -int vec_all_lt (vector long long, vector long long); -int vec_all_lt (vector unsigned long long, vector unsigned long long); -int vec_all_ne (vector long long, vector long long); -int vec_all_ne (vector unsigned long long, vector unsigned long long); +@node S/390 System z Built-in Functions +@subsection S/390 System z Built-in Functions +@deftypefn {Built-in Function} int __builtin_tbegin (void*) +Generates the @code{tbegin} machine instruction starting a +non-constraint hardware transaction. If the parameter is non-NULL the +memory area is used to store the transaction diagnostic buffer and +will be passed as first operand to @code{tbegin}. This buffer can be +defined using the @code{struct __htm_tdb} C struct defined in +@code{htmintrin.h} and must reside on a double-word boundary. The +second tbegin operand is set to @code{0xff0c}. This enables +save/restore of all GPRs and disables aborts for FPR and AR +manipulations inside the transaction body. The condition code set by +the tbegin instruction is returned as integer value. The tbegin +instruction by definition overwrites the content of all FPRs. The +compiler will generate code which saves and restores the FPRs. For +soft-float code it is recommended to used the @code{*_nofloat} +variant. In order to prevent a TDB from being written it is required +to pass an constant zero value as parameter. Passing the zero value +through a variable is not sufficient. Although modifications of +access registers inside the transaction will not trigger an +transaction abort it is not supported to actually modify them. Access +registers do not get saved when entering a transaction. They will have +undefined state when reaching the abort code. +@end deftypefn -int vec_any_eq (vector long long, vector long long); -int vec_any_eq (vector unsigned long long, vector unsigned long long); -int vec_any_ge (vector long long, vector long long); -int vec_any_ge (vector unsigned long long, vector unsigned long long); -int vec_any_gt (vector long long, vector long long); -int vec_any_gt (vector unsigned long long, vector unsigned long long); -int vec_any_le (vector long long, vector long long); -int vec_any_le (vector unsigned long long, vector unsigned long long); -int vec_any_lt (vector long long, vector long long); -int vec_any_lt (vector unsigned long long, vector unsigned long long); -int vec_any_ne (vector long long, vector long long); -int vec_any_ne (vector unsigned long long, vector unsigned long long); +Macros for the possible return codes of tbegin are defined in the +@code{htmintrin.h} header file: -vector long long vec_eqv (vector long long, vector long long); -vector long long vec_eqv (vector bool long long, vector long long); -vector long long vec_eqv (vector long long, vector bool long long); -vector unsigned long long vec_eqv (vector unsigned long long, - vector unsigned long long); -vector unsigned long long vec_eqv (vector bool long long, - vector unsigned long long); -vector unsigned long long vec_eqv (vector unsigned long long, - vector bool long long); -vector int vec_eqv (vector int, vector int); -vector int vec_eqv (vector bool int, vector int); -vector int vec_eqv (vector int, vector bool int); -vector unsigned int vec_eqv (vector unsigned int, vector unsigned int); -vector unsigned int vec_eqv (vector bool unsigned int, - vector unsigned int); -vector unsigned int vec_eqv (vector unsigned int, - vector bool unsigned int); -vector short vec_eqv (vector short, vector short); -vector short vec_eqv (vector bool short, vector short); -vector short vec_eqv (vector short, vector bool short); -vector unsigned short vec_eqv (vector unsigned short, vector unsigned short); -vector unsigned short vec_eqv (vector bool unsigned short, - vector unsigned short); -vector unsigned short vec_eqv (vector unsigned short, - vector bool unsigned short); -vector signed char vec_eqv (vector signed char, vector signed char); -vector signed char vec_eqv (vector bool signed char, vector signed char); -vector signed char vec_eqv (vector signed char, vector bool signed char); -vector unsigned char vec_eqv (vector unsigned char, vector unsigned char); -vector unsigned char vec_eqv (vector bool unsigned char, vector unsigned char); -vector unsigned char vec_eqv (vector unsigned char, vector bool unsigned char); +@table @code +@item _HTM_TBEGIN_STARTED +@code{tbegin} has been executed as part of normal processing. The +transaction body is supposed to be executed. +@item _HTM_TBEGIN_INDETERMINATE +The transaction was aborted due to an indeterminate condition which +might be persistent. +@item _HTM_TBEGIN_TRANSIENT +The transaction aborted due to a transient failure. The transaction +should be re-executed in that case. +@item _HTM_TBEGIN_PERSISTENT +The transaction aborted due to a persistent failure. Re-execution +under same circumstances will not be productive. +@end table + +@defmac _HTM_FIRST_USER_ABORT_CODE +The @code{_HTM_FIRST_USER_ABORT_CODE} defined in @code{htmintrin.h} +specifies the first abort code which can be used for +@code{__builtin_tabort}. Values below this threshold are reserved for +machine use. +@end defmac + +@deftp {Data type} {struct __htm_tdb} +The @code{struct __htm_tdb} defined in @code{htmintrin.h} describes +the structure of the transaction diagnostic block as specified in the +Principles of Operation manual chapter 5-91. +@end deftp + +@deftypefn {Built-in Function} int __builtin_tbegin_nofloat (void*) +Same as @code{__builtin_tbegin} but without FPR saves and restores. +Using this variant in code making use of FPRs will leave the FPRs in +undefined state when entering the transaction abort handler code. +@end deftypefn + +@deftypefn {Built-in Function} int __builtin_tbegin_retry (void*, int) +In addition to @code{__builtin_tbegin} a loop for transient failures +is generated. If tbegin returns a condition code of 2 the transaction +will be retried as often as specified in the second argument. The +perform processor assist instruction is used to tell the CPU about the +number of fails so far. +@end deftypefn + +@deftypefn {Built-in Function} int __builtin_tbegin_retry_nofloat (void*, int) +Same as @code{__builtin_tbegin_retry} but without FPR saves and +restores. Using this variant in code making use of FPRs will leave +the FPRs in undefined state when entering the transaction abort +handler code. +@end deftypefn -vector long long vec_max (vector long long, vector long long); -vector unsigned long long vec_max (vector unsigned long long, - vector unsigned long long); +@deftypefn {Built-in Function} void __builtin_tbeginc (void) +Generates the @code{tbeginc} machine instruction starting a constraint +hardware transaction. The second operand is set to @code{0xff08}. +@end deftypefn -vector signed int vec_mergee (vector signed int, vector signed int); -vector unsigned int vec_mergee (vector unsigned int, vector unsigned int); -vector bool int vec_mergee (vector bool int, vector bool int); +@deftypefn {Built-in Function} int __builtin_tend (void) +Generates the @code{tend} machine instruction finishing a transaction +and making the changes visible to other threads. The condition code +generated by tend is returned as integer value. +@end deftypefn -vector signed int vec_mergeo (vector signed int, vector signed int); -vector unsigned int vec_mergeo (vector unsigned int, vector unsigned int); -vector bool int vec_mergeo (vector bool int, vector bool int); +@deftypefn {Built-in Function} void __builtin_tabort (int) +Generates the @code{tabort} machine instruction with the specified +abort code. Abort codes from 0 through 255 are reserved and will +result in an error message. +@end deftypefn -vector long long vec_min (vector long long, vector long long); -vector unsigned long long vec_min (vector unsigned long long, - vector unsigned long long); +@deftypefn {Built-in Function} void __builtin_tx_assist (int) +Generates the @code{ppa rX,rY,1} machine instruction. Where the +integer parameter is loaded into rX and a value of zero is loaded into +rY. The integer parameter specifies the number of times the +transaction repeatedly aborted. +@end deftypefn -vector long long vec_nand (vector long long, vector long long); -vector long long vec_nand (vector bool long long, vector long long); -vector long long vec_nand (vector long long, vector bool long long); -vector unsigned long long vec_nand (vector unsigned long long, - vector unsigned long long); -vector unsigned long long vec_nand (vector bool long long, - vector unsigned long long); -vector unsigned long long vec_nand (vector unsigned long long, - vector bool long long); -vector int vec_nand (vector int, vector int); -vector int vec_nand (vector bool int, vector int); -vector int vec_nand (vector int, vector bool int); -vector unsigned int vec_nand (vector unsigned int, vector unsigned int); -vector unsigned int vec_nand (vector bool unsigned int, - vector unsigned int); -vector unsigned int vec_nand (vector unsigned int, - vector bool unsigned int); -vector short vec_nand (vector short, vector short); -vector short vec_nand (vector bool short, vector short); -vector short vec_nand (vector short, vector bool short); -vector unsigned short vec_nand (vector unsigned short, vector unsigned short); -vector unsigned short vec_nand (vector bool unsigned short, - vector unsigned short); -vector unsigned short vec_nand (vector unsigned short, - vector bool unsigned short); -vector signed char vec_nand (vector signed char, vector signed char); -vector signed char vec_nand (vector bool signed char, vector signed char); -vector signed char vec_nand (vector signed char, vector bool signed char); -vector unsigned char vec_nand (vector unsigned char, vector unsigned char); -vector unsigned char vec_nand (vector bool unsigned char, vector unsigned char); -vector unsigned char vec_nand (vector unsigned char, vector bool unsigned char); +@deftypefn {Built-in Function} int __builtin_tx_nesting_depth (void) +Generates the @code{etnd} machine instruction. The current nesting +depth is returned as integer value. For a nesting depth of 0 the code +is not executed as part of an transaction. +@end deftypefn -vector long long vec_orc (vector long long, vector long long); -vector long long vec_orc (vector bool long long, vector long long); -vector long long vec_orc (vector long long, vector bool long long); -vector unsigned long long vec_orc (vector unsigned long long, - vector unsigned long long); -vector unsigned long long vec_orc (vector bool long long, - vector unsigned long long); -vector unsigned long long vec_orc (vector unsigned long long, - vector bool long long); -vector int vec_orc (vector int, vector int); -vector int vec_orc (vector bool int, vector int); -vector int vec_orc (vector int, vector bool int); -vector unsigned int vec_orc (vector unsigned int, vector unsigned int); -vector unsigned int vec_orc (vector bool unsigned int, - vector unsigned int); -vector unsigned int vec_orc (vector unsigned int, - vector bool unsigned int); -vector short vec_orc (vector short, vector short); -vector short vec_orc (vector bool short, vector short); -vector short vec_orc (vector short, vector bool short); -vector unsigned short vec_orc (vector unsigned short, vector unsigned short); -vector unsigned short vec_orc (vector bool unsigned short, - vector unsigned short); -vector unsigned short vec_orc (vector unsigned short, - vector bool unsigned short); -vector signed char vec_orc (vector signed char, vector signed char); -vector signed char vec_orc (vector bool signed char, vector signed char); -vector signed char vec_orc (vector signed char, vector bool signed char); -vector unsigned char vec_orc (vector unsigned char, vector unsigned char); -vector unsigned char vec_orc (vector bool unsigned char, vector unsigned char); -vector unsigned char vec_orc (vector unsigned char, vector bool unsigned char); +@deftypefn {Built-in Function} void __builtin_non_tx_store (uint64_t *, uint64_t) -vector int vec_pack (vector long long, vector long long); -vector unsigned int vec_pack (vector unsigned long long, - vector unsigned long long); -vector bool int vec_pack (vector bool long long, vector bool long long); +Generates the @code{ntstg} machine instruction. The second argument +is written to the first arguments location. The store operation will +not be rolled-back in case of an transaction abort. +@end deftypefn -vector int vec_packs (vector long long, vector long long); -vector unsigned int vec_packs (vector unsigned long long, - vector unsigned long long); +@node SH Built-in Functions +@subsection SH Built-in Functions +The following built-in functions are supported on the SH1, SH2, SH3 and SH4 +families of processors: -vector unsigned int vec_packsu (vector long long, vector long long); -vector unsigned int vec_packsu (vector unsigned long long, - vector unsigned long long); +@deftypefn {Built-in Function} {void} __builtin_set_thread_pointer (void *@var{ptr}) +Sets the @samp{GBR} register to the specified value @var{ptr}. This is usually +used by system code that manages threads and execution contexts. The compiler +normally does not generate code that modifies the contents of @samp{GBR} and +thus the value is preserved across function calls. Changing the @samp{GBR} +value in user code must be done with caution, since the compiler might use +@samp{GBR} in order to access thread local variables. -vector long long vec_rl (vector long long, - vector unsigned long long); -vector long long vec_rl (vector unsigned long long, - vector unsigned long long); +@end deftypefn -vector long long vec_sl (vector long long, vector unsigned long long); -vector long long vec_sl (vector unsigned long long, - vector unsigned long long); +@deftypefn {Built-in Function} {void *} __builtin_thread_pointer (void) +Returns the value that is currently set in the @samp{GBR} register. +Memory loads and stores that use the thread pointer as a base address are +turned into @samp{GBR} based displacement loads and stores, if possible. +For example: +@smallexample +struct my_tcb +@{ + int a, b, c, d, e; +@}; -vector long long vec_sr (vector long long, vector unsigned long long); -vector unsigned long long char vec_sr (vector unsigned long long, - vector unsigned long long); +int get_tcb_value (void) +@{ + // Generate @samp{mov.l @@(8,gbr),r0} instruction + return ((my_tcb*)__builtin_thread_pointer ())->c; +@} -vector long long vec_sra (vector long long, vector unsigned long long); -vector unsigned long long vec_sra (vector unsigned long long, - vector unsigned long long); +@end smallexample +@end deftypefn -vector long long vec_sub (vector long long, vector long long); -vector unsigned long long vec_sub (vector unsigned long long, - vector unsigned long long); +@deftypefn {Built-in Function} {unsigned int} __builtin_sh_get_fpscr (void) +Returns the value that is currently set in the @samp{FPSCR} register. +@end deftypefn -vector long long vec_unpackh (vector int); -vector unsigned long long vec_unpackh (vector unsigned int); +@deftypefn {Built-in Function} {void} __builtin_sh_set_fpscr (unsigned int @var{val}) +Sets the @samp{FPSCR} register to the specified value @var{val}, while +preserving the current values of the FR, SZ and PR bits. +@end deftypefn -vector long long vec_unpackl (vector int); -vector unsigned long long vec_unpackl (vector unsigned int); +@node SPARC VIS Built-in Functions +@subsection SPARC VIS Built-in Functions -vector long long vec_vaddudm (vector long long, vector long long); -vector long long vec_vaddudm (vector bool long long, vector long long); -vector long long vec_vaddudm (vector long long, vector bool long long); -vector unsigned long long vec_vaddudm (vector unsigned long long, - vector unsigned long long); -vector unsigned long long vec_vaddudm (vector bool unsigned long long, - vector unsigned long long); -vector unsigned long long vec_vaddudm (vector unsigned long long, - vector bool unsigned long long); +GCC supports SIMD operations on the SPARC using both the generic vector +extensions (@pxref{Vector Extensions}) as well as built-in functions for +the SPARC Visual Instruction Set (VIS). When you use the @option{-mvis} +switch, the VIS extension is exposed as the following built-in functions: -vector long long vec_vbpermq (vector signed char, vector signed char); -vector long long vec_vbpermq (vector unsigned char, vector unsigned char); +@smallexample +typedef int v1si __attribute__ ((vector_size (4))); +typedef int v2si __attribute__ ((vector_size (8))); +typedef short v4hi __attribute__ ((vector_size (8))); +typedef short v2hi __attribute__ ((vector_size (4))); +typedef unsigned char v8qi __attribute__ ((vector_size (8))); +typedef unsigned char v4qi __attribute__ ((vector_size (4))); -vector long long vec_cntlz (vector long long); -vector unsigned long long vec_cntlz (vector unsigned long long); -vector int vec_cntlz (vector int); -vector unsigned int vec_cntlz (vector int); -vector short vec_cntlz (vector short); -vector unsigned short vec_cntlz (vector unsigned short); -vector signed char vec_cntlz (vector signed char); -vector unsigned char vec_cntlz (vector unsigned char); +void __builtin_vis_write_gsr (int64_t); +int64_t __builtin_vis_read_gsr (void); -vector long long vec_vclz (vector long long); -vector unsigned long long vec_vclz (vector unsigned long long); -vector int vec_vclz (vector int); -vector unsigned int vec_vclz (vector int); -vector short vec_vclz (vector short); -vector unsigned short vec_vclz (vector unsigned short); -vector signed char vec_vclz (vector signed char); -vector unsigned char vec_vclz (vector unsigned char); +void * __builtin_vis_alignaddr (void *, long); +void * __builtin_vis_alignaddrl (void *, long); +int64_t __builtin_vis_faligndatadi (int64_t, int64_t); +v2si __builtin_vis_faligndatav2si (v2si, v2si); +v4hi __builtin_vis_faligndatav4hi (v4si, v4si); +v8qi __builtin_vis_faligndatav8qi (v8qi, v8qi); -vector signed char vec_vclzb (vector signed char); -vector unsigned char vec_vclzb (vector unsigned char); +v4hi __builtin_vis_fexpand (v4qi); -vector long long vec_vclzd (vector long long); -vector unsigned long long vec_vclzd (vector unsigned long long); +v4hi __builtin_vis_fmul8x16 (v4qi, v4hi); +v4hi __builtin_vis_fmul8x16au (v4qi, v2hi); +v4hi __builtin_vis_fmul8x16al (v4qi, v2hi); +v4hi __builtin_vis_fmul8sux16 (v8qi, v4hi); +v4hi __builtin_vis_fmul8ulx16 (v8qi, v4hi); +v2si __builtin_vis_fmuld8sux16 (v4qi, v2hi); +v2si __builtin_vis_fmuld8ulx16 (v4qi, v2hi); + +v4qi __builtin_vis_fpack16 (v4hi); +v8qi __builtin_vis_fpack32 (v2si, v8qi); +v2hi __builtin_vis_fpackfix (v2si); +v8qi __builtin_vis_fpmerge (v4qi, v4qi); + +int64_t __builtin_vis_pdist (v8qi, v8qi, int64_t); + +long __builtin_vis_edge8 (void *, void *); +long __builtin_vis_edge8l (void *, void *); +long __builtin_vis_edge16 (void *, void *); +long __builtin_vis_edge16l (void *, void *); +long __builtin_vis_edge32 (void *, void *); +long __builtin_vis_edge32l (void *, void *); + +long __builtin_vis_fcmple16 (v4hi, v4hi); +long __builtin_vis_fcmple32 (v2si, v2si); +long __builtin_vis_fcmpne16 (v4hi, v4hi); +long __builtin_vis_fcmpne32 (v2si, v2si); +long __builtin_vis_fcmpgt16 (v4hi, v4hi); +long __builtin_vis_fcmpgt32 (v2si, v2si); +long __builtin_vis_fcmpeq16 (v4hi, v4hi); +long __builtin_vis_fcmpeq32 (v2si, v2si); + +v4hi __builtin_vis_fpadd16 (v4hi, v4hi); +v2hi __builtin_vis_fpadd16s (v2hi, v2hi); +v2si __builtin_vis_fpadd32 (v2si, v2si); +v1si __builtin_vis_fpadd32s (v1si, v1si); +v4hi __builtin_vis_fpsub16 (v4hi, v4hi); +v2hi __builtin_vis_fpsub16s (v2hi, v2hi); +v2si __builtin_vis_fpsub32 (v2si, v2si); +v1si __builtin_vis_fpsub32s (v1si, v1si); -vector short vec_vclzh (vector short); -vector unsigned short vec_vclzh (vector unsigned short); +long __builtin_vis_array8 (long, long); +long __builtin_vis_array16 (long, long); +long __builtin_vis_array32 (long, long); +@end smallexample -vector int vec_vclzw (vector int); -vector unsigned int vec_vclzw (vector int); +When you use the @option{-mvis2} switch, the VIS version 2.0 built-in +functions also become available: -vector signed char vec_vgbbd (vector signed char); -vector unsigned char vec_vgbbd (vector unsigned char); +@smallexample +long __builtin_vis_bmask (long, long); +int64_t __builtin_vis_bshuffledi (int64_t, int64_t); +v2si __builtin_vis_bshufflev2si (v2si, v2si); +v4hi __builtin_vis_bshufflev2si (v4hi, v4hi); +v8qi __builtin_vis_bshufflev2si (v8qi, v8qi); -vector long long vec_vmaxsd (vector long long, vector long long); +long __builtin_vis_edge8n (void *, void *); +long __builtin_vis_edge8ln (void *, void *); +long __builtin_vis_edge16n (void *, void *); +long __builtin_vis_edge16ln (void *, void *); +long __builtin_vis_edge32n (void *, void *); +long __builtin_vis_edge32ln (void *, void *); +@end smallexample -vector unsigned long long vec_vmaxud (vector unsigned long long, - unsigned vector long long); +When you use the @option{-mvis3} switch, the VIS version 3.0 built-in +functions also become available: -vector long long vec_vminsd (vector long long, vector long long); +@smallexample +void __builtin_vis_cmask8 (long); +void __builtin_vis_cmask16 (long); +void __builtin_vis_cmask32 (long); -vector unsigned long long vec_vminud (vector long long, - vector long long); +v4hi __builtin_vis_fchksm16 (v4hi, v4hi); -vector int vec_vpksdss (vector long long, vector long long); -vector unsigned int vec_vpksdss (vector long long, vector long long); +v4hi __builtin_vis_fsll16 (v4hi, v4hi); +v4hi __builtin_vis_fslas16 (v4hi, v4hi); +v4hi __builtin_vis_fsrl16 (v4hi, v4hi); +v4hi __builtin_vis_fsra16 (v4hi, v4hi); +v2si __builtin_vis_fsll16 (v2si, v2si); +v2si __builtin_vis_fslas16 (v2si, v2si); +v2si __builtin_vis_fsrl16 (v2si, v2si); +v2si __builtin_vis_fsra16 (v2si, v2si); -vector unsigned int vec_vpkudus (vector unsigned long long, - vector unsigned long long); +long __builtin_vis_pdistn (v8qi, v8qi); -vector int vec_vpkudum (vector long long, vector long long); -vector unsigned int vec_vpkudum (vector unsigned long long, - vector unsigned long long); -vector bool int vec_vpkudum (vector bool long long, vector bool long long); +v4hi __builtin_vis_fmean16 (v4hi, v4hi); -vector long long vec_vpopcnt (vector long long); -vector unsigned long long vec_vpopcnt (vector unsigned long long); -vector int vec_vpopcnt (vector int); -vector unsigned int vec_vpopcnt (vector int); -vector short vec_vpopcnt (vector short); -vector unsigned short vec_vpopcnt (vector unsigned short); -vector signed char vec_vpopcnt (vector signed char); -vector unsigned char vec_vpopcnt (vector unsigned char); +int64_t __builtin_vis_fpadd64 (int64_t, int64_t); +int64_t __builtin_vis_fpsub64 (int64_t, int64_t); -vector signed char vec_vpopcntb (vector signed char); -vector unsigned char vec_vpopcntb (vector unsigned char); +v4hi __builtin_vis_fpadds16 (v4hi, v4hi); +v2hi __builtin_vis_fpadds16s (v2hi, v2hi); +v4hi __builtin_vis_fpsubs16 (v4hi, v4hi); +v2hi __builtin_vis_fpsubs16s (v2hi, v2hi); +v2si __builtin_vis_fpadds32 (v2si, v2si); +v1si __builtin_vis_fpadds32s (v1si, v1si); +v2si __builtin_vis_fpsubs32 (v2si, v2si); +v1si __builtin_vis_fpsubs32s (v1si, v1si); -vector long long vec_vpopcntd (vector long long); -vector unsigned long long vec_vpopcntd (vector unsigned long long); +long __builtin_vis_fucmple8 (v8qi, v8qi); +long __builtin_vis_fucmpne8 (v8qi, v8qi); +long __builtin_vis_fucmpgt8 (v8qi, v8qi); +long __builtin_vis_fucmpeq8 (v8qi, v8qi); -vector short vec_vpopcnth (vector short); -vector unsigned short vec_vpopcnth (vector unsigned short); +float __builtin_vis_fhadds (float, float); +double __builtin_vis_fhaddd (double, double); +float __builtin_vis_fhsubs (float, float); +double __builtin_vis_fhsubd (double, double); +float __builtin_vis_fnhadds (float, float); +double __builtin_vis_fnhaddd (double, double); -vector int vec_vpopcntw (vector int); -vector unsigned int vec_vpopcntw (vector int); +int64_t __builtin_vis_umulxhi (int64_t, int64_t); +int64_t __builtin_vis_xmulx (int64_t, int64_t); +int64_t __builtin_vis_xmulxhi (int64_t, int64_t); +@end smallexample -vector long long vec_vrld (vector long long, vector unsigned long long); -vector unsigned long long vec_vrld (vector unsigned long long, - vector unsigned long long); +@node SPU Built-in Functions +@subsection SPU Built-in Functions -vector long long vec_vsld (vector long long, vector unsigned long long); -vector long long vec_vsld (vector unsigned long long, - vector unsigned long long); +GCC provides extensions for the SPU processor as described in the +Sony/Toshiba/IBM SPU Language Extensions Specification, which can be +found at @uref{http://cell.scei.co.jp/} or +@uref{http://www.ibm.com/developerworks/power/cell/}. GCC's +implementation differs in several ways. -vector long long vec_vsrad (vector long long, vector unsigned long long); -vector unsigned long long vec_vsrad (vector unsigned long long, - vector unsigned long long); +@itemize @bullet -vector long long vec_vsrd (vector long long, vector unsigned long long); -vector unsigned long long char vec_vsrd (vector unsigned long long, - vector unsigned long long); +@item +The optional extension of specifying vector constants in parentheses is +not supported. -vector long long vec_vsubudm (vector long long, vector long long); -vector long long vec_vsubudm (vector bool long long, vector long long); -vector long long vec_vsubudm (vector long long, vector bool long long); -vector unsigned long long vec_vsubudm (vector unsigned long long, - vector unsigned long long); -vector unsigned long long vec_vsubudm (vector bool long long, - vector unsigned long long); -vector unsigned long long vec_vsubudm (vector unsigned long long, - vector bool long long); +@item +A vector initializer requires no cast if the vector constant is of the +same type as the variable it is initializing. -vector long long vec_vupkhsw (vector int); -vector unsigned long long vec_vupkhsw (vector unsigned int); +@item +If @code{signed} or @code{unsigned} is omitted, the signedness of the +vector type is the default signedness of the base type. The default +varies depending on the operating system, so a portable program should +always specify the signedness. -vector long long vec_vupklsw (vector int); -vector unsigned long long vec_vupklsw (vector int); -@end smallexample +@item +By default, the keyword @code{__vector} is added. The macro +@code{vector} is defined in @code{} and can be +undefined. -If the ISA 2.07 additions to the vector/scalar (power8-vector) -instruction set is available, the following additional functions are -available for 64-bit targets. New vector types -(@var{vector __int128_t} and @var{vector __uint128_t}) are available -to hold the @var{__int128_t} and @var{__uint128_t} types to use these -builtins. +@item +GCC allows using a @code{typedef} name as the type specifier for a +vector type. -The normal vector extract, and set operations work on -@var{vector __int128_t} and @var{vector __uint128_t} types, -but the index value must be 0. +@item +For C, overloaded functions are implemented with macros so the following +does not work: @smallexample -vector __int128_t vec_vaddcuq (vector __int128_t, vector __int128_t); -vector __uint128_t vec_vaddcuq (vector __uint128_t, vector __uint128_t); - -vector __int128_t vec_vadduqm (vector __int128_t, vector __int128_t); -vector __uint128_t vec_vadduqm (vector __uint128_t, vector __uint128_t); + spu_add ((vector signed int)@{1, 2, 3, 4@}, foo); +@end smallexample -vector __int128_t vec_vaddecuq (vector __int128_t, vector __int128_t, - vector __int128_t); -vector __uint128_t vec_vaddecuq (vector __uint128_t, vector __uint128_t, - vector __uint128_t); +@noindent +Since @code{spu_add} is a macro, the vector constant in the example +is treated as four separate arguments. Wrap the entire argument in +parentheses for this to work. -vector __int128_t vec_vaddeuqm (vector __int128_t, vector __int128_t, - vector __int128_t); -vector __uint128_t vec_vaddeuqm (vector __uint128_t, vector __uint128_t, - vector __uint128_t); +@item +The extended version of @code{__builtin_expect} is not supported. -vector __int128_t vec_vsubecuq (vector __int128_t, vector __int128_t, - vector __int128_t); -vector __uint128_t vec_vsubecuq (vector __uint128_t, vector __uint128_t, - vector __uint128_t); +@end itemize -vector __int128_t vec_vsubeuqm (vector __int128_t, vector __int128_t, - vector __int128_t); -vector __uint128_t vec_vsubeuqm (vector __uint128_t, vector __uint128_t, - vector __uint128_t); +@emph{Note:} Only the interface described in the aforementioned +specification is supported. Internally, GCC uses built-in functions to +implement the required functionality, but these are not supported and +are subject to change without notice. -vector __int128_t vec_vsubcuq (vector __int128_t, vector __int128_t); -vector __uint128_t vec_vsubcuq (vector __uint128_t, vector __uint128_t); +@node TI C6X Built-in Functions +@subsection TI C6X Built-in Functions -__int128_t vec_vsubuqm (__int128_t, __int128_t); -__uint128_t vec_vsubuqm (__uint128_t, __uint128_t); +GCC provides intrinsics to access certain instructions of the TI C6X +processors. These intrinsics, listed below, are available after +inclusion of the @code{c6x_intrinsics.h} header file. They map directly +to C6X instructions. -vector __int128_t __builtin_bcdadd (vector __int128_t, vector__int128_t); -int __builtin_bcdadd_lt (vector __int128_t, vector__int128_t); -int __builtin_bcdadd_eq (vector __int128_t, vector__int128_t); -int __builtin_bcdadd_gt (vector __int128_t, vector__int128_t); -int __builtin_bcdadd_ov (vector __int128_t, vector__int128_t); -vector __int128_t bcdsub (vector __int128_t, vector__int128_t); -int __builtin_bcdsub_lt (vector __int128_t, vector__int128_t); -int __builtin_bcdsub_eq (vector __int128_t, vector__int128_t); -int __builtin_bcdsub_gt (vector __int128_t, vector__int128_t); -int __builtin_bcdsub_ov (vector __int128_t, vector__int128_t); -@end smallexample +@smallexample -If the cryptographic instructions are enabled (@option{-mcrypto} or -@option{-mcpu=power8}), the following builtins are enabled. +int _sadd (int, int) +int _ssub (int, int) +int _sadd2 (int, int) +int _ssub2 (int, int) +long long _mpy2 (int, int) +long long _smpy2 (int, int) +int _add4 (int, int) +int _sub4 (int, int) +int _saddu4 (int, int) -@smallexample -vector unsigned long long __builtin_crypto_vsbox (vector unsigned long long); +int _smpy (int, int) +int _smpyh (int, int) +int _smpyhl (int, int) +int _smpylh (int, int) -vector unsigned long long __builtin_crypto_vcipher (vector unsigned long long, - vector unsigned long long); +int _sshl (int, int) +int _subc (int, int) -vector unsigned long long __builtin_crypto_vcipherlast - (vector unsigned long long, - vector unsigned long long); +int _avg2 (int, int) +int _avgu4 (int, int) -vector unsigned long long __builtin_crypto_vncipher (vector unsigned long long, - vector unsigned long long); +int _clrr (int, int) +int _extr (int, int) +int _extru (int, int) +int _abs (int) +int _abs2 (int) -vector unsigned long long __builtin_crypto_vncipherlast - (vector unsigned long long, - vector unsigned long long); +@end smallexample -vector unsigned char __builtin_crypto_vpermxor (vector unsigned char, - vector unsigned char, - vector unsigned char); +@node TILE-Gx Built-in Functions +@subsection TILE-Gx Built-in Functions -vector unsigned short __builtin_crypto_vpermxor (vector unsigned short, - vector unsigned short, - vector unsigned short); +GCC provides intrinsics to access every instruction of the TILE-Gx +processor. The intrinsics are of the form: -vector unsigned int __builtin_crypto_vpermxor (vector unsigned int, - vector unsigned int, - vector unsigned int); +@smallexample -vector unsigned long long __builtin_crypto_vpermxor (vector unsigned long long, - vector unsigned long long, - vector unsigned long long); +unsigned long long __insn_@var{op} (...) -vector unsigned char __builtin_crypto_vpmsumb (vector unsigned char, - vector unsigned char); +@end smallexample -vector unsigned short __builtin_crypto_vpmsumb (vector unsigned short, - vector unsigned short); +Where @var{op} is the name of the instruction. Refer to the ISA manual +for the complete list of instructions. -vector unsigned int __builtin_crypto_vpmsumb (vector unsigned int, - vector unsigned int); +GCC also provides intrinsics to directly access the network registers. +The intrinsics are: -vector unsigned long long __builtin_crypto_vpmsumb (vector unsigned long long, - vector unsigned long long); +@smallexample -vector unsigned long long __builtin_crypto_vshasigmad - (vector unsigned long long, int, int); +unsigned long long __tile_idn0_receive (void) +unsigned long long __tile_idn1_receive (void) +unsigned long long __tile_udn0_receive (void) +unsigned long long __tile_udn1_receive (void) +unsigned long long __tile_udn2_receive (void) +unsigned long long __tile_udn3_receive (void) +void __tile_idn_send (unsigned long long) +void __tile_udn_send (unsigned long long) -vector unsigned int __builtin_crypto_vshasigmaw (vector unsigned int, - int, int); @end smallexample -The second argument to the @var{__builtin_crypto_vshasigmad} and -@var{__builtin_crypto_vshasigmaw} builtin functions must be a constant -integer that is 0 or 1. The third argument to these builtin functions -must be a constant integer in the range of 0 to 15. - -@node PowerPC Hardware Transactional Memory Built-in Functions -@subsection PowerPC Hardware Transactional Memory Built-in Functions -GCC provides two interfaces for accessing the Hardware Transactional -Memory (HTM) instructions available on some of the PowerPC family -of prcoessors (eg, POWER8). The two interfaces come in a low level -interface, consisting of built-in functions specific to PowerPC and a -higher level interface consisting of inline functions that are common -between PowerPC and S/390. - -@subsubsection PowerPC HTM Low Level Built-in Functions +The intrinsic @code{void __tile_network_barrier (void)} is used to +guarantee that no network operations before it are reordered with +those after it. -The following low level built-in functions are available with -@option{-mhtm} or @option{-mcpu=CPU} where CPU is `power8' or later. -They all generate the machine instruction that is part of the name. +@node TILEPro Built-in Functions +@subsection TILEPro Built-in Functions -The HTM built-ins return true or false depending on their success and -their arguments match exactly the type and order of the associated -hardware instruction's operands. Refer to the ISA manual for a -description of each instruction's operands. +GCC provides intrinsics to access every instruction of the TILEPro +processor. The intrinsics are of the form: @smallexample -unsigned int __builtin_tbegin (unsigned int) -unsigned int __builtin_tend (unsigned int) -unsigned int __builtin_tabort (unsigned int) -unsigned int __builtin_tabortdc (unsigned int, unsigned int, unsigned int) -unsigned int __builtin_tabortdci (unsigned int, unsigned int, int) -unsigned int __builtin_tabortwc (unsigned int, unsigned int, unsigned int) -unsigned int __builtin_tabortwci (unsigned int, unsigned int, int) +unsigned __insn_@var{op} (...) -unsigned int __builtin_tcheck (unsigned int) -unsigned int __builtin_treclaim (unsigned int) -unsigned int __builtin_trechkpt (void) -unsigned int __builtin_tsr (unsigned int) @end smallexample -In addition to the above HTM built-ins, we have added built-ins for -some common extended mnemonics of the HTM instructions: - -@smallexample -unsigned int __builtin_tendall (void) -unsigned int __builtin_tresume (void) -unsigned int __builtin_tsuspend (void) -@end smallexample +@noindent +where @var{op} is the name of the instruction. Refer to the ISA manual +for the complete list of instructions. -The following set of built-in functions are available to gain access -to the HTM specific special purpose registers. +GCC also provides intrinsics to directly access the network registers. +The intrinsics are: @smallexample -unsigned long __builtin_get_texasr (void) -unsigned long __builtin_get_texasru (void) -unsigned long __builtin_get_tfhar (void) -unsigned long __builtin_get_tfiar (void) -void __builtin_set_texasr (unsigned long); -void __builtin_set_texasru (unsigned long); -void __builtin_set_tfhar (unsigned long); -void __builtin_set_tfiar (unsigned long); +unsigned __tile_idn0_receive (void) +unsigned __tile_idn1_receive (void) +unsigned __tile_sn_receive (void) +unsigned __tile_udn0_receive (void) +unsigned __tile_udn1_receive (void) +unsigned __tile_udn2_receive (void) +unsigned __tile_udn3_receive (void) +void __tile_idn_send (unsigned) +void __tile_sn_send (unsigned) +void __tile_udn_send (unsigned) + @end smallexample -Example usage of these low level built-in functions may look like: +The intrinsic @code{void __tile_network_barrier (void)} is used to +guarantee that no network operations before it are reordered with +those after it. -@smallexample -#include +@node x86 Built-in Functions +@subsection x86 Built-in Functions -int num_retries = 10; +These built-in functions are available for the x86-32 and x86-64 family +of computers, depending on the command-line switches used. -while (1) - @{ - if (__builtin_tbegin (0)) - @{ - /* Transaction State Initiated. */ - if (is_locked (lock)) - __builtin_tabort (0); - ... transaction code... - __builtin_tend (0); - break; - @} - else - @{ - /* Transaction State Failed. Use locks if the transaction - failure is "persistent" or we've tried too many times. */ - if (num_retries-- <= 0 - || _TEXASRU_FAILURE_PERSISTENT (__builtin_get_texasru ())) - @{ - acquire_lock (lock); - ... non transactional fallback path... - release_lock (lock); - break; - @} - @} - @} -@end smallexample +If you specify command-line switches such as @option{-msse}, +the compiler could use the extended instruction sets even if the built-ins +are not used explicitly in the program. For this reason, applications +that perform run-time CPU detection must compile separate files for each +supported architecture, using the appropriate flags. In particular, +the file containing the CPU detection code should be compiled without +these options. -One final built-in function has been added that returns the value of -the 2-bit Transaction State field of the Machine Status Register (MSR) -as stored in @code{CR0}. +The following machine modes are available for use with MMX built-in functions +(@pxref{Vector Extensions}): @code{V2SI} for a vector of two 32-bit integers, +@code{V4HI} for a vector of four 16-bit integers, and @code{V8QI} for a +vector of eight 8-bit integers. Some of the built-in functions operate on +MMX registers as a whole 64-bit entity, these use @code{V1DI} as their mode. -@smallexample -unsigned long __builtin_ttest (void) -@end smallexample +If 3DNow!@: extensions are enabled, @code{V2SF} is used as a mode for a vector +of two 32-bit floating-point values. -This built-in can be used to determine the current transaction state -using the following code example: +If SSE extensions are enabled, @code{V4SF} is used for a vector of four 32-bit +floating-point values. Some instructions use a vector of four 32-bit +integers, these use @code{V4SI}. Finally, some instructions operate on an +entire vector register, interpreting it as a 128-bit integer, these use mode +@code{TI}. -@smallexample -#include +In 64-bit mode, the x86-64 family of processors uses additional built-in +functions for efficient use of @code{TF} (@code{__float128}) 128-bit +floating point and @code{TC} 128-bit complex floating-point values. -unsigned char tx_state = _HTM_STATE (__builtin_ttest ()); +The following floating-point built-in functions are available in 64-bit +mode. All of them implement the function that is part of the name. -if (tx_state == _HTM_TRANSACTIONAL) - @{ - /* Code to use in transactional state. */ - @} -else if (tx_state == _HTM_NONTRANSACTIONAL) - @{ - /* Code to use in non-transactional state. */ - @} -else if (tx_state == _HTM_SUSPENDED) - @{ - /* Code to use in transaction suspended state. */ - @} +@smallexample +__float128 __builtin_fabsq (__float128) +__float128 __builtin_copysignq (__float128, __float128) @end smallexample -@subsubsection PowerPC HTM High Level Inline Functions +The following built-in function is always available. -The following high level HTM interface is made available by including -@code{} and using @option{-mhtm} or @option{-mcpu=CPU} -where CPU is `power8' or later. This interface is common between PowerPC -and S/390, allowing users to write one HTM source implementation that -can be compiled and executed on either system. +@table @code +@item void __builtin_ia32_pause (void) +Generates the @code{pause} machine instruction with a compiler memory +barrier. +@end table -@smallexample -long __TM_simple_begin (void) -long __TM_begin (void* const TM_buff) -long __TM_end (void) -void __TM_abort (void) -void __TM_named_abort (unsigned char const code) -void __TM_resume (void) -void __TM_suspend (void) +The following floating-point built-in functions are made available in the +64-bit mode. -long __TM_is_user_abort (void* const TM_buff) -long __TM_is_named_user_abort (void* const TM_buff, unsigned char *code) -long __TM_is_illegal (void* const TM_buff) -long __TM_is_footprint_exceeded (void* const TM_buff) -long __TM_nesting_depth (void* const TM_buff) -long __TM_is_nested_too_deep(void* const TM_buff) -long __TM_is_conflict(void* const TM_buff) -long __TM_is_failure_persistent(void* const TM_buff) -long __TM_failure_address(void* const TM_buff) -long long __TM_failure_code(void* const TM_buff) -@end smallexample +@table @code +@item __float128 __builtin_infq (void) +Similar to @code{__builtin_inf}, except the return type is @code{__float128}. +@findex __builtin_infq -Using these common set of HTM inline functions, we can create -a more portable version of the HTM example in the previous -section that will work on either PowerPC or S/390: +@item __float128 __builtin_huge_valq (void) +Similar to @code{__builtin_huge_val}, except the return type is @code{__float128}. +@findex __builtin_huge_valq +@end table + +The following built-in functions are always available and can be used to +check the target platform type. + +@deftypefn {Built-in Function} void __builtin_cpu_init (void) +This function runs the CPU detection code to check the type of CPU and the +features supported. This built-in function needs to be invoked along with the built-in functions +to check CPU type and features, @code{__builtin_cpu_is} and +@code{__builtin_cpu_supports}, only when used in a function that is +executed before any constructors are called. The CPU detection code is +automatically executed in a very high priority constructor. +For example, this function has to be used in @code{ifunc} resolvers that +check for CPU type using the built-in functions @code{__builtin_cpu_is} +and @code{__builtin_cpu_supports}, or in constructors on targets that +don't support constructor priority. @smallexample -#include -int num_retries = 10; -TM_buff_type TM_buff; +static void (*resolve_memcpy (void)) (void) +@{ + // ifunc resolvers fire before constructors, explicitly call the init + // function. + __builtin_cpu_init (); + if (__builtin_cpu_supports ("ssse3")) + return ssse3_memcpy; // super fast memcpy with ssse3 instructions. + else + return default_memcpy; +@} -while (1) - @{ - if (__TM_begin (TM_buff)) - @{ - /* Transaction State Initiated. */ - if (is_locked (lock)) - __TM_abort (); - ... transaction code... - __TM_end (); - break; - @} - else - @{ - /* Transaction State Failed. Use locks if the transaction - failure is "persistent" or we've tried too many times. */ - if (num_retries-- <= 0 - || __TM_is_failure_persistent (TM_buff)) - @{ - acquire_lock (lock); - ... non transactional fallback path... - release_lock (lock); - break; - @} - @} - @} +void *memcpy (void *, const void *, size_t) + __attribute__ ((ifunc ("resolve_memcpy"))); @end smallexample -@node RX Built-in Functions -@subsection RX Built-in Functions -GCC supports some of the RX instructions which cannot be expressed in -the C programming language via the use of built-in functions. The -following functions are supported: - -@deftypefn {Built-in Function} void __builtin_rx_brk (void) -Generates the @code{brk} machine instruction. @end deftypefn -@deftypefn {Built-in Function} void __builtin_rx_clrpsw (int) -Generates the @code{clrpsw} machine instruction to clear the specified -bit in the processor status word. -@end deftypefn +@deftypefn {Built-in Function} int __builtin_cpu_is (const char *@var{cpuname}) +This function returns a positive integer if the run-time CPU +is of type @var{cpuname} +and returns @code{0} otherwise. The following CPU names can be detected: -@deftypefn {Built-in Function} void __builtin_rx_int (int) -Generates the @code{int} machine instruction to generate an interrupt -with the specified value. -@end deftypefn +@table @samp +@item intel +Intel CPU. -@deftypefn {Built-in Function} void __builtin_rx_machi (int, int) -Generates the @code{machi} machine instruction to add the result of -multiplying the top 16 bits of the two arguments into the -accumulator. -@end deftypefn +@item atom +Intel Atom CPU. -@deftypefn {Built-in Function} void __builtin_rx_maclo (int, int) -Generates the @code{maclo} machine instruction to add the result of -multiplying the bottom 16 bits of the two arguments into the -accumulator. -@end deftypefn +@item core2 +Intel Core 2 CPU. -@deftypefn {Built-in Function} void __builtin_rx_mulhi (int, int) -Generates the @code{mulhi} machine instruction to place the result of -multiplying the top 16 bits of the two arguments into the -accumulator. -@end deftypefn +@item corei7 +Intel Core i7 CPU. -@deftypefn {Built-in Function} void __builtin_rx_mullo (int, int) -Generates the @code{mullo} machine instruction to place the result of -multiplying the bottom 16 bits of the two arguments into the -accumulator. -@end deftypefn +@item nehalem +Intel Core i7 Nehalem CPU. -@deftypefn {Built-in Function} int __builtin_rx_mvfachi (void) -Generates the @code{mvfachi} machine instruction to read the top -32 bits of the accumulator. -@end deftypefn +@item westmere +Intel Core i7 Westmere CPU. -@deftypefn {Built-in Function} int __builtin_rx_mvfacmi (void) -Generates the @code{mvfacmi} machine instruction to read the middle -32 bits of the accumulator. -@end deftypefn +@item sandybridge +Intel Core i7 Sandy Bridge CPU. -@deftypefn {Built-in Function} int __builtin_rx_mvfc (int) -Generates the @code{mvfc} machine instruction which reads the control -register specified in its argument and returns its value. -@end deftypefn +@item amd +AMD CPU. -@deftypefn {Built-in Function} void __builtin_rx_mvtachi (int) -Generates the @code{mvtachi} machine instruction to set the top -32 bits of the accumulator. -@end deftypefn +@item amdfam10h +AMD Family 10h CPU. -@deftypefn {Built-in Function} void __builtin_rx_mvtaclo (int) -Generates the @code{mvtaclo} machine instruction to set the bottom -32 bits of the accumulator. -@end deftypefn +@item barcelona +AMD Family 10h Barcelona CPU. -@deftypefn {Built-in Function} void __builtin_rx_mvtc (int reg, int val) -Generates the @code{mvtc} machine instruction which sets control -register number @code{reg} to @code{val}. -@end deftypefn +@item shanghai +AMD Family 10h Shanghai CPU. -@deftypefn {Built-in Function} void __builtin_rx_mvtipl (int) -Generates the @code{mvtipl} machine instruction set the interrupt -priority level. -@end deftypefn +@item istanbul +AMD Family 10h Istanbul CPU. -@deftypefn {Built-in Function} void __builtin_rx_racw (int) -Generates the @code{racw} machine instruction to round the accumulator -according to the specified mode. -@end deftypefn +@item btver1 +AMD Family 14h CPU. -@deftypefn {Built-in Function} int __builtin_rx_revw (int) -Generates the @code{revw} machine instruction which swaps the bytes in -the argument so that bits 0--7 now occupy bits 8--15 and vice versa, -and also bits 16--23 occupy bits 24--31 and vice versa. -@end deftypefn +@item amdfam15h +AMD Family 15h CPU. -@deftypefn {Built-in Function} void __builtin_rx_rmpa (void) -Generates the @code{rmpa} machine instruction which initiates a -repeated multiply and accumulate sequence. -@end deftypefn +@item bdver1 +AMD Family 15h Bulldozer version 1. -@deftypefn {Built-in Function} void __builtin_rx_round (float) -Generates the @code{round} machine instruction which returns the -floating-point argument rounded according to the current rounding mode -set in the floating-point status word register. -@end deftypefn +@item bdver2 +AMD Family 15h Bulldozer version 2. -@deftypefn {Built-in Function} int __builtin_rx_sat (int) -Generates the @code{sat} machine instruction which returns the -saturated value of the argument. -@end deftypefn +@item bdver3 +AMD Family 15h Bulldozer version 3. -@deftypefn {Built-in Function} void __builtin_rx_setpsw (int) -Generates the @code{setpsw} machine instruction to set the specified -bit in the processor status word. -@end deftypefn +@item bdver4 +AMD Family 15h Bulldozer version 4. -@deftypefn {Built-in Function} void __builtin_rx_wait (void) -Generates the @code{wait} machine instruction. +@item btver2 +AMD Family 16h CPU. +@end table + +Here is an example: +@smallexample +if (__builtin_cpu_is ("corei7")) + @{ + do_corei7 (); // Core i7 specific implementation. + @} +else + @{ + do_generic (); // Generic implementation. + @} +@end smallexample @end deftypefn -@node S/390 System z Built-in Functions -@subsection S/390 System z Built-in Functions -@deftypefn {Built-in Function} int __builtin_tbegin (void*) -Generates the @code{tbegin} machine instruction starting a -non-constraint hardware transaction. If the parameter is non-NULL the -memory area is used to store the transaction diagnostic buffer and -will be passed as first operand to @code{tbegin}. This buffer can be -defined using the @code{struct __htm_tdb} C struct defined in -@code{htmintrin.h} and must reside on a double-word boundary. The -second tbegin operand is set to @code{0xff0c}. This enables -save/restore of all GPRs and disables aborts for FPR and AR -manipulations inside the transaction body. The condition code set by -the tbegin instruction is returned as integer value. The tbegin -instruction by definition overwrites the content of all FPRs. The -compiler will generate code which saves and restores the FPRs. For -soft-float code it is recommended to used the @code{*_nofloat} -variant. In order to prevent a TDB from being written it is required -to pass an constant zero value as parameter. Passing the zero value -through a variable is not sufficient. Although modifications of -access registers inside the transaction will not trigger an -transaction abort it is not supported to actually modify them. Access -registers do not get saved when entering a transaction. They will have -undefined state when reaching the abort code. +@deftypefn {Built-in Function} int __builtin_cpu_supports (const char *@var{feature}) +This function returns a positive integer if the run-time CPU +supports @var{feature} +and returns @code{0} otherwise. The following features can be detected: + +@table @samp +@item cmov +CMOV instruction. +@item mmx +MMX instructions. +@item popcnt +POPCNT instruction. +@item sse +SSE instructions. +@item sse2 +SSE2 instructions. +@item sse3 +SSE3 instructions. +@item ssse3 +SSSE3 instructions. +@item sse4.1 +SSE4.1 instructions. +@item sse4.2 +SSE4.2 instructions. +@item avx +AVX instructions. +@item avx2 +AVX2 instructions. +@item avx512f +AVX512F instructions. +@end table + +Here is an example: +@smallexample +if (__builtin_cpu_supports ("popcnt")) + @{ + asm("popcnt %1,%0" : "=r"(count) : "rm"(n) : "cc"); + @} +else + @{ + count = generic_countbits (n); //generic implementation. + @} +@end smallexample @end deftypefn -Macros for the possible return codes of tbegin are defined in the -@code{htmintrin.h} header file: -@table @code -@item _HTM_TBEGIN_STARTED -@code{tbegin} has been executed as part of normal processing. The -transaction body is supposed to be executed. -@item _HTM_TBEGIN_INDETERMINATE -The transaction was aborted due to an indeterminate condition which -might be persistent. -@item _HTM_TBEGIN_TRANSIENT -The transaction aborted due to a transient failure. The transaction -should be re-executed in that case. -@item _HTM_TBEGIN_PERSISTENT -The transaction aborted due to a persistent failure. Re-execution -under same circumstances will not be productive. -@end table +The following built-in functions are made available by @option{-mmmx}. +All of them generate the machine instruction that is part of the name. -@defmac _HTM_FIRST_USER_ABORT_CODE -The @code{_HTM_FIRST_USER_ABORT_CODE} defined in @code{htmintrin.h} -specifies the first abort code which can be used for -@code{__builtin_tabort}. Values below this threshold are reserved for -machine use. -@end defmac +@smallexample +v8qi __builtin_ia32_paddb (v8qi, v8qi) +v4hi __builtin_ia32_paddw (v4hi, v4hi) +v2si __builtin_ia32_paddd (v2si, v2si) +v8qi __builtin_ia32_psubb (v8qi, v8qi) +v4hi __builtin_ia32_psubw (v4hi, v4hi) +v2si __builtin_ia32_psubd (v2si, v2si) +v8qi __builtin_ia32_paddsb (v8qi, v8qi) +v4hi __builtin_ia32_paddsw (v4hi, v4hi) +v8qi __builtin_ia32_psubsb (v8qi, v8qi) +v4hi __builtin_ia32_psubsw (v4hi, v4hi) +v8qi __builtin_ia32_paddusb (v8qi, v8qi) +v4hi __builtin_ia32_paddusw (v4hi, v4hi) +v8qi __builtin_ia32_psubusb (v8qi, v8qi) +v4hi __builtin_ia32_psubusw (v4hi, v4hi) +v4hi __builtin_ia32_pmullw (v4hi, v4hi) +v4hi __builtin_ia32_pmulhw (v4hi, v4hi) +di __builtin_ia32_pand (di, di) +di __builtin_ia32_pandn (di,di) +di __builtin_ia32_por (di, di) +di __builtin_ia32_pxor (di, di) +v8qi __builtin_ia32_pcmpeqb (v8qi, v8qi) +v4hi __builtin_ia32_pcmpeqw (v4hi, v4hi) +v2si __builtin_ia32_pcmpeqd (v2si, v2si) +v8qi __builtin_ia32_pcmpgtb (v8qi, v8qi) +v4hi __builtin_ia32_pcmpgtw (v4hi, v4hi) +v2si __builtin_ia32_pcmpgtd (v2si, v2si) +v8qi __builtin_ia32_punpckhbw (v8qi, v8qi) +v4hi __builtin_ia32_punpckhwd (v4hi, v4hi) +v2si __builtin_ia32_punpckhdq (v2si, v2si) +v8qi __builtin_ia32_punpcklbw (v8qi, v8qi) +v4hi __builtin_ia32_punpcklwd (v4hi, v4hi) +v2si __builtin_ia32_punpckldq (v2si, v2si) +v8qi __builtin_ia32_packsswb (v4hi, v4hi) +v4hi __builtin_ia32_packssdw (v2si, v2si) +v8qi __builtin_ia32_packuswb (v4hi, v4hi) -@deftp {Data type} {struct __htm_tdb} -The @code{struct __htm_tdb} defined in @code{htmintrin.h} describes -the structure of the transaction diagnostic block as specified in the -Principles of Operation manual chapter 5-91. -@end deftp +v4hi __builtin_ia32_psllw (v4hi, v4hi) +v2si __builtin_ia32_pslld (v2si, v2si) +v1di __builtin_ia32_psllq (v1di, v1di) +v4hi __builtin_ia32_psrlw (v4hi, v4hi) +v2si __builtin_ia32_psrld (v2si, v2si) +v1di __builtin_ia32_psrlq (v1di, v1di) +v4hi __builtin_ia32_psraw (v4hi, v4hi) +v2si __builtin_ia32_psrad (v2si, v2si) +v4hi __builtin_ia32_psllwi (v4hi, int) +v2si __builtin_ia32_pslldi (v2si, int) +v1di __builtin_ia32_psllqi (v1di, int) +v4hi __builtin_ia32_psrlwi (v4hi, int) +v2si __builtin_ia32_psrldi (v2si, int) +v1di __builtin_ia32_psrlqi (v1di, int) +v4hi __builtin_ia32_psrawi (v4hi, int) +v2si __builtin_ia32_psradi (v2si, int) -@deftypefn {Built-in Function} int __builtin_tbegin_nofloat (void*) -Same as @code{__builtin_tbegin} but without FPR saves and restores. -Using this variant in code making use of FPRs will leave the FPRs in -undefined state when entering the transaction abort handler code. -@end deftypefn +@end smallexample -@deftypefn {Built-in Function} int __builtin_tbegin_retry (void*, int) -In addition to @code{__builtin_tbegin} a loop for transient failures -is generated. If tbegin returns a condition code of 2 the transaction -will be retried as often as specified in the second argument. The -perform processor assist instruction is used to tell the CPU about the -number of fails so far. -@end deftypefn +The following built-in functions are made available either with +@option{-msse}, or with a combination of @option{-m3dnow} and +@option{-march=athlon}. All of them generate the machine +instruction that is part of the name. -@deftypefn {Built-in Function} int __builtin_tbegin_retry_nofloat (void*, int) -Same as @code{__builtin_tbegin_retry} but without FPR saves and -restores. Using this variant in code making use of FPRs will leave -the FPRs in undefined state when entering the transaction abort -handler code. -@end deftypefn +@smallexample +v4hi __builtin_ia32_pmulhuw (v4hi, v4hi) +v8qi __builtin_ia32_pavgb (v8qi, v8qi) +v4hi __builtin_ia32_pavgw (v4hi, v4hi) +v1di __builtin_ia32_psadbw (v8qi, v8qi) +v8qi __builtin_ia32_pmaxub (v8qi, v8qi) +v4hi __builtin_ia32_pmaxsw (v4hi, v4hi) +v8qi __builtin_ia32_pminub (v8qi, v8qi) +v4hi __builtin_ia32_pminsw (v4hi, v4hi) +int __builtin_ia32_pmovmskb (v8qi) +void __builtin_ia32_maskmovq (v8qi, v8qi, char *) +void __builtin_ia32_movntq (di *, di) +void __builtin_ia32_sfence (void) +@end smallexample -@deftypefn {Built-in Function} void __builtin_tbeginc (void) -Generates the @code{tbeginc} machine instruction starting a constraint -hardware transaction. The second operand is set to @code{0xff08}. -@end deftypefn +The following built-in functions are available when @option{-msse} is used. +All of them generate the machine instruction that is part of the name. -@deftypefn {Built-in Function} int __builtin_tend (void) -Generates the @code{tend} machine instruction finishing a transaction -and making the changes visible to other threads. The condition code -generated by tend is returned as integer value. -@end deftypefn +@smallexample +int __builtin_ia32_comieq (v4sf, v4sf) +int __builtin_ia32_comineq (v4sf, v4sf) +int __builtin_ia32_comilt (v4sf, v4sf) +int __builtin_ia32_comile (v4sf, v4sf) +int __builtin_ia32_comigt (v4sf, v4sf) +int __builtin_ia32_comige (v4sf, v4sf) +int __builtin_ia32_ucomieq (v4sf, v4sf) +int __builtin_ia32_ucomineq (v4sf, v4sf) +int __builtin_ia32_ucomilt (v4sf, v4sf) +int __builtin_ia32_ucomile (v4sf, v4sf) +int __builtin_ia32_ucomigt (v4sf, v4sf) +int __builtin_ia32_ucomige (v4sf, v4sf) +v4sf __builtin_ia32_addps (v4sf, v4sf) +v4sf __builtin_ia32_subps (v4sf, v4sf) +v4sf __builtin_ia32_mulps (v4sf, v4sf) +v4sf __builtin_ia32_divps (v4sf, v4sf) +v4sf __builtin_ia32_addss (v4sf, v4sf) +v4sf __builtin_ia32_subss (v4sf, v4sf) +v4sf __builtin_ia32_mulss (v4sf, v4sf) +v4sf __builtin_ia32_divss (v4sf, v4sf) +v4sf __builtin_ia32_cmpeqps (v4sf, v4sf) +v4sf __builtin_ia32_cmpltps (v4sf, v4sf) +v4sf __builtin_ia32_cmpleps (v4sf, v4sf) +v4sf __builtin_ia32_cmpgtps (v4sf, v4sf) +v4sf __builtin_ia32_cmpgeps (v4sf, v4sf) +v4sf __builtin_ia32_cmpunordps (v4sf, v4sf) +v4sf __builtin_ia32_cmpneqps (v4sf, v4sf) +v4sf __builtin_ia32_cmpnltps (v4sf, v4sf) +v4sf __builtin_ia32_cmpnleps (v4sf, v4sf) +v4sf __builtin_ia32_cmpngtps (v4sf, v4sf) +v4sf __builtin_ia32_cmpngeps (v4sf, v4sf) +v4sf __builtin_ia32_cmpordps (v4sf, v4sf) +v4sf __builtin_ia32_cmpeqss (v4sf, v4sf) +v4sf __builtin_ia32_cmpltss (v4sf, v4sf) +v4sf __builtin_ia32_cmpless (v4sf, v4sf) +v4sf __builtin_ia32_cmpunordss (v4sf, v4sf) +v4sf __builtin_ia32_cmpneqss (v4sf, v4sf) +v4sf __builtin_ia32_cmpnltss (v4sf, v4sf) +v4sf __builtin_ia32_cmpnless (v4sf, v4sf) +v4sf __builtin_ia32_cmpordss (v4sf, v4sf) +v4sf __builtin_ia32_maxps (v4sf, v4sf) +v4sf __builtin_ia32_maxss (v4sf, v4sf) +v4sf __builtin_ia32_minps (v4sf, v4sf) +v4sf __builtin_ia32_minss (v4sf, v4sf) +v4sf __builtin_ia32_andps (v4sf, v4sf) +v4sf __builtin_ia32_andnps (v4sf, v4sf) +v4sf __builtin_ia32_orps (v4sf, v4sf) +v4sf __builtin_ia32_xorps (v4sf, v4sf) +v4sf __builtin_ia32_movss (v4sf, v4sf) +v4sf __builtin_ia32_movhlps (v4sf, v4sf) +v4sf __builtin_ia32_movlhps (v4sf, v4sf) +v4sf __builtin_ia32_unpckhps (v4sf, v4sf) +v4sf __builtin_ia32_unpcklps (v4sf, v4sf) +v4sf __builtin_ia32_cvtpi2ps (v4sf, v2si) +v4sf __builtin_ia32_cvtsi2ss (v4sf, int) +v2si __builtin_ia32_cvtps2pi (v4sf) +int __builtin_ia32_cvtss2si (v4sf) +v2si __builtin_ia32_cvttps2pi (v4sf) +int __builtin_ia32_cvttss2si (v4sf) +v4sf __builtin_ia32_rcpps (v4sf) +v4sf __builtin_ia32_rsqrtps (v4sf) +v4sf __builtin_ia32_sqrtps (v4sf) +v4sf __builtin_ia32_rcpss (v4sf) +v4sf __builtin_ia32_rsqrtss (v4sf) +v4sf __builtin_ia32_sqrtss (v4sf) +v4sf __builtin_ia32_shufps (v4sf, v4sf, int) +void __builtin_ia32_movntps (float *, v4sf) +int __builtin_ia32_movmskps (v4sf) +@end smallexample -@deftypefn {Built-in Function} void __builtin_tabort (int) -Generates the @code{tabort} machine instruction with the specified -abort code. Abort codes from 0 through 255 are reserved and will -result in an error message. -@end deftypefn +The following built-in functions are available when @option{-msse} is used. -@deftypefn {Built-in Function} void __builtin_tx_assist (int) -Generates the @code{ppa rX,rY,1} machine instruction. Where the -integer parameter is loaded into rX and a value of zero is loaded into -rY. The integer parameter specifies the number of times the -transaction repeatedly aborted. -@end deftypefn +@table @code +@item v4sf __builtin_ia32_loadups (float *) +Generates the @code{movups} machine instruction as a load from memory. +@item void __builtin_ia32_storeups (float *, v4sf) +Generates the @code{movups} machine instruction as a store to memory. +@item v4sf __builtin_ia32_loadss (float *) +Generates the @code{movss} machine instruction as a load from memory. +@item v4sf __builtin_ia32_loadhps (v4sf, const v2sf *) +Generates the @code{movhps} machine instruction as a load from memory. +@item v4sf __builtin_ia32_loadlps (v4sf, const v2sf *) +Generates the @code{movlps} machine instruction as a load from memory +@item void __builtin_ia32_storehps (v2sf *, v4sf) +Generates the @code{movhps} machine instruction as a store to memory. +@item void __builtin_ia32_storelps (v2sf *, v4sf) +Generates the @code{movlps} machine instruction as a store to memory. +@end table -@deftypefn {Built-in Function} int __builtin_tx_nesting_depth (void) -Generates the @code{etnd} machine instruction. The current nesting -depth is returned as integer value. For a nesting depth of 0 the code -is not executed as part of an transaction. -@end deftypefn +The following built-in functions are available when @option{-msse2} is used. +All of them generate the machine instruction that is part of the name. + +@smallexample +int __builtin_ia32_comisdeq (v2df, v2df) +int __builtin_ia32_comisdlt (v2df, v2df) +int __builtin_ia32_comisdle (v2df, v2df) +int __builtin_ia32_comisdgt (v2df, v2df) +int __builtin_ia32_comisdge (v2df, v2df) +int __builtin_ia32_comisdneq (v2df, v2df) +int __builtin_ia32_ucomisdeq (v2df, v2df) +int __builtin_ia32_ucomisdlt (v2df, v2df) +int __builtin_ia32_ucomisdle (v2df, v2df) +int __builtin_ia32_ucomisdgt (v2df, v2df) +int __builtin_ia32_ucomisdge (v2df, v2df) +int __builtin_ia32_ucomisdneq (v2df, v2df) +v2df __builtin_ia32_cmpeqpd (v2df, v2df) +v2df __builtin_ia32_cmpltpd (v2df, v2df) +v2df __builtin_ia32_cmplepd (v2df, v2df) +v2df __builtin_ia32_cmpgtpd (v2df, v2df) +v2df __builtin_ia32_cmpgepd (v2df, v2df) +v2df __builtin_ia32_cmpunordpd (v2df, v2df) +v2df __builtin_ia32_cmpneqpd (v2df, v2df) +v2df __builtin_ia32_cmpnltpd (v2df, v2df) +v2df __builtin_ia32_cmpnlepd (v2df, v2df) +v2df __builtin_ia32_cmpngtpd (v2df, v2df) +v2df __builtin_ia32_cmpngepd (v2df, v2df) +v2df __builtin_ia32_cmpordpd (v2df, v2df) +v2df __builtin_ia32_cmpeqsd (v2df, v2df) +v2df __builtin_ia32_cmpltsd (v2df, v2df) +v2df __builtin_ia32_cmplesd (v2df, v2df) +v2df __builtin_ia32_cmpunordsd (v2df, v2df) +v2df __builtin_ia32_cmpneqsd (v2df, v2df) +v2df __builtin_ia32_cmpnltsd (v2df, v2df) +v2df __builtin_ia32_cmpnlesd (v2df, v2df) +v2df __builtin_ia32_cmpordsd (v2df, v2df) +v2di __builtin_ia32_paddq (v2di, v2di) +v2di __builtin_ia32_psubq (v2di, v2di) +v2df __builtin_ia32_addpd (v2df, v2df) +v2df __builtin_ia32_subpd (v2df, v2df) +v2df __builtin_ia32_mulpd (v2df, v2df) +v2df __builtin_ia32_divpd (v2df, v2df) +v2df __builtin_ia32_addsd (v2df, v2df) +v2df __builtin_ia32_subsd (v2df, v2df) +v2df __builtin_ia32_mulsd (v2df, v2df) +v2df __builtin_ia32_divsd (v2df, v2df) +v2df __builtin_ia32_minpd (v2df, v2df) +v2df __builtin_ia32_maxpd (v2df, v2df) +v2df __builtin_ia32_minsd (v2df, v2df) +v2df __builtin_ia32_maxsd (v2df, v2df) +v2df __builtin_ia32_andpd (v2df, v2df) +v2df __builtin_ia32_andnpd (v2df, v2df) +v2df __builtin_ia32_orpd (v2df, v2df) +v2df __builtin_ia32_xorpd (v2df, v2df) +v2df __builtin_ia32_movsd (v2df, v2df) +v2df __builtin_ia32_unpckhpd (v2df, v2df) +v2df __builtin_ia32_unpcklpd (v2df, v2df) +v16qi __builtin_ia32_paddb128 (v16qi, v16qi) +v8hi __builtin_ia32_paddw128 (v8hi, v8hi) +v4si __builtin_ia32_paddd128 (v4si, v4si) +v2di __builtin_ia32_paddq128 (v2di, v2di) +v16qi __builtin_ia32_psubb128 (v16qi, v16qi) +v8hi __builtin_ia32_psubw128 (v8hi, v8hi) +v4si __builtin_ia32_psubd128 (v4si, v4si) +v2di __builtin_ia32_psubq128 (v2di, v2di) +v8hi __builtin_ia32_pmullw128 (v8hi, v8hi) +v8hi __builtin_ia32_pmulhw128 (v8hi, v8hi) +v2di __builtin_ia32_pand128 (v2di, v2di) +v2di __builtin_ia32_pandn128 (v2di, v2di) +v2di __builtin_ia32_por128 (v2di, v2di) +v2di __builtin_ia32_pxor128 (v2di, v2di) +v16qi __builtin_ia32_pavgb128 (v16qi, v16qi) +v8hi __builtin_ia32_pavgw128 (v8hi, v8hi) +v16qi __builtin_ia32_pcmpeqb128 (v16qi, v16qi) +v8hi __builtin_ia32_pcmpeqw128 (v8hi, v8hi) +v4si __builtin_ia32_pcmpeqd128 (v4si, v4si) +v16qi __builtin_ia32_pcmpgtb128 (v16qi, v16qi) +v8hi __builtin_ia32_pcmpgtw128 (v8hi, v8hi) +v4si __builtin_ia32_pcmpgtd128 (v4si, v4si) +v16qi __builtin_ia32_pmaxub128 (v16qi, v16qi) +v8hi __builtin_ia32_pmaxsw128 (v8hi, v8hi) +v16qi __builtin_ia32_pminub128 (v16qi, v16qi) +v8hi __builtin_ia32_pminsw128 (v8hi, v8hi) +v16qi __builtin_ia32_punpckhbw128 (v16qi, v16qi) +v8hi __builtin_ia32_punpckhwd128 (v8hi, v8hi) +v4si __builtin_ia32_punpckhdq128 (v4si, v4si) +v2di __builtin_ia32_punpckhqdq128 (v2di, v2di) +v16qi __builtin_ia32_punpcklbw128 (v16qi, v16qi) +v8hi __builtin_ia32_punpcklwd128 (v8hi, v8hi) +v4si __builtin_ia32_punpckldq128 (v4si, v4si) +v2di __builtin_ia32_punpcklqdq128 (v2di, v2di) +v16qi __builtin_ia32_packsswb128 (v8hi, v8hi) +v8hi __builtin_ia32_packssdw128 (v4si, v4si) +v16qi __builtin_ia32_packuswb128 (v8hi, v8hi) +v8hi __builtin_ia32_pmulhuw128 (v8hi, v8hi) +void __builtin_ia32_maskmovdqu (v16qi, v16qi) +v2df __builtin_ia32_loadupd (double *) +void __builtin_ia32_storeupd (double *, v2df) +v2df __builtin_ia32_loadhpd (v2df, double const *) +v2df __builtin_ia32_loadlpd (v2df, double const *) +int __builtin_ia32_movmskpd (v2df) +int __builtin_ia32_pmovmskb128 (v16qi) +void __builtin_ia32_movnti (int *, int) +void __builtin_ia32_movnti64 (long long int *, long long int) +void __builtin_ia32_movntpd (double *, v2df) +void __builtin_ia32_movntdq (v2df *, v2df) +v4si __builtin_ia32_pshufd (v4si, int) +v8hi __builtin_ia32_pshuflw (v8hi, int) +v8hi __builtin_ia32_pshufhw (v8hi, int) +v2di __builtin_ia32_psadbw128 (v16qi, v16qi) +v2df __builtin_ia32_sqrtpd (v2df) +v2df __builtin_ia32_sqrtsd (v2df) +v2df __builtin_ia32_shufpd (v2df, v2df, int) +v2df __builtin_ia32_cvtdq2pd (v4si) +v4sf __builtin_ia32_cvtdq2ps (v4si) +v4si __builtin_ia32_cvtpd2dq (v2df) +v2si __builtin_ia32_cvtpd2pi (v2df) +v4sf __builtin_ia32_cvtpd2ps (v2df) +v4si __builtin_ia32_cvttpd2dq (v2df) +v2si __builtin_ia32_cvttpd2pi (v2df) +v2df __builtin_ia32_cvtpi2pd (v2si) +int __builtin_ia32_cvtsd2si (v2df) +int __builtin_ia32_cvttsd2si (v2df) +long long __builtin_ia32_cvtsd2si64 (v2df) +long long __builtin_ia32_cvttsd2si64 (v2df) +v4si __builtin_ia32_cvtps2dq (v4sf) +v2df __builtin_ia32_cvtps2pd (v4sf) +v4si __builtin_ia32_cvttps2dq (v4sf) +v2df __builtin_ia32_cvtsi2sd (v2df, int) +v2df __builtin_ia32_cvtsi642sd (v2df, long long) +v4sf __builtin_ia32_cvtsd2ss (v4sf, v2df) +v2df __builtin_ia32_cvtss2sd (v2df, v4sf) +void __builtin_ia32_clflush (const void *) +void __builtin_ia32_lfence (void) +void __builtin_ia32_mfence (void) +v16qi __builtin_ia32_loaddqu (const char *) +void __builtin_ia32_storedqu (char *, v16qi) +v1di __builtin_ia32_pmuludq (v2si, v2si) +v2di __builtin_ia32_pmuludq128 (v4si, v4si) +v8hi __builtin_ia32_psllw128 (v8hi, v8hi) +v4si __builtin_ia32_pslld128 (v4si, v4si) +v2di __builtin_ia32_psllq128 (v2di, v2di) +v8hi __builtin_ia32_psrlw128 (v8hi, v8hi) +v4si __builtin_ia32_psrld128 (v4si, v4si) +v2di __builtin_ia32_psrlq128 (v2di, v2di) +v8hi __builtin_ia32_psraw128 (v8hi, v8hi) +v4si __builtin_ia32_psrad128 (v4si, v4si) +v2di __builtin_ia32_pslldqi128 (v2di, int) +v8hi __builtin_ia32_psllwi128 (v8hi, int) +v4si __builtin_ia32_pslldi128 (v4si, int) +v2di __builtin_ia32_psllqi128 (v2di, int) +v2di __builtin_ia32_psrldqi128 (v2di, int) +v8hi __builtin_ia32_psrlwi128 (v8hi, int) +v4si __builtin_ia32_psrldi128 (v4si, int) +v2di __builtin_ia32_psrlqi128 (v2di, int) +v8hi __builtin_ia32_psrawi128 (v8hi, int) +v4si __builtin_ia32_psradi128 (v4si, int) +v4si __builtin_ia32_pmaddwd128 (v8hi, v8hi) +v2di __builtin_ia32_movq128 (v2di) +@end smallexample -@deftypefn {Built-in Function} void __builtin_non_tx_store (uint64_t *, uint64_t) +The following built-in functions are available when @option{-msse3} is used. +All of them generate the machine instruction that is part of the name. -Generates the @code{ntstg} machine instruction. The second argument -is written to the first arguments location. The store operation will -not be rolled-back in case of an transaction abort. -@end deftypefn +@smallexample +v2df __builtin_ia32_addsubpd (v2df, v2df) +v4sf __builtin_ia32_addsubps (v4sf, v4sf) +v2df __builtin_ia32_haddpd (v2df, v2df) +v4sf __builtin_ia32_haddps (v4sf, v4sf) +v2df __builtin_ia32_hsubpd (v2df, v2df) +v4sf __builtin_ia32_hsubps (v4sf, v4sf) +v16qi __builtin_ia32_lddqu (char const *) +void __builtin_ia32_monitor (void *, unsigned int, unsigned int) +v4sf __builtin_ia32_movshdup (v4sf) +v4sf __builtin_ia32_movsldup (v4sf) +void __builtin_ia32_mwait (unsigned int, unsigned int) +@end smallexample -@node SH Built-in Functions -@subsection SH Built-in Functions -The following built-in functions are supported on the SH1, SH2, SH3 and SH4 -families of processors: +The following built-in functions are available when @option{-mssse3} is used. +All of them generate the machine instruction that is part of the name. -@deftypefn {Built-in Function} {void} __builtin_set_thread_pointer (void *@var{ptr}) -Sets the @samp{GBR} register to the specified value @var{ptr}. This is usually -used by system code that manages threads and execution contexts. The compiler -normally does not generate code that modifies the contents of @samp{GBR} and -thus the value is preserved across function calls. Changing the @samp{GBR} -value in user code must be done with caution, since the compiler might use -@samp{GBR} in order to access thread local variables. +@smallexample +v2si __builtin_ia32_phaddd (v2si, v2si) +v4hi __builtin_ia32_phaddw (v4hi, v4hi) +v4hi __builtin_ia32_phaddsw (v4hi, v4hi) +v2si __builtin_ia32_phsubd (v2si, v2si) +v4hi __builtin_ia32_phsubw (v4hi, v4hi) +v4hi __builtin_ia32_phsubsw (v4hi, v4hi) +v4hi __builtin_ia32_pmaddubsw (v8qi, v8qi) +v4hi __builtin_ia32_pmulhrsw (v4hi, v4hi) +v8qi __builtin_ia32_pshufb (v8qi, v8qi) +v8qi __builtin_ia32_psignb (v8qi, v8qi) +v2si __builtin_ia32_psignd (v2si, v2si) +v4hi __builtin_ia32_psignw (v4hi, v4hi) +v1di __builtin_ia32_palignr (v1di, v1di, int) +v8qi __builtin_ia32_pabsb (v8qi) +v2si __builtin_ia32_pabsd (v2si) +v4hi __builtin_ia32_pabsw (v4hi) +@end smallexample -@end deftypefn +The following built-in functions are available when @option{-mssse3} is used. +All of them generate the machine instruction that is part of the name. -@deftypefn {Built-in Function} {void *} __builtin_thread_pointer (void) -Returns the value that is currently set in the @samp{GBR} register. -Memory loads and stores that use the thread pointer as a base address are -turned into @samp{GBR} based displacement loads and stores, if possible. -For example: @smallexample -struct my_tcb -@{ - int a, b, c, d, e; -@}; +v4si __builtin_ia32_phaddd128 (v4si, v4si) +v8hi __builtin_ia32_phaddw128 (v8hi, v8hi) +v8hi __builtin_ia32_phaddsw128 (v8hi, v8hi) +v4si __builtin_ia32_phsubd128 (v4si, v4si) +v8hi __builtin_ia32_phsubw128 (v8hi, v8hi) +v8hi __builtin_ia32_phsubsw128 (v8hi, v8hi) +v8hi __builtin_ia32_pmaddubsw128 (v16qi, v16qi) +v8hi __builtin_ia32_pmulhrsw128 (v8hi, v8hi) +v16qi __builtin_ia32_pshufb128 (v16qi, v16qi) +v16qi __builtin_ia32_psignb128 (v16qi, v16qi) +v4si __builtin_ia32_psignd128 (v4si, v4si) +v8hi __builtin_ia32_psignw128 (v8hi, v8hi) +v2di __builtin_ia32_palignr128 (v2di, v2di, int) +v16qi __builtin_ia32_pabsb128 (v16qi) +v4si __builtin_ia32_pabsd128 (v4si) +v8hi __builtin_ia32_pabsw128 (v8hi) +@end smallexample -int get_tcb_value (void) -@{ - // Generate @samp{mov.l @@(8,gbr),r0} instruction - return ((my_tcb*)__builtin_thread_pointer ())->c; -@} +The following built-in functions are available when @option{-msse4.1} is +used. All of them generate the machine instruction that is part of the +name. +@smallexample +v2df __builtin_ia32_blendpd (v2df, v2df, const int) +v4sf __builtin_ia32_blendps (v4sf, v4sf, const int) +v2df __builtin_ia32_blendvpd (v2df, v2df, v2df) +v4sf __builtin_ia32_blendvps (v4sf, v4sf, v4sf) +v2df __builtin_ia32_dppd (v2df, v2df, const int) +v4sf __builtin_ia32_dpps (v4sf, v4sf, const int) +v4sf __builtin_ia32_insertps128 (v4sf, v4sf, const int) +v2di __builtin_ia32_movntdqa (v2di *); +v16qi __builtin_ia32_mpsadbw128 (v16qi, v16qi, const int) +v8hi __builtin_ia32_packusdw128 (v4si, v4si) +v16qi __builtin_ia32_pblendvb128 (v16qi, v16qi, v16qi) +v8hi __builtin_ia32_pblendw128 (v8hi, v8hi, const int) +v2di __builtin_ia32_pcmpeqq (v2di, v2di) +v8hi __builtin_ia32_phminposuw128 (v8hi) +v16qi __builtin_ia32_pmaxsb128 (v16qi, v16qi) +v4si __builtin_ia32_pmaxsd128 (v4si, v4si) +v4si __builtin_ia32_pmaxud128 (v4si, v4si) +v8hi __builtin_ia32_pmaxuw128 (v8hi, v8hi) +v16qi __builtin_ia32_pminsb128 (v16qi, v16qi) +v4si __builtin_ia32_pminsd128 (v4si, v4si) +v4si __builtin_ia32_pminud128 (v4si, v4si) +v8hi __builtin_ia32_pminuw128 (v8hi, v8hi) +v4si __builtin_ia32_pmovsxbd128 (v16qi) +v2di __builtin_ia32_pmovsxbq128 (v16qi) +v8hi __builtin_ia32_pmovsxbw128 (v16qi) +v2di __builtin_ia32_pmovsxdq128 (v4si) +v4si __builtin_ia32_pmovsxwd128 (v8hi) +v2di __builtin_ia32_pmovsxwq128 (v8hi) +v4si __builtin_ia32_pmovzxbd128 (v16qi) +v2di __builtin_ia32_pmovzxbq128 (v16qi) +v8hi __builtin_ia32_pmovzxbw128 (v16qi) +v2di __builtin_ia32_pmovzxdq128 (v4si) +v4si __builtin_ia32_pmovzxwd128 (v8hi) +v2di __builtin_ia32_pmovzxwq128 (v8hi) +v2di __builtin_ia32_pmuldq128 (v4si, v4si) +v4si __builtin_ia32_pmulld128 (v4si, v4si) +int __builtin_ia32_ptestc128 (v2di, v2di) +int __builtin_ia32_ptestnzc128 (v2di, v2di) +int __builtin_ia32_ptestz128 (v2di, v2di) +v2df __builtin_ia32_roundpd (v2df, const int) +v4sf __builtin_ia32_roundps (v4sf, const int) +v2df __builtin_ia32_roundsd (v2df, v2df, const int) +v4sf __builtin_ia32_roundss (v4sf, v4sf, const int) @end smallexample -@end deftypefn - -@deftypefn {Built-in Function} {unsigned int} __builtin_sh_get_fpscr (void) -Returns the value that is currently set in the @samp{FPSCR} register. -@end deftypefn - -@deftypefn {Built-in Function} {void} __builtin_sh_set_fpscr (unsigned int @var{val}) -Sets the @samp{FPSCR} register to the specified value @var{val}, while -preserving the current values of the FR, SZ and PR bits. -@end deftypefn - -@node SPARC VIS Built-in Functions -@subsection SPARC VIS Built-in Functions - -GCC supports SIMD operations on the SPARC using both the generic vector -extensions (@pxref{Vector Extensions}) as well as built-in functions for -the SPARC Visual Instruction Set (VIS). When you use the @option{-mvis} -switch, the VIS extension is exposed as the following built-in functions: - -@smallexample -typedef int v1si __attribute__ ((vector_size (4))); -typedef int v2si __attribute__ ((vector_size (8))); -typedef short v4hi __attribute__ ((vector_size (8))); -typedef short v2hi __attribute__ ((vector_size (4))); -typedef unsigned char v8qi __attribute__ ((vector_size (8))); -typedef unsigned char v4qi __attribute__ ((vector_size (4))); -void __builtin_vis_write_gsr (int64_t); -int64_t __builtin_vis_read_gsr (void); +The following built-in functions are available when @option{-msse4.1} is +used. -void * __builtin_vis_alignaddr (void *, long); -void * __builtin_vis_alignaddrl (void *, long); -int64_t __builtin_vis_faligndatadi (int64_t, int64_t); -v2si __builtin_vis_faligndatav2si (v2si, v2si); -v4hi __builtin_vis_faligndatav4hi (v4si, v4si); -v8qi __builtin_vis_faligndatav8qi (v8qi, v8qi); +@table @code +@item v4sf __builtin_ia32_vec_set_v4sf (v4sf, float, const int) +Generates the @code{insertps} machine instruction. +@item int __builtin_ia32_vec_ext_v16qi (v16qi, const int) +Generates the @code{pextrb} machine instruction. +@item v16qi __builtin_ia32_vec_set_v16qi (v16qi, int, const int) +Generates the @code{pinsrb} machine instruction. +@item v4si __builtin_ia32_vec_set_v4si (v4si, int, const int) +Generates the @code{pinsrd} machine instruction. +@item v2di __builtin_ia32_vec_set_v2di (v2di, long long, const int) +Generates the @code{pinsrq} machine instruction in 64bit mode. +@end table -v4hi __builtin_vis_fexpand (v4qi); +The following built-in functions are changed to generate new SSE4.1 +instructions when @option{-msse4.1} is used. -v4hi __builtin_vis_fmul8x16 (v4qi, v4hi); -v4hi __builtin_vis_fmul8x16au (v4qi, v2hi); -v4hi __builtin_vis_fmul8x16al (v4qi, v2hi); -v4hi __builtin_vis_fmul8sux16 (v8qi, v4hi); -v4hi __builtin_vis_fmul8ulx16 (v8qi, v4hi); -v2si __builtin_vis_fmuld8sux16 (v4qi, v2hi); -v2si __builtin_vis_fmuld8ulx16 (v4qi, v2hi); +@table @code +@item float __builtin_ia32_vec_ext_v4sf (v4sf, const int) +Generates the @code{extractps} machine instruction. +@item int __builtin_ia32_vec_ext_v4si (v4si, const int) +Generates the @code{pextrd} machine instruction. +@item long long __builtin_ia32_vec_ext_v2di (v2di, const int) +Generates the @code{pextrq} machine instruction in 64bit mode. +@end table -v4qi __builtin_vis_fpack16 (v4hi); -v8qi __builtin_vis_fpack32 (v2si, v8qi); -v2hi __builtin_vis_fpackfix (v2si); -v8qi __builtin_vis_fpmerge (v4qi, v4qi); +The following built-in functions are available when @option{-msse4.2} is +used. All of them generate the machine instruction that is part of the +name. -int64_t __builtin_vis_pdist (v8qi, v8qi, int64_t); +@smallexample +v16qi __builtin_ia32_pcmpestrm128 (v16qi, int, v16qi, int, const int) +int __builtin_ia32_pcmpestri128 (v16qi, int, v16qi, int, const int) +int __builtin_ia32_pcmpestria128 (v16qi, int, v16qi, int, const int) +int __builtin_ia32_pcmpestric128 (v16qi, int, v16qi, int, const int) +int __builtin_ia32_pcmpestrio128 (v16qi, int, v16qi, int, const int) +int __builtin_ia32_pcmpestris128 (v16qi, int, v16qi, int, const int) +int __builtin_ia32_pcmpestriz128 (v16qi, int, v16qi, int, const int) +v16qi __builtin_ia32_pcmpistrm128 (v16qi, v16qi, const int) +int __builtin_ia32_pcmpistri128 (v16qi, v16qi, const int) +int __builtin_ia32_pcmpistria128 (v16qi, v16qi, const int) +int __builtin_ia32_pcmpistric128 (v16qi, v16qi, const int) +int __builtin_ia32_pcmpistrio128 (v16qi, v16qi, const int) +int __builtin_ia32_pcmpistris128 (v16qi, v16qi, const int) +int __builtin_ia32_pcmpistriz128 (v16qi, v16qi, const int) +v2di __builtin_ia32_pcmpgtq (v2di, v2di) +@end smallexample -long __builtin_vis_edge8 (void *, void *); -long __builtin_vis_edge8l (void *, void *); -long __builtin_vis_edge16 (void *, void *); -long __builtin_vis_edge16l (void *, void *); -long __builtin_vis_edge32 (void *, void *); -long __builtin_vis_edge32l (void *, void *); +The following built-in functions are available when @option{-msse4.2} is +used. -long __builtin_vis_fcmple16 (v4hi, v4hi); -long __builtin_vis_fcmple32 (v2si, v2si); -long __builtin_vis_fcmpne16 (v4hi, v4hi); -long __builtin_vis_fcmpne32 (v2si, v2si); -long __builtin_vis_fcmpgt16 (v4hi, v4hi); -long __builtin_vis_fcmpgt32 (v2si, v2si); -long __builtin_vis_fcmpeq16 (v4hi, v4hi); -long __builtin_vis_fcmpeq32 (v2si, v2si); +@table @code +@item unsigned int __builtin_ia32_crc32qi (unsigned int, unsigned char) +Generates the @code{crc32b} machine instruction. +@item unsigned int __builtin_ia32_crc32hi (unsigned int, unsigned short) +Generates the @code{crc32w} machine instruction. +@item unsigned int __builtin_ia32_crc32si (unsigned int, unsigned int) +Generates the @code{crc32l} machine instruction. +@item unsigned long long __builtin_ia32_crc32di (unsigned long long, unsigned long long) +Generates the @code{crc32q} machine instruction. +@end table -v4hi __builtin_vis_fpadd16 (v4hi, v4hi); -v2hi __builtin_vis_fpadd16s (v2hi, v2hi); -v2si __builtin_vis_fpadd32 (v2si, v2si); -v1si __builtin_vis_fpadd32s (v1si, v1si); -v4hi __builtin_vis_fpsub16 (v4hi, v4hi); -v2hi __builtin_vis_fpsub16s (v2hi, v2hi); -v2si __builtin_vis_fpsub32 (v2si, v2si); -v1si __builtin_vis_fpsub32s (v1si, v1si); +The following built-in functions are changed to generate new SSE4.2 +instructions when @option{-msse4.2} is used. -long __builtin_vis_array8 (long, long); -long __builtin_vis_array16 (long, long); -long __builtin_vis_array32 (long, long); -@end smallexample +@table @code +@item int __builtin_popcount (unsigned int) +Generates the @code{popcntl} machine instruction. +@item int __builtin_popcountl (unsigned long) +Generates the @code{popcntl} or @code{popcntq} machine instruction, +depending on the size of @code{unsigned long}. +@item int __builtin_popcountll (unsigned long long) +Generates the @code{popcntq} machine instruction. +@end table -When you use the @option{-mvis2} switch, the VIS version 2.0 built-in -functions also become available: +The following built-in functions are available when @option{-mavx} is +used. All of them generate the machine instruction that is part of the +name. @smallexample -long __builtin_vis_bmask (long, long); -int64_t __builtin_vis_bshuffledi (int64_t, int64_t); -v2si __builtin_vis_bshufflev2si (v2si, v2si); -v4hi __builtin_vis_bshufflev2si (v4hi, v4hi); -v8qi __builtin_vis_bshufflev2si (v8qi, v8qi); - -long __builtin_vis_edge8n (void *, void *); -long __builtin_vis_edge8ln (void *, void *); -long __builtin_vis_edge16n (void *, void *); -long __builtin_vis_edge16ln (void *, void *); -long __builtin_vis_edge32n (void *, void *); -long __builtin_vis_edge32ln (void *, void *); +v4df __builtin_ia32_addpd256 (v4df,v4df) +v8sf __builtin_ia32_addps256 (v8sf,v8sf) +v4df __builtin_ia32_addsubpd256 (v4df,v4df) +v8sf __builtin_ia32_addsubps256 (v8sf,v8sf) +v4df __builtin_ia32_andnpd256 (v4df,v4df) +v8sf __builtin_ia32_andnps256 (v8sf,v8sf) +v4df __builtin_ia32_andpd256 (v4df,v4df) +v8sf __builtin_ia32_andps256 (v8sf,v8sf) +v4df __builtin_ia32_blendpd256 (v4df,v4df,int) +v8sf __builtin_ia32_blendps256 (v8sf,v8sf,int) +v4df __builtin_ia32_blendvpd256 (v4df,v4df,v4df) +v8sf __builtin_ia32_blendvps256 (v8sf,v8sf,v8sf) +v2df __builtin_ia32_cmppd (v2df,v2df,int) +v4df __builtin_ia32_cmppd256 (v4df,v4df,int) +v4sf __builtin_ia32_cmpps (v4sf,v4sf,int) +v8sf __builtin_ia32_cmpps256 (v8sf,v8sf,int) +v2df __builtin_ia32_cmpsd (v2df,v2df,int) +v4sf __builtin_ia32_cmpss (v4sf,v4sf,int) +v4df __builtin_ia32_cvtdq2pd256 (v4si) +v8sf __builtin_ia32_cvtdq2ps256 (v8si) +v4si __builtin_ia32_cvtpd2dq256 (v4df) +v4sf __builtin_ia32_cvtpd2ps256 (v4df) +v8si __builtin_ia32_cvtps2dq256 (v8sf) +v4df __builtin_ia32_cvtps2pd256 (v4sf) +v4si __builtin_ia32_cvttpd2dq256 (v4df) +v8si __builtin_ia32_cvttps2dq256 (v8sf) +v4df __builtin_ia32_divpd256 (v4df,v4df) +v8sf __builtin_ia32_divps256 (v8sf,v8sf) +v8sf __builtin_ia32_dpps256 (v8sf,v8sf,int) +v4df __builtin_ia32_haddpd256 (v4df,v4df) +v8sf __builtin_ia32_haddps256 (v8sf,v8sf) +v4df __builtin_ia32_hsubpd256 (v4df,v4df) +v8sf __builtin_ia32_hsubps256 (v8sf,v8sf) +v32qi __builtin_ia32_lddqu256 (pcchar) +v32qi __builtin_ia32_loaddqu256 (pcchar) +v4df __builtin_ia32_loadupd256 (pcdouble) +v8sf __builtin_ia32_loadups256 (pcfloat) +v2df __builtin_ia32_maskloadpd (pcv2df,v2df) +v4df __builtin_ia32_maskloadpd256 (pcv4df,v4df) +v4sf __builtin_ia32_maskloadps (pcv4sf,v4sf) +v8sf __builtin_ia32_maskloadps256 (pcv8sf,v8sf) +void __builtin_ia32_maskstorepd (pv2df,v2df,v2df) +void __builtin_ia32_maskstorepd256 (pv4df,v4df,v4df) +void __builtin_ia32_maskstoreps (pv4sf,v4sf,v4sf) +void __builtin_ia32_maskstoreps256 (pv8sf,v8sf,v8sf) +v4df __builtin_ia32_maxpd256 (v4df,v4df) +v8sf __builtin_ia32_maxps256 (v8sf,v8sf) +v4df __builtin_ia32_minpd256 (v4df,v4df) +v8sf __builtin_ia32_minps256 (v8sf,v8sf) +v4df __builtin_ia32_movddup256 (v4df) +int __builtin_ia32_movmskpd256 (v4df) +int __builtin_ia32_movmskps256 (v8sf) +v8sf __builtin_ia32_movshdup256 (v8sf) +v8sf __builtin_ia32_movsldup256 (v8sf) +v4df __builtin_ia32_mulpd256 (v4df,v4df) +v8sf __builtin_ia32_mulps256 (v8sf,v8sf) +v4df __builtin_ia32_orpd256 (v4df,v4df) +v8sf __builtin_ia32_orps256 (v8sf,v8sf) +v2df __builtin_ia32_pd_pd256 (v4df) +v4df __builtin_ia32_pd256_pd (v2df) +v4sf __builtin_ia32_ps_ps256 (v8sf) +v8sf __builtin_ia32_ps256_ps (v4sf) +int __builtin_ia32_ptestc256 (v4di,v4di,ptest) +int __builtin_ia32_ptestnzc256 (v4di,v4di,ptest) +int __builtin_ia32_ptestz256 (v4di,v4di,ptest) +v8sf __builtin_ia32_rcpps256 (v8sf) +v4df __builtin_ia32_roundpd256 (v4df,int) +v8sf __builtin_ia32_roundps256 (v8sf,int) +v8sf __builtin_ia32_rsqrtps_nr256 (v8sf) +v8sf __builtin_ia32_rsqrtps256 (v8sf) +v4df __builtin_ia32_shufpd256 (v4df,v4df,int) +v8sf __builtin_ia32_shufps256 (v8sf,v8sf,int) +v4si __builtin_ia32_si_si256 (v8si) +v8si __builtin_ia32_si256_si (v4si) +v4df __builtin_ia32_sqrtpd256 (v4df) +v8sf __builtin_ia32_sqrtps_nr256 (v8sf) +v8sf __builtin_ia32_sqrtps256 (v8sf) +void __builtin_ia32_storedqu256 (pchar,v32qi) +void __builtin_ia32_storeupd256 (pdouble,v4df) +void __builtin_ia32_storeups256 (pfloat,v8sf) +v4df __builtin_ia32_subpd256 (v4df,v4df) +v8sf __builtin_ia32_subps256 (v8sf,v8sf) +v4df __builtin_ia32_unpckhpd256 (v4df,v4df) +v8sf __builtin_ia32_unpckhps256 (v8sf,v8sf) +v4df __builtin_ia32_unpcklpd256 (v4df,v4df) +v8sf __builtin_ia32_unpcklps256 (v8sf,v8sf) +v4df __builtin_ia32_vbroadcastf128_pd256 (pcv2df) +v8sf __builtin_ia32_vbroadcastf128_ps256 (pcv4sf) +v4df __builtin_ia32_vbroadcastsd256 (pcdouble) +v4sf __builtin_ia32_vbroadcastss (pcfloat) +v8sf __builtin_ia32_vbroadcastss256 (pcfloat) +v2df __builtin_ia32_vextractf128_pd256 (v4df,int) +v4sf __builtin_ia32_vextractf128_ps256 (v8sf,int) +v4si __builtin_ia32_vextractf128_si256 (v8si,int) +v4df __builtin_ia32_vinsertf128_pd256 (v4df,v2df,int) +v8sf __builtin_ia32_vinsertf128_ps256 (v8sf,v4sf,int) +v8si __builtin_ia32_vinsertf128_si256 (v8si,v4si,int) +v4df __builtin_ia32_vperm2f128_pd256 (v4df,v4df,int) +v8sf __builtin_ia32_vperm2f128_ps256 (v8sf,v8sf,int) +v8si __builtin_ia32_vperm2f128_si256 (v8si,v8si,int) +v2df __builtin_ia32_vpermil2pd (v2df,v2df,v2di,int) +v4df __builtin_ia32_vpermil2pd256 (v4df,v4df,v4di,int) +v4sf __builtin_ia32_vpermil2ps (v4sf,v4sf,v4si,int) +v8sf __builtin_ia32_vpermil2ps256 (v8sf,v8sf,v8si,int) +v2df __builtin_ia32_vpermilpd (v2df,int) +v4df __builtin_ia32_vpermilpd256 (v4df,int) +v4sf __builtin_ia32_vpermilps (v4sf,int) +v8sf __builtin_ia32_vpermilps256 (v8sf,int) +v2df __builtin_ia32_vpermilvarpd (v2df,v2di) +v4df __builtin_ia32_vpermilvarpd256 (v4df,v4di) +v4sf __builtin_ia32_vpermilvarps (v4sf,v4si) +v8sf __builtin_ia32_vpermilvarps256 (v8sf,v8si) +int __builtin_ia32_vtestcpd (v2df,v2df,ptest) +int __builtin_ia32_vtestcpd256 (v4df,v4df,ptest) +int __builtin_ia32_vtestcps (v4sf,v4sf,ptest) +int __builtin_ia32_vtestcps256 (v8sf,v8sf,ptest) +int __builtin_ia32_vtestnzcpd (v2df,v2df,ptest) +int __builtin_ia32_vtestnzcpd256 (v4df,v4df,ptest) +int __builtin_ia32_vtestnzcps (v4sf,v4sf,ptest) +int __builtin_ia32_vtestnzcps256 (v8sf,v8sf,ptest) +int __builtin_ia32_vtestzpd (v2df,v2df,ptest) +int __builtin_ia32_vtestzpd256 (v4df,v4df,ptest) +int __builtin_ia32_vtestzps (v4sf,v4sf,ptest) +int __builtin_ia32_vtestzps256 (v8sf,v8sf,ptest) +void __builtin_ia32_vzeroall (void) +void __builtin_ia32_vzeroupper (void) +v4df __builtin_ia32_xorpd256 (v4df,v4df) +v8sf __builtin_ia32_xorps256 (v8sf,v8sf) @end smallexample -When you use the @option{-mvis3} switch, the VIS version 3.0 built-in -functions also become available: +The following built-in functions are available when @option{-mavx2} is +used. All of them generate the machine instruction that is part of the +name. @smallexample -void __builtin_vis_cmask8 (long); -void __builtin_vis_cmask16 (long); -void __builtin_vis_cmask32 (long); - -v4hi __builtin_vis_fchksm16 (v4hi, v4hi); - -v4hi __builtin_vis_fsll16 (v4hi, v4hi); -v4hi __builtin_vis_fslas16 (v4hi, v4hi); -v4hi __builtin_vis_fsrl16 (v4hi, v4hi); -v4hi __builtin_vis_fsra16 (v4hi, v4hi); -v2si __builtin_vis_fsll16 (v2si, v2si); -v2si __builtin_vis_fslas16 (v2si, v2si); -v2si __builtin_vis_fsrl16 (v2si, v2si); -v2si __builtin_vis_fsra16 (v2si, v2si); - -long __builtin_vis_pdistn (v8qi, v8qi); - -v4hi __builtin_vis_fmean16 (v4hi, v4hi); - -int64_t __builtin_vis_fpadd64 (int64_t, int64_t); -int64_t __builtin_vis_fpsub64 (int64_t, int64_t); - -v4hi __builtin_vis_fpadds16 (v4hi, v4hi); -v2hi __builtin_vis_fpadds16s (v2hi, v2hi); -v4hi __builtin_vis_fpsubs16 (v4hi, v4hi); -v2hi __builtin_vis_fpsubs16s (v2hi, v2hi); -v2si __builtin_vis_fpadds32 (v2si, v2si); -v1si __builtin_vis_fpadds32s (v1si, v1si); -v2si __builtin_vis_fpsubs32 (v2si, v2si); -v1si __builtin_vis_fpsubs32s (v1si, v1si); - -long __builtin_vis_fucmple8 (v8qi, v8qi); -long __builtin_vis_fucmpne8 (v8qi, v8qi); -long __builtin_vis_fucmpgt8 (v8qi, v8qi); -long __builtin_vis_fucmpeq8 (v8qi, v8qi); +v32qi __builtin_ia32_mpsadbw256 (v32qi,v32qi,int) +v32qi __builtin_ia32_pabsb256 (v32qi) +v16hi __builtin_ia32_pabsw256 (v16hi) +v8si __builtin_ia32_pabsd256 (v8si) +v16hi __builtin_ia32_packssdw256 (v8si,v8si) +v32qi __builtin_ia32_packsswb256 (v16hi,v16hi) +v16hi __builtin_ia32_packusdw256 (v8si,v8si) +v32qi __builtin_ia32_packuswb256 (v16hi,v16hi) +v32qi __builtin_ia32_paddb256 (v32qi,v32qi) +v16hi __builtin_ia32_paddw256 (v16hi,v16hi) +v8si __builtin_ia32_paddd256 (v8si,v8si) +v4di __builtin_ia32_paddq256 (v4di,v4di) +v32qi __builtin_ia32_paddsb256 (v32qi,v32qi) +v16hi __builtin_ia32_paddsw256 (v16hi,v16hi) +v32qi __builtin_ia32_paddusb256 (v32qi,v32qi) +v16hi __builtin_ia32_paddusw256 (v16hi,v16hi) +v4di __builtin_ia32_palignr256 (v4di,v4di,int) +v4di __builtin_ia32_andsi256 (v4di,v4di) +v4di __builtin_ia32_andnotsi256 (v4di,v4di) +v32qi __builtin_ia32_pavgb256 (v32qi,v32qi) +v16hi __builtin_ia32_pavgw256 (v16hi,v16hi) +v32qi __builtin_ia32_pblendvb256 (v32qi,v32qi,v32qi) +v16hi __builtin_ia32_pblendw256 (v16hi,v16hi,int) +v32qi __builtin_ia32_pcmpeqb256 (v32qi,v32qi) +v16hi __builtin_ia32_pcmpeqw256 (v16hi,v16hi) +v8si __builtin_ia32_pcmpeqd256 (c8si,v8si) +v4di __builtin_ia32_pcmpeqq256 (v4di,v4di) +v32qi __builtin_ia32_pcmpgtb256 (v32qi,v32qi) +v16hi __builtin_ia32_pcmpgtw256 (16hi,v16hi) +v8si __builtin_ia32_pcmpgtd256 (v8si,v8si) +v4di __builtin_ia32_pcmpgtq256 (v4di,v4di) +v16hi __builtin_ia32_phaddw256 (v16hi,v16hi) +v8si __builtin_ia32_phaddd256 (v8si,v8si) +v16hi __builtin_ia32_phaddsw256 (v16hi,v16hi) +v16hi __builtin_ia32_phsubw256 (v16hi,v16hi) +v8si __builtin_ia32_phsubd256 (v8si,v8si) +v16hi __builtin_ia32_phsubsw256 (v16hi,v16hi) +v32qi __builtin_ia32_pmaddubsw256 (v32qi,v32qi) +v16hi __builtin_ia32_pmaddwd256 (v16hi,v16hi) +v32qi __builtin_ia32_pmaxsb256 (v32qi,v32qi) +v16hi __builtin_ia32_pmaxsw256 (v16hi,v16hi) +v8si __builtin_ia32_pmaxsd256 (v8si,v8si) +v32qi __builtin_ia32_pmaxub256 (v32qi,v32qi) +v16hi __builtin_ia32_pmaxuw256 (v16hi,v16hi) +v8si __builtin_ia32_pmaxud256 (v8si,v8si) +v32qi __builtin_ia32_pminsb256 (v32qi,v32qi) +v16hi __builtin_ia32_pminsw256 (v16hi,v16hi) +v8si __builtin_ia32_pminsd256 (v8si,v8si) +v32qi __builtin_ia32_pminub256 (v32qi,v32qi) +v16hi __builtin_ia32_pminuw256 (v16hi,v16hi) +v8si __builtin_ia32_pminud256 (v8si,v8si) +int __builtin_ia32_pmovmskb256 (v32qi) +v16hi __builtin_ia32_pmovsxbw256 (v16qi) +v8si __builtin_ia32_pmovsxbd256 (v16qi) +v4di __builtin_ia32_pmovsxbq256 (v16qi) +v8si __builtin_ia32_pmovsxwd256 (v8hi) +v4di __builtin_ia32_pmovsxwq256 (v8hi) +v4di __builtin_ia32_pmovsxdq256 (v4si) +v16hi __builtin_ia32_pmovzxbw256 (v16qi) +v8si __builtin_ia32_pmovzxbd256 (v16qi) +v4di __builtin_ia32_pmovzxbq256 (v16qi) +v8si __builtin_ia32_pmovzxwd256 (v8hi) +v4di __builtin_ia32_pmovzxwq256 (v8hi) +v4di __builtin_ia32_pmovzxdq256 (v4si) +v4di __builtin_ia32_pmuldq256 (v8si,v8si) +v16hi __builtin_ia32_pmulhrsw256 (v16hi, v16hi) +v16hi __builtin_ia32_pmulhuw256 (v16hi,v16hi) +v16hi __builtin_ia32_pmulhw256 (v16hi,v16hi) +v16hi __builtin_ia32_pmullw256 (v16hi,v16hi) +v8si __builtin_ia32_pmulld256 (v8si,v8si) +v4di __builtin_ia32_pmuludq256 (v8si,v8si) +v4di __builtin_ia32_por256 (v4di,v4di) +v16hi __builtin_ia32_psadbw256 (v32qi,v32qi) +v32qi __builtin_ia32_pshufb256 (v32qi,v32qi) +v8si __builtin_ia32_pshufd256 (v8si,int) +v16hi __builtin_ia32_pshufhw256 (v16hi,int) +v16hi __builtin_ia32_pshuflw256 (v16hi,int) +v32qi __builtin_ia32_psignb256 (v32qi,v32qi) +v16hi __builtin_ia32_psignw256 (v16hi,v16hi) +v8si __builtin_ia32_psignd256 (v8si,v8si) +v4di __builtin_ia32_pslldqi256 (v4di,int) +v16hi __builtin_ia32_psllwi256 (16hi,int) +v16hi __builtin_ia32_psllw256(v16hi,v8hi) +v8si __builtin_ia32_pslldi256 (v8si,int) +v8si __builtin_ia32_pslld256(v8si,v4si) +v4di __builtin_ia32_psllqi256 (v4di,int) +v4di __builtin_ia32_psllq256(v4di,v2di) +v16hi __builtin_ia32_psrawi256 (v16hi,int) +v16hi __builtin_ia32_psraw256 (v16hi,v8hi) +v8si __builtin_ia32_psradi256 (v8si,int) +v8si __builtin_ia32_psrad256 (v8si,v4si) +v4di __builtin_ia32_psrldqi256 (v4di, int) +v16hi __builtin_ia32_psrlwi256 (v16hi,int) +v16hi __builtin_ia32_psrlw256 (v16hi,v8hi) +v8si __builtin_ia32_psrldi256 (v8si,int) +v8si __builtin_ia32_psrld256 (v8si,v4si) +v4di __builtin_ia32_psrlqi256 (v4di,int) +v4di __builtin_ia32_psrlq256(v4di,v2di) +v32qi __builtin_ia32_psubb256 (v32qi,v32qi) +v32hi __builtin_ia32_psubw256 (v16hi,v16hi) +v8si __builtin_ia32_psubd256 (v8si,v8si) +v4di __builtin_ia32_psubq256 (v4di,v4di) +v32qi __builtin_ia32_psubsb256 (v32qi,v32qi) +v16hi __builtin_ia32_psubsw256 (v16hi,v16hi) +v32qi __builtin_ia32_psubusb256 (v32qi,v32qi) +v16hi __builtin_ia32_psubusw256 (v16hi,v16hi) +v32qi __builtin_ia32_punpckhbw256 (v32qi,v32qi) +v16hi __builtin_ia32_punpckhwd256 (v16hi,v16hi) +v8si __builtin_ia32_punpckhdq256 (v8si,v8si) +v4di __builtin_ia32_punpckhqdq256 (v4di,v4di) +v32qi __builtin_ia32_punpcklbw256 (v32qi,v32qi) +v16hi __builtin_ia32_punpcklwd256 (v16hi,v16hi) +v8si __builtin_ia32_punpckldq256 (v8si,v8si) +v4di __builtin_ia32_punpcklqdq256 (v4di,v4di) +v4di __builtin_ia32_pxor256 (v4di,v4di) +v4di __builtin_ia32_movntdqa256 (pv4di) +v4sf __builtin_ia32_vbroadcastss_ps (v4sf) +v8sf __builtin_ia32_vbroadcastss_ps256 (v4sf) +v4df __builtin_ia32_vbroadcastsd_pd256 (v2df) +v4di __builtin_ia32_vbroadcastsi256 (v2di) +v4si __builtin_ia32_pblendd128 (v4si,v4si) +v8si __builtin_ia32_pblendd256 (v8si,v8si) +v32qi __builtin_ia32_pbroadcastb256 (v16qi) +v16hi __builtin_ia32_pbroadcastw256 (v8hi) +v8si __builtin_ia32_pbroadcastd256 (v4si) +v4di __builtin_ia32_pbroadcastq256 (v2di) +v16qi __builtin_ia32_pbroadcastb128 (v16qi) +v8hi __builtin_ia32_pbroadcastw128 (v8hi) +v4si __builtin_ia32_pbroadcastd128 (v4si) +v2di __builtin_ia32_pbroadcastq128 (v2di) +v8si __builtin_ia32_permvarsi256 (v8si,v8si) +v4df __builtin_ia32_permdf256 (v4df,int) +v8sf __builtin_ia32_permvarsf256 (v8sf,v8sf) +v4di __builtin_ia32_permdi256 (v4di,int) +v4di __builtin_ia32_permti256 (v4di,v4di,int) +v4di __builtin_ia32_extract128i256 (v4di,int) +v4di __builtin_ia32_insert128i256 (v4di,v2di,int) +v8si __builtin_ia32_maskloadd256 (pcv8si,v8si) +v4di __builtin_ia32_maskloadq256 (pcv4di,v4di) +v4si __builtin_ia32_maskloadd (pcv4si,v4si) +v2di __builtin_ia32_maskloadq (pcv2di,v2di) +void __builtin_ia32_maskstored256 (pv8si,v8si,v8si) +void __builtin_ia32_maskstoreq256 (pv4di,v4di,v4di) +void __builtin_ia32_maskstored (pv4si,v4si,v4si) +void __builtin_ia32_maskstoreq (pv2di,v2di,v2di) +v8si __builtin_ia32_psllv8si (v8si,v8si) +v4si __builtin_ia32_psllv4si (v4si,v4si) +v4di __builtin_ia32_psllv4di (v4di,v4di) +v2di __builtin_ia32_psllv2di (v2di,v2di) +v8si __builtin_ia32_psrav8si (v8si,v8si) +v4si __builtin_ia32_psrav4si (v4si,v4si) +v8si __builtin_ia32_psrlv8si (v8si,v8si) +v4si __builtin_ia32_psrlv4si (v4si,v4si) +v4di __builtin_ia32_psrlv4di (v4di,v4di) +v2di __builtin_ia32_psrlv2di (v2di,v2di) +v2df __builtin_ia32_gathersiv2df (v2df, pcdouble,v4si,v2df,int) +v4df __builtin_ia32_gathersiv4df (v4df, pcdouble,v4si,v4df,int) +v2df __builtin_ia32_gatherdiv2df (v2df, pcdouble,v2di,v2df,int) +v4df __builtin_ia32_gatherdiv4df (v4df, pcdouble,v4di,v4df,int) +v4sf __builtin_ia32_gathersiv4sf (v4sf, pcfloat,v4si,v4sf,int) +v8sf __builtin_ia32_gathersiv8sf (v8sf, pcfloat,v8si,v8sf,int) +v4sf __builtin_ia32_gatherdiv4sf (v4sf, pcfloat,v2di,v4sf,int) +v4sf __builtin_ia32_gatherdiv4sf256 (v4sf, pcfloat,v4di,v4sf,int) +v2di __builtin_ia32_gathersiv2di (v2di, pcint64,v4si,v2di,int) +v4di __builtin_ia32_gathersiv4di (v4di, pcint64,v4si,v4di,int) +v2di __builtin_ia32_gatherdiv2di (v2di, pcint64,v2di,v2di,int) +v4di __builtin_ia32_gatherdiv4di (v4di, pcint64,v4di,v4di,int) +v4si __builtin_ia32_gathersiv4si (v4si, pcint,v4si,v4si,int) +v8si __builtin_ia32_gathersiv8si (v8si, pcint,v8si,v8si,int) +v4si __builtin_ia32_gatherdiv4si (v4si, pcint,v2di,v4si,int) +v4si __builtin_ia32_gatherdiv4si256 (v4si, pcint,v4di,v4si,int) +@end smallexample -float __builtin_vis_fhadds (float, float); -double __builtin_vis_fhaddd (double, double); -float __builtin_vis_fhsubs (float, float); -double __builtin_vis_fhsubd (double, double); -float __builtin_vis_fnhadds (float, float); -double __builtin_vis_fnhaddd (double, double); +The following built-in functions are available when @option{-maes} is +used. All of them generate the machine instruction that is part of the +name. -int64_t __builtin_vis_umulxhi (int64_t, int64_t); -int64_t __builtin_vis_xmulx (int64_t, int64_t); -int64_t __builtin_vis_xmulxhi (int64_t, int64_t); +@smallexample +v2di __builtin_ia32_aesenc128 (v2di, v2di) +v2di __builtin_ia32_aesenclast128 (v2di, v2di) +v2di __builtin_ia32_aesdec128 (v2di, v2di) +v2di __builtin_ia32_aesdeclast128 (v2di, v2di) +v2di __builtin_ia32_aeskeygenassist128 (v2di, const int) +v2di __builtin_ia32_aesimc128 (v2di) @end smallexample -@node SPU Built-in Functions -@subsection SPU Built-in Functions +The following built-in function is available when @option{-mpclmul} is +used. -GCC provides extensions for the SPU processor as described in the -Sony/Toshiba/IBM SPU Language Extensions Specification, which can be -found at @uref{http://cell.scei.co.jp/} or -@uref{http://www.ibm.com/developerworks/power/cell/}. GCC's -implementation differs in several ways. +@table @code +@item v2di __builtin_ia32_pclmulqdq128 (v2di, v2di, const int) +Generates the @code{pclmulqdq} machine instruction. +@end table -@itemize @bullet +The following built-in function is available when @option{-mfsgsbase} is +used. All of them generate the machine instruction that is part of the +name. -@item -The optional extension of specifying vector constants in parentheses is -not supported. +@smallexample +unsigned int __builtin_ia32_rdfsbase32 (void) +unsigned long long __builtin_ia32_rdfsbase64 (void) +unsigned int __builtin_ia32_rdgsbase32 (void) +unsigned long long __builtin_ia32_rdgsbase64 (void) +void _writefsbase_u32 (unsigned int) +void _writefsbase_u64 (unsigned long long) +void _writegsbase_u32 (unsigned int) +void _writegsbase_u64 (unsigned long long) +@end smallexample -@item -A vector initializer requires no cast if the vector constant is of the -same type as the variable it is initializing. +The following built-in function is available when @option{-mrdrnd} is +used. All of them generate the machine instruction that is part of the +name. -@item -If @code{signed} or @code{unsigned} is omitted, the signedness of the -vector type is the default signedness of the base type. The default -varies depending on the operating system, so a portable program should -always specify the signedness. +@smallexample +unsigned int __builtin_ia32_rdrand16_step (unsigned short *) +unsigned int __builtin_ia32_rdrand32_step (unsigned int *) +unsigned int __builtin_ia32_rdrand64_step (unsigned long long *) +@end smallexample -@item -By default, the keyword @code{__vector} is added. The macro -@code{vector} is defined in @code{} and can be -undefined. +The following built-in functions are available when @option{-msse4a} is used. +All of them generate the machine instruction that is part of the name. -@item -GCC allows using a @code{typedef} name as the type specifier for a -vector type. +@smallexample +void __builtin_ia32_movntsd (double *, v2df) +void __builtin_ia32_movntss (float *, v4sf) +v2di __builtin_ia32_extrq (v2di, v16qi) +v2di __builtin_ia32_extrqi (v2di, const unsigned int, const unsigned int) +v2di __builtin_ia32_insertq (v2di, v2di) +v2di __builtin_ia32_insertqi (v2di, v2di, const unsigned int, const unsigned int) +@end smallexample -@item -For C, overloaded functions are implemented with macros so the following -does not work: +The following built-in functions are available when @option{-mxop} is used. +@smallexample +v2df __builtin_ia32_vfrczpd (v2df) +v4sf __builtin_ia32_vfrczps (v4sf) +v2df __builtin_ia32_vfrczsd (v2df) +v4sf __builtin_ia32_vfrczss (v4sf) +v4df __builtin_ia32_vfrczpd256 (v4df) +v8sf __builtin_ia32_vfrczps256 (v8sf) +v2di __builtin_ia32_vpcmov (v2di, v2di, v2di) +v2di __builtin_ia32_vpcmov_v2di (v2di, v2di, v2di) +v4si __builtin_ia32_vpcmov_v4si (v4si, v4si, v4si) +v8hi __builtin_ia32_vpcmov_v8hi (v8hi, v8hi, v8hi) +v16qi __builtin_ia32_vpcmov_v16qi (v16qi, v16qi, v16qi) +v2df __builtin_ia32_vpcmov_v2df (v2df, v2df, v2df) +v4sf __builtin_ia32_vpcmov_v4sf (v4sf, v4sf, v4sf) +v4di __builtin_ia32_vpcmov_v4di256 (v4di, v4di, v4di) +v8si __builtin_ia32_vpcmov_v8si256 (v8si, v8si, v8si) +v16hi __builtin_ia32_vpcmov_v16hi256 (v16hi, v16hi, v16hi) +v32qi __builtin_ia32_vpcmov_v32qi256 (v32qi, v32qi, v32qi) +v4df __builtin_ia32_vpcmov_v4df256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vpcmov_v8sf256 (v8sf, v8sf, v8sf) +v16qi __builtin_ia32_vpcomeqb (v16qi, v16qi) +v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi) +v4si __builtin_ia32_vpcomeqd (v4si, v4si) +v2di __builtin_ia32_vpcomeqq (v2di, v2di) +v16qi __builtin_ia32_vpcomequb (v16qi, v16qi) +v4si __builtin_ia32_vpcomequd (v4si, v4si) +v2di __builtin_ia32_vpcomequq (v2di, v2di) +v8hi __builtin_ia32_vpcomequw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomeqw (v8hi, v8hi) +v16qi __builtin_ia32_vpcomfalseb (v16qi, v16qi) +v4si __builtin_ia32_vpcomfalsed (v4si, v4si) +v2di __builtin_ia32_vpcomfalseq (v2di, v2di) +v16qi __builtin_ia32_vpcomfalseub (v16qi, v16qi) +v4si __builtin_ia32_vpcomfalseud (v4si, v4si) +v2di __builtin_ia32_vpcomfalseuq (v2di, v2di) +v8hi __builtin_ia32_vpcomfalseuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomfalsew (v8hi, v8hi) +v16qi __builtin_ia32_vpcomgeb (v16qi, v16qi) +v4si __builtin_ia32_vpcomged (v4si, v4si) +v2di __builtin_ia32_vpcomgeq (v2di, v2di) +v16qi __builtin_ia32_vpcomgeub (v16qi, v16qi) +v4si __builtin_ia32_vpcomgeud (v4si, v4si) +v2di __builtin_ia32_vpcomgeuq (v2di, v2di) +v8hi __builtin_ia32_vpcomgeuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomgew (v8hi, v8hi) +v16qi __builtin_ia32_vpcomgtb (v16qi, v16qi) +v4si __builtin_ia32_vpcomgtd (v4si, v4si) +v2di __builtin_ia32_vpcomgtq (v2di, v2di) +v16qi __builtin_ia32_vpcomgtub (v16qi, v16qi) +v4si __builtin_ia32_vpcomgtud (v4si, v4si) +v2di __builtin_ia32_vpcomgtuq (v2di, v2di) +v8hi __builtin_ia32_vpcomgtuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomgtw (v8hi, v8hi) +v16qi __builtin_ia32_vpcomleb (v16qi, v16qi) +v4si __builtin_ia32_vpcomled (v4si, v4si) +v2di __builtin_ia32_vpcomleq (v2di, v2di) +v16qi __builtin_ia32_vpcomleub (v16qi, v16qi) +v4si __builtin_ia32_vpcomleud (v4si, v4si) +v2di __builtin_ia32_vpcomleuq (v2di, v2di) +v8hi __builtin_ia32_vpcomleuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomlew (v8hi, v8hi) +v16qi __builtin_ia32_vpcomltb (v16qi, v16qi) +v4si __builtin_ia32_vpcomltd (v4si, v4si) +v2di __builtin_ia32_vpcomltq (v2di, v2di) +v16qi __builtin_ia32_vpcomltub (v16qi, v16qi) +v4si __builtin_ia32_vpcomltud (v4si, v4si) +v2di __builtin_ia32_vpcomltuq (v2di, v2di) +v8hi __builtin_ia32_vpcomltuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomltw (v8hi, v8hi) +v16qi __builtin_ia32_vpcomneb (v16qi, v16qi) +v4si __builtin_ia32_vpcomned (v4si, v4si) +v2di __builtin_ia32_vpcomneq (v2di, v2di) +v16qi __builtin_ia32_vpcomneub (v16qi, v16qi) +v4si __builtin_ia32_vpcomneud (v4si, v4si) +v2di __builtin_ia32_vpcomneuq (v2di, v2di) +v8hi __builtin_ia32_vpcomneuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomnew (v8hi, v8hi) +v16qi __builtin_ia32_vpcomtrueb (v16qi, v16qi) +v4si __builtin_ia32_vpcomtrued (v4si, v4si) +v2di __builtin_ia32_vpcomtrueq (v2di, v2di) +v16qi __builtin_ia32_vpcomtrueub (v16qi, v16qi) +v4si __builtin_ia32_vpcomtrueud (v4si, v4si) +v2di __builtin_ia32_vpcomtrueuq (v2di, v2di) +v8hi __builtin_ia32_vpcomtrueuw (v8hi, v8hi) +v8hi __builtin_ia32_vpcomtruew (v8hi, v8hi) +v4si __builtin_ia32_vphaddbd (v16qi) +v2di __builtin_ia32_vphaddbq (v16qi) +v8hi __builtin_ia32_vphaddbw (v16qi) +v2di __builtin_ia32_vphadddq (v4si) +v4si __builtin_ia32_vphaddubd (v16qi) +v2di __builtin_ia32_vphaddubq (v16qi) +v8hi __builtin_ia32_vphaddubw (v16qi) +v2di __builtin_ia32_vphaddudq (v4si) +v4si __builtin_ia32_vphadduwd (v8hi) +v2di __builtin_ia32_vphadduwq (v8hi) +v4si __builtin_ia32_vphaddwd (v8hi) +v2di __builtin_ia32_vphaddwq (v8hi) +v8hi __builtin_ia32_vphsubbw (v16qi) +v2di __builtin_ia32_vphsubdq (v4si) +v4si __builtin_ia32_vphsubwd (v8hi) +v4si __builtin_ia32_vpmacsdd (v4si, v4si, v4si) +v2di __builtin_ia32_vpmacsdqh (v4si, v4si, v2di) +v2di __builtin_ia32_vpmacsdql (v4si, v4si, v2di) +v4si __builtin_ia32_vpmacssdd (v4si, v4si, v4si) +v2di __builtin_ia32_vpmacssdqh (v4si, v4si, v2di) +v2di __builtin_ia32_vpmacssdql (v4si, v4si, v2di) +v4si __builtin_ia32_vpmacsswd (v8hi, v8hi, v4si) +v8hi __builtin_ia32_vpmacssww (v8hi, v8hi, v8hi) +v4si __builtin_ia32_vpmacswd (v8hi, v8hi, v4si) +v8hi __builtin_ia32_vpmacsww (v8hi, v8hi, v8hi) +v4si __builtin_ia32_vpmadcsswd (v8hi, v8hi, v4si) +v4si __builtin_ia32_vpmadcswd (v8hi, v8hi, v4si) +v16qi __builtin_ia32_vpperm (v16qi, v16qi, v16qi) +v16qi __builtin_ia32_vprotb (v16qi, v16qi) +v4si __builtin_ia32_vprotd (v4si, v4si) +v2di __builtin_ia32_vprotq (v2di, v2di) +v8hi __builtin_ia32_vprotw (v8hi, v8hi) +v16qi __builtin_ia32_vpshab (v16qi, v16qi) +v4si __builtin_ia32_vpshad (v4si, v4si) +v2di __builtin_ia32_vpshaq (v2di, v2di) +v8hi __builtin_ia32_vpshaw (v8hi, v8hi) +v16qi __builtin_ia32_vpshlb (v16qi, v16qi) +v4si __builtin_ia32_vpshld (v4si, v4si) +v2di __builtin_ia32_vpshlq (v2di, v2di) +v8hi __builtin_ia32_vpshlw (v8hi, v8hi) +@end smallexample + +The following built-in functions are available when @option{-mfma4} is used. +All of them generate the machine instruction that is part of the name. + +@smallexample +v2df __builtin_ia32_vfmaddpd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfmaddps (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfmaddsd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfmaddss (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfmsubpd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfmsubps (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfmsubsd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfmsubss (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfnmaddpd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfnmaddps (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfnmaddsd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfnmaddss (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfnmsubpd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfnmsubps (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfnmsubsd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfnmsubss (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfmaddsubpd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfmaddsubps (v4sf, v4sf, v4sf) +v2df __builtin_ia32_vfmsubaddpd (v2df, v2df, v2df) +v4sf __builtin_ia32_vfmsubaddps (v4sf, v4sf, v4sf) +v4df __builtin_ia32_vfmaddpd256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vfmaddps256 (v8sf, v8sf, v8sf) +v4df __builtin_ia32_vfmsubpd256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vfmsubps256 (v8sf, v8sf, v8sf) +v4df __builtin_ia32_vfnmaddpd256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vfnmaddps256 (v8sf, v8sf, v8sf) +v4df __builtin_ia32_vfnmsubpd256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vfnmsubps256 (v8sf, v8sf, v8sf) +v4df __builtin_ia32_vfmaddsubpd256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vfmaddsubps256 (v8sf, v8sf, v8sf) +v4df __builtin_ia32_vfmsubaddpd256 (v4df, v4df, v4df) +v8sf __builtin_ia32_vfmsubaddps256 (v8sf, v8sf, v8sf) -@smallexample - spu_add ((vector signed int)@{1, 2, 3, 4@}, foo); @end smallexample -@noindent -Since @code{spu_add} is a macro, the vector constant in the example -is treated as four separate arguments. Wrap the entire argument in -parentheses for this to work. - -@item -The extended version of @code{__builtin_expect} is not supported. - -@end itemize - -@emph{Note:} Only the interface described in the aforementioned -specification is supported. Internally, GCC uses built-in functions to -implement the required functionality, but these are not supported and -are subject to change without notice. - -@node TI C6X Built-in Functions -@subsection TI C6X Built-in Functions +The following built-in functions are available when @option{-mlwp} is used. -GCC provides intrinsics to access certain instructions of the TI C6X -processors. These intrinsics, listed below, are available after -inclusion of the @code{c6x_intrinsics.h} header file. They map directly -to C6X instructions. +@smallexample +void __builtin_ia32_llwpcb16 (void *); +void __builtin_ia32_llwpcb32 (void *); +void __builtin_ia32_llwpcb64 (void *); +void * __builtin_ia32_llwpcb16 (void); +void * __builtin_ia32_llwpcb32 (void); +void * __builtin_ia32_llwpcb64 (void); +void __builtin_ia32_lwpval16 (unsigned short, unsigned int, unsigned short) +void __builtin_ia32_lwpval32 (unsigned int, unsigned int, unsigned int) +void __builtin_ia32_lwpval64 (unsigned __int64, unsigned int, unsigned int) +unsigned char __builtin_ia32_lwpins16 (unsigned short, unsigned int, unsigned short) +unsigned char __builtin_ia32_lwpins32 (unsigned int, unsigned int, unsigned int) +unsigned char __builtin_ia32_lwpins64 (unsigned __int64, unsigned int, unsigned int) +@end smallexample +The following built-in functions are available when @option{-mbmi} is used. +All of them generate the machine instruction that is part of the name. @smallexample +unsigned int __builtin_ia32_bextr_u32(unsigned int, unsigned int); +unsigned long long __builtin_ia32_bextr_u64 (unsigned long long, unsigned long long); +@end smallexample -int _sadd (int, int) -int _ssub (int, int) -int _sadd2 (int, int) -int _ssub2 (int, int) -long long _mpy2 (int, int) -long long _smpy2 (int, int) -int _add4 (int, int) -int _sub4 (int, int) -int _saddu4 (int, int) +The following built-in functions are available when @option{-mbmi2} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +unsigned int _bzhi_u32 (unsigned int, unsigned int) +unsigned int _pdep_u32 (unsigned int, unsigned int) +unsigned int _pext_u32 (unsigned int, unsigned int) +unsigned long long _bzhi_u64 (unsigned long long, unsigned long long) +unsigned long long _pdep_u64 (unsigned long long, unsigned long long) +unsigned long long _pext_u64 (unsigned long long, unsigned long long) +@end smallexample -int _smpy (int, int) -int _smpyh (int, int) -int _smpyhl (int, int) -int _smpylh (int, int) +The following built-in functions are available when @option{-mlzcnt} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +unsigned short __builtin_ia32_lzcnt_16(unsigned short); +unsigned int __builtin_ia32_lzcnt_u32(unsigned int); +unsigned long long __builtin_ia32_lzcnt_u64 (unsigned long long); +@end smallexample -int _sshl (int, int) -int _subc (int, int) +The following built-in functions are available when @option{-mfxsr} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +void __builtin_ia32_fxsave (void *) +void __builtin_ia32_fxrstor (void *) +void __builtin_ia32_fxsave64 (void *) +void __builtin_ia32_fxrstor64 (void *) +@end smallexample -int _avg2 (int, int) -int _avgu4 (int, int) +The following built-in functions are available when @option{-mxsave} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +void __builtin_ia32_xsave (void *, long long) +void __builtin_ia32_xrstor (void *, long long) +void __builtin_ia32_xsave64 (void *, long long) +void __builtin_ia32_xrstor64 (void *, long long) +@end smallexample -int _clrr (int, int) -int _extr (int, int) -int _extru (int, int) -int _abs (int) -int _abs2 (int) +The following built-in functions are available when @option{-mxsaveopt} is used. +All of them generate the machine instruction that is part of the name. +@smallexample +void __builtin_ia32_xsaveopt (void *, long long) +void __builtin_ia32_xsaveopt64 (void *, long long) +@end smallexample +The following built-in functions are available when @option{-mtbm} is used. +Both of them generate the immediate form of the bextr machine instruction. +@smallexample +unsigned int __builtin_ia32_bextri_u32 (unsigned int, const unsigned int); +unsigned long long __builtin_ia32_bextri_u64 (unsigned long long, const unsigned long long); @end smallexample -@node TILE-Gx Built-in Functions -@subsection TILE-Gx Built-in Functions -GCC provides intrinsics to access every instruction of the TILE-Gx -processor. The intrinsics are of the form: +The following built-in functions are available when @option{-m3dnow} is used. +All of them generate the machine instruction that is part of the name. @smallexample +void __builtin_ia32_femms (void) +v8qi __builtin_ia32_pavgusb (v8qi, v8qi) +v2si __builtin_ia32_pf2id (v2sf) +v2sf __builtin_ia32_pfacc (v2sf, v2sf) +v2sf __builtin_ia32_pfadd (v2sf, v2sf) +v2si __builtin_ia32_pfcmpeq (v2sf, v2sf) +v2si __builtin_ia32_pfcmpge (v2sf, v2sf) +v2si __builtin_ia32_pfcmpgt (v2sf, v2sf) +v2sf __builtin_ia32_pfmax (v2sf, v2sf) +v2sf __builtin_ia32_pfmin (v2sf, v2sf) +v2sf __builtin_ia32_pfmul (v2sf, v2sf) +v2sf __builtin_ia32_pfrcp (v2sf) +v2sf __builtin_ia32_pfrcpit1 (v2sf, v2sf) +v2sf __builtin_ia32_pfrcpit2 (v2sf, v2sf) +v2sf __builtin_ia32_pfrsqrt (v2sf) +v2sf __builtin_ia32_pfsub (v2sf, v2sf) +v2sf __builtin_ia32_pfsubr (v2sf, v2sf) +v2sf __builtin_ia32_pi2fd (v2si) +v4hi __builtin_ia32_pmulhrw (v4hi, v4hi) +@end smallexample -unsigned long long __insn_@var{op} (...) +The following built-in functions are available when both @option{-m3dnow} +and @option{-march=athlon} are used. All of them generate the machine +instruction that is part of the name. +@smallexample +v2si __builtin_ia32_pf2iw (v2sf) +v2sf __builtin_ia32_pfnacc (v2sf, v2sf) +v2sf __builtin_ia32_pfpnacc (v2sf, v2sf) +v2sf __builtin_ia32_pi2fw (v2si) +v2sf __builtin_ia32_pswapdsf (v2sf) +v2si __builtin_ia32_pswapdsi (v2si) @end smallexample -Where @var{op} is the name of the instruction. Refer to the ISA manual -for the complete list of instructions. - -GCC also provides intrinsics to directly access the network registers. -The intrinsics are: +The following built-in functions are available when @option{-mrtm} is used +They are used for restricted transactional memory. These are the internal +low level functions. Normally the functions in +@ref{x86 transactional memory intrinsics} should be used instead. @smallexample +int __builtin_ia32_xbegin () +void __builtin_ia32_xend () +void __builtin_ia32_xabort (status) +int __builtin_ia32_xtest () +@end smallexample -unsigned long long __tile_idn0_receive (void) -unsigned long long __tile_idn1_receive (void) -unsigned long long __tile_udn0_receive (void) -unsigned long long __tile_udn1_receive (void) -unsigned long long __tile_udn2_receive (void) -unsigned long long __tile_udn3_receive (void) -void __tile_idn_send (unsigned long long) -void __tile_udn_send (unsigned long long) +@node x86 transactional memory intrinsics +@subsection x86 transaction memory intrinsics -@end smallexample +Hardware transactional memory intrinsics for x86. These allow to use +memory transactions with RTM (Restricted Transactional Memory). +For using HLE (Hardware Lock Elision) see @ref{x86 specific memory model extensions for transactional memory} instead. +This support is enabled with the @option{-mrtm} option. -The intrinsic @code{void __tile_network_barrier (void)} is used to -guarantee that no network operations before it are reordered with -those after it. +A memory transaction commits all changes to memory in an atomic way, +as visible to other threads. If the transaction fails it is rolled back +and all side effects discarded. -@node TILEPro Built-in Functions -@subsection TILEPro Built-in Functions +Generally there is no guarantee that a memory transaction ever succeeds +and suitable fallback code always needs to be supplied. -GCC provides intrinsics to access every instruction of the TILEPro -processor. The intrinsics are of the form: +@deftypefn {RTM Function} {unsigned} _xbegin () +Start a RTM (Restricted Transactional Memory) transaction. +Returns _XBEGIN_STARTED when the transaction +started successfully (note this is not 0, so the constant has to be +explicitely tested). When the transaction aborts all side effects +are undone and an abort code is returned. There is no guarantee +any transaction ever succeeds, so there always needs to be a valid +tested fallback path. +@end deftypefn @smallexample +#include -unsigned __insn_@var{op} (...) - +if ((status = _xbegin ()) == _XBEGIN_STARTED) @{ + ... transaction code... + _xend (); +@} else @{ + ... non transactional fallback path... +@} @end smallexample -@noindent -where @var{op} is the name of the instruction. Refer to the ISA manual -for the complete list of instructions. - -GCC also provides intrinsics to directly access the network registers. -The intrinsics are: +Valid abort status bits (when the value is not @code{_XBEGIN_STARTED}) are: -@smallexample +@table @code +@item _XABORT_EXPLICIT +Transaction explicitely aborted with @code{_xabort}. The parameter passed +to @code{_xabort} is available with @code{_XABORT_CODE(status)} +@item _XABORT_RETRY +Transaction retry is possible. +@item _XABORT_CONFLICT +Transaction abort due to a memory conflict with another thread +@item _XABORT_CAPACITY +Transaction abort due to the transaction using too much memory +@item _XABORT_DEBUG +Transaction abort due to a debug trap +@item _XABORT_NESTED +Transaction abort in a inner nested transaction +@end table -unsigned __tile_idn0_receive (void) -unsigned __tile_idn1_receive (void) -unsigned __tile_sn_receive (void) -unsigned __tile_udn0_receive (void) -unsigned __tile_udn1_receive (void) -unsigned __tile_udn2_receive (void) -unsigned __tile_udn3_receive (void) -void __tile_idn_send (unsigned) -void __tile_sn_send (unsigned) -void __tile_udn_send (unsigned) +@deftypefn {RTM Function} {void} _xend () +Commit the current transaction. When no transaction is active this will +fault. All memory side effects of the transactions will become visible +to other threads in an atomic matter. +@end deftypefn -@end smallexample +@deftypefn {RTM Function} {int} _xtest () +Return a value not zero when a transaction is currently active, otherwise 0. +@end deftypefn -The intrinsic @code{void __tile_network_barrier (void)} is used to -guarantee that no network operations before it are reordered with -those after it. +@deftypefn {RTM Function} {void} _xabort (status) +Abort the current transaction. When no transaction is active this is a no-op. +status must be a 8bit constant, that is included in the status code returned +by @code{_xbegin} +@end deftypefn @node Target Format Checks @section Format Checks Specific to Particular Target Machines diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 94ca9472120..ba81ec7a7d8 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -676,44 +676,6 @@ Objective-C and Objective-C++ Dialects}. -mschedule=@var{cpu-type} -mspace-regs -msio -mwsio @gol -munix=@var{unix-std} -nolibdld -static -threads} -@emph{x86 Options} -@gccoptlist{-mtune=@var{cpu-type} -march=@var{cpu-type} @gol --mtune-ctrl=@var{feature-list} -mdump-tune-features -mno-default @gol --mfpmath=@var{unit} @gol --masm=@var{dialect} -mno-fancy-math-387 @gol --mno-fp-ret-in-387 -msoft-float @gol --mno-wide-multiply -mrtd -malign-double @gol --mpreferred-stack-boundary=@var{num} @gol --mincoming-stack-boundary=@var{num} @gol --mcld -mcx16 -msahf -mmovbe -mcrc32 @gol --mrecip -mrecip=@var{opt} @gol --mvzeroupper -mprefer-avx128 @gol --mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol --mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -msha @gol --maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfma -mprefetchwt1 @gol --mclflushopt -mxsavec -mxsaves @gol --msse4a -m3dnow -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop -mlzcnt @gol --mbmi2 -mfxsr -mxsave -mxsaveopt -mrtm -mlwp -mmpx -mthreads @gol --mno-align-stringops -minline-all-stringops @gol --minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol --mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} @gol --mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol --m96bit-long-double -mlong-double-64 -mlong-double-80 -mlong-double-128 @gol --mregparm=@var{num} -msseregparm @gol --mveclibabi=@var{type} -mvect8-ret-in-mem @gol --mpc32 -mpc64 -mpc80 -mstackrealign @gol --momit-leaf-frame-pointer -mno-red-zone -mno-tls-direct-seg-refs @gol --mcmodel=@var{code-model} -mabi=@var{name} -maddress-mode=@var{mode} @gol --m32 -m64 -mx32 -m16 -mlarge-data-threshold=@var{num} @gol --msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol --mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol --malign-data=@var{type} -mstack-protector-guard=@var{guard}} - -@emph{x86 Windows Options} -@gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol --mnop-fun-dllimport -mthread @gol --municode -mwin32 -mwindows -fno-set-stack-executable} - @emph{IA-64 Options} @gccoptlist{-mbig-endian -mlittle-endian -mgnu-as -mgnu-ld -mno-pic @gol -mvolatile-asm-stop -mregister-names -msdata -mno-sdata @gol @@ -1081,6 +1043,44 @@ See RS/6000 and PowerPC Options. @gccoptlist{-mrtp -non-static -Bstatic -Bdynamic @gol -Xbind-lazy -Xbind-now} +@emph{x86 Options} +@gccoptlist{-mtune=@var{cpu-type} -march=@var{cpu-type} @gol +-mtune-ctrl=@var{feature-list} -mdump-tune-features -mno-default @gol +-mfpmath=@var{unit} @gol +-masm=@var{dialect} -mno-fancy-math-387 @gol +-mno-fp-ret-in-387 -msoft-float @gol +-mno-wide-multiply -mrtd -malign-double @gol +-mpreferred-stack-boundary=@var{num} @gol +-mincoming-stack-boundary=@var{num} @gol +-mcld -mcx16 -msahf -mmovbe -mcrc32 @gol +-mrecip -mrecip=@var{opt} @gol +-mvzeroupper -mprefer-avx128 @gol +-mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 -mavx @gol +-mavx2 -mavx512f -mavx512pf -mavx512er -mavx512cd -msha @gol +-maes -mpclmul -mfsgsbase -mrdrnd -mf16c -mfma -mprefetchwt1 @gol +-mclflushopt -mxsavec -mxsaves @gol +-msse4a -m3dnow -mpopcnt -mabm -mbmi -mtbm -mfma4 -mxop -mlzcnt @gol +-mbmi2 -mfxsr -mxsave -mxsaveopt -mrtm -mlwp -mmpx -mthreads @gol +-mno-align-stringops -minline-all-stringops @gol +-minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} @gol +-mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol +-m96bit-long-double -mlong-double-64 -mlong-double-80 -mlong-double-128 @gol +-mregparm=@var{num} -msseregparm @gol +-mveclibabi=@var{type} -mvect8-ret-in-mem @gol +-mpc32 -mpc64 -mpc80 -mstackrealign @gol +-momit-leaf-frame-pointer -mno-red-zone -mno-tls-direct-seg-refs @gol +-mcmodel=@var{code-model} -mabi=@var{name} -maddress-mode=@var{mode} @gol +-m32 -m64 -mx32 -m16 -mlarge-data-threshold=@var{num} @gol +-msse2avx -mfentry -mrecord-mcount -mnop-mcount -m8bit-idiv @gol +-mavx256-split-unaligned-load -mavx256-split-unaligned-store @gol +-malign-data=@var{type} -mstack-protector-guard=@var{guard}} + +@emph{x86 Windows Options} +@gccoptlist{-mconsole -mcygwin -mno-cygwin -mdll @gol +-mnop-fun-dllimport -mthread @gol +-municode -mwin32 -mwindows -fno-set-stack-executable} + @emph{Xstormy16 Options} @gccoptlist{-msim} @@ -11952,8 +11952,6 @@ platform. * GNU/Linux Options:: * H8/300 Options:: * HPPA Options:: -* x86 Options:: -* x86 Windows Options:: * IA-64 Options:: * LM32 Options:: * M32C Options:: @@ -11989,6 +11987,8 @@ platform. * Visium Options:: * VMS Options:: * VxWorks Options:: +* x86 Options:: +* x86 Windows Options:: * Xstormy16 Options:: * Xtensa Options:: * zSeries Options:: @@ -15361,6568 +15361,6170 @@ under HP-UX@. This option sets flags for both the preprocessor and linker. @end table -@node x86 Options -@subsection x86 Options -@cindex x86 Options +@node IA-64 Options +@subsection IA-64 Options +@cindex IA-64 Options -These @samp{-m} options are defined for the x86 family of computers. +These are the @samp{-m} options defined for the Intel IA-64 architecture. @table @gcctabopt +@item -mbig-endian +@opindex mbig-endian +Generate code for a big-endian target. This is the default for HP-UX@. -@item -march=@var{cpu-type} -@opindex march -Generate instructions for the machine type @var{cpu-type}. In contrast to -@option{-mtune=@var{cpu-type}}, which merely tunes the generated code -for the specified @var{cpu-type}, @option{-march=@var{cpu-type}} allows GCC -to generate code that may not run at all on processors other than the one -indicated. Specifying @option{-march=@var{cpu-type}} implies -@option{-mtune=@var{cpu-type}}. - -The choices for @var{cpu-type} are: +@item -mlittle-endian +@opindex mlittle-endian +Generate code for a little-endian target. This is the default for AIX5 +and GNU/Linux. -@table @samp -@item native -This selects the CPU to generate code for at compilation time by determining -the processor type of the compiling machine. Using @option{-march=native} -enables all instruction subsets supported by the local machine (hence -the result might not run on different machines). Using @option{-mtune=native} -produces code optimized for the local machine under the constraints -of the selected instruction set. +@item -mgnu-as +@itemx -mno-gnu-as +@opindex mgnu-as +@opindex mno-gnu-as +Generate (or don't) code for the GNU assembler. This is the default. +@c Also, this is the default if the configure option @option{--with-gnu-as} +@c is used. -@item i386 -Original Intel i386 CPU@. +@item -mgnu-ld +@itemx -mno-gnu-ld +@opindex mgnu-ld +@opindex mno-gnu-ld +Generate (or don't) code for the GNU linker. This is the default. +@c Also, this is the default if the configure option @option{--with-gnu-ld} +@c is used. -@item i486 -Intel i486 CPU@. (No scheduling is implemented for this chip.) +@item -mno-pic +@opindex mno-pic +Generate code that does not use a global pointer register. The result +is not position independent code, and violates the IA-64 ABI@. -@item i586 -@itemx pentium -Intel Pentium CPU with no MMX support. +@item -mvolatile-asm-stop +@itemx -mno-volatile-asm-stop +@opindex mvolatile-asm-stop +@opindex mno-volatile-asm-stop +Generate (or don't) a stop bit immediately before and after volatile asm +statements. -@item pentium-mmx -Intel Pentium MMX CPU, based on Pentium core with MMX instruction set support. +@item -mregister-names +@itemx -mno-register-names +@opindex mregister-names +@opindex mno-register-names +Generate (or don't) @samp{in}, @samp{loc}, and @samp{out} register names for +the stacked registers. This may make assembler output more readable. -@item pentiumpro -Intel Pentium Pro CPU@. +@item -mno-sdata +@itemx -msdata +@opindex mno-sdata +@opindex msdata +Disable (or enable) optimizations that use the small data section. This may +be useful for working around optimizer bugs. -@item i686 -When used with @option{-march}, the Pentium Pro -instruction set is used, so the code runs on all i686 family chips. -When used with @option{-mtune}, it has the same meaning as @samp{generic}. +@item -mconstant-gp +@opindex mconstant-gp +Generate code that uses a single constant global pointer value. This is +useful when compiling kernel code. -@item pentium2 -Intel Pentium II CPU, based on Pentium Pro core with MMX instruction set -support. +@item -mauto-pic +@opindex mauto-pic +Generate code that is self-relocatable. This implies @option{-mconstant-gp}. +This is useful when compiling firmware code. -@item pentium3 -@itemx pentium3m -Intel Pentium III CPU, based on Pentium Pro core with MMX and SSE instruction -set support. +@item -minline-float-divide-min-latency +@opindex minline-float-divide-min-latency +Generate code for inline divides of floating-point values +using the minimum latency algorithm. -@item pentium-m -Intel Pentium M; low-power version of Intel Pentium III CPU -with MMX, SSE and SSE2 instruction set support. Used by Centrino notebooks. +@item -minline-float-divide-max-throughput +@opindex minline-float-divide-max-throughput +Generate code for inline divides of floating-point values +using the maximum throughput algorithm. -@item pentium4 -@itemx pentium4m -Intel Pentium 4 CPU with MMX, SSE and SSE2 instruction set support. +@item -mno-inline-float-divide +@opindex mno-inline-float-divide +Do not generate inline code for divides of floating-point values. -@item prescott -Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2 and SSE3 instruction -set support. +@item -minline-int-divide-min-latency +@opindex minline-int-divide-min-latency +Generate code for inline divides of integer values +using the minimum latency algorithm. -@item nocona -Improved version of Intel Pentium 4 CPU with 64-bit extensions, MMX, SSE, -SSE2 and SSE3 instruction set support. +@item -minline-int-divide-max-throughput +@opindex minline-int-divide-max-throughput +Generate code for inline divides of integer values +using the maximum throughput algorithm. -@item core2 -Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 -instruction set support. +@item -mno-inline-int-divide +@opindex mno-inline-int-divide +Do not generate inline code for divides of integer values. -@item nehalem -Intel Nehalem CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2 and POPCNT instruction set support. +@item -minline-sqrt-min-latency +@opindex minline-sqrt-min-latency +Generate code for inline square roots +using the minimum latency algorithm. -@item westmere -Intel Westmere CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2, POPCNT, AES and PCLMUL instruction set support. +@item -minline-sqrt-max-throughput +@opindex minline-sqrt-max-throughput +Generate code for inline square roots +using the maximum throughput algorithm. -@item sandybridge -Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support. +@item -mno-inline-sqrt +@opindex mno-inline-sqrt +Do not generate inline code for @code{sqrt}. -@item ivybridge -Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C -instruction set support. +@item -mfused-madd +@itemx -mno-fused-madd +@opindex mfused-madd +@opindex mno-fused-madd +Do (don't) generate code that uses the fused multiply/add or multiply/subtract +instructions. The default is to use these instructions. -@item haswell -Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, -BMI, BMI2 and F16C instruction set support. +@item -mno-dwarf2-asm +@itemx -mdwarf2-asm +@opindex mno-dwarf2-asm +@opindex mdwarf2-asm +Don't (or do) generate assembler code for the DWARF 2 line number debugging +info. This may be useful when not using the GNU assembler. -@item broadwell -Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, -BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set support. +@item -mearly-stop-bits +@itemx -mno-early-stop-bits +@opindex mearly-stop-bits +@opindex mno-early-stop-bits +Allow stop bits to be placed earlier than immediately preceding the +instruction that triggered the stop bit. This can improve instruction +scheduling, but does not always do so. -@item bonnell -Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3 -instruction set support. +@item -mfixed-range=@var{register-range} +@opindex mfixed-range +Generate code treating the given register range as fixed registers. +A fixed register is one that the register allocator cannot use. This is +useful when compiling kernel code. A register range is specified as +two registers separated by a dash. Multiple register ranges can be +specified separated by a comma. -@item silvermont -Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, -SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction set support. +@item -mtls-size=@var{tls-size} +@opindex mtls-size +Specify bit size of immediate TLS offsets. Valid values are 14, 22, and +64. -@item k6 -AMD K6 CPU with MMX instruction set support. +@item -mtune=@var{cpu-type} +@opindex mtune +Tune the instruction scheduling for a particular CPU, Valid values are +@samp{itanium}, @samp{itanium1}, @samp{merced}, @samp{itanium2}, +and @samp{mckinley}. -@item k6-2 -@itemx k6-3 -Improved versions of AMD K6 CPU with MMX and 3DNow!@: instruction set support. +@item -milp32 +@itemx -mlp64 +@opindex milp32 +@opindex mlp64 +Generate code for a 32-bit or 64-bit environment. +The 32-bit environment sets int, long and pointer to 32 bits. +The 64-bit environment sets int to 32 bits and long and pointer +to 64 bits. These are HP-UX specific flags. -@item athlon -@itemx athlon-tbird -AMD Athlon CPU with MMX, 3dNOW!, enhanced 3DNow!@: and SSE prefetch instructions -support. +@item -mno-sched-br-data-spec +@itemx -msched-br-data-spec +@opindex mno-sched-br-data-spec +@opindex msched-br-data-spec +(Dis/En)able data speculative scheduling before reload. +This results in generation of @code{ld.a} instructions and +the corresponding check instructions (@code{ld.c} / @code{chk.a}). +The default is 'disable'. -@item athlon-4 -@itemx athlon-xp -@itemx athlon-mp -Improved AMD Athlon CPU with MMX, 3DNow!, enhanced 3DNow!@: and full SSE -instruction set support. +@item -msched-ar-data-spec +@itemx -mno-sched-ar-data-spec +@opindex msched-ar-data-spec +@opindex mno-sched-ar-data-spec +(En/Dis)able data speculative scheduling after reload. +This results in generation of @code{ld.a} instructions and +the corresponding check instructions (@code{ld.c} / @code{chk.a}). +The default is 'enable'. -@item k8 -@itemx opteron -@itemx athlon64 -@itemx athlon-fx -Processors based on the AMD K8 core with x86-64 instruction set support, -including the AMD Opteron, Athlon 64, and Athlon 64 FX processors. -(This supersets MMX, SSE, SSE2, 3DNow!, enhanced 3DNow!@: and 64-bit -instruction set extensions.) +@item -mno-sched-control-spec +@itemx -msched-control-spec +@opindex mno-sched-control-spec +@opindex msched-control-spec +(Dis/En)able control speculative scheduling. This feature is +available only during region scheduling (i.e.@: before reload). +This results in generation of the @code{ld.s} instructions and +the corresponding check instructions @code{chk.s}. +The default is 'disable'. -@item k8-sse3 -@itemx opteron-sse3 -@itemx athlon64-sse3 -Improved versions of AMD K8 cores with SSE3 instruction set support. +@item -msched-br-in-data-spec +@itemx -mno-sched-br-in-data-spec +@opindex msched-br-in-data-spec +@opindex mno-sched-br-in-data-spec +(En/Dis)able speculative scheduling of the instructions that +are dependent on the data speculative loads before reload. +This is effective only with @option{-msched-br-data-spec} enabled. +The default is 'enable'. -@item amdfam10 -@itemx barcelona -CPUs based on AMD Family 10h cores with x86-64 instruction set support. (This -supersets MMX, SSE, SSE2, SSE3, SSE4A, 3DNow!, enhanced 3DNow!, ABM and 64-bit -instruction set extensions.) +@item -msched-ar-in-data-spec +@itemx -mno-sched-ar-in-data-spec +@opindex msched-ar-in-data-spec +@opindex mno-sched-ar-in-data-spec +(En/Dis)able speculative scheduling of the instructions that +are dependent on the data speculative loads after reload. +This is effective only with @option{-msched-ar-data-spec} enabled. +The default is 'enable'. -@item bdver1 -CPUs based on AMD Family 15h cores with x86-64 instruction set support. (This -supersets FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, -SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.) -@item bdver2 -AMD Family 15h core based CPUs with x86-64 instruction set support. (This -supersets BMI, TBM, F16C, FMA, FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, -SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set -extensions.) -@item bdver3 -AMD Family 15h core based CPUs with x86-64 instruction set support. (This -supersets BMI, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, XOP, LWP, AES, -PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and -64-bit instruction set extensions. -@item bdver4 -AMD Family 15h core based CPUs with x86-64 instruction set support. (This -supersets BMI, BMI2, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, AVX2, XOP, LWP, -AES, PCL_MUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, -SSE4.2, ABM and 64-bit instruction set extensions. +@item -msched-in-control-spec +@itemx -mno-sched-in-control-spec +@opindex msched-in-control-spec +@opindex mno-sched-in-control-spec +(En/Dis)able speculative scheduling of the instructions that +are dependent on the control speculative loads. +This is effective only with @option{-msched-control-spec} enabled. +The default is 'enable'. -@item btver1 -CPUs based on AMD Family 14h cores with x86-64 instruction set support. (This -supersets MMX, SSE, SSE2, SSE3, SSSE3, SSE4A, CX16, ABM and 64-bit -instruction set extensions.) +@item -mno-sched-prefer-non-data-spec-insns +@itemx -msched-prefer-non-data-spec-insns +@opindex mno-sched-prefer-non-data-spec-insns +@opindex msched-prefer-non-data-spec-insns +If enabled, data-speculative instructions are chosen for schedule +only if there are no other choices at the moment. This makes +the use of the data speculation much more conservative. +The default is 'disable'. -@item btver2 -CPUs based on AMD Family 16h cores with x86-64 instruction set support. This -includes MOVBE, F16C, BMI, AVX, PCL_MUL, AES, SSE4.2, SSE4.1, CX16, ABM, -SSE4A, SSSE3, SSE3, SSE2, SSE, MMX and 64-bit instruction set extensions. +@item -mno-sched-prefer-non-control-spec-insns +@itemx -msched-prefer-non-control-spec-insns +@opindex mno-sched-prefer-non-control-spec-insns +@opindex msched-prefer-non-control-spec-insns +If enabled, control-speculative instructions are chosen for schedule +only if there are no other choices at the moment. This makes +the use of the control speculation much more conservative. +The default is 'disable'. -@item winchip-c6 -IDT WinChip C6 CPU, dealt in same way as i486 with additional MMX instruction -set support. +@item -mno-sched-count-spec-in-critical-path +@itemx -msched-count-spec-in-critical-path +@opindex mno-sched-count-spec-in-critical-path +@opindex msched-count-spec-in-critical-path +If enabled, speculative dependencies are considered during +computation of the instructions priorities. This makes the use of the +speculation a bit more conservative. +The default is 'disable'. -@item winchip2 -IDT WinChip 2 CPU, dealt in same way as i486 with additional MMX and 3DNow!@: -instruction set support. +@item -msched-spec-ldc +@opindex msched-spec-ldc +Use a simple data speculation check. This option is on by default. -@item c3 -VIA C3 CPU with MMX and 3DNow!@: instruction set support. (No scheduling is -implemented for this chip.) +@item -msched-control-spec-ldc +@opindex msched-spec-ldc +Use a simple check for control speculation. This option is on by default. -@item c3-2 -VIA C3-2 (Nehemiah/C5XL) CPU with MMX and SSE instruction set support. -(No scheduling is -implemented for this chip.) +@item -msched-stop-bits-after-every-cycle +@opindex msched-stop-bits-after-every-cycle +Place a stop bit after every cycle when scheduling. This option is on +by default. -@item geode -AMD Geode embedded processor with MMX and 3DNow!@: instruction set support. -@end table +@item -msched-fp-mem-deps-zero-cost +@opindex msched-fp-mem-deps-zero-cost +Assume that floating-point stores and loads are not likely to cause a conflict +when placed into the same instruction group. This option is disabled by +default. -@item -mtune=@var{cpu-type} -@opindex mtune -Tune to @var{cpu-type} everything applicable about the generated code, except -for the ABI and the set of available instructions. -While picking a specific @var{cpu-type} schedules things appropriately -for that particular chip, the compiler does not generate any code that -cannot run on the default machine type unless you use a -@option{-march=@var{cpu-type}} option. -For example, if GCC is configured for i686-pc-linux-gnu -then @option{-mtune=pentium4} generates code that is tuned for Pentium 4 -but still runs on i686 machines. +@item -msel-sched-dont-check-control-spec +@opindex msel-sched-dont-check-control-spec +Generate checks for control speculation in selective scheduling. +This flag is disabled by default. -The choices for @var{cpu-type} are the same as for @option{-march}. -In addition, @option{-mtune} supports 2 extra choices for @var{cpu-type}: +@item -msched-max-memory-insns=@var{max-insns} +@opindex msched-max-memory-insns +Limit on the number of memory insns per instruction group, giving lower +priority to subsequent memory insns attempting to schedule in the same +instruction group. Frequently useful to prevent cache bank conflicts. +The default value is 1. -@table @samp -@item generic -Produce code optimized for the most common IA32/@/AMD64/@/EM64T processors. -If you know the CPU on which your code will run, then you should use -the corresponding @option{-mtune} or @option{-march} option instead of -@option{-mtune=generic}. But, if you do not know exactly what CPU users -of your application will have, then you should use this option. +@item -msched-max-memory-insns-hard-limit +@opindex msched-max-memory-insns-hard-limit +Makes the limit specified by @option{msched-max-memory-insns} a hard limit, +disallowing more than that number in an instruction group. +Otherwise, the limit is ``soft'', meaning that non-memory operations +are preferred when the limit is reached, but memory operations may still +be scheduled. -As new processors are deployed in the marketplace, the behavior of this -option will change. Therefore, if you upgrade to a newer version of -GCC, code generation controlled by this option will change to reflect -the processors -that are most common at the time that version of GCC is released. +@end table -There is no @option{-march=generic} option because @option{-march} -indicates the instruction set the compiler can use, and there is no -generic instruction set applicable to all processors. In contrast, -@option{-mtune} indicates the processor (or, in this case, collection of -processors) for which the code is optimized. +@node LM32 Options +@subsection LM32 Options +@cindex LM32 options -@item intel -Produce code optimized for the most current Intel processors, which are -Haswell and Silvermont for this version of GCC. If you know the CPU -on which your code will run, then you should use the corresponding -@option{-mtune} or @option{-march} option instead of @option{-mtune=intel}. -But, if you want your application performs better on both Haswell and -Silvermont, then you should use this option. +These @option{-m} options are defined for the LatticeMico32 architecture: -As new Intel processors are deployed in the marketplace, the behavior of -this option will change. Therefore, if you upgrade to a newer version of -GCC, code generation controlled by this option will change to reflect -the most current Intel processors at the time that version of GCC is -released. +@table @gcctabopt +@item -mbarrel-shift-enabled +@opindex mbarrel-shift-enabled +Enable barrel-shift instructions. -There is no @option{-march=intel} option because @option{-march} indicates -the instruction set the compiler can use, and there is no common -instruction set applicable to all processors. In contrast, -@option{-mtune} indicates the processor (or, in this case, collection of -processors) for which the code is optimized. -@end table +@item -mdivide-enabled +@opindex mdivide-enabled +Enable divide and modulus instructions. -@item -mcpu=@var{cpu-type} -@opindex mcpu -A deprecated synonym for @option{-mtune}. +@item -mmultiply-enabled +@opindex multiply-enabled +Enable multiply instructions. -@item -mfpmath=@var{unit} -@opindex mfpmath -Generate floating-point arithmetic for selected unit @var{unit}. The choices -for @var{unit} are: +@item -msign-extend-enabled +@opindex msign-extend-enabled +Enable sign extend instructions. -@table @samp -@item 387 -Use the standard 387 floating-point coprocessor present on the majority of chips and -emulated otherwise. Code compiled with this option runs almost everywhere. -The temporary results are computed in 80-bit precision instead of the precision -specified by the type, resulting in slightly different results compared to most -of other chips. See @option{-ffloat-store} for more detailed description. +@item -muser-enabled +@opindex muser-enabled +Enable user-defined instructions. -This is the default choice for x86-32 targets. +@end table -@item sse -Use scalar floating-point instructions present in the SSE instruction set. -This instruction set is supported by Pentium III and newer chips, -and in the AMD line -by Athlon-4, Athlon XP and Athlon MP chips. The earlier version of the SSE -instruction set supports only single-precision arithmetic, thus the double and -extended-precision arithmetic are still done using 387. A later version, present -only in Pentium 4 and AMD x86-64 chips, supports double-precision -arithmetic too. +@node M32C Options +@subsection M32C Options +@cindex M32C options -For the x86-32 compiler, you must use @option{-march=@var{cpu-type}}, @option{-msse} -or @option{-msse2} switches to enable SSE extensions and make this option -effective. For the x86-64 compiler, these extensions are enabled by default. +@table @gcctabopt +@item -mcpu=@var{name} +@opindex mcpu= +Select the CPU for which code is generated. @var{name} may be one of +@samp{r8c} for the R8C/Tiny series, @samp{m16c} for the M16C (up to +/60) series, @samp{m32cm} for the M16C/80 series, or @samp{m32c} for +the M32C/80 series. -The resulting code should be considerably faster in the majority of cases and avoid -the numerical instability problems of 387 code, but may break some existing -code that expects temporaries to be 80 bits. +@item -msim +@opindex msim +Specifies that the program will be run on the simulator. This causes +an alternate runtime library to be linked in which supports, for +example, file I/O@. You must not use this option when generating +programs that will run on real hardware; you must provide your own +runtime library for whatever I/O functions are needed. -This is the default choice for the x86-64 compiler. +@item -memregs=@var{number} +@opindex memregs= +Specifies the number of memory-based pseudo-registers GCC uses +during code generation. These pseudo-registers are used like real +registers, so there is a tradeoff between GCC's ability to fit the +code into available registers, and the performance penalty of using +memory instead of registers. Note that all modules in a program must +be compiled with the same value for this option. Because of that, you +must not use this option with GCC's default runtime libraries. -@item sse,387 -@itemx sse+387 -@itemx both -Attempt to utilize both instruction sets at once. This effectively doubles the -amount of available registers, and on chips with separate execution units for -387 and SSE the execution resources too. Use this option with care, as it is -still experimental, because the GCC register allocator does not model separate -functional units well, resulting in unstable performance. @end table -@item -masm=@var{dialect} -@opindex masm=@var{dialect} -Output assembly instructions using selected @var{dialect}. Supported -choices are @samp{intel} or @samp{att} (the default). Darwin does -not support @samp{intel}. +@node M32R/D Options +@subsection M32R/D Options +@cindex M32R/D options -@item -mieee-fp -@itemx -mno-ieee-fp -@opindex mieee-fp -@opindex mno-ieee-fp -Control whether or not the compiler uses IEEE floating-point -comparisons. These correctly handle the case where the result of a -comparison is unordered. +These @option{-m} options are defined for Renesas M32R/D architectures: -@item -msoft-float -@opindex msoft-float -Generate output containing library calls for floating point. +@table @gcctabopt +@item -m32r2 +@opindex m32r2 +Generate code for the M32R/2@. -@strong{Warning:} the requisite libraries are not part of GCC@. -Normally the facilities of the machine's usual C compiler are used, but -this can't be done directly in cross-compilation. You must make your -own arrangements to provide suitable library functions for -cross-compilation. +@item -m32rx +@opindex m32rx +Generate code for the M32R/X@. -On machines where a function returns floating-point results in the 80387 -register stack, some floating-point opcodes may be emitted even if -@option{-msoft-float} is used. +@item -m32r +@opindex m32r +Generate code for the M32R@. This is the default. -@item -mno-fp-ret-in-387 -@opindex mno-fp-ret-in-387 -Do not use the FPU registers for return values of functions. +@item -mmodel=small +@opindex mmodel=small +Assume all objects live in the lower 16MB of memory (so that their addresses +can be loaded with the @code{ld24} instruction), and assume all subroutines +are reachable with the @code{bl} instruction. +This is the default. -The usual calling convention has functions return values of types -@code{float} and @code{double} in an FPU register, even if there -is no FPU@. The idea is that the operating system should emulate -an FPU@. +The addressability of a particular object can be set with the +@code{model} attribute. -The option @option{-mno-fp-ret-in-387} causes such values to be returned -in ordinary CPU registers instead. +@item -mmodel=medium +@opindex mmodel=medium +Assume objects may be anywhere in the 32-bit address space (the compiler +generates @code{seth/add3} instructions to load their addresses), and +assume all subroutines are reachable with the @code{bl} instruction. -@item -mno-fancy-math-387 -@opindex mno-fancy-math-387 -Some 387 emulators do not support the @code{sin}, @code{cos} and -@code{sqrt} instructions for the 387. Specify this option to avoid -generating those instructions. This option is the default on FreeBSD, -OpenBSD and NetBSD@. This option is overridden when @option{-march} -indicates that the target CPU always has an FPU and so the -instruction does not need emulation. These -instructions are not generated unless you also use the -@option{-funsafe-math-optimizations} switch. +@item -mmodel=large +@opindex mmodel=large +Assume objects may be anywhere in the 32-bit address space (the compiler +generates @code{seth/add3} instructions to load their addresses), and +assume subroutines may not be reachable with the @code{bl} instruction +(the compiler generates the much slower @code{seth/add3/jl} +instruction sequence). -@item -malign-double -@itemx -mno-align-double -@opindex malign-double -@opindex mno-align-double -Control whether GCC aligns @code{double}, @code{long double}, and -@code{long long} variables on a two-word boundary or a one-word -boundary. Aligning @code{double} variables on a two-word boundary -produces code that runs somewhat faster on a Pentium at the -expense of more memory. +@item -msdata=none +@opindex msdata=none +Disable use of the small data area. Variables are put into +one of @code{.data}, @code{.bss}, or @code{.rodata} (unless the +@code{section} attribute has been specified). +This is the default. -On x86-64, @option{-malign-double} is enabled by default. +The small data area consists of sections @code{.sdata} and @code{.sbss}. +Objects may be explicitly put in the small data area with the +@code{section} attribute using one of these sections. -@strong{Warning:} if you use the @option{-malign-double} switch, -structures containing the above types are aligned differently than -the published application binary interface specifications for the x86-32 -and are not binary compatible with structures in code compiled -without that switch. +@item -msdata=sdata +@opindex msdata=sdata +Put small global and static data in the small data area, but do not +generate special code to reference them. -@item -m96bit-long-double -@itemx -m128bit-long-double -@opindex m96bit-long-double -@opindex m128bit-long-double -These switches control the size of @code{long double} type. The x86-32 -application binary interface specifies the size to be 96 bits, -so @option{-m96bit-long-double} is the default in 32-bit mode. +@item -msdata=use +@opindex msdata=use +Put small global and static data in the small data area, and generate +special instructions to reference them. -Modern architectures (Pentium and newer) prefer @code{long double} -to be aligned to an 8- or 16-byte boundary. In arrays or structures -conforming to the ABI, this is not possible. So specifying -@option{-m128bit-long-double} aligns @code{long double} -to a 16-byte boundary by padding the @code{long double} with an additional -32-bit zero. +@item -G @var{num} +@opindex G +@cindex smaller data references +Put global and static objects less than or equal to @var{num} bytes +into the small data or BSS sections instead of the normal data or BSS +sections. The default value of @var{num} is 8. +The @option{-msdata} option must be set to one of @samp{sdata} or @samp{use} +for this option to have any effect. -In the x86-64 compiler, @option{-m128bit-long-double} is the default choice as -its ABI specifies that @code{long double} is aligned on 16-byte boundary. +All modules should be compiled with the same @option{-G @var{num}} value. +Compiling with different values of @var{num} may or may not work; if it +doesn't the linker gives an error message---incorrect code is not +generated. -Notice that neither of these options enable any extra precision over the x87 -standard of 80 bits for a @code{long double}. +@item -mdebug +@opindex mdebug +Makes the M32R-specific code in the compiler display some statistics +that might help in debugging programs. -@strong{Warning:} if you override the default value for your target ABI, this -changes the size of -structures and arrays containing @code{long double} variables, -as well as modifying the function calling convention for functions taking -@code{long double}. Hence they are not binary-compatible -with code compiled without that switch. +@item -malign-loops +@opindex malign-loops +Align all loops to a 32-byte boundary. -@item -mlong-double-64 -@itemx -mlong-double-80 -@itemx -mlong-double-128 -@opindex mlong-double-64 -@opindex mlong-double-80 -@opindex mlong-double-128 -These switches control the size of @code{long double} type. A size -of 64 bits makes the @code{long double} type equivalent to the @code{double} -type. This is the default for 32-bit Bionic C library. A size -of 128 bits makes the @code{long double} type equivalent to the -@code{__float128} type. This is the default for 64-bit Bionic C library. - -@strong{Warning:} if you override the default value for your target ABI, this -changes the size of -structures and arrays containing @code{long double} variables, -as well as modifying the function calling convention for functions taking -@code{long double}. Hence they are not binary-compatible -with code compiled without that switch. +@item -mno-align-loops +@opindex mno-align-loops +Do not enforce a 32-byte alignment for loops. This is the default. -@item -malign-data=@var{type} -@opindex malign-data -Control how GCC aligns variables. Supported values for @var{type} are -@samp{compat} uses increased alignment value compatible uses GCC 4.8 -and earlier, @samp{abi} uses alignment value as specified by the -psABI, and @samp{cacheline} uses increased alignment value to match -the cache line size. @samp{compat} is the default. +@item -missue-rate=@var{number} +@opindex missue-rate=@var{number} +Issue @var{number} instructions per cycle. @var{number} can only be 1 +or 2. -@item -mlarge-data-threshold=@var{threshold} -@opindex mlarge-data-threshold -When @option{-mcmodel=medium} is specified, data objects larger than -@var{threshold} are placed in the large data section. This value must be the -same across all objects linked into the binary, and defaults to 65535. +@item -mbranch-cost=@var{number} +@opindex mbranch-cost=@var{number} +@var{number} can only be 1 or 2. If it is 1 then branches are +preferred over conditional code, if it is 2, then the opposite applies. -@item -mrtd -@opindex mrtd -Use a different function-calling convention, in which functions that -take a fixed number of arguments return with the @code{ret @var{num}} -instruction, which pops their arguments while returning. This saves one -instruction in the caller since there is no need to pop the arguments -there. +@item -mflush-trap=@var{number} +@opindex mflush-trap=@var{number} +Specifies the trap number to use to flush the cache. The default is +12. Valid numbers are between 0 and 15 inclusive. -You can specify that an individual function is called with this calling -sequence with the function attribute @code{stdcall}. You can also -override the @option{-mrtd} option by using the function attribute -@code{cdecl}. @xref{Function Attributes}. +@item -mno-flush-trap +@opindex mno-flush-trap +Specifies that the cache cannot be flushed by using a trap. -@strong{Warning:} this calling convention is incompatible with the one -normally used on Unix, so you cannot use it if you need to call -libraries compiled with the Unix compiler. +@item -mflush-func=@var{name} +@opindex mflush-func=@var{name} +Specifies the name of the operating system function to call to flush +the cache. The default is @samp{_flush_cache}, but a function call +is only used if a trap is not available. -Also, you must provide function prototypes for all functions that -take variable numbers of arguments (including @code{printf}); -otherwise incorrect code is generated for calls to those -functions. +@item -mno-flush-func +@opindex mno-flush-func +Indicates that there is no OS function for flushing the cache. -In addition, seriously incorrect code results if you call a -function with too many arguments. (Normally, extra arguments are -harmlessly ignored.) +@end table -@item -mregparm=@var{num} -@opindex mregparm -Control how many registers are used to pass integer arguments. By -default, no registers are used to pass arguments, and at most 3 -registers can be used. You can control this behavior for a specific -function by using the function attribute @code{regparm}. -@xref{Function Attributes}. +@node M680x0 Options +@subsection M680x0 Options +@cindex M680x0 options -@strong{Warning:} if you use this switch, and -@var{num} is nonzero, then you must build all modules with the same -value, including any libraries. This includes the system libraries and -startup modules. +These are the @samp{-m} options defined for M680x0 and ColdFire processors. +The default settings depend on which architecture was selected when +the compiler was configured; the defaults for the most common choices +are given below. -@item -msseregparm -@opindex msseregparm -Use SSE register passing conventions for float and double arguments -and return values. You can control this behavior for a specific -function by using the function attribute @code{sseregparm}. -@xref{Function Attributes}. +@table @gcctabopt +@item -march=@var{arch} +@opindex march +Generate code for a specific M680x0 or ColdFire instruction set +architecture. Permissible values of @var{arch} for M680x0 +architectures are: @samp{68000}, @samp{68010}, @samp{68020}, +@samp{68030}, @samp{68040}, @samp{68060} and @samp{cpu32}. ColdFire +architectures are selected according to Freescale's ISA classification +and the permissible values are: @samp{isaa}, @samp{isaaplus}, +@samp{isab} and @samp{isac}. -@strong{Warning:} if you use this switch then you must build all -modules with the same value, including any libraries. This includes -the system libraries and startup modules. +GCC defines a macro @code{__mcf@var{arch}__} whenever it is generating +code for a ColdFire target. The @var{arch} in this macro is one of the +@option{-march} arguments given above. -@item -mvect8-ret-in-mem -@opindex mvect8-ret-in-mem -Return 8-byte vectors in memory instead of MMX registers. This is the -default on Solaris@tie{}8 and 9 and VxWorks to match the ABI of the Sun -Studio compilers until version 12. Later compiler versions (starting -with Studio 12 Update@tie{}1) follow the ABI used by other x86 targets, which -is the default on Solaris@tie{}10 and later. @emph{Only} use this option if -you need to remain compatible with existing code produced by those -previous compiler versions or older versions of GCC@. +When used together, @option{-march} and @option{-mtune} select code +that runs on a family of similar processors but that is optimized +for a particular microarchitecture. -@item -mpc32 -@itemx -mpc64 -@itemx -mpc80 -@opindex mpc32 -@opindex mpc64 -@opindex mpc80 +@item -mcpu=@var{cpu} +@opindex mcpu +Generate code for a specific M680x0 or ColdFire processor. +The M680x0 @var{cpu}s are: @samp{68000}, @samp{68010}, @samp{68020}, +@samp{68030}, @samp{68040}, @samp{68060}, @samp{68302}, @samp{68332} +and @samp{cpu32}. The ColdFire @var{cpu}s are given by the table +below, which also classifies the CPUs into families: -Set 80387 floating-point precision to 32, 64 or 80 bits. When @option{-mpc32} -is specified, the significands of results of floating-point operations are -rounded to 24 bits (single precision); @option{-mpc64} rounds the -significands of results of floating-point operations to 53 bits (double -precision) and @option{-mpc80} rounds the significands of results of -floating-point operations to 64 bits (extended double precision), which is -the default. When this option is used, floating-point operations in higher -precisions are not available to the programmer without setting the FPU -control word explicitly. +@multitable @columnfractions 0.20 0.80 +@item @strong{Family} @tab @strong{@samp{-mcpu} arguments} +@item @samp{51} @tab @samp{51} @samp{51ac} @samp{51ag} @samp{51cn} @samp{51em} @samp{51je} @samp{51jf} @samp{51jg} @samp{51jm} @samp{51mm} @samp{51qe} @samp{51qm} +@item @samp{5206} @tab @samp{5202} @samp{5204} @samp{5206} +@item @samp{5206e} @tab @samp{5206e} +@item @samp{5208} @tab @samp{5207} @samp{5208} +@item @samp{5211a} @tab @samp{5210a} @samp{5211a} +@item @samp{5213} @tab @samp{5211} @samp{5212} @samp{5213} +@item @samp{5216} @tab @samp{5214} @samp{5216} +@item @samp{52235} @tab @samp{52230} @samp{52231} @samp{52232} @samp{52233} @samp{52234} @samp{52235} +@item @samp{5225} @tab @samp{5224} @samp{5225} +@item @samp{52259} @tab @samp{52252} @samp{52254} @samp{52255} @samp{52256} @samp{52258} @samp{52259} +@item @samp{5235} @tab @samp{5232} @samp{5233} @samp{5234} @samp{5235} @samp{523x} +@item @samp{5249} @tab @samp{5249} +@item @samp{5250} @tab @samp{5250} +@item @samp{5271} @tab @samp{5270} @samp{5271} +@item @samp{5272} @tab @samp{5272} +@item @samp{5275} @tab @samp{5274} @samp{5275} +@item @samp{5282} @tab @samp{5280} @samp{5281} @samp{5282} @samp{528x} +@item @samp{53017} @tab @samp{53011} @samp{53012} @samp{53013} @samp{53014} @samp{53015} @samp{53016} @samp{53017} +@item @samp{5307} @tab @samp{5307} +@item @samp{5329} @tab @samp{5327} @samp{5328} @samp{5329} @samp{532x} +@item @samp{5373} @tab @samp{5372} @samp{5373} @samp{537x} +@item @samp{5407} @tab @samp{5407} +@item @samp{5475} @tab @samp{5470} @samp{5471} @samp{5472} @samp{5473} @samp{5474} @samp{5475} @samp{547x} @samp{5480} @samp{5481} @samp{5482} @samp{5483} @samp{5484} @samp{5485} +@end multitable -Setting the rounding of floating-point operations to less than the default -80 bits can speed some programs by 2% or more. Note that some mathematical -libraries assume that extended-precision (80-bit) floating-point operations -are enabled by default; routines in such libraries could suffer significant -loss of accuracy, typically through so-called ``catastrophic cancellation'', -when this option is used to set the precision to less than extended precision. +@option{-mcpu=@var{cpu}} overrides @option{-march=@var{arch}} if +@var{arch} is compatible with @var{cpu}. Other combinations of +@option{-mcpu} and @option{-march} are rejected. -@item -mstackrealign -@opindex mstackrealign -Realign the stack at entry. On the x86, the @option{-mstackrealign} -option generates an alternate prologue and epilogue that realigns the -run-time stack if necessary. This supports mixing legacy codes that keep -4-byte stack alignment with modern codes that keep 16-byte stack alignment for -SSE compatibility. See also the attribute @code{force_align_arg_pointer}, -applicable to individual functions. +GCC defines the macro @code{__mcf_cpu_@var{cpu}} when ColdFire target +@var{cpu} is selected. It also defines @code{__mcf_family_@var{family}}, +where the value of @var{family} is given by the table above. -@item -mpreferred-stack-boundary=@var{num} -@opindex mpreferred-stack-boundary -Attempt to keep the stack boundary aligned to a 2 raised to @var{num} -byte boundary. If @option{-mpreferred-stack-boundary} is not specified, -the default is 4 (16 bytes or 128 bits). +@item -mtune=@var{tune} +@opindex mtune +Tune the code for a particular microarchitecture within the +constraints set by @option{-march} and @option{-mcpu}. +The M680x0 microarchitectures are: @samp{68000}, @samp{68010}, +@samp{68020}, @samp{68030}, @samp{68040}, @samp{68060} +and @samp{cpu32}. The ColdFire microarchitectures +are: @samp{cfv1}, @samp{cfv2}, @samp{cfv3}, @samp{cfv4} and @samp{cfv4e}. -@strong{Warning:} When generating code for the x86-64 architecture with -SSE extensions disabled, @option{-mpreferred-stack-boundary=3} can be -used to keep the stack boundary aligned to 8 byte boundary. Since -x86-64 ABI require 16 byte stack alignment, this is ABI incompatible and -intended to be used in controlled environment where stack space is -important limitation. This option leads to wrong code when functions -compiled with 16 byte stack alignment (such as functions from a standard -library) are called with misaligned stack. In this case, SSE -instructions may lead to misaligned memory access traps. In addition, -variable arguments are handled incorrectly for 16 byte aligned -objects (including x87 long double and __int128), leading to wrong -results. You must build all modules with -@option{-mpreferred-stack-boundary=3}, including any libraries. This -includes the system libraries and startup modules. +You can also use @option{-mtune=68020-40} for code that needs +to run relatively well on 68020, 68030 and 68040 targets. +@option{-mtune=68020-60} is similar but includes 68060 targets +as well. These two options select the same tuning decisions as +@option{-m68020-40} and @option{-m68020-60} respectively. -@item -mincoming-stack-boundary=@var{num} -@opindex mincoming-stack-boundary -Assume the incoming stack is aligned to a 2 raised to @var{num} byte -boundary. If @option{-mincoming-stack-boundary} is not specified, -the one specified by @option{-mpreferred-stack-boundary} is used. +GCC defines the macros @code{__mc@var{arch}} and @code{__mc@var{arch}__} +when tuning for 680x0 architecture @var{arch}. It also defines +@code{mc@var{arch}} unless either @option{-ansi} or a non-GNU @option{-std} +option is used. If GCC is tuning for a range of architectures, +as selected by @option{-mtune=68020-40} or @option{-mtune=68020-60}, +it defines the macros for every architecture in the range. -On Pentium and Pentium Pro, @code{double} and @code{long double} values -should be aligned to an 8-byte boundary (see @option{-malign-double}) or -suffer significant run time performance penalties. On Pentium III, the -Streaming SIMD Extension (SSE) data type @code{__m128} may not work -properly if it is not 16-byte aligned. +GCC also defines the macro @code{__m@var{uarch}__} when tuning for +ColdFire microarchitecture @var{uarch}, where @var{uarch} is one +of the arguments given above. -To ensure proper alignment of this values on the stack, the stack boundary -must be as aligned as that required by any value stored on the stack. -Further, every function must be generated such that it keeps the stack -aligned. Thus calling a function compiled with a higher preferred -stack boundary from a function compiled with a lower preferred stack -boundary most likely misaligns the stack. It is recommended that -libraries that use callbacks always use the default setting. +@item -m68000 +@itemx -mc68000 +@opindex m68000 +@opindex mc68000 +Generate output for a 68000. This is the default +when the compiler is configured for 68000-based systems. +It is equivalent to @option{-march=68000}. -This extra alignment does consume extra stack space, and generally -increases code size. Code that is sensitive to stack space usage, such -as embedded systems and operating system kernels, may want to reduce the -preferred alignment to @option{-mpreferred-stack-boundary=2}. +Use this option for microcontrollers with a 68000 or EC000 core, +including the 68008, 68302, 68306, 68307, 68322, 68328 and 68356. -@need 200 -@item -mmmx -@opindex mmmx -@need 200 -@itemx -msse -@opindex msse -@need 200 -@itemx -msse2 -@need 200 -@itemx -msse3 -@need 200 -@itemx -mssse3 -@need 200 -@itemx -msse4 -@need 200 -@itemx -msse4a -@need 200 -@itemx -msse4.1 -@need 200 -@itemx -msse4.2 -@need 200 -@itemx -mavx -@opindex mavx -@need 200 -@itemx -mavx2 -@need 200 -@itemx -mavx512f -@need 200 -@itemx -mavx512pf -@need 200 -@itemx -mavx512er -@need 200 -@itemx -mavx512cd -@need 200 -@itemx -msha -@opindex msha -@need 200 -@itemx -maes -@opindex maes -@need 200 -@itemx -mpclmul -@opindex mpclmul -@need 200 -@itemx -mclfushopt -@opindex mclfushopt -@need 200 -@itemx -mfsgsbase -@opindex mfsgsbase -@need 200 -@itemx -mrdrnd -@opindex mrdrnd -@need 200 -@itemx -mf16c -@opindex mf16c -@need 200 -@itemx -mfma -@opindex mfma -@need 200 -@itemx -mfma4 -@need 200 -@itemx -mno-fma4 -@need 200 -@itemx -mprefetchwt1 -@opindex mprefetchwt1 -@need 200 -@itemx -mxop -@opindex mxop -@need 200 -@itemx -mlwp -@opindex mlwp -@need 200 -@itemx -m3dnow -@opindex m3dnow -@need 200 -@itemx -mpopcnt -@opindex mpopcnt -@need 200 -@itemx -mabm -@opindex mabm -@need 200 -@itemx -mbmi -@opindex mbmi -@need 200 -@itemx -mbmi2 -@need 200 -@itemx -mlzcnt -@opindex mlzcnt -@need 200 -@itemx -mfxsr -@opindex mfxsr -@need 200 -@itemx -mxsave -@opindex mxsave -@need 200 -@itemx -mxsaveopt -@opindex mxsaveopt -@need 200 -@itemx -mxsavec -@opindex mxsavec -@need 200 -@itemx -mxsaves -@opindex mxsaves -@need 200 -@itemx -mrtm -@opindex mrtm -@need 200 -@itemx -mtbm -@opindex mtbm -@need 200 -@itemx -mmpx -@opindex mmpx -These switches enable the use of instructions in the MMX, SSE, -SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD, -SHA, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, LWP, ABM, -BMI, BMI2, FXSR, XSAVE, XSAVEOPT, LZCNT, RTM, MPX or 3DNow!@: -extended instruction sets. Each has a corresponding @option{-mno-} option -to disable use of these instructions. +@item -m68010 +@opindex m68010 +Generate output for a 68010. This is the default +when the compiler is configured for 68010-based systems. +It is equivalent to @option{-march=68010}. -These extensions are also available as built-in functions: see -@ref{x86 Built-in Functions}, for details of the functions enabled and -disabled by these switches. +@item -m68020 +@itemx -mc68020 +@opindex m68020 +@opindex mc68020 +Generate output for a 68020. This is the default +when the compiler is configured for 68020-based systems. +It is equivalent to @option{-march=68020}. -To generate SSE/SSE2 instructions automatically from floating-point -code (as opposed to 387 instructions), see @option{-mfpmath=sse}. +@item -m68030 +@opindex m68030 +Generate output for a 68030. This is the default when the compiler is +configured for 68030-based systems. It is equivalent to +@option{-march=68030}. -GCC depresses SSEx instructions when @option{-mavx} is used. Instead, it -generates new AVX instructions or AVX equivalence for all SSEx instructions -when needed. +@item -m68040 +@opindex m68040 +Generate output for a 68040. This is the default when the compiler is +configured for 68040-based systems. It is equivalent to +@option{-march=68040}. -These options enable GCC to use these extended instructions in -generated code, even without @option{-mfpmath=sse}. Applications that -perform run-time CPU detection must compile separate files for each -supported architecture, using the appropriate flags. In particular, -the file containing the CPU detection code should be compiled without -these options. +This option inhibits the use of 68881/68882 instructions that have to be +emulated by software on the 68040. Use this option if your 68040 does not +have code to emulate those instructions. -@item -mdump-tune-features -@opindex mdump-tune-features -This option instructs GCC to dump the names of the x86 performance -tuning features and default settings. The names can be used in -@option{-mtune-ctrl=@var{feature-list}}. +@item -m68060 +@opindex m68060 +Generate output for a 68060. This is the default when the compiler is +configured for 68060-based systems. It is equivalent to +@option{-march=68060}. -@item -mtune-ctrl=@var{feature-list} -@opindex mtune-ctrl=@var{feature-list} -This option is used to do fine grain control of x86 code generation features. -@var{feature-list} is a comma separated list of @var{feature} names. See also -@option{-mdump-tune-features}. When specified, the @var{feature} is turned -on if it is not preceded with @samp{^}, otherwise, it is turned off. -@option{-mtune-ctrl=@var{feature-list}} is intended to be used by GCC -developers. Using it may lead to code paths not covered by testing and can -potentially result in compiler ICEs or runtime errors. +This option inhibits the use of 68020 and 68881/68882 instructions that +have to be emulated by software on the 68060. Use this option if your 68060 +does not have code to emulate those instructions. -@item -mno-default -@opindex mno-default -This option instructs GCC to turn off all tunable features. See also -@option{-mtune-ctrl=@var{feature-list}} and @option{-mdump-tune-features}. +@item -mcpu32 +@opindex mcpu32 +Generate output for a CPU32. This is the default +when the compiler is configured for CPU32-based systems. +It is equivalent to @option{-march=cpu32}. -@item -mcld -@opindex mcld -This option instructs GCC to emit a @code{cld} instruction in the prologue -of functions that use string instructions. String instructions depend on -the DF flag to select between autoincrement or autodecrement mode. While the -ABI specifies the DF flag to be cleared on function entry, some operating -systems violate this specification by not clearing the DF flag in their -exception dispatchers. The exception handler can be invoked with the DF flag -set, which leads to wrong direction mode when string instructions are used. -This option can be enabled by default on 32-bit x86 targets by configuring -GCC with the @option{--enable-cld} configure option. Generation of @code{cld} -instructions can be suppressed with the @option{-mno-cld} compiler option -in this case. +Use this option for microcontrollers with a +CPU32 or CPU32+ core, including the 68330, 68331, 68332, 68333, 68334, +68336, 68340, 68341, 68349 and 68360. -@item -mvzeroupper -@opindex mvzeroupper -This option instructs GCC to emit a @code{vzeroupper} instruction -before a transfer of control flow out of the function to minimize -the AVX to SSE transition penalty as well as remove unnecessary @code{zeroupper} -intrinsics. +@item -m5200 +@opindex m5200 +Generate output for a 520X ColdFire CPU@. This is the default +when the compiler is configured for 520X-based systems. +It is equivalent to @option{-mcpu=5206}, and is now deprecated +in favor of that option. -@item -mprefer-avx128 -@opindex mprefer-avx128 -This option instructs GCC to use 128-bit AVX instructions instead of -256-bit AVX instructions in the auto-vectorizer. +Use this option for microcontroller with a 5200 core, including +the MCF5202, MCF5203, MCF5204 and MCF5206. -@item -mcx16 -@opindex mcx16 -This option enables GCC to generate @code{CMPXCHG16B} instructions. -@code{CMPXCHG16B} allows for atomic operations on 128-bit double quadword -(or oword) data types. -This is useful for high-resolution counters that can be updated -by multiple processors (or cores). This instruction is generated as part of -atomic built-in functions: see @ref{__sync Builtins} or -@ref{__atomic Builtins} for details. +@item -m5206e +@opindex m5206e +Generate output for a 5206e ColdFire CPU@. The option is now +deprecated in favor of the equivalent @option{-mcpu=5206e}. -@item -msahf -@opindex msahf -This option enables generation of @code{SAHF} instructions in 64-bit code. -Early Intel Pentium 4 CPUs with Intel 64 support, -prior to the introduction of Pentium 4 G1 step in December 2005, -lacked the @code{LAHF} and @code{SAHF} instructions -which are supported by AMD64. -These are load and store instructions, respectively, for certain status flags. -In 64-bit mode, the @code{SAHF} instruction is used to optimize @code{fmod}, -@code{drem}, and @code{remainder} built-in functions; -see @ref{Other Builtins} for details. +@item -m528x +@opindex m528x +Generate output for a member of the ColdFire 528X family. +The option is now deprecated in favor of the equivalent +@option{-mcpu=528x}. -@item -mmovbe -@opindex mmovbe -This option enables use of the @code{movbe} instruction to implement -@code{__builtin_bswap32} and @code{__builtin_bswap64}. +@item -m5307 +@opindex m5307 +Generate output for a ColdFire 5307 CPU@. The option is now deprecated +in favor of the equivalent @option{-mcpu=5307}. -@item -mcrc32 -@opindex mcrc32 -This option enables built-in functions @code{__builtin_ia32_crc32qi}, -@code{__builtin_ia32_crc32hi}, @code{__builtin_ia32_crc32si} and -@code{__builtin_ia32_crc32di} to generate the @code{crc32} machine instruction. +@item -m5407 +@opindex m5407 +Generate output for a ColdFire 5407 CPU@. The option is now deprecated +in favor of the equivalent @option{-mcpu=5407}. -@item -mrecip -@opindex mrecip -This option enables use of @code{RCPSS} and @code{RSQRTSS} instructions -(and their vectorized variants @code{RCPPS} and @code{RSQRTPS}) -with an additional Newton-Raphson step -to increase precision instead of @code{DIVSS} and @code{SQRTSS} -(and their vectorized -variants) for single-precision floating-point arguments. These instructions -are generated only when @option{-funsafe-math-optimizations} is enabled -together with @option{-finite-math-only} and @option{-fno-trapping-math}. -Note that while the throughput of the sequence is higher than the throughput -of the non-reciprocal instruction, the precision of the sequence can be -decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994). +@item -mcfv4e +@opindex mcfv4e +Generate output for a ColdFire V4e family CPU (e.g.@: 547x/548x). +This includes use of hardware floating-point instructions. +The option is equivalent to @option{-mcpu=547x}, and is now +deprecated in favor of that option. -Note that GCC implements @code{1.0f/sqrtf(@var{x})} in terms of @code{RSQRTSS} -(or @code{RSQRTPS}) already with @option{-ffast-math} (or the above option -combination), and doesn't need @option{-mrecip}. +@item -m68020-40 +@opindex m68020-40 +Generate output for a 68040, without using any of the new instructions. +This results in code that can run relatively efficiently on either a +68020/68881 or a 68030 or a 68040. The generated code does use the +68881 instructions that are emulated on the 68040. -Also note that GCC emits the above sequence with additional Newton-Raphson step -for vectorized single-float division and vectorized @code{sqrtf(@var{x})} -already with @option{-ffast-math} (or the above option combination), and -doesn't need @option{-mrecip}. +The option is equivalent to @option{-march=68020} @option{-mtune=68020-40}. -@item -mrecip=@var{opt} -@opindex mrecip=opt -This option controls which reciprocal estimate instructions -may be used. @var{opt} is a comma-separated list of options, which may -be preceded by a @samp{!} to invert the option: +@item -m68020-60 +@opindex m68020-60 +Generate output for a 68060, without using any of the new instructions. +This results in code that can run relatively efficiently on either a +68020/68881 or a 68030 or a 68040. The generated code does use the +68881 instructions that are emulated on the 68060. -@table @samp -@item all -Enable all estimate instructions. +The option is equivalent to @option{-march=68020} @option{-mtune=68020-60}. -@item default -Enable the default instructions, equivalent to @option{-mrecip}. +@item -mhard-float +@itemx -m68881 +@opindex mhard-float +@opindex m68881 +Generate floating-point instructions. This is the default for 68020 +and above, and for ColdFire devices that have an FPU@. It defines the +macro @code{__HAVE_68881__} on M680x0 targets and @code{__mcffpu__} +on ColdFire targets. -@item none -Disable all estimate instructions, equivalent to @option{-mno-recip}. +@item -msoft-float +@opindex msoft-float +Do not generate floating-point instructions; use library calls instead. +This is the default for 68000, 68010, and 68832 targets. It is also +the default for ColdFire devices that have no FPU. -@item div -Enable the approximation for scalar division. +@item -mdiv +@itemx -mno-div +@opindex mdiv +@opindex mno-div +Generate (do not generate) ColdFire hardware divide and remainder +instructions. If @option{-march} is used without @option{-mcpu}, +the default is ``on'' for ColdFire architectures and ``off'' for M680x0 +architectures. Otherwise, the default is taken from the target CPU +(either the default CPU, or the one specified by @option{-mcpu}). For +example, the default is ``off'' for @option{-mcpu=5206} and ``on'' for +@option{-mcpu=5206e}. -@item vec-div -Enable the approximation for vectorized division. +GCC defines the macro @code{__mcfhwdiv__} when this option is enabled. -@item sqrt -Enable the approximation for scalar square root. +@item -mshort +@opindex mshort +Consider type @code{int} to be 16 bits wide, like @code{short int}. +Additionally, parameters passed on the stack are also aligned to a +16-bit boundary even on targets whose API mandates promotion to 32-bit. -@item vec-sqrt -Enable the approximation for vectorized square root. -@end table +@item -mno-short +@opindex mno-short +Do not consider type @code{int} to be 16 bits wide. This is the default. -So, for example, @option{-mrecip=all,!sqrt} enables -all of the reciprocal approximations, except for square root. +@item -mnobitfield +@itemx -mno-bitfield +@opindex mnobitfield +@opindex mno-bitfield +Do not use the bit-field instructions. The @option{-m68000}, @option{-mcpu32} +and @option{-m5200} options imply @w{@option{-mnobitfield}}. -@item -mveclibabi=@var{type} -@opindex mveclibabi -Specifies the ABI type to use for vectorizing intrinsics using an -external library. Supported values for @var{type} are @samp{svml} -for the Intel short -vector math library and @samp{acml} for the AMD math core library. -To use this option, both @option{-ftree-vectorize} and -@option{-funsafe-math-optimizations} have to be enabled, and an SVML or ACML -ABI-compatible library must be specified at link time. +@item -mbitfield +@opindex mbitfield +Do use the bit-field instructions. The @option{-m68020} option implies +@option{-mbitfield}. This is the default if you use a configuration +designed for a 68020. -GCC currently emits calls to @code{vmldExp2}, -@code{vmldLn2}, @code{vmldLog102}, @code{vmldLog102}, @code{vmldPow2}, -@code{vmldTanh2}, @code{vmldTan2}, @code{vmldAtan2}, @code{vmldAtanh2}, -@code{vmldCbrt2}, @code{vmldSinh2}, @code{vmldSin2}, @code{vmldAsinh2}, -@code{vmldAsin2}, @code{vmldCosh2}, @code{vmldCos2}, @code{vmldAcosh2}, -@code{vmldAcos2}, @code{vmlsExp4}, @code{vmlsLn4}, @code{vmlsLog104}, -@code{vmlsLog104}, @code{vmlsPow4}, @code{vmlsTanh4}, @code{vmlsTan4}, -@code{vmlsAtan4}, @code{vmlsAtanh4}, @code{vmlsCbrt4}, @code{vmlsSinh4}, -@code{vmlsSin4}, @code{vmlsAsinh4}, @code{vmlsAsin4}, @code{vmlsCosh4}, -@code{vmlsCos4}, @code{vmlsAcosh4} and @code{vmlsAcos4} for corresponding -function type when @option{-mveclibabi=svml} is used, and @code{__vrd2_sin}, -@code{__vrd2_cos}, @code{__vrd2_exp}, @code{__vrd2_log}, @code{__vrd2_log2}, -@code{__vrd2_log10}, @code{__vrs4_sinf}, @code{__vrs4_cosf}, -@code{__vrs4_expf}, @code{__vrs4_logf}, @code{__vrs4_log2f}, -@code{__vrs4_log10f} and @code{__vrs4_powf} for the corresponding function type -when @option{-mveclibabi=acml} is used. +@item -mrtd +@opindex mrtd +Use a different function-calling convention, in which functions +that take a fixed number of arguments return with the @code{rtd} +instruction, which pops their arguments while returning. This +saves one instruction in the caller since there is no need to pop +the arguments there. -@item -mabi=@var{name} -@opindex mabi -Generate code for the specified calling convention. Permissible values -are @samp{sysv} for the ABI used on GNU/Linux and other systems, and -@samp{ms} for the Microsoft ABI. The default is to use the Microsoft -ABI when targeting Microsoft Windows and the SysV ABI on all other systems. -You can control this behavior for specific functions by -using the function attributes @code{ms_abi} and @code{sysv_abi}. -@xref{Function Attributes}. +This calling convention is incompatible with the one normally +used on Unix, so you cannot use it if you need to call libraries +compiled with the Unix compiler. -@item -mtls-dialect=@var{type} -@opindex mtls-dialect -Generate code to access thread-local storage using the @samp{gnu} or -@samp{gnu2} conventions. @samp{gnu} is the conservative default; -@samp{gnu2} is more efficient, but it may add compile- and run-time -requirements that cannot be satisfied on all systems. +Also, you must provide function prototypes for all functions that +take variable numbers of arguments (including @code{printf}); +otherwise incorrect code is generated for calls to those +functions. -@item -mpush-args -@itemx -mno-push-args -@opindex mpush-args -@opindex mno-push-args -Use PUSH operations to store outgoing parameters. This method is shorter -and usually equally fast as method using SUB/MOV operations and is enabled -by default. In some cases disabling it may improve performance because of -improved scheduling and reduced dependencies. +In addition, seriously incorrect code results if you call a +function with too many arguments. (Normally, extra arguments are +harmlessly ignored.) -@item -maccumulate-outgoing-args -@opindex maccumulate-outgoing-args -If enabled, the maximum amount of space required for outgoing arguments is -computed in the function prologue. This is faster on most modern CPUs -because of reduced dependencies, improved scheduling and reduced stack usage -when the preferred stack boundary is not equal to 2. The drawback is a notable -increase in code size. This switch implies @option{-mno-push-args}. +The @code{rtd} instruction is supported by the 68010, 68020, 68030, +68040, 68060 and CPU32 processors, but not by the 68000 or 5200. -@item -mthreads -@opindex mthreads -Support thread-safe exception handling on MinGW. Programs that rely -on thread-safe exception handling must compile and link all code with the -@option{-mthreads} option. When compiling, @option{-mthreads} defines -@option{-D_MT}; when linking, it links in a special thread helper library -@option{-lmingwthrd} which cleans up per-thread exception-handling data. +@item -mno-rtd +@opindex mno-rtd +Do not use the calling conventions selected by @option{-mrtd}. +This is the default. -@item -mno-align-stringops -@opindex mno-align-stringops -Do not align the destination of inlined string operations. This switch reduces -code size and improves performance in case the destination is already aligned, -but GCC doesn't know about it. +@item -malign-int +@itemx -mno-align-int +@opindex malign-int +@opindex mno-align-int +Control whether GCC aligns @code{int}, @code{long}, @code{long long}, +@code{float}, @code{double}, and @code{long double} variables on a 32-bit +boundary (@option{-malign-int}) or a 16-bit boundary (@option{-mno-align-int}). +Aligning variables on 32-bit boundaries produces code that runs somewhat +faster on processors with 32-bit busses at the expense of more memory. -@item -minline-all-stringops -@opindex minline-all-stringops -By default GCC inlines string operations only when the destination is -known to be aligned to least a 4-byte boundary. -This enables more inlining and increases code -size, but may improve performance of code that depends on fast -@code{memcpy}, @code{strlen}, -and @code{memset} for short lengths. +@strong{Warning:} if you use the @option{-malign-int} switch, GCC +aligns structures containing the above types differently than +most published application binary interface specifications for the m68k. -@item -minline-stringops-dynamically -@opindex minline-stringops-dynamically -For string operations of unknown size, use run-time checks with -inline code for small blocks and a library call for large blocks. +@item -mpcrel +@opindex mpcrel +Use the pc-relative addressing mode of the 68000 directly, instead of +using a global offset table. At present, this option implies @option{-fpic}, +allowing at most a 16-bit offset for pc-relative addressing. @option{-fPIC} is +not presently supported with @option{-mpcrel}, though this could be supported for +68020 and higher processors. -@item -mstringop-strategy=@var{alg} -@opindex mstringop-strategy=@var{alg} -Override the internal decision heuristic for the particular algorithm to use -for inlining string operations. The allowed values for @var{alg} are: +@item -mno-strict-align +@itemx -mstrict-align +@opindex mno-strict-align +@opindex mstrict-align +Do not (do) assume that unaligned memory references are handled by +the system. -@table @samp -@item rep_byte -@itemx rep_4byte -@itemx rep_8byte -Expand using i386 @code{rep} prefix of the specified size. +@item -msep-data +Generate code that allows the data segment to be located in a different +area of memory from the text segment. This allows for execute-in-place in +an environment without virtual memory management. This option implies +@option{-fPIC}. -@item byte_loop -@itemx loop -@itemx unrolled_loop -Expand into an inline loop. +@item -mno-sep-data +Generate code that assumes that the data segment follows the text segment. +This is the default. -@item libcall -Always use a library call. -@end table +@item -mid-shared-library +Generate code that supports shared libraries via the library ID method. +This allows for execute-in-place and shared libraries in an environment +without virtual memory management. This option implies @option{-fPIC}. -@item -mmemcpy-strategy=@var{strategy} -@opindex mmemcpy-strategy=@var{strategy} -Override the internal decision heuristic to decide if @code{__builtin_memcpy} -should be inlined and what inline algorithm to use when the expected size -of the copy operation is known. @var{strategy} -is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets. -@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies -the max byte size with which inline algorithm @var{alg} is allowed. For the last -triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets -in the list must be specified in increasing order. The minimal byte size for -@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the -preceding range. +@item -mno-id-shared-library +Generate code that doesn't assume ID-based shared libraries are being used. +This is the default. -@item -mmemset-strategy=@var{strategy} -@opindex mmemset-strategy=@var{strategy} -The option is similar to @option{-mmemcpy-strategy=} except that it is to control -@code{__builtin_memset} expansion. +@item -mshared-library-id=n +Specifies the identification number of the ID-based shared library being +compiled. Specifying a value of 0 generates more compact code; specifying +other values forces the allocation of that number to the current +library, but is no more space- or time-efficient than omitting this option. -@item -momit-leaf-frame-pointer -@opindex momit-leaf-frame-pointer -Don't keep the frame pointer in a register for leaf functions. This -avoids the instructions to save, set up, and restore frame pointers and -makes an extra register available in leaf functions. The option -@option{-fomit-leaf-frame-pointer} removes the frame pointer for leaf functions, -which might make debugging harder. +@item -mxgot +@itemx -mno-xgot +@opindex mxgot +@opindex mno-xgot +When generating position-independent code for ColdFire, generate code +that works if the GOT has more than 8192 entries. This code is +larger and slower than code generated without this option. On M680x0 +processors, this option is not needed; @option{-fPIC} suffices. -@item -mtls-direct-seg-refs -@itemx -mno-tls-direct-seg-refs -@opindex mtls-direct-seg-refs -Controls whether TLS variables may be accessed with offsets from the -TLS segment register (@code{%gs} for 32-bit, @code{%fs} for 64-bit), -or whether the thread base pointer must be added. Whether or not this -is valid depends on the operating system, and whether it maps the -segment to cover the entire TLS area. +GCC normally uses a single instruction to load values from the GOT@. +While this is relatively efficient, it only works if the GOT +is smaller than about 64k. Anything larger causes the linker +to report an error such as: -For systems that use the GNU C Library, the default is on. +@cindex relocation truncated to fit (ColdFire) +@smallexample +relocation truncated to fit: R_68K_GOT16O foobar +@end smallexample -@item -msse2avx -@itemx -mno-sse2avx -@opindex msse2avx -Specify that the assembler should encode SSE instructions with VEX -prefix. The option @option{-mavx} turns this on by default. +If this happens, you should recompile your code with @option{-mxgot}. +It should then work with very large GOTs. However, code generated with +@option{-mxgot} is less efficient, since it takes 4 instructions to fetch +the value of a global symbol. -@item -mfentry -@itemx -mno-fentry -@opindex mfentry -If profiling is active (@option{-pg}), put the profiling -counter call before the prologue. -Note: On x86 architectures the attribute @code{ms_hook_prologue} -isn't possible at the moment for @option{-mfentry} and @option{-pg}. +Note that some linkers, including newer versions of the GNU linker, +can create multiple GOTs and sort GOT entries. If you have such a linker, +you should only need to use @option{-mxgot} when compiling a single +object file that accesses more than 8192 GOT entries. Very few do. -@item -mrecord-mcount -@itemx -mno-record-mcount -@opindex mrecord-mcount -If profiling is active (@option{-pg}), generate a __mcount_loc section -that contains pointers to each profiling call. This is useful for -automatically patching and out calls. +These options have no effect unless GCC is generating +position-independent code. -@item -mnop-mcount -@itemx -mno-nop-mcount -@opindex mnop-mcount -If profiling is active (@option{-pg}), generate the calls to -the profiling functions as nops. This is useful when they -should be patched in later dynamically. This is likely only -useful together with @option{-mrecord-mcount}. +@end table -@item -mskip-rax-setup -@itemx -mno-skip-rax-setup -@opindex mskip-rax-setup -When generating code for the x86-64 architecture with SSE extensions -disabled, @option{-skip-rax-setup} can be used to skip setting up RAX -register when there are no variable arguments passed in vector registers. +@node MCore Options +@subsection MCore Options +@cindex MCore options -@strong{Warning:} Since RAX register is used to avoid unnecessarily -saving vector registers on stack when passing variable arguments, the -impacts of this option are callees may waste some stack space, -misbehave or jump to a random location. GCC 4.4 or newer don't have -those issues, regardless the RAX register value. +These are the @samp{-m} options defined for the Motorola M*Core +processors. -@item -m8bit-idiv -@itemx -mno-8bit-idiv -@opindex m8bit-idiv -On some processors, like Intel Atom, 8-bit unsigned integer divide is -much faster than 32-bit/64-bit integer divide. This option generates a -run-time check. If both dividend and divisor are within range of 0 -to 255, 8-bit unsigned integer divide is used instead of -32-bit/64-bit integer divide. +@table @gcctabopt -@item -mavx256-split-unaligned-load -@itemx -mavx256-split-unaligned-store -@opindex mavx256-split-unaligned-load -@opindex mavx256-split-unaligned-store -Split 32-byte AVX unaligned load and store. +@item -mhardlit +@itemx -mno-hardlit +@opindex mhardlit +@opindex mno-hardlit +Inline constants into the code stream if it can be done in two +instructions or less. -@item -mstack-protector-guard=@var{guard} -@opindex mstack-protector-guard=@var{guard} -Generate stack protection code using canary at @var{guard}. Supported -locations are @samp{global} for global canary or @samp{tls} for per-thread -canary in the TLS block (the default). This option has effect only when -@option{-fstack-protector} or @option{-fstack-protector-all} is specified. +@item -mdiv +@itemx -mno-div +@opindex mdiv +@opindex mno-div +Use the divide instruction. (Enabled by default). -@end table +@item -mrelax-immediate +@itemx -mno-relax-immediate +@opindex mrelax-immediate +@opindex mno-relax-immediate +Allow arbitrary-sized immediates in bit operations. -These @samp{-m} switches are supported in addition to the above -on x86-64 processors in 64-bit environments. +@item -mwide-bitfields +@itemx -mno-wide-bitfields +@opindex mwide-bitfields +@opindex mno-wide-bitfields +Always treat bit-fields as @code{int}-sized. -@table @gcctabopt -@item -m32 -@itemx -m64 -@itemx -mx32 -@itemx -m16 -@opindex m32 -@opindex m64 -@opindex mx32 -@opindex m16 -Generate code for a 16-bit, 32-bit or 64-bit environment. -The @option{-m32} option sets @code{int}, @code{long}, and pointer types -to 32 bits, and -generates code that runs on any i386 system. +@item -m4byte-functions +@itemx -mno-4byte-functions +@opindex m4byte-functions +@opindex mno-4byte-functions +Force all functions to be aligned to a 4-byte boundary. -The @option{-m64} option sets @code{int} to 32 bits and @code{long} and pointer -types to 64 bits, and generates code for the x86-64 architecture. -For Darwin only the @option{-m64} option also turns off the @option{-fno-pic} -and @option{-mdynamic-no-pic} options. +@item -mcallgraph-data +@itemx -mno-callgraph-data +@opindex mcallgraph-data +@opindex mno-callgraph-data +Emit callgraph information. -The @option{-mx32} option sets @code{int}, @code{long}, and pointer types -to 32 bits, and -generates code for the x86-64 architecture. +@item -mslow-bytes +@itemx -mno-slow-bytes +@opindex mslow-bytes +@opindex mno-slow-bytes +Prefer word access when reading byte quantities. -The @option{-m16} option is the same as @option{-m32}, except for that -it outputs the @code{.code16gcc} assembly directive at the beginning of -the assembly output so that the binary can run in 16-bit mode. +@item -mlittle-endian +@itemx -mbig-endian +@opindex mlittle-endian +@opindex mbig-endian +Generate code for a little-endian target. -@item -mno-red-zone -@opindex mno-red-zone -Do not use a so-called ``red zone'' for x86-64 code. The red zone is mandated -by the x86-64 ABI; it is a 128-byte area beyond the location of the -stack pointer that is not modified by signal or interrupt handlers -and therefore can be used for temporary data without adjusting the stack -pointer. The flag @option{-mno-red-zone} disables this red zone. +@item -m210 +@itemx -m340 +@opindex m210 +@opindex m340 +Generate code for the 210 processor. -@item -mcmodel=small -@opindex mcmodel=small -Generate code for the small code model: the program and its symbols must -be linked in the lower 2 GB of the address space. Pointers are 64 bits. -Programs can be statically or dynamically linked. This is the default -code model. +@item -mno-lsim +@opindex mno-lsim +Assume that runtime support has been provided and so omit the +simulator library (@file{libsim.a)} from the linker command line. -@item -mcmodel=kernel -@opindex mcmodel=kernel -Generate code for the kernel code model. The kernel runs in the -negative 2 GB of the address space. -This model has to be used for Linux kernel code. +@item -mstack-increment=@var{size} +@opindex mstack-increment +Set the maximum amount for a single stack increment operation. Large +values can increase the speed of programs that contain functions +that need a large amount of stack space, but they can also trigger a +segmentation fault if the stack is extended too much. The default +value is 0x1000. -@item -mcmodel=medium -@opindex mcmodel=medium -Generate code for the medium model: the program is linked in the lower 2 -GB of the address space. Small symbols are also placed there. Symbols -with sizes larger than @option{-mlarge-data-threshold} are put into -large data or BSS sections and can be located above 2GB. Programs can -be statically or dynamically linked. - -@item -mcmodel=large -@opindex mcmodel=large -Generate code for the large model. This model makes no assumptions -about addresses and sizes of sections. - -@item -maddress-mode=long -@opindex maddress-mode=long -Generate code for long address mode. This is only supported for 64-bit -and x32 environments. It is the default address mode for 64-bit -environments. - -@item -maddress-mode=short -@opindex maddress-mode=short -Generate code for short address mode. This is only supported for 32-bit -and x32 environments. It is the default address mode for 32-bit and -x32 environments. @end table -@node x86 Windows Options -@subsection x86 Windows Options -@cindex x86 Windows Options -@cindex Windows Options for x86 - -These additional options are available for Microsoft Windows targets: +@node MeP Options +@subsection MeP Options +@cindex MeP options @table @gcctabopt -@item -mconsole -@opindex mconsole -This option -specifies that a console application is to be generated, by -instructing the linker to set the PE header subsystem type -required for console applications. -This option is available for Cygwin and MinGW targets and is -enabled by default on those targets. - -@item -mdll -@opindex mdll -This option is available for Cygwin and MinGW targets. It -specifies that a DLL---a dynamic link library---is to be -generated, enabling the selection of the required runtime -startup object and entry point. -@item -mnop-fun-dllimport -@opindex mnop-fun-dllimport -This option is available for Cygwin and MinGW targets. It -specifies that the @code{dllimport} attribute should be ignored. +@item -mabsdiff +@opindex mabsdiff +Enables the @code{abs} instruction, which is the absolute difference +between two registers. -@item -mthread -@opindex mthread -This option is available for MinGW targets. It specifies -that MinGW-specific thread support is to be used. +@item -mall-opts +@opindex mall-opts +Enables all the optional instructions---average, multiply, divide, bit +operations, leading zero, absolute difference, min/max, clip, and +saturation. -@item -municode -@opindex municode -This option is available for MinGW-w64 targets. It causes -the @code{UNICODE} preprocessor macro to be predefined, and -chooses Unicode-capable runtime startup code. -@item -mwin32 -@opindex mwin32 -This option is available for Cygwin and MinGW targets. It -specifies that the typical Microsoft Windows predefined macros are to -be set in the pre-processor, but does not influence the choice -of runtime library/startup code. +@item -maverage +@opindex maverage +Enables the @code{ave} instruction, which computes the average of two +registers. -@item -mwindows -@opindex mwindows -This option is available for Cygwin and MinGW targets. It -specifies that a GUI application is to be generated by -instructing the linker to set the PE header subsystem type -appropriately. +@item -mbased=@var{n} +@opindex mbased= +Variables of size @var{n} bytes or smaller are placed in the +@code{.based} section by default. Based variables use the @code{$tp} +register as a base register, and there is a 128-byte limit to the +@code{.based} section. -@item -fno-set-stack-executable -@opindex fno-set-stack-executable -This option is available for MinGW targets. It specifies that -the executable flag for the stack used by nested functions isn't -set. This is necessary for binaries running in kernel mode of -Microsoft Windows, as there the User32 API, which is used to set executable -privileges, isn't available. +@item -mbitops +@opindex mbitops +Enables the bit operation instructions---bit test (@code{btstm}), set +(@code{bsetm}), clear (@code{bclrm}), invert (@code{bnotm}), and +test-and-set (@code{tas}). -@item -fwritable-relocated-rdata -@opindex fno-writable-relocated-rdata -This option is available for MinGW and Cygwin targets. It specifies -that relocated-data in read-only section is put into .data -section. This is a necessary for older runtimes not supporting -modification of .rdata sections for pseudo-relocation. +@item -mc=@var{name} +@opindex mc= +Selects which section constant data is placed in. @var{name} may +be @samp{tiny}, @samp{near}, or @samp{far}. -@item -mpe-aligned-commons -@opindex mpe-aligned-commons -This option is available for Cygwin and MinGW targets. It -specifies that the GNU extension to the PE file format that -permits the correct alignment of COMMON variables should be -used when generating code. It is enabled by default if -GCC detects that the target assembler found during configuration -supports the feature. -@end table +@item -mclip +@opindex mclip +Enables the @code{clip} instruction. Note that @option{-mclip} is not +useful unless you also provide @option{-mminmax}. -See also under @ref{x86 Options} for standard options. +@item -mconfig=@var{name} +@opindex mconfig= +Selects one of the built-in core configurations. Each MeP chip has +one or more modules in it; each module has a core CPU and a variety of +coprocessors, optional instructions, and peripherals. The +@code{MeP-Integrator} tool, not part of GCC, provides these +configurations through this option; using this option is the same as +using all the corresponding command-line options. The default +configuration is @samp{default}. -@node IA-64 Options -@subsection IA-64 Options -@cindex IA-64 Options +@item -mcop +@opindex mcop +Enables the coprocessor instructions. By default, this is a 32-bit +coprocessor. Note that the coprocessor is normally enabled via the +@option{-mconfig=} option. -These are the @samp{-m} options defined for the Intel IA-64 architecture. +@item -mcop32 +@opindex mcop32 +Enables the 32-bit coprocessor's instructions. -@table @gcctabopt -@item -mbig-endian -@opindex mbig-endian -Generate code for a big-endian target. This is the default for HP-UX@. +@item -mcop64 +@opindex mcop64 +Enables the 64-bit coprocessor's instructions. -@item -mlittle-endian -@opindex mlittle-endian -Generate code for a little-endian target. This is the default for AIX5 -and GNU/Linux. +@item -mivc2 +@opindex mivc2 +Enables IVC2 scheduling. IVC2 is a 64-bit VLIW coprocessor. -@item -mgnu-as -@itemx -mno-gnu-as -@opindex mgnu-as -@opindex mno-gnu-as -Generate (or don't) code for the GNU assembler. This is the default. -@c Also, this is the default if the configure option @option{--with-gnu-as} -@c is used. +@item -mdc +@opindex mdc +Causes constant variables to be placed in the @code{.near} section. -@item -mgnu-ld -@itemx -mno-gnu-ld -@opindex mgnu-ld -@opindex mno-gnu-ld -Generate (or don't) code for the GNU linker. This is the default. -@c Also, this is the default if the configure option @option{--with-gnu-ld} -@c is used. +@item -mdiv +@opindex mdiv +Enables the @code{div} and @code{divu} instructions. -@item -mno-pic -@opindex mno-pic -Generate code that does not use a global pointer register. The result -is not position independent code, and violates the IA-64 ABI@. +@item -meb +@opindex meb +Generate big-endian code. -@item -mvolatile-asm-stop -@itemx -mno-volatile-asm-stop -@opindex mvolatile-asm-stop -@opindex mno-volatile-asm-stop -Generate (or don't) a stop bit immediately before and after volatile asm -statements. +@item -mel +@opindex mel +Generate little-endian code. -@item -mregister-names -@itemx -mno-register-names -@opindex mregister-names -@opindex mno-register-names -Generate (or don't) @samp{in}, @samp{loc}, and @samp{out} register names for -the stacked registers. This may make assembler output more readable. +@item -mio-volatile +@opindex mio-volatile +Tells the compiler that any variable marked with the @code{io} +attribute is to be considered volatile. -@item -mno-sdata -@itemx -msdata -@opindex mno-sdata -@opindex msdata -Disable (or enable) optimizations that use the small data section. This may -be useful for working around optimizer bugs. +@item -ml +@opindex ml +Causes variables to be assigned to the @code{.far} section by default. -@item -mconstant-gp -@opindex mconstant-gp -Generate code that uses a single constant global pointer value. This is -useful when compiling kernel code. +@item -mleadz +@opindex mleadz +Enables the @code{leadz} (leading zero) instruction. -@item -mauto-pic -@opindex mauto-pic -Generate code that is self-relocatable. This implies @option{-mconstant-gp}. -This is useful when compiling firmware code. +@item -mm +@opindex mm +Causes variables to be assigned to the @code{.near} section by default. -@item -minline-float-divide-min-latency -@opindex minline-float-divide-min-latency -Generate code for inline divides of floating-point values -using the minimum latency algorithm. +@item -mminmax +@opindex mminmax +Enables the @code{min} and @code{max} instructions. -@item -minline-float-divide-max-throughput -@opindex minline-float-divide-max-throughput -Generate code for inline divides of floating-point values -using the maximum throughput algorithm. +@item -mmult +@opindex mmult +Enables the multiplication and multiply-accumulate instructions. -@item -mno-inline-float-divide -@opindex mno-inline-float-divide -Do not generate inline code for divides of floating-point values. +@item -mno-opts +@opindex mno-opts +Disables all the optional instructions enabled by @option{-mall-opts}. -@item -minline-int-divide-min-latency -@opindex minline-int-divide-min-latency -Generate code for inline divides of integer values -using the minimum latency algorithm. +@item -mrepeat +@opindex mrepeat +Enables the @code{repeat} and @code{erepeat} instructions, used for +low-overhead looping. -@item -minline-int-divide-max-throughput -@opindex minline-int-divide-max-throughput -Generate code for inline divides of integer values -using the maximum throughput algorithm. +@item -ms +@opindex ms +Causes all variables to default to the @code{.tiny} section. Note +that there is a 65536-byte limit to this section. Accesses to these +variables use the @code{%gp} base register. -@item -mno-inline-int-divide -@opindex mno-inline-int-divide -Do not generate inline code for divides of integer values. +@item -msatur +@opindex msatur +Enables the saturation instructions. Note that the compiler does not +currently generate these itself, but this option is included for +compatibility with other tools, like @code{as}. -@item -minline-sqrt-min-latency -@opindex minline-sqrt-min-latency -Generate code for inline square roots -using the minimum latency algorithm. +@item -msdram +@opindex msdram +Link the SDRAM-based runtime instead of the default ROM-based runtime. -@item -minline-sqrt-max-throughput -@opindex minline-sqrt-max-throughput -Generate code for inline square roots -using the maximum throughput algorithm. +@item -msim +@opindex msim +Link the simulator run-time libraries. -@item -mno-inline-sqrt -@opindex mno-inline-sqrt -Do not generate inline code for @code{sqrt}. +@item -msimnovec +@opindex msimnovec +Link the simulator runtime libraries, excluding built-in support +for reset and exception vectors and tables. -@item -mfused-madd -@itemx -mno-fused-madd -@opindex mfused-madd -@opindex mno-fused-madd -Do (don't) generate code that uses the fused multiply/add or multiply/subtract -instructions. The default is to use these instructions. +@item -mtf +@opindex mtf +Causes all functions to default to the @code{.far} section. Without +this option, functions default to the @code{.near} section. -@item -mno-dwarf2-asm -@itemx -mdwarf2-asm -@opindex mno-dwarf2-asm -@opindex mdwarf2-asm -Don't (or do) generate assembler code for the DWARF 2 line number debugging -info. This may be useful when not using the GNU assembler. +@item -mtiny=@var{n} +@opindex mtiny= +Variables that are @var{n} bytes or smaller are allocated to the +@code{.tiny} section. These variables use the @code{$gp} base +register. The default for this option is 4, but note that there's a +65536-byte limit to the @code{.tiny} section. -@item -mearly-stop-bits -@itemx -mno-early-stop-bits -@opindex mearly-stop-bits -@opindex mno-early-stop-bits -Allow stop bits to be placed earlier than immediately preceding the -instruction that triggered the stop bit. This can improve instruction -scheduling, but does not always do so. +@end table -@item -mfixed-range=@var{register-range} -@opindex mfixed-range -Generate code treating the given register range as fixed registers. -A fixed register is one that the register allocator cannot use. This is -useful when compiling kernel code. A register range is specified as -two registers separated by a dash. Multiple register ranges can be -specified separated by a comma. +@node MicroBlaze Options +@subsection MicroBlaze Options +@cindex MicroBlaze Options -@item -mtls-size=@var{tls-size} -@opindex mtls-size -Specify bit size of immediate TLS offsets. Valid values are 14, 22, and -64. +@table @gcctabopt -@item -mtune=@var{cpu-type} -@opindex mtune -Tune the instruction scheduling for a particular CPU, Valid values are -@samp{itanium}, @samp{itanium1}, @samp{merced}, @samp{itanium2}, -and @samp{mckinley}. +@item -msoft-float +@opindex msoft-float +Use software emulation for floating point (default). -@item -milp32 -@itemx -mlp64 -@opindex milp32 -@opindex mlp64 -Generate code for a 32-bit or 64-bit environment. -The 32-bit environment sets int, long and pointer to 32 bits. -The 64-bit environment sets int to 32 bits and long and pointer -to 64 bits. These are HP-UX specific flags. +@item -mhard-float +@opindex mhard-float +Use hardware floating-point instructions. -@item -mno-sched-br-data-spec -@itemx -msched-br-data-spec -@opindex mno-sched-br-data-spec -@opindex msched-br-data-spec -(Dis/En)able data speculative scheduling before reload. -This results in generation of @code{ld.a} instructions and -the corresponding check instructions (@code{ld.c} / @code{chk.a}). -The default is 'disable'. +@item -mmemcpy +@opindex mmemcpy +Do not optimize block moves, use @code{memcpy}. -@item -msched-ar-data-spec -@itemx -mno-sched-ar-data-spec -@opindex msched-ar-data-spec -@opindex mno-sched-ar-data-spec -(En/Dis)able data speculative scheduling after reload. -This results in generation of @code{ld.a} instructions and -the corresponding check instructions (@code{ld.c} / @code{chk.a}). -The default is 'enable'. +@item -mno-clearbss +@opindex mno-clearbss +This option is deprecated. Use @option{-fno-zero-initialized-in-bss} instead. -@item -mno-sched-control-spec -@itemx -msched-control-spec -@opindex mno-sched-control-spec -@opindex msched-control-spec -(Dis/En)able control speculative scheduling. This feature is -available only during region scheduling (i.e.@: before reload). -This results in generation of the @code{ld.s} instructions and -the corresponding check instructions @code{chk.s}. -The default is 'disable'. +@item -mcpu=@var{cpu-type} +@opindex mcpu= +Use features of, and schedule code for, the given CPU. +Supported values are in the format @samp{v@var{X}.@var{YY}.@var{Z}}, +where @var{X} is a major version, @var{YY} is the minor version, and +@var{Z} is compatibility code. Example values are @samp{v3.00.a}, +@samp{v4.00.b}, @samp{v5.00.a}, @samp{v5.00.b}, @samp{v5.00.b}, @samp{v6.00.a}. -@item -msched-br-in-data-spec -@itemx -mno-sched-br-in-data-spec -@opindex msched-br-in-data-spec -@opindex mno-sched-br-in-data-spec -(En/Dis)able speculative scheduling of the instructions that -are dependent on the data speculative loads before reload. -This is effective only with @option{-msched-br-data-spec} enabled. -The default is 'enable'. +@item -mxl-soft-mul +@opindex mxl-soft-mul +Use software multiply emulation (default). -@item -msched-ar-in-data-spec -@itemx -mno-sched-ar-in-data-spec -@opindex msched-ar-in-data-spec -@opindex mno-sched-ar-in-data-spec -(En/Dis)able speculative scheduling of the instructions that -are dependent on the data speculative loads after reload. -This is effective only with @option{-msched-ar-data-spec} enabled. -The default is 'enable'. +@item -mxl-soft-div +@opindex mxl-soft-div +Use software emulation for divides (default). -@item -msched-in-control-spec -@itemx -mno-sched-in-control-spec -@opindex msched-in-control-spec -@opindex mno-sched-in-control-spec -(En/Dis)able speculative scheduling of the instructions that -are dependent on the control speculative loads. -This is effective only with @option{-msched-control-spec} enabled. -The default is 'enable'. +@item -mxl-barrel-shift +@opindex mxl-barrel-shift +Use the hardware barrel shifter. -@item -mno-sched-prefer-non-data-spec-insns -@itemx -msched-prefer-non-data-spec-insns -@opindex mno-sched-prefer-non-data-spec-insns -@opindex msched-prefer-non-data-spec-insns -If enabled, data-speculative instructions are chosen for schedule -only if there are no other choices at the moment. This makes -the use of the data speculation much more conservative. -The default is 'disable'. +@item -mxl-pattern-compare +@opindex mxl-pattern-compare +Use pattern compare instructions. -@item -mno-sched-prefer-non-control-spec-insns -@itemx -msched-prefer-non-control-spec-insns -@opindex mno-sched-prefer-non-control-spec-insns -@opindex msched-prefer-non-control-spec-insns -If enabled, control-speculative instructions are chosen for schedule -only if there are no other choices at the moment. This makes -the use of the control speculation much more conservative. -The default is 'disable'. +@item -msmall-divides +@opindex msmall-divides +Use table lookup optimization for small signed integer divisions. -@item -mno-sched-count-spec-in-critical-path -@itemx -msched-count-spec-in-critical-path -@opindex mno-sched-count-spec-in-critical-path -@opindex msched-count-spec-in-critical-path -If enabled, speculative dependencies are considered during -computation of the instructions priorities. This makes the use of the -speculation a bit more conservative. -The default is 'disable'. +@item -mxl-stack-check +@opindex mxl-stack-check +This option is deprecated. Use @option{-fstack-check} instead. -@item -msched-spec-ldc -@opindex msched-spec-ldc -Use a simple data speculation check. This option is on by default. +@item -mxl-gp-opt +@opindex mxl-gp-opt +Use GP-relative @code{.sdata}/@code{.sbss} sections. -@item -msched-control-spec-ldc -@opindex msched-spec-ldc -Use a simple check for control speculation. This option is on by default. +@item -mxl-multiply-high +@opindex mxl-multiply-high +Use multiply high instructions for high part of 32x32 multiply. -@item -msched-stop-bits-after-every-cycle -@opindex msched-stop-bits-after-every-cycle -Place a stop bit after every cycle when scheduling. This option is on -by default. +@item -mxl-float-convert +@opindex mxl-float-convert +Use hardware floating-point conversion instructions. -@item -msched-fp-mem-deps-zero-cost -@opindex msched-fp-mem-deps-zero-cost -Assume that floating-point stores and loads are not likely to cause a conflict -when placed into the same instruction group. This option is disabled by -default. +@item -mxl-float-sqrt +@opindex mxl-float-sqrt +Use hardware floating-point square root instruction. -@item -msel-sched-dont-check-control-spec -@opindex msel-sched-dont-check-control-spec -Generate checks for control speculation in selective scheduling. -This flag is disabled by default. +@item -mbig-endian +@opindex mbig-endian +Generate code for a big-endian target. -@item -msched-max-memory-insns=@var{max-insns} -@opindex msched-max-memory-insns -Limit on the number of memory insns per instruction group, giving lower -priority to subsequent memory insns attempting to schedule in the same -instruction group. Frequently useful to prevent cache bank conflicts. -The default value is 1. +@item -mlittle-endian +@opindex mlittle-endian +Generate code for a little-endian target. -@item -msched-max-memory-insns-hard-limit -@opindex msched-max-memory-insns-hard-limit -Makes the limit specified by @option{msched-max-memory-insns} a hard limit, -disallowing more than that number in an instruction group. -Otherwise, the limit is ``soft'', meaning that non-memory operations -are preferred when the limit is reached, but memory operations may still -be scheduled. - -@end table - -@node LM32 Options -@subsection LM32 Options -@cindex LM32 options - -These @option{-m} options are defined for the LatticeMico32 architecture: +@item -mxl-reorder +@opindex mxl-reorder +Use reorder instructions (swap and byte reversed load/store). -@table @gcctabopt -@item -mbarrel-shift-enabled -@opindex mbarrel-shift-enabled -Enable barrel-shift instructions. +@item -mxl-mode-@var{app-model} +Select application model @var{app-model}. Valid models are +@table @samp +@item executable +normal executable (default), uses startup code @file{crt0.o}. -@item -mdivide-enabled -@opindex mdivide-enabled -Enable divide and modulus instructions. +@item xmdstub +for use with Xilinx Microprocessor Debugger (XMD) based +software intrusive debug agent called xmdstub. This uses startup file +@file{crt1.o} and sets the start address of the program to 0x800. -@item -mmultiply-enabled -@opindex multiply-enabled -Enable multiply instructions. +@item bootstrap +for applications that are loaded using a bootloader. +This model uses startup file @file{crt2.o} which does not contain a processor +reset vector handler. This is suitable for transferring control on a +processor reset to the bootloader rather than the application. -@item -msign-extend-enabled -@opindex msign-extend-enabled -Enable sign extend instructions. +@item novectors +for applications that do not require any of the +MicroBlaze vectors. This option may be useful for applications running +within a monitoring application. This model uses @file{crt3.o} as a startup file. +@end table -@item -muser-enabled -@opindex muser-enabled -Enable user-defined instructions. +Option @option{-xl-mode-@var{app-model}} is a deprecated alias for +@option{-mxl-mode-@var{app-model}}. @end table -@node M32C Options -@subsection M32C Options -@cindex M32C options +@node MIPS Options +@subsection MIPS Options +@cindex MIPS options @table @gcctabopt -@item -mcpu=@var{name} -@opindex mcpu= -Select the CPU for which code is generated. @var{name} may be one of -@samp{r8c} for the R8C/Tiny series, @samp{m16c} for the M16C (up to -/60) series, @samp{m32cm} for the M16C/80 series, or @samp{m32c} for -the M32C/80 series. -@item -msim -@opindex msim -Specifies that the program will be run on the simulator. This causes -an alternate runtime library to be linked in which supports, for -example, file I/O@. You must not use this option when generating -programs that will run on real hardware; you must provide your own -runtime library for whatever I/O functions are needed. +@item -EB +@opindex EB +Generate big-endian code. -@item -memregs=@var{number} -@opindex memregs= -Specifies the number of memory-based pseudo-registers GCC uses -during code generation. These pseudo-registers are used like real -registers, so there is a tradeoff between GCC's ability to fit the -code into available registers, and the performance penalty of using -memory instead of registers. Note that all modules in a program must -be compiled with the same value for this option. Because of that, you -must not use this option with GCC's default runtime libraries. +@item -EL +@opindex EL +Generate little-endian code. This is the default for @samp{mips*el-*-*} +configurations. -@end table +@item -march=@var{arch} +@opindex march +Generate code that runs on @var{arch}, which can be the name of a +generic MIPS ISA, or the name of a particular processor. +The ISA names are: +@samp{mips1}, @samp{mips2}, @samp{mips3}, @samp{mips4}, +@samp{mips32}, @samp{mips32r2}, @samp{mips32r3}, @samp{mips32r5}, +@samp{mips32r6}, @samp{mips64}, @samp{mips64r2}, @samp{mips64r3}, +@samp{mips64r5} and @samp{mips64r6}. +The processor names are: +@samp{4kc}, @samp{4km}, @samp{4kp}, @samp{4ksc}, +@samp{4kec}, @samp{4kem}, @samp{4kep}, @samp{4ksd}, +@samp{5kc}, @samp{5kf}, +@samp{20kc}, +@samp{24kc}, @samp{24kf2_1}, @samp{24kf1_1}, +@samp{24kec}, @samp{24kef2_1}, @samp{24kef1_1}, +@samp{34kc}, @samp{34kf2_1}, @samp{34kf1_1}, @samp{34kn}, +@samp{74kc}, @samp{74kf2_1}, @samp{74kf1_1}, @samp{74kf3_2}, +@samp{1004kc}, @samp{1004kf2_1}, @samp{1004kf1_1}, +@samp{loongson2e}, @samp{loongson2f}, @samp{loongson3a}, +@samp{m4k}, +@samp{m14k}, @samp{m14kc}, @samp{m14ke}, @samp{m14kec}, +@samp{octeon}, @samp{octeon+}, @samp{octeon2}, @samp{octeon3}, +@samp{orion}, +@samp{p5600}, +@samp{r2000}, @samp{r3000}, @samp{r3900}, @samp{r4000}, @samp{r4400}, +@samp{r4600}, @samp{r4650}, @samp{r4700}, @samp{r6000}, @samp{r8000}, +@samp{rm7000}, @samp{rm9000}, +@samp{r10000}, @samp{r12000}, @samp{r14000}, @samp{r16000}, +@samp{sb1}, +@samp{sr71000}, +@samp{vr4100}, @samp{vr4111}, @samp{vr4120}, @samp{vr4130}, @samp{vr4300}, +@samp{vr5000}, @samp{vr5400}, @samp{vr5500}, +@samp{xlr} and @samp{xlp}. +The special value @samp{from-abi} selects the +most compatible architecture for the selected ABI (that is, +@samp{mips1} for 32-bit ABIs and @samp{mips3} for 64-bit ABIs)@. -@node M32R/D Options -@subsection M32R/D Options -@cindex M32R/D options +The native Linux/GNU toolchain also supports the value @samp{native}, +which selects the best architecture option for the host processor. +@option{-march=native} has no effect if GCC does not recognize +the processor. -These @option{-m} options are defined for Renesas M32R/D architectures: +In processor names, a final @samp{000} can be abbreviated as @samp{k} +(for example, @option{-march=r2k}). Prefixes are optional, and +@samp{vr} may be written @samp{r}. -@table @gcctabopt -@item -m32r2 -@opindex m32r2 -Generate code for the M32R/2@. +Names of the form @samp{@var{n}f2_1} refer to processors with +FPUs clocked at half the rate of the core, names of the form +@samp{@var{n}f1_1} refer to processors with FPUs clocked at the same +rate as the core, and names of the form @samp{@var{n}f3_2} refer to +processors with FPUs clocked a ratio of 3:2 with respect to the core. +For compatibility reasons, @samp{@var{n}f} is accepted as a synonym +for @samp{@var{n}f2_1} while @samp{@var{n}x} and @samp{@var{b}fx} are +accepted as synonyms for @samp{@var{n}f1_1}. -@item -m32rx -@opindex m32rx -Generate code for the M32R/X@. +GCC defines two macros based on the value of this option. The first +is @code{_MIPS_ARCH}, which gives the name of target architecture, as +a string. The second has the form @code{_MIPS_ARCH_@var{foo}}, +where @var{foo} is the capitalized value of @code{_MIPS_ARCH}@. +For example, @option{-march=r2000} sets @code{_MIPS_ARCH} +to @code{"r2000"} and defines the macro @code{_MIPS_ARCH_R2000}. -@item -m32r -@opindex m32r -Generate code for the M32R@. This is the default. +Note that the @code{_MIPS_ARCH} macro uses the processor names given +above. In other words, it has the full prefix and does not +abbreviate @samp{000} as @samp{k}. In the case of @samp{from-abi}, +the macro names the resolved architecture (either @code{"mips1"} or +@code{"mips3"}). It names the default architecture when no +@option{-march} option is given. -@item -mmodel=small -@opindex mmodel=small -Assume all objects live in the lower 16MB of memory (so that their addresses -can be loaded with the @code{ld24} instruction), and assume all subroutines -are reachable with the @code{bl} instruction. -This is the default. +@item -mtune=@var{arch} +@opindex mtune +Optimize for @var{arch}. Among other things, this option controls +the way instructions are scheduled, and the perceived cost of arithmetic +operations. The list of @var{arch} values is the same as for +@option{-march}. -The addressability of a particular object can be set with the -@code{model} attribute. +When this option is not used, GCC optimizes for the processor +specified by @option{-march}. By using @option{-march} and +@option{-mtune} together, it is possible to generate code that +runs on a family of processors, but optimize the code for one +particular member of that family. -@item -mmodel=medium -@opindex mmodel=medium -Assume objects may be anywhere in the 32-bit address space (the compiler -generates @code{seth/add3} instructions to load their addresses), and -assume all subroutines are reachable with the @code{bl} instruction. +@option{-mtune} defines the macros @code{_MIPS_TUNE} and +@code{_MIPS_TUNE_@var{foo}}, which work in the same way as the +@option{-march} ones described above. -@item -mmodel=large -@opindex mmodel=large -Assume objects may be anywhere in the 32-bit address space (the compiler -generates @code{seth/add3} instructions to load their addresses), and -assume subroutines may not be reachable with the @code{bl} instruction -(the compiler generates the much slower @code{seth/add3/jl} -instruction sequence). +@item -mips1 +@opindex mips1 +Equivalent to @option{-march=mips1}. -@item -msdata=none -@opindex msdata=none -Disable use of the small data area. Variables are put into -one of @code{.data}, @code{.bss}, or @code{.rodata} (unless the -@code{section} attribute has been specified). -This is the default. +@item -mips2 +@opindex mips2 +Equivalent to @option{-march=mips2}. -The small data area consists of sections @code{.sdata} and @code{.sbss}. -Objects may be explicitly put in the small data area with the -@code{section} attribute using one of these sections. +@item -mips3 +@opindex mips3 +Equivalent to @option{-march=mips3}. -@item -msdata=sdata -@opindex msdata=sdata -Put small global and static data in the small data area, but do not -generate special code to reference them. +@item -mips4 +@opindex mips4 +Equivalent to @option{-march=mips4}. -@item -msdata=use -@opindex msdata=use -Put small global and static data in the small data area, and generate -special instructions to reference them. +@item -mips32 +@opindex mips32 +Equivalent to @option{-march=mips32}. -@item -G @var{num} -@opindex G -@cindex smaller data references -Put global and static objects less than or equal to @var{num} bytes -into the small data or BSS sections instead of the normal data or BSS -sections. The default value of @var{num} is 8. -The @option{-msdata} option must be set to one of @samp{sdata} or @samp{use} -for this option to have any effect. +@item -mips32r3 +@opindex mips32r3 +Equivalent to @option{-march=mips32r3}. -All modules should be compiled with the same @option{-G @var{num}} value. -Compiling with different values of @var{num} may or may not work; if it -doesn't the linker gives an error message---incorrect code is not -generated. +@item -mips32r5 +@opindex mips32r5 +Equivalent to @option{-march=mips32r5}. -@item -mdebug -@opindex mdebug -Makes the M32R-specific code in the compiler display some statistics -that might help in debugging programs. +@item -mips32r6 +@opindex mips32r6 +Equivalent to @option{-march=mips32r6}. -@item -malign-loops -@opindex malign-loops -Align all loops to a 32-byte boundary. +@item -mips64 +@opindex mips64 +Equivalent to @option{-march=mips64}. -@item -mno-align-loops -@opindex mno-align-loops -Do not enforce a 32-byte alignment for loops. This is the default. +@item -mips64r2 +@opindex mips64r2 +Equivalent to @option{-march=mips64r2}. -@item -missue-rate=@var{number} -@opindex missue-rate=@var{number} -Issue @var{number} instructions per cycle. @var{number} can only be 1 -or 2. +@item -mips64r3 +@opindex mips64r3 +Equivalent to @option{-march=mips64r3}. -@item -mbranch-cost=@var{number} -@opindex mbranch-cost=@var{number} -@var{number} can only be 1 or 2. If it is 1 then branches are -preferred over conditional code, if it is 2, then the opposite applies. +@item -mips64r5 +@opindex mips64r5 +Equivalent to @option{-march=mips64r5}. -@item -mflush-trap=@var{number} -@opindex mflush-trap=@var{number} -Specifies the trap number to use to flush the cache. The default is -12. Valid numbers are between 0 and 15 inclusive. +@item -mips64r6 +@opindex mips64r6 +Equivalent to @option{-march=mips64r6}. -@item -mno-flush-trap -@opindex mno-flush-trap -Specifies that the cache cannot be flushed by using a trap. +@item -mips16 +@itemx -mno-mips16 +@opindex mips16 +@opindex mno-mips16 +Generate (do not generate) MIPS16 code. If GCC is targeting a +MIPS32 or MIPS64 architecture, it makes use of the MIPS16e ASE@. -@item -mflush-func=@var{name} -@opindex mflush-func=@var{name} -Specifies the name of the operating system function to call to flush -the cache. The default is @samp{_flush_cache}, but a function call -is only used if a trap is not available. +MIPS16 code generation can also be controlled on a per-function basis +by means of @code{mips16} and @code{nomips16} attributes. +@xref{Function Attributes}, for more information. -@item -mno-flush-func -@opindex mno-flush-func -Indicates that there is no OS function for flushing the cache. +@item -mflip-mips16 +@opindex mflip-mips16 +Generate MIPS16 code on alternating functions. This option is provided +for regression testing of mixed MIPS16/non-MIPS16 code generation, and is +not intended for ordinary use in compiling user code. -@end table +@item -minterlink-compressed +@item -mno-interlink-compressed +@opindex minterlink-compressed +@opindex mno-interlink-compressed +Require (do not require) that code using the standard (uncompressed) MIPS ISA +be link-compatible with MIPS16 and microMIPS code, and vice versa. -@node M680x0 Options -@subsection M680x0 Options -@cindex M680x0 options +For example, code using the standard ISA encoding cannot jump directly +to MIPS16 or microMIPS code; it must either use a call or an indirect jump. +@option{-minterlink-compressed} therefore disables direct jumps unless GCC +knows that the target of the jump is not compressed. -These are the @samp{-m} options defined for M680x0 and ColdFire processors. -The default settings depend on which architecture was selected when -the compiler was configured; the defaults for the most common choices -are given below. +@item -minterlink-mips16 +@itemx -mno-interlink-mips16 +@opindex minterlink-mips16 +@opindex mno-interlink-mips16 +Aliases of @option{-minterlink-compressed} and +@option{-mno-interlink-compressed}. These options predate the microMIPS ASE +and are retained for backwards compatibility. -@table @gcctabopt -@item -march=@var{arch} -@opindex march -Generate code for a specific M680x0 or ColdFire instruction set -architecture. Permissible values of @var{arch} for M680x0 -architectures are: @samp{68000}, @samp{68010}, @samp{68020}, -@samp{68030}, @samp{68040}, @samp{68060} and @samp{cpu32}. ColdFire -architectures are selected according to Freescale's ISA classification -and the permissible values are: @samp{isaa}, @samp{isaaplus}, -@samp{isab} and @samp{isac}. +@item -mabi=32 +@itemx -mabi=o64 +@itemx -mabi=n32 +@itemx -mabi=64 +@itemx -mabi=eabi +@opindex mabi=32 +@opindex mabi=o64 +@opindex mabi=n32 +@opindex mabi=64 +@opindex mabi=eabi +Generate code for the given ABI@. -GCC defines a macro @code{__mcf@var{arch}__} whenever it is generating -code for a ColdFire target. The @var{arch} in this macro is one of the -@option{-march} arguments given above. +Note that the EABI has a 32-bit and a 64-bit variant. GCC normally +generates 64-bit code when you select a 64-bit architecture, but you +can use @option{-mgp32} to get 32-bit code instead. -When used together, @option{-march} and @option{-mtune} select code -that runs on a family of similar processors but that is optimized -for a particular microarchitecture. +For information about the O64 ABI, see +@uref{http://gcc.gnu.org/@/projects/@/mipso64-abi.html}. -@item -mcpu=@var{cpu} -@opindex mcpu -Generate code for a specific M680x0 or ColdFire processor. -The M680x0 @var{cpu}s are: @samp{68000}, @samp{68010}, @samp{68020}, -@samp{68030}, @samp{68040}, @samp{68060}, @samp{68302}, @samp{68332} -and @samp{cpu32}. The ColdFire @var{cpu}s are given by the table -below, which also classifies the CPUs into families: +GCC supports a variant of the o32 ABI in which floating-point registers +are 64 rather than 32 bits wide. You can select this combination with +@option{-mabi=32} @option{-mfp64}. This ABI relies on the @code{mthc1} +and @code{mfhc1} instructions and is therefore only supported for +MIPS32R2, MIPS32R3 and MIPS32R5 processors. -@multitable @columnfractions 0.20 0.80 -@item @strong{Family} @tab @strong{@samp{-mcpu} arguments} -@item @samp{51} @tab @samp{51} @samp{51ac} @samp{51ag} @samp{51cn} @samp{51em} @samp{51je} @samp{51jf} @samp{51jg} @samp{51jm} @samp{51mm} @samp{51qe} @samp{51qm} -@item @samp{5206} @tab @samp{5202} @samp{5204} @samp{5206} -@item @samp{5206e} @tab @samp{5206e} -@item @samp{5208} @tab @samp{5207} @samp{5208} -@item @samp{5211a} @tab @samp{5210a} @samp{5211a} -@item @samp{5213} @tab @samp{5211} @samp{5212} @samp{5213} -@item @samp{5216} @tab @samp{5214} @samp{5216} -@item @samp{52235} @tab @samp{52230} @samp{52231} @samp{52232} @samp{52233} @samp{52234} @samp{52235} -@item @samp{5225} @tab @samp{5224} @samp{5225} -@item @samp{52259} @tab @samp{52252} @samp{52254} @samp{52255} @samp{52256} @samp{52258} @samp{52259} -@item @samp{5235} @tab @samp{5232} @samp{5233} @samp{5234} @samp{5235} @samp{523x} -@item @samp{5249} @tab @samp{5249} -@item @samp{5250} @tab @samp{5250} -@item @samp{5271} @tab @samp{5270} @samp{5271} -@item @samp{5272} @tab @samp{5272} -@item @samp{5275} @tab @samp{5274} @samp{5275} -@item @samp{5282} @tab @samp{5280} @samp{5281} @samp{5282} @samp{528x} -@item @samp{53017} @tab @samp{53011} @samp{53012} @samp{53013} @samp{53014} @samp{53015} @samp{53016} @samp{53017} -@item @samp{5307} @tab @samp{5307} -@item @samp{5329} @tab @samp{5327} @samp{5328} @samp{5329} @samp{532x} -@item @samp{5373} @tab @samp{5372} @samp{5373} @samp{537x} -@item @samp{5407} @tab @samp{5407} -@item @samp{5475} @tab @samp{5470} @samp{5471} @samp{5472} @samp{5473} @samp{5474} @samp{5475} @samp{547x} @samp{5480} @samp{5481} @samp{5482} @samp{5483} @samp{5484} @samp{5485} -@end multitable +The register assignments for arguments and return values remain the +same, but each scalar value is passed in a single 64-bit register +rather than a pair of 32-bit registers. For example, scalar +floating-point values are returned in @samp{$f0} only, not a +@samp{$f0}/@samp{$f1} pair. The set of call-saved registers also +remains the same in that the even-numbered double-precision registers +are saved. -@option{-mcpu=@var{cpu}} overrides @option{-march=@var{arch}} if -@var{arch} is compatible with @var{cpu}. Other combinations of -@option{-mcpu} and @option{-march} are rejected. +Two additional variants of the o32 ABI are supported to enable +a transition from 32-bit to 64-bit registers. These are FPXX +(@option{-mfpxx}) and FP64A (@option{-mfp64} @option{-mno-odd-spreg}). +The FPXX extension mandates that all code must execute correctly +when run using 32-bit or 64-bit registers. The code can be interlinked +with either FP32 or FP64, but not both. +The FP64A extension is similar to the FP64 extension but forbids the +use of odd-numbered single-precision registers. This can be used +in conjunction with the @code{FRE} mode of FPUs in MIPS32R5 +processors and allows both FP32 and FP64A code to interlink and +run in the same process without changing FPU modes. -GCC defines the macro @code{__mcf_cpu_@var{cpu}} when ColdFire target -@var{cpu} is selected. It also defines @code{__mcf_family_@var{family}}, -where the value of @var{family} is given by the table above. +@item -mabicalls +@itemx -mno-abicalls +@opindex mabicalls +@opindex mno-abicalls +Generate (do not generate) code that is suitable for SVR4-style +dynamic objects. @option{-mabicalls} is the default for SVR4-based +systems. -@item -mtune=@var{tune} -@opindex mtune -Tune the code for a particular microarchitecture within the -constraints set by @option{-march} and @option{-mcpu}. -The M680x0 microarchitectures are: @samp{68000}, @samp{68010}, -@samp{68020}, @samp{68030}, @samp{68040}, @samp{68060} -and @samp{cpu32}. The ColdFire microarchitectures -are: @samp{cfv1}, @samp{cfv2}, @samp{cfv3}, @samp{cfv4} and @samp{cfv4e}. +@item -mshared +@itemx -mno-shared +Generate (do not generate) code that is fully position-independent, +and that can therefore be linked into shared libraries. This option +only affects @option{-mabicalls}. -You can also use @option{-mtune=68020-40} for code that needs -to run relatively well on 68020, 68030 and 68040 targets. -@option{-mtune=68020-60} is similar but includes 68060 targets -as well. These two options select the same tuning decisions as -@option{-m68020-40} and @option{-m68020-60} respectively. +All @option{-mabicalls} code has traditionally been position-independent, +regardless of options like @option{-fPIC} and @option{-fpic}. However, +as an extension, the GNU toolchain allows executables to use absolute +accesses for locally-binding symbols. It can also use shorter GP +initialization sequences and generate direct calls to locally-defined +functions. This mode is selected by @option{-mno-shared}. -GCC defines the macros @code{__mc@var{arch}} and @code{__mc@var{arch}__} -when tuning for 680x0 architecture @var{arch}. It also defines -@code{mc@var{arch}} unless either @option{-ansi} or a non-GNU @option{-std} -option is used. If GCC is tuning for a range of architectures, -as selected by @option{-mtune=68020-40} or @option{-mtune=68020-60}, -it defines the macros for every architecture in the range. +@option{-mno-shared} depends on binutils 2.16 or higher and generates +objects that can only be linked by the GNU linker. However, the option +does not affect the ABI of the final executable; it only affects the ABI +of relocatable objects. Using @option{-mno-shared} generally makes +executables both smaller and quicker. -GCC also defines the macro @code{__m@var{uarch}__} when tuning for -ColdFire microarchitecture @var{uarch}, where @var{uarch} is one -of the arguments given above. +@option{-mshared} is the default. -@item -m68000 -@itemx -mc68000 -@opindex m68000 -@opindex mc68000 -Generate output for a 68000. This is the default -when the compiler is configured for 68000-based systems. -It is equivalent to @option{-march=68000}. +@item -mplt +@itemx -mno-plt +@opindex mplt +@opindex mno-plt +Assume (do not assume) that the static and dynamic linkers +support PLTs and copy relocations. This option only affects +@option{-mno-shared -mabicalls}. For the n64 ABI, this option +has no effect without @option{-msym32}. -Use this option for microcontrollers with a 68000 or EC000 core, -including the 68008, 68302, 68306, 68307, 68322, 68328 and 68356. +You can make @option{-mplt} the default by configuring +GCC with @option{--with-mips-plt}. The default is +@option{-mno-plt} otherwise. -@item -m68010 -@opindex m68010 -Generate output for a 68010. This is the default -when the compiler is configured for 68010-based systems. -It is equivalent to @option{-march=68010}. +@item -mxgot +@itemx -mno-xgot +@opindex mxgot +@opindex mno-xgot +Lift (do not lift) the usual restrictions on the size of the global +offset table. -@item -m68020 -@itemx -mc68020 -@opindex m68020 -@opindex mc68020 -Generate output for a 68020. This is the default -when the compiler is configured for 68020-based systems. -It is equivalent to @option{-march=68020}. - -@item -m68030 -@opindex m68030 -Generate output for a 68030. This is the default when the compiler is -configured for 68030-based systems. It is equivalent to -@option{-march=68030}. - -@item -m68040 -@opindex m68040 -Generate output for a 68040. This is the default when the compiler is -configured for 68040-based systems. It is equivalent to -@option{-march=68040}. - -This option inhibits the use of 68881/68882 instructions that have to be -emulated by software on the 68040. Use this option if your 68040 does not -have code to emulate those instructions. - -@item -m68060 -@opindex m68060 -Generate output for a 68060. This is the default when the compiler is -configured for 68060-based systems. It is equivalent to -@option{-march=68060}. - -This option inhibits the use of 68020 and 68881/68882 instructions that -have to be emulated by software on the 68060. Use this option if your 68060 -does not have code to emulate those instructions. - -@item -mcpu32 -@opindex mcpu32 -Generate output for a CPU32. This is the default -when the compiler is configured for CPU32-based systems. -It is equivalent to @option{-march=cpu32}. - -Use this option for microcontrollers with a -CPU32 or CPU32+ core, including the 68330, 68331, 68332, 68333, 68334, -68336, 68340, 68341, 68349 and 68360. - -@item -m5200 -@opindex m5200 -Generate output for a 520X ColdFire CPU@. This is the default -when the compiler is configured for 520X-based systems. -It is equivalent to @option{-mcpu=5206}, and is now deprecated -in favor of that option. - -Use this option for microcontroller with a 5200 core, including -the MCF5202, MCF5203, MCF5204 and MCF5206. +GCC normally uses a single instruction to load values from the GOT@. +While this is relatively efficient, it only works if the GOT +is smaller than about 64k. Anything larger causes the linker +to report an error such as: -@item -m5206e -@opindex m5206e -Generate output for a 5206e ColdFire CPU@. The option is now -deprecated in favor of the equivalent @option{-mcpu=5206e}. +@cindex relocation truncated to fit (MIPS) +@smallexample +relocation truncated to fit: R_MIPS_GOT16 foobar +@end smallexample -@item -m528x -@opindex m528x -Generate output for a member of the ColdFire 528X family. -The option is now deprecated in favor of the equivalent -@option{-mcpu=528x}. +If this happens, you should recompile your code with @option{-mxgot}. +This works with very large GOTs, although the code is also +less efficient, since it takes three instructions to fetch the +value of a global symbol. -@item -m5307 -@opindex m5307 -Generate output for a ColdFire 5307 CPU@. The option is now deprecated -in favor of the equivalent @option{-mcpu=5307}. +Note that some linkers can create multiple GOTs. If you have such a +linker, you should only need to use @option{-mxgot} when a single object +file accesses more than 64k's worth of GOT entries. Very few do. -@item -m5407 -@opindex m5407 -Generate output for a ColdFire 5407 CPU@. The option is now deprecated -in favor of the equivalent @option{-mcpu=5407}. +These options have no effect unless GCC is generating position +independent code. -@item -mcfv4e -@opindex mcfv4e -Generate output for a ColdFire V4e family CPU (e.g.@: 547x/548x). -This includes use of hardware floating-point instructions. -The option is equivalent to @option{-mcpu=547x}, and is now -deprecated in favor of that option. +@item -mgp32 +@opindex mgp32 +Assume that general-purpose registers are 32 bits wide. -@item -m68020-40 -@opindex m68020-40 -Generate output for a 68040, without using any of the new instructions. -This results in code that can run relatively efficiently on either a -68020/68881 or a 68030 or a 68040. The generated code does use the -68881 instructions that are emulated on the 68040. +@item -mgp64 +@opindex mgp64 +Assume that general-purpose registers are 64 bits wide. -The option is equivalent to @option{-march=68020} @option{-mtune=68020-40}. +@item -mfp32 +@opindex mfp32 +Assume that floating-point registers are 32 bits wide. -@item -m68020-60 -@opindex m68020-60 -Generate output for a 68060, without using any of the new instructions. -This results in code that can run relatively efficiently on either a -68020/68881 or a 68030 or a 68040. The generated code does use the -68881 instructions that are emulated on the 68060. +@item -mfp64 +@opindex mfp64 +Assume that floating-point registers are 64 bits wide. -The option is equivalent to @option{-march=68020} @option{-mtune=68020-60}. +@item -mfpxx +@opindex mfpxx +Do not assume the width of floating-point registers. @item -mhard-float -@itemx -m68881 @opindex mhard-float -@opindex m68881 -Generate floating-point instructions. This is the default for 68020 -and above, and for ColdFire devices that have an FPU@. It defines the -macro @code{__HAVE_68881__} on M680x0 targets and @code{__mcffpu__} -on ColdFire targets. +Use floating-point coprocessor instructions. @item -msoft-float @opindex msoft-float -Do not generate floating-point instructions; use library calls instead. -This is the default for 68000, 68010, and 68832 targets. It is also -the default for ColdFire devices that have no FPU. +Do not use floating-point coprocessor instructions. Implement +floating-point calculations using library calls instead. -@item -mdiv -@itemx -mno-div -@opindex mdiv -@opindex mno-div -Generate (do not generate) ColdFire hardware divide and remainder -instructions. If @option{-march} is used without @option{-mcpu}, -the default is ``on'' for ColdFire architectures and ``off'' for M680x0 -architectures. Otherwise, the default is taken from the target CPU -(either the default CPU, or the one specified by @option{-mcpu}). For -example, the default is ``off'' for @option{-mcpu=5206} and ``on'' for -@option{-mcpu=5206e}. +@item -mno-float +@opindex mno-float +Equivalent to @option{-msoft-float}, but additionally asserts that the +program being compiled does not perform any floating-point operations. +This option is presently supported only by some bare-metal MIPS +configurations, where it may select a special set of libraries +that lack all floating-point support (including, for example, the +floating-point @code{printf} formats). +If code compiled with @option{-mno-float} accidentally contains +floating-point operations, it is likely to suffer a link-time +or run-time failure. -GCC defines the macro @code{__mcfhwdiv__} when this option is enabled. +@item -msingle-float +@opindex msingle-float +Assume that the floating-point coprocessor only supports single-precision +operations. -@item -mshort -@opindex mshort -Consider type @code{int} to be 16 bits wide, like @code{short int}. -Additionally, parameters passed on the stack are also aligned to a -16-bit boundary even on targets whose API mandates promotion to 32-bit. +@item -mdouble-float +@opindex mdouble-float +Assume that the floating-point coprocessor supports double-precision +operations. This is the default. -@item -mno-short -@opindex mno-short -Do not consider type @code{int} to be 16 bits wide. This is the default. +@item -modd-spreg +@itemx -mno-odd-spreg +@opindex modd-spreg +@opindex mno-odd-spreg +Enable the use of odd-numbered single-precision floating-point registers +for the o32 ABI. This is the default for processors that are known to +support these registers. When using the o32 FPXX ABI, @option{-mno-odd-spreg} +is set by default. -@item -mnobitfield -@itemx -mno-bitfield -@opindex mnobitfield -@opindex mno-bitfield -Do not use the bit-field instructions. The @option{-m68000}, @option{-mcpu32} -and @option{-m5200} options imply @w{@option{-mnobitfield}}. +@item -mabs=2008 +@itemx -mabs=legacy +@opindex mabs=2008 +@opindex mabs=legacy +These options control the treatment of the special not-a-number (NaN) +IEEE 754 floating-point data with the @code{abs.@i{fmt}} and +@code{neg.@i{fmt}} machine instructions. -@item -mbitfield -@opindex mbitfield -Do use the bit-field instructions. The @option{-m68020} option implies -@option{-mbitfield}. This is the default if you use a configuration -designed for a 68020. +By default or when the @option{-mabs=legacy} is used the legacy +treatment is selected. In this case these instructions are considered +arithmetic and avoided where correct operation is required and the +input operand might be a NaN. A longer sequence of instructions that +manipulate the sign bit of floating-point datum manually is used +instead unless the @option{-ffinite-math-only} option has also been +specified. -@item -mrtd -@opindex mrtd -Use a different function-calling convention, in which functions -that take a fixed number of arguments return with the @code{rtd} -instruction, which pops their arguments while returning. This -saves one instruction in the caller since there is no need to pop -the arguments there. +The @option{-mabs=2008} option selects the IEEE 754-2008 treatment. In +this case these instructions are considered non-arithmetic and therefore +operating correctly in all cases, including in particular where the +input operand is a NaN. These instructions are therefore always used +for the respective operations. -This calling convention is incompatible with the one normally -used on Unix, so you cannot use it if you need to call libraries -compiled with the Unix compiler. +@item -mnan=2008 +@itemx -mnan=legacy +@opindex mnan=2008 +@opindex mnan=legacy +These options control the encoding of the special not-a-number (NaN) +IEEE 754 floating-point data. -Also, you must provide function prototypes for all functions that -take variable numbers of arguments (including @code{printf}); -otherwise incorrect code is generated for calls to those -functions. +The @option{-mnan=legacy} option selects the legacy encoding. In this +case quiet NaNs (qNaNs) are denoted by the first bit of their trailing +significand field being 0, whereas signalling NaNs (sNaNs) are denoted +by the first bit of their trailing significand field being 1. -In addition, seriously incorrect code results if you call a -function with too many arguments. (Normally, extra arguments are -harmlessly ignored.) +The @option{-mnan=2008} option selects the IEEE 754-2008 encoding. In +this case qNaNs are denoted by the first bit of their trailing +significand field being 1, whereas sNaNs are denoted by the first bit of +their trailing significand field being 0. -The @code{rtd} instruction is supported by the 68010, 68020, 68030, -68040, 68060 and CPU32 processors, but not by the 68000 or 5200. +The default is @option{-mnan=legacy} unless GCC has been configured with +@option{--with-nan=2008}. -@item -mno-rtd -@opindex mno-rtd -Do not use the calling conventions selected by @option{-mrtd}. -This is the default. +@item -mllsc +@itemx -mno-llsc +@opindex mllsc +@opindex mno-llsc +Use (do not use) @samp{ll}, @samp{sc}, and @samp{sync} instructions to +implement atomic memory built-in functions. When neither option is +specified, GCC uses the instructions if the target architecture +supports them. -@item -malign-int -@itemx -mno-align-int -@opindex malign-int -@opindex mno-align-int -Control whether GCC aligns @code{int}, @code{long}, @code{long long}, -@code{float}, @code{double}, and @code{long double} variables on a 32-bit -boundary (@option{-malign-int}) or a 16-bit boundary (@option{-mno-align-int}). -Aligning variables on 32-bit boundaries produces code that runs somewhat -faster on processors with 32-bit busses at the expense of more memory. - -@strong{Warning:} if you use the @option{-malign-int} switch, GCC -aligns structures containing the above types differently than -most published application binary interface specifications for the m68k. +@option{-mllsc} is useful if the runtime environment can emulate the +instructions and @option{-mno-llsc} can be useful when compiling for +nonstandard ISAs. You can make either option the default by +configuring GCC with @option{--with-llsc} and @option{--without-llsc} +respectively. @option{--with-llsc} is the default for some +configurations; see the installation documentation for details. -@item -mpcrel -@opindex mpcrel -Use the pc-relative addressing mode of the 68000 directly, instead of -using a global offset table. At present, this option implies @option{-fpic}, -allowing at most a 16-bit offset for pc-relative addressing. @option{-fPIC} is -not presently supported with @option{-mpcrel}, though this could be supported for -68020 and higher processors. +@item -mdsp +@itemx -mno-dsp +@opindex mdsp +@opindex mno-dsp +Use (do not use) revision 1 of the MIPS DSP ASE@. +@xref{MIPS DSP Built-in Functions}. This option defines the +preprocessor macro @code{__mips_dsp}. It also defines +@code{__mips_dsp_rev} to 1. -@item -mno-strict-align -@itemx -mstrict-align -@opindex mno-strict-align -@opindex mstrict-align -Do not (do) assume that unaligned memory references are handled by -the system. +@item -mdspr2 +@itemx -mno-dspr2 +@opindex mdspr2 +@opindex mno-dspr2 +Use (do not use) revision 2 of the MIPS DSP ASE@. +@xref{MIPS DSP Built-in Functions}. This option defines the +preprocessor macros @code{__mips_dsp} and @code{__mips_dspr2}. +It also defines @code{__mips_dsp_rev} to 2. -@item -msep-data -Generate code that allows the data segment to be located in a different -area of memory from the text segment. This allows for execute-in-place in -an environment without virtual memory management. This option implies -@option{-fPIC}. +@item -msmartmips +@itemx -mno-smartmips +@opindex msmartmips +@opindex mno-smartmips +Use (do not use) the MIPS SmartMIPS ASE. -@item -mno-sep-data -Generate code that assumes that the data segment follows the text segment. -This is the default. +@item -mpaired-single +@itemx -mno-paired-single +@opindex mpaired-single +@opindex mno-paired-single +Use (do not use) paired-single floating-point instructions. +@xref{MIPS Paired-Single Support}. This option requires +hardware floating-point support to be enabled. -@item -mid-shared-library -Generate code that supports shared libraries via the library ID method. -This allows for execute-in-place and shared libraries in an environment -without virtual memory management. This option implies @option{-fPIC}. +@item -mdmx +@itemx -mno-mdmx +@opindex mdmx +@opindex mno-mdmx +Use (do not use) MIPS Digital Media Extension instructions. +This option can only be used when generating 64-bit code and requires +hardware floating-point support to be enabled. -@item -mno-id-shared-library -Generate code that doesn't assume ID-based shared libraries are being used. -This is the default. +@item -mips3d +@itemx -mno-mips3d +@opindex mips3d +@opindex mno-mips3d +Use (do not use) the MIPS-3D ASE@. @xref{MIPS-3D Built-in Functions}. +The option @option{-mips3d} implies @option{-mpaired-single}. -@item -mshared-library-id=n -Specifies the identification number of the ID-based shared library being -compiled. Specifying a value of 0 generates more compact code; specifying -other values forces the allocation of that number to the current -library, but is no more space- or time-efficient than omitting this option. +@item -mmicromips +@itemx -mno-micromips +@opindex mmicromips +@opindex mno-mmicromips +Generate (do not generate) microMIPS code. -@item -mxgot -@itemx -mno-xgot -@opindex mxgot -@opindex mno-xgot -When generating position-independent code for ColdFire, generate code -that works if the GOT has more than 8192 entries. This code is -larger and slower than code generated without this option. On M680x0 -processors, this option is not needed; @option{-fPIC} suffices. +MicroMIPS code generation can also be controlled on a per-function basis +by means of @code{micromips} and @code{nomicromips} attributes. +@xref{Function Attributes}, for more information. -GCC normally uses a single instruction to load values from the GOT@. -While this is relatively efficient, it only works if the GOT -is smaller than about 64k. Anything larger causes the linker -to report an error such as: +@item -mmt +@itemx -mno-mt +@opindex mmt +@opindex mno-mt +Use (do not use) MT Multithreading instructions. -@cindex relocation truncated to fit (ColdFire) -@smallexample -relocation truncated to fit: R_68K_GOT16O foobar -@end smallexample +@item -mmcu +@itemx -mno-mcu +@opindex mmcu +@opindex mno-mcu +Use (do not use) the MIPS MCU ASE instructions. -If this happens, you should recompile your code with @option{-mxgot}. -It should then work with very large GOTs. However, code generated with -@option{-mxgot} is less efficient, since it takes 4 instructions to fetch -the value of a global symbol. +@item -meva +@itemx -mno-eva +@opindex meva +@opindex mno-eva +Use (do not use) the MIPS Enhanced Virtual Addressing instructions. -Note that some linkers, including newer versions of the GNU linker, -can create multiple GOTs and sort GOT entries. If you have such a linker, -you should only need to use @option{-mxgot} when compiling a single -object file that accesses more than 8192 GOT entries. Very few do. +@item -mvirt +@itemx -mno-virt +@opindex mvirt +@opindex mno-virt +Use (do not use) the MIPS Virtualization Application Specific instructions. -These options have no effect unless GCC is generating -position-independent code. +@item -mxpa +@itemx -mno-xpa +@opindex mxpa +@opindex mno-xpa +Use (do not use) the MIPS eXtended Physical Address (XPA) instructions. -@end table +@item -mlong64 +@opindex mlong64 +Force @code{long} types to be 64 bits wide. See @option{-mlong32} for +an explanation of the default and the way that the pointer size is +determined. -@node MCore Options -@subsection MCore Options -@cindex MCore options +@item -mlong32 +@opindex mlong32 +Force @code{long}, @code{int}, and pointer types to be 32 bits wide. -These are the @samp{-m} options defined for the Motorola M*Core -processors. +The default size of @code{int}s, @code{long}s and pointers depends on +the ABI@. All the supported ABIs use 32-bit @code{int}s. The n64 ABI +uses 64-bit @code{long}s, as does the 64-bit EABI; the others use +32-bit @code{long}s. Pointers are the same size as @code{long}s, +or the same size as integer registers, whichever is smaller. -@table @gcctabopt +@item -msym32 +@itemx -mno-sym32 +@opindex msym32 +@opindex mno-sym32 +Assume (do not assume) that all symbols have 32-bit values, regardless +of the selected ABI@. This option is useful in combination with +@option{-mabi=64} and @option{-mno-abicalls} because it allows GCC +to generate shorter and faster references to symbolic addresses. -@item -mhardlit -@itemx -mno-hardlit -@opindex mhardlit -@opindex mno-hardlit -Inline constants into the code stream if it can be done in two -instructions or less. +@item -G @var{num} +@opindex G +Put definitions of externally-visible data in a small data section +if that data is no bigger than @var{num} bytes. GCC can then generate +more efficient accesses to the data; see @option{-mgpopt} for details. -@item -mdiv -@itemx -mno-div -@opindex mdiv -@opindex mno-div -Use the divide instruction. (Enabled by default). +The default @option{-G} option depends on the configuration. -@item -mrelax-immediate -@itemx -mno-relax-immediate -@opindex mrelax-immediate -@opindex mno-relax-immediate -Allow arbitrary-sized immediates in bit operations. +@item -mlocal-sdata +@itemx -mno-local-sdata +@opindex mlocal-sdata +@opindex mno-local-sdata +Extend (do not extend) the @option{-G} behavior to local data too, +such as to static variables in C@. @option{-mlocal-sdata} is the +default for all configurations. -@item -mwide-bitfields -@itemx -mno-wide-bitfields -@opindex mwide-bitfields -@opindex mno-wide-bitfields -Always treat bit-fields as @code{int}-sized. +If the linker complains that an application is using too much small data, +you might want to try rebuilding the less performance-critical parts with +@option{-mno-local-sdata}. You might also want to build large +libraries with @option{-mno-local-sdata}, so that the libraries leave +more room for the main program. -@item -m4byte-functions -@itemx -mno-4byte-functions -@opindex m4byte-functions -@opindex mno-4byte-functions -Force all functions to be aligned to a 4-byte boundary. +@item -mextern-sdata +@itemx -mno-extern-sdata +@opindex mextern-sdata +@opindex mno-extern-sdata +Assume (do not assume) that externally-defined data is in +a small data section if the size of that data is within the @option{-G} limit. +@option{-mextern-sdata} is the default for all configurations. -@item -mcallgraph-data -@itemx -mno-callgraph-data -@opindex mcallgraph-data -@opindex mno-callgraph-data -Emit callgraph information. +If you compile a module @var{Mod} with @option{-mextern-sdata} @option{-G +@var{num}} @option{-mgpopt}, and @var{Mod} references a variable @var{Var} +that is no bigger than @var{num} bytes, you must make sure that @var{Var} +is placed in a small data section. If @var{Var} is defined by another +module, you must either compile that module with a high-enough +@option{-G} setting or attach a @code{section} attribute to @var{Var}'s +definition. If @var{Var} is common, you must link the application +with a high-enough @option{-G} setting. -@item -mslow-bytes -@itemx -mno-slow-bytes -@opindex mslow-bytes -@opindex mno-slow-bytes -Prefer word access when reading byte quantities. +The easiest way of satisfying these restrictions is to compile +and link every module with the same @option{-G} option. However, +you may wish to build a library that supports several different +small data limits. You can do this by compiling the library with +the highest supported @option{-G} setting and additionally using +@option{-mno-extern-sdata} to stop the library from making assumptions +about externally-defined data. -@item -mlittle-endian -@itemx -mbig-endian -@opindex mlittle-endian -@opindex mbig-endian -Generate code for a little-endian target. +@item -mgpopt +@itemx -mno-gpopt +@opindex mgpopt +@opindex mno-gpopt +Use (do not use) GP-relative accesses for symbols that are known to be +in a small data section; see @option{-G}, @option{-mlocal-sdata} and +@option{-mextern-sdata}. @option{-mgpopt} is the default for all +configurations. -@item -m210 -@itemx -m340 -@opindex m210 -@opindex m340 -Generate code for the 210 processor. +@option{-mno-gpopt} is useful for cases where the @code{$gp} register +might not hold the value of @code{_gp}. For example, if the code is +part of a library that might be used in a boot monitor, programs that +call boot monitor routines pass an unknown value in @code{$gp}. +(In such situations, the boot monitor itself is usually compiled +with @option{-G0}.) -@item -mno-lsim -@opindex mno-lsim -Assume that runtime support has been provided and so omit the -simulator library (@file{libsim.a)} from the linker command line. +@option{-mno-gpopt} implies @option{-mno-local-sdata} and +@option{-mno-extern-sdata}. -@item -mstack-increment=@var{size} -@opindex mstack-increment -Set the maximum amount for a single stack increment operation. Large -values can increase the speed of programs that contain functions -that need a large amount of stack space, but they can also trigger a -segmentation fault if the stack is extended too much. The default -value is 0x1000. +@item -membedded-data +@itemx -mno-embedded-data +@opindex membedded-data +@opindex mno-embedded-data +Allocate variables to the read-only data section first if possible, then +next in the small data section if possible, otherwise in data. This gives +slightly slower code than the default, but reduces the amount of RAM required +when executing, and thus may be preferred for some embedded systems. -@end table +@item -muninit-const-in-rodata +@itemx -mno-uninit-const-in-rodata +@opindex muninit-const-in-rodata +@opindex mno-uninit-const-in-rodata +Put uninitialized @code{const} variables in the read-only data section. +This option is only meaningful in conjunction with @option{-membedded-data}. -@node MeP Options -@subsection MeP Options -@cindex MeP options +@item -mcode-readable=@var{setting} +@opindex mcode-readable +Specify whether GCC may generate code that reads from executable sections. +There are three possible settings: @table @gcctabopt +@item -mcode-readable=yes +Instructions may freely access executable sections. This is the +default setting. -@item -mabsdiff -@opindex mabsdiff -Enables the @code{abs} instruction, which is the absolute difference -between two registers. - -@item -mall-opts -@opindex mall-opts -Enables all the optional instructions---average, multiply, divide, bit -operations, leading zero, absolute difference, min/max, clip, and -saturation. +@item -mcode-readable=pcrel +MIPS16 PC-relative load instructions can access executable sections, +but other instructions must not do so. This option is useful on 4KSc +and 4KSd processors when the code TLBs have the Read Inhibit bit set. +It is also useful on processors that can be configured to have a dual +instruction/data SRAM interface and that, like the M4K, automatically +redirect PC-relative loads to the instruction RAM. +@item -mcode-readable=no +Instructions must not access executable sections. This option can be +useful on targets that are configured to have a dual instruction/data +SRAM interface but that (unlike the M4K) do not automatically redirect +PC-relative loads to the instruction RAM. +@end table -@item -maverage -@opindex maverage -Enables the @code{ave} instruction, which computes the average of two -registers. +@item -msplit-addresses +@itemx -mno-split-addresses +@opindex msplit-addresses +@opindex mno-split-addresses +Enable (disable) use of the @code{%hi()} and @code{%lo()} assembler +relocation operators. This option has been superseded by +@option{-mexplicit-relocs} but is retained for backwards compatibility. -@item -mbased=@var{n} -@opindex mbased= -Variables of size @var{n} bytes or smaller are placed in the -@code{.based} section by default. Based variables use the @code{$tp} -register as a base register, and there is a 128-byte limit to the -@code{.based} section. +@item -mexplicit-relocs +@itemx -mno-explicit-relocs +@opindex mexplicit-relocs +@opindex mno-explicit-relocs +Use (do not use) assembler relocation operators when dealing with symbolic +addresses. The alternative, selected by @option{-mno-explicit-relocs}, +is to use assembler macros instead. -@item -mbitops -@opindex mbitops -Enables the bit operation instructions---bit test (@code{btstm}), set -(@code{bsetm}), clear (@code{bclrm}), invert (@code{bnotm}), and -test-and-set (@code{tas}). +@option{-mexplicit-relocs} is the default if GCC was configured +to use an assembler that supports relocation operators. -@item -mc=@var{name} -@opindex mc= -Selects which section constant data is placed in. @var{name} may -be @samp{tiny}, @samp{near}, or @samp{far}. +@item -mcheck-zero-division +@itemx -mno-check-zero-division +@opindex mcheck-zero-division +@opindex mno-check-zero-division +Trap (do not trap) on integer division by zero. -@item -mclip -@opindex mclip -Enables the @code{clip} instruction. Note that @option{-mclip} is not -useful unless you also provide @option{-mminmax}. +The default is @option{-mcheck-zero-division}. -@item -mconfig=@var{name} -@opindex mconfig= -Selects one of the built-in core configurations. Each MeP chip has -one or more modules in it; each module has a core CPU and a variety of -coprocessors, optional instructions, and peripherals. The -@code{MeP-Integrator} tool, not part of GCC, provides these -configurations through this option; using this option is the same as -using all the corresponding command-line options. The default -configuration is @samp{default}. +@item -mdivide-traps +@itemx -mdivide-breaks +@opindex mdivide-traps +@opindex mdivide-breaks +MIPS systems check for division by zero by generating either a +conditional trap or a break instruction. Using traps results in +smaller code, but is only supported on MIPS II and later. Also, some +versions of the Linux kernel have a bug that prevents trap from +generating the proper signal (@code{SIGFPE}). Use @option{-mdivide-traps} to +allow conditional traps on architectures that support them and +@option{-mdivide-breaks} to force the use of breaks. -@item -mcop -@opindex mcop -Enables the coprocessor instructions. By default, this is a 32-bit -coprocessor. Note that the coprocessor is normally enabled via the -@option{-mconfig=} option. +The default is usually @option{-mdivide-traps}, but this can be +overridden at configure time using @option{--with-divide=breaks}. +Divide-by-zero checks can be completely disabled using +@option{-mno-check-zero-division}. -@item -mcop32 -@opindex mcop32 -Enables the 32-bit coprocessor's instructions. +@item -mmemcpy +@itemx -mno-memcpy +@opindex mmemcpy +@opindex mno-memcpy +Force (do not force) the use of @code{memcpy} for non-trivial block +moves. The default is @option{-mno-memcpy}, which allows GCC to inline +most constant-sized copies. -@item -mcop64 -@opindex mcop64 -Enables the 64-bit coprocessor's instructions. +@item -mlong-calls +@itemx -mno-long-calls +@opindex mlong-calls +@opindex mno-long-calls +Disable (do not disable) use of the @code{jal} instruction. Calling +functions using @code{jal} is more efficient but requires the caller +and callee to be in the same 256 megabyte segment. -@item -mivc2 -@opindex mivc2 -Enables IVC2 scheduling. IVC2 is a 64-bit VLIW coprocessor. +This option has no effect on abicalls code. The default is +@option{-mno-long-calls}. -@item -mdc -@opindex mdc -Causes constant variables to be placed in the @code{.near} section. +@item -mmad +@itemx -mno-mad +@opindex mmad +@opindex mno-mad +Enable (disable) use of the @code{mad}, @code{madu} and @code{mul} +instructions, as provided by the R4650 ISA@. -@item -mdiv -@opindex mdiv -Enables the @code{div} and @code{divu} instructions. +@item -mimadd +@itemx -mno-imadd +@opindex mimadd +@opindex mno-imadd +Enable (disable) use of the @code{madd} and @code{msub} integer +instructions. The default is @option{-mimadd} on architectures +that support @code{madd} and @code{msub} except for the 74k +architecture where it was found to generate slower code. -@item -meb -@opindex meb -Generate big-endian code. +@item -mfused-madd +@itemx -mno-fused-madd +@opindex mfused-madd +@opindex mno-fused-madd +Enable (disable) use of the floating-point multiply-accumulate +instructions, when they are available. The default is +@option{-mfused-madd}. -@item -mel -@opindex mel -Generate little-endian code. +On the R8000 CPU when multiply-accumulate instructions are used, +the intermediate product is calculated to infinite precision +and is not subject to the FCSR Flush to Zero bit. This may be +undesirable in some circumstances. On other processors the result +is numerically identical to the equivalent computation using +separate multiply, add, subtract and negate instructions. -@item -mio-volatile -@opindex mio-volatile -Tells the compiler that any variable marked with the @code{io} -attribute is to be considered volatile. +@item -nocpp +@opindex nocpp +Tell the MIPS assembler to not run its preprocessor over user +assembler files (with a @samp{.s} suffix) when assembling them. -@item -ml -@opindex ml -Causes variables to be assigned to the @code{.far} section by default. +@item -mfix-24k +@item -mno-fix-24k +@opindex mfix-24k +@opindex mno-fix-24k +Work around the 24K E48 (lost data on stores during refill) errata. +The workarounds are implemented by the assembler rather than by GCC@. -@item -mleadz -@opindex mleadz -Enables the @code{leadz} (leading zero) instruction. +@item -mfix-r4000 +@itemx -mno-fix-r4000 +@opindex mfix-r4000 +@opindex mno-fix-r4000 +Work around certain R4000 CPU errata: +@itemize @minus +@item +A double-word or a variable shift may give an incorrect result if executed +immediately after starting an integer division. +@item +A double-word or a variable shift may give an incorrect result if executed +while an integer multiplication is in progress. +@item +An integer division may give an incorrect result if started in a delay slot +of a taken branch or a jump. +@end itemize -@item -mm -@opindex mm -Causes variables to be assigned to the @code{.near} section by default. - -@item -mminmax -@opindex mminmax -Enables the @code{min} and @code{max} instructions. +@item -mfix-r4400 +@itemx -mno-fix-r4400 +@opindex mfix-r4400 +@opindex mno-fix-r4400 +Work around certain R4400 CPU errata: +@itemize @minus +@item +A double-word or a variable shift may give an incorrect result if executed +immediately after starting an integer division. +@end itemize -@item -mmult -@opindex mmult -Enables the multiplication and multiply-accumulate instructions. +@item -mfix-r10000 +@itemx -mno-fix-r10000 +@opindex mfix-r10000 +@opindex mno-fix-r10000 +Work around certain R10000 errata: +@itemize @minus +@item +@code{ll}/@code{sc} sequences may not behave atomically on revisions +prior to 3.0. They may deadlock on revisions 2.6 and earlier. +@end itemize -@item -mno-opts -@opindex mno-opts -Disables all the optional instructions enabled by @option{-mall-opts}. +This option can only be used if the target architecture supports +branch-likely instructions. @option{-mfix-r10000} is the default when +@option{-march=r10000} is used; @option{-mno-fix-r10000} is the default +otherwise. -@item -mrepeat -@opindex mrepeat -Enables the @code{repeat} and @code{erepeat} instructions, used for -low-overhead looping. +@item -mfix-rm7000 +@itemx -mno-fix-rm7000 +@opindex mfix-rm7000 +Work around the RM7000 @code{dmult}/@code{dmultu} errata. The +workarounds are implemented by the assembler rather than by GCC@. -@item -ms -@opindex ms -Causes all variables to default to the @code{.tiny} section. Note -that there is a 65536-byte limit to this section. Accesses to these -variables use the @code{%gp} base register. +@item -mfix-vr4120 +@itemx -mno-fix-vr4120 +@opindex mfix-vr4120 +Work around certain VR4120 errata: +@itemize @minus +@item +@code{dmultu} does not always produce the correct result. +@item +@code{div} and @code{ddiv} do not always produce the correct result if one +of the operands is negative. +@end itemize +The workarounds for the division errata rely on special functions in +@file{libgcc.a}. At present, these functions are only provided by +the @code{mips64vr*-elf} configurations. -@item -msatur -@opindex msatur -Enables the saturation instructions. Note that the compiler does not -currently generate these itself, but this option is included for -compatibility with other tools, like @code{as}. +Other VR4120 errata require a NOP to be inserted between certain pairs of +instructions. These errata are handled by the assembler, not by GCC itself. -@item -msdram -@opindex msdram -Link the SDRAM-based runtime instead of the default ROM-based runtime. +@item -mfix-vr4130 +@opindex mfix-vr4130 +Work around the VR4130 @code{mflo}/@code{mfhi} errata. The +workarounds are implemented by the assembler rather than by GCC, +although GCC avoids using @code{mflo} and @code{mfhi} if the +VR4130 @code{macc}, @code{macchi}, @code{dmacc} and @code{dmacchi} +instructions are available instead. -@item -msim -@opindex msim -Link the simulator run-time libraries. +@item -mfix-sb1 +@itemx -mno-fix-sb1 +@opindex mfix-sb1 +Work around certain SB-1 CPU core errata. +(This flag currently works around the SB-1 revision 2 +``F1'' and ``F2'' floating-point errata.) -@item -msimnovec -@opindex msimnovec -Link the simulator runtime libraries, excluding built-in support -for reset and exception vectors and tables. +@item -mr10k-cache-barrier=@var{setting} +@opindex mr10k-cache-barrier +Specify whether GCC should insert cache barriers to avoid the +side-effects of speculation on R10K processors. -@item -mtf -@opindex mtf -Causes all functions to default to the @code{.far} section. Without -this option, functions default to the @code{.near} section. +In common with many processors, the R10K tries to predict the outcome +of a conditional branch and speculatively executes instructions from +the ``taken'' branch. It later aborts these instructions if the +predicted outcome is wrong. However, on the R10K, even aborted +instructions can have side effects. -@item -mtiny=@var{n} -@opindex mtiny= -Variables that are @var{n} bytes or smaller are allocated to the -@code{.tiny} section. These variables use the @code{$gp} base -register. The default for this option is 4, but note that there's a -65536-byte limit to the @code{.tiny} section. +This problem only affects kernel stores and, depending on the system, +kernel loads. As an example, a speculatively-executed store may load +the target memory into cache and mark the cache line as dirty, even if +the store itself is later aborted. If a DMA operation writes to the +same area of memory before the ``dirty'' line is flushed, the cached +data overwrites the DMA-ed data. See the R10K processor manual +for a full description, including other potential problems. -@end table +One workaround is to insert cache barrier instructions before every memory +access that might be speculatively executed and that might have side +effects even if aborted. @option{-mr10k-cache-barrier=@var{setting}} +controls GCC's implementation of this workaround. It assumes that +aborted accesses to any byte in the following regions does not have +side effects: -@node MicroBlaze Options -@subsection MicroBlaze Options -@cindex MicroBlaze Options +@enumerate +@item +the memory occupied by the current function's stack frame; -@table @gcctabopt +@item +the memory occupied by an incoming stack argument; -@item -msoft-float -@opindex msoft-float -Use software emulation for floating point (default). +@item +the memory occupied by an object with a link-time-constant address. +@end enumerate -@item -mhard-float -@opindex mhard-float -Use hardware floating-point instructions. +It is the kernel's responsibility to ensure that speculative +accesses to these regions are indeed safe. -@item -mmemcpy -@opindex mmemcpy -Do not optimize block moves, use @code{memcpy}. +If the input program contains a function declaration such as: -@item -mno-clearbss -@opindex mno-clearbss -This option is deprecated. Use @option{-fno-zero-initialized-in-bss} instead. +@smallexample +void foo (void); +@end smallexample -@item -mcpu=@var{cpu-type} -@opindex mcpu= -Use features of, and schedule code for, the given CPU. -Supported values are in the format @samp{v@var{X}.@var{YY}.@var{Z}}, -where @var{X} is a major version, @var{YY} is the minor version, and -@var{Z} is compatibility code. Example values are @samp{v3.00.a}, -@samp{v4.00.b}, @samp{v5.00.a}, @samp{v5.00.b}, @samp{v5.00.b}, @samp{v6.00.a}. +then the implementation of @code{foo} must allow @code{j foo} and +@code{jal foo} to be executed speculatively. GCC honors this +restriction for functions it compiles itself. It expects non-GCC +functions (such as hand-written assembly code) to do the same. -@item -mxl-soft-mul -@opindex mxl-soft-mul -Use software multiply emulation (default). +The option has three forms: -@item -mxl-soft-div -@opindex mxl-soft-div -Use software emulation for divides (default). +@table @gcctabopt +@item -mr10k-cache-barrier=load-store +Insert a cache barrier before a load or store that might be +speculatively executed and that might have side effects even +if aborted. -@item -mxl-barrel-shift -@opindex mxl-barrel-shift -Use the hardware barrel shifter. +@item -mr10k-cache-barrier=store +Insert a cache barrier before a store that might be speculatively +executed and that might have side effects even if aborted. -@item -mxl-pattern-compare -@opindex mxl-pattern-compare -Use pattern compare instructions. +@item -mr10k-cache-barrier=none +Disable the insertion of cache barriers. This is the default setting. +@end table -@item -msmall-divides -@opindex msmall-divides -Use table lookup optimization for small signed integer divisions. +@item -mflush-func=@var{func} +@itemx -mno-flush-func +@opindex mflush-func +Specifies the function to call to flush the I and D caches, or to not +call any such function. If called, the function must take the same +arguments as the common @code{_flush_func}, that is, the address of the +memory range for which the cache is being flushed, the size of the +memory range, and the number 3 (to flush both caches). The default +depends on the target GCC was configured for, but commonly is either +@code{_flush_func} or @code{__cpu_flush}. -@item -mxl-stack-check -@opindex mxl-stack-check -This option is deprecated. Use @option{-fstack-check} instead. +@item mbranch-cost=@var{num} +@opindex mbranch-cost +Set the cost of branches to roughly @var{num} ``simple'' instructions. +This cost is only a heuristic and is not guaranteed to produce +consistent results across releases. A zero cost redundantly selects +the default, which is based on the @option{-mtune} setting. -@item -mxl-gp-opt -@opindex mxl-gp-opt -Use GP-relative @code{.sdata}/@code{.sbss} sections. +@item -mbranch-likely +@itemx -mno-branch-likely +@opindex mbranch-likely +@opindex mno-branch-likely +Enable or disable use of Branch Likely instructions, regardless of the +default for the selected architecture. By default, Branch Likely +instructions may be generated if they are supported by the selected +architecture. An exception is for the MIPS32 and MIPS64 architectures +and processors that implement those architectures; for those, Branch +Likely instructions are not be generated by default because the MIPS32 +and MIPS64 architectures specifically deprecate their use. -@item -mxl-multiply-high -@opindex mxl-multiply-high -Use multiply high instructions for high part of 32x32 multiply. +@item -mfp-exceptions +@itemx -mno-fp-exceptions +@opindex mfp-exceptions +Specifies whether FP exceptions are enabled. This affects how +FP instructions are scheduled for some processors. +The default is that FP exceptions are +enabled. -@item -mxl-float-convert -@opindex mxl-float-convert -Use hardware floating-point conversion instructions. +For instance, on the SB-1, if FP exceptions are disabled, and we are emitting +64-bit code, then we can use both FP pipes. Otherwise, we can only use one +FP pipe. -@item -mxl-float-sqrt -@opindex mxl-float-sqrt -Use hardware floating-point square root instruction. +@item -mvr4130-align +@itemx -mno-vr4130-align +@opindex mvr4130-align +The VR4130 pipeline is two-way superscalar, but can only issue two +instructions together if the first one is 8-byte aligned. When this +option is enabled, GCC aligns pairs of instructions that it +thinks should execute in parallel. -@item -mbig-endian -@opindex mbig-endian -Generate code for a big-endian target. +This option only has an effect when optimizing for the VR4130. +It normally makes code faster, but at the expense of making it bigger. +It is enabled by default at optimization level @option{-O3}. -@item -mlittle-endian -@opindex mlittle-endian -Generate code for a little-endian target. +@item -msynci +@itemx -mno-synci +@opindex msynci +Enable (disable) generation of @code{synci} instructions on +architectures that support it. The @code{synci} instructions (if +enabled) are generated when @code{__builtin___clear_cache} is +compiled. -@item -mxl-reorder -@opindex mxl-reorder -Use reorder instructions (swap and byte reversed load/store). +This option defaults to @option{-mno-synci}, but the default can be +overridden by configuring GCC with @option{--with-synci}. -@item -mxl-mode-@var{app-model} -Select application model @var{app-model}. Valid models are -@table @samp -@item executable -normal executable (default), uses startup code @file{crt0.o}. +When compiling code for single processor systems, it is generally safe +to use @code{synci}. However, on many multi-core (SMP) systems, it +does not invalidate the instruction caches on all cores and may lead +to undefined behavior. -@item xmdstub -for use with Xilinx Microprocessor Debugger (XMD) based -software intrusive debug agent called xmdstub. This uses startup file -@file{crt1.o} and sets the start address of the program to 0x800. +@item -mrelax-pic-calls +@itemx -mno-relax-pic-calls +@opindex mrelax-pic-calls +Try to turn PIC calls that are normally dispatched via register +@code{$25} into direct calls. This is only possible if the linker can +resolve the destination at link-time and if the destination is within +range for a direct call. -@item bootstrap -for applications that are loaded using a bootloader. -This model uses startup file @file{crt2.o} which does not contain a processor -reset vector handler. This is suitable for transferring control on a -processor reset to the bootloader rather than the application. +@option{-mrelax-pic-calls} is the default if GCC was configured to use +an assembler and a linker that support the @code{.reloc} assembly +directive and @option{-mexplicit-relocs} is in effect. With +@option{-mno-explicit-relocs}, this optimization can be performed by the +assembler and the linker alone without help from the compiler. -@item novectors -for applications that do not require any of the -MicroBlaze vectors. This option may be useful for applications running -within a monitoring application. This model uses @file{crt3.o} as a startup file. -@end table +@item -mmcount-ra-address +@itemx -mno-mcount-ra-address +@opindex mmcount-ra-address +@opindex mno-mcount-ra-address +Emit (do not emit) code that allows @code{_mcount} to modify the +calling function's return address. When enabled, this option extends +the usual @code{_mcount} interface with a new @var{ra-address} +parameter, which has type @code{intptr_t *} and is passed in register +@code{$12}. @code{_mcount} can then modify the return address by +doing both of the following: +@itemize +@item +Returning the new address in register @code{$31}. +@item +Storing the new address in @code{*@var{ra-address}}, +if @var{ra-address} is nonnull. +@end itemize -Option @option{-xl-mode-@var{app-model}} is a deprecated alias for -@option{-mxl-mode-@var{app-model}}. +The default is @option{-mno-mcount-ra-address}. @end table -@node MIPS Options -@subsection MIPS Options -@cindex MIPS options +@node MMIX Options +@subsection MMIX Options +@cindex MMIX Options + +These options are defined for the MMIX: @table @gcctabopt +@item -mlibfuncs +@itemx -mno-libfuncs +@opindex mlibfuncs +@opindex mno-libfuncs +Specify that intrinsic library functions are being compiled, passing all +values in registers, no matter the size. -@item -EB -@opindex EB -Generate big-endian code. +@item -mepsilon +@itemx -mno-epsilon +@opindex mepsilon +@opindex mno-epsilon +Generate floating-point comparison instructions that compare with respect +to the @code{rE} epsilon register. -@item -EL -@opindex EL -Generate little-endian code. This is the default for @samp{mips*el-*-*} -configurations. +@item -mabi=mmixware +@itemx -mabi=gnu +@opindex mabi=mmixware +@opindex mabi=gnu +Generate code that passes function parameters and return values that (in +the called function) are seen as registers @code{$0} and up, as opposed to +the GNU ABI which uses global registers @code{$231} and up. -@item -march=@var{arch} -@opindex march -Generate code that runs on @var{arch}, which can be the name of a -generic MIPS ISA, or the name of a particular processor. -The ISA names are: -@samp{mips1}, @samp{mips2}, @samp{mips3}, @samp{mips4}, -@samp{mips32}, @samp{mips32r2}, @samp{mips32r3}, @samp{mips32r5}, -@samp{mips32r6}, @samp{mips64}, @samp{mips64r2}, @samp{mips64r3}, -@samp{mips64r5} and @samp{mips64r6}. -The processor names are: -@samp{4kc}, @samp{4km}, @samp{4kp}, @samp{4ksc}, -@samp{4kec}, @samp{4kem}, @samp{4kep}, @samp{4ksd}, -@samp{5kc}, @samp{5kf}, -@samp{20kc}, -@samp{24kc}, @samp{24kf2_1}, @samp{24kf1_1}, -@samp{24kec}, @samp{24kef2_1}, @samp{24kef1_1}, -@samp{34kc}, @samp{34kf2_1}, @samp{34kf1_1}, @samp{34kn}, -@samp{74kc}, @samp{74kf2_1}, @samp{74kf1_1}, @samp{74kf3_2}, -@samp{1004kc}, @samp{1004kf2_1}, @samp{1004kf1_1}, -@samp{loongson2e}, @samp{loongson2f}, @samp{loongson3a}, -@samp{m4k}, -@samp{m14k}, @samp{m14kc}, @samp{m14ke}, @samp{m14kec}, -@samp{octeon}, @samp{octeon+}, @samp{octeon2}, @samp{octeon3}, -@samp{orion}, -@samp{p5600}, -@samp{r2000}, @samp{r3000}, @samp{r3900}, @samp{r4000}, @samp{r4400}, -@samp{r4600}, @samp{r4650}, @samp{r4700}, @samp{r6000}, @samp{r8000}, -@samp{rm7000}, @samp{rm9000}, -@samp{r10000}, @samp{r12000}, @samp{r14000}, @samp{r16000}, -@samp{sb1}, -@samp{sr71000}, -@samp{vr4100}, @samp{vr4111}, @samp{vr4120}, @samp{vr4130}, @samp{vr4300}, -@samp{vr5000}, @samp{vr5400}, @samp{vr5500}, -@samp{xlr} and @samp{xlp}. -The special value @samp{from-abi} selects the -most compatible architecture for the selected ABI (that is, -@samp{mips1} for 32-bit ABIs and @samp{mips3} for 64-bit ABIs)@. +@item -mzero-extend +@itemx -mno-zero-extend +@opindex mzero-extend +@opindex mno-zero-extend +When reading data from memory in sizes shorter than 64 bits, use (do not +use) zero-extending load instructions by default, rather than +sign-extending ones. -The native Linux/GNU toolchain also supports the value @samp{native}, -which selects the best architecture option for the host processor. -@option{-march=native} has no effect if GCC does not recognize -the processor. +@item -mknuthdiv +@itemx -mno-knuthdiv +@opindex mknuthdiv +@opindex mno-knuthdiv +Make the result of a division yielding a remainder have the same sign as +the divisor. With the default, @option{-mno-knuthdiv}, the sign of the +remainder follows the sign of the dividend. Both methods are +arithmetically valid, the latter being almost exclusively used. -In processor names, a final @samp{000} can be abbreviated as @samp{k} -(for example, @option{-march=r2k}). Prefixes are optional, and -@samp{vr} may be written @samp{r}. +@item -mtoplevel-symbols +@itemx -mno-toplevel-symbols +@opindex mtoplevel-symbols +@opindex mno-toplevel-symbols +Prepend (do not prepend) a @samp{:} to all global symbols, so the assembly +code can be used with the @code{PREFIX} assembly directive. -Names of the form @samp{@var{n}f2_1} refer to processors with -FPUs clocked at half the rate of the core, names of the form -@samp{@var{n}f1_1} refer to processors with FPUs clocked at the same -rate as the core, and names of the form @samp{@var{n}f3_2} refer to -processors with FPUs clocked a ratio of 3:2 with respect to the core. -For compatibility reasons, @samp{@var{n}f} is accepted as a synonym -for @samp{@var{n}f2_1} while @samp{@var{n}x} and @samp{@var{b}fx} are -accepted as synonyms for @samp{@var{n}f1_1}. +@item -melf +@opindex melf +Generate an executable in the ELF format, rather than the default +@samp{mmo} format used by the @command{mmix} simulator. -GCC defines two macros based on the value of this option. The first -is @code{_MIPS_ARCH}, which gives the name of target architecture, as -a string. The second has the form @code{_MIPS_ARCH_@var{foo}}, -where @var{foo} is the capitalized value of @code{_MIPS_ARCH}@. -For example, @option{-march=r2000} sets @code{_MIPS_ARCH} -to @code{"r2000"} and defines the macro @code{_MIPS_ARCH_R2000}. +@item -mbranch-predict +@itemx -mno-branch-predict +@opindex mbranch-predict +@opindex mno-branch-predict +Use (do not use) the probable-branch instructions, when static branch +prediction indicates a probable branch. -Note that the @code{_MIPS_ARCH} macro uses the processor names given -above. In other words, it has the full prefix and does not -abbreviate @samp{000} as @samp{k}. In the case of @samp{from-abi}, -the macro names the resolved architecture (either @code{"mips1"} or -@code{"mips3"}). It names the default architecture when no -@option{-march} option is given. +@item -mbase-addresses +@itemx -mno-base-addresses +@opindex mbase-addresses +@opindex mno-base-addresses +Generate (do not generate) code that uses @emph{base addresses}. Using a +base address automatically generates a request (handled by the assembler +and the linker) for a constant to be set up in a global register. The +register is used for one or more base address requests within the range 0 +to 255 from the value held in the register. The generally leads to short +and fast code, but the number of different data items that can be +addressed is limited. This means that a program that uses lots of static +data may require @option{-mno-base-addresses}. -@item -mtune=@var{arch} -@opindex mtune -Optimize for @var{arch}. Among other things, this option controls -the way instructions are scheduled, and the perceived cost of arithmetic -operations. The list of @var{arch} values is the same as for -@option{-march}. +@item -msingle-exit +@itemx -mno-single-exit +@opindex msingle-exit +@opindex mno-single-exit +Force (do not force) generated code to have a single exit point in each +function. +@end table -When this option is not used, GCC optimizes for the processor -specified by @option{-march}. By using @option{-march} and -@option{-mtune} together, it is possible to generate code that -runs on a family of processors, but optimize the code for one -particular member of that family. +@node MN10300 Options +@subsection MN10300 Options +@cindex MN10300 options -@option{-mtune} defines the macros @code{_MIPS_TUNE} and -@code{_MIPS_TUNE_@var{foo}}, which work in the same way as the -@option{-march} ones described above. +These @option{-m} options are defined for Matsushita MN10300 architectures: -@item -mips1 -@opindex mips1 -Equivalent to @option{-march=mips1}. +@table @gcctabopt +@item -mmult-bug +@opindex mmult-bug +Generate code to avoid bugs in the multiply instructions for the MN10300 +processors. This is the default. -@item -mips2 -@opindex mips2 -Equivalent to @option{-march=mips2}. +@item -mno-mult-bug +@opindex mno-mult-bug +Do not generate code to avoid bugs in the multiply instructions for the +MN10300 processors. -@item -mips3 -@opindex mips3 -Equivalent to @option{-march=mips3}. +@item -mam33 +@opindex mam33 +Generate code using features specific to the AM33 processor. -@item -mips4 -@opindex mips4 -Equivalent to @option{-march=mips4}. +@item -mno-am33 +@opindex mno-am33 +Do not generate code using features specific to the AM33 processor. This +is the default. -@item -mips32 -@opindex mips32 -Equivalent to @option{-march=mips32}. +@item -mam33-2 +@opindex mam33-2 +Generate code using features specific to the AM33/2.0 processor. -@item -mips32r3 -@opindex mips32r3 -Equivalent to @option{-march=mips32r3}. +@item -mam34 +@opindex mam34 +Generate code using features specific to the AM34 processor. -@item -mips32r5 -@opindex mips32r5 -Equivalent to @option{-march=mips32r5}. +@item -mtune=@var{cpu-type} +@opindex mtune +Use the timing characteristics of the indicated CPU type when +scheduling instructions. This does not change the targeted processor +type. The CPU type must be one of @samp{mn10300}, @samp{am33}, +@samp{am33-2} or @samp{am34}. -@item -mips32r6 -@opindex mips32r6 -Equivalent to @option{-march=mips32r6}. +@item -mreturn-pointer-on-d0 +@opindex mreturn-pointer-on-d0 +When generating a function that returns a pointer, return the pointer +in both @code{a0} and @code{d0}. Otherwise, the pointer is returned +only in @code{a0}, and attempts to call such functions without a prototype +result in errors. Note that this option is on by default; use +@option{-mno-return-pointer-on-d0} to disable it. -@item -mips64 -@opindex mips64 -Equivalent to @option{-march=mips64}. +@item -mno-crt0 +@opindex mno-crt0 +Do not link in the C run-time initialization object file. -@item -mips64r2 -@opindex mips64r2 -Equivalent to @option{-march=mips64r2}. +@item -mrelax +@opindex mrelax +Indicate to the linker that it should perform a relaxation optimization pass +to shorten branches, calls and absolute memory addresses. This option only +has an effect when used on the command line for the final link step. -@item -mips64r3 -@opindex mips64r3 -Equivalent to @option{-march=mips64r3}. +This option makes symbolic debugging impossible. -@item -mips64r5 -@opindex mips64r5 -Equivalent to @option{-march=mips64r5}. +@item -mliw +@opindex mliw +Allow the compiler to generate @emph{Long Instruction Word} +instructions if the target is the @samp{AM33} or later. This is the +default. This option defines the preprocessor macro @code{__LIW__}. -@item -mips64r6 -@opindex mips64r6 -Equivalent to @option{-march=mips64r6}. +@item -mnoliw +@opindex mnoliw +Do not allow the compiler to generate @emph{Long Instruction Word} +instructions. This option defines the preprocessor macro +@code{__NO_LIW__}. -@item -mips16 -@itemx -mno-mips16 -@opindex mips16 -@opindex mno-mips16 -Generate (do not generate) MIPS16 code. If GCC is targeting a -MIPS32 or MIPS64 architecture, it makes use of the MIPS16e ASE@. +@item -msetlb +@opindex msetlb +Allow the compiler to generate the @emph{SETLB} and @emph{Lcc} +instructions if the target is the @samp{AM33} or later. This is the +default. This option defines the preprocessor macro @code{__SETLB__}. -MIPS16 code generation can also be controlled on a per-function basis -by means of @code{mips16} and @code{nomips16} attributes. -@xref{Function Attributes}, for more information. +@item -mnosetlb +@opindex mnosetlb +Do not allow the compiler to generate @emph{SETLB} or @emph{Lcc} +instructions. This option defines the preprocessor macro +@code{__NO_SETLB__}. -@item -mflip-mips16 -@opindex mflip-mips16 -Generate MIPS16 code on alternating functions. This option is provided -for regression testing of mixed MIPS16/non-MIPS16 code generation, and is -not intended for ordinary use in compiling user code. +@end table -@item -minterlink-compressed -@item -mno-interlink-compressed -@opindex minterlink-compressed -@opindex mno-interlink-compressed -Require (do not require) that code using the standard (uncompressed) MIPS ISA -be link-compatible with MIPS16 and microMIPS code, and vice versa. +@node Moxie Options +@subsection Moxie Options +@cindex Moxie Options -For example, code using the standard ISA encoding cannot jump directly -to MIPS16 or microMIPS code; it must either use a call or an indirect jump. -@option{-minterlink-compressed} therefore disables direct jumps unless GCC -knows that the target of the jump is not compressed. +@table @gcctabopt -@item -minterlink-mips16 -@itemx -mno-interlink-mips16 -@opindex minterlink-mips16 -@opindex mno-interlink-mips16 -Aliases of @option{-minterlink-compressed} and -@option{-mno-interlink-compressed}. These options predate the microMIPS ASE -and are retained for backwards compatibility. +@item -meb +@opindex meb +Generate big-endian code. This is the default for @samp{moxie-*-*} +configurations. -@item -mabi=32 -@itemx -mabi=o64 -@itemx -mabi=n32 -@itemx -mabi=64 -@itemx -mabi=eabi -@opindex mabi=32 -@opindex mabi=o64 -@opindex mabi=n32 -@opindex mabi=64 -@opindex mabi=eabi -Generate code for the given ABI@. +@item -mel +@opindex mel +Generate little-endian code. -Note that the EABI has a 32-bit and a 64-bit variant. GCC normally -generates 64-bit code when you select a 64-bit architecture, but you -can use @option{-mgp32} to get 32-bit code instead. +@item -mmul.x +@opindex mmul.x +Generate mul.x and umul.x instructions. This is the default for +@samp{moxiebox-*-*} configurations. -For information about the O64 ABI, see -@uref{http://gcc.gnu.org/@/projects/@/mipso64-abi.html}. +@item -mno-crt0 +@opindex mno-crt0 +Do not link in the C run-time initialization object file. -GCC supports a variant of the o32 ABI in which floating-point registers -are 64 rather than 32 bits wide. You can select this combination with -@option{-mabi=32} @option{-mfp64}. This ABI relies on the @code{mthc1} -and @code{mfhc1} instructions and is therefore only supported for -MIPS32R2, MIPS32R3 and MIPS32R5 processors. +@end table -The register assignments for arguments and return values remain the -same, but each scalar value is passed in a single 64-bit register -rather than a pair of 32-bit registers. For example, scalar -floating-point values are returned in @samp{$f0} only, not a -@samp{$f0}/@samp{$f1} pair. The set of call-saved registers also -remains the same in that the even-numbered double-precision registers -are saved. +@node MSP430 Options +@subsection MSP430 Options +@cindex MSP430 Options -Two additional variants of the o32 ABI are supported to enable -a transition from 32-bit to 64-bit registers. These are FPXX -(@option{-mfpxx}) and FP64A (@option{-mfp64} @option{-mno-odd-spreg}). -The FPXX extension mandates that all code must execute correctly -when run using 32-bit or 64-bit registers. The code can be interlinked -with either FP32 or FP64, but not both. -The FP64A extension is similar to the FP64 extension but forbids the -use of odd-numbered single-precision registers. This can be used -in conjunction with the @code{FRE} mode of FPUs in MIPS32R5 -processors and allows both FP32 and FP64A code to interlink and -run in the same process without changing FPU modes. +These options are defined for the MSP430: -@item -mabicalls -@itemx -mno-abicalls -@opindex mabicalls -@opindex mno-abicalls -Generate (do not generate) code that is suitable for SVR4-style -dynamic objects. @option{-mabicalls} is the default for SVR4-based -systems. +@table @gcctabopt -@item -mshared -@itemx -mno-shared -Generate (do not generate) code that is fully position-independent, -and that can therefore be linked into shared libraries. This option -only affects @option{-mabicalls}. +@item -masm-hex +@opindex masm-hex +Force assembly output to always use hex constants. Normally such +constants are signed decimals, but this option is available for +testsuite and/or aesthetic purposes. -All @option{-mabicalls} code has traditionally been position-independent, -regardless of options like @option{-fPIC} and @option{-fpic}. However, -as an extension, the GNU toolchain allows executables to use absolute -accesses for locally-binding symbols. It can also use shorter GP -initialization sequences and generate direct calls to locally-defined -functions. This mode is selected by @option{-mno-shared}. +@item -mmcu= +@opindex mmcu= +Select the MCU to target. This is used to create a C preprocessor +symbol based upon the MCU name, converted to upper case and pre- and +post-fixed with @samp{__}. This in turn is used by the +@file{msp430.h} header file to select an MCU-specific supplementary +header file. -@option{-mno-shared} depends on binutils 2.16 or higher and generates -objects that can only be linked by the GNU linker. However, the option -does not affect the ABI of the final executable; it only affects the ABI -of relocatable objects. Using @option{-mno-shared} generally makes -executables both smaller and quicker. +The option also sets the ISA to use. If the MCU name is one that is +known to only support the 430 ISA then that is selected, otherwise the +430X ISA is selected. A generic MCU name of @samp{msp430} can also be +used to select the 430 ISA. Similarly the generic @samp{msp430x} MCU +name selects the 430X ISA. -@option{-mshared} is the default. +In addition an MCU-specific linker script is added to the linker +command line. The script's name is the name of the MCU with +@file{.ld} appended. Thus specifying @option{-mmcu=xxx} on the @command{gcc} +command line defines the C preprocessor symbol @code{__XXX__} and +cause the linker to search for a script called @file{xxx.ld}. -@item -mplt -@itemx -mno-plt -@opindex mplt -@opindex mno-plt -Assume (do not assume) that the static and dynamic linkers -support PLTs and copy relocations. This option only affects -@option{-mno-shared -mabicalls}. For the n64 ABI, this option -has no effect without @option{-msym32}. +This option is also passed on to the assembler. -You can make @option{-mplt} the default by configuring -GCC with @option{--with-mips-plt}. The default is -@option{-mno-plt} otherwise. +@item -mcpu= +@opindex mcpu= +Specifies the ISA to use. Accepted values are @samp{msp430}, +@samp{msp430x} and @samp{msp430xv2}. This option is deprecated. The +@option{-mmcu=} option should be used to select the ISA. -@item -mxgot -@itemx -mno-xgot -@opindex mxgot -@opindex mno-xgot -Lift (do not lift) the usual restrictions on the size of the global -offset table. +@item -msim +@opindex msim +Link to the simulator runtime libraries and linker script. Overrides +any scripts that would be selected by the @option{-mmcu=} option. -GCC normally uses a single instruction to load values from the GOT@. -While this is relatively efficient, it only works if the GOT -is smaller than about 64k. Anything larger causes the linker -to report an error such as: +@item -mlarge +@opindex mlarge +Use large-model addressing (20-bit pointers, 32-bit @code{size_t}). -@cindex relocation truncated to fit (MIPS) -@smallexample -relocation truncated to fit: R_MIPS_GOT16 foobar -@end smallexample +@item -msmall +@opindex msmall +Use small-model addressing (16-bit pointers, 16-bit @code{size_t}). -If this happens, you should recompile your code with @option{-mxgot}. -This works with very large GOTs, although the code is also -less efficient, since it takes three instructions to fetch the -value of a global symbol. +@item -mrelax +@opindex mrelax +This option is passed to the assembler and linker, and allows the +linker to perform certain optimizations that cannot be done until +the final link. -Note that some linkers can create multiple GOTs. If you have such a -linker, you should only need to use @option{-mxgot} when a single object -file accesses more than 64k's worth of GOT entries. Very few do. +@item mhwmult= +@opindex mhwmult= +Describes the type of hardware multiply supported by the target. +Accepted values are @samp{none} for no hardware multiply, @samp{16bit} +for the original 16-bit-only multiply supported by early MCUs. +@samp{32bit} for the 16/32-bit multiply supported by later MCUs and +@samp{f5series} for the 16/32-bit multiply supported by F5-series MCUs. +A value of @samp{auto} can also be given. This tells GCC to deduce +the hardware multiply support based upon the MCU name provided by the +@option{-mmcu} option. If no @option{-mmcu} option is specified then +@samp{32bit} hardware multiply support is assumed. @samp{auto} is the +default setting. -These options have no effect unless GCC is generating position -independent code. +Hardware multiplies are normally performed by calling a library +routine. This saves space in the generated code. When compiling at +@option{-O3} or higher however the hardware multiplier is invoked +inline. This makes for bigger, but faster code. -@item -mgp32 -@opindex mgp32 -Assume that general-purpose registers are 32 bits wide. +The hardware multiply routines disable interrupts whilst running and +restore the previous interrupt state when they finish. This makes +them safe to use inside interrupt handlers as well as in normal code. -@item -mgp64 -@opindex mgp64 -Assume that general-purpose registers are 64 bits wide. +@item -minrt +@opindex minrt +Enable the use of a minimum runtime environment - no static +initializers or constructors. This is intended for memory-constrained +devices. The compiler includes special symbols in some objects +that tell the linker and runtime which code fragments are required. -@item -mfp32 -@opindex mfp32 -Assume that floating-point registers are 32 bits wide. +@end table -@item -mfp64 -@opindex mfp64 -Assume that floating-point registers are 64 bits wide. +@node NDS32 Options +@subsection NDS32 Options +@cindex NDS32 Options -@item -mfpxx -@opindex mfpxx -Do not assume the width of floating-point registers. +These options are defined for NDS32 implementations: -@item -mhard-float -@opindex mhard-float -Use floating-point coprocessor instructions. +@table @gcctabopt -@item -msoft-float -@opindex msoft-float -Do not use floating-point coprocessor instructions. Implement -floating-point calculations using library calls instead. +@item -mbig-endian +@opindex mbig-endian +Generate code in big-endian mode. -@item -mno-float -@opindex mno-float -Equivalent to @option{-msoft-float}, but additionally asserts that the -program being compiled does not perform any floating-point operations. -This option is presently supported only by some bare-metal MIPS -configurations, where it may select a special set of libraries -that lack all floating-point support (including, for example, the -floating-point @code{printf} formats). -If code compiled with @option{-mno-float} accidentally contains -floating-point operations, it is likely to suffer a link-time -or run-time failure. +@item -mlittle-endian +@opindex mlittle-endian +Generate code in little-endian mode. -@item -msingle-float -@opindex msingle-float -Assume that the floating-point coprocessor only supports single-precision -operations. +@item -mreduced-regs +@opindex mreduced-regs +Use reduced-set registers for register allocation. -@item -mdouble-float -@opindex mdouble-float -Assume that the floating-point coprocessor supports double-precision -operations. This is the default. +@item -mfull-regs +@opindex mfull-regs +Use full-set registers for register allocation. -@item -modd-spreg -@itemx -mno-odd-spreg -@opindex modd-spreg -@opindex mno-odd-spreg -Enable the use of odd-numbered single-precision floating-point registers -for the o32 ABI. This is the default for processors that are known to -support these registers. When using the o32 FPXX ABI, @option{-mno-odd-spreg} -is set by default. +@item -mcmov +@opindex mcmov +Generate conditional move instructions. -@item -mabs=2008 -@itemx -mabs=legacy -@opindex mabs=2008 -@opindex mabs=legacy -These options control the treatment of the special not-a-number (NaN) -IEEE 754 floating-point data with the @code{abs.@i{fmt}} and -@code{neg.@i{fmt}} machine instructions. +@item -mno-cmov +@opindex mno-cmov +Do not generate conditional move instructions. -By default or when the @option{-mabs=legacy} is used the legacy -treatment is selected. In this case these instructions are considered -arithmetic and avoided where correct operation is required and the -input operand might be a NaN. A longer sequence of instructions that -manipulate the sign bit of floating-point datum manually is used -instead unless the @option{-ffinite-math-only} option has also been -specified. +@item -mperf-ext +@opindex mperf-ext +Generate performance extension instructions. -The @option{-mabs=2008} option selects the IEEE 754-2008 treatment. In -this case these instructions are considered non-arithmetic and therefore -operating correctly in all cases, including in particular where the -input operand is a NaN. These instructions are therefore always used -for the respective operations. +@item -mno-perf-ext +@opindex mno-perf-ext +Do not generate performance extension instructions. -@item -mnan=2008 -@itemx -mnan=legacy -@opindex mnan=2008 -@opindex mnan=legacy -These options control the encoding of the special not-a-number (NaN) -IEEE 754 floating-point data. +@item -mv3push +@opindex mv3push +Generate v3 push25/pop25 instructions. -The @option{-mnan=legacy} option selects the legacy encoding. In this -case quiet NaNs (qNaNs) are denoted by the first bit of their trailing -significand field being 0, whereas signalling NaNs (sNaNs) are denoted -by the first bit of their trailing significand field being 1. +@item -mno-v3push +@opindex mno-v3push +Do not generate v3 push25/pop25 instructions. -The @option{-mnan=2008} option selects the IEEE 754-2008 encoding. In -this case qNaNs are denoted by the first bit of their trailing -significand field being 1, whereas sNaNs are denoted by the first bit of -their trailing significand field being 0. +@item -m16-bit +@opindex m16-bit +Generate 16-bit instructions. -The default is @option{-mnan=legacy} unless GCC has been configured with -@option{--with-nan=2008}. +@item -mno-16-bit +@opindex mno-16-bit +Do not generate 16-bit instructions. -@item -mllsc -@itemx -mno-llsc -@opindex mllsc -@opindex mno-llsc -Use (do not use) @samp{ll}, @samp{sc}, and @samp{sync} instructions to -implement atomic memory built-in functions. When neither option is -specified, GCC uses the instructions if the target architecture -supports them. +@item -misr-vector-size=@var{num} +@opindex misr-vector-size +Specify the size of each interrupt vector, which must be 4 or 16. -@option{-mllsc} is useful if the runtime environment can emulate the -instructions and @option{-mno-llsc} can be useful when compiling for -nonstandard ISAs. You can make either option the default by -configuring GCC with @option{--with-llsc} and @option{--without-llsc} -respectively. @option{--with-llsc} is the default for some -configurations; see the installation documentation for details. +@item -mcache-block-size=@var{num} +@opindex mcache-block-size +Specify the size of each cache block, +which must be a power of 2 between 4 and 512. -@item -mdsp -@itemx -mno-dsp -@opindex mdsp -@opindex mno-dsp -Use (do not use) revision 1 of the MIPS DSP ASE@. -@xref{MIPS DSP Built-in Functions}. This option defines the -preprocessor macro @code{__mips_dsp}. It also defines -@code{__mips_dsp_rev} to 1. +@item -march=@var{arch} +@opindex march +Specify the name of the target architecture. -@item -mdspr2 -@itemx -mno-dspr2 -@opindex mdspr2 -@opindex mno-dspr2 -Use (do not use) revision 2 of the MIPS DSP ASE@. -@xref{MIPS DSP Built-in Functions}. This option defines the -preprocessor macros @code{__mips_dsp} and @code{__mips_dspr2}. -It also defines @code{__mips_dsp_rev} to 2. +@item -mcmodel=@var{code-model} +@opindex mcmodel +Set the code model to one of +@table @asis +@item @samp{small} +All the data and read-only data segments must be within 512KB addressing space. +The text segment must be within 16MB addressing space. +@item @samp{medium} +The data segment must be within 512KB while the read-only data segment can be +within 4GB addressing space. The text segment should be still within 16MB +addressing space. +@item @samp{large} +All the text and data segments can be within 4GB addressing space. +@end table -@item -msmartmips -@itemx -mno-smartmips -@opindex msmartmips -@opindex mno-smartmips -Use (do not use) the MIPS SmartMIPS ASE. +@item -mctor-dtor +@opindex mctor-dtor +Enable constructor/destructor feature. -@item -mpaired-single -@itemx -mno-paired-single -@opindex mpaired-single -@opindex mno-paired-single -Use (do not use) paired-single floating-point instructions. -@xref{MIPS Paired-Single Support}. This option requires -hardware floating-point support to be enabled. +@item -mrelax +@opindex mrelax +Guide linker to relax instructions. -@item -mdmx -@itemx -mno-mdmx -@opindex mdmx -@opindex mno-mdmx -Use (do not use) MIPS Digital Media Extension instructions. -This option can only be used when generating 64-bit code and requires -hardware floating-point support to be enabled. +@end table -@item -mips3d -@itemx -mno-mips3d -@opindex mips3d -@opindex mno-mips3d -Use (do not use) the MIPS-3D ASE@. @xref{MIPS-3D Built-in Functions}. -The option @option{-mips3d} implies @option{-mpaired-single}. +@node Nios II Options +@subsection Nios II Options +@cindex Nios II options +@cindex Altera Nios II options -@item -mmicromips -@itemx -mno-micromips -@opindex mmicromips -@opindex mno-mmicromips -Generate (do not generate) microMIPS code. +These are the options defined for the Altera Nios II processor. -MicroMIPS code generation can also be controlled on a per-function basis -by means of @code{micromips} and @code{nomicromips} attributes. -@xref{Function Attributes}, for more information. +@table @gcctabopt -@item -mmt -@itemx -mno-mt -@opindex mmt -@opindex mno-mt -Use (do not use) MT Multithreading instructions. +@item -G @var{num} +@opindex G +@cindex smaller data references +Put global and static objects less than or equal to @var{num} bytes +into the small data or BSS sections instead of the normal data or BSS +sections. The default value of @var{num} is 8. -@item -mmcu -@itemx -mno-mcu -@opindex mmcu -@opindex mno-mcu -Use (do not use) the MIPS MCU ASE instructions. +@item -mgpopt=@var{option} +@item -mgpopt +@itemx -mno-gpopt +@opindex mgpopt +@opindex mno-gpopt +Generate (do not generate) GP-relative accesses. The following +@var{option} names are recognized: -@item -meva -@itemx -mno-eva -@opindex meva -@opindex mno-eva -Use (do not use) the MIPS Enhanced Virtual Addressing instructions. +@table @samp -@item -mvirt -@itemx -mno-virt -@opindex mvirt -@opindex mno-virt -Use (do not use) the MIPS Virtualization Application Specific instructions. +@item none +Do not generate GP-relative accesses. -@item -mxpa -@itemx -mno-xpa -@opindex mxpa -@opindex mno-xpa -Use (do not use) the MIPS eXtended Physical Address (XPA) instructions. +@item local +Generate GP-relative accesses for small data objects that are not +external or weak. Also use GP-relative addressing for objects that +have been explicitly placed in a small data section via a @code{section} +attribute. -@item -mlong64 -@opindex mlong64 -Force @code{long} types to be 64 bits wide. See @option{-mlong32} for -an explanation of the default and the way that the pointer size is -determined. +@item global +As for @samp{local}, but also generate GP-relative accesses for +small data objects that are external or weak. If you use this option, +you must ensure that all parts of your program (including libraries) are +compiled with the same @option{-G} setting. -@item -mlong32 -@opindex mlong32 -Force @code{long}, @code{int}, and pointer types to be 32 bits wide. +@item data +Generate GP-relative accesses for all data objects in the program. If you +use this option, the entire data and BSS segments +of your program must fit in 64K of memory and you must use an appropriate +linker script to allocate them within the addressible range of the +global pointer. -The default size of @code{int}s, @code{long}s and pointers depends on -the ABI@. All the supported ABIs use 32-bit @code{int}s. The n64 ABI -uses 64-bit @code{long}s, as does the 64-bit EABI; the others use -32-bit @code{long}s. Pointers are the same size as @code{long}s, -or the same size as integer registers, whichever is smaller. +@item all +Generate GP-relative addresses for function pointers as well as data +pointers. If you use this option, the entire text, data, and BSS segments +of your program must fit in 64K of memory and you must use an appropriate +linker script to allocate them within the addressible range of the +global pointer. -@item -msym32 -@itemx -mno-sym32 -@opindex msym32 -@opindex mno-sym32 -Assume (do not assume) that all symbols have 32-bit values, regardless -of the selected ABI@. This option is useful in combination with -@option{-mabi=64} and @option{-mno-abicalls} because it allows GCC -to generate shorter and faster references to symbolic addresses. +@end table -@item -G @var{num} -@opindex G -Put definitions of externally-visible data in a small data section -if that data is no bigger than @var{num} bytes. GCC can then generate -more efficient accesses to the data; see @option{-mgpopt} for details. +@option{-mgpopt} is equivalent to @option{-mgpopt=local}, and +@option{-mno-gpopt} is equivalent to @option{-mgpopt=none}. -The default @option{-G} option depends on the configuration. +The default is @option{-mgpopt} except when @option{-fpic} or +@option{-fPIC} is specified to generate position-independent code. +Note that the Nios II ABI does not permit GP-relative accesses from +shared libraries. -@item -mlocal-sdata -@itemx -mno-local-sdata -@opindex mlocal-sdata -@opindex mno-local-sdata -Extend (do not extend) the @option{-G} behavior to local data too, -such as to static variables in C@. @option{-mlocal-sdata} is the -default for all configurations. +You may need to specify @option{-mno-gpopt} explicitly when building +programs that include large amounts of small data, including large +GOT data sections. In this case, the 16-bit offset for GP-relative +addressing may not be large enough to allow access to the entire +small data section. -If the linker complains that an application is using too much small data, -you might want to try rebuilding the less performance-critical parts with -@option{-mno-local-sdata}. You might also want to build large -libraries with @option{-mno-local-sdata}, so that the libraries leave -more room for the main program. +@item -mel +@itemx -meb +@opindex mel +@opindex meb +Generate little-endian (default) or big-endian (experimental) code, +respectively. -@item -mextern-sdata -@itemx -mno-extern-sdata -@opindex mextern-sdata -@opindex mno-extern-sdata -Assume (do not assume) that externally-defined data is in -a small data section if the size of that data is within the @option{-G} limit. -@option{-mextern-sdata} is the default for all configurations. +@item -mbypass-cache +@itemx -mno-bypass-cache +@opindex mno-bypass-cache +@opindex mbypass-cache +Force all load and store instructions to always bypass cache by +using I/O variants of the instructions. The default is not to +bypass the cache. -If you compile a module @var{Mod} with @option{-mextern-sdata} @option{-G -@var{num}} @option{-mgpopt}, and @var{Mod} references a variable @var{Var} -that is no bigger than @var{num} bytes, you must make sure that @var{Var} -is placed in a small data section. If @var{Var} is defined by another -module, you must either compile that module with a high-enough -@option{-G} setting or attach a @code{section} attribute to @var{Var}'s -definition. If @var{Var} is common, you must link the application -with a high-enough @option{-G} setting. +@item -mno-cache-volatile +@itemx -mcache-volatile +@opindex mcache-volatile +@opindex mno-cache-volatile +Volatile memory access bypass the cache using the I/O variants of +the load and store instructions. The default is not to bypass the cache. -The easiest way of satisfying these restrictions is to compile -and link every module with the same @option{-G} option. However, -you may wish to build a library that supports several different -small data limits. You can do this by compiling the library with -the highest supported @option{-G} setting and additionally using -@option{-mno-extern-sdata} to stop the library from making assumptions -about externally-defined data. +@item -mno-fast-sw-div +@itemx -mfast-sw-div +@opindex mno-fast-sw-div +@opindex mfast-sw-div +Do not use table-based fast divide for small numbers. The default +is to use the fast divide at @option{-O3} and above. -@item -mgpopt -@itemx -mno-gpopt -@opindex mgpopt -@opindex mno-gpopt -Use (do not use) GP-relative accesses for symbols that are known to be -in a small data section; see @option{-G}, @option{-mlocal-sdata} and -@option{-mextern-sdata}. @option{-mgpopt} is the default for all -configurations. +@item -mno-hw-mul +@itemx -mhw-mul +@itemx -mno-hw-mulx +@itemx -mhw-mulx +@itemx -mno-hw-div +@itemx -mhw-div +@opindex mno-hw-mul +@opindex mhw-mul +@opindex mno-hw-mulx +@opindex mhw-mulx +@opindex mno-hw-div +@opindex mhw-div +Enable or disable emitting @code{mul}, @code{mulx} and @code{div} family of +instructions by the compiler. The default is to emit @code{mul} +and not emit @code{div} and @code{mulx}. -@option{-mno-gpopt} is useful for cases where the @code{$gp} register -might not hold the value of @code{_gp}. For example, if the code is -part of a library that might be used in a boot monitor, programs that -call boot monitor routines pass an unknown value in @code{$gp}. -(In such situations, the boot monitor itself is usually compiled -with @option{-G0}.) +@item -mcustom-@var{insn}=@var{N} +@itemx -mno-custom-@var{insn} +@opindex mcustom-@var{insn} +@opindex mno-custom-@var{insn} +Each @option{-mcustom-@var{insn}=@var{N}} option enables use of a +custom instruction with encoding @var{N} when generating code that uses +@var{insn}. For example, @option{-mcustom-fadds=253} generates custom +instruction 253 for single-precision floating-point add operations instead +of the default behavior of using a library call. -@option{-mno-gpopt} implies @option{-mno-local-sdata} and -@option{-mno-extern-sdata}. +The following values of @var{insn} are supported. Except as otherwise +noted, floating-point operations are expected to be implemented with +normal IEEE 754 semantics and correspond directly to the C operators or the +equivalent GCC built-in functions (@pxref{Other Builtins}). -@item -membedded-data -@itemx -mno-embedded-data -@opindex membedded-data -@opindex mno-embedded-data -Allocate variables to the read-only data section first if possible, then -next in the small data section if possible, otherwise in data. This gives -slightly slower code than the default, but reduces the amount of RAM required -when executing, and thus may be preferred for some embedded systems. +Single-precision floating point: +@table @asis -@item -muninit-const-in-rodata -@itemx -mno-uninit-const-in-rodata -@opindex muninit-const-in-rodata -@opindex mno-uninit-const-in-rodata -Put uninitialized @code{const} variables in the read-only data section. -This option is only meaningful in conjunction with @option{-membedded-data}. +@item @samp{fadds}, @samp{fsubs}, @samp{fdivs}, @samp{fmuls} +Binary arithmetic operations. -@item -mcode-readable=@var{setting} -@opindex mcode-readable -Specify whether GCC may generate code that reads from executable sections. -There are three possible settings: +@item @samp{fnegs} +Unary negation. -@table @gcctabopt -@item -mcode-readable=yes -Instructions may freely access executable sections. This is the -default setting. +@item @samp{fabss} +Unary absolute value. -@item -mcode-readable=pcrel -MIPS16 PC-relative load instructions can access executable sections, -but other instructions must not do so. This option is useful on 4KSc -and 4KSd processors when the code TLBs have the Read Inhibit bit set. -It is also useful on processors that can be configured to have a dual -instruction/data SRAM interface and that, like the M4K, automatically -redirect PC-relative loads to the instruction RAM. +@item @samp{fcmpeqs}, @samp{fcmpges}, @samp{fcmpgts}, @samp{fcmples}, @samp{fcmplts}, @samp{fcmpnes} +Comparison operations. -@item -mcode-readable=no -Instructions must not access executable sections. This option can be -useful on targets that are configured to have a dual instruction/data -SRAM interface but that (unlike the M4K) do not automatically redirect -PC-relative loads to the instruction RAM. -@end table +@item @samp{fmins}, @samp{fmaxs} +Floating-point minimum and maximum. These instructions are only +generated if @option{-ffinite-math-only} is specified. -@item -msplit-addresses -@itemx -mno-split-addresses -@opindex msplit-addresses -@opindex mno-split-addresses -Enable (disable) use of the @code{%hi()} and @code{%lo()} assembler -relocation operators. This option has been superseded by -@option{-mexplicit-relocs} but is retained for backwards compatibility. +@item @samp{fsqrts} +Unary square root operation. -@item -mexplicit-relocs -@itemx -mno-explicit-relocs -@opindex mexplicit-relocs -@opindex mno-explicit-relocs -Use (do not use) assembler relocation operators when dealing with symbolic -addresses. The alternative, selected by @option{-mno-explicit-relocs}, -is to use assembler macros instead. +@item @samp{fcoss}, @samp{fsins}, @samp{ftans}, @samp{fatans}, @samp{fexps}, @samp{flogs} +Floating-point trigonometric and exponential functions. These instructions +are only generated if @option{-funsafe-math-optimizations} is also specified. -@option{-mexplicit-relocs} is the default if GCC was configured -to use an assembler that supports relocation operators. +@end table -@item -mcheck-zero-division -@itemx -mno-check-zero-division -@opindex mcheck-zero-division -@opindex mno-check-zero-division -Trap (do not trap) on integer division by zero. +Double-precision floating point: +@table @asis -The default is @option{-mcheck-zero-division}. +@item @samp{faddd}, @samp{fsubd}, @samp{fdivd}, @samp{fmuld} +Binary arithmetic operations. -@item -mdivide-traps -@itemx -mdivide-breaks -@opindex mdivide-traps -@opindex mdivide-breaks -MIPS systems check for division by zero by generating either a -conditional trap or a break instruction. Using traps results in -smaller code, but is only supported on MIPS II and later. Also, some -versions of the Linux kernel have a bug that prevents trap from -generating the proper signal (@code{SIGFPE}). Use @option{-mdivide-traps} to -allow conditional traps on architectures that support them and -@option{-mdivide-breaks} to force the use of breaks. +@item @samp{fnegd} +Unary negation. -The default is usually @option{-mdivide-traps}, but this can be -overridden at configure time using @option{--with-divide=breaks}. -Divide-by-zero checks can be completely disabled using -@option{-mno-check-zero-division}. +@item @samp{fabsd} +Unary absolute value. -@item -mmemcpy -@itemx -mno-memcpy -@opindex mmemcpy -@opindex mno-memcpy -Force (do not force) the use of @code{memcpy} for non-trivial block -moves. The default is @option{-mno-memcpy}, which allows GCC to inline -most constant-sized copies. +@item @samp{fcmpeqd}, @samp{fcmpged}, @samp{fcmpgtd}, @samp{fcmpled}, @samp{fcmpltd}, @samp{fcmpned} +Comparison operations. -@item -mlong-calls -@itemx -mno-long-calls -@opindex mlong-calls -@opindex mno-long-calls -Disable (do not disable) use of the @code{jal} instruction. Calling -functions using @code{jal} is more efficient but requires the caller -and callee to be in the same 256 megabyte segment. +@item @samp{fmind}, @samp{fmaxd} +Double-precision minimum and maximum. These instructions are only +generated if @option{-ffinite-math-only} is specified. -This option has no effect on abicalls code. The default is -@option{-mno-long-calls}. +@item @samp{fsqrtd} +Unary square root operation. -@item -mmad -@itemx -mno-mad -@opindex mmad -@opindex mno-mad -Enable (disable) use of the @code{mad}, @code{madu} and @code{mul} -instructions, as provided by the R4650 ISA@. +@item @samp{fcosd}, @samp{fsind}, @samp{ftand}, @samp{fatand}, @samp{fexpd}, @samp{flogd} +Double-precision trigonometric and exponential functions. These instructions +are only generated if @option{-funsafe-math-optimizations} is also specified. -@item -mimadd -@itemx -mno-imadd -@opindex mimadd -@opindex mno-imadd -Enable (disable) use of the @code{madd} and @code{msub} integer -instructions. The default is @option{-mimadd} on architectures -that support @code{madd} and @code{msub} except for the 74k -architecture where it was found to generate slower code. +@end table -@item -mfused-madd -@itemx -mno-fused-madd -@opindex mfused-madd -@opindex mno-fused-madd -Enable (disable) use of the floating-point multiply-accumulate -instructions, when they are available. The default is -@option{-mfused-madd}. +Conversions: +@table @asis +@item @samp{fextsd} +Conversion from single precision to double precision. -On the R8000 CPU when multiply-accumulate instructions are used, -the intermediate product is calculated to infinite precision -and is not subject to the FCSR Flush to Zero bit. This may be -undesirable in some circumstances. On other processors the result -is numerically identical to the equivalent computation using -separate multiply, add, subtract and negate instructions. +@item @samp{ftruncds} +Conversion from double precision to single precision. -@item -nocpp -@opindex nocpp -Tell the MIPS assembler to not run its preprocessor over user -assembler files (with a @samp{.s} suffix) when assembling them. +@item @samp{fixsi}, @samp{fixsu}, @samp{fixdi}, @samp{fixdu} +Conversion from floating point to signed or unsigned integer types, with +truncation towards zero. -@item -mfix-24k -@item -mno-fix-24k -@opindex mfix-24k -@opindex mno-fix-24k -Work around the 24K E48 (lost data on stores during refill) errata. -The workarounds are implemented by the assembler rather than by GCC@. +@item @samp{round} +Conversion from single-precision floating point to signed integer, +rounding to the nearest integer and ties away from zero. +This corresponds to the @code{__builtin_lroundf} function when +@option{-fno-math-errno} is used. -@item -mfix-r4000 -@itemx -mno-fix-r4000 -@opindex mfix-r4000 -@opindex mno-fix-r4000 -Work around certain R4000 CPU errata: -@itemize @minus -@item -A double-word or a variable shift may give an incorrect result if executed -immediately after starting an integer division. -@item -A double-word or a variable shift may give an incorrect result if executed -while an integer multiplication is in progress. -@item -An integer division may give an incorrect result if started in a delay slot -of a taken branch or a jump. -@end itemize +@item @samp{floatis}, @samp{floatus}, @samp{floatid}, @samp{floatud} +Conversion from signed or unsigned integer types to floating-point types. -@item -mfix-r4400 -@itemx -mno-fix-r4400 -@opindex mfix-r4400 -@opindex mno-fix-r4400 -Work around certain R4400 CPU errata: -@itemize @minus -@item -A double-word or a variable shift may give an incorrect result if executed -immediately after starting an integer division. -@end itemize +@end table -@item -mfix-r10000 -@itemx -mno-fix-r10000 -@opindex mfix-r10000 -@opindex mno-fix-r10000 -Work around certain R10000 errata: -@itemize @minus -@item -@code{ll}/@code{sc} sequences may not behave atomically on revisions -prior to 3.0. They may deadlock on revisions 2.6 and earlier. -@end itemize +In addition, all of the following transfer instructions for internal +registers X and Y must be provided to use any of the double-precision +floating-point instructions. Custom instructions taking two +double-precision source operands expect the first operand in the +64-bit register X. The other operand (or only operand of a unary +operation) is given to the custom arithmetic instruction with the +least significant half in source register @var{src1} and the most +significant half in @var{src2}. A custom instruction that returns a +double-precision result returns the most significant 32 bits in the +destination register and the other half in 32-bit register Y. +GCC automatically generates the necessary code sequences to write +register X and/or read register Y when double-precision floating-point +instructions are used. -This option can only be used if the target architecture supports -branch-likely instructions. @option{-mfix-r10000} is the default when -@option{-march=r10000} is used; @option{-mno-fix-r10000} is the default -otherwise. +@table @asis -@item -mfix-rm7000 -@itemx -mno-fix-rm7000 -@opindex mfix-rm7000 -Work around the RM7000 @code{dmult}/@code{dmultu} errata. The -workarounds are implemented by the assembler rather than by GCC@. +@item @samp{fwrx} +Write @var{src1} into the least significant half of X and @var{src2} into +the most significant half of X. -@item -mfix-vr4120 -@itemx -mno-fix-vr4120 -@opindex mfix-vr4120 -Work around certain VR4120 errata: -@itemize @minus -@item -@code{dmultu} does not always produce the correct result. -@item -@code{div} and @code{ddiv} do not always produce the correct result if one -of the operands is negative. -@end itemize -The workarounds for the division errata rely on special functions in -@file{libgcc.a}. At present, these functions are only provided by -the @code{mips64vr*-elf} configurations. +@item @samp{fwry} +Write @var{src1} into Y. -Other VR4120 errata require a NOP to be inserted between certain pairs of -instructions. These errata are handled by the assembler, not by GCC itself. +@item @samp{frdxhi}, @samp{frdxlo} +Read the most or least (respectively) significant half of X and store it in +@var{dest}. -@item -mfix-vr4130 -@opindex mfix-vr4130 -Work around the VR4130 @code{mflo}/@code{mfhi} errata. The -workarounds are implemented by the assembler rather than by GCC, -although GCC avoids using @code{mflo} and @code{mfhi} if the -VR4130 @code{macc}, @code{macchi}, @code{dmacc} and @code{dmacchi} -instructions are available instead. +@item @samp{frdy} +Read the value of Y and store it into @var{dest}. +@end table -@item -mfix-sb1 -@itemx -mno-fix-sb1 -@opindex mfix-sb1 -Work around certain SB-1 CPU core errata. -(This flag currently works around the SB-1 revision 2 -``F1'' and ``F2'' floating-point errata.) +Note that you can gain more local control over generation of Nios II custom +instructions by using the @code{target("custom-@var{insn}=@var{N}")} +and @code{target("no-custom-@var{insn}")} function attributes +(@pxref{Function Attributes}) +or pragmas (@pxref{Function Specific Option Pragmas}). -@item -mr10k-cache-barrier=@var{setting} -@opindex mr10k-cache-barrier -Specify whether GCC should insert cache barriers to avoid the -side-effects of speculation on R10K processors. +@item -mcustom-fpu-cfg=@var{name} +@opindex mcustom-fpu-cfg -In common with many processors, the R10K tries to predict the outcome -of a conditional branch and speculatively executes instructions from -the ``taken'' branch. It later aborts these instructions if the -predicted outcome is wrong. However, on the R10K, even aborted -instructions can have side effects. +This option enables a predefined, named set of custom instruction encodings +(see @option{-mcustom-@var{insn}} above). +Currently, the following sets are defined: -This problem only affects kernel stores and, depending on the system, -kernel loads. As an example, a speculatively-executed store may load -the target memory into cache and mark the cache line as dirty, even if -the store itself is later aborted. If a DMA operation writes to the -same area of memory before the ``dirty'' line is flushed, the cached -data overwrites the DMA-ed data. See the R10K processor manual -for a full description, including other potential problems. +@option{-mcustom-fpu-cfg=60-1} is equivalent to: +@gccoptlist{-mcustom-fmuls=252 @gol +-mcustom-fadds=253 @gol +-mcustom-fsubs=254 @gol +-fsingle-precision-constant} -One workaround is to insert cache barrier instructions before every memory -access that might be speculatively executed and that might have side -effects even if aborted. @option{-mr10k-cache-barrier=@var{setting}} -controls GCC's implementation of this workaround. It assumes that -aborted accesses to any byte in the following regions does not have -side effects: +@option{-mcustom-fpu-cfg=60-2} is equivalent to: +@gccoptlist{-mcustom-fmuls=252 @gol +-mcustom-fadds=253 @gol +-mcustom-fsubs=254 @gol +-mcustom-fdivs=255 @gol +-fsingle-precision-constant} -@enumerate -@item -the memory occupied by the current function's stack frame; +@option{-mcustom-fpu-cfg=72-3} is equivalent to: +@gccoptlist{-mcustom-floatus=243 @gol +-mcustom-fixsi=244 @gol +-mcustom-floatis=245 @gol +-mcustom-fcmpgts=246 @gol +-mcustom-fcmples=249 @gol +-mcustom-fcmpeqs=250 @gol +-mcustom-fcmpnes=251 @gol +-mcustom-fmuls=252 @gol +-mcustom-fadds=253 @gol +-mcustom-fsubs=254 @gol +-mcustom-fdivs=255 @gol +-fsingle-precision-constant} -@item -the memory occupied by an incoming stack argument; +Custom instruction assignments given by individual +@option{-mcustom-@var{insn}=} options override those given by +@option{-mcustom-fpu-cfg=}, regardless of the +order of the options on the command line. -@item -the memory occupied by an object with a link-time-constant address. -@end enumerate +Note that you can gain more local control over selection of a FPU +configuration by using the @code{target("custom-fpu-cfg=@var{name}")} +function attribute (@pxref{Function Attributes}) +or pragma (@pxref{Function Specific Option Pragmas}). -It is the kernel's responsibility to ensure that speculative -accesses to these regions are indeed safe. +@end table -If the input program contains a function declaration such as: +These additional @samp{-m} options are available for the Altera Nios II +ELF (bare-metal) target: -@smallexample -void foo (void); -@end smallexample +@table @gcctabopt -then the implementation of @code{foo} must allow @code{j foo} and -@code{jal foo} to be executed speculatively. GCC honors this -restriction for functions it compiles itself. It expects non-GCC -functions (such as hand-written assembly code) to do the same. +@item -mhal +@opindex mhal +Link with HAL BSP. This suppresses linking with the GCC-provided C runtime +startup and termination code, and is typically used in conjunction with +@option{-msys-crt0=} to specify the location of the alternate startup code +provided by the HAL BSP. -The option has three forms: +@item -msmallc +@opindex msmallc +Link with a limited version of the C library, @option{-lsmallc}, rather than +Newlib. -@table @gcctabopt -@item -mr10k-cache-barrier=load-store -Insert a cache barrier before a load or store that might be -speculatively executed and that might have side effects even -if aborted. +@item -msys-crt0=@var{startfile} +@opindex msys-crt0 +@var{startfile} is the file name of the startfile (crt0) to use +when linking. This option is only useful in conjunction with @option{-mhal}. -@item -mr10k-cache-barrier=store -Insert a cache barrier before a store that might be speculatively -executed and that might have side effects even if aborted. +@item -msys-lib=@var{systemlib} +@opindex msys-lib +@var{systemlib} is the library name of the library that provides +low-level system calls required by the C library, +e.g. @code{read} and @code{write}. +This option is typically used to link with a library provided by a HAL BSP. -@item -mr10k-cache-barrier=none -Disable the insertion of cache barriers. This is the default setting. @end table -@item -mflush-func=@var{func} -@itemx -mno-flush-func -@opindex mflush-func -Specifies the function to call to flush the I and D caches, or to not -call any such function. If called, the function must take the same -arguments as the common @code{_flush_func}, that is, the address of the -memory range for which the cache is being flushed, the size of the -memory range, and the number 3 (to flush both caches). The default -depends on the target GCC was configured for, but commonly is either -@code{_flush_func} or @code{__cpu_flush}. - -@item mbranch-cost=@var{num} -@opindex mbranch-cost -Set the cost of branches to roughly @var{num} ``simple'' instructions. -This cost is only a heuristic and is not guaranteed to produce -consistent results across releases. A zero cost redundantly selects -the default, which is based on the @option{-mtune} setting. +@node PDP-11 Options +@subsection PDP-11 Options +@cindex PDP-11 Options -@item -mbranch-likely -@itemx -mno-branch-likely -@opindex mbranch-likely -@opindex mno-branch-likely -Enable or disable use of Branch Likely instructions, regardless of the -default for the selected architecture. By default, Branch Likely -instructions may be generated if they are supported by the selected -architecture. An exception is for the MIPS32 and MIPS64 architectures -and processors that implement those architectures; for those, Branch -Likely instructions are not be generated by default because the MIPS32 -and MIPS64 architectures specifically deprecate their use. +These options are defined for the PDP-11: -@item -mfp-exceptions -@itemx -mno-fp-exceptions -@opindex mfp-exceptions -Specifies whether FP exceptions are enabled. This affects how -FP instructions are scheduled for some processors. -The default is that FP exceptions are -enabled. +@table @gcctabopt +@item -mfpu +@opindex mfpu +Use hardware FPP floating point. This is the default. (FIS floating +point on the PDP-11/40 is not supported.) -For instance, on the SB-1, if FP exceptions are disabled, and we are emitting -64-bit code, then we can use both FP pipes. Otherwise, we can only use one -FP pipe. +@item -msoft-float +@opindex msoft-float +Do not use hardware floating point. -@item -mvr4130-align -@itemx -mno-vr4130-align -@opindex mvr4130-align -The VR4130 pipeline is two-way superscalar, but can only issue two -instructions together if the first one is 8-byte aligned. When this -option is enabled, GCC aligns pairs of instructions that it -thinks should execute in parallel. +@item -mac0 +@opindex mac0 +Return floating-point results in ac0 (fr0 in Unix assembler syntax). -This option only has an effect when optimizing for the VR4130. -It normally makes code faster, but at the expense of making it bigger. -It is enabled by default at optimization level @option{-O3}. +@item -mno-ac0 +@opindex mno-ac0 +Return floating-point results in memory. This is the default. -@item -msynci -@itemx -mno-synci -@opindex msynci -Enable (disable) generation of @code{synci} instructions on -architectures that support it. The @code{synci} instructions (if -enabled) are generated when @code{__builtin___clear_cache} is -compiled. +@item -m40 +@opindex m40 +Generate code for a PDP-11/40. -This option defaults to @option{-mno-synci}, but the default can be -overridden by configuring GCC with @option{--with-synci}. +@item -m45 +@opindex m45 +Generate code for a PDP-11/45. This is the default. -When compiling code for single processor systems, it is generally safe -to use @code{synci}. However, on many multi-core (SMP) systems, it -does not invalidate the instruction caches on all cores and may lead -to undefined behavior. +@item -m10 +@opindex m10 +Generate code for a PDP-11/10. -@item -mrelax-pic-calls -@itemx -mno-relax-pic-calls -@opindex mrelax-pic-calls -Try to turn PIC calls that are normally dispatched via register -@code{$25} into direct calls. This is only possible if the linker can -resolve the destination at link-time and if the destination is within -range for a direct call. +@item -mbcopy-builtin +@opindex mbcopy-builtin +Use inline @code{movmemhi} patterns for copying memory. This is the +default. -@option{-mrelax-pic-calls} is the default if GCC was configured to use -an assembler and a linker that support the @code{.reloc} assembly -directive and @option{-mexplicit-relocs} is in effect. With -@option{-mno-explicit-relocs}, this optimization can be performed by the -assembler and the linker alone without help from the compiler. +@item -mbcopy +@opindex mbcopy +Do not use inline @code{movmemhi} patterns for copying memory. -@item -mmcount-ra-address -@itemx -mno-mcount-ra-address -@opindex mmcount-ra-address -@opindex mno-mcount-ra-address -Emit (do not emit) code that allows @code{_mcount} to modify the -calling function's return address. When enabled, this option extends -the usual @code{_mcount} interface with a new @var{ra-address} -parameter, which has type @code{intptr_t *} and is passed in register -@code{$12}. @code{_mcount} can then modify the return address by -doing both of the following: -@itemize -@item -Returning the new address in register @code{$31}. -@item -Storing the new address in @code{*@var{ra-address}}, -if @var{ra-address} is nonnull. -@end itemize +@item -mint16 +@itemx -mno-int32 +@opindex mint16 +@opindex mno-int32 +Use 16-bit @code{int}. This is the default. -The default is @option{-mno-mcount-ra-address}. +@item -mint32 +@itemx -mno-int16 +@opindex mint32 +@opindex mno-int16 +Use 32-bit @code{int}. -@end table +@item -mfloat64 +@itemx -mno-float32 +@opindex mfloat64 +@opindex mno-float32 +Use 64-bit @code{float}. This is the default. -@node MMIX Options -@subsection MMIX Options -@cindex MMIX Options - -These options are defined for the MMIX: - -@table @gcctabopt -@item -mlibfuncs -@itemx -mno-libfuncs -@opindex mlibfuncs -@opindex mno-libfuncs -Specify that intrinsic library functions are being compiled, passing all -values in registers, no matter the size. - -@item -mepsilon -@itemx -mno-epsilon -@opindex mepsilon -@opindex mno-epsilon -Generate floating-point comparison instructions that compare with respect -to the @code{rE} epsilon register. - -@item -mabi=mmixware -@itemx -mabi=gnu -@opindex mabi=mmixware -@opindex mabi=gnu -Generate code that passes function parameters and return values that (in -the called function) are seen as registers @code{$0} and up, as opposed to -the GNU ABI which uses global registers @code{$231} and up. - -@item -mzero-extend -@itemx -mno-zero-extend -@opindex mzero-extend -@opindex mno-zero-extend -When reading data from memory in sizes shorter than 64 bits, use (do not -use) zero-extending load instructions by default, rather than -sign-extending ones. +@item -mfloat32 +@itemx -mno-float64 +@opindex mfloat32 +@opindex mno-float64 +Use 32-bit @code{float}. -@item -mknuthdiv -@itemx -mno-knuthdiv -@opindex mknuthdiv -@opindex mno-knuthdiv -Make the result of a division yielding a remainder have the same sign as -the divisor. With the default, @option{-mno-knuthdiv}, the sign of the -remainder follows the sign of the dividend. Both methods are -arithmetically valid, the latter being almost exclusively used. +@item -mabshi +@opindex mabshi +Use @code{abshi2} pattern. This is the default. -@item -mtoplevel-symbols -@itemx -mno-toplevel-symbols -@opindex mtoplevel-symbols -@opindex mno-toplevel-symbols -Prepend (do not prepend) a @samp{:} to all global symbols, so the assembly -code can be used with the @code{PREFIX} assembly directive. +@item -mno-abshi +@opindex mno-abshi +Do not use @code{abshi2} pattern. -@item -melf -@opindex melf -Generate an executable in the ELF format, rather than the default -@samp{mmo} format used by the @command{mmix} simulator. +@item -mbranch-expensive +@opindex mbranch-expensive +Pretend that branches are expensive. This is for experimenting with +code generation only. -@item -mbranch-predict -@itemx -mno-branch-predict -@opindex mbranch-predict -@opindex mno-branch-predict -Use (do not use) the probable-branch instructions, when static branch -prediction indicates a probable branch. +@item -mbranch-cheap +@opindex mbranch-cheap +Do not pretend that branches are expensive. This is the default. -@item -mbase-addresses -@itemx -mno-base-addresses -@opindex mbase-addresses -@opindex mno-base-addresses -Generate (do not generate) code that uses @emph{base addresses}. Using a -base address automatically generates a request (handled by the assembler -and the linker) for a constant to be set up in a global register. The -register is used for one or more base address requests within the range 0 -to 255 from the value held in the register. The generally leads to short -and fast code, but the number of different data items that can be -addressed is limited. This means that a program that uses lots of static -data may require @option{-mno-base-addresses}. +@item -munix-asm +@opindex munix-asm +Use Unix assembler syntax. This is the default when configured for +@samp{pdp11-*-bsd}. -@item -msingle-exit -@itemx -mno-single-exit -@opindex msingle-exit -@opindex mno-single-exit -Force (do not force) generated code to have a single exit point in each -function. +@item -mdec-asm +@opindex mdec-asm +Use DEC assembler syntax. This is the default when configured for any +PDP-11 target other than @samp{pdp11-*-bsd}. @end table -@node MN10300 Options -@subsection MN10300 Options -@cindex MN10300 options +@node picoChip Options +@subsection picoChip Options +@cindex picoChip options -These @option{-m} options are defined for Matsushita MN10300 architectures: +These @samp{-m} options are defined for picoChip implementations: @table @gcctabopt -@item -mmult-bug -@opindex mmult-bug -Generate code to avoid bugs in the multiply instructions for the MN10300 -processors. This is the default. -@item -mno-mult-bug -@opindex mno-mult-bug -Do not generate code to avoid bugs in the multiply instructions for the -MN10300 processors. +@item -mae=@var{ae_type} +@opindex mcpu +Set the instruction set, register set, and instruction scheduling +parameters for array element type @var{ae_type}. Supported values +for @var{ae_type} are @samp{ANY}, @samp{MUL}, and @samp{MAC}. -@item -mam33 -@opindex mam33 -Generate code using features specific to the AM33 processor. +@option{-mae=ANY} selects a completely generic AE type. Code +generated with this option runs on any of the other AE types. The +code is not as efficient as it would be if compiled for a specific +AE type, and some types of operation (e.g., multiplication) do not +work properly on all types of AE. -@item -mno-am33 -@opindex mno-am33 -Do not generate code using features specific to the AM33 processor. This -is the default. +@option{-mae=MUL} selects a MUL AE type. This is the most useful AE type +for compiled code, and is the default. -@item -mam33-2 -@opindex mam33-2 -Generate code using features specific to the AM33/2.0 processor. +@option{-mae=MAC} selects a DSP-style MAC AE. Code compiled with this +option may suffer from poor performance of byte (char) manipulation, +since the DSP AE does not provide hardware support for byte load/stores. -@item -mam34 -@opindex mam34 -Generate code using features specific to the AM34 processor. +@item -msymbol-as-address +Enable the compiler to directly use a symbol name as an address in a +load/store instruction, without first loading it into a +register. Typically, the use of this option generates larger +programs, which run faster than when the option isn't used. However, the +results vary from program to program, so it is left as a user option, +rather than being permanently enabled. -@item -mtune=@var{cpu-type} -@opindex mtune -Use the timing characteristics of the indicated CPU type when -scheduling instructions. This does not change the targeted processor -type. The CPU type must be one of @samp{mn10300}, @samp{am33}, -@samp{am33-2} or @samp{am34}. +@item -mno-inefficient-warnings +Disables warnings about the generation of inefficient code. These +warnings can be generated, for example, when compiling code that +performs byte-level memory operations on the MAC AE type. The MAC AE has +no hardware support for byte-level memory operations, so all byte +load/stores must be synthesized from word load/store operations. This is +inefficient and a warning is generated to indicate +that you should rewrite the code to avoid byte operations, or to target +an AE type that has the necessary hardware support. This option disables +these warnings. -@item -mreturn-pointer-on-d0 -@opindex mreturn-pointer-on-d0 -When generating a function that returns a pointer, return the pointer -in both @code{a0} and @code{d0}. Otherwise, the pointer is returned -only in @code{a0}, and attempts to call such functions without a prototype -result in errors. Note that this option is on by default; use -@option{-mno-return-pointer-on-d0} to disable it. +@end table -@item -mno-crt0 -@opindex mno-crt0 -Do not link in the C run-time initialization object file. +@node PowerPC Options +@subsection PowerPC Options +@cindex PowerPC options -@item -mrelax -@opindex mrelax -Indicate to the linker that it should perform a relaxation optimization pass -to shorten branches, calls and absolute memory addresses. This option only -has an effect when used on the command line for the final link step. +These are listed under @xref{RS/6000 and PowerPC Options}. -This option makes symbolic debugging impossible. +@node RL78 Options +@subsection RL78 Options +@cindex RL78 Options -@item -mliw -@opindex mliw -Allow the compiler to generate @emph{Long Instruction Word} -instructions if the target is the @samp{AM33} or later. This is the -default. This option defines the preprocessor macro @code{__LIW__}. +@table @gcctabopt -@item -mnoliw -@opindex mnoliw -Do not allow the compiler to generate @emph{Long Instruction Word} -instructions. This option defines the preprocessor macro -@code{__NO_LIW__}. +@item -msim +@opindex msim +Links in additional target libraries to support operation within a +simulator. -@item -msetlb -@opindex msetlb -Allow the compiler to generate the @emph{SETLB} and @emph{Lcc} -instructions if the target is the @samp{AM33} or later. This is the -default. This option defines the preprocessor macro @code{__SETLB__}. +@item -mmul=none +@itemx -mmul=g13 +@itemx -mmul=rl78 +@opindex mmul +Specifies the type of hardware multiplication support to be used. The +default is @samp{none}, which uses software multiplication functions. +The @samp{g13} option is for the hardware multiply/divide peripheral +only on the RL78/G13 targets. The @samp{rl78} option is for the +standard hardware multiplication defined in the RL78 software manual. -@item -mnosetlb -@opindex mnosetlb -Do not allow the compiler to generate @emph{SETLB} or @emph{Lcc} -instructions. This option defines the preprocessor macro -@code{__NO_SETLB__}. +@item -m64bit-doubles +@itemx -m32bit-doubles +@opindex m64bit-doubles +@opindex m32bit-doubles +Make the @code{double} data type be 64 bits (@option{-m64bit-doubles}) +or 32 bits (@option{-m32bit-doubles}) in size. The default is +@option{-m32bit-doubles}. @end table -@node Moxie Options -@subsection Moxie Options -@cindex Moxie Options +@node RS/6000 and PowerPC Options +@subsection IBM RS/6000 and PowerPC Options +@cindex RS/6000 and PowerPC Options +@cindex IBM RS/6000 and PowerPC Options +These @samp{-m} options are defined for the IBM RS/6000 and PowerPC: @table @gcctabopt +@item -mpowerpc-gpopt +@itemx -mno-powerpc-gpopt +@itemx -mpowerpc-gfxopt +@itemx -mno-powerpc-gfxopt +@need 800 +@itemx -mpowerpc64 +@itemx -mno-powerpc64 +@itemx -mmfcrf +@itemx -mno-mfcrf +@itemx -mpopcntb +@itemx -mno-popcntb +@itemx -mpopcntd +@itemx -mno-popcntd +@itemx -mfprnd +@itemx -mno-fprnd +@need 800 +@itemx -mcmpb +@itemx -mno-cmpb +@itemx -mmfpgpr +@itemx -mno-mfpgpr +@itemx -mhard-dfp +@itemx -mno-hard-dfp +@opindex mpowerpc-gpopt +@opindex mno-powerpc-gpopt +@opindex mpowerpc-gfxopt +@opindex mno-powerpc-gfxopt +@opindex mpowerpc64 +@opindex mno-powerpc64 +@opindex mmfcrf +@opindex mno-mfcrf +@opindex mpopcntb +@opindex mno-popcntb +@opindex mpopcntd +@opindex mno-popcntd +@opindex mfprnd +@opindex mno-fprnd +@opindex mcmpb +@opindex mno-cmpb +@opindex mmfpgpr +@opindex mno-mfpgpr +@opindex mhard-dfp +@opindex mno-hard-dfp +You use these options to specify which instructions are available on the +processor you are using. The default value of these options is +determined when configuring GCC@. Specifying the +@option{-mcpu=@var{cpu_type}} overrides the specification of these +options. We recommend you use the @option{-mcpu=@var{cpu_type}} option +rather than the options listed above. -@item -meb -@opindex meb -Generate big-endian code. This is the default for @samp{moxie-*-*} -configurations. - -@item -mel -@opindex mel -Generate little-endian code. - -@item -mmul.x -@opindex mmul.x -Generate mul.x and umul.x instructions. This is the default for -@samp{moxiebox-*-*} configurations. - -@item -mno-crt0 -@opindex mno-crt0 -Do not link in the C run-time initialization object file. - -@end table +Specifying @option{-mpowerpc-gpopt} allows +GCC to use the optional PowerPC architecture instructions in the +General Purpose group, including floating-point square root. Specifying +@option{-mpowerpc-gfxopt} allows GCC to +use the optional PowerPC architecture instructions in the Graphics +group, including floating-point select. -@node MSP430 Options -@subsection MSP430 Options -@cindex MSP430 Options +The @option{-mmfcrf} option allows GCC to generate the move from +condition register field instruction implemented on the POWER4 +processor and other processors that support the PowerPC V2.01 +architecture. +The @option{-mpopcntb} option allows GCC to generate the popcount and +double-precision FP reciprocal estimate instruction implemented on the +POWER5 processor and other processors that support the PowerPC V2.02 +architecture. +The @option{-mpopcntd} option allows GCC to generate the popcount +instruction implemented on the POWER7 processor and other processors +that support the PowerPC V2.06 architecture. +The @option{-mfprnd} option allows GCC to generate the FP round to +integer instructions implemented on the POWER5+ processor and other +processors that support the PowerPC V2.03 architecture. +The @option{-mcmpb} option allows GCC to generate the compare bytes +instruction implemented on the POWER6 processor and other processors +that support the PowerPC V2.05 architecture. +The @option{-mmfpgpr} option allows GCC to generate the FP move to/from +general-purpose register instructions implemented on the POWER6X +processor and other processors that support the extended PowerPC V2.05 +architecture. +The @option{-mhard-dfp} option allows GCC to generate the decimal +floating-point instructions implemented on some POWER processors. -These options are defined for the MSP430: +The @option{-mpowerpc64} option allows GCC to generate the additional +64-bit instructions that are found in the full PowerPC64 architecture +and to treat GPRs as 64-bit, doubleword quantities. GCC defaults to +@option{-mno-powerpc64}. -@table @gcctabopt +@item -mcpu=@var{cpu_type} +@opindex mcpu +Set architecture type, register usage, and +instruction scheduling parameters for machine type @var{cpu_type}. +Supported values for @var{cpu_type} are @samp{401}, @samp{403}, +@samp{405}, @samp{405fp}, @samp{440}, @samp{440fp}, @samp{464}, @samp{464fp}, +@samp{476}, @samp{476fp}, @samp{505}, @samp{601}, @samp{602}, @samp{603}, +@samp{603e}, @samp{604}, @samp{604e}, @samp{620}, @samp{630}, @samp{740}, +@samp{7400}, @samp{7450}, @samp{750}, @samp{801}, @samp{821}, @samp{823}, +@samp{860}, @samp{970}, @samp{8540}, @samp{a2}, @samp{e300c2}, +@samp{e300c3}, @samp{e500mc}, @samp{e500mc64}, @samp{e5500}, +@samp{e6500}, @samp{ec603e}, @samp{G3}, @samp{G4}, @samp{G5}, +@samp{titan}, @samp{power3}, @samp{power4}, @samp{power5}, @samp{power5+}, +@samp{power6}, @samp{power6x}, @samp{power7}, @samp{power8}, @samp{powerpc}, +@samp{powerpc64}, and @samp{rs64}. -@item -masm-hex -@opindex masm-hex -Force assembly output to always use hex constants. Normally such -constants are signed decimals, but this option is available for -testsuite and/or aesthetic purposes. +@option{-mcpu=powerpc}, and @option{-mcpu=powerpc64} specify pure 32-bit +PowerPC and 64-bit PowerPC architecture machine +types, with an appropriate, generic processor model assumed for +scheduling purposes. -@item -mmcu= -@opindex mmcu= -Select the MCU to target. This is used to create a C preprocessor -symbol based upon the MCU name, converted to upper case and pre- and -post-fixed with @samp{__}. This in turn is used by the -@file{msp430.h} header file to select an MCU-specific supplementary -header file. +The other options specify a specific processor. Code generated under +those options runs best on that processor, and may not run at all on +others. -The option also sets the ISA to use. If the MCU name is one that is -known to only support the 430 ISA then that is selected, otherwise the -430X ISA is selected. A generic MCU name of @samp{msp430} can also be -used to select the 430 ISA. Similarly the generic @samp{msp430x} MCU -name selects the 430X ISA. +The @option{-mcpu} options automatically enable or disable the +following options: -In addition an MCU-specific linker script is added to the linker -command line. The script's name is the name of the MCU with -@file{.ld} appended. Thus specifying @option{-mmcu=xxx} on the @command{gcc} -command line defines the C preprocessor symbol @code{__XXX__} and -cause the linker to search for a script called @file{xxx.ld}. +@gccoptlist{-maltivec -mfprnd -mhard-float -mmfcrf -mmultiple @gol +-mpopcntb -mpopcntd -mpowerpc64 @gol +-mpowerpc-gpopt -mpowerpc-gfxopt -msingle-float -mdouble-float @gol +-msimple-fpu -mstring -mmulhw -mdlmzb -mmfpgpr -mvsx @gol +-mcrypto -mdirect-move -mpower8-fusion -mpower8-vector @gol +-mquad-memory -mquad-memory-atomic} -This option is also passed on to the assembler. +The particular options set for any particular CPU varies between +compiler versions, depending on what setting seems to produce optimal +code for that CPU; it doesn't necessarily reflect the actual hardware's +capabilities. If you wish to set an individual option to a particular +value, you may specify it after the @option{-mcpu} option, like +@option{-mcpu=970 -mno-altivec}. -@item -mcpu= -@opindex mcpu= -Specifies the ISA to use. Accepted values are @samp{msp430}, -@samp{msp430x} and @samp{msp430xv2}. This option is deprecated. The -@option{-mmcu=} option should be used to select the ISA. +On AIX, the @option{-maltivec} and @option{-mpowerpc64} options are +not enabled or disabled by the @option{-mcpu} option at present because +AIX does not have full support for these options. You may still +enable or disable them individually if you're sure it'll work in your +environment. -@item -msim -@opindex msim -Link to the simulator runtime libraries and linker script. Overrides -any scripts that would be selected by the @option{-mmcu=} option. +@item -mtune=@var{cpu_type} +@opindex mtune +Set the instruction scheduling parameters for machine type +@var{cpu_type}, but do not set the architecture type or register usage, +as @option{-mcpu=@var{cpu_type}} does. The same +values for @var{cpu_type} are used for @option{-mtune} as for +@option{-mcpu}. If both are specified, the code generated uses the +architecture and registers set by @option{-mcpu}, but the +scheduling parameters set by @option{-mtune}. -@item -mlarge -@opindex mlarge -Use large-model addressing (20-bit pointers, 32-bit @code{size_t}). +@item -mcmodel=small +@opindex mcmodel=small +Generate PowerPC64 code for the small model: The TOC is limited to +64k. -@item -msmall -@opindex msmall -Use small-model addressing (16-bit pointers, 16-bit @code{size_t}). +@item -mcmodel=medium +@opindex mcmodel=medium +Generate PowerPC64 code for the medium model: The TOC and other static +data may be up to a total of 4G in size. -@item -mrelax -@opindex mrelax -This option is passed to the assembler and linker, and allows the -linker to perform certain optimizations that cannot be done until -the final link. +@item -mcmodel=large +@opindex mcmodel=large +Generate PowerPC64 code for the large model: The TOC may be up to 4G +in size. Other data and code is only limited by the 64-bit address +space. -@item mhwmult= -@opindex mhwmult= -Describes the type of hardware multiply supported by the target. -Accepted values are @samp{none} for no hardware multiply, @samp{16bit} -for the original 16-bit-only multiply supported by early MCUs. -@samp{32bit} for the 16/32-bit multiply supported by later MCUs and -@samp{f5series} for the 16/32-bit multiply supported by F5-series MCUs. -A value of @samp{auto} can also be given. This tells GCC to deduce -the hardware multiply support based upon the MCU name provided by the -@option{-mmcu} option. If no @option{-mmcu} option is specified then -@samp{32bit} hardware multiply support is assumed. @samp{auto} is the -default setting. +@item -maltivec +@itemx -mno-altivec +@opindex maltivec +@opindex mno-altivec +Generate code that uses (does not use) AltiVec instructions, and also +enable the use of built-in functions that allow more direct access to +the AltiVec instruction set. You may also need to set +@option{-mabi=altivec} to adjust the current ABI with AltiVec ABI +enhancements. -Hardware multiplies are normally performed by calling a library -routine. This saves space in the generated code. When compiling at -@option{-O3} or higher however the hardware multiplier is invoked -inline. This makes for bigger, but faster code. +When @option{-maltivec} is used, rather than @option{-maltivec=le} or +@option{-maltivec=be}, the element order for Altivec intrinsics such +as @code{vec_splat}, @code{vec_extract}, and @code{vec_insert} +match array element order corresponding to the endianness of the +target. That is, element zero identifies the leftmost element in a +vector register when targeting a big-endian platform, and identifies +the rightmost element in a vector register when targeting a +little-endian platform. -The hardware multiply routines disable interrupts whilst running and -restore the previous interrupt state when they finish. This makes -them safe to use inside interrupt handlers as well as in normal code. +@item -maltivec=be +@opindex maltivec=be +Generate Altivec instructions using big-endian element order, +regardless of whether the target is big- or little-endian. This is +the default when targeting a big-endian platform. -@item -minrt -@opindex minrt -Enable the use of a minimum runtime environment - no static -initializers or constructors. This is intended for memory-constrained -devices. The compiler includes special symbols in some objects -that tell the linker and runtime which code fragments are required. +The element order is used to interpret element numbers in Altivec +intrinsics such as @code{vec_splat}, @code{vec_extract}, and +@code{vec_insert}. By default, these match array element order +corresponding to the endianness for the target. -@end table +@item -maltivec=le +@opindex maltivec=le +Generate Altivec instructions using little-endian element order, +regardless of whether the target is big- or little-endian. This is +the default when targeting a little-endian platform. This option is +currently ignored when targeting a big-endian platform. -@node NDS32 Options -@subsection NDS32 Options -@cindex NDS32 Options - -These options are defined for NDS32 implementations: - -@table @gcctabopt +The element order is used to interpret element numbers in Altivec +intrinsics such as @code{vec_splat}, @code{vec_extract}, and +@code{vec_insert}. By default, these match array element order +corresponding to the endianness for the target. -@item -mbig-endian -@opindex mbig-endian -Generate code in big-endian mode. +@item -mvrsave +@itemx -mno-vrsave +@opindex mvrsave +@opindex mno-vrsave +Generate VRSAVE instructions when generating AltiVec code. -@item -mlittle-endian -@opindex mlittle-endian -Generate code in little-endian mode. +@item -mgen-cell-microcode +@opindex mgen-cell-microcode +Generate Cell microcode instructions. -@item -mreduced-regs -@opindex mreduced-regs -Use reduced-set registers for register allocation. +@item -mwarn-cell-microcode +@opindex mwarn-cell-microcode +Warn when a Cell microcode instruction is emitted. An example +of a Cell microcode instruction is a variable shift. -@item -mfull-regs -@opindex mfull-regs -Use full-set registers for register allocation. +@item -msecure-plt +@opindex msecure-plt +Generate code that allows @command{ld} and @command{ld.so} +to build executables and shared +libraries with non-executable @code{.plt} and @code{.got} sections. +This is a PowerPC +32-bit SYSV ABI option. -@item -mcmov -@opindex mcmov -Generate conditional move instructions. +@item -mbss-plt +@opindex mbss-plt +Generate code that uses a BSS @code{.plt} section that @command{ld.so} +fills in, and +requires @code{.plt} and @code{.got} +sections that are both writable and executable. +This is a PowerPC 32-bit SYSV ABI option. -@item -mno-cmov -@opindex mno-cmov -Do not generate conditional move instructions. +@item -misel +@itemx -mno-isel +@opindex misel +@opindex mno-isel +This switch enables or disables the generation of ISEL instructions. -@item -mperf-ext -@opindex mperf-ext -Generate performance extension instructions. +@item -misel=@var{yes/no} +This switch has been deprecated. Use @option{-misel} and +@option{-mno-isel} instead. -@item -mno-perf-ext -@opindex mno-perf-ext -Do not generate performance extension instructions. +@item -mspe +@itemx -mno-spe +@opindex mspe +@opindex mno-spe +This switch enables or disables the generation of SPE simd +instructions. -@item -mv3push -@opindex mv3push -Generate v3 push25/pop25 instructions. +@item -mpaired +@itemx -mno-paired +@opindex mpaired +@opindex mno-paired +This switch enables or disables the generation of PAIRED simd +instructions. -@item -mno-v3push -@opindex mno-v3push -Do not generate v3 push25/pop25 instructions. +@item -mspe=@var{yes/no} +This option has been deprecated. Use @option{-mspe} and +@option{-mno-spe} instead. -@item -m16-bit -@opindex m16-bit -Generate 16-bit instructions. +@item -mvsx +@itemx -mno-vsx +@opindex mvsx +@opindex mno-vsx +Generate code that uses (does not use) vector/scalar (VSX) +instructions, and also enable the use of built-in functions that allow +more direct access to the VSX instruction set. -@item -mno-16-bit -@opindex mno-16-bit -Do not generate 16-bit instructions. +@item -mcrypto +@itemx -mno-crypto +@opindex mcrypto +@opindex mno-crypto +Enable the use (disable) of the built-in functions that allow direct +access to the cryptographic instructions that were added in version +2.07 of the PowerPC ISA. -@item -misr-vector-size=@var{num} -@opindex misr-vector-size -Specify the size of each interrupt vector, which must be 4 or 16. +@item -mdirect-move +@itemx -mno-direct-move +@opindex mdirect-move +@opindex mno-direct-move +Generate code that uses (does not use) the instructions to move data +between the general purpose registers and the vector/scalar (VSX) +registers that were added in version 2.07 of the PowerPC ISA. -@item -mcache-block-size=@var{num} -@opindex mcache-block-size -Specify the size of each cache block, -which must be a power of 2 between 4 and 512. +@item -mpower8-fusion +@itemx -mno-power8-fusion +@opindex mpower8-fusion +@opindex mno-power8-fusion +Generate code that keeps (does not keeps) some integer operations +adjacent so that the instructions can be fused together on power8 and +later processors. -@item -march=@var{arch} -@opindex march -Specify the name of the target architecture. +@item -mpower8-vector +@itemx -mno-power8-vector +@opindex mpower8-vector +@opindex mno-power8-vector +Generate code that uses (does not use) the vector and scalar +instructions that were added in version 2.07 of the PowerPC ISA. Also +enable the use of built-in functions that allow more direct access to +the vector instructions. -@item -mcmodel=@var{code-model} -@opindex mcmodel -Set the code model to one of -@table @asis -@item @samp{small} -All the data and read-only data segments must be within 512KB addressing space. -The text segment must be within 16MB addressing space. -@item @samp{medium} -The data segment must be within 512KB while the read-only data segment can be -within 4GB addressing space. The text segment should be still within 16MB -addressing space. -@item @samp{large} -All the text and data segments can be within 4GB addressing space. -@end table +@item -mquad-memory +@itemx -mno-quad-memory +@opindex mquad-memory +@opindex mno-quad-memory +Generate code that uses (does not use) the non-atomic quad word memory +instructions. The @option{-mquad-memory} option requires use of +64-bit mode. -@item -mctor-dtor -@opindex mctor-dtor -Enable constructor/destructor feature. +@item -mquad-memory-atomic +@itemx -mno-quad-memory-atomic +@opindex mquad-memory-atomic +@opindex mno-quad-memory-atomic +Generate code that uses (does not use) the atomic quad word memory +instructions. The @option{-mquad-memory-atomic} option requires use of +64-bit mode. -@item -mrelax -@opindex mrelax -Guide linker to relax instructions. +@item -mupper-regs-df +@itemx -mno-upper-regs-df +@opindex mupper-regs-df +@opindex mno-upper-regs-df +Generate code that uses (does not use) the scalar double precision +instructions that target all 64 registers in the vector/scalar +floating point register set that were added in version 2.06 of the +PowerPC ISA. The @option{-mupper-regs-df} turned on by default if you +use either of the @option{-mcpu=power7}, @option{-mcpu=power8}, or +@option{-mvsx} options. -@end table +@item -mupper-regs-sf +@itemx -mno-upper-regs-sf +@opindex mupper-regs-sf +@opindex mno-upper-regs-sf +Generate code that uses (does not use) the scalar single precision +instructions that target all 64 registers in the vector/scalar +floating point register set that were added in version 2.07 of the +PowerPC ISA. The @option{-mupper-regs-sf} turned on by default if you +use either of the @option{-mcpu=power8}, or @option{-mpower8-vector} +options. -@node Nios II Options -@subsection Nios II Options -@cindex Nios II options -@cindex Altera Nios II options +@item -mupper-regs +@itemx -mno-upper-regs +@opindex mupper-regs +@opindex mno-upper-regs +Generate code that uses (does not use) the scalar +instructions that target all 64 registers in the vector/scalar +floating point register set, depending on the model of the machine. -These are the options defined for the Altera Nios II processor. +If the @option{-mno-upper-regs} option is used, it turns off both +@option{-mupper-regs-sf} and @option{-mupper-regs-df} options. -@table @gcctabopt +@item -mfloat-gprs=@var{yes/single/double/no} +@itemx -mfloat-gprs +@opindex mfloat-gprs +This switch enables or disables the generation of floating-point +operations on the general-purpose registers for architectures that +support it. -@item -G @var{num} -@opindex G -@cindex smaller data references -Put global and static objects less than or equal to @var{num} bytes -into the small data or BSS sections instead of the normal data or BSS -sections. The default value of @var{num} is 8. +The argument @samp{yes} or @samp{single} enables the use of +single-precision floating-point operations. -@item -mgpopt=@var{option} -@item -mgpopt -@itemx -mno-gpopt -@opindex mgpopt -@opindex mno-gpopt -Generate (do not generate) GP-relative accesses. The following -@var{option} names are recognized: +The argument @samp{double} enables the use of single and +double-precision floating-point operations. -@table @samp +The argument @samp{no} disables floating-point operations on the +general-purpose registers. -@item none -Do not generate GP-relative accesses. +This option is currently only available on the MPC854x. -@item local -Generate GP-relative accesses for small data objects that are not -external or weak. Also use GP-relative addressing for objects that -have been explicitly placed in a small data section via a @code{section} -attribute. +@item -m32 +@itemx -m64 +@opindex m32 +@opindex m64 +Generate code for 32-bit or 64-bit environments of Darwin and SVR4 +targets (including GNU/Linux). The 32-bit environment sets int, long +and pointer to 32 bits and generates code that runs on any PowerPC +variant. The 64-bit environment sets int to 32 bits and long and +pointer to 64 bits, and generates code for PowerPC64, as for +@option{-mpowerpc64}. -@item global -As for @samp{local}, but also generate GP-relative accesses for -small data objects that are external or weak. If you use this option, -you must ensure that all parts of your program (including libraries) are -compiled with the same @option{-G} setting. - -@item data -Generate GP-relative accesses for all data objects in the program. If you -use this option, the entire data and BSS segments -of your program must fit in 64K of memory and you must use an appropriate -linker script to allocate them within the addressible range of the -global pointer. - -@item all -Generate GP-relative addresses for function pointers as well as data -pointers. If you use this option, the entire text, data, and BSS segments -of your program must fit in 64K of memory and you must use an appropriate -linker script to allocate them within the addressible range of the -global pointer. +@item -mfull-toc +@itemx -mno-fp-in-toc +@itemx -mno-sum-in-toc +@itemx -mminimal-toc +@opindex mfull-toc +@opindex mno-fp-in-toc +@opindex mno-sum-in-toc +@opindex mminimal-toc +Modify generation of the TOC (Table Of Contents), which is created for +every executable file. The @option{-mfull-toc} option is selected by +default. In that case, GCC allocates at least one TOC entry for +each unique non-automatic variable reference in your program. GCC +also places floating-point constants in the TOC@. However, only +16,384 entries are available in the TOC@. -@end table +If you receive a linker error message that saying you have overflowed +the available TOC space, you can reduce the amount of TOC space used +with the @option{-mno-fp-in-toc} and @option{-mno-sum-in-toc} options. +@option{-mno-fp-in-toc} prevents GCC from putting floating-point +constants in the TOC and @option{-mno-sum-in-toc} forces GCC to +generate code to calculate the sum of an address and a constant at +run time instead of putting that sum into the TOC@. You may specify one +or both of these options. Each causes GCC to produce very slightly +slower and larger code at the expense of conserving TOC space. -@option{-mgpopt} is equivalent to @option{-mgpopt=local}, and -@option{-mno-gpopt} is equivalent to @option{-mgpopt=none}. +If you still run out of space in the TOC even when you specify both of +these options, specify @option{-mminimal-toc} instead. This option causes +GCC to make only one TOC entry for every file. When you specify this +option, GCC produces code that is slower and larger but which +uses extremely little TOC space. You may wish to use this option +only on files that contain less frequently-executed code. -The default is @option{-mgpopt} except when @option{-fpic} or -@option{-fPIC} is specified to generate position-independent code. -Note that the Nios II ABI does not permit GP-relative accesses from -shared libraries. +@item -maix64 +@itemx -maix32 +@opindex maix64 +@opindex maix32 +Enable 64-bit AIX ABI and calling convention: 64-bit pointers, 64-bit +@code{long} type, and the infrastructure needed to support them. +Specifying @option{-maix64} implies @option{-mpowerpc64}, +while @option{-maix32} disables the 64-bit ABI and +implies @option{-mno-powerpc64}. GCC defaults to @option{-maix32}. -You may need to specify @option{-mno-gpopt} explicitly when building -programs that include large amounts of small data, including large -GOT data sections. In this case, the 16-bit offset for GP-relative -addressing may not be large enough to allow access to the entire -small data section. +@item -mxl-compat +@itemx -mno-xl-compat +@opindex mxl-compat +@opindex mno-xl-compat +Produce code that conforms more closely to IBM XL compiler semantics +when using AIX-compatible ABI@. Pass floating-point arguments to +prototyped functions beyond the register save area (RSA) on the stack +in addition to argument FPRs. Do not assume that most significant +double in 128-bit long double value is properly rounded when comparing +values and converting to double. Use XL symbol names for long double +support routines. -@item -mel -@itemx -meb -@opindex mel -@opindex meb -Generate little-endian (default) or big-endian (experimental) code, -respectively. +The AIX calling convention was extended but not initially documented to +handle an obscure K&R C case of calling a function that takes the +address of its arguments with fewer arguments than declared. IBM XL +compilers access floating-point arguments that do not fit in the +RSA from the stack when a subroutine is compiled without +optimization. Because always storing floating-point arguments on the +stack is inefficient and rarely needed, this option is not enabled by +default and only is necessary when calling subroutines compiled by IBM +XL compilers without optimization. -@item -mbypass-cache -@itemx -mno-bypass-cache -@opindex mno-bypass-cache -@opindex mbypass-cache -Force all load and store instructions to always bypass cache by -using I/O variants of the instructions. The default is not to -bypass the cache. +@item -mpe +@opindex mpe +Support @dfn{IBM RS/6000 SP} @dfn{Parallel Environment} (PE)@. Link an +application written to use message passing with special startup code to +enable the application to run. The system must have PE installed in the +standard location (@file{/usr/lpp/ppe.poe/}), or the @file{specs} file +must be overridden with the @option{-specs=} option to specify the +appropriate directory location. The Parallel Environment does not +support threads, so the @option{-mpe} option and the @option{-pthread} +option are incompatible. -@item -mno-cache-volatile -@itemx -mcache-volatile -@opindex mcache-volatile -@opindex mno-cache-volatile -Volatile memory access bypass the cache using the I/O variants of -the load and store instructions. The default is not to bypass the cache. +@item -malign-natural +@itemx -malign-power +@opindex malign-natural +@opindex malign-power +On AIX, 32-bit Darwin, and 64-bit PowerPC GNU/Linux, the option +@option{-malign-natural} overrides the ABI-defined alignment of larger +types, such as floating-point doubles, on their natural size-based boundary. +The option @option{-malign-power} instructs GCC to follow the ABI-specified +alignment rules. GCC defaults to the standard alignment defined in the ABI@. -@item -mno-fast-sw-div -@itemx -mfast-sw-div -@opindex mno-fast-sw-div -@opindex mfast-sw-div -Do not use table-based fast divide for small numbers. The default -is to use the fast divide at @option{-O3} and above. +On 64-bit Darwin, natural alignment is the default, and @option{-malign-power} +is not supported. -@item -mno-hw-mul -@itemx -mhw-mul -@itemx -mno-hw-mulx -@itemx -mhw-mulx -@itemx -mno-hw-div -@itemx -mhw-div -@opindex mno-hw-mul -@opindex mhw-mul -@opindex mno-hw-mulx -@opindex mhw-mulx -@opindex mno-hw-div -@opindex mhw-div -Enable or disable emitting @code{mul}, @code{mulx} and @code{div} family of -instructions by the compiler. The default is to emit @code{mul} -and not emit @code{div} and @code{mulx}. +@item -msoft-float +@itemx -mhard-float +@opindex msoft-float +@opindex mhard-float +Generate code that does not use (uses) the floating-point register set. +Software floating-point emulation is provided if you use the +@option{-msoft-float} option, and pass the option to GCC when linking. -@item -mcustom-@var{insn}=@var{N} -@itemx -mno-custom-@var{insn} -@opindex mcustom-@var{insn} -@opindex mno-custom-@var{insn} -Each @option{-mcustom-@var{insn}=@var{N}} option enables use of a -custom instruction with encoding @var{N} when generating code that uses -@var{insn}. For example, @option{-mcustom-fadds=253} generates custom -instruction 253 for single-precision floating-point add operations instead -of the default behavior of using a library call. +@item -msingle-float +@itemx -mdouble-float +@opindex msingle-float +@opindex mdouble-float +Generate code for single- or double-precision floating-point operations. +@option{-mdouble-float} implies @option{-msingle-float}. -The following values of @var{insn} are supported. Except as otherwise -noted, floating-point operations are expected to be implemented with -normal IEEE 754 semantics and correspond directly to the C operators or the -equivalent GCC built-in functions (@pxref{Other Builtins}). +@item -msimple-fpu +@opindex msimple-fpu +Do not generate @code{sqrt} and @code{div} instructions for hardware +floating-point unit. -Single-precision floating point: -@table @asis +@item -mfpu=@var{name} +@opindex mfpu +Specify type of floating-point unit. Valid values for @var{name} are +@samp{sp_lite} (equivalent to @option{-msingle-float -msimple-fpu}), +@samp{dp_lite} (equivalent to @option{-mdouble-float -msimple-fpu}), +@samp{sp_full} (equivalent to @option{-msingle-float}), +and @samp{dp_full} (equivalent to @option{-mdouble-float}). -@item @samp{fadds}, @samp{fsubs}, @samp{fdivs}, @samp{fmuls} -Binary arithmetic operations. +@item -mxilinx-fpu +@opindex mxilinx-fpu +Perform optimizations for the floating-point unit on Xilinx PPC 405/440. -@item @samp{fnegs} -Unary negation. +@item -mmultiple +@itemx -mno-multiple +@opindex mmultiple +@opindex mno-multiple +Generate code that uses (does not use) the load multiple word +instructions and the store multiple word instructions. These +instructions are generated by default on POWER systems, and not +generated on PowerPC systems. Do not use @option{-mmultiple} on little-endian +PowerPC systems, since those instructions do not work when the +processor is in little-endian mode. The exceptions are PPC740 and +PPC750 which permit these instructions in little-endian mode. -@item @samp{fabss} -Unary absolute value. +@item -mstring +@itemx -mno-string +@opindex mstring +@opindex mno-string +Generate code that uses (does not use) the load string instructions +and the store string word instructions to save multiple registers and +do small block moves. These instructions are generated by default on +POWER systems, and not generated on PowerPC systems. Do not use +@option{-mstring} on little-endian PowerPC systems, since those +instructions do not work when the processor is in little-endian mode. +The exceptions are PPC740 and PPC750 which permit these instructions +in little-endian mode. -@item @samp{fcmpeqs}, @samp{fcmpges}, @samp{fcmpgts}, @samp{fcmples}, @samp{fcmplts}, @samp{fcmpnes} -Comparison operations. +@item -mupdate +@itemx -mno-update +@opindex mupdate +@opindex mno-update +Generate code that uses (does not use) the load or store instructions +that update the base register to the address of the calculated memory +location. These instructions are generated by default. If you use +@option{-mno-update}, there is a small window between the time that the +stack pointer is updated and the address of the previous frame is +stored, which means code that walks the stack frame across interrupts or +signals may get corrupted data. -@item @samp{fmins}, @samp{fmaxs} -Floating-point minimum and maximum. These instructions are only -generated if @option{-ffinite-math-only} is specified. +@item -mavoid-indexed-addresses +@itemx -mno-avoid-indexed-addresses +@opindex mavoid-indexed-addresses +@opindex mno-avoid-indexed-addresses +Generate code that tries to avoid (not avoid) the use of indexed load +or store instructions. These instructions can incur a performance +penalty on Power6 processors in certain situations, such as when +stepping through large arrays that cross a 16M boundary. This option +is enabled by default when targeting Power6 and disabled otherwise. -@item @samp{fsqrts} -Unary square root operation. +@item -mfused-madd +@itemx -mno-fused-madd +@opindex mfused-madd +@opindex mno-fused-madd +Generate code that uses (does not use) the floating-point multiply and +accumulate instructions. These instructions are generated by default +if hardware floating point is used. The machine-dependent +@option{-mfused-madd} option is now mapped to the machine-independent +@option{-ffp-contract=fast} option, and @option{-mno-fused-madd} is +mapped to @option{-ffp-contract=off}. -@item @samp{fcoss}, @samp{fsins}, @samp{ftans}, @samp{fatans}, @samp{fexps}, @samp{flogs} -Floating-point trigonometric and exponential functions. These instructions -are only generated if @option{-funsafe-math-optimizations} is also specified. +@item -mmulhw +@itemx -mno-mulhw +@opindex mmulhw +@opindex mno-mulhw +Generate code that uses (does not use) the half-word multiply and +multiply-accumulate instructions on the IBM 405, 440, 464 and 476 processors. +These instructions are generated by default when targeting those +processors. -@end table +@item -mdlmzb +@itemx -mno-dlmzb +@opindex mdlmzb +@opindex mno-dlmzb +Generate code that uses (does not use) the string-search @samp{dlmzb} +instruction on the IBM 405, 440, 464 and 476 processors. This instruction is +generated by default when targeting those processors. -Double-precision floating point: -@table @asis +@item -mno-bit-align +@itemx -mbit-align +@opindex mno-bit-align +@opindex mbit-align +On System V.4 and embedded PowerPC systems do not (do) force structures +and unions that contain bit-fields to be aligned to the base type of the +bit-field. -@item @samp{faddd}, @samp{fsubd}, @samp{fdivd}, @samp{fmuld} -Binary arithmetic operations. +For example, by default a structure containing nothing but 8 +@code{unsigned} bit-fields of length 1 is aligned to a 4-byte +boundary and has a size of 4 bytes. By using @option{-mno-bit-align}, +the structure is aligned to a 1-byte boundary and is 1 byte in +size. -@item @samp{fnegd} -Unary negation. +@item -mno-strict-align +@itemx -mstrict-align +@opindex mno-strict-align +@opindex mstrict-align +On System V.4 and embedded PowerPC systems do not (do) assume that +unaligned memory references are handled by the system. -@item @samp{fabsd} -Unary absolute value. +@item -mrelocatable +@itemx -mno-relocatable +@opindex mrelocatable +@opindex mno-relocatable +Generate code that allows (does not allow) a static executable to be +relocated to a different address at run time. A simple embedded +PowerPC system loader should relocate the entire contents of +@code{.got2} and 4-byte locations listed in the @code{.fixup} section, +a table of 32-bit addresses generated by this option. For this to +work, all objects linked together must be compiled with +@option{-mrelocatable} or @option{-mrelocatable-lib}. +@option{-mrelocatable} code aligns the stack to an 8-byte boundary. -@item @samp{fcmpeqd}, @samp{fcmpged}, @samp{fcmpgtd}, @samp{fcmpled}, @samp{fcmpltd}, @samp{fcmpned} -Comparison operations. +@item -mrelocatable-lib +@itemx -mno-relocatable-lib +@opindex mrelocatable-lib +@opindex mno-relocatable-lib +Like @option{-mrelocatable}, @option{-mrelocatable-lib} generates a +@code{.fixup} section to allow static executables to be relocated at +run time, but @option{-mrelocatable-lib} does not use the smaller stack +alignment of @option{-mrelocatable}. Objects compiled with +@option{-mrelocatable-lib} may be linked with objects compiled with +any combination of the @option{-mrelocatable} options. -@item @samp{fmind}, @samp{fmaxd} -Double-precision minimum and maximum. These instructions are only -generated if @option{-ffinite-math-only} is specified. +@item -mno-toc +@itemx -mtoc +@opindex mno-toc +@opindex mtoc +On System V.4 and embedded PowerPC systems do not (do) assume that +register 2 contains a pointer to a global area pointing to the addresses +used in the program. -@item @samp{fsqrtd} -Unary square root operation. +@item -mlittle +@itemx -mlittle-endian +@opindex mlittle +@opindex mlittle-endian +On System V.4 and embedded PowerPC systems compile code for the +processor in little-endian mode. The @option{-mlittle-endian} option is +the same as @option{-mlittle}. -@item @samp{fcosd}, @samp{fsind}, @samp{ftand}, @samp{fatand}, @samp{fexpd}, @samp{flogd} -Double-precision trigonometric and exponential functions. These instructions -are only generated if @option{-funsafe-math-optimizations} is also specified. +@item -mbig +@itemx -mbig-endian +@opindex mbig +@opindex mbig-endian +On System V.4 and embedded PowerPC systems compile code for the +processor in big-endian mode. The @option{-mbig-endian} option is +the same as @option{-mbig}. -@end table +@item -mdynamic-no-pic +@opindex mdynamic-no-pic +On Darwin and Mac OS X systems, compile code so that it is not +relocatable, but that its external references are relocatable. The +resulting code is suitable for applications, but not shared +libraries. -Conversions: -@table @asis -@item @samp{fextsd} -Conversion from single precision to double precision. +@item -msingle-pic-base +@opindex msingle-pic-base +Treat the register used for PIC addressing as read-only, rather than +loading it in the prologue for each function. The runtime system is +responsible for initializing this register with an appropriate value +before execution begins. -@item @samp{ftruncds} -Conversion from double precision to single precision. +@item -mprioritize-restricted-insns=@var{priority} +@opindex mprioritize-restricted-insns +This option controls the priority that is assigned to +dispatch-slot restricted instructions during the second scheduling +pass. The argument @var{priority} takes the value @samp{0}, @samp{1}, +or @samp{2} to assign no, highest, or second-highest (respectively) +priority to dispatch-slot restricted +instructions. -@item @samp{fixsi}, @samp{fixsu}, @samp{fixdi}, @samp{fixdu} -Conversion from floating point to signed or unsigned integer types, with -truncation towards zero. +@item -msched-costly-dep=@var{dependence_type} +@opindex msched-costly-dep +This option controls which dependences are considered costly +by the target during instruction scheduling. The argument +@var{dependence_type} takes one of the following values: -@item @samp{round} -Conversion from single-precision floating point to signed integer, -rounding to the nearest integer and ties away from zero. -This corresponds to the @code{__builtin_lroundf} function when -@option{-fno-math-errno} is used. +@table @asis +@item @samp{no} +No dependence is costly. -@item @samp{floatis}, @samp{floatus}, @samp{floatid}, @samp{floatud} -Conversion from signed or unsigned integer types to floating-point types. +@item @samp{all} +All dependences are costly. + +@item @samp{true_store_to_load} +A true dependence from store to load is costly. +@item @samp{store_to_load} +Any dependence from store to load is costly. + +@item @var{number} +Any dependence for which the latency is greater than or equal to +@var{number} is costly. @end table -In addition, all of the following transfer instructions for internal -registers X and Y must be provided to use any of the double-precision -floating-point instructions. Custom instructions taking two -double-precision source operands expect the first operand in the -64-bit register X. The other operand (or only operand of a unary -operation) is given to the custom arithmetic instruction with the -least significant half in source register @var{src1} and the most -significant half in @var{src2}. A custom instruction that returns a -double-precision result returns the most significant 32 bits in the -destination register and the other half in 32-bit register Y. -GCC automatically generates the necessary code sequences to write -register X and/or read register Y when double-precision floating-point -instructions are used. +@item -minsert-sched-nops=@var{scheme} +@opindex minsert-sched-nops +This option controls which NOP insertion scheme is used during +the second scheduling pass. The argument @var{scheme} takes one of the +following values: @table @asis +@item @samp{no} +Don't insert NOPs. -@item @samp{fwrx} -Write @var{src1} into the least significant half of X and @var{src2} into -the most significant half of X. - -@item @samp{fwry} -Write @var{src1} into Y. +@item @samp{pad} +Pad with NOPs any dispatch group that has vacant issue slots, +according to the scheduler's grouping. -@item @samp{frdxhi}, @samp{frdxlo} -Read the most or least (respectively) significant half of X and store it in -@var{dest}. +@item @samp{regroup_exact} +Insert NOPs to force costly dependent insns into +separate groups. Insert exactly as many NOPs as needed to force an insn +to a new group, according to the estimated processor grouping. -@item @samp{frdy} -Read the value of Y and store it into @var{dest}. +@item @var{number} +Insert NOPs to force costly dependent insns into +separate groups. Insert @var{number} NOPs to force an insn to a new group. @end table -Note that you can gain more local control over generation of Nios II custom -instructions by using the @code{target("custom-@var{insn}=@var{N}")} -and @code{target("no-custom-@var{insn}")} function attributes -(@pxref{Function Attributes}) -or pragmas (@pxref{Function Specific Option Pragmas}). +@item -mcall-sysv +@opindex mcall-sysv +On System V.4 and embedded PowerPC systems compile code using calling +conventions that adhere to the March 1995 draft of the System V +Application Binary Interface, PowerPC processor supplement. This is the +default unless you configured GCC using @samp{powerpc-*-eabiaix}. -@item -mcustom-fpu-cfg=@var{name} -@opindex mcustom-fpu-cfg +@item -mcall-sysv-eabi +@itemx -mcall-eabi +@opindex mcall-sysv-eabi +@opindex mcall-eabi +Specify both @option{-mcall-sysv} and @option{-meabi} options. -This option enables a predefined, named set of custom instruction encodings -(see @option{-mcustom-@var{insn}} above). -Currently, the following sets are defined: +@item -mcall-sysv-noeabi +@opindex mcall-sysv-noeabi +Specify both @option{-mcall-sysv} and @option{-mno-eabi} options. -@option{-mcustom-fpu-cfg=60-1} is equivalent to: -@gccoptlist{-mcustom-fmuls=252 @gol --mcustom-fadds=253 @gol --mcustom-fsubs=254 @gol --fsingle-precision-constant} +@item -mcall-aixdesc +@opindex m +On System V.4 and embedded PowerPC systems compile code for the AIX +operating system. -@option{-mcustom-fpu-cfg=60-2} is equivalent to: -@gccoptlist{-mcustom-fmuls=252 @gol --mcustom-fadds=253 @gol --mcustom-fsubs=254 @gol --mcustom-fdivs=255 @gol --fsingle-precision-constant} +@item -mcall-linux +@opindex mcall-linux +On System V.4 and embedded PowerPC systems compile code for the +Linux-based GNU system. -@option{-mcustom-fpu-cfg=72-3} is equivalent to: -@gccoptlist{-mcustom-floatus=243 @gol --mcustom-fixsi=244 @gol --mcustom-floatis=245 @gol --mcustom-fcmpgts=246 @gol --mcustom-fcmples=249 @gol --mcustom-fcmpeqs=250 @gol --mcustom-fcmpnes=251 @gol --mcustom-fmuls=252 @gol --mcustom-fadds=253 @gol --mcustom-fsubs=254 @gol --mcustom-fdivs=255 @gol --fsingle-precision-constant} +@item -mcall-freebsd +@opindex mcall-freebsd +On System V.4 and embedded PowerPC systems compile code for the +FreeBSD operating system. -Custom instruction assignments given by individual -@option{-mcustom-@var{insn}=} options override those given by -@option{-mcustom-fpu-cfg=}, regardless of the -order of the options on the command line. - -Note that you can gain more local control over selection of a FPU -configuration by using the @code{target("custom-fpu-cfg=@var{name}")} -function attribute (@pxref{Function Attributes}) -or pragma (@pxref{Function Specific Option Pragmas}). +@item -mcall-netbsd +@opindex mcall-netbsd +On System V.4 and embedded PowerPC systems compile code for the +NetBSD operating system. -@end table +@item -mcall-openbsd +@opindex mcall-netbsd +On System V.4 and embedded PowerPC systems compile code for the +OpenBSD operating system. -These additional @samp{-m} options are available for the Altera Nios II -ELF (bare-metal) target: +@item -maix-struct-return +@opindex maix-struct-return +Return all structures in memory (as specified by the AIX ABI)@. -@table @gcctabopt +@item -msvr4-struct-return +@opindex msvr4-struct-return +Return structures smaller than 8 bytes in registers (as specified by the +SVR4 ABI)@. -@item -mhal -@opindex mhal -Link with HAL BSP. This suppresses linking with the GCC-provided C runtime -startup and termination code, and is typically used in conjunction with -@option{-msys-crt0=} to specify the location of the alternate startup code -provided by the HAL BSP. +@item -mabi=@var{abi-type} +@opindex mabi +Extend the current ABI with a particular extension, or remove such extension. +Valid values are @samp{altivec}, @samp{no-altivec}, @samp{spe}, +@samp{no-spe}, @samp{ibmlongdouble}, @samp{ieeelongdouble}, +@samp{elfv1}, @samp{elfv2}@. -@item -msmallc -@opindex msmallc -Link with a limited version of the C library, @option{-lsmallc}, rather than -Newlib. +@item -mabi=spe +@opindex mabi=spe +Extend the current ABI with SPE ABI extensions. This does not change +the default ABI, instead it adds the SPE ABI extensions to the current +ABI@. -@item -msys-crt0=@var{startfile} -@opindex msys-crt0 -@var{startfile} is the file name of the startfile (crt0) to use -when linking. This option is only useful in conjunction with @option{-mhal}. +@item -mabi=no-spe +@opindex mabi=no-spe +Disable Book-E SPE ABI extensions for the current ABI@. -@item -msys-lib=@var{systemlib} -@opindex msys-lib -@var{systemlib} is the library name of the library that provides -low-level system calls required by the C library, -e.g. @code{read} and @code{write}. -This option is typically used to link with a library provided by a HAL BSP. +@item -mabi=ibmlongdouble +@opindex mabi=ibmlongdouble +Change the current ABI to use IBM extended-precision long double. +This is a PowerPC 32-bit SYSV ABI option. -@end table +@item -mabi=ieeelongdouble +@opindex mabi=ieeelongdouble +Change the current ABI to use IEEE extended-precision long double. +This is a PowerPC 32-bit Linux ABI option. -@node PDP-11 Options -@subsection PDP-11 Options -@cindex PDP-11 Options +@item -mabi=elfv1 +@opindex mabi=elfv1 +Change the current ABI to use the ELFv1 ABI. +This is the default ABI for big-endian PowerPC 64-bit Linux. +Overriding the default ABI requires special system support and is +likely to fail in spectacular ways. -These options are defined for the PDP-11: +@item -mabi=elfv2 +@opindex mabi=elfv2 +Change the current ABI to use the ELFv2 ABI. +This is the default ABI for little-endian PowerPC 64-bit Linux. +Overriding the default ABI requires special system support and is +likely to fail in spectacular ways. -@table @gcctabopt -@item -mfpu -@opindex mfpu -Use hardware FPP floating point. This is the default. (FIS floating -point on the PDP-11/40 is not supported.) +@item -mprototype +@itemx -mno-prototype +@opindex mprototype +@opindex mno-prototype +On System V.4 and embedded PowerPC systems assume that all calls to +variable argument functions are properly prototyped. Otherwise, the +compiler must insert an instruction before every non-prototyped call to +set or clear bit 6 of the condition code register (@code{CR}) to +indicate whether floating-point values are passed in the floating-point +registers in case the function takes variable arguments. With +@option{-mprototype}, only calls to prototyped variable argument functions +set or clear the bit. -@item -msoft-float -@opindex msoft-float -Do not use hardware floating point. +@item -msim +@opindex msim +On embedded PowerPC systems, assume that the startup module is called +@file{sim-crt0.o} and that the standard C libraries are @file{libsim.a} and +@file{libc.a}. This is the default for @samp{powerpc-*-eabisim} +configurations. -@item -mac0 -@opindex mac0 -Return floating-point results in ac0 (fr0 in Unix assembler syntax). +@item -mmvme +@opindex mmvme +On embedded PowerPC systems, assume that the startup module is called +@file{crt0.o} and the standard C libraries are @file{libmvme.a} and +@file{libc.a}. -@item -mno-ac0 -@opindex mno-ac0 -Return floating-point results in memory. This is the default. +@item -mads +@opindex mads +On embedded PowerPC systems, assume that the startup module is called +@file{crt0.o} and the standard C libraries are @file{libads.a} and +@file{libc.a}. -@item -m40 -@opindex m40 -Generate code for a PDP-11/40. +@item -myellowknife +@opindex myellowknife +On embedded PowerPC systems, assume that the startup module is called +@file{crt0.o} and the standard C libraries are @file{libyk.a} and +@file{libc.a}. -@item -m45 -@opindex m45 -Generate code for a PDP-11/45. This is the default. +@item -mvxworks +@opindex mvxworks +On System V.4 and embedded PowerPC systems, specify that you are +compiling for a VxWorks system. -@item -m10 -@opindex m10 -Generate code for a PDP-11/10. +@item -memb +@opindex memb +On embedded PowerPC systems, set the @code{PPC_EMB} bit in the ELF flags +header to indicate that @samp{eabi} extended relocations are used. -@item -mbcopy-builtin -@opindex mbcopy-builtin -Use inline @code{movmemhi} patterns for copying memory. This is the -default. +@item -meabi +@itemx -mno-eabi +@opindex meabi +@opindex mno-eabi +On System V.4 and embedded PowerPC systems do (do not) adhere to the +Embedded Applications Binary Interface (EABI), which is a set of +modifications to the System V.4 specifications. Selecting @option{-meabi} +means that the stack is aligned to an 8-byte boundary, a function +@code{__eabi} is called from @code{main} to set up the EABI +environment, and the @option{-msdata} option can use both @code{r2} and +@code{r13} to point to two separate small data areas. Selecting +@option{-mno-eabi} means that the stack is aligned to a 16-byte boundary, +no EABI initialization function is called from @code{main}, and the +@option{-msdata} option only uses @code{r13} to point to a single +small data area. The @option{-meabi} option is on by default if you +configured GCC using one of the @samp{powerpc*-*-eabi*} options. -@item -mbcopy -@opindex mbcopy -Do not use inline @code{movmemhi} patterns for copying memory. +@item -msdata=eabi +@opindex msdata=eabi +On System V.4 and embedded PowerPC systems, put small initialized +@code{const} global and static data in the @code{.sdata2} section, which +is pointed to by register @code{r2}. Put small initialized +non-@code{const} global and static data in the @code{.sdata} section, +which is pointed to by register @code{r13}. Put small uninitialized +global and static data in the @code{.sbss} section, which is adjacent to +the @code{.sdata} section. The @option{-msdata=eabi} option is +incompatible with the @option{-mrelocatable} option. The +@option{-msdata=eabi} option also sets the @option{-memb} option. -@item -mint16 -@itemx -mno-int32 -@opindex mint16 -@opindex mno-int32 -Use 16-bit @code{int}. This is the default. +@item -msdata=sysv +@opindex msdata=sysv +On System V.4 and embedded PowerPC systems, put small global and static +data in the @code{.sdata} section, which is pointed to by register +@code{r13}. Put small uninitialized global and static data in the +@code{.sbss} section, which is adjacent to the @code{.sdata} section. +The @option{-msdata=sysv} option is incompatible with the +@option{-mrelocatable} option. -@item -mint32 -@itemx -mno-int16 -@opindex mint32 -@opindex mno-int16 -Use 32-bit @code{int}. +@item -msdata=default +@itemx -msdata +@opindex msdata=default +@opindex msdata +On System V.4 and embedded PowerPC systems, if @option{-meabi} is used, +compile code the same as @option{-msdata=eabi}, otherwise compile code the +same as @option{-msdata=sysv}. -@item -mfloat64 -@itemx -mno-float32 -@opindex mfloat64 -@opindex mno-float32 -Use 64-bit @code{float}. This is the default. +@item -msdata=data +@opindex msdata=data +On System V.4 and embedded PowerPC systems, put small global +data in the @code{.sdata} section. Put small uninitialized global +data in the @code{.sbss} section. Do not use register @code{r13} +to address small data however. This is the default behavior unless +other @option{-msdata} options are used. -@item -mfloat32 -@itemx -mno-float64 -@opindex mfloat32 -@opindex mno-float64 -Use 32-bit @code{float}. +@item -msdata=none +@itemx -mno-sdata +@opindex msdata=none +@opindex mno-sdata +On embedded PowerPC systems, put all initialized global and static data +in the @code{.data} section, and all uninitialized data in the +@code{.bss} section. -@item -mabshi -@opindex mabshi -Use @code{abshi2} pattern. This is the default. +@item -mblock-move-inline-limit=@var{num} +@opindex mblock-move-inline-limit +Inline all block moves (such as calls to @code{memcpy} or structure +copies) less than or equal to @var{num} bytes. The minimum value for +@var{num} is 32 bytes on 32-bit targets and 64 bytes on 64-bit +targets. The default value is target-specific. -@item -mno-abshi -@opindex mno-abshi -Do not use @code{abshi2} pattern. +@item -G @var{num} +@opindex G +@cindex smaller data references (PowerPC) +@cindex .sdata/.sdata2 references (PowerPC) +On embedded PowerPC systems, put global and static items less than or +equal to @var{num} bytes into the small data or BSS sections instead of +the normal data or BSS section. By default, @var{num} is 8. The +@option{-G @var{num}} switch is also passed to the linker. +All modules should be compiled with the same @option{-G @var{num}} value. -@item -mbranch-expensive -@opindex mbranch-expensive -Pretend that branches are expensive. This is for experimenting with -code generation only. - -@item -mbranch-cheap -@opindex mbranch-cheap -Do not pretend that branches are expensive. This is the default. +@item -mregnames +@itemx -mno-regnames +@opindex mregnames +@opindex mno-regnames +On System V.4 and embedded PowerPC systems do (do not) emit register +names in the assembly language output using symbolic forms. -@item -munix-asm -@opindex munix-asm -Use Unix assembler syntax. This is the default when configured for -@samp{pdp11-*-bsd}. +@item -mlongcall +@itemx -mno-longcall +@opindex mlongcall +@opindex mno-longcall +By default assume that all calls are far away so that a longer and more +expensive calling sequence is required. This is required for calls +farther than 32 megabytes (33,554,432 bytes) from the current location. +A short call is generated if the compiler knows +the call cannot be that far away. This setting can be overridden by +the @code{shortcall} function attribute, or by @code{#pragma +longcall(0)}. -@item -mdec-asm -@opindex mdec-asm -Use DEC assembler syntax. This is the default when configured for any -PDP-11 target other than @samp{pdp11-*-bsd}. -@end table +Some linkers are capable of detecting out-of-range calls and generating +glue code on the fly. On these systems, long calls are unnecessary and +generate slower code. As of this writing, the AIX linker can do this, +as can the GNU linker for PowerPC/64. It is planned to add this feature +to the GNU linker for 32-bit PowerPC systems as well. -@node picoChip Options -@subsection picoChip Options -@cindex picoChip options +On Darwin/PPC systems, @code{#pragma longcall} generates @code{jbsr +callee, L42}, plus a @dfn{branch island} (glue code). The two target +addresses represent the callee and the branch island. The +Darwin/PPC linker prefers the first address and generates a @code{bl +callee} if the PPC @code{bl} instruction reaches the callee directly; +otherwise, the linker generates @code{bl L42} to call the branch +island. The branch island is appended to the body of the +calling function; it computes the full 32-bit address of the callee +and jumps to it. -These @samp{-m} options are defined for picoChip implementations: +On Mach-O (Darwin) systems, this option directs the compiler emit to +the glue for every direct call, and the Darwin linker decides whether +to use or discard it. -@table @gcctabopt +In the future, GCC may ignore all longcall specifications +when the linker is known to generate glue. -@item -mae=@var{ae_type} -@opindex mcpu -Set the instruction set, register set, and instruction scheduling -parameters for array element type @var{ae_type}. Supported values -for @var{ae_type} are @samp{ANY}, @samp{MUL}, and @samp{MAC}. +@item -mtls-markers +@itemx -mno-tls-markers +@opindex mtls-markers +@opindex mno-tls-markers +Mark (do not mark) calls to @code{__tls_get_addr} with a relocation +specifying the function argument. The relocation allows the linker to +reliably associate function call with argument setup instructions for +TLS optimization, which in turn allows GCC to better schedule the +sequence. -@option{-mae=ANY} selects a completely generic AE type. Code -generated with this option runs on any of the other AE types. The -code is not as efficient as it would be if compiled for a specific -AE type, and some types of operation (e.g., multiplication) do not -work properly on all types of AE. +@item -pthread +@opindex pthread +Adds support for multithreading with the @dfn{pthreads} library. +This option sets flags for both the preprocessor and linker. -@option{-mae=MUL} selects a MUL AE type. This is the most useful AE type -for compiled code, and is the default. +@item -mrecip +@itemx -mno-recip +@opindex mrecip +This option enables use of the reciprocal estimate and +reciprocal square root estimate instructions with additional +Newton-Raphson steps to increase precision instead of doing a divide or +square root and divide for floating-point arguments. You should use +the @option{-ffast-math} option when using @option{-mrecip} (or at +least @option{-funsafe-math-optimizations}, +@option{-finite-math-only}, @option{-freciprocal-math} and +@option{-fno-trapping-math}). Note that while the throughput of the +sequence is generally higher than the throughput of the non-reciprocal +instruction, the precision of the sequence can be decreased by up to 2 +ulp (i.e.@: the inverse of 1.0 equals 0.99999994) for reciprocal square +roots. -@option{-mae=MAC} selects a DSP-style MAC AE. Code compiled with this -option may suffer from poor performance of byte (char) manipulation, -since the DSP AE does not provide hardware support for byte load/stores. +@item -mrecip=@var{opt} +@opindex mrecip=opt +This option controls which reciprocal estimate instructions +may be used. @var{opt} is a comma-separated list of options, which may +be preceded by a @code{!} to invert the option: -@item -msymbol-as-address -Enable the compiler to directly use a symbol name as an address in a -load/store instruction, without first loading it into a -register. Typically, the use of this option generates larger -programs, which run faster than when the option isn't used. However, the -results vary from program to program, so it is left as a user option, -rather than being permanently enabled. +@table @samp -@item -mno-inefficient-warnings -Disables warnings about the generation of inefficient code. These -warnings can be generated, for example, when compiling code that -performs byte-level memory operations on the MAC AE type. The MAC AE has -no hardware support for byte-level memory operations, so all byte -load/stores must be synthesized from word load/store operations. This is -inefficient and a warning is generated to indicate -that you should rewrite the code to avoid byte operations, or to target -an AE type that has the necessary hardware support. This option disables -these warnings. +@item all +Enable all estimate instructions. -@end table +@item default +Enable the default instructions, equivalent to @option{-mrecip}. -@node PowerPC Options -@subsection PowerPC Options -@cindex PowerPC options +@item none +Disable all estimate instructions, equivalent to @option{-mno-recip}. -These are listed under @xref{RS/6000 and PowerPC Options}. +@item div +Enable the reciprocal approximation instructions for both +single and double precision. -@node RL78 Options -@subsection RL78 Options -@cindex RL78 Options +@item divf +Enable the single-precision reciprocal approximation instructions. -@table @gcctabopt +@item divd +Enable the double-precision reciprocal approximation instructions. -@item -msim -@opindex msim -Links in additional target libraries to support operation within a -simulator. +@item rsqrt +Enable the reciprocal square root approximation instructions for both +single and double precision. -@item -mmul=none -@itemx -mmul=g13 -@itemx -mmul=rl78 -@opindex mmul -Specifies the type of hardware multiplication support to be used. The -default is @samp{none}, which uses software multiplication functions. -The @samp{g13} option is for the hardware multiply/divide peripheral -only on the RL78/G13 targets. The @samp{rl78} option is for the -standard hardware multiplication defined in the RL78 software manual. +@item rsqrtf +Enable the single-precision reciprocal square root approximation instructions. -@item -m64bit-doubles -@itemx -m32bit-doubles -@opindex m64bit-doubles -@opindex m32bit-doubles -Make the @code{double} data type be 64 bits (@option{-m64bit-doubles}) -or 32 bits (@option{-m32bit-doubles}) in size. The default is -@option{-m32bit-doubles}. +@item rsqrtd +Enable the double-precision reciprocal square root approximation instructions. @end table -@node RS/6000 and PowerPC Options -@subsection IBM RS/6000 and PowerPC Options -@cindex RS/6000 and PowerPC Options -@cindex IBM RS/6000 and PowerPC Options - -These @samp{-m} options are defined for the IBM RS/6000 and PowerPC: -@table @gcctabopt -@item -mpowerpc-gpopt -@itemx -mno-powerpc-gpopt -@itemx -mpowerpc-gfxopt -@itemx -mno-powerpc-gfxopt -@need 800 -@itemx -mpowerpc64 -@itemx -mno-powerpc64 -@itemx -mmfcrf -@itemx -mno-mfcrf -@itemx -mpopcntb -@itemx -mno-popcntb -@itemx -mpopcntd -@itemx -mno-popcntd -@itemx -mfprnd -@itemx -mno-fprnd -@need 800 -@itemx -mcmpb -@itemx -mno-cmpb -@itemx -mmfpgpr -@itemx -mno-mfpgpr -@itemx -mhard-dfp -@itemx -mno-hard-dfp -@opindex mpowerpc-gpopt -@opindex mno-powerpc-gpopt -@opindex mpowerpc-gfxopt -@opindex mno-powerpc-gfxopt -@opindex mpowerpc64 -@opindex mno-powerpc64 -@opindex mmfcrf -@opindex mno-mfcrf -@opindex mpopcntb -@opindex mno-popcntb -@opindex mpopcntd -@opindex mno-popcntd -@opindex mfprnd -@opindex mno-fprnd -@opindex mcmpb -@opindex mno-cmpb -@opindex mmfpgpr -@opindex mno-mfpgpr -@opindex mhard-dfp -@opindex mno-hard-dfp -You use these options to specify which instructions are available on the -processor you are using. The default value of these options is -determined when configuring GCC@. Specifying the -@option{-mcpu=@var{cpu_type}} overrides the specification of these -options. We recommend you use the @option{-mcpu=@var{cpu_type}} option -rather than the options listed above. +So, for example, @option{-mrecip=all,!rsqrtd} enables +all of the reciprocal estimate instructions, except for the +@code{FRSQRTE}, @code{XSRSQRTEDP}, and @code{XVRSQRTEDP} instructions +which handle the double-precision reciprocal square root calculations. -Specifying @option{-mpowerpc-gpopt} allows -GCC to use the optional PowerPC architecture instructions in the -General Purpose group, including floating-point square root. Specifying -@option{-mpowerpc-gfxopt} allows GCC to -use the optional PowerPC architecture instructions in the Graphics -group, including floating-point select. +@item -mrecip-precision +@itemx -mno-recip-precision +@opindex mrecip-precision +Assume (do not assume) that the reciprocal estimate instructions +provide higher-precision estimates than is mandated by the PowerPC +ABI. Selecting @option{-mcpu=power6}, @option{-mcpu=power7} or +@option{-mcpu=power8} automatically selects @option{-mrecip-precision}. +The double-precision square root estimate instructions are not generated by +default on low-precision machines, since they do not provide an +estimate that converges after three steps. -The @option{-mmfcrf} option allows GCC to generate the move from -condition register field instruction implemented on the POWER4 -processor and other processors that support the PowerPC V2.01 -architecture. -The @option{-mpopcntb} option allows GCC to generate the popcount and -double-precision FP reciprocal estimate instruction implemented on the -POWER5 processor and other processors that support the PowerPC V2.02 -architecture. -The @option{-mpopcntd} option allows GCC to generate the popcount -instruction implemented on the POWER7 processor and other processors -that support the PowerPC V2.06 architecture. -The @option{-mfprnd} option allows GCC to generate the FP round to -integer instructions implemented on the POWER5+ processor and other -processors that support the PowerPC V2.03 architecture. -The @option{-mcmpb} option allows GCC to generate the compare bytes -instruction implemented on the POWER6 processor and other processors -that support the PowerPC V2.05 architecture. -The @option{-mmfpgpr} option allows GCC to generate the FP move to/from -general-purpose register instructions implemented on the POWER6X -processor and other processors that support the extended PowerPC V2.05 -architecture. -The @option{-mhard-dfp} option allows GCC to generate the decimal -floating-point instructions implemented on some POWER processors. - -The @option{-mpowerpc64} option allows GCC to generate the additional -64-bit instructions that are found in the full PowerPC64 architecture -and to treat GPRs as 64-bit, doubleword quantities. GCC defaults to -@option{-mno-powerpc64}. +@item -mveclibabi=@var{type} +@opindex mveclibabi +Specifies the ABI type to use for vectorizing intrinsics using an +external library. The only type supported at present is @samp{mass}, +which specifies to use IBM's Mathematical Acceleration Subsystem +(MASS) libraries for vectorizing intrinsics using external libraries. +GCC currently emits calls to @code{acosd2}, @code{acosf4}, +@code{acoshd2}, @code{acoshf4}, @code{asind2}, @code{asinf4}, +@code{asinhd2}, @code{asinhf4}, @code{atan2d2}, @code{atan2f4}, +@code{atand2}, @code{atanf4}, @code{atanhd2}, @code{atanhf4}, +@code{cbrtd2}, @code{cbrtf4}, @code{cosd2}, @code{cosf4}, +@code{coshd2}, @code{coshf4}, @code{erfcd2}, @code{erfcf4}, +@code{erfd2}, @code{erff4}, @code{exp2d2}, @code{exp2f4}, +@code{expd2}, @code{expf4}, @code{expm1d2}, @code{expm1f4}, +@code{hypotd2}, @code{hypotf4}, @code{lgammad2}, @code{lgammaf4}, +@code{log10d2}, @code{log10f4}, @code{log1pd2}, @code{log1pf4}, +@code{log2d2}, @code{log2f4}, @code{logd2}, @code{logf4}, +@code{powd2}, @code{powf4}, @code{sind2}, @code{sinf4}, @code{sinhd2}, +@code{sinhf4}, @code{sqrtd2}, @code{sqrtf4}, @code{tand2}, +@code{tanf4}, @code{tanhd2}, and @code{tanhf4} when generating code +for power7. Both @option{-ftree-vectorize} and +@option{-funsafe-math-optimizations} must also be enabled. The MASS +libraries must be specified at link time. -@item -mcpu=@var{cpu_type} -@opindex mcpu -Set architecture type, register usage, and -instruction scheduling parameters for machine type @var{cpu_type}. -Supported values for @var{cpu_type} are @samp{401}, @samp{403}, -@samp{405}, @samp{405fp}, @samp{440}, @samp{440fp}, @samp{464}, @samp{464fp}, -@samp{476}, @samp{476fp}, @samp{505}, @samp{601}, @samp{602}, @samp{603}, -@samp{603e}, @samp{604}, @samp{604e}, @samp{620}, @samp{630}, @samp{740}, -@samp{7400}, @samp{7450}, @samp{750}, @samp{801}, @samp{821}, @samp{823}, -@samp{860}, @samp{970}, @samp{8540}, @samp{a2}, @samp{e300c2}, -@samp{e300c3}, @samp{e500mc}, @samp{e500mc64}, @samp{e5500}, -@samp{e6500}, @samp{ec603e}, @samp{G3}, @samp{G4}, @samp{G5}, -@samp{titan}, @samp{power3}, @samp{power4}, @samp{power5}, @samp{power5+}, -@samp{power6}, @samp{power6x}, @samp{power7}, @samp{power8}, @samp{powerpc}, -@samp{powerpc64}, and @samp{rs64}. +@item -mfriz +@itemx -mno-friz +@opindex mfriz +Generate (do not generate) the @code{friz} instruction when the +@option{-funsafe-math-optimizations} option is used to optimize +rounding of floating-point values to 64-bit integer and back to floating +point. The @code{friz} instruction does not return the same value if +the floating-point number is too large to fit in an integer. -@option{-mcpu=powerpc}, and @option{-mcpu=powerpc64} specify pure 32-bit -PowerPC and 64-bit PowerPC architecture machine -types, with an appropriate, generic processor model assumed for -scheduling purposes. +@item -mpointers-to-nested-functions +@itemx -mno-pointers-to-nested-functions +@opindex mpointers-to-nested-functions +Generate (do not generate) code to load up the static chain register +(@code{r11}) when calling through a pointer on AIX and 64-bit Linux +systems where a function pointer points to a 3-word descriptor giving +the function address, TOC value to be loaded in register @code{r2}, and +static chain value to be loaded in register @code{r11}. The +@option{-mpointers-to-nested-functions} is on by default. You cannot +call through pointers to nested functions or pointers +to functions compiled in other languages that use the static chain if +you use the @option{-mno-pointers-to-nested-functions}. -The other options specify a specific processor. Code generated under -those options runs best on that processor, and may not run at all on -others. +@item -msave-toc-indirect +@itemx -mno-save-toc-indirect +@opindex msave-toc-indirect +Generate (do not generate) code to save the TOC value in the reserved +stack location in the function prologue if the function calls through +a pointer on AIX and 64-bit Linux systems. If the TOC value is not +saved in the prologue, it is saved just before the call through the +pointer. The @option{-mno-save-toc-indirect} option is the default. -The @option{-mcpu} options automatically enable or disable the -following options: +@item -mcompat-align-parm +@itemx -mno-compat-align-parm +@opindex mcompat-align-parm +Generate (do not generate) code to pass structure parameters with a +maximum alignment of 64 bits, for compatibility with older versions +of GCC. -@gccoptlist{-maltivec -mfprnd -mhard-float -mmfcrf -mmultiple @gol --mpopcntb -mpopcntd -mpowerpc64 @gol --mpowerpc-gpopt -mpowerpc-gfxopt -msingle-float -mdouble-float @gol --msimple-fpu -mstring -mmulhw -mdlmzb -mmfpgpr -mvsx @gol --mcrypto -mdirect-move -mpower8-fusion -mpower8-vector @gol --mquad-memory -mquad-memory-atomic} +Older versions of GCC (prior to 4.9.0) incorrectly did not align a +structure parameter on a 128-bit boundary when that structure contained +a member requiring 128-bit alignment. This is corrected in more +recent versions of GCC. This option may be used to generate code +that is compatible with functions compiled with older versions of +GCC. -The particular options set for any particular CPU varies between -compiler versions, depending on what setting seems to produce optimal -code for that CPU; it doesn't necessarily reflect the actual hardware's -capabilities. If you wish to set an individual option to a particular -value, you may specify it after the @option{-mcpu} option, like -@option{-mcpu=970 -mno-altivec}. +The @option{-mno-compat-align-parm} option is the default. +@end table -On AIX, the @option{-maltivec} and @option{-mpowerpc64} options are -not enabled or disabled by the @option{-mcpu} option at present because -AIX does not have full support for these options. You may still -enable or disable them individually if you're sure it'll work in your -environment. +@node RX Options +@subsection RX Options +@cindex RX Options -@item -mtune=@var{cpu_type} -@opindex mtune -Set the instruction scheduling parameters for machine type -@var{cpu_type}, but do not set the architecture type or register usage, -as @option{-mcpu=@var{cpu_type}} does. The same -values for @var{cpu_type} are used for @option{-mtune} as for -@option{-mcpu}. If both are specified, the code generated uses the -architecture and registers set by @option{-mcpu}, but the -scheduling parameters set by @option{-mtune}. +These command-line options are defined for RX targets: -@item -mcmodel=small -@opindex mcmodel=small -Generate PowerPC64 code for the small model: The TOC is limited to -64k. +@table @gcctabopt +@item -m64bit-doubles +@itemx -m32bit-doubles +@opindex m64bit-doubles +@opindex m32bit-doubles +Make the @code{double} data type be 64 bits (@option{-m64bit-doubles}) +or 32 bits (@option{-m32bit-doubles}) in size. The default is +@option{-m32bit-doubles}. @emph{Note} RX floating-point hardware only +works on 32-bit values, which is why the default is +@option{-m32bit-doubles}. -@item -mcmodel=medium -@opindex mcmodel=medium -Generate PowerPC64 code for the medium model: The TOC and other static -data may be up to a total of 4G in size. +@item -fpu +@itemx -nofpu +@opindex fpu +@opindex nofpu +Enables (@option{-fpu}) or disables (@option{-nofpu}) the use of RX +floating-point hardware. The default is enabled for the RX600 +series and disabled for the RX200 series. -@item -mcmodel=large -@opindex mcmodel=large -Generate PowerPC64 code for the large model: The TOC may be up to 4G -in size. Other data and code is only limited by the 64-bit address -space. +Floating-point instructions are only generated for 32-bit floating-point +values, however, so the FPU hardware is not used for doubles if the +@option{-m64bit-doubles} option is used. -@item -maltivec -@itemx -mno-altivec -@opindex maltivec -@opindex mno-altivec -Generate code that uses (does not use) AltiVec instructions, and also -enable the use of built-in functions that allow more direct access to -the AltiVec instruction set. You may also need to set -@option{-mabi=altivec} to adjust the current ABI with AltiVec ABI -enhancements. +@emph{Note} If the @option{-fpu} option is enabled then +@option{-funsafe-math-optimizations} is also enabled automatically. +This is because the RX FPU instructions are themselves unsafe. -When @option{-maltivec} is used, rather than @option{-maltivec=le} or -@option{-maltivec=be}, the element order for Altivec intrinsics such -as @code{vec_splat}, @code{vec_extract}, and @code{vec_insert} -match array element order corresponding to the endianness of the -target. That is, element zero identifies the leftmost element in a -vector register when targeting a big-endian platform, and identifies -the rightmost element in a vector register when targeting a -little-endian platform. +@item -mcpu=@var{name} +@opindex mcpu +Selects the type of RX CPU to be targeted. Currently three types are +supported, the generic @samp{RX600} and @samp{RX200} series hardware and +the specific @samp{RX610} CPU. The default is @samp{RX600}. -@item -maltivec=be -@opindex maltivec=be -Generate Altivec instructions using big-endian element order, -regardless of whether the target is big- or little-endian. This is -the default when targeting a big-endian platform. +The only difference between @samp{RX600} and @samp{RX610} is that the +@samp{RX610} does not support the @code{MVTIPL} instruction. -The element order is used to interpret element numbers in Altivec -intrinsics such as @code{vec_splat}, @code{vec_extract}, and -@code{vec_insert}. By default, these match array element order -corresponding to the endianness for the target. +The @samp{RX200} series does not have a hardware floating-point unit +and so @option{-nofpu} is enabled by default when this type is +selected. -@item -maltivec=le -@opindex maltivec=le -Generate Altivec instructions using little-endian element order, -regardless of whether the target is big- or little-endian. This is -the default when targeting a little-endian platform. This option is -currently ignored when targeting a big-endian platform. +@item -mbig-endian-data +@itemx -mlittle-endian-data +@opindex mbig-endian-data +@opindex mlittle-endian-data +Store data (but not code) in the big-endian format. The default is +@option{-mlittle-endian-data}, i.e.@: to store data in the little-endian +format. -The element order is used to interpret element numbers in Altivec -intrinsics such as @code{vec_splat}, @code{vec_extract}, and -@code{vec_insert}. By default, these match array element order -corresponding to the endianness for the target. +@item -msmall-data-limit=@var{N} +@opindex msmall-data-limit +Specifies the maximum size in bytes of global and static variables +which can be placed into the small data area. Using the small data +area can lead to smaller and faster code, but the size of area is +limited and it is up to the programmer to ensure that the area does +not overflow. Also when the small data area is used one of the RX's +registers (usually @code{r13}) is reserved for use pointing to this +area, so it is no longer available for use by the compiler. This +could result in slower and/or larger code if variables are pushed onto +the stack instead of being held in this register. -@item -mvrsave -@itemx -mno-vrsave -@opindex mvrsave -@opindex mno-vrsave -Generate VRSAVE instructions when generating AltiVec code. +Note, common variables (variables that have not been initialized) and +constants are not placed into the small data area as they are assigned +to other sections in the output executable. -@item -mgen-cell-microcode -@opindex mgen-cell-microcode -Generate Cell microcode instructions. +The default value is zero, which disables this feature. Note, this +feature is not enabled by default with higher optimization levels +(@option{-O2} etc) because of the potentially detrimental effects of +reserving a register. It is up to the programmer to experiment and +discover whether this feature is of benefit to their program. See the +description of the @option{-mpid} option for a description of how the +actual register to hold the small data area pointer is chosen. -@item -mwarn-cell-microcode -@opindex mwarn-cell-microcode -Warn when a Cell microcode instruction is emitted. An example -of a Cell microcode instruction is a variable shift. - -@item -msecure-plt -@opindex msecure-plt -Generate code that allows @command{ld} and @command{ld.so} -to build executables and shared -libraries with non-executable @code{.plt} and @code{.got} sections. -This is a PowerPC -32-bit SYSV ABI option. - -@item -mbss-plt -@opindex mbss-plt -Generate code that uses a BSS @code{.plt} section that @command{ld.so} -fills in, and -requires @code{.plt} and @code{.got} -sections that are both writable and executable. -This is a PowerPC 32-bit SYSV ABI option. - -@item -misel -@itemx -mno-isel -@opindex misel -@opindex mno-isel -This switch enables or disables the generation of ISEL instructions. +@item -msim +@itemx -mno-sim +@opindex msim +@opindex mno-sim +Use the simulator runtime. The default is to use the libgloss +board-specific runtime. -@item -misel=@var{yes/no} -This switch has been deprecated. Use @option{-misel} and -@option{-mno-isel} instead. +@item -mas100-syntax +@itemx -mno-as100-syntax +@opindex mas100-syntax +@opindex mno-as100-syntax +When generating assembler output use a syntax that is compatible with +Renesas's AS100 assembler. This syntax can also be handled by the GAS +assembler, but it has some restrictions so it is not generated by default. -@item -mspe -@itemx -mno-spe -@opindex mspe -@opindex mno-spe -This switch enables or disables the generation of SPE simd -instructions. +@item -mmax-constant-size=@var{N} +@opindex mmax-constant-size +Specifies the maximum size, in bytes, of a constant that can be used as +an operand in a RX instruction. Although the RX instruction set does +allow constants of up to 4 bytes in length to be used in instructions, +a longer value equates to a longer instruction. Thus in some +circumstances it can be beneficial to restrict the size of constants +that are used in instructions. Constants that are too big are instead +placed into a constant pool and referenced via register indirection. -@item -mpaired -@itemx -mno-paired -@opindex mpaired -@opindex mno-paired -This switch enables or disables the generation of PAIRED simd -instructions. +The value @var{N} can be between 0 and 4. A value of 0 (the default) +or 4 means that constants of any size are allowed. -@item -mspe=@var{yes/no} -This option has been deprecated. Use @option{-mspe} and -@option{-mno-spe} instead. +@item -mrelax +@opindex mrelax +Enable linker relaxation. Linker relaxation is a process whereby the +linker attempts to reduce the size of a program by finding shorter +versions of various instructions. Disabled by default. -@item -mvsx -@itemx -mno-vsx -@opindex mvsx -@opindex mno-vsx -Generate code that uses (does not use) vector/scalar (VSX) -instructions, and also enable the use of built-in functions that allow -more direct access to the VSX instruction set. +@item -mint-register=@var{N} +@opindex mint-register +Specify the number of registers to reserve for fast interrupt handler +functions. The value @var{N} can be between 0 and 4. A value of 1 +means that register @code{r13} is reserved for the exclusive use +of fast interrupt handlers. A value of 2 reserves @code{r13} and +@code{r12}. A value of 3 reserves @code{r13}, @code{r12} and +@code{r11}, and a value of 4 reserves @code{r13} through @code{r10}. +A value of 0, the default, does not reserve any registers. -@item -mcrypto -@itemx -mno-crypto -@opindex mcrypto -@opindex mno-crypto -Enable the use (disable) of the built-in functions that allow direct -access to the cryptographic instructions that were added in version -2.07 of the PowerPC ISA. +@item -msave-acc-in-interrupts +@opindex msave-acc-in-interrupts +Specifies that interrupt handler functions should preserve the +accumulator register. This is only necessary if normal code might use +the accumulator register, for example because it performs 64-bit +multiplications. The default is to ignore the accumulator as this +makes the interrupt handlers faster. -@item -mdirect-move -@itemx -mno-direct-move -@opindex mdirect-move -@opindex mno-direct-move -Generate code that uses (does not use) the instructions to move data -between the general purpose registers and the vector/scalar (VSX) -registers that were added in version 2.07 of the PowerPC ISA. +@item -mpid +@itemx -mno-pid +@opindex mpid +@opindex mno-pid +Enables the generation of position independent data. When enabled any +access to constant data is done via an offset from a base address +held in a register. This allows the location of constant data to be +determined at run time without requiring the executable to be +relocated, which is a benefit to embedded applications with tight +memory constraints. Data that can be modified is not affected by this +option. -@item -mpower8-fusion -@itemx -mno-power8-fusion -@opindex mpower8-fusion -@opindex mno-power8-fusion -Generate code that keeps (does not keeps) some integer operations -adjacent so that the instructions can be fused together on power8 and -later processors. +Note, using this feature reserves a register, usually @code{r13}, for +the constant data base address. This can result in slower and/or +larger code, especially in complicated functions. -@item -mpower8-vector -@itemx -mno-power8-vector -@opindex mpower8-vector -@opindex mno-power8-vector -Generate code that uses (does not use) the vector and scalar -instructions that were added in version 2.07 of the PowerPC ISA. Also -enable the use of built-in functions that allow more direct access to -the vector instructions. +The actual register chosen to hold the constant data base address +depends upon whether the @option{-msmall-data-limit} and/or the +@option{-mint-register} command-line options are enabled. Starting +with register @code{r13} and proceeding downwards, registers are +allocated first to satisfy the requirements of @option{-mint-register}, +then @option{-mpid} and finally @option{-msmall-data-limit}. Thus it +is possible for the small data area register to be @code{r8} if both +@option{-mint-register=4} and @option{-mpid} are specified on the +command line. -@item -mquad-memory -@itemx -mno-quad-memory -@opindex mquad-memory -@opindex mno-quad-memory -Generate code that uses (does not use) the non-atomic quad word memory -instructions. The @option{-mquad-memory} option requires use of -64-bit mode. +By default this feature is not enabled. The default can be restored +via the @option{-mno-pid} command-line option. -@item -mquad-memory-atomic -@itemx -mno-quad-memory-atomic -@opindex mquad-memory-atomic -@opindex mno-quad-memory-atomic -Generate code that uses (does not use) the atomic quad word memory -instructions. The @option{-mquad-memory-atomic} option requires use of -64-bit mode. +@item -mno-warn-multiple-fast-interrupts +@itemx -mwarn-multiple-fast-interrupts +@opindex mno-warn-multiple-fast-interrupts +@opindex mwarn-multiple-fast-interrupts +Prevents GCC from issuing a warning message if it finds more than one +fast interrupt handler when it is compiling a file. The default is to +issue a warning for each extra fast interrupt handler found, as the RX +only supports one such interrupt. -@item -mupper-regs-df -@itemx -mno-upper-regs-df -@opindex mupper-regs-df -@opindex mno-upper-regs-df -Generate code that uses (does not use) the scalar double precision -instructions that target all 64 registers in the vector/scalar -floating point register set that were added in version 2.06 of the -PowerPC ISA. The @option{-mupper-regs-df} turned on by default if you -use either of the @option{-mcpu=power7}, @option{-mcpu=power8}, or -@option{-mvsx} options. +@end table -@item -mupper-regs-sf -@itemx -mno-upper-regs-sf -@opindex mupper-regs-sf -@opindex mno-upper-regs-sf -Generate code that uses (does not use) the scalar single precision -instructions that target all 64 registers in the vector/scalar -floating point register set that were added in version 2.07 of the -PowerPC ISA. The @option{-mupper-regs-sf} turned on by default if you -use either of the @option{-mcpu=power8}, or @option{-mpower8-vector} +@emph{Note:} The generic GCC command-line option @option{-ffixed-@var{reg}} +has special significance to the RX port when used with the +@code{interrupt} function attribute. This attribute indicates a +function intended to process fast interrupts. GCC ensures +that it only uses the registers @code{r10}, @code{r11}, @code{r12} +and/or @code{r13} and only provided that the normal use of the +corresponding registers have been restricted via the +@option{-ffixed-@var{reg}} or @option{-mint-register} command-line options. -@item -mupper-regs -@itemx -mno-upper-regs -@opindex mupper-regs -@opindex mno-upper-regs -Generate code that uses (does not use) the scalar -instructions that target all 64 registers in the vector/scalar -floating point register set, depending on the model of the machine. - -If the @option{-mno-upper-regs} option is used, it turns off both -@option{-mupper-regs-sf} and @option{-mupper-regs-df} options. - -@item -mfloat-gprs=@var{yes/single/double/no} -@itemx -mfloat-gprs -@opindex mfloat-gprs -This switch enables or disables the generation of floating-point -operations on the general-purpose registers for architectures that -support it. +@node S/390 and zSeries Options +@subsection S/390 and zSeries Options +@cindex S/390 and zSeries Options -The argument @samp{yes} or @samp{single} enables the use of -single-precision floating-point operations. +These are the @samp{-m} options defined for the S/390 and zSeries architecture. -The argument @samp{double} enables the use of single and -double-precision floating-point operations. +@table @gcctabopt +@item -mhard-float +@itemx -msoft-float +@opindex mhard-float +@opindex msoft-float +Use (do not use) the hardware floating-point instructions and registers +for floating-point operations. When @option{-msoft-float} is specified, +functions in @file{libgcc.a} are used to perform floating-point +operations. When @option{-mhard-float} is specified, the compiler +generates IEEE floating-point instructions. This is the default. -The argument @samp{no} disables floating-point operations on the -general-purpose registers. +@item -mhard-dfp +@itemx -mno-hard-dfp +@opindex mhard-dfp +@opindex mno-hard-dfp +Use (do not use) the hardware decimal-floating-point instructions for +decimal-floating-point operations. When @option{-mno-hard-dfp} is +specified, functions in @file{libgcc.a} are used to perform +decimal-floating-point operations. When @option{-mhard-dfp} is +specified, the compiler generates decimal-floating-point hardware +instructions. This is the default for @option{-march=z9-ec} or higher. -This option is currently only available on the MPC854x. +@item -mlong-double-64 +@itemx -mlong-double-128 +@opindex mlong-double-64 +@opindex mlong-double-128 +These switches control the size of @code{long double} type. A size +of 64 bits makes the @code{long double} type equivalent to the @code{double} +type. This is the default. -@item -m32 -@itemx -m64 -@opindex m32 -@opindex m64 -Generate code for 32-bit or 64-bit environments of Darwin and SVR4 -targets (including GNU/Linux). The 32-bit environment sets int, long -and pointer to 32 bits and generates code that runs on any PowerPC -variant. The 64-bit environment sets int to 32 bits and long and -pointer to 64 bits, and generates code for PowerPC64, as for -@option{-mpowerpc64}. +@item -mbackchain +@itemx -mno-backchain +@opindex mbackchain +@opindex mno-backchain +Store (do not store) the address of the caller's frame as backchain pointer +into the callee's stack frame. +A backchain may be needed to allow debugging using tools that do not understand +DWARF 2 call frame information. +When @option{-mno-packed-stack} is in effect, the backchain pointer is stored +at the bottom of the stack frame; when @option{-mpacked-stack} is in effect, +the backchain is placed into the topmost word of the 96/160 byte register +save area. -@item -mfull-toc -@itemx -mno-fp-in-toc -@itemx -mno-sum-in-toc -@itemx -mminimal-toc -@opindex mfull-toc -@opindex mno-fp-in-toc -@opindex mno-sum-in-toc -@opindex mminimal-toc -Modify generation of the TOC (Table Of Contents), which is created for -every executable file. The @option{-mfull-toc} option is selected by -default. In that case, GCC allocates at least one TOC entry for -each unique non-automatic variable reference in your program. GCC -also places floating-point constants in the TOC@. However, only -16,384 entries are available in the TOC@. +In general, code compiled with @option{-mbackchain} is call-compatible with +code compiled with @option{-mmo-backchain}; however, use of the backchain +for debugging purposes usually requires that the whole binary is built with +@option{-mbackchain}. Note that the combination of @option{-mbackchain}, +@option{-mpacked-stack} and @option{-mhard-float} is not supported. In order +to build a linux kernel use @option{-msoft-float}. -If you receive a linker error message that saying you have overflowed -the available TOC space, you can reduce the amount of TOC space used -with the @option{-mno-fp-in-toc} and @option{-mno-sum-in-toc} options. -@option{-mno-fp-in-toc} prevents GCC from putting floating-point -constants in the TOC and @option{-mno-sum-in-toc} forces GCC to -generate code to calculate the sum of an address and a constant at -run time instead of putting that sum into the TOC@. You may specify one -or both of these options. Each causes GCC to produce very slightly -slower and larger code at the expense of conserving TOC space. +The default is to not maintain the backchain. -If you still run out of space in the TOC even when you specify both of -these options, specify @option{-mminimal-toc} instead. This option causes -GCC to make only one TOC entry for every file. When you specify this -option, GCC produces code that is slower and larger but which -uses extremely little TOC space. You may wish to use this option -only on files that contain less frequently-executed code. +@item -mpacked-stack +@itemx -mno-packed-stack +@opindex mpacked-stack +@opindex mno-packed-stack +Use (do not use) the packed stack layout. When @option{-mno-packed-stack} is +specified, the compiler uses the all fields of the 96/160 byte register save +area only for their default purpose; unused fields still take up stack space. +When @option{-mpacked-stack} is specified, register save slots are densely +packed at the top of the register save area; unused space is reused for other +purposes, allowing for more efficient use of the available stack space. +However, when @option{-mbackchain} is also in effect, the topmost word of +the save area is always used to store the backchain, and the return address +register is always saved two words below the backchain. -@item -maix64 -@itemx -maix32 -@opindex maix64 -@opindex maix32 -Enable 64-bit AIX ABI and calling convention: 64-bit pointers, 64-bit -@code{long} type, and the infrastructure needed to support them. -Specifying @option{-maix64} implies @option{-mpowerpc64}, -while @option{-maix32} disables the 64-bit ABI and -implies @option{-mno-powerpc64}. GCC defaults to @option{-maix32}. +As long as the stack frame backchain is not used, code generated with +@option{-mpacked-stack} is call-compatible with code generated with +@option{-mno-packed-stack}. Note that some non-FSF releases of GCC 2.95 for +S/390 or zSeries generated code that uses the stack frame backchain at run +time, not just for debugging purposes. Such code is not call-compatible +with code compiled with @option{-mpacked-stack}. Also, note that the +combination of @option{-mbackchain}, +@option{-mpacked-stack} and @option{-mhard-float} is not supported. In order +to build a linux kernel use @option{-msoft-float}. -@item -mxl-compat -@itemx -mno-xl-compat -@opindex mxl-compat -@opindex mno-xl-compat -Produce code that conforms more closely to IBM XL compiler semantics -when using AIX-compatible ABI@. Pass floating-point arguments to -prototyped functions beyond the register save area (RSA) on the stack -in addition to argument FPRs. Do not assume that most significant -double in 128-bit long double value is properly rounded when comparing -values and converting to double. Use XL symbol names for long double -support routines. +The default is to not use the packed stack layout. -The AIX calling convention was extended but not initially documented to -handle an obscure K&R C case of calling a function that takes the -address of its arguments with fewer arguments than declared. IBM XL -compilers access floating-point arguments that do not fit in the -RSA from the stack when a subroutine is compiled without -optimization. Because always storing floating-point arguments on the -stack is inefficient and rarely needed, this option is not enabled by -default and only is necessary when calling subroutines compiled by IBM -XL compilers without optimization. +@item -msmall-exec +@itemx -mno-small-exec +@opindex msmall-exec +@opindex mno-small-exec +Generate (or do not generate) code using the @code{bras} instruction +to do subroutine calls. +This only works reliably if the total executable size does not +exceed 64k. The default is to use the @code{basr} instruction instead, +which does not have this limitation. -@item -mpe -@opindex mpe -Support @dfn{IBM RS/6000 SP} @dfn{Parallel Environment} (PE)@. Link an -application written to use message passing with special startup code to -enable the application to run. The system must have PE installed in the -standard location (@file{/usr/lpp/ppe.poe/}), or the @file{specs} file -must be overridden with the @option{-specs=} option to specify the -appropriate directory location. The Parallel Environment does not -support threads, so the @option{-mpe} option and the @option{-pthread} -option are incompatible. +@item -m64 +@itemx -m31 +@opindex m64 +@opindex m31 +When @option{-m31} is specified, generate code compliant to the +GNU/Linux for S/390 ABI@. When @option{-m64} is specified, generate +code compliant to the GNU/Linux for zSeries ABI@. This allows GCC in +particular to generate 64-bit instructions. For the @samp{s390} +targets, the default is @option{-m31}, while the @samp{s390x} +targets default to @option{-m64}. -@item -malign-natural -@itemx -malign-power -@opindex malign-natural -@opindex malign-power -On AIX, 32-bit Darwin, and 64-bit PowerPC GNU/Linux, the option -@option{-malign-natural} overrides the ABI-defined alignment of larger -types, such as floating-point doubles, on their natural size-based boundary. -The option @option{-malign-power} instructs GCC to follow the ABI-specified -alignment rules. GCC defaults to the standard alignment defined in the ABI@. +@item -mzarch +@itemx -mesa +@opindex mzarch +@opindex mesa +When @option{-mzarch} is specified, generate code using the +instructions available on z/Architecture. +When @option{-mesa} is specified, generate code using the +instructions available on ESA/390. Note that @option{-mesa} is +not possible with @option{-m64}. +When generating code compliant to the GNU/Linux for S/390 ABI, +the default is @option{-mesa}. When generating code compliant +to the GNU/Linux for zSeries ABI, the default is @option{-mzarch}. -On 64-bit Darwin, natural alignment is the default, and @option{-malign-power} -is not supported. +@item -mmvcle +@itemx -mno-mvcle +@opindex mmvcle +@opindex mno-mvcle +Generate (or do not generate) code using the @code{mvcle} instruction +to perform block moves. When @option{-mno-mvcle} is specified, +use a @code{mvc} loop instead. This is the default unless optimizing for +size. -@item -msoft-float -@itemx -mhard-float -@opindex msoft-float -@opindex mhard-float -Generate code that does not use (uses) the floating-point register set. -Software floating-point emulation is provided if you use the -@option{-msoft-float} option, and pass the option to GCC when linking. +@item -mdebug +@itemx -mno-debug +@opindex mdebug +@opindex mno-debug +Print (or do not print) additional debug information when compiling. +The default is to not print debug information. -@item -msingle-float -@itemx -mdouble-float -@opindex msingle-float -@opindex mdouble-float -Generate code for single- or double-precision floating-point operations. -@option{-mdouble-float} implies @option{-msingle-float}. +@item -march=@var{cpu-type} +@opindex march +Generate code that runs on @var{cpu-type}, which is the name of a system +representing a certain processor type. Possible values for +@var{cpu-type} are @samp{g5}, @samp{g6}, @samp{z900}, @samp{z990}, +@samp{z9-109}, @samp{z9-ec} and @samp{z10}. +When generating code using the instructions available on z/Architecture, +the default is @option{-march=z900}. Otherwise, the default is +@option{-march=g5}. -@item -msimple-fpu -@opindex msimple-fpu -Do not generate @code{sqrt} and @code{div} instructions for hardware -floating-point unit. +@item -mtune=@var{cpu-type} +@opindex mtune +Tune to @var{cpu-type} everything applicable about the generated code, +except for the ABI and the set of available instructions. +The list of @var{cpu-type} values is the same as for @option{-march}. +The default is the value used for @option{-march}. -@item -mfpu=@var{name} -@opindex mfpu -Specify type of floating-point unit. Valid values for @var{name} are -@samp{sp_lite} (equivalent to @option{-msingle-float -msimple-fpu}), -@samp{dp_lite} (equivalent to @option{-mdouble-float -msimple-fpu}), -@samp{sp_full} (equivalent to @option{-msingle-float}), -and @samp{dp_full} (equivalent to @option{-mdouble-float}). +@item -mtpf-trace +@itemx -mno-tpf-trace +@opindex mtpf-trace +@opindex mno-tpf-trace +Generate code that adds (does not add) in TPF OS specific branches to trace +routines in the operating system. This option is off by default, even +when compiling for the TPF OS@. -@item -mxilinx-fpu -@opindex mxilinx-fpu -Perform optimizations for the floating-point unit on Xilinx PPC 405/440. +@item -mfused-madd +@itemx -mno-fused-madd +@opindex mfused-madd +@opindex mno-fused-madd +Generate code that uses (does not use) the floating-point multiply and +accumulate instructions. These instructions are generated by default if +hardware floating point is used. -@item -mmultiple -@itemx -mno-multiple -@opindex mmultiple -@opindex mno-multiple -Generate code that uses (does not use) the load multiple word -instructions and the store multiple word instructions. These -instructions are generated by default on POWER systems, and not -generated on PowerPC systems. Do not use @option{-mmultiple} on little-endian -PowerPC systems, since those instructions do not work when the -processor is in little-endian mode. The exceptions are PPC740 and -PPC750 which permit these instructions in little-endian mode. +@item -mwarn-framesize=@var{framesize} +@opindex mwarn-framesize +Emit a warning if the current function exceeds the given frame size. Because +this is a compile-time check it doesn't need to be a real problem when the program +runs. It is intended to identify functions that most probably cause +a stack overflow. It is useful to be used in an environment with limited stack +size e.g.@: the linux kernel. -@item -mstring -@itemx -mno-string -@opindex mstring -@opindex mno-string -Generate code that uses (does not use) the load string instructions -and the store string word instructions to save multiple registers and -do small block moves. These instructions are generated by default on -POWER systems, and not generated on PowerPC systems. Do not use -@option{-mstring} on little-endian PowerPC systems, since those -instructions do not work when the processor is in little-endian mode. -The exceptions are PPC740 and PPC750 which permit these instructions -in little-endian mode. - -@item -mupdate -@itemx -mno-update -@opindex mupdate -@opindex mno-update -Generate code that uses (does not use) the load or store instructions -that update the base register to the address of the calculated memory -location. These instructions are generated by default. If you use -@option{-mno-update}, there is a small window between the time that the -stack pointer is updated and the address of the previous frame is -stored, which means code that walks the stack frame across interrupts or -signals may get corrupted data. +@item -mwarn-dynamicstack +@opindex mwarn-dynamicstack +Emit a warning if the function calls @code{alloca} or uses dynamically-sized +arrays. This is generally a bad idea with a limited stack size. -@item -mavoid-indexed-addresses -@itemx -mno-avoid-indexed-addresses -@opindex mavoid-indexed-addresses -@opindex mno-avoid-indexed-addresses -Generate code that tries to avoid (not avoid) the use of indexed load -or store instructions. These instructions can incur a performance -penalty on Power6 processors in certain situations, such as when -stepping through large arrays that cross a 16M boundary. This option -is enabled by default when targeting Power6 and disabled otherwise. +@item -mstack-guard=@var{stack-guard} +@itemx -mstack-size=@var{stack-size} +@opindex mstack-guard +@opindex mstack-size +If these options are provided the S/390 back end emits additional instructions in +the function prologue that trigger a trap if the stack size is @var{stack-guard} +bytes above the @var{stack-size} (remember that the stack on S/390 grows downward). +If the @var{stack-guard} option is omitted the smallest power of 2 larger than +the frame size of the compiled function is chosen. +These options are intended to be used to help debugging stack overflow problems. +The additionally emitted code causes only little overhead and hence can also be +used in production-like systems without greater performance degradation. The given +values have to be exact powers of 2 and @var{stack-size} has to be greater than +@var{stack-guard} without exceeding 64k. +In order to be efficient the extra code makes the assumption that the stack starts +at an address aligned to the value given by @var{stack-size}. +The @var{stack-guard} option can only be used in conjunction with @var{stack-size}. -@item -mfused-madd -@itemx -mno-fused-madd -@opindex mfused-madd -@opindex mno-fused-madd -Generate code that uses (does not use) the floating-point multiply and -accumulate instructions. These instructions are generated by default -if hardware floating point is used. The machine-dependent -@option{-mfused-madd} option is now mapped to the machine-independent -@option{-ffp-contract=fast} option, and @option{-mno-fused-madd} is -mapped to @option{-ffp-contract=off}. +@item -mhotpatch=@var{pre-halfwords},@var{post-halfwords} +@opindex mhotpatch +If the hotpatch option is enabled, a ``hot-patching'' function +prologue is generated for all functions in the compilation unit. +The funtion label is prepended with the given number of two-byte +Nop instructions (@var{pre-halfwords}, maximum 1000000). After +the label, 2 * @var{post-halfwords} bytes are appended, using the +larges nop like instructions the architecture allows (maximum +1000000). -@item -mmulhw -@itemx -mno-mulhw -@opindex mmulhw -@opindex mno-mulhw -Generate code that uses (does not use) the half-word multiply and -multiply-accumulate instructions on the IBM 405, 440, 464 and 476 processors. -These instructions are generated by default when targeting those -processors. +If both arguments are zero, hotpatching is disabled. -@item -mdlmzb -@itemx -mno-dlmzb -@opindex mdlmzb -@opindex mno-dlmzb -Generate code that uses (does not use) the string-search @samp{dlmzb} -instruction on the IBM 405, 440, 464 and 476 processors. This instruction is -generated by default when targeting those processors. +This option can be overridden for individual functions with the +@code{hotpatch} attribute. +@end table -@item -mno-bit-align -@itemx -mbit-align -@opindex mno-bit-align -@opindex mbit-align -On System V.4 and embedded PowerPC systems do not (do) force structures -and unions that contain bit-fields to be aligned to the base type of the -bit-field. +@node Score Options +@subsection Score Options +@cindex Score Options -For example, by default a structure containing nothing but 8 -@code{unsigned} bit-fields of length 1 is aligned to a 4-byte -boundary and has a size of 4 bytes. By using @option{-mno-bit-align}, -the structure is aligned to a 1-byte boundary and is 1 byte in -size. +These options are defined for Score implementations: -@item -mno-strict-align -@itemx -mstrict-align -@opindex mno-strict-align -@opindex mstrict-align -On System V.4 and embedded PowerPC systems do not (do) assume that -unaligned memory references are handled by the system. +@table @gcctabopt +@item -meb +@opindex meb +Compile code for big-endian mode. This is the default. -@item -mrelocatable -@itemx -mno-relocatable -@opindex mrelocatable -@opindex mno-relocatable -Generate code that allows (does not allow) a static executable to be -relocated to a different address at run time. A simple embedded -PowerPC system loader should relocate the entire contents of -@code{.got2} and 4-byte locations listed in the @code{.fixup} section, -a table of 32-bit addresses generated by this option. For this to -work, all objects linked together must be compiled with -@option{-mrelocatable} or @option{-mrelocatable-lib}. -@option{-mrelocatable} code aligns the stack to an 8-byte boundary. +@item -mel +@opindex mel +Compile code for little-endian mode. -@item -mrelocatable-lib -@itemx -mno-relocatable-lib -@opindex mrelocatable-lib -@opindex mno-relocatable-lib -Like @option{-mrelocatable}, @option{-mrelocatable-lib} generates a -@code{.fixup} section to allow static executables to be relocated at -run time, but @option{-mrelocatable-lib} does not use the smaller stack -alignment of @option{-mrelocatable}. Objects compiled with -@option{-mrelocatable-lib} may be linked with objects compiled with -any combination of the @option{-mrelocatable} options. +@item -mnhwloop +@opindex mnhwloop +Disable generation of @code{bcnz} instructions. -@item -mno-toc -@itemx -mtoc -@opindex mno-toc -@opindex mtoc -On System V.4 and embedded PowerPC systems do not (do) assume that -register 2 contains a pointer to a global area pointing to the addresses -used in the program. +@item -muls +@opindex muls +Enable generation of unaligned load and store instructions. -@item -mlittle -@itemx -mlittle-endian -@opindex mlittle -@opindex mlittle-endian -On System V.4 and embedded PowerPC systems compile code for the -processor in little-endian mode. The @option{-mlittle-endian} option is -the same as @option{-mlittle}. +@item -mmac +@opindex mmac +Enable the use of multiply-accumulate instructions. Disabled by default. -@item -mbig -@itemx -mbig-endian -@opindex mbig -@opindex mbig-endian -On System V.4 and embedded PowerPC systems compile code for the -processor in big-endian mode. The @option{-mbig-endian} option is -the same as @option{-mbig}. +@item -mscore5 +@opindex mscore5 +Specify the SCORE5 as the target architecture. -@item -mdynamic-no-pic -@opindex mdynamic-no-pic -On Darwin and Mac OS X systems, compile code so that it is not -relocatable, but that its external references are relocatable. The -resulting code is suitable for applications, but not shared -libraries. +@item -mscore5u +@opindex mscore5u +Specify the SCORE5U of the target architecture. -@item -msingle-pic-base -@opindex msingle-pic-base -Treat the register used for PIC addressing as read-only, rather than -loading it in the prologue for each function. The runtime system is -responsible for initializing this register with an appropriate value -before execution begins. +@item -mscore7 +@opindex mscore7 +Specify the SCORE7 as the target architecture. This is the default. -@item -mprioritize-restricted-insns=@var{priority} -@opindex mprioritize-restricted-insns -This option controls the priority that is assigned to -dispatch-slot restricted instructions during the second scheduling -pass. The argument @var{priority} takes the value @samp{0}, @samp{1}, -or @samp{2} to assign no, highest, or second-highest (respectively) -priority to dispatch-slot restricted -instructions. +@item -mscore7d +@opindex mscore7d +Specify the SCORE7D as the target architecture. +@end table -@item -msched-costly-dep=@var{dependence_type} -@opindex msched-costly-dep -This option controls which dependences are considered costly -by the target during instruction scheduling. The argument -@var{dependence_type} takes one of the following values: +@node SH Options +@subsection SH Options -@table @asis -@item @samp{no} -No dependence is costly. +These @samp{-m} options are defined for the SH implementations: -@item @samp{all} -All dependences are costly. +@table @gcctabopt +@item -m1 +@opindex m1 +Generate code for the SH1. -@item @samp{true_store_to_load} -A true dependence from store to load is costly. +@item -m2 +@opindex m2 +Generate code for the SH2. -@item @samp{store_to_load} -Any dependence from store to load is costly. +@item -m2e +Generate code for the SH2e. -@item @var{number} -Any dependence for which the latency is greater than or equal to -@var{number} is costly. -@end table +@item -m2a-nofpu +@opindex m2a-nofpu +Generate code for the SH2a without FPU, or for a SH2a-FPU in such a way +that the floating-point unit is not used. -@item -minsert-sched-nops=@var{scheme} -@opindex minsert-sched-nops -This option controls which NOP insertion scheme is used during -the second scheduling pass. The argument @var{scheme} takes one of the -following values: +@item -m2a-single-only +@opindex m2a-single-only +Generate code for the SH2a-FPU, in such a way that no double-precision +floating-point operations are used. -@table @asis -@item @samp{no} -Don't insert NOPs. +@item -m2a-single +@opindex m2a-single +Generate code for the SH2a-FPU assuming the floating-point unit is in +single-precision mode by default. -@item @samp{pad} -Pad with NOPs any dispatch group that has vacant issue slots, -according to the scheduler's grouping. +@item -m2a +@opindex m2a +Generate code for the SH2a-FPU assuming the floating-point unit is in +double-precision mode by default. -@item @samp{regroup_exact} -Insert NOPs to force costly dependent insns into -separate groups. Insert exactly as many NOPs as needed to force an insn -to a new group, according to the estimated processor grouping. +@item -m3 +@opindex m3 +Generate code for the SH3. -@item @var{number} -Insert NOPs to force costly dependent insns into -separate groups. Insert @var{number} NOPs to force an insn to a new group. -@end table +@item -m3e +@opindex m3e +Generate code for the SH3e. -@item -mcall-sysv -@opindex mcall-sysv -On System V.4 and embedded PowerPC systems compile code using calling -conventions that adhere to the March 1995 draft of the System V -Application Binary Interface, PowerPC processor supplement. This is the -default unless you configured GCC using @samp{powerpc-*-eabiaix}. +@item -m4-nofpu +@opindex m4-nofpu +Generate code for the SH4 without a floating-point unit. -@item -mcall-sysv-eabi -@itemx -mcall-eabi -@opindex mcall-sysv-eabi -@opindex mcall-eabi -Specify both @option{-mcall-sysv} and @option{-meabi} options. +@item -m4-single-only +@opindex m4-single-only +Generate code for the SH4 with a floating-point unit that only +supports single-precision arithmetic. -@item -mcall-sysv-noeabi -@opindex mcall-sysv-noeabi -Specify both @option{-mcall-sysv} and @option{-mno-eabi} options. +@item -m4-single +@opindex m4-single +Generate code for the SH4 assuming the floating-point unit is in +single-precision mode by default. -@item -mcall-aixdesc -@opindex m -On System V.4 and embedded PowerPC systems compile code for the AIX -operating system. +@item -m4 +@opindex m4 +Generate code for the SH4. -@item -mcall-linux -@opindex mcall-linux -On System V.4 and embedded PowerPC systems compile code for the -Linux-based GNU system. +@item -m4-100 +@opindex m4-100 +Generate code for SH4-100. -@item -mcall-freebsd -@opindex mcall-freebsd -On System V.4 and embedded PowerPC systems compile code for the -FreeBSD operating system. +@item -m4-100-nofpu +@opindex m4-100-nofpu +Generate code for SH4-100 in such a way that the +floating-point unit is not used. -@item -mcall-netbsd -@opindex mcall-netbsd -On System V.4 and embedded PowerPC systems compile code for the -NetBSD operating system. +@item -m4-100-single +@opindex m4-100-single +Generate code for SH4-100 assuming the floating-point unit is in +single-precision mode by default. -@item -mcall-openbsd -@opindex mcall-netbsd -On System V.4 and embedded PowerPC systems compile code for the -OpenBSD operating system. +@item -m4-100-single-only +@opindex m4-100-single-only +Generate code for SH4-100 in such a way that no double-precision +floating-point operations are used. -@item -maix-struct-return -@opindex maix-struct-return -Return all structures in memory (as specified by the AIX ABI)@. +@item -m4-200 +@opindex m4-200 +Generate code for SH4-200. -@item -msvr4-struct-return -@opindex msvr4-struct-return -Return structures smaller than 8 bytes in registers (as specified by the -SVR4 ABI)@. +@item -m4-200-nofpu +@opindex m4-200-nofpu +Generate code for SH4-200 without in such a way that the +floating-point unit is not used. -@item -mabi=@var{abi-type} -@opindex mabi -Extend the current ABI with a particular extension, or remove such extension. -Valid values are @samp{altivec}, @samp{no-altivec}, @samp{spe}, -@samp{no-spe}, @samp{ibmlongdouble}, @samp{ieeelongdouble}, -@samp{elfv1}, @samp{elfv2}@. +@item -m4-200-single +@opindex m4-200-single +Generate code for SH4-200 assuming the floating-point unit is in +single-precision mode by default. -@item -mabi=spe -@opindex mabi=spe -Extend the current ABI with SPE ABI extensions. This does not change -the default ABI, instead it adds the SPE ABI extensions to the current -ABI@. +@item -m4-200-single-only +@opindex m4-200-single-only +Generate code for SH4-200 in such a way that no double-precision +floating-point operations are used. -@item -mabi=no-spe -@opindex mabi=no-spe -Disable Book-E SPE ABI extensions for the current ABI@. +@item -m4-300 +@opindex m4-300 +Generate code for SH4-300. -@item -mabi=ibmlongdouble -@opindex mabi=ibmlongdouble -Change the current ABI to use IBM extended-precision long double. -This is a PowerPC 32-bit SYSV ABI option. +@item -m4-300-nofpu +@opindex m4-300-nofpu +Generate code for SH4-300 without in such a way that the +floating-point unit is not used. -@item -mabi=ieeelongdouble -@opindex mabi=ieeelongdouble -Change the current ABI to use IEEE extended-precision long double. -This is a PowerPC 32-bit Linux ABI option. +@item -m4-300-single +@opindex m4-300-single +Generate code for SH4-300 in such a way that no double-precision +floating-point operations are used. -@item -mabi=elfv1 -@opindex mabi=elfv1 -Change the current ABI to use the ELFv1 ABI. -This is the default ABI for big-endian PowerPC 64-bit Linux. -Overriding the default ABI requires special system support and is -likely to fail in spectacular ways. +@item -m4-300-single-only +@opindex m4-300-single-only +Generate code for SH4-300 in such a way that no double-precision +floating-point operations are used. -@item -mabi=elfv2 -@opindex mabi=elfv2 -Change the current ABI to use the ELFv2 ABI. -This is the default ABI for little-endian PowerPC 64-bit Linux. -Overriding the default ABI requires special system support and is -likely to fail in spectacular ways. +@item -m4-340 +@opindex m4-340 +Generate code for SH4-340 (no MMU, no FPU). -@item -mprototype -@itemx -mno-prototype -@opindex mprototype -@opindex mno-prototype -On System V.4 and embedded PowerPC systems assume that all calls to -variable argument functions are properly prototyped. Otherwise, the -compiler must insert an instruction before every non-prototyped call to -set or clear bit 6 of the condition code register (@code{CR}) to -indicate whether floating-point values are passed in the floating-point -registers in case the function takes variable arguments. With -@option{-mprototype}, only calls to prototyped variable argument functions -set or clear the bit. +@item -m4-500 +@opindex m4-500 +Generate code for SH4-500 (no FPU). Passes @option{-isa=sh4-nofpu} to the +assembler. -@item -msim -@opindex msim -On embedded PowerPC systems, assume that the startup module is called -@file{sim-crt0.o} and that the standard C libraries are @file{libsim.a} and -@file{libc.a}. This is the default for @samp{powerpc-*-eabisim} -configurations. +@item -m4a-nofpu +@opindex m4a-nofpu +Generate code for the SH4al-dsp, or for a SH4a in such a way that the +floating-point unit is not used. -@item -mmvme -@opindex mmvme -On embedded PowerPC systems, assume that the startup module is called -@file{crt0.o} and the standard C libraries are @file{libmvme.a} and -@file{libc.a}. +@item -m4a-single-only +@opindex m4a-single-only +Generate code for the SH4a, in such a way that no double-precision +floating-point operations are used. -@item -mads -@opindex mads -On embedded PowerPC systems, assume that the startup module is called -@file{crt0.o} and the standard C libraries are @file{libads.a} and -@file{libc.a}. +@item -m4a-single +@opindex m4a-single +Generate code for the SH4a assuming the floating-point unit is in +single-precision mode by default. -@item -myellowknife -@opindex myellowknife -On embedded PowerPC systems, assume that the startup module is called -@file{crt0.o} and the standard C libraries are @file{libyk.a} and -@file{libc.a}. +@item -m4a +@opindex m4a +Generate code for the SH4a. -@item -mvxworks -@opindex mvxworks -On System V.4 and embedded PowerPC systems, specify that you are -compiling for a VxWorks system. +@item -m4al +@opindex m4al +Same as @option{-m4a-nofpu}, except that it implicitly passes +@option{-dsp} to the assembler. GCC doesn't generate any DSP +instructions at the moment. -@item -memb -@opindex memb -On embedded PowerPC systems, set the @code{PPC_EMB} bit in the ELF flags -header to indicate that @samp{eabi} extended relocations are used. +@item -m5-32media +@opindex m5-32media +Generate 32-bit code for SHmedia. -@item -meabi -@itemx -mno-eabi -@opindex meabi -@opindex mno-eabi -On System V.4 and embedded PowerPC systems do (do not) adhere to the -Embedded Applications Binary Interface (EABI), which is a set of -modifications to the System V.4 specifications. Selecting @option{-meabi} -means that the stack is aligned to an 8-byte boundary, a function -@code{__eabi} is called from @code{main} to set up the EABI -environment, and the @option{-msdata} option can use both @code{r2} and -@code{r13} to point to two separate small data areas. Selecting -@option{-mno-eabi} means that the stack is aligned to a 16-byte boundary, -no EABI initialization function is called from @code{main}, and the -@option{-msdata} option only uses @code{r13} to point to a single -small data area. The @option{-meabi} option is on by default if you -configured GCC using one of the @samp{powerpc*-*-eabi*} options. +@item -m5-32media-nofpu +@opindex m5-32media-nofpu +Generate 32-bit code for SHmedia in such a way that the +floating-point unit is not used. -@item -msdata=eabi -@opindex msdata=eabi -On System V.4 and embedded PowerPC systems, put small initialized -@code{const} global and static data in the @code{.sdata2} section, which -is pointed to by register @code{r2}. Put small initialized -non-@code{const} global and static data in the @code{.sdata} section, -which is pointed to by register @code{r13}. Put small uninitialized -global and static data in the @code{.sbss} section, which is adjacent to -the @code{.sdata} section. The @option{-msdata=eabi} option is -incompatible with the @option{-mrelocatable} option. The -@option{-msdata=eabi} option also sets the @option{-memb} option. +@item -m5-64media +@opindex m5-64media +Generate 64-bit code for SHmedia. -@item -msdata=sysv -@opindex msdata=sysv -On System V.4 and embedded PowerPC systems, put small global and static -data in the @code{.sdata} section, which is pointed to by register -@code{r13}. Put small uninitialized global and static data in the -@code{.sbss} section, which is adjacent to the @code{.sdata} section. -The @option{-msdata=sysv} option is incompatible with the -@option{-mrelocatable} option. - -@item -msdata=default -@itemx -msdata -@opindex msdata=default -@opindex msdata -On System V.4 and embedded PowerPC systems, if @option{-meabi} is used, -compile code the same as @option{-msdata=eabi}, otherwise compile code the -same as @option{-msdata=sysv}. +@item -m5-64media-nofpu +@opindex m5-64media-nofpu +Generate 64-bit code for SHmedia in such a way that the +floating-point unit is not used. -@item -msdata=data -@opindex msdata=data -On System V.4 and embedded PowerPC systems, put small global -data in the @code{.sdata} section. Put small uninitialized global -data in the @code{.sbss} section. Do not use register @code{r13} -to address small data however. This is the default behavior unless -other @option{-msdata} options are used. +@item -m5-compact +@opindex m5-compact +Generate code for SHcompact. -@item -msdata=none -@itemx -mno-sdata -@opindex msdata=none -@opindex mno-sdata -On embedded PowerPC systems, put all initialized global and static data -in the @code{.data} section, and all uninitialized data in the -@code{.bss} section. +@item -m5-compact-nofpu +@opindex m5-compact-nofpu +Generate code for SHcompact in such a way that the +floating-point unit is not used. -@item -mblock-move-inline-limit=@var{num} -@opindex mblock-move-inline-limit -Inline all block moves (such as calls to @code{memcpy} or structure -copies) less than or equal to @var{num} bytes. The minimum value for -@var{num} is 32 bytes on 32-bit targets and 64 bytes on 64-bit -targets. The default value is target-specific. +@item -mb +@opindex mb +Compile code for the processor in big-endian mode. -@item -G @var{num} -@opindex G -@cindex smaller data references (PowerPC) -@cindex .sdata/.sdata2 references (PowerPC) -On embedded PowerPC systems, put global and static items less than or -equal to @var{num} bytes into the small data or BSS sections instead of -the normal data or BSS section. By default, @var{num} is 8. The -@option{-G @var{num}} switch is also passed to the linker. -All modules should be compiled with the same @option{-G @var{num}} value. +@item -ml +@opindex ml +Compile code for the processor in little-endian mode. -@item -mregnames -@itemx -mno-regnames -@opindex mregnames -@opindex mno-regnames -On System V.4 and embedded PowerPC systems do (do not) emit register -names in the assembly language output using symbolic forms. +@item -mdalign +@opindex mdalign +Align doubles at 64-bit boundaries. Note that this changes the calling +conventions, and thus some functions from the standard C library do +not work unless you recompile it first with @option{-mdalign}. -@item -mlongcall -@itemx -mno-longcall -@opindex mlongcall -@opindex mno-longcall -By default assume that all calls are far away so that a longer and more -expensive calling sequence is required. This is required for calls -farther than 32 megabytes (33,554,432 bytes) from the current location. -A short call is generated if the compiler knows -the call cannot be that far away. This setting can be overridden by -the @code{shortcall} function attribute, or by @code{#pragma -longcall(0)}. +@item -mrelax +@opindex mrelax +Shorten some address references at link time, when possible; uses the +linker option @option{-relax}. -Some linkers are capable of detecting out-of-range calls and generating -glue code on the fly. On these systems, long calls are unnecessary and -generate slower code. As of this writing, the AIX linker can do this, -as can the GNU linker for PowerPC/64. It is planned to add this feature -to the GNU linker for 32-bit PowerPC systems as well. +@item -mbigtable +@opindex mbigtable +Use 32-bit offsets in @code{switch} tables. The default is to use +16-bit offsets. -On Darwin/PPC systems, @code{#pragma longcall} generates @code{jbsr -callee, L42}, plus a @dfn{branch island} (glue code). The two target -addresses represent the callee and the branch island. The -Darwin/PPC linker prefers the first address and generates a @code{bl -callee} if the PPC @code{bl} instruction reaches the callee directly; -otherwise, the linker generates @code{bl L42} to call the branch -island. The branch island is appended to the body of the -calling function; it computes the full 32-bit address of the callee -and jumps to it. +@item -mbitops +@opindex mbitops +Enable the use of bit manipulation instructions on SH2A. -On Mach-O (Darwin) systems, this option directs the compiler emit to -the glue for every direct call, and the Darwin linker decides whether -to use or discard it. +@item -mfmovd +@opindex mfmovd +Enable the use of the instruction @code{fmovd}. Check @option{-mdalign} for +alignment constraints. -In the future, GCC may ignore all longcall specifications -when the linker is known to generate glue. +@item -mrenesas +@opindex mrenesas +Comply with the calling conventions defined by Renesas. -@item -mtls-markers -@itemx -mno-tls-markers -@opindex mtls-markers -@opindex mno-tls-markers -Mark (do not mark) calls to @code{__tls_get_addr} with a relocation -specifying the function argument. The relocation allows the linker to -reliably associate function call with argument setup instructions for -TLS optimization, which in turn allows GCC to better schedule the -sequence. +@item -mno-renesas +@opindex mno-renesas +Comply with the calling conventions defined for GCC before the Renesas +conventions were available. This option is the default for all +targets of the SH toolchain. -@item -pthread -@opindex pthread -Adds support for multithreading with the @dfn{pthreads} library. -This option sets flags for both the preprocessor and linker. +@item -mnomacsave +@opindex mnomacsave +Mark the @code{MAC} register as call-clobbered, even if +@option{-mrenesas} is given. -@item -mrecip -@itemx -mno-recip -@opindex mrecip -This option enables use of the reciprocal estimate and -reciprocal square root estimate instructions with additional -Newton-Raphson steps to increase precision instead of doing a divide or -square root and divide for floating-point arguments. You should use -the @option{-ffast-math} option when using @option{-mrecip} (or at -least @option{-funsafe-math-optimizations}, -@option{-finite-math-only}, @option{-freciprocal-math} and -@option{-fno-trapping-math}). Note that while the throughput of the -sequence is generally higher than the throughput of the non-reciprocal -instruction, the precision of the sequence can be decreased by up to 2 -ulp (i.e.@: the inverse of 1.0 equals 0.99999994) for reciprocal square -roots. +@item -mieee +@itemx -mno-ieee +@opindex mieee +@opindex mno-ieee +Control the IEEE compliance of floating-point comparisons, which affects the +handling of cases where the result of a comparison is unordered. By default +@option{-mieee} is implicitly enabled. If @option{-ffinite-math-only} is +enabled @option{-mno-ieee} is implicitly set, which results in faster +floating-point greater-equal and less-equal comparisons. The implcit settings +can be overridden by specifying either @option{-mieee} or @option{-mno-ieee}. -@item -mrecip=@var{opt} -@opindex mrecip=opt -This option controls which reciprocal estimate instructions -may be used. @var{opt} is a comma-separated list of options, which may -be preceded by a @code{!} to invert the option: +@item -minline-ic_invalidate +@opindex minline-ic_invalidate +Inline code to invalidate instruction cache entries after setting up +nested function trampolines. +This option has no effect if @option{-musermode} is in effect and the selected +code generation option (e.g. @option{-m4}) does not allow the use of the @code{icbi} +instruction. +If the selected code generation option does not allow the use of the @code{icbi} +instruction, and @option{-musermode} is not in effect, the inlined code +manipulates the instruction cache address array directly with an associative +write. This not only requires privileged mode at run time, but it also +fails if the cache line had been mapped via the TLB and has become unmapped. -@table @samp +@item -misize +@opindex misize +Dump instruction size and location in the assembly code. -@item all -Enable all estimate instructions. +@item -mpadstruct +@opindex mpadstruct +This option is deprecated. It pads structures to multiple of 4 bytes, +which is incompatible with the SH ABI@. -@item default -Enable the default instructions, equivalent to @option{-mrecip}. +@item -matomic-model=@var{model} +@opindex matomic-model=@var{model} +Sets the model of atomic operations and additional parameters as a comma +separated list. For details on the atomic built-in functions see +@ref{__atomic Builtins}. The following models and parameters are supported: -@item none -Disable all estimate instructions, equivalent to @option{-mno-recip}. +@table @samp -@item div -Enable the reciprocal approximation instructions for both -single and double precision. +@item none +Disable compiler generated atomic sequences and emit library calls for atomic +operations. This is the default if the target is not @code{sh*-*-linux*}. -@item divf -Enable the single-precision reciprocal approximation instructions. +@item soft-gusa +Generate GNU/Linux compatible gUSA software atomic sequences for the atomic +built-in functions. The generated atomic sequences require additional support +from the interrupt/exception handling code of the system and are only suitable +for SH3* and SH4* single-core systems. This option is enabled by default when +the target is @code{sh*-*-linux*} and SH3* or SH4*. When the target is SH4A, +this option also partially utilizes the hardware atomic instructions +@code{movli.l} and @code{movco.l} to create more efficient code, unless +@samp{strict} is specified. -@item divd -Enable the double-precision reciprocal approximation instructions. +@item soft-tcb +Generate software atomic sequences that use a variable in the thread control +block. This is a variation of the gUSA sequences which can also be used on +SH1* and SH2* targets. The generated atomic sequences require additional +support from the interrupt/exception handling code of the system and are only +suitable for single-core systems. When using this model, the @samp{gbr-offset=} +parameter has to be specified as well. -@item rsqrt -Enable the reciprocal square root approximation instructions for both -single and double precision. +@item soft-imask +Generate software atomic sequences that temporarily disable interrupts by +setting @code{SR.IMASK = 1111}. This model works only when the program runs +in privileged mode and is only suitable for single-core systems. Additional +support from the interrupt/exception handling code of the system is not +required. This model is enabled by default when the target is +@code{sh*-*-linux*} and SH1* or SH2*. -@item rsqrtf -Enable the single-precision reciprocal square root approximation instructions. +@item hard-llcs +Generate hardware atomic sequences using the @code{movli.l} and @code{movco.l} +instructions only. This is only available on SH4A and is suitable for +multi-core systems. Since the hardware instructions support only 32 bit atomic +variables access to 8 or 16 bit variables is emulated with 32 bit accesses. +Code compiled with this option is also compatible with other software +atomic model interrupt/exception handling systems if executed on an SH4A +system. Additional support from the interrupt/exception handling code of the +system is not required for this model. -@item rsqrtd -Enable the double-precision reciprocal square root approximation instructions. +@item gbr-offset= +This parameter specifies the offset in bytes of the variable in the thread +control block structure that should be used by the generated atomic sequences +when the @samp{soft-tcb} model has been selected. For other models this +parameter is ignored. The specified value must be an integer multiple of four +and in the range 0-1020. + +@item strict +This parameter prevents mixed usage of multiple atomic models, even if they +are compatible, and makes the compiler generate atomic sequences of the +specified model only. @end table -So, for example, @option{-mrecip=all,!rsqrtd} enables -all of the reciprocal estimate instructions, except for the -@code{FRSQRTE}, @code{XSRSQRTEDP}, and @code{XVRSQRTEDP} instructions -which handle the double-precision reciprocal square root calculations. +@item -mtas +@opindex mtas +Generate the @code{tas.b} opcode for @code{__atomic_test_and_set}. +Notice that depending on the particular hardware and software configuration +this can degrade overall performance due to the operand cache line flushes +that are implied by the @code{tas.b} instruction. On multi-core SH4A +processors the @code{tas.b} instruction must be used with caution since it +can result in data corruption for certain cache configurations. -@item -mrecip-precision -@itemx -mno-recip-precision -@opindex mrecip-precision -Assume (do not assume) that the reciprocal estimate instructions -provide higher-precision estimates than is mandated by the PowerPC -ABI. Selecting @option{-mcpu=power6}, @option{-mcpu=power7} or -@option{-mcpu=power8} automatically selects @option{-mrecip-precision}. -The double-precision square root estimate instructions are not generated by -default on low-precision machines, since they do not provide an -estimate that converges after three steps. +@item -mprefergot +@opindex mprefergot +When generating position-independent code, emit function calls using +the Global Offset Table instead of the Procedure Linkage Table. -@item -mveclibabi=@var{type} -@opindex mveclibabi -Specifies the ABI type to use for vectorizing intrinsics using an -external library. The only type supported at present is @samp{mass}, -which specifies to use IBM's Mathematical Acceleration Subsystem -(MASS) libraries for vectorizing intrinsics using external libraries. -GCC currently emits calls to @code{acosd2}, @code{acosf4}, -@code{acoshd2}, @code{acoshf4}, @code{asind2}, @code{asinf4}, -@code{asinhd2}, @code{asinhf4}, @code{atan2d2}, @code{atan2f4}, -@code{atand2}, @code{atanf4}, @code{atanhd2}, @code{atanhf4}, -@code{cbrtd2}, @code{cbrtf4}, @code{cosd2}, @code{cosf4}, -@code{coshd2}, @code{coshf4}, @code{erfcd2}, @code{erfcf4}, -@code{erfd2}, @code{erff4}, @code{exp2d2}, @code{exp2f4}, -@code{expd2}, @code{expf4}, @code{expm1d2}, @code{expm1f4}, -@code{hypotd2}, @code{hypotf4}, @code{lgammad2}, @code{lgammaf4}, -@code{log10d2}, @code{log10f4}, @code{log1pd2}, @code{log1pf4}, -@code{log2d2}, @code{log2f4}, @code{logd2}, @code{logf4}, -@code{powd2}, @code{powf4}, @code{sind2}, @code{sinf4}, @code{sinhd2}, -@code{sinhf4}, @code{sqrtd2}, @code{sqrtf4}, @code{tand2}, -@code{tanf4}, @code{tanhd2}, and @code{tanhf4} when generating code -for power7. Both @option{-ftree-vectorize} and -@option{-funsafe-math-optimizations} must also be enabled. The MASS -libraries must be specified at link time. +@item -musermode +@itemx -mno-usermode +@opindex musermode +@opindex mno-usermode +Don't allow (allow) the compiler generating privileged mode code. Specifying +@option{-musermode} also implies @option{-mno-inline-ic_invalidate} if the +inlined code would not work in user mode. @option{-musermode} is the default +when the target is @code{sh*-*-linux*}. If the target is SH1* or SH2* +@option{-musermode} has no effect, since there is no user mode. -@item -mfriz -@itemx -mno-friz -@opindex mfriz -Generate (do not generate) the @code{friz} instruction when the -@option{-funsafe-math-optimizations} option is used to optimize -rounding of floating-point values to 64-bit integer and back to floating -point. The @code{friz} instruction does not return the same value if -the floating-point number is too large to fit in an integer. +@item -multcost=@var{number} +@opindex multcost=@var{number} +Set the cost to assume for a multiply insn. -@item -mpointers-to-nested-functions -@itemx -mno-pointers-to-nested-functions -@opindex mpointers-to-nested-functions -Generate (do not generate) code to load up the static chain register -(@code{r11}) when calling through a pointer on AIX and 64-bit Linux -systems where a function pointer points to a 3-word descriptor giving -the function address, TOC value to be loaded in register @code{r2}, and -static chain value to be loaded in register @code{r11}. The -@option{-mpointers-to-nested-functions} is on by default. You cannot -call through pointers to nested functions or pointers -to functions compiled in other languages that use the static chain if -you use the @option{-mno-pointers-to-nested-functions}. +@item -mdiv=@var{strategy} +@opindex mdiv=@var{strategy} +Set the division strategy to be used for integer division operations. +For SHmedia @var{strategy} can be one of: -@item -msave-toc-indirect -@itemx -mno-save-toc-indirect -@opindex msave-toc-indirect -Generate (do not generate) code to save the TOC value in the reserved -stack location in the function prologue if the function calls through -a pointer on AIX and 64-bit Linux systems. If the TOC value is not -saved in the prologue, it is saved just before the call through the -pointer. The @option{-mno-save-toc-indirect} option is the default. +@table @samp -@item -mcompat-align-parm -@itemx -mno-compat-align-parm -@opindex mcompat-align-parm -Generate (do not generate) code to pass structure parameters with a -maximum alignment of 64 bits, for compatibility with older versions -of GCC. +@item fp +Performs the operation in floating point. This has a very high latency, +but needs only a few instructions, so it might be a good choice if +your code has enough easily-exploitable ILP to allow the compiler to +schedule the floating-point instructions together with other instructions. +Division by zero causes a floating-point exception. -Older versions of GCC (prior to 4.9.0) incorrectly did not align a -structure parameter on a 128-bit boundary when that structure contained -a member requiring 128-bit alignment. This is corrected in more -recent versions of GCC. This option may be used to generate code -that is compatible with functions compiled with older versions of -GCC. +@item inv +Uses integer operations to calculate the inverse of the divisor, +and then multiplies the dividend with the inverse. This strategy allows +CSE and hoisting of the inverse calculation. Division by zero calculates +an unspecified result, but does not trap. -The @option{-mno-compat-align-parm} option is the default. -@end table +@item inv:minlat +A variant of @samp{inv} where, if no CSE or hoisting opportunities +have been found, or if the entire operation has been hoisted to the same +place, the last stages of the inverse calculation are intertwined with the +final multiply to reduce the overall latency, at the expense of using a few +more instructions, and thus offering fewer scheduling opportunities with +other code. -@node RX Options -@subsection RX Options -@cindex RX Options +@item call +Calls a library function that usually implements the @samp{inv:minlat} +strategy. +This gives high code density for @code{m5-*media-nofpu} compilations. -These command-line options are defined for RX targets: +@item call2 +Uses a different entry point of the same library function, where it +assumes that a pointer to a lookup table has already been set up, which +exposes the pointer load to CSE and code hoisting optimizations. -@table @gcctabopt -@item -m64bit-doubles -@itemx -m32bit-doubles -@opindex m64bit-doubles -@opindex m32bit-doubles -Make the @code{double} data type be 64 bits (@option{-m64bit-doubles}) -or 32 bits (@option{-m32bit-doubles}) in size. The default is -@option{-m32bit-doubles}. @emph{Note} RX floating-point hardware only -works on 32-bit values, which is why the default is -@option{-m32bit-doubles}. +@item inv:call +@itemx inv:call2 +@itemx inv:fp +Use the @samp{inv} algorithm for initial +code generation, but if the code stays unoptimized, revert to the @samp{call}, +@samp{call2}, or @samp{fp} strategies, respectively. Note that the +potentially-trapping side effect of division by zero is carried by a +separate instruction, so it is possible that all the integer instructions +are hoisted out, but the marker for the side effect stays where it is. +A recombination to floating-point operations or a call is not possible +in that case. -@item -fpu -@itemx -nofpu -@opindex fpu -@opindex nofpu -Enables (@option{-fpu}) or disables (@option{-nofpu}) the use of RX -floating-point hardware. The default is enabled for the RX600 -series and disabled for the RX200 series. +@item inv20u +@itemx inv20l +Variants of the @samp{inv:minlat} strategy. In the case +that the inverse calculation is not separated from the multiply, they speed +up division where the dividend fits into 20 bits (plus sign where applicable) +by inserting a test to skip a number of operations in this case; this test +slows down the case of larger dividends. @samp{inv20u} assumes the case of a such +a small dividend to be unlikely, and @samp{inv20l} assumes it to be likely. -Floating-point instructions are only generated for 32-bit floating-point -values, however, so the FPU hardware is not used for doubles if the -@option{-m64bit-doubles} option is used. +@end table -@emph{Note} If the @option{-fpu} option is enabled then -@option{-funsafe-math-optimizations} is also enabled automatically. -This is because the RX FPU instructions are themselves unsafe. +For targets other than SHmedia @var{strategy} can be one of: -@item -mcpu=@var{name} -@opindex mcpu -Selects the type of RX CPU to be targeted. Currently three types are -supported, the generic @samp{RX600} and @samp{RX200} series hardware and -the specific @samp{RX610} CPU. The default is @samp{RX600}. +@table @samp -The only difference between @samp{RX600} and @samp{RX610} is that the -@samp{RX610} does not support the @code{MVTIPL} instruction. +@item call-div1 +Calls a library function that uses the single-step division instruction +@code{div1} to perform the operation. Division by zero calculates an +unspecified result and does not trap. This is the default except for SH4, +SH2A and SHcompact. -The @samp{RX200} series does not have a hardware floating-point unit -and so @option{-nofpu} is enabled by default when this type is -selected. +@item call-fp +Calls a library function that performs the operation in double precision +floating point. Division by zero causes a floating-point exception. This is +the default for SHcompact with FPU. Specifying this for targets that do not +have a double precision FPU defaults to @code{call-div1}. -@item -mbig-endian-data -@itemx -mlittle-endian-data -@opindex mbig-endian-data -@opindex mlittle-endian-data -Store data (but not code) in the big-endian format. The default is -@option{-mlittle-endian-data}, i.e.@: to store data in the little-endian -format. +@item call-table +Calls a library function that uses a lookup table for small divisors and +the @code{div1} instruction with case distinction for larger divisors. Division +by zero calculates an unspecified result and does not trap. This is the default +for SH4. Specifying this for targets that do not have dynamic shift +instructions defaults to @code{call-div1}. -@item -msmall-data-limit=@var{N} -@opindex msmall-data-limit -Specifies the maximum size in bytes of global and static variables -which can be placed into the small data area. Using the small data -area can lead to smaller and faster code, but the size of area is -limited and it is up to the programmer to ensure that the area does -not overflow. Also when the small data area is used one of the RX's -registers (usually @code{r13}) is reserved for use pointing to this -area, so it is no longer available for use by the compiler. This -could result in slower and/or larger code if variables are pushed onto -the stack instead of being held in this register. +@end table -Note, common variables (variables that have not been initialized) and -constants are not placed into the small data area as they are assigned -to other sections in the output executable. +When a division strategy has not been specified the default strategy is +selected based on the current target. For SH2A the default strategy is to +use the @code{divs} and @code{divu} instructions instead of library function +calls. -The default value is zero, which disables this feature. Note, this -feature is not enabled by default with higher optimization levels -(@option{-O2} etc) because of the potentially detrimental effects of -reserving a register. It is up to the programmer to experiment and -discover whether this feature is of benefit to their program. See the -description of the @option{-mpid} option for a description of how the -actual register to hold the small data area pointer is chosen. +@item -maccumulate-outgoing-args +@opindex maccumulate-outgoing-args +Reserve space once for outgoing arguments in the function prologue rather +than around each call. Generally beneficial for performance and size. Also +needed for unwinding to avoid changing the stack frame around conditional code. -@item -msim -@itemx -mno-sim -@opindex msim -@opindex mno-sim -Use the simulator runtime. The default is to use the libgloss -board-specific runtime. +@item -mdivsi3_libfunc=@var{name} +@opindex mdivsi3_libfunc=@var{name} +Set the name of the library function used for 32-bit signed division to +@var{name}. +This only affects the name used in the @samp{call} and @samp{inv:call} +division strategies, and the compiler still expects the same +sets of input/output/clobbered registers as if this option were not present. -@item -mas100-syntax -@itemx -mno-as100-syntax -@opindex mas100-syntax -@opindex mno-as100-syntax -When generating assembler output use a syntax that is compatible with -Renesas's AS100 assembler. This syntax can also be handled by the GAS -assembler, but it has some restrictions so it is not generated by default. +@item -mfixed-range=@var{register-range} +@opindex mfixed-range +Generate code treating the given register range as fixed registers. +A fixed register is one that the register allocator can not use. This is +useful when compiling kernel code. A register range is specified as +two registers separated by a dash. Multiple register ranges can be +specified separated by a comma. -@item -mmax-constant-size=@var{N} -@opindex mmax-constant-size -Specifies the maximum size, in bytes, of a constant that can be used as -an operand in a RX instruction. Although the RX instruction set does -allow constants of up to 4 bytes in length to be used in instructions, -a longer value equates to a longer instruction. Thus in some -circumstances it can be beneficial to restrict the size of constants -that are used in instructions. Constants that are too big are instead -placed into a constant pool and referenced via register indirection. +@item -mindexed-addressing +@opindex mindexed-addressing +Enable the use of the indexed addressing mode for SHmedia32/SHcompact. +This is only safe if the hardware and/or OS implement 32-bit wrap-around +semantics for the indexed addressing mode. The architecture allows the +implementation of processors with 64-bit MMU, which the OS could use to +get 32-bit addressing, but since no current hardware implementation supports +this or any other way to make the indexed addressing mode safe to use in +the 32-bit ABI, the default is @option{-mno-indexed-addressing}. -The value @var{N} can be between 0 and 4. A value of 0 (the default) -or 4 means that constants of any size are allowed. +@item -mgettrcost=@var{number} +@opindex mgettrcost=@var{number} +Set the cost assumed for the @code{gettr} instruction to @var{number}. +The default is 2 if @option{-mpt-fixed} is in effect, 100 otherwise. -@item -mrelax -@opindex mrelax -Enable linker relaxation. Linker relaxation is a process whereby the -linker attempts to reduce the size of a program by finding shorter -versions of various instructions. Disabled by default. +@item -mpt-fixed +@opindex mpt-fixed +Assume @code{pt*} instructions won't trap. This generally generates +better-scheduled code, but is unsafe on current hardware. +The current architecture +definition says that @code{ptabs} and @code{ptrel} trap when the target +anded with 3 is 3. +This has the unintentional effect of making it unsafe to schedule these +instructions before a branch, or hoist them out of a loop. For example, +@code{__do_global_ctors}, a part of @file{libgcc} +that runs constructors at program +startup, calls functions in a list which is delimited by @minus{}1. With the +@option{-mpt-fixed} option, the @code{ptabs} is done before testing against @minus{}1. +That means that all the constructors run a bit more quickly, but when +the loop comes to the end of the list, the program crashes because @code{ptabs} +loads @minus{}1 into a target register. -@item -mint-register=@var{N} -@opindex mint-register -Specify the number of registers to reserve for fast interrupt handler -functions. The value @var{N} can be between 0 and 4. A value of 1 -means that register @code{r13} is reserved for the exclusive use -of fast interrupt handlers. A value of 2 reserves @code{r13} and -@code{r12}. A value of 3 reserves @code{r13}, @code{r12} and -@code{r11}, and a value of 4 reserves @code{r13} through @code{r10}. -A value of 0, the default, does not reserve any registers. +Since this option is unsafe for any +hardware implementing the current architecture specification, the default +is @option{-mno-pt-fixed}. Unless specified explicitly with +@option{-mgettrcost}, @option{-mno-pt-fixed} also implies @option{-mgettrcost=100}; +this deters register allocation from using target registers for storing +ordinary integers. -@item -msave-acc-in-interrupts -@opindex msave-acc-in-interrupts -Specifies that interrupt handler functions should preserve the -accumulator register. This is only necessary if normal code might use -the accumulator register, for example because it performs 64-bit -multiplications. The default is to ignore the accumulator as this -makes the interrupt handlers faster. +@item -minvalid-symbols +@opindex minvalid-symbols +Assume symbols might be invalid. Ordinary function symbols generated by +the compiler are always valid to load with +@code{movi}/@code{shori}/@code{ptabs} or +@code{movi}/@code{shori}/@code{ptrel}, +but with assembler and/or linker tricks it is possible +to generate symbols that cause @code{ptabs} or @code{ptrel} to trap. +This option is only meaningful when @option{-mno-pt-fixed} is in effect. +It prevents cross-basic-block CSE, hoisting and most scheduling +of symbol loads. The default is @option{-mno-invalid-symbols}. -@item -mpid -@itemx -mno-pid -@opindex mpid -@opindex mno-pid -Enables the generation of position independent data. When enabled any -access to constant data is done via an offset from a base address -held in a register. This allows the location of constant data to be -determined at run time without requiring the executable to be -relocated, which is a benefit to embedded applications with tight -memory constraints. Data that can be modified is not affected by this -option. +@item -mbranch-cost=@var{num} +@opindex mbranch-cost=@var{num} +Assume @var{num} to be the cost for a branch instruction. Higher numbers +make the compiler try to generate more branch-free code if possible. +If not specified the value is selected depending on the processor type that +is being compiled for. -Note, using this feature reserves a register, usually @code{r13}, for -the constant data base address. This can result in slower and/or -larger code, especially in complicated functions. +@item -mzdcbranch +@itemx -mno-zdcbranch +@opindex mzdcbranch +@opindex mno-zdcbranch +Assume (do not assume) that zero displacement conditional branch instructions +@code{bt} and @code{bf} are fast. If @option{-mzdcbranch} is specified, the +compiler prefers zero displacement branch code sequences. This is +enabled by default when generating code for SH4 and SH4A. It can be explicitly +disabled by specifying @option{-mno-zdcbranch}. -The actual register chosen to hold the constant data base address -depends upon whether the @option{-msmall-data-limit} and/or the -@option{-mint-register} command-line options are enabled. Starting -with register @code{r13} and proceeding downwards, registers are -allocated first to satisfy the requirements of @option{-mint-register}, -then @option{-mpid} and finally @option{-msmall-data-limit}. Thus it -is possible for the small data area register to be @code{r8} if both -@option{-mint-register=4} and @option{-mpid} are specified on the -command line. +@item -mfused-madd +@itemx -mno-fused-madd +@opindex mfused-madd +@opindex mno-fused-madd +Generate code that uses (does not use) the floating-point multiply and +accumulate instructions. These instructions are generated by default +if hardware floating point is used. The machine-dependent +@option{-mfused-madd} option is now mapped to the machine-independent +@option{-ffp-contract=fast} option, and @option{-mno-fused-madd} is +mapped to @option{-ffp-contract=off}. -By default this feature is not enabled. The default can be restored -via the @option{-mno-pid} command-line option. +@item -mfsca +@itemx -mno-fsca +@opindex mfsca +@opindex mno-fsca +Allow or disallow the compiler to emit the @code{fsca} instruction for sine +and cosine approximations. The option @option{-mfsca} must be used in +combination with @option{-funsafe-math-optimizations}. It is enabled by default +when generating code for SH4A. Using @option{-mno-fsca} disables sine and cosine +approximations even if @option{-funsafe-math-optimizations} is in effect. -@item -mno-warn-multiple-fast-interrupts -@itemx -mwarn-multiple-fast-interrupts -@opindex mno-warn-multiple-fast-interrupts -@opindex mwarn-multiple-fast-interrupts -Prevents GCC from issuing a warning message if it finds more than one -fast interrupt handler when it is compiling a file. The default is to -issue a warning for each extra fast interrupt handler found, as the RX -only supports one such interrupt. +@item -mfsrra +@itemx -mno-fsrra +@opindex mfsrra +@opindex mno-fsrra +Allow or disallow the compiler to emit the @code{fsrra} instruction for +reciprocal square root approximations. The option @option{-mfsrra} must be used +in combination with @option{-funsafe-math-optimizations} and +@option{-ffinite-math-only}. It is enabled by default when generating code for +SH4A. Using @option{-mno-fsrra} disables reciprocal square root approximations +even if @option{-funsafe-math-optimizations} and @option{-ffinite-math-only} are +in effect. -@end table +@item -mpretend-cmove +@opindex mpretend-cmove +Prefer zero-displacement conditional branches for conditional move instruction +patterns. This can result in faster code on the SH4 processor. -@emph{Note:} The generic GCC command-line option @option{-ffixed-@var{reg}} -has special significance to the RX port when used with the -@code{interrupt} function attribute. This attribute indicates a -function intended to process fast interrupts. GCC ensures -that it only uses the registers @code{r10}, @code{r11}, @code{r12} -and/or @code{r13} and only provided that the normal use of the -corresponding registers have been restricted via the -@option{-ffixed-@var{reg}} or @option{-mint-register} command-line -options. +@end table -@node S/390 and zSeries Options -@subsection S/390 and zSeries Options -@cindex S/390 and zSeries Options +@node Solaris 2 Options +@subsection Solaris 2 Options +@cindex Solaris 2 options -These are the @samp{-m} options defined for the S/390 and zSeries architecture. +These @samp{-m} options are supported on Solaris 2: @table @gcctabopt -@item -mhard-float -@itemx -msoft-float -@opindex mhard-float -@opindex msoft-float -Use (do not use) the hardware floating-point instructions and registers -for floating-point operations. When @option{-msoft-float} is specified, -functions in @file{libgcc.a} are used to perform floating-point -operations. When @option{-mhard-float} is specified, the compiler -generates IEEE floating-point instructions. This is the default. +@item -mclear-hwcap +@opindex mclear-hwcap +@option{-mclear-hwcap} tells the compiler to remove the hardware +capabilities generated by the Solaris assembler. This is only necessary +when object files use ISA extensions not supported by the current +machine, but check at runtime whether or not to use them. -@item -mhard-dfp -@itemx -mno-hard-dfp -@opindex mhard-dfp -@opindex mno-hard-dfp -Use (do not use) the hardware decimal-floating-point instructions for -decimal-floating-point operations. When @option{-mno-hard-dfp} is -specified, functions in @file{libgcc.a} are used to perform -decimal-floating-point operations. When @option{-mhard-dfp} is -specified, the compiler generates decimal-floating-point hardware -instructions. This is the default for @option{-march=z9-ec} or higher. +@item -mimpure-text +@opindex mimpure-text +@option{-mimpure-text}, used in addition to @option{-shared}, tells +the compiler to not pass @option{-z text} to the linker when linking a +shared object. Using this option, you can link position-dependent +code into a shared object. -@item -mlong-double-64 -@itemx -mlong-double-128 -@opindex mlong-double-64 -@opindex mlong-double-128 -These switches control the size of @code{long double} type. A size -of 64 bits makes the @code{long double} type equivalent to the @code{double} -type. This is the default. +@option{-mimpure-text} suppresses the ``relocations remain against +allocatable but non-writable sections'' linker error message. +However, the necessary relocations trigger copy-on-write, and the +shared object is not actually shared across processes. Instead of +using @option{-mimpure-text}, you should compile all source code with +@option{-fpic} or @option{-fPIC}. -@item -mbackchain -@itemx -mno-backchain -@opindex mbackchain -@opindex mno-backchain -Store (do not store) the address of the caller's frame as backchain pointer -into the callee's stack frame. -A backchain may be needed to allow debugging using tools that do not understand -DWARF 2 call frame information. -When @option{-mno-packed-stack} is in effect, the backchain pointer is stored -at the bottom of the stack frame; when @option{-mpacked-stack} is in effect, -the backchain is placed into the topmost word of the 96/160 byte register -save area. - -In general, code compiled with @option{-mbackchain} is call-compatible with -code compiled with @option{-mmo-backchain}; however, use of the backchain -for debugging purposes usually requires that the whole binary is built with -@option{-mbackchain}. Note that the combination of @option{-mbackchain}, -@option{-mpacked-stack} and @option{-mhard-float} is not supported. In order -to build a linux kernel use @option{-msoft-float}. +@end table -The default is to not maintain the backchain. +These switches are supported in addition to the above on Solaris 2: -@item -mpacked-stack -@itemx -mno-packed-stack -@opindex mpacked-stack -@opindex mno-packed-stack -Use (do not use) the packed stack layout. When @option{-mno-packed-stack} is -specified, the compiler uses the all fields of the 96/160 byte register save -area only for their default purpose; unused fields still take up stack space. -When @option{-mpacked-stack} is specified, register save slots are densely -packed at the top of the register save area; unused space is reused for other -purposes, allowing for more efficient use of the available stack space. -However, when @option{-mbackchain} is also in effect, the topmost word of -the save area is always used to store the backchain, and the return address -register is always saved two words below the backchain. +@table @gcctabopt +@item -pthreads +@opindex pthreads +Add support for multithreading using the POSIX threads library. This +option sets flags for both the preprocessor and linker. This option does +not affect the thread safety of object code produced by the compiler or +that of libraries supplied with it. -As long as the stack frame backchain is not used, code generated with -@option{-mpacked-stack} is call-compatible with code generated with -@option{-mno-packed-stack}. Note that some non-FSF releases of GCC 2.95 for -S/390 or zSeries generated code that uses the stack frame backchain at run -time, not just for debugging purposes. Such code is not call-compatible -with code compiled with @option{-mpacked-stack}. Also, note that the -combination of @option{-mbackchain}, -@option{-mpacked-stack} and @option{-mhard-float} is not supported. In order -to build a linux kernel use @option{-msoft-float}. +@item -pthread +@opindex pthread +This is a synonym for @option{-pthreads}. +@end table -The default is to not use the packed stack layout. +@node SPARC Options +@subsection SPARC Options +@cindex SPARC options -@item -msmall-exec -@itemx -mno-small-exec -@opindex msmall-exec -@opindex mno-small-exec -Generate (or do not generate) code using the @code{bras} instruction -to do subroutine calls. -This only works reliably if the total executable size does not -exceed 64k. The default is to use the @code{basr} instruction instead, -which does not have this limitation. +These @samp{-m} options are supported on the SPARC: -@item -m64 -@itemx -m31 -@opindex m64 -@opindex m31 -When @option{-m31} is specified, generate code compliant to the -GNU/Linux for S/390 ABI@. When @option{-m64} is specified, generate -code compliant to the GNU/Linux for zSeries ABI@. This allows GCC in -particular to generate 64-bit instructions. For the @samp{s390} -targets, the default is @option{-m31}, while the @samp{s390x} -targets default to @option{-m64}. +@table @gcctabopt +@item -mno-app-regs +@itemx -mapp-regs +@opindex mno-app-regs +@opindex mapp-regs +Specify @option{-mapp-regs} to generate output using the global registers +2 through 4, which the SPARC SVR4 ABI reserves for applications. Like the +global register 1, each global register 2 through 4 is then treated as an +allocable register that is clobbered by function calls. This is the default. -@item -mzarch -@itemx -mesa -@opindex mzarch -@opindex mesa -When @option{-mzarch} is specified, generate code using the -instructions available on z/Architecture. -When @option{-mesa} is specified, generate code using the -instructions available on ESA/390. Note that @option{-mesa} is -not possible with @option{-m64}. -When generating code compliant to the GNU/Linux for S/390 ABI, -the default is @option{-mesa}. When generating code compliant -to the GNU/Linux for zSeries ABI, the default is @option{-mzarch}. +To be fully SVR4 ABI-compliant at the cost of some performance loss, +specify @option{-mno-app-regs}. You should compile libraries and system +software with this option. -@item -mmvcle -@itemx -mno-mvcle -@opindex mmvcle -@opindex mno-mvcle -Generate (or do not generate) code using the @code{mvcle} instruction -to perform block moves. When @option{-mno-mvcle} is specified, -use a @code{mvc} loop instead. This is the default unless optimizing for -size. +@item -mflat +@itemx -mno-flat +@opindex mflat +@opindex mno-flat +With @option{-mflat}, the compiler does not generate save/restore instructions +and uses a ``flat'' or single register window model. This model is compatible +with the regular register window model. The local registers and the input +registers (0--5) are still treated as ``call-saved'' registers and are +saved on the stack as needed. -@item -mdebug -@itemx -mno-debug -@opindex mdebug -@opindex mno-debug -Print (or do not print) additional debug information when compiling. -The default is to not print debug information. +With @option{-mno-flat} (the default), the compiler generates save/restore +instructions (except for leaf functions). This is the normal operating mode. -@item -march=@var{cpu-type} -@opindex march -Generate code that runs on @var{cpu-type}, which is the name of a system -representing a certain processor type. Possible values for -@var{cpu-type} are @samp{g5}, @samp{g6}, @samp{z900}, @samp{z990}, -@samp{z9-109}, @samp{z9-ec} and @samp{z10}. -When generating code using the instructions available on z/Architecture, -the default is @option{-march=z900}. Otherwise, the default is -@option{-march=g5}. +@item -mfpu +@itemx -mhard-float +@opindex mfpu +@opindex mhard-float +Generate output containing floating-point instructions. This is the +default. -@item -mtune=@var{cpu-type} -@opindex mtune -Tune to @var{cpu-type} everything applicable about the generated code, -except for the ABI and the set of available instructions. -The list of @var{cpu-type} values is the same as for @option{-march}. -The default is the value used for @option{-march}. +@item -mno-fpu +@itemx -msoft-float +@opindex mno-fpu +@opindex msoft-float +Generate output containing library calls for floating point. +@strong{Warning:} the requisite libraries are not available for all SPARC +targets. Normally the facilities of the machine's usual C compiler are +used, but this cannot be done directly in cross-compilation. You must make +your own arrangements to provide suitable library functions for +cross-compilation. The embedded targets @samp{sparc-*-aout} and +@samp{sparclite-*-*} do provide software floating-point support. -@item -mtpf-trace -@itemx -mno-tpf-trace -@opindex mtpf-trace -@opindex mno-tpf-trace -Generate code that adds (does not add) in TPF OS specific branches to trace -routines in the operating system. This option is off by default, even -when compiling for the TPF OS@. +@option{-msoft-float} changes the calling convention in the output file; +therefore, it is only useful if you compile @emph{all} of a program with +this option. In particular, you need to compile @file{libgcc.a}, the +library that comes with GCC, with @option{-msoft-float} in order for +this to work. -@item -mfused-madd -@itemx -mno-fused-madd -@opindex mfused-madd -@opindex mno-fused-madd -Generate code that uses (does not use) the floating-point multiply and -accumulate instructions. These instructions are generated by default if -hardware floating point is used. +@item -mhard-quad-float +@opindex mhard-quad-float +Generate output containing quad-word (long double) floating-point +instructions. -@item -mwarn-framesize=@var{framesize} -@opindex mwarn-framesize -Emit a warning if the current function exceeds the given frame size. Because -this is a compile-time check it doesn't need to be a real problem when the program -runs. It is intended to identify functions that most probably cause -a stack overflow. It is useful to be used in an environment with limited stack -size e.g.@: the linux kernel. +@item -msoft-quad-float +@opindex msoft-quad-float +Generate output containing library calls for quad-word (long double) +floating-point instructions. The functions called are those specified +in the SPARC ABI@. This is the default. -@item -mwarn-dynamicstack -@opindex mwarn-dynamicstack -Emit a warning if the function calls @code{alloca} or uses dynamically-sized -arrays. This is generally a bad idea with a limited stack size. +As of this writing, there are no SPARC implementations that have hardware +support for the quad-word floating-point instructions. They all invoke +a trap handler for one of these instructions, and then the trap handler +emulates the effect of the instruction. Because of the trap handler overhead, +this is much slower than calling the ABI library routines. Thus the +@option{-msoft-quad-float} option is the default. -@item -mstack-guard=@var{stack-guard} -@itemx -mstack-size=@var{stack-size} -@opindex mstack-guard -@opindex mstack-size -If these options are provided the S/390 back end emits additional instructions in -the function prologue that trigger a trap if the stack size is @var{stack-guard} -bytes above the @var{stack-size} (remember that the stack on S/390 grows downward). -If the @var{stack-guard} option is omitted the smallest power of 2 larger than -the frame size of the compiled function is chosen. -These options are intended to be used to help debugging stack overflow problems. -The additionally emitted code causes only little overhead and hence can also be -used in production-like systems without greater performance degradation. The given -values have to be exact powers of 2 and @var{stack-size} has to be greater than -@var{stack-guard} without exceeding 64k. -In order to be efficient the extra code makes the assumption that the stack starts -at an address aligned to the value given by @var{stack-size}. -The @var{stack-guard} option can only be used in conjunction with @var{stack-size}. +@item -mno-unaligned-doubles +@itemx -munaligned-doubles +@opindex mno-unaligned-doubles +@opindex munaligned-doubles +Assume that doubles have 8-byte alignment. This is the default. -@item -mhotpatch=@var{pre-halfwords},@var{post-halfwords} -@opindex mhotpatch -If the hotpatch option is enabled, a ``hot-patching'' function -prologue is generated for all functions in the compilation unit. -The funtion label is prepended with the given number of two-byte -Nop instructions (@var{pre-halfwords}, maximum 1000000). After -the label, 2 * @var{post-halfwords} bytes are appended, using the -larges nop like instructions the architecture allows (maximum -1000000). +With @option{-munaligned-doubles}, GCC assumes that doubles have 8-byte +alignment only if they are contained in another type, or if they have an +absolute address. Otherwise, it assumes they have 4-byte alignment. +Specifying this option avoids some rare compatibility problems with code +generated by other compilers. It is not the default because it results +in a performance loss, especially for floating-point code. -If both arguments are zero, hotpatching is disabled. +@item -muser-mode +@itemx -mno-user-mode +@opindex muser-mode +@opindex mno-user-mode +Do not generate code that can only run in supervisor mode. This is relevant +only for the @code{casa} instruction emitted for the LEON3 processor. The +default is @option{-mno-user-mode}. -This option can be overridden for individual functions with the -@code{hotpatch} attribute. -@end table +@item -mno-faster-structs +@itemx -mfaster-structs +@opindex mno-faster-structs +@opindex mfaster-structs +With @option{-mfaster-structs}, the compiler assumes that structures +should have 8-byte alignment. This enables the use of pairs of +@code{ldd} and @code{std} instructions for copies in structure +assignment, in place of twice as many @code{ld} and @code{st} pairs. +However, the use of this changed alignment directly violates the SPARC +ABI@. Thus, it's intended only for use on targets where the developer +acknowledges that their resulting code is not directly in line with +the rules of the ABI@. -@node Score Options -@subsection Score Options -@cindex Score Options - -These options are defined for Score implementations: - -@table @gcctabopt -@item -meb -@opindex meb -Compile code for big-endian mode. This is the default. +@item -mcpu=@var{cpu_type} +@opindex mcpu +Set the instruction set, register set, and instruction scheduling parameters +for machine type @var{cpu_type}. Supported values for @var{cpu_type} are +@samp{v7}, @samp{cypress}, @samp{v8}, @samp{supersparc}, @samp{hypersparc}, +@samp{leon}, @samp{leon3}, @samp{leon3v7}, @samp{sparclite}, @samp{f930}, +@samp{f934}, @samp{sparclite86x}, @samp{sparclet}, @samp{tsc701}, @samp{v9}, +@samp{ultrasparc}, @samp{ultrasparc3}, @samp{niagara}, @samp{niagara2}, +@samp{niagara3} and @samp{niagara4}. -@item -mel -@opindex mel -Compile code for little-endian mode. +Native Solaris and GNU/Linux toolchains also support the value @samp{native}, +which selects the best architecture option for the host processor. +@option{-mcpu=native} has no effect if GCC does not recognize +the processor. -@item -mnhwloop -@opindex mnhwloop -Disable generation of @code{bcnz} instructions. +Default instruction scheduling parameters are used for values that select +an architecture and not an implementation. These are @samp{v7}, @samp{v8}, +@samp{sparclite}, @samp{sparclet}, @samp{v9}. -@item -muls -@opindex muls -Enable generation of unaligned load and store instructions. +Here is a list of each supported architecture and their supported +implementations. -@item -mmac -@opindex mmac -Enable the use of multiply-accumulate instructions. Disabled by default. +@table @asis +@item v7 +cypress, leon3v7 -@item -mscore5 -@opindex mscore5 -Specify the SCORE5 as the target architecture. +@item v8 +supersparc, hypersparc, leon, leon3 -@item -mscore5u -@opindex mscore5u -Specify the SCORE5U of the target architecture. +@item sparclite +f930, f934, sparclite86x -@item -mscore7 -@opindex mscore7 -Specify the SCORE7 as the target architecture. This is the default. +@item sparclet +tsc701 -@item -mscore7d -@opindex mscore7d -Specify the SCORE7D as the target architecture. +@item v9 +ultrasparc, ultrasparc3, niagara, niagara2, niagara3, niagara4 @end table -@node SH Options -@subsection SH Options +By default (unless configured otherwise), GCC generates code for the V7 +variant of the SPARC architecture. With @option{-mcpu=cypress}, the compiler +additionally optimizes it for the Cypress CY7C602 chip, as used in the +SPARCStation/SPARCServer 3xx series. This is also appropriate for the older +SPARCStation 1, 2, IPX etc. -These @samp{-m} options are defined for the SH implementations: +With @option{-mcpu=v8}, GCC generates code for the V8 variant of the SPARC +architecture. The only difference from V7 code is that the compiler emits +the integer multiply and integer divide instructions which exist in SPARC-V8 +but not in SPARC-V7. With @option{-mcpu=supersparc}, the compiler additionally +optimizes it for the SuperSPARC chip, as used in the SPARCStation 10, 1000 and +2000 series. -@table @gcctabopt -@item -m1 -@opindex m1 -Generate code for the SH1. +With @option{-mcpu=sparclite}, GCC generates code for the SPARClite variant of +the SPARC architecture. This adds the integer multiply, integer divide step +and scan (@code{ffs}) instructions which exist in SPARClite but not in SPARC-V7. +With @option{-mcpu=f930}, the compiler additionally optimizes it for the +Fujitsu MB86930 chip, which is the original SPARClite, with no FPU@. With +@option{-mcpu=f934}, the compiler additionally optimizes it for the Fujitsu +MB86934 chip, which is the more recent SPARClite with FPU@. -@item -m2 -@opindex m2 -Generate code for the SH2. +With @option{-mcpu=sparclet}, GCC generates code for the SPARClet variant of +the SPARC architecture. This adds the integer multiply, multiply/accumulate, +integer divide step and scan (@code{ffs}) instructions which exist in SPARClet +but not in SPARC-V7. With @option{-mcpu=tsc701}, the compiler additionally +optimizes it for the TEMIC SPARClet chip. -@item -m2e -Generate code for the SH2e. +With @option{-mcpu=v9}, GCC generates code for the V9 variant of the SPARC +architecture. This adds 64-bit integer and floating-point move instructions, +3 additional floating-point condition code registers and conditional move +instructions. With @option{-mcpu=ultrasparc}, the compiler additionally +optimizes it for the Sun UltraSPARC I/II/IIi chips. With +@option{-mcpu=ultrasparc3}, the compiler additionally optimizes it for the +Sun UltraSPARC III/III+/IIIi/IIIi+/IV/IV+ chips. With +@option{-mcpu=niagara}, the compiler additionally optimizes it for +Sun UltraSPARC T1 chips. With @option{-mcpu=niagara2}, the compiler +additionally optimizes it for Sun UltraSPARC T2 chips. With +@option{-mcpu=niagara3}, the compiler additionally optimizes it for Sun +UltraSPARC T3 chips. With @option{-mcpu=niagara4}, the compiler +additionally optimizes it for Sun UltraSPARC T4 chips. -@item -m2a-nofpu -@opindex m2a-nofpu -Generate code for the SH2a without FPU, or for a SH2a-FPU in such a way -that the floating-point unit is not used. +@item -mtune=@var{cpu_type} +@opindex mtune +Set the instruction scheduling parameters for machine type +@var{cpu_type}, but do not set the instruction set or register set that the +option @option{-mcpu=@var{cpu_type}} does. -@item -m2a-single-only -@opindex m2a-single-only -Generate code for the SH2a-FPU, in such a way that no double-precision -floating-point operations are used. +The same values for @option{-mcpu=@var{cpu_type}} can be used for +@option{-mtune=@var{cpu_type}}, but the only useful values are those +that select a particular CPU implementation. Those are @samp{cypress}, +@samp{supersparc}, @samp{hypersparc}, @samp{leon}, @samp{leon3}, +@samp{leon3v7}, @samp{f930}, @samp{f934}, @samp{sparclite86x}, @samp{tsc701}, +@samp{ultrasparc}, @samp{ultrasparc3}, @samp{niagara}, @samp{niagara2}, +@samp{niagara3} and @samp{niagara4}. With native Solaris and GNU/Linux +toolchains, @samp{native} can also be used. -@item -m2a-single -@opindex m2a-single -Generate code for the SH2a-FPU assuming the floating-point unit is in -single-precision mode by default. +@item -mv8plus +@itemx -mno-v8plus +@opindex mv8plus +@opindex mno-v8plus +With @option{-mv8plus}, GCC generates code for the SPARC-V8+ ABI@. The +difference from the V8 ABI is that the global and out registers are +considered 64 bits wide. This is enabled by default on Solaris in 32-bit +mode for all SPARC-V9 processors. -@item -m2a -@opindex m2a -Generate code for the SH2a-FPU assuming the floating-point unit is in -double-precision mode by default. +@item -mvis +@itemx -mno-vis +@opindex mvis +@opindex mno-vis +With @option{-mvis}, GCC generates code that takes advantage of the UltraSPARC +Visual Instruction Set extensions. The default is @option{-mno-vis}. -@item -m3 -@opindex m3 -Generate code for the SH3. +@item -mvis2 +@itemx -mno-vis2 +@opindex mvis2 +@opindex mno-vis2 +With @option{-mvis2}, GCC generates code that takes advantage of +version 2.0 of the UltraSPARC Visual Instruction Set extensions. The +default is @option{-mvis2} when targeting a cpu that supports such +instructions, such as UltraSPARC-III and later. Setting @option{-mvis2} +also sets @option{-mvis}. -@item -m3e -@opindex m3e -Generate code for the SH3e. +@item -mvis3 +@itemx -mno-vis3 +@opindex mvis3 +@opindex mno-vis3 +With @option{-mvis3}, GCC generates code that takes advantage of +version 3.0 of the UltraSPARC Visual Instruction Set extensions. The +default is @option{-mvis3} when targeting a cpu that supports such +instructions, such as niagara-3 and later. Setting @option{-mvis3} +also sets @option{-mvis2} and @option{-mvis}. -@item -m4-nofpu -@opindex m4-nofpu -Generate code for the SH4 without a floating-point unit. +@item -mcbcond +@itemx -mno-cbcond +@opindex mcbcond +@opindex mno-cbcond +With @option{-mcbcond}, GCC generates code that takes advantage of +compare-and-branch instructions, as defined in the Sparc Architecture 2011. +The default is @option{-mcbcond} when targeting a cpu that supports such +instructions, such as niagara-4 and later. -@item -m4-single-only -@opindex m4-single-only -Generate code for the SH4 with a floating-point unit that only -supports single-precision arithmetic. +@item -mpopc +@itemx -mno-popc +@opindex mpopc +@opindex mno-popc +With @option{-mpopc}, GCC generates code that takes advantage of the UltraSPARC +population count instruction. The default is @option{-mpopc} +when targeting a cpu that supports such instructions, such as Niagara-2 and +later. -@item -m4-single -@opindex m4-single -Generate code for the SH4 assuming the floating-point unit is in -single-precision mode by default. +@item -mfmaf +@itemx -mno-fmaf +@opindex mfmaf +@opindex mno-fmaf +With @option{-mfmaf}, GCC generates code that takes advantage of the UltraSPARC +Fused Multiply-Add Floating-point extensions. The default is @option{-mfmaf} +when targeting a cpu that supports such instructions, such as Niagara-3 and +later. -@item -m4 -@opindex m4 -Generate code for the SH4. +@item -mfix-at697f +@opindex mfix-at697f +Enable the documented workaround for the single erratum of the Atmel AT697F +processor (which corresponds to erratum #13 of the AT697E processor). -@item -m4-100 -@opindex m4-100 -Generate code for SH4-100. +@item -mfix-ut699 +@opindex mfix-ut699 +Enable the documented workarounds for the floating-point errata and the data +cache nullify errata of the UT699 processor. +@end table -@item -m4-100-nofpu -@opindex m4-100-nofpu -Generate code for SH4-100 in such a way that the -floating-point unit is not used. +These @samp{-m} options are supported in addition to the above +on SPARC-V9 processors in 64-bit environments: -@item -m4-100-single -@opindex m4-100-single -Generate code for SH4-100 assuming the floating-point unit is in -single-precision mode by default. +@table @gcctabopt +@item -m32 +@itemx -m64 +@opindex m32 +@opindex m64 +Generate code for a 32-bit or 64-bit environment. +The 32-bit environment sets int, long and pointer to 32 bits. +The 64-bit environment sets int to 32 bits and long and pointer +to 64 bits. -@item -m4-100-single-only -@opindex m4-100-single-only -Generate code for SH4-100 in such a way that no double-precision -floating-point operations are used. +@item -mcmodel=@var{which} +@opindex mcmodel +Set the code model to one of -@item -m4-200 -@opindex m4-200 -Generate code for SH4-200. +@table @samp +@item medlow +The Medium/Low code model: 64-bit addresses, programs +must be linked in the low 32 bits of memory. Programs can be statically +or dynamically linked. -@item -m4-200-nofpu -@opindex m4-200-nofpu -Generate code for SH4-200 without in such a way that the -floating-point unit is not used. +@item medmid +The Medium/Middle code model: 64-bit addresses, programs +must be linked in the low 44 bits of memory, the text and data segments must +be less than 2GB in size and the data segment must be located within 2GB of +the text segment. -@item -m4-200-single -@opindex m4-200-single -Generate code for SH4-200 assuming the floating-point unit is in -single-precision mode by default. +@item medany +The Medium/Anywhere code model: 64-bit addresses, programs +may be linked anywhere in memory, the text and data segments must be less +than 2GB in size and the data segment must be located within 2GB of the +text segment. -@item -m4-200-single-only -@opindex m4-200-single-only -Generate code for SH4-200 in such a way that no double-precision -floating-point operations are used. +@item embmedany +The Medium/Anywhere code model for embedded systems: +64-bit addresses, the text and data segments must be less than 2GB in +size, both starting anywhere in memory (determined at link time). The +global register %g4 points to the base of the data segment. Programs +are statically linked and PIC is not supported. +@end table -@item -m4-300 -@opindex m4-300 -Generate code for SH4-300. +@item -mmemory-model=@var{mem-model} +@opindex mmemory-model +Set the memory model in force on the processor to one of -@item -m4-300-nofpu -@opindex m4-300-nofpu -Generate code for SH4-300 without in such a way that the -floating-point unit is not used. +@table @samp +@item default +The default memory model for the processor and operating system. -@item -m4-300-single -@opindex m4-300-single -Generate code for SH4-300 in such a way that no double-precision -floating-point operations are used. +@item rmo +Relaxed Memory Order -@item -m4-300-single-only -@opindex m4-300-single-only -Generate code for SH4-300 in such a way that no double-precision -floating-point operations are used. +@item pso +Partial Store Order -@item -m4-340 -@opindex m4-340 -Generate code for SH4-340 (no MMU, no FPU). +@item tso +Total Store Order -@item -m4-500 -@opindex m4-500 -Generate code for SH4-500 (no FPU). Passes @option{-isa=sh4-nofpu} to the -assembler. +@item sc +Sequential Consistency +@end table -@item -m4a-nofpu -@opindex m4a-nofpu -Generate code for the SH4al-dsp, or for a SH4a in such a way that the -floating-point unit is not used. +These memory models are formally defined in Appendix D of the Sparc V9 +architecture manual, as set in the processor's @code{PSTATE.MM} field. -@item -m4a-single-only -@opindex m4a-single-only -Generate code for the SH4a, in such a way that no double-precision -floating-point operations are used. +@item -mstack-bias +@itemx -mno-stack-bias +@opindex mstack-bias +@opindex mno-stack-bias +With @option{-mstack-bias}, GCC assumes that the stack pointer, and +frame pointer if present, are offset by @minus{}2047 which must be added back +when making stack frame references. This is the default in 64-bit mode. +Otherwise, assume no such offset is present. +@end table -@item -m4a-single -@opindex m4a-single -Generate code for the SH4a assuming the floating-point unit is in -single-precision mode by default. +@node SPU Options +@subsection SPU Options +@cindex SPU options -@item -m4a -@opindex m4a -Generate code for the SH4a. +These @samp{-m} options are supported on the SPU: -@item -m4al -@opindex m4al -Same as @option{-m4a-nofpu}, except that it implicitly passes -@option{-dsp} to the assembler. GCC doesn't generate any DSP -instructions at the moment. +@table @gcctabopt +@item -mwarn-reloc +@itemx -merror-reloc +@opindex mwarn-reloc +@opindex merror-reloc -@item -m5-32media -@opindex m5-32media -Generate 32-bit code for SHmedia. +The loader for SPU does not handle dynamic relocations. By default, GCC +gives an error when it generates code that requires a dynamic +relocation. @option{-mno-error-reloc} disables the error, +@option{-mwarn-reloc} generates a warning instead. -@item -m5-32media-nofpu -@opindex m5-32media-nofpu -Generate 32-bit code for SHmedia in such a way that the -floating-point unit is not used. +@item -msafe-dma +@itemx -munsafe-dma +@opindex msafe-dma +@opindex munsafe-dma -@item -m5-64media -@opindex m5-64media -Generate 64-bit code for SHmedia. +Instructions that initiate or test completion of DMA must not be +reordered with respect to loads and stores of the memory that is being +accessed. +With @option{-munsafe-dma} you must use the @code{volatile} keyword to protect +memory accesses, but that can lead to inefficient code in places where the +memory is known to not change. Rather than mark the memory as volatile, +you can use @option{-msafe-dma} to tell the compiler to treat +the DMA instructions as potentially affecting all memory. -@item -m5-64media-nofpu -@opindex m5-64media-nofpu -Generate 64-bit code for SHmedia in such a way that the -floating-point unit is not used. +@item -mbranch-hints +@opindex mbranch-hints -@item -m5-compact -@opindex m5-compact -Generate code for SHcompact. +By default, GCC generates a branch hint instruction to avoid +pipeline stalls for always-taken or probably-taken branches. A hint +is not generated closer than 8 instructions away from its branch. +There is little reason to disable them, except for debugging purposes, +or to make an object a little bit smaller. -@item -m5-compact-nofpu -@opindex m5-compact-nofpu -Generate code for SHcompact in such a way that the -floating-point unit is not used. +@item -msmall-mem +@itemx -mlarge-mem +@opindex msmall-mem +@opindex mlarge-mem -@item -mb -@opindex mb -Compile code for the processor in big-endian mode. +By default, GCC generates code assuming that addresses are never larger +than 18 bits. With @option{-mlarge-mem} code is generated that assumes +a full 32-bit address. -@item -ml -@opindex ml -Compile code for the processor in little-endian mode. +@item -mstdmain +@opindex mstdmain -@item -mdalign -@opindex mdalign -Align doubles at 64-bit boundaries. Note that this changes the calling -conventions, and thus some functions from the standard C library do -not work unless you recompile it first with @option{-mdalign}. +By default, GCC links against startup code that assumes the SPU-style +main function interface (which has an unconventional parameter list). +With @option{-mstdmain}, GCC links your program against startup +code that assumes a C99-style interface to @code{main}, including a +local copy of @code{argv} strings. -@item -mrelax -@opindex mrelax -Shorten some address references at link time, when possible; uses the -linker option @option{-relax}. +@item -mfixed-range=@var{register-range} +@opindex mfixed-range +Generate code treating the given register range as fixed registers. +A fixed register is one that the register allocator cannot use. This is +useful when compiling kernel code. A register range is specified as +two registers separated by a dash. Multiple register ranges can be +specified separated by a comma. -@item -mbigtable -@opindex mbigtable -Use 32-bit offsets in @code{switch} tables. The default is to use -16-bit offsets. +@item -mea32 +@itemx -mea64 +@opindex mea32 +@opindex mea64 +Compile code assuming that pointers to the PPU address space accessed +via the @code{__ea} named address space qualifier are either 32 or 64 +bits wide. The default is 32 bits. As this is an ABI-changing option, +all object code in an executable must be compiled with the same setting. -@item -mbitops -@opindex mbitops -Enable the use of bit manipulation instructions on SH2A. +@item -maddress-space-conversion +@itemx -mno-address-space-conversion +@opindex maddress-space-conversion +@opindex mno-address-space-conversion +Allow/disallow treating the @code{__ea} address space as superset +of the generic address space. This enables explicit type casts +between @code{__ea} and generic pointer as well as implicit +conversions of generic pointers to @code{__ea} pointers. The +default is to allow address space pointer conversions. -@item -mfmovd -@opindex mfmovd -Enable the use of the instruction @code{fmovd}. Check @option{-mdalign} for -alignment constraints. +@item -mcache-size=@var{cache-size} +@opindex mcache-size +This option controls the version of libgcc that the compiler links to an +executable and selects a software-managed cache for accessing variables +in the @code{__ea} address space with a particular cache size. Possible +options for @var{cache-size} are @samp{8}, @samp{16}, @samp{32}, @samp{64} +and @samp{128}. The default cache size is 64KB. -@item -mrenesas -@opindex mrenesas -Comply with the calling conventions defined by Renesas. +@item -matomic-updates +@itemx -mno-atomic-updates +@opindex matomic-updates +@opindex mno-atomic-updates +This option controls the version of libgcc that the compiler links to an +executable and selects whether atomic updates to the software-managed +cache of PPU-side variables are used. If you use atomic updates, changes +to a PPU variable from SPU code using the @code{__ea} named address space +qualifier do not interfere with changes to other PPU variables residing +in the same cache line from PPU code. If you do not use atomic updates, +such interference may occur; however, writing back cache lines is +more efficient. The default behavior is to use atomic updates. -@item -mno-renesas -@opindex mno-renesas -Comply with the calling conventions defined for GCC before the Renesas -conventions were available. This option is the default for all -targets of the SH toolchain. +@item -mdual-nops +@itemx -mdual-nops=@var{n} +@opindex mdual-nops +By default, GCC inserts nops to increase dual issue when it expects +it to increase performance. @var{n} can be a value from 0 to 10. A +smaller @var{n} inserts fewer nops. 10 is the default, 0 is the +same as @option{-mno-dual-nops}. Disabled with @option{-Os}. -@item -mnomacsave -@opindex mnomacsave -Mark the @code{MAC} register as call-clobbered, even if -@option{-mrenesas} is given. +@item -mhint-max-nops=@var{n} +@opindex mhint-max-nops +Maximum number of nops to insert for a branch hint. A branch hint must +be at least 8 instructions away from the branch it is affecting. GCC +inserts up to @var{n} nops to enforce this, otherwise it does not +generate the branch hint. -@item -mieee -@itemx -mno-ieee -@opindex mieee -@opindex mno-ieee -Control the IEEE compliance of floating-point comparisons, which affects the -handling of cases where the result of a comparison is unordered. By default -@option{-mieee} is implicitly enabled. If @option{-ffinite-math-only} is -enabled @option{-mno-ieee} is implicitly set, which results in faster -floating-point greater-equal and less-equal comparisons. The implcit settings -can be overridden by specifying either @option{-mieee} or @option{-mno-ieee}. +@item -mhint-max-distance=@var{n} +@opindex mhint-max-distance +The encoding of the branch hint instruction limits the hint to be within +256 instructions of the branch it is affecting. By default, GCC makes +sure it is within 125. -@item -minline-ic_invalidate -@opindex minline-ic_invalidate -Inline code to invalidate instruction cache entries after setting up -nested function trampolines. -This option has no effect if @option{-musermode} is in effect and the selected -code generation option (e.g. @option{-m4}) does not allow the use of the @code{icbi} -instruction. -If the selected code generation option does not allow the use of the @code{icbi} -instruction, and @option{-musermode} is not in effect, the inlined code -manipulates the instruction cache address array directly with an associative -write. This not only requires privileged mode at run time, but it also -fails if the cache line had been mapped via the TLB and has become unmapped. +@item -msafe-hints +@opindex msafe-hints +Work around a hardware bug that causes the SPU to stall indefinitely. +By default, GCC inserts the @code{hbrp} instruction to make sure +this stall won't happen. -@item -misize -@opindex misize -Dump instruction size and location in the assembly code. +@end table -@item -mpadstruct -@opindex mpadstruct -This option is deprecated. It pads structures to multiple of 4 bytes, -which is incompatible with the SH ABI@. +@node System V Options +@subsection Options for System V -@item -matomic-model=@var{model} -@opindex matomic-model=@var{model} -Sets the model of atomic operations and additional parameters as a comma -separated list. For details on the atomic built-in functions see -@ref{__atomic Builtins}. The following models and parameters are supported: +These additional options are available on System V Release 4 for +compatibility with other compilers on those systems: -@table @samp +@table @gcctabopt +@item -G +@opindex G +Create a shared object. +It is recommended that @option{-symbolic} or @option{-shared} be used instead. -@item none -Disable compiler generated atomic sequences and emit library calls for atomic -operations. This is the default if the target is not @code{sh*-*-linux*}. +@item -Qy +@opindex Qy +Identify the versions of each tool used by the compiler, in a +@code{.ident} assembler directive in the output. -@item soft-gusa -Generate GNU/Linux compatible gUSA software atomic sequences for the atomic -built-in functions. The generated atomic sequences require additional support -from the interrupt/exception handling code of the system and are only suitable -for SH3* and SH4* single-core systems. This option is enabled by default when -the target is @code{sh*-*-linux*} and SH3* or SH4*. When the target is SH4A, -this option also partially utilizes the hardware atomic instructions -@code{movli.l} and @code{movco.l} to create more efficient code, unless -@samp{strict} is specified. +@item -Qn +@opindex Qn +Refrain from adding @code{.ident} directives to the output file (this is +the default). -@item soft-tcb -Generate software atomic sequences that use a variable in the thread control -block. This is a variation of the gUSA sequences which can also be used on -SH1* and SH2* targets. The generated atomic sequences require additional -support from the interrupt/exception handling code of the system and are only -suitable for single-core systems. When using this model, the @samp{gbr-offset=} -parameter has to be specified as well. +@item -YP,@var{dirs} +@opindex YP +Search the directories @var{dirs}, and no others, for libraries +specified with @option{-l}. -@item soft-imask -Generate software atomic sequences that temporarily disable interrupts by -setting @code{SR.IMASK = 1111}. This model works only when the program runs -in privileged mode and is only suitable for single-core systems. Additional -support from the interrupt/exception handling code of the system is not -required. This model is enabled by default when the target is -@code{sh*-*-linux*} and SH1* or SH2*. +@item -Ym,@var{dir} +@opindex Ym +Look in the directory @var{dir} to find the M4 preprocessor. +The assembler uses this option. +@c This is supposed to go with a -Yd for predefined M4 macro files, but +@c the generic assembler that comes with Solaris takes just -Ym. +@end table -@item hard-llcs -Generate hardware atomic sequences using the @code{movli.l} and @code{movco.l} -instructions only. This is only available on SH4A and is suitable for -multi-core systems. Since the hardware instructions support only 32 bit atomic -variables access to 8 or 16 bit variables is emulated with 32 bit accesses. -Code compiled with this option is also compatible with other software -atomic model interrupt/exception handling systems if executed on an SH4A -system. Additional support from the interrupt/exception handling code of the -system is not required for this model. +@node TILE-Gx Options +@subsection TILE-Gx Options +@cindex TILE-Gx options -@item gbr-offset= -This parameter specifies the offset in bytes of the variable in the thread -control block structure that should be used by the generated atomic sequences -when the @samp{soft-tcb} model has been selected. For other models this -parameter is ignored. The specified value must be an integer multiple of four -and in the range 0-1020. +These @samp{-m} options are supported on the TILE-Gx: -@item strict -This parameter prevents mixed usage of multiple atomic models, even if they -are compatible, and makes the compiler generate atomic sequences of the -specified model only. +@table @gcctabopt +@item -mcmodel=small +@opindex mcmodel=small +Generate code for the small model. The distance for direct calls is +limited to 500M in either direction. PC-relative addresses are 32 +bits. Absolute addresses support the full address range. -@end table +@item -mcmodel=large +@opindex mcmodel=large +Generate code for the large model. There is no limitation on call +distance, pc-relative addresses, or absolute addresses. -@item -mtas -@opindex mtas -Generate the @code{tas.b} opcode for @code{__atomic_test_and_set}. -Notice that depending on the particular hardware and software configuration -this can degrade overall performance due to the operand cache line flushes -that are implied by the @code{tas.b} instruction. On multi-core SH4A -processors the @code{tas.b} instruction must be used with caution since it -can result in data corruption for certain cache configurations. +@item -mcpu=@var{name} +@opindex mcpu +Selects the type of CPU to be targeted. Currently the only supported +type is @samp{tilegx}. -@item -mprefergot -@opindex mprefergot -When generating position-independent code, emit function calls using -the Global Offset Table instead of the Procedure Linkage Table. +@item -m32 +@itemx -m64 +@opindex m32 +@opindex m64 +Generate code for a 32-bit or 64-bit environment. The 32-bit +environment sets int, long, and pointer to 32 bits. The 64-bit +environment sets int to 32 bits and long and pointer to 64 bits. -@item -musermode -@itemx -mno-usermode -@opindex musermode -@opindex mno-usermode -Don't allow (allow) the compiler generating privileged mode code. Specifying -@option{-musermode} also implies @option{-mno-inline-ic_invalidate} if the -inlined code would not work in user mode. @option{-musermode} is the default -when the target is @code{sh*-*-linux*}. If the target is SH1* or SH2* -@option{-musermode} has no effect, since there is no user mode. +@item -mbig-endian +@itemx -mlittle-endian +@opindex mbig-endian +@opindex mlittle-endian +Generate code in big/little endian mode, respectively. +@end table -@item -multcost=@var{number} -@opindex multcost=@var{number} -Set the cost to assume for a multiply insn. +@node TILEPro Options +@subsection TILEPro Options +@cindex TILEPro options -@item -mdiv=@var{strategy} -@opindex mdiv=@var{strategy} -Set the division strategy to be used for integer division operations. -For SHmedia @var{strategy} can be one of: +These @samp{-m} options are supported on the TILEPro: -@table @samp +@table @gcctabopt +@item -mcpu=@var{name} +@opindex mcpu +Selects the type of CPU to be targeted. Currently the only supported +type is @samp{tilepro}. -@item fp -Performs the operation in floating point. This has a very high latency, -but needs only a few instructions, so it might be a good choice if -your code has enough easily-exploitable ILP to allow the compiler to -schedule the floating-point instructions together with other instructions. -Division by zero causes a floating-point exception. +@item -m32 +@opindex m32 +Generate code for a 32-bit environment, which sets int, long, and +pointer to 32 bits. This is the only supported behavior so the flag +is essentially ignored. +@end table -@item inv -Uses integer operations to calculate the inverse of the divisor, -and then multiplies the dividend with the inverse. This strategy allows -CSE and hoisting of the inverse calculation. Division by zero calculates -an unspecified result, but does not trap. +@node V850 Options +@subsection V850 Options +@cindex V850 Options -@item inv:minlat -A variant of @samp{inv} where, if no CSE or hoisting opportunities -have been found, or if the entire operation has been hoisted to the same -place, the last stages of the inverse calculation are intertwined with the -final multiply to reduce the overall latency, at the expense of using a few -more instructions, and thus offering fewer scheduling opportunities with -other code. +These @samp{-m} options are defined for V850 implementations: -@item call -Calls a library function that usually implements the @samp{inv:minlat} -strategy. -This gives high code density for @code{m5-*media-nofpu} compilations. +@table @gcctabopt +@item -mlong-calls +@itemx -mno-long-calls +@opindex mlong-calls +@opindex mno-long-calls +Treat all calls as being far away (near). If calls are assumed to be +far away, the compiler always loads the function's address into a +register, and calls indirect through the pointer. -@item call2 -Uses a different entry point of the same library function, where it -assumes that a pointer to a lookup table has already been set up, which -exposes the pointer load to CSE and code hoisting optimizations. +@item -mno-ep +@itemx -mep +@opindex mno-ep +@opindex mep +Do not optimize (do optimize) basic blocks that use the same index +pointer 4 or more times to copy pointer into the @code{ep} register, and +use the shorter @code{sld} and @code{sst} instructions. The @option{-mep} +option is on by default if you optimize. -@item inv:call -@itemx inv:call2 -@itemx inv:fp -Use the @samp{inv} algorithm for initial -code generation, but if the code stays unoptimized, revert to the @samp{call}, -@samp{call2}, or @samp{fp} strategies, respectively. Note that the -potentially-trapping side effect of division by zero is carried by a -separate instruction, so it is possible that all the integer instructions -are hoisted out, but the marker for the side effect stays where it is. -A recombination to floating-point operations or a call is not possible -in that case. - -@item inv20u -@itemx inv20l -Variants of the @samp{inv:minlat} strategy. In the case -that the inverse calculation is not separated from the multiply, they speed -up division where the dividend fits into 20 bits (plus sign where applicable) -by inserting a test to skip a number of operations in this case; this test -slows down the case of larger dividends. @samp{inv20u} assumes the case of a such -a small dividend to be unlikely, and @samp{inv20l} assumes it to be likely. +@item -mno-prolog-function +@itemx -mprolog-function +@opindex mno-prolog-function +@opindex mprolog-function +Do not use (do use) external functions to save and restore registers +at the prologue and epilogue of a function. The external functions +are slower, but use less code space if more than one function saves +the same number of registers. The @option{-mprolog-function} option +is on by default if you optimize. -@end table +@item -mspace +@opindex mspace +Try to make the code as small as possible. At present, this just turns +on the @option{-mep} and @option{-mprolog-function} options. -For targets other than SHmedia @var{strategy} can be one of: +@item -mtda=@var{n} +@opindex mtda +Put static or global variables whose size is @var{n} bytes or less into +the tiny data area that register @code{ep} points to. The tiny data +area can hold up to 256 bytes in total (128 bytes for byte references). -@table @samp +@item -msda=@var{n} +@opindex msda +Put static or global variables whose size is @var{n} bytes or less into +the small data area that register @code{gp} points to. The small data +area can hold up to 64 kilobytes. -@item call-div1 -Calls a library function that uses the single-step division instruction -@code{div1} to perform the operation. Division by zero calculates an -unspecified result and does not trap. This is the default except for SH4, -SH2A and SHcompact. +@item -mzda=@var{n} +@opindex mzda +Put static or global variables whose size is @var{n} bytes or less into +the first 32 kilobytes of memory. -@item call-fp -Calls a library function that performs the operation in double precision -floating point. Division by zero causes a floating-point exception. This is -the default for SHcompact with FPU. Specifying this for targets that do not -have a double precision FPU defaults to @code{call-div1}. +@item -mv850 +@opindex mv850 +Specify that the target processor is the V850. -@item call-table -Calls a library function that uses a lookup table for small divisors and -the @code{div1} instruction with case distinction for larger divisors. Division -by zero calculates an unspecified result and does not trap. This is the default -for SH4. Specifying this for targets that do not have dynamic shift -instructions defaults to @code{call-div1}. +@item -mv850e3v5 +@opindex mv850e3v5 +Specify that the target processor is the V850E3V5. The preprocessor +constant @code{__v850e3v5__} is defined if this option is used. -@end table +@item -mv850e2v4 +@opindex mv850e2v4 +Specify that the target processor is the V850E3V5. This is an alias for +the @option{-mv850e3v5} option. -When a division strategy has not been specified the default strategy is -selected based on the current target. For SH2A the default strategy is to -use the @code{divs} and @code{divu} instructions instead of library function -calls. +@item -mv850e2v3 +@opindex mv850e2v3 +Specify that the target processor is the V850E2V3. The preprocessor +constant @code{__v850e2v3__} is defined if this option is used. -@item -maccumulate-outgoing-args -@opindex maccumulate-outgoing-args -Reserve space once for outgoing arguments in the function prologue rather -than around each call. Generally beneficial for performance and size. Also -needed for unwinding to avoid changing the stack frame around conditional code. +@item -mv850e2 +@opindex mv850e2 +Specify that the target processor is the V850E2. The preprocessor +constant @code{__v850e2__} is defined if this option is used. -@item -mdivsi3_libfunc=@var{name} -@opindex mdivsi3_libfunc=@var{name} -Set the name of the library function used for 32-bit signed division to -@var{name}. -This only affects the name used in the @samp{call} and @samp{inv:call} -division strategies, and the compiler still expects the same -sets of input/output/clobbered registers as if this option were not present. +@item -mv850e1 +@opindex mv850e1 +Specify that the target processor is the V850E1. The preprocessor +constants @code{__v850e1__} and @code{__v850e__} are defined if +this option is used. -@item -mfixed-range=@var{register-range} -@opindex mfixed-range -Generate code treating the given register range as fixed registers. -A fixed register is one that the register allocator can not use. This is -useful when compiling kernel code. A register range is specified as -two registers separated by a dash. Multiple register ranges can be -specified separated by a comma. +@item -mv850es +@opindex mv850es +Specify that the target processor is the V850ES. This is an alias for +the @option{-mv850e1} option. -@item -mindexed-addressing -@opindex mindexed-addressing -Enable the use of the indexed addressing mode for SHmedia32/SHcompact. -This is only safe if the hardware and/or OS implement 32-bit wrap-around -semantics for the indexed addressing mode. The architecture allows the -implementation of processors with 64-bit MMU, which the OS could use to -get 32-bit addressing, but since no current hardware implementation supports -this or any other way to make the indexed addressing mode safe to use in -the 32-bit ABI, the default is @option{-mno-indexed-addressing}. +@item -mv850e +@opindex mv850e +Specify that the target processor is the V850E@. The preprocessor +constant @code{__v850e__} is defined if this option is used. -@item -mgettrcost=@var{number} -@opindex mgettrcost=@var{number} -Set the cost assumed for the @code{gettr} instruction to @var{number}. -The default is 2 if @option{-mpt-fixed} is in effect, 100 otherwise. +If neither @option{-mv850} nor @option{-mv850e} nor @option{-mv850e1} +nor @option{-mv850e2} nor @option{-mv850e2v3} nor @option{-mv850e3v5} +are defined then a default target processor is chosen and the +relevant @samp{__v850*__} preprocessor constant is defined. -@item -mpt-fixed -@opindex mpt-fixed -Assume @code{pt*} instructions won't trap. This generally generates -better-scheduled code, but is unsafe on current hardware. -The current architecture -definition says that @code{ptabs} and @code{ptrel} trap when the target -anded with 3 is 3. -This has the unintentional effect of making it unsafe to schedule these -instructions before a branch, or hoist them out of a loop. For example, -@code{__do_global_ctors}, a part of @file{libgcc} -that runs constructors at program -startup, calls functions in a list which is delimited by @minus{}1. With the -@option{-mpt-fixed} option, the @code{ptabs} is done before testing against @minus{}1. -That means that all the constructors run a bit more quickly, but when -the loop comes to the end of the list, the program crashes because @code{ptabs} -loads @minus{}1 into a target register. +The preprocessor constants @code{__v850} and @code{__v851__} are always +defined, regardless of which processor variant is the target. -Since this option is unsafe for any -hardware implementing the current architecture specification, the default -is @option{-mno-pt-fixed}. Unless specified explicitly with -@option{-mgettrcost}, @option{-mno-pt-fixed} also implies @option{-mgettrcost=100}; -this deters register allocation from using target registers for storing -ordinary integers. +@item -mdisable-callt +@itemx -mno-disable-callt +@opindex mdisable-callt +@opindex mno-disable-callt +This option suppresses generation of the @code{CALLT} instruction for the +v850e, v850e1, v850e2, v850e2v3 and v850e3v5 flavors of the v850 +architecture. -@item -minvalid-symbols -@opindex minvalid-symbols -Assume symbols might be invalid. Ordinary function symbols generated by -the compiler are always valid to load with -@code{movi}/@code{shori}/@code{ptabs} or -@code{movi}/@code{shori}/@code{ptrel}, -but with assembler and/or linker tricks it is possible -to generate symbols that cause @code{ptabs} or @code{ptrel} to trap. -This option is only meaningful when @option{-mno-pt-fixed} is in effect. -It prevents cross-basic-block CSE, hoisting and most scheduling -of symbol loads. The default is @option{-mno-invalid-symbols}. +This option is enabled by default when the RH850 ABI is +in use (see @option{-mrh850-abi}), and disabled by default when the +GCC ABI is in use. If @code{CALLT} instructions are being generated +then the C preprocessor symbol @code{__V850_CALLT__} is defined. -@item -mbranch-cost=@var{num} -@opindex mbranch-cost=@var{num} -Assume @var{num} to be the cost for a branch instruction. Higher numbers -make the compiler try to generate more branch-free code if possible. -If not specified the value is selected depending on the processor type that -is being compiled for. +@item -mrelax +@itemx -mno-relax +@opindex mrelax +@opindex mno-relax +Pass on (or do not pass on) the @option{-mrelax} command line option +to the assembler. -@item -mzdcbranch -@itemx -mno-zdcbranch -@opindex mzdcbranch -@opindex mno-zdcbranch -Assume (do not assume) that zero displacement conditional branch instructions -@code{bt} and @code{bf} are fast. If @option{-mzdcbranch} is specified, the -compiler prefers zero displacement branch code sequences. This is -enabled by default when generating code for SH4 and SH4A. It can be explicitly -disabled by specifying @option{-mno-zdcbranch}. +@item -mlong-jumps +@itemx -mno-long-jumps +@opindex mlong-jumps +@opindex mno-long-jumps +Disable (or re-enable) the generation of PC-relative jump instructions. -@item -mfused-madd -@itemx -mno-fused-madd -@opindex mfused-madd -@opindex mno-fused-madd -Generate code that uses (does not use) the floating-point multiply and -accumulate instructions. These instructions are generated by default -if hardware floating point is used. The machine-dependent -@option{-mfused-madd} option is now mapped to the machine-independent -@option{-ffp-contract=fast} option, and @option{-mno-fused-madd} is -mapped to @option{-ffp-contract=off}. +@item -msoft-float +@itemx -mhard-float +@opindex msoft-float +@opindex mhard-float +Disable (or re-enable) the generation of hardware floating point +instructions. This option is only significant when the target +architecture is @samp{V850E2V3} or higher. If hardware floating point +instructions are being generated then the C preprocessor symbol +@code{__FPU_OK__} is defined, otherwise the symbol +@code{__NO_FPU__} is defined. -@item -mfsca -@itemx -mno-fsca -@opindex mfsca -@opindex mno-fsca -Allow or disallow the compiler to emit the @code{fsca} instruction for sine -and cosine approximations. The option @option{-mfsca} must be used in -combination with @option{-funsafe-math-optimizations}. It is enabled by default -when generating code for SH4A. Using @option{-mno-fsca} disables sine and cosine -approximations even if @option{-funsafe-math-optimizations} is in effect. +@item -mloop +@opindex mloop +Enables the use of the e3v5 LOOP instruction. The use of this +instruction is not enabled by default when the e3v5 architecture is +selected because its use is still experimental. -@item -mfsrra -@itemx -mno-fsrra -@opindex mfsrra -@opindex mno-fsrra -Allow or disallow the compiler to emit the @code{fsrra} instruction for -reciprocal square root approximations. The option @option{-mfsrra} must be used -in combination with @option{-funsafe-math-optimizations} and -@option{-ffinite-math-only}. It is enabled by default when generating code for -SH4A. Using @option{-mno-fsrra} disables reciprocal square root approximations -even if @option{-funsafe-math-optimizations} and @option{-ffinite-math-only} are -in effect. +@item -mrh850-abi +@itemx -mghs +@opindex mrh850-abi +@opindex mghs +Enables support for the RH850 version of the V850 ABI. This is the +default. With this version of the ABI the following rules apply: -@item -mpretend-cmove -@opindex mpretend-cmove -Prefer zero-displacement conditional branches for conditional move instruction -patterns. This can result in faster code on the SH4 processor. +@itemize +@item +Integer sized structures and unions are returned via a memory pointer +rather than a register. -@end table +@item +Large structures and unions (more than 8 bytes in size) are passed by +value. -@node Solaris 2 Options -@subsection Solaris 2 Options -@cindex Solaris 2 options +@item +Functions are aligned to 16-bit boundaries. -These @samp{-m} options are supported on Solaris 2: +@item +The @option{-m8byte-align} command line option is supported. -@table @gcctabopt -@item -mclear-hwcap -@opindex mclear-hwcap -@option{-mclear-hwcap} tells the compiler to remove the hardware -capabilities generated by the Solaris assembler. This is only necessary -when object files use ISA extensions not supported by the current -machine, but check at runtime whether or not to use them. +@item +The @option{-mdisable-callt} command line option is enabled by +default. The @option{-mno-disable-callt} command line option is not +supported. +@end itemize -@item -mimpure-text -@opindex mimpure-text -@option{-mimpure-text}, used in addition to @option{-shared}, tells -the compiler to not pass @option{-z text} to the linker when linking a -shared object. Using this option, you can link position-dependent -code into a shared object. +When this version of the ABI is enabled the C preprocessor symbol +@code{__V850_RH850_ABI__} is defined. -@option{-mimpure-text} suppresses the ``relocations remain against -allocatable but non-writable sections'' linker error message. -However, the necessary relocations trigger copy-on-write, and the -shared object is not actually shared across processes. Instead of -using @option{-mimpure-text}, you should compile all source code with -@option{-fpic} or @option{-fPIC}. +@item -mgcc-abi +@opindex mgcc-abi +Enables support for the old GCC version of the V850 ABI. With this +version of the ABI the following rules apply: -@end table +@itemize +@item +Integer sized structures and unions are returned in register @code{r10}. -These switches are supported in addition to the above on Solaris 2: +@item +Large structures and unions (more than 8 bytes in size) are passed by +reference. -@table @gcctabopt -@item -pthreads -@opindex pthreads -Add support for multithreading using the POSIX threads library. This -option sets flags for both the preprocessor and linker. This option does -not affect the thread safety of object code produced by the compiler or -that of libraries supplied with it. +@item +Functions are aligned to 32-bit boundaries, unless optimizing for +size. -@item -pthread -@opindex pthread -This is a synonym for @option{-pthreads}. -@end table +@item +The @option{-m8byte-align} command line option is not supported. -@node SPARC Options -@subsection SPARC Options -@cindex SPARC options +@item +The @option{-mdisable-callt} command line option is supported but not +enabled by default. +@end itemize -These @samp{-m} options are supported on the SPARC: +When this version of the ABI is enabled the C preprocessor symbol +@code{__V850_GCC_ABI__} is defined. + +@item -m8byte-align +@itemx -mno-8byte-align +@opindex m8byte-align +@opindex mno-8byte-align +Enables support for @code{double} and @code{long long} types to be +aligned on 8-byte boundaries. The default is to restrict the +alignment of all objects to at most 4-bytes. When +@option{-m8byte-align} is in effect the C preprocessor symbol +@code{__V850_8BYTE_ALIGN__} is defined. + +@item -mbig-switch +@opindex mbig-switch +Generate code suitable for big switch tables. Use this option only if +the assembler/linker complain about out of range branches within a switch +table. + +@item -mapp-regs +@opindex mapp-regs +This option causes r2 and r5 to be used in the code generated by +the compiler. This setting is the default. -@table @gcctabopt @item -mno-app-regs -@itemx -mapp-regs @opindex mno-app-regs -@opindex mapp-regs -Specify @option{-mapp-regs} to generate output using the global registers -2 through 4, which the SPARC SVR4 ABI reserves for applications. Like the -global register 1, each global register 2 through 4 is then treated as an -allocable register that is clobbered by function calls. This is the default. +This option causes r2 and r5 to be treated as fixed registers. -To be fully SVR4 ABI-compliant at the cost of some performance loss, -specify @option{-mno-app-regs}. You should compile libraries and system -software with this option. +@end table -@item -mflat -@itemx -mno-flat -@opindex mflat -@opindex mno-flat -With @option{-mflat}, the compiler does not generate save/restore instructions -and uses a ``flat'' or single register window model. This model is compatible -with the regular register window model. The local registers and the input -registers (0--5) are still treated as ``call-saved'' registers and are -saved on the stack as needed. +@node VAX Options +@subsection VAX Options +@cindex VAX options -With @option{-mno-flat} (the default), the compiler generates save/restore -instructions (except for leaf functions). This is the normal operating mode. +These @samp{-m} options are defined for the VAX: + +@table @gcctabopt +@item -munix +@opindex munix +Do not output certain jump instructions (@code{aobleq} and so on) +that the Unix assembler for the VAX cannot handle across long +ranges. + +@item -mgnu +@opindex mgnu +Do output those jump instructions, on the assumption that the +GNU assembler is being used. + +@item -mg +@opindex mg +Output code for G-format floating-point numbers instead of D-format. +@end table + +@node Visium Options +@subsection Visium Options +@cindex Visium options + +@table @gcctabopt + +@item -mdebug +@opindex mdebug +A program which performs file I/O and is destined to run on an MCM target +should be linked with this option. It causes the libraries libc.a and +libdebug.a to be linked. The program should be run on the target under +the control of the GDB remote debugging stub. + +@item -msim +@opindex msim +A program which performs file I/O and is destined to run on the simulator +should be linked with option. This causes libraries libc.a and libsim.a to +be linked. @item -mfpu @itemx -mhard-float @opindex mfpu @opindex mhard-float -Generate output containing floating-point instructions. This is the +Generate code containing floating-point instructions. This is the default. @item -mno-fpu @itemx -msoft-float @opindex mno-fpu @opindex msoft-float -Generate output containing library calls for floating point. -@strong{Warning:} the requisite libraries are not available for all SPARC -targets. Normally the facilities of the machine's usual C compiler are -used, but this cannot be done directly in cross-compilation. You must make -your own arrangements to provide suitable library functions for -cross-compilation. The embedded targets @samp{sparc-*-aout} and -@samp{sparclite-*-*} do provide software floating-point support. +Generate code containing library calls for floating-point. @option{-msoft-float} changes the calling convention in the output file; therefore, it is only useful if you compile @emph{all} of a program with @@ -21930,926 +21532,1324 @@ this option. In particular, you need to compile @file{libgcc.a}, the library that comes with GCC, with @option{-msoft-float} in order for this to work. -@item -mhard-quad-float -@opindex mhard-quad-float -Generate output containing quad-word (long double) floating-point -instructions. +@item -mcpu=@var{cpu_type} +@opindex mcpu +Set the instruction set, register set, and instruction scheduling parameters +for machine type @var{cpu_type}. Supported values for @var{cpu_type} are +@samp{mcm}, @samp{gr5} and @samp{gr6}. -@item -msoft-quad-float -@opindex msoft-quad-float -Generate output containing library calls for quad-word (long double) -floating-point instructions. The functions called are those specified -in the SPARC ABI@. This is the default. +@samp{mcm} is a synonym of @samp{gr5} present for backward compatibility. -As of this writing, there are no SPARC implementations that have hardware -support for the quad-word floating-point instructions. They all invoke -a trap handler for one of these instructions, and then the trap handler -emulates the effect of the instruction. Because of the trap handler overhead, -this is much slower than calling the ABI library routines. Thus the -@option{-msoft-quad-float} option is the default. +By default (unless configured otherwise), GCC generates code for the GR5 +variant of the Visium architecture. -@item -mno-unaligned-doubles -@itemx -munaligned-doubles -@opindex mno-unaligned-doubles -@opindex munaligned-doubles -Assume that doubles have 8-byte alignment. This is the default. +With @option{-mcpu=gr6}, GCC generates code for the GR6 variant of the Visium +architecture. The only difference from GR5 code is that the compiler will +generate block move instructions. -With @option{-munaligned-doubles}, GCC assumes that doubles have 8-byte -alignment only if they are contained in another type, or if they have an -absolute address. Otherwise, it assumes they have 4-byte alignment. -Specifying this option avoids some rare compatibility problems with code -generated by other compilers. It is not the default because it results -in a performance loss, especially for floating-point code. +@item -mtune=@var{cpu_type} +@opindex mtune +Set the instruction scheduling parameters for machine type @var{cpu_type}, +but do not set the instruction set or register set that the option +@option{-mcpu=@var{cpu_type}} would. + +@item -msv-mode +@opindex msv-mode +Generate code for the supervisor mode, where there are no restrictions on +the access to general registers. This is the default. @item -muser-mode -@itemx -mno-user-mode @opindex muser-mode -@opindex mno-user-mode -Do not generate code that can only run in supervisor mode. This is relevant -only for the @code{casa} instruction emitted for the LEON3 processor. The -default is @option{-mno-user-mode}. +Generate code for the user mode, where the access to some general registers +is forbidden: on the GR5, registers r24 to r31 cannot be accessed in this +mode; on the GR6, only registers r29 to r31 are affected. +@end table -@item -mno-faster-structs -@itemx -mfaster-structs -@opindex mno-faster-structs -@opindex mfaster-structs -With @option{-mfaster-structs}, the compiler assumes that structures -should have 8-byte alignment. This enables the use of pairs of -@code{ldd} and @code{std} instructions for copies in structure -assignment, in place of twice as many @code{ld} and @code{st} pairs. -However, the use of this changed alignment directly violates the SPARC -ABI@. Thus, it's intended only for use on targets where the developer -acknowledges that their resulting code is not directly in line with -the rules of the ABI@. +@node VMS Options +@subsection VMS Options -@item -mcpu=@var{cpu_type} -@opindex mcpu -Set the instruction set, register set, and instruction scheduling parameters -for machine type @var{cpu_type}. Supported values for @var{cpu_type} are -@samp{v7}, @samp{cypress}, @samp{v8}, @samp{supersparc}, @samp{hypersparc}, -@samp{leon}, @samp{leon3}, @samp{leon3v7}, @samp{sparclite}, @samp{f930}, -@samp{f934}, @samp{sparclite86x}, @samp{sparclet}, @samp{tsc701}, @samp{v9}, -@samp{ultrasparc}, @samp{ultrasparc3}, @samp{niagara}, @samp{niagara2}, -@samp{niagara3} and @samp{niagara4}. +These @samp{-m} options are defined for the VMS implementations: -Native Solaris and GNU/Linux toolchains also support the value @samp{native}, -which selects the best architecture option for the host processor. -@option{-mcpu=native} has no effect if GCC does not recognize -the processor. +@table @gcctabopt +@item -mvms-return-codes +@opindex mvms-return-codes +Return VMS condition codes from @code{main}. The default is to return POSIX-style +condition (e.g.@ error) codes. -Default instruction scheduling parameters are used for values that select -an architecture and not an implementation. These are @samp{v7}, @samp{v8}, -@samp{sparclite}, @samp{sparclet}, @samp{v9}. +@item -mdebug-main=@var{prefix} +@opindex mdebug-main=@var{prefix} +Flag the first routine whose name starts with @var{prefix} as the main +routine for the debugger. -Here is a list of each supported architecture and their supported -implementations. +@item -mmalloc64 +@opindex mmalloc64 +Default to 64-bit memory allocation routines. -@table @asis -@item v7 -cypress, leon3v7 +@item -mpointer-size=@var{size} +@opindex mpointer-size=@var{size} +Set the default size of pointers. Possible options for @var{size} are +@samp{32} or @samp{short} for 32 bit pointers, @samp{64} or @samp{long} +for 64 bit pointers, and @samp{no} for supporting only 32 bit pointers. +The later option disables @code{pragma pointer_size}. +@end table -@item v8 -supersparc, hypersparc, leon, leon3 +@node VxWorks Options +@subsection VxWorks Options +@cindex VxWorks Options -@item sparclite -f930, f934, sparclite86x +The options in this section are defined for all VxWorks targets. +Options specific to the target hardware are listed with the other +options for that target. -@item sparclet -tsc701 +@table @gcctabopt +@item -mrtp +@opindex mrtp +GCC can generate code for both VxWorks kernels and real time processes +(RTPs). This option switches from the former to the latter. It also +defines the preprocessor macro @code{__RTP__}. -@item v9 -ultrasparc, ultrasparc3, niagara, niagara2, niagara3, niagara4 -@end table +@item -non-static +@opindex non-static +Link an RTP executable against shared libraries rather than static +libraries. The options @option{-static} and @option{-shared} can +also be used for RTPs (@pxref{Link Options}); @option{-static} +is the default. -By default (unless configured otherwise), GCC generates code for the V7 -variant of the SPARC architecture. With @option{-mcpu=cypress}, the compiler -additionally optimizes it for the Cypress CY7C602 chip, as used in the -SPARCStation/SPARCServer 3xx series. This is also appropriate for the older -SPARCStation 1, 2, IPX etc. +@item -Bstatic +@itemx -Bdynamic +@opindex Bstatic +@opindex Bdynamic +These options are passed down to the linker. They are defined for +compatibility with Diab. -With @option{-mcpu=v8}, GCC generates code for the V8 variant of the SPARC -architecture. The only difference from V7 code is that the compiler emits -the integer multiply and integer divide instructions which exist in SPARC-V8 -but not in SPARC-V7. With @option{-mcpu=supersparc}, the compiler additionally -optimizes it for the SuperSPARC chip, as used in the SPARCStation 10, 1000 and -2000 series. +@item -Xbind-lazy +@opindex Xbind-lazy +Enable lazy binding of function calls. This option is equivalent to +@option{-Wl,-z,now} and is defined for compatibility with Diab. -With @option{-mcpu=sparclite}, GCC generates code for the SPARClite variant of -the SPARC architecture. This adds the integer multiply, integer divide step -and scan (@code{ffs}) instructions which exist in SPARClite but not in SPARC-V7. -With @option{-mcpu=f930}, the compiler additionally optimizes it for the -Fujitsu MB86930 chip, which is the original SPARClite, with no FPU@. With -@option{-mcpu=f934}, the compiler additionally optimizes it for the Fujitsu -MB86934 chip, which is the more recent SPARClite with FPU@. +@item -Xbind-now +@opindex Xbind-now +Disable lazy binding of function calls. This option is the default and +is defined for compatibility with Diab. +@end table -With @option{-mcpu=sparclet}, GCC generates code for the SPARClet variant of -the SPARC architecture. This adds the integer multiply, multiply/accumulate, -integer divide step and scan (@code{ffs}) instructions which exist in SPARClet -but not in SPARC-V7. With @option{-mcpu=tsc701}, the compiler additionally -optimizes it for the TEMIC SPARClet chip. +@node x86 Options +@subsection x86 Options +@cindex x86 Options -With @option{-mcpu=v9}, GCC generates code for the V9 variant of the SPARC -architecture. This adds 64-bit integer and floating-point move instructions, -3 additional floating-point condition code registers and conditional move -instructions. With @option{-mcpu=ultrasparc}, the compiler additionally -optimizes it for the Sun UltraSPARC I/II/IIi chips. With -@option{-mcpu=ultrasparc3}, the compiler additionally optimizes it for the -Sun UltraSPARC III/III+/IIIi/IIIi+/IV/IV+ chips. With -@option{-mcpu=niagara}, the compiler additionally optimizes it for -Sun UltraSPARC T1 chips. With @option{-mcpu=niagara2}, the compiler -additionally optimizes it for Sun UltraSPARC T2 chips. With -@option{-mcpu=niagara3}, the compiler additionally optimizes it for Sun -UltraSPARC T3 chips. With @option{-mcpu=niagara4}, the compiler -additionally optimizes it for Sun UltraSPARC T4 chips. +These @samp{-m} options are defined for the x86 family of computers. -@item -mtune=@var{cpu_type} -@opindex mtune -Set the instruction scheduling parameters for machine type -@var{cpu_type}, but do not set the instruction set or register set that the -option @option{-mcpu=@var{cpu_type}} does. +@table @gcctabopt -The same values for @option{-mcpu=@var{cpu_type}} can be used for -@option{-mtune=@var{cpu_type}}, but the only useful values are those -that select a particular CPU implementation. Those are @samp{cypress}, -@samp{supersparc}, @samp{hypersparc}, @samp{leon}, @samp{leon3}, -@samp{leon3v7}, @samp{f930}, @samp{f934}, @samp{sparclite86x}, @samp{tsc701}, -@samp{ultrasparc}, @samp{ultrasparc3}, @samp{niagara}, @samp{niagara2}, -@samp{niagara3} and @samp{niagara4}. With native Solaris and GNU/Linux -toolchains, @samp{native} can also be used. +@item -march=@var{cpu-type} +@opindex march +Generate instructions for the machine type @var{cpu-type}. In contrast to +@option{-mtune=@var{cpu-type}}, which merely tunes the generated code +for the specified @var{cpu-type}, @option{-march=@var{cpu-type}} allows GCC +to generate code that may not run at all on processors other than the one +indicated. Specifying @option{-march=@var{cpu-type}} implies +@option{-mtune=@var{cpu-type}}. -@item -mv8plus -@itemx -mno-v8plus -@opindex mv8plus -@opindex mno-v8plus -With @option{-mv8plus}, GCC generates code for the SPARC-V8+ ABI@. The -difference from the V8 ABI is that the global and out registers are -considered 64 bits wide. This is enabled by default on Solaris in 32-bit -mode for all SPARC-V9 processors. +The choices for @var{cpu-type} are: -@item -mvis -@itemx -mno-vis -@opindex mvis -@opindex mno-vis -With @option{-mvis}, GCC generates code that takes advantage of the UltraSPARC -Visual Instruction Set extensions. The default is @option{-mno-vis}. +@table @samp +@item native +This selects the CPU to generate code for at compilation time by determining +the processor type of the compiling machine. Using @option{-march=native} +enables all instruction subsets supported by the local machine (hence +the result might not run on different machines). Using @option{-mtune=native} +produces code optimized for the local machine under the constraints +of the selected instruction set. -@item -mvis2 -@itemx -mno-vis2 -@opindex mvis2 -@opindex mno-vis2 -With @option{-mvis2}, GCC generates code that takes advantage of -version 2.0 of the UltraSPARC Visual Instruction Set extensions. The -default is @option{-mvis2} when targeting a cpu that supports such -instructions, such as UltraSPARC-III and later. Setting @option{-mvis2} -also sets @option{-mvis}. +@item i386 +Original Intel i386 CPU@. -@item -mvis3 -@itemx -mno-vis3 -@opindex mvis3 -@opindex mno-vis3 -With @option{-mvis3}, GCC generates code that takes advantage of -version 3.0 of the UltraSPARC Visual Instruction Set extensions. The -default is @option{-mvis3} when targeting a cpu that supports such -instructions, such as niagara-3 and later. Setting @option{-mvis3} -also sets @option{-mvis2} and @option{-mvis}. +@item i486 +Intel i486 CPU@. (No scheduling is implemented for this chip.) -@item -mcbcond -@itemx -mno-cbcond -@opindex mcbcond -@opindex mno-cbcond -With @option{-mcbcond}, GCC generates code that takes advantage of -compare-and-branch instructions, as defined in the Sparc Architecture 2011. -The default is @option{-mcbcond} when targeting a cpu that supports such -instructions, such as niagara-4 and later. +@item i586 +@itemx pentium +Intel Pentium CPU with no MMX support. -@item -mpopc -@itemx -mno-popc -@opindex mpopc -@opindex mno-popc -With @option{-mpopc}, GCC generates code that takes advantage of the UltraSPARC -population count instruction. The default is @option{-mpopc} -when targeting a cpu that supports such instructions, such as Niagara-2 and -later. +@item pentium-mmx +Intel Pentium MMX CPU, based on Pentium core with MMX instruction set support. -@item -mfmaf -@itemx -mno-fmaf -@opindex mfmaf -@opindex mno-fmaf -With @option{-mfmaf}, GCC generates code that takes advantage of the UltraSPARC -Fused Multiply-Add Floating-point extensions. The default is @option{-mfmaf} -when targeting a cpu that supports such instructions, such as Niagara-3 and -later. +@item pentiumpro +Intel Pentium Pro CPU@. -@item -mfix-at697f -@opindex mfix-at697f -Enable the documented workaround for the single erratum of the Atmel AT697F -processor (which corresponds to erratum #13 of the AT697E processor). +@item i686 +When used with @option{-march}, the Pentium Pro +instruction set is used, so the code runs on all i686 family chips. +When used with @option{-mtune}, it has the same meaning as @samp{generic}. -@item -mfix-ut699 -@opindex mfix-ut699 -Enable the documented workarounds for the floating-point errata and the data -cache nullify errata of the UT699 processor. -@end table +@item pentium2 +Intel Pentium II CPU, based on Pentium Pro core with MMX instruction set +support. -These @samp{-m} options are supported in addition to the above -on SPARC-V9 processors in 64-bit environments: +@item pentium3 +@itemx pentium3m +Intel Pentium III CPU, based on Pentium Pro core with MMX and SSE instruction +set support. -@table @gcctabopt -@item -m32 -@itemx -m64 -@opindex m32 -@opindex m64 -Generate code for a 32-bit or 64-bit environment. -The 32-bit environment sets int, long and pointer to 32 bits. -The 64-bit environment sets int to 32 bits and long and pointer -to 64 bits. +@item pentium-m +Intel Pentium M; low-power version of Intel Pentium III CPU +with MMX, SSE and SSE2 instruction set support. Used by Centrino notebooks. -@item -mcmodel=@var{which} -@opindex mcmodel -Set the code model to one of +@item pentium4 +@itemx pentium4m +Intel Pentium 4 CPU with MMX, SSE and SSE2 instruction set support. -@table @samp -@item medlow -The Medium/Low code model: 64-bit addresses, programs -must be linked in the low 32 bits of memory. Programs can be statically -or dynamically linked. - -@item medmid -The Medium/Middle code model: 64-bit addresses, programs -must be linked in the low 44 bits of memory, the text and data segments must -be less than 2GB in size and the data segment must be located within 2GB of -the text segment. +@item prescott +Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2 and SSE3 instruction +set support. -@item medany -The Medium/Anywhere code model: 64-bit addresses, programs -may be linked anywhere in memory, the text and data segments must be less -than 2GB in size and the data segment must be located within 2GB of the -text segment. +@item nocona +Improved version of Intel Pentium 4 CPU with 64-bit extensions, MMX, SSE, +SSE2 and SSE3 instruction set support. -@item embmedany -The Medium/Anywhere code model for embedded systems: -64-bit addresses, the text and data segments must be less than 2GB in -size, both starting anywhere in memory (determined at link time). The -global register %g4 points to the base of the data segment. Programs -are statically linked and PIC is not supported. -@end table +@item core2 +Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 +instruction set support. -@item -mmemory-model=@var{mem-model} -@opindex mmemory-model -Set the memory model in force on the processor to one of +@item nehalem +Intel Nehalem CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2 and POPCNT instruction set support. -@table @samp -@item default -The default memory model for the processor and operating system. +@item westmere +Intel Westmere CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2, POPCNT, AES and PCLMUL instruction set support. -@item rmo -Relaxed Memory Order +@item sandybridge +Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2, POPCNT, AVX, AES and PCLMUL instruction set support. -@item pso -Partial Store Order +@item ivybridge +Intel Ivy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2, POPCNT, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C +instruction set support. -@item tso -Total Store Order +@item haswell +Intel Haswell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, +BMI, BMI2 and F16C instruction set support. -@item sc -Sequential Consistency -@end table +@item broadwell +Intel Broadwell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2, POPCNT, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, +BMI, BMI2, F16C, RDSEED, ADCX and PREFETCHW instruction set support. -These memory models are formally defined in Appendix D of the Sparc V9 -architecture manual, as set in the processor's @code{PSTATE.MM} field. +@item bonnell +Intel Bonnell CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3 +instruction set support. -@item -mstack-bias -@itemx -mno-stack-bias -@opindex mstack-bias -@opindex mno-stack-bias -With @option{-mstack-bias}, GCC assumes that the stack pointer, and -frame pointer if present, are offset by @minus{}2047 which must be added back -when making stack frame references. This is the default in 64-bit mode. -Otherwise, assume no such offset is present. -@end table +@item silvermont +Intel Silvermont CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, +SSE4.1, SSE4.2, POPCNT, AES, PCLMUL and RDRND instruction set support. -@node SPU Options -@subsection SPU Options -@cindex SPU options +@item k6 +AMD K6 CPU with MMX instruction set support. -These @samp{-m} options are supported on the SPU: +@item k6-2 +@itemx k6-3 +Improved versions of AMD K6 CPU with MMX and 3DNow!@: instruction set support. -@table @gcctabopt -@item -mwarn-reloc -@itemx -merror-reloc -@opindex mwarn-reloc -@opindex merror-reloc +@item athlon +@itemx athlon-tbird +AMD Athlon CPU with MMX, 3dNOW!, enhanced 3DNow!@: and SSE prefetch instructions +support. -The loader for SPU does not handle dynamic relocations. By default, GCC -gives an error when it generates code that requires a dynamic -relocation. @option{-mno-error-reloc} disables the error, -@option{-mwarn-reloc} generates a warning instead. +@item athlon-4 +@itemx athlon-xp +@itemx athlon-mp +Improved AMD Athlon CPU with MMX, 3DNow!, enhanced 3DNow!@: and full SSE +instruction set support. -@item -msafe-dma -@itemx -munsafe-dma -@opindex msafe-dma -@opindex munsafe-dma +@item k8 +@itemx opteron +@itemx athlon64 +@itemx athlon-fx +Processors based on the AMD K8 core with x86-64 instruction set support, +including the AMD Opteron, Athlon 64, and Athlon 64 FX processors. +(This supersets MMX, SSE, SSE2, 3DNow!, enhanced 3DNow!@: and 64-bit +instruction set extensions.) -Instructions that initiate or test completion of DMA must not be -reordered with respect to loads and stores of the memory that is being -accessed. -With @option{-munsafe-dma} you must use the @code{volatile} keyword to protect -memory accesses, but that can lead to inefficient code in places where the -memory is known to not change. Rather than mark the memory as volatile, -you can use @option{-msafe-dma} to tell the compiler to treat -the DMA instructions as potentially affecting all memory. +@item k8-sse3 +@itemx opteron-sse3 +@itemx athlon64-sse3 +Improved versions of AMD K8 cores with SSE3 instruction set support. -@item -mbranch-hints -@opindex mbranch-hints +@item amdfam10 +@itemx barcelona +CPUs based on AMD Family 10h cores with x86-64 instruction set support. (This +supersets MMX, SSE, SSE2, SSE3, SSE4A, 3DNow!, enhanced 3DNow!, ABM and 64-bit +instruction set extensions.) -By default, GCC generates a branch hint instruction to avoid -pipeline stalls for always-taken or probably-taken branches. A hint -is not generated closer than 8 instructions away from its branch. -There is little reason to disable them, except for debugging purposes, -or to make an object a little bit smaller. +@item bdver1 +CPUs based on AMD Family 15h cores with x86-64 instruction set support. (This +supersets FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, +SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.) +@item bdver2 +AMD Family 15h core based CPUs with x86-64 instruction set support. (This +supersets BMI, TBM, F16C, FMA, FMA4, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, +SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set +extensions.) +@item bdver3 +AMD Family 15h core based CPUs with x86-64 instruction set support. (This +supersets BMI, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, XOP, LWP, AES, +PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and +64-bit instruction set extensions. +@item bdver4 +AMD Family 15h core based CPUs with x86-64 instruction set support. (This +supersets BMI, BMI2, TBM, F16C, FMA, FMA4, FSGSBASE, AVX, AVX2, XOP, LWP, +AES, PCL_MUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, +SSE4.2, ABM and 64-bit instruction set extensions. -@item -msmall-mem -@itemx -mlarge-mem -@opindex msmall-mem -@opindex mlarge-mem +@item btver1 +CPUs based on AMD Family 14h cores with x86-64 instruction set support. (This +supersets MMX, SSE, SSE2, SSE3, SSSE3, SSE4A, CX16, ABM and 64-bit +instruction set extensions.) -By default, GCC generates code assuming that addresses are never larger -than 18 bits. With @option{-mlarge-mem} code is generated that assumes -a full 32-bit address. +@item btver2 +CPUs based on AMD Family 16h cores with x86-64 instruction set support. This +includes MOVBE, F16C, BMI, AVX, PCL_MUL, AES, SSE4.2, SSE4.1, CX16, ABM, +SSE4A, SSSE3, SSE3, SSE2, SSE, MMX and 64-bit instruction set extensions. -@item -mstdmain -@opindex mstdmain +@item winchip-c6 +IDT WinChip C6 CPU, dealt in same way as i486 with additional MMX instruction +set support. -By default, GCC links against startup code that assumes the SPU-style -main function interface (which has an unconventional parameter list). -With @option{-mstdmain}, GCC links your program against startup -code that assumes a C99-style interface to @code{main}, including a -local copy of @code{argv} strings. +@item winchip2 +IDT WinChip 2 CPU, dealt in same way as i486 with additional MMX and 3DNow!@: +instruction set support. -@item -mfixed-range=@var{register-range} -@opindex mfixed-range -Generate code treating the given register range as fixed registers. -A fixed register is one that the register allocator cannot use. This is -useful when compiling kernel code. A register range is specified as -two registers separated by a dash. Multiple register ranges can be -specified separated by a comma. +@item c3 +VIA C3 CPU with MMX and 3DNow!@: instruction set support. (No scheduling is +implemented for this chip.) -@item -mea32 -@itemx -mea64 -@opindex mea32 -@opindex mea64 -Compile code assuming that pointers to the PPU address space accessed -via the @code{__ea} named address space qualifier are either 32 or 64 -bits wide. The default is 32 bits. As this is an ABI-changing option, -all object code in an executable must be compiled with the same setting. +@item c3-2 +VIA C3-2 (Nehemiah/C5XL) CPU with MMX and SSE instruction set support. +(No scheduling is +implemented for this chip.) -@item -maddress-space-conversion -@itemx -mno-address-space-conversion -@opindex maddress-space-conversion -@opindex mno-address-space-conversion -Allow/disallow treating the @code{__ea} address space as superset -of the generic address space. This enables explicit type casts -between @code{__ea} and generic pointer as well as implicit -conversions of generic pointers to @code{__ea} pointers. The -default is to allow address space pointer conversions. +@item geode +AMD Geode embedded processor with MMX and 3DNow!@: instruction set support. +@end table -@item -mcache-size=@var{cache-size} -@opindex mcache-size -This option controls the version of libgcc that the compiler links to an -executable and selects a software-managed cache for accessing variables -in the @code{__ea} address space with a particular cache size. Possible -options for @var{cache-size} are @samp{8}, @samp{16}, @samp{32}, @samp{64} -and @samp{128}. The default cache size is 64KB. +@item -mtune=@var{cpu-type} +@opindex mtune +Tune to @var{cpu-type} everything applicable about the generated code, except +for the ABI and the set of available instructions. +While picking a specific @var{cpu-type} schedules things appropriately +for that particular chip, the compiler does not generate any code that +cannot run on the default machine type unless you use a +@option{-march=@var{cpu-type}} option. +For example, if GCC is configured for i686-pc-linux-gnu +then @option{-mtune=pentium4} generates code that is tuned for Pentium 4 +but still runs on i686 machines. -@item -matomic-updates -@itemx -mno-atomic-updates -@opindex matomic-updates -@opindex mno-atomic-updates -This option controls the version of libgcc that the compiler links to an -executable and selects whether atomic updates to the software-managed -cache of PPU-side variables are used. If you use atomic updates, changes -to a PPU variable from SPU code using the @code{__ea} named address space -qualifier do not interfere with changes to other PPU variables residing -in the same cache line from PPU code. If you do not use atomic updates, -such interference may occur; however, writing back cache lines is -more efficient. The default behavior is to use atomic updates. +The choices for @var{cpu-type} are the same as for @option{-march}. +In addition, @option{-mtune} supports 2 extra choices for @var{cpu-type}: -@item -mdual-nops -@itemx -mdual-nops=@var{n} -@opindex mdual-nops -By default, GCC inserts nops to increase dual issue when it expects -it to increase performance. @var{n} can be a value from 0 to 10. A -smaller @var{n} inserts fewer nops. 10 is the default, 0 is the -same as @option{-mno-dual-nops}. Disabled with @option{-Os}. +@table @samp +@item generic +Produce code optimized for the most common IA32/@/AMD64/@/EM64T processors. +If you know the CPU on which your code will run, then you should use +the corresponding @option{-mtune} or @option{-march} option instead of +@option{-mtune=generic}. But, if you do not know exactly what CPU users +of your application will have, then you should use this option. -@item -mhint-max-nops=@var{n} -@opindex mhint-max-nops -Maximum number of nops to insert for a branch hint. A branch hint must -be at least 8 instructions away from the branch it is affecting. GCC -inserts up to @var{n} nops to enforce this, otherwise it does not -generate the branch hint. +As new processors are deployed in the marketplace, the behavior of this +option will change. Therefore, if you upgrade to a newer version of +GCC, code generation controlled by this option will change to reflect +the processors +that are most common at the time that version of GCC is released. -@item -mhint-max-distance=@var{n} -@opindex mhint-max-distance -The encoding of the branch hint instruction limits the hint to be within -256 instructions of the branch it is affecting. By default, GCC makes -sure it is within 125. +There is no @option{-march=generic} option because @option{-march} +indicates the instruction set the compiler can use, and there is no +generic instruction set applicable to all processors. In contrast, +@option{-mtune} indicates the processor (or, in this case, collection of +processors) for which the code is optimized. -@item -msafe-hints -@opindex msafe-hints -Work around a hardware bug that causes the SPU to stall indefinitely. -By default, GCC inserts the @code{hbrp} instruction to make sure -this stall won't happen. +@item intel +Produce code optimized for the most current Intel processors, which are +Haswell and Silvermont for this version of GCC. If you know the CPU +on which your code will run, then you should use the corresponding +@option{-mtune} or @option{-march} option instead of @option{-mtune=intel}. +But, if you want your application performs better on both Haswell and +Silvermont, then you should use this option. + +As new Intel processors are deployed in the marketplace, the behavior of +this option will change. Therefore, if you upgrade to a newer version of +GCC, code generation controlled by this option will change to reflect +the most current Intel processors at the time that version of GCC is +released. +There is no @option{-march=intel} option because @option{-march} indicates +the instruction set the compiler can use, and there is no common +instruction set applicable to all processors. In contrast, +@option{-mtune} indicates the processor (or, in this case, collection of +processors) for which the code is optimized. @end table -@node System V Options -@subsection Options for System V +@item -mcpu=@var{cpu-type} +@opindex mcpu +A deprecated synonym for @option{-mtune}. -These additional options are available on System V Release 4 for -compatibility with other compilers on those systems: +@item -mfpmath=@var{unit} +@opindex mfpmath +Generate floating-point arithmetic for selected unit @var{unit}. The choices +for @var{unit} are: -@table @gcctabopt -@item -G -@opindex G -Create a shared object. -It is recommended that @option{-symbolic} or @option{-shared} be used instead. +@table @samp +@item 387 +Use the standard 387 floating-point coprocessor present on the majority of chips and +emulated otherwise. Code compiled with this option runs almost everywhere. +The temporary results are computed in 80-bit precision instead of the precision +specified by the type, resulting in slightly different results compared to most +of other chips. See @option{-ffloat-store} for more detailed description. -@item -Qy -@opindex Qy -Identify the versions of each tool used by the compiler, in a -@code{.ident} assembler directive in the output. +This is the default choice for x86-32 targets. -@item -Qn -@opindex Qn -Refrain from adding @code{.ident} directives to the output file (this is -the default). +@item sse +Use scalar floating-point instructions present in the SSE instruction set. +This instruction set is supported by Pentium III and newer chips, +and in the AMD line +by Athlon-4, Athlon XP and Athlon MP chips. The earlier version of the SSE +instruction set supports only single-precision arithmetic, thus the double and +extended-precision arithmetic are still done using 387. A later version, present +only in Pentium 4 and AMD x86-64 chips, supports double-precision +arithmetic too. -@item -YP,@var{dirs} -@opindex YP -Search the directories @var{dirs}, and no others, for libraries -specified with @option{-l}. +For the x86-32 compiler, you must use @option{-march=@var{cpu-type}}, @option{-msse} +or @option{-msse2} switches to enable SSE extensions and make this option +effective. For the x86-64 compiler, these extensions are enabled by default. -@item -Ym,@var{dir} -@opindex Ym -Look in the directory @var{dir} to find the M4 preprocessor. -The assembler uses this option. -@c This is supposed to go with a -Yd for predefined M4 macro files, but -@c the generic assembler that comes with Solaris takes just -Ym. +The resulting code should be considerably faster in the majority of cases and avoid +the numerical instability problems of 387 code, but may break some existing +code that expects temporaries to be 80 bits. + +This is the default choice for the x86-64 compiler. + +@item sse,387 +@itemx sse+387 +@itemx both +Attempt to utilize both instruction sets at once. This effectively doubles the +amount of available registers, and on chips with separate execution units for +387 and SSE the execution resources too. Use this option with care, as it is +still experimental, because the GCC register allocator does not model separate +functional units well, resulting in unstable performance. @end table -@node TILE-Gx Options -@subsection TILE-Gx Options -@cindex TILE-Gx options +@item -masm=@var{dialect} +@opindex masm=@var{dialect} +Output assembly instructions using selected @var{dialect}. Supported +choices are @samp{intel} or @samp{att} (the default). Darwin does +not support @samp{intel}. -These @samp{-m} options are supported on the TILE-Gx: +@item -mieee-fp +@itemx -mno-ieee-fp +@opindex mieee-fp +@opindex mno-ieee-fp +Control whether or not the compiler uses IEEE floating-point +comparisons. These correctly handle the case where the result of a +comparison is unordered. -@table @gcctabopt -@item -mcmodel=small -@opindex mcmodel=small -Generate code for the small model. The distance for direct calls is -limited to 500M in either direction. PC-relative addresses are 32 -bits. Absolute addresses support the full address range. +@item -msoft-float +@opindex msoft-float +Generate output containing library calls for floating point. -@item -mcmodel=large -@opindex mcmodel=large -Generate code for the large model. There is no limitation on call -distance, pc-relative addresses, or absolute addresses. +@strong{Warning:} the requisite libraries are not part of GCC@. +Normally the facilities of the machine's usual C compiler are used, but +this can't be done directly in cross-compilation. You must make your +own arrangements to provide suitable library functions for +cross-compilation. -@item -mcpu=@var{name} -@opindex mcpu -Selects the type of CPU to be targeted. Currently the only supported -type is @samp{tilegx}. +On machines where a function returns floating-point results in the 80387 +register stack, some floating-point opcodes may be emitted even if +@option{-msoft-float} is used. -@item -m32 -@itemx -m64 -@opindex m32 -@opindex m64 -Generate code for a 32-bit or 64-bit environment. The 32-bit -environment sets int, long, and pointer to 32 bits. The 64-bit -environment sets int to 32 bits and long and pointer to 64 bits. +@item -mno-fp-ret-in-387 +@opindex mno-fp-ret-in-387 +Do not use the FPU registers for return values of functions. -@item -mbig-endian -@itemx -mlittle-endian -@opindex mbig-endian -@opindex mlittle-endian -Generate code in big/little endian mode, respectively. -@end table +The usual calling convention has functions return values of types +@code{float} and @code{double} in an FPU register, even if there +is no FPU@. The idea is that the operating system should emulate +an FPU@. + +The option @option{-mno-fp-ret-in-387} causes such values to be returned +in ordinary CPU registers instead. + +@item -mno-fancy-math-387 +@opindex mno-fancy-math-387 +Some 387 emulators do not support the @code{sin}, @code{cos} and +@code{sqrt} instructions for the 387. Specify this option to avoid +generating those instructions. This option is the default on FreeBSD, +OpenBSD and NetBSD@. This option is overridden when @option{-march} +indicates that the target CPU always has an FPU and so the +instruction does not need emulation. These +instructions are not generated unless you also use the +@option{-funsafe-math-optimizations} switch. + +@item -malign-double +@itemx -mno-align-double +@opindex malign-double +@opindex mno-align-double +Control whether GCC aligns @code{double}, @code{long double}, and +@code{long long} variables on a two-word boundary or a one-word +boundary. Aligning @code{double} variables on a two-word boundary +produces code that runs somewhat faster on a Pentium at the +expense of more memory. + +On x86-64, @option{-malign-double} is enabled by default. + +@strong{Warning:} if you use the @option{-malign-double} switch, +structures containing the above types are aligned differently than +the published application binary interface specifications for the x86-32 +and are not binary compatible with structures in code compiled +without that switch. + +@item -m96bit-long-double +@itemx -m128bit-long-double +@opindex m96bit-long-double +@opindex m128bit-long-double +These switches control the size of @code{long double} type. The x86-32 +application binary interface specifies the size to be 96 bits, +so @option{-m96bit-long-double} is the default in 32-bit mode. + +Modern architectures (Pentium and newer) prefer @code{long double} +to be aligned to an 8- or 16-byte boundary. In arrays or structures +conforming to the ABI, this is not possible. So specifying +@option{-m128bit-long-double} aligns @code{long double} +to a 16-byte boundary by padding the @code{long double} with an additional +32-bit zero. + +In the x86-64 compiler, @option{-m128bit-long-double} is the default choice as +its ABI specifies that @code{long double} is aligned on 16-byte boundary. + +Notice that neither of these options enable any extra precision over the x87 +standard of 80 bits for a @code{long double}. + +@strong{Warning:} if you override the default value for your target ABI, this +changes the size of +structures and arrays containing @code{long double} variables, +as well as modifying the function calling convention for functions taking +@code{long double}. Hence they are not binary-compatible +with code compiled without that switch. + +@item -mlong-double-64 +@itemx -mlong-double-80 +@itemx -mlong-double-128 +@opindex mlong-double-64 +@opindex mlong-double-80 +@opindex mlong-double-128 +These switches control the size of @code{long double} type. A size +of 64 bits makes the @code{long double} type equivalent to the @code{double} +type. This is the default for 32-bit Bionic C library. A size +of 128 bits makes the @code{long double} type equivalent to the +@code{__float128} type. This is the default for 64-bit Bionic C library. + +@strong{Warning:} if you override the default value for your target ABI, this +changes the size of +structures and arrays containing @code{long double} variables, +as well as modifying the function calling convention for functions taking +@code{long double}. Hence they are not binary-compatible +with code compiled without that switch. + +@item -malign-data=@var{type} +@opindex malign-data +Control how GCC aligns variables. Supported values for @var{type} are +@samp{compat} uses increased alignment value compatible uses GCC 4.8 +and earlier, @samp{abi} uses alignment value as specified by the +psABI, and @samp{cacheline} uses increased alignment value to match +the cache line size. @samp{compat} is the default. + +@item -mlarge-data-threshold=@var{threshold} +@opindex mlarge-data-threshold +When @option{-mcmodel=medium} is specified, data objects larger than +@var{threshold} are placed in the large data section. This value must be the +same across all objects linked into the binary, and defaults to 65535. + +@item -mrtd +@opindex mrtd +Use a different function-calling convention, in which functions that +take a fixed number of arguments return with the @code{ret @var{num}} +instruction, which pops their arguments while returning. This saves one +instruction in the caller since there is no need to pop the arguments +there. + +You can specify that an individual function is called with this calling +sequence with the function attribute @code{stdcall}. You can also +override the @option{-mrtd} option by using the function attribute +@code{cdecl}. @xref{Function Attributes}. + +@strong{Warning:} this calling convention is incompatible with the one +normally used on Unix, so you cannot use it if you need to call +libraries compiled with the Unix compiler. + +Also, you must provide function prototypes for all functions that +take variable numbers of arguments (including @code{printf}); +otherwise incorrect code is generated for calls to those +functions. + +In addition, seriously incorrect code results if you call a +function with too many arguments. (Normally, extra arguments are +harmlessly ignored.) + +@item -mregparm=@var{num} +@opindex mregparm +Control how many registers are used to pass integer arguments. By +default, no registers are used to pass arguments, and at most 3 +registers can be used. You can control this behavior for a specific +function by using the function attribute @code{regparm}. +@xref{Function Attributes}. + +@strong{Warning:} if you use this switch, and +@var{num} is nonzero, then you must build all modules with the same +value, including any libraries. This includes the system libraries and +startup modules. + +@item -msseregparm +@opindex msseregparm +Use SSE register passing conventions for float and double arguments +and return values. You can control this behavior for a specific +function by using the function attribute @code{sseregparm}. +@xref{Function Attributes}. + +@strong{Warning:} if you use this switch then you must build all +modules with the same value, including any libraries. This includes +the system libraries and startup modules. + +@item -mvect8-ret-in-mem +@opindex mvect8-ret-in-mem +Return 8-byte vectors in memory instead of MMX registers. This is the +default on Solaris@tie{}8 and 9 and VxWorks to match the ABI of the Sun +Studio compilers until version 12. Later compiler versions (starting +with Studio 12 Update@tie{}1) follow the ABI used by other x86 targets, which +is the default on Solaris@tie{}10 and later. @emph{Only} use this option if +you need to remain compatible with existing code produced by those +previous compiler versions or older versions of GCC@. + +@item -mpc32 +@itemx -mpc64 +@itemx -mpc80 +@opindex mpc32 +@opindex mpc64 +@opindex mpc80 + +Set 80387 floating-point precision to 32, 64 or 80 bits. When @option{-mpc32} +is specified, the significands of results of floating-point operations are +rounded to 24 bits (single precision); @option{-mpc64} rounds the +significands of results of floating-point operations to 53 bits (double +precision) and @option{-mpc80} rounds the significands of results of +floating-point operations to 64 bits (extended double precision), which is +the default. When this option is used, floating-point operations in higher +precisions are not available to the programmer without setting the FPU +control word explicitly. + +Setting the rounding of floating-point operations to less than the default +80 bits can speed some programs by 2% or more. Note that some mathematical +libraries assume that extended-precision (80-bit) floating-point operations +are enabled by default; routines in such libraries could suffer significant +loss of accuracy, typically through so-called ``catastrophic cancellation'', +when this option is used to set the precision to less than extended precision. + +@item -mstackrealign +@opindex mstackrealign +Realign the stack at entry. On the x86, the @option{-mstackrealign} +option generates an alternate prologue and epilogue that realigns the +run-time stack if necessary. This supports mixing legacy codes that keep +4-byte stack alignment with modern codes that keep 16-byte stack alignment for +SSE compatibility. See also the attribute @code{force_align_arg_pointer}, +applicable to individual functions. + +@item -mpreferred-stack-boundary=@var{num} +@opindex mpreferred-stack-boundary +Attempt to keep the stack boundary aligned to a 2 raised to @var{num} +byte boundary. If @option{-mpreferred-stack-boundary} is not specified, +the default is 4 (16 bytes or 128 bits). + +@strong{Warning:} When generating code for the x86-64 architecture with +SSE extensions disabled, @option{-mpreferred-stack-boundary=3} can be +used to keep the stack boundary aligned to 8 byte boundary. Since +x86-64 ABI require 16 byte stack alignment, this is ABI incompatible and +intended to be used in controlled environment where stack space is +important limitation. This option leads to wrong code when functions +compiled with 16 byte stack alignment (such as functions from a standard +library) are called with misaligned stack. In this case, SSE +instructions may lead to misaligned memory access traps. In addition, +variable arguments are handled incorrectly for 16 byte aligned +objects (including x87 long double and __int128), leading to wrong +results. You must build all modules with +@option{-mpreferred-stack-boundary=3}, including any libraries. This +includes the system libraries and startup modules. + +@item -mincoming-stack-boundary=@var{num} +@opindex mincoming-stack-boundary +Assume the incoming stack is aligned to a 2 raised to @var{num} byte +boundary. If @option{-mincoming-stack-boundary} is not specified, +the one specified by @option{-mpreferred-stack-boundary} is used. + +On Pentium and Pentium Pro, @code{double} and @code{long double} values +should be aligned to an 8-byte boundary (see @option{-malign-double}) or +suffer significant run time performance penalties. On Pentium III, the +Streaming SIMD Extension (SSE) data type @code{__m128} may not work +properly if it is not 16-byte aligned. + +To ensure proper alignment of this values on the stack, the stack boundary +must be as aligned as that required by any value stored on the stack. +Further, every function must be generated such that it keeps the stack +aligned. Thus calling a function compiled with a higher preferred +stack boundary from a function compiled with a lower preferred stack +boundary most likely misaligns the stack. It is recommended that +libraries that use callbacks always use the default setting. + +This extra alignment does consume extra stack space, and generally +increases code size. Code that is sensitive to stack space usage, such +as embedded systems and operating system kernels, may want to reduce the +preferred alignment to @option{-mpreferred-stack-boundary=2}. + +@need 200 +@item -mmmx +@opindex mmmx +@need 200 +@itemx -msse +@opindex msse +@need 200 +@itemx -msse2 +@need 200 +@itemx -msse3 +@need 200 +@itemx -mssse3 +@need 200 +@itemx -msse4 +@need 200 +@itemx -msse4a +@need 200 +@itemx -msse4.1 +@need 200 +@itemx -msse4.2 +@need 200 +@itemx -mavx +@opindex mavx +@need 200 +@itemx -mavx2 +@need 200 +@itemx -mavx512f +@need 200 +@itemx -mavx512pf +@need 200 +@itemx -mavx512er +@need 200 +@itemx -mavx512cd +@need 200 +@itemx -msha +@opindex msha +@need 200 +@itemx -maes +@opindex maes +@need 200 +@itemx -mpclmul +@opindex mpclmul +@need 200 +@itemx -mclfushopt +@opindex mclfushopt +@need 200 +@itemx -mfsgsbase +@opindex mfsgsbase +@need 200 +@itemx -mrdrnd +@opindex mrdrnd +@need 200 +@itemx -mf16c +@opindex mf16c +@need 200 +@itemx -mfma +@opindex mfma +@need 200 +@itemx -mfma4 +@need 200 +@itemx -mno-fma4 +@need 200 +@itemx -mprefetchwt1 +@opindex mprefetchwt1 +@need 200 +@itemx -mxop +@opindex mxop +@need 200 +@itemx -mlwp +@opindex mlwp +@need 200 +@itemx -m3dnow +@opindex m3dnow +@need 200 +@itemx -mpopcnt +@opindex mpopcnt +@need 200 +@itemx -mabm +@opindex mabm +@need 200 +@itemx -mbmi +@opindex mbmi +@need 200 +@itemx -mbmi2 +@need 200 +@itemx -mlzcnt +@opindex mlzcnt +@need 200 +@itemx -mfxsr +@opindex mfxsr +@need 200 +@itemx -mxsave +@opindex mxsave +@need 200 +@itemx -mxsaveopt +@opindex mxsaveopt +@need 200 +@itemx -mxsavec +@opindex mxsavec +@need 200 +@itemx -mxsaves +@opindex mxsaves +@need 200 +@itemx -mrtm +@opindex mrtm +@need 200 +@itemx -mtbm +@opindex mtbm +@need 200 +@itemx -mmpx +@opindex mmpx +These switches enable the use of instructions in the MMX, SSE, +SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD, +SHA, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, LWP, ABM, +BMI, BMI2, FXSR, XSAVE, XSAVEOPT, LZCNT, RTM, MPX or 3DNow!@: +extended instruction sets. Each has a corresponding @option{-mno-} option +to disable use of these instructions. + +These extensions are also available as built-in functions: see +@ref{x86 Built-in Functions}, for details of the functions enabled and +disabled by these switches. + +To generate SSE/SSE2 instructions automatically from floating-point +code (as opposed to 387 instructions), see @option{-mfpmath=sse}. -@node TILEPro Options -@subsection TILEPro Options -@cindex TILEPro options +GCC depresses SSEx instructions when @option{-mavx} is used. Instead, it +generates new AVX instructions or AVX equivalence for all SSEx instructions +when needed. -These @samp{-m} options are supported on the TILEPro: +These options enable GCC to use these extended instructions in +generated code, even without @option{-mfpmath=sse}. Applications that +perform run-time CPU detection must compile separate files for each +supported architecture, using the appropriate flags. In particular, +the file containing the CPU detection code should be compiled without +these options. -@table @gcctabopt -@item -mcpu=@var{name} -@opindex mcpu -Selects the type of CPU to be targeted. Currently the only supported -type is @samp{tilepro}. +@item -mdump-tune-features +@opindex mdump-tune-features +This option instructs GCC to dump the names of the x86 performance +tuning features and default settings. The names can be used in +@option{-mtune-ctrl=@var{feature-list}}. -@item -m32 -@opindex m32 -Generate code for a 32-bit environment, which sets int, long, and -pointer to 32 bits. This is the only supported behavior so the flag -is essentially ignored. -@end table +@item -mtune-ctrl=@var{feature-list} +@opindex mtune-ctrl=@var{feature-list} +This option is used to do fine grain control of x86 code generation features. +@var{feature-list} is a comma separated list of @var{feature} names. See also +@option{-mdump-tune-features}. When specified, the @var{feature} is turned +on if it is not preceded with @samp{^}, otherwise, it is turned off. +@option{-mtune-ctrl=@var{feature-list}} is intended to be used by GCC +developers. Using it may lead to code paths not covered by testing and can +potentially result in compiler ICEs or runtime errors. -@node V850 Options -@subsection V850 Options -@cindex V850 Options +@item -mno-default +@opindex mno-default +This option instructs GCC to turn off all tunable features. See also +@option{-mtune-ctrl=@var{feature-list}} and @option{-mdump-tune-features}. -These @samp{-m} options are defined for V850 implementations: +@item -mcld +@opindex mcld +This option instructs GCC to emit a @code{cld} instruction in the prologue +of functions that use string instructions. String instructions depend on +the DF flag to select between autoincrement or autodecrement mode. While the +ABI specifies the DF flag to be cleared on function entry, some operating +systems violate this specification by not clearing the DF flag in their +exception dispatchers. The exception handler can be invoked with the DF flag +set, which leads to wrong direction mode when string instructions are used. +This option can be enabled by default on 32-bit x86 targets by configuring +GCC with the @option{--enable-cld} configure option. Generation of @code{cld} +instructions can be suppressed with the @option{-mno-cld} compiler option +in this case. -@table @gcctabopt -@item -mlong-calls -@itemx -mno-long-calls -@opindex mlong-calls -@opindex mno-long-calls -Treat all calls as being far away (near). If calls are assumed to be -far away, the compiler always loads the function's address into a -register, and calls indirect through the pointer. +@item -mvzeroupper +@opindex mvzeroupper +This option instructs GCC to emit a @code{vzeroupper} instruction +before a transfer of control flow out of the function to minimize +the AVX to SSE transition penalty as well as remove unnecessary @code{zeroupper} +intrinsics. -@item -mno-ep -@itemx -mep -@opindex mno-ep -@opindex mep -Do not optimize (do optimize) basic blocks that use the same index -pointer 4 or more times to copy pointer into the @code{ep} register, and -use the shorter @code{sld} and @code{sst} instructions. The @option{-mep} -option is on by default if you optimize. +@item -mprefer-avx128 +@opindex mprefer-avx128 +This option instructs GCC to use 128-bit AVX instructions instead of +256-bit AVX instructions in the auto-vectorizer. -@item -mno-prolog-function -@itemx -mprolog-function -@opindex mno-prolog-function -@opindex mprolog-function -Do not use (do use) external functions to save and restore registers -at the prologue and epilogue of a function. The external functions -are slower, but use less code space if more than one function saves -the same number of registers. The @option{-mprolog-function} option -is on by default if you optimize. +@item -mcx16 +@opindex mcx16 +This option enables GCC to generate @code{CMPXCHG16B} instructions. +@code{CMPXCHG16B} allows for atomic operations on 128-bit double quadword +(or oword) data types. +This is useful for high-resolution counters that can be updated +by multiple processors (or cores). This instruction is generated as part of +atomic built-in functions: see @ref{__sync Builtins} or +@ref{__atomic Builtins} for details. -@item -mspace -@opindex mspace -Try to make the code as small as possible. At present, this just turns -on the @option{-mep} and @option{-mprolog-function} options. +@item -msahf +@opindex msahf +This option enables generation of @code{SAHF} instructions in 64-bit code. +Early Intel Pentium 4 CPUs with Intel 64 support, +prior to the introduction of Pentium 4 G1 step in December 2005, +lacked the @code{LAHF} and @code{SAHF} instructions +which are supported by AMD64. +These are load and store instructions, respectively, for certain status flags. +In 64-bit mode, the @code{SAHF} instruction is used to optimize @code{fmod}, +@code{drem}, and @code{remainder} built-in functions; +see @ref{Other Builtins} for details. -@item -mtda=@var{n} -@opindex mtda -Put static or global variables whose size is @var{n} bytes or less into -the tiny data area that register @code{ep} points to. The tiny data -area can hold up to 256 bytes in total (128 bytes for byte references). +@item -mmovbe +@opindex mmovbe +This option enables use of the @code{movbe} instruction to implement +@code{__builtin_bswap32} and @code{__builtin_bswap64}. -@item -msda=@var{n} -@opindex msda -Put static or global variables whose size is @var{n} bytes or less into -the small data area that register @code{gp} points to. The small data -area can hold up to 64 kilobytes. +@item -mcrc32 +@opindex mcrc32 +This option enables built-in functions @code{__builtin_ia32_crc32qi}, +@code{__builtin_ia32_crc32hi}, @code{__builtin_ia32_crc32si} and +@code{__builtin_ia32_crc32di} to generate the @code{crc32} machine instruction. -@item -mzda=@var{n} -@opindex mzda -Put static or global variables whose size is @var{n} bytes or less into -the first 32 kilobytes of memory. +@item -mrecip +@opindex mrecip +This option enables use of @code{RCPSS} and @code{RSQRTSS} instructions +(and their vectorized variants @code{RCPPS} and @code{RSQRTPS}) +with an additional Newton-Raphson step +to increase precision instead of @code{DIVSS} and @code{SQRTSS} +(and their vectorized +variants) for single-precision floating-point arguments. These instructions +are generated only when @option{-funsafe-math-optimizations} is enabled +together with @option{-finite-math-only} and @option{-fno-trapping-math}. +Note that while the throughput of the sequence is higher than the throughput +of the non-reciprocal instruction, the precision of the sequence can be +decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994). -@item -mv850 -@opindex mv850 -Specify that the target processor is the V850. +Note that GCC implements @code{1.0f/sqrtf(@var{x})} in terms of @code{RSQRTSS} +(or @code{RSQRTPS}) already with @option{-ffast-math} (or the above option +combination), and doesn't need @option{-mrecip}. -@item -mv850e3v5 -@opindex mv850e3v5 -Specify that the target processor is the V850E3V5. The preprocessor -constant @code{__v850e3v5__} is defined if this option is used. +Also note that GCC emits the above sequence with additional Newton-Raphson step +for vectorized single-float division and vectorized @code{sqrtf(@var{x})} +already with @option{-ffast-math} (or the above option combination), and +doesn't need @option{-mrecip}. -@item -mv850e2v4 -@opindex mv850e2v4 -Specify that the target processor is the V850E3V5. This is an alias for -the @option{-mv850e3v5} option. +@item -mrecip=@var{opt} +@opindex mrecip=opt +This option controls which reciprocal estimate instructions +may be used. @var{opt} is a comma-separated list of options, which may +be preceded by a @samp{!} to invert the option: -@item -mv850e2v3 -@opindex mv850e2v3 -Specify that the target processor is the V850E2V3. The preprocessor -constant @code{__v850e2v3__} is defined if this option is used. +@table @samp +@item all +Enable all estimate instructions. -@item -mv850e2 -@opindex mv850e2 -Specify that the target processor is the V850E2. The preprocessor -constant @code{__v850e2__} is defined if this option is used. +@item default +Enable the default instructions, equivalent to @option{-mrecip}. -@item -mv850e1 -@opindex mv850e1 -Specify that the target processor is the V850E1. The preprocessor -constants @code{__v850e1__} and @code{__v850e__} are defined if -this option is used. +@item none +Disable all estimate instructions, equivalent to @option{-mno-recip}. -@item -mv850es -@opindex mv850es -Specify that the target processor is the V850ES. This is an alias for -the @option{-mv850e1} option. +@item div +Enable the approximation for scalar division. -@item -mv850e -@opindex mv850e -Specify that the target processor is the V850E@. The preprocessor -constant @code{__v850e__} is defined if this option is used. +@item vec-div +Enable the approximation for vectorized division. -If neither @option{-mv850} nor @option{-mv850e} nor @option{-mv850e1} -nor @option{-mv850e2} nor @option{-mv850e2v3} nor @option{-mv850e3v5} -are defined then a default target processor is chosen and the -relevant @samp{__v850*__} preprocessor constant is defined. +@item sqrt +Enable the approximation for scalar square root. -The preprocessor constants @code{__v850} and @code{__v851__} are always -defined, regardless of which processor variant is the target. +@item vec-sqrt +Enable the approximation for vectorized square root. +@end table -@item -mdisable-callt -@itemx -mno-disable-callt -@opindex mdisable-callt -@opindex mno-disable-callt -This option suppresses generation of the @code{CALLT} instruction for the -v850e, v850e1, v850e2, v850e2v3 and v850e3v5 flavors of the v850 -architecture. +So, for example, @option{-mrecip=all,!sqrt} enables +all of the reciprocal approximations, except for square root. -This option is enabled by default when the RH850 ABI is -in use (see @option{-mrh850-abi}), and disabled by default when the -GCC ABI is in use. If @code{CALLT} instructions are being generated -then the C preprocessor symbol @code{__V850_CALLT__} is defined. +@item -mveclibabi=@var{type} +@opindex mveclibabi +Specifies the ABI type to use for vectorizing intrinsics using an +external library. Supported values for @var{type} are @samp{svml} +for the Intel short +vector math library and @samp{acml} for the AMD math core library. +To use this option, both @option{-ftree-vectorize} and +@option{-funsafe-math-optimizations} have to be enabled, and an SVML or ACML +ABI-compatible library must be specified at link time. -@item -mrelax -@itemx -mno-relax -@opindex mrelax -@opindex mno-relax -Pass on (or do not pass on) the @option{-mrelax} command line option -to the assembler. +GCC currently emits calls to @code{vmldExp2}, +@code{vmldLn2}, @code{vmldLog102}, @code{vmldLog102}, @code{vmldPow2}, +@code{vmldTanh2}, @code{vmldTan2}, @code{vmldAtan2}, @code{vmldAtanh2}, +@code{vmldCbrt2}, @code{vmldSinh2}, @code{vmldSin2}, @code{vmldAsinh2}, +@code{vmldAsin2}, @code{vmldCosh2}, @code{vmldCos2}, @code{vmldAcosh2}, +@code{vmldAcos2}, @code{vmlsExp4}, @code{vmlsLn4}, @code{vmlsLog104}, +@code{vmlsLog104}, @code{vmlsPow4}, @code{vmlsTanh4}, @code{vmlsTan4}, +@code{vmlsAtan4}, @code{vmlsAtanh4}, @code{vmlsCbrt4}, @code{vmlsSinh4}, +@code{vmlsSin4}, @code{vmlsAsinh4}, @code{vmlsAsin4}, @code{vmlsCosh4}, +@code{vmlsCos4}, @code{vmlsAcosh4} and @code{vmlsAcos4} for corresponding +function type when @option{-mveclibabi=svml} is used, and @code{__vrd2_sin}, +@code{__vrd2_cos}, @code{__vrd2_exp}, @code{__vrd2_log}, @code{__vrd2_log2}, +@code{__vrd2_log10}, @code{__vrs4_sinf}, @code{__vrs4_cosf}, +@code{__vrs4_expf}, @code{__vrs4_logf}, @code{__vrs4_log2f}, +@code{__vrs4_log10f} and @code{__vrs4_powf} for the corresponding function type +when @option{-mveclibabi=acml} is used. -@item -mlong-jumps -@itemx -mno-long-jumps -@opindex mlong-jumps -@opindex mno-long-jumps -Disable (or re-enable) the generation of PC-relative jump instructions. +@item -mabi=@var{name} +@opindex mabi +Generate code for the specified calling convention. Permissible values +are @samp{sysv} for the ABI used on GNU/Linux and other systems, and +@samp{ms} for the Microsoft ABI. The default is to use the Microsoft +ABI when targeting Microsoft Windows and the SysV ABI on all other systems. +You can control this behavior for specific functions by +using the function attributes @code{ms_abi} and @code{sysv_abi}. +@xref{Function Attributes}. -@item -msoft-float -@itemx -mhard-float -@opindex msoft-float -@opindex mhard-float -Disable (or re-enable) the generation of hardware floating point -instructions. This option is only significant when the target -architecture is @samp{V850E2V3} or higher. If hardware floating point -instructions are being generated then the C preprocessor symbol -@code{__FPU_OK__} is defined, otherwise the symbol -@code{__NO_FPU__} is defined. +@item -mtls-dialect=@var{type} +@opindex mtls-dialect +Generate code to access thread-local storage using the @samp{gnu} or +@samp{gnu2} conventions. @samp{gnu} is the conservative default; +@samp{gnu2} is more efficient, but it may add compile- and run-time +requirements that cannot be satisfied on all systems. -@item -mloop -@opindex mloop -Enables the use of the e3v5 LOOP instruction. The use of this -instruction is not enabled by default when the e3v5 architecture is -selected because its use is still experimental. +@item -mpush-args +@itemx -mno-push-args +@opindex mpush-args +@opindex mno-push-args +Use PUSH operations to store outgoing parameters. This method is shorter +and usually equally fast as method using SUB/MOV operations and is enabled +by default. In some cases disabling it may improve performance because of +improved scheduling and reduced dependencies. -@item -mrh850-abi -@itemx -mghs -@opindex mrh850-abi -@opindex mghs -Enables support for the RH850 version of the V850 ABI. This is the -default. With this version of the ABI the following rules apply: +@item -maccumulate-outgoing-args +@opindex maccumulate-outgoing-args +If enabled, the maximum amount of space required for outgoing arguments is +computed in the function prologue. This is faster on most modern CPUs +because of reduced dependencies, improved scheduling and reduced stack usage +when the preferred stack boundary is not equal to 2. The drawback is a notable +increase in code size. This switch implies @option{-mno-push-args}. -@itemize -@item -Integer sized structures and unions are returned via a memory pointer -rather than a register. +@item -mthreads +@opindex mthreads +Support thread-safe exception handling on MinGW. Programs that rely +on thread-safe exception handling must compile and link all code with the +@option{-mthreads} option. When compiling, @option{-mthreads} defines +@option{-D_MT}; when linking, it links in a special thread helper library +@option{-lmingwthrd} which cleans up per-thread exception-handling data. -@item -Large structures and unions (more than 8 bytes in size) are passed by -value. +@item -mno-align-stringops +@opindex mno-align-stringops +Do not align the destination of inlined string operations. This switch reduces +code size and improves performance in case the destination is already aligned, +but GCC doesn't know about it. -@item -Functions are aligned to 16-bit boundaries. +@item -minline-all-stringops +@opindex minline-all-stringops +By default GCC inlines string operations only when the destination is +known to be aligned to least a 4-byte boundary. +This enables more inlining and increases code +size, but may improve performance of code that depends on fast +@code{memcpy}, @code{strlen}, +and @code{memset} for short lengths. -@item -The @option{-m8byte-align} command line option is supported. +@item -minline-stringops-dynamically +@opindex minline-stringops-dynamically +For string operations of unknown size, use run-time checks with +inline code for small blocks and a library call for large blocks. -@item -The @option{-mdisable-callt} command line option is enabled by -default. The @option{-mno-disable-callt} command line option is not -supported. -@end itemize +@item -mstringop-strategy=@var{alg} +@opindex mstringop-strategy=@var{alg} +Override the internal decision heuristic for the particular algorithm to use +for inlining string operations. The allowed values for @var{alg} are: -When this version of the ABI is enabled the C preprocessor symbol -@code{__V850_RH850_ABI__} is defined. +@table @samp +@item rep_byte +@itemx rep_4byte +@itemx rep_8byte +Expand using i386 @code{rep} prefix of the specified size. -@item -mgcc-abi -@opindex mgcc-abi -Enables support for the old GCC version of the V850 ABI. With this -version of the ABI the following rules apply: +@item byte_loop +@itemx loop +@itemx unrolled_loop +Expand into an inline loop. -@itemize -@item -Integer sized structures and unions are returned in register @code{r10}. +@item libcall +Always use a library call. +@end table -@item -Large structures and unions (more than 8 bytes in size) are passed by -reference. +@item -mmemcpy-strategy=@var{strategy} +@opindex mmemcpy-strategy=@var{strategy} +Override the internal decision heuristic to decide if @code{__builtin_memcpy} +should be inlined and what inline algorithm to use when the expected size +of the copy operation is known. @var{strategy} +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} triplets. +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} specifies +the max byte size with which inline algorithm @var{alg} is allowed. For the last +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the triplets +in the list must be specified in increasing order. The minimal byte size for +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} of the +preceding range. -@item -Functions are aligned to 32-bit boundaries, unless optimizing for -size. +@item -mmemset-strategy=@var{strategy} +@opindex mmemset-strategy=@var{strategy} +The option is similar to @option{-mmemcpy-strategy=} except that it is to control +@code{__builtin_memset} expansion. -@item -The @option{-m8byte-align} command line option is not supported. +@item -momit-leaf-frame-pointer +@opindex momit-leaf-frame-pointer +Don't keep the frame pointer in a register for leaf functions. This +avoids the instructions to save, set up, and restore frame pointers and +makes an extra register available in leaf functions. The option +@option{-fomit-leaf-frame-pointer} removes the frame pointer for leaf functions, +which might make debugging harder. -@item -The @option{-mdisable-callt} command line option is supported but not -enabled by default. -@end itemize +@item -mtls-direct-seg-refs +@itemx -mno-tls-direct-seg-refs +@opindex mtls-direct-seg-refs +Controls whether TLS variables may be accessed with offsets from the +TLS segment register (@code{%gs} for 32-bit, @code{%fs} for 64-bit), +or whether the thread base pointer must be added. Whether or not this +is valid depends on the operating system, and whether it maps the +segment to cover the entire TLS area. -When this version of the ABI is enabled the C preprocessor symbol -@code{__V850_GCC_ABI__} is defined. +For systems that use the GNU C Library, the default is on. -@item -m8byte-align -@itemx -mno-8byte-align -@opindex m8byte-align -@opindex mno-8byte-align -Enables support for @code{double} and @code{long long} types to be -aligned on 8-byte boundaries. The default is to restrict the -alignment of all objects to at most 4-bytes. When -@option{-m8byte-align} is in effect the C preprocessor symbol -@code{__V850_8BYTE_ALIGN__} is defined. +@item -msse2avx +@itemx -mno-sse2avx +@opindex msse2avx +Specify that the assembler should encode SSE instructions with VEX +prefix. The option @option{-mavx} turns this on by default. -@item -mbig-switch -@opindex mbig-switch -Generate code suitable for big switch tables. Use this option only if -the assembler/linker complain about out of range branches within a switch -table. +@item -mfentry +@itemx -mno-fentry +@opindex mfentry +If profiling is active (@option{-pg}), put the profiling +counter call before the prologue. +Note: On x86 architectures the attribute @code{ms_hook_prologue} +isn't possible at the moment for @option{-mfentry} and @option{-pg}. -@item -mapp-regs -@opindex mapp-regs -This option causes r2 and r5 to be used in the code generated by -the compiler. This setting is the default. +@item -mrecord-mcount +@itemx -mno-record-mcount +@opindex mrecord-mcount +If profiling is active (@option{-pg}), generate a __mcount_loc section +that contains pointers to each profiling call. This is useful for +automatically patching and out calls. -@item -mno-app-regs -@opindex mno-app-regs -This option causes r2 and r5 to be treated as fixed registers. +@item -mnop-mcount +@itemx -mno-nop-mcount +@opindex mnop-mcount +If profiling is active (@option{-pg}), generate the calls to +the profiling functions as nops. This is useful when they +should be patched in later dynamically. This is likely only +useful together with @option{-mrecord-mcount}. -@end table +@item -mskip-rax-setup +@itemx -mno-skip-rax-setup +@opindex mskip-rax-setup +When generating code for the x86-64 architecture with SSE extensions +disabled, @option{-skip-rax-setup} can be used to skip setting up RAX +register when there are no variable arguments passed in vector registers. -@node VAX Options -@subsection VAX Options -@cindex VAX options +@strong{Warning:} Since RAX register is used to avoid unnecessarily +saving vector registers on stack when passing variable arguments, the +impacts of this option are callees may waste some stack space, +misbehave or jump to a random location. GCC 4.4 or newer don't have +those issues, regardless the RAX register value. -These @samp{-m} options are defined for the VAX: +@item -m8bit-idiv +@itemx -mno-8bit-idiv +@opindex m8bit-idiv +On some processors, like Intel Atom, 8-bit unsigned integer divide is +much faster than 32-bit/64-bit integer divide. This option generates a +run-time check. If both dividend and divisor are within range of 0 +to 255, 8-bit unsigned integer divide is used instead of +32-bit/64-bit integer divide. -@table @gcctabopt -@item -munix -@opindex munix -Do not output certain jump instructions (@code{aobleq} and so on) -that the Unix assembler for the VAX cannot handle across long -ranges. +@item -mavx256-split-unaligned-load +@itemx -mavx256-split-unaligned-store +@opindex mavx256-split-unaligned-load +@opindex mavx256-split-unaligned-store +Split 32-byte AVX unaligned load and store. -@item -mgnu -@opindex mgnu -Do output those jump instructions, on the assumption that the -GNU assembler is being used. +@item -mstack-protector-guard=@var{guard} +@opindex mstack-protector-guard=@var{guard} +Generate stack protection code using canary at @var{guard}. Supported +locations are @samp{global} for global canary or @samp{tls} for per-thread +canary in the TLS block (the default). This option has effect only when +@option{-fstack-protector} or @option{-fstack-protector-all} is specified. -@item -mg -@opindex mg -Output code for G-format floating-point numbers instead of D-format. @end table -@node Visium Options -@subsection Visium Options -@cindex Visium options +These @samp{-m} switches are supported in addition to the above +on x86-64 processors in 64-bit environments. @table @gcctabopt +@item -m32 +@itemx -m64 +@itemx -mx32 +@itemx -m16 +@opindex m32 +@opindex m64 +@opindex mx32 +@opindex m16 +Generate code for a 16-bit, 32-bit or 64-bit environment. +The @option{-m32} option sets @code{int}, @code{long}, and pointer types +to 32 bits, and +generates code that runs on any i386 system. -@item -mdebug -@opindex mdebug -A program which performs file I/O and is destined to run on an MCM target -should be linked with this option. It causes the libraries libc.a and -libdebug.a to be linked. The program should be run on the target under -the control of the GDB remote debugging stub. - -@item -msim -@opindex msim -A program which performs file I/O and is destined to run on the simulator -should be linked with option. This causes libraries libc.a and libsim.a to -be linked. - -@item -mfpu -@itemx -mhard-float -@opindex mfpu -@opindex mhard-float -Generate code containing floating-point instructions. This is the -default. +The @option{-m64} option sets @code{int} to 32 bits and @code{long} and pointer +types to 64 bits, and generates code for the x86-64 architecture. +For Darwin only the @option{-m64} option also turns off the @option{-fno-pic} +and @option{-mdynamic-no-pic} options. -@item -mno-fpu -@itemx -msoft-float -@opindex mno-fpu -@opindex msoft-float -Generate code containing library calls for floating-point. +The @option{-mx32} option sets @code{int}, @code{long}, and pointer types +to 32 bits, and +generates code for the x86-64 architecture. -@option{-msoft-float} changes the calling convention in the output file; -therefore, it is only useful if you compile @emph{all} of a program with -this option. In particular, you need to compile @file{libgcc.a}, the -library that comes with GCC, with @option{-msoft-float} in order for -this to work. +The @option{-m16} option is the same as @option{-m32}, except for that +it outputs the @code{.code16gcc} assembly directive at the beginning of +the assembly output so that the binary can run in 16-bit mode. -@item -mcpu=@var{cpu_type} -@opindex mcpu -Set the instruction set, register set, and instruction scheduling parameters -for machine type @var{cpu_type}. Supported values for @var{cpu_type} are -@samp{mcm}, @samp{gr5} and @samp{gr6}. +@item -mno-red-zone +@opindex mno-red-zone +Do not use a so-called ``red zone'' for x86-64 code. The red zone is mandated +by the x86-64 ABI; it is a 128-byte area beyond the location of the +stack pointer that is not modified by signal or interrupt handlers +and therefore can be used for temporary data without adjusting the stack +pointer. The flag @option{-mno-red-zone} disables this red zone. -@samp{mcm} is a synonym of @samp{gr5} present for backward compatibility. +@item -mcmodel=small +@opindex mcmodel=small +Generate code for the small code model: the program and its symbols must +be linked in the lower 2 GB of the address space. Pointers are 64 bits. +Programs can be statically or dynamically linked. This is the default +code model. -By default (unless configured otherwise), GCC generates code for the GR5 -variant of the Visium architecture. +@item -mcmodel=kernel +@opindex mcmodel=kernel +Generate code for the kernel code model. The kernel runs in the +negative 2 GB of the address space. +This model has to be used for Linux kernel code. -With @option{-mcpu=gr6}, GCC generates code for the GR6 variant of the Visium -architecture. The only difference from GR5 code is that the compiler will -generate block move instructions. +@item -mcmodel=medium +@opindex mcmodel=medium +Generate code for the medium model: the program is linked in the lower 2 +GB of the address space. Small symbols are also placed there. Symbols +with sizes larger than @option{-mlarge-data-threshold} are put into +large data or BSS sections and can be located above 2GB. Programs can +be statically or dynamically linked. -@item -mtune=@var{cpu_type} -@opindex mtune -Set the instruction scheduling parameters for machine type @var{cpu_type}, -but do not set the instruction set or register set that the option -@option{-mcpu=@var{cpu_type}} would. +@item -mcmodel=large +@opindex mcmodel=large +Generate code for the large model. This model makes no assumptions +about addresses and sizes of sections. -@item -msv-mode -@opindex msv-mode -Generate code for the supervisor mode, where there are no restrictions on -the access to general registers. This is the default. +@item -maddress-mode=long +@opindex maddress-mode=long +Generate code for long address mode. This is only supported for 64-bit +and x32 environments. It is the default address mode for 64-bit +environments. -@item -muser-mode -@opindex muser-mode -Generate code for the user mode, where the access to some general registers -is forbidden: on the GR5, registers r24 to r31 cannot be accessed in this -mode; on the GR6, only registers r29 to r31 are affected. +@item -maddress-mode=short +@opindex maddress-mode=short +Generate code for short address mode. This is only supported for 32-bit +and x32 environments. It is the default address mode for 32-bit and +x32 environments. @end table -@node VMS Options -@subsection VMS Options +@node x86 Windows Options +@subsection x86 Windows Options +@cindex x86 Windows Options +@cindex Windows Options for x86 -These @samp{-m} options are defined for the VMS implementations: +These additional options are available for Microsoft Windows targets: @table @gcctabopt -@item -mvms-return-codes -@opindex mvms-return-codes -Return VMS condition codes from @code{main}. The default is to return POSIX-style -condition (e.g.@ error) codes. - -@item -mdebug-main=@var{prefix} -@opindex mdebug-main=@var{prefix} -Flag the first routine whose name starts with @var{prefix} as the main -routine for the debugger. +@item -mconsole +@opindex mconsole +This option +specifies that a console application is to be generated, by +instructing the linker to set the PE header subsystem type +required for console applications. +This option is available for Cygwin and MinGW targets and is +enabled by default on those targets. -@item -mmalloc64 -@opindex mmalloc64 -Default to 64-bit memory allocation routines. +@item -mdll +@opindex mdll +This option is available for Cygwin and MinGW targets. It +specifies that a DLL---a dynamic link library---is to be +generated, enabling the selection of the required runtime +startup object and entry point. -@item -mpointer-size=@var{size} -@opindex mpointer-size=@var{size} -Set the default size of pointers. Possible options for @var{size} are -@samp{32} or @samp{short} for 32 bit pointers, @samp{64} or @samp{long} -for 64 bit pointers, and @samp{no} for supporting only 32 bit pointers. -The later option disables @code{pragma pointer_size}. -@end table +@item -mnop-fun-dllimport +@opindex mnop-fun-dllimport +This option is available for Cygwin and MinGW targets. It +specifies that the @code{dllimport} attribute should be ignored. -@node VxWorks Options -@subsection VxWorks Options -@cindex VxWorks Options +@item -mthread +@opindex mthread +This option is available for MinGW targets. It specifies +that MinGW-specific thread support is to be used. -The options in this section are defined for all VxWorks targets. -Options specific to the target hardware are listed with the other -options for that target. +@item -municode +@opindex municode +This option is available for MinGW-w64 targets. It causes +the @code{UNICODE} preprocessor macro to be predefined, and +chooses Unicode-capable runtime startup code. -@table @gcctabopt -@item -mrtp -@opindex mrtp -GCC can generate code for both VxWorks kernels and real time processes -(RTPs). This option switches from the former to the latter. It also -defines the preprocessor macro @code{__RTP__}. +@item -mwin32 +@opindex mwin32 +This option is available for Cygwin and MinGW targets. It +specifies that the typical Microsoft Windows predefined macros are to +be set in the pre-processor, but does not influence the choice +of runtime library/startup code. -@item -non-static -@opindex non-static -Link an RTP executable against shared libraries rather than static -libraries. The options @option{-static} and @option{-shared} can -also be used for RTPs (@pxref{Link Options}); @option{-static} -is the default. +@item -mwindows +@opindex mwindows +This option is available for Cygwin and MinGW targets. It +specifies that a GUI application is to be generated by +instructing the linker to set the PE header subsystem type +appropriately. -@item -Bstatic -@itemx -Bdynamic -@opindex Bstatic -@opindex Bdynamic -These options are passed down to the linker. They are defined for -compatibility with Diab. +@item -fno-set-stack-executable +@opindex fno-set-stack-executable +This option is available for MinGW targets. It specifies that +the executable flag for the stack used by nested functions isn't +set. This is necessary for binaries running in kernel mode of +Microsoft Windows, as there the User32 API, which is used to set executable +privileges, isn't available. -@item -Xbind-lazy -@opindex Xbind-lazy -Enable lazy binding of function calls. This option is equivalent to -@option{-Wl,-z,now} and is defined for compatibility with Diab. +@item -fwritable-relocated-rdata +@opindex fno-writable-relocated-rdata +This option is available for MinGW and Cygwin targets. It specifies +that relocated-data in read-only section is put into .data +section. This is a necessary for older runtimes not supporting +modification of .rdata sections for pseudo-relocation. -@item -Xbind-now -@opindex Xbind-now -Disable lazy binding of function calls. This option is the default and -is defined for compatibility with Diab. +@item -mpe-aligned-commons +@opindex mpe-aligned-commons +This option is available for Cygwin and MinGW targets. It +specifies that the GNU extension to the PE file format that +permits the correct alignment of COMMON variables should be +used when generating code. It is enabled by default if +GCC detects that the target assembler found during configuration +supports the feature. @end table +See also under @ref{x86 Options} for standard options. + @node Xstormy16 Options @subsection Xstormy16 Options @cindex Xstormy16 Options diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 03faa12d4a7..f2c25c2a45d 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -1695,6 +1695,7 @@ constraints that aren't. The compiler source file mentioned in the table heading for each architecture is the definitive reference for the meanings of that architecture's constraints. +@c Please keep this table alphabetized by target! @table @emph @item AArch64 family---@file{config/aarch64/constraints.md} @table @code @@ -1931,6 +1932,157 @@ A floating point constant 0.0 A memory address based on Y or Z pointer with displacement. @end table +@item Blackfin family---@file{config/bfin/constraints.md} +@table @code +@item a +P register + +@item d +D register + +@item z +A call clobbered P register. + +@item q@var{n} +A single register. If @var{n} is in the range 0 to 7, the corresponding D +register. If it is @code{A}, then the register P0. + +@item D +Even-numbered D register + +@item W +Odd-numbered D register + +@item e +Accumulator register. + +@item A +Even-numbered accumulator register. + +@item B +Odd-numbered accumulator register. + +@item b +I register + +@item v +B register + +@item f +M register + +@item c +Registers used for circular buffering, i.e. I, B, or L registers. + +@item C +The CC register. + +@item t +LT0 or LT1. + +@item k +LC0 or LC1. + +@item u +LB0 or LB1. + +@item x +Any D, P, B, M, I or L register. + +@item y +Additional registers typically used only in prologues and epilogues: RETS, +RETN, RETI, RETX, RETE, ASTAT, SEQSTAT and USP. + +@item w +Any register except accumulators or CC. + +@item Ksh +Signed 16 bit integer (in the range @minus{}32768 to 32767) + +@item Kuh +Unsigned 16 bit integer (in the range 0 to 65535) + +@item Ks7 +Signed 7 bit integer (in the range @minus{}64 to 63) + +@item Ku7 +Unsigned 7 bit integer (in the range 0 to 127) + +@item Ku5 +Unsigned 5 bit integer (in the range 0 to 31) + +@item Ks4 +Signed 4 bit integer (in the range @minus{}8 to 7) + +@item Ks3 +Signed 3 bit integer (in the range @minus{}3 to 4) + +@item Ku3 +Unsigned 3 bit integer (in the range 0 to 7) + +@item P@var{n} +Constant @var{n}, where @var{n} is a single-digit constant in the range 0 to 4. + +@item PA +An integer equal to one of the MACFLAG_XXX constants that is suitable for +use with either accumulator. + +@item PB +An integer equal to one of the MACFLAG_XXX constants that is suitable for +use only with accumulator A1. + +@item M1 +Constant 255. + +@item M2 +Constant 65535. + +@item J +An integer constant with exactly a single bit set. + +@item L +An integer constant with all bits set except exactly one. + +@item H + +@item Q +Any SYMBOL_REF. +@end table + +@item CR16 Architecture---@file{config/cr16/cr16.h} +@table @code + +@item b +Registers from r0 to r14 (registers without stack pointer) + +@item t +Register from r0 to r11 (all 16-bit registers) + +@item p +Register from r12 to r15 (all 32-bit registers) + +@item I +Signed constant that fits in 4 bits + +@item J +Signed constant that fits in 5 bits + +@item K +Signed constant that fits in 6 bits + +@item L +Unsigned constant that fits in 4 bits + +@item M +Signed constant that fits in 32 bits + +@item N +Check for 64 bits wide constants for add/sub instructions + +@item G +Floating point constant that is legal for store immediate +@end table + @item Epiphany---@file{config/epiphany/constraints.md} @table @code @item U16 @@ -2002,38 +2154,97 @@ Matches control register values to switch fp mode, which are encapsulated in @code{UNSPEC_FP_MODE}. @end table -@item CR16 Architecture---@file{config/cr16/cr16.h} +@item FRV---@file{config/frv/frv.h} @table @code +@item a +Register in the class @code{ACC_REGS} (@code{acc0} to @code{acc7}). @item b -Registers from r0 to r14 (registers without stack pointer) +Register in the class @code{EVEN_ACC_REGS} (@code{acc0} to @code{acc7}). + +@item c +Register in the class @code{CC_REGS} (@code{fcc0} to @code{fcc3} and +@code{icc0} to @code{icc3}). + +@item d +Register in the class @code{GPR_REGS} (@code{gr0} to @code{gr63}). + +@item e +Register in the class @code{EVEN_REGS} (@code{gr0} to @code{gr63}). +Odd registers are excluded not in the class but through the use of a machine +mode larger than 4 bytes. + +@item f +Register in the class @code{FPR_REGS} (@code{fr0} to @code{fr63}). + +@item h +Register in the class @code{FEVEN_REGS} (@code{fr0} to @code{fr63}). +Odd registers are excluded not in the class but through the use of a machine +mode larger than 4 bytes. + +@item l +Register in the class @code{LR_REG} (the @code{lr} register). + +@item q +Register in the class @code{QUAD_REGS} (@code{gr2} to @code{gr63}). +Register numbers not divisible by 4 are excluded not in the class but through +the use of a machine mode larger than 8 bytes. @item t -Register from r0 to r11 (all 16-bit registers) +Register in the class @code{ICC_REGS} (@code{icc0} to @code{icc3}). -@item p -Register from r12 to r15 (all 32-bit registers) +@item u +Register in the class @code{FCC_REGS} (@code{fcc0} to @code{fcc3}). + +@item v +Register in the class @code{ICR_REGS} (@code{cc4} to @code{cc7}). + +@item w +Register in the class @code{FCR_REGS} (@code{cc0} to @code{cc3}). + +@item x +Register in the class @code{QUAD_FPR_REGS} (@code{fr0} to @code{fr63}). +Register numbers not divisible by 4 are excluded not in the class but through +the use of a machine mode larger than 8 bytes. + +@item z +Register in the class @code{SPR_REGS} (@code{lcr} and @code{lr}). + +@item A +Register in the class @code{QUAD_ACC_REGS} (@code{acc0} to @code{acc7}). + +@item B +Register in the class @code{ACCG_REGS} (@code{accg0} to @code{accg7}). + +@item C +Register in the class @code{CR_REGS} (@code{cc0} to @code{cc7}). + +@item G +Floating point constant zero @item I -Signed constant that fits in 4 bits +6-bit signed integer constant @item J -Signed constant that fits in 5 bits - -@item K -Signed constant that fits in 6 bits +10-bit signed integer constant @item L -Unsigned constant that fits in 4 bits +16-bit signed integer constant @item M -Signed constant that fits in 32 bits +16-bit unsigned integer constant @item N -Check for 64 bits wide constants for add/sub instructions +12-bit signed integer constant that is negative---i.e.@: in the +range of @minus{}2048 to @minus{}1 + +@item O +Constant zero + +@item P +12-bit signed integer constant that is greater than zero---i.e.@: in the +range of 1 to 2047. -@item G -Floating point constant that is legal for store immediate @end table @item Hewlett-Packard PA-RISC---@file{config/pa/pa.h} @@ -2107,615 +2318,68 @@ A memory operand for floating-point loads and stores A register indirect memory operand @end table -@item PowerPC and IBM RS6000---@file{config/rs6000/constraints.md} +@item Intel IA-64---@file{config/ia64/ia64.h} @table @code -@item b -Address base register - -@item d -Floating point register (containing 64-bit value) - -@item f -Floating point register (containing 32-bit value) +@item a +General register @code{r0} to @code{r3} for @code{addl} instruction -@item v -Altivec vector register - -@item wa -Any VSX register if the -mvsx option was used or NO_REGS. - -@item wd -VSX vector register to hold vector double data or NO_REGS. - -@item wf -VSX vector register to hold vector float data or NO_REGS. - -@item wg -If @option{-mmfpgpr} was used, a floating point register or NO_REGS. - -@item wh -Floating point register if direct moves are available, or NO_REGS. - -@item wi -FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS. - -@item wj -FP or VSX register to hold 64-bit integers for direct moves or NO_REGS. - -@item wk -FP or VSX register to hold 64-bit doubles for direct moves or NO_REGS. - -@item wl -Floating point register if the LFIWAX instruction is enabled or NO_REGS. - -@item wm -VSX register if direct move instructions are enabled, or NO_REGS. - -@item wn -No register (NO_REGS). - -@item wr -General purpose register if 64-bit instructions are enabled or NO_REGS. - -@item ws -VSX vector register to hold scalar double values or NO_REGS. - -@item wt -VSX vector register to hold 128 bit integer or NO_REGS. - -@item wu -Altivec register to use for float/32-bit int loads/stores or NO_REGS. - -@item wv -Altivec register to use for double loads/stores or NO_REGS. - -@item ww -FP or VSX register to perform float operations under @option{-mvsx} or NO_REGS. - -@item wx -Floating point register if the STFIWX instruction is enabled or NO_REGS. - -@item wy -FP or VSX register to perform ISA 2.07 float ops or NO_REGS. - -@item wz -Floating point register if the LFIWZX instruction is enabled or NO_REGS. - -@item wD -Int constant that is the element number of the 64-bit scalar in a vector. - -@item wQ -A memory address that will work with the @code{lq} and @code{stq} -instructions. - -@item h -@samp{MQ}, @samp{CTR}, or @samp{LINK} register - -@item q -@samp{MQ} register - -@item c -@samp{CTR} register - -@item l -@samp{LINK} register - -@item x -@samp{CR} register (condition register) number 0 - -@item y -@samp{CR} register (condition register) - -@item z -@samp{XER[CA]} carry bit (part of the XER register) - -@item I -Signed 16-bit constant - -@item J -Unsigned 16-bit constant shifted left 16 bits (use @samp{L} instead for -@code{SImode} constants) - -@item K -Unsigned 16-bit constant - -@item L -Signed 16-bit constant shifted left 16 bits - -@item M -Constant larger than 31 - -@item N -Exact power of 2 - -@item O -Zero - -@item P -Constant whose negation is a signed 16-bit constant - -@item G -Floating point constant that can be loaded into a register with one -instruction per word - -@item H -Integer/Floating point constant that can be loaded into a register using -three instructions - -@item m -Memory operand. -Normally, @code{m} does not allow addresses that update the base register. -If @samp{<} or @samp{>} constraint is also used, they are allowed and -therefore on PowerPC targets in that case it is only safe -to use @samp{m<>} in an @code{asm} statement if that @code{asm} statement -accesses the operand exactly once. The @code{asm} statement must also -use @samp{%U@var{}} as a placeholder for the ``update'' flag in the -corresponding load or store instruction. For example: - -@smallexample -asm ("st%U0 %1,%0" : "=m<>" (mem) : "r" (val)); -@end smallexample - -is correct but: - -@smallexample -asm ("st %1,%0" : "=m<>" (mem) : "r" (val)); -@end smallexample - -is not. - -@item es -A ``stable'' memory operand; that is, one which does not include any -automodification of the base register. This used to be useful when -@samp{m} allowed automodification of the base register, but as those are now only -allowed when @samp{<} or @samp{>} is used, @samp{es} is basically the same -as @samp{m} without @samp{<} and @samp{>}. - -@item Q -Memory operand that is an offset from a register (it is usually better -to use @samp{m} or @samp{es} in @code{asm} statements) - -@item Z -Memory operand that is an indexed or indirect from a register (it is -usually better to use @samp{m} or @samp{es} in @code{asm} statements) - -@item R -AIX TOC entry - -@item a -Address operand that is an indexed or indirect from a register (@samp{p} is -preferable for @code{asm} statements) - -@item S -Constant suitable as a 64-bit mask operand - -@item T -Constant suitable as a 32-bit mask operand - -@item U -System V Release 4 small data area reference - -@item t -AND masks that can be performed by two rldic@{l, r@} instructions - -@item W -Vector constant that does not require memory - -@item j -Vector constant that is all zeros. - -@end table - -@item x86 family---@file{config/i386/constraints.md} -@table @code -@item R -Legacy register---the eight integer registers available on all -i386 processors (@code{a}, @code{b}, @code{c}, @code{d}, -@code{si}, @code{di}, @code{bp}, @code{sp}). - -@item q -Any register accessible as @code{@var{r}l}. In 32-bit mode, @code{a}, -@code{b}, @code{c}, and @code{d}; in 64-bit mode, any integer register. - -@item Q -Any register accessible as @code{@var{r}h}: @code{a}, @code{b}, -@code{c}, and @code{d}. - -@ifset INTERNALS -@item l -Any register that can be used as the index in a base+index memory -access: that is, any general register except the stack pointer. -@end ifset - -@item a -The @code{a} register. - -@item b -The @code{b} register. - -@item c -The @code{c} register. - -@item d -The @code{d} register. - -@item S -The @code{si} register. - -@item D -The @code{di} register. - -@item A -The @code{a} and @code{d} registers. This class is used for instructions -that return double word results in the @code{ax:dx} register pair. Single -word values will be allocated either in @code{ax} or @code{dx}. -For example on i386 the following implements @code{rdtsc}: - -@smallexample -unsigned long long rdtsc (void) -@{ - unsigned long long tick; - __asm__ __volatile__("rdtsc":"=A"(tick)); - return tick; -@} -@end smallexample - -This is not correct on x86-64 as it would allocate tick in either @code{ax} -or @code{dx}. You have to use the following variant instead: - -@smallexample -unsigned long long rdtsc (void) -@{ - unsigned int tickl, tickh; - __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh)); - return ((unsigned long long)tickh << 32)|tickl; -@} -@end smallexample - - -@item f -Any 80387 floating-point (stack) register. - -@item t -Top of 80387 floating-point stack (@code{%st(0)}). - -@item u -Second from top of 80387 floating-point stack (@code{%st(1)}). - -@item y -Any MMX register. - -@item x -Any SSE register. - -@item Yz -First SSE register (@code{%xmm0}). - -@ifset INTERNALS -@item Y2 -Any SSE register, when SSE2 is enabled. - -@item Yi -Any SSE register, when SSE2 and inter-unit moves are enabled. - -@item Ym -Any MMX register, when inter-unit moves are enabled. -@end ifset - -@item I -Integer constant in the range 0 @dots{} 31, for 32-bit shifts. - -@item J -Integer constant in the range 0 @dots{} 63, for 64-bit shifts. - -@item K -Signed 8-bit integer constant. - -@item L -@code{0xFF} or @code{0xFFFF}, for andsi as a zero-extending move. - -@item M -0, 1, 2, or 3 (shifts for the @code{lea} instruction). - -@item N -Unsigned 8-bit integer constant (for @code{in} and @code{out} -instructions). - -@ifset INTERNALS -@item O -Integer constant in the range 0 @dots{} 127, for 128-bit shifts. -@end ifset - -@item G -Standard 80387 floating point constant. - -@item C -Standard SSE floating point constant. - -@item e -32-bit signed integer constant, or a symbolic reference known -to fit that range (for immediate operands in sign-extending x86-64 -instructions). - -@item Z -32-bit unsigned integer constant, or a symbolic reference known -to fit that range (for immediate operands in zero-extending x86-64 -instructions). - -@end table - -@item Intel IA-64---@file{config/ia64/ia64.h} -@table @code -@item a -General register @code{r0} to @code{r3} for @code{addl} instruction - -@item b -Branch register +@item b +Branch register @item c Predicate register (@samp{c} as in ``conditional'') -@item d -Application register residing in M-unit - -@item e -Application register residing in I-unit - -@item f -Floating-point register - -@item m -Memory operand. If used together with @samp{<} or @samp{>}, -the operand can have postincrement and postdecrement which -require printing with @samp{%Pn} on IA-64. - -@item G -Floating-point constant 0.0 or 1.0 - -@item I -14-bit signed integer constant - -@item J -22-bit signed integer constant - -@item K -8-bit signed integer constant for logical instructions - -@item L -8-bit adjusted signed integer constant for compare pseudo-ops - -@item M -6-bit unsigned integer constant for shift counts - -@item N -9-bit signed integer constant for load and store postincrements - -@item O -The constant zero - -@item P -0 or @minus{}1 for @code{dep} instruction - -@item Q -Non-volatile memory for floating-point loads and stores - -@item R -Integer constant in the range 1 to 4 for @code{shladd} instruction - -@item S -Memory operand except postincrement and postdecrement. This is -now roughly the same as @samp{m} when not used together with @samp{<} -or @samp{>}. -@end table - -@item FRV---@file{config/frv/frv.h} -@table @code -@item a -Register in the class @code{ACC_REGS} (@code{acc0} to @code{acc7}). - -@item b -Register in the class @code{EVEN_ACC_REGS} (@code{acc0} to @code{acc7}). - -@item c -Register in the class @code{CC_REGS} (@code{fcc0} to @code{fcc3} and -@code{icc0} to @code{icc3}). - -@item d -Register in the class @code{GPR_REGS} (@code{gr0} to @code{gr63}). - -@item e -Register in the class @code{EVEN_REGS} (@code{gr0} to @code{gr63}). -Odd registers are excluded not in the class but through the use of a machine -mode larger than 4 bytes. - -@item f -Register in the class @code{FPR_REGS} (@code{fr0} to @code{fr63}). - -@item h -Register in the class @code{FEVEN_REGS} (@code{fr0} to @code{fr63}). -Odd registers are excluded not in the class but through the use of a machine -mode larger than 4 bytes. - -@item l -Register in the class @code{LR_REG} (the @code{lr} register). - -@item q -Register in the class @code{QUAD_REGS} (@code{gr2} to @code{gr63}). -Register numbers not divisible by 4 are excluded not in the class but through -the use of a machine mode larger than 8 bytes. - -@item t -Register in the class @code{ICC_REGS} (@code{icc0} to @code{icc3}). - -@item u -Register in the class @code{FCC_REGS} (@code{fcc0} to @code{fcc3}). - -@item v -Register in the class @code{ICR_REGS} (@code{cc4} to @code{cc7}). - -@item w -Register in the class @code{FCR_REGS} (@code{cc0} to @code{cc3}). - -@item x -Register in the class @code{QUAD_FPR_REGS} (@code{fr0} to @code{fr63}). -Register numbers not divisible by 4 are excluded not in the class but through -the use of a machine mode larger than 8 bytes. - -@item z -Register in the class @code{SPR_REGS} (@code{lcr} and @code{lr}). - -@item A -Register in the class @code{QUAD_ACC_REGS} (@code{acc0} to @code{acc7}). - -@item B -Register in the class @code{ACCG_REGS} (@code{accg0} to @code{accg7}). - -@item C -Register in the class @code{CR_REGS} (@code{cc0} to @code{cc7}). - -@item G -Floating point constant zero - -@item I -6-bit signed integer constant - -@item J -10-bit signed integer constant - -@item L -16-bit signed integer constant - -@item M -16-bit unsigned integer constant - -@item N -12-bit signed integer constant that is negative---i.e.@: in the -range of @minus{}2048 to @minus{}1 - -@item O -Constant zero - -@item P -12-bit signed integer constant that is greater than zero---i.e.@: in the -range of 1 to 2047. - -@end table - -@item Blackfin family---@file{config/bfin/constraints.md} -@table @code -@item a -P register - -@item d -D register - -@item z -A call clobbered P register. - -@item q@var{n} -A single register. If @var{n} is in the range 0 to 7, the corresponding D -register. If it is @code{A}, then the register P0. - -@item D -Even-numbered D register - -@item W -Odd-numbered D register - -@item e -Accumulator register. - -@item A -Even-numbered accumulator register. - -@item B -Odd-numbered accumulator register. - -@item b -I register - -@item v -B register - -@item f -M register - -@item c -Registers used for circular buffering, i.e. I, B, or L registers. - -@item C -The CC register. - -@item t -LT0 or LT1. - -@item k -LC0 or LC1. - -@item u -LB0 or LB1. - -@item x -Any D, P, B, M, I or L register. - -@item y -Additional registers typically used only in prologues and epilogues: RETS, -RETN, RETI, RETX, RETE, ASTAT, SEQSTAT and USP. - -@item w -Any register except accumulators or CC. - -@item Ksh -Signed 16 bit integer (in the range @minus{}32768 to 32767) - -@item Kuh -Unsigned 16 bit integer (in the range 0 to 65535) - -@item Ks7 -Signed 7 bit integer (in the range @minus{}64 to 63) - -@item Ku7 -Unsigned 7 bit integer (in the range 0 to 127) - -@item Ku5 -Unsigned 5 bit integer (in the range 0 to 31) - -@item Ks4 -Signed 4 bit integer (in the range @minus{}8 to 7) - -@item Ks3 -Signed 3 bit integer (in the range @minus{}3 to 4) - -@item Ku3 -Unsigned 3 bit integer (in the range 0 to 7) +@item d +Application register residing in M-unit -@item P@var{n} -Constant @var{n}, where @var{n} is a single-digit constant in the range 0 to 4. +@item e +Application register residing in I-unit -@item PA -An integer equal to one of the MACFLAG_XXX constants that is suitable for -use with either accumulator. +@item f +Floating-point register -@item PB -An integer equal to one of the MACFLAG_XXX constants that is suitable for -use only with accumulator A1. +@item m +Memory operand. If used together with @samp{<} or @samp{>}, +the operand can have postincrement and postdecrement which +require printing with @samp{%Pn} on IA-64. -@item M1 -Constant 255. +@item G +Floating-point constant 0.0 or 1.0 -@item M2 -Constant 65535. +@item I +14-bit signed integer constant @item J -An integer constant with exactly a single bit set. +22-bit signed integer constant + +@item K +8-bit signed integer constant for logical instructions @item L -An integer constant with all bits set except exactly one. +8-bit adjusted signed integer constant for compare pseudo-ops -@item H +@item M +6-bit unsigned integer constant for shift counts + +@item N +9-bit signed integer constant for load and store postincrements + +@item O +The constant zero + +@item P +0 or @minus{}1 for @code{dep} instruction @item Q -Any SYMBOL_REF. +Non-volatile memory for floating-point loads and stores + +@item R +Integer constant in the range 1 to 4 for @code{shladd} instruction + +@item S +Memory operand except postincrement and postdecrement. This is +now roughly the same as @samp{m} when not used together with @samp{<} +or @samp{>}. @end table @item M32C---@file{config/m32c/m32c.c} @@ -3316,33 +2980,232 @@ Floating point constant 0. @item I An integer constant that fits in 16 bits. -@item J -An integer constant whose low order 16 bits are zero. +@item J +An integer constant whose low order 16 bits are zero. + +@item K +An integer constant that does not meet the constraints for codes +@samp{I} or @samp{J}. + +@item L +The integer constant 1. + +@item M +The integer constant @minus{}1. + +@item N +The integer constant 0. + +@item O +Integer constants @minus{}4 through @minus{}1 and 1 through 4; shifts by these +amounts are handled as multiple single-bit shifts rather than a single +variable-length shift. + +@item Q +A memory reference which requires an additional word (address or +offset) after the opcode. + +@item R +A memory reference that is encoded within the opcode. + +@end table + +@item PowerPC and IBM RS6000---@file{config/rs6000/constraints.md} +@table @code +@item b +Address base register + +@item d +Floating point register (containing 64-bit value) + +@item f +Floating point register (containing 32-bit value) + +@item v +Altivec vector register + +@item wa +Any VSX register if the -mvsx option was used or NO_REGS. + +@item wd +VSX vector register to hold vector double data or NO_REGS. + +@item wf +VSX vector register to hold vector float data or NO_REGS. + +@item wg +If @option{-mmfpgpr} was used, a floating point register or NO_REGS. + +@item wh +Floating point register if direct moves are available, or NO_REGS. + +@item wi +FP or VSX register to hold 64-bit integers for VSX insns or NO_REGS. + +@item wj +FP or VSX register to hold 64-bit integers for direct moves or NO_REGS. + +@item wk +FP or VSX register to hold 64-bit doubles for direct moves or NO_REGS. + +@item wl +Floating point register if the LFIWAX instruction is enabled or NO_REGS. + +@item wm +VSX register if direct move instructions are enabled, or NO_REGS. + +@item wn +No register (NO_REGS). + +@item wr +General purpose register if 64-bit instructions are enabled or NO_REGS. + +@item ws +VSX vector register to hold scalar double values or NO_REGS. + +@item wt +VSX vector register to hold 128 bit integer or NO_REGS. + +@item wu +Altivec register to use for float/32-bit int loads/stores or NO_REGS. + +@item wv +Altivec register to use for double loads/stores or NO_REGS. + +@item ww +FP or VSX register to perform float operations under @option{-mvsx} or NO_REGS. + +@item wx +Floating point register if the STFIWX instruction is enabled or NO_REGS. + +@item wy +FP or VSX register to perform ISA 2.07 float ops or NO_REGS. + +@item wz +Floating point register if the LFIWZX instruction is enabled or NO_REGS. + +@item wD +Int constant that is the element number of the 64-bit scalar in a vector. + +@item wQ +A memory address that will work with the @code{lq} and @code{stq} +instructions. + +@item h +@samp{MQ}, @samp{CTR}, or @samp{LINK} register + +@item q +@samp{MQ} register + +@item c +@samp{CTR} register + +@item l +@samp{LINK} register + +@item x +@samp{CR} register (condition register) number 0 + +@item y +@samp{CR} register (condition register) + +@item z +@samp{XER[CA]} carry bit (part of the XER register) + +@item I +Signed 16-bit constant + +@item J +Unsigned 16-bit constant shifted left 16 bits (use @samp{L} instead for +@code{SImode} constants) + +@item K +Unsigned 16-bit constant + +@item L +Signed 16-bit constant shifted left 16 bits + +@item M +Constant larger than 31 + +@item N +Exact power of 2 + +@item O +Zero + +@item P +Constant whose negation is a signed 16-bit constant + +@item G +Floating point constant that can be loaded into a register with one +instruction per word + +@item H +Integer/Floating point constant that can be loaded into a register using +three instructions + +@item m +Memory operand. +Normally, @code{m} does not allow addresses that update the base register. +If @samp{<} or @samp{>} constraint is also used, they are allowed and +therefore on PowerPC targets in that case it is only safe +to use @samp{m<>} in an @code{asm} statement if that @code{asm} statement +accesses the operand exactly once. The @code{asm} statement must also +use @samp{%U@var{}} as a placeholder for the ``update'' flag in the +corresponding load or store instruction. For example: + +@smallexample +asm ("st%U0 %1,%0" : "=m<>" (mem) : "r" (val)); +@end smallexample + +is correct but: + +@smallexample +asm ("st %1,%0" : "=m<>" (mem) : "r" (val)); +@end smallexample + +is not. + +@item es +A ``stable'' memory operand; that is, one which does not include any +automodification of the base register. This used to be useful when +@samp{m} allowed automodification of the base register, but as those are now only +allowed when @samp{<} or @samp{>} is used, @samp{es} is basically the same +as @samp{m} without @samp{<} and @samp{>}. + +@item Q +Memory operand that is an offset from a register (it is usually better +to use @samp{m} or @samp{es} in @code{asm} statements) + +@item Z +Memory operand that is an indexed or indirect from a register (it is +usually better to use @samp{m} or @samp{es} in @code{asm} statements) + +@item R +AIX TOC entry -@item K -An integer constant that does not meet the constraints for codes -@samp{I} or @samp{J}. +@item a +Address operand that is an indexed or indirect from a register (@samp{p} is +preferable for @code{asm} statements) -@item L -The integer constant 1. +@item S +Constant suitable as a 64-bit mask operand -@item M -The integer constant @minus{}1. +@item T +Constant suitable as a 32-bit mask operand -@item N -The integer constant 0. +@item U +System V Release 4 small data area reference -@item O -Integer constants @minus{}4 through @minus{}1 and 1 through 4; shifts by these -amounts are handled as multiple single-bit shifts rather than a single -variable-length shift. +@item t +AND masks that can be performed by two rldic@{l, r@} instructions -@item Q -A memory reference which requires an additional word (address or -offset) after the opcode. +@item W +Vector constant that does not require memory -@item R -A memory reference that is encoded within the opcode. +@item j +Vector constant that is all zeros. @end table @@ -3462,6 +3325,79 @@ A constant in the range 0 to 15, inclusive. @end table +@item S/390 and zSeries---@file{config/s390/s390.h} +@table @code +@item a +Address register (general purpose register except r0) + +@item c +Condition code register + +@item d +Data register (arbitrary general purpose register) + +@item f +Floating-point register + +@item I +Unsigned 8-bit constant (0--255) + +@item J +Unsigned 12-bit constant (0--4095) + +@item K +Signed 16-bit constant (@minus{}32768--32767) + +@item L +Value appropriate as displacement. +@table @code +@item (0..4095) +for short displacement +@item (@minus{}524288..524287) +for long displacement +@end table + +@item M +Constant integer with a value of 0x7fffffff. + +@item N +Multiple letter constraint followed by 4 parameter letters. +@table @code +@item 0..9: +number of the part counting from most to least significant +@item H,Q: +mode of the part +@item D,S,H: +mode of the containing operand +@item 0,F: +value of the other parts (F---all bits set) +@end table +The constraint matches if the specified part of a constant +has a value different from its other parts. + +@item Q +Memory reference without index register and with short displacement. + +@item R +Memory reference with index register and short displacement. + +@item S +Memory reference without index register but with long displacement. + +@item T +Memory reference with index register and long displacement. + +@item U +Pointer with short displacement. + +@item W +Pointer with long displacement. + +@item Y +Shift count operand. + +@end table + @need 1000 @item SPARC---@file{config/sparc/sparc.h} @table @code @@ -3581,199 +3517,56 @@ An immediate which can be loaded with @code{fsmbi}. @item A An immediate which can be loaded with the il/ila/ilh/ilhu instructions. const_int is treated as a 32 bit value. -@item B -An immediate for most arithmetic instructions. const_int is treated as a 32 bit value. - -@item C -An immediate for and/xor/or instructions. const_int is treated as a 32 bit value. - -@item D -An immediate for the @code{iohl} instruction. const_int is treated as a 32 bit value. - -@item I -A constant in the range [@minus{}64, 63] for shift/rotate instructions. - -@item J -An unsigned 7-bit constant for conversion/nop/channel instructions. - -@item K -A signed 10-bit constant for most arithmetic instructions. - -@item M -A signed 16 bit immediate for @code{stop}. - -@item N -An unsigned 16-bit constant for @code{iohl} and @code{fsmbi}. - -@item O -An unsigned 7-bit constant whose 3 least significant bits are 0. - -@item P -An unsigned 3-bit constant for 16-byte rotates and shifts - -@item R -Call operand, reg, for indirect calls - -@item S -Call operand, symbol, for relative calls. - -@item T -Call operand, const_int, for absolute calls. - -@item U -An immediate which can be loaded with the il/ila/ilh/ilhu instructions. const_int is sign extended to 128 bit. - -@item W -An immediate for shift and rotate instructions. const_int is treated as a 32 bit value. - -@item Y -An immediate for and/xor/or instructions. const_int is sign extended as a 128 bit. - -@item Z -An immediate for the @code{iohl} instruction. const_int is sign extended to 128 bit. - -@end table - -@item S/390 and zSeries---@file{config/s390/s390.h} -@table @code -@item a -Address register (general purpose register except r0) - -@item c -Condition code register - -@item d -Data register (arbitrary general purpose register) - -@item f -Floating-point register - -@item I -Unsigned 8-bit constant (0--255) - -@item J -Unsigned 12-bit constant (0--4095) - -@item K -Signed 16-bit constant (@minus{}32768--32767) - -@item L -Value appropriate as displacement. -@table @code -@item (0..4095) -for short displacement -@item (@minus{}524288..524287) -for long displacement -@end table - -@item M -Constant integer with a value of 0x7fffffff. - -@item N -Multiple letter constraint followed by 4 parameter letters. -@table @code -@item 0..9: -number of the part counting from most to least significant -@item H,Q: -mode of the part -@item D,S,H: -mode of the containing operand -@item 0,F: -value of the other parts (F---all bits set) -@end table -The constraint matches if the specified part of a constant -has a value different from its other parts. - -@item Q -Memory reference without index register and with short displacement. - -@item R -Memory reference with index register and short displacement. - -@item S -Memory reference without index register but with long displacement. - -@item T -Memory reference with index register and long displacement. - -@item U -Pointer with short displacement. - -@item W -Pointer with long displacement. - -@item Y -Shift count operand. - -@end table - -@item Xstormy16---@file{config/stormy16/stormy16.h} -@table @code -@item a -Register r0. - -@item b -Register r1. - -@item c -Register r2. - -@item d -Register r8. - -@item e -Registers r0 through r7. - -@item t -Registers r0 and r1. +@item B +An immediate for most arithmetic instructions. const_int is treated as a 32 bit value. -@item y -The carry register. +@item C +An immediate for and/xor/or instructions. const_int is treated as a 32 bit value. -@item z -Registers r8 and r9. +@item D +An immediate for the @code{iohl} instruction. const_int is treated as a 32 bit value. @item I -A constant between 0 and 3 inclusive. +A constant in the range [@minus{}64, 63] for shift/rotate instructions. @item J -A constant that has exactly one bit set. +An unsigned 7-bit constant for conversion/nop/channel instructions. @item K -A constant that has exactly one bit clear. - -@item L -A constant between 0 and 255 inclusive. +A signed 10-bit constant for most arithmetic instructions. @item M -A constant between @minus{}255 and 0 inclusive. +A signed 16 bit immediate for @code{stop}. @item N -A constant between @minus{}3 and 0 inclusive. +An unsigned 16-bit constant for @code{iohl} and @code{fsmbi}. @item O -A constant between 1 and 4 inclusive. +An unsigned 7-bit constant whose 3 least significant bits are 0. @item P -A constant between @minus{}4 and @minus{}1 inclusive. - -@item Q -A memory reference that is a stack push. +An unsigned 3-bit constant for 16-byte rotates and shifts @item R -A memory reference that is a stack pop. +Call operand, reg, for indirect calls @item S -A memory reference that refers to a constant address of known value. +Call operand, symbol, for relative calls. @item T -The register indicated by Rx (not implemented yet). +Call operand, const_int, for absolute calls. @item U -A constant that is not between 2 and 15 inclusive. +An immediate which can be loaded with the il/ila/ilh/ilhu instructions. const_int is sign extended to 128 bit. + +@item W +An immediate for shift and rotate instructions. const_int is treated as a 32 bit value. + +@item Y +An immediate for and/xor/or instructions. const_int is sign extended as a 128 bit. @item Z -The constant 0. +An immediate for the @code{iohl} instruction. const_int is sign extended to 128 bit. @end table @@ -4058,6 +3851,214 @@ Integer constant 0 Integer constant 32 @end table +@item x86 family---@file{config/i386/constraints.md} +@table @code +@item R +Legacy register---the eight integer registers available on all +i386 processors (@code{a}, @code{b}, @code{c}, @code{d}, +@code{si}, @code{di}, @code{bp}, @code{sp}). + +@item q +Any register accessible as @code{@var{r}l}. In 32-bit mode, @code{a}, +@code{b}, @code{c}, and @code{d}; in 64-bit mode, any integer register. + +@item Q +Any register accessible as @code{@var{r}h}: @code{a}, @code{b}, +@code{c}, and @code{d}. + +@ifset INTERNALS +@item l +Any register that can be used as the index in a base+index memory +access: that is, any general register except the stack pointer. +@end ifset + +@item a +The @code{a} register. + +@item b +The @code{b} register. + +@item c +The @code{c} register. + +@item d +The @code{d} register. + +@item S +The @code{si} register. + +@item D +The @code{di} register. + +@item A +The @code{a} and @code{d} registers. This class is used for instructions +that return double word results in the @code{ax:dx} register pair. Single +word values will be allocated either in @code{ax} or @code{dx}. +For example on i386 the following implements @code{rdtsc}: + +@smallexample +unsigned long long rdtsc (void) +@{ + unsigned long long tick; + __asm__ __volatile__("rdtsc":"=A"(tick)); + return tick; +@} +@end smallexample + +This is not correct on x86-64 as it would allocate tick in either @code{ax} +or @code{dx}. You have to use the following variant instead: + +@smallexample +unsigned long long rdtsc (void) +@{ + unsigned int tickl, tickh; + __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh)); + return ((unsigned long long)tickh << 32)|tickl; +@} +@end smallexample + + +@item f +Any 80387 floating-point (stack) register. + +@item t +Top of 80387 floating-point stack (@code{%st(0)}). + +@item u +Second from top of 80387 floating-point stack (@code{%st(1)}). + +@item y +Any MMX register. + +@item x +Any SSE register. + +@item Yz +First SSE register (@code{%xmm0}). + +@ifset INTERNALS +@item Y2 +Any SSE register, when SSE2 is enabled. + +@item Yi +Any SSE register, when SSE2 and inter-unit moves are enabled. + +@item Ym +Any MMX register, when inter-unit moves are enabled. +@end ifset + +@item I +Integer constant in the range 0 @dots{} 31, for 32-bit shifts. + +@item J +Integer constant in the range 0 @dots{} 63, for 64-bit shifts. + +@item K +Signed 8-bit integer constant. + +@item L +@code{0xFF} or @code{0xFFFF}, for andsi as a zero-extending move. + +@item M +0, 1, 2, or 3 (shifts for the @code{lea} instruction). + +@item N +Unsigned 8-bit integer constant (for @code{in} and @code{out} +instructions). + +@ifset INTERNALS +@item O +Integer constant in the range 0 @dots{} 127, for 128-bit shifts. +@end ifset + +@item G +Standard 80387 floating point constant. + +@item C +Standard SSE floating point constant. + +@item e +32-bit signed integer constant, or a symbolic reference known +to fit that range (for immediate operands in sign-extending x86-64 +instructions). + +@item Z +32-bit unsigned integer constant, or a symbolic reference known +to fit that range (for immediate operands in zero-extending x86-64 +instructions). + +@end table + +@item Xstormy16---@file{config/stormy16/stormy16.h} +@table @code +@item a +Register r0. + +@item b +Register r1. + +@item c +Register r2. + +@item d +Register r8. + +@item e +Registers r0 through r7. + +@item t +Registers r0 and r1. + +@item y +The carry register. + +@item z +Registers r8 and r9. + +@item I +A constant between 0 and 3 inclusive. + +@item J +A constant that has exactly one bit set. + +@item K +A constant that has exactly one bit clear. + +@item L +A constant between 0 and 255 inclusive. + +@item M +A constant between @minus{}255 and 0 inclusive. + +@item N +A constant between @minus{}3 and 0 inclusive. + +@item O +A constant between 1 and 4 inclusive. + +@item P +A constant between @minus{}4 and @minus{}1 inclusive. + +@item Q +A memory reference that is a stack push. + +@item R +A memory reference that is a stack pop. + +@item S +A memory reference that refers to a constant address of known value. + +@item T +The register indicated by Rx (not implemented yet). + +@item U +A constant that is not between 2 and 15 inclusive. + +@item Z +The constant 0. + +@end table + @item Xtensa---@file{config/xtensa/constraints.md} @table @code @item a