From 2bcceb6fc59fcdaf51006d4fcfc71c2d26761396 Mon Sep 17 00:00:00 2001 From: Matthias Kretz Date: Thu, 21 Jan 2021 11:45:15 +0000 Subject: [PATCH] libstdc++: Add std::experimental::simd from the Parallelism TS 2 Adds . This implements the simd and simd_mask class templates via [[gnu::vector_size(N)]] data members. It implements overloads for all of for simd. Explicit vectorization of the functions is not finished. The majority of functions are marked as [[gnu::always_inline]] to enable quasi-ODR-conforming linking of TUs with different -m flags. Performance optimization was done for x86_64. ARM, Aarch64, and POWER rely on the compiler to recognize reduction, conversion, and shuffle patterns. Besides verification using many different machine flages, the code was also verified with different fast-math flags. libstdc++-v3/ChangeLog: * doc/xml/manual/status_cxx2017.xml: Add implementation status of the Parallelism TS 2. Document implementation-defined types and behavior. * include/Makefile.am: Add new headers. * include/Makefile.in: Regenerate. * include/experimental/simd: New file. New header for Parallelism TS 2. * include/experimental/bits/numeric_traits.h: New file. Implementation of P1841R1 using internal naming. Addition of missing IEC559 functionality query. * include/experimental/bits/simd.h: New file. Definition of the public simd interfaces and general implementation helpers. * include/experimental/bits/simd_builtin.h: New file. Implementation of the _VecBuiltin simd_abi. * include/experimental/bits/simd_converter.h: New file. Generic simd conversions. * include/experimental/bits/simd_detail.h: New file. Internal macros for the simd implementation. * include/experimental/bits/simd_fixed_size.h: New file. Simd fixed_size ABI specific implementations. * include/experimental/bits/simd_math.h: New file. Math overloads for simd. * include/experimental/bits/simd_neon.h: New file. Simd NEON specific implementations. * include/experimental/bits/simd_ppc.h: New file. Implement bit shifts to avoid invalid results for integral types smaller than int. * include/experimental/bits/simd_scalar.h: New file. Simd scalar ABI specific implementations. * include/experimental/bits/simd_x86.h: New file. Simd x86 specific implementations. * include/experimental/bits/simd_x86_conversions.h: New file. x86 specific conversion optimizations. The conversion patterns work around missing conversion patterns in the compiler and should be removed as soon as PR85048 is resolved. * testsuite/experimental/simd/standard_abi_usable.cc: New file. Test that all (not all fixed_size, though) standard simd and simd_mask types are usable. * testsuite/experimental/simd/standard_abi_usable_2.cc: New file. As above but with -ffast-math. * testsuite/libstdc++-dg/conformance.exp: Don't build simd tests from the standard test loop. Instead use check_vect_support_and_set_flags to build simd tests with the relevant machine flags. --- .../doc/xml/manual/status_cxx2017.xml | 216 + libstdc++-v3/include/Makefile.am | 13 + libstdc++-v3/include/Makefile.in | 13 + .../experimental/bits/numeric_traits.h | 567 ++ libstdc++-v3/include/experimental/bits/simd.h | 5051 ++++++++++++++++ .../include/experimental/bits/simd_builtin.h | 2949 ++++++++++ .../experimental/bits/simd_converter.h | 354 ++ .../include/experimental/bits/simd_detail.h | 306 + .../experimental/bits/simd_fixed_size.h | 2066 +++++++ .../include/experimental/bits/simd_math.h | 1500 +++++ .../include/experimental/bits/simd_neon.h | 519 ++ .../include/experimental/bits/simd_ppc.h | 123 + .../include/experimental/bits/simd_scalar.h | 772 +++ .../include/experimental/bits/simd_x86.h | 5169 +++++++++++++++++ .../experimental/bits/simd_x86_conversions.h | 2029 +++++++ libstdc++-v3/include/experimental/simd | 70 + .../experimental/simd/standard_abi_usable.cc | 64 + .../simd/standard_abi_usable_2.cc | 4 + .../testsuite/libstdc++-dg/conformance.exp | 18 +- 19 files changed, 21802 insertions(+), 1 deletion(-) create mode 100644 libstdc++-v3/include/experimental/bits/numeric_traits.h create mode 100644 libstdc++-v3/include/experimental/bits/simd.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_builtin.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_converter.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_detail.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_fixed_size.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_math.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_neon.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_ppc.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_scalar.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_x86.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_x86_conversions.h create mode 100644 libstdc++-v3/include/experimental/simd create mode 100644 libstdc++-v3/testsuite/experimental/simd/standard_abi_usable.cc create mode 100644 libstdc++-v3/testsuite/experimental/simd/standard_abi_usable_2.cc diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml index e6834b3607a..bc740f8e1ba 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml @@ -2869,6 +2869,17 @@ since C++14 and the implementation is complete. Library Fundamentals 2 TS + + + + P0214R9 + + + Data-Parallel Types + Y + Parallelism 2 TS + + @@ -3014,6 +3025,211 @@ since C++14 and the implementation is complete. If !is_regular_file(p), an error is reported. +
Parallelism 2 TS + + + 9.3 [parallel.simd.abi] + max_fixed_size<T> is 32, except when targetting + AVX512BW and sizeof(T) is 1. + + + + When targeting 32-bit x86, + simd_abi::compatible<T> is an alias for + simd_abi::scalar. + When targeting 64-bit x86 (including x32) or Aarch64, + simd_abi::compatible<T> is an alias for + simd_abi::_VecBuiltin<16>, + unless T is long double, in which case it is + an alias for simd_abi::scalar. + When targeting ARM (but not Aarch64) with NEON support, + simd_abi::compatible<T> is an alias for + simd_abi::_VecBuiltin<16>, + unless sizeof(T) > 4, in which case it is + an alias for simd_abi::scalar. Additionally, + simd_abi::compatible<float> is an alias for + simd_abi::scalar unless compiling with + -ffast-math. + + + + When targeting x86 (both 32-bit and 64-bit), + simd_abi::native<T> is an alias for one of + simd_abi::scalar, + simd_abi::_VecBuiltin<16>, + simd_abi::_VecBuiltin<32>, or + simd_abi::_VecBltnBtmsk<64>, depending on + T and the machine options the compiler was invoked with. + + + + When targeting ARM/Aarch64 or POWER, + simd_abi::native<T> is an alias for + simd_abi::scalar or + simd_abi::_VecBuiltin<16>, depending on + T and the machine options the compiler was invoked with. + + + + For any other targeted machine + simd_abi::compatible<T> and + simd_abi::native<T> are aliases for + simd_abi::scalar. (subject to change) + + + + The extended ABI tag types defined in the + std::experimental::parallelism_v2::simd_abi namespace are: + simd_abi::_VecBuiltin<Bytes>, and + simd_abi::_VecBltnBtmsk<Bytes>. + + + + simd_abi::deduce<T, N, Abis...>::type, + with N > 1 is an alias for an extended ABI tag, if a + supported extended ABI tag exists. Otherwise it is an alias for + simd_abi::fixed_size<N>. The + simd_abi::_VecBltnBtmsk ABI tag is preferred over + simd_abi::_VecBuiltin. + + + + 9.4 [parallel.simd.traits] + memory_alignment<T, U>::value is + sizeof(U) * T::size() rounded up to the next power-of-two + value. + + + + 9.6.1 [parallel.simd.overview] + On ARM, simd<T, _VecBuiltin<Bytes>> + is supported if __ARM_NEON is defined and + sizeof(T) <= 4. Additionally, + sizeof(T) == 8 with integral T is supported if + __ARM_ARCH >= 8, and double is supported if + __aarch64__ is defined. + + On POWER, simd<T, _VecBuiltin<Bytes>> + is supported if __ALTIVEC__ is defined and sizeof(T) + < 8. Additionally, double is supported if + __VSX__ is defined, and any T with + sizeof(T) ≤ 8 is supported if __POWER8_VECTOR__ + is defined. + + On x86, given an extended ABI tag Abi, + simd<T, Abi> is supported according to the + following table: + + Support for Extended ABI Tags + + + + + + + + + ABI tag Abi + value type T + values for Bytes + required machine option + + + + + + + _VecBuiltin<Bytes> + + float + 8, 12, 16 + "-msse" + + + + 20, 24, 28, 32 + "-mavx" + + + + double + 16 + "-msse2" + + + + 24, 32 + "-mavx" + + + + + integral types other than bool + + + Bytes ≤ 16 and Bytes divisible by + sizeof(T) + + "-msse2" + + + + + 16 < Bytes ≤ 32 and Bytes + divisible by sizeof(T) + + "-mavx2" + + + + + _VecBuiltin<Bytes> and + _VecBltnBtmsk<Bytes> + + + vectorizable types with sizeof(T) ≥ 4 + + + 32 < Bytes ≤ 64 and Bytes + divisible by sizeof(T) + + "-mavx512f" + + + + + vectorizable types with sizeof(T) < 4 + + "-mavx512bw" + + + + + _VecBltnBtmsk<Bytes> + + + vectorizable types with sizeof(T) ≥ 4 + + + Bytes ≤ 32 and Bytes divisible by + sizeof(T) + + "-mavx512vl" + + + + + vectorizable types with sizeof(T) < 4 + + "-mavx512bw" and "-mavx512vl" + + + + +
+
+ +
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index 90508a8fe83..f24a5489e8e 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -747,6 +747,7 @@ experimental_headers = \ ${experimental_srcdir}/ratio \ ${experimental_srcdir}/regex \ ${experimental_srcdir}/set \ + ${experimental_srcdir}/simd \ ${experimental_srcdir}/socket \ ${experimental_srcdir}/source_location \ ${experimental_srcdir}/string \ @@ -766,7 +767,19 @@ experimental_bits_builddir = ./experimental/bits experimental_bits_headers = \ ${experimental_bits_srcdir}/lfts_config.h \ ${experimental_bits_srcdir}/net.h \ + ${experimental_bits_srcdir}/numeric_traits.h \ ${experimental_bits_srcdir}/shared_ptr.h \ + ${experimental_bits_srcdir}/simd.h \ + ${experimental_bits_srcdir}/simd_builtin.h \ + ${experimental_bits_srcdir}/simd_converter.h \ + ${experimental_bits_srcdir}/simd_detail.h \ + ${experimental_bits_srcdir}/simd_fixed_size.h \ + ${experimental_bits_srcdir}/simd_math.h \ + ${experimental_bits_srcdir}/simd_neon.h \ + ${experimental_bits_srcdir}/simd_ppc.h \ + ${experimental_bits_srcdir}/simd_scalar.h \ + ${experimental_bits_srcdir}/simd_x86.h \ + ${experimental_bits_srcdir}/simd_x86_conversions.h \ ${experimental_bits_srcdir}/string_view.tcc \ ${experimental_bits_filesystem_headers} diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in index 922ba440df0..12c63400706 100644 --- a/libstdc++-v3/include/Makefile.in +++ b/libstdc++-v3/include/Makefile.in @@ -1097,6 +1097,7 @@ experimental_headers = \ ${experimental_srcdir}/ratio \ ${experimental_srcdir}/regex \ ${experimental_srcdir}/set \ + ${experimental_srcdir}/simd \ ${experimental_srcdir}/socket \ ${experimental_srcdir}/source_location \ ${experimental_srcdir}/string \ @@ -1116,7 +1117,19 @@ experimental_bits_builddir = ./experimental/bits experimental_bits_headers = \ ${experimental_bits_srcdir}/lfts_config.h \ ${experimental_bits_srcdir}/net.h \ + ${experimental_bits_srcdir}/numeric_traits.h \ ${experimental_bits_srcdir}/shared_ptr.h \ + ${experimental_bits_srcdir}/simd.h \ + ${experimental_bits_srcdir}/simd_builtin.h \ + ${experimental_bits_srcdir}/simd_converter.h \ + ${experimental_bits_srcdir}/simd_detail.h \ + ${experimental_bits_srcdir}/simd_fixed_size.h \ + ${experimental_bits_srcdir}/simd_math.h \ + ${experimental_bits_srcdir}/simd_neon.h \ + ${experimental_bits_srcdir}/simd_ppc.h \ + ${experimental_bits_srcdir}/simd_scalar.h \ + ${experimental_bits_srcdir}/simd_x86.h \ + ${experimental_bits_srcdir}/simd_x86_conversions.h \ ${experimental_bits_srcdir}/string_view.tcc \ ${experimental_bits_filesystem_headers} diff --git a/libstdc++-v3/include/experimental/bits/numeric_traits.h b/libstdc++-v3/include/experimental/bits/numeric_traits.h new file mode 100644 index 00000000000..1b60874b788 --- /dev/null +++ b/libstdc++-v3/include/experimental/bits/numeric_traits.h @@ -0,0 +1,567 @@ +// Definition of numeric_limits replacement traits P1841R1 -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// . + +#include + +namespace std { + +template