libstdc++: Fix mask reduction of simd_mask<double> on POWER7
authorMatthias Kretz <kretz@kde.org>
Wed, 3 Feb 2021 15:49:30 +0000 (15:49 +0000)
committerJonathan Wakely <jwakely@redhat.com>
Wed, 3 Feb 2021 15:49:30 +0000 (15:49 +0000)
commit81c2c32de9c1058c33fcf77ada31186b4ae1f1fe
treeade43ae42ef8baf375965866e4811c3a871d9389
parent71f9b9bd0acc7d0749e159efb1b9b4c57197a77d
libstdc++: Fix mask reduction of simd_mask<double> on POWER7

POWER7 does not support __vector long long reductions, making the
generic _S_popcount implementation ill-formed. Specializing _S_popcount
for PPC allows optimization and avoids the issue.

libstdc++-v3/ChangeLog:

* include/experimental/bits/simd.h: Add __have_power10vec
conditional on _ARCH_PWR10.
* include/experimental/bits/simd_builtin.h: Forward declare
_MaskImplPpc and use it as _MaskImpl when __ALTIVEC__ is
defined.
(_MaskImplBuiltin::_S_some_of): Call _S_popcount from the
_SuperImpl for optimizations and correctness.
* include/experimental/bits/simd_ppc.h: Add _MaskImplPpc.
(_MaskImplPpc::_S_popcount): Implement via vec_cntm for POWER10.
Otherwise, for >=int use -vec_sums divided by a sizeof factor.
For <int use -vec_sums(vec_sum4s(...)) to sum all mask entries.
libstdc++-v3/include/experimental/bits/simd.h
libstdc++-v3/include/experimental/bits/simd_builtin.h
libstdc++-v3/include/experimental/bits/simd_ppc.h