Design

Design
PrevÂ	ChapterÂ 31.Â Parallel Mode	Â Next

Interface Basics

All parallel algorithms are intended to have signatures that are +

Interface Basics

+All parallel algorithms are intended to have signatures that are equivalent to the ISO C++ algorithms replaced. For instance, the -std::adjacent_find function is declared as: +std::adjacent_find function is declared as:

 namespace std
 {
@@ -57,36 +58,124 @@ parallel algorithms look like this:
 ISO C++ signature to the correct parallel version. Also, some of the
 algorithms do not have support for run-time conditions, so the last
 overload is therefore missing.
-

Configuration and Tuning

Some algorithm variants can be enabled/disabled/selected at compile-time. -See -<compiletime_settings.h> and -See -<features.h> for details. +

Configuration and Tuning

Setting up the OpenMP Environment

+Several aspects of the overall runtime environment can be manipulated +by standard OpenMP function calls.

-To specify the number of threads to be used for an algorithm, -use omp_set_num_threads. -To force a function to execute sequentially, -even though parallelism is switched on in general, -add __gnu_parallel::sequential_tag() -to the end of the argument list. +To specify the number of threads to be used for an algorithm, use the +function omp_set_num_threads. An example: +

+#include <stdlib.h>
+#include <omp.h>
+
+int main()
+{
+  // Explicitly set number of threads.
+  const int threads_wanted = 20;
+  omp_set_dynamic(false);
+  omp_set_num_threads(threads_wanted);
+  if (omp_get_num_threads() != threads_wanted)
+    abort();
+
+  // Do work.
+
+  return 0;
+}
+

+Other parts of the runtime environment able to be manipulated include +nested parallelism (omp_set_nested), schedule kind +(omp_set_schedule), and others. See the OpenMP +documentation for more information. +

Compile Time Switches

+To force an algorithm to execute sequentially, even though parallelism +is switched on in general via the macro _GLIBCXX_PARALLEL, +add __gnu_parallel::sequential_tag() to the end +of the algorithm's argument list, or explicitly qualify the algorithm +with the __gnu_parallel:: namespace. +

+Like so: +

+std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag());
+

+or +

+__gnu_serial::sort(v.begin(), v.end());
+

+In addition, some parallel algorithm variants can be enabled/disabled/selected +at compile-time. +

+See compiletime_settings.h and +See features.h for details. +

Run Time Settings and Defaults

+The default parallization strategy, the choice of specific algorithm +strategy, the minimum threshold limits for individual parallel +algorithms, and aspects of the underlying hardware can be specified as +desired via manipulation +of __gnu_parallel::_Settings member data.

-Parallelism always incurs some overhead. Thus, it is not -helpful to parallelize operations on very small sets of data. -There are measures to avoid parallelizing stuff that is not worth it. -For each algorithm, a minimum problem size can be stated, -usually using the variable -__gnu_parallel::Settings::[algorithm]_minimal_n. -Please see -<settings.h> for details.

Implementation Namespaces

One namespace contain versions of code that are explicitly sequential: +First off, the choice of parallelization strategy: serial, parallel, +or implementation-deduced. This corresponds +to __gnu_parallel::_Settings::algorithm_strategy and is a +value of enum __gnu_parallel::_AlgorithmStrategy +type. Choices +include: heuristic, force_sequential, +and force_parallel. The default is +implementation-deduced, ie heuristic. +

+Next, the sub-choices for algorithm implementation. Specific +algorithms like find or sort +can be implemented in multiple ways: when this is the case, +a __gnu_parallel::_Settings member exists to +pick the default strategy. For +example, __gnu_parallel::_Settings::sort_algorithm can +have any values of +enum __gnu_parallel::_SortAlgorithm: MWMS, QS, +or QS_BALANCED. +

+Likewise for setting the minimal threshold for algorithm +paralleization. Parallelism always incurs some overhead. Thus, it is +not helpful to parallelize operations on very small sets of +data. Because of this, measures are taken to avoid parallelizing below +a certain, pre-determined threshold. For each algorithm, a minimum +problem size is encoded as a variable in the +active __gnu_parallel::_Settings object. This +threshold variable follows the following naming scheme: +__gnu_parallel::_Settings::[algorithm]_minimal_n. So, +for fill, the threshold variable +is __gnu_parallel::_Settings::fill_minimal_n +

+Finally, hardware details like L1/L2 cache size can be hardwired +via __gnu_parallel::_Settings::L1_cache_size and friends. +

+All these configuration variables can be changed by the user, if +desired. Please +see settings.h +for complete details. +

+A small example of tuning the default: +

+#include <parallel/algorithm>
+#include <parallel/settings.h>
+
+int main()
+{
+  __gnu_parallel::_Settings s;
+  s.algorithm_strategy = __gnu_parallel::force_parallel;
+  __gnu_parallel::_Settings::set(s);
+
+  // Do work... all algorithms will be parallelized, always.
+
+  return 0;
+}
+

Implementation Namespaces

One namespace contain versions of code that are always +explicitly sequential: __gnu_serial.

Two namespaces contain the parallel mode: std::__parallel and __gnu_parallel.

Parallel implementations of standard components, including template helpers to select parallelism, are defined in namespace -std::__parallel. For instance, std::transform from -<algorithm> has a parallel counterpart in -std::__parallel::transform from -<parallel/algorithm>. In addition, these parallel +std::__parallel. For instance, std::transform from algorithm has a parallel counterpart in +std::__parallel::transform from parallel/algorithm. In addition, these parallel implementations are injected into namespace __gnu_parallel with using declarations.

Support and general infrastructure is in namespace diff --git a/libstdc++-v3/doc/xml/manual/parallel_mode.xml b/libstdc++-v3/doc/xml/manual/parallel_mode.xml index 4236f63c8b1..0bcbbcab04d 100644 --- a/libstdc++-v3/doc/xml/manual/parallel_mode.xml +++ b/libstdc++-v3/doc/xml/manual/parallel_mode.xml @@ -28,7 +28,7 @@ implementation of many algorithms the C++ Standard Library. Several of the standard algorithms, for instance -std::sort, are made parallel using OpenMP +std::sort, are made parallel using OpenMP annotations. These parallel mode constructs and can be invoked by explicit source declaration or by compiling existing sources with a specific compiler flag. @@ -39,52 +39,52 @@ specific compiler flag. Intro The following library components in the include -<numeric> are included in the parallel mode: +numeric are included in the parallel mode: - std::accumulate - std::adjacent_difference - std::inner_product - std::partial_sum + std::accumulate + std::adjacent_difference + std::inner_product + std::partial_sum The following library components in the include -<algorithm> are included in the parallel mode: +algorithm are included in the parallel mode: - std::adjacent_find - std::count - std::count_if - std::equal - std::find - std::find_if - std::find_first_of - std::for_each - std::generate - std::generate_n - std::lexicographical_compare - std::mismatch - std::search - std::search_n - std::transform - std::replace - std::replace_if - std::max_element - std::merge - std::min_element - std::nth_element - std::partial_sort - std::partition - std::random_shuffle - std::set_union - std::set_intersection - std::set_symmetric_difference - std::set_difference - std::sort - std::stable_sort - std::unique_copy + std::adjacent_find + std::count + std::count_if + std::equal + std::find + std::find_if + std::find_first_of + std::for_each + std::generate + std::generate_n + std::lexicographical_compare + std::mismatch + std::search + std::search_n + std::transform + std::replace + std::replace_if + std::max_element + std::merge + std::min_element + std::nth_element + std::partial_sort + std::partition + std::random_shuffle + std::set_union + std::set_intersection + std::set_symmetric_difference + std::set_difference + std::sort + std::stable_sort + std::unique_copy The following library components in the includes -<set> and <map> are included in the parallel mode: +set and map are included in the parallel mode: std::(multi_)map/set<T>::(multi_)map/set(Iterator begin, Iterator end) (bulk construction) std::(multi_)map/set<T>::insert(Iterator begin, Iterator end) (bulk insertion) @@ -113,23 +113,25 @@ It might work with other compilers, though. Using Parallel Mode -To use the libstdc++ parallel mode, compile your application with - the compiler flag -D_GLIBCXX_PARALLEL -fopenmp. This + + To use the libstdc++ parallel mode, compile your application with + the compiler flag -D_GLIBCXX_PARALLEL -fopenmp. This will link in libgomp, the GNU OpenMP implementation, whose presence is mandatory. In addition, hardware capable of atomic operations is mandatory. Actually activating these atomic operations may require explicit compiler flags on some targets - (like sparc and x86), such as -march=i686, - -march=native or -mcpu=v9. + (like sparc and x86), such as -march=i686, + -march=native or -mcpu=v9. -Note that the _GLIBCXX_PARALLEL define may change the +Note that the _GLIBCXX_PARALLEL define may change the sizes and behavior of standard class templates such as - std::search, and therefore one can only link code + std::search, and therefore one can only link code compiled with parallel mode and code compiled without parallel mode if no instantiation of a container is passed between the two translation units. Parallel mode functionality has distinct linkage, - and cannot be confused with normal mode symbols. + and cannot be confused with normal mode symbols. + @@ -420,9 +422,10 @@ It might work with other compilers, though. Interface Basics -All parallel algorithms are intended to have signatures that are + +All parallel algorithms are intended to have signatures that are equivalent to the ISO C++ algorithms replaced. For instance, the -std::adjacent_find function is declared as: +std::adjacent_find function is declared as: namespace std @@ -506,39 +509,176 @@ overload is therefore missing. Configuration and Tuning - Some algorithm variants can be enabled/disabled/selected at compile-time. -See -<compiletime_settings.h> and -See -<features.h> for details. + + + Setting up the OpenMP Environment + + +Several aspects of the overall runtime environment can be manipulated +by standard OpenMP function calls. + + + +To specify the number of threads to be used for an algorithm, use the +function omp_set_num_threads. An example: + + + +#include <stdlib.h> +#include <omp.h> + +int main() +{ + // Explicitly set number of threads. + const int threads_wanted = 20; + omp_set_dynamic(false); + omp_set_num_threads(threads_wanted); + if (omp_get_num_threads() != threads_wanted) + abort(); + + // Do work. + + return 0; +} + + + +Other parts of the runtime environment able to be manipulated include +nested parallelism (omp_set_nested), schedule kind +(omp_set_schedule), and others. See the OpenMP +documentation for more information. + + + + + + Compile Time Switches + + +To force an algorithm to execute sequentially, even though parallelism +is switched on in general via the macro _GLIBCXX_PARALLEL, +add __gnu_parallel::sequential_tag() to the end +of the algorithm's argument list, or explicitly qualify the algorithm +with the __gnu_parallel:: namespace. + + + +Like so: + + + +std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag()); + + + +or + + + +__gnu_serial::sort(v.begin(), v.end()); + + + +In addition, some parallel algorithm variants can be enabled/disabled/selected +at compile-time. + + + +See compiletime_settings.h and +See features.h for details. + + + + + Run Time Settings and Defaults + + +The default parallization strategy, the choice of specific algorithm +strategy, the minimum threshold limits for individual parallel +algorithms, and aspects of the underlying hardware can be specified as +desired via manipulation +of __gnu_parallel::_Settings member data. + + + +First off, the choice of parallelization strategy: serial, parallel, +or implementation-deduced. This corresponds +to __gnu_parallel::_Settings::algorithm_strategy and is a +value of enum __gnu_parallel::_AlgorithmStrategy +type. Choices +include: heuristic, force_sequential, +and force_parallel. The default is +implementation-deduced, ie heuristic. + + + + +Next, the sub-choices for algorithm implementation. Specific +algorithms like find or sort +can be implemented in multiple ways: when this is the case, +a __gnu_parallel::_Settings member exists to +pick the default strategy. For +example, __gnu_parallel::_Settings::sort_algorithm can +have any values of +enum __gnu_parallel::_SortAlgorithm: MWMS, QS, +or QS_BALANCED. + + + +Likewise for setting the minimal threshold for algorithm +paralleization. Parallelism always incurs some overhead. Thus, it is +not helpful to parallelize operations on very small sets of +data. Because of this, measures are taken to avoid parallelizing below +a certain, pre-determined threshold. For each algorithm, a minimum +problem size is encoded as a variable in the +active __gnu_parallel::_Settings object. This +threshold variable follows the following naming scheme: +__gnu_parallel::_Settings::[algorithm]_minimal_n. So, +for fill, the threshold variable +is __gnu_parallel::_Settings::fill_minimal_n -To specify the number of threads to be used for an algorithm, -use omp_set_num_threads. -To force a function to execute sequentially, -even though parallelism is switched on in general, -add __gnu_parallel::sequential_tag() -to the end of the argument list. +Finally, hardware details like L1/L2 cache size can be hardwired +via __gnu_parallel::_Settings::L1_cache_size and friends. -Parallelism always incurs some overhead. Thus, it is not -helpful to parallelize operations on very small sets of data. -There are measures to avoid parallelizing stuff that is not worth it. -For each algorithm, a minimum problem size can be stated, -usually using the variable -__gnu_parallel::Settings::[algorithm]_minimal_n. -Please see -<settings.h> for details. +All these configuration variables can be changed by the user, if +desired. Please +see settings.h +for complete details. + + + +A small example of tuning the default: + + + +#include <parallel/algorithm> +#include <parallel/settings.h> + +int main() +{ + __gnu_parallel::_Settings s; + s.algorithm_strategy = __gnu_parallel::force_parallel; + __gnu_parallel::_Settings::set(s); + + // Do work... all algorithms will be parallelized, always. + + return 0; +} + + Implementation Namespaces - One namespace contain versions of code that are explicitly sequential: + One namespace contain versions of code that are always +explicitly sequential: __gnu_serial. @@ -548,10 +688,8 @@ Please see Parallel implementations of standard components, including template helpers to select parallelism, are defined in namespace -std::__parallel. For instance, std::transform from -<algorithm> has a parallel counterpart in -std::__parallel::transform from -<parallel/algorithm>. In addition, these parallel +std::__parallel. For instance, std::transform from algorithm has a parallel counterpart in +std::__parallel::transform from parallel/algorithm. In addition, these parallel implementations are injected into namespace __gnu_parallel with using declarations. @@ -588,7 +726,7 @@ the generated source documentation. The log and summary files for conformance testing are in the - testsuite/parallel directory. + testsuite/parallel directory. @@ -596,13 +734,13 @@ the generated source documentation. - check-performance-parallel + make check-performance-parallel The result file for performance testing are in the - testsuite directory, in the file - libstdc++_performance.sum. In addition, the + testsuite directory, in the file + libstdc++_performance.sum. In addition, the policy-based containers have their own visualizations, which have additional software dependencies than the usual bare-boned text file, and can be generated by using the make