X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=libstdc%2B%2B-v3%2Fdoc%2Fhtml%2Fmanual%2Fbk01pt03ch19s07.html;h=780504cbbc537b8bea66bae2d50f2d9dff6c35c0;hb=f25481f470c2810f6af2a7fcd76e2a0804b5f738;hp=3d3117ab82113d080371f354009898e2f370a759;hpb=604b91240e4f3a1b5c97681b575389addf3dd638;p=gcc.git diff --git a/libstdc++-v3/doc/html/manual/bk01pt03ch19s07.html b/libstdc++-v3/doc/html/manual/bk01pt03ch19s07.html index 3d3117ab821..780504cbbc5 100644 --- a/libstdc++-v3/doc/html/manual/bk01pt03ch19s07.html +++ b/libstdc++-v3/doc/html/manual/bk01pt03ch19s07.html @@ -1,6 +1,6 @@ - -
+ +
The table below presents all the diagnostics we intend to implement.
Each diagnostic has a corresponding compile time switch
-D_GLIBCXX_PROFILE_<diagnostic>
.
@@ -18,24 +18,24 @@
A high accuracy means that the diagnostic is unlikely to be wrong.
These grades are not perfect. They are just meant to guide users with
specific needs or time budgets.
-
Table 19.2. Profile Diagnostics
Group | Flag | Benefit | Cost | Freq. | Implemented | Â |
---|---|---|---|---|---|---|
- CONTAINERS | - HASHTABLE_TOO_SMALL | 10 | 1 | Â | 10 | yes |
 | - HASHTABLE_TOO_LARGE | 5 | 1 |  | 10 | yes |
 | - INEFFICIENT_HASH | 7 | 3 |  | 10 | yes |
 | - VECTOR_TOO_SMALL | 8 | 1 |  | 10 | yes |
 | - VECTOR_TOO_LARGE | 5 | 1 |  | 10 | yes |
 | - VECTOR_TO_HASHTABLE | 7 | 7 |  | 10 | no |
 | - HASHTABLE_TO_VECTOR | 7 | 7 |  | 10 | no |
 | - VECTOR_TO_LIST | 8 | 5 |  | 10 | yes |
 | - LIST_TO_VECTOR | 10 | 5 |  | 10 | no |
 | - ORDERED_TO_UNORDERED | 10 | 5 |  | 10 | only map/unordered_map |
- ALGORITHMS | - SORT | 7 | 8 | Â | 7 | no |
- LOCALITY | - SOFTWARE_PREFETCH | 8 | 8 | Â | 5 | no |
 | - RBTREE_LOCALITY | 4 | 8 |  | 5 | no |
 | - FALSE_SHARING | 8 | 10 |  | 10 | no |
Switch: +
Table 19.2. Profile Diagnostics
Group | Flag | Benefit | Cost | Freq. | Implemented | Â |
---|---|---|---|---|---|---|
+ CONTAINERS | + HASHTABLE_TOO_SMALL | 10 | 1 | Â | 10 | yes |
 | + HASHTABLE_TOO_LARGE | 5 | 1 |  | 10 | yes |
 | + INEFFICIENT_HASH | 7 | 3 |  | 10 | yes |
 | + VECTOR_TOO_SMALL | 8 | 1 |  | 10 | yes |
 | + VECTOR_TOO_LARGE | 5 | 1 |  | 10 | yes |
 | + VECTOR_TO_HASHTABLE | 7 | 7 |  | 10 | no |
 | + HASHTABLE_TO_VECTOR | 7 | 7 |  | 10 | no |
 | + VECTOR_TO_LIST | 8 | 5 |  | 10 | yes |
 | + LIST_TO_VECTOR | 10 | 5 |  | 10 | no |
 | + ORDERED_TO_UNORDERED | 10 | 5 |  | 10 | only map/unordered_map |
+ ALGORITHMS | + SORT | 7 | 8 | Â | 7 | no |
+ LOCALITY | + SOFTWARE_PREFETCH | 8 | 8 | Â | 5 | no |
 | + RBTREE_LOCALITY | 4 | 8 |  | 5 | no |
 | + FALSE_SHARING | 8 | 10 |  | 10 | no |
Switch:
_GLIBCXX_PROFILE_<diagnostic>
.
Goal: What problem will it diagnose?
Fundamentals:. @@ -52,10 +52,10 @@ program code ... advice sample
-
Switch:
_GLIBCXX_PROFILE_CONTAINERS
.
-
Switch: +
Switch:
_GLIBCXX_PROFILE_HASHTABLE_TOO_SMALL
.
Goal: Detect hashtables with many rehash operations, small construction size and large destruction size. @@ -81,7 +81,7 @@ advice sample foo.cc:1: advice: Changing initial unordered_set size from 10 to 1000000 saves 1025530 rehash operations.
-
Switch:
_GLIBCXX_PROFILE_HASHTABLE_TOO_LARGE
.
Goal: Detect hashtables which are never filled up because fewer elements than reserved are ever @@ -110,7 +110,7 @@ foo.cc:1: advice: Changing initial unordered_set size from 10 to 1000000 saves 1 foo.cc:1: advice: Changing initial unordered_set size from 100 to 10 saves N bytes of memory and M iteration steps.
-
Switch:
_GLIBCXX_PROFILE_INEFFICIENT_HASH
.
Goal: Detect hashtables with polarized distribution. @@ -141,7 +141,7 @@ class dumb_hash { hs.find(i); }
-
Switch:
_GLIBCXX_PROFILE_VECTOR_TOO_SMALL
.
Goal:Detect vectors with many resize operations, small construction size and large destruction size.. @@ -166,7 +166,7 @@ class dumb_hash { foo.cc:1: advice: Changing initial vector size from 10 to 1000000 saves copying 4000000 bytes and 20 memory allocations and deallocations.
-
Switch:
_GLIBCXX_PROFILE_VECTOR_TOO_LARGE
Goal:Detect vectors which are never filled up because fewer elements than reserved are ever @@ -192,7 +192,7 @@ copying 4000000 bytes and 20 memory allocations and deallocations. foo.cc:1: advice: Changing initial vector size from 100 to 10 saves N bytes of memory and may reduce the number of cache and TLB misses.
-
Switch:
_GLIBCXX_PROFILE_VECTOR_TO_HASHTABLE
.
Goal: Detect uses of
vector
that can be substituted with unordered_set
@@ -223,7 +223,7 @@ bytes of memory and may reduce the number of cache and TLB misses.
foo.cc:1: advice: Changing "vector" to "unordered_set" will save about 500,000
comparisons.
-
Switch:
_GLIBCXX_PROFILE_HASHTABLE_TO_VECTOR
.
Goal: Detect uses of
unordered_set
that can be substituted with vector
@@ -252,7 +252,7 @@ comparisons.
foo.cc:1: advice: Changing "unordered_set" to "vector" will save about N
indirections and may achieve better data locality.
-
Switch:
_GLIBCXX_PROFILE_VECTOR_TO_LIST
.
Goal: Detect cases where
vector
could be substituted with list
for
@@ -282,7 +282,7 @@ indirections and may achieve better data locality.
foo.cc:1: advice: Changing "vector" to "list" will save about 5,000,000
operations.
-
Switch:
_GLIBCXX_PROFILE_LIST_TO_VECTOR
.
Goal: Detect cases where
list
could be substituted with vector
for
@@ -309,7 +309,7 @@ operations.
foo.cc:1: advice: Changing "list" to "vector" will save about 1000000 indirect
memory references.
-
Switch:
_GLIBCXX_PROFILE_LIST_TO_SLIST
.
Goal: Detect cases where
list
could be substituted with forward_list
for
@@ -339,7 +339,7 @@ memory references.
foo.cc:1: advice: Change "list" to "forward_list".
-
Switch:
_GLIBCXX_PROFILE_ALGORITHMS
.
-
Switch: +
Switch:
_GLIBCXX_PROFILE_SORT
.
Goal: Give measure of sort algorithm performance based on actual input. For instance, advise Radix Sort over Quick Sort for a particular call context.
Fundamentals: See papers: - + A framework for adaptive algorithm selection in STAPL and - + Optimizing Sorting with Machine Learning Algorithms.
Sample runtime reduction:60%.
Recommendation: Change sort algorithm @@ -389,9 +389,9 @@ foo.cc:1: advice: Change "list" to "forward_list". Runtime(algo) for algo in [radix, quick, merge, ...]
Example:
-
Switch:
_GLIBCXX_PROFILE_LOCALITY
.
-
Switch: +
Switch:
_GLIBCXX_PROFILE_SOFTWARE_PREFETCH
.
Goal: Discover sequences of indirect memory accesses that are not regular, thus cannot be predicted by @@ -434,7 +434,7 @@ foo.cc:1: advice: Change "list" to "forward_list". foo.cc:7: advice: Insert prefetch instruction.
-
Switch:
_GLIBCXX_PROFILE_RBTREE_LOCALITY
.
Goal: Give measure of locality of objects stored in linked structures (lists, red-black trees and hashtables) @@ -442,7 +442,7 @@ foo.cc:7: advice: Insert prefetch instruction.
Fundamentals:Allocation can be tuned to a specific traversal pattern, to result in better data locality. See paper: - + Custom Memory Allocation for Free.
Sample runtime reduction:30%.
Recommendation: @@ -479,13 +479,13 @@ foo.cc:7: advice: Insert prefetch instruction. foo.cc:5: advice: High scatter score NNN for set built here. Consider changing the allocation sequence or switching to a structure conscious allocator.
-
The diagnostics in this group are not meant to be implemented short term. They require compiler support to know when container elements are written to. Instrumentation can only tell us when elements are referenced.
Switch:
_GLIBCXX_PROFILE_MULTITHREADED
.
-
Switch: +
Switch:
_GLIBCXX_PROFILE_DDTEST
.
Goal: Detect container elements that are referenced from multiple threads in the parallel region or @@ -501,7 +501,7 @@ the allocation sequence or switching to a structure conscious allocator. Keep a shadow for each container. Record iterator dereferences and container member accesses. Issue advice for elements referenced by multiple threads. - See paper: + See paper: The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization.
Cost model: @@ -509,7 +509,7 @@ the allocation sequence or switching to a structure conscious allocator.
Example:
-
Switch:
_GLIBCXX_PROFILE_FALSE_SHARING
.
Goal: Detect elements in the same container which share a cache line, are written by at least one @@ -542,7 +542,7 @@ OMP_NUM_THREADS=2 ./a.out foo.cc:1: advice: Change container structure or padding to avoid false sharing in multithreaded access at foo.cc:4. Detected N shared cache lines.
-
Switch:
_GLIBCXX_PROFILE_STATISTICS
.
@@ -555,4 +555,4 @@ sharing in multithreaded access at foo.cc:4. Detected N shared cache lines. This diagnostic will not issue any advice, but it will print statistics for each container construction site. The statistics will contain the cost of each operation actually performed on the container. -