libgo: reduce overhead for memory/block/mutex profiling
Revise the gccgo version of memory/block/mutex profiling to reduce
runtime overhead. The main change is to collect raw stack traces while
the profile is on line, then post-process the stacks just prior to the
point where we are ready to use the final product. Memory profiling
(at a very low sampling rate) is enabled by default, and the overhead
of the symbolization / DWARF-reading from backtrace_full was slowing
things down relative to the main Go runtime.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/171497
From-SVN: r271172