libstdc++: Reduce monotonic_buffer_resource overallocation [PR 96942]
The primary reason for this change is to reduce the size of buffers
allocated by std::pmr::monotonic_buffer_resource. Previously, a new
buffer would always add the size of the linked list node (11 bytes) and
then round up to the next power of two. This results in a huge increase
if the expected size of the next buffer is already a power of two. For
example, if the resource is constructed with a desired initial size of
4096 the first buffer it allocates will be std::bit_ceil(4096+11) which
is 8192. If the user has carefully selected the initial size to match
their expected memory requirements then allocating double that amount
wastes a lot of memory.
After this patch the allocated size will be rounded up to a 64-byte
boundary, instead of to a power of two. This means for an initial size
of 4096 only 4160 bytes get allocated.
Previously only the base-2 logarithm of the size was stored, which could
be stored in a single 8-bit integer. Now that the size isn't always a
power of two we need to use more bits to store it. As the size is always
a multiple of 64 the low six bits are not needed, and so we can use the
same approach that the pool resources already use of storing the base-2
logarithm of the alignment in the low bits that are not used for the
size. To avoid code duplication, a new aligned_size<N> helper class is
introduced by this patch, which is then used by both the pool resources'
big_block type and the monotonic_buffer_resource::_Chunk type.
Originally the big_block type used two bit-fields to store the size and
alignment in the space of a single size_t member. The aligned_size type
uses a single size_t member and uses masks and bitwise operations to
manipulate the size and alignment values. This results in better code
than the old version, because the bit-fields weren't optimally ordered
for little endian architectures, so the alignment was actually stored in
the high bits, not the unused low bits, requiring additional shifts to
calculate the values. Using bitwise operations directly avoids needing
to reorder the bit-fields depending on the endianness.
While adapting the _Chunk and big_block types to use aligned_size<N> I
also added checks for size overflows (technically, unsigned wraparound).
The memory resources now ensure that when they require an allocation
that is too large to represent in size_t they will request SIZE_MAX
bytes from the upstream resource, rather than requesting a small value
that results from wrapround. The testsuite is enhanced to verify this.
libstdc++-v3/ChangeLog:
PR libstdc++/96942
* include/std/memory_resource (monotonic_buffer_resource::do_allocate):
Use __builtin_expect when checking if a new buffer needs to be
allocated from the upstream resource, and for checks for edge
cases like zero sized buffers and allocations.
* src/c++17/memory_resource.cc (aligned_size): New class template.
(aligned_ceil): New helper function to round up to a given
alignment.
(monotonic_buffer_resource::chunk): Replace _M_size and _M_align
with an aligned_size member. Remove _M_canary member. Change _M_next
to pointer instead of unaligned buffer.
(monotonic_buffer_resource::chunk::allocate): Round up to multiple
of 64 instead of to power of two. Check for size overflow. Remove
redundant check for minimum required alignment.
(monotonic_buffer_resource::chunk::release): Adjust for changes
to data members.
(monotonic_buffer_resource::_M_new_buffer): Use aligned_ceil.
(big_block): Replace _M_size and _M_align with aligned_size
member.
(big_block::big_block): Check for size overflow.
(big_block::size, big_block::align): Adjust to use aligned_size.
(big_block::alloc_size): Use aligned_ceil.
(munge_options): Use aligned_ceil.
(__pool_resource::allocate): Use big_block::align for alignment.
* testsuite/20_util/monotonic_buffer_resource/allocate.cc: Check
upstream resource gets expected values for impossible sizes.
* testsuite/20_util/unsynchronized_pool_resource/allocate.cc:
Likewise. Adjust checks for expected alignment in existing test.