You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf(profiling): improve scaling of memory profiler for large heaps (#13317)
The heap profiler component of memalloc uses a plain array to track
allocations. It performs a linear scan of this array on every free to
see if the freed allocation is tracked. This means the cost of freeing
an allocation increases substantially for programs with large heaps.
Switch to a hash table for tracking allocations. This should have roughly
constant overhead even for larger heaps, where we track more allocations.
This PR uses the C implementation of the [Abseil
SwissTable](https://abseil.io/blog/20180927-swisstables)
from https://github.com/google/cwisstable. This hash table is designed
for memory efficiency, both in terms of memory used (1 byte per entry of
metadata) and in terms of access (cache-friendly layout). The table
is designed in particular for fast lookup and insertion. We do a lookup
for every single free, so it's important that it's fast.
This PR previously tried to use a C++ `std::unordered_map`, but it
proved far too difficult to integrate with the existing C `memalloc`
implementation.
This PR includes the single `cwisstable.h` header with the following
modifications, which can be found in the header by searching for
`BEGIN MODIFICATION`:
- Fix `CWISS_Mul128`: for 64-bit windows we need a different intrinsic,
and for 32-bit systems we can't use it at all
- Fix a multiple declaration and invalid C++ syntax in
`CWISS_LeadingZeroes64`
for 32-bit Windows
The PR also supplies a 32-bit compatible hash function since the
one in `cwisstable.h` couldn't be easily made to work. We use `xxHash`
instead. Note that `cwisstable.h` requires the hash function to return
a `size_t`, which is 32 bits on 32-bit platforms, but the SwissTable design
expects 64-bit hashes. I don't know if this is a huge problem in practice
or if people even use our profiler on 32-bit systems, but it's something to
watch out for.
Note that the `cwisstable.h` hash table implementation basically never
gets rid of backing memory unless we completely clear the table. This
is not a big deal in practice. Each entry in the table is an 8 byte `void*` key
and an 8 byte `traceback_t*`, plus 1 byte of control metadata. If we assume
a target load factor of 50%, and we keep the existing `2^16` element cap,
then a completely full table is only about 2MiB. Most of the heap profiler
memory usage comes from the tracebacks themselves, which we _do_
free when they're removed from the profiler structures.
This PR also fixes a potential memory leak. Since we use try-locks, it theoretically
possible to fail to un-track an allocation if we can't acquire the lock.
When this happens, the previous implementation would just leak an entry.
With the hash table, if we find an existing entry for a given address, we can free the
old traceback and replace it with the new one. The real fix will be to fix how we
handle locking, but this PR improves the situation.
I've tested this locally and see very good performance even for larger heaps,
and this has also been tested in #13307.
Fixes#13307
0 commit comments