-
Notifications
You must be signed in to change notification settings - Fork 52
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Use a principled, and consistent, implementation of freelists. #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
FWIW, I had tried similar idea but I couldn't optimize it enough. |
Could this be done as a feature of pymalloc? pymalloc already has a number of the features you're looking for, particularly mapping from size to size-class-specific metadata. |
@methane how did it compare? As long as it was no slower, it would be worth it just for the reduction in code size. @kmod, it is a bit arbitrary whether the free lists go inside the allocator or on top of it. |
Cool idea! A quick look at Include/internal/pycore_interp.h shows the following freelists:
So all of these would change to use the new mechanism, right? What is the likelihood that any of these share a size? Other questions:
|
I updated my legacy global-freelist with current main branch. |
Inspired from Sam's nogil effort, I started to play with https://github.com/microsoft/mimalloc as replacement for libc's malloc and pymalloc. It comes with builtin freelist support and freelist sharding. If we move to to mimalloc, then we get lock-free freelists for free (three times free for free)! Pablo ran benchmarks for me:
The first case is slightly slower than vanilla Python. It is a surprisingly good result for a first attempt without any tuning. |
Is not it already done at low level in obmalloc? |
Yes. obmalloc has "freelist" in pool. It is very similar to "freelist" in mimalloc.
So "mimalloc has freelist" is not enough reason to say "Python don't need freelist before malloc/free". |
Prototype version of a more principled approach to free lists shows a 1%speed up |
Extending this beyond ints and floats is a pain. Adopting an allocator, such as |
How much impact that replace pymalloc with mimalloc on the binary size? |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
We currently have many ad-hoc per class free-lists.
We should consider changing freelists from per-class to per-size.
My informal experiments show that currently about 50% of allocations are served from free lists, whereas with per-size freelists, this increased to about 90%.
With tuning, we should expect a hit rate well over 90%.
Using free lists for blocks of memory, instead of objects, means that there are no interactions between the free lists and the cycle GC, simplifying both. There is a small additional cost of re-assigning the class, but the cost of this will be negligible in most cases.
Design principles
With the above, allocation becomes:
The functions mentioned above must be small and pure so that:
In practice, we will probably inline most of the above function calls, it just helps to think of the parts separately when designing.
Implementation of the freelist.
One possible design of the free lists is:
Each free list consists of three elements.
For alignment reasons the integers should be half the size of a pointer, which limits the capacity to 64k on a 32 bit machine, which is plenty.
The actual list is implemented as a linked list through the blocks of memory themselves, treating the first word of the memory as a pointer to the next entry.
NULL
and the remaining space to zero, no allocations will be from the free list. Should tracemalloc be turned off, the remaining space can be set back to the capacity.freelist->remaining == 0
andfreelist->head == NULL
, respectively.new_entry->next = freelist->head; freelist->head = new_entry; freelist->remaining--;
result = freelist->head; freelist->head = result->next; freelist->remaining++;
By putting the free lists in an array, the mapping from size class to free list is very simple.
Behavior when free list is full or empty; interaction with the malloc implementation.
This simplest option is to call malloc when the list is empty and to call free when the list is full.
Unfortunately this leads to fragmentation and performs poorly when repeatedly allocating or deallocating.
To avoid fragmentation, we should clear the free list when it is full and we need to free another object.
To avoid trashing on alternating malloc/frees we should half-fill the list when it is empty and we need to allocate an object.
Doing bulk free/mallocs reduces the number of free list misses considerably and keeps fragmentation down.
Ideally the malloc implementation would have functions for freeing and allocating multiple objects of the same size, which they generally don't. We could add that feature to
PyMalloc
.The text was updated successfully, but these errors were encountered: