-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-107421: Clarify OrderedDict
Examples and Recipes
#107613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-107421: Clarify OrderedDict
Examples and Recipes
#107613
Conversation
…MultiHitLRUCache` See explanation at python#107421 (comment) for more information.
…itLRUCache` This minor change ensures the cache never has more than `maxsize` items in `cache`. The previous implementation would temporarily exceed `maxsize` items by one item when the cache adds the newest entry to `cache` before removing the least recently used item from `cache` with `self.cache.popitem(last=False)`.
Doc/library/collections.rst
Outdated
else: | ||
self.requests.pop(args, None) | ||
# entry has been seen more than cache_after times | ||
self.requests.pop(args) # no longer need to keep track of how many times this entry has been seen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is reasonable and add some value. Please move it to a separate line, just before the relevant code.
Also, keep the original None
argument. It provides some safety in a multithreaded environment. We never want this line to raise an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this comment to the lines just above the code in commit 77f10b2.
I also added context the multi-threading problems that could arise that you mentioned in commit a03483d, since I think the race conditions that could occur in multi-threading environments is kind of obvious for someone who is already designing code for multi-threading environments - but might be missed by someone who initially designs the code for a single-threaded environment and then moves it into a multi-threaded environment without reviewing all the code.
I think this is useful context and might help some people avoid problems if they mistakenly take out the default argument from pop(args, None)
because they think it is unnecessary (in their single-threaded environment), but let me know if you think the comment is too long/should be changed, or if the comment is unnecessary altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the comments altogether. Just stick with the 1/0 to True/False edit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is related to #107613 (comment), in which case, commit aae218b should take care of this as well.
Doc/library/collections.rst
Outdated
self.requests.pop(args, None) | ||
# entry has been seen more than cache_after times | ||
self.requests.pop(args) # no longer need to keep track of how many times this entry has been seen | ||
if len(self.cache) == self.maxsize: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not agree with this code edit. Letting the cache grow larger than the maxsize and then popping is safer because we are guaranteed that at least one entry is present. Also, we tend to put deletion code after the insertion code so that the cache is in a consistent state prior to deletion whose DECREF logic can trigger arbitrary Python code.
Please restore the original code order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted this change in commit 26f9f90.
I think I got the gist of what you're saying here, but I don't think I understand the internals of CPython enough to follow all of it.
Letting the cache grow larger than the maxsize and then popping is safer because we are guaranteed that at least one entry is present.
Is this also to avoid issues in multi-threaded environments, as you mentioned above (and in issue 107421)?
Also, we tend to put deletion code after the insertion code so that the cache is in a consistent state prior to deletion whose DECREF logic can trigger arbitrary Python code.
I think this is referring to garbage collection that might remove the cache when the cache becomes empty, correct? In that case, wouldn't the self.cache
still be safe from garbage collection since the reference count is still (at least) 1 since the instance has a reference to the self.cache
attribute? Or is this implementation dependent? (Or is it something else altogether? 😅)
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
These comments just say what the code is doing in different words and do not add any new information or context.
See the comment at https://github.com/python/cpython/pull/107613/files#r1284119710 and python#107421 (comment) for more information.
… `MultiHitLRUCache`" This reverts commit ff22bac. See the explanation at https://github.com/python/cpython/pull/107613/files#r1284124385 for more information.
a7a173d
to
12a42e5
Compare
See https://github.com/python/cpython/actions/runs/5765303505/job/15631004596?pr=107613. This was missed and should have been included in the previous commit.
I have made the requested changes; please review again I also added the multi-threaded context you mentioned here and in issue 107421 (mentioned here), but let me know if I should change that. I also had a few questions I added here, but including the answers to those questions might not be necessary to include in the example since it seems based on internals (if it isn't based on internals, though, it might be worth adding). |
Thanks for making the requested changes! @rhettinger: please review the changes made to this pull request. |
In addition to updating the documentation to include the changes suggested in https://github.com//issues/107421, this PR also makes a minor fix in commit ff22bac to ensure theMultiHitLRUCache
never has more thanmaxsize
items in thecache
attribute at any given time (see the commit message for more information).See #107613 (comment) for an explanation about why "never [having] more than
maxsize
items in thecache
attribute at any given time" might not be a good idea.📚 Documentation preview 📚: https://cpython-previews--107613.org.readthedocs.build/
Here's the direct link to the preview of the
OrderedDict
Examples and Recipes section