Skip to content

gh-107421: Clarify OrderedDict Examples and Recipes #107613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 11, 2023

Conversation

shailshouryya
Copy link
Contributor

@shailshouryya shailshouryya commented Aug 4, 2023

In addition to updating the documentation to include the changes suggested in https://github.com//issues/107421, this PR also makes a minor fix in commit ff22bac to ensure the MultiHitLRUCache never has more than maxsize items in the cache attribute at any given time (see the commit message for more information).

See #107613 (comment) for an explanation about why "never [having] more than maxsize items in the cache attribute at any given time" might not be a good idea.


📚 Documentation preview 📚: https://cpython-previews--107613.org.readthedocs.build/

Here's the direct link to the preview of the OrderedDict Examples and Recipes section

…MultiHitLRUCache`

See explanation at python#107421 (comment)
for more information.
…itLRUCache`

This minor change ensures the cache never has more than `maxsize`
items in `cache`. The previous implementation would temporarily
exceed `maxsize` items by one item when the cache adds the
newest entry to `cache` before removing the least recently used
item from `cache` with `self.cache.popitem(last=False)`.
else:
self.requests.pop(args, None)
# entry has been seen more than cache_after times
self.requests.pop(args) # no longer need to keep track of how many times this entry has been seen
Copy link
Contributor

@rhettinger rhettinger Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is reasonable and add some value. Please move it to a separate line, just before the relevant code.

Also, keep the original None argument. It provides some safety in a multithreaded environment. We never want this line to raise an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this comment to the lines just above the code in commit 77f10b2.

I also added context the multi-threading problems that could arise that you mentioned in commit a03483d, since I think the race conditions that could occur in multi-threading environments is kind of obvious for someone who is already designing code for multi-threading environments - but might be missed by someone who initially designs the code for a single-threaded environment and then moves it into a multi-threaded environment without reviewing all the code.

I think this is useful context and might help some people avoid problems if they mistakenly take out the default argument from pop(args, None) because they think it is unnecessary (in their single-threaded environment), but let me know if you think the comment is too long/should be changed, or if the comment is unnecessary altogether.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the comments altogether. Just stick with the 1/0 to True/False edit.

Copy link
Contributor Author

@shailshouryya shailshouryya Aug 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is related to #107613 (comment), in which case, commit aae218b should take care of this as well.

self.requests.pop(args, None)
# entry has been seen more than cache_after times
self.requests.pop(args) # no longer need to keep track of how many times this entry has been seen
if len(self.cache) == self.maxsize:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not agree with this code edit. Letting the cache grow larger than the maxsize and then popping is safer because we are guaranteed that at least one entry is present. Also, we tend to put deletion code after the insertion code so that the cache is in a consistent state prior to deletion whose DECREF logic can trigger arbitrary Python code.

Please restore the original code order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this change in commit 26f9f90.

I think I got the gist of what you're saying here, but I don't think I understand the internals of CPython enough to follow all of it.

Letting the cache grow larger than the maxsize and then popping is safer because we are guaranteed that at least one entry is present.

Is this also to avoid issues in multi-threaded environments, as you mentioned above (and in issue 107421)?

Also, we tend to put deletion code after the insertion code so that the cache is in a consistent state prior to deletion whose DECREF logic can trigger arbitrary Python code.

I think this is referring to garbage collection that might remove the cache when the cache becomes empty, correct? In that case, wouldn't the self.cache still be safe from garbage collection since the reference count is still (at least) 1 since the instance has a reference to the self.cache attribute? Or is this implementation dependent? (Or is it something else altogether? 😅)

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@shailshouryya shailshouryya force-pushed the OrderedDict_docs_clarified branch from a7a173d to 12a42e5 Compare August 4, 2023 18:17
@shailshouryya
Copy link
Contributor Author

I have made the requested changes; please review again

I also added the multi-threaded context you mentioned here and in issue 107421 (mentioned here), but let me know if I should change that.

I also had a few questions I added here, but including the answers to those questions might not be necessary to include in the example since it seems based on internals (if it isn't based on internals, though, it might be worth adding).

@bedevere-bot
Copy link

Thanks for making the requested changes!

@rhettinger: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from rhettinger August 4, 2023 18:30
@rhettinger rhettinger merged commit 23a6db9 into python:main Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants