Skip to content

gh-129185: Fix PyTraceMalloc_Untrack() at Python exit #129191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 23, 2025

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Jan 22, 2025

Support calling PyTraceMalloc_Track() and PyTraceMalloc_Untrack() during late Python finalization.

Support calling PyTraceMalloc_Track() and PyTraceMalloc_Untrack()
during late Python finalization.
@vstinner
Copy link
Member Author

cc @ZeroIntensity

@@ -1255,6 +1255,11 @@ int
PyTraceMalloc_Track(unsigned int domain, uintptr_t ptr,
size_t size)
{
// gh-129185: Pre-check to support calls after _PyTraceMalloc_Fini()
if (!tracemalloc_config.tracing) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to hold the lock or GIL, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this "pre-check" is to avoid the GIL and TABLES_LOCK().

The flag is tested again below with TABLES_LOCK().

PyTraceMalloc_Untrack() doesn't lock the GIL, only TABLES_LOCK(), when tracing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change restores the check that we had before recent refactoring. For example, Python 3.14 alpha3 uses:

int
PyTraceMalloc_Track(unsigned int domain, uintptr_t ptr,
                    size_t size)
{
    int res;
    PyGILState_STATE gil_state;

    if (!tracemalloc_config.tracing) {
        /* tracemalloc is not tracing: do nothing */
        return -2;
    }

    gil_state = PyGILState_Ensure();

    TABLES_LOCK();
    res = tracemalloc_add_trace(domain, ptr, size);
    TABLES_UNLOCK();

    PyGILState_Release(gil_state);
    return res;
}

The check is done without holding the GIL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then we lose the thread safety: if one thread that holds the GIL were to write to tracing, then a thread that calls one of these without the GIL would race.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest solution would be to grab the tables lock for the read, and then unlock it before calling PyGILState_Ensure to prevent lock-ordering deadlocks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race is right here. Thread A might check tracing at the exact same time as thread B, which is a data race.

What is the problem of "continue execution" of thread A, since thread A checks again tracing with the tables lock?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it might not reach the second tracing read at all--if the first one races, then we're probably going to get a crash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're talking about thread B calling _PyTraceMalloc_Fini(), not tracemalloc.stop(). In this case, thread A can call TABLES_LOCK() after thread B deleted the lock, and yes, we can get a crash.

Sadly, PyTraceMalloc_Track() and PyTraceMalloc_Untrack() API doesn't require the caller to hold the GIL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can happen too, but I meant that we'll get a crash from the data race on tracing, not a use-after-free on the lock. I think we just hold the GIL (via PyGILState_Ensure) and call without the tables lock held, that should be thread safe enough for 3.12.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the Untrack() function to get the GIL. It should address your last concern.

@vstinner
Copy link
Member Author

@ZeroIntensity: Please review the updated PR, it should reduce the risk of race.

#129185 is not about threads running in parallel. It's just a single thread and PyTraceMalloc_Track/Untrack() called by Python objects destroyed after _PyTraceMalloc_Fini().

I propose to merge this incomplete fix and backport it, and then consider switching to PyMutex in the main branch to avoid the last race.

@ZeroIntensity
Copy link
Member

#129185 is not about threads running in parallel. It's just a single thread and PyTraceMalloc_Track/Untrack() called by Python objects destroyed after _PyTraceMalloc_Fini().

Yeah, I'm talking about the other issue that you recently fixed. The previous approach would make it a problem again.

I propose to merge this incomplete fix and backport it, and then consider switching to PyMutex in the main branch to avoid the last race.

I agree.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@vstinner vstinner merged commit 46c7e13 into python:main Jan 23, 2025
40 checks passed
@vstinner vstinner deleted the tracemalloc branch January 23, 2025 11:07
@miss-islington-app
Copy link

Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @vstinner, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 46c7e13c055c218e18b0424efc60965e6a5fe6ea 3.13

@miss-islington-app
Copy link

Sorry, @vstinner, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 46c7e13c055c218e18b0424efc60965e6a5fe6ea 3.12

vstinner added a commit to vstinner/cpython that referenced this pull request Jan 23, 2025
…29191)

Support calling PyTraceMalloc_Track() and PyTraceMalloc_Untrack()
during late Python finalization.

* Call _PyTraceMalloc_Fini() later in Python finalization.
* Test also PyTraceMalloc_Untrack() without the GIL
* PyTraceMalloc_Untrack() now gets the GIL.
* Test also PyTraceMalloc_Untrack() in test_tracemalloc_track_race().

(cherry picked from commit 46c7e13)
@bedevere-app
Copy link

bedevere-app bot commented Jan 23, 2025

GH-129217 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Jan 23, 2025
@vstinner vstinner removed the needs backport to 3.12 only security fixes label Jan 23, 2025
vstinner added a commit that referenced this pull request Jan 23, 2025
#129217)

gh-129185: Fix PyTraceMalloc_Untrack() at Python exit (#129191)

Support calling PyTraceMalloc_Track() and PyTraceMalloc_Untrack()
during late Python finalization.

* Call _PyTraceMalloc_Fini() later in Python finalization.
* Test also PyTraceMalloc_Untrack() without the GIL
* PyTraceMalloc_Untrack() now gets the GIL.
* Test also PyTraceMalloc_Untrack() in test_tracemalloc_track_race().

(cherry picked from commit 46c7e13)
vstinner added a commit that referenced this pull request Jan 23, 2025
#129217) (#129221)

[3.13] gh-129185: Fix PyTraceMalloc_Untrack() at Python exit (#129191) (#129217)

gh-129185: Fix PyTraceMalloc_Untrack() at Python exit (#129191)

Support calling PyTraceMalloc_Track() and PyTraceMalloc_Untrack()
during late Python finalization.

* Call _PyTraceMalloc_Fini() later in Python finalization.
* Test also PyTraceMalloc_Untrack() without the GIL
* PyTraceMalloc_Untrack() now gets the GIL.
* Test also PyTraceMalloc_Untrack() in test_tracemalloc_track_race().

(cherry picked from commit 46c7e13)
(cherry picked from commit e3b3e01)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants