Skip to content

[lldb] Update async step-in implementation #2742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 2, 2021

Conversation

kastiglione
Copy link

This followup to #2537 is to support the following:

  1. The new async unwinding implemented across a few other commits (see Add UnwindPlans that can follow an AsyncContext chain #2617 Don't create a swift async unwind plan when in a function prologue #2645 [lldb] Swift async: Hopefully correct unwinder #2663)
  2. The initial implementation set a breakpoint on the async resume function, but the breakpoint was not conditional and would stop even if another async task (ie "thread") was the caller. This change implements the conditional breakpoint
  3. Handle the updated swift_task_switch ABI changes (see Change the async ABI to not pass the active task and executor swift#36437)

Conditional breakpoints need some explanation. This change introduces a new mode for thread plans which has been discussed with @jimingham. As of this change, thread plans are no longer bound to a thread, they can be detached from an initial thread, and connected to a new thread. This is to support Swift Concurrency, where stepping into an async function can not assume that function will be called on the same thread, and cannot assume the state of the OS native stack.

To support this, the Process now manages new list of thread plans that have been detached from a thread (m_async_thread_plans), and has supporting code to move thread plans back and forth between m_thread_plans and m_async_thread_plans.

The new function FindDetachedPlanExplainingStop is used to determine if a detached thread plan explains a current stop, and if so that thread plan is attached to the current thread, in order to continue carrying out its work.

@kastiglione
Copy link
Author

kastiglione commented Mar 23, 2021

TODO:

  • Write API tests
  • Figure out long term StackID comparison for async frames
  • Prevent step-over from stepping in
  • Attempt to get rid of the duct tape solutions

Copy link

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have a bunch of mostly non-substantial comments inline.

@@ -2256,6 +2256,10 @@ bool PruneThreadPlansForTID(lldb::tid_t tid);
/// Prune ThreadPlanStacks for all unreported threads.
void PruneThreadPlans();

void SynchronizeThreadPlans();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It would be nice to have Doxygen comment here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. FindDetachedPlanExplainingStop doesn't indicate that the function also reinstates that stack on the thread that found it relevant. So either a comment or something in the name to make that clear is important.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great points. fwiw I would not have committed without comments, but I also didn't want to write comments only to find out code review obviated them :)

void SetTID(lldb::tid_t tid) { m_tid = tid; }

friend lldb::ThreadPlanSP
Process::FindDetachedPlanExplainingStop(Event *event_ptr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Doxygen comment explaining what this function does?

@@ -145,8 +149,36 @@ class ThreadPlanStackMap {
return &result->second;
}

// rename to Reactivate?
void Activate(ThreadPlanStack &&stack) {
if (m_plans_list.find(stack.GetTID()) == m_plans_list.end()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: LLVM coding style omits {} on single statements.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactivate is better than Activate. But the operation is more that the Plan Stack gets Detached from the current thread, and then Reattached to the thread the work lands on, than that the stack is somehow deactivated & reactivated. Using that sort of language might make more sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't felt a strong attachment to which terms to use for having a thread vs no thread. Detached and Reattached sounds good to me. I've also used dissociate and associate, which have the benefit of not being overloaded.

if (m_plans_list.find(stack.GetTID()) == m_plans_list.end()) {
m_plans_list.emplace(stack.GetTID(), std::move(stack));
} else {
m_plans_list.at(stack.GetTID()) = std::move(stack);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this equivalent to m_plans_list[stack.GetTID()] = std::move(stack); or better in some way?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind of equivalent. To use operator[], the value type needs to be default constructible, and ThreadPlanStack is not. Maybe the better thing to do here is make it default constructible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to this change, it didn't make sense to add ThreadPlanStacks that didn't have a TID associated with them, so a default constructor wouldn't have made sense either. That's no longer the case, so you could make them default constructible. But the other requirement for ThreadPlanStacks is that they have a base plan, either the ThreadPlanBase, or ThreadPlanNull. The latter handles the case where we're shutting down and want to have a do-nothing ThreadPlanStack in case anybody asks it questions.

Since ThreadPlanStacks are only made in one or two places, that's just handled making a stack and immediately pushing on the proper base plan. But if we are going to make ThreadPlanStacks default constructible, we should probably do that bit of business in the constructor to keep us forgetting to do this.

auto arch = reg_ctx_sp->CalculateTarget()->GetArchitecture();
int async_context_regnum = 0;
if (arch.GetMachine() == llvm::Triple::x86_64) {
async_context_regnum = dwarf_r14_x86_64;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to consolidate the knowledge about r14/r22 into a single place, too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have way too many Swift specific files in Target now. There really all belong in the Swift LanguageRuntime Plugin directory.

@kastiglione
Copy link
Author

Thanks for all the comments @adrian-prantl

Copy link

@jimingham jimingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of want to figure out why getting the context didn't work in the AsyncThreadPlan's ShouldStop, but did work in the ThreadPlanBase's ShouldStop? That really shouldn't make any difference, and the code is complicated by having that not work. It would be nice to clear that up before moving forward.

It also seems weird to me that you are disassociating the whole plan stack from the thread when you see an async event. I was imagining that when you detect that an async dispatch is about to start, you would make a disassociated ThreadPlan stack to catch the async when it resumed, and push a step-out plan onto the current stack.

@@ -2256,6 +2256,10 @@ bool PruneThreadPlansForTID(lldb::tid_t tid);
/// Prune ThreadPlanStacks for all unreported threads.
void PruneThreadPlans();

void SynchronizeThreadPlans();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. FindDetachedPlanExplainingStop doesn't indicate that the function also reinstates that stack on the thread that found it relevant. So either a comment or something in the name to make that clear is important.

@@ -145,8 +149,36 @@ class ThreadPlanStackMap {
return &result->second;
}

// rename to Reactivate?
void Activate(ThreadPlanStack &&stack) {
if (m_plans_list.find(stack.GetTID()) == m_plans_list.end()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactivate is better than Activate. But the operation is more that the Plan Stack gets Detached from the current thread, and then Reattached to the thread the work lands on, than that the stack is somehow deactivated & reactivated. Using that sort of language might make more sense?

auto arch = reg_ctx_sp->CalculateTarget()->GetArchitecture();
int async_context_regnum = 0;
if (arch.GetMachine() == llvm::Triple::x86_64) {
async_context_regnum = dwarf_r14_x86_64;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have way too many Swift specific files in Target now. There really all belong in the Swift LanguageRuntime Plugin directory.

should_resume = !m_thread_list.ShouldStop(event_ptr);
SynchronizeThreadPlans();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you only do this if we aren't restarted? That's going to happen if for instance there was another breakpoint that was hit at same time as when you were catching the async work getting detached, and that breakpoint auto-continued. We still need to make a plan stack to catch the async execution when it starts up onto the unassociated thread plan stack list? And this is just house-keeping, we don't need to ask the process questions to do that since we're just looking at the TIDs as marked in the stacks in CleanUp, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimingham the placement isn't relative to was_restarted, it's that SynchronizeThreadPlans should be run after ThreadList::ShouldStop. The idea is that ThreadPlan::ShouldStop is where a thread plan could choose to dissociate from its current thread.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally it is the ThreadList::ShouldStop that runs actions that might restart the target. Synchronous breakpoint callbacks have run before this, but they should never actually restart, they should return a ShouldStop. So maybe this check is over-defensive. If there were a legit reason why we might have actually restarted here, we would need to think about what work we needed to do.

Note also that ShouldStop CAN restart the process (somebody could have a breakpoint command that is just "continue"... So you need to make sure the SynchronizeThreadPlans can get called when the target is running (or maybe even ran & stopped again in the middle of SynchronizeThreadPlan.

You also need to call SynchronizeThreadPlans above in the Interrupted branch's ShouldStop. It's highly unlikely that somebody will interrupt and stop just as async work is being scheduled, but not impossible.

@kastiglione
Copy link
Author

kastiglione commented Mar 31, 2021

@jimingham I spent time to remove the use of the breakpoint callback to capture info. It's all done from the ThreadPlan now. See 2a4d4ac.

@kastiglione
Copy link
Author

Left for a followup:

  • Function renames and add docstrings
  • Removing StackID::operator< heuristic

@kastiglione kastiglione marked this pull request as ready for review April 1, 2021 20:52
@kastiglione
Copy link
Author

@swift-ci test

Copy link

@jimingham jimingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the comments are about naming or smaller refinements. Overall this looks pretty good.

@@ -2796,6 +2800,7 @@ void PruneThreadPlans();
/// threads in m_thread_list, as well as
/// threads we knew existed, but haven't
/// determined that they have died yet.
std::vector<ThreadPlanStack> m_async_thread_plans;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe m_async_plan_stacks would be a better name here? This isn't a collection of plans it's a collection of plan stacks.

@@ -507,6 +507,14 @@ class ThreadPlan : public std::enable_shared_from_this<ThreadPlan>,
return m_iteration_count;
}

bool HasTID() { return m_tid != LLDB_INVALID_THREAD_ID; }
void ClearTID() { m_tid = LLDB_INVALID_THREAD_ID; }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ClearTID & SetTID should be private and friended to ThreadPlanStack. I don't think there's any case where we would want somebody to change the TID for one plan in a stack, but not the others.

if (m_plans_list.find(stack.GetTID()) == m_plans_list.end()) {
m_plans_list.emplace(stack.GetTID(), std::move(stack));
} else {
m_plans_list.at(stack.GetTID()) = std::move(stack);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior to this change, it didn't make sense to add ThreadPlanStacks that didn't have a TID associated with them, so a default constructor wouldn't have made sense either. That's no longer the case, so you could make them default constructible. But the other requirement for ThreadPlanStacks is that they have a base plan, either the ThreadPlanBase, or ThreadPlanNull. The latter handles the case where we're shutting down and want to have a do-nothing ThreadPlanStack in case anybody asks it questions.

Since ThreadPlanStacks are only made in one or two places, that's just handled making a stack and immediately pushing on the proper base plan. But if we are going to make ThreadPlanStacks default constructible, we should probably do that bit of business in the constructor to keep us forgetting to do this.

std::vector<ThreadPlanStack> CleanUp() {
llvm::SmallVector<lldb::tid_t, 2> invalidated_tids;
for (auto &pair : m_plans_list)
if (pair.second.GetTID() != pair.first)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more obvious - since when a thread plan stack is invalidated its TID gets set to LLDB_INVALID_THREAD_ID, that you explicitly check for that value. If the two key and the stack's TID are different but the Stack is not LLDB_INVALID_THREAD_ID, we probably want to assert, since that shouldn't happen.

}

// rename to ...?
std::vector<ThreadPlanStack> CleanUp() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CleanUp seems too generic a name for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it was a placeholder, to defer thinking of a better name until the dust settled.

m_async_thread_plans.emplace_back(std::move(stack));
}

ThreadPlanSP Process::FindDetachedPlanExplainingStop(Thread &thread,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to have a single name for these Detached/Deactivated/etc. Stacks. Also since this routine doesn't just find the Plan, it also Reattaches/Reactivates the Stack, "Find" isn't a good verb for this function.

should_resume = !m_thread_list.ShouldStop(event_ptr);
SynchronizeThreadPlans();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally it is the ThreadList::ShouldStop that runs actions that might restart the target. Synchronous breakpoint callbacks have run before this, but they should never actually restart, they should return a ShouldStop. So maybe this check is over-defensive. If there were a legit reason why we might have actually restarted here, we would need to think about what work we needed to do.

Note also that ShouldStop CAN restart the process (somebody could have a breakpoint command that is just "continue"... So you need to make sure the SynchronizeThreadPlans can get called when the target is running (or maybe even ran & stopped again in the middle of SynchronizeThreadPlan.

You also need to call SynchronizeThreadPlans above in the Interrupted branch's ShouldStop. It's highly unlikely that somebody will interrupt and stop just as async work is being scheduled, but not impossible.

@@ -61,6 +61,13 @@ bool lldb_private::operator<(const StackID &lhs, const StackID &rhs) {
const lldb::addr_t lhs_cfa = lhs.GetCallFrameAddress();
const lldb::addr_t rhs_cfa = rhs.GetCallFrameAddress();

// FIXME: rdar://76119439

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a async function call another async function that also gets detached? In that case, there's really no natural ordering since they would both be in the heap and where is subject to the whims of malloc...


auto line_start = line_entry.range.GetBaseAddress().GetFileAddress();
if (line_start >= fn_start && line_start < fn_end) {
if (++line_entry_count > 1) {
if (line_start >= fn_start && line_start < fn_end)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it's better not to mix this sort of cleanup into a big patch like this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, Adrian had commented on the use of curly braces so I did a pass on all of the code I had added since there was a bunch. It could also have been something I did earlier before the size had increased.

}

bool ShouldStop(Event *event) override {
if (!m_async_breakpoint_sp)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like if GetBreakpointAsyncContext matches this context the plan is complete. So if GetBreakpointAsyncContext is expensive, you can do the SetPlanComplete in DoPlanExplainsStop, and then have ShouldStop check IsPlanComplete and use that to decide what to return. That way you only have to do the work once.

There's one slightly non-obvious bit to that which is that is it possible that your plan wasn't the one that explained the stop. Rather, an earlier plan explained the stop, but then that planwas finished, and and got popped off the stack. We don't do another round of ExplainsStop at that point (*) but just ask the next plan on the stack if it ShouldStop.

In that case, your ShouldStop will get called, without having previously called PlanExplainsStop. The way to finesse that is to have ShouldStop call DoPlanExplainsStop and then return IsPlanComplete. The up side is no matter how you get to your ShouldStop, you only do the work to check what to do once, and all in one place.

(*) It might be more regular to redo the "PlanExplainsStop" every time we pop a plan to figure out the next plan we should ask ShouldStop of. However, that would require other changes, for instance plans that only use other plans would have to start responding themselves to ExplainsStop, which they don't at present.

@kastiglione
Copy link
Author

@swift-ci test macOS Platform

3 similar comments
@kastiglione
Copy link
Author

@swift-ci test macOS Platform

@kastiglione
Copy link
Author

@swift-ci test macOS Platform

@kastiglione
Copy link
Author

@swift-ci test macOS Platform

@kastiglione kastiglione merged commit e092783 into swift/main Apr 2, 2021
@kastiglione kastiglione deleted the lldb-Update-async-step-in-implementation branch April 2, 2021 22:33
kastiglione added a commit that referenced this pull request Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants