Skip to content

Upload Perfetto traces to manifold on mesh exit #252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

thomasywang
Copy link

Summary: We want our Perfetto trace JSON be uploaded to Manifold so that we can view them in the browser

Differential Revision: D76433475

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 12, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76433475

Summary:
The simulator event loop can end up in a state where it loops without ever yielding execution back to the tokio runtime causing other tasks to starve.

We will fix this by adding a yield

Differential Revision: D76433473
Summary: Instead of wrapping the records in an Option<Arc<Mutex>> to be shared between the SimnetHandle and the Simnet and moved out when requested, we can just have the Simnet fully own the records and return them when the event loop is completed

Differential Revision: D76433476
Summary:
Previously we relied on matching a hardcoded abstract name for a unix socket to determine if the message was sent by the client. This does not work in non-linux. What we will do instead is determine if a message was sent by the client based on whether or not the src proxy address is different than the local proxy address.

To make this work we do need to remove our logic for using an egress sender (which is currently unused) to send a forward message to the other proxy which then forwards it to the actual recipient and will just send directly to the recipient. That is if the controller is sending the client a message instead of going from controller -> client_proxy -> client we will go directly from controller -> client. This is because client doesn't actually run in a separate process anymore and doesn't actually have a separate proxy, and if we were to set this up this would require more work because we would also need some way to configure it such that only one proxy sends events to the event loop whereas all other proxy forwards messages to that proxy. We can file a follow up task if need be but in the meantime this diff solves our problems very simply without creating any new problems

Differential Revision: D76526071
Summary: We have been treating the entire message send & recv as a single event but this doesn't really make sense since each of these should have their own duration (sending -> in flight -> receiving), and they happen on different processes (This will matter in the next diff when we render our events onto Perfetto).

Differential Revision: D76433477
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76433475

thomasywang pushed a commit to thomasywang/monarch-1 that referenced this pull request Jun 12, 2025
Summary:
Pull Request resolved: pytorch-labs#252

We want our Perfetto trace JSON be uploaded to Manifold so that we can view them in the browser

Differential Revision: D76433475
thomasywang and others added 2 commits June 12, 2025 11:36
Summary: The frontend simulator outputs events as perfetto traces. We will do the same

Differential Revision: D76433474
Summary:
Pull Request resolved: pytorch-labs#252

We want our Perfetto trace JSON be uploaded to Manifold so that we can view them in the browser

Differential Revision: D76433475
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76433475

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants