Skip to content

Remove redundant 'resolve_obligations_as_possible' call #13222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 12, 2022

Conversation

bminaiev
Copy link
Contributor

Hi! I was looking for a "good first issue" and saw this one: #7542. I like searching for performance improvements, so I wanted to try to find something useful there.

There are two tests in integrated_benchmarks.rs, I looked at 'integrated_highlighting_benchmark' (not the one discussed in the issue above).

Profile from that test looks like this:

$ RUN_SLOW_BENCHES=1 cargo test --release --package rust-analyzer --lib -- integrated_benchmarks::integrated_highlighting_benchmark --exact --nocapture
    Finished release [optimized] target(s) in 0.06s
     Running unittests src/lib.rs (target/release/deps/rust_analyzer-a80ca6bb8f877458)

running 1 test
workspace loading: 358.45ms
initial: 9.60s
change: 13.96µs
cpu profiling is disabled, uncomment `default = [ "cpu_profiler" ]` in Cargo.toml to enable.
  273ms - highlight
      143ms - infer:wait @ per_query_memory_usage
          143ms - infer_query
                0   - crate_def_map:wait (3165 calls)
                4ms - deref_by_trait (967 calls)
               96ms - resolve_obligations_as_possible (22106 calls)
                0   - trait_solve::wait (2068 calls)
       21ms - Semantics::analyze_impl (18 calls)
        0   - SourceBinder::to_module_def (20 calls)
       36ms - classify_name (19 calls)
       19ms - classify_name_ref (308 calls)
        0   - crate_def_map:wait (461 calls)
        4ms - descend_into_macros (628 calls)
        0   - generic_params_query (4 calls)
        0   - impl_data_with_diagnostics_query (1 calls)
       45ms - infer:wait (37 calls)
        0   - resolve_obligations_as_possible (2 calls)
        0   - source_file_to_def (1 calls)
        0   - trait_solve::wait (42 calls)
after change: 275.23ms
test integrated_benchmarks::integrated_highlighting_benchmark ... ok

22106 calls to resolve_obligations_as_possible seem like the main issue there.

One thing I noticed (and fixed in this PR) is that InferenceContext::resolve_ty_shallow first calls resolve_obligations_as_possible, and then calls InferenceTable::resolve_ty_shallow. But InferenceTable::resolve_ty_shallow inside again calls resolve_obligations_as_possible.

resolve_obligations_as_possible inside has a while loop, which works until it can't find any helpful information. So calling this function for the second time does nothing, so one of the calls could be safely removed.

InferenceContext::resolve_ty_shallow is actually quite a hot place, and after fixing it, the total number of resolve_obligations_as_possible in this test is reduced to 15516 (from 22106). "After change" time also improves from ~270ms to ~240ms, which is not a very huge win, but still something measurable.

Same profile after PR:

$ RUN_SLOW_BENCHES=1 cargo test --release --package rust-analyzer --lib -- integrated_benchmarks::integrated_highlighting_benchmark --exact --nocapture
    Finished release [optimized] target(s) in 0.06s
     Running unittests src/lib.rs (target/release/deps/rust_analyzer-a80ca6bb8f877458)

running 1 test
workspace loading: 339.86ms
initial: 9.28s
change: 10.69µs
cpu profiling is disabled, uncomment `default = [ "cpu_profiler" ]` in Cargo.toml to enable.
  236ms - highlight
      110ms - infer:wait @ per_query_memory_usage
          110ms - infer_query
                0   - crate_def_map:wait (3165 calls)
                4ms - deref_by_trait (967 calls)
               64ms - resolve_obligations_as_possible (15516 calls)
                0   - trait_solve::wait (2068 calls)
       21ms - Semantics::analyze_impl (18 calls)
        0   - SourceBinder::to_module_def (20 calls)
       34ms - classify_name (19 calls)
       18ms - classify_name_ref (308 calls)
        0   - crate_def_map:wait (461 calls)
        3ms - descend_into_macros (628 calls)
        0   - generic_params_query (4 calls)
        0   - impl_data_with_diagnostics_query (1 calls)
       45ms - infer:wait (37 calls)
        0   - resolve_obligations_as_possible (2 calls)
        0   - source_file_to_def (1 calls)
        0   - trait_solve::wait (42 calls)
after change: 238.15ms
test integrated_benchmarks::integrated_highlighting_benchmark ... ok

The performance of this test could be further improved but at the cost of making code more complicated, so I wanted to check if such a change is desirable before sending another PR.

resolve_obligations_as_possible is actually called a lot of times even when no new information was provided. As I understand, resolve_obligations_as_possible could do something useful only if some variables/values were unified since the last check. We can store a boolean variable inside InferenceTable, which indicates if try_unify was called after last resolve_obligations_as_possible. If it wasn't called, we can safely not call resolve_obligations_as_possible again.

I tested this change locally, and it reduces the number of resolve_obligations_as_possible to several thousand (it is not shown in the profile anymore, so don't know the exact number), and the total time is reduced to ~180ms. Here is a generated profile:

$ RUN_SLOW_BENCHES=1 cargo test --release --package rust-analyzer --lib -- integrated_benchmarks::integrated_highlighting_benchmark --exact --nocapture
    Finished release [optimized] target(s) in 0.06s
     Running unittests src/lib.rs (target/release/deps/rust_analyzer-a80ca6bb8f877458)

running 1 test
workspace loading: 349.92ms
initial: 8.56s
change: 11.32µs
cpu profiling is disabled, uncomment `default = [ "cpu_profiler" ]` in Cargo.toml to enable.
  175ms - highlight
       21ms - Semantics::analyze_impl (18 calls)
        0   - SourceBinder::to_module_def (20 calls)
       33ms - classify_name (19 calls)
       17ms - classify_name_ref (308 calls)
        0   - crate_def_map:wait (461 calls)
        3ms - descend_into_macros (628 calls)
        0   - generic_params_query (4 calls)
        0   - impl_data_with_diagnostics_query (1 calls)
       97ms - infer:wait (38 calls)
        0   - resolve_obligations_as_possible (2 calls)
        0   - source_file_to_def (1 calls)
        0   - trait_solve::wait (42 calls)
after change: 177.04ms
test integrated_benchmarks::integrated_highlighting_benchmark ... ok

Let me know if adding a new bool field seems like a reasonable tradeoff, so I can send a PR.

@flodiebold
Copy link
Member

@bors r+

Your other suggestion does make sense, but it's a bit dangerous as missing some edge-case condition can easily lead to hard-to-track bugs. Still, it's probably worth doing. You would at least have to also reset the flag if new obligations are added, though. Also, I'd suggest checking rust-analyzer analysis-stats on various repositories 1. for any changes in the numbers of inferred types, and 2. to compare the performance.

@bors
Copy link
Contributor

bors commented Sep 12, 2022

📌 Commit 32603ba has been approved by flodiebold

It is now in the queue for this repository.

@bors
Copy link
Contributor

bors commented Sep 12, 2022

⌛ Testing commit 32603ba with merge bc13142...

@bors
Copy link
Contributor

bors commented Sep 12, 2022

☀️ Test successful - checks-actions
Approved by: flodiebold
Pushing bc13142 to master...

@bors bors merged commit bc13142 into rust-lang:master Sep 12, 2022
@lnicola
Copy link
Member

lnicola commented Sep 19, 2022

@BorysMinaiev are you planning to follow up with the other change?

@bminaiev
Copy link
Contributor Author

I ran rust-analyzer analysis-stats on the rust-analyzer itself with some additional logs. I realized that file ./crates/ide-db/src/apply_change.rs, which is tested inside the integrated_highlighting_benchmark test, is the only file of the rust-analyzer code base, where this optimization helps to improve performance.

The reason is that when analyzing this specific file InferenceTable::pending_obligations becomes big, and each resolve_obligations_as_possible costs a lot. But for other files the sizes of pending_obligations is usually relatively small, so reducing the number of resolve_obligations_as_possible calls doesn't help.

So I thought that making code more complex by adding a new boolean field to the state doesn't make a lot of sense if it only improves performance in some specific cases.

But let me know if you disagree or you know other places, where this optimization could help.

@lnicola
Copy link
Member

lnicola commented Sep 19, 2022

I'm not sure, but maybe our diesel benchmark? I think it's trait-heavy and pretty slow, as you can see in https://rust-analyzer.github.io/metrics/.

See metrics.rs, we use c52ee623e231e7690a93be88d943016968c1036b of https://github.com/rust-lang/rustc-perf.

@bminaiev
Copy link
Contributor Author

I tried to test locally on diesel, but seems like the difference between the two versions is smaller than measurement precision (as I see on the metrics page, it is normal to receive +-10% difference, when running the same test several times).

@lnicola
Copy link
Member

lnicola commented Sep 19, 2022

I don't know, it might still be worth doing it, since it can expose other slow parts in that specific benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants