Remove redundant 'resolve_obligations_as_possible' call #13222

bminaiev · 2022-09-12T11:03:09Z

Hi! I was looking for a "good first issue" and saw this one: #7542. I like searching for performance improvements, so I wanted to try to find something useful there.

There are two tests in integrated_benchmarks.rs, I looked at 'integrated_highlighting_benchmark' (not the one discussed in the issue above).

Profile from that test looks like this:

$ RUN_SLOW_BENCHES=1 cargo test --release --package rust-analyzer --lib -- integrated_benchmarks::integrated_highlighting_benchmark --exact --nocapture
    Finished release [optimized] target(s) in 0.06s
     Running unittests src/lib.rs (target/release/deps/rust_analyzer-a80ca6bb8f877458)

running 1 test
workspace loading: 358.45ms
initial: 9.60s
change: 13.96µs
cpu profiling is disabled, uncomment `default = [ "cpu_profiler" ]` in Cargo.toml to enable.
  273ms - highlight
      143ms - infer:wait @ per_query_memory_usage
          143ms - infer_query
                0   - crate_def_map:wait (3165 calls)
                4ms - deref_by_trait (967 calls)
               96ms - resolve_obligations_as_possible (22106 calls)
                0   - trait_solve::wait (2068 calls)
       21ms - Semantics::analyze_impl (18 calls)
        0   - SourceBinder::to_module_def (20 calls)
       36ms - classify_name (19 calls)
       19ms - classify_name_ref (308 calls)
        0   - crate_def_map:wait (461 calls)
        4ms - descend_into_macros (628 calls)
        0   - generic_params_query (4 calls)
        0   - impl_data_with_diagnostics_query (1 calls)
       45ms - infer:wait (37 calls)
        0   - resolve_obligations_as_possible (2 calls)
        0   - source_file_to_def (1 calls)
        0   - trait_solve::wait (42 calls)
after change: 275.23ms
test integrated_benchmarks::integrated_highlighting_benchmark ... ok

22106 calls to resolve_obligations_as_possible seem like the main issue there.

One thing I noticed (and fixed in this PR) is that InferenceContext::resolve_ty_shallow first calls resolve_obligations_as_possible, and then calls InferenceTable::resolve_ty_shallow. But InferenceTable::resolve_ty_shallow inside again calls resolve_obligations_as_possible.

resolve_obligations_as_possible inside has a while loop, which works until it can't find any helpful information. So calling this function for the second time does nothing, so one of the calls could be safely removed.

InferenceContext::resolve_ty_shallow is actually quite a hot place, and after fixing it, the total number of resolve_obligations_as_possible in this test is reduced to 15516 (from 22106). "After change" time also improves from ~270ms to ~240ms, which is not a very huge win, but still something measurable.

Same profile after PR:

$ RUN_SLOW_BENCHES=1 cargo test --release --package rust-analyzer --lib -- integrated_benchmarks::integrated_highlighting_benchmark --exact --nocapture
    Finished release [optimized] target(s) in 0.06s
     Running unittests src/lib.rs (target/release/deps/rust_analyzer-a80ca6bb8f877458)

running 1 test
workspace loading: 339.86ms
initial: 9.28s
change: 10.69µs
cpu profiling is disabled, uncomment `default = [ "cpu_profiler" ]` in Cargo.toml to enable.
  236ms - highlight
      110ms - infer:wait @ per_query_memory_usage
          110ms - infer_query
                0   - crate_def_map:wait (3165 calls)
                4ms - deref_by_trait (967 calls)
               64ms - resolve_obligations_as_possible (15516 calls)
                0   - trait_solve::wait (2068 calls)
       21ms - Semantics::analyze_impl (18 calls)
        0   - SourceBinder::to_module_def (20 calls)
       34ms - classify_name (19 calls)
       18ms - classify_name_ref (308 calls)
        0   - crate_def_map:wait (461 calls)
        3ms - descend_into_macros (628 calls)
        0   - generic_params_query (4 calls)
        0   - impl_data_with_diagnostics_query (1 calls)
       45ms - infer:wait (37 calls)
        0   - resolve_obligations_as_possible (2 calls)
        0   - source_file_to_def (1 calls)
        0   - trait_solve::wait (42 calls)
after change: 238.15ms
test integrated_benchmarks::integrated_highlighting_benchmark ... ok

The performance of this test could be further improved but at the cost of making code more complicated, so I wanted to check if such a change is desirable before sending another PR.

resolve_obligations_as_possible is actually called a lot of times even when no new information was provided. As I understand, resolve_obligations_as_possible could do something useful only if some variables/values were unified since the last check. We can store a boolean variable inside InferenceTable, which indicates if try_unify was called after last resolve_obligations_as_possible. If it wasn't called, we can safely not call resolve_obligations_as_possible again.

I tested this change locally, and it reduces the number of resolve_obligations_as_possible to several thousand (it is not shown in the profile anymore, so don't know the exact number), and the total time is reduced to ~180ms. Here is a generated profile:

$ RUN_SLOW_BENCHES=1 cargo test --release --package rust-analyzer --lib -- integrated_benchmarks::integrated_highlighting_benchmark --exact --nocapture
    Finished release [optimized] target(s) in 0.06s
     Running unittests src/lib.rs (target/release/deps/rust_analyzer-a80ca6bb8f877458)

running 1 test
workspace loading: 349.92ms
initial: 8.56s
change: 11.32µs
cpu profiling is disabled, uncomment `default = [ "cpu_profiler" ]` in Cargo.toml to enable.
  175ms - highlight
       21ms - Semantics::analyze_impl (18 calls)
        0   - SourceBinder::to_module_def (20 calls)
       33ms - classify_name (19 calls)
       17ms - classify_name_ref (308 calls)
        0   - crate_def_map:wait (461 calls)
        3ms - descend_into_macros (628 calls)
        0   - generic_params_query (4 calls)
        0   - impl_data_with_diagnostics_query (1 calls)
       97ms - infer:wait (38 calls)
        0   - resolve_obligations_as_possible (2 calls)
        0   - source_file_to_def (1 calls)
        0   - trait_solve::wait (42 calls)
after change: 177.04ms
test integrated_benchmarks::integrated_highlighting_benchmark ... ok

Let me know if adding a new bool field seems like a reasonable tradeoff, so I can send a PR.

flodiebold · 2022-09-12T11:30:18Z

@bors r+

Your other suggestion does make sense, but it's a bit dangerous as missing some edge-case condition can easily lead to hard-to-track bugs. Still, it's probably worth doing. You would at least have to also reset the flag if new obligations are added, though. Also, I'd suggest checking rust-analyzer analysis-stats on various repositories 1. for any changes in the numbers of inferred types, and 2. to compare the performance.

bors · 2022-09-12T11:30:19Z

📌 Commit 32603ba has been approved by flodiebold

It is now in the queue for this repository.

bors · 2022-09-12T11:30:26Z

⌛ Testing commit 32603ba with merge bc13142...

bors · 2022-09-12T11:40:57Z

☀️ Test successful - checks-actions
Approved by: flodiebold
Pushing bc13142 to master...

lnicola · 2022-09-19T10:59:12Z

@BorysMinaiev are you planning to follow up with the other change?

bminaiev · 2022-09-19T11:28:19Z

I ran rust-analyzer analysis-stats on the rust-analyzer itself with some additional logs. I realized that file ./crates/ide-db/src/apply_change.rs, which is tested inside the integrated_highlighting_benchmark test, is the only file of the rust-analyzer code base, where this optimization helps to improve performance.

The reason is that when analyzing this specific file InferenceTable::pending_obligations becomes big, and each resolve_obligations_as_possible costs a lot. But for other files the sizes of pending_obligations is usually relatively small, so reducing the number of resolve_obligations_as_possible calls doesn't help.

So I thought that making code more complex by adding a new boolean field to the state doesn't make a lot of sense if it only improves performance in some specific cases.

But let me know if you disagree or you know other places, where this optimization could help.

lnicola · 2022-09-19T11:33:46Z

I'm not sure, but maybe our diesel benchmark? I think it's trait-heavy and pretty slow, as you can see in https://rust-analyzer.github.io/metrics/.

See metrics.rs, we use c52ee623e231e7690a93be88d943016968c1036b of https://github.com/rust-lang/rustc-perf.

bminaiev · 2022-09-19T14:32:06Z

I tried to test locally on diesel, but seems like the difference between the two versions is smaller than measurement precision (as I see on the metrics page, it is normal to receive +-10% difference, when running the same test several times).

lnicola · 2022-09-19T14:40:37Z

I don't know, it might still be worth doing it, since it can expose other slow parts in that specific benchmark.

Remove redundant 'resolve_obligations_as_possible' call

32603ba

bors merged commit bc13142 into rust-lang:master Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove redundant 'resolve_obligations_as_possible' call #13222

Remove redundant 'resolve_obligations_as_possible' call #13222

Uh oh!

bminaiev commented Sep 12, 2022

Uh oh!

flodiebold commented Sep 12, 2022

Uh oh!

bors commented Sep 12, 2022

Uh oh!

bors commented Sep 12, 2022

Uh oh!

bors commented Sep 12, 2022

Uh oh!

lnicola commented Sep 19, 2022

Uh oh!

bminaiev commented Sep 19, 2022

Uh oh!

lnicola commented Sep 19, 2022

Uh oh!

bminaiev commented Sep 19, 2022

Uh oh!

lnicola commented Sep 19, 2022

Uh oh!

Uh oh!

Remove redundant 'resolve_obligations_as_possible' call #13222

Remove redundant 'resolve_obligations_as_possible' call #13222

Uh oh!

Conversation

bminaiev commented Sep 12, 2022

Uh oh!

flodiebold commented Sep 12, 2022

Uh oh!

bors commented Sep 12, 2022

Uh oh!

bors commented Sep 12, 2022

Uh oh!

bors commented Sep 12, 2022

Uh oh!

lnicola commented Sep 19, 2022

Uh oh!

bminaiev commented Sep 19, 2022

Uh oh!

lnicola commented Sep 19, 2022

Uh oh!

bminaiev commented Sep 19, 2022

Uh oh!

lnicola commented Sep 19, 2022

Uh oh!

Uh oh!