Skip to content

Enable Link-Time Optimization (LTO) and codegen-units = 1 #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zamazan4ik opened this issue Jan 13, 2025 · 6 comments
Closed

Enable Link-Time Optimization (LTO) and codegen-units = 1 #1

zamazan4ik opened this issue Jan 13, 2025 · 6 comments

Comments

@zamazan4ik
Copy link

Hi!

I noticed that in the Cargo.toml file Link-Time Optimization (LTO) for the project is not enabled. I suggest switching it on since it will reduce the binary size (always a good thing to have) and will likely improve the application's performance. If you want to read more about LTO and its possible modes, I recommend starting from this Rustc documentation.

I think you can enable LTO only for the Release builds so as not to sacrifice the developers' experience while working on the project since LTO consumes an additional amount of time to finish the compilation routine. If you think that a regular Release build should not be affected by such a change as well, then I suggest adding an additional dist or release-lto profile where in addition to regular release optimizations LTO will also be added. Such a change simplifies life for maintainers and others interested in the project persons who want to build the most optimized version of the application. However, if we enable it on the Cargo profile level for the Release profile, users, who install the application with cargo install will get the LTO-optimized version of the game "automatically". E.g., check cargo-outdated Release profile. You also could be interested in other optimization options like codegen-units = 1 - it also brings improvements over the current defaults.

Basically, it can be enabled with the following lines:

[profile.release]
codegen-units = 1
lto = true

I have made quick tests (Fedora 41, Rust 1.84, the latest version of the project at the moment) - here are the results:

  • Release (current default): 1.3 MiB
  • Release + codegen-units = 1 + Fat LTO: 912 KiB

Thank you.

@GoldenStack
Copy link
Owner

LOL
image

@berkus
Copy link

berkus commented Jan 13, 2025

912 KiB

nice, and that comes with a speed boost i guess?

@zamazan4ik
Copy link
Author

I didn't measure a speed boost yet tbh. As a minimum I expect at least the same performance level as without LTO but with a smaller size that is definitely still a win. As a maximum - yep, performance should be increased too. If there is a simple way to measure performance "before and after" - I can run the benchmark and the results here too.

@GoldenStack
Copy link
Owner

GoldenStack commented Jan 13, 2025

Hi,
Is this some kind of social experiment? Based on your comments here it seems like you're really dedicated to the cause, but what I don't really get is why you're working at this in the first place, especially when this often leads to much slower compile times (even if for release only) with little to no performance gain (although there is clearly a size decrease).

Given the large number of contributions, along with expectations that it'll lead to change, it seems like it would be something that you would want to contribute as a default to the upstream to save a lot of time (I did notice your issue here). Let me know if I missed anything relevant to this.

I looked through your profiles and you seem like you're clearly a real person (i.e. not an AI issue opening bot), and I admire the patience that it takes to go through hundreds of projects. So I'm just really confused why you've embarked on this journey.

@zamazan4ik
Copy link
Author

zamazan4ik commented Jan 13, 2025

Is this some kind of social experiment?

Nope. I simply trying to improve Rust projects by using provided by the Rustc compiler opportunities to deliver better experience to the users of the Rust-based ecosystem. That's it.

Based on your comments zed-industries/zed#21450 it seems like you're really dedicated to the cause, but what I don't really get is why you're working at this in the first place, especially when this often leads to much slower compile times (even if for release only) with little to no performance gain (although there is clearly a size decrease).

The answer is pretty simple. I value users' experience much more than experience of the CI systems. My personal opinion on this topic can be demonstrated with one example: personally, I am ready to compile delivered to the users binary for weeks if it will deliver several percent improvements for a large amount of users. It's not the case of this project (at least yet, hehe) but you got my point. However, in places where such benchmarks can be performed quite easily, I do LTO benchmarks too.

The reason why in almost all my LTO issues I show binary size improvements instead of performance is that binary size comparisons are much easier to measure than performance. If we can improve binary size improvements even without (proven) performance reasons - it's still a huge win. Especially for the Rust ecosystem, where binaries usually are large - and this consequently raises from time to time various rants about that. However, if other ecosystems or other langs will enable LTO by default - I would be happy for them too! I simply trying to use more Rust software where I can mostly for performance and SIGSEGV-free reasons (ofc balancing other criteria depending on a situation - I'm not a Rust zealot).

Given the large number of contributions, along with expectations that it'll lead to change, it seems like it would be something that you would want to contribute as a default to the upstream (I did notice your issue rust-lang/rust#115344). Let me know if I missed anything relevant to this.

You are right here 100%. One day I hope that LTO will be enabled by default for the Cargo's Release profile. Or we will get an additional [profile.truly-optimized-release] profile. Or any other solution. If you look through over Cargo discussions, you will see that currently discussions about that are stuck. There are similar thread in the Rust Zulip but I don't have the exact links near me. I've seen even arguments like "Rust is already fast enough so we don't want spend our time on improving the default performance level more" (and a part of me can even agree with this point of view). If Cargo dev team later decides to consider enabling LTO by default once again in any way - I already have for them pretty large amount of evidence, how frequently LTO is enabled by Rust users. Also, I gathered LTO statistics for the Rust ecosystem earlier and shared it with the Cargo dev team - also with no practical effects. Since all of that, I decided to enable LTO "manually" for Rust applications - it's better than nothing, IMHO.

I looked through your profiles and you seem like you're clearly a real person (i.e. not an AI issue opening bot), and I admire the patience that it takes to go through hundreds of projects.

I am 100% real person. If you visit Warsaw, we can even meet in a bar and drink beer ;)

So I'm just really confused why you've embarked on this journey.

Only one reason - I want to improve the "default performance level" of the Rust ecosystem in any way that I can. There is no any other reason, believe me or not. The same goes not only about LTO but for Profile-Guided Optimization (PGO) too - exactly the same reasons. Besides that, I hope write about it in two articles - if you are interested, the current drafts are available here: LTO, LTO todos, PGO, PGO todos.

I hope I answered your questions! If so, could you please enable LTO and codegen-units for the project?

@GoldenStack
Copy link
Owner

Originally I was not going to enable LTO and codegen-units, but out of respect for your incredible dedication, I just added it in version 0.2.1 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants