Skip to content

Performance improvement of Vec's swap_remove. #52166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 9, 2018
Merged

Conversation

orlp
Copy link
Contributor

@orlp orlp commented Jul 9, 2018

The old implementation literally swapped and then removed, which resulted in unnecessary move instructions. The new implementation does use unsafe code, but is easy to see that it is correct.

Fixes #52150.

@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @joshtriplett (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 9, 2018
// safe even when index == self.len() - 1, as pop() only uses
// ptr::read and leaves the memory at self[index] untouched.
let hole: *mut T = &mut self[index];
ptr::replace(hole, self.pop().unwrap())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't help but feel that it would be clearer and require less explanation to first move the element with ptr::replace, then decrease the size, rather than relying on the implementation of pop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - in fact I had that implementation at some point. However in the discussion of #52150 the emphasis was on less unsafe code, so I may have gone overboard a bit. I'll change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less concerned about the quantity, and more concerned that the unsafe code we do have should be as straightforward and obvious as possible.

@rust-highfive
Copy link
Contributor

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

[00:04:50] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:04:50] tidy error: /checkout/src/liballoc/vec.rs:813: trailing whitespace
[00:04:52] some tidy checks failed
[00:04:52] 
[00:04:52] 
[00:04:52] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:04:52] 
[00:04:52] 
[00:04:52] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:04:52] Build completed unsuccessfully in 0:00:46
[00:04:52] Build completed unsuccessfully in 0:00:46
[00:04:52] Makefile:79: recipe for target 'tidy' failed
[00:04:52] make: *** [tidy] Error 1

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:09c5b288
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
---
travis_time:end:00e263d1:start=1531110036905404064,finish=1531110036912062911,duration=6658847
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:04ec144f
$ head -30 ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
head: cannot open ‘./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers’ for reading: No such file or directory
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:28c2c6c6
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@joshtriplett
Copy link
Member

This seems much more straightforward to me. 👍

@joshtriplett
Copy link
Member

@bors r+

@bors
Copy link
Collaborator

bors commented Jul 9, 2018

📌 Commit e529dfd has been approved by joshtriplett

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 9, 2018
@bors
Copy link
Collaborator

bors commented Jul 9, 2018

⌛ Testing commit e529dfd with merge a80a610...

bors added a commit that referenced this pull request Jul 9, 2018
Performance improvement of Vec's swap_remove.

The old implementation *literally* swapped and then removed, which resulted in unnecessary move instructions. The new implementation does use unsafe code, but is easy to see that it is correct.

Fixes #52150.
@bors
Copy link
Collaborator

bors commented Jul 9, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: joshtriplett
Pushing a80a610 to master...

@bors bors merged commit e529dfd into rust-lang:master Jul 9, 2018
@hanna-kruppe
Copy link
Contributor

Sorry, yeah, absolute number of unsafe widgets is clearly less important than the subtlety of their correctness (I mentioned that in my first comment in #52150 but was sloppy with it). However, I found the replace(hole, pop()) formulation quite intuitive -- that's part of why I suggested making a pull request. The only thing that's making me reconsider now is an edge case that doesn't appear to be explicitly mentioned by either the original comment or by @joshtriplett, namely: if len == 1 on entry, then pop would touch self[0] but it's read again by ptr::replace and that second read is what's returned. This is fine if pop doesn't overwrite the memory it moves from (which seems very obvious, and in libstd we can certainly assume it) but it is unnecessarily subtle.

But speaking of unnecessary subtlety, the get_unchecked that's now in the merged code is completely superfluous for performance and in fact makes me more worried about correctness than anything else (since it's omitting a bounds check).

@joshtriplett
Copy link
Member

joshtriplett commented Jul 9, 2018

@rkruppe That's a good catch, and no less of a subtlety than the original formulation. I almost suggested a second simplification, which would have also avoided that, but I ended up dismissing the difference; clearly I shouldn't have.

Rather than ptr::read then decreasing length then ptr::replace, why not start out with a ptr::replace of the indexed element from the last element, save the result (the indexed element) into a temporary, decrease the length, then return the temporary?

@orlp
Copy link
Contributor Author

orlp commented Jul 9, 2018

@rkruppe

But speaking of unnecessary subtlety, the get_unchecked that's now in the merged code is completely superfluous for performance and in fact makes me more worried about correctness than anything else (since it's omitting a bounds check).

There is an explicit comment explaining that there must be a last element if the bounds check of hole succeeded.

The only thing that's making me reconsider now is an edge case that doesn't appear to be explicitly mentioned by either the original comment or by @joshtriplett, namely: if len == 1 on entry, then pop would touch self[0] but it's read again by ptr::replace and that second read is what's returned.

That subtlety is mentioned in the committed versions, and it's not just limited to len == 1. It's in general when swap_removeing the last element.

Rather than ptr::read then decreasing length then ptr::replace, why not start out with a ptr::replace of the indexed element from the last element, save the result (the indexed element) into a temporary, decrease the length, then return the temporary?

I considered it, but the assembly generated looked less favorable, and most importantly, I followed the implementation of pop, which also avoids a temporary. Decreasing the length first is perfectly safe.

@hanna-kruppe
Copy link
Contributor

There is an explicit comment explaining that there must be a last element if the bounds check of hole succeeded.

Yes, but one still has to think that through (and verify that the code actually corresponds to what the comment describes), and it has no advantage either. Using an unchecked method where safe indexing would do needs to be motivated. (It also looks uglier, but that's only a very minor aspect.)

That subtlety is mentioned in the committed versions, and it's not just limited to len == 1. It's in general when swap_removeing the last element.

Good point about it applying whenever removing the last element, but my point is specifically about pop. It's not completely obvious whether it might overwrite the element that it moved from and thereby wreak havoc, and the fact that it took me a while to even realize that the pop-based implementation relies on this (undocumented!) guarantee doesn't give me much confidence. In contrast, it's quite obvious that ptr::read doesn't cause problems for the later read by ptr::replace.

I considered it, but the assembly generated looked less favorable, and most importantly, I followed the implementation of pop, which also avoids a temporary.

I haven't looked at assembly myself, but this sounds plausible. Ideally we shouldn't need to worry about temporaries, but unfortunately our optimizations around temporaries and memcpys are still less than stellar. (In contrast, specializing swap_remove to not write the to-be-removed value back to the end of the Vec is an algorithmic optimization that we can't reasonably expect any optimizer to do for us, so I have no qualms about that.)

let hole: *mut T = &mut self[index];
let last = ptr::read(self.get_unchecked(self.len - 1));
self.len -= 1;
ptr::replace(hole, last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The item on hole doesn't seem to be dropped with this change, does it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That item is returned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants