Skip to content

Add missing _excess methods to Alloc and use them in RawVec #50738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented May 14, 2018

Adds the following methods to the Alloc trait:

  • alloc_zeroed_excess
  • shrink_in_place_excess
  • grow_in_place_excess

Uses them in the appropriate places within the RawVec implementation.

@rust-highfive
Copy link
Contributor

r? @dtolnay

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 14, 2018
@gnzlbg
Copy link
Contributor Author

gnzlbg commented May 14, 2018

r? @sfackler

--

cc @glandium

@rust-highfive rust-highfive assigned sfackler and unassigned dtolnay May 14, 2018
@gnzlbg
Copy link
Contributor Author

gnzlbg commented May 14, 2018

The shrink_in_place_excess method is not currently used anywhere but should be used by RawVec::shrink_to_fit in a subsequent PR.

Also, sub-sequent PRs should move Vec from using double to using reserve (#50739), and should specialize these methods for jemalloc (where is the Alloc impl for jemalloc ? I can't really find it anymore ; it used to be in liballoc_jemalloc).

@gnzlbg
Copy link
Contributor Author

gnzlbg commented May 14, 2018

cc @alexcrichton I'll update the jemallocator crate once these land on nightly with the specialized implementations: {grow,shrink}_in_place_excess map directly to xallocx and don't need to call nallocx, also, {grow_shrink}_in_place for jemalloc become then thin wrappers over the _excess variants that just drop the excess.

@rust-highfive
Copy link
Contributor

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

[00:04:59] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:04:59] tidy error: /checkout/src/liballoc/raw_vec.rs:413: line longer than 100 chars
[00:04:59] tidy error: /checkout/src/liballoc/raw_vec.rs:491: line longer than 100 chars
[00:05:00] some tidy checks failed
[00:05:00] 
[00:05:00] 
[00:05:00] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:05:00] 
[00:05:00] 
[00:05:00] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:05:00] Build completed unsuccessfully in 0:02:00
[00:05:00] Build completed unsuccessfully in 0:02:00
[00:05:00] make: *** [tidy] Error 1
[00:05:00] Makefile:79: recipe for target 'tidy' failed

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:043a73b4
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@sfackler
Copy link
Member

Could you show some benchmark comparisons between the old and new behavior?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented May 14, 2018

@sfackler sure, what benchmarks would you like to see ?

@sfackler
Copy link
Member

The existing vec benchmarks would be a good choice https://github.com/rust-lang/rust/blob/master/src/liballoc/benches/vec.rs

@gnzlbg
Copy link
Contributor Author

gnzlbg commented May 14, 2018

Yeah I can do those!

@glandium
Copy link
Contributor

While I think something like this is necessary, I also think that's way too much churn to add to the Alloc API. Background: https://internals.rust-lang.org/t/pre-rfc-changing-the-alloc-trait/7487

@alexcrichton
Copy link
Member

When this is about ready to land I think we'll also want to do at least a perf.rust-lang.org run, these tend to be very performance sensitive so just want to try to weed out possible regressions if we can!

@gnzlbg
Copy link
Contributor Author

gnzlbg commented May 16, 2018

After the PR (vec_bench_exc) the benchmarks are faster in most cases (negative diff means faster), but honestly I think all of this is within noise.

 name                                vec_bench_master ns/iter  vec_bench_exc ns/iter  diff ns/iter   diff %  speedup 
 slice::ends_with_same_vector        0                         0                                 0     NaN%    x NaN 
 slice::starts_with_same_vector      0                         0                                 0     NaN%    x NaN 
 vec::bench_clone_0000               14                        13                               -1   -7.14%   x 1.08 
 vec::bench_clone_0010               51 (196 MB/s)             51 (196 MB/s)                     0    0.00%   x 1.00 
 vec::bench_clone_0100               149 (671 MB/s)            139 (719 MB/s)                  -10   -6.71%   x 1.07 
 vec::bench_clone_1000               1,085 (921 MB/s)          1,057 (946 MB/s)                -28   -2.58%   x 1.03 
 vec::bench_clone_from_01_0000_0000  25                        24                               -1   -4.00%   x 1.04 
 vec::bench_clone_from_01_0000_0010  80 (125 MB/s)             77 (129 MB/s)                    -3   -3.75%   x 1.04 
 vec::bench_clone_from_01_0000_0100  212 (471 MB/s)            204 (490 MB/s)                   -8   -3.77%   x 1.04 
 vec::bench_clone_from_01_0000_1000  1,631 (613 MB/s)          1,616 (618 MB/s)                -15   -0.92%   x 1.01 
 vec::bench_clone_from_01_0010_0000  24                        27                                3   12.50%   x 0.89 
 vec::bench_clone_from_01_0010_0010  81 (123 MB/s)             72 (138 MB/s)                    -9  -11.11%   x 1.12 
 vec::bench_clone_from_01_0010_0100  236 (423 MB/s)            204 (490 MB/s)                  -32  -13.56%   x 1.16 
 vec::bench_clone_from_01_0100_0010  81 (123 MB/s)             70 (142 MB/s)                   -11  -13.58%   x 1.16 
 vec::bench_clone_from_01_0100_0100  213 (469 MB/s)            210 (476 MB/s)                   -3   -1.41%   x 1.01 
 vec::bench_clone_from_01_0100_1000  1,715 (583 MB/s)          1,660 (602 MB/s)                -55   -3.21%   x 1.03 
 vec::bench_clone_from_01_1000_0100  230 (434 MB/s)            226 (442 MB/s)                   -4   -1.74%   x 1.02 
 vec::bench_clone_from_01_1000_1000  1,745 (573 MB/s)          1,658 (603 MB/s)                -87   -4.99%   x 1.05 
 vec::bench_clone_from_10_0000_0000  152                       146                              -6   -3.95%   x 1.04 
 vec::bench_clone_from_10_0000_0010  347 (288 MB/s)            321 (311 MB/s)                  -26   -7.49%   x 1.08 
 vec::bench_clone_from_10_0000_0100  1,619 (617 MB/s)          1,513 (660 MB/s)               -106   -6.55%   x 1.07 
 vec::bench_clone_from_10_0000_1000  14,540 (687 MB/s)         14,150 (706 MB/s)              -390   -2.68%   x 1.03 
 vec::bench_clone_from_10_0010_0000  148                       143                              -5   -3.38%   x 1.03 
 vec::bench_clone_from_10_0010_0010  373 (268 MB/s)            321 (311 MB/s)                  -52  -13.94%   x 1.16 
 vec::bench_clone_from_10_0010_0100  1,711 (584 MB/s)          1,513 (660 MB/s)               -198  -11.57%   x 1.13 
 vec::bench_clone_from_10_0100_0010  356 (280 MB/s)            323 (309 MB/s)                  -33   -9.27%   x 1.10 
 vec::bench_clone_from_10_0100_0100  1,719 (581 MB/s)          1,510 (662 MB/s)               -209  -12.16%   x 1.14 
 vec::bench_clone_from_10_0100_1000  14,240 (702 MB/s)         14,201 (704 MB/s)               -39   -0.27%   x 1.00 
 vec::bench_clone_from_10_1000_0100  1,732 (577 MB/s)          1,511 (661 MB/s)               -221  -12.76%   x 1.15 
 vec::bench_clone_from_10_1000_1000  15,541 (643 MB/s)         15,025 (665 MB/s)              -516   -3.32%   x 1.03 
 vec::bench_extend_0000_0000         40                        34                               -6  -15.00%   x 1.18 
 vec::bench_extend_0000_0010         119 (84 MB/s)             110 (90 MB/s)                    -9   -7.56%   x 1.08 
 vec::bench_extend_0000_0100         227 (440 MB/s)            220 (454 MB/s)                   -7   -3.08%   x 1.03 
 vec::bench_extend_0000_1000         1,373 (728 MB/s)          1,332 (750 MB/s)                -41   -2.99%   x 1.03 
 vec::bench_extend_0010_0010         177 (56 MB/s)             180 (55 MB/s)                     3    1.69%   x 0.98 
 vec::bench_extend_0100_0100         410 (243 MB/s)            396 (252 MB/s)                  -14   -3.41%   x 1.04 
 vec::bench_extend_1000_1000         3,133 (319 MB/s)          3,002 (333 MB/s)               -131   -4.18%   x 1.04 
 vec::bench_from_elem_0000           10                        10                                0    0.00%   x 1.00 
 vec::bench_from_elem_0010           52 (192 MB/s)             51 (196 MB/s)                    -1   -1.92%   x 1.02 
 vec::bench_from_elem_0100           168 (595 MB/s)            192 (520 MB/s)                   24   14.29%   x 0.88 
 vec::bench_from_elem_1000           1,284 (778 MB/s)          1,264 (791 MB/s)                -20   -1.56%   x 1.02 
 vec::bench_from_fn_0000             11                        11                                0    0.00%   x 1.00 
 vec::bench_from_fn_0010             54 (185 MB/s)             60 (166 MB/s)                     6   11.11%   x 0.90 
 vec::bench_from_fn_0100             207 (483 MB/s)            228 (438 MB/s)                   21   10.14%   x 0.91 
 vec::bench_from_fn_1000             1,627 (614 MB/s)          1,600 (625 MB/s)                -27   -1.66%   x 1.02 
 vec::bench_from_iter_0000           15                        15                                0    0.00%   x 1.00 
 vec::bench_from_iter_0010           55 (181 MB/s)             51 (196 MB/s)                    -4   -7.27%   x 1.08 
 vec::bench_from_iter_0100           142 (704 MB/s)            142 (704 MB/s)                    0    0.00%   x 1.00 
 vec::bench_from_iter_1000           1,110 (900 MB/s)          1,057 (946 MB/s)                -53   -4.77%   x 1.05 
 vec::bench_from_slice_0000          23                        23                                0    0.00%   x 1.00 
 vec::bench_from_slice_0010          94 (106 MB/s)             110 (90 MB/s)                    16   17.02%   x 0.85 
 vec::bench_from_slice_0100          202 (495 MB/s)            228 (438 MB/s)                   26   12.87%   x 0.89 
 vec::bench_from_slice_1000          1,357 (736 MB/s)          1,320 (757 MB/s)                -37   -2.73%   x 1.03 
 vec::bench_new                      0                         0                                 0     NaN%    x NaN 
 vec::bench_push_all_0000_0000       21                        21                                0    0.00%   x 1.00 
 vec::bench_push_all_0000_0010       61 (163 MB/s)             58 (172 MB/s)                    -3   -4.92%   x 1.05 
 vec::bench_push_all_0000_0100       164 (609 MB/s)            147 (680 MB/s)                  -17  -10.37%   x 1.12 
 vec::bench_push_all_0000_1000       1,089 (918 MB/s)          1,072 (932 MB/s)                -17   -1.56%   x 1.02 
 vec::bench_push_all_0010_0010       139 (71 MB/s)             127 (78 MB/s)                   -12   -8.63%   x 1.09 
 vec::bench_push_all_0100_0100       338 (295 MB/s)            334 (299 MB/s)                   -4   -1.18%   x 1.01 
 vec::bench_push_all_1000_1000       2,979 (335 MB/s)          3,086 (324 MB/s)                107    3.59%   x 0.97 
 vec::bench_push_all_move_0000_0000  36                        37                                1    2.78%   x 0.97 
 vec::bench_push_all_move_0000_0010  118 (84 MB/s)             116 (86 MB/s)                    -2   -1.69%   x 1.02 
 vec::bench_push_all_move_0000_0100  224 (446 MB/s)            234 (427 MB/s)                   10    4.46%   x 0.96 
 vec::bench_push_all_move_0000_1000  1,400 (714 MB/s)          1,352 (739 MB/s)                -48   -3.43%   x 1.04 
 vec::bench_push_all_move_0010_0010  178 (56 MB/s)             196 (51 MB/s)                    18   10.11%   x 0.91 
 vec::bench_push_all_move_0100_0100  399 (250 MB/s)            399 (250 MB/s)                    0    0.00%   x 1.00 
 vec::bench_push_all_move_1000_1000  3,218 (310 MB/s)          3,334 (299 MB/s)                116    3.60%   x 0.97 
 vec::bench_with_capacity_0000       2                         2                                 0    0.00%   x 1.00 
 vec::bench_with_capacity_0010       31 (322 MB/s)             31 (322 MB/s)                     0    0.00%   x 1.00 
 vec::bench_with_capacity_0100       31 (3225 MB/s)            30 (3333 MB/s)                   -1   -3.23%   x 1.03 
 vec::bench_with_capacity_1000       31 (32258 MB/s)           34 (29411 MB/s)                   3    9.68%   x 0.91 
 vec_deque::bench_grow_1025          5,754                     5,546                          -208   -3.61%   x 1.04 
 vec_deque::bench_iter_1000          1,002                     910                             -92   -9.18%   x 1.10 
 vec_deque::bench_mut_iter_1000      983                       924                             -59   -6.00%   x 1.06 
 vec_deque::bench_new                37                        36                               -1   -2.70%   x 1.03 

@alexcrichton perf.rust-lang.org memory consumption results has noise levels of 20-30% (master vs master) and this change would mostly impact memory consumption. This PR does not change the Alloc implementation of jemalloc, which should specialize {grow,shrink_in_place_excess} amongst other things. That should happen when I figure out where that implementation went (I don't know where that is done anymore and haven't been able to find it).

For the run-time parts of the benchmarks, whether that's relevant or not will depend on how much does rustc use Vecs and Strings internally in the performance critical parts. For example, if we would reduce the cost of all Vec operations to zero, by how much would that impact rustc timings? AFAICT rustc uses "ids" and hashtables more often than Vecs and Strings.

@Mark-Simulacrum could you launch a perf run? Does perf-rust-lang.org provide an estimate of the measurement noise? (e.g. running the benchmarks enough times with master should give at least some estimate of the noise)


@glandium

While I think something like this is necessary, I also think that's way too much churn to add to the Alloc API. Background

I am of the opinion that all Alloc methods should return the Excess always, period. If the Excess is not used, rustc should optimize its computation out. This is possible for jemalloc, and should be possible for every allocator whether its implemented in C or in Rust. Also every allocator can return an Excess by just returning the size that was passed in, so it is trivial for anybody to write a newtype that takes an Alloc and implements Alloc by just always dropping whichever Excess the alloc methods return and returns the input size instead.

The only situation in which returning the Excess cannot be optimized out is if usable_size is a side-effecting operation, but given that it isn't in all allocators that we are using today, it suffices to add it as a requirement (taking &self is not enough due to interior mutability).

@alexcrichton
Copy link
Member

@bors: try

FWIW a perf run isn't a great way to measure this, it's mostly a "hey look there's no obvious regressions by accident" sort of signal

@bors
Copy link
Collaborator

bors commented May 16, 2018

⌛ Trying commit 6e2fcbd with merge bf00fd8...

bors added a commit that referenced this pull request May 16, 2018
Add missing _excess methods to Alloc and use them in RawVec

Adds the following methods to the `Alloc` trait:

* `alloc_zeroed_excess`
* `shrink_in_place_excess`
* `grow_in_place_excess`

Uses them in the appropriate places within the `RawVec` implementation, with the exception of `shrink_in_place_excess` which should be used in `RawVec::shrink_to_fit`, but which will be done in a different PR.
@gnzlbg
Copy link
Contributor Author

gnzlbg commented May 16, 2018

cc @SimonSapin this only touches Alloc but you might want to give this a look as well

@kennytm kennytm added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 16, 2018
@bors
Copy link
Collaborator

bors commented May 16, 2018

☀️ Test successful - status-travis
State: approved= try=True

@alexcrichton
Copy link
Member

I've scheduled a perf run, data will show up here when ready

@gnzlbg
Copy link
Contributor Author

gnzlbg commented May 16, 2018

Awesome, thanks @alexcrichton

@glandium
Copy link
Contributor

I think the discussion about Excess should happen on https://internals.rust-lang.org/t/pre-rfc-changing-the-alloc-trait/7487 . I don't totally disagree that returning Excess could be a good change, but there are serious downsides to it for the callers. Whichever conclusion happens wrt Excess, it doesn't seem to me adding more _excess methods is going to be the solution. IOW, I don't think the Alloc trait is going to look like either what it looks like today, nor what it looks like with the change from this PR, so it feels like change for the sake of change, but doesn't get us any closer to the final goal.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented May 16, 2018

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0901r0.html suggest the same, FWIW I implemented that same API on jemalloc (jemalloc/jemalloc#1074 (comment)) , and in some cases is much faster than doing alloc + usable_size.

@glandium
Copy link
Contributor

Maybe what's missing is for Excess to have some conversion code, like Into<NonNull<Opaque>>, to make it easier for code that doesn't use the size.
I'll note that when I wrote in the internals thread that the compiler doesn't do a good job at optimizing the size out, I was not even talking about an implementation that uses usable_size, but one that just returned the layout size. Presumably, that can be improved.
But pretty please, can we move this discussion to internals?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented May 16, 2018

I also wondered whether Excess should just return a Layout. Why not alloc(Layout) -> (ptr, Layout) ? It might well be that the alignment of the allocation is higher than the requested one. Is that useful? Probably not, but who knows. Worst case, allocators that do not support this just return the input layout as is.

But pretty please, can we move this discussion to internals?

I'll try to chime in there tomorrow.

@pietroalbini
Copy link
Member

Ping from triage @sfackler! What's the status of this PR?

@kennytm
Copy link
Member

kennytm commented May 21, 2018

@kennytm kennytm removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 21, 2018
@alexcrichton
Copy link
Member

Great! That shows there aren't any "obvious" regressions

@@ -574,6 +574,10 @@ impl<T> Vec<T> {
/// It will drop down as close as possible to the length but the allocator
/// may still inform the vector that there is space for a few more elements.
///
/// Note: `shrink_to_fit` is a non-binding request. Whether the `Vec` size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what way will this help someone reading the documentation?

if l <= new_size {
return Ok(Excess(ptr, u));
} else {
return Err(CannotReallocInPlace);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we end up in this situation? The allocation can always fit less data than it already contains, right?

Copy link
Contributor Author

@gnzlbg gnzlbg May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two issues that interact here:

  • the user is not allowed to call dealloc with any size, but the size used must be in the range returned by usable_size -> (l, u). Sizes in that range must not invoke undefined behavior when passed to dealloc.

  • the only requirement on the sizes (l, u) returned by usable_size is that l <= u. Therefore, a valid usable_size implementation is: usable_size() -> (u, u).

Therefore, we can end here if the lower bound on the allocation size l returned by usable_size -> (l, u) is larger than new_size, which happens if usable_size() -> (u, u) and new_size < u.

If we were to return Ok in this case, the user would be allowed to call dealloc with new_size, but since that's out of the [l, u] range, that would trigger undefined behavior when calling dealloc.

Does that make sense?


FWIW I think that the current API of shrink_in_place is a bit broken. For example, malloc does not care about the allocation size on deallocation, so a valid implementation of usable_size for malloc is to return (0, size). This would allow implementing shrink_in_place / shrink_in_place_excess for malloc by just doing nothing and returning Ok, but I don't think that's a useful implementation of shrink_in_place. Even if it were to return Ok only if it was able to shrink the allocation, that might not be useful either. If I have a 1Gb allocation, and want to shrink it in place to 100 Mb, and shrink_in_place only shrinks it by 1 byte, then that's not useful.

The wording that is problematic is that shrink_in_place is allowed to return more size than the one requested, which means that it can just return the original size and do nothing.

For grow_in_place this wording makes sense, but it seems that it was c&p for shrink_in_place. So either shrink_in_place should always return the allocation size so that the user can check whether it shrunk the allocation enough, or it should take a requested size and an upper bound, and return error if it can't shrink the allocation under the upper bound.

Also while I understand the rationale behind splitting realloc_in_place into {grow,shrink}_in_place, I am starting to doubt whether that was the right call. A single realloc_in_place method would reduce API surface and simplify things even if it adds a branch for allocators that can have completely different execution paths for growing and shrinking.

In any case all of this must be discussed in the RFC issue, and not here, but those are my thoughts on the matter.

@sfackler
Copy link
Member

On the other hand, it also doesn't show any particular advantage to making this change.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented May 22, 2018

On the other hand, it also doesn't show any particular advantage to making this change.

Yes, these changes should not have any impact on Perf at all since liballoc_jemalloc does not specialize the _excess methods anymore. One would need to run perf with a rustc compiled with jemallocator configured as the global allocator for that.

@pietroalbini
Copy link
Member

Ping from triage! What's the status of this PR?

@pietroalbini
Copy link
Member

Ping from triage @sfackler! What's the status of this PR?

@sfackler
Copy link
Member

sfackler commented Jun 5, 2018

I'd like to see a concrete performance improvement before adding more complexity here.

@pietroalbini pietroalbini added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 5, 2018
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 5, 2018

I'd like to see a concrete performance improvement

The liballoc_jemalloc specialization of the Alloc functions were removed form rustc, and the plan is to completely remove liballoc_jemalloc and move to jemallocator.

Jemallocator can't implement these APIs until they are part of the Alloc trait. Std can't use these APIs until they are part of the Alloc trait. Perf runs without both std using these APIs and the allocator specializing them, are meaningless beyond "this change does not introduce regressions".

So can you explain how to do a meaningful perf run from this? I can try to remove liballoc_jemalloc from rustc, include jemallocator as a 3rd party dependency, make rustc use that by default, change all of std to use the new APIs, and then report back. Is that what you mean that should be done?

@glandium
Copy link
Contributor

glandium commented Jun 5, 2018

Performance is not the point anyways. The point is aligning the allocated sizes from types like Vec to what the underlying allocator actually allocates.

However, I still think this adds too much API surface.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 5, 2018

I have a branch that changes the Alloc trait to use an Allocation type as part of its API (ptr + Layout) and moves from shrink/grow to realloc_in_place. The API surface of Alloc becomes the same as that of GlobalAlloc + realloc_in_place.

@glandium
Copy link
Contributor

glandium commented Jun 5, 2018

Did you look at the rationale when realloc_in_place was split into grow_in_place/shrink_in_place?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 5, 2018

Did you look at the rationale when realloc_in_place was split into grow_in_place/shrink_in_place?

Yes, the rationale for the split was that two different functions make intent clearer and can save a branch in the implementation of Alloc.

AFAIK there is no rationale anywhere for why the current APIs for grow_in_place/shrink_in_place were chosen. AFAICT the split was done naively which is why the current API for shrink_in_place (and shrink_in_place_excess) are not really useful / do not work.

Fixing shrink_in_place so that it can be used in RawVec's shrink_to_fit is too much work for this PR though.

FWIW nowhere was discussed whether saving a branch in Alloc was worth adding one or two more methods to Alloc's API; back then the concern of Alloc having too many methods did not exist.

The main reason I went with realloc_in_place in my branch was because that made Alloc and GlobalAlloc identical and I was hoping that we could just have one trait instead of two.

@sfackler
Copy link
Member

sfackler commented Jun 5, 2018

The fact that it's annoying to determine of a change is actually beneficial or not doesn't mean we should just YOLO merge every exotic variant on allocation/reallocation/deallocation/whatever that could possibly exist. I am generally of the opinion that all of the _excess methods are a waste of effort and complexity budget.

If this change couldn't have any benefit until it's propagated through the the global allocator API, then it seems like this change should do that? You could alternatively .yank the relevant code out into a separate project and point the allocation calls to something that actually forwards to the right allocator methods for the purpose of benchmarking.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 5, 2018 via email

@pietroalbini
Copy link
Member

Ping from triage! @sfackler, what should we do with this?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 18, 2018

@pietroalbini Probably close. This is now in the limbo where the only thing I can do to benchmark this is a synthetic benchmark using jemallocator and a modified version of rustc. For example, creating one vector/string of a particular size, and then pushing one element to it. Depending on whether the vector reallocates or not, the timings vary significantly. So I can always choose a size for which if there is capacity available the excess variant takes 1ns for the push back, and the non-excess one takes at least 20ns-30ns because of the reallocation. Depending on how hard I want to make my point, I can make the vectors huge, and choose one of the known particularly bad prime number for jemalloc, so that I can pump those numbers to the 1000ns range.

But I don't know what any of that would prove. Before liballoc_jemalloc was crippled we could have used rustc + perf for this, but until we can compile rustc or servo with jemallocator again, measuring the general impact of this is just going to be hard.

@SimonSapin
Copy link
Contributor

until we can compile rustc or servo with jemallocator again

Off-topic but you totally can do that in a Servo branch, it just won’t pass our Android CI at the moment.

@pietroalbini
Copy link
Member

Ok, closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants