Skip to content

Commit 95965c3

Browse files
committed
Move possible performance improvements into their own file
Keeping the current interation in tasks.md more focussed on what to accomplish. Related to #1
1 parent 95f8af2 commit 95965c3

File tree

3 files changed

+21
-12
lines changed

3 files changed

+21
-12
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ Please see _'Development Status'_ for a listing of all crates and their capabili
6161
* _various memory options allow trading off speed for lower memory consumption_
6262
* [ ] resolve 'thin' packs
6363
* [ ] encode
64+
* [ ] Add support for zlib-ng for 2.5x compression performance and 20% faster decompression
6465
* [ ] create new pack
6566
* [ ] create 'thin' pack
6667
* [x] verify pack with statistics

performance-tasks.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
### Potential for improving performance
2+
3+
* [ ] @joshtriplett writes: "Regarding decompression performance, try replacing miniz_oxide with a better zlib decoder. Build with libz-sys, and then try substituting zlib-ng built with --zlib-compat. (I'm working on making that easier.) That should substantially improve decompression."
4+
* @joshtriplett writes: "As far as I know, I'm not aware of flate2 adding any significant overhead, and it provides fairly low-level interfaces in addition to high-level ones. If there's a good reason to, you could use libz-sys directly, but that's a less safe interface. Either way, if you port to libz-sys or to a crate like flate2 that's based on libz-sys, that'll make it trivial to switch to zlib-ng later, as well as making it easy to test zlib-ng now via LD_LIBRARY_PATH."
5+
* potential [savings: MASSIVE](https://github.com/Byron/gitoxide/issues/1#issuecomment-672626465)
6+
* Note that this should only be feature toggled. Using any of the above would replace a pure Rust implementation, which we would always like to keep as an option for those who want maximum safety.
7+
* [ ] Add more control over the amount of memory used for the `less-memory` algorithm of `pack-verify` to increase cache hit rate at the cost of memory.
8+
Note that depending on this setting, it might not be needed anymore to iterated over sorted offsets, freeing 150MB of memory in the process
9+
that could be used for the improved cache. With the current cache and no sorted offsets, the time nearly triples.
10+
* [ ] _progress measuring costs when using 96 cores_ (see [this comment][josh-aug-12])
11+
* potential savings: low
12+
* [ ] Add '--chunk|batch-size' flag to `pack-verify` and `index-from-pack` to allow tuning sizes for large amounts of cores
13+
* @joshtriplett write: "I did find that algorithm when I was looking for the chunk size, though I didn't dig into the details. As a quick hack, I tried dropping the upper number from 1000 to 250, which made no apparent difference in performance."
14+
* potential savings: ~~medium~~ unclear
15+
* [ ] On 96 core machines, it takes visible time until all threads are started and have work. Is it because starting 100 threads takes so long? Or is it contention to get work?
16+
* [ ] Improve cache hit rate of `lookup` pack traversal by using partial DAGs build with help of the index
17+
* @joshtriplett writes: "Would it be possible, with some care, to use the index to figure out in advance which objects will be needed again and which ones won't? Could you compute a small DAG of objects you need for deltas (without storing the objects themselves), and use that to decide the order you process objects in?"
18+
* Note that there is tension between adding more latency to build such tree and the algorithms ability to (otherwise) start instantly.
19+
* potential savings: unknown

tasks.md

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -25,21 +25,10 @@ To be picked in any order….
2525

2626
* **prodash**
2727
* [ ] finish transitioning to futures-lite to get rid of futures-util dependency to reduce compile times
28-
* **gitoxide performance**
29-
* [ ] @joshtriplett writes: "Regarding decompression performance, try replacing miniz_oxide with a better zlib decoder. Build with libz-sys, and then try substituting zlib-ng built with --zlib-compat. (I'm working on making that easier.) That should substantially improve decompression."
30-
* potential [savings: MASSIVE](https://github.com/Byron/gitoxide/issues/1#issuecomment-672626465)
31-
* Note that this should only be feature toggled. Using any of the above would replace a pure Rust implementation, which we would always like to keep as an option for those who want maximum safety.
32-
* [ ] Add '--chunk|batch-size' flag to `pack-verify` and `index-from-pack` to allow tuning sizes for large amounts of cores
33-
* potential savings: medium
34-
* [ ] Add more control over the amount of memory used for the `less-memory` algorithm of `pack-verify` to increase cache hit rate at the cost of memory.
35-
Note that depending on this setting, it might not be needed anymore to iterated over sorted offsets, freeing 150MB of memory in the process
36-
that could be used for the improved cache. With the current cache and no sorted offsets, the time nearly triples.
37-
* [ ] _progress measuring costs when using 96 cores_ (see [this comment][josh-aug-12])
38-
* potential savings: low
3928
* **criner**
4029
* [ ] switch to `isahc`
4130
seems to allow async-reading of bodies, allowing to get rid of reqwest and tokio. Redirect is configurable.
4231
* **miniz-oxide**
43-
* Get [our PR](https://github.com/Frommi/miniz_oxide/pull/92) merged
32+
* Get [this PR](https://github.com/Frommi/miniz_oxide/pull/91) merged for faster reset performance
4433

4534
[josh-aug-12]: https://github.com/Byron/gitoxide/issues/1#issuecomment-672566602

0 commit comments

Comments
 (0)