Move possible performance improvements into their own file

Byron · Byron · commit 95965c3faf22 · 2020-08-13T10:57:56.000+08:00
Keeping the current interation in tasks.md more focussed on what to accomplish. Related to #1
diff --git a/README.md b/README.md
@@ -61,6 +61,7 @@ Please see _'Development Status'_ for a listing of all crates and their capabili
         * _various memory options allow trading off speed for lower memory consumption_
         * [ ] resolve 'thin' packs
     * [ ] encode
+      * [ ] Add support for zlib-ng for 2.5x compression performance and 20% faster decompression
       * [ ] create new pack
       * [ ] create 'thin' pack
     * [x] verify pack with statistics
diff --git a/performance-tasks.md b/performance-tasks.md
@@ -0,0 +1,19 @@
+### Potential for improving performance
+
+* [ ] @joshtriplett writes: "Regarding decompression performance, try replacing miniz_oxide with a better zlib decoder. Build with libz-sys, and then try substituting zlib-ng built with --zlib-compat. (I'm working on making that easier.) That should substantially improve decompression."
+  * @joshtriplett writes: "As far as I know, I'm not aware of flate2 adding any significant overhead, and it provides fairly low-level interfaces in addition to high-level ones. If there's a good reason to, you could use libz-sys directly, but that's a less safe interface. Either way, if you port to libz-sys or to a crate like flate2 that's based on libz-sys, that'll make it trivial to switch to zlib-ng later, as well as making it easy to test zlib-ng now via LD_LIBRARY_PATH."
+  * potential [savings: MASSIVE](https://github.com/Byron/gitoxide/issues/1#issuecomment-672626465) 
+  * Note that this should only be feature toggled. Using any of the above would replace a pure Rust implementation, which we would always like to keep as an option for those who want maximum safety.
+* [ ] Add more control over the amount of memory used for the `less-memory` algorithm of `pack-verify` to increase cache hit rate at the cost of memory.
+  Note that depending on this setting, it might not be needed anymore to iterated over sorted offsets, freeing 150MB of memory in the process
+  that could be used for the improved cache. With the current cache and no sorted offsets, the time nearly triples.
+* [ ] _progress measuring costs when using 96 cores_ (see [this comment][josh-aug-12])
+  * potential savings: low
+* [ ] Add '--chunk|batch-size' flag to `pack-verify` and `index-from-pack` to allow tuning sizes for large amounts of cores
+  * @joshtriplett write: "I did find that algorithm when I was looking for the chunk size, though I didn't dig into the details. As a quick hack, I tried dropping the upper number from 1000 to 250, which made no apparent difference in performance."
+  * potential savings: ~~medium~~ unclear
+* [ ] On 96 core machines, it takes visible time until all threads are started and have work. Is it because starting 100 threads takes so long? Or is it contention to get work?
+* [ ] Improve cache hit rate of `lookup` pack traversal by using partial DAGs build with help of the index
+  * @joshtriplett writes: "Would it be possible, with some care, to use the index to figure out in advance which objects will be needed again and which ones won't? Could you compute a small DAG of objects you need for deltas (without storing the objects themselves), and use that to decide the order you process objects in?"
+  * Note that there is tension between adding more latency to build such tree and the algorithms ability to (otherwise) start instantly.
+  * potential savings: unknown
diff --git a/tasks.md b/tasks.md
@@ -25,21 +25,10 @@ To be picked in any order….
 
 * **prodash**
   * [ ] finish transitioning to futures-lite to get rid of futures-util dependency to reduce compile times
-* **gitoxide performance**
-  * [ ] @joshtriplett writes: "Regarding decompression performance, try replacing miniz_oxide with a better zlib decoder. Build with libz-sys, and then try substituting zlib-ng built with --zlib-compat. (I'm working on making that easier.) That should substantially improve decompression."
-    * potential [savings: MASSIVE](https://github.com/Byron/gitoxide/issues/1#issuecomment-672626465) 
-    * Note that this should only be feature toggled. Using any of the above would replace a pure Rust implementation, which we would always like to keep as an option for those who want maximum safety.
-  * [ ] Add '--chunk|batch-size' flag to `pack-verify` and `index-from-pack` to allow tuning sizes for large amounts of cores
-    * potential savings: medium
-  * [ ] Add more control over the amount of memory used for the `less-memory` algorithm of `pack-verify` to increase cache hit rate at the cost of memory.
-    Note that depending on this setting, it might not be needed anymore to iterated over sorted offsets, freeing 150MB of memory in the process
-    that could be used for the improved cache. With the current cache and no sorted offsets, the time nearly triples.
-  * [ ] _progress measuring costs when using 96 cores_ (see [this comment][josh-aug-12])
-    * potential savings: low
 * **criner**
   * [ ] switch to `isahc`
     seems to allow async-reading of bodies, allowing to get rid of reqwest and tokio. Redirect is configurable.
 * **miniz-oxide**
-  * Get [our PR](https://github.com/Frommi/miniz_oxide/pull/92) merged
+  * Get [this PR](https://github.com/Frommi/miniz_oxide/pull/91) merged for faster reset performance
 
 [josh-aug-12]: https://github.com/Byron/gitoxide/issues/1#issuecomment-672566602