Skip to content

[v3] Request coalescing #1758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jhamman opened this issue Apr 5, 2024 · 1 comment
Open

[v3] Request coalescing #1758

jhamman opened this issue Apr 5, 2024 · 1 comment
Assignees
Labels
enhancement New features or improvements

Comments

@jhamman
Copy link
Member

jhamman commented Apr 5, 2024

Various threads have recently highlighted that supporting request coalescing would be a nice feature to add to the v3 effort. This may be particularly impactful with sharding coming online. We need to decide if we should handle this and where (at the store level or elsewhere).

@jhamman jhamman added the V3 label Apr 5, 2024
@jhamman jhamman added this to the After 3.0.0 milestone Apr 5, 2024
@jhamman jhamman moved this to Todo in Zarr-Python - 3.0 Apr 5, 2024
@dstansby dstansby removed the V3 label Dec 12, 2024
@dstansby dstansby added the enhancement New features or improvements label Dec 30, 2024
@aldenks
Copy link
Contributor

aldenks commented Mar 27, 2025

I'm interested in taking this on for coalescing requests to chunks within the same shard. (side question: are there other types of coalescing in play?)

I did some profiling of slower than I'd anticipated reading from a zarr with many small chunks per shard and found that this loop in ShardingCodec._decode_partial_single was the culprit.

            for chunk_coords in all_chunk_coords:
                chunk_byte_slice = shard_index.get_chunk_slice(chunk_coords)
                if chunk_byte_slice:
                    chunk_bytes = await byte_getter.get(
                        prototype=chunk_spec.prototype,
                        byte_range=RangeByteRequest(chunk_byte_slice[0], chunk_byte_slice[1]),
                    )
                    if chunk_bytes:
                        shard_dict[chunk_coords] = chunk_bytes

I had Claude wip up coalescing for those requests and while the code is a bit of an AI hot mess it's working to get way higher read throughput (went from roughly 4MB/s to 70MB/s aka saturating my current network).


In terms of solutions, two obvious ones to me are:

  • replace the serial for loop with a concurrent_map.
  • do actual coalescing to reduce request count.

Coalescing seems to me like the ideal longer term solution (less request overhead, lower costs if paying per request) and a decent implementation shouldn't be too hard. The key questions I see if we go this route are:

  • is there a max size we want to stop coalescing at or do we let it be naturally bounded by the shard size?
  • what's the maximum gap of unnecessary bytes to coalesce over? Should that be a config option? (one point of reference is rust's object_store which defaults to 1MiB)
  • do we make concurrent or serial requests to each of the coalesced groups? My default would be concurrent, but this is happening within an outer concurrent_map in ArrayBytesCodecPartialDecodeMixin.decode_partial so you're potentially not respecting the async.concurrency config value if you make a nested concurrent_map call.

should I do this? and if so, anyone have suggestions/answers to those questions/things im missing?

@dstansby dstansby removed this from the After 3.0.0 milestone May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features or improvements
Projects
Status: Todo
Development

No branches or pull requests

4 participants