[server] Support generating rack aware bucket assignment when creating table #786
+1,211
−144
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: close #785
Currently, Fluss only support generate bucket assignment info using round-robin strategy. However, in some cases, like deploy on K8s, we need to avoid multi replicas of one bucket was deployed to same host.
So, in this pr I will introduce one rack aware bucket assignment strategy. For example:
If we have a tabletServer set as follow:
ts-id -> machine rack
0 -> "rack1"
1 -> "rack3"
2 -> "rack3"
3 -> "rack2"
4 -> "rack2"
5 -> "rack1"
First we will create a rack alternated tabletServers list:
[0, 3, 1, 5, 4, 2]
the list is in order rack1 -> rack2 -> rack3 -> rack1 -> rack2 -> rack3, and the preview tabletServerId lower than the next tabletServerId for the same rack.
Then an easy round-robin assignment can be applied. Assume 6 buckets with replica factor of 3, the assignment will be:
bucket0 -> 0,3,1
bucket1 -> 3,1,5
bucket2 -> 1,5,4
bucket3 -> 5,4,2
bucket4 -> 4,2,0
bucket5 -> 2,0,3
Once it has completed the first round-robin, if there are more buckets to assign, the algorithm will start shifting the followers. This is to ensure we will not always get the same set of sequences. In this case, if there is another bucket to assign (bucket6, bucket7), the assignment will be:
bucket6 -> 0,4,2 (instead of repeating 0,3,1 as bucket0)
bucket7 -> 3,2,0 (instead of repeating 3,1,5 as bucket1)
Brief change log
Tests
API and Format
Documentation