Skip to content

[Server] Introduce Default Max bucket number for a table #811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 10, 2025

Conversation

MehulBatra
Copy link
Contributor

@MehulBatra MehulBatra commented Apr 30, 2025

Purpose

Introduce a cluster-level config, which limits the maximum number of buckets that can be created for a partitioned table to avoid creating too many buckets. Default 128000

Linked issue: close #688, #582

Brief change log

  • New configuration parameter max.bucket.num with default value 128000 to limit the total number of buckets across all partitions of a table
  • New exception class TooManyBucketsException extending ApiException to handle bucket limit violations
  • Method getTotalBucketNumber(TablePath tablePath) in ZooKeeperClient to count buckets across partitions
  • Check in TableDescriptorValidation.checkDistribution() to enforce limit while table creation
  • Bucket count validation in MetadataManager.createPartition() to enforce limits
  • Exception handling in AutoPartitionManager to gracefully handle bucket limit exceptions
  • MetadataManager constructor to store maxBucketNum as a final field for consistent application of limits
  • AutoPartitionManager.createPartitions() to catch and log TooManyBucketsException for auto-partitioning

Tests

  • Added testAddTooManyBuckets() to FlussAdminITCase to verify the API behavior when bucket limits are reached for a partitioned table.
  • Added testMaxBucketNum() to AutoPartitionManagerTest to verify that auto-partitioning respects bucket limits in AutoPartitionManagerTest
  • Added testBucketLimitForNonPartitionedTable() to FlussAdminITCase to verify the API behavior when bucket limits are reached for a non-partitioned table

Documentation

  • Added JavaDoc comments explaining the bucket limitation behavior
  • Added max bucket num limit in the maintenance/configuration.md

@MehulBatra MehulBatra changed the title [Server] Respect Max bucket number of a table [Server] Introduce Default Max bucket number for a table Apr 30, 2025
@MehulBatra MehulBatra marked this pull request as draft April 30, 2025 21:17
@MehulBatra MehulBatra marked this pull request as ready for review May 1, 2025 12:24
@MehulBatra
Copy link
Contributor Author

@wuchong, please help me review it. I have tried to do it the way we discussed in #688
Thank you!

@MehulBatra MehulBatra requested a review from wuchong May 6, 2025 10:59
@MehulBatra
Copy link
Contributor Author

@wuchong addressed the comments, Please have a look whenever you get a chance.

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I pushed a commit to fix the minor comments.

@wuchong wuchong force-pushed the max_bucket_limit branch from 0a0dd5c to 9b7f4fc Compare May 10, 2025 10:41
@wuchong wuchong merged commit 735edb0 into alibaba:main May 10, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support max bucket number of a table
2 participants