Skip to content

fix: error reading multiple batches of Dict(_, FixedSizeBinary(_)) #7585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

albertlockett
Copy link
Contributor

@albertlockett albertlockett commented Jun 1, 2025

Which issue does this PR close?

Rationale for this change

See comment here for what I think is the root cause of the issue: #7545 (comment)

What changes are included in this PR?

Are there any user-facing changes?

No breaking changes. Users may notice that if they try to collect batches with arrow type Dictionary(_, FixedSizeBinary(_)) when reading their parquet file, that there will no longer be an error

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 1, 2025
@albertlockett albertlockett force-pushed the 7545-fix-reading-dict-fsb-all-batches branch from 5c504a5 to ef8e835 Compare June 1, 2025 16:55
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @albertlockett -- this makes sense to me

cc @etseidl

@@ -2158,6 +2158,41 @@ mod tests {
);
}

#[test]
fn test_read_dict_fixed_size_binary() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this test fails without the code change:


thread 'arrow::arrow_reader::tests::test_read_dict_fixed_size_binary' panicked at parquet/src/arrow/buffer/offset_buffer.rs:133:48:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Expected 1 buffers in array of type FixedSizeBinary(8), got 2")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just one little nit

@alamb alamb merged commit e814b97 into apache:main Jun 3, 2025
16 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 3, 2025

Thanks again @albertlockett and @etseidl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants