Skip to content

Support types other than String and Int for partition columns #1154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 19, 2025

Conversation

miclegr
Copy link
Contributor

@miclegr miclegr commented Jun 17, 2025

Which issue does this PR close?

Closes #1155

Rationale for this change

Following datafusion PR4221, this PR exposes the functionality in the python bindings

What changes are included in this PR?

Change of the table_partition_cols arguments in PySessionContext.register_listing_table,
PySessionContext.register_parquet, PySessionContext.register_avro, PySessionContext.register_json,
PySessionContext.read_parquet, PySessionContext.read_avro, PySessionContext.read_json

Are there any user-facing changes?

Yes, the signature of the functions above is changed

Copy link
Contributor

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is a very nice addition. I have one concern about the breaking change without giving users time to adjust. Otherwise I would like to get this into DF48 release.

@@ -535,7 +535,7 @@ def register_listing_table(
self,
name: str,
path: str | pathlib.Path,
table_partition_cols: list[tuple[str, str]] | None = None,
table_partition_cols: list[tuple[str, pa.DataType]] | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid a breaking change, we could make the type hint

table_partition_cols: list[tuple[str, pa.DataType]] | list[tuple[str, str]] | None = None

And then in the python code just below we can do type checking of the table_partition_cols and coerce to pyarrow data type. Alternatively we could overload the method signature and mark the old one as deprecated for a couple of releases. Either way we could avoid a breaking change without giving users the opportunity to upgrade at their schedule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, I opted for overloading and raising the deprecation warning

Copy link
Contributor

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for an excellent contribution!

@timsaucer timsaucer merged commit 9b6acec into apache:main Jun 19, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support types other than String and Int for partition columns
3 participants