-
Notifications
You must be signed in to change notification settings - Fork 114
Support types other than String and Int for partition columns #1154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This is a very nice addition. I have one concern about the breaking change without giving users time to adjust. Otherwise I would like to get this into DF48 release.
python/datafusion/context.py
Outdated
@@ -535,7 +535,7 @@ def register_listing_table( | |||
self, | |||
name: str, | |||
path: str | pathlib.Path, | |||
table_partition_cols: list[tuple[str, str]] | None = None, | |||
table_partition_cols: list[tuple[str, pa.DataType]] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid a breaking change, we could make the type hint
table_partition_cols: list[tuple[str, pa.DataType]] | list[tuple[str, str]] | None = None
And then in the python code just below we can do type checking of the table_partition_cols
and coerce to pyarrow data type. Alternatively we could overload the method signature and mark the old one as deprecated for a couple of releases. Either way we could avoid a breaking change without giving users the opportunity to upgrade at their schedule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, I opted for overloading and raising the deprecation warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for an excellent contribution!
Which issue does this PR close?
Closes #1155
Rationale for this change
Following datafusion PR4221, this PR exposes the functionality in the python bindings
What changes are included in this PR?
Change of the
table_partition_cols
arguments inPySessionContext.register_listing_table
,PySessionContext.register_parquet
,PySessionContext.register_avro
,PySessionContext.register_json
,PySessionContext.read_parquet
,PySessionContext.read_avro
,PySessionContext.read_json
Are there any user-facing changes?
Yes, the signature of the functions above is changed