-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: Fix FastParquetImpl.write for non-existent file #28326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you add a test for the bug this fixes |
It seems like the only real way to test this is a GCS-specific test like the s3 roundtrip test here: https://github.com/pandas-dev/pandas/blob/master/pandas/tests/io/test_parquet.py#L525-L527 I think everything else besides the GCS branch of the code ends up ignoring the Any other ideas or suggestions on how to set up a test bucket...? |
I take it nothing in test_gcs is helpful for this? Can you mock the relevant server-side behavior? |
pandas/io/parquet.py
Outdated
@@ -170,7 +170,7 @@ def write( | |||
# And pass the opened s3file to the fastparquet internal impl. | |||
kwargs["open_with"] = lambda path, _: path | |||
else: | |||
path, _, _, _ = get_filepath_or_buffer(path) | |||
path, _, _, _ = get_filepath_or_buffer(path, mode="wb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change the is_s3_url
on L165 to is_s3_url(path) or is_gcs_url(path)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that works too
56999c9
to
ab4ce24
Compare
@jbrockmendel I took a stab at a test, it doesn't check much but it does fail on master and pass here so 🤷♂😄 |
Oh, whoops, can you add a release notes for 1.0? Also, you could update the comments under the |
@TomAugspurger done! |
Linting issue with that last commit :( Can you run black and repush? Looks like a merge conflict also. |
oops, still not used to black :) done |
FastParquetImpl.write
for non-existent file
@bnaul you have a small linting error:
|
BTW, for black and isort, I really started to appreciate pre-commit hooks to avoid getting those failures after pushing to github. See https://dev.pandas.io/development/contributing.html#python-pep8-black |
Thanks! |
* Fix `FastParquetImpl.write` for non-existent file
* Fix `FastParquetImpl.write` for non-existent file
PyArrowImpl
already correctly opens a non-existent file for writing (https://github.com/pandas-dev/pandas/blob/master/pandas/io/parquet.py#L95), withengine='fastparquet'
this fails for e.g. a GCS URL (though it looks like S3 is already correct):