Skip to content

s3.to_csv ignores pandas_kwargs when dataset is set to True #308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jasadams opened this issue Jul 1, 2020 · 3 comments
Closed

s3.to_csv ignores pandas_kwargs when dataset is set to True #308

jasadams opened this issue Jul 1, 2020 · 3 comments
Assignees
Labels
enhancement New feature or request feature minor release Will be addressed in the next minor release ready to release
Milestone

Comments

@jasadams
Copy link
Contributor

jasadams commented Jul 1, 2020

if dataset = True as an argument to s3.to_csv, than any additional arguments passed are ignored. e.g. compression='gzip' which I wanted to be passed to pandas.to_csv does not happen

@jasadams jasadams added the bug Something isn't working label Jul 1, 2020
@igorborgest igorborgest added blocked Something is blocking the development feature and removed bug Something isn't working labels Jul 3, 2020
@igorborgest igorborgest self-assigned this Jul 3, 2020
@igorborgest
Copy link
Contributor

Hi @jasadams,

Unfortunately Pandas does not support in-memory compression for CSV files. So we will not support it by now because we need to serialize it in memory during the S3 upload.

Original Issue: pandas-dev/pandas#22555

Adding a couple of notes in the docs to address this issue by now.

Note
----
If `dataset=True`, `pandas_kwargs` will be ignored due
restrictive quoting, date_format, escapechar, encoding, etc required by Athena/Glue Catalog.

Note
----
By now Pandas does not support in-memory CSV compression.
https://github.com/pandas-dev/pandas/issues/22555
So the `compression` will not be supported on Wrangler too.

@igorborgest
Copy link
Contributor

Hi @jasadams

Pandas 1.2.0 is available so we added support to it in the PR above 👆 .
Could you give it a try before the official release? You can install that directly from the dev branch:

pip install git+https://github.com/awslabs/aws-data-wrangler.git@write-compressed-text

@igorborgest igorborgest added enhancement New feature or request minor release Will be addressed in the next minor release ready to release and removed blocked Something is blocking the development labels Jan 4, 2021
@igorborgest igorborgest added this to the 2.3.0 milestone Jan 4, 2021
@igorborgest igorborgest self-assigned this Jan 4, 2021
@igorborgest
Copy link
Contributor

Released on version 2.3.0 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature minor release Will be addressed in the next minor release ready to release
Projects
None yet
Development

No branches or pull requests

2 participants