Skip to content

[Parquet][Python] Read and write file/column metadata using pandas attrs #28558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
asfimport opened this issue May 18, 2021 · 1 comment
Open

Comments

@asfimport
Copy link
Collaborator

Related: pandas-dev/pandas#20521

What the general thoughts are to use DataFrame.attrs and Series.attrs for reading and writing metadata to/from parquet?

For example, here is how the metadata would be written:

pdf = pandas.DataFrame({"a": [1]})
pdf.attrs = {"name": "my custom dataset"}
pdf.a.attrs = {"long_name": "Description about data", "nodata": -1, "units": "metre"}
pdf.to_parquet("file.parquet")

Then, when loading in the data:

pdf = pandas.read_parquet("file.parquet")
pdf.attrs

{"name": "my custom dataset"}

pdf.a.attrs

{"long_name": "Description about data", "nodata": -1, "units": "metre"}

 

 

Reporter: Alan Snow

Note: This issue was originally created as ARROW-12823. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Alan Snow:
Seems like writing metadata could happen in get_column_metadata

Possibly add an "attrs" item so it doesn't conflict with "metadata".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant