You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Two issues have been observed when using pandas 2.2.3 with pyarrow >= 18.0.0:
Test cases Failing : pandas/tests/extension/test_arrow.py::test_from_arrow_respecting_given_dtype_unsafe and pandas/tests/io/test_parquet.py::TestParquetPyArrow::test_roundtrip_decimal
Stricter float-to-int casting causes ArrowInvalid in tests like test_from_arrow_respecting_given_dtype_unsafe.
Decimal roundtrip mismatch: test_roundtrip_decimal fails due to dtype mismatches (object vs. string[python]) when reading back a decimal column written with a specified pyarrow schema.
These issues were not present with pyarrow==17.x.
Expected Behavior
Float to int casting should either handle truncation more gracefully (as in older versions) or tests should be updated to skip/adjust.
Decimal roundtrips to parquet should maintain the same pandas dtype or document clearly if type coercion is expected.
Installed Versions
python : 3.11.11
pandas : 2.2.3
pyarrow : 19.0.1
The text was updated successfully, but these errors were encountered:
In newer versions of PyArrow, type identity is stricter which is why this code is now causing errors.
Issue 1:
Hi, the issue is that types_mapper={pa.float64(): pa.int64()}.get is not reliable in newer versions of PyArrow. This is because each call to pa.float64() creates a new object, so the key in your dictionary does not match the instance passed internally by PyArrow. I fixed the issue by converting the float to pandas and then casting the float to an int using truncation.
Issue 2:
The issue is that both the values and types of result and expected are different. The result column is an object with a decimal value of 123.00, while the expected column is a string with a value of "123". I fixed this by converting the result column to a string and removing the trailing decimal places so that it matched the expected column.
Thanks @phoebecd for the suggestions but I am running test cases implemented by pandas. AFAIK, pandas needs to fix these test cases in newer version so that pyarrow stricter identity type errors get resolved with a fix made by pandas.
@bhavya2109sharma@phoebecd - Unfortunately your response does not help with this issue and only adds noise that maintainers spend time going through. I suspect it was generated by AI. If that is the case, please do not merely post the AI response to an issue. Using it as an aid when crafting a response is okay, but you should first feel confident that what you post is likely to be helpful.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Two issues have been observed when using pandas 2.2.3 with pyarrow >= 18.0.0:
Test cases Failing : pandas/tests/extension/test_arrow.py::test_from_arrow_respecting_given_dtype_unsafe and pandas/tests/io/test_parquet.py::TestParquetPyArrow::test_roundtrip_decimal
Stricter float-to-int casting causes ArrowInvalid in tests like test_from_arrow_respecting_given_dtype_unsafe.
Decimal roundtrip mismatch: test_roundtrip_decimal fails due to dtype mismatches (object vs. string[python]) when reading back a decimal column written with a specified pyarrow schema.
These issues were not present with pyarrow==17.x.
Expected Behavior
Float to int casting should either handle truncation more gracefully (as in older versions) or tests should be updated to skip/adjust.
Decimal roundtrips to parquet should maintain the same pandas dtype or document clearly if type coercion is expected.
Installed Versions
python : 3.11.11
pandas : 2.2.3
pyarrow : 19.0.1
The text was updated successfully, but these errors were encountered: