Skip to content

BUG: Decimal and float-to-int conversion issues with pyarrow ≥18.0.0 in parquet and Arrow dtype tests #61464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
bhavya2109sharma opened this issue May 19, 2025 · 3 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@bhavya2109sharma
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

Issue 1
import pyarrow as pa
array = pa.array([1.5, 2.5], type=pa.float64())
array.to_pandas(types_mapper={pa.float64(): pa.int64()}.get)

ArrowInvalid: Float value 1.5 was truncated converting to int64


Issue 2 
import pandas as pd
import pyarrow as pa
from decimal import Decimal

df = pd.DataFrame({"a": [Decimal("123.00")]}, dtype="string[pyarrow]")
df.to_parquet("decimal.pq", schema=pa.schema([("a", pa.decimal128(5))]))
result = pd.read_parquet("decimal.pq")
expected = pd.DataFrame({"a": ["123"]}, dtype="string[python]")

pd.testing.assert_frame_equal(result, expected)

AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
Attribute "dtype" are different
[left]:  object
[right]: string[python]

Issue Description

Two issues have been observed when using pandas 2.2.3 with pyarrow >= 18.0.0:

  • Test cases Failing : pandas/tests/extension/test_arrow.py::test_from_arrow_respecting_given_dtype_unsafe and pandas/tests/io/test_parquet.py::TestParquetPyArrow::test_roundtrip_decimal

  • Stricter float-to-int casting causes ArrowInvalid in tests like test_from_arrow_respecting_given_dtype_unsafe.

  • Decimal roundtrip mismatch: test_roundtrip_decimal fails due to dtype mismatches (object vs. string[python]) when reading back a decimal column written with a specified pyarrow schema.

These issues were not present with pyarrow==17.x.

Expected Behavior

  • Float to int casting should either handle truncation more gracefully (as in older versions) or tests should be updated to skip/adjust.

  • Decimal roundtrips to parquet should maintain the same pandas dtype or document clearly if type coercion is expected.

Installed Versions

python : 3.11.11
pandas : 2.2.3
pyarrow : 19.0.1

@bhavya2109sharma bhavya2109sharma added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 19, 2025
@phoebecd
Copy link

In newer versions of PyArrow, type identity is stricter which is why this code is now causing errors.

Issue 1:
Hi, the issue is that types_mapper={pa.float64(): pa.int64()}.get is not reliable in newer versions of PyArrow. This is because each call to pa.float64() creates a new object, so the key in your dictionary does not match the instance passed internally by PyArrow. I fixed the issue by converting the float to pandas and then casting the float to an int using truncation.

array = pa.array([1.5, 2.5], type=pa.float64())
s = array.to_pandas()       
s_int = s.astype(int) 

Issue 2:
The issue is that both the values and types of result and expected are different. The result column is an object with a decimal value of 123.00, while the expected column is a string with a value of "123". I fixed this by converting the result column to a string and removing the trailing decimal places so that it matched the expected column.

df = pd.DataFrame({"a": [Decimal("123.00")]}, dtype="object")
df.to_parquet("decimal.pq", schema=pa.schema([("a", pa.decimal128(5))]))
result = pd.read_parquet("decimal.pq")
result["a"] = result["a"].apply(lambda x: str(x).split(".")[0] if isinstance(x, Decimal) else str(x))
result = result.astype({"a": "string"})

expected = pd.DataFrame({"a": ["123"]}, dtype="string[python]")
pd.testing.assert_frame_equal(result, expected)

@bhavya2109sharma
Copy link
Author

Thanks @phoebecd for the suggestions but I am running test cases implemented by pandas. AFAIK, pandas needs to fix these test cases in newer version so that pyarrow stricter identity type errors get resolved with a fix made by pandas.

@rhshadrach
Copy link
Member

rhshadrach commented May 20, 2025

@bhavya2109sharma @phoebecd - Unfortunately your response does not help with this issue and only adds noise that maintainers spend time going through. I suspect it was generated by AI. If that is the case, please do not merely post the AI response to an issue. Using it as an aid when crafting a response is okay, but you should first feel confident that what you post is likely to be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants