Skip to content

BUG: Fix large floats in Excel losing precision when converted to integer #49635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

ng-henry
Copy link

When opening Excel files with large floating point values like 1E50, Pandas will convert these values to integer, resulting in integers like 100000000000000007629769841091887003294964970946560. Converting large floating points to integers results in erroneous values.

This PR only converts floats to integers if they are smaller than 1E22, which is about the maximum float that can fully represent an integer. If any floats are higher than that cutoff, then they are not converted to integers.

@ng-henry ng-henry changed the title BUG: Fix large floats in Excel losing precision upon conversion to string BUG: Fix large floats in Excel losing precision when converted to integer Nov 11, 2022
@debnathshoham
Copy link
Member

Thanks for the PR @ng-henry !
Is there a related bug report? If not, could you please open one

@mroeschke mroeschke added the IO Excel read_excel, to_excel label Nov 16, 2022
return val
# If we try to convert a large float to an integer, weird issues arise because of precision limitation of
# floating point numbers
if abs(cell.value) < 1e22:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks but these constants like 1e22 are not very robust to platform difference and can hide bugs. Unless there's a openpyxl flag or package constant that can be used, I recommend just documenting this limitation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know Python represents floats as 64 bit doubles, we are guaranteed that all doubles have 16 decimal digits of precision (according to https://en.wikipedia.org/wiki/Double-precision_floating-point_format). So I'm thinking we can replace 1e22 with 1e16? 1e22 was just an empirically determined value, but 1e16 is based off the maximum precision of a double number.

On a practical note, numbers larger than 1e16 are probably better represented with floats than integers. If an Excel cell contains numbers that large, the meaning is better captured as a float than as an integer.

For context, this int(cell.value) check was done because of #46988.

@mroeschke
Copy link
Member

Thanks for the pull request, but given the issue and solution, I think more discussion is needed in a dedicate issue first. Closing this for now, but happy to reopen if other core developers think this is an adequate solution in the issue

@mroeschke mroeschke closed this Dec 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants