BUG: Fix large floats in Excel losing precision when converted to integer #49635

ng-henry · 2022-11-11T05:04:24Z

Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

When opening Excel files with large floating point values like 1E50, Pandas will convert these values to integer, resulting in integers like 100000000000000007629769841091887003294964970946560. Converting large floating points to integers results in erroneous values.

This PR only converts floats to integers if they are smaller than 1E22, which is about the maximum float that can fully represent an integer. If any floats are higher than that cutoff, then they are not converted to integers.

debnathshoham · 2022-11-11T06:13:36Z

Thanks for the PR @ng-henry !
Is there a related bug report? If not, could you please open one

mroeschke · 2022-11-16T22:15:34Z

pandas/io/excel/_openpyxl.py

-                return val
+            # If we try to convert a large float to an integer, weird issues arise because of precision limitation of
+            # floating point numbers
+            if abs(cell.value) < 1e22:


Thanks but these constants like 1e22 are not very robust to platform difference and can hide bugs. Unless there's a openpyxl flag or package constant that can be used, I recommend just documenting this limitation.

Since we know Python represents floats as 64 bit doubles, we are guaranteed that all doubles have 16 decimal digits of precision (according to https://en.wikipedia.org/wiki/Double-precision_floating-point_format). So I'm thinking we can replace 1e22 with 1e16? 1e22 was just an empirically determined value, but 1e16 is based off the maximum precision of a double number.

On a practical note, numbers larger than 1e16 are probably better represented with floats than integers. If an Excel cell contains numbers that large, the meaning is better captured as a float than as an integer.

For context, this int(cell.value) check was done because of #46988.

mroeschke · 2022-12-17T19:47:42Z

Thanks for the pull request, but given the issue and solution, I think more discussion is needed in a dedicate issue first. Closing this for now, but happy to reopen if other core developers think this is an adequate solution in the issue

ng-henry added 2 commits November 10, 2022 23:40

Fix large floats in Excel having a weird value upon conversion to string

6b64ae0

add tests

2867a1d

ng-henry changed the title ~~BUG: Fix large floats in Excel losing precision upon conversion to string~~ BUG: Fix large floats in Excel losing precision when converted to integer Nov 11, 2022

mroeschke added the IO Excel read_excel, to_excel label Nov 16, 2022

mroeschke requested changes Nov 16, 2022

View reviewed changes

mroeschke closed this Dec 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Fix large floats in Excel losing precision when converted to integer #49635

BUG: Fix large floats in Excel losing precision when converted to integer #49635

Uh oh!

ng-henry commented Nov 11, 2022

Uh oh!

debnathshoham commented Nov 11, 2022

Uh oh!

mroeschke Nov 16, 2022

Uh oh!

ng-henry Nov 17, 2022

Uh oh!

mroeschke commented Dec 17, 2022

Uh oh!

Uh oh!

Uh oh!

BUG: Fix large floats in Excel losing precision when converted to integer #49635

BUG: Fix large floats in Excel losing precision when converted to integer #49635

Uh oh!

Conversation

ng-henry commented Nov 11, 2022

Uh oh!

debnathshoham commented Nov 11, 2022

Uh oh!

mroeschke Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

ng-henry Nov 17, 2022

Choose a reason for hiding this comment

Uh oh!

mroeschke commented Dec 17, 2022

Uh oh!

Uh oh!