Skip to content

BUG: Incorrect data when modifying a dataframe column with +=  #37011

Closed
@alexifm

Description

@alexifm
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

Get the column as a series and modify inplace with +=:

In [1]: import pandas as pd
    ...: df = pd.DataFrame({"col": [1 ,2 ,3, 4]})
    ...: s = df["col"]
    ...: s += 1
    ...: print(s)
    ...: print(df["col"])
    ...: print(df[["col"]])
0    2
1    3
2    4
3    5
Name: col, dtype: int64
0    2
1    3
2    4
3    5
Name: col, dtype: int64
   col
0    1
1    2
2    3
3    4

Modify the column directly with +=:

In [2]: import pandas as pd
    ...: df = pd.DataFrame({"col": [1 ,2 ,3, 4]})
    ...: df["col"] += 1
    ...: print(df["col"])
    ...: print(df[["col"]])
0    2
1    3
2    4
3    5
Name: col, dtype: int64
   col
0    2
1    3
2    4
3    5

Problem description

The data in the column should be consistent whether accessed as df["col"] or df[["col"]].

Expected Output

Either += modifies the values in the dataframe column or it doesn't but df["col"] and df[["col"]] should show the same values.

Output of pd.show_versions()

In [13]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : db08276bc116c438d3fdee492026f8223584c477
python           : 3.8.5.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.6.0
Version          : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.3
numpy            : 1.18.5
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 47.1.0
Cython           : 0.29.21
pytest           : 5.4.3
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.5.2
html5lib         : None
pymysql          : 0.10.1
psycopg2         : 2.8.6 (dt dec pq3 ext lo64)
jinja2           : 2.11.2
IPython          : 7.14.0
pandas_datareader: None
bs4              : None
bottleneck       : 1.3.2
fsspec           : 0.8.3
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.2
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.5
pandas_gbq       : None
pyarrow          : 1.0.1
pytables         : None
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.4.1
sqlalchemy       : 1.3.19
tables           : 3.6.1
tabulate         : 0.8.7
xarray           : None
xlrd             : None
xlwt             : None
numba            : None

For master branch:

In [1]: import pandas as pd; pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : 9787744272c13a1dcbbcdfc7daaae8cc73ac78a3
python           : 3.8.5.final.0
...

pandas           : 1.2.0.dev0+684.g978774427
...

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions