Skip to content

BUG: Pandas loc can result in ambiguous result not caught by an exception #42603

Open
@komodovaran

Description

@komodovaran
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample

Tested on Pandas 1.3.0.

import pandas as pd
import numpy as np
import pytest

sample_name = ["foo", "foo", "foo"]
sample_number = [3, 3, 3]

y_pred = np.array([[0.1, 0.1, 0.1], [0.9, 0.9, 0.9]]).reshape(-1, 2)

# a df with columns '0' and '1'
df = pd.DataFrame(y_pred, index=[sample_name, sample_number])

# Returns a df, as expected
c1 = df.loc["foo", 3]
print(c1)
assert hasattr(c1, "columns")

# Key error, as expected, because there's no entry 2
with pytest.raises(KeyError):
    _c2 = df.loc["foo", 2]

# Now it's a Series of (foo, column 0) instead of a DataFrame!
c3 = df.loc["foo", 0]
print(c3)
assert hasattr(c3, "columns")

Problem description

When converting arrays to DataFrames, the columns will automatically be enumerated. Obviously this problem is somewhat circumventable by not doing all these implicit conversions. However, it doesn't warn you of this, and the case I describe above can return a Series or a DataFrame, depending on which index you query for, which, at least for me, created a decent amount of confusion.

Expected Output

I'd imagine it should raise an ambiguity error, similar to #21080, like

 ValueError: '0' is both an index and a column label.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselvesNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions