Open
Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample
Tested on Pandas 1.3.0.
import pandas as pd
import numpy as np
import pytest
sample_name = ["foo", "foo", "foo"]
sample_number = [3, 3, 3]
y_pred = np.array([[0.1, 0.1, 0.1], [0.9, 0.9, 0.9]]).reshape(-1, 2)
# a df with columns '0' and '1'
df = pd.DataFrame(y_pred, index=[sample_name, sample_number])
# Returns a df, as expected
c1 = df.loc["foo", 3]
print(c1)
assert hasattr(c1, "columns")
# Key error, as expected, because there's no entry 2
with pytest.raises(KeyError):
_c2 = df.loc["foo", 2]
# Now it's a Series of (foo, column 0) instead of a DataFrame!
c3 = df.loc["foo", 0]
print(c3)
assert hasattr(c3, "columns")
Problem description
When converting arrays to DataFrames, the columns will automatically be enumerated. Obviously this problem is somewhat circumventable by not doing all these implicit conversions. However, it doesn't warn you of this, and the case I describe above can return a Series or a DataFrame, depending on which index you query for, which, at least for me, created a decent amount of confusion.
Expected Output
I'd imagine it should raise an ambiguity error, similar to #21080, like
ValueError: '0' is both an index and a column label.