When using pandas dataframes, comparing a series with a value does not return a series #302

schlichtanders · 2023-05-04T19:38:04Z

using CondaPkg
CondaPkg.add("pandas")
pd = pyimport("pandas")
datafile = download("https://nyc3.digitaloceanspaces.com/owid-public/data/co2/owid-co2-data.csv")
df = pd.read_csv(datafile)

df[df["country] == "World"]

fails because df["country"] == "World" returns plainly false

It would be very awesome if we could just use normal pandas indexing syntax

The text was updated successfully, but these errors were encountered:

schlichtanders · 2023-05-04T19:41:48Z

interestingly PyCall.jl can do the comparison df["country"] == "World", but then fails with the final indexing df[df["country"] == "World"]

cjdoris · 2023-05-14T13:25:12Z

The issue here is that PythonCall intentionally does not define comparisons such as == between Python and non-Python objects, such as between df["country"] and "World". Hence df["country"] == "World" falls back to the default Julia behaviour of returning false.

You've got a few options:

Ensure both sides are Python objects: df["country"] == Py("World")
Use specific Python comparison functions: pyeq(df["country"], "World")
Use @py, which is equivalent to the previous line: @py df["country"] == "World"

schlichtanders · 2023-05-15T08:13:24Z

thank you for pointing me to the @py macro. It works also together with the boolean indexing

julia> @py df[df["country"] == "World"]
Python DataFrame:
country  year iso_code  ...  total_ghg_excluding_lucf  trade_co2  trade_co2_share
49810   World  1750      NaN  ...                       NaN        NaN              NaN
49811   World  1751      NaN  ...                       NaN        NaN              NaN
49812   World  1752      NaN  ...                       NaN        NaN              NaN
49813   World  1753      NaN  ...                       NaN        NaN              NaN
49814   World  1754      NaN  ...                       NaN        NaN              NaN
...       ...   ...      ...  ...                       ...        ...              ...
50077   World  2017      NaN  ...                 47031.820      0.004              0.0
50078   World  2018      NaN  ...                 47980.469     -0.004             -0.0
50079   World  2019      NaN  ...                 48116.559      0.000              0.0
50080   World  2020      NaN  ...                       NaN      0.000              0.0
50081   World  2021      NaN  ...                       NaN      0.000              0.0

[272 rows x 79 columns]

schlichtanders added the bug Something isn't working label May 4, 2023

schlichtanders closed this as completed May 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When using pandas dataframes, comparing a series with a value does not return a series #302

When using pandas dataframes, comparing a series with a value does not return a series #302

schlichtanders commented May 4, 2023 •

edited

Loading

schlichtanders commented May 4, 2023

Uh oh!

cjdoris commented May 14, 2023

Uh oh!

schlichtanders commented May 15, 2023

Uh oh!

When using pandas dataframes, comparing a series with a value does not return a series #302

When using pandas dataframes, comparing a series with a value does not return a series #302

Comments

schlichtanders commented May 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

schlichtanders commented May 4, 2023

Uh oh!

cjdoris commented May 14, 2023

Uh oh!

schlichtanders commented May 15, 2023

Uh oh!

schlichtanders commented May 4, 2023 •

edited

Loading