Skip to content

When using pandas dataframes, comparing a series with a value does not return a series #302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
schlichtanders opened this issue May 4, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@schlichtanders
Copy link
Contributor

schlichtanders commented May 4, 2023

using CondaPkg
CondaPkg.add("pandas")
pd = pyimport("pandas")
datafile = download("https://nyc3.digitaloceanspaces.com/owid-public/data/co2/owid-co2-data.csv")
df = pd.read_csv(datafile)

df[df["country] == "World"]

fails because df["country"] == "World" returns plainly false

It would be very awesome if we could just use normal pandas indexing syntax

@schlichtanders schlichtanders added the bug Something isn't working label May 4, 2023
@schlichtanders
Copy link
Contributor Author

interestingly PyCall.jl can do the comparison df["country"] == "World", but then fails with the final indexing df[df["country"] == "World"]

@cjdoris
Copy link
Collaborator

cjdoris commented May 14, 2023

The issue here is that PythonCall intentionally does not define comparisons such as == between Python and non-Python objects, such as between df["country"] and "World". Hence df["country"] == "World" falls back to the default Julia behaviour of returning false.

You've got a few options:

  • Ensure both sides are Python objects: df["country"] == Py("World")
  • Use specific Python comparison functions: pyeq(df["country"], "World")
  • Use @py, which is equivalent to the previous line: @py df["country"] == "World"

@schlichtanders
Copy link
Contributor Author

thank you for pointing me to the @py macro. It works also together with the boolean indexing

julia> @py df[df["country"] == "World"]
Python DataFrame:
country  year iso_code  ...  total_ghg_excluding_lucf  trade_co2  trade_co2_share
49810   World  1750      NaN  ...                       NaN        NaN              NaN
49811   World  1751      NaN  ...                       NaN        NaN              NaN
49812   World  1752      NaN  ...                       NaN        NaN              NaN
49813   World  1753      NaN  ...                       NaN        NaN              NaN
49814   World  1754      NaN  ...                       NaN        NaN              NaN
...       ...   ...      ...  ...                       ...        ...              ...
50077   World  2017      NaN  ...                 47031.820      0.004              0.0
50078   World  2018      NaN  ...                 47980.469     -0.004             -0.0
50079   World  2019      NaN  ...                 48116.559      0.000              0.0
50080   World  2020      NaN  ...                       NaN      0.000              0.0
50081   World  2021      NaN  ...                       NaN      0.000              0.0

[272 rows x 79 columns]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants