-
-
Notifications
You must be signed in to change notification settings - Fork 216
Extension integer dtypes (such as pd.Int32Dtype) do not work with numexpr #331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yeah, |
Thank you. My understanding is that this is independent of numpy
dtypes—they are pandas dtypes (which they call extended dtypes) to allow
for such things as nullable integers. I am not sure of the extent to which
these types are meant to integrate into numpy, or how they do.
I’ll put a post on the numpy github, and also play around in numpy and see
if there is any compatibility with the new pandas dtypes.
Here is the pandas report I posted:
pandas-dev/pandas#25369
|
just fyi: the main documentation reference is here:
https://pandas.pydata.org/pandas-docs/stable/development/extending.html
That mentions that all pandas extension types must be convertible to a
numpy array (even if it’s an expensive operation).
Here is the dtype I was using, a pandas nullable integer
(pd.Int32Dtype—capitalized, instead of np.int32):
https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
|
Sounds like |
Do we have an issue raised with |
I have a post with the pandas guys at: I believe the pandas team consider the issue an enhancement request, rather than a bug, so while they're open to someone fixing it as you suggest, it's not a priority. Hopefully as more people use the new nullable integer types, the issue will get assigned to someone for a fix. |
Closing as |
I see the issue marked as “open” in pandas: Or am I looking in the wrong place? |
In the latest pandas version (0.24.1), combined with the latest version of numexpr (2.6.9), extension dtypes do not work (e.g. the “query” method on dataframes fails when the engine parameter is set to the default ):
https://stackoverflow.com/questions/54759936/extension-dtypes-in-pandas-appear-to-have-a-bug-with-query
Code to reproduce:
df_test = pd.DataFrame(data=[4,5,6], columns=["col_test"])
df_test = df_test.astype(dtype={"col_test": pd.Int32Dtype()})
df_test.query("col_test != 6")
Last lines of the long error message are:
File "...\site_packages\numexpr\necompiler.py", line 822, in evaluate zip(names, arguments)] File "...\site_packages\numexpr\necompiler.py", line 821, in signature = [(name, getType(arg)) for (name, arg) in File "...\site_packages\numexpr\necompiler.py", line 703, in getType raise ValueError("unknown type %s" % a.dtype.name) ValueError: unknown type object
Thanks!
The text was updated successfully, but these errors were encountered: