-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: membership checks on ExtensionArray containing NA values #37867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 25 commits
7ab6443
7986bc4
466f7cc
a9b75dd
9d0eca1
b0b32ab
c6e42d2
75c45bc
83c9fe4
08c4c98
5a23b1d
4b0c200
92604e9
8a24f0d
fdb9deb
52e2b43
f21890e
6f633c7
d8bdb2e
4e4dbc4
a1583e7
3c2c2b0
237fe45
37219c3
c4a6c36
245c99a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -50,6 +50,10 @@ def test_view(self, data): | |
# __setitem__ does not work, so we only have a smoke-test | ||
data.view() | ||
|
||
@pytest.mark.xfail(raises=AssertionError, reason="Not implemented yet") | ||
def test_contains(self, data, data_missing): | ||
super().test_contains(data, data_missing) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Arrow arrays have Arrow Nulls as the dtype.na_value, while data_missing[0] gives ´None´. Maybe @TomAugspurger could look into that as part of his Arrow work? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's fine to simply skip it as you did. This are only test EAs, that were needed for certain specific tests, so no problem this don't fully work otherwise. The actual public arrays using Arrow under the hood (eg the new ArrowStringArray) has a different implementation and is fully tested. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we also base these test arrays on |
||
|
||
|
||
class TestConstructors(BaseArrowTests, base.BaseConstructorsTests): | ||
def test_from_dtype(self, data): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,31 @@ def test_can_hold_na_valid(self, data): | |
# GH-20761 | ||
assert data._can_hold_na is True | ||
|
||
def test_contains(self, data, data_missing): | ||
# GH-37867 | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Tests for membership checks. Membership checks for nan-likes is tricky and | ||
# the settled on rule is: `nan_like in arr` is True if nan_like is | ||
# arr.dtype.na_value and arr.isna().any() is True. Else the check returns False. | ||
|
||
na_value = data.dtype.na_value | ||
# ensure data without missing values | ||
data = data[~data.isna()] | ||
|
||
# first elements are non-missing | ||
assert data[0] in data | ||
assert data_missing[0] in data_missing | ||
|
||
# check the presence of na_value | ||
assert na_value in data_missing | ||
assert na_value not in data | ||
|
||
# the data can never contain other nan-likes than na_value | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. might consider actually using the nulls_fixture as this is more comprehensive (sure it will run the other checks multiple times but no big deal). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am personally -1 on using a fixture for this (we could move the list of nulls that is currently used for defining the fixture to constant in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in this case? we already have a nulls fixture and use it in a great many places we need to be comprehensive and general in testing - specific cases are ok sometimes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok here's the fixture contents
so if you add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like the generality of using the nulls_fixture too. I've added it in the newest commit. |
||
for na_value_type in {None, np.nan, pd.NA, pd.NaT}: | ||
if na_value_type is na_value: | ||
continue | ||
assert na_value_type not in data | ||
assert na_value_type not in data_missing | ||
|
||
def test_memory_usage(self, data): | ||
s = pd.Series(data) | ||
result = s.memory_usage(index=False) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -87,6 +87,31 @@ def test_memory_usage(self, data): | |
# Is this deliberate? | ||
super().test_memory_usage(data) | ||
|
||
def test_contains(self, data, data_missing): | ||
# GH-37867 | ||
# na value handling in Categorical.__contains__ is deprecated. | ||
# See base.BaseInterFaceTests.test_contains for more details. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This duplicates what you have in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately not: na values is also more complicated in categoricals, because in some cases we want to accept There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be clear, I was not referring to the base tests that this is overriding, but the original test you added to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah. I'll delete the ones in |
||
|
||
na_value = data.dtype.na_value | ||
# ensure data without missing values | ||
data = data[~data.isna()] | ||
|
||
# first elements are non-missing | ||
assert data[0] in data | ||
assert data_missing[0] in data_missing | ||
|
||
# check the presence of na_value | ||
assert na_value in data_missing | ||
assert na_value not in data | ||
|
||
# the data can never contain other nan-likes than na_value | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment here. |
||
for na_value_type in {None, np.nan, pd.NA, pd.NaT}: | ||
if na_value_type is na_value: | ||
continue | ||
|
||
assert na_value_type not in data | ||
assert na_value_type in data_missing | ||
|
||
|
||
class TestConstructors(base.BaseConstructorsTests): | ||
pass | ||
|
Uh oh!
There was an error while loading. Please reload this page.