Skip to content

Implement tensor.isin #2098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Conversation

ndgrigorian
Copy link
Collaborator

This PR proposes an implementation for isin, a function likely coming to a future array API specification, which leverages a similar kernel to the implementation of searchsorted

This implementation uses the searchsorted kernel to check if the value has a position in the array. If that position is the number of elements in the array, it is not a member. Otherwise, if arr[pos] == val for some array arr being searched for value val, then val is a member.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

Copy link

github-actions bot commented Jun 6, 2025

Copy link

github-actions bot commented Jun 6, 2025

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_8 ran successfully.
Passed: 1115
Failed: 6
Skipped: 119

@coveralls
Copy link
Collaborator

coveralls commented Jun 6, 2025

Coverage Status

coverage: 84.892% (-0.1%) from 84.989%
when pulling 23c61a8 on feature/searchsorted-based-isin
into 35a8c26 on master.

@ndgrigorian ndgrigorian force-pushed the feature/searchsorted-based-isin branch from 1805102 to 5355fb8 Compare June 6, 2025 22:41
Copy link

github-actions bot commented Jun 6, 2025

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_10 ran successfully.
Passed: 1114
Failed: 7
Skipped: 119


dep_evs = _manager.submitted_events
ht_ev, s_ev = _isin(
needles=x1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the only case when the strided implementation (which assumes slower) will be used when x1 array is not contiguous (we sort test_elements array and no out keyword in isin function).
Would it make sense to flatten input array x and to pass order keyword there?

But, it makes sense also to keep strided implementation of _isin in case when it might be helpful in implementation of other set functions).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can experiment and see values of flattening vs. not flattening

but in general, this implementation is going to be changed quite a bit soon, I have some local changes waiting

@ndgrigorian
Copy link
Collaborator Author

@antonwolfy
implementation is updated and now permits Python scalars in second argument

tests still need to be added, but isin itself is ready for review

@ndgrigorian ndgrigorian marked this pull request as ready for review June 11, 2025 22:15
@ndgrigorian ndgrigorian requested a review from antonwolfy June 11, 2025 22:20
Copy link

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_17 ran successfully.
Passed: 1115
Failed: 6
Skipped: 119

@@ -112,6 +112,7 @@ set(_reduction_sources
${CMAKE_CURRENT_SOURCE_DIR}/libtensor/source/reductions/sum.cpp
)
set(_sorting_sources
${CMAKE_CURRENT_SOURCE_DIR}/libtensor/source/sorting/isin.cpp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem relating to sorting routine

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it uses common utilities with searchsorted (i.e., from rich_comparisons.hpp) which is why it lives there

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the code from rich_comparisons gets factored out, I can go ahead and move it elsewhere, I guess to _tensor_impl for now

fnT get() const
{
using dpctl::tensor::kernels::isin_contig_impl;
using Compare = typename AscendingSorter<argTy>::type;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have Compare templated here? Is there any use case possible when another one will be required to be used by isin kernel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not strictly necessary, but was done to reduce code duplication—these sorters are defined in tensor/source.

I can look at the normal sort implementation to refresh myself on what was done there, but if that isn't sufficient, it may be preferable to template and pass the sorter here as opposed to duplicating in isin.hpp

Copy link

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_18 ran successfully.
Passed: 1115
Failed: 6
Skipped: 119

Copy link

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_22 ran successfully.
Passed: 1115
Failed: 6
Skipped: 119

Copy link

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_23 ran successfully.
Passed: 1115
Failed: 6
Skipped: 119

@ndgrigorian ndgrigorian requested a review from antonwolfy June 17, 2025 06:24
Copy link

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_24 ran successfully.
Passed: 1115
Failed: 6
Skipped: 119

@ndgrigorian ndgrigorian force-pushed the feature/searchsorted-based-isin branch from b3822f3 to f7e0967 Compare June 17, 2025 07:40
Copy link

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_23 ran successfully.
Passed: 1116
Failed: 5
Skipped: 119


if not isinstance(x, dpt.usm_ndarray):
x_arr = dpt.asarray(
x, dtype=dt1, usm_type=res_usm_type, sycl_queue=exec_q
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to cast here to result dtype to avoid unnecessary copy below?

Suggested change
x, dtype=dt1, usm_type=res_usm_type, sycl_queue=exec_q
x, dtype=dt, usm_type=res_usm_type, sycl_queue=exec_q

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same with test_arr casting here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And probably then it would sense to combine the checks, like:

    if not isinstance(x, dpt.usm_ndarray):
        x_buf = dpt.asarray(
            x, dtype=dt, usm_type=res_usm_type, sycl_queue=exec_q
        )
    elif x_dt != dt:
        x_buf = _empty_like_orderK(x, dt, res_usm_type, sycl_dev)
        ht_ev, ev = _copy_usm_ndarray_into_usm_ndarray(
            src=x, dst=x_buf, sycl_queue=exec_q, depends=dep_evs
        )
        _manager.add_event_pair(ht_ev, ev)
    else:
        x_buf = x

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left it this way because this is how element-wise functions handle it, as well: scalars are put first into the appropriate array type, then the array is cast into another type for computation

I'm not sure if it's strictly necessary, but may avoid some edge cases producing incorrect results



@pytest.mark.parametrize(
"dtype",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is about case when inputs have different dtypes and casting is required?

Copy link

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_24 ran successfully.
Passed: 1112
Failed: 9
Skipped: 119

@ndgrigorian ndgrigorian force-pushed the feature/searchsorted-based-isin branch from 8f19cb5 to 3cf7445 Compare June 17, 2025 21:32
Copy link

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_26 ran successfully.
Passed: 1115
Failed: 6
Skipped: 119

@ndgrigorian ndgrigorian force-pushed the feature/searchsorted-based-isin branch from 3cf7445 to 7c6a4be Compare June 17, 2025 22:20
Copy link

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_27 ran successfully.
Passed: 1115
Failed: 6
Skipped: 119

Copy link

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_30 ran successfully.
Passed: 1115
Failed: 6
Skipped: 119

isin leverages kernel very similar to searchsorted, but after the search, the position is checked, and if the position is equal to the number of elements in the searched array, existence is considered false
@ndgrigorian ndgrigorian force-pushed the feature/searchsorted-based-isin branch from 16c63e4 to 23c61a8 Compare June 18, 2025 20:06
Copy link

Array API standard conformance tests for dpctl=0.21.0dev0=py310h93fe807_35 ran successfully.
Passed: 1115
Failed: 6
Skipped: 119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants