Skip to content

Make knn search as a query #97940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mayya-sharipova opened this issue Jul 25, 2023 · 14 comments · Fixed by #98916
Closed

Make knn search as a query #97940

mayya-sharipova opened this issue Jul 25, 2023 · 14 comments · Fixed by #98916
Assignees
Labels
>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Jul 25, 2023

Description

Currently knn is a top section of a search request. knn search runs in the DFS phase, which allows to collect global top k results regardless of a number of shards. This is very useful for aggregations, as only global top K results will be included in aggregations' results.
A limitation of having knn as a top section is that knn search by itself can not be combined or nested within other queries.

There are two ways to approach this:

  1. Introduce a new query knn_shard query that works on a shard level in the Query phase similar to all other queries.
  2. Modify the current DSL to make knn as a query, but keep the internal implementation the same (i.e. internally knn search will still run during DFS phase and collect top global k).
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jul 25, 2023
@mayya-sharipova mayya-sharipova changed the title Make knn query Make knn search as a query Jul 25, 2023
@javanna
Copy link
Member

javanna commented Jul 25, 2023

I wonder how we could make knn a query that still runs in the dfs phase, while being able to combine it with other queries. We'd need to be able to extract knn queries from the query tree, run them separately and rewrite them into something else for the query phase, similar to what we already do after we run the dfs phase?

I also wonder if we added a knn query that only runs in the query phase, how would that integrate with aggregations?

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Jul 25, 2023

@javanna Very good questions.

I need to think more about your first question.

also wonder if we added a knn query that only runs in the query phase, how would that integrate with aggregations?

For knn_shard query, aggregations will collect K results per shard with total = K * shards. This is a limitation of this query.

@mayya-sharipova mayya-sharipova self-assigned this Jul 26, 2023
@benwtrent
Copy link
Member

benwtrent commented Aug 8, 2023

This is a limitation of this query.

IMO, this is a feature, not a limitation. This increases recall and if folks only want vectors that fit within a similarity, they have the similarity threshold option.

There are a couple of things that concern me on the query layer we need to be careful about:

  • How do we apply alias filters? These should be made available during query rewrite on the shard, this we they can be added to any knn filters available.
  • Can we auto-apply filters supplied within the DSL as pre-filters?
    • We currently have the frustrating design of requiring a "filter" applied to the knn clause directly. We would need to carefully think about this but I don't think its impossible.
    • One option is, look up the DSL tree to find the first bool query a particular knn query is a member of. Remove the particular knn query and make that bool query its filter.
    • Another option is to take the entire DSL, rewrite it to remove the inner knn query, and apply that as a filter.
    • I think even if we allow things to be auto-discovered via some flag or by default, users should still be able to apply their own filters to override any auto-calculation of filters.

@benwtrent
Copy link
Member

Another thing to consider is that if this query supports searching over non-indexed vectors. I think it should. We should have a parameter called exact that allows users to provide the similarity method to use.

Something like:

"exact": { "similarity": "dot_product"}

This would also work for things indexed in HNSW. Instead of searching the graph, we iterate all vectors using the provided similarity. In this case, I think we could allow "exact": {} and we use the similarity configured in the mapping.

@carlosdelest
Copy link
Member

This issue blocks #97541 as we need to ensure our design for the knn Query is acceptable as is before we attempt to add new things to it that the top level knn doesn't support.

@saiparsa
Copy link

@mayya-sharipova thank you so much for your contributions to this project! Just checking in to see if there's an update or if you might have an idea of when you'll have a chance to look at this further. Thanks again!

@mayya-sharipova
Copy link
Contributor Author

@saiparsa I am wondering what is your need for knn as query? We already have top level knn search, is there something you can't accomplish with it?

@saiparsa
Copy link

@mayya-sharipova Thanks for the reply! We're attempting to leverage KNN for querying embeddings in child docs while concurrently searching metadata in parent docs. However, current Elasticsearch limits prevent KNN from being nested within has_child queries, limiting our desired query approach.

@NhuanTDBK
Copy link

@mayya-sharipova I would love to look knn query can apply another features as query such as BooleanQuery, ConstantScore, Boosting. Is there any plan to upgrade knn query?

@mayya-sharipova
Copy link
Contributor Author

Closed by #98916

@rishubhgupta
Copy link

Thank you so much @mayya-sharipova for working on this. You're the best! This is a really important feature people have been awaiting. If it's not too much to ask, could you please help us understand if this enhancement would be part of the 8.11 series of releases or will it be available in a new series such as 8.12.

@mayya-sharipova
Copy link
Contributor Author

@rishubhgupta Thank you for your kind words. This will be available from 8.12

@yanchaoguo
Copy link

@rishubhgupta Thank you for your kind words. This will be available from 8.12

When is this version coming for 8.12

@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants