Skip to content

Support exact search in a better way #97541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
benwtrent opened this issue Jul 10, 2023 · 10 comments
Open

Support exact search in a better way #97541

benwtrent opened this issue Jul 10, 2023 · 10 comments
Assignees
Labels
>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@benwtrent
Copy link
Member

Description

Currently the only way for users to do a brute-force or exact search with KNN is with a script query. This requires some knowledge of the function names and scoring methodologies.

We should provide a better interface for exact scan. One idea is to have an exact: true field within kNN.

The name of the field is debatable. Or if we even update kNN at all. Maybe a new kNN query that allows for both exact and approximate within the query DSL?

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jul 10, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@benwtrent
Copy link
Member Author

Some edge cases here are:

  • dense_vector fields that are not indexed, this means they don't have a similarity_function defined. We need to allow users to specify their expected similarity function AND have a good default defined.
  • Folks that want to use a different similarity function than the one that is indexed in dense_vector. Should we allow this? It's technically possible as exact will simply be iterating the vectors and calculating similarity.

@carlosdelest
Copy link
Member

carlosdelest commented Sep 5, 2023

Task Refinement

cc @benwtrent and @mayya-sharipova for validation

Questions

Implementation plan

In case we do this before making knn search a query, these are the changes to the code that I've planned:

Lucene

  • Modify AbstractKnnVectorQuery to add an exact attribute to it for doing exact search.
  • Modify AbstractKnnVectorQuery.getLeafResults to do an exact search in case exact attribute is set, similar to the current exact searches that are done due to too many nodes visited or less than k possible matches:
    if (exact) {
      Scorer scorer = filterWeight.scorer(ctx);
      BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc);

      return exactSearch(ctx, new BitSetIterator(acceptDocs, k));
    }
  • Update KnnVectorQuery, KnnByteVectorQuery and KnnFloatVectorQuery queries adding the exact attribute. I'm thinking on using a Builder or a Parameter Object to avoid duplicating current constructors, thought that would involve changes in the callers as well.
  • Add tests for exact query parameter and ensuring it performs an exact search. I'm thinking about checking that the proper method (exactSearch) is invoked, but happy to hear other opinions on how to test this.

Elasticsearch

  • Modify KnnVectorQueryBuilder to add an exact parameter
  • Change KnnVectorQueryBuilder.doToQuery to pass along the exact parameter to DenseVectorFieldMapper.createKnnQuery
  • Modify DenseVectorFieldMapper.createKnnQuery to create the Lucene KnnByteVectorQuery or KnnFloatVectorQuery with the Elasticsearch queryexact parameter
  • Add YAML tests and AbstractKnnVectorQueryBuilderTestCase tests

@carlosdelest
Copy link
Member

After discussing with @benwtrent , we'll tackle this issue after #97940

@carlosdelest
Copy link
Member

This is no longer blocked - @liranabn for prioritisation.

@benwtrent
Copy link
Member Author

We need to consider the case when vectors are not in an HNSW graph at all (e.g. "index: false"). We need to allow kNN queries and top level kNN to work there as well IMO. This may require some configuration from the user to indicate the similarity they want to use. Possibly, we just require the similarity to be set in the mapping and if they want to use custom similarity functions, they must switch back to script.

I wonder if we should allow similarity to be stored in the mapping configuration even when index: false.

@kderusso
Copy link
Member

Proposal:

  • The scope of this issue is to add a flag e.g. exact: true to the KNN request.
    • This will only worked if indexed: true
    • If indexed: false this should probably error
  • Create a followup issue to address "flat" indices where content is not indexed and we don't have a default similarity in place.

This turns this issue into a doable action item that provides value to our users, and defers scoping of some of the edge case questions surrounding flat indices to outside the scope of this issue.

CC: @liranabn @benwtrent @mayya-sharipova

@saikatsarkar056 saikatsarkar056 self-assigned this Feb 23, 2024
@saikatsarkar056
Copy link
Contributor

saikatsarkar056 commented Mar 5, 2024

From the above discussion, we will take the following steps for the scope of this work.

  • Based on this comment, create a PR in Lucene and merge it after review.
  • Wait for the latest release of lucene and upgrade of lucene in elasticsearch.
  • Based on this comment, create a PR in Elasticsearch.
  • Handling the edge cases is out of scope for this issue.

@saikatsarkar056
Copy link
Contributor

For the Lucene changes, we need a new public interface that all leaf readers can read.

Assigning the issue to @benwtrent for the lucene work. Once the lucene work is done, search relevance team can take the elasticsearch work.

@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

6 participants