Support exact search in a better way #97541

benwtrent · 2023-07-10T19:24:23Z

Description

Currently the only way for users to do a brute-force or exact search with KNN is with a script query. This requires some knowledge of the function names and scoring methodologies.

We should provide a better interface for exact scan. One idea is to have an exact: true field within kNN.

The name of the field is debatable. Or if we even update kNN at all. Maybe a new kNN query that allows for both exact and approximate within the query DSL?

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2023-07-10T19:24:47Z

Pinging @elastic/es-search (Team:Search)

benwtrent · 2023-07-24T19:34:07Z

Some edge cases here are:

dense_vector fields that are not indexed, this means they don't have a similarity_function defined. We need to allow users to specify their expected similarity function AND have a good default defined.
Folks that want to use a different similarity function than the one that is indexed in dense_vector. Should we allow this? It's technically possible as exact will simply be iterating the vectors and calculating similarity.

carlosdelest · 2023-09-05T08:30:49Z

Task Refinement

cc @benwtrent and @mayya-sharipova for validation

Questions

Is this task to be done before or after Make knn search as a query?

Implementation plan

In case we do this before making knn search a query, these are the changes to the code that I've planned:

Lucene

Modify AbstractKnnVectorQuery to add an exact attribute to it for doing exact search.
Modify AbstractKnnVectorQuery.getLeafResults to do an exact search in case exact attribute is set, similar to the current exact searches that are done due to too many nodes visited or less than k possible matches:

    if (exact) {
      Scorer scorer = filterWeight.scorer(ctx);
      BitSet acceptDocs = createBitSet(scorer.iterator(), liveDocs, maxDoc);

      return exactSearch(ctx, new BitSetIterator(acceptDocs, k));
    }

Update KnnVectorQuery, KnnByteVectorQuery and KnnFloatVectorQuery queries adding the exact attribute. I'm thinking on using a Builder or a Parameter Object to avoid duplicating current constructors, thought that would involve changes in the callers as well.
Add tests for exact query parameter and ensuring it performs an exact search. I'm thinking about checking that the proper method (exactSearch) is invoked, but happy to hear other opinions on how to test this.

Elasticsearch

Modify KnnVectorQueryBuilder to add an exact parameter
Change KnnVectorQueryBuilder.doToQuery to pass along the exact parameter to DenseVectorFieldMapper.createKnnQuery
Modify DenseVectorFieldMapper.createKnnQuery to create the Lucene KnnByteVectorQuery or KnnFloatVectorQuery with the Elasticsearch queryexact parameter
Add YAML tests and AbstractKnnVectorQueryBuilderTestCase tests

carlosdelest · 2023-09-06T11:37:07Z

After discussing with @benwtrent , we'll tackle this issue after #97940

carlosdelest · 2023-11-07T14:11:35Z

This is no longer blocked - @liranabn for prioritisation.

benwtrent · 2023-11-07T14:25:38Z

We need to consider the case when vectors are not in an HNSW graph at all (e.g. "index: false"). We need to allow kNN queries and top level kNN to work there as well IMO. This may require some configuration from the user to indicate the similarity they want to use. Possibly, we just require the similarity to be set in the mapping and if they want to use custom similarity functions, they must switch back to script.

I wonder if we should allow similarity to be stored in the mapping configuration even when index: false.

kderusso · 2023-11-27T21:03:54Z

Proposal:

The scope of this issue is to add a flag e.g. exact: true to the KNN request.
- This will only worked if indexed: true
- If indexed: false this should probably error
Create a followup issue to address "flat" indices where content is not indexed and we don't have a default similarity in place.

This turns this issue into a doable action item that provides value to our users, and defers scoping of some of the edge case questions surrounding flat indices to outside the scope of this issue.

CC: @liranabn @benwtrent @mayya-sharipova

saikatsarkar056 · 2024-03-05T15:50:50Z

From the above discussion, we will take the following steps for the scope of this work.

Based on this comment, create a PR in Lucene and merge it after review.
Wait for the latest release of lucene and upgrade of lucene in elasticsearch.
Based on this comment, create a PR in Elasticsearch.
Handling the edge cases is out of scope for this issue.

saikatsarkar056 · 2024-03-05T17:54:49Z

For the Lucene changes, we need a new public interface that all leaf readers can read.

Assigning the issue to @benwtrent for the lucene work. Once the lucene work is done, search relevance team can take the elasticsearch work.

elasticsearchmachine · 2024-07-12T08:32:43Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

benwtrent added >enhancement :Search Relevance/Vectors Vector search labels Jul 10, 2023

elasticsearchmachine added the Team:Search Meta label for search team label Jul 10, 2023

benwtrent mentioned this issue Aug 10, 2023

Incorrect total value with k-NN search #97807

Closed

carlosdelest self-assigned this Sep 4, 2023

carlosdelest removed their assignment Sep 6, 2023

carlosdelest mentioned this issue Sep 6, 2023

Make knn search as a query #97940

Closed

saikatsarkar056 self-assigned this Feb 23, 2024

saikatsarkar056 assigned benwtrent and unassigned saikatsarkar056 Mar 5, 2024

javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support exact search in a better way #97541

Support exact search in a better way #97541

benwtrent commented Jul 10, 2023

elasticsearchmachine commented Jul 10, 2023

Uh oh!

benwtrent commented Jul 24, 2023

Uh oh!

carlosdelest commented Sep 5, 2023 •

edited by javanna

Loading

Uh oh!

carlosdelest commented Sep 6, 2023

Uh oh!

carlosdelest commented Nov 7, 2023

Uh oh!

benwtrent commented Nov 7, 2023

Uh oh!

kderusso commented Nov 27, 2023

Uh oh!

saikatsarkar056 commented Mar 5, 2024 •

edited

Loading

Uh oh!

saikatsarkar056 commented Mar 5, 2024

Uh oh!

elasticsearchmachine commented Jul 12, 2024

Uh oh!

Support exact search in a better way #97541

Support exact search in a better way #97541

Comments

benwtrent commented Jul 10, 2023

Description

elasticsearchmachine commented Jul 10, 2023

Uh oh!

benwtrent commented Jul 24, 2023

Uh oh!

carlosdelest commented Sep 5, 2023 • edited by javanna Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Task Refinement

Questions

Implementation plan

Lucene

Elasticsearch

Uh oh!

carlosdelest commented Sep 6, 2023

Uh oh!

carlosdelest commented Nov 7, 2023

Uh oh!

benwtrent commented Nov 7, 2023

Uh oh!

kderusso commented Nov 27, 2023

Uh oh!

saikatsarkar056 commented Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saikatsarkar056 commented Mar 5, 2024

Uh oh!

elasticsearchmachine commented Jul 12, 2024

Uh oh!

carlosdelest commented Sep 5, 2023 •

edited by javanna

Loading

saikatsarkar056 commented Mar 5, 2024 •

edited

Loading