Skip to content

Make k and num_candidates optional in the knn section #97533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jpountz opened this issue Jul 10, 2023 · 35 comments · Fixed by #101209
Closed

Make k and num_candidates optional in the knn section #97533

jpountz opened this issue Jul 10, 2023 · 35 comments · Fixed by #101209
Assignees
Labels
>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@jpountz
Copy link
Contributor

jpountz commented Jul 10, 2023

Description

k and num_candidates are required parameters of the knn section. Could they have sensible defaults instead? E.g. could we default k to size and num_candidates to a value that would yield a sensible latency/recall trade-off depending on k?

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jul 10, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@jimczi
Copy link
Contributor

jimczi commented Aug 7, 2023

Another way to streamline this could involve substituting num_candidates with a multiplicative factor. A potential default could be 10, meaning it becomes 10 times k. This approach would eliminate the need for altering the value whenever k is overloaded.

@pmpailis
Copy link
Contributor

pmpailis commented Nov 2, 2023

Run a few experiments with different k and numCands combinations. I've also made some "hacky" changes in Lucene to expose the number of nodes visited as hits in the final response, in order to give us an idea of how these parameters affect graph exploration.

As suggested, I explored k defaulting to size, and numCands being relative to k, and express it as a multiplicative factor ([1x, 1.5x, 2x, 3x, 4x, ...]). The values of k considered were [10, 20, 50, 100, 1000]. I've used Rally's dense_vector and so_vector tracks for evaluation with the default datasets, after enhancing the later one with 2000 vector queries extracted from the corpus (and adding noise in some cases), measuring latency, avg_recall, and nodes_visited (median and 99th percentile) for all queries.

I've used Kibana to visualize the results which are as follows:
Screenshot 2023-11-02 at 19 41 23

And in tabular format:

  • For dense_vector track
k numCands 99th percentile - nodes_visited avg - nodes_visited avg - recall latency
10 10 44.43k 31.50k 0.879 105.776
10 15 46.38k 32.37k 0.919 11.977
10 20 55.01k 38.33k 0.94 10.319
10 30 71.28k 49.45k 0.963 11.9
10* 100* 195.69k 132.93k 0.994 15.801
20 20 77.04k 54.19k 0.932 116.899
20 30 101.20k 69.75k 0.958 15.943
20 40 121.91k 84.07k 0.971 14.342
20 80 196.59k 133.52k 0.99 13.397
20* 100* 228.79k 155.09k 0.993 13.217
50 50 141.80k 97.47k 0.971 10.765
50 75 187.85k 127.86k 0.984 11.941
50* 100* 228.79k 155.09k 0.989 12.4
50 150 299.92k 203.20k 0.994 14.956
50 200 361.62k 245.46k 0.996 18.298
100* 100* 228.79k 155.09k 0.984 12.683
100 150 299.92k 203.20k 0.992 13.594
100 200 361.62k 245.46k 0.995 14.532
100 300 469.63k 318.90k 0.997 17.702
100 400 560.77k 382.58k 0.998 23.566
1,000* 1,000* 965.15k 669.10k 0.998 38.155
1,000 1,500 1.21m 848.51k 0.999 43.291
1,000 2,000 1.42m 1.00m 0.999 43.783
1,000 3,000 1.77m 1.26m 1 59.581
1,000 4,000 2.04m 1.48m 1 69.658
  • For so_vector track
k numCands 99th percentile - nodes_visited avg - nodes_visited avg - recall latency
10 10 137.05k 115.74k 0.925 494.525
10 15 173.10k 145.54k 0.948 80.989
10 20 206.64k 173.40k 0.963 96.038
10 30 268.52k 225.34k 0.975 139.671
10* 100* 627.21k 517.94k 0.993 616.19
20 20 198.46k 167.63k 0.954 632.526
20 30 258.98k 218.01k 0.97 112.374
20 40 312.58k 264.54k 0.977 125.152
20 80 511.22k 426.34k 0.99 190.286
20* 100* 597.47k 497.62k 0.992 134.843
50 50 367.91k 308.11k 0.981 28.984
50 75 488.69k 407.70k 0.988 35.175
50* 100* 597.47k 497.62k 0.991 35.718
50 150 795.82k 658.62k 0.994 317.462
50 200 974.52k 802.20k 0.995 221.507
100* 100* 597.47k 497.62k 0.99 29.468
100 150 795.82k 658.62k 0.994 27.239
100 200 974.52k 802.20k 0.995 31.492
100 300 1.29m 1.06m 0.997 360.641
100 400 1.55m 1.28m 0.998 272.023
1,000* 1,000* 2.75m 2.30m 0.999 803.109
1,000 1,500 3.48m 2.94m 0.999 205.52
1,000 2,000 4.08m 3.46m 0.999 155.893
1,000 3,000 5.03m 4.32m 0.999 158.196
1,000 4,000 5.78m 5.01m 1 174.905

Note: there is high latency reported for small exploration parameters in some cases, for which I'm still looking into it

Based on the above, I'd suggest that we could default numCands to max(100, k), as this gives us high enough recall in all scenarios where k <= 100, and for values larger than that, exploring just enough results (i.e. numCands == k) seems to be sufficient enough to achieve high accuracy metrics. This could be due to the fact that the exploration that we need to do in the final layer, at least in the given datasets, to get around a "good" neighbourhood is independent of k for large enough values of k. i.e. even if we want 1000 results, increasing the numCands from 1000 to 3000 brings almost no value (recall increases from .998 to .999), but does introduce a significant increase in latency (38ms to 60ms).

@pmpailis
Copy link
Contributor

pmpailis commented Nov 3, 2023

We discussed the results & the conclusions with @mayya-sharipova and agreed to verify a few more things prior to reaching a final conclusion on the default value for num_candidates. More specifically:

  • Re-run latency tests with proper warmup first (to address the high numbers in some cases reported above) and verify results
  • Experiment with a few more k values - including something in the [100, 1000] which is rather big gap for now
  • Experiment with another dataset whose initial (k=10 - numCands=10) recall is rather low. We should also look into the ones mentioned in ANN-benchmarks as well
  • Try to check how the size of the individual graphs affects recall - i.e. try to force merge to a single segment and limit index to a few (or just 1) shard so that all documents are part of the same graph - and run benchmarks against that.

@benwtrent
Copy link
Member

Could you put these latencies and recalls in a Pareto graph (probably each k deserves its own graph per data set)? This way we can easily see the optimal point where recall starts to level off as latency increases.

@pmpailis
Copy link
Contributor

pmpailis commented Nov 28, 2023

Now that we have also introduced knn as a query, I was wondering what we want the target behavior to be for KNN searches as far as param validation goes - as there are different scenarios and combinations on how parameters may be set. As discussed the defaults could be set as:

  • size for k if k missing,
  • N if num_candidates is missing (N is TBD but does not affect the discussion)

Now on to some examples (many are derived from some really nice tests that @mayya-sharipova added):

Case 1
knn-search if num_candidates and size are both specified. E.g:

POST /test_index/_search?
{
    "knn": {
        "field": "embedding",
        "query_vector": [1, 2, 3],
        "num_candidates": 10
    },
    "size": 5
}

Case 2
knn-search if only num_candidates. E.g:

POST /test_index/_search?
{
    "knn": {
        "field": "embedding",
        "query_vector": [1, 2, 3],
        "num_candidates": 5
    }
}

In this scenario what should we expect the result to be? Should we assume the default size of 10 and act similarly to Case#1, or if size is not explicitly set then we do not take it into account?

Case 3
knn-query both num_candidates and size are set

For knn-queries we do not specify k, but instead we always assume it to be equal to size. So the questions is pretty similar to Case#1- i.e. do we want to throw based on size or return as many results as possible (even if by definition will be < than size) ?

POST /test_index/_search?
{
    "query": {
        knn": {
            "field": "embedding",
            "query_vector": [1, 2, 3],
            "num_candidates": 5
        }
    },
    "size": 10
}

Case 4
knn-query only num_candidates is set

Similarly to Case#2

POST /test_index/_search?
{
    "query": {
        knn": {
            "field": "embedding",
            "query_vector": [1, 2, 3],
            "num_candidates": 5
        }
    }
}

Case 5
nested/function/filter/pinned knn-queries etc

POST /test_index/_search?
{
"size": 3,
"query": {
          "function_score": {
              "query": {
                "knn": { 
                  "field": "embedding",
                  "query_vector": [ 1, 2, 3],
                  "num_candidates": 5
              "functions": [
                {"filter": { "match": { "my_field": v1 } }
                  "weight": 10 } } }, 
              "boost_mode": "multiply"}}}}
}

A knn query can be nested on multiple levels and IMHO it might be a bit confusing on why a request might fail because somewhere a num_candidates was set with an invalid value.

I would think that for knn-search, validation is much more straightforward and intuitive. For knn-query, while it would be nice to have similar restrictions and validations to knn-search, that could also very easily end up being a bit confusing, the more complex the query gets. Maybe we could "auto-correct" the param with a warning instead of throwing? :thiking-face WDYT?

@benwtrent
Copy link
Member

preamble:

kNN requires running in the DFS phase, where we at most return k elements from each shard. These are then merged into a global-k. Then once there is a global top-k, they are serialized out to be combined with any other query objects and collected up to size.

This is very different from the knn query. The knn query runs on the shard, exploring num_candidates, merging those results with other queries on the shard and then collecting size results. There is no "k" in the knn query.

Case 2

I think only returning num_candidate values for kNN here is OK.

For consistent behavior, we should enforce k = Math.min(num_candidate, size).

Case 3

No, we don't want to throw. Note, there is no k for a knn query. You can consider num_candidates as "shard level top-k".

Case 4

Again, we don't want to throw, this is fine. We gather num_candidates per shard, collecting size documents based on the configured query.

case 5

Again, this is fine.

For knn-query, while it would be nice to have similar restrictions and validations to knn-search

To me this is the opposite of nice. Knn as a query is meant to be more expert and less restrictive.

@pmpailis
Copy link
Contributor

I see your point. By "nice" I mean that it would be nice to be consistent. E.g. one might assume that the following 2 would yield the same results, but it might not be the case:

POST /test_index/_search?
{
    "knn": {
        "field": "embedding",
        "query_vector": [1, 2, 3],
        "num_candidates": 5
    },
    "size": 10
}
POST /test_index/_search?
{
    "query": {
        "knn": {
            "field": "embedding",
            "query_vector": [1, 2, 3],
            "num_candidates": 5
        }
    },
    "size": 10
}

So, if someone is met with an exception in the first case, instead of adjusting the exploration params, they could just as easily "nest" the search under query and then search would succeed (albeit with maybe sub-optimal results).
Taking into account the potential usages of knn query I agree that it shouldn't be so restrictive or with complicated dependencies overall - but I believe that while onboarding/first using knn on ES, the very straightforward example above would seem a bit confusing to me as an end user.

@benwtrent
Copy link
Member

The query shouldn't throw.

The top-level knn body object shouldn't throw.

k should default to Math.min(num_candidates, size)

@pmpailis
Copy link
Contributor

Fair enough - given that the existing behavior was to throw I was under the impression that we want to keep this as an API design.

So if we omit k or in knn queries where we don't have k at all, we'll respect num_candidates as a parameter no matter the user-specified size (even if this means returning less than size requested results). So in the examples above, we wouldn't throw an exception, and whether we meet the requested size will effectively be down to the number of shards the index has.

@benwtrent
Copy link
Member

I would like @mayya-sharipova's input as well. But that is my current thought process.

. So in the examples above, we wouldn't throw an exception, and whether we meet the requested size will effectively be down to the number of shards the index has.

This would only be for the knn query. The top level kNN clause should return consistent number of values no matter the number of shards.

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Nov 30, 2023

@pmpailis @benwtrent Thanks for the discussion so far.

@pmpailis If k is omitted for the top level knn search, good idea of using size instead, as size is always available (default size is 10). Let's proceed with that.


If k (or size) > num_candidates, summarizing the discussion, we have 3 options:

  1. Throw an exception
    • easy to reason for users
    • we consistently return the final k results (what the user requested) for the top knn search and knn query regardless of number of shards (as we will always have k results available).
  2. Auto-correct: k = Math.min(num_candidates, k)
    • more difficult to reason for users
    • we consistently return the final num_candidates results (we are consistent, but that's NOT what user requested)
  3. Proceed
    • most difficult to reason for users
    • inconsistency: the final number of returned results depends on number of shards (num_candidates * number_of_shards) both for the top level knn search and knn query.

I am ok with options 1 or 2. But I since we already throw an exception for the top level knn search, I prefer to go with option 1 and also throw an exception for knn query when size > num_candidates.

@pmpailis @benwtrent What do you think?

@jpountz
Copy link
Contributor Author

jpountz commented Nov 30, 2023

we consistently return the final k results (what the user requested) for the top knn search and knn query regardless of number of shards (as we will always have k results available).

I agree with @benwtrent 's above statement that the knn query doesn't have a k, so it shouldn't be affected by the size either, and we should just proceed with the configured number of candidates, regardless of the configured size on the search request body?

Auto-correct: k = Math.min(num_candidates, k)

Intuitively k is more important than num_candidates (since k is the "what" while num_candidates is rather the "how") so I would rather auto-correct the other way around, ie. num_candidates = Math.max(num_candidates, k)?

@benwtrent
Copy link
Member

Intuitively k is more important than num_candidates (since k is the "what" while num_candidates is rather the "how") so I would rather auto-correct the other way around, ie. num_candidates = Math.max(num_candidates, k)?

I don't think we should "auto-correct" as the user is specifically requesting a lower num_candidates without supplying a k which will default to size.

I think the only valid options are:

  • Throw an error for the knn section if num_candidates is < size when k isn't present
  • auto-correct k instead of defaulting to size

To move this forward, I am fine with throwing an error for now. Its always easier to be more lenient in the future if users consider this to be frustrating :)

@pmpailis
Copy link
Contributor

pmpailis commented Dec 1, 2023

Thanks @mayya-sharipova, @jpountz & @benwtrent !

Personally, in the broader view of things I agree with @mayya-sharipova 's conclusion that we should either throw or autocorrect - not just let the query proceed as this could lead to weird & potentially inconsistent behavior for the user as in the example here.

I would also consider autocorrect but as @benwtrent 's mentioned - it'll always be easier to be more lenient in the future than to start throwing exceptions for previously working functionality (if for example we discover later that for some reason this configuration isn't working as expected), so I'd opt for throwing in all cases (both knn search & query).

Would potentially autocorrecting num_candidates might have an effect on the collected aggs / combining results/hits/scores etc? 🤔

So, if everyone's ok with this, I'd suggest proceeding with throwing an exception for now for the following cases (making sure that the exception message is clear and easy to understand):

knn-search

  • When num_candidates and k are both present, and num_candidates < k
  • When num_candidates and size are present, and num_candidates < size
  • When only num_candidates is defined and num_candidates < 10 (default size)

knn-query

  • When num_candidates and size are present, and num_candidates < size
  • When only num_candidates is defined and num_candidates < 10 (default size)

@benwtrent
Copy link
Member

, so I'd opt for throwing in all cases (both knn search & query).

No, knn search only. I stand firm we shouldn't throw on the query.

@pmpailis
Copy link
Contributor

pmpailis commented Dec 1, 2023

, so I'd opt for throwing in all cases (both knn search & query).

No, knn search only. I stand firm we shouldn't throw on the query.

Ok, fair enough. If we don't autocorrect either though and just proceed with what the user specified, then as mentioned the results will vary depending on the number of shards. Which in turn could be a bit confusing as the same query might return different results on different setups (e.g. prod vs staging vs benchmarking environments where the clusters might be a bit different). On the other hand, it's not clear to me what the implications of auto-correcting might be so, for now, I don't have a strong opinion.

@benwtrent
Copy link
Member

then as mentioned the results will vary depending on the number of shards.

The query already has this problem IMO. The number of hits and recall is effected directly by the number of shards (we gather num_candidates per shard and collect size per shard). Its considered "expert" for a reason.

@pmpailis
Copy link
Contributor

pmpailis commented Dec 1, 2023

TBH I believe that the adaptation will be very wide - despite it being labeled as "expert use". And if people have already pipelines for generating query requests, it might be easier to reuse them for knn with all the additional capabilities that come with it, or add it as a scoring function in a hybrid mode (we do have rrf for that though), rather than having a single knn search. But I do see your point - just thinking whether it'd be helpful to maybe somehow let the user know in case something might not be properly set up - or if we know that the search will not fill size results based on the num_candidates provided.

@mayya-sharipova
Copy link
Contributor

@pmpailis It looks like a way forward for the top level knn search that we all agree on. Let's proceed with it, and leave knn query for a later discussion.

@pmpailis
Copy link
Contributor

pmpailis commented Dec 4, 2023

Ok - will setup a PR covering the knn-search part and leave validation for knn-query as-is (except the already existing max value limit).
We should just keep this in mind in case any inquires might come on later w.r.t. the potentially different behavior of the two - and the implicit dependency to number of shards (as an edge case).

@pmpailis
Copy link
Contributor

pmpailis commented Dec 11, 2023

Run some more experiments using rally with the following 5 tracks/datasets:

Furthermore for the dense_vector and so_vector tracks we tested explicitly having one vs many segments by removing the force-merge action on the track's schedule.

For each of the above datasets we explored k being one of [10, 20, 50, 100, 500, 1000] and num_candidates being relative of k and defined as one of the following multipliers: [1x, 1.5x, 2x, 3x, 4x] and one additional entry defined as max(100, k). In turn, for each run we report recall@k, number of nodes visited, latency@50, latency@90, and latency@99_9 percentiles.

Attaching a pdf with the full results & will post recall/latency graphs for each datase as a separate comment for better readability - following with an interpretation of the results & the suggestion to move forward.

@pmpailis
Copy link
Contributor

pmpailis commented Dec 11, 2023

DENSE-VECTOR

dense_vector_single_segment

@pmpailis
Copy link
Contributor

pmpailis commented Dec 11, 2023

DENSE-VECTOR multiple segments

dense_vector_multiple_segments

@pmpailis
Copy link
Contributor

pmpailis commented Dec 11, 2023

SO_VECTOR

so_vector_single_segment

@pmpailis
Copy link
Contributor

pmpailis commented Dec 11, 2023

SO_VECTOR multiple segments

so_vector_multiple_segments

@pmpailis
Copy link
Contributor

pmpailis commented Dec 11, 2023

OPENAI_VECTOR

openai_vector

@pmpailis
Copy link
Contributor

pmpailis commented Dec 11, 2023

COHERE_VECTOR

cohere_vector

@pmpailis
Copy link
Contributor

pmpailis commented Dec 11, 2023

GLOVE

glove_vector

@jimczi
Copy link
Contributor

jimczi commented Dec 11, 2023

Thanks for sharing @pmpailis . One aspect to keep in mind is that the number of candidates should also be a factor of the total number of indexed vectors as well as the number of segments. The previous shared results indicate that some of the tested queries need to visit a fairly significant percentage of the total documents. So rather than the latency I think it would be easier to look at the percentage of documents visited.
Are you trying to find the default value for num_candidates when it is not set?

@pmpailis
Copy link
Contributor

Thanks for taking a look @jimczi ! Tbh we also discussed briefly with @benwtrent an alternative to make num_candidates relative to the graph size instead of dependant on query-time parameters (e.g. k or size), but agreed to handle this in a different issue and not in this PR. The previous results, which wasn't clear and is my fault, were upon having multiple segments, i.e. w/o force-merge.

I've also updated the comment above, as we also run a couple of tests (with the dense_vector and so_vector tracks) to account for both having one vs many segments. Albeit, upon taking another look now I think that there might have been an issue with generating the heatmap for the multiple segments, so will update them asap.

@jimczi
Copy link
Contributor

jimczi commented Dec 11, 2023

Tbh we also discussed briefly with @benwtrent an alternative to make num_candidates relative to the graph size instead of dependant on query-time parameters (e.g. k or size), but agreed to handle this in a different issue and not in this PR. The previous results, which wasn't clear and is my fault, were upon having multiple segments, i.e. w/o force-merge.

++, there's also apache/lucene#12794 that should help to reason about num_candidates in a multi-graph setup. Hopefully this should make it easier to setup a value in ES without looking at the shape of the segments in the underlying graphs.

@pmpailis
Copy link
Contributor

pmpailis commented Jan 8, 2024

Took some time as I had to work around some issues with OOM exceptions on local runs after upgrading to Lucene 9.9.0 (and due to the holiday period!), but I'm finally attaching the results for the runs described in the comment above. The recall/latency graphs above are also slightly different for the new runs, so I have updated them in-place.

num_candidates_analysis.pdf

Based on the results, and as discussed with @benwtrent and @mayya-sharipova there are 2 main suggestions on how to proceed with setting the default value for num_candidates (as k will default to size)

  • num_candidates = Math.min(1.5 * k, 10_000) - this achieves good enough recall in almost all cases, with very good latency scores. The only thing that we should carefully state is that for lower k values, e.g. top-1, top-3 searches, the default behavior might not yield satisfactory results as the exploration would be minimal, since num_candidates would be equal to 2 and 5 respectively.
  • num_candidates = Math.max(100, k) - this achieves slightly better recall especially for lower k values, with the tradeoff though of increased default latency compared to the previous approach.

IMHO both options provide good enough defaults, with some tradeoffs on potential applications/latencies. Looking forward to hearing your thoughts!

Note: the results posted in a previous comment for multi-segment dense_vector and so_vector tracks in tabular form extracted from kibana, had some errors when aggregating, so while they are comparable against each other - the absolute numbers are incorrect. Maybe I should mark them as invalid, or update them in place with the proper results to avoid confusion?

@benwtrent
Copy link
Member

@pmpailis thanks a bunch for digging into all this. Given the results, I still think num_candidates = Math.min(1.5 * k, 10_000) is a good default. Folks can easily override this by providing their own num_candidates and out of the box, we default to size: 20.

@pmpailis
Copy link
Contributor

Don't have a strong preference on this, so I'll gladly move on with the suggested 1.5 * k option.

@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants