Skip to content

Making k and num_candidates optional for knn search #101209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 75 commits into from
Feb 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
7eba0ce
updating
pmpailis Oct 9, 2023
b5b2b90
temp commit
pmpailis Oct 13, 2023
0860626
Merge branch 'elastic:main' into feature/97533
pmpailis Oct 16, 2023
c31b2b7
Merge branch 'feature/97533' of github.com:pmpailis/elasticsearch int…
pmpailis Oct 16, 2023
f9c1345
updating
pmpailis Oct 19, 2023
d4e8445
updating docs
pmpailis Oct 20, 2023
7cd412d
updating
pmpailis Oct 23, 2023
20554ac
updating
pmpailis Oct 23, 2023
343acbc
updating
pmpailis Oct 23, 2023
eaade90
updating
pmpailis Oct 23, 2023
4b5ce54
Merge branch 'elastic:main' into feature/97533
pmpailis Oct 23, 2023
a0f8152
updating
pmpailis Oct 23, 2023
ec07bfc
updating
pmpailis Oct 23, 2023
d24ad28
updating
pmpailis Oct 23, 2023
2586f9c
Merge remote-tracking branch 'origin/main' into feature/97533
pmpailis Nov 2, 2023
bf7ef86
adding requestSize to SearchExecutionContext
pmpailis Nov 8, 2023
a66f965
merging main
pmpailis Nov 8, 2023
642f481
updating
pmpailis Nov 8, 2023
c3794fc
updating
pmpailis Nov 8, 2023
7e29689
Update docs/changelog/101209.yaml
pmpailis Nov 8, 2023
8cadde7
updating
pmpailis Nov 8, 2023
c26da26
Merge branch 'feature/97533' of github.com:pmpailis/elasticsearch int…
pmpailis Nov 8, 2023
46522ff
updating
pmpailis Nov 8, 2023
e1bcdfc
updating
pmpailis Nov 8, 2023
f5f96b0
Merge remote-tracking branch 'origin/main' into feature/97533
pmpailis Nov 9, 2023
48fc088
merge main
pmpailis Nov 23, 2023
e64b0dd
updating
pmpailis Nov 23, 2023
59f1d95
updating
pmpailis Nov 23, 2023
d2515bd
updating
pmpailis Nov 23, 2023
1e759bd
updating
pmpailis Nov 23, 2023
75be049
updating
pmpailis Nov 23, 2023
767a423
updating
pmpailis Nov 28, 2023
8abc91d
Merge branch 'main' into feature/97533
pmpailis Nov 28, 2023
6a50a33
Merge branch 'main' into feature/97533
elasticmachine Nov 28, 2023
6deca60
Merge remote-tracking branch 'origin/main' into feature/97533
pmpailis Dec 5, 2023
d1fb407
updating
pmpailis Dec 5, 2023
e32190a
updating
pmpailis Dec 5, 2023
d3d1f3f
updating
pmpailis Dec 6, 2023
db92720
Merge remote-tracking branch 'origin/main' into feature/97533
pmpailis Dec 6, 2023
e0c8e8b
updating
pmpailis Dec 6, 2023
07dea0b
updating
pmpailis Dec 7, 2023
af23344
Merge remote-tracking branch 'origin/main' into feature/97533
pmpailis Dec 7, 2023
091f3e5
updating
pmpailis Dec 8, 2023
2a8b5f9
updating
pmpailis Dec 8, 2023
de8cd7a
updating
pmpailis Dec 8, 2023
5c548bb
Update 101209.yaml
pmpailis Dec 8, 2023
50c4c25
updating
pmpailis Dec 8, 2023
4fcdc1c
Merge remote-tracking branch 'origin/main' into feature/97533
pmpailis Dec 11, 2023
78cbbdc
minor
pmpailis Dec 13, 2023
02fee64
Merge remote-tracking branch 'origin/main' into feature/97533
pmpailis Jan 10, 2024
7fb67c1
updating
pmpailis Jan 10, 2024
9092f37
setting k through SearchSourceBuilder
pmpailis Jan 10, 2024
05a0990
Merge remote-tracking branch 'origin/main' into feature/97533
pmpailis Jan 10, 2024
86a72e9
removing unused method
pmpailis Jan 10, 2024
6f1ce26
removing unused imports
pmpailis Jan 11, 2024
8729217
pr iter - updating documentation
pmpailis Jan 12, 2024
a1237a5
pr iter - removing unecessary check from tests
pmpailis Jan 12, 2024
1aa708b
pr iter - rolling back changes to deprecated class
pmpailis Jan 12, 2024
4082cf7
pr iter - setting default requestSize to DEFAULT_SIZE in SearchExecut…
pmpailis Jan 12, 2024
78a7fc0
minor iter
pmpailis Jan 12, 2024
2d760f7
Merge remote-tracking branch 'origin/main' into feature/97533
pmpailis Jan 12, 2024
d1b457c
minor iter
pmpailis Jan 12, 2024
2b5d15d
Merge remote-tracking branch 'origin/main' into feature/97533
pmpailis Jan 30, 2024
6b1c94e
iter
pmpailis Jan 30, 2024
cfee774
iter
pmpailis Jan 30, 2024
d1a072d
Update test/framework/src/main/java/org/elasticsearch/test/AbstractQu…
pmpailis Jan 30, 2024
7fc1157
Update test/framework/src/main/java/org/elasticsearch/test/AbstractQu…
pmpailis Jan 30, 2024
208a483
Merge remote-tracking branch 'origin/main' into feature/97533
pmpailis Jan 30, 2024
390aff2
restoring final variables & updating TransportVersion
pmpailis Jan 30, 2024
8965d58
renaming TransportVersion
pmpailis Jan 30, 2024
69f92f6
Merge branch 'main' into feature/97533
pmpailis Feb 1, 2024
d2cb4b2
updating exception message
pmpailis Feb 1, 2024
9121b82
Merge branch 'feature/97533' of github.com:pmpailis/elasticsearch int…
pmpailis Feb 1, 2024
f964fa8
checkstyle
pmpailis Feb 1, 2024
5d195e3
Update KnnVectorQueryBuilder.java
pmpailis Feb 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/101209.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 101209
summary: "Making `k` and `num_candidates` optional for knn search"
area: Vector Search
type: enhancement
issues:
- 97533
4 changes: 2 additions & 2 deletions docs/reference/query-dsl/knn-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,10 @@ as the vector field you are searching against.
`num_candidates`::
+
--
(Required, integer) The number of nearest neighbor candidates to consider per shard.
(Optional, integer) The number of nearest neighbor candidates to consider per shard.
Cannot exceed 10,000. {es} collects `num_candidates` results from each shard, then
merges them to find the top results. Increasing `num_candidates` tends to improve the
accuracy of the final results.
accuracy of the final results. Defaults to `Math.min(1.5 * size, 10_000)`.
--

`filter`::
Expand Down
9 changes: 5 additions & 4 deletions docs/reference/rest-api/common-parms.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -584,14 +584,15 @@ end::knn-filter[]

tag::knn-k[]
Number of nearest neighbors to return as top hits. This value must be less than
`num_candidates`.
`num_candidates`. Defaults to `size`.
end::knn-k[]

tag::knn-num-candidates[]
The number of nearest neighbor candidates to consider per shard. Cannot exceed
10,000. {es} collects `num_candidates` results from each shard, then merges them
The number of nearest neighbor candidates to consider per shard.
Needs to be greater than `k`, or `size` if `k` is omitted, and cannot exceed 10,000.
{es} collects `num_candidates` results from each shard, then merges them
to find the top `k` results. Increasing `num_candidates` tends to improve the
accuracy of the final `k` results.
accuracy of the final `k` results. Defaults to `Math.min(1.5 * k, 10_000)`.
end::knn-num-candidates[]

tag::knn-query-vector[]
Expand Down
8 changes: 4 additions & 4 deletions docs/reference/search/knn-search.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -102,22 +102,22 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=routing]
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=knn-filter]

`knn`::
(Required, object)
(Required, object)
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=knn]
+
.Properties of `knn` object
[%collapsible%open]
====
`field`::
(Required, string)
(Required, string)
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=knn-field]

`k`::
(Required, integer)
(Optional, integer)
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=knn-k]

`num_candidates`::
(Required, integer)
(Optional, integer)
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=knn-num-candidates]

`query_vector`::
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/search/search.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -505,11 +505,11 @@ include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=knn-field]
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=knn-filter]

`k`::
(Required, integer)
(Optional, integer)
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=knn-k]

`num_candidates`::
(Required, integer)
(Optional, integer)
include::{es-repo-dir}/rest-api/common-parms.asciidoc[tag=knn-num-candidates]

`query_vector`::
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
setup:
- skip:
version: ' - 8.12.99'
reason: '[k] and [num_candidates] were made optional for kNN search in 8.13.0'
- do:
indices.create:
index: knn_search_test_index
body:
mappings:
properties:
vector:
type: dense_vector
dims: 5
index: true
similarity: l2_norm

- do:
index:
index: knn_search_test_index
id: "1"
body:
vector: [1.0, -10.5, 1.3, 0.593, 41]

- do:
index:
index: knn_search_test_index
id: "2"
body:
vector: [-0.5, 100.0, -13, 14.8, -156.0]

- do:
index:
index: knn_search_test_index
id: "3"
body:
vector: [0.5, 111.3, -13.0, 14.8, -156.0]

- do:
indices.refresh: {}

---
"kNN with missing k param using default size":
- do:
search:
rest_total_hits_as_int: true
index: knn_search_test_index
body:
knn:
field: vector
query_vector: [-0.5, 90.0, -10, 14.8, -156.0]
num_candidates: 10

- match: {hits.total: 3}

---
"kNN with missing k param using provided size":
- do:
search:
rest_total_hits_as_int: true
index: knn_search_test_index
body:
knn:
field: vector
query_vector: [-0.5, 90.0, -10, 14.8, -156.0]
num_candidates: 10
size: 2

- match: {hits.total: 2}

---
"kNN search with missing num_candidates param":

- do:
search:
rest_total_hits_as_int: true
index: knn_search_test_index
body:
knn:
field: vector
query_vector: [-0.5, 90.0, -10, 14.8, -156.0]
k: 2

- match: {hits.total: 2}

---
"kNN search with missing both k and num_candidates param - default size":
- do:
search:
rest_total_hits_as_int: true
index: knn_search_test_index
body:
knn:
field: vector
query_vector: [-0.5, 90.0, -10, 14.8, -156.0]

- match: {hits.total: 3}


---
"kNN search with missing both k and num_candidates param - provided size":

- do:
search:
rest_total_hits_as_int: true
index: knn_search_test_index
body:
knn:
field: vector
query_vector: [-0.5, 90.0, -10, 14.8, -156.0]
size: 2

- match: {hits.total: 2}

---
"kNN search with missing k, and num_candidates < size":

- do:
catch: bad_request
search:
index: knn_search_test_index
body:
knn:
field: vector
query_vector: [-0.5, 90.0, -10, 14.8, -156.0]
num_candidates: 2
size: 10

---
"kNN search with missing k, default size, and invalid num_candidates":

- do:
catch: bad_request
search:
index: knn_search_test_index
body:
knn:
field: vector
query_vector: [-0.5, 90.0, -10, 14.8, -156.0]
num_candidates: 2
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
setup:
- skip:
version: ' - 8.12.99'
reason: '[k] and [num_candidates] were made optional for kNN query in 8.13.0'
- do:
indices.create:
index: knn_query_test_index
body:
mappings:
properties:
vector:
type: dense_vector
dims: 3
index: true
similarity: l2_norm
category:
type: keyword
nested:
type: nested
properties:
paragraph_id:
type: keyword
vector:
type: dense_vector
dims: 5
index: true
similarity: l2_norm

- do:
index:
index: knn_query_test_index
id: "1"
body:
vector: [1.0, 1.0, 0.0]
category: A
nested:
- paragraph_id: 0
vector: [ 230.0, 300.33, -34.8988, 15.555, -200.0 ]
- paragraph_id: 1
vector: [ 240.0, 300, -3, 1, -20 ]

- do:
index:
index: knn_query_test_index
id: "2"
body:
vector: [1.0, 0.5, 1.0]
category: A
nested:
- paragraph_id: 2
vector: [ 0, 100.0, 0, 14.8, -156.0 ]

- do:
index:
index: knn_query_test_index
id: "3"
body:
vector: [-1, -1, -1]
category: B
nested:
- paragraph_id: 0
vector: [ 100, 200.0, 300, 14.8, -156.0 ]

- do:
indices.refresh: {}

---
"kNN query with missing num_candidates param - default size":

- do:
search:
rest_total_hits_as_int: true
index: knn_query_test_index
body:
query:
knn:
field: vector
query_vector: [0, 0, 0]

- match: { hits.total: 3 }

---
"kNN query with missing num_candidates param - size provided":
- do:
search:
rest_total_hits_as_int: true
index: knn_query_test_index
body:
query:
knn:
field: vector
query_vector: [1, 1, 1]
size: 1
- match: { hits.total: 2 } # due to num_candidates defined as round(1.5 * size), so we only see 2 results
- length: { hits.hits: 1 } # one result is only returned though

---
"kNN query with num_candidates less than size":

- do:
search:
rest_total_hits_as_int: true
index: knn_query_test_index
body:
query:
knn:
field: vector
query_vector: [-1, -1, -1]
num_candidates: 1
size: 10

- match: { hits.total: 1 }


---
"kNN query in a bool clause - missing num_candidates":
- do:
search:
rest_total_hits_as_int: true
index: knn_query_test_index
body:
query:
bool:
must:
- term:
category: A
- knn:
field: vector
query_vector: [ 1, 1, 0]
size: 1

- match: { hits.total: 2 } # due to num_candidates defined as round(1.5 * size), so we only see 2 results from cat:A
- length: { hits.hits: 1 }

---
"kNN search in a dis_max query - missing num_candidates":
- do:
search:
index: knn_query_test_index
body:
query:
dis_max:
queries:
- knn:
field: vector
query_vector: [1, 1, 0]
- match:
category: B
tie_breaker: 0.8
size: 1

- match: { hits.total.value: 3 } # 2 knn result + 1 extra from match query
- length: { hits.hits: 1 }

---
"kNN search used in nested field - missing num_candidates":
- do:
search:
index: knn_query_test_index
body:
query:
nested:
path: nested
query:
knn:
field: nested.vector
query_vector: [ -0.5, 90.0, -10, 14.8, -156.0 ]
inner_hits: { size: 1, "fields": [ "nested.paragraph_id" ], _source: false }
size: 1

- match: { hits.total.value: 2 }
- length: { hits.hits: 1 }
Loading