Skip to content

Commit 61c7483

Browse files
Make knn search a query (#98916)
This introduced a new knn query: - knn query is executed during the Query phase similar to all other queries. - No k parameter, k defaults to size - num_candidates is a size of queue for candidates to consider while search a graph on each shard - For aggregations: "size" results are collected with total = size * shards. Aggregations will see size * shards results. - All filters from DSL are applied as post-filters, except: 1) alias filter is applied as pre-filter or 2) a filter provided as a parameter inside knn query.
1 parent 41f09fb commit 61c7483

File tree

19 files changed

+1346
-193
lines changed

19 files changed

+1346
-193
lines changed

docs/changelog/98916.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 98916
2+
summary: Make knn search a query
3+
area: Vector Search
4+
type: feature
5+
issues: []
Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
[[query-dsl-knn-query]]
2+
=== Knn query
3+
++++
4+
<titleabbrev>Knn</titleabbrev>
5+
++++
6+
7+
Finds the _k_ nearest vectors to a query vector, as measured by a similarity
8+
metric. _knn_ query finds nearest vectors through approximate search on indexed
9+
dense_vectors. The preferred way to do approximate kNN search is through the
10+
<<knn-search,top level knn section>> of a search request. _knn_ query is reserved for
11+
expert cases, where there is a need to combine this query with other queries.
12+
13+
[[knn-query-ex-request]]
14+
==== Example request
15+
16+
[source,console]
17+
----
18+
PUT my-image-index
19+
{
20+
"mappings": {
21+
"properties": {
22+
"image-vector": {
23+
"type": "dense_vector",
24+
"dims": 3,
25+
"index": true,
26+
"similarity": "l2_norm"
27+
},
28+
"file-type": {
29+
"type": "keyword"
30+
}
31+
}
32+
}
33+
}
34+
----
35+
36+
. Index your data.
37+
+
38+
[source,console]
39+
----
40+
POST my-image-index/_bulk?refresh=true
41+
{ "index": { "_id": "1" } }
42+
{ "image-vector": [1, 5, -20], "file-type": "jpg" }
43+
{ "index": { "_id": "2" } }
44+
{ "image-vector": [42, 8, -15], "file-type": "png" }
45+
{ "index": { "_id": "3" } }
46+
{ "image-vector": [15, 11, 23], "file-type": "jpg" }
47+
----
48+
//TEST[continued]
49+
50+
. Run the search using the `knn` query, asking for the top 3 nearest vectors.
51+
+
52+
[source,console]
53+
----
54+
POST my-image-index/_search
55+
{
56+
"size" : 3,
57+
"query" : {
58+
"knn": {
59+
"field": "image-vector",
60+
"query_vector": [-5, 9, -12],
61+
"num_candidates": 10
62+
}
63+
}
64+
}
65+
----
66+
//TEST[continued]
67+
68+
NOTE: `knn` query doesn't have a separate `k` parameter. `k` is defined by
69+
`size` parameter of a search request similar to other queries. `knn` query
70+
collects `num_candidates` results from each shard, then merges them to get
71+
the top `size` results.
72+
73+
74+
[[knn-query-top-level-parameters]]
75+
==== Top-level parameters for `knn`
76+
77+
`field`::
78+
+
79+
--
80+
(Required, string) The name of the vector field to search against. Must be a
81+
<<index-vectors-knn-search, `dense_vector` field with indexing enabled>>.
82+
--
83+
84+
`query_vector`::
85+
+
86+
--
87+
(Required, array of floats) Query vector. Must have the same number of dimensions
88+
as the vector field you are searching against.
89+
--
90+
91+
`num_candidates`::
92+
+
93+
--
94+
(Required, integer) The number of nearest neighbor candidates to consider per shard.
95+
Cannot exceed 10,000. {es} collects `num_candidates` results from each shard, then
96+
merges them to find the top results. Increasing `num_candidates` tends to improve the
97+
accuracy of the final results.
98+
--
99+
100+
`filter`::
101+
+
102+
--
103+
(Optional, query object) Query to filter the documents that can match.
104+
The kNN search will return the top documents that also match this filter.
105+
The value can be a single query or a list of queries. If `filter` is not provided,
106+
all documents are allowed to match.
107+
108+
The filter is a pre-filter, meaning that it is applied **during** the approximate
109+
kNN search to ensure that `num_candidates` matching documents are returned.
110+
--
111+
112+
`similarity`::
113+
+
114+
--
115+
(Optional, float) The minimum similarity required for a document to be considered
116+
a match. The similarity value calculated relates to the raw
117+
<<dense-vector-similarity, `similarity`>> used. Not the document score. The matched
118+
documents are then scored according to <<dense-vector-similarity, `similarity`>>
119+
and the provided `boost` is applied.
120+
--
121+
122+
`boost`::
123+
+
124+
--
125+
(Optional, float) Floating point number used to multiply the
126+
scores of matched documents. This value cannot be negative. Defaults to `1.0`.
127+
--
128+
129+
`_name`::
130+
+
131+
--
132+
(Optional, string) Name field to identify the query
133+
--
134+
135+
[[knn-query-filtering]]
136+
==== Pre-filters and post-filters in knn query
137+
138+
There are two ways to filter documents that match a kNN query:
139+
140+
. **pre-filtering** – filter is applied during the approximate kNN search
141+
to ensure that `k` matching documents are returned.
142+
. **post-filtering** – filter is applied after the approximate kNN search
143+
completes, which results in fewer than k results, even when there are enough
144+
matching documents.
145+
146+
Pre-filtering is supported through the `filter` parameter of the `knn` query.
147+
Also filters from <<filter-alias,aliases>> are applied as pre-filters.
148+
149+
All other filters found in the Query DSL tree are applied as post-filters.
150+
For example, `knn` query finds the top 3 documents with the nearest vectors
151+
(num_candidates=3), which are combined with `term` filter, that is
152+
post-filtered. The final set of documents will contain only a single document
153+
that passes the post-filter.
154+
155+
156+
[source,console]
157+
----
158+
POST my-image-index/_search
159+
{
160+
"size" : 10,
161+
"query" : {
162+
"bool" : {
163+
"must" : {
164+
"knn": {
165+
"field": "image-vector",
166+
"query_vector": [-5, 9, -12],
167+
"num_candidates": 3
168+
}
169+
},
170+
"filter" : {
171+
"term" : { "file-type" : "png" }
172+
}
173+
}
174+
}
175+
}
176+
----
177+
//TEST[continued]
178+
179+
[[knn-query-with-nested-query]]
180+
==== Knn query inside a nested query
181+
182+
`knn` query can be used inside a nested query. The behaviour here is similar
183+
to <<nested-knn-search, top level nested kNN search>>:
184+
185+
* kNN search over nested dense_vectors diversifies the top results over
186+
the top-level document
187+
* `filter` over the top-level document metadata is supported and acts as a
188+
post-filter
189+
* `filter` over `nested` field metadata is not supported
190+
191+
A sample query can look like below:
192+
193+
[source,js]
194+
----
195+
{
196+
"query" : {
197+
"nested" : {
198+
"path" : "paragraph",
199+
"query" : {
200+
"knn": {
201+
"query_vector": [
202+
0.45,
203+
45
204+
],
205+
"field": "paragraph.vector",
206+
"num_candidates": 2
207+
}
208+
}
209+
}
210+
}
211+
}
212+
----
213+
// NOTCONSOLE
214+
215+
[[knn-query-aggregations]]
216+
==== Knn query with aggregations
217+
`knn` query calculates aggregations on `num_candidates` from each shard.
218+
Thus, the final results from aggregations contain
219+
`num_candidates * number_of_shards` documents. This is different from
220+
the <<knn-search,top level knn section>> where aggregations are
221+
calculated on the global top k nearest documents.
222+

docs/reference/query-dsl/special-queries.asciidoc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,10 @@ or collection of documents.
1717
This query finds queries that are stored as documents that match with
1818
the specified document.
1919

20+
<<query-dsl-knn-query,`knn` query>>::
21+
A query that finds the _k_ nearest vectors to a query
22+
vector, as measured by a similarity metric.
23+
2024
<<query-dsl-rank-feature-query,`rank_feature` query>>::
2125
A query that computes scores based on the values of numeric features and is
2226
able to efficiently skip non-competitive hits.
@@ -43,6 +47,8 @@ include::mlt-query.asciidoc[]
4347

4448
include::percolate-query.asciidoc[]
4549

50+
include::knn-query.asciidoc[]
51+
4652
include::rank-feature-query.asciidoc[]
4753

4854
include::script-query.asciidoc[]

docs/reference/search/search-your-data/knn-search.asciidoc

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ based on a similarity metric, the better its match.
4343
{es} supports two methods for kNN search:
4444

4545
* <<approximate-knn,Approximate kNN>> using the `knn` search
46-
option
46+
option or `knn` query
4747

4848
* <<exact-knn,Exact, brute-force kNN>> using a `script_score` query with a
4949
vector function
@@ -129,7 +129,8 @@ POST image-index/_bulk?refresh=true
129129
//TEST[continued]
130130
//TEST[s/\.\.\.//]
131131

132-
. Run the search using the <<search-api-knn, `knn` option>>.
132+
. Run the search using the <<search-api-knn, `knn` option>> or the
133+
<<query-dsl-knn-query,`knn` query>> (expert case).
133134
+
134135
[source,console]
135136
----

modules/percolator/src/internalClusterTest/java/org/elasticsearch/percolator/PercolatorQuerySearchIT.java

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99

1010
import org.apache.lucene.search.join.ScoreMode;
1111
import org.elasticsearch.ElasticsearchException;
12+
import org.elasticsearch.action.index.IndexRequestBuilder;
1213
import org.elasticsearch.action.search.MultiSearchResponse;
1314
import org.elasticsearch.action.search.SearchResponse;
1415
import org.elasticsearch.action.support.WriteRequest;
@@ -22,10 +23,12 @@
2223
import org.elasticsearch.index.query.MatchPhraseQueryBuilder;
2324
import org.elasticsearch.index.query.MultiMatchQueryBuilder;
2425
import org.elasticsearch.index.query.Operator;
26+
import org.elasticsearch.index.query.QueryBuilder;
2527
import org.elasticsearch.index.query.QueryBuilders;
2628
import org.elasticsearch.plugins.Plugin;
2729
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
2830
import org.elasticsearch.search.sort.SortOrder;
31+
import org.elasticsearch.search.vectors.KnnVectorQueryBuilder;
2932
import org.elasticsearch.test.ESIntegTestCase;
3033
import org.elasticsearch.xcontent.XContentBuilder;
3134
import org.elasticsearch.xcontent.XContentFactory;
@@ -1295,4 +1298,34 @@ public void testWithWildcardFieldNames() throws Exception {
12951298
).get();
12961299
assertEquals(1, response.getHits().getTotalHits().value);
12971300
}
1301+
1302+
public void testKnnQueryNotSupportedInPercolator() throws IOException {
1303+
String mappings = org.elasticsearch.common.Strings.format("""
1304+
{
1305+
"properties": {
1306+
"my_query" : {
1307+
"type" : "percolator"
1308+
},
1309+
"my_vector" : {
1310+
"type" : "dense_vector",
1311+
"dims" : 5,
1312+
"index" : true,
1313+
"similarity" : "l2_norm"
1314+
}
1315+
1316+
}
1317+
}
1318+
""");
1319+
indicesAdmin().prepareCreate("index1").setMapping(mappings).get();
1320+
ensureGreen();
1321+
QueryBuilder knnVectorQueryBuilder = new KnnVectorQueryBuilder("my_vector", new float[] { 1, 1, 1, 1, 1 }, 10, null);
1322+
1323+
IndexRequestBuilder indexRequestBuilder = client().prepareIndex("index1")
1324+
.setId("knn_query1")
1325+
.setSource(jsonBuilder().startObject().field("my_query", knnVectorQueryBuilder).endObject());
1326+
1327+
DocumentParsingException exception = expectThrows(DocumentParsingException.class, () -> indexRequestBuilder.get());
1328+
assertThat(exception.getMessage(), containsString("the [knn] query is unsupported inside a percolator"));
1329+
}
1330+
12981331
}

modules/percolator/src/main/java/org/elasticsearch/percolator/PercolatorFieldMapper.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@
6161
import org.elasticsearch.index.query.QueryShardException;
6262
import org.elasticsearch.index.query.Rewriteable;
6363
import org.elasticsearch.index.query.SearchExecutionContext;
64+
import org.elasticsearch.search.vectors.KnnVectorQueryBuilder;
6465
import org.elasticsearch.xcontent.XContentParser;
6566

6667
import java.io.ByteArrayOutputStream;
@@ -438,6 +439,8 @@ static QueryBuilder parseQueryBuilder(DocumentParserContext context) {
438439
throw new IllegalArgumentException("the [has_child] query is unsupported inside a percolator query");
439440
} else if (queryName.equals("has_parent")) {
440441
throw new IllegalArgumentException("the [has_parent] query is unsupported inside a percolator query");
442+
} else if (queryName.equals(KnnVectorQueryBuilder.NAME)) {
443+
throw new IllegalArgumentException("the [knn] query is unsupported inside a percolator query");
441444
}
442445
});
443446
} catch (IOException e) {

0 commit comments

Comments
 (0)