-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Make flattened synthetic source concatenate object keys on scalar/object mismatch #129600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Make flattened synthetic source concatenate object keys on scalar/object mismatch #129600
Conversation
.../main/java/org/elasticsearch/index/mapper/flattened/FlattenedFieldSyntheticWriterHelper.java
Outdated
Show resolved
Hide resolved
.../main/java/org/elasticsearch/index/mapper/flattened/FlattenedFieldSyntheticWriterHelper.java
Outdated
Show resolved
Hide resolved
next = nextValue == null ? KeyValue.EMPTY : new KeyValue(nextValue); | ||
|
||
var startPrefix = curr.prefix.diff(openObjects); | ||
if (startPrefix.prefix.isEmpty() == false && startPrefix.prefix.getFirst().equals(lastScalarSingleLeaf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the conflict doesn't happen on the first part of the prefix, e.g.
field {
path {
to: 10
to {
foo: bar
}
}
}
Would this be caught here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, so this will become:
field {
path {
to: 10
to.foo: bar
}
}
When it get to the first key/value field.path.to|10
it will take the else block and traverse down into the object, adding field
and path
to the openObject
context. When it reaches the key value field.path.to.foo|bar
that object will still be open, and seeing that lastScalarSingleLeaf
has a value of to
, and that to
is the first token in the startPrexix (to.foo
), it will make a concatenated path.
(Updated a test to this situation to verify it)
.../main/java/org/elasticsearch/index/mapper/flattened/FlattenedFieldSyntheticWriterHelper.java
Outdated
Show resolved
Hide resolved
@@ -103,7 +103,7 @@ public void testSingleObject() throws IOException { | |||
|
|||
// THEN | |||
assertEquals( | |||
"{\"a\":\"value_a\",\"a\":{\"b\":\"value_b\",\"b\":{\"c\":\"value_c\"},\"d\":\"value_d\"}}", | |||
"{\"a\":\"value_a\",\"a.b\":\"value_b\",\"a.b.c\":\"value_c\",\"a.d\":\"value_d\"}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, there was a test which had a scalar/object mismatch. But it produced duplicate keys. When the xcontent was converted to jsont these duplicate keys originally threw an error, but now just drop the duplicates. (Something must have changed in xcontent stuff since the issue was opened to cause this change from an error to deduplication)
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Hi @parkertimmins, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document this? Otherwise LGTM
.../main/java/org/elasticsearch/index/mapper/flattened/FlattenedFieldSyntheticWriterHelper.java
Outdated
Show resolved
Hide resolved
.../main/java/org/elasticsearch/index/mapper/flattened/FlattenedFieldSyntheticWriterHelper.java
Outdated
Show resolved
Hide resolved
// In the open object, there is a leaf with a scalar value, which is also the first | ||
// part of the current path. Instead of traversing down into the path and building objects, | ||
// combine the path into a single leaf and add it as a field. | ||
if (curr.pathEquals(next) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what this if
does to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if
is where we add fields to the output. Specifically, fields that need to be specified with the concatenated path due to a scalar/object mismatch. The similar if
in the else block below serves the same purpose. The if
is necessary to handle multiple values for the same field. For example if we have the following key/value pairs (where I'm using |
instead of \0
to separate key and value):
foo|5
foo.bar|10
foo.bar|20
baz|99
Assume we've already written out foo|5
. When we go through the loop for foo.bar|10
, we'll add 10 to the values array (which will be empty) at the beginning of the loop. But when we get to this if
statement, we'll see that the next pair (foo.bar|20
) has the same path foo.bar
. Because of this we won't enter the if
and won't write out the field yet. In the next loop iteration, curr is now foo.bar|20
. We'll add 20 to values, making it [10, 20]. When we get to this if
statement, we see that next path is baz
. Since this not equal to foo.bar
we know that we have seen all values for foo.bar
. So we enter the if statement and write out all values for the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good example, consider adding it to the comment above, or splitting some of that before the first branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @parkertimmins, i think it's worth a comment too. Alternatively it sounds like we could restructure this to first read all the values for a field (move the if into a separate loop above this one) and then write them but this is not critical of course.
🔍 Preview links for changed docs: 🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes. |
`"bar"` has both a scalar value `"10"`, and an object value of `{ "baz": "20" }`. | ||
|
||
With synthetic source, objects with such duplicate fields will appear | ||
differently in `_source`. For example, if the above field has synthetic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More accurately: if the field is defined in an index configured with synthetic source.
This is because, when constructing the source, the key `"foo.bar"` is eagerly expanding into the key `"foo"` with an object value. | ||
Then `"bar"` is added to the object with the scalar value of `"10"`. Then, because another | ||
`"bar"` cannot be added with an object value, `"bar"` and `"baz"` are collapsed | ||
in the flat key `"bar.baz"` with a scalar value of `"20"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably skip this paragraph, and maybe mention above:
... to produce a valid JSON output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, let's not go into the details
var syntheticSource = syntheticSource(mapper, b -> { | ||
b.startObject("field"); | ||
{ | ||
b.field("key1.key2", "foo"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add also: `b.field("key1", "baz");
final FlattenedFieldSyntheticWriterHelper writer = new FlattenedFieldSyntheticWriterHelper(new SortedSetSortedKeyedValues(dv)); | ||
final ByteArrayOutputStream baos = new ByteArrayOutputStream(); | ||
final XContentBuilder builder = new XContentBuilder(XContentType.JSON.xContent(), baos); | ||
final List<byte[]> bytes = List.of("a.b.c" + '\0' + "10", "a.b.c.d" + '\0' + "20") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add something for a
and a.b
.
}); | ||
assertThat(syntheticSource, equalTo(""" | ||
{"field":{"key1":{"key2":"foo","key2.key3":"bar"}}}""")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also add an example with arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, some minor comments to address but otherwise it looks good.
@@ -364,3 +364,42 @@ Will become (note the nested objects instead of the "flattened" array): | |||
} | |||
} | |||
``` | |||
|
|||
Flattened fields allow for a duplicate key to contain both an object and a scalar value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flattened fields allow for a duplicate key to contain both an object and a scalar value. | |
Flattened fields allow for a key to contain both an object and a scalar value. |
This is because, when constructing the source, the key `"foo.bar"` is eagerly expanding into the key `"foo"` with an object value. | ||
Then `"bar"` is added to the object with the scalar value of `"10"`. Then, because another | ||
`"bar"` cannot be added with an object value, `"bar"` and `"baz"` are collapsed | ||
in the flat key `"bar.baz"` with a scalar value of `"20"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, let's not go into the details
There is an issue where for Flattened fields with synthetic source, if there is a key with a scalar value, and a duplicate key with an object value, one of the values will be left out of the produced synthetic source.
This fixes the issue by replacing the problematic object with paths to each of its keys. These paths consist of the concatenation of all keys going down to a given scalar, joined by
.
. For example, they are of the formfoo.bar.baz
. This applies recursively, so that every value within the object, no matter how nested, will be accessible through a full specified path.For example if the following flattened field values is indexed:
The following synthetic source will be produced:
Fixes #122936