Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for dots in field names for metrics usecases #86166

Merged
merged 33 commits into from
May 17, 2022
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
e4c9458
Add support for dots in field names for metrics usecases
javanna Apr 25, 2022
5d34e0f
Merge branch 'main' into enhancement/collapsed_objects
javanna Apr 26, 2022
af1d3fa
checkstyle
javanna Apr 26, 2022
64cb72b
iter
javanna Apr 26, 2022
b96ae79
add test
javanna Apr 26, 2022
8d47bfe
spotless
javanna Apr 26, 2022
c6c3b2c
Merge branch 'main' into enhancement/collapsed_objects
javanna Apr 26, 2022
f825e00
more tests
javanna Apr 26, 2022
7d12ca7
spotless
javanna Apr 26, 2022
227fe53
iter
javanna Apr 26, 2022
6175fc2
iter
javanna Apr 27, 2022
9761211
spotless
javanna Apr 28, 2022
33d570f
docs
javanna Apr 28, 2022
5034bc1
Update docs/changelog/86166.yaml
javanna Apr 28, 2022
4ebdd8f
Update docs/changelog/86166.yaml
javanna Apr 28, 2022
7a6143d
Update docs/changelog/86166.yaml
javanna Apr 28, 2022
3acbdfe
iter
javanna Apr 28, 2022
f48a4a1
add array of objects tests
javanna Apr 28, 2022
eab7d83
typo
javanna Apr 28, 2022
4ddb176
rename
javanna Apr 28, 2022
a5b5f9c
line length
javanna Apr 28, 2022
4df6111
changelog
javanna Apr 28, 2022
1430aff
rename leftover
javanna Apr 28, 2022
17bc4dd
spotless
javanna Apr 28, 2022
e5b0e20
Merge branch 'main' into enhancement/collapsed_objects
javanna May 16, 2022
ef89e05
iter
javanna May 16, 2022
df40355
changelog
javanna May 16, 2022
9e48d1b
iter
javanna May 16, 2022
cd3a6e5
iter
javanna May 16, 2022
cc7b420
add yaml test
javanna May 16, 2022
1ebb167
add test for synthetic source
javanna May 17, 2022
153a150
add another test for synthetic source
javanna May 17, 2022
6db29eb
update changelog
javanna May 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/86166.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 86166
summary: Add support for dots in field names for metrics usecases
area: Mapping
type: feature
issues:
- 63530
3 changes: 3 additions & 0 deletions docs/reference/mapping/params.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ The following mapping parameters are common to some or all field data types:
* <<analyzer,`analyzer`>>
* <<coerce,`coerce`>>
* <<copy-to,`copy_to`>>
* <<collapsed,`collapsed`>>
* <<doc-values,`doc_values`>>
* <<dynamic,`dynamic`>>
* <<eager-global-ordinals,`eager_global_ordinals`>>
Expand Down Expand Up @@ -41,6 +42,8 @@ include::params/coerce.asciidoc[]

include::params/copy-to.asciidoc[]

include::params/collapsed.asciidoc[]

include::params/doc-values.asciidoc[]

include::params/dynamic.asciidoc[]
Expand Down
102 changes: 102 additions & 0 deletions docs/reference/mapping/params/collapsed.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
[[collapsed]]
=== `collapsed`

When indexing a document or updating mappings, Elasticsearch accepts field that contain dots in their names.
which get expanded to their corresponding object structure. For instance, the field `metrics.time.max`
is mapped as a `max` leaf field with a parent `time` object, belonging to its parent `metrics` object.

The described default behaviour is reasonable for most scenarios, but causes problems in certain situations
where for instance a field `metrics.time` holds a value too, which is common when indexing metrics data.
A document holding a value for both `metrics.time.max` and `metrics.time` gets rejected given that `time`
would need to be a leaf field to hold a value as well as an object to hold the `max` sub-field.

The `collapsed` setting, which can be applied only to the top-level mapping definition and
to <<object,`object`>> fields, makes it possible to store metrics where field names contain dots and share
common prefixes. From the example above, if the object container `metrics` is collapsed, it can hold values
for both `time` and `time.max` directly without the need for any intermediate object, as dots in field
names are preserved.
javanna marked this conversation as resolved.
Show resolved Hide resolved

[source,console]
--------------------------------------------------
PUT my-index-000001
{
"mappings": {
"properties": {
"metrics": {
"type": "object",
"collapsed": true <1>
}
}
}
}

PUT my-index-000001/_doc/metric_1
{
"metrics.time" : 100, <2>
"metrics.time.min" : 10,
"metrics.time.max" : 900
}

PUT my-index-000001/_doc/metric_2
{
"metrics" : {
"time" : 100, <3>
"time.min" : 10,
"time.max" : 900
}
}

GET my-index-000001/_mapping
--------------------------------------------------

[source,console-result]
--------------------------------------------------
{
"metrics" : {
"type" : "object",
"collapsed" : true,
"properties" : {
"time" : {
"type" : "long"
},
"time.min" : { <4>
"type" : "long"
},
"time.max" : {
"type" : "long"
}
}
}
}
--------------------------------------------------

<1> The `metrics` field is collapsed.
<2> Sample document holding flat paths
<3> Sample document holding an object (mapped as collapsed) and its leaf sub-fields
<4> The resulting mapping where dots in field names were preserved

The entire mapping may be collapsed as well, in which case the document can
only ever hold leaf sub-fields:

[source,console]
--------------------------------------------------
PUT my-index-000001
{
"mappings": {
"collapsed": true <1>
}
}

PUT my-index-000001/_doc/metric_1
{
"time" : "100ms", <2>
"time.min" : "10ms",
"time.max" : "900ms"
}

--------------------------------------------------

<1> The entire mapping is collapsed.
<2> The document does not support objects

The `collapsed` setting for existing fields and the top-level mapping definition cannot be updated.
6 changes: 6 additions & 0 deletions docs/reference/mapping/types/object.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,12 @@ The following parameters are accepted by `object` fields:
Whether the JSON value given for the object field should be
parsed and indexed (`true`, default) or completely ignored (`false`).

<<collapsed,`collapsed`>>::

Whether sub-fields with dots in their names should be treated as leaves (`true`),
or their prefix should be expanded to their corresponding object structure (`false`, default).
A collapsed object can only ever hold leaf sub-fields and does not support further objects.

<<properties,`properties`>>::

The fields within the object, which can be of any
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ public final class ContentPath {

private String[] path = new String[10];

private boolean withinCollapsedPath = false;

public ContentPath() {
this(0);
}
Expand Down Expand Up @@ -54,6 +56,14 @@ public void remove() {
path[index--] = null;
}

public void setWithinCollapsedPath(boolean withinCollapsedPath) {
this.withinCollapsedPath = withinCollapsedPath;
}

public boolean isWithinCollapsedPath() {
return withinCollapsedPath;
}

public String pathAsText(String name) {
sb.setLength(0);
for (int i = offset; i < index; i++) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -445,7 +445,13 @@ private static void parseObject(final DocumentParserContext context, ObjectMappe
Mapper objectMapper = getMapper(context, mapper, currentFieldName);
if (objectMapper != null) {
context.path().add(currentFieldName);
if (objectMapper instanceof ObjectMapper objMapper) {
if (objMapper.isCollapsed()) {
context.path().setWithinCollapsedPath(true);
}
}
parseObjectOrField(context, objectMapper);
context.path().setWithinCollapsedPath(false);
context.path().remove();
javanna marked this conversation as resolved.
Show resolved Hide resolved
} else {
parseObjectDynamic(context, mapper, currentFieldName);
Expand Down Expand Up @@ -474,7 +480,13 @@ private static void parseObjectDynamic(DocumentParserContext context, ObjectMapp
throwOnCreateDynamicNestedViaCopyTo(dynamicObjectMapper);
}
context.path().add(currentFieldName);
if (dynamicObjectMapper instanceof ObjectMapper objectMapper) {
if (objectMapper.isCollapsed()) {
context.path().setWithinCollapsedPath(true);
}
}
parseObjectOrField(context, dynamicObjectMapper);
context.path().setWithinCollapsedPath(false);
context.path().remove();
}
}
Expand Down Expand Up @@ -789,7 +801,7 @@ protected String contentType() {

private static class NoOpObjectMapper extends ObjectMapper {
NoOpObjectMapper(String name, String fullPath) {
super(name, fullPath, Explicit.IMPLICIT_TRUE, Dynamic.RUNTIME, Collections.emptyMap());
super(name, fullPath, Explicit.IMPLICIT_TRUE, Explicit.IMPLICIT_FALSE, Dynamic.RUNTIME, Collections.emptyMap());
}
}

Expand All @@ -815,7 +827,11 @@ private static class InternalDocumentParserContext extends DocumentParserContext
XContentParser parser
) throws IOException {
super(mappingLookup, indexSettings, indexAnalyzers, parserContext, source);
this.parser = DotExpandingXContentParser.expandDots(parser);
if (mappingLookup.getMapping().getRoot().isCollapsed()) {
this.parser = parser;
} else {
this.parser = DotExpandingXContentParser.expandDots(parser, this.path::isWithinCollapsedPath);
}
this.document = new LuceneDocument();
this.documents.add(document);
this.maxAllowedNumNestedDocs = indexSettings().getMappingNestedDocsLimit();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,7 @@ public LuceneDocument doc() {
*/
public final DocumentParserContext createCopyToContext(String copyToField, LuceneDocument doc) throws IOException {
ContentPath path = new ContentPath(0);
XContentParser parser = DotExpandingXContentParser.expandDots(new CopyToParser(copyToField, parser()));
XContentParser parser = DotExpandingXContentParser.expandDots(new CopyToParser(copyToField, parser()), path::isWithinCollapsedPath);
return new Wrapper(this) {
@Override
public ContentPath path() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import java.util.Deque;
import java.util.List;
import java.util.Map;
import java.util.function.BooleanSupplier;
import java.util.function.Supplier;

/**
Expand All @@ -35,9 +36,11 @@ class DotExpandingXContentParser extends FilterXContentParserWrapper {

private static final class WrappingParser extends FilterXContentParser {

private final BooleanSupplier isWithinCollapsedPath;
final Deque<XContentParser> parsers = new ArrayDeque<>();

WrappingParser(XContentParser in) throws IOException {
WrappingParser(XContentParser in, BooleanSupplier isWithinCollapsedPath) throws IOException {
this.isWithinCollapsedPath = isWithinCollapsedPath;
parsers.push(in);
if (in.currentToken() == Token.FIELD_NAME) {
expandDots();
Expand All @@ -61,6 +64,11 @@ public Token nextToken() throws IOException {
}

private void expandDots() throws IOException {
// this handles fields that belong to collapsed objects, where the document contains the object which holds the flat fields
// e.g. { "metrics.service": { "time.max" : 10 } } with service being collapsed
if (isWithinCollapsedPath.getAsBoolean()) {
return;
}
String field = delegate().currentName();
String[] subpaths = splitAndValidatePath(field);
if (subpaths.length == 0) {
Expand All @@ -76,11 +84,13 @@ private void expandDots() throws IOException {
XContentLocation location = delegate().getTokenLocation();
Token token = delegate().nextToken();
if (token == Token.START_OBJECT || token == Token.START_ARRAY) {
parsers.push(new DotExpandingXContentParser(new XContentSubParser(delegate()), subpaths, location));
parsers.push(new DotExpandingXContentParser(new XContentSubParser(delegate()), subpaths, location, isWithinCollapsedPath));
} else if (token == Token.END_OBJECT || token == Token.END_ARRAY) {
throw new IllegalStateException("Expecting START_OBJECT or START_ARRAY or VALUE but got [" + token + "]");
} else {
parsers.push(new DotExpandingXContentParser(new SingletonValueXContentParser(delegate()), subpaths, location));
parsers.push(
new DotExpandingXContentParser(new SingletonValueXContentParser(delegate()), subpaths, location, isWithinCollapsedPath)
);
}
}

Expand Down Expand Up @@ -121,25 +131,26 @@ public List<Object> listOrderedMap() throws IOException {
}
}

private static String[] splitAndValidatePath(String fullFieldPath) {
if (fullFieldPath.isEmpty()) {
private static String[] splitAndValidatePath(String fieldName) {
if (fieldName.isEmpty()) {
throw new IllegalArgumentException("field name cannot be an empty string");
}
if (fullFieldPath.contains(".") == false) {
return new String[] { fullFieldPath };
if (fieldName.contains(".") == false) {
return new String[] { fieldName };
}
String[] parts = fullFieldPath.split("\\.");
String[] parts = fieldName.split("\\.");
if (parts.length == 0) {
throw new IllegalArgumentException("field name cannot contain only dots");
}

for (String part : parts) {
// check if the field name contains only whitespace
if (part.isEmpty()) {
throw new IllegalArgumentException("field name cannot contain only whitespace: ['" + fullFieldPath + "']");
throw new IllegalArgumentException("field name cannot contain only whitespace: ['" + fieldName + "']");
}
if (part.isBlank()) {
throw new IllegalArgumentException(
"field name starting or ending with a [.] makes object resolution ambiguous: [" + fullFieldPath + "]"
"field name starting or ending with a [.] makes object resolution ambiguous: [" + fieldName + "]"
);
}
}
Expand All @@ -151,8 +162,8 @@ private static String[] splitAndValidatePath(String fullFieldPath) {
* @param in the parser to wrap
* @return the wrapped XContentParser
*/
static XContentParser expandDots(XContentParser in) throws IOException {
return new WrappingParser(in);
static XContentParser expandDots(XContentParser in, BooleanSupplier isWithinCollapsedPath) throws IOException {
return new WrappingParser(in, isWithinCollapsedPath);
}

private enum State {
Expand All @@ -161,17 +172,24 @@ private enum State {
ENDING_EXPANDED_OBJECT
}

final String[] subPaths;
private final BooleanSupplier isWithinCollapsedPath;

private String[] subPaths;
private XContentLocation currentLocation;
private int expandedTokens = 0;
private int innerLevel = -1;
private State state = State.EXPANDING_START_OBJECT;

private DotExpandingXContentParser(XContentParser subparser, String[] subPaths, XContentLocation startLocation) {
private DotExpandingXContentParser(
XContentParser subparser,
String[] subPaths,
XContentLocation startLocation,
BooleanSupplier isWithinCollapsedPath
) {
super(subparser);
this.subPaths = subPaths;
this.currentLocation = startLocation;
this.isWithinCollapsedPath = isWithinCollapsedPath;
}

@Override
Expand All @@ -189,6 +207,25 @@ public Token nextToken() throws IOException {
}
// The expansion consists of adding pairs of START_OBJECT and FIELD_NAME tokens
if (expandedTokens % 2 == 0) {
int currentIndex = expandedTokens / 2;
// if there's more than one element left to expand and the parent is collapsed, we rewrite the array
// e.g. metrics.service.time.max -> ["metrics", "service", "time.max"]
if (currentIndex < subPaths.length - 1 && isWithinCollapsedPath.getAsBoolean()) {
String[] newSubPaths = new String[currentIndex + 1];
StringBuilder collapsedPath = new StringBuilder();
for (int i = 0; i < subPaths.length; i++) {
if (i < currentIndex) {
newSubPaths[i] = subPaths[i];
} else {
collapsedPath.append(subPaths[i]);
if (i < subPaths.length - 1) {
collapsedPath.append(".");
}
}
}
newSubPaths[currentIndex] = collapsedPath.toString();
subPaths = newSubPaths;
}
return Token.FIELD_NAME;
}
return Token.START_OBJECT;
Expand Down Expand Up @@ -232,7 +269,7 @@ public Token currentToken() {
@Override
public String currentName() throws IOException {
if (state == State.PARSING_ORIGINAL_CONTENT) {
assert expandedTokens == subPaths.length * 2 - 1;
// assert expandedTokens == subPaths.length * 2 - 1;
javanna marked this conversation as resolved.
Show resolved Hide resolved
// whenever we are parsing some inner object/array we can easily delegate to the inner parser
// e.g. field.with.dots: { obj:{ parsing here } }
if (innerLevel > 0) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ protected static void parseNested(String name, Map<String, Object> node, NestedO
private final Query nestedTypeFilter;

NestedObjectMapper(String name, String fullPath, Map<String, Mapper> mappers, Builder builder) {
super(name, fullPath, builder.enabled, builder.dynamic, mappers);
super(name, fullPath, builder.enabled, Explicit.IMPLICIT_FALSE, builder.dynamic, mappers);
if (builder.indexCreatedVersion.before(Version.V_8_0_0)) {
javanna marked this conversation as resolved.
Show resolved Hide resolved
this.nestedTypePath = "__" + fullPath;
} else {
Expand Down
Loading