Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement aggregate_metric field mapper #49830

Merged
merged 31 commits into from
Mar 31, 2020

Conversation

csoulios
Copy link
Contributor

@csoulios csoulios commented Dec 4, 2019

This PR addresses part of #42720 creating a field mapper for storing pre-computed aggregate metrics such as min, max, sum and value_count

@csoulios csoulios added :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data WIP labels Dec 4, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Rollup)

@csoulios csoulios added the v8.0.0 label Dec 4, 2019
@csoulios csoulios requested review from jimczi and iverase December 4, 2019 14:02
Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @csoulios.

I think the parse logic should be on the parse method not the parseCreateField method. The latter is used (I think) for dynamic mapping and I don't think this field allow that.

I am not sure the leniency in the parse logic. I would like to confirm it but if there is an error on one of the metrics (does not exist or have an incorrect value) we should stop generating the doc value if ignore malformed is true.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @csoulios . I think you can use a dedicated module for this field mapper (see mapper-flattened), no need to put it in the rollup module.
I left the other comments in the pr.

Copy link
Contributor

@polyfractal polyfractal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments! :)

Open question for discussion: since this is isolated functionality at the moment (can configure but not actually use the fields yet), should we potentially merge to a feature branch instead? Not sure of our plans to backport to 7x, but it would make a potential backport (or revert) easier too.

"] must contain all metrics " + metrics.value().toString());
}

if (fieldType().hasDocValues()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should override setHasDocValues() on the MappedFieldType so that it throws an exception if the user tries to disable doc values? Since we don't index anything and only store doc values, seems like we shouldn't let the user disable them?

Not sure, WDYT?

@csoulios
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc
@elasticmachine run elasticsearch-ci/2

Make parsing of malformed values stricter and
ignore all field metrics even if a single metric
is wrong
@csoulios
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc
@elasticmachine run elasticsearch-ci/default-distro

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good @csoulios, I left some comments regarding the tests. Can you also create a branch to merge this pr (can be in your fork) since we said that we'll need metrics aggregation support to publish in master ?

builder.store(false);
builder.index(true);
builder.docValues(true);
builder.fieldType().setDocValuesType(DocValuesType.NUMERIC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these options are set by default in the NumberFieldMapper.Builder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

public Query rangeQuery(Object lowerTerm, Object upperTerm, boolean includeLower, boolean includeUpper, QueryShardContext context) {
return delegateFieldType().rangeQuery(lowerTerm, upperTerm, includeLower, includeUpper, context);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should also override isFieldWithinQuery to handle range queries efficiently, fielddataBuilder to handle aggregations and sort, and docValueFormat to format double correctly.

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the idea now. It is looking good, just left a few comments for clarification.

@csoulios csoulios changed the base branch from master to feature/aggregate-metrics January 14, 2020 14:55
Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @csoulios. Easy to follow the mapping logic. I would only recommend to add a test that uses a parent object in the mapping so we make sure we are resolving the path correctly in the metric's docValues in that case.

@csoulios
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/1
@elasticmachine run elasticsearch-ci/packaging-sample-unix-docker

@csoulios csoulios merged commit 7b64d83 into elastic:feature/aggregate-metrics Mar 31, 2020
@csoulios csoulios deleted the rollups branch March 31, 2020 10:16
csoulios added a commit that referenced this pull request May 14, 2020
Following the implementation of the aggregate_metric_double field mapper(#49830) we are implementing the Min, Max, ValueCount, Sum and Average aggregations on aggregate metrics.

The code builds on the excellent work done for #42949 and uses the extensible ValuesSources infrastructure to wire up common metric aggregation on the aggregate_metric_double field type.

This PR is part of the rollups v2 refactoring as described in meta issue #42720
@talevy talevy added the Team:Deployment Management Meta label for Management Experience - Deployment Management team label Jul 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport pending :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data Team:Deployment Management Meta label for Management Experience - Deployment Management team v8.0.0-alpha1 WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants