Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sub-fields of aggregate_metric_double field type optional #74145

Open
csoulios opened this issue Jun 15, 2021 · 14 comments
Open

Make sub-fields of aggregate_metric_double field type optional #74145

csoulios opened this issue Jun 15, 2021 · 14 comments
Labels
>enhancement feedback_needed :Search Foundations/Mapping Index mappings, including merging and defining field types stalled :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch Team:StorageEngine

Comments

@csoulios
Copy link
Contributor

csoulios commented Jun 15, 2021

Currently, when a user defines a field of type aggregate_metric_double with any of the supported summary sub-fields (min, max, value_count, sum), all inserted documents are required to contain all sub-fields. We have found this approach to be very restrictive and we should relax this constraint.

Perhaps, we should allow some (or all?) sub-fields of an aggregate_metric_double field to be empty. In this case we should decide how we handle the following cases:

  • If an aggregation process a sub-field that is empty (null), we handle this as the aggregation would handle any document with a missing field.
  • What happens if the default_metric sub-field is empty? We rely on this field for delegating many operations.

I am opening this ticket for discussion so that we plan implementing it.

@csoulios csoulios added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data labels Jun 15, 2021
@csoulios csoulios self-assigned this Jun 15, 2021
@elasticmachine elasticmachine added Team:Search Meta label for search team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jun 15, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@csoulios
Copy link
Contributor Author

csoulios commented Mar 9, 2022

@axw we discussed about this issue with the team and we would like to ask what is the use case that fixing this issue would solve for you. This will help us understand if relaxing this constrain is the best solution for your use case.

@axw
Copy link
Member

axw commented Mar 14, 2022

@csoulios the use-case is for APM to store custom application metrics. Different metrics sources will provide different sets of summary metrics. Often it will just be sum+count. Other times min and max will also be available. We never know ahead of time.

@csoulios
Copy link
Contributor Author

@axw Thanks for the answer. I have a follow up question on this. Why can't you define different fields (one for each summary metric), so you can have a field to store sum+count and another field to store min+max. In this case you will define the supported metrics for each field in their mapping.

@axw
Copy link
Member

axw commented Mar 14, 2022

Why can't you define different fields (one for each summary metric), so you can have a field to store sum+count and another field to store min+max.

We don't know the full set of metric names up front. They're application-defined, not within APM's control.

Metrics come to APM Server something like this:

{
  "metrics": {
    "count_and_sum": {
      "count": 100,
      "sum": 500
    },
    "count_and_sum_and_minmax": {
      "count": 100,
      "sum": 500,
      "min": 0,
      "max": 500
    }
  }
}

The names "count_and_sum" and "count_and_sum_and_minmax" are completely arbitrary, and unknown to APM Server. We dynamically map these metrics as individual fields, so the resulting document should look like this:

{
  "count_and_sum": {
    "value_count": 100,
    "sum": 500
  },
  "count_and_sum_and_minmax": {
    "count": 100,
    "sum": 500,
    "min": 0,
    "max": 500
  }
}

Each of those fields would be mapped as aggregate_metric_double based on the metric type, not on the names.

@csoulios
Copy link
Contributor Author

Thanks, let me discuss this again with the team and I will get back to you

@wchaparro
Copy link
Member

We'd like to discuss this one a bit more with APM, looking for some times to coordinate discussions.

@axw
Copy link
Member

axw commented Apr 1, 2022

@wchaparro @csoulios I've been chatting with @felixbarny about this issue, and our needs in APM. Re-reading the description, maybe we can reduce the scope a bit?

In APM we would use the field as follows:

  • we will always store value_count and sum; we can set the default_metric to either sum or value_count (undecided which one)
  • we will sometimes store min and max as well

So for APM, value_count, sum, and default_metric need not be optional -- only min and max.

Regarding making min and max optional, Felix asked me a good question: how does Elasticsearch currently handle aggregation over two aggregate_metric_double fields when one does and one doesn't have those fields? e.g.

PUT with
{
  "mappings": {
    "properties": {
      "my-agg-metric-field": {
        "type": "aggregate_metric_double",
        "metrics": [ "min", "max", "sum", "value_count" ],
        "default_metric": "value_count"
      }
    }
  }
}

PUT without
{
  "mappings": {
    "properties": {
      "my-agg-metric-field": {
        "type": "aggregate_metric_double",
        "metrics": [ "sum", "value_count" ],
        "default_metric": "value_count"
      }
    }
  }
}

POST /with/_doc
{
  "my-agg-metric-field": {
    "sum": 100,
    "value_count": 10,
    "min": 0,
    "max": 20
  }
}

POST /without/_doc
{
  "my-agg-metric-field": {
    "sum": 100,
    "value_count": 10
  }
}

GET /with,without/_search
{
  "size": 0,
  "aggs": {
    "min": {
      "min": {
        "field": "my-agg-metric-field"
      }
    },
    "max": {
      "max": {
        "field": "my-agg-metric-field"
      }
    },
    "sum": {
      "sum": {
        "field": "my-agg-metric-field"
      }
    },
    "value_count": {
      "value_count": {
        "field": "my-agg-metric-field"
      }
    }
  }
}

Results in:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "min" : {
      "value" : 0.0
    },
    "max" : {
      "value" : 20.0
    },
    "sum" : {
      "value" : 200.0
    },
    "value_count" : {
      "value" : 20
    }
  }
}

If you aggregate over just without, then min and max aggs return null.

Seems to me that aggregating over a single index/field should behave exactly the same.

@wchaparro
Copy link
Member

wchaparro commented Jun 23, 2023

@csoulios planning to close this one in favor of current discussions on downsampling. Please reopen if needed.

@wchaparro wchaparro closed this as not planned Won't fix, can't repro, duplicate, stale Jun 23, 2023
@felixbarny
Copy link
Member

felixbarny commented Jul 7, 2023

@wchaparro is there an equivalent of this issue in the context of downsampling? Note that in APM, we're using aggregate_metric_double outside of the context of downsampling.

We just define summary metrics as a pair of sum and value_count, and ignore all other sub fields. So we have a workaround but I don't really understand why this got closed unless this is a duplicate.

@wchaparro
Copy link
Member

reopening this one - tagging with Downsampling, to track going forward.

@wchaparro wchaparro reopened this Jul 13, 2023
@wchaparro wchaparro removed the :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data label Jul 13, 2023
@wchaparro wchaparro added the :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data label Jul 13, 2023
@wchaparro wchaparro removed the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 21, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement feedback_needed :Search Foundations/Mapping Index mappings, including merging and defining field types stalled :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch Team:StorageEngine
Projects
None yet
Development

No branches or pull requests

7 participants