Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model: align labels with ECS #3873

Closed
axw opened this issue Jun 11, 2020 · 16 comments · Fixed by #6633
Closed

model: align labels with ECS #3873

axw opened this issue Jun 11, 2020 · 16 comments · Fixed by #6633
Assignees
Milestone

Comments

@axw
Copy link
Member

axw commented Jun 11, 2020

ECS defines labels as key-value pairs, where all values are stored as keywords. APM also allows booleans and scaled_float values.

We should propose an extension to ECS.

@axw axw added the ecs label Jun 11, 2020
@axw
Copy link
Member Author

axw commented Jun 11, 2020

I think it's also worth considering extending to allow integers (long type in ES), to better support storing OpenTelemetry attributes. This isn't specific to traces/APM.

@zube
Copy link

zube bot commented Jun 17, 2020

simitt said: For starting work on ECS follow the process described in https://github.com/elastic/ecs/tree/master/rfcs

@simitt
Copy link
Contributor

simitt commented Jun 25, 2020

Allowing long could be problematic from an implementation point, for cases where the first value of an intended float would be e.g. 24.0 and the type long would be auto-derived from it instead of scaled_float. We specifically used scaled_float in the past to be less ambiguous.

@axw
Copy link
Member Author

axw commented Jun 26, 2020

This might be an unpopular opinion, but I'm starting to wonder if having dynamically-typed label fields is something we should continue. We see that sometimes users have a bad time because they changed the label type in their instrumentation, or they have multiple applications with colliding label names/different types.

Maybe we should split them into separate fields, like labels, double_labels, long_labels, and boolean_labels?

@simitt
Copy link
Contributor

simitt commented Jun 26, 2020

I don't see how this change would cover the issues arising from using the same lable name with different types in different applications, but otherwise agree that dynamic typing can be very painful here and adding the type to the name might improve the situation.

@jalvz
Copy link
Contributor

jalvz commented Jun 26, 2020

split them into separate fields, like labels, double_labels, long_labels, and boolean_labels

That would be the Go way, but APIs might feel very unidiomatic on pretty much every other language, thou.

@webmat
Copy link

webmat commented Jun 26, 2020

I like the idea of adding labels fields for different types. I'd like to avoid having an explosion of types though (e.g. half_float, float, double, scaled_float). So perhaps I would err on the side of using only the roomier versions of any given type (long and double). Happy to make exceptions if any are needed.

I think it may be acceptable define these new object fields within the existing labels object type:

  • labels.*: unless otherwise noted, are keyword
  • labels.long.*
  • labels.double.*
  • labels.boolean.*

We're currently starting to plan the ECS 2.0 breaking changes. This could be part of it.

On the other hand, labels sounds like a misnomer for numerics and booleans. We're open to ideas for capturing ad hoc metrics or other kinds of numerics :-)

cc @ebeahan

@axw
Copy link
Member Author

axw commented Jun 28, 2020

@simitt

I don't see how this change would cover the issues arising from using the same lable name with different types in different applications

They would go into different field names, so no indexing errors. i.e. app one records a string label "name": "value", and app two records a numeric label "name": 123.456. The first would be recorded in a field called labels.name, whereas the latter would be recorded in a field named double_labels.name.

@jalvz

split them into separate fields, like labels, double_labels, long_labels, and boolean_labels

That would be the Go way, but APIs might feel very unidiomatic on pretty much every other language, thou.

I'm not sure how programming languages are relevant here. I do think it's less than ideal from a consumption perspective to encode the type in the name, however it seems to be a recurring issue for users.

@webmat I like your counter proposal. I also think you have a good point about them being misnomers. They're probably more like request-scoped metrics. Will have to ponder on this, and see if we can capture some info on how people are using them...

@webmat
Copy link

webmat commented Jun 30, 2020

And just to be clear, if we do come up with a better naming strategy that's great. But if not, I think nesting it all under labels.[long|double|etc].* would probably be fine as well. I think users will get that this is for their custom stuff, even if it's a bit of a misnomer.

@jalvz
Copy link
Contributor

jalvz commented Jul 1, 2020

I'm not sure how programming languages are relevant here

I meant that users set labels via an agent API.

So eg. in Python users would expect elasticapm.label(thing=1) or elasticapm.label(thing='hi'); not elasticapm.label_int(thing=1), elasticapm.label_double(thing=1.0), etc.

Sure they could keep the same API and do type shenanigans inside the implementation (with some perf and complexity cost) to populate the right fields; but I'm not neglecting the problem to solve, only suggesting that all agent devs should agree to it.

Which I guess is a convoluted way to say that this issue would fit better in elastic/apm imo.
Just my 2c anyways.

@simitt
Copy link
Contributor

simitt commented Jul 1, 2020

@jalvz from my understanding adding the type to the label name was focused on the storage side, concerning ES field names and type inference.

Having an APM meta issue and opening up the discussion to also include the agent APIs and the server Intake API sounds good though.

@ebeahan
Copy link
Member

ebeahan commented Mar 31, 2021

I think it may be acceptable define these new object fields within the existing labels object type:

Is there still interest in expanding labels in ECS to support these additional types?

There had been some thought given to switching the ECS labels field to flattened, but with Kibana having limited support and no support for numerics, flattened wouldn't help here currently.

@axw
Copy link
Member Author

axw commented Apr 1, 2021

@ebeahan We've been kicking the can down the road, but still would like to either expand labels to permit additional types or find a reasonable alternative.

There had been some thought given to switching the ECS labels field to flattened, but with Kibana having limited support and no support for numerics, flattened wouldn't help here currently.

Not currently, but I believe there are intentions for Kibana to support for flattened fields. I haven't thought about it super deeply, but to me flattened seems ideal for labels since it'll avoid mapping explosion.

@ebeahan
Copy link
Member

ebeahan commented Apr 2, 2021

@axw I've opened an ECS RFC proposal PR to capture a design and have a discussion around this: elastic/ecs#1341.

@axw
Copy link
Member Author

axw commented May 28, 2021

@felixbarny @Mpdreamz and @sqren and I discussed this a bit, and we came to an agreement on separating string and numeric labels. Booleans can be stored as strings.

For string labels, we would (ideally, assuming ECS goes this way) use flattened under labels.*. For numeric labels, we would (eventually) rely on numeric flattened fields when they arrive, and store them under a separate fieldset such as numeric_labels.*

With these changes we should be able to move away from dynamic mapping for most of our documents, avoid the possibility of mapping explosions and conflicts. This in turn makes it more feasible to combine data streams for multiple services, with the probable exception of application metrics which would continue to be dynamically mapped.

@simitt
Copy link
Contributor

simitt commented Aug 18, 2021

The proposed way of moving foward with this in 8.0 is to have

  • labels (keyword) - containing strings and booleans stored as strings
  • numeric_labels (scaled_float) - containing numbers

Once the flattened fields are fully supported in ES and Kibana (elastic/elasticsearch#61550, elastic/ecs#1341 (comment)) the type of the labels and numeric_labels fields are going to be changed to the flattened types. This won't be considered a breaking change (#5963)

@simitt simitt added the v8.0.0 label Aug 20, 2021
@simitt simitt added this to the 8.0 milestone Aug 20, 2021
@marclop marclop self-assigned this Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants