Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecating custom context in favor of tags #7

Closed
felixbarny opened this issue Oct 26, 2018 · 37 comments
Closed

Deprecating custom context in favor of tags #7

felixbarny opened this issue Oct 26, 2018 · 37 comments

Comments

@felixbarny
Copy link
Member

felixbarny commented Oct 26, 2018

Today, users have two options for adding additional metadata to transactions and spans.

We offer custom context and tags in most agent APIs.

Both are quite similar which causes confusion to users https://discuss.elastic.co/t/custom-context-vs-tags/151126.

Cons of custom context

  • Users can set any objects. Setting a large object graph, a circular object graph or a non JSON serializable object can lead to errors.
  • In contrast to tags, we don't have custom context on spans
  • Custom context is not indexed, so users can't build dashboards on top of it

Cons of Tags

  • Currently, we only support string tags, so we should consider adding support for number and boolean tags. This would enable users to build graphs on top of number tags. For example by adding a cart_value number tag on a checkout cart transaction. See also Allow setting numbers as tags apm-server#828
  • Only allows for indexed values

So I propose deprecating the custom context and suggest users to move to span tags. In the agents which are still in beta, we should remove the custom context and agents which are already generally available should deprecate the custom context.

Open questions:

  • Do we need non-indexed tags? What are the use cases for that? If we decide that we do need them, we can have two different api methods like addIndexedTag and addNonIndexedTag.
  • How do we map non-string tags? The zipkin ES storage, for example, converts all tags to strings because of mapping difficulties. @xeraa knows some background around this
@watson
Copy link
Contributor

watson commented Oct 26, 2018

While I'm generally in favor of this proposal, I just have a few comments on your listed cons:

Users can set any objects. Setting a large object graph, a circular object graph or a non JSON serializable object can lead to errors.

I'm not sure how this is handled in the other agents, but in the Node.js agent this isn't a problem as the JSON serializer we're using is safe when it comes to circular references etc. I hope all agents guard against this so that they don't risk crashing the system.

In contrast to tags, we don't have custom context on spans

We could just add that if we wanted to

Custom context is not indexed

This could also be seen as a feature as it means users can store data without the fear of breaking the index 🤔

Currently, we only support string tags, so we should consider adding support for number and boolean tags.

Yes 👍

@felixbarny
Copy link
Member Author

Custom context is not indexed

This could also be seen as a feature as it means users can store data without the fear of breaking the index 🤔

True, that's why I also added that as a con to tags. But if the only value of custom context is that it's not indexed, I think its better to have non-indexed tags so it is verry clear to users what the difference is.

I hope all agents guard against this so that they don't risk crashing the system.

In case of the manual serialization approach in the Java agent, this would add both complexity and overhead. Serializing arbitarty objects is also not that easy with the JSON serializer we are using there.

If we were ever to support additional formats like Protocol Buffers, arbitrary, user-supplied custom context objects would be a blocker.

@watson
Copy link
Contributor

watson commented Oct 26, 2018

If we were ever to support additional formats like Protocol Buffers, arbitrary, user-supplied custom context objects would be a blocker.

This is a very big point plus for removing custom context 💯

If the user really wanted to store an object, they could just serialize the JSON them selfs and store it as a tag value (though the size would of course be limited to 1024 chars)

@xeraa
Copy link

xeraa commented Oct 28, 2018

One other thing to watch out for: Having keys like foo and foo.bar, which won't work since you can't have string and object as the type of foo. Spring Boot actuator are doing that for their JSON representation for example. In Beats there is a dedot function and we'll probably need something similar here if we allow the indexing of custom JSON structures.

@felixbarny
Copy link
Member Author

AFAIK, our agents de-dot by default.

@watson
Copy link
Contributor

watson commented Oct 29, 2018

AFAIK, our agents de-dot by default.

Not to my knowledge

@felixbarny
Copy link
Member Author

felixbarny commented Dec 10, 2018

@elastic/apm-agent-devs please vote:

So I propose deprecating the custom context and suggest users to move to span tags. In the agents which are still in beta, we should remove the custom context and agents which are already generally available should deprecate the custom context.

@watson
Copy link
Contributor

watson commented Dec 10, 2018

@felixbarny there's a few open questions in the issue description, so I'm not really sure what we're voting on? Update: Resolved IRL

@mikker
Copy link
Contributor

mikker commented Dec 10, 2018

Vote on what? Summary + question would be nice.

@felixbarny
Copy link
Member Author

Cited what to vote on from the issue description

@beniwohli
Copy link
Contributor

This probably also needs some input from product and UI. We do have a nag screen in the UI that explicitly tells users to pretty please provide some custom context, after all

screenshot from 2018-12-11 10-25-42

@felixbarny
Copy link
Member Author

I'm not sure this is a product-related decision. If we agree to deprecate custom context we will ofc have to also deemphasize and deprecate the custom context in the UI. But we can only do that after we agree that makes sense from an agent's perspective. Let me know if you still think product should be involved 🙂

@mikker
Copy link
Contributor

mikker commented Dec 11, 2018

Thanks Felix for updating the comment ❤️

I am fine with this but don't agree that it isn't a product decision. We should hear @makwarth – and he can take into account that the agents have concerns about custom context.

@simitt
Copy link
Contributor

simitt commented Dec 21, 2018

But if the only value of custom context is that it's not indexed, I think its better to have non-indexed tags so it is verry clear to users what the difference is.

@felixbarny does this come down to a naming problem then? Because context.custom is basically non indexed tags, but with no restrictions on the format, while tags need to be string key-value pairs.

Before making a decision here, do we know if/how context.custom is used by customers? From what I understand the motivation for removing this is potential user confusion (which might be improved by revisiting our documentation around it) and potential issues with json serialisation, which, as @watson pointed out, should be avoided anyways in the agents.

If we are aiming for removing context.custom I would see this as a breaking change, on the Intake API level and on the ES level, as we would take away a feature. So if we decide on this, I suggest to mark it as deprecated in the next 6.x release and remove it in 7.0.

@alvarolobato
Copy link
Contributor

@roncohen @makwarth can you give us your feedback on this?

@roncohen
Copy link

the major open question that @felixbarny posted in the top hasn't been addressed:

  • Do we need non-indexed tags? What are the use cases for that? If we decide that we do need them, we can have two different api methods like addIndexedTag and addNonIndexedTag.

The ability to send up non-indexed data was the reason for custom in the first place. We've since heard customer requests for more control over how things are indexed. I've posted #53 to address that issue but haven't heard anything.

@felixbarny
Copy link
Member Author

felixbarny commented Feb 28, 2019

The ability to send up non-indexed data was the reason for custom in the first place.

By now, I also think we need support for non-indexed key/value pairs. For example to add a value which is larger than 1024 chars. But I question that we need to have support for arbitrarily nested objects. Do you have use cases for that in mind? We could still allow customizing how those key/value pairs are indexed.

Restricting to key/value pairs has the advantage that we know the schema up-front which reduces the risk of serialization errors and allows us to use formats like protobuf in the future.

I think this issue and #53 are complementary.

@roncohen
Copy link

Great @felixbarny!

I'm fine to restrict to key/value pairs. In some languages we can't guarantee that users don't add a complex structure to "custom" at compile time. In those cases we could either try to serialize to JSON gracefully or ignore it all together. I think for something like Python, you'd expect something that resembles what you set, to show up. Thoughts @beniwohli ?

@felixbarny
Copy link
Member Author

try to serialize to JSON gracefully

Do you mean serializing the object to a JSON string and setting that string as the value? For example "foo": "{\"bar\": \"baz\"}"

@beniwohli
Copy link
Contributor

@roncohen we already say in the documentation that the custom context should be a flat key/value dict: https://www.elastic.co/guide/en/apm/agent/python/master/api.html#api-set-custom-context

Currently we don't enforce that (although I thought we did, but I can't find it in the code), but I'd be in favor of doing that, by stringifying the value (as in, call str(value) (aka value.toString() or value.to_s or whatever other languages call it).

@felixbarny
Copy link
Member Author

we already say in the documentation that the custom context should be a flat key/value dict

Ah, great, I wasn't aware of this. Then this basically boils down to a naming issue. If the only difference between custom context and labels (formerly tags) is how it's indexed, I think renaming to non_indexed_labels would make that intent clearer. The JSON spec of the intake API should then also reflect that (only allowing number, boolean and string values).

@simitt
Copy link
Contributor

simitt commented Feb 28, 2019

I am re-raising my questions and my concerns about this being a breaking change on Intake API and ES level from #7 (comment).

I'd really be interested in the motivation of why to limit sending up non-indexed fields to simple key:value pairs.

@felixbarny
Copy link
Member Author

Restricting to key/value pairs has the advantage that we know the schema up-front which reduces the risk of serialization errors and allows us to use formats like protobuf in the future.

@felixbarny
Copy link
Member Author

This would not be a breaking change. The context.custom namespace would just be deprecated and we would create a new one for non-indexed simple key/value pairs.

@beniwohli
Copy link
Contributor

@felixbarny just to clarify, I linked to documentation of the Python agent. I'm not aware if any other agent has similar wording

@simitt
Copy link
Contributor

simitt commented Apr 17, 2019

To summarize this, @felixbarny you are suggesting to:

  • deprecate context.custom at Intake API and remove with next major version
  • add context.non-indexed-tags at Intake API, allowing for key value pairs of type string, number, boolean
  • release new agent builds that support new intake API. Remove support for context.custom from agents that offered that support would probably be released as breaking change in the agents.

Did I get this right?

I am still not sure whether or not we have any data backing up the assumption that the feature is not used at the moment.
From the server perspective the above mentioned process wouldn't be a breaking change, so good to move forward with this.

@roncohen
Copy link

since this includes "deprecate context.custom at Intake API" it conflicts with #53. So we need to consider them together.

@simitt
Copy link
Contributor

simitt commented Apr 17, 2019

I think it is fine from a technical perspective. Following along the conversations here, and also the fact that @roncohen agreed on removing context.custom after creating #53, led to the impression that this was also decided from a product perspective.

@felixbarny
Copy link
Member Author

Great summary @simitt.

I am still not sure whether or not we have any data backing up the assumption that the feature is not used at the moment.

I'm sure context.custom is currently used so users would have to migrate to non-indexed tags/labels (or whatever we decide the name to be). Summary of the benefits:

  • Simple JSON schema, not allowing for arbitrarily nested objects. Important should we consider formats like protobuf in the future.
  • Makes it clearer what the purpose and difference of labels and non-indexed labels are as opposed to labels and context.custom.

since this includes "deprecate context.custom at Intake API" it conflicts with #53. So we need to consider them together.

I don't see them conflicting. We should still allow users to customize the indexing of the non-indexed labels.

An alternative to introducing non-indexed labels would be to restrict the context.custom to key/value pairs. But then it might still be confusing to users what the difference to tags is. But we could solve that with documentation.

@simitt
Copy link
Contributor

simitt commented Apr 17, 2019

since this includes "deprecate context.custom at Intake API" it conflicts with #53. So we need to consider them together.

I don't see them conflicting. We should still allow users to customize the indexing of the non-indexed labels.

From the issue #53:

We could extend custom to mean user-defined data

The difference is that you suggest to restrict which data can be uploaded, while at the moment there are no restrictions. That is the crucial point.

@roncohen
Copy link

@simitt and I discussed offline. We need to discuss the product implications. I'll bring it up on the appropriate sync.

We should still allow users to customize the indexing of the non-indexed labels.

I don't think it makes sense to call them non-indexed labels and then let people add indexing.

@felixbarny
Copy link
Member Author

In the meeting, we concluded the following:

  • Taking away this feature can be problematic in terms of backwards compatibility
  • If we want to support protobuf in the future, we can map values which are not a string, number or boolean to a string by converting them to a string (either a JSON string or by calling toString() to_s, or similar)
  • The confusion around whether to use labels or custom context does not seem to warrant breaking backwards compatibility or having users to migrate to non-indexed labels. That confusion is better resolved with documentation.
  • For agents where it's difficult to support serializing arbitrary objects (like the Java agent) the path forward is to support only key/value pairs where the value can be a string, number or boolean. Example API methods:
    • context.addCustom(String key, String value)
    • context.addCustom(String key, Number value)
    • context.addCustom(String key, boolean value)

If there are no objections, I'll close this issue in a week.

@bmorelli25
Copy link
Member

The confusion around whether to use labels or custom context does not seem to warrant breaking backwards compatibility or having users to migrate to non-indexed labels. That confusion is better resolved with documentation.

elastic/apm-server#2111 will hopefully fix this problem by adding more in-depth and centralized documentation on labels/user/custom.

@felixbarny
Copy link
Member Author

felixbarny commented Apr 17, 2019 via email

@axw
Copy link
Member

axw commented Apr 23, 2019

Was there any discussion of when to add custom context APIs to the agents that currently lack them? (only Java and Go I think)

@felixbarny
Copy link
Member Author

Those agents can start adding these APIs now. As mentioned, I plan to only offer an API to add key/value pairs where the value is either a String, Number or boolean. What's your plan for the go agent?

@axw
Copy link
Member

axw commented Apr 23, 2019

Not sure yet. Probably accept an arbitrary value, encode basic types to string/number/bool, and stringify complex types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants