Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TraceQL] Support attribute existence filter #2188

Open
joe-elliott opened this issue Mar 9, 2023 · 21 comments
Open

[TraceQL] Support attribute existence filter #2188

joe-elliott opened this issue Mar 9, 2023 · 21 comments
Assignees
Labels
enhancement New feature or request keepalive Label to exempt Issues / PRs from stale workflow traceql

Comments

@joe-elliott
Copy link
Member

Is your feature request related to a problem? Please describe.
There should be a way to query for the existence of an attribute without asserting a value. Perhaps something like:

{ span.att = nil }
{ exists(span.att) }
{ span.att }

Note that the last one overloads querying a boolean attribute for true or false values. If we want to go that route we should think through it carefully.

@lanapouney
Copy link

Hi, I am looking for a way to query that an attribute does not exist for a span, could this be achieved via { !exists(span.att) } or would you propose another method for this?

@joe-elliott
Copy link
Member Author

Unfortunately this does not currently exist. I want it too, but haven't found the time to add it. Do you have a preferred syntax above?

If you know the type of your attribute you can do something like { span.att != ""} or `{ span.att != 0 }'. I know that's not great, but it's the only current work around.

@lanapouney
Copy link

lanapouney commented Mar 31, 2023

For me the second syntax makes the most sense. ( { exists(span.att) } )

Unfortunately that doesn't work in this case, my only idea for a work around would be to set att="" on span creation as { span.att != ""} checks that the attribute does exist, and I want to find all spans where is does not exist at all, and in theory my only other option there is to query for if it exists but is only set to an empty string.

@cyrille-leclerc
Copy link
Contributor

I prefer { span.att = nil } and { span.att = nil } because they reuse the existing TraceQL comparison operators and work well IMO for auto-completion

@cedricziel
Copy link

I'm putting my money on { .attr = nil } as well because it's simple and intuitive

@adirmatzkin
Copy link
Contributor

Just making sure I don't miss out on something here...
By querying for a value of "nil" for some attribute x (syntax as { span.x = nil }), we would assume x doesn't even exist?
How would we differentiate between the 2 cases:

  1. x is not even a field in a span
  2. x is a field in a span, but its value is None/Null/Nil.

If we're about to use "nil", "none", or "null" for different meanings here - let me raise a red flag, because it would be confusing and not intuitive for users...

Unless I'm missing something - I feel like the { exists(span.x) } syntax would be better here due to the need to differentiate between the above 2 cases.

@cyrille-leclerc
Copy link
Contributor

cyrille-leclerc commented May 18, 2023 via email

@cyrille-leclerc
Copy link
Contributor

cyrille-leclerc commented May 18, 2023 via email

@adirmatzkin
Copy link
Contributor

Nice, so I did miss something 🙃 , thanks @cyrille-leclerc!

If that's the case, and it looks like the first two options Joe mentioned in the original comment are the leading ones, I'd like to help push this feature forward - which means at this point helping out getting to a decision 😃.

Here are (IMO) the main considerations we should focus on and take into account when dealing with this kind of syntax question:

1. Expressiveness: Allow users to express a wide range of queries and operations in different query patterns.
2. Readability, simplicity, and consistency: Keep it easy and simple to learn/use/read-understand/maintain, by leveraging familiar conventions that will make it intuitive.
3. Performance: Although syntax won't directly impact performance, design decisions can affect execution efficiency by enabling/limiting optimizations in query processing or execution.

The more I think about it - I'm getting convinced the { span.att = nil } syntax will be the better choice.

  • It is more consistent with the current syntax - also saving us unnecessary complications and introducing new functions and overall ways to express queries.
  • As for expressiveness - I see this feature as a very specific one. Basically allowing 2 (that are kind of the same) querying abilities - { span.attr = nil } for "not exists", and { span.attr != nil } for "exists".
  • As for performance - I'm not (yet) familiar with TraceQL engine, perhaps @joe-elliott could help out with this consideration.

I think that unless the performance parameter is a win for the exists(span.attr) - there is a winning option and we can continue moving! 💪🏼
Looking forward to hearing an opinion from someone whose familiar with the TraceQL engine 🙃

@joe-elliott
Copy link
Member Author

Agree with { span.attr = nil } and { span.attr != nil }.

!= nil is relatively simple to implement and could probably be done in a couple days by a dev experienced in the TraceQL parser and engine.

= nil will be quite difficult. The fetch layer all the way down to Parquet is entirely built around finding row numbers that match conditions. To make this work we'd need to pull a list of all row numbers at the span level, and then all row numbers where != nil is true and negate them.

@cyrille-leclerc
Copy link
Contributor

cyrille-leclerc commented May 23, 2023

Adding != nil would bring so much value that the inconsistency of lacking of = nil is IMO a minor disadvantage. I support the idea to implement != nil first and postpone the implementation of = nil that we may never prioritize

@mdisibio
Copy link
Contributor

= nil will be quite difficult. The fetch layer all the way down to Parquet is entirely built around finding row numbers that match conditions. To make this work we'd need to pull a list of all row numbers at the span level, and then all row numbers where != nil is true and negate them.

Agree, can't think of a more efficient algorithm and that type of query would always be much slower than != nil. But I do think it's achievable with the current storage layer. Fetch something like status (all the row numbers), the attr with no filtering (opnone), and then let the engine determine the matches.

@knylander-grafana
Copy link
Contributor

The output from the work on this issue can be documented with #2188

@ie-pham ie-pham moved this from In Progress to Next in Tempo squad Jul 25, 2023
@ie-pham
Copy link
Contributor

ie-pham commented Jul 25, 2023

@ie-pham ie-pham moved this from Next to Todo in Tempo squad Aug 1, 2023
@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had any activity in the past 60 days.
The next time this stale check runs, the stale label will be removed if there is new activity. The issue will be closed after 15 days if there is no new activity.
Please apply keepalive label to exempt this Issue.

@github-actions github-actions bot added the stale Used for stale issues / PRs label Sep 24, 2023
@joe-elliott joe-elliott removed the stale Used for stale issues / PRs label Sep 25, 2023
@joe-elliott joe-elliott added enhancement New feature or request keepalive Label to exempt Issues / PRs from stale workflow labels Sep 25, 2023
@zohnannor
Copy link

zohnannor commented Feb 16, 2024

Is there a way to make a query "if attribute exists, compare to X"? For example, span could have attr attribute or it might not, { name = "span" } finds all spans, { name = "span" && span.attr = "test" } filters out all the spans that don't include such attribute. What I want is a monadic behavior, "find all span attributes, and in case span has attr, it must be "test""

@mdisibio
Copy link
Contributor

"find all span attributes, and in case span has attr, it must be "test""

Something like this: { name = "span" && (span.attr = nil || span.attr = "test") } ? That would work as expected when support for = nil is added.

@zohnannor
Copy link

That's unfortunate. The lack of ability to find nil values cannot be solved with a workaround and thus requires me to use two different TraceQL queries.

@ie-pham ie-pham removed their assignment Jun 5, 2024
@ie-pham ie-pham self-assigned this Aug 13, 2024
@mdisibio
Copy link
Contributor

Hi everyone, I discovered a way to achieve this using the existing language and wanted to share it. I think this could be useful if anyone has a pressing need for the functionality today, it's not meant to be a replacement for the better options discussed above.

The query is: {} | select(.attr) | { !(.attr != nil)}

To explain: We can already pull back an attribute optionally using select(). The span will be populated with the value if it exists. The trick is that those results can be run through another filter to get just the spans that were missing it. The second clause acts like .attr = nil but that specifically can't be used yet since our query validation catches it.

Here's a screenshot using https://github.com/grafana/xk6-client-tracing showing that it works. All of the results are for cache or identityService which do not populate these values (and you can see on the right-hand side http.url is missing):
Image

@09jvilla
Copy link
Contributor

@mdisibio , I noticed that

{} | select(.attr) | { !(.attr != nil)} worked when I tried it, but that {} | select(resource.attr) | { !(resource.attr != nil)} and {} | select(span.attr) | { !(span.attr != nil)} did not. Essentially, specifying a scope in front of the attribute results in the query executing but returning no matches. Is this expected? I would think that if a span is missing an attribute entirely, then it also is definitely missing that attribute when you look at the resource scope, or the span scope, or any other scope in fact.

@mdisibio
Copy link
Contributor

specifying a scope in front of the attribute results in the query executing but returning no matches

Good catch, that's a bug. Specifying a scope shouldn't matter. Not 100% unexpected since this was kind of a hacky work-around already. Checking briefly, it has to do with the optimizer trying things when a scope is specified, that it's not able to do with unscoped (.foo).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request keepalive Label to exempt Issues / PRs from stale workflow traceql
Projects
Status: Todo
Development

No branches or pull requests

10 participants