-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LogQL: Introduce distinct
#8662
Conversation
@liguozhong Thank you very much for solving the problem that is bothering me, I just want to say cool! Two questions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Docs squad] One small comment, otherwise the docs LGTM.
nice, I submitted a commit to complete this feature.
|
No, currently the earliest log is kept. because the |
The mysql's distinct was no keep the last one too, maybe we can keep the same logic with mysql's SQL now. |
Thanks.
You are right, pipeline filtering is not possible, which conflicts with streaming log lines, since the latest log line is in the future. However, when the date range and query conditions are determined, the log lines is no longer a stream, but a determined lines. Is it possible to reevaluate the log lines to be returned(if there is a suitable place to do that)? |
The result of the MySQL query is the column values, it doesn't matter whether it is earliest or latest. But for logs, the log content itself is important, the deduplication strategy become more important, right? |
And there is an important question we have to consider. I'm afraid it's hard for loki to pull the last log.But I also think that the last line of log is the most important log. Maybe you can modify the start time continuously. |
Anyway, your issue is very valuable, I asked our developers, they also want a feature in their daily oncall work. |
if !ok { | ||
return line, false | ||
} | ||
_, ok = r.datas[label][val] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when r.datas[label]
returns nil? E.g. | logfmt
would add new labels that have not been seens when r.data
was initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the label does not exist, it will be discarded, and the new label is based on the order of the pipeline. Such as the following example
log
{"a":"1", "b":2}
logql 1 :
{app="buy2"} | json | distinct a
logql 2 :
{app="buy2"} | distinct a | json
logql 1 can get log. logql 2 will not get log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, but you'd still have a nil pointer right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point.This code does look dangerous.
No, each label has been initialized in the NewDistinctFilter
() function.
func NewDistinctFilter(labels []string) (Stage, error) {
datas := make(map[string]map[string]int, 0)
for _, label := range labels {
datas[label] = make(map[string]int, 0)
}
...
}
If distinctFilter
struct initialized through NewDistinctFilter
(), the distinctFilter
struct will never get nil point. But if we directly build a distinctFilter
struct with &distinctFilter
{}, it is possible that the nil point you mentioned will appear.
another way.
this for performance, it will not need to make a judgment like this for each log.
_, ok = r.datas[label]
if !ok {
r.datas[label] = make(map[string]int, 0)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is NewDestinctFilter
called for each line? How is preserving the state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience. I still don't understand how data[label]
can be not nil if the label not known when NewDestinctFilter
is called. I'll try out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no I get it. The labels are not the labels seen in the line but the ones passed to distinct
🙈
Co-authored-by: Dylan Guedes <[email protected]>
Co-authored-by: Dylan Guedes <[email protected]>
Co-authored-by: Dylan Guedes <[email protected]>
Co-authored-by: Dylan Guedes <[email protected]>
thanks, ready for review. |
{ | ||
`{app="foo"} | logfmt | distinct id`, | ||
[]*labels.Matcher{ | ||
mustNewMatcher(labels.MatchEqual, "app", "foo"), | ||
}, | ||
[]linecheck{{"id=foo", true}, {"id=foo", false}, {"id=bar", true}}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liguozhong I hope you don't mind me adding this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DylanGuedes I'm fine with this change. What do you say?
thanks |
same 😄 @liguozhong thank you again for another awesome contribution ❤️ |
distinct
Hi,what's the milestone of this feature? The lastest version 2.8.4 don't support distinct for now. |
**What this PR does / why we need it**: This reverts commit 38b298c. We are removing this PR #8662 because [known issue tracked here](#9594) and some issues we facing internally with query splitting and sharding work(hard to parallelize, merging two `distinct` sets can have duplicates, hard to make it work with metric queries, etc). We decided best to remove it before make it to public release. **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [x] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e) --------- Signed-off-by: Kaviraj <[email protected]>
**What this PR does / why we need it**: This reverts commit 38b298c. We are removing this PR #8662 because [known issue tracked here](#9594) and some issues we facing internally with query splitting and sharding work(hard to parallelize, merging two `distinct` sets can have duplicates, hard to make it work with metric queries, etc). We decided best to remove it before make it to public release. **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [x] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e) --------- Signed-off-by: Kaviraj <[email protected]> (cherry picked from commit c030217)
Backport c030217 from #10356 --- **What this PR does / why we need it**: This reverts commit 38b298c. We are removing this PR #8662 because [known issue tracked here](#9594) and some issues we facing internally with query splitting and sharding work(hard to parallelize, merging two `distinct` sets can have duplicates, hard to make it work with metric queries, etc). We decided best to remove it before make it to public release. **Which issue(s) this PR fixes**: Fixes #<issue number> **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [x] Documentation added - [x] Tests updated - [x] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/setup/upgrade/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e) Co-authored-by: Kaviraj Kanagaraj <[email protected]>
What this PR does / why we need it:
Introduces
distinct
to LogQL. Usage:{job="varlogs"} | distinct filename
It is similar to
distinct
in SQL and especially useful for sampling. Similar syntax exists in other log query languages.Example
For the following log lines:
The query below:
Will return:
Example with multiple labels
Which issue(s) this PR fixes:
Fixes #8649
Special notes for your reviewer:
{job="varlogs"} |distinct filename
The test passed in a simple local development environment
snapshot.
Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updateddocs/sources/upgrading/_index.md