-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LogQL: Simple JSON expressions #3280
LogQL: Simple JSON expressions #3280
Conversation
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Benchmarks:
Methodology: We got lucky here since |
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all, looks good -- I expect you can teach us a few thing about yacc.
I left a few discussion points.
pkg/logql/jsonexpr/lexer.go
Outdated
|
||
for { | ||
r := sc.read() | ||
if r == '"' || r == 1 || r == ']' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the significance of || r == 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually nothing! I think I was trying something out and this is an artifact of that; thanks for catching
pkg/logql/log/parser.go
Outdated
|
||
for identifier, paths := range j.expressions { | ||
result := jsoniter.ConfigFastest.Get(line, paths...).ToString() | ||
if result == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return line, false
in this case? i.e. if | json foo=bar.baz
is specified against the line {}
log line, what should happen?
Should we implicitly filter these? Is it a best-effort label setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't because this is in a loop of multiple expressions - one expression may not evaluate to a value, but a subsequent one could.
Any expression that doesn't return a value will be ignored. If any expression is evaluated against an empty JSON document, it'll return InvalidType
which will be stringed to ""
.
Could you expand a bit on what you mean by this?
Should we implicitly filter these? Is it a best-effort label setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current JSON parser will set a JSONParserErr
when it finds malformed JSON here: https://github.com/grafana/loki/blob/master/pkg/logql/log/parser.go#L54.
I'm curious if we should do something similar here if one of the expected labels does not exist, because we're being explicit.
My original comment was trying to say "if one of the explicitly extracted labels doesn't exist, should we drop the line from this result set?", or Does this new json extension imply a filter operation?
For instance, if we try to extract the "foo" label from an entry like {"bar": 2}
, should we skip this line? Should we add an error label such as JSONExpectedLabelNotPresent
? Should we ignore it and continue (bummer if you aggregated on foo
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for clarifying. I think we need to distinguish here between incorrect and malformed expressions.
For example, with label_format
if you use an incorrect expression (i.e. one that doesn't return a value), it will evaluate correctly (blank result).
A malformed expression (i.e. one that breaks syntactic rules) should indeed raise an error:
You're right though that this PR doesn't set a __error__="JSONParserErr"
label, I'll fix that.
How do you think we should handle malformed/incorrect expressions though?
I'm thinking we follow the behaviour of label_format
: for malformed queries, raise an error that fails the whole query, and for incorrect queries just set a blank label. We don't, AFAIK, have a mechanism to mark a particular label as being problematic, and since we could be returning multiple labels here this gets a bit tricky in terms of UX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep parity with what we have & set empty labels when they don't have corresponding JSON fields. i.e. extracting | json foo="foo"
from {"bar": 2}
should just yield a foo=""
label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
expressions []JSONExpression | ||
lbs labels.Labels | ||
want labels.Labels | ||
}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test against labels that already exist, such as:
"duplicate field"
[]byte(`{"foo": "bar"}`),
[]JSONExpression{NewJSONExpr("foo": "foo"},
labels.Labels{"foo": "bazz"}
labels.Labels{"foo": "bazz", "foo_extracted": "bar"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would foo_extracted
come from?
[]JSONExpression{NewJSONExpr("foo": "foo"},
would need to be []JSONExpression{NewJSONExpr("foo_extracted": "foo"},
in this case. Or am I missing something?
Does https://github.com/grafana/loki/pull/3280/files#diff-c3a5c0501669f6c25bb23430a1f1bb4fcce88bfccf3cfa6741122af4ad3e803aR254 go some way towards this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have functionality that currently moves <x>
to <x>_extracted
if the label we extract from i.e. JSON is already present in the log stream's labels. It's how we deal with label collision during extraction.
See https://github.com/grafana/loki/blob/master/pkg/logql/log/parser.go#L123-L125
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I didn't actually know about that behaviour. Fixed
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
…ng/loki into dannykopping/json_expressions
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
I've modified the benchmarks a bit, so you can disregard the previous results. There are now equivalent JSON & JSONExpression benchmarks: $ go test -run=NONE -bench='JSON.*Parser$' -benchtime 5s -count 3 ./pkg/logql/log
goos: linux
goarch: amd64
pkg: github.com/grafana/loki/pkg/logql/log
BenchmarkJSONParser-8 3266301 1712 ns/op 992 B/op 14 allocs/op
BenchmarkJSONParser-8 3213999 2054 ns/op 992 B/op 14 allocs/op
BenchmarkJSONParser-8 3021211 1859 ns/op 992 B/op 14 allocs/op
BenchmarkJSONExpressionParser-8 3185694 1884 ns/op 936 B/op 16 allocs/op
BenchmarkJSONExpressionParser-8 3250711 1903 ns/op 936 B/op 16 allocs/op
BenchmarkJSONExpressionParser-8 3172682 1845 ns/op 936 B/op 16 allocs/op
PASS
ok github.com/grafana/loki/pkg/logql/log 47.300s And also benchmarks around handling invalid lines: $ go test -run=NONE -bench='JSON.*Invalid' -benchtime 5s -count 3 ./pkg/logql/log
goos: linux
goarch: amd64
pkg: github.com/grafana/loki/pkg/logql/log
BenchmarkJSONParserInvalidLine-8 12737628 424 ns/op 112 B/op 4 allocs/op
BenchmarkJSONParserInvalidLine-8 14350429 456 ns/op 112 B/op 4 allocs/op
BenchmarkJSONParserInvalidLine-8 10928316 482 ns/op 112 B/op 4 allocs/op
BenchmarkJSONExpressionParserInvalidLine-8 6386509 999 ns/op 336 B/op 11 allocs/op
BenchmarkJSONExpressionParserInvalidLine-8 6215274 969 ns/op 336 B/op 11 allocs/op
BenchmarkJSONExpressionParserInvalidLine-8 6367006 932 ns/op 336 B/op 11 allocs/op
PASS
ok github.com/grafana/loki/pkg/logql/log 39.899s |
… label_format Signed-off-by: Danny Kopping <[email protected]>
ping @cyriltovena if you want to review this before I merge it |
Yeah I will today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
* New approach, still rough Signed-off-by: Danny Kopping <[email protected]> * Adding benchmark Signed-off-by: Danny Kopping <[email protected]> * Adding tests Signed-off-by: Danny Kopping <[email protected]> * Minor refactoring Signed-off-by: Danny Kopping <[email protected]> * Appeasing the linter Signed-off-by: Danny Kopping <[email protected]> * Further appeasing the linter Signed-off-by: Danny Kopping <[email protected]> * Adding more tests Signed-off-by: Danny Kopping <[email protected]> * Adding documentation Signed-off-by: Danny Kopping <[email protected]> * Docs fixup Signed-off-by: Danny Kopping <[email protected]> * Removing unnecessary condition Signed-off-by: Danny Kopping <[email protected]> * Adding extra tests from suggestion in review Signed-off-by: Danny Kopping <[email protected]> * Adding JSONParseErr Signed-off-by: Danny Kopping <[email protected]> * Adding test to cover invalid JSON line Signed-off-by: Danny Kopping <[email protected]> * Adding equivalent benchmarks for JSON and JSONExpression parsing Signed-off-by: Danny Kopping <[email protected]> * Adding suffix if label would be overridden Signed-off-by: Danny Kopping <[email protected]> * Reparenting jsonexpr directory to more appropriate location Signed-off-by: Danny Kopping <[email protected]> * Setting empty label on non-matching expression, to retain parity with label_format Signed-off-by: Danny Kopping <[email protected]> * Adding statement about returned complex JSON types Signed-off-by: Danny Kopping <[email protected]> * Added check for valid label name Signed-off-by: Danny Kopping <[email protected]> * Making json expressions shardable Signed-off-by: Danny Kopping <[email protected]>
What this PR does / why we need it:
This PR introduces the following synax in LogQL:
where a log entry would contain:
{"proxy_protocol_addr": "","remote_addr": "1.2.3.4","remote_user": "api_key","upstream_addr": "10.57.54.72:80","the_real_ip": "10.56.136.15","timestamp": "2021-01-26T10:01:09+00:00", "request": {"method" : "POST", "headers": {"user_agent": "Go-http-client/1.1"},"referer": ""},"response": {"status": 200,"upstream_status": "200","size": "6379","size_sent": "6385","latency_seconds": "0.063"}}
...which would produce a single label named
resp_code
with a value of"200"
.This also adds support for accessing JSON arrays.
Which issue(s) this PR fixes:
N/A
Special notes for your reviewer:
We discussed introducing JMESPath, but decided against it because it adds a lot of complexity (DSL within a DSL) and we probably won't need most of its features.
This PR attempts to combine the ideas of JMESPath with the syntax of
label_format
Checklist
Future optimisations: