Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogQL: Simple JSON expressions #3280

Merged

Conversation

dannykopping
Copy link
Contributor

@dannykopping dannykopping commented Feb 2, 2021

What this PR does / why we need it:
This PR introduces the following synax in LogQL:

{app="foo"} | json resp_code="response.status"

where a log entry would contain:

{"proxy_protocol_addr": "","remote_addr": "1.2.3.4","remote_user": "api_key","upstream_addr": "10.57.54.72:80","the_real_ip": "10.56.136.15","timestamp": "2021-01-26T10:01:09+00:00", "request": {"method" : "POST", "headers": {"user_agent": "Go-http-client/1.1"},"referer": ""},"response": {"status": 200,"upstream_status": "200","size": "6379","size_sent": "6385","latency_seconds": "0.063"}}

...which would produce a single label named resp_code with a value of "200".
This also adds support for accessing JSON arrays.

Which issue(s) this PR fixes:
N/A

Special notes for your reviewer:
We discussed introducing JMESPath, but decided against it because it adds a lot of complexity (DSL within a DSL) and we probably won't need most of its features.

This PR attempts to combine the ideas of JMESPath with the syntax of label_format

Checklist

  • Documentation added
  • Tests updated

Future optimisations:

  • consider using a parser pool to prevent excessive allocations

Danny Kopping added 3 commits February 2, 2021 22:53
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
@dannykopping
Copy link
Contributor Author

dannykopping commented Feb 2, 2021

Benchmarks:

benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                           old ns/op     new ns/op     delta
BenchmarkJSONExpressionParser-8     2204          2237          +1.50%

benchmark                           old allocs     new allocs     delta
BenchmarkJSONExpressionParser-8     21             20             -4.76%

benchmark                           old bytes     new bytes     delta
BenchmarkJSONExpressionParser-8     1120          1088          -2.86%

Methodology:
Ran BenchmarkJSONExpressionParser with the new JSONExpressionParser and with the existing JSON Parser

We got lucky here since jsoniter exposes an API for extracting fields from a JSON document in a very convenient manner, and very efficiently.

Danny Kopping added 4 commits February 2, 2021 23:42
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
Signed-off-by: Danny Kopping <[email protected]>
@dannykopping dannykopping marked this pull request as ready for review February 3, 2021 09:46
Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all, looks good -- I expect you can teach us a few thing about yacc.

I left a few discussion points.


for {
r := sc.read()
if r == '"' || r == 1 || r == ']' {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the significance of || r == 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually nothing! I think I was trying something out and this is an artifact of that; thanks for catching


for identifier, paths := range j.expressions {
result := jsoniter.ConfigFastest.Get(line, paths...).ToString()
if result == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return line, false in this case? i.e. if | json foo=bar.baz is specified against the line {} log line, what should happen?

Should we implicitly filter these? Is it a best-effort label setting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't because this is in a loop of multiple expressions - one expression may not evaluate to a value, but a subsequent one could.

Any expression that doesn't return a value will be ignored. If any expression is evaluated against an empty JSON document, it'll return InvalidType which will be stringed to "".

Could you expand a bit on what you mean by this?

Should we implicitly filter these? Is it a best-effort label setting?

Copy link
Member

@owen-d owen-d Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current JSON parser will set a JSONParserErr when it finds malformed JSON here: https://github.com/grafana/loki/blob/master/pkg/logql/log/parser.go#L54.
I'm curious if we should do something similar here if one of the expected labels does not exist, because we're being explicit.

My original comment was trying to say "if one of the explicitly extracted labels doesn't exist, should we drop the line from this result set?", or Does this new json extension imply a filter operation?

For instance, if we try to extract the "foo" label from an entry like {"bar": 2}, should we skip this line? Should we add an error label such as JSONExpectedLabelNotPresent? Should we ignore it and continue (bummer if you aggregated on foo)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for clarifying. I think we need to distinguish here between incorrect and malformed expressions.
For example, with label_format if you use an incorrect expression (i.e. one that doesn't return a value), it will evaluate correctly (blank result).

image

A malformed expression (i.e. one that breaks syntactic rules) should indeed raise an error:
image

You're right though that this PR doesn't set a __error__="JSONParserErr" label, I'll fix that.

How do you think we should handle malformed/incorrect expressions though?
I'm thinking we follow the behaviour of label_format: for malformed queries, raise an error that fails the whole query, and for incorrect queries just set a blank label. We don't, AFAIK, have a mechanism to mark a particular label as being problematic, and since we could be returning multiple labels here this gets a bit tricky in terms of UX.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep parity with what we have & set empty labels when they don't have corresponding JSON fields. i.e. extracting | json foo="foo" from {"bar": 2} should just yield a foo="" label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

expressions []JSONExpression
lbs labels.Labels
want labels.Labels
}{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test against labels that already exist, such as:

"duplicate field"
[]byte(`{"foo": "bar"}`),
[]JSONExpression{NewJSONExpr("foo": "foo"},
labels.Labels{"foo": "bazz"}
labels.Labels{"foo": "bazz", "foo_extracted": "bar"}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would foo_extracted come from?
[]JSONExpression{NewJSONExpr("foo": "foo"}, would need to be []JSONExpression{NewJSONExpr("foo_extracted": "foo"}, in this case. Or am I missing something?

Does https://github.com/grafana/loki/pull/3280/files#diff-c3a5c0501669f6c25bb23430a1f1bb4fcce88bfccf3cfa6741122af4ad3e803aR254 go some way towards this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have functionality that currently moves <x> to <x>_extracted if the label we extract from i.e. JSON is already present in the log stream's labels. It's how we deal with label collision during extraction.

See https://github.com/grafana/loki/blob/master/pkg/logql/log/parser.go#L123-L125

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I didn't actually know about that behaviour. Fixed

@dannykopping
Copy link
Contributor Author

I've modified the benchmarks a bit, so you can disregard the previous results. There are now equivalent JSON & JSONExpression benchmarks:

$ go test -run=NONE -bench='JSON.*Parser$' -benchtime 5s -count 3 ./pkg/logql/log
goos: linux
goarch: amd64
pkg: github.com/grafana/loki/pkg/logql/log
BenchmarkJSONParser-8                    3266301              1712 ns/op             992 B/op         14 allocs/op
BenchmarkJSONParser-8                    3213999              2054 ns/op             992 B/op         14 allocs/op
BenchmarkJSONParser-8                    3021211              1859 ns/op             992 B/op         14 allocs/op
BenchmarkJSONExpressionParser-8          3185694              1884 ns/op             936 B/op         16 allocs/op
BenchmarkJSONExpressionParser-8          3250711              1903 ns/op             936 B/op         16 allocs/op
BenchmarkJSONExpressionParser-8          3172682              1845 ns/op             936 B/op         16 allocs/op
PASS
ok      github.com/grafana/loki/pkg/logql/log   47.300s

And also benchmarks around handling invalid lines:

$ go test -run=NONE -bench='JSON.*Invalid' -benchtime 5s -count 3 ./pkg/logql/log 
goos: linux
goarch: amd64
pkg: github.com/grafana/loki/pkg/logql/log
BenchmarkJSONParserInvalidLine-8                12737628               424 ns/op             112 B/op          4 allocs/op
BenchmarkJSONParserInvalidLine-8                14350429               456 ns/op             112 B/op          4 allocs/op
BenchmarkJSONParserInvalidLine-8                10928316               482 ns/op             112 B/op          4 allocs/op
BenchmarkJSONExpressionParserInvalidLine-8       6386509               999 ns/op             336 B/op         11 allocs/op
BenchmarkJSONExpressionParserInvalidLine-8       6215274               969 ns/op             336 B/op         11 allocs/op
BenchmarkJSONExpressionParserInvalidLine-8       6367006               932 ns/op             336 B/op         11 allocs/op
PASS
ok      github.com/grafana/loki/pkg/logql/log   39.899s

@owen-d
Copy link
Member

owen-d commented Feb 8, 2021

ping @cyriltovena if you want to review this before I merge it

@cyriltovena
Copy link
Contributor

Yeah I will today

pkg/logql/ast.go Outdated Show resolved Hide resolved
Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Danny Kopping added 3 commits February 9, 2021 14:15
@cyriltovena cyriltovena merged commit feb7fb4 into grafana:master Feb 9, 2021
@dannykopping dannykopping deleted the dannykopping/json_expressions branch February 9, 2021 13:42
cyriltovena pushed a commit to cyriltovena/loki that referenced this pull request Feb 15, 2021
* New approach, still rough

Signed-off-by: Danny Kopping <[email protected]>

* Adding benchmark

Signed-off-by: Danny Kopping <[email protected]>

* Adding tests

Signed-off-by: Danny Kopping <[email protected]>

* Minor refactoring

Signed-off-by: Danny Kopping <[email protected]>

* Appeasing the linter

Signed-off-by: Danny Kopping <[email protected]>

* Further appeasing the linter

Signed-off-by: Danny Kopping <[email protected]>

* Adding more tests

Signed-off-by: Danny Kopping <[email protected]>

* Adding documentation

Signed-off-by: Danny Kopping <[email protected]>

* Docs fixup

Signed-off-by: Danny Kopping <[email protected]>

* Removing unnecessary condition

Signed-off-by: Danny Kopping <[email protected]>

* Adding extra tests from suggestion in review

Signed-off-by: Danny Kopping <[email protected]>

* Adding JSONParseErr

Signed-off-by: Danny Kopping <[email protected]>

* Adding test to cover invalid JSON line

Signed-off-by: Danny Kopping <[email protected]>

* Adding equivalent benchmarks for JSON and JSONExpression parsing

Signed-off-by: Danny Kopping <[email protected]>

* Adding suffix if label would be overridden

Signed-off-by: Danny Kopping <[email protected]>

* Reparenting jsonexpr directory to more appropriate location

Signed-off-by: Danny Kopping <[email protected]>

* Setting empty label on non-matching expression, to retain parity with label_format

Signed-off-by: Danny Kopping <[email protected]>

* Adding statement about returned complex JSON types

Signed-off-by: Danny Kopping <[email protected]>

* Added check for valid label name

Signed-off-by: Danny Kopping <[email protected]>

* Making json expressions shardable

Signed-off-by: Danny Kopping <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants