LogQL: Simple JSON expressions #3280

dannykopping · 2021-02-02T21:18:14Z

What this PR does / why we need it:
This PR introduces the following synax in LogQL:

{app="foo"} | json resp_code="response.status"

where a log entry would contain:

{"proxy_protocol_addr": "","remote_addr": "1.2.3.4","remote_user": "api_key","upstream_addr": "10.57.54.72:80","the_real_ip": "10.56.136.15","timestamp": "2021-01-26T10:01:09+00:00", "request": {"method" : "POST", "headers": {"user_agent": "Go-http-client/1.1"},"referer": ""},"response": {"status": 200,"upstream_status": "200","size": "6379","size_sent": "6385","latency_seconds": "0.063"}}

...which would produce a single label named resp_code with a value of "200".
This also adds support for accessing JSON arrays.

Which issue(s) this PR fixes:
N/A

Special notes for your reviewer:
We discussed introducing JMESPath, but decided against it because it adds a lot of complexity (DSL within a DSL) and we probably won't need most of its features.

This PR attempts to combine the ideas of JMESPath with the syntax of label_format

Checklist

Documentation added
Tests updated

Future optimisations:

consider using a parser pool to prevent excessive allocations

Signed-off-by: Danny Kopping <[email protected]>

dannykopping · 2021-02-02T21:37:40Z

Benchmarks:

benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                           old ns/op     new ns/op     delta
BenchmarkJSONExpressionParser-8     2204          2237          +1.50%

benchmark                           old allocs     new allocs     delta
BenchmarkJSONExpressionParser-8     21             20             -4.76%

benchmark                           old bytes     new bytes     delta
BenchmarkJSONExpressionParser-8     1120          1088          -2.86%

Methodology:
Ran BenchmarkJSONExpressionParser with the new JSONExpressionParser and with the existing JSON Parser

We got lucky here since jsoniter exposes an API for extracting fields from a JSON document in a very convenient manner, and very efficiently.

Signed-off-by: Danny Kopping <[email protected]>

pkg/logql/log/parser.go

Signed-off-by: Danny Kopping <[email protected]>

owen-d

All in all, looks good -- I expect you can teach us a few thing about yacc.

I left a few discussion points.

owen-d · 2021-02-03T17:50:00Z

pkg/logql/jsonexpr/lexer.go

+
+	for {
+		r := sc.read()
+		if r == '"' || r == 1 || r == ']' {


What's the significance of || r == 1?

Actually nothing! I think I was trying something out and this is an artifact of that; thanks for catching

owen-d · 2021-02-03T18:16:11Z

pkg/logql/log/parser.go

+
+	for identifier, paths := range j.expressions {
+		result := jsoniter.ConfigFastest.Get(line, paths...).ToString()
+		if result == "" {


Should we return line, false in this case? i.e. if | json foo=bar.baz is specified against the line {} log line, what should happen?

Should we implicitly filter these? Is it a best-effort label setting?

We can't because this is in a loop of multiple expressions - one expression may not evaluate to a value, but a subsequent one could.

Any expression that doesn't return a value will be ignored. If any expression is evaluated against an empty JSON document, it'll return InvalidType which will be stringed to "".

Could you expand a bit on what you mean by this?

Should we implicitly filter these? Is it a best-effort label setting?

The current JSON parser will set a JSONParserErr when it finds malformed JSON here: https://github.com/grafana/loki/blob/master/pkg/logql/log/parser.go#L54.
I'm curious if we should do something similar here if one of the expected labels does not exist, because we're being explicit.

My original comment was trying to say "if one of the explicitly extracted labels doesn't exist, should we drop the line from this result set?", or Does this new json extension imply a filter operation?

For instance, if we try to extract the "foo" label from an entry like {"bar": 2}, should we skip this line? Should we add an error label such as JSONExpectedLabelNotPresent? Should we ignore it and continue (bummer if you aggregated on foo)?

OK, thanks for clarifying. I think we need to distinguish here between incorrect and malformed expressions.
For example, with label_format if you use an incorrect expression (i.e. one that doesn't return a value), it will evaluate correctly (blank result).

A malformed expression (i.e. one that breaks syntactic rules) should indeed raise an error:

You're right though that this PR doesn't set a __error__="JSONParserErr" label, I'll fix that.

How do you think we should handle malformed/incorrect expressions though?
I'm thinking we follow the behaviour of label_format: for malformed queries, raise an error that fails the whole query, and for incorrect queries just set a blank label. We don't, AFAIK, have a mechanism to mark a particular label as being problematic, and since we could be returning multiple labels here this gets a bit tricky in terms of UX.

Let's keep parity with what we have & set empty labels when they don't have corresponding JSON fields. i.e. extracting | json foo="foo" from {"bar": 2} should just yield a foo="" label.

owen-d · 2021-02-03T19:26:17Z

pkg/logql/log/parser_test.go

+		expressions []JSONExpression
+		lbs         labels.Labels
+		want        labels.Labels
+	}{


Can we add a test against labels that already exist, such as:

"duplicate field" []byte(`{"foo": "bar"}`), []JSONExpression{NewJSONExpr("foo": "foo"}, labels.Labels{"foo": "bazz"} labels.Labels{"foo": "bazz", "foo_extracted": "bar"}

Where would foo_extracted come from?
[]JSONExpression{NewJSONExpr("foo": "foo"}, would need to be []JSONExpression{NewJSONExpr("foo_extracted": "foo"}, in this case. Or am I missing something?

Does https://github.com/grafana/loki/pull/3280/files#diff-c3a5c0501669f6c25bb23430a1f1bb4fcce88bfccf3cfa6741122af4ad3e803aR254 go some way towards this?

We have functionality that currently moves <x> to <x>_extracted if the label we extract from i.e. JSON is already present in the log stream's labels. It's how we deal with label collision during extraction.

See https://github.com/grafana/loki/blob/master/pkg/logql/log/parser.go#L123-L125

Ah! I didn't actually know about that behaviour. Fixed

Signed-off-by: Danny Kopping <[email protected]>

…_expressions

…ng/loki into dannykopping/json_expressions

Signed-off-by: Danny Kopping <[email protected]>

dannykopping · 2021-02-04T07:45:47Z

I've modified the benchmarks a bit, so you can disregard the previous results. There are now equivalent JSON & JSONExpression benchmarks:

$ go test -run=NONE -bench='JSON.*Parser$' -benchtime 5s -count 3 ./pkg/logql/log
goos: linux
goarch: amd64
pkg: github.com/grafana/loki/pkg/logql/log
BenchmarkJSONParser-8                    3266301              1712 ns/op             992 B/op         14 allocs/op
BenchmarkJSONParser-8                    3213999              2054 ns/op             992 B/op         14 allocs/op
BenchmarkJSONParser-8                    3021211              1859 ns/op             992 B/op         14 allocs/op
BenchmarkJSONExpressionParser-8          3185694              1884 ns/op             936 B/op         16 allocs/op
BenchmarkJSONExpressionParser-8          3250711              1903 ns/op             936 B/op         16 allocs/op
BenchmarkJSONExpressionParser-8          3172682              1845 ns/op             936 B/op         16 allocs/op
PASS
ok      github.com/grafana/loki/pkg/logql/log   47.300s

And also benchmarks around handling invalid lines:

$ go test -run=NONE -bench='JSON.*Invalid' -benchtime 5s -count 3 ./pkg/logql/log 
goos: linux
goarch: amd64
pkg: github.com/grafana/loki/pkg/logql/log
BenchmarkJSONParserInvalidLine-8                12737628               424 ns/op             112 B/op          4 allocs/op
BenchmarkJSONParserInvalidLine-8                14350429               456 ns/op             112 B/op          4 allocs/op
BenchmarkJSONParserInvalidLine-8                10928316               482 ns/op             112 B/op          4 allocs/op
BenchmarkJSONExpressionParserInvalidLine-8       6386509               999 ns/op             336 B/op         11 allocs/op
BenchmarkJSONExpressionParserInvalidLine-8       6215274               969 ns/op             336 B/op         11 allocs/op
BenchmarkJSONExpressionParserInvalidLine-8       6367006               932 ns/op             336 B/op         11 allocs/op
PASS
ok      github.com/grafana/loki/pkg/logql/log   39.899s

… label_format Signed-off-by: Danny Kopping <[email protected]>

owen-d · 2021-02-08T13:41:33Z

ping @cyriltovena if you want to review this before I merge it

cyriltovena · 2021-02-09T06:49:40Z

Yeah I will today

docs/sources/logql/_index.md

pkg/logql/ast.go

pkg/logql/expr.y

pkg/logql/log/parser.go

cyriltovena

LGTM

Signed-off-by: Danny Kopping <[email protected]>

* New approach, still rough Signed-off-by: Danny Kopping <[email protected]> * Adding benchmark Signed-off-by: Danny Kopping <[email protected]> * Adding tests Signed-off-by: Danny Kopping <[email protected]> * Minor refactoring Signed-off-by: Danny Kopping <[email protected]> * Appeasing the linter Signed-off-by: Danny Kopping <[email protected]> * Further appeasing the linter Signed-off-by: Danny Kopping <[email protected]> * Adding more tests Signed-off-by: Danny Kopping <[email protected]> * Adding documentation Signed-off-by: Danny Kopping <[email protected]> * Docs fixup Signed-off-by: Danny Kopping <[email protected]> * Removing unnecessary condition Signed-off-by: Danny Kopping <[email protected]> * Adding extra tests from suggestion in review Signed-off-by: Danny Kopping <[email protected]> * Adding JSONParseErr Signed-off-by: Danny Kopping <[email protected]> * Adding test to cover invalid JSON line Signed-off-by: Danny Kopping <[email protected]> * Adding equivalent benchmarks for JSON and JSONExpression parsing Signed-off-by: Danny Kopping <[email protected]> * Adding suffix if label would be overridden Signed-off-by: Danny Kopping <[email protected]> * Reparenting jsonexpr directory to more appropriate location Signed-off-by: Danny Kopping <[email protected]> * Setting empty label on non-matching expression, to retain parity with label_format Signed-off-by: Danny Kopping <[email protected]> * Adding statement about returned complex JSON types Signed-off-by: Danny Kopping <[email protected]> * Added check for valid label name Signed-off-by: Danny Kopping <[email protected]> * Making json expressions shardable Signed-off-by: Danny Kopping <[email protected]>

Danny Kopping added 3 commits February 2, 2021 22:53

New approach, still rough

fdbb6ef

Signed-off-by: Danny Kopping <[email protected]>

Adding benchmark

9c7eec2

Signed-off-by: Danny Kopping <[email protected]>

Adding tests

8620235

Signed-off-by: Danny Kopping <[email protected]>

pull-request-size bot added the size/XXL label Feb 2, 2021

Minor refactoring

f7bd1f8

Signed-off-by: Danny Kopping <[email protected]>

Danny Kopping added 4 commits February 2, 2021 23:42

Appeasing the linter

807d4a4

Signed-off-by: Danny Kopping <[email protected]>

Further appeasing the linter

51ee0f5

Signed-off-by: Danny Kopping <[email protected]>

Adding more tests

ae96ca9

Signed-off-by: Danny Kopping <[email protected]>

Adding documentation

bc15fbe

Signed-off-by: Danny Kopping <[email protected]>

dannykopping commented Feb 3, 2021

View reviewed changes

pkg/logql/log/parser.go Show resolved Hide resolved

Docs fixup

8078304

Signed-off-by: Danny Kopping <[email protected]>

dannykopping marked this pull request as ready for review February 3, 2021 09:46

dannykopping requested review from cyriltovena and owen-d February 3, 2021 09:46

owen-d reviewed Feb 3, 2021

View reviewed changes

Danny Kopping added 9 commits February 3, 2021 22:28

Removing unnecessary condition

608c570

Signed-off-by: Danny Kopping <[email protected]>

Adding extra tests from suggestion in review

4b3b315

Signed-off-by: Danny Kopping <[email protected]>

Adding JSONParseErr

79abe05

Signed-off-by: Danny Kopping <[email protected]>

Merge remote-tracking branch 'upstream/master' into dannykopping/json…

cf509af

…_expressions

Merge branch 'dannykopping/json_expressions' of github.com:dannykoppi…

640cb0c

…ng/loki into dannykopping/json_expressions

Adding test to cover invalid JSON line

55c58cd

Signed-off-by: Danny Kopping <[email protected]>

Adding equivalent benchmarks for JSON and JSONExpression parsing

d932763

Signed-off-by: Danny Kopping <[email protected]>

Adding suffix if label would be overridden

37ec118

Signed-off-by: Danny Kopping <[email protected]>

Reparenting jsonexpr directory to more appropriate location

70bfda5

Signed-off-by: Danny Kopping <[email protected]>

Setting empty label on non-matching expression, to retain parity with…

7e081ca

… label_format Signed-off-by: Danny Kopping <[email protected]>

owen-d approved these changes Feb 5, 2021

View reviewed changes

cyriltovena reviewed Feb 9, 2021

View reviewed changes

docs/sources/logql/_index.md Show resolved Hide resolved

cyriltovena reviewed Feb 9, 2021

View reviewed changes

pkg/logql/ast.go Outdated Show resolved Hide resolved

cyriltovena reviewed Feb 9, 2021

View reviewed changes

pkg/logql/expr.y Show resolved Hide resolved

cyriltovena reviewed Feb 9, 2021

View reviewed changes

pkg/logql/log/parser.go Show resolved Hide resolved

cyriltovena approved these changes Feb 9, 2021

View reviewed changes

Danny Kopping added 3 commits February 9, 2021 14:15

Adding statement about returned complex JSON types

1ad2976

Signed-off-by: Danny Kopping <[email protected]>

Added check for valid label name

61f5a1e

Signed-off-by: Danny Kopping <[email protected]>

Making json expressions shardable

7a8a4e2

Signed-off-by: Danny Kopping <[email protected]>

cyriltovena merged commit feb7fb4 into grafana:master Feb 9, 2021

dannykopping deleted the dannykopping/json_expressions branch February 9, 2021 13:42

dannykopping mentioned this pull request Feb 15, 2021

What do you want from Loki in 2021? #3119

Open

This was referenced Feb 18, 2021

Loki fails to create labels containing the symbol \uf04a #3038

Closed

LogQL: Selective label extraction with logfmt pipeline #3355

Closed

btaani mentioned this pull request Sep 29, 2022

Add logfmt selective label extraction #6675

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LogQL: Simple JSON expressions #3280

LogQL: Simple JSON expressions #3280

dannykopping commented Feb 2, 2021 •

edited

Loading

dannykopping commented Feb 2, 2021 •

edited

Loading

owen-d left a comment

owen-d Feb 3, 2021

dannykopping Feb 3, 2021

owen-d Feb 3, 2021

dannykopping Feb 3, 2021

owen-d Feb 3, 2021 •

edited

Loading

dannykopping Feb 4, 2021

owen-d Feb 4, 2021

dannykopping Feb 5, 2021

owen-d Feb 3, 2021

dannykopping Feb 3, 2021

owen-d Feb 3, 2021

dannykopping Feb 4, 2021

dannykopping commented Feb 4, 2021

owen-d commented Feb 8, 2021

cyriltovena commented Feb 9, 2021

cyriltovena left a comment

LogQL: Simple JSON expressions #3280

LogQL: Simple JSON expressions #3280

Conversation

dannykopping commented Feb 2, 2021 • edited Loading

dannykopping commented Feb 2, 2021 • edited Loading

owen-d left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

owen-d Feb 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dannykopping commented Feb 4, 2021

owen-d commented Feb 8, 2021

cyriltovena commented Feb 9, 2021

cyriltovena left a comment

Choose a reason for hiding this comment

dannykopping commented Feb 2, 2021 •

edited

Loading

dannykopping commented Feb 2, 2021 •

edited

Loading

owen-d Feb 3, 2021 •

edited

Loading