Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regexp simplification #1787

Merged
merged 16 commits into from
Mar 10, 2020
Merged

Regexp simplification #1787

merged 16 commits into from
Mar 10, 2020

Conversation

cyriltovena
Copy link
Contributor

@cyriltovena cyriltovena commented Mar 10, 2020

This PRs introduces simplification of regexp filter by parsing regexp and replacing them with combination of bytes.Contains aka literal filter.

The reasoning behind is that many regexp can be simplified to execute faster than actual regexp that requires backtracking and string capturing.

Currently we only support recursively those regexp operations:

  • literal (foo)
  • alternate (foo|bar)
  • concat (foo.*) and (f(oo|bar))
  • capture of any above.

Obviously we can support more (I have a lot of ideas) but I wanted to started with this for now and focus on what is next based on usage.

Now regarding benchmark I'm really impressed see below:

goos: darwin
goarch: amd64
pkg: github.com/grafana/loki/pkg/logql
Benchmark_LineFilter/default_true_foo.*-4         	21158535	        52.0 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_true_foo.*-4      	85218762	        14.0 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_false_foo.*-4        	22269228	        54.7 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_false_foo.*-4     	70311904	        17.3 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_true_.*foo.*-4       	  266133	      4547 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_true_.*foo.*-4    	86887345	        13.9 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_false_.*foo.*-4      	  265743	      4561 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_false_.*foo.*-4   	70870168	        17.4 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_true_.*foo-4         	  264886	      4608 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_true_.*foo-4      	86199532	        14.5 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_false_.*foo-4        	  214016	      5971 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_false_.*foo-4     	70008873	        18.9 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_true_foo|bar-4       	 3362852	       337 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_true_foo|bar-4    	26095603	        53.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_false_foo|bar-4      	 3665444	       381 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_false_foo|bar-4   	26247618	        43.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_true_foo|bar|buzz-4  	 3709946	       322 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_true_foo|bar|buzz-4         	29003151	        41.9 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_false_foo|bar|buzz-4           	 3651050	       323 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_false_foo|bar|buzz-4        	24917181	        53.7 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_true_foo|(bar|buzz)-4          	 2345220	       544 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_true_foo|(bar|buzz)-4       	28986980	        43.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_false_foo|(bar|buzz)-4         	 3044107	       402 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_false_foo|(bar|buzz)-4      	24970191	        49.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_true_foo|bar.*|buzz-4          	  568879	      2115 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_true_foo|bar.*|buzz-4       	29001889	        42.7 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_false_foo|bar.*|buzz-4         	  561146	      2210 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_false_foo|bar.*|buzz-4      	24876418	        48.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_true_.*foo.*|bar|uzz-4         	  320361	      3819 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_true_.*foo.*|bar|uzz-4      	26418298	        43.5 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_false_.*foo.*|bar|uzz-4        	  319743	      3692 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_false_.*foo.*|bar|uzz-4     	24772117	        48.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_true_((f.*)|foobar.*)|.*buzz-4 	  513464	      2356 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_true_((f.*)|foobar.*)|.*buzz-4         	14787681	        84.6 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_false_((f.*)|foobar.*)|.*buzz-4           	  385447	      2688 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_false_((f.*)|foobar.*)|.*buzz-4        	12987230	        96.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_true_(?P<foo>.*foo.*|bar)-4               	  323151	      5924 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_true_(?P<foo>.*foo.*|bar)-4            	27697644	        39.8 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/default_false_(?P<foo>.*foo.*|bar)-4              	  263703	      3805 ns/op	       0 B/op	       0 allocs/op
Benchmark_LineFilter/simplified_false_(?P<foo>.*foo.*|bar)-4           	27809758	        46.5 ns/op	       0 B/op	       0 allocs/op

Basically the smallest improvement is 5x and the biggest one is 333x 😮 !

Fixes #1542 and more !

Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
git push

Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
Signed-off-by: Cyril Tovena <[email protected]>
@cyriltovena cyriltovena requested review from pstibrany and owen-d March 10, 2020 02:32
@cyriltovena cyriltovena changed the title Regex simplification Regexp simplification Mar 10, 2020
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beautiful!

It took me a while to understand simplifyConcat and simplifyConcatAlternate, but with many tests with good coverage, it wasn't difficult to see what it is doing.

Well done, and I look forward to see more optimizations here.

pkg/logql/filter_test.go Outdated Show resolved Hide resolved
pkg/logql/filter_test.go Show resolved Hide resolved
pkg/logql/filter.go Show resolved Hide resolved
pkg/logql/filter.go Outdated Show resolved Hide resolved
pkg/logql/filter.go Outdated Show resolved Hide resolved
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good, I just have a few comments and questions.

pkg/logql/filter.go Outdated Show resolved Hide resolved
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

Signed-off-by: Cyril Tovena <[email protected]>
@codecov-io
Copy link

Codecov Report

Merging #1787 into master will increase coverage by 0.32%.
The diff coverage is 88.34%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1787      +/-   ##
==========================================
+ Coverage   64.01%   64.34%   +0.32%     
==========================================
  Files         121      122       +1     
  Lines        9027     9143     +116     
==========================================
+ Hits         5779     5883     +104     
- Misses       2844     2853       +9     
- Partials      404      407       +3
Impacted Files Coverage Δ
pkg/chunkenc/interface.go 87.5% <ø> (ø) ⬆️
pkg/chunkenc/dumb_chunk.go 0% <0%> (ø) ⬆️
pkg/ingester/tailer.go 38.01% <0%> (ø) ⬆️
pkg/chunkenc/lazy_chunk.go 0% <0%> (ø) ⬆️
pkg/logentry/stages/match.go 93.33% <100%> (ø) ⬆️
pkg/ingester/stream.go 77.77% <100%> (ø) ⬆️
pkg/storage/iterator.go 85.82% <100%> (ø) ⬆️
pkg/logql/ast.go 88.83% <66.66%> (+0.65%) ⬆️
pkg/chunkenc/memchunk.go 72.78% <83.33%> (ø) ⬆️
pkg/logql/filter.go 91.36% <91.36%> (ø)
... and 2 more

@cyriltovena cyriltovena merged commit d34b0a9 into grafana:master Mar 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize multiple simple regex "or" filters to use byte.contains
4 participants