-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster regexp prefixes #728
base: main
Are you sure you want to change the base?
Conversation
f9ee6fd
to
e4f85f7
Compare
30ca2ce
to
5e6f0bd
Compare
Given a regex like `a(b|c).*`, find ab and ac as prefixes. Even though the original may have looked like `ab.*|ac.*`, Go extracts the sub-prefixes, but we want the longer strings when the optimisation to put them into a map applies. Signed-off-by: Bryan Boreham <[email protected]>
Up to 32-byte values this saves garbage, runs faster. For prefixes, only `toLower` the part we need for the map lookup. Split toNormalisedLower into fast and slow paths, to avoid a penalty for the `copy` call in the case where no allocations are done. Signed-off-by: Bryan Boreham <[email protected]>
The case-sensitive path goes ~10% faster if we don't stop to ask whether it is insensitive. Signed-off-by: Bryan Boreham <[email protected]>
5e6f0bd
to
0f12cf6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting PR. I think the overall idea is nice, but I think I've found a potential issue. Generally speaking, I think we're lacking some tests around edge cases that could be introduced by these changes. Also the fact that I've run fuzzy tests for 5m and they passed, but my simple test has failed is a bit concerning about the tests coverage we have here :|
m.setMatches = matches | ||
m.stringMatcher = stringMatcherFromRegexp(parsed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I think this is superfluous and can be removed to keep code easier to follow. If there are setMatches
we only use them in matcher (see compileMatchStringFunction()
) so stringMatcher
is never used.
@@ -164,27 +171,27 @@ func (m *FastRegexMatcher) IsOptimized() bool { | |||
// findSetMatches extract equality matches from a regexp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] This comment needs to be updated. It returns prefixes too now.
} | ||
|
||
func findSetMatchesFromAlternate(re *syntax.Regexp, base string) (matches []string, matchesCaseSensitive bool) { | ||
func findSetMatchesFromAlternate(re *syntax.Regexp, base string) (matches []string, prefixes []prefixMatcher, matchesCaseSensitive bool) { | ||
matches = []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? 🤔
if i > 0 && re.Sub[i].Op == syntax.OpStar && len(matches) > 0 { | ||
prefixes = make([]prefixMatcher, 0, len(matches)) | ||
for _, prefix := range matches { | ||
right := stringMatcherFromRegexpInternal(re.Sub[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we're matching re.Sub[i]
as the right side, but what if i
is not the last sub of regexp and there are others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example this test fails:
const pattern = "a.*a|b.*b|c.*c|d.*d|e.*e|f.*f|g.*g|h.*h|i.*i|l.*l|m.*m|n.*n|o.*o|p.*p|q.*q|r.*r|s.*s|t.*t|u.*u|v.*v|z.*z"
standard := regexp.MustCompile(pattern)
require.True(t, standard.MatchString("aa"))
require.True(t, standard.MatchString("aXa"))
require.False(t, standard.MatchString("aXb"))
matcher, err := newFastRegexMatcherWithoutCache(pattern)
require.NoError(t, err)
require.True(t, matcher.MatchString("aa"))
require.True(t, matcher.MatchString("aXa"))
require.False(t, matcher.MatchString("aXb"))
if i == len(re.Sub)-1 && len(p) > 0 { | ||
prefixes = append(prefixes, p...) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got lost here. What is this used for? Can we have a code commend to explain it?
This is cherry-picking prometheus/prometheus#15210 and prometheus/prometheus#15237.
Do a better job of regexps with a long series of choices followed by
.*
, like:Benchmarks: