Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimize common case of GlobPath #180

Merged
merged 3 commits into from
Dec 16, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions internal/strdist/strdist.go
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,15 @@ func Distance(a, b string, f CostFunc, cut int64) int64 {
// * - Any zero or more characters, except for /
// ** - Any zero or more characters, including /
func GlobPath(a, b string) bool {
if !wildcardPrefixMatch(a, b) {
// Fast path.
return false
}
if !wildcardSuffixMatch(a, b) {
// Fast path.
return false
}

a = strings.ReplaceAll(a, "**", "⁑")
b = strings.ReplaceAll(b, "**", "⁑")
return Distance(a, b, globCost, 1) == 0
Expand All @@ -125,3 +134,37 @@ func globCost(ar, br rune) Cost {
}
return Cost{SwapAB: 1, DeleteA: 1, InsertB: 1}
}

// wildcardPrefixMatch compares whether the prefixes of a and b are equal up
// to the shortest one. The prefix is defined as the longest substring that
// starts at index 0 and does not contain a wildcard.
func wildcardPrefixMatch(a, b string) bool {
ai := strings.IndexAny(a, "*?")
bi := strings.IndexAny(b, "*?")
if ai == -1 {
ai = len(a)
}
if bi == -1 {
bi = len(b)
}
mini := min(ai, bi)
return a[:mini] == b[:mini]
}

// wildcardSuffixMatch compares whether the suffixes of a and b are equal up
// to the shortest one. The suffix is defined as the longest substring that ends
// at the string length and does not contain a wildcard.
func wildcardSuffixMatch(a, b string) bool {
ai := strings.LastIndexAny(a, "*?")
la := 0
if ai != -1 {
la = len(a) - ai - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These algorithms are exact opposites of one another, so I'm missing why a choice was made not to implement them as such as well.

For example, if we look at the prefix function, the opposite would be:

if ai == -1 {
        ai = 0
}

Right? Doing something entirely different besides being more work, it seems, is also unnecessary cognitive load.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am probably thinking of the wrong algorithm but I do not think the above will work. We use the indexes for the prefix because they coincide with the prefix length, which is what we really want. Conversely, for the suffixes we want to compare the lengths of the suffixes to get the minimum one and, as far as I know, we cannot do that using indexes without calculating the deltas first because there is no correspondence between index and length. For example:

foo*baz
foobar*b

We know that the indexes are 3 and 6 respectively. If we take the minimum of the position then I do not see how to get to the fact that we should check only the last character of both strings because the minimum length of the suffix is 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must be talking about different things. I'm not suggesting a change in algorithm, but rather just pointing out that the algorithm was laid out differently between the suffix and prefix version. The actual comparison is exactly the same at the end.

Copy link
Collaborator Author

@letFunny letFunny Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking about this sentence for example:

At first I thought this was wrong, again because I had the prefix logic in mind and didn't realize here it's the minimum of the delta, rather than the minimum of the position

I cannot make it like the prefix logic that uses the position because for suffix position != length of the suffix. Meaning I have to use the deltas, unless I am missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you are missing something.

What's the difference between a[len(a)-minl:] and a[ai:]? :)

}
lb := 0
bi := strings.LastIndexAny(b, "*?")
if bi != -1 {
lb = len(b) - bi - 1
}
minl := min(la, lb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The opposite would be max here. At first I thought this was wrong, again because I had the prefix logic in mind and didn't realize here it's the minimum of the delta, rather than the minimum of the position, per logic above.

return a[len(a)-minl:] == b[len(b)-minl:]
}
Loading