-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use regex to detect exclusions instead of string.Contains #21
Conversation
@jonesbrianc26 Thanks for the contribution! I'll have to review/test it in more depth tomorrow, but I took a brief glance and it's looking very nice so far. |
My test additions are pretty minimal, but I think this would be a perfect chance to add something like mentioned here #20 Also this regex is rather naive, so it could definitely be optimized and would probably clean up some of the logic during building it |
I've actually gotten started on something for that. The temporary directory idea that was mentioned in #20 sounds like a cool idea, but might be a little excessive for this. Go automatically ignores anything inside a |
I'm not too familiar with go tests unfortunately, but a good mock library would he helpful, but data driven tests would definitely fit the case here I think 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, I love the direction.
Left some comments for things that can be improved. Let me know what you think.
"strings" | ||
) | ||
|
||
//Excluder allows us to support different methods of excluding in the future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
I think we can use this same interface to detect paths we've seen and apply regex to that to check for leading/trailing slashes (this is also naively implemented with a simple map and no regex right now).
Let me know if you'd like to work on that as well (should be in a different PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make an issue for this, see if anyone picks it up, otherwise I'll gladly take care of it!
query/excluder.go
Outdated
func (r *RegexpExclude) ShouldExclude(path string) bool { | ||
var b bool | ||
var ok error | ||
if r.regex == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When no exclusion is provided, this is running every single time and nothing is being outputted (since everything matches ""
).
The solution here would be to track if the expression has been built or not, and if it's been built and still is empty, then this method should return false without ever running the match method.
query/excluder.go
Outdated
|
||
//RegexpExclude uses regular expressions to tell if a file/path should be excluded | ||
type RegexpExclude struct { | ||
recursive bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think this is being used anywhere? Would be a good idea to remove it (interested in hearing what it was intended for though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was toying with the idea of preemptively excluding paths here, but decided against that for the scope of this PR. The idea being you could pass in flags to determine the behavior of the struct
query/excluder.go
Outdated
type RegexpExclude struct { | ||
recursive bool | ||
exclusions []string | ||
regex string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be more appropriate if this was of type *regexp.Regexp
.
query/excluder.go
Outdated
} | ||
regex = or(prev, curr) | ||
} | ||
r.regex = regex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since r.regex
is of type *regexp.Regexp
now, this should read
r.regex = regexp.MustCompile(regex)
Note that this panics on invalid regex, which I think is the appropriate solution here, since it means the input was invalid/malformed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Behavior now is to print out the same broken error message, so I agree a panic would be preferred
query/excluder.go
Outdated
r.buildRegex() | ||
} | ||
|
||
if b, ok = regexp.MatchString(r.regex, path); not(ok) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can now be reduced to:
return r.regex.MatchString(path)
query/query_test.go
Outdated
|
||
func TestShouldExclude_ExpectAllExcluded(t *testing.T) { | ||
exclusions := make([]string, 0) | ||
exclusions = append(exclusions, ".git", ".gitignore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little bit nitpicky, but you can do the equivalent in a single line with:
exclusions := []string{".git", ".gitignore"}
query/query_test.go
Outdated
|
||
func TestShouldExclude_ExpectNotExcluded(t *testing.T) { | ||
exclusions := make([]string, 0) | ||
exclusions = append(exclusions, ".git") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too 😄
query/excluder.go
Outdated
var curr string | ||
for _, exclusion := range r.exclusions { | ||
prev = curr | ||
if strings.HasSuffix(exclusion, "/") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like what's happening here, but I think there's a simpler way to do it. So instead of ^.git$|^.git/.*$
, we can use ^.git(/.*)?$
.
You can simplify this loop by first right-trimming any /
and replacing all .
with \\.
, then wrapping the result with ^
and (/.*)?$
(I'd recommend doing this with fmt.Sprintf
instead of individual functions).
I suggest creating a slice where each exclusion is put into that form, and then joining that slice with |
(using strings.Join
) after the loop is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like his alot. Definitely simplifies the logic
Refactored a bit of the code. I went with what you said for all of these, great catches! |
if r.regex == nil { | ||
r.buildRegex() | ||
} | ||
if r.regex.String() == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
Thanks @jonesbrianc26! Edit: Just FYI, you'll show up as a contributor when feature/subquery gets merged into master. |
* feature/subquery: Minor clean-up Use regex to detect exclusions instead of string.Contains (#21) Fix failing tests; track tokens with slice instead of linked list Add prelim. implementation of subqueries; update package structure Fix bug with missing ending identifier in parsed subqueries Replace stack implementation with Lane (oleiade/lane) Minor tokenizer refactor Add subquery tokenizer
I added a struct which uses a regex to determine whether or not a file should be excluded.
if .git is excluded, it excludes .git and .git/*, and will not exclude .gitignore
in general if you have a list of exclusions
.git, .gitignore
the generated regexp will look like:^\.git$|^\.git/.*$|^\.gitignore$|^\.gitignore/.*$
Fixes: #5