Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use regex to detect exclusions instead of string.Contains #21

Merged
merged 11 commits into from May 24, 2017
Merged

Use regex to detect exclusions instead of string.Contains #21

merged 11 commits into from May 24, 2017

Conversation

ghost
Copy link

@ghost ghost commented May 22, 2017

I added a struct which uses a regex to determine whether or not a file should be excluded.

if .git is excluded, it excludes .git and .git/*, and will not exclude .gitignore

in general if you have a list of exclusions .git, .gitignore the generated regexp will look like:
^\.git$|^\.git/.*$|^\.gitignore$|^\.gitignore/.*$

Fixes: #5

@kashav
Copy link
Owner

kashav commented May 22, 2017

@jonesbrianc26 Thanks for the contribution! I'll have to review/test it in more depth tomorrow, but I took a brief glance and it's looking very nice so far.

@ghost
Copy link
Author

ghost commented May 22, 2017

My test additions are pretty minimal, but I think this would be a perfect chance to add something like mentioned here #20

Also this regex is rather naive, so it could definitely be optimized and would probably clean up some of the logic during building it

@kashav
Copy link
Owner

kashav commented May 22, 2017

I've actually gotten started on something for that.

The temporary directory idea that was mentioned in #20 sounds like a cool idea, but might be a little excessive for this. Go automatically ignores anything inside a testdata directory, so the idea is to fill that directory with arbitrary files/fixtures and then use those in the tests.

@ghost
Copy link
Author

ghost commented May 22, 2017

I'm not too familiar with go tests unfortunately, but a good mock library would he helpful, but data driven tests would definitely fit the case here I think 👍

@kashav kashav changed the title Bugfix/fix exclusions Use regex to detect exclusions instead of string.Contains May 22, 2017
Copy link
Owner

@kashav kashav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, I love the direction.

Left some comments for things that can be improved. Let me know what you think.

"strings"
)

//Excluder allows us to support different methods of excluding in the future
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️

I think we can use this same interface to detect paths we've seen and apply regex to that to check for leading/trailing slashes (this is also naively implemented with a simple map and no regex right now).

Let me know if you'd like to work on that as well (should be in a different PR).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make an issue for this, see if anyone picks it up, otherwise I'll gladly take care of it!

func (r *RegexpExclude) ShouldExclude(path string) bool {
var b bool
var ok error
if r.regex == "" {
Copy link
Owner

@kashav kashav May 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When no exclusion is provided, this is running every single time and nothing is being outputted (since everything matches "").

The solution here would be to track if the expression has been built or not, and if it's been built and still is empty, then this method should return false without ever running the match method.


//RegexpExclude uses regular expressions to tell if a file/path should be excluded
type RegexpExclude struct {
recursive bool
Copy link
Owner

@kashav kashav May 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think this is being used anywhere? Would be a good idea to remove it (interested in hearing what it was intended for though).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was toying with the idea of preemptively excluding paths here, but decided against that for the scope of this PR. The idea being you could pass in flags to determine the behavior of the struct

type RegexpExclude struct {
recursive bool
exclusions []string
regex string
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be more appropriate if this was of type *regexp.Regexp.

}
regex = or(prev, curr)
}
r.regex = regex
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since r.regex is of type *regexp.Regexp now, this should read

r.regex = regexp.MustCompile(regex)

Note that this panics on invalid regex, which I think is the appropriate solution here, since it means the input was invalid/malformed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Behavior now is to print out the same broken error message, so I agree a panic would be preferred

r.buildRegex()
}

if b, ok = regexp.MatchString(r.regex, path); not(ok) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can now be reduced to:

return r.regex.MatchString(path)


func TestShouldExclude_ExpectAllExcluded(t *testing.T) {
exclusions := make([]string, 0)
exclusions = append(exclusions, ".git", ".gitignore")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little bit nitpicky, but you can do the equivalent in a single line with:

exclusions := []string{".git", ".gitignore"}


func TestShouldExclude_ExpectNotExcluded(t *testing.T) {
exclusions := make([]string, 0)
exclusions = append(exclusions, ".git")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too 😄

var curr string
for _, exclusion := range r.exclusions {
prev = curr
if strings.HasSuffix(exclusion, "/") {
Copy link
Owner

@kashav kashav May 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like what's happening here, but I think there's a simpler way to do it. So instead of ^.git$|^.git/.*$, we can use ^.git(/.*)?$.

You can simplify this loop by first right-trimming any / and replacing all . with \\., then wrapping the result with ^ and (/.*)?$ (I'd recommend doing this with fmt.Sprintf instead of individual functions).

I suggest creating a slice where each exclusion is put into that form, and then joining that slice with | (using strings.Join) after the loop is done.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like his alot. Definitely simplifies the logic

@ghost
Copy link
Author

ghost commented May 24, 2017

@kshvmdn

Refactored a bit of the code.

I went with what you said for all of these, great catches!

if r.regex == nil {
r.buildRegex()
}
if r.regex.String() == "" {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@kashav kashav merged commit bb9efe2 into kashav:feature/subquery May 24, 2017
@kashav
Copy link
Owner

kashav commented May 24, 2017

Thanks @jonesbrianc26!

Edit: Just FYI, you'll show up as a contributor when feature/subquery gets merged into master.

kashav added a commit that referenced this pull request May 24, 2017
* feature/subquery:
  Minor clean-up
  Use regex to detect exclusions instead of string.Contains (#21)
  Fix failing tests; track tokens with slice instead of linked list
  Add prelim. implementation of subqueries; update package structure
  Fix bug with missing ending identifier in parsed subqueries
  Replace stack implementation with Lane (oleiade/lane)
  Minor tokenizer refactor
  Add subquery tokenizer
@kashav kashav mentioned this pull request May 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants