Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch directory ownership to git ls-files #80

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

gmacdougall
Copy link

This switches away from using a glob match on '**' within
DirectoryOwnership matcher in favour of using git ls-files. This
provides a significant speed improvement for large repositories for a
few reasons.

The primary reason is that we're only searching through tracked files
and avoiding potentially large directores, such as node_modules.

The major intended side effect of this is that it only is dealing with
tracked files rather than all files.

I'm opening the PR for further discussion and refinement to see if this
is worth considering.

Speed comparison for me locally:

Before:

❯ time bin/codeownership validate

________________________________________________________
Executed in   16.72 secs    fish           external
   usr time    3.08 secs    0.10 millis    3.08 secs
   sys time   10.70 secs    1.59 millis   10.70 secs

After:

❯ time bin/codeownership validate

________________________________________________________
Executed in   10.67 secs    fish           external
   usr time    2.90 secs  111.00 micros    2.90 secs
   sys time    8.65 secs  741.00 micros    8.65 secs

@gmacdougall
Copy link
Author

The tests are failing here because the use of git ls-files. The tests are acting on failes in a non-git repository. I can fix this if these tests if this is a path forward we're interested in.

@technicalpickles
Copy link
Collaborator

I just remembered why we don't use git ls-files. In production & CI, we don't always have a git checkout. So it fails like CI fails here 😅

@gmacdougall
Copy link
Author

Maybe we could add an optional configuration to do this? Could even be something super customizable like a proc configuration that would just do the file glob be default.

@technicalpickles
Copy link
Collaborator

@gmacdougall you read my mind 😁 Yeah, detecting the ability to use git ls-files would make sense. If we make it a lambda, could even experiment with some of the fancier tools like fd

@professor
Copy link
Contributor

I like that we are trying to improve the speed of this command. I would prefer to find a solution that runs the same code in local development and on CI.

This switches away from using a glob match on '**' within
`DirectoryOwnership` matcher in favour of using `git ls-files`. This
provides a significant speed improvement for large repositories for a
few reasons.

The primary reason is that we're only searching through tracked files
and avoiding potentially large directores, such as `node_modules`.

The major intended side effect of this is that it only is dealing with
tracked files rather than all files.

I'm opening the PR for further discussion and refinement to see if this
is worth considering.

Speed comparison for me locally:

Before:
```
❯ time bin/codeownership validate

________________________________________________________
Executed in   16.72 secs    fish           external
   usr time    3.08 secs    0.10 millis    3.08 secs
   sys time   10.70 secs    1.59 millis   10.70 secs
```

After:
```
❯ time bin/codeownership validate

________________________________________________________
Executed in   10.67 secs    fish           external
   usr time    2.90 secs  111.00 micros    2.90 secs
   sys time    8.65 secs  741.00 micros    8.65 secs
```
@gmacdougall gmacdougall force-pushed the gmacdougall-use-ls-files-in-directory-ownership branch from 3e135ec to a7f1a70 Compare June 24, 2024 03:47
This allows us to decide which algorithm to use for finding code
ownership files within the repo.

If the configuration directive is set to use_git_ls_files then we will
use that rather than finding files via a glob. This will lead to a
significant speed improvement as finding files via glob is slow.
This allows us to configure a tool to use git ls-files over using the
standard file glob matching.
@gmacdougall gmacdougall force-pushed the gmacdougall-use-ls-files-in-directory-ownership branch from a7f1a70 to 04eea8f Compare June 24, 2024 03:50
This switches `-use-git-ls-files` to `--use-git-ls-files` because I
screwed this up.
This passes the argument from the CLI onto the configuration to be
loaded into the configuration through semi-hacky means.
@corsonknowles
Copy link

corsonknowles commented Oct 17, 2024

Yeah, this is great.

Don't worry about this comment:

I like that we are trying to improve the speed of this command. I would prefer to find a solution that runs the same code in local development and on CI.

The problem is that CI doesn't have everything we have locally.

I'm OK with compensating for CI if we don't want to add a checkout step there for performance reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants