-
Notifications
You must be signed in to change notification settings - Fork 1k
gps: Refactor prune and filesystem functions #1219
gps: Refactor prune and filesystem functions #1219
Conversation
6f898f0
to
07a3587
Compare
4e92699
to
4db54a5
Compare
internal/gps/filesystem.go
Outdated
|
||
// calculateFilesystemState returns a filesystemState based on the state of | ||
// the filesystem on root. | ||
func calculateFilesystemState(root string) (filesystemState, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using calculate
a lot in this context but I'm not very comfortable with it. Any better suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deriveFilesystemState
, analyzeFilesystem
, or even just newFilesystemState
?
maybe "derive"?
…On September 24, 2017 12:58:04 PM EDT, Ibrahim AshShohail ***@***.***> wrote:
ibrasho commented on this pull request.
> +type filesystemState struct {
+ root string
+ dirs []string
+ files []string
+ links []fsLink
+}
+
+// fsLink represents a symbolic link.
+type fsLink struct {
+ path string
+ to string
+}
+
+// calculateFilesystemState returns a filesystemState based on the
state of
+// the filesystem on root.
+func calculateFilesystemState(root string) (filesystemState, error) {
I've been using `calculate` a lot in this context but I'm not very
comfortable with it. Any better suggestions?
--
You are receiving this because your review was requested.
Reply to this email directly or view it on GitHub:
#1219 (review)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I wonder how the API would feel with these functions as methods on filesystemState
instead.
I also noticed some of the logger
args are unused, if you want to clean up while you're in here.
b76ed0f
to
4d7004b
Compare
Hmm... 🤔 This could look nicer, but I didn't want to couple
I actually planned to use those to log all files that will be deleted. This was added in the last push. But I'm a bit reluctant now about logging all these lines. |
c4d36a0
to
92e138a
Compare
internal/gps/prune.go
Outdated
@@ -107,152 +111,141 @@ func pruneNestedVendorDirs(baseDir string) error { | |||
return filepath.Walk(baseDir, stripVendor) | |||
} | |||
|
|||
// pruneUnusedPackages deletes unimported packages found within baseDir. | |||
// pruneVendorDirs deletes all nested vendor directories within baseDir. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why keep the commented funcs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why this is a [WIP]. 😁
I'm trying to refactor strip vendor func
s to use the filesystemState
.
internal/gps/prune.go
Outdated
for _, path := range toDelete { | ||
logger.Printf(" * %s", path) | ||
|
||
if err := os.Remove(path); err != nil && !os.IsNotExist(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are we ordering the deletion of directories such that we don't leave unnecessary empty ones around, but still do it in the correct order so that directories are only removed once all of their contents have been removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the most annoying parts of pruning for me.
I opted to only delete files in the new prune functions, and empty directories are kept behind.
I think empty directories are a minor issue atm since git
and hg
ignore empty directories by default (not sure how bzr
handles them).
We'll need to replace filesystemState
with a hierarchical representation to allow determining which directories are empty efficiently or come up with a different solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to replace filesystemState with a hierarchical representation to allow determining which directories are empty efficiently or come up with a different solution.
If the paths are normalized, can the list just be sorted so that everything is deleted bottom-up? Then whenever the next item is a new dir, empty-check the previous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dirs are not added to the list currently. The logic to determine when to add a dir to the list will be more involved with the current struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't need to be in the list for what I was describing. I just meant that if the next file is in a different dir then the previous file, then it was the last file to delete from that dir (assuming a proper normalized sort), so empty-check it (and possibly its parent, and so on). This doesn't account for directories which were already empty before any pruning, but that seems like a weird edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think empty directories as a minor issue atm
we may be able to skate by in an initial implementation without them because of git/hg's treatment of empty dirs, but i suspect we'll need to catch up rather quickly. beyond the general annoyance empty dirs will cause for folks, i suspect that correct implementation of #1113 will also demand that we do something a little more sophisticated here, anyway.
i don't have a clear implementation in mind yet, but i have some vague ideas swirling around regarding prefix-based iteration/exploration of a trie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've 2 approaches in mind to do this:
- Loop through all dirs in the
filesystemState
struct and check if each dir is empty or not. This can be done once after all options are done, or after each option. I prefer doing it once at the end. - After each option deletes its files, determine the touched dirs from the slice containing the
toDelete
files, and loop through only those dirs and check for emptiness.
The key operation that I'm trying to minimize is opening the file to use os.File.Readdirname
.
I can keep looping over the slices to find files in a dir, but a prefix tree would make that easier. Giving filesystemState
hierarchy could be even better and more specialized.
But I'm trying to resist the temptation to change filesystemState
to a more complex struct that provides more functionality by finding a simpler solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just off the cuff, but maybe it's as simple as a sort function, where records represent dirs and their contents, and dirs that are subpaths of other dir records are moved to the front? that way you can proceed linearly through the sorted slice, assured that you will never visit a dir before any of its ancestors.
@jmank88 would you mind dismissing your review until this is no longer WIP? |
92e138a
to
12a1b64
Compare
a6ad0cc
to
b50a8b5
Compare
We now delete anything that looks like a symlink if its called "vendor" while pruning. Hopefully, you didn't make a bad decision by relying on the magical properties of symlinks. Signed-off-by: Ibrahim AshShohail <[email protected]>
b50a8b5
to
b4fca9b
Compare
I discussed this with @sdboyer and this is the approach we want to keep just because we can never be sure about the scope of those files.
@ChrisHines This should be solved now. Could you please try? If this still doesn't keep some directories (and it doesn't fail in the process) I would really appreciate an example project where this occurs. |
@ibrasho Pruning unused packages still leaves empty directories for pruned packages that are sub-directories of a bigger repository. Here is a tiny reproducer for you: https://github.com/ChrisHines/dep-test. If you clone that repo and run When I do it on my machine I see the following empty directories under vendor:
|
Signed-off-by: Ibrahim AshShohail <[email protected]>
can we feasibly add a test case to cover what's represented by @ChrisHines repository example there? |
@ChrisHines Thanks for the repo. Found the issue and fixed it now. Turns out the nested empty dirs were not handled properly. @sdboyer Sure. The basic issue was very simple and a small example can cover it. |
@ibrasho It looks like it's working perfectly now. I don't see any more problems. |
Signed-off-by: Ibrahim AshShohail <[email protected]>
wow. codeclimate has gotta go. |
@sdboyer Any update on the fix from their side? |
no and now they've got all these additional issues that are much less clear to fix - e.g. the code duplication complaints for table based tests. I don't see there being an easy solution there; I'd much rather we just move back to something strictly for code coverage. I think I was using codecov before with gps. |
@ibrasho (and others!), than you so much for this PR! I'm moving a decently sized Go monorepo at $WORK from using One edge case I'm seeing: Some larger libraries have license and legal files not only in the root, but also in the nested packages. We should probably delete such files if the nested package is not otherwise used. An example library if you need it for troubleshooting is
|
@alexkay so glad this PR is helping you make the jump! I realize it can be a bit annoying, but because there is no well-defined relationship between legal files and the code they are intended to cover, in order to err on the side of not pissing off lawyers unnecessarily, we're opting to just always keep those files. |
I have updated the target branch for this PR to the |
ok, we're good on this - pulling it into the release prep branch! 🎉 🎉 |
such a great job! @sdboyer which is the ETA for the next release including these changes? |
I've been trying to get a release out for a couple weeks now 😢. everything's currently held up on docs. i have an opportunity to blitz those on a plane this weekend, so I'm hoping... next week.
…On January 5, 2018 5:44:49 AM EST, Gorka Lerchundi Osa ***@***.***> wrote:
such a great job!
@sdboyer which is the ETA for the next release including these changes?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1219 (comment)
|
don't worry, would be a huge win so it's worth waiting! 👍 |
Did this get released yet? I can't seem to find it |
yes, it now happens automatically when you set appropriate values in |
What does this do / why do we need it?
This PR moves
filesystemState
from being a test struct to become part ofgps
. It's used to optimize the prune function by calculating the filesystem state and determining the files to prune in-memory.addition: This PR also includes the remaining plumbing required to make prune part of ensure.
What should your reviewer look out for in this PR?
Correctness and optimizations as usual. 😁
Do you need help or clarification on anything?
Any suggestion on how to handle striping vendor directories?
Which issue(s) does this PR fix?
One more step towards #944