You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.
non-go just removes files that are not relevant to go anyways (e.g. .travis-ci.yml files, READMEs and so on)
go-tests also removes test files
unused-packages even removes some go code that is definitely not being used, since it is not referenced by the package that "owns" the vendor folder
I'd like to propose a pruning mode that is even more rigorous than unused-packages:
unused-files would remove every file that does not influence the (hash of the) resulting package that "owns" the vendor folder. This means that a package has to produce the same binary with a vendor folder that contains only unused-file packages as a package built with an unpruned vendor folder. It however must not be possible to remove any set of files/directorys/symlinks from a unused-files vendored dependency without also influencing the compilation result. Inversely this means that any file/directory/symlink that does not cause any change in the resulting binary has been removed. Even if it is in the same folder as an imported dependency, for example a go file that only contains comments or some classes that are not used and thus stripped away later would not be vendored in the first place with this strategy.
A naive approach would be to do an initial measurement with an unpruned vendor folder, get a list of files/folders/symlinks of the folder to be pruned and run a ddmin algorithm (e.g. https://github.com/dgryski/go-ddmin) over that list with the criterion that the binary hash must still be the same as the initial one. The remaining list of files/folders/symlinks is then not guaranteed to be a global minimum unfortunately, but it would be least 1-minimal (removing any single entry from that list would change the outcome). This can be sped up by various heuristics (e.g. it is VERY likely that pruning with the existing strategies first would shrink the list of potential files/folders/symlinks to be removed already down considerably while pure ddmin would struggle a while).
The advantages compared to the existing pruning strategies:
No need to decide if/how/which tests, testdata or symlinks are relevant, there is an objective measurement to decide if they are necessary
Minimizes the amount of data that needs to be checked in while keeping files unmodified (only rewriting code - e.g. stripping comments - would get any smaller than this)
Guarantees the same behavior as the unpruned version (the current strategies just assume this property I guess?)
The same strategy could be used to identify dead code/unused files in the actual code base too.
Computationally expensive if done by (re)compiling and comparing hashes
If more features of a vendored dependency are being used, it might become necessary to first get an unpruned version from upstream and then re-prune (this is already a potential issue with unused-files too)
You are only guaranteed to have the code you currently need available in your vendor folder, not a full "insurance" against vanishing upstreams (this is a general issue with pruning)
The text was updated successfully, but these errors were encountered:
hi! Sorry i didn't respond to this earlier - after releasing v0.4.1, my attention turned entirely to rebutting vgo.
This is an interesting approach, but the computational costs would likely undermine its utility for the big wins with verification, as now realized in #1912.
i'll keep this in mind for the successor tool, though!
https://golang.github.io/dep/docs/Gopkg.toml.html#prune currently defines 3 pruning modes:
non-go
just removes files that are not relevant to go anyways (e.g..travis-ci.yml
files,README
s and so on)go-tests
also removes test filesunused-packages
even removes some go code that is definitely not being used, since it is not referenced by the package that "owns" the vendor folderI'd like to propose a pruning mode that is even more rigorous than
unused-packages
:unused-files
would remove every file that does not influence the (hash of the) resulting package that "owns" the vendor folder. This means that a package has to produce the same binary with a vendor folder that contains onlyunused-file
packages as a package built with an unpruned vendor folder. It however must not be possible to remove any set of files/directorys/symlinks from aunused-files
vendored dependency without also influencing the compilation result. Inversely this means that any file/directory/symlink that does not cause any change in the resulting binary has been removed. Even if it is in the same folder as an imported dependency, for example a go file that only contains comments or some classes that are not used and thus stripped away later would not be vendored in the first place with this strategy.A naive approach would be to do an initial measurement with an unpruned vendor folder, get a list of files/folders/symlinks of the folder to be pruned and run a ddmin algorithm (e.g. https://github.com/dgryski/go-ddmin) over that list with the criterion that the binary hash must still be the same as the initial one. The remaining list of files/folders/symlinks is then not guaranteed to be a global minimum unfortunately, but it would be least 1-minimal (removing any single entry from that list would change the outcome). This can be sped up by various heuristics (e.g. it is VERY likely that pruning with the existing strategies first would shrink the list of potential files/folders/symlinks to be removed already down considerably while pure ddmin would struggle a while).
The advantages compared to the existing pruning strategies:
Downsides:
unused-files
too)The text was updated successfully, but these errors were encountered: