-
Notifications
You must be signed in to change notification settings - Fork 441
Reproducible source checkouts for release builds #2388
Comments
+1000 I actually brought up this idea yesterday in #core-dev but suggested a slightly different approach (different tooling). The Go community's official tool for dependency and package management is dep and, as discussed at Gophercon 2017, should be the tool of choice going forward. Development for other tools in the same space (govendor, glide, etc.), as far as I am aware, have stopped and all the different tool communities are combining and working on dep. As for whether or not the I would suggest (and would be happy to do it!) vendoring code with |
Here is my branch that I started putting together last night with what the repo would look like vendoring with Still a WIP as I am working on issue with github.com/huin/goupnp as it wasn't properly |
One thing I'm fairly paranoid about is retroactive malware. If you gain access to the developer's account for one of our dependencies, you could potentially sneak in malware that ends up harming our users. Vendoring is one good way to protect against this - it means that you don't update your deps until you choose, and that also means you have a chance to visually inspect all changes to make sure nothing drastic happened. I think it's much preferable to have out of date package than to update continuously for this reason. At the same time, we do want to make sure that we're notified anytime that a dependency is updated because we need to make sure that if there's a security patch we get that asap (while still inspecting the fix for malicious intent) |
@DavidVorick Exactly, using a tool like Especially if we can get our packages that we import to tag releases, we can track this type of changes easily using Here is an example of current status of all Sia's dependencies and from which we could track changes: PROJECT CONSTRAINT VERSION REVISION LATEST PKGS USED
github.com/NebulousLabs/bolt branch master branch master a22e934 a22e934 1
github.com/NebulousLabs/demotemutex branch master branch master 235395f 235395f 1
github.com/NebulousLabs/entropy-mnemonics branch master branch master 7b01a64 7b01a64 1
github.com/NebulousLabs/errors branch master branch master 98e1f05 98e1f05 1
github.com/NebulousLabs/fastrand branch master branch master 60b6156 60b6156 1
github.com/NebulousLabs/go-upnp branch master branch master 620e235 620e235 1
github.com/NebulousLabs/merkletree branch master branch master 8482d02 8482d02 1
github.com/NebulousLabs/muxado branch master branch master b4de4d8 b4de4d8 5
github.com/bgentry/speakeasy ^0.1.0 v0.1.0 4aabc24 4aabc24 1
github.com/cpuguy83/go-md2man * v1.0.7 1d903dc 1d903dc 1
github.com/huin/goupnp * branch master 5b7801a 5b7801a 6
github.com/inconshreveable/go-update branch master branch master 8152e7e 8152e7e 3
github.com/inconshreveable/mousetrap * v1.0 76626ae 76626ae 1
github.com/julienschmidt/httprouter ^1.1.0 v1.1 8c199fb 8c199fb 1
github.com/kardianos/osext branch master branch master ae77be6 ae77be6 1
github.com/klauspost/cpuid * v1.1 ae7887d ae7887d 1
github.com/klauspost/reedsolomon ^1.6.0 v1.6 6bb6130 6bb6130 1
github.com/pkg/errors * v0.8.0 645ef00 645ef00 1
github.com/russross/blackfriday * v1.5 4048872 cadec56 1
github.com/spf13/cobra branch master branch master e5f66de 0dacccf 2
github.com/spf13/pflag * v1.0.0 e57e3ee e57e3ee 1
github.com/xtaci/smux ^1.0.5 v1.0.5 a1a5df8 a1a5df8 1
golang.org/x/crypto branch master branch master 9419663 9419663 4
golang.org/x/net * branch master 0a93976 0a93976 3
golang.org/x/sys * branch master 314a259 314a259 1
golang.org/x/text * branch master 1cbadb4 1cbadb4 20
gopkg.in/yaml.v2 * branch v2 eb3733d eb3733d 1
As you can see, |
Hey
I took a look at "dep". I like the tool - the only thing that I'd avoid is
the "polution" of the top level directory by putting files Gopkg.lock and
Gopkg.toml that have an unusual format and are usable only by `dep` itself.
`govendor` stores its config inside "vendor" dir and uses JSON which is
more nice I believe.
With regard to keeping sources of deps in Git or not:
ensure truly reproducible builds, as it guards against upstream renames,
deletes, and commit history overwrites
If commit hashes are stored, then upstream changes are not an issue - it is
still possible to checkout a specific commit. The only possible issue is a
repo deletion, but this is very unlikely to happen and many of us will have
a local copy to reupload it anyway.
I personally believe that anytime you import something, you are taking on
the responsibility of that code, which is why I like tracking it in vcs.
I think that vendoring-all approach is needed only to build binaries for
releases. Traditional approach to dependencies (having them in GOPATH, not
in vendor) is much more flexible which is good for every day development.
Imagine you play with the whole source tree (what I do from time to time) -
it is much easier to see your changes if dependencies are in GOPATH/src/
not in vendor. Another use-case: you change a dependency and run tests in
Sia - much shorter change-to-result latency.
Also, if we want 100% reproducibility, we also have to vendor all the
standard library and the compiler. Obviously we don't do it. I think we
don't have to do it with the sources of deps for the same reason. Commit
hashes are enough I believe.
If we still go with tracking sources of all deps in Git, we need to find a
way to get rid of unused files (e.g. tests). `govendor` already does it: it
only adds Go packages (not tests) and README. The size of "vendor" made by
`govendor` was 8MB, while `dep` made 34MB. Sia's own files are only 5.7MB.
…On Tue, Oct 3, 2017 at 5:30 PM, Derek McQuay ***@***.***> wrote:
@DavidVorick <https://github.com/davidvorick> Exactly, using a tool like
dep and bringing around the vendor/ directory in vcs protects against
what you are saying.
Especially if we can get our packages that we import to tag releases, we
can track this type of changes easily using dep.
Here is an example of current status of all Sia's dependencies and from
which we could track changes:
PROJECT CONSTRAINT VERSION REVISION LATEST PKGS USEDgithub.com/NebulousLabs/bolt branch master branch master a22e934 a22e934 1github.com/NebulousLabs/demotemutex branch master branch master 235395f 235395f 1github.com/NebulousLabs/entropy-mnemonics branch master branch master 7b01a64 7b01a64 1github.com/NebulousLabs/errors branch master branch master 98e1f05 98e1f05 1github.com/NebulousLabs/fastrand branch master branch master 60b6156 60b6156 1github.com/NebulousLabs/go-upnp branch master branch master 620e235 620e235 1github.com/NebulousLabs/merkletree branch master branch master 8482d02 8482d02 1github.com/NebulousLabs/muxado branch master branch master b4de4d8 b4de4d8 5github.com/bgentry/speakeasy ^0.1.0 v0.1.0 4aabc24 4aabc24 1github.com/cpuguy83/go-md2man * v1.0.7 1d903dc 1d903dc 1github.com/huin/goupnp * branch master 5b7801a 5b7801a github.com/inconshreveable/go-update branch master branch master 8152e7e 8152e7e 3github.com/inconshreveable/mousetrap * v1.0 76626ae 76626ae 1github.com/julienschmidt/httprouter ^1.1.0 v1.1 8c199fb 8c199fb 1github.com/kardianos/osext branch master branch master ae77be6 ae77be6 1github.com/klauspost/cpuid * v1.1 ae7887d ae7887d 1github.com/klauspost/reedsolomon ^1.6.0 v1.6 6bb6130 6bb6130 1github.com/pkg/errors * v0.8.0 645ef00 645ef00 1github.com/russross/blackfriday * v1.5 4048872 cadec56 1github.com/spf13/cobra branch master branch master e5f66de 0dacccf 2github.com/spf13/pflag * v1.0.0 e57e3ee e57e3ee 1github.com/xtaci/smux ^1.0.5 v1.0.5 a1a5df8 a1a5df8 1golang.org/x/crypto branch master branch master 9419663 9419663 4golang.org/x/net * branch master 0a93976 0a93976 3golang.org/x/sys * branch master 314a259 314a259 1golang.org/x/text * branch master 1cbadb4 1cbadb4 20gopkg.in/yaml.v2 * branch v2 eb3733d eb3733d 1
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2388 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA4KW8RQOeIF799pulxwn3W5sMy4pBJXks5somE4gaJpZM4PraDD>
.
--
Best regards,
Boris Nagaev
|
@starius After running |
It certainly seems like the Go community is gravitating towards What do we do about the Sia dependencies that are not using tagged releases? |
My branch: master...starius:vendor |
dep's homepage says "dep is the official experiment, but not yet the official tool." (emphasis by me)
https://github.com/golang/go/wiki/PackageManagementTools refers to https://github.com/blindpirate/report-of-build-tools-for-java-and-golang#conclusion-1 for vendoring tools popularity. |
Hmm, it reminded me the xkcd about Standards :-) |
There are several orthogonal issues in this issue:
|
It is very likely that As to issue brought up, I'll voice my opinion on them. which tool to use: dep whether to keep sources of deps in Sia's Git: deps should be brought along for the ride (that way we can ensure protection against retroactive malware by seeing what changes when we upgrade pkgs) whether to use the latest version or tags of deps: the best scenario is to use tagged releases (see here]). If that is not possible, we have to do what is second best and do it off a commit hash. As a lot of the repos that are not tagged @tbenz9 as to your question, we basically tag based on a commit hash for now. Its the least safe of the options, but it is better than having everything based on a |
I don't like having other projects' source code in Sia's git. It feels needless. I think my ideal dependency management tool would keep the sources outside in your GOPATH as usual, and just ensure that the correct git hash was checked out for each dependency before running That said, vendoring a copy does come with some advantages, and the disadvantages (larger repo size?) don't concern me as much as they might have in 1995. I think the best thing to do here is use whichever of @dmmcquay why is using a tag better than a commit hash? |
@lukechampine Dave Cheney says it better than I can: Gophers, please tag your releases Whether we like the I don't think of it as having "other projects' source code in Sia" because the reality is that is already is the case. When you have a Most large projects carry the vendor dir along for the ride (coming from the kubernetes community, go look at projects there and you'll see it). It really isn't an issue (just creates one large commit, but once the code is in you don't really notice.. unless you want to in the case of a pkg update which allows us to inspect the code more closely). |
whether to keep sources of deps in Sia's Git: IMHO, no. Keep only hashes to have a buildable version and to protect against retroactive malware. I think
It is duty of maintainers (e.g. in Debian) not software authors to fix versions of dependencies hard. Developers normally say "A depends on B >= 1.1", then maintainers pick up particular B (for instance, 1.2) and ship it to end users. Note that this B=1.2 is global for the whole distribution (e.g. Debian): all packages depending on B use B=1.2. When we put commit hashes of dependencies to Why don't "import" operators in Go contain commit hashes? I asked this question when I was new to Go. The answer is that nobody would update them: developers are lazy and maintainers don't touch things that work. Contrary, having "import"s pointing to the latest version forces people to move forward. If you clone and build a project with In large companies the only version of software of interest is the latest one (in master branch). It is guaranteed to work correctly. If the package development is in progress, it can be done in a separate branch (e.g. "dev") which is merged into "master" after isolation period. This approach works really well and that is the ecosystem Go was born in and for, I guess. |
Keeping the code in vcs is a topic of discussion and both camps have valid points. People have different opinions, personally I like bringing it along for the ride. If we don't track the versions of the packages we depend on and only keep track of commit hashes or checksums of the code, combing through the changes will be a bear (leaving open the possibility of retroactive malware bug) . I'm open to either options and would love more to chime in to get a sort of consensus. I disagree with you about imports in Go and I think its evident the Go team and others do as well. The reason these tools exist is to avoid building from master. The assumption that |
Not really. You can start a branch, checkout all the packages of current commits, add whole "vendor" dir to Git, update packages to the latest version and run
Was there a message from the Go team discouraging usage of traditional imports?
I believe it is true for majority of Go packages. If we find ourselves using a package that has broken master, we can either fork working commit of it to NebulousLabs/ or stop using the package. When I build software (not in Go) having copies of other software source in its tree the first thing I do is removing those copies and patching it to use copies provided by my environment. |
You can also easily compare commits using GitHub, e.g. NebulousLabs/bolt@657f184...a22e934 I agree with @dmmcquay that we shouldn't be building from master. I thought that was kind of the point, honestly. It's not that we're worried about a dependency changing their APIs and breaking our build, it's that we're worried about someone pushing malicious code to a dependency's As for why @starius, there's also some discussion about how best to manage modifying our dependencies (here). One of the obvious disadvantages of not copying the sources is that we can't easily modify them as needed. We've already done this at least once (for our go-upnp package). If we're not copying the sources, we need to either get our changes merged upstream, or fork the repo. Not saying that these are necessarily bad (some would argue the opposite, i.e. that vendoring discourages open source contributions), just something to be aware of. |
Tags are not secure because you can change what commit hash your tag refers to. Therefore it's not useful if the concern is stopping retroactive malware. Having a commit hash is also not good enough because there's no guarantee that you'll be able to find code that maps to that commit hash. We need to make sure that we preserve that code somewhere, and if that's not in the Sia repo itself, it needs to be in an NL repo at the very least. Otherwise we are vulnerable to data witholding-classes of attack |
TL;DR: use latest version of dependencies for development and fixed for releases.
We should not build production binaries (included in releases) from dep's master. But we should use dep's master for development: to keep up with new bugs asap, not when releasing.
We'll have more chances to discover it (and not putting into release binary) if we use dep's master during development.
It is possible to push such change in 3 steps (make backward compatible change in the dep first, then update main repo to use new API, then remove old API from dep). It should not be a common case though.
Keeping dependencies up to date in development environment is both safe and necessary in open source as well. Infecting the development environment can't be avoided, because you have to update dependencies at some point. Infecting development environment shouldn't result in a disaster anyway. What we can (and must) do is avoiding infecting released binaries - by discovering bad commit in a dependency soon.
Or vendor and fix there. IMHO, the best approach is fix locally (in vendor dir or in fork) and keep there until it is merged upstream. The "vendor" dir doesn't bring much in this situation, if you track source and commit of dependencies in your repo.
I think the safest place is our local hard disks where it is already :-) |
Perhaps another reason to fully vendor our dependencies: As for pulling from master in development and using fixed versions for release, I would be concerned there about subtle divergent behavior. For example, it's plausible that our test suite could pass when using latest dependencies, but fail with our pinned production dependencies. Also, infecting a developer's environment with retroactive malware is still quite awful. Not sure I understand @starius 's point about it being unavoidable because you have to update dependencies at some point. If we control and review every update made to our dependencies it seems far less likely that malware could slip in. |
I didn't know about the What do you think about vendoring the standard library and the toolchain? Few weeks ago I spend hours figuring out how to clean one project which has unusual source control system and build system, including several gigs of gcc binaries for several targets in the repo. I wanted to make sure I didn't run any of them and finally wrote a script removing everything but sources before starting the build. But I still don't know exactly how that thing builds. I wish that project was written in Go and build with just The priorities for build for dev and for prod are different. The former is interested in fast build using the latest (or even locally modified) version of dependencies (exactly what I was not accurate saying that the risk of infecting the dev environment is unavoidable - it is hard to avoid, but still possible. But instead of securing dev environment you can give developers maximum freedom (including root on the machine to install anything) but no privileges (like
We can run two Travis jobs: one for |
Yeah, I don't think that will be necessary. Seems extremely excessive to me.
I can see your point, but I don't think the priorities are that different. What is dev hopefully becomes prod eventually. I think they should be treated as similarly as possible (in general). My 2 cents: Use dep, bring in the dependencies, use hashes to mark our dependencies, tag our releases. |
This would likely be necessary in order to create bit-perfect deterministic builds. However, the current Go toolchain doesn't appear to support deterministic compilation anyway, so it's out of scope for now. I think the best we can do here is:
That way we can at least identify and reproduce any bugs or security issues introduced by a specific Go toolchain. |
I think at this point we've agreed that we must fully vendor all our dependencies. As per @DavidVorick's comment, we should also be pinning those dependencies to commit hashes, not tags (although IIUC this is less important when the sources are fully vendored). That just leaves the decision of which vendoring tool to use. |
I think we should provide a shell script producing the released binary from scratch: downloads and installs Go, downloads Sia and all its dependencies and buildes them. Then everybody can run the script and compare results bit-by-bit. See also #2410 Let me summarize arguments. Pro-fully vendoring (put all sources of deps to vendor dir):
Pro-hash-only vendoring (putting only hashes/commits of dependencies to Sia source):
Both decisions provide:
|
I tried to write a script for #2410 and discovered few things which made me less enthusiastic about the hash-only approach:
Can |
I think if we fully vendor and when releasing versions of Sia post what the hash of the binary needs to be, we can eliminate a portion of the security concern with bit-perfect builds, etc.. This puts the responsibility on us when releasing versions but this might be a path to quell these concerns a bit. Since we seem to be leaning towards committing the I would propose taking a look at this PR and adding comments there. It seems to me that |
I changed my mind and now lean towards committing the vendor. The PR looks good to me and I used it in my script building reproducible binaries. One comment: it bundles tests and other unneeded files. Could you vendor only Go files that are actually needed + README, please? |
We would also need the LICENSE files, right? Personally I think it's fine to include tests, but if there's a dependency that includes some 5MB testdata file or something, we should eliminate that |
These are things handled by dep, it’s not something you mess with manually otherwise you’ll have to keep track of each of these changes and would have no tooling around rebuilding those changes. iirc, If you run ‘dep prune’ it should remove many of these concerns. |
add "Feature Request" Label |
As of 4fe48d5 packages under Sia/ directory depend on 84 packages, many of them are not under NebulousLabs/.
If somebody wants to build Sia 1.3.0 in a year, it will be nontrivial to checkout proper revisions of all those packages (including those under NebulousLabs). Also it is hard to sign all the source needed to build a specific version: if you sign the commit behind tag 1.3.0, most of source is not covered by this signature (because it is outside Sia/ and its changes do not invalidate the signature).
At the same time some packages are already vendored: smux, pkg/errors, twofish.
I found the tool that can fix the issue: https://github.com/kardianos/govendor.
We can remove all files currently in
vendor
dir and instead put vendor/vendor.json there. The vendored sources will not be used for development (instead normal Go imports will be used, as usual) and they will be added to.gitignore
, butmake release-std
will checkout all vendored sources usinggovendor sync
. Alternative is to put all source of dependencies tovendor
- it is only 8MB - but then normal packages would not be used at all, which is likely to result in using old versions (even packages currently invendor
are not the latest revisions).The text was updated successfully, but these errors were encountered: