-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate .cargo_vcs_info.json and include in cargo package
#5786
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
The windows CI failures are real. I'll fix that with additional commits here. |
cargo package
cargo package
f5e6562
to
7eb7720
Compare
The package_verbose test has different output on windows, and from that output I can only conclude that on windows, and windows only, cargo is failing to find the parent repo. Note that difference. As the new .cargo_vcs_info.json of PR rust-lang#5786 is a best effort, informative only, this independent problem on windows seems acceptable to the feature.
7eb7720
to
b553d72
Compare
cargo package
cargo package
Ping @alexcrichton, this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for this! Could a test also be added to ensure that this file exists and has an expected structure?
Also cc @rust-lang/cargo, do any of y'all have feelings about this? This is a pretty minor feature but seems like it could be useful in the future! We may also want to have a bit more discussion about exactly what goes into this file
And finally, could documentation be added for this as well? Just to document the json format and such.
// Manual JSON format is safe given sha1 hash is hex encoded. | ||
writeln!(ifile, r#"{{"git_head_revision_sha1":"{}"}}"#, hsh)?; | ||
ifile.flush()?; | ||
ifile.file().sync_all()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm could you elaborate a bit on why the file locks and syncs are needed here? I was thinking we'd just synthesize this file directly into the tarball (like Cargo.toml.orig
without actually putting it on the filesystem)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured early on that I wanted a sentry file to try and make it at least inconvenient to spoof the hash, for example by protecting against interference from another cargo package
process. The changes would be reduced without the sentry file however. Do you think the anti-spoof efforts don't justify the increased complexity? Note: The test failure by platform diff will remain an issue even without the linked sentry file.
Another early thought was that cargo package
was doing something sentry file like, with the tarball produced being valided before being moved in place, so this was for parity.
); | ||
} | ||
|
||
// FIXME: From this required change in the test (omits final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm this seems like it's a bit worrisome, do we know what changed from before to cause this test to fail on windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we don't know that anything changed. the prior check_not_dirty
would not bail!
(which is avoided for the tests on both platforms) if either no-git-repo-found or git-repo-clean. So if there has been a change with this PR, for windows, then its a change between one of these results to the other. There is no existing test to pin down which one it is.
Also I've changed the order of checks (in cargo_package) vs list output, so another side-effect could be an earlier failure in a check, but that is not apparent. With the PR changes, cargo package
just won't generate a .cargo_vcs_info.json file in the repo-not-found or git-repo-clean cases, which makes the output, coincidentally, the same as before.
So I'm asking to move forward with merging this, then followup with a specific test for this which can be backported if necessary for personal diagnostic purposes. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: I have a logic flaw in the code comment, clarified above, which I'll fix. When I manually reviewed test fixtures, it looked like the sub-repo was dirty, so that makes me think this the no-git-repo-found case.
I'm thrilled to have this information available. I do think we should generate this directly into the tarball, if we can. |
We certainly can, per my understanding of the
As I personally think the odds of the above items are quite high, by objective review of past issues/PRs, I respectfully request reconsideration of this PR in its original form. While you consider that, I will be attempting a reliable test of the no-git-repo-found or git-repo-clean conditions on Windows. :-) |
It's a client-side mechanism, it's always going to be possible to fake it; let's not pretend it's a security mechanism. It's helpful to inform people, and if the build is reproducible then it can be cross-checked by independent builds. Some basic guard to avoid unintentional concurrent builds in the same target directory seems like a feature, but that's orthogonal to this. |
I accept that the sentry aspect is security in depth at best. I thought I was careful not to suggest greater sanctimony, see above:
...and its not entirely orthogonal. That would require significant independent new feature development. As in stands in master, the tarball acts like a guard but |
Prior to this PR and with #5820, the package_verbose tests now fails with the warning on windows: https://ci.appveyor.com/project/rust-lang-libs/cargo/build/1.0.5293/job/ug1evef71f6d6pav#L1656 While the test succeeds without warning on So the no-git-repo-found case is causing the disparity in the package_verbose test, as I feared. Suggested plan of action:
@alexcrichton : reasonable plan? |
@dekellum You were clear in your original post; I was responding to your comment at #5786 (comment) . I don't have any objection to attempting to detect concurrent builds; my only objection is that doing that only in the specific case of building in a git repository, rather than in every case, seems wrong to me. |
On a different note: we don't currently have a check for packages already containing a file I understand the rationale for doing so; it just seems unfortunate. |
@joshtriplett, regarding comment I would say the strategy is instead: write the .cargo_vcs_info.json under a lock, ensuring the same file makes it into the tarball. It just so happens that that file only contains metadata from git currently. In the future (see discussion in #5629) it may be extended to other (D)VCS like hg, or with other cargo assertions/metadata. With this reasoning there is no precedent set as this being a git-specific feature. On the latter point: It seems completely reasonable to me for any package manager, cargo included, to reserve certain file names for which it is to control the providence, format and meaning. Related, Ok with me proceeding as per plan? Note I'm also trying in parallel to just fix the windows repo difference noted above. |
@dekellum This shouldn't be specific to packages stored in a VCS, either. Why not just have a general check to avoid concurrent builds, rather than as a side effect of a lock on a particular file? (I'm not objecting to using As for reserving filenames: right now, as far as I can tell, Cargo doesn't reserve any names, even those it uses (which is potentially a problem). Changing that could potentially introduce compatibility issues, as well as interesting questions. What happens, for instance, if someone wants to have a directory of files used in test cases? I'm not trying to come up with arbitrary corner cases, and I'm not necessarily objecting to blocking the use of certain filenames, but we'd need to handle that carefully and think about it explicitly. What could potentially go wrong with starting to reserve filenames and prevent them from being in a Cargo crate? |
This PR agrees I think: |
The general check would be a |
Its probably best to defer this until #5820 is merged. I can then resolve the conflicts here and avoid the unfortunate windows specific test allowance. |
@dekellum let's remove the synchronization logic here. As @joshtriplett mentioned this isn't buying us any security really and this file is largely going to only be informative to other tools downloading from the registry. |
I strongly disagree with "this isn't buying us any security really" part of your statement, for the reasons previously described. Does your preferred direction also include hamstringing (and/or subverting) the feature by allowing the same filename in the source? |
@dekellum this is not a security feature, full stop. This is an informational feature for downloading crates from crates.io and allowing tools to opportunistically connect the source of a downloaded crate to the exact commit of a git repository. Arbitrary tarballs can be uploaded to the registry at any time, there's practically no restrictions other than file size. We do not have a threat model here, nor are we trying to develop a threat model. Publishing and packaging are "best effort" when it comes to concurrent publication because it almost never reasonably happens, so the file locking here is not necessary. |
Fun fact: When new people come to your project, they frequently attempt to emulate design choices made in the existing code. May I ask why there is the (somewhat inflexible) flock utility in cargo and why there is specific effort made to move the tarball into position only after it is completely built? Why aren't you clamoring to yank that sentry aspect out? And before your full stop, you haven't even illuminated me as to your position for reserving the file name/tarball path. I guess "help wanted" on the original feature request was a good bit more provisional than I originally imagined. |
With my efforts in #5820 decisively thwarted, I think this PR is now similarly rendered immovable and you can close it if you wish, as in full stop. I just want to leave one last note here for posterity: another way to achieve my initial (apparently not entirely embraced) objective of a reasonably reliable git sha1 hash recording, would be to change Best of luck. |
@dekellum the flock utilities are there to handle concurrent invocations of I do not have an opinion about the file/tarball path option. It's fine to go either way, I was trying to be clear about a different aspect of this PR before moving on to that. |
Closing in favor of #5886. |
Generate .cargo_vcs_info.json and include in `cargo package` (take 2) Implements #5629 and supersedes #5786, with the following improvements: * With an upstream git2-rs fix (tracked #5823, validated and min version update in: #5858), no longer requires windows/unix split tests. * Per review in #5786, drops file system output and locks for .cargo_vcs_info.json. * Now uses serde `json!` macro for json production with appropriate escaping. * Now includes a test of the output json format. Possible followup: * Per discussion in #5786, we could improve reliability of both the VCS dirty check and the git head sha1 recording by preferring (and/or warning otherwise) the local repository bytes of each source file, at the same head commit. This makes the process more appropriately like an atomic snapshot, with no sentry file or other locking required. However given my lack of a window's license and dev setup, as exhibited by troubles of #5823, this feel intuitively like too much to attempt to change in one iteration here. I accept the "best effort" concept of this feature as suggested in #5786, and think it acceptable progress if merged in this form. @alexcrichton r? @joshtriplett cc
Implements #5629
Edit 1, implementation notes:
I elected to generate the JSON with link in filesystem as
target/.cargo_vcs_info.json
(compare totarget/.rustc_info.json
). This makes it usable as a sentinel file. It also makes windows-mandory vs unix-advisoryFileLock
s a relevent distinction, which required some trial and error on appveyor CI here, now all sorted.I found one test case (
cargo test package::package_verbose
) with different output on windows, to quote b553d72:Let me know if you would also like a new issue for that test difference. Cargo windows bug?
Edit 2, more implementation notes:
Note that I needed to adjust the handling of listing in ops/cargo_package vs checks such as what is now
check_repo_state
(previously namedcheck_not_dirty
). This is so that .cargo_vcs_info.json, if it can be produced, is included in the listing. (It would seem poor to omit there.) A side effect of this, is that a user might need to use the--allow-dirty
flag to get a listing if the repo is dirty. That doesn't seem onerous to me. One test case reflects that change.