Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit deterministic pack #8601

Open
nkolev92 opened this issue Sep 17, 2019 · 28 comments
Open

Revisit deterministic pack #8601

nkolev92 opened this issue Sep 17, 2019 · 28 comments

Comments

@nkolev92
Copy link
Member

nkolev92 commented Sep 17, 2019

The pack implementation for deterministic was reverted in 5.3 due to #8599.

This issue tracks re-enabling that feature.

Important:

The implementation of this feature should account for #7001.
Consider the #8603 scenario.

@nkolev92
Copy link
Member Author

Important scenario in #8603, hit because of #7001.

tl;dr; 1980-1-1 might not be the best standard date because of timezones.

@nkolev92 nkolev92 changed the title Long term plan for deterministic pack Revisit deterministic pack Sep 18, 2019
@poke
Copy link

poke commented Sep 22, 2019

I personally think that having a real date is somewhat useful. Since NuGet probably won't be able to figure out a deterministic real date for a package, I think having a way to externally set the date would be useful.

@tmat said in the dotnet issue the following:

Making the timestamp settable just shifts the problem elsewhere but it does not address the problem.

But I actually think that's exactly a valid way to solve this. NuGet does not have this information, nor has MSBuild. But there are systems that do have this kind of information: version control systems.

When generating a deterministic build and pack output, I would expect that to always happen based on a specific version in the VCS. So when packing deterministically, we could pass the commit date from the source revision the build is based on. That way, there would be a fixed and deterministic date (1) that makes sense (because it's real), (2) and that's even able to identify the source a pack is based on.

I could see that actually figuring out that date is not a responsibility for NuGet (since it would have to interact with the VCS then), so the default could be a different behavior. But then I would really like there to be some way to externally set the date during pack, so that e.g. a build process can pass the revision date to the pack step.

@olivier-spinelli
Copy link

Yes, having a real date is somewhat useful, and yes the commit date is the best candidate (when a VCS is used), otherwise the date can be derived from... whatever you can imagine.

Honnestly a simple PackageDate[Time]Utc or a more general BuildDate[Time]Utc property that could be initialized with UtcNow seems perfect to me...

...or did I miss something huge?

@tmat
Copy link

tmat commented Oct 3, 2019

I don't see why is there a need for timestamps on the items zipped in the NuGet package. Why wouldn't the version stored in these files be sufficient/better.
NuGet packages already have commit SHA metadata from VCS, so it is easy to find out what source snapshot were they built from.

@nkolev92
Copy link
Member Author

nkolev92 commented Oct 3, 2019

@olivier-spinelli Using UTC now would break determinism.

I don't see why is there a need for timestamps on the items zipped in the NuGet package. Why wouldn't the version stored in these files be sufficient/better.

I agree.

NuGet packages already have commit SHA metadata from VCS, so it is easy to find out what source snapshot were they built from.

The problem is not every package does that. Even then, depending on the timestamp is a bad idea.

@tmat
Copy link

tmat commented Oct 3, 2019

he problem is not every package does that. Even then, depending on the timestamp is a bad idea.

Agreed. Just saying we already have a mechanism that associates package with the source snapshot.

@olivier-spinelli
Copy link

You got me wrong: I absolutely don't care of all these date/time. My goal is only to achieve "Deterministic Packages".

My "proposal" is only because there are dates at some places (and that purely removing them seems unrealistic).

I'm just saying: let's define ONE MSBuild date property, let's it empty by default (following the MSBuild empty property pattern for (re)definition), at the very last moment, use the Conditional == '' to set the UtcNow time (and yes, in such default, unspecified, scenario, we have lost the "Deterministic Package")...

But for ANY other scenario, we can set this MSBuild Date to be what "we" (we being the tool) want...

@rrelyea rrelyea modified the milestones: 5.4, Backlog Oct 31, 2019
@nkolev92 nkolev92 added the Priority:2 Issues for the current backlog. label Apr 23, 2020
@mikeblakeuk
Copy link

Where are we with this? It's super confusing to me.

@erdembayar
Copy link
Contributor

@zivkan
I'm solving this problem with together with other problem #7001 and #7395. Please check this comment.

@nkolev92
Copy link
Member Author

@erdembayar

Deterministic pack being enabled by default broke a lot of tooling (admittedly a fragile one, a dependency that should've never been taken).

Please do not do anything for this issue at this point.

@erdembayar
Copy link
Contributor

@erdembayar

Deterministic pack being enabled by default broke a lot of tooling (admittedly a fragile one, a dependency that should've never been taken).

Please do not do anything for this issue at this point.

Sorry for miscommunication. Actually I'm not really fixing it, but I'm making sure my change it ready for deterministic pack re-enabling later. It seems my change it good for it, I tested locally by manually enabling it, currently it's explicitly disabled in code.
Anywhere I'm not going to do anything for this issue at this point as you suggested.

@olivier-spinelli
Copy link

olivier-spinelli commented Jan 30, 2021

... I'm lost ... @nkolev92, please, could you explain the proposal above that could be named "Tool, Please Choose the Date!" MAY weaken anything?
It's not enabled by default, it's just about deciding a release date (and working in UTC of course...).

If it's the (unfortunate choice of) DateTimeOffset that is blocking it can obviously stay (whith a 0 offset) WHEN the <PackageReleaseDate> property is set.

@zivkan
Copy link
Member

zivkan commented Jan 30, 2021

@olivier-spinelli if you follow the links, you'll find this issue where customers were saying that web deploy was not updating binaries.

In other words, it appears that web deploy is only checking file timestamp. Therefore, if packge.dll has the timestamp in both version 1 and version 2 of the package, web deploy may not detect that the file has changed and therefore needs to be deployed.

Your proposal only works if the package author changes the MSBuild property specifying what file timestamp they want every time they increment the package version. However, the lowest effort, and therefore most likely, usage of your proposal is to set the date once in the project file, and keep using it forever. This will not cause immediate problems to the package author, but will to consumers of the package who are also using web deploy or zip deploy, as explained by the linked issue.

@olivier-spinelli
Copy link

Ok... but... I cannot imagine for 1 second a dev that could hard code this MSBuild property in a .proj file!
This is obviously a property that has to be externally set by tools: by default it simply remains empty (that is: "use now").

And IF someone hard codes it, then he/she has exactly what he/she wants: a fixed released date of its package that leads to binary compatibility of its package.

@John0King
Copy link

Since NuGet probably won't be able to figure out a deterministic real date for a package

why nuget care about the timestamp or date ? shouldn't nuget only care about the binary and version ?

all the binary is not changed for rebuild , but the nuget package is changed even all the nuget property (package id, name, version, author etc.)

the name of <name>.psmdcp can use sha1 or something to compute the binary to get a unchange name for unchanged binary

@nkolev92
Copy link
Member Author

@John0King If the date isn't deterministic then the package content would still be different from build to build, thus not deterministic.

@John0King
Copy link

@nkolev92 why is that ? the build output (eg. output.dll and output.xml) no change is the source code is not change ?

@Kuinox
Copy link

Kuinox commented Aug 29, 2022

The zip register the date of the file, the file contents itself are deterministic.

I made a dotnet tool to make a nuget package deterministic:

https://github.com/Kuinox/NupkgDeterministicator

@John0King
Copy link

@Smaug123
Copy link

Smaug123 commented Jul 11, 2024

Can we not just respect SOURCE_DATE_EPOCH if it's set, like every other tool does? That way, people who care about determinism can opt in using exactly the same mechanism they use for every other tool, and people who don't care can just not.

(By the way, Nixpkgs computes an appropriate value by taking the max of the input file timestamps, but that only works if you've got the source from a tarball or zip file or something that contains file modification metadata. git checkout, for example, doesn't set file metadata this way, so it's probably not a goer.)

@zivkan
Copy link
Member

zivkan commented Jul 11, 2024

When we tried deterministic nupkgs (using the same timestamp on all files, although there were other changes too) back in 2019, some deployment tools stopped working correctly because it didn't notice that when the contents of x.dll changed, but kept the same timestamp.

I didn't spend much time looking into your suggestion, but I don't think that an environment variable is going to fix that problem, and it definitely doesn't allow some people to opt in while other opt out, since the zip/nupkg itself bakes in timestamps. Although I don't know if anyone checked to see if those deployment tools have changed to stop using last modified as the only information to make decisions on.

@Smaug123
Copy link

Personally I'd argue that if anyone is still using broken-by-design deployment tools five years after being made aware of the problem, then it's up to them to fix their tools, but of course it's for the NuGet team (not me) to make decisions about backward compatibility. After all, anyone publishing a NuGet package is free to set their system clock however they want, in principle; person X's deployment system should not have the property that I can break X's deployments just because my computer is suffering from clock skew.

@joshudson
Copy link

It looks a lot less likely to break things to fix the contents of the zip file to be 100% reproducible.

There are two sources of non-reproducibility:

  1. The Id="Rhex" in _rels/.rels ; Clearly this should be a hash of the file contents that is being referred to.
  2. The file name of the .psmdp file; I'm not sure what that guid is supposed to be but it should either be a hashguid of the contents of something (either itself or all the packaged binaries) or it should be specified in the build sources somewhere.

@joshudson
Copy link

Hilariously there is a null value for dates in a zip file, and I think it very likely the MS Zip library blows up encountering it. The value is 0xFFFF for the day part and the time part doesn't matter. This is the intended value for storing dates after the year 2107 as that would be an overflow and overflows convert any field into 0xFFFF.

@nkolev92
Copy link
Member Author

It looks a lot less likely to break things to fix the contents of the zip file to be 100% reproducible.

There are two sources of non-reproducibility:

  1. The Id="Rhex" in _rels/.rels ; Clearly this should be a hash of the file contents that is being referred to.
  2. The file name of the .psmdp file; I'm not sure what that guid is supposed to be but it should either be a hashguid of the contents of something (either itself or all the packaged binaries) or it should be specified in the build sources somewhere.

Digging into the code, the implementation of deterministic pack is still checked in actually.

I think the #1 indeterminism might be driven by #2.

@joshudson
Copy link

@nkolev92 : On re-build and test, the guid and the Rhex reference to the file are not the same. The other Rhex to the nuspec file is the same on rebuild.

The contents of the .psmdp file are the same in both cases so the Rhex isn't a hash of the contents.

@joshudson
Copy link

joshudson commented Oct 19, 2024

Yup it's just generating a new Guid every time. https://github.com/NuGet/NuGet.Client/blob/428d2d0013971187ac5f4ce4fc6402f191cf28e0/src/NuGet.Core/NuGet.Packaging/PackageCreation/Authoring/PackageBuilder.cs#L465

I haven't been able to determine any outside definition of such a file type; it's just embedding an arbitrary Guid here for no apparent reason.

@nkolev92
Copy link
Member Author

@joshudson If you look at the code right above it, there's already a code path that makes that consistent when deterministic packages are specified, so that probably would be solved in a theoretical partial implementation of deterministic pack.

https://github.com/NuGet/NuGet.Client/blob/428d2d0013971187ac5f4ce4fc6402f191cf28e0/src/NuGet.Core/NuGet.Packaging/PackageCreation/Authoring/PackageBuilder.cs#L451-L462

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests