-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.IO.Compression test asset and native dependency proposal #14403
Comments
Another alternative might be forking https://github.com/madler/zlib into https://github.com/dotnet and applying the few changes there, to keep the corefx repo on a single license. |
Thank you, very good suggestion, I'll reach out to Mark. |
Though it does seem reasonable to me to have the test assets in separate nuget packages, I'd probably would just bypass nuget, and add a build target to download a specific revision of the assets separate GitHub repository. One can even take advantage of GitHub builtin download feature to bypass git and just get the files at a specific commit (target would then download and unzip the assets). |
Hi @pedrolamas - thanks for the reply. To elaborate, I planned to host the test asset nuget packages on a myget.org feed "dotnet-corefxtestdata", because we already have dependencies on myget.org and nuget.org engrained in to our build. I'd like to avoid introducing for the first time a build-time dependency on github.com unless there was a compelling reason to. I'm also unfamiliar with the github feature to download files at a specific commit - but would that download all files in the repository? By splitting the test assets in to individual nuget packages per test assembly, I'd hoped to prevent individuals running tests for one assembly to need to download assets for another assembly - see in the spec where System.IO.Packaging has 60GB of total test assets. Actually, if some libraries end up having very large test assets across all their tiers, that may cause us to want to split the test asset git repositories in the future, too. In the speclet above I said that all the assets would be in a single repository, what do people think of that? It's turned out to be very convenient for us to have the corefx libraries share the same corefx repository. The advantages we've seen include a single issue and pull request list, single URL we can point people to in documentation, and single area for people to collaborate together, contribute and witness progress. Full disclosure - I was originally a proponent of having more repositories with fewer libraries, but in hindsight I'm glad we did what we did. |
Although I understand the desire to not commit large binary assets to the same repo, I am not especially comfortable with keeping test assets in another repo. As an alternative, would it be possible to generate these assets, with the code to generate them checked in to the main repo?
|
Also, is it possible to look at the existing tests and understand whether the size and number of existing tests assets is actually required? 60GB of test assets feels particularly excessive.
|
Sounds fine to me. In reality, almost no-one is going to clone that repo anyway and the test data is not changing constantly as well, so the arguments for "don't put binaries in the repo" don't really apply. Having said that, the 60 gigabytes of test assets are a different story, not sure if that should really be checked into a git repository (does GitHub even support repos with that size)? Personally, I don't really care if those assets are published in a repo, having them only on myget and internally on some MS file share should be enough.
The problem with that is you can't be sure you get the exact same results in a few years from now cause dependencies, etc. might change and you might not notice that you regressed something. |
Yes... 60GB is a lot... I don't directly own that area so I don't have full context on the test assets, so I'm not sure if they should be reduced but operating on the assumption that they shouldn't. Unfortunately @eatdrinksleepcode it's not feasible to generate these test assets on the fly. To elaborate on @akoeplinger 's response, many of these test assets are designed to maintain compatibility with binary formats. In System.IO.Compression's example, we generated archives with 7-Zip, Windows Explorer, and other compression libraries to ensure compatibility. @akoeplinger - that was my original thought for jamming them all in the same repository but splitting it in to individual nuget packages for test time. We'll have to see what happens when System.IO.Packaging comes online. If we didn't keep the System.IO.Packaging assets in a git repository, where would we keep the nuspec to regenerate new versions of the package? Keep the nuspec in the repo, and the assets on myget, with a workflow to update the package of:
|
@akoeplinger Why would that be the case? Our build should always be deterministic; if it's not, we need to fix that. And there is no reason for the dependencies of the binary generator to change, so I don't see that as a problem either. |
@eatdrinksleepcode as Matt mentioned, some of them are designed to ensure compatibility. You'd need to e.g. keep a Windows XP machine around to generate a ZIP file with Windows Explorer as later OSes might've changed the implementation. @FiveTimesTheFun .nuspec in the repo, yep. sounds acceptable. |
GitHub announced Git Large File Storage (LFS) a couple days ago, which might be useful for the test assets. |
In regards to the native dependencies, I maintain a fork of the Zlib Silverlight implementation from DotNetZip that is portable, and currently distributed on NuGet. Would it make sense to provide that code for use in the Framework? HTH! |
While zlib is quite stable/secure, I would like to see minimal friction around updates to native libraries in general. Anything we can do to ease the pain of a native binary nuget package would be wonderful for all of us. We've hacked our tree together on AppVeyor, but have to use custom scripts to extract the nuget packages and assembly the components. |
@nathanaeljones , we're actively working with the NuGet folks to enable better support for native dependencies. |
@FiveTimesTheFun, can this be closed now? |
There are much faster versions of zlib. Consider Intel or Cloudflare's fork of 1.2.8: http://www.htslib.org/benchmarks/zlib.html Any reason you are using a buggy 2005 edition of zlib (v1.2.3) instead of 1.2.8? |
Closing since @FiveTimesTheFun went ahead and created https://github.com/dotnet/corefx-testdata back in April to store Compression, Packaging, and X509 Cert test data. |
System.IO.Compression provides an API to compress and decompress files. It supports the ZIP and gzip archive formats and the Deflate compression algorithm.
_Testing_
System.IO.Compression has existing test assets that fit three categories:
The existing assets include archives, as well as decompressed binary files both for use as source material, and an oracle to verify decompression. Altogether, the binary assets weigh in at about 40MB.
Problem: these test assets need to exist at test run time. Placing them in our corefx git repository is problematic because binary assets do not compress well and will bloat the size of our repository.
Proposal: Binary test assets are packaged in to a nuget package per test assembly to be deployed at test run time. The binary assets to generate these nuget package are persisted in a single separate Git repository.
Git repository structure:
corefx-testdata
corefx-testdata\System.IO.Compression.TestData.nuspec
corefx-testdata\System.IO.Compression.TestData<binary assets>
Nupkg structure:
content<binary assets>
Test asset nuget packages are depended on by the test assembly’s packages.config
Currently, nuget is designed to deploy at package reference time (ie. when a package is referenced in Visual Studio) rather than runtime. In Dev14 this is changing. Until then, we’ll include a target in the test assembly’s csproj to copy the assets from the downloaded nuget package to the test’s working directory.
In addition to System.IO.Compression, metadata reader uses binary test assets, but as they are only 100KB combined and are not a practical problem. System.IO.Packaging has 60GB of test assets.
Proposal: System.IO.Packaging split their test assets in to tiers, with a single nuget package per tier. The inner loop tests should be no more than some number of megabytes.
_Native Dependencies_
In the desktop .NET Framework, although we have a purely managed implementation of the deflate algorithm, customers have reported that the popular cross-platform open-source zlib library is faster and performs high quality compression. In response we include zlib, which we wrap and delegate to. We build it from source using the unmodified 1.2.3 version of the sources in the FX partition, and name the resulting output “clrcompression.dll.”
On Windows, to fit with .NET Core’s app-local, side-by-side model, if we include zlib, we should deploy a copy of the native assembly alongside the managed System.IO.Compression.dll assembly included with an application. zlib distributes both sources and official binaries. While we could redistribute the official zlib binary, unfortunately they only release a 32-bit version and we are currently distributing a 64-bit version of coreclr.
While zlib has a permissive license, it is not the MIT license, and including it in the corefx repository would mean the corefx repository no longer only contained code under a single license.
Proposal: Continue building clrcompression.dll from the FX repository. Publish the 64-bit version of clrcompression.dll in a nuget package System.IO.Compression.clrcompression-x64.nupkg. Make System.IO.Compression.clrcompression-x64 a dependency of the System.IO.Compression package.
X-Plat todo: Linux has a strong history of deploying applications and their dependencies through package managers. Packagers bundle the application binaries and metadata describing dependencies in to packages, and publish those on package manager repositories. We are planning to declare other framework and runtime dependencies through package managers where appropriate, and System.IO.Compression’s zlib dependency should be consistent.
The text was updated successfully, but these errors were encountered: