Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement stream-based ZipFile ExtractToDirectory and CreateFromDirectory method overloads #85491

Merged
merged 11 commits into from
May 30, 2023

Conversation

carlossanlop
Copy link
Member

Fixes #1555

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost
Copy link

ghost commented Apr 27, 2023

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #1555

Author: carlossanlop
Assignees: carlossanlop
Labels:

area-System.IO.Compression

Milestone: 8.0.0

Comment on lines 2 to 61
<root>
<!--
Microsoft ResX Schema

Version 2.0

The primary goals of this format is to allow a simple XML format
that is mostly human readable. The generation and parsing of the
various data types are done through the TypeConverter classes
associated with the data types.

Example:

... ado.net/XML headers & schema ...
<resheader name="resmimetype">text/microsoft-resx</resheader>
<resheader name="version">2.0</resheader>
<resheader name="reader">System.Resources.ResXResourceReader, System.Windows.Forms, ...</resheader>
<resheader name="writer">System.Resources.ResXResourceWriter, System.Windows.Forms, ...</resheader>
<data name="Name1"><value>this is my long string</value><comment>this is a comment</comment></data>
<data name="Color1" type="System.Drawing.Color, System.Drawing">Blue</data>
<data name="Bitmap1" mimetype="application/x-microsoft.net.object.binary.base64">
<value>[base64 mime encoded serialized .NET Framework object]</value>
</data>
<data name="Icon1" type="System.Drawing.Icon, System.Drawing" mimetype="application/x-microsoft.net.object.bytearray.base64">
<value>[base64 mime encoded string representing a byte array form of the .NET Framework object]</value>
<comment>This is a comment</comment>
</data>

There are any number of "resheader" rows that contain simple
name/value pairs.

Each data row contains a name, and value. The row also contains a
type or mimetype. Type corresponds to a .NET class that support
text/value conversion through the TypeConverter architecture.
Classes that don't support this are serialized and stored with the
mimetype set.

The mimetype is used for serialized objects, and tells the
ResXResourceReader how to depersist the object. This is currently not
extensible. For a given mimetype the value must be set accordingly:

Note - application/x-microsoft.net.object.binary.base64 is the format
that the ResXResourceWriter will generate, however the reader can
read any of the formats listed below.

mimetype: application/x-microsoft.net.object.binary.base64
value : The object must be serialized with
: System.Runtime.Serialization.Formatters.Binary.BinaryFormatter
: and then encoded with base64 encoding.

mimetype: application/x-microsoft.net.object.soap.base64
value : The object must be serialized with
: System.Runtime.Serialization.Formatters.Soap.SoapFormatter
: and then encoded with base64 encoding.

mimetype: application/x-microsoft.net.object.bytearray.base64
value : The object must be serialized into a byte array
: using a System.ComponentModel.TypeConverter
: and then encoded with base64 encoding.
-->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<root>
<!--
Microsoft ResX Schema
Version 2.0
The primary goals of this format is to allow a simple XML format
that is mostly human readable. The generation and parsing of the
various data types are done through the TypeConverter classes
associated with the data types.
Example:
... ado.net/XML headers & schema ...
<resheader name="resmimetype">text/microsoft-resx</resheader>
<resheader name="version">2.0</resheader>
<resheader name="reader">System.Resources.ResXResourceReader, System.Windows.Forms, ...</resheader>
<resheader name="writer">System.Resources.ResXResourceWriter, System.Windows.Forms, ...</resheader>
<data name="Name1"><value>this is my long string</value><comment>this is a comment</comment></data>
<data name="Color1" type="System.Drawing.Color, System.Drawing">Blue</data>
<data name="Bitmap1" mimetype="application/x-microsoft.net.object.binary.base64">
<value>[base64 mime encoded serialized .NET Framework object]</value>
</data>
<data name="Icon1" type="System.Drawing.Icon, System.Drawing" mimetype="application/x-microsoft.net.object.bytearray.base64">
<value>[base64 mime encoded string representing a byte array form of the .NET Framework object]</value>
<comment>This is a comment</comment>
</data>
There are any number of "resheader" rows that contain simple
name/value pairs.
Each data row contains a name, and value. The row also contains a
type or mimetype. Type corresponds to a .NET class that support
text/value conversion through the TypeConverter architecture.
Classes that don't support this are serialized and stored with the
mimetype set.
The mimetype is used for serialized objects, and tells the
ResXResourceReader how to depersist the object. This is currently not
extensible. For a given mimetype the value must be set accordingly:
Note - application/x-microsoft.net.object.binary.base64 is the format
that the ResXResourceWriter will generate, however the reader can
read any of the formats listed below.
mimetype: application/x-microsoft.net.object.binary.base64
value : The object must be serialized with
: System.Runtime.Serialization.Formatters.Binary.BinaryFormatter
: and then encoded with base64 encoding.
mimetype: application/x-microsoft.net.object.soap.base64
value : The object must be serialized with
: System.Runtime.Serialization.Formatters.Soap.SoapFormatter
: and then encoded with base64 encoding.
mimetype: application/x-microsoft.net.object.bytearray.base64
value : The object must be serialized into a byte array
: using a System.ComponentModel.TypeConverter
: and then encoded with base64 encoding.
-->
<root>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was automatically added by Visual Studio. Shouldn't it be preserved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a lot of resx files in our repo that preserve the automatically-added info:

I see fewer examples of resx files that do not include that info:

Copy link
Member

@ViktorHofer ViktorHofer May 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we are inconsistent here but it shows up as added in your diff. For this PR, please undo the change. Ideally, we would reach out to the team that owns the VS resx tooling and ask them to add an option to remove these auto-added comments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can remove it. Do you know who the owner of VS resx is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say -- let's do this :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an option is created to control this, we (dotnet/project-system) can make sure it's exposed in the Properties pane in VS for SDK-style projects, likely setting metadata on the MSBuild item. The CSPROJ team would need to do something similar for legacy projects. It's possible that other project flavours would need to be changed to support this everywhere.

I've long wanted to improve the generated comment in the .Designer.cs file too, to make it clearer what the string actually is (such as with a <c> tag) but again was concerned about pushback from zillions of diffs for something that's really just a cosmetic quality-of-life thing, and doesn't impact running code. For the .Designer.cs file, I assume doing this in a source generator would be a better option, but that doesn't help with the .resx comment/XSD.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an option is created to control this

Do we need an option? The idea in my mind is that it's removed from all templates and when editing via ResXResourceWriter or the designer (I don't recall whether that uses ResxResourceXXX or its own code) it is only written out if it previously existed. Would that work?

I agree that the Designer.cs should become a source generator eventually, but that is orthogonal other than touching many of the same components.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewnoakes I would like to open an issue in the right repo to continue the conversation, I'll copy what was mentioned here. I assume dotnet/project-system is the right place?

Copy link
Member

@drewnoakes drewnoakes May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dotnet/project-system owns the Resource Editor UI, but not the code generation. If we don't want an option for this, then I don't think there's any involvement from any project system. Eric's comment is a good summary of the situation.

The Resource Editor in VS uses the .NET Framework classes to read and write the file. I don't think the current object model would track whether such a comment/XSD existed when read in order to know whether to include it on write. That could be changed, but it'd need to be changed in .NET Framework.

@carlossanlop
Copy link
Member Author

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source changes LGTM. I only quickly skimmed the tests.

Copy link
Member Author

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions to address feedback.

@carlossanlop carlossanlop force-pushed the ZipFileStreamMethods branch from 95de910 to 322ca3f Compare May 23, 2023 19:30
@carlossanlop
Copy link
Member Author

carlossanlop commented May 23, 2023

Conflict resolved.
Edit: Wait, no. I did something weird.
Edit 2: I fixed the conflict causing the build failure, but I am seeing a weird test regression in wasm. I'm investigating.

public static void ExtractToDirectory(Stream source, string destinationDirectoryName, Encoding? entryNameEncoding, bool overwriteFiles)
{
ArgumentNullException.ThrowIfNull(source);
if (!source.CanRead)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will throw anyway


throw new ArgumentException(SR.ReadModeCapabilities);

@lewing
Copy link
Member

lewing commented May 25, 2023

It looks like the wasm failure is just that enumerating the files ends up in a different order after extraction?

@carlossanlop
Copy link
Member Author

It looks like the wasm failure is just that enumerating the files ends up in a different order after extraction?

Yep, that seems to be the case. I'll fix it by ordering the results. I wasn't sure that was the root cause at first, the error wasn't clear about that. It's why I pushed f1f23c4 to be able to see the enumeration comparison values in the error:

[02:11:18] info: [2023-05-24T02:11:18.029Z] [FAIL] System.IO.Compression.Tests.ZipFile_Extract_Stream.ExtractToDirectoryRoundTrip
[02:11:18] info: Showing first 10 differences
[02:11:18] info:   Position 2: Expected: notempty, Actual: binary.wmv
[02:11:18] info:   Position 3: Expected: binary.wmv, Actual: notempty
[02:11:18] info: Total number of differences: 2 out of 5
[02:11:18] info:    at System.AssertExtensions.SequenceEqual[String](ReadOnlySpan`1 expected, ReadOnlySpan`1 actual)
[02:11:18] info:    at System.AssertExtensions.SequenceEqual[String](Span`1 expected, Span`1 actual)
[02:11:18] info:    at System.AssertExtensions.SequenceEqual[String](String[] expected, String[] actual)
[02:11:18] info:    at System.IO.Compression.Tests.ZipFileTestBase.DirFileNamesEqual(String actual, String expected)
[02:11:18] info:    at System.IO.Compression.Tests.ZipFile_Extract_Stream.ExtractToDirectoryRoundTrip()
[02:11:18] info:    at System.Reflection.MethodInvoker.InterpretedInvoke(Object obj, IntPtr* args)
[02:11:18] info:    at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)

@carlossanlop carlossanlop force-pushed the ZipFileStreamMethods branch from e402b38 to 42097ce Compare May 29, 2023 04:22
@carlossanlop carlossanlop merged commit cfa28e3 into dotnet:main May 30, 2023
@carlossanlop carlossanlop deleted the ZipFileStreamMethods branch May 30, 2023 16:48
IDisposable added a commit to IDisposable/dotnet-runtime that referenced this pull request May 30, 2023
In dotnet#85491 the stream-based version of the `DoCreateFromDirectory` override copy-pasted comments from the string-based version
```csharp
// Rely on Path.GetFullPath for validation of sourceDirectoryName and destinationArchive
```
The last clause of that comment is not relevant in the stream-based version (the argument is `Stream destination`) so deleted that clause.

It's possible that the other comment is also misleading as we test the `compressionLevel ` argument just above (unlike in the `String destinationArchive` version
```csharp
            // Checking of compressionLevel is passed down to DeflateStream and the IDeflater implementation
            // as it is a pluggable component that completely encapsulates the meaning of compressionLevel.
```
carlossanlop added a commit that referenced this pull request May 31, 2023
* Remove comment in ZipFile.Create.cs

In #85491 the stream-based version of the `DoCreateFromDirectory` override copy-pasted comments from the string-based version
```csharp
// Rely on Path.GetFullPath for validation of sourceDirectoryName and destinationArchive
```
The last clause of that comment is not relevant in the stream-based version (the argument is `Stream destination`) so deleted that clause.

It's possible that the other comment is also misleading as we test the `compressionLevel ` argument just above (unlike in the `String destinationArchive` version
```csharp
            // Checking of compressionLevel is passed down to DeflateStream and the IDeflater implementation
            // as it is a pluggable component that completely encapsulates the meaning of compressionLevel.
```

* Remove compressionLevel validation comment

Co-authored-by: Carlos Sánchez López <[email protected]>

---------

Co-authored-by: Carlos Sánchez López <[email protected]>
@ghost ghost locked as resolved and limited conversation to collaborators Jun 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

System.IO.Compression.ZipFile.CreateFromDirectory: overloads to write to a stream rather than a file
9 participants