-
Notifications
You must be signed in to change notification settings - Fork 204
Large zip file cannot be read by Excel #388
Comments
Also, the file was created with the following zip.rs options: pub(crate) fn new(writer: W) -> Packager<W> {
let zip = zip::ZipWriter::new(writer);
let zip_options = FileOptions::default()
.compression_method(zip::CompressionMethod::Deflated)
.unix_permissions(0o600)
.last_modified_time(DateTime::default())
.large_file(false);
Packager { zip, zip_options }
} And the following dependency in Cargo.toml: zip = {version = "0.6.4", default-features = false, features = ["deflate"]} Turning on the Also, and I have no idea if this is helpful, but I get the same bad file result when I add the |
Looking at the
EDIT: In case it's not easy to regenerate the file, you can modify the field by replacing #!/usr/bin/env python3
with open('temp.xlsx', 'rb+') as f:
f.seek(0xCAA3A20)
assert f.read(1) == b'\x2e'
f.seek(0xCAA3A20)
f.write(b'\x2d') zipdetails outputs: Local file header for
|
Thanks for the analysis. I'll try the suggested fixes an let you know. |
@chenxiaolong That was a good guess. Changing the "Created Zip Spec" record to 0x2D (via the Python program) didn't help but changing "Extract Zip Spec" (or both) did. Here is the "ZIP64 END CENTRAL DIR" header from the modified file that will open in Excel:
|
Awesome, glad that worked! I didn't notice I accidentally grabbed the |
For additional context, I managed to get the Here is the "ZIP64 END CENTRAL DIR" header from the Excel generated file. It seems like Excel prefers the 4.5 Spec:
|
If the The > 65k file entries is a little bit unusual but there are more common cases of xlsx files that need ZIP64 support and the 4.6/4.5 issue would block those. |
In that case would a fix to only use 4.6 when the diff --git a/src/types.rs b/src/types.rs
index c3d0a45..e2f44d4 100644
--- a/src/types.rs
+++ b/src/types.rs
@@ -272,7 +272,10 @@ impl TryFrom<OffsetDateTime> for DateTime {
}
}
+#[cfg(feature = "bzip2")]
pub const DEFAULT_VERSION: u8 = 46;
+#[cfg(not(feature = "bzip2"))]
+pub const DEFAULT_VERSION: u8 = 45;
/// A type like `AtomicU64` except it implements `Clone` and has predefined
/// ordering. |
It looks to me that the version needed to extract in the zip64 record still uses the old fixed method. The non-64 version is already written dynamically, as can be seen here. Should be easy enough to fix. |
@mvdnes thanks for the pointer. I initially thought of something like this: diff --git a/src/write.rs b/src/write.rs
index 4cdc031..f9ef3fa 100644
--- a/src/write.rs
+++ b/src/write.rs
@@ -836,12 +836,19 @@ impl<W: Write + io::Seek> ZipWriter<W> {
}
let central_size = writer.stream_position()? - central_start;
+ let max_version_needed = self
+ .files
+ .iter()
+ .map(|f| f.version_needed())
+ .max()
+ .unwrap_or(DEFAULT_VERSION as u16);
+
if self.files.len() > spec::ZIP64_ENTRY_THR
|| central_size.max(central_start) > spec::ZIP64_BYTES_THR
{
let zip64_footer = spec::Zip64CentralDirectoryEnd {
- version_made_by: DEFAULT_VERSION as u16,
- version_needed_to_extract: DEFAULT_VERSION as u16,
+ version_made_by: max_version_needed,
+ version_needed_to_extract: max_version_needed,
disk_number: 0,
disk_with_central_directory: 0,
number_of_files_on_this_disk: self.files.len() as u64, However, that will only set the version the version to 4.5/45 if one of the file sizes is > 4GB and not for the case where the number of files > 64k. So that probably isn't correct. My previous Any suggestions on a better way to handle this? |
Closing here and moving to zip2: zip-rs/zip2#100 |
I have a library call
rust_xlsxwriter
that useszip.rs
to create xlsx files.The crate is reasonably well used and to date I haven't had any issues from Excel, either personally or from end users.
However, one user who is creating large (but not
large_file
large) 400MB files with lots of images reported an issue with the following file:Bad file: basic-full.xlsx
Excel complains about this almost immediately when it tries to open it:
I don't see any issues in the zipped XML data (the content) and if I unzip and rezip the file then Excel can open it:
Good file: basic-rezip.xlsx
Also, a Microsoft Open XML SDK validation tool complains about the zip file when I try to open the bad file:
So it looks to me like there is an issue with the zip container.
The bad file above was created with
zip.rs
v0.6.4 but I get the same result with v0.6.6. See the following for additional details: jmcnamara/rust_xlsxwriter#51 Also, I'm second guessing that the issue is with large files. It could be related to the number of files in the zip: 87814.Thanks for taking a look at this and let me know if you need any more data.
The text was updated successfully, but these errors were encountered: