zip `2.1.3` regressed large zips with >64k files #189

Swatinem · 2024-06-07T09:50:51Z

I have a huge zip file with well over 64k files.

Version 2.1.1 was able to correctly parse that because inside of get_metadata, there would only be one ok_results, the one with the correct number of files.

Version 2.1.3 however regressed that behavior, and it would parse both zip64 and zip32 indices, reading both indices into an IndexMap, and then picking the wrong one which has the number of files capped at 64k.

As the dir_start is the same for both, max_by_key picks the last one as per its documentation.

As mentioned, version 2.1.1 rejects the index with the capped number of files and it fails somewhere in central_header_to_zip_file, though I haven’t debugged it deeper.

This regression might be related to 8efd233, or 68f7f5d which are touching the relevant code, though I’m not quite sure about that.

The text was updated successfully, but these errors were encountered:

Swatinem · 2024-06-07T10:01:18Z

Another interesting result here is that performance also suffers because it is collecting into two IndexMaps. A better idea might be to try using the "better" directory info first, and just return that right away when it succeeds parsing and collecting also the file entries.
If that fails for whatever reason, then fall back to a secondary directory, potentially even clearing and reusing the already allocated IndexMap.

cosmicexplorer · 2024-06-13T21:14:12Z

I suspect #93 is likely the culprit. Will take a look at this.

cosmicexplorer · 2024-06-13T21:17:45Z

@Swatinem could I ask you to provide a repro case as in OP of #138? Your investigation seems very likely to be correct (thanks so much!) but I would like to be sure of the result.

…present (#189)

Swatinem · 2024-06-14T11:23:41Z

Thanks for taking a look here.

I was able to quickly reproduce this by using an older version of zip to create the archive, and adding a custom header in front.

This is to mimic the usage within https://github.com/getsentry/symbolic/blob/master/symbolic-debuginfo/src/sourcebundle.rs as closely as possible.

I can confirm using that testcase that version 2.1.3 throws a FileNotFound error, whereas master (which I believe will be published as 2.1.4 pretty soon) runs that testcase fine:

    #[test]
    fn test_64k() {
        let mut buffer = vec![];
        buffer.write_all(b"SYSB").unwrap();
        buffer.write_all(&0u32.to_le_bytes()).unwrap();

        let cursor = Cursor::new(buffer);
        let mut zipwriter = zip064::write::ZipWriter::new(cursor);

        let opt =
            zip064::write::FileOptions::default().last_modified_time(zip064::DateTime::default());

        for i in 0..100_000 {
            let file_contents = format!("{i}.txt");
            zipwriter.start_file(&file_contents, opt).unwrap();
            zipwriter.write_all(file_contents.as_bytes()).unwrap();
        }

        let cursor = zipwriter.finish().unwrap();

        let mut zipreader = zip213::read::ZipArchive::new(cursor).unwrap();

        for i in 0..100_000 {
            let expected_contents = format!("{i}.txt");
            let mut file = zipreader.by_name(&expected_contents).unwrap();
            let mut file_contents = String::new();
            file.read_to_string(&mut file_contents).unwrap();
            assert_eq!(file_contents, expected_contents);
        }
    }

feel free to adopt this to your testsuite, though mind you that it runs very slowly on a debug build.

…or slowly (#189)

Pr0methean · 2024-06-22T03:25:13Z

Done adapting it, and I've now used it to test a fix with some refactoring that avoids enumerating the files repeatedly and unnecessarily. When writing a lot of tiny files, it's important to choose CompressionMethod::Stored.

Swatinem added the bug Something isn't working label Jun 7, 2024

Swatinem mentioned this issue Jun 7, 2024

Downgrade and pin zip to fix SourceBundles with >64k files getsentry/symbolic#846

Merged

Shatur mentioned this issue Jun 13, 2024

Performance issue after upgrading to 1.3.0 #138

Closed

cosmicexplorer self-assigned this Jun 13, 2024

Pr0methean added a commit that referenced this issue Jun 13, 2024

perf: Skip searching for the ZIP32 header if a valid ZIP64 header is …

3eb2a04

…present (#189)

Pr0methean mentioned this issue Jun 13, 2024

chore: release v2.1.4 #191

Closed

Pr0methean mentioned this issue Jun 15, 2024

chore: release v2.1.4 #194

Merged

Pr0methean added fix merged Fix is merged, but the issue will stay open until it's released. and removed fix merged Fix is merged, but the issue will stay open until it's released. labels Jun 22, 2024

Pr0methean added a commit that referenced this issue Jun 22, 2024

fix: Some archives with over u16::MAX files were handled incorrectly …

c934c82

…or slowly (#189)

syphar mentioned this issue Jul 12, 2024

Crate succeeds to build, but some pages are unreachable rust-lang/docs.rs#2536

Closed

Pr0methean closed this as completed Jul 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zip `2.1.3` regressed large zips with >64k files #189

zip `2.1.3` regressed large zips with >64k files #189

Swatinem commented Jun 7, 2024

Swatinem commented Jun 7, 2024

cosmicexplorer commented Jun 13, 2024

cosmicexplorer commented Jun 13, 2024

Swatinem commented Jun 14, 2024

Pr0methean commented Jun 22, 2024

zip 2.1.3 regressed large zips with >64k files #189

zip 2.1.3 regressed large zips with >64k files #189

Comments

Swatinem commented Jun 7, 2024

Swatinem commented Jun 7, 2024

cosmicexplorer commented Jun 13, 2024

cosmicexplorer commented Jun 13, 2024

Swatinem commented Jun 14, 2024

Pr0methean commented Jun 22, 2024

zip `2.1.3` regressed large zips with >64k files #189

zip `2.1.3` regressed large zips with >64k files #189