Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-standard zip file #1

Closed
pmqs opened this issue Feb 7, 2020 · 4 comments
Closed

Non-standard zip file #1

pmqs opened this issue Feb 7, 2020 · 4 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@pmqs
Copy link

pmqs commented Feb 7, 2020

I've just answered a question on StackOverflow (here) about an issue someone was having uncompressing a zip file from your site https://www.geoboundaries.org/. The exact file was https://www.geoboundaries.org/data/geoBoundaries-2_0_0/NGA/ADM1/geoBoundaries-2_0_0-NGA-ADM1-all.zip. I've check other zip files on your site & they all have the same issue.

The fundamental issue is the filenames stored in the central-header and the equivalent local-header sections of the zip files should be identical. In this case they are not.

I'm seeing the central-header filename entries without any leading path components, the matching local-header entries do have a leading path. For example, geoBoundaries-2_0_0-NGA-ADM1-shp.zip in the central-header and release/geoBoundaries-2_0_0/NGA/ADM1/geoBoundaries-2_0_0-NGA-ADM1-shp.zip in the local-header

That means the zip files are badly-formed. See the StackOverflow write-up for more details.

There are a couple of implications with the zip files in their current format

  1. The uncompression behaviour will vary depending on the unzipping utility used.
    The person reporting the issue on StackOverflow saw two different behaviours.

  2. The zip file may get flagged as malicious.
    See here for details.

Looking briefly at the Python code in this repository it appears that zipfile is being used, I didn't think that Python would let you create a badly-formed zip file.

Is that how you are creating these zip files?

@pmqs
Copy link
Author

pmqs commented Feb 8, 2020

I see the reason for the badly-formed zip files. There are a few places in the code like this is used to write a file into a zip archive.

  with zipfile.ZipFile(allZipPath, 'w') as allZip:
    for f in allFilesToZip:
      allZip.write(f, compress_type=zipfile.ZIP_DEFLATED)
    for zip_info in allZip.infolist():
        if zip_info.filename[-1] == '/':
            continue
        zip_info.filename = os.path.basename(zip_info.filename)

The problem is with the second half of the code where it uses the zipinfo object to remove any trailing path component of the files being added. That is not the way to achieve what you want.

You should use the arcname option in the write method to force zipfile to use the filename you want.

Something like this should do what you need.

with zipfile.ZipFile("abc1.zip", 'w') as allZip:
    for f in allFilesToZip:
        aname = f
        if aname != '/':
            aname = os.path.basename(aname)
        allZip.write(f, arcname=aname, compress_type=zipfile.ZIP_DEFLATED)

@DanRunfola
Copy link
Member

Wow, thank you for this incredibly in-depth analysis (and response on Stack Overflow)! We'll get this fix wrapped into the next minor release.

@DanRunfola DanRunfola self-assigned this Mar 3, 2020
@DanRunfola DanRunfola added the bug Something isn't working label Mar 3, 2020
@DanRunfola DanRunfola added this to the 2.0.1 milestone Mar 3, 2020
@DanRunfola
Copy link
Member

I just pushed geoboundaries 2.0.1, which contains this fix (alongside a few other minor fixes). Thanks again for the report and help with the solution!

@pmqs
Copy link
Author

pmqs commented Mar 5, 2020

No problem. Glad to help

DanRunfola added a commit that referenced this issue Aug 29, 2020
Updating my local copy with master
DanRunfola pushed a commit that referenced this issue Sep 13, 2020
updating my local copy with master
slfuhrig pushed a commit that referenced this issue Sep 18, 2020
Updating Local - LindseyR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants