Skip to content
This repository has been archived by the owner on Nov 19, 2024. It is now read-only.

ArchiveFS.ReadDir does not list implicit directories in ZIP archives. #338

Closed
jeremyje opened this issue Jun 15, 2022 · 2 comments · Fixed by #339
Closed

ArchiveFS.ReadDir does not list implicit directories in ZIP archives. #338

jeremyje opened this issue Jun 15, 2022 · 2 comments · Fixed by #339

Comments

@jeremyje
Copy link
Contributor

What version of the package or command are you using?

v4.0.0-alpha.7

What are you trying to do?

ArchiveFS.ReadDir with a zip file that does not have explicit directory entries.

zip -qr9 ../nodir-testassets.zip index.html assets/1.txt assets/2.txt site.js "weird #1.txt" weird#.txt weird$$.txt assets/more/3.txt assets/four/4.txt assets/fivesix/5.txt assets/fivesix/6.txt
unzip -l nodir-testassets.zip 
Archive:  nodir-testassets.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
       10  2022-06-04 05:25   index.html
       12  2022-06-04 05:25   assets/1.txt
       12  2022-06-04 05:25   assets/2.txt
        7  2022-06-04 05:25   site.js
       12  2022-06-08 06:50   weird #1.txt
       10  2022-06-07 08:27   weird#.txt
       10  2022-06-07 08:27   weird$.txt
       17  2022-06-04 05:25   assets/more/3.txt
       17  2022-06-04 05:25   assets/four/4.txt
       20  2022-06-04 05:25   assets/fivesix/5.txt
       20  2022-06-04 05:25   assets/fivesix/6.txt
---------                     -------
      147                     11 files

Archiver works fine when there are explicit directory entries like:

zip -qr9 ../testassets.zip *
cd ..
unzip -l testassets.zip
Archive:  testassets.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2022-06-04 05:25   assets/
        0  2022-06-04 05:25   assets/images/
    50603  2022-06-04 05:25   assets/images/rocks.jpg
       12  2022-06-04 05:25   assets/images/1.txt
     2335  2022-06-04 05:25   assets/images/Example.png
    13508  2022-06-04 05:25   assets/images/300.jpg
       10  2022-06-04 05:25   assets/images/index.html
    68793  2022-06-04 05:25   assets/images/ocean.jpg
    39681  2022-06-04 05:25   assets/images/nature.jpg
     3430  2022-06-04 05:25   assets/images/300.webp
       12  2022-06-04 05:25   assets/images/2.txt
        7  2022-06-04 05:25   assets/images/site.js
    49396  2022-06-04 05:25   assets/images/underwater.jpg
       12  2022-06-04 05:25   assets/1.txt
        0  2022-06-04 05:25   assets/more/
       17  2022-06-04 05:25   assets/more/3.txt
        0  2022-06-04 05:25   assets/fivesix/
       20  2022-06-04 05:25   assets/fivesix/5.txt
       20  2022-06-04 05:25   assets/fivesix/6.txt
       12  2022-06-04 05:25   assets/2.txt
        0  2022-06-04 05:25   assets/four/
       17  2022-06-04 05:25   assets/four/4.txt
       10  2022-06-04 05:25   index.html
        7  2022-06-04 05:25   site.js
       12  2022-06-08 06:50   weird #1.txt
       10  2022-06-07 08:27   weird#.txt
       10  2022-06-07 08:27   weird$.txt
---------                     -------
   227934                     27 files

What steps did you take?

ArchiveFS.ReadDir("assets")

This returns:

assets/1.txt
assets/2.txt
assets

What did you expect to happen, and what actually happened instead?

ArchiveFS.ReadDir("assets")

Should Return

assets/fivesix/
assets/four/
assets/more/
assets/1.txt
assets/2.txt

How do you think this should be fixed?

I'm guessing the err = f.Format.Extract(f.context(), inputStream, filter, handler) call should be smarter to understand that if there are files listed with directories not represented in the dir list that they should have a implicitDirEntry{} entry stubbed in.

// ReadDir reads the named directory from within the archive.
func (f ArchiveFS) ReadDir(name string) ([]fs.DirEntry, error) {
	...
	handler := func(_ context.Context, file File) error {
		// directories may end with trailing slash; standardize name
		trimmedName := strings.Trim(file.NameInArchive, "/")

		// don't include the named directory itself in the list of entries
		if trimmedName == name {
			return nil
		}

		// items added to an archive depth-first results in the subfolder entry being
		// added to the archive after all the files within it, meaning we won't have
		// the chance to return SkipDir before traversing into it; so we have to also
		// check if we are within a subfolder deeper than the requested name (because
		// this is a ReadDir function, we do not intend to traverse subfolders) (issue #310)
		// in other words, archive entries can be created out-of-(breadth-first)-order,
		// or even an arbitrary/random order, and we need to make sure we get all entries
		// in just this directory
		if path.Dir(trimmedName) != name {
			// additionally, some archive files don't have actual entries for folders,
			// leaving them to be inferred from the names of files instead (issue #330)
			// so as we traverse deeper, we need to implicitly find subfolders within
			// this current directory and add fake entries to the output
			remainingPath := strings.TrimPrefix(file.NameInArchive, name)
			nextDir := topDir(remainingPath)        // if name in archive is "a/b/c" and root is "a", this becomes "b" (the implied folder to add)
			implicitDir := path.Join(name, nextDir) // the full path of the implied directory

			// create fake entry only if no entry currently exists (don't overwrite a real entry)
			if _, ok := entries[implicitDir]; !ok {
				entries[implicitDir] = implicitDirEntry{implicitDir}
			}

			return fs.SkipDir
		}

		entries[file.NameInArchive] = fs.FileInfoToDirEntry(file)

		// don't traverse deeper into subfolders
		if file.IsDir() {
			return fs.SkipDir
		}

		return nil
	}

	// handle special case of reading from root of archive
	var filter []string
	if name != "." {
		filter = []string{name}
	}

	var inputStream io.Reader = archiveFile
	if f.Stream != nil {
		inputStream = io.NewSectionReader(f.Stream, 0, f.Stream.Size())
	}

	err = f.Format.Extract(f.context(), inputStream, filter, handler)
	if err != nil {
		return nil, err
	}

        ...
}

Please link to any related issues, pull requests, and/or discussion

jeremyje/gowebserver#64

You can trigger this problem by editing xTestDirlessArchive to TestDirlessArchive. Then run the tests via:

git clone [email protected]:jeremyje/gowebserver.git
cd gowebserver
make assets
make test

Bonus: What do you use archiver for, and do you find it useful?

http.FileSystem to browse different archives and archive file nesting.

@mholt
Copy link
Owner

mholt commented Jun 15, 2022

Hi, thanks for the report (and the neat gowebserver project).

I believe it should already be doing that... or is this code something else than what you're talking about?

archiver/fs.go

Lines 457 to 468 in 3869ede

// additionally, some archive files don't have actual entries for folders,
// leaving them to be inferred from the names of files instead (issue #330)
// so as we traverse deeper, we need to implicitly find subfolders within
// this current directory and add fake entries to the output
remainingPath := strings.TrimPrefix(file.NameInArchive, name)
nextDir := topDir(remainingPath) // if name in archive is "a/b/c" and root is "a", this becomes "b" (the implied folder to add)
implicitDir := path.Join(name, nextDir) // the full path of the implied directory
// create fake entry only if no entry currently exists (don't overwrite a real entry)
if _, ok := entries[implicitDir]; !ok {
entries[implicitDir] = implicitDirEntry{implicitDir}
}

(Apologies if I am misunderstanding, it's after 1am here, I should probably get some sleep, heh.)

@jeremyje
Copy link
Contributor Author

I'm writing a test that reproduces this issue locally and also discovered another bug (which I have fixed). I'll send a PR for this fairly soon.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants