Skip to content
This repository has been archived by the owner on Nov 19, 2024. It is now read-only.

archiver.Identify fails on non-archive file: zlib: invalid header #406

Closed
rgmz opened this issue Jun 8, 2024 · 1 comment
Closed

archiver.Identify fails on non-archive file: zlib: invalid header #406

rgmz opened this issue Jun 8, 2024 · 1 comment

Comments

@rgmz
Copy link
Contributor

rgmz commented Jun 8, 2024

What version of the package or command are you using?

The latest release of v4, v4.0.0-alpha.8

What are you trying to do?

Recursively extract a .tar.gz file.

What steps did you take?

  1. Download https://github.com/kubernetes/git-sync/blob/b161f3f0c78b56f27188b4e4aabf672ba0b03706/vendor/github.com/google/licenseclassifier/licenses/licenses.db
  2. Run the following reproducer
    import (
        "context"
        "errors"
        "fmt"
        "os"
        "testing"
    
        "github.com/mholt/archiver/v4"
    )
    
    func TestTarGz(t *testing.T) {
        f, err := os.Open("/tmp/licenses.db")
        if err != nil {
    	    t.Fatal(err)
        }
        defer f.Close()
    
        format := archiver.CompressedArchive{
    	    Compression: archiver.Gz{},
    	    Archival:    archiver.Tar{},
        }
        err = format.Extract(context.Background(), f, nil, handler(t))
        if err != nil {
    	    t.Fatal(err)
        }
    }
    
    func handler(t *testing.T) func(ctx context.Context, file archiver.File) error {
        return func(ctx context.Context, file archiver.File) error {
    	    f, err := file.Open()
    	    if err != nil {
    		    t.Fatal(err)
    	    }
    	    defer f.Close()
    
    	    format, _, err := archiver.Identify(file.Name(), f)
    	    if err == nil {
    		    fmt.Printf("File '%s' is format '%s'\n", file.Name(), format.Name())
    	    } else if errors.Is(err, archiver.ErrNoMatch) {
    		    //fmt.Printf("File '%s' is not an archive\n", file.Name())
    	    } else {
    		    t.Errorf("Error identifying '%s' format: %v\n", file.Name(), err)
    	    }
    
    	    return nil
        }
    }

What did you expect to happen, and what actually happened instead?

I expected that archiver.Identify would return archiver.ErrNoMatch as the file isn't an archive. However, a different error is returned.

Error identifying 'X11.txt' format: matching zip: zlib: invalid header

How do you think this should be fixed?

I'm not sure, it depends on the cause

Please link to any related issues, pull requests, and/or discussion

trufflesecurity/trufflehog#2928

Bonus: What do you use archiver for, and do you find it useful?

I use archiver via TruffleHog. It is quite useful in that regard. :)

@rgmz
Copy link
Contributor Author

rgmz commented Jun 8, 2024

I tested the reproducer against HEAD and it doesn't return that error. It seems this was fixed in 24fa33e (#386), which isn't included in the latest release.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants