Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valid .zip file: zlib: invalid header #2933

Closed
rgmz opened this issue Jun 6, 2024 · 6 comments
Closed

Valid .zip file: zlib: invalid header #2933

rgmz opened this issue Jun 6, 2024 · 6 comments
Labels

Comments

@rgmz
Copy link
Contributor

rgmz commented Jun 6, 2024

Please review the Community Note before submitting

TruffleHog Version

3.78.0

Trace Output

$ ./trufflehog filesystem /tmp/002-linoise.zip --trace
2024-06-06T18:43:39-04:00       info-2  trufflehog      trufflehog dev
🐷🔑🐷  TruffleHog. Unearth your secrets. 🐷🔑🐷

...
2024-06-06T18:43:39-04:00       info-3  trufflehog      scanning file   {"source_manager_worker_id": "AAGFT", "unit": "/tmp/002-linoise.zip", "unit_kind": "unit", "path": "/tmp/002-linoise.zip"}
2024-06-06T18:43:39-04:00       info-5  trufflehog      Handling extracted file.        {"source_manager_worker_id": "AAGFT", "unit": "/tmp/002-linoise.zip", "unit_kind": "unit", "timeout": 30, "filename": "linoise.csv", "size": 689}
2024-06-06T18:43:39-04:00       error   trufflehog      error unarchiving chunk.        {"source_manager_worker_id": "AAGFT", "unit": "/tmp/002-linoise.zip", "unit_kind": "unit", "timeout": 30, "error": "error extracting archive with format: .zip: handling file 0: linoise.csv: error creating custom reader: error identifying archive: matching tar: zlib: invalid header"}
2024-06-06T18:43:39-04:00       info-5  trufflehog      handler channel closed, all chunks processed    {"source_manager_worker_id": "AAGFT", "unit": "/tmp/002-linoise.zip", "unit_kind": "unit"}

Expected Behavior

The valid zip file should be extracted and scanned.

$ unzip 002-linoise.zip
Archive:  002-linoise.zip
  inflating: linoise.csv

Actual Behavior

The file fails with error identifying archive: matching $X: zlib: invalid header, with $X changing between different archive methods.

2024-06-06T18:42:44-04:00       error   trufflehog      error unarchiving chunk.        {"source_manager_worker_id": "W21um", "unit": "/tmp/002-linoise.zip", "unit_kind": "unit", "timeout": 30, "error": "error extracting archive with format: .zip: handling file 0: linoise.csv: error creating custom reader: error identifying archive: matching 7z: zlib: invalid header"}
...
2024-06-06T18:42:58-04:00       error   trufflehog      error unarchiving chunk.        {"source_manager_worker_id": "ibT8o", "unit": "/tmp/002-linoise.zip", "unit_kind": "unit", "timeout": 30, "error": "error extracting archive with format: .zip: handling file 0: linoise.csv: error creating custom reader: error identifying archive: matching rar: zlib: invalid header"}
...
2024-06-06T18:43:39-04:00       error   trufflehog      error unarchiving chunk.        {"source_manager_worker_id": "AAGFT", "unit": "/tmp/002-linoise.zip", "unit_kind": "unit", "timeout": 30, "error": "error extracting archive with format: .zip: handling file 0: linoise.csv: error creating custom reader: error identifying archive: matching tar: zlib: invalid header"}

Steps to Reproduce

  1. Download https://github.com/microsoft/OCP-ISV-Machine-Learning-Hands-on-Lab/blob/629a7750a1dd50c4af80043eced0bdc8da5bc216/appx/002-linoise.zip
  2. Run ./trufflehog filesystem ./002-linoise.zip

Environment

N/A

Additional Context

N/A

References

N/A

@rgmz rgmz added the bug label Jun 6, 2024
@rgmz
Copy link
Contributor Author

rgmz commented Jun 6, 2024

Another one: https://github.com/Azure/tfs-matlab-connector/blob/f172fb73c3c3edb944b3ab775dc8b87879256150/TFS%20Version%20Control%20Integration.mlappinstall

CLI

$ unzip 'TFS Version Control Integration.mlappinstall'
Archive:  TFS Version Control Integration.mlappinstall
  inflating: LICENSE.txt
  inflating: ThirdPartyNotices.txt
...

TruffleHog

$ ./trufflehog/trufflehog filesystem '/tmp/TFS Version Control Integration.mlappinstall'
🐷🔑🐷  TruffleHog. Unearth your secrets. 🐷🔑🐷

2024-06-06T18:49:34-04:00       info-0  trufflehog      running source  {"source_manager_worker_id": "UzYPa", "with_units": true}
2024-06-06T18:49:34-04:00       error   trufflehog      error unarchiving chunk.        {"source_manager_worker_id": "UzYPa", "unit": "/tmp/TFS Version Control Integration.mlappinstall", "unit_kind": "unit", "timeout": 30, "error": "error extracting archive with format: .zip: handling file 7: dist/TFS-SDK/redist/lib/com.microsoft.tfs.sdk-14.0.3.jar: error extracting archive with format: .zip: handling file 13735: license/LICENSE.dom-documentation.txt: error creating custom reader: error identifying archive: matching rar: zlib: invalid header"}
2024-06-06T18:49:34-04:00       info-0  trufflehog      finished scanning       {"chunks": 290, "bytes": 870987, "verified_secrets": 0, "unverified_secrets": 0, "scan_duration": "188.000855ms", "trufflehog_version": "dev"}

@rgmz
Copy link
Contributor Author

rgmz commented Jun 6, 2024

Another one: https://github.com/0xdaryl/openj9-openjdk-jdk21/blob/ef265ca9399171d89d02031c3aa8b6d33e72c71c/jdk/test/java/util/PluggableLocale/barprovider.jar

CLI

$ unzip barprovider.jar
Archive:  barprovider.jar
  inflating: META-INF/MANIFEST.MF
   creating: META-INF/services/
...

TruffleHog

./trufflehog filesystem /tmp/barprovider.jar
🐷🔑🐷  TruffleHog. Unearth your secrets. 🐷🔑🐷

2024-06-06T19:19:36-04:00       info-0  trufflehog      running source  {"source_manager_worker_id": "Mjv50", "with_units": true}
2024-06-06T19:19:36-04:00       error   trufflehog      error unarchiving chunk.        {"source_manager_worker_id": "Mjv50", "unit": "/tmp/barprovider.jar", "unit_kind": "unit", "timeout": 30, "error": "error extracting archive with format: .zip: handling file 15: com/bar/LocaleNames_xx.properties: error creating custom reader: error identifying archive: matching tar: zlib: invalid header"}
2024-06-06T19:19:36-04:00       info-0  trufflehog      finished scanning       {"chunks": 8, "bytes": 1230, "verified_secrets": 0, "unverified_secrets": 0, "scan_duration": "8.352035ms", "trufflehog_version": "dev"}

@ahrav
Copy link
Collaborator

ahrav commented Jun 9, 2024

This issue seems to originate from the archiver library. When we call Identify on the underlying CSV, it incorrectly identifies it as a zlib file instead of a text/csv file.

Example:

func main() {
	file, err := os.Open(“testdata/linoise.csv”)
	if err != nil {
		panic(err)
	}

	reader := bufio.NewReader(file)

	format, arReader, err := archiver.Identify(“”, reader)
	if err != nil {
		panic(err)
	}

	fmt.Println(format)
	fmt.Println(arReader)
}

This results in the error: panic: matching tar: zlib: invalid header.

We can work around this by merging the case errors.Is(err, archiver.ErrNoMatch) with the default case and fallback to mimetype detection. This approach appears to resolve the issue.

I'll get the fix in on the Trufflehog side, and look into an upstream fix if possible.

@rgmz
Copy link
Contributor Author

rgmz commented Jun 9, 2024

Perhaps the same issue as #2928: mholt/archiver#406?

If so, it's been fixed but there hasn't been a new release since last year.

@rgmz rgmz closed this as completed Jun 9, 2024
@rgmz rgmz reopened this Jun 9, 2024
@ahrav
Copy link
Collaborator

ahrav commented Jun 9, 2024

Perhaps the same issue as #2928: mholt/archiver#406?

If so, it's been fixed but there hasn't been a new release since last year.

ah, I missed that. Yep that looks like it indeed. 👏

@ahrav
Copy link
Collaborator

ahrav commented Jun 9, 2024

We could consider using the HEAD of the library instead of the latest tagged version. I'm not sure what the best course of action is here. Forking the library is another option, but it's not very appealing 😅. I'll defer to @dustin-decker for his thoughts on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants