This repository has been archived by the owner on Nov 19, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Several bug fixes were needed, including properly closing decompressor resources when closing files within compressed archives.
Additionally, tar files are quite inefficient in particular because they are not indexed like zip files: you have to sequentially read them to find the file you're looking for, and this becomes exponentially complex when calling
fs.WalkDir()
, which callsReadDir()
, which requires scanning the entire archive to produce a properly-structured file hierarchy.This PR introduces some breaking changes to v4 alpha (but what's new, it's in alpha), but mostly minor ones.
All that is to say:
ArchiveFS
with tar files will be much, much faster now. At most one scan of the archive is performed while walking since we now amortize the results, but another scan is still required when calling Open (only until the file is found).FileSystem()
can now create an FS from a stream, but it does have to implement all of threeio
interfaces:Reader
,ReaderAt
, andSeeker
.Likely some more API changes coming before a beta or stable release. For example, I'm questioning the usefulness of the filename filter in
Extract()
(the third argument), since the enforcement of that depends on the individual format types that implement it... it's generally treated as a prefix, and sometimes callers want that, other times they want exact matches... maybe the logic is best left to the user in their callback. It's not really an efficiency gain.