Skip to content
This repository has been archived by the owner on Nov 19, 2024. It is now read-only.

Refactor FS types; improve performance #426

Merged
merged 3 commits into from
Nov 8, 2024
Merged

Refactor FS types; improve performance #426

merged 3 commits into from
Nov 8, 2024

Conversation

mholt
Copy link
Owner

@mholt mholt commented Nov 8, 2024

Several bug fixes were needed, including properly closing decompressor resources when closing files within compressed archives.

Additionally, tar files are quite inefficient in particular because they are not indexed like zip files: you have to sequentially read them to find the file you're looking for, and this becomes exponentially complex when calling fs.WalkDir(), which calls ReadDir(), which requires scanning the entire archive to produce a properly-structured file hierarchy.

This PR introduces some breaking changes to v4 alpha (but what's new, it's in alpha), but mostly minor ones.

All that is to say:

  • Fixed some bugs; resources are (I think) properly managed now
  • At the expense of some memory, using ArchiveFS with tar files will be much, much faster now. At most one scan of the archive is performed while walking since we now amortize the results, but another scan is still required when calling Open (only until the file is found).
  • Some exported API changes
  • FileSystem() can now create an FS from a stream, but it does have to implement all of three io interfaces: Reader, ReaderAt, and Seeker.
  • Oh, I also added stream filetype detection for brotli, which doesn't have a header or magic number (sigh), so we basically try reading some bytes from a brotli reader and see if we succeeded 🤷‍♂️ -- not great, but oh well.

Likely some more API changes coming before a beta or stable release. For example, I'm questioning the usefulness of the filename filter in Extract() (the third argument), since the enforcement of that depends on the individual format types that implement it... it's generally treated as a prefix, and sometimes callers want that, other times they want exact matches... maybe the logic is best left to the user in their callback. It's not really an efficiency gain.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant