Refactor FS types; improve performance #426

mholt · 2024-11-08T03:21:58Z

Several bug fixes were needed, including properly closing decompressor resources when closing files within compressed archives.

Additionally, tar files are quite inefficient in particular because they are not indexed like zip files: you have to sequentially read them to find the file you're looking for, and this becomes exponentially complex when calling fs.WalkDir(), which calls ReadDir(), which requires scanning the entire archive to produce a properly-structured file hierarchy.

This PR introduces some breaking changes to v4 alpha (but what's new, it's in alpha), but mostly minor ones.

All that is to say:

Fixed some bugs; resources are (I think) properly managed now
At the expense of some memory, using ArchiveFS with tar files will be much, much faster now. At most one scan of the archive is performed while walking since we now amortize the results, but another scan is still required when calling Open (only until the file is found).
Some exported API changes
FileSystem() can now create an FS from a stream, but it does have to implement all of three io interfaces: Reader, ReaderAt, and Seeker.
Oh, I also added stream filetype detection for brotli, which doesn't have a header or magic number (sigh), so we basically try reading some bytes from a brotli reader and see if we succeeded 🤷‍♂️ -- not great, but oh well.

Likely some more API changes coming before a beta or stable release. For example, I'm questioning the usefulness of the filename filter in Extract() (the third argument), since the enforcement of that depends on the individual format types that implement it... it's generally treated as a prefix, and sometimes callers want that, other times they want exact matches... maybe the logic is best left to the user in their callback. It's not really an efficiency gain.

mholt added 3 commits October 30, 2024 18:33

WIP

e4bb348

More WIP

5ca8902

Finish improvements (probably)

3c8ec72

mholt merged commit e310539 into master Nov 8, 2024
6 checks passed

mholt deleted the wip branch November 8, 2024 04:01

This was referenced Nov 8, 2024

archiver.FileSystem() alternative that takes io.Reader #358

Closed

File.Open given to the handler for Tar.Extract can't be used after the handler has returned #371

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor FS types; improve performance #426

Refactor FS types; improve performance #426

mholt commented Nov 8, 2024 •

edited

Loading

Refactor FS types; improve performance #426

Refactor FS types; improve performance #426

Conversation

mholt commented Nov 8, 2024 • edited Loading

mholt commented Nov 8, 2024 •

edited

Loading