-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support recursion across all archive types #47
Conversation
Generate changelog in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor feedback and PR needs to be deconflicted, but on the whole the structure is much nicer and looks to make sense.
In the abstract may have a bit of feedback on the WalkerProvider
construction, but nothing that's fundamental/major enough to derail the current approach.
pkg/archive/paths_test.go
Outdated
@@ -96,3 +105,21 @@ func mustZipFileReader(t *testing.T, path string) (*zip.Reader, func(*testing.T) | |||
assert.NoError(t, file.Close()) | |||
} | |||
} | |||
func mustTarGZReader(t *testing.T, path string) (*tar.Reader, func(*testing.T)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add newline between end of previous declaration and start of this one.
pkg/archive/paths.go
Outdated
|
||
// FileWalkFn is called by a WalkFn on each file contained in an archive. | ||
type FileWalkFn func(ctx context.Context, path string, size int64, contents io.Reader) (proceed bool, err error) | ||
|
||
// WalkZipFiles walks the files in the provided *zip.Reader, calling walkFn on each header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function can now be private (or even inlined into the one calling location) -- previously this was exposed since it was referenced directly, but no more need to do that with this refactor.
pkg/archive/paths.go
Outdated
return err | ||
} | ||
// WalkTarFiles walks the files in the provided *tar.Reader, calling walkFn on each header. | ||
func WalkTarFiles(ctx context.Context, r *tar.Reader, walkFn FileWalkFn) (err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for zip (can either be unexported or possibly even inlined to one calling location)
d22b3b3
to
c325f6e
Compare
Needs changelog and conflict resolution |
277b990
to
c4afd39
Compare
c4afd39
to
b446ca8
Compare
Recursion supported across all archive types by making
crawl.Log4jIdentifier
type no longer holds any concepts of specific archives or archive types and then reusing pretty much the same approach that was used before.Instead, the following fields have been added to it:
When a file or path is encountered, the
ParseArchiveFormat
function is used to determine if the file is an archive, and the archive type is passed along to theArchiveWalkers
function to retrieve anarchive.WalkerProvider
.A new
archive.WalkerProvider
should support creating aWalkFn
from either a file or a reader. This is handy because during the first walk, we have access to files on disk and so do not have to load anything into memory. When we are recursing into archives we do not have access to disk as everything is currently in memory, so aWalkerProvider
must also provide aFromReader
method to create one from anio.Reader
. Reading into memory is only required by thearchive.ZipWalkers
provider currently, which still has a configurable memory limit, as it did before this PR (this situation will be improved in #40).