webcp

Copy a subtree of a web site, with smart page filters. Distinctive features:

Crawl pages without saving them, in order to discover links to the pages you really want.
Automatically crawl from the Internet Archive instead of the live site (please be nice to the Archive!)

WORK IN PROGRESS: This program is not yet ready to use.

Compiling

go get -t github.com/jesand/webcp
go test github.com/jesand/webcp/...
go install github.com/jesand/webcp

Usage Examples

See the usage notes:

webcp -h

Download a URL and its sub-pages into the current directory:

webcp <url> .

By default, the crawl will fetch all linked pages up to a depth of 5, and will delay 5 seconds between subsequent requests to the same domain.

If you have a large crawl that you might need to kill and later resume, you can do that by providing a resume file:

webcp --resume=links.txt <url> .

API

The command line interface described here is just a thin wrapper around the Crawler type in the crawl package. You can easily use the crawler component directly in some other program. See the API reference on godoc for details.

Planned Enhancements

Crawl linked pages to the maximum depth, but only save pages whose URLs/MIME types match certain filters.
Stay on the same domain, or set of domains.
Crawl from the Internet Wayback Machine instead of from the live site, with fancy date filtering to get the page version you want.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
crawl		crawl
README.md		README.md
main.go		main.go
main_test.go		main_test.go
mockgen.sh		mockgen.sh
parse.go		parse.go
parse_test.go		parse_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

webcp

Compiling

Usage Examples

API

Planned Enhancements

About

Releases

Packages

Contributors 2

Languages

jesand/webcp

Folders and files

Latest commit

History

Repository files navigation

webcp

Compiling

Usage Examples

API

Planned Enhancements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages