This is an Go implementation of the design covered in /Kraaler: A User Perspective Web Crawler/ and presented at TMA 2019
Kraaler requires CGO_ENABLED=1
(C-support in Go), due to the use of sqlite.
In order to compile the binary a set of C libraries is needed.
The official Golang Docker Images comes pre-bundled with these C dependencies, making them a convenient tool for compilation.
docker run \
--rm \
-v $(pwd):/go/src/github.com/aau-network-security/kraaler \
-w /go/src/github.com/aau-network-security/kraaler/app/ \
-e GO111MODULE=on \
-e GOOS=linux \
-e GOARCH=amd64 \
-e CGO_ENABLED=1 \
-e HOST_UID=`id -u` \
golang:1.12.6 \
bash build.sh
Remember to set GOOS
and GOARCH
according to your platform.
$ krl run -n 3 \ # amount of workers
--provider-file urls.txt \ # provider for urls
--sampler 'uni' \ # sampler for prioritization of urls
--filter-resp-bodies-ct '^text/' # only text bodies
- Thomas Kobber Panum (@tpanum)