A vendor plugin for Saber that allows webmasters to scrape & archive content from the web & RSS feeds.
Clone the repository anywhere on your web server outside of Saber
git clone https://github.com/Datasilk/Charlotte
- Open solution
Charlotte.sln
using Visual Studio 2019 or newer & build Charlotte - execute
bin\x64\Debug\Charlotte.exe -register
in PowerShell to register the Charlotte console application as a Windows Service, which will automatically start the WCF Hosted Service
While using the latest source code for Saber, do the following:
- Execute
git clone https://github.com/Datasilk/Saber-Collector Collector
within the folder/App/Vendors/
While using the latest release of Saber, do the following:
- Download latest release of Saber.Vendors.Collector
- Extract all files & folders from either the
win-x64
orlinux-x64
zip folder to Saber's/Vendors/
folder
- run command
./publish.bat
- publish
bin/Publish/Collector.7z
as latest release
{
"browser": {
"endpoint": {
"development": "http://localhost:7007/GetDOM",
"staging": "http://localhost:7007/GetDOM",
"production": "http://localhost:7007/GetDOM"
}
},
"storage": {
"development": "/Content/Collector/",
"staging": "/Content/Collector/",
"production": "/Content/Collector/"
},
"domains": {
"downloads": {
"minIntervals": 60
}
}
}
The URL for your instance of Charlotte's Web, a load balancer application that delegates requests to a cluster of Charlotte workers.
The relative or absolute path to the folder where you'd like to store downloaded content for Collector.
This path should typically be located on a network drive where instances of Collector running on multiple machines can access the drive in a local network.
Also note that the path must end with a /
slash.
This number is used to make sure that Collector doesn't make too many requests on any given domain in a short period of time. The value is in seconds and determines the minimum time between each request made to a single domain. Collector will exclude any download queue items that meet this criteria when finding the next item in queue.