Skip to content

Releases: liserman/archiveRetriever

archiveRetriever 0.4.0

11 Jun 09:31
Compare
Choose a tag to compare

archiveRetriever 0.4.0

  • Replace deprecated functions of dependencies
  • Fix bugs in archive_overview() and retrieve_urls()
  • New option nonArchive added to retrieve_links() and scrape_urls(). This option allows users to scrape internet pages not stemming from the Internet Archive.
  • New feature added to the collapse option of scrape_urls(). collapse can now also take a Xpath as input, to collapse results based on a structuring Xpath. Unfortunately, this works only with Xpaths and not with CSS selectors. If used, Paths refers only to children of the structuring Xpath given in collapse.

archiveRetriever 0.3.1

27 Dec 10:34
Compare
Choose a tag to compare
  • Changes to the testing environment.
  • Disable progress bar in non-interactive use.

archiveRetriever 0.3.0

20 Dec 21:03
Compare
Choose a tag to compare
  • Fixes to filtering of links in retrieve_links() to enable link scraping from domains with more than one domain ending.
  • New option filter added to retrieve_links(). This options allows to disable the filtering of links to be sub-domains of the top-level domain.
  • New option pattern added to retrieve_links(). This option allows for custom patterns by which links are filtered before output.

archiveRetriever 0.2.0

21 Jun 15:22
Compare
Choose a tag to compare
  • New option collapseDate added to retrieve_urls(). This option allows users to choose whether retrieve_urls outputs all or just one memento per requested day.

archiveRetriever 0.1.2

08 Jun 08:31
Compare
Choose a tag to compare
  • Fixes to ignoreErrors option for html reading-errors in scrape_urls()
  • Fixes to retrieve_links() for Errors occurring in last Url
  • Improve compatibility between retrieve_links() and scrape_urls()

archiveRetriever 0.1.1

12 Jan 18:37
Compare
Choose a tag to compare
  • Fixes to ignoreErrors option for encoding errors in retrieve_links()

archiveRetriever 0.1.0

22 Sep 12:54
Compare
Choose a tag to compare

Added new function to scrape_urs: collapse
Improved functionality
Fixes to test environment

archiveRetriever 0.0.2

19 Mar 12:01
Compare
Choose a tag to compare

Scraping content from archived web pages stored in the 'Internet Archive' (https://archive.org) using a systematic workflow. Get an overview of the mementos available from the respective homepage, retrieve the Urls and links of the page and finally scrape the content. The final output is stored in tibbles, which can be then easily used for further analysis.