methodshub.qmd

---
title: webbotparseR - Parse html files containing search engine results
format:
  html:
    embed-resources: true
  gfm: default
---

## Description

<!-- - Provide a brief and clear description of the method, its purpose, and what it aims to achieve. Add a link to a related paper from social science domain and show how your method can be applied to solve that research question.   -->

Parse search engine results which have been scraped with the 'WebBot' browser extension <https://github.com/gesiscss/WebBot>.

## Keywords

<!-- EDITME -->

* Digital Behavioral Data 
* Search Engine Results
* Data Preprocessing

## Science Usecase(s)

Search engine results data is a valuable resource for research, enabling the study of information-seeking behaviors and the broader impact of search algorithms on society. By analyzing queries and result rankings, researchers can investigate how individuals interact with information and how biases in search algorithms influence knowledge acquisition and decision-making. This data is essential for examining phenomena like the spread of misinformation, the creation of filter bubbles, and the public's access to diverse perspectives. Additionally, search data can reveal temporal trends in societal concerns and interests, providing insights into collective behavior during events like elections, crises, or cultural moments. Leveraging techniques such as query classification and network analysis, researchers can explore the interplay between user intent, algorithmic curation, and societal outcomes, contributing to a better understanding of digital information ecosystems.

## Repository structure

This repository follows [the standard structure of an R package](https://cran.r-project.org/doc/FAQ/R-exts.html#Package-structure).

## Environment Setup

With R installed:

```r
install.packages("webbotparseR")
```

<!-- ## Hardware Requirements (Optional) -->
<!-- - The hardware requirements may be needed in specific cases when a method is known to require more memory/compute power.  -->
<!-- - The method need to be executed on a specific architecture (GPUs, Hadoop cluster etc.) -->


## Input Data 

<!-- - The input data has to be a Digital Behavioral Data (DBD) Dataset -->
<!-- - You can provide link to a public DBD dataset. GESIS DBD datasets (https://www.gesis.org/en/institute/digital-behavioral-data) -->

<!-- This is an example -->
The package accepts data that has been gathered with the 'WebBot' browser extension <https://github.com/gesiscss/WebBot>.

## Sample Input and Output Data

The package contains a sample dataset of search engine results.

## How to Use

```r
library(webbotparseR)
ex_file <- system.file("www.google.com_climatechange_text_2023-03-16_08_16_11.html", package = "webbotparseR")
output <- parse_search_results(path = ex_file, engine = "google text")
output
```

## Contact Details

Maintainer: David Schoch <david@schochastics.net>

Issue Tracker: [https://github.com/gesistsa/webbotparseR/issues](https://github.com/gesistsa/webbotparseR/issues)

<!-- ## Publication -->
<!-- - Include information on publications or articles related to the method, if applicable. -->

<!-- ## Acknowledgements -->
<!-- - Acknowledgements if any -->

<!-- ## Disclaimer -->
<!-- - Add any disclaimers, legal notices, or usage restrictions for the method, if necessary. -->