Skip to content

dataforgoodfr/12_observatoire_des_imaginaires

Repository files navigation

Observatoire des imaginaires

Installing with poetry

Prerequisites:

  1. Python 3.12.3 installed on your system.
  2. Ensure you have poetry installed. If not, you can install them using pip.
pip install poetry

Steps:

  1. Clone the GitHub Repository:

    Clone the GitHub repository you want to install locally using the git clone command.

    git clone https://github.com/dataforgoodfr/12_observatoire_des_imaginaires.git
  2. Navigate to the Repository Directory:

    Use the cd command to navigate into the repository directory.

    cd 12_observatoire_des_imaginaires/
  3. Configure poetry to create a Virtual Environment inside the project:

    Ensure that poetry will create a .venv directory into the project with the command:

    poetry config virtualenvs.in-project true
  4. Install Project Dependencies using poetry:

    Use poetry to install the project dependencies.

    poetry install

    This will read the pyproject.toml file in the repository and install all the dependencies specified.

  5. Activate the Virtual Environment:

    Activate the virtual environment to work within its isolated environment.

    On Unix or MacOS:

    poetry shell
  6. Run & edit notebooks:

    jupyter notebook

Environment Variables

This code base uses a .env file at the root directory of the code base.

Variable Description Default Value
HF_TOKEN Hugging Face API Token. You must have write access to the datasets. N/A
TMDB_API_KEY TMDB API Token. N/A
TMDB_BATCH_SIZE Number of TMDB entries to download before updating a HF dataset. 10000
TMDB_MAX_RETRIES Maximum number of times to retry a failed TMDB API call. 500

Website to select a specific movie or TV show

The observable directory contains an observable framework site that collect film and movie data from datasets on Hugging Face and filters the datasets according to the following rules in order to reduced the size of the data present on the generated web site. This site provides a search UI allow a user to select a specific movie or TV show. The user can then click on the link for their selection to kick off the questionnaire on tally andis destined to be embedded in an iframe in the main Observatoire des Imaginaires web site.

Movies:

  • filter out adult movies
  • filter out movies released more that two years ago

TV Shows:

  • filter out adult shows

The web site is currently hosted on the Observable hosting platform and is available at the following URL:

https://observatoire-des-imaginaires.observablehq.cloud/questionnaire

Run precommit-hook locally

Install precommits

pre-commit run --all-files

Use Tox to test your code

tox -vv

Tasks

This repo includes invoke for pythonic task execution. To see the is of available tasks you can run:

invoke -l

To run the observable site in development mode you can run:

invoke dev

Updating the Movie Dataset

The French regional TMDB Movies Dataset on Hugging Face can be updated using the following command:

invoke update-movies-dataset

Updating the Series Dataset

The French regional TMDB Series Dataset on Hugging Face can be updated using the following command:

invoke update-series-dataset

Python CLI

The Python CLI supports the following commands:

python -m observatoire.tmdb.movies --mode=[latest | missing]
python -m observatoire.tmdb.series --mode=[latest | missing]

In the latest mode, which is the default, these commands sync the latest records from TMDB to our datasets on Hugging Face. In the missing mode, they calculate which rows may be missing from the Hugging Face datasets and attempt to sync these records.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published