KNMI Dataset Downloader

A Python package for easily downloading datasets from the KNMI (Royal Netherlands Meteorological Institute) Data Platform. This tool supports concurrent downloads and provides both a command-line interface and a Python API.

Background

This project was inspired by my experience working at Clairify [www.clairify.io], where I worked extensively with KNMI datasets. After leaving, I had more time to create this tool to address the need for a more streamlined download process. The goal was to simplify dataset acquisition for Python projects, making it easier for developers and data scientists to work with KNMI's valuable meteorological data.

Features

Concurrent downloads for improved performance
Progress bars for both overall and individual file downloads
Support for date range filtering
Skips already downloaded files
Both CLI and Python API interfaces
Detailed download statistics
Anonymous API key support with automatic fetching
Built with Kiota-generated API client for type-safe KNMI API interactions
Request timeouts for improved reliability

Installation

You can install the package using pip:

pip install knmi-dataset-downloader

Prerequisites

Python 3.7 or higher
A KNMI Data Platform API key (optional - will use anonymous API key if not provided)

Usage

Command Line Interface

The simplest way to use the downloader is through the command line:

# Using your own API key
knmi-download --api-key YOUR_API_KEY --start-date 2024-01-01T00:00:00 --end-date 2024-01-31T23:59:59

# Using anonymous API key (automatically fetched)
knmi-download --start-date 2024-01-01 --end-date 2024-01-31

# Limit the number of files to download
knmi-download --start-date 2024-01-01 --end-date 2024-01-31 --limit 5

Available options:

Options:
  -d, --dataset TEXT     Name of the dataset to download (default: Actuele10mindataKNMIstations)
  -v, --version TEXT     Version of the dataset (default: 2)
  -c, --concurrent INT   Maximum number of concurrent downloads (default: 10)
  -s, --start-date TEXT  Start date in ISO 8601 format (e.g., 2024-01-01T00:00:00 or 2024-01-01)
                        Default is 1 hour and 30 minutes ago
  -e, --end-date TEXT    End date in ISO 8601 format (e.g., 2024-01-01T00:00:00 or 2024-01-01)
                        Default is now
  --api-key TEXT         KNMI API key (optional - will fetch anonymous API key if not provided)
  -o, --output-dir PATH  Output directory for downloaded files
  --limit INT           Maximum number of files to download (optional)
  --help                 Show this message and exit

Python API

You can also use the package in your Python code:

from knmi_dataset_downloader import dataset
import asyncio
from datetime import datetime

async def main():
    # Download files for a specific date range
    stats = await dataset.download(
        api_key="YOUR_API_KEY",  # Optional - will use anonymous API key if not provided
        dataset_name="Actuele10mindataKNMIstations",  # Optional - uses default if not provided
        version="2",  # Optional - uses default if not provided
        max_concurrent=10,  # Optional - uses default if not provided
        output_dir="path/to/output",  # Optional - uses default if not provided
        start_date=datetime(2024, 1, 1),
        end_date=datetime(2024, 1, 31),
        limit=5  # Optional - limit the number of files to download
    )
    
    # Access download statistics
    print(f"Total files found: {stats.total_files}")
    print(f"Files downloaded: {stats.downloaded_files}")
    print(f"Files skipped: {stats.skipped_files}")

# Run the download
if __name__ == "__main__":
    asyncio.run(main())

Download Statistics

After each download session, the tool provides detailed statistics including:

Total number of files found
Number of files already present (skipped)
Number of files downloaded
Number of failed downloads
Total data downloaded
List of any failed downloads

Configuration

By default, files are downloaded to a directory specified by DATASET_OUTPUT_DIR in your configuration. You can modify this by setting the appropriate environment variable or updating the config file.

Error Handling

The downloader automatically skips existing files
Partially downloaded files are removed in case of failures
Failed downloads are logged and reported in the final statistics

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Acknowledgments

KNMI for providing the Data Platform API
Built with Python's asyncio for efficient concurrent downloads

Support

If you encounter any problems or have suggestions, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
.vscode		.vscode
src/knmi_dataset_downloader		src/knmi_dataset_downloader
tests		tests
.cursorignore		.cursorignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_api.sh		generate_api.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KNMI Dataset Downloader

Background

Features

Installation

Prerequisites

Usage

Command Line Interface

Python API

Download Statistics

Configuration

Error Handling

Contributing

License

Acknowledgments

Support

About

Releases 8

Packages

Languages

License

tiborrr/knmi-dataset-downloader

Folders and files

Latest commit

History

Repository files navigation

KNMI Dataset Downloader

Background

Features

Installation

Prerequisites

Usage

Command Line Interface

Python API

Download Statistics

Configuration

Error Handling

Contributing

License

Acknowledgments

Support

About

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages