Skip to content
View ddelange's full-sized avatar
💥
["translatio", "imitatio", "aemulatio"]
💥
["translatio", "imitatio", "aemulatio"]

Block or report ddelange

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

etl

Extract-Transform-Load, Data Wrangling, Data Mining, ...
251 repositories

YTsaurus is a scalable and fault-tolerant open-source big data platform.

C++ 1,971 139 Updated Jan 23, 2025

Quilt is a data mesh for connecting people with actionable data

TypeScript 1,330 90 Updated Jan 23, 2025

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

Java 2,688 794 Updated Jan 21, 2025

A Unified Toolkit for Deep Learning Based Document Image Analysis

Python 5,033 481 Updated Aug 15, 2024

Source code for my collection of articles on using pandas.

Jupyter Notebook 1,541 384 Updated Dec 14, 2022

Distributed task queue with full async support

Python 954 57 Updated Dec 23, 2024

A fast and reliable background task processing library for Python 3.

Python 4,434 323 Updated Dec 22, 2024

A Pure Python, React-style Framework for Scaling Your Jupyter and Web Apps

Python 1,965 146 Updated Jan 23, 2025

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

TypeScript 56,244 10,600 Updated Jan 23, 2025

Cluster tools for running Dask on Databricks

Python 13 5 Updated Jun 3, 2024

Port of Wappalyzer (uncovers technologies used on websites) to automate mass scanning.

Go 996 143 Updated Nov 26, 2023

This project aims to maintain Wappalyzer technologies

Python 272 60 Updated Jan 22, 2025

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

HTML 9,857 824 Updated Jan 23, 2025

Task pipelining for taskiq

Python 26 3 Updated Aug 3, 2024

Bokeh Plotting Backend for Pandas and GeoPandas

Python 878 112 Updated Apr 10, 2024

Easily create large video dataset from video urls

Python 561 67 Updated Jul 30, 2024

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,163 164 Updated Jan 22, 2025

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, …

C 1,204 68 Updated Jan 23, 2025

All-in-one infrastructure for search, recommendations, RAG, and analytics offered via API

Rust 1,859 159 Updated Jan 23, 2025

Data-Centric Pipelines and Data Versioning

Go 6,199 569 Updated Jan 15, 2025

Efficient data transformation and modeling framework that is backwards compatible with dbt.

Python 1,985 178 Updated Jan 23, 2025

PISA: Performant Indexes and Search for Academia

C++ 959 66 Updated Jan 12, 2025

⬛️ CLI tool for saving complete web pages as a single HTML file

Rust 12,415 347 Updated Dec 2, 2024

Capture a URL with Playwright

Python 30 3 Updated Jan 7, 2025

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 25,283 3,225 Updated Sep 24, 2024

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

TypeScript 22,451 1,807 Updated Jan 23, 2025

the file filesystem: mount semi-structured data (like JSON) as a Unix filesystem

Rust 468 14 Updated May 2, 2024

Python scraper based on AI

Python 17,369 1,457 Updated Jan 22, 2025

A lightweight message queue. Like AWS SQS and RSMQ but on Postgres.

PLpgSQL 2,923 77 Updated Dec 27, 2024

High-performance and seamless sharing and modification of Python objects between processes, without the periodic overhead of serialization and deserialization. Provides fast inter-process communica…

Python 66 3 Updated Nov 25, 2024