- Digital Nomad
- [email protected]
etl
A Python package to manage extremely large amounts of data
A curated list of analytics frameworks, software and other tools.
Downloadable snapshots of the Chrome Top Million Websites pulled from public CrUX data in Google BigQuery.
Fast data store for Pandas time-series data
Apache DataFusion Ballista Distributed Query Engine
DatenLord, Computing Defined Storage, an application-orientated, cloud-native distributed storage system
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data everโฆ
A scikit-learn compatible neural network library that wraps PyTorch
C++ library for value-oriented design using the unidirectional data-flow architecture โ Redux for C++
Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.
Deploy a Prefect flow to serverless AWS Lambda function
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
a high-performance, POSIX-ish Amazon S3 file system written in Go
๐๐ฎ๐๐ฎ, ๐๐ป๐ฎ๐น๐๐๐ถ๐ฐ๐ & ๐๐. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
LogAI - An open-source library for log analytics and intelligence
Scalable Python DS & ML, in an API compatible & lightning fast way.
The country converter (coco) - a Python package for converting country names between different classification schemes.
Trigger.dev is the open source background jobs platform.
๐ (currently broken) Backup Google Takeout archives (YouTube channel and Google Photos) at 1GB/s+ to Azure Storage periodically with minimal human toil and financial cost
Basic AWS S3 WebDAV interface implemented in Rust
Python library and CLI you can use to move relational data from one place to another - DBs/CSV/gsheets/dataframes/...
Web Serving and Remote Procedure Calls at 50x lower latency and 70x higher bandwidth than FastAPI, implementing JSON-RPC & REST over io_uring โ๏ธ