DATAPARK

Datapark is a self-hosted data platform for educational purposes. It consists of a collection of containerized services that allow the user to build solutions for data-related problems. To use them, you'll need to have docker installed. On the docker-compose file you can find the following services:

jupyterlab: a Jupyter lab server. This is where a developer should be able to use notebooks for handling their data and prototyping their solution.
postgresql: a PostgreSQL database. This can be used for storing data. It's used by other services to store their metadata, such as Minio and MLFlow.
minio: a Minio storage service. It behaves similarly to S3 (AWS). This is the intended place for storing data.
mlflow: a MLFlow tracking server to support machine leraning tasks and applications.
spark: the 3 Spark containers (one master and two workers) provide a Spark cluster that can be used for computing tasks.
airflow: the 3 Airflow containers (one for setting up, one for the web-ui, and one for the scheduler) allows for the scheduling and monitoring of data workflows.

To use, simply clone this repository. To run everything (on a Unix/WSL terminal):

docker compose up -d

To shut it down:

docker compose down

To access the different services on the browser:

You can find usernames and password for the different services on the .env file. Please make sure you change those before using. The platform has examples to help you use the different services from notebooks. There is also an example on how to build Airflow DAGs that run on Spark. By defaut, notebooks are stored on platform/jupyterlab/notebooks/ and DAGs can be found on platform/airflow/dags.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.devcontainer		.devcontainer
platform		platform
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DATAPARK

About

Languages

License

msmenegol/datapark

Folders and files

Latest commit

History

Repository files navigation

DATAPARK

About

Topics

Resources

License

Stars

Watchers

Forks

Languages