Skip to content

Commit

Permalink
Merge pull request #23 from big-unibo/develop
Browse files Browse the repository at this point in the history
Merging develop to mater
  • Loading branch information
ManuelePasini authored Sep 12, 2024
2 parents d0d4ea1 + cbf74f4 commit 6320234
Show file tree
Hide file tree
Showing 31 changed files with 541 additions and 495 deletions.
27 changes: 0 additions & 27 deletions .env

This file was deleted.

4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.env
dataplatform/.env
/dataplatform/tests/HDFD_VENV_3.7
/dataplatform/tests/HDFD_VENV_3.7
/dataplatform/multiple_stacks/runtime/
Expand Down Expand Up @@ -639,3 +641,5 @@ fabric.propertyDataList
/python-utils/src/main/python/gmail/credentials.json
/python-utils/src/main/python/gmail/token.pickle
*.png

*.airflow.env
51 changes: 51 additions & 0 deletions airflow_dags/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Airflow
[Installazione](https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html)

## Docker registry
- stack `dataplatform/multiple_stacks/registry.yaml`
- contact locally in the cluster at 127.0.0.0:5000
- Starting from a Docker file in a cluster machine
- `docker image build --tag 127.0.0.0:5000/IMAGE_NAME:VERSION -f PATH_DOCKERFILE .`
- `docker push 127.0.0.0:5000/IMAGE_NAME:VERSION`
- From any other cluster machine
- `docker pull 127.0.0.0:5000/IMAGE_NAME:VERSION`

- Clean data in the registry (enter in the container)
- `registry garbage-collect -m /etc/docker/registry/config.yml`

## Airflow
Start example:
- In the directory of dags that is `:${NFSPATH}/dataplatform_config/airflow_data/dags`
- create a directory for each project and put the file for generate the DAG (in a subdirectory)
- example of a files are in __abds-bigdata__ project `\cimice\src\main\resources` and `\ingestion-weather\src\main\resources`
- [DAG](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/dags.html): A DAG (Directed Acyclic Graph) is the core concept of Airflow, collecting Tasks together, organized with dependencies and relationships to say how they should run.
- [scheduling options](https://airflow.apache.org/docs/apache-airflow/1.10.1/scheduler.html)
- Our dags are always of one task, that is a [Docker Operator](https://airflow.apache.org/docs/apache-airflow-providers-docker/1.0.2/_api/airflow/providers/docker/operators/docker/index.html)
- have a name has to be made of alphanumeric characters, dashes, dots and underscores exclusively
- In particular, we use a specialization that is the [Docker swarm operator](https://airflow.apache.org/docs/apache-airflow-providers-docker/stable/_api/airflow/providers/docker/operators/docker_swarm/index.html#airflow.providers.docker.operators.docker_swarm.DockerSwarmOperator),
that can be useful for put constraints in where spawn docker containers.
### DockerSwarmOperator Airflow (some things)
- last line returned by the docker container is in XComs
- If want logs all the things on the standard output: `xcom_all=True`
- constraints in cpus and memory usage
- **auto_remove**=True, the docker rm
- **mounts**=[] Use volumes "source", "target", "type", "read_only"
- **command**="Command to be run in the container", overwrites the cmd, add a space at the end to tell that is not a template
- **mount_tmp_dir**=False, not mount a temporary directory
- **container_name**=similar to task name
- **placement**
- **network_mode** and **networks** use BIG-dataplatform-network
- For extra things refer to the official documentation

### Trigger a dag from python application
This is made in **abds-bigdata** project `ingestion-weather` module,
through the `python-service-interaction-utils/src/main/python/airflow_interaction.py` service.

### Common errors in the deploy
- not pass files that are in .gitignore in the build of the container
- use service names and internal ports when refer to other services (not use exposed ports)
- set the link to services in the config to the new clusters (e.g., hdfs)
- Errors in dag import: enter inside airflow-scheduler container and launch `airflow scheduler`

### Possible updates
- configure an smpt server, for [send mails on failure](https://stackoverflow.com/questions/58736009/email-on-failure-retry-with-airflow-in-docker-container)
51 changes: 0 additions & 51 deletions airflow_dags/bashscript/bash_script_example.py

This file was deleted.

3 changes: 0 additions & 3 deletions airflow_dags/bashscript/bash_script_example.sh

This file was deleted.

13 changes: 0 additions & 13 deletions airflow_dags/docker_with_code/Dockerfile

This file was deleted.

26 changes: 0 additions & 26 deletions airflow_dags/docker_with_code/docker_include_python.py

This file was deleted.

17 changes: 0 additions & 17 deletions airflow_dags/docker_with_code/python_script.py

This file was deleted.

10 changes: 0 additions & 10 deletions airflow_dags/dockerpyscript/Dockerfile

This file was deleted.

68 changes: 0 additions & 68 deletions airflow_dags/dockerpyscript/README.md

This file was deleted.

47 changes: 0 additions & 47 deletions airflow_dags/dockerpyscript/docker_operation_example.py

This file was deleted.

34 changes: 0 additions & 34 deletions airflow_dags/dockerpyscript/python_docker_example.py

This file was deleted.

Loading

0 comments on commit 6320234

Please sign in to comment.