The AMS Background Tasks is a set of tools designed to create and update the database of the Amazon Situation Room (AMS). The execution of these tools is managed by Airflow.
In short, Airflow is a platform created by the community to programmatically author, schedule, and monitor workflows or DAGs. In Airflow, a DAG (Directed Acyclic Graph) is a collection of tasks that you want to run, organized in a way that reflects their relationships and dependencies. A DAG is defined in a Python script, which represents the DAG's structure (tasks and their dependencies) as code.
The DAG ams-create-db
is responsible for creating and updating the AMS database. This DAG consists of following tasks:
check-variables
update-environment
check-recreate-db
create-db
update-biome
update-spatial-units
update-active-fires
update-amz-deter
update-cer-deter
download-risk-file
update-ibama-risk
prepare-classification
classify-deter-by-land-use
classify-fires-by-land-use
finalize-classification
Each of these tasks is a Python command-line tool developed using the Click library.
To run the DAG ams-create-db
, three external databases are required: one for DETER data (for each biome), another for active fires data, and an auxiliary database.
From the auxiliary database, the following tables are required:
public.lm_bioma_250
public.municipio_test
public.lml_unidade_federacao_a
cs_amz_5km
cs_amz_5km_biome
cs_amz_25km
cs_amz_25km_biome
cs_amz_150km
cs_amz_150km_biome
cs_cer_5km
cs_cer_5km_biome
cs_cer_25km
cs_cer_25km_biome
cs_cer_150km
cs_cer_150km_biome
The cell tables (prefixed with cs_
), except for the 5km ones, are created by the notebook update_auxiliary.ipynb
, which uses data from the existing AMS Database. The tables cs_*_5km*
, however, are created by the notebook import_cells_from_shapefile.ipynb
, which allows for importing cells into the auxiliary database from a shapefile. If the cell tables are not defined in the auxiliary database, it is necessary to run these notebooks. The shapefile containing the 5km cells is attached to issue #26
.
$ jupyter-notebook notebooks/update_auxiliary.ipynb
This DAG is made to run from "DagBag", this means that all the dag files are inside the root folder. Assuming that the Airflow environment is using
To run over an Airflow instance, it's necessary to setup the following airflow configurations:
Setup this connections ids:
AMS_AF_DB_URL
(Raw fires database, ex: raw_fires_data)AMS_AUX_DB_URL
(Auxiliary database, ex: auxiliary)AMS_AMZ_DETER_B_DB_URL
(Deter Amazonia database, ex: deter_amazonia_nb)AMS_CER_DETER_B_DB_URL
(Deter Cerrado database, ex: deter_amazonia_nb)AMS_DB_URL
(AMS ouput database, ex: ams_new)AMS_FTP_URL
(FTP to get the risk file provided by IBAMA)
Example how to setup the connection fields:
- Connection Id: AMS_AF_DB_URL (Id used by DAG)
- Connection Type: Postgres
- Host: 192.168.1.9 (Database host or IP)
- Database: raw_fires_data (Database name)
- Login: ams (Database username)
- Password: ams (Database password)
- Port: 5432 (Database port)
Setup the following variables:
AIRFLOW_UID
: see Setting the right Airflow user. Example: AIRFLOW_UID=1000AMS_FORCE_RECREATE_DB
: the expected values are 0 or 1. When enabled, it forces the recreation of the AMS database. Example: AMS_FORCE_RECREATE_DB=1AMS_ALL_DATA_DB
: the expected values are 0 or 1. When enabled, it updates all data, including historical data. Example: AMS_ALL_DATA_DB=1AMS_BIOMES
: a list of biomes separated by semicolons. Example: AMS_BIOMES="Amazônia;Cerrado;".
Additionally, it is necessary to place the land use files in the land_use
directory. The naming convention for the files is: {BIOMA}_land_use
.tif (e.g., Amazônia_land_use.tif
, Cerrado_land_use.tif
, and so on).