Biodiversity areas, especially primary forests, provide multiple ecosystem services for the local population and the planet as a whole. The rapid expansion of human land use into natural ecosystems and the impacts of the global climate crisis put natural ecosystems and the global biodiversity under threat.
The mapme.biodiversity package helps to analyse a number of biodiversity related indicators and biodiversity threats based on freely available geodata-sources such as the Global Forest Watch. It supports computational efficient routines and heavy parallel computing in cloud-infrastructures such as AWS or Microsoft Azure using in the statistical programming language R. The package allows for the analysis of global biodiversity portfolios with a thousand or millions of AOIs which is normally only possible on dedicated platforms such as the Google Earth Engine. It provides the possibility to e.g. analyse the World Database of Protected Areas (WDPA) for a number of relevant indicators. The primary use case of this package is to support scientific analysis and data science for individuals and organizations who seek to preserve the planet biodiversity. Its development is funded by the German Development Bank KfW.
The package and its dependencies can be installed from CRAN via:
install.packages("mapme.biodiversity", dependencies = TRUE)
To install the development version, use the following command:
remotes::install_github("https://github.com/mapme-initiative/mapme.biodiversity", dependencies = TRUE)
Below is a list of the resources currently supported by
mapme.biodiversity
.
name | description | licence |
---|---|---|
accessibility_2000 | Accessibility data for the year 2000 from the Global Accessibility Map project | See JRC data policy: https://joint-research-centre.ec.europa.eu/jrc-mission-statement-work-programme/data-policy_en |
acled | Armed Conflict Location & Event Data (ACLED) | Visit acleddata.com |
biodiversity_intactness_index | Biodiversity Intactness Index | CC-BY-4.0 |
chelsa | Climatologies at High resolution for the Earth Land Surface Areas (CHELSA) | Unknown - Must cite! |
chirps | Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) | CC - unknown |
esalandcover | Copernicus Land Monitoring Service (CLMS) 100 meter land cover product | CC-BY 4.0 |
fritz_et_al | Drivers of deforestation in the tropics | CC-BY 4.0 |
gfw_emissions | Global Forest Watch - CO2 Emssions caused by forest cover loss | CC-BY 4.0 |
gfw_lossyear | Global Forest Watch - Year of forest cover loss occurence | CC-BY 4.0 |
gfw_treecover | Global Forest Watch - Percentage of canopy closure in 2000 | CC-BY 4.0 |
global_surface_water_change | Global Surface Water - Change of water occurrence intensity | https://www.copernicus.eu/en/access-data |
global_surface_water_occurrence | Global Surface Water - Percentage of water occurrence | https://www.copernicus.eu/en/access-data |
global_surface_water_recurrence | Global Surface Water - Percentage of water recurrence | https://www.copernicus.eu/en/access-data |
global_surface_water_seasonality | Global Surface Water - Seasonality of water occurrrence | https://www.copernicus.eu/en/access-data |
global_surface_water_transitions | Global Surface Water - Transition classes | https://www.copernicus.eu/en/access-data |
gmw | Global Mangrove Watch - Vector data of mangrove extent | CC BY 4.0 |
gsw_time_series | Global Surface Water - Yearly Time Series | https://global-surface-water.appspot.com/download |
humanfootprint | Time series on human pressures on natural ecosystems. | CC BY 4.0 |
ipbes_biomes | Global Assessment Report on Biodiversity and Ecosystem Services division of the earth’s surface into biomes and anthromes. | CC 4.0 |
irr_carbon | Amount of carbon irrecoverably lost by a typical land use conversion event until mid-century. | CC NC 4.0 |
iucn | IUCN Species Richness Raser Dataset | https://www.iucnredlist.org/terms/terms-of-use |
key_biodiversity_areas | Key Biodiversity Areas | https://www.keybiodiversityareas.org/termsofservice |
man_carbon | Amount of carbon that is manageable by humans. | CC NC 4.0 |
mcd64a1 | MODIS Burned Area Monthly Product (Aqua and Terra) | https://lpdaac.usgs.gov/data/data-citation-and-policies/ |
nasa_grace | NASA Gravity Recovery And Climate Experiment (GRACE) - Measurments of Earth’s mass and water changes | https://nasagrace.unl.edu/About.aspx |
nasa_srtm | NASA Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) | https://lpdaac.usgs.gov/data/data-citation-and-policies/ |
nelson_et_al | Global maps of traveltime to cities | CC-BY 4.0 |
soilgrids | ISRIC - Modelled global soil property layers | CC-BY 4.0 |
teow | Terrestrial Ecosystems of the World (TEOW) from WWF-US | unknown |
ucdp_ged | UCDP Georeferenced Event Dataset (UCDP GED) | CC-BY 4.0 |
vul_carbon | Amount of carbon that is vulnerable to a typical land use conversion event. | CC NC 4.0 |
worldclim_max_temperature | WorldClim - Monthly maximum temperature 1960 - 2021 | https://www.worldclim.org/about.html |
worldclim_min_temperature | WorldClim - Monthly minimum temperature 1960 - 2021 | https://www.worldclim.org/about.html |
worldclim_precipitation | WorldClim - Monthly precipitation 1960 - 2021 | https://www.worldclim.org/about.html |
worldpop | WorldPop - Unconstrained Global Mosaics 2000 - 2020 | CC-BY 4.0 |
Next, is a list of supported indicators.
name | description |
---|---|
biodiversity_intactness_index | Averaged biodiversity intactness index. |
biome | Areal statistics of biomes from TEOW |
burned_area | Monthly burned area detected by MODIS satellites |
deforestation_drivers | Areal statistics of deforestation drivers |
drought_indicator | Relative wetness statistics based on NASA GRACE |
ecoregion | Areal statstics of ecoregions based on TEOW |
elevation | Statistics of elevation based on NASA SRTM |
exposed_population_acled | Number of people exposed to conflicts based on ACLED |
exposed_population_ucdp | Number of people exposed to conflicts based on UCDP GED |
fatalities_acled | Number of fatalities by event type based on ACLED. |
fatalities_ucdp | Number of fatalities by group of conflict based on UCDP GED |
gsw_change | Statistics of the surface water change layer by JRC |
gsw_occurrence | Areal statistic of surface water based on occurrence threshold |
gsw_recurrence | Areal statistic of surface water based on reccurence threshold |
gsw_seasonality | Areal statistic of surface water by seasonality |
gsw_time_series | Global Surface Water - Yearly Time Series area estimation of water classes. |
gsw_transitions | Areal statistics of surface water grouped by transition class |
humanfootprint | Statistics of the human footprint data set per polygon. |
ipbes_biomes | Area distibution of IBPES biomes within a polygon. |
irr_carbon | Statistics of irrecoverable carbon per polygon. |
key_biodiversity_areas | Area estimation of intersection with key biodiversity areas. |
landcover | Areal statistics grouped by landcover class |
man_carbon | Statistics of manageable carbon per polygon. |
mangroves_area | Area covered by mangroves |
population_count | Statistic of population counts |
precipitation_chelsa | Statistics of CHELSA precipitation layer |
precipitation_chirps | Statistics of CHIRPS precipitation layer |
precipitation_wc | Statistics of WorldClim precipitation layer |
slope | Statistics of slope based on NASA SRTM |
soilproperties | Statistics of SoilGrids layers |
species_richness | Species richness statistics based on user-specified raster files. |
temperature_max_wc | Statistics of WorldClim maximum temperature layer |
temperature_min_wc | Statistics of WorldClim minimum temperature layer |
traveltime | Statistics of traveltime to the clostes city grouped by city category |
traveltime_2000 | Statistics of traveltime to the clostest city in 2000 |
treecover_area | Area of forest cover by year |
treecover_area_and_emissions | Area of forest cover and greenhouse gas emssions caused by forest loss by year |
treecoverloss_emissions | Greenouse gas emissions cause by forest loss by year |
tri | Statistics of terrain rudgedness index based on NASA SRTM DEM |
vul_carbon | Statistics of vulnerable carbon per polygon. |
{mapme.biodiversity}
works by constructing a portfolio from an sf
object. Specific raster and vector resource matching the spatio-temporal
extent of the portfolio are made available locally. Once all required
resources are available, indicators can be calculated individually for
each asset in the portfolio.
library(mapme.biodiversity)
library(sf)
## Linking to GEOS 3.13.0, GDAL 3.9.2, PROJ 9.5.0; sf_use_s2() is TRUE
Once you have decided on an indicator you are interested in, you can
start by making the required resource available for your portfolio.
Using mapme_options()
you can set an output directory, control the
maximum size of polygons before they are chunked into smaller parts, and
control the verbosity of the package.
A portfolio is represented by an sf-object. It is required for the
object to only contain geometries of type POLYGON
and MULTIPOLYGON
as assets. We can request the download of a resource for the spatial
extent of our portfolio by using the get_resources()
function. We
simply supply our portfolio and one or more resource functions. Once the
resources were made available, we can query the calculation of an
indicator by using the calc_indicators()
function. This function also
expects the portfolio as input and one or more indicator functions. Once
the indicator has been calculated for all assets in a portfolio, the
data is returned as a nested list column to the original portfolio
object. The output of each indicator is standardized to common format,
consisting of a tibble with columns datetime
, variable
, unit
, and
value
. We can transform the the data to long format by using
portfolio_long()
.
mapme_options(
outdir = system.file("res", package = "mapme.biodiversity"),
chunk_size = 1e6, # in ha
verbose = FALSE
)
aoi <- system.file("extdata", "sierra_de_neiba_478140_2.gpkg", package = "mapme.biodiversity") %>%
sf::read_sf() %>%
get_resources(
get_gfw_treecover(version = "GFC-2023-v1.11"),
get_gfw_lossyear(version = "GFC-2023-v1.11"),
get_gfw_emissions()
) %>%
calc_indicators(calc_treecover_area_and_emissions(years = 2016:2017, min_size = 1, min_cover = 30)) %>%
portfolio_long()
aoi
## Simple feature collection with 4 features and 8 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -71.80933 ymin: 18.57668 xmax: -71.33201 ymax: 18.69931
## Geodetic CRS: WGS 84
## # A tibble: 4 × 9
## WDPAID ISO3 assetid indicator datetime variable unit value
## <dbl> <chr> <int> <chr> <dttm> <chr> <chr> <dbl>
## 1 478140 DOM 1 treecover_area_… 2016-01-01 00:00:00 emissio… Mg 4296.
## 2 478140 DOM 1 treecover_area_… 2016-01-01 00:00:00 treecov… ha 2370.
## 3 478140 DOM 1 treecover_area_… 2017-01-01 00:00:00 emissio… Mg 4970.
## 4 478140 DOM 1 treecover_area_… 2017-01-01 00:00:00 treecov… ha 2358.
## # ℹ 1 more variable: geom <POLYGON [°]>
{mapme.biodiversity}
leverages GDAL’s capabilities for data I/O. For
users of this package, that means that integrating a cloud storage is as
easy as setting up a configuration file and changing the outdir
argument in mapme_options()
. While you could also decide to use
environment variables, we recommend to set up a GDAL config file. You
can find GDAL’s documentation on this topic
here.
Suppose that we want to use an AWS S3 bucket that we control to write
resource data to. Let’s assume this bucket is already set up and we wish
to refer to it in our R code as mapme-data
. The GDAL configuration
file should look something like this:
[credentials]
[.mapme-data]
path=/vsis3/mapme-data
AWS_SECRET_ACCESS_KEY=<your-access-key>
AWS_ACCESS_KEY_ID=<your-access-id>
The connection will be handled based on GDAL’s virtual file system. You can find documentation on specific options for your cloud provider here.
Ideally, you would also set the following in the .Renviron
file in
your user’s home directory to ensure that GDAL is aware of this
configuration when an R session is started:
GDAL_CONFIG_FILE = "<path-to-your-config-file>"
Then, in your scripts set the outdir
option to the value specified
with the path
variable in the configuration file:
mapme_options(outdir = "/vsis3/mapme-data")
{mapme.biodiversity}
follows the parallel computing paradigm of the
{future}
package. That
means that you as a user are in the control if and how you would like to
set up parallel processing. Since {mapme.biodiversity} v0.9
, we apply
pre-chunking to all assets in the portfolio. That means that assets are
split up into components of roughly the size of chunk_size
. These
components can than be iterated over in parallel to speed up processing.
Indicator values will be aggregated automatically.
library(future)
plan(cluster, workers = 6)
As another example, with the code below one would apply parallel processing of 2 assets, with each having 4 workers available to process chunks, thus requiring a total of 8 available cores on the host machine. Be sure to not request more workers than available on your machine.
library(progressr)
plan(cluster, workers = 2)
with_progress({
aoi <- calc_indicators(
aoi,
calc_treecover_area_and_emissions(
min_size = 1,
min_cover = 30
)
)
})
plan(sequential) # close child processes
Head over to the online documentation find more detailed information about the package.