Skip to content

Commit

Permalink
47 sentinel 1 radiometric calibration (#48)
Browse files Browse the repository at this point in the history
* Ajusted sorting of file list

* Created sub-module for clipping

* Created sub-module for clipping

* Added calibration of S1 DN values and removing thermal noise

* Remove redundant line

* Adding xarray to requirements

* Updating comments

* Fixed linting

* Added xmlschema to requirements

* Updated docs

* Added __init__ file

* Extended docstring

* Adjusted variable naming

* Fixed linting

* Updated docstring

* Removed debugging code

* Added argument denoise

* Update docs/dataset.md

Co-authored-by: jmaces <[email protected]>

* Update eurocropsml/acquisition/clipping/calibration.py

Co-authored-by: jmaces <[email protected]>

* Update eurocropsml/acquisition/clipping/clipper.py

Co-authored-by: jmaces <[email protected]>

* Fixed linting

* Removing convertion to integer

* Changing values to float

* Adding None if all clipped values are zero

* Changed padding value for filtering None

* Adjusted padding value

* Adjusted padding value to -999

* Adjusted if-clauses

* Updated acquisition pipeline diagram with S1

* Updated diagram

* Updated diagram

* Updated diagram

* patch_median as float

* Changed precision of calibration values

* Updated colour

---------

Co-authored-by: jmaces <[email protected]>
  • Loading branch information
jsreuss and jmaces authored Nov 11, 2024
1 parent 99617b8 commit 6f734a5
Show file tree
Hide file tree
Showing 18 changed files with 442 additions and 167 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Changes from previous releases are listed below.
- Adjusting split generation _(see #45)_
- Fixing padding mask _(see #50)_
- Fasten up padding to 366 days _(see #54)_
- Radiometric calibration of Sentinel-1 _(see #47)_

## 0.3.1 (2024-07-29)
- Remove country_code variable in collector downloader _(see #33)_
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Therefore, this dataset is particularly suited to explore transfer-learning meth

The data acquisition, aggregation, and pre-processing steps are schematically illustrated below. A more detailed description is given in the [dataset section](https://eurocropsml.readthedocs.io/en/latest/dataset.html) of our documentation.

![Data Acquisition Pipeline.](docs/_static/acquisition-pipeline.png)
![Data Acquisition Pipeline.](docs/_static/acquisition-pipeline-s1s2.png)
<!-- teaser-end -->

## Getting Started
Expand Down
Binary file added docs/_static/acquisition-pipeline-s1s2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/_static/acquisition-pipeline.png
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The second is used for building ready-to-use machine learning datasets from the

The usage of individual sub-packages and modules within our data processing is illustrated below.

.. image:: _static/acquisition-pipeline.png
.. image:: _static/acquisition-pipeline-s1s2.png
:height: 300px
:alt: Data Acquisition Pipeline.
:align: center
Expand Down
6 changes: 3 additions & 3 deletions docs/dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,14 @@ In order to obtain the observation data for a given country and year, the follow

2. Clipping of satellite data and calculation of median pixel values.

{any}`Polygon clipping<eurocropsml.acquisition.clipper>`: Clip parcels from the `.SAFE` files to obtain time series of corresponding reflectance data. As the dataset is intended to be used for crop type classification, we aggregated the collected pixel values. For every parcel and each available time step observation, we calculated the median pixel value for each of the two polarization bands of the Sentinel-1 or 13 spectral bands of the Sentinel-2 raster tiles, as also done in the [tiny EuroCrops dataset](https://arxiv.org/abs/2106.08151).
{any}`Polygon clipping<eurocropsml.acquisition.clipping.clipper>`: Clip parcels from the `.SAFE` files to obtain time series of corresponding reflectance data. As the dataset is intended to be used for crop type classification, we aggregated the collected pixel values. For every parcel and each available time step observation, we calculated the median pixel value for each of the two polarization bands of the Sentinel-1 or 13 spectral bands of the Sentinel-2 raster tiles, as also done in the [tiny EuroCrops dataset](https://arxiv.org/abs/2106.08151).

3. Regional mapping: To enhance the precision of geographical data and facilitate the effective partitioning of the dataset, we utilized the [Eurostat GISCO database](https://ec.europa.eu/eurostat/de/web/gisco/geodata/statistical-units/territorial-units-statistics) to link the $\texttt{EuroCrops}$ parcels with their corresponding NUTS region.

{any}`NUTS regions<eurocropsml.acquisition.region>`: Add NUTS1-NUTS3 regions. The shapefiles for the NUTS-regions have been obtained from [Eurostat](https://ec.europa.eu/eurostat/de/web/gisco/geodata/statistical-units/territorial-units-statistics).

:::{note}
By default, the Sentinel-1 data is acquired at processing level LEVEL1, utilizing the IW operational mode. Additionally, the observations are selected for the VH and VV polarization.
By default, the Sentinel-1 data is acquired at processing level LEVEL1, utilizing the IW operational mode. Additionally, the observations are selected for the VH and VV polarization. The pipeline provides the median pixel value of radar backscatter in decibels (dB). For information on how to to remove thermal noise, please see {doc}`Examples<examples>`.
:::

:::{note}
Expand All @@ -36,7 +36,7 @@ However, since we are only relying on the median pixel value and not on individu
Furthermore, the $\texttt{EuroCrops}$ sometimes contains duplicate parcel geometries. If this is the case, only one entry is kept.
:::

![Data Acquisition Pipeline.](_static/acquisition-pipeline.png)
![Data Acquisition Pipeline.](_static/acquisition-pipeline-s1s2.png)


We provide all scripts that are necessary to perform the above steps.
Expand Down
7 changes: 6 additions & 1 deletion docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Please first make sure to download and unzip the necessary [vector data](https:/
:::

### Adjusting multiprocessing parameters
Each country config contains a number of parameters that are utilized for multiprocessing, which may be adjusted in accordance with the available resources. It should be noted that the limitations pertain, in particular, to the {any}`clipping module<eurocropsml.acquisition.clipper>`.
Each country config contains a number of parameters that are utilized for multiprocessing, which may be adjusted in accordance with the available resources. It should be noted that the limitations pertain, in particular, to the {any}`clipping module<eurocropsml.acquisition.clipping.clipper>`.
- `workers` (default 16, used in multiple modules): Maximum number of parallel workers used for multiprocessing. This is contingent upon the vailability of central processing units (CPUs). In the event that this exceeds the number of CPUs, the parallel workers will be set to the number of CPUs. However, in instances where the random-access memory (RAM) capacity is insufficient—such as when constructing the argument list prior to clipping—this value can be reduced within the configuration parameters. It is essential to note that this adjustment may result in a slowing of the process. Therefore, it is recommended to only reduce the number of parallel workers when absolutely necessary.

The following two parameters are exclusively used during the clipping process. In the event that the available RAM is insufficient, they can be lowered. It is important to note that this will impede the clipping process, and thus, we again only advise reducing them if absolutely necessary.
Expand All @@ -91,6 +91,11 @@ Additional settings can also be added, e.g.
$ eurocropsml-cli acquisition eurocrops <COMMAND> cfg.country_config.satellite="S1" +cfg.country_config.operational_mode="EW"
```

By default, thermal noise will not be removed for Sentinel-1. If wanting to do so, please use the following command:
```console
$ eurocropsml-cli acquisition eurocrops <COMMAND> cfg.country_config.satellite="S1" +cfg.country_config.denoise=True
```

Please see {any}`config<eurocropsml.acquisition.config>` for the Sentinel-1 defaults and possible other values.

## Customizing the dataset pipeline
Expand Down
3 changes: 2 additions & 1 deletion eurocropsml/acquisition/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
from pathlib import Path
from typing import cast

from eurocropsml.acquisition import clipper, collector, copier, region
from eurocropsml.acquisition import collector, copier, region
from eurocropsml.acquisition.clipping import clipper
from eurocropsml.acquisition.config import AcquisitionConfig
from eurocropsml.settings import Settings

Expand Down
1 change: 1 addition & 0 deletions eurocropsml/acquisition/clipping/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Clipping Sentinel tiles for the EuroCropsML dataset."""
170 changes: 170 additions & 0 deletions eurocropsml/acquisition/clipping/calibration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
"""Utilities for applying radiometric calibration to S-1 data."""

from pathlib import Path
from typing import Any, cast
from xml.etree import ElementTree

import numpy as np
import xarray as xr
import xmlschema


def _get_xml_file(filepath: Path, band: str, identifier: str = "calibration") -> Path:
files: list[Path] = list(filepath.iterdir())
return [file for file in files if f"{band.lower()}" in str(file) and identifier in str(file)][0]


def _parse_tag_as_list(
xml_path: Path,
query: str,
schema_dir: Path,
validation: str = "skip",
) -> list[dict[str, Any]]:
"""Function to parse xml tags into list.
Adjusted from xarray-sentinel (https://github.com/bopen/xarray-sentinel).
"""
schema = xmlschema.XMLSchema(schema_dir)
xml_tree = ElementTree.parse(xml_path)
tag: Any = schema.decode(xml_tree, query, validation=validation)
if tag is None:
tag = []
elif isinstance(tag, dict):
tag = [tag]
tag_list: list[dict[str, Any]] = tag
assert isinstance(tag_list, list), f"{type(tag_list)} is not list"
return tag_list


def _open_noise_dataset(safe_dir: Path, band: str) -> xr.Dataset:
"""Function to read noise from LUT.
This reads the NADS (Noise Annotation Data Set) file.
Adjusted from xarray-sentinel (https://github.com/bopen/xarray-sentinel).
"""
xml_dir: Path = safe_dir / "annotation" / "calibration"
schema_dir: Path = safe_dir / "support" / "s1-level-1-noise.xsd"
xml_calibration_file = _get_xml_file(xml_dir, band, identifier="noise")
noise_vectors = _parse_tag_as_list(xml_calibration_file, ".//noiseRangeVector", schema_dir)

pixel_list = []
line_list = []
noise_range_lut_list = []
for vector in noise_vectors:
line_list.append(vector["line"])
pixel = np.fromstring(vector["pixel"]["$"], dtype=int, sep=" ")
pixel_list.append(pixel)
noise_range_lut = np.fromstring(vector["noiseRangeLut"]["$"], dtype=np.float32, sep=" ")
noise_range_lut_list.append(noise_range_lut)

pixel = np.array(pixel_list)
if (pixel - pixel[0]).any():
raise ValueError("Unable to organize noise vectors in a regular line-pixel grid")
data_vars = {
"noiseRangeLut": (("line", "pixel"), noise_range_lut_list),
}
coords = {"line": line_list, "pixel": pixel_list[0]}

return xr.Dataset(data_vars=data_vars, coords=coords)


def _open_calibration_dataset(safe_dir: Path, band: str) -> xr.Dataset:
"""Function to read calibration from LUT.
This reads the CADS (Calibration Annotation Data Set) file.
Adjusted from xarray-sentinel (https://github.com/bopen/xarray-sentinel).
"""
xml_dir: Path = safe_dir / "annotation" / "calibration"
schema_dir: Path = safe_dir / "support" / "s1-level-1-calibration.xsd"
xml_calibration_file = _get_xml_file(xml_dir, band)
calibration_vectors = _parse_tag_as_list(
xml_calibration_file, ".//calibrationVector", schema_dir
)

pixel_list = []
line_list = []
sigmanought_list = []
for vector in calibration_vectors:
line_list.append(vector["line"])
pixel = np.fromstring(vector["pixel"]["$"], dtype=int, sep=" ")
pixel_list.append(pixel)
sigma_nought = np.fromstring(vector["sigmaNought"]["$"], dtype=np.float32, sep=" ")
sigmanought_list.append(sigma_nought)

pixel = np.array(pixel_list)
if (pixel - pixel[0]).any():
raise ValueError("Unable to organise calibration vectors in a regular line-pixel grid")
data_vars = {
"sigmaNought": (("line", "pixel"), sigmanought_list),
}
coords = {"line": line_list, "pixel": pixel_list[0]}

return xr.Dataset(data_vars=data_vars, coords=coords)


def _get_lut_value(
digital_number: xr.DataArray, available_lut: xr.DataArray, **kwargs: Any
) -> xr.DataArray:
lut_mean = available_lut.mean()
if np.allclose(lut_mean, available_lut, **kwargs):
lut: xr.DataArray = lut_mean.astype(np.float64)
else:
lut = available_lut.interp(
line=digital_number.line,
pixel=digital_number.pixel,
).astype(np.float64)
if digital_number.chunks is not None:
lut = lut.chunk(digital_number.chunksizes)

return lut


def _calibrate(
digital_number: xr.DataArray,
backscatter_calibration_lut: xr.DataArray,
thermal_noise_lut: xr.DataArray | None = None,
**kwargs: Any,
) -> xr.DataArray:
"""Return the calibrated sigma nought (backscatter coefficient) using the calibration LUT.
Apply thermal noise removal if wanted.
Adjusted from xarray-sentinel (https://github.com/bopen/xarray-sentinel).
digital_number: Digital numbers from the original raster tile to be calibrated.
backscatter_calibration_lut: Calibration LUT for backscatter (sigma nought).
thermal_noise_lut: Thermal noise LUT to remove sensor noise.
"""
radar_intensity = digital_number**2
backscatter_calibration = _get_lut_value(digital_number, backscatter_calibration_lut, **kwargs)
if thermal_noise_lut is not None:
thermal_noise = _get_lut_value(digital_number, thermal_noise_lut, **kwargs)
radar_intensity = radar_intensity - thermal_noise
sigma_nought: xr.DataArray = radar_intensity / backscatter_calibration**2
return abs(sigma_nought)


def _calibrate_digital_number_in_db(
digital_number: xr.DataArray,
backscatter_calibration_lut: xr.DataArray,
thermal_noise_lut: xr.DataArray | None = None,
min_db: float | None = -50.0,
) -> np.ndarray:
"""Return calibrated sigma nought (backscatter coefficient) in dB using the calibration LUT.
Adjusted from xarray-sentinel (https://github.com/bopen/xarray-sentinel).
digital_number: Digital numbers from the original raster tile to be calibrated.
backscatter_calibration_lut: calibration LUT (sigmaNought).
thermal_noise_lut: Thermal noise LUT.
min_db: minimal value in db, to avoid infinity values.
"""
sigma_nought = _calibrate(digital_number, backscatter_calibration_lut, thermal_noise_lut)
# convert to decibels (dB)
sigma_nought_db = 10.0 * np.log10(np.maximum(sigma_nought, 1e-10)) # prevent division by 0

if min_db is not None:
sigma_nought_db = cast(xr.DataArray, np.maximum(sigma_nought_db, min_db))

return cast(np.ndarray, sigma_nought_db.values)
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
import typer
from tqdm import tqdm

from eurocropsml.acquisition.clipping.utils import _merge_clipper, mask_polygon_raster
from eurocropsml.acquisition.config import CollectorConfig
from eurocropsml.acquisition.utils import _merge_clipper, mask_polygon_raster

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -163,18 +163,23 @@ def _filter_args(

def _process_raster_parallel(
satellite: Literal["S1", "S2"],
denoise: bool,
polygon_df: pd.DataFrame,
parcel_id_name: str,
bands: list[str],
filtered_images: gpd.GeoDataFrame,
band_tiles: list[Path],
) -> pd.DataFrame:
"""Processing one raster file.
Args:
satellite: S1 for Sentinel-1 and S2 for Sentinel-2.
denoise: Whether to perform thermal noise removal for Sentinel-1.
For Sentinel-2 this argument has no effect.
polygon_df: Dataframe containing all parcel ids. Will be merged with the clipped values.
parcel_id_name: The country's parcel ID name (varies from country to country).
filtered_images: Dataframe containing all parcel ids that lie in this raster tile.
bands: (Sub-)set of Sentinel-1 (radar) or Sentinel-2 (spectral) bands.
band_tiles: Paths to the raster's band tiles.
Returns:
Expand All @@ -192,7 +197,7 @@ def _process_raster_parallel(
filtered_geom = polygon_df[polygon_df[parcel_id_name].isin(parcel_ids)]

result = mask_polygon_raster(
satellite, band_tiles, filtered_geom, parcel_id_name, product_date
satellite, band_tiles, bands, filtered_geom, parcel_id_name, product_date, denoise
)

if result is not None:
Expand Down Expand Up @@ -248,7 +253,12 @@ def clipping(

polygon_df[config.parcel_id_name] = polygon_df[config.parcel_id_name].astype(int)
func = partial(
_process_raster_parallel, config.satellite, polygon_df, cast(str, config.parcel_id_name)
_process_raster_parallel,
config.satellite,
config.denoise,
polygon_df,
cast(str, config.parcel_id_name),
config.bands,
)

polygon_df = polygon_df.drop(["geometry"], axis=1)
Expand Down
Loading

0 comments on commit 6f734a5

Please sign in to comment.