Fields of The World (FTW) is a large-scale benchmark dataset designed to advance machine learning models for instance segmentation of agricultural field boundaries. This dataset supports the need for accurate and scalable field boundary data, which is essential for global agricultural monitoring, land use assessments, and environmental studies.
This repository provides the codebase for working with the FTW dataset, including tools for data pre-processing, model training, and evaluation.
- System setup
- Dataset setup
- Dataset visualization
- Inference
- Experimentation
- Notes
- Upcoming features
- Contributing
- License
You need to install Python 3.9 or later and GDAL with libgdal-arrow-parquet.
The easiest way to install the fiboa CLI is to run:
pip install ftw-tools
The alternative is to install the required software through Anaconda/Mamba.
Set up the environment using the provided env.yml
file:
conda env create -f env.yml
conda activate ftw
mamba env create -f env.yml
mamba activate ftw
Verify that PyTorch and CUDA are installed correctly (if using a GPU):
python -c "import torch; print(torch.cuda.is_available())"
This creates the ftw
command-line tool, which is used to download and unpack the data.
pip install -e .
or for development purposes:
pip install -e .[dev]
Usage: ftw [OPTIONS] COMMAND [ARGS]...
Fields of The World (FTW) - Command Line Interface
Options:
--help Show this message and exit.
Commands:
data Downloading, unpacking, and preparing the FTW dataset.
inference Running inference on satellite images plus data prep.
model Training and testing FTW models.
Download and unpack the dataset using the FTW CLI.
This will create a ftw
folder under the given folder after unpacking.
ftw data download --help
Usage: ftw data download [OPTIONS]
Download and unpack the FTW dataset.
Options:
-o, --out TEXT Folder where the files will be downloaded to. Defaults
to './data'.
-f, --clean_download If set, the script will delete the root folder before
downloading.
--countries TEXT Comma-separated list of countries to download. If
'all' (default) is passed, downloads all available
countries.
--no-unpack If set, the script will NOT unpack the downloaded
files.
--help Show this message and exit.
If you had --no-unpack
enabled during download, you can manually unpack the downloaded files using the unpack
command.
This will create a ftw
folder under the given folder after unpacking.
Usage: ftw data unpack [OPTIONS] [INPUT]
Unpack the downloaded FTW dataset. Specify the folder where the data is
located via INPUT. Defaults to './data'.
Options:
--help Show this message and exit.
To download and unpack the complete dataset use following commands:
ftw data download
To download and unpack the specific set of countries use following commands:
ftw data download --countries belgium,kenya,vietnam
Note: Make sure to avoid adding any space in between the list of comma seperated countries.
Explore visualize_dataset.ipynb
to know more about the dataset.
We provide the inference
cli commands to allow users to run models that have been pre-trained on FTW on any temporal pair of S2 images.
ftw inference --help
Usage: ftw inference [OPTIONS] COMMAND [ARGS]...
Inference-related commands.
Options:
--help Show this message and exit.
Commands:
download Download 2 Sentinel-2 scenes & stack them in a single file...
polygonize Polygonize the output from inference
run Run inference on the stacked satellite images
First, you need a trained model - either download a pre-trained model (we provide an example pre-trained model in the Releases list), or train your own model as explained in the Training section.
Second, you need to concatenate the bands of two aligned Sentinel-2 scenes that show your area of interest in two seasons (e.g. planting and harvesting seasons) in the following order: B04_t1, BO3_t1, BO2_t1, B08_t1, B04_t2, BO3_t2, BO2_t2, B08_t2 (t1 and t2 represent two different points in time). The ftw inference download
command does this automatically given two STAC items. The Microsoft Planetary Computer Explorer is a convenient tool for finding relevant scenes and their corresponding STAC items.
To select the timeframe for the two images (Window A and Window B), we looked at the crop calendar by USDA and found the approximate time for planting and harvesting. For example, if you open the crop calendar and select China, you will find that most of the crops are planted from Feb to May, and harvested from Aug to Nov. We then put these dates as filtering parameters in the Planetary Computer Explorer. Set the cloud threshold to 10% or less. Then select a clear observation that covers the full tile.
ftw inference download --help
Usage: ftw inference download [OPTIONS]
Download 2 Sentinel-2 scenes & stack them in a single file for inference.
Options:
--win_a TEXT URL to or Microsoft Planetary Computer ID of an Sentinel-2
L2A STAC item for the window A image [required]
--win_b TEXT URL to or Microsoft Planetary Computer ID of an Sentinel-2
L2A STAC item for the window B image [required]
-o, --out TEXT Filename to save results to [required]
-f, --overwrite Overwrites the outputs if they exist
--bbox TEXT Bounding box to use for the download in the format
'minx,miny,maxx,maxy'
--help Show this message and exit.
Then ftw inference run
is the command that will run a given model on overlapping patches of input imagery (i.e. the output of ftw inference download
) and stitch the results together in GeoTIFF format.
ftw inference run --help
Usage: ftw inference run [OPTIONS] INPUT
Run inference on the stacked Sentinel-2 L2A satellite images specified via
INPUT.
Options:
-m, --model PATH Path to the model checkpoint. [required]
-o, --out TEXT Output filename. [required]
--resize_factor INTEGER Resize factor to use for inference. [default: 2]
--gpu INTEGER GPU ID to use. If not provided, CPU will be used by
default.
--patch_size INTEGER Size of patch to use for inference. Defaults to
1024 unless the image is < 1024x1024px.
--batch_size INTEGER Batch size. [default: 2]
--padding INTEGER Pixels to discard from each side of the patch.
[default: 64]
-f, --overwrite Overwrite outputs if they exist.
--mps_mode Run inference in MPS mode (Apple GPUs).
--help Show this message and exit.
You can then use the ftw inference polygonize
command to convert the output of the inference into a vector format (defaults to GeoParquet/Fiboa, with GeoPackage, FlatGeobuf and GeoJSON as other options).
ftw inference polygonize --help
Usage: ftw inference polygonize [OPTIONS] INPUT
Polygonize the output from inference for the raster image given via INPUT.
Results are in the CRS of the given raster image.
Options:
-o, --out TEXT Output filename for the polygonized data. If not given
defaults to the name of the input file with parquet
extension. Available file extensions: .parquet
(GeoParquet, fiboa-compliant), .fgb (FlatGeoBuf), .gpkg
(GeoPackage), .geojson and .json (GeoJSON)
--simplify FLOAT Simplification factor to use when polygonizing in the
unit of the CRS, e.g. meters for Sentinel-2 imagery in
UTM. Set to 0 to disable simplification. [default: 15]
--min_size FLOAT Minimum area size in square meters to include in the
output. [default: 500]
-f, --overwrite Overwrite outputs if they exist.
--close_interiors Remove the interiors holes in the polygons.
--help Show this message and exit.
Simplification factor is measured in the units of the coordinate reference system (CRS), and for Sentinel-2 this is meters, so a simplification factor of 15 or 20 is usually sufficient (and recommended, or the vector file will be as large as the raster file).
The following commands show these four steps for a pair of Sentinel-2 scenes over Austria:
-
Download pretrained checkpoint from v1.
-
3 Class
wget https://github.com/fieldsoftheworld/ftw-baselines/releases/download/v1/3_Class_FULL_FTW_Pretrained.ckpt
-
2 Class
wget https://github.com/fieldsoftheworld/ftw-baselines/releases/download/v1/2_Class_FULL_FTW_Pretrained.ckpt
-
-
Download S2 Image scene.
ftw inference download --win_a S2B_MSIL2A_20210617T100559_R022_T33UUP_20210624T063729 --win_b S2B_MSIL2A_20210925T101019_R022_T33UUP_20210926T121923 --out inference_imagery/austria_example.tif
You can also specify a bbox to download a smaller subset of the data, e.g. add
--bbox 13.0,48.0,13.3,48.3
-
Run inference on the entire scene.
ftw inference run inference_imagery/austria_example.tif --model 3_Class_FULL_FTW_Pretrained.ckpt --out austria_example_output_full.tif --gpu 0 --overwrite
-
Polygonize the output.
ftw inference polygonize austria_example_output_full.tif --simplify 20
This results in a fiboa-compliant file named austria_example_output_full.parquet
.
Consider using CC-BY FTW Trained Checkpoints from the release file for Commercial Purpose, For Non-Commercial Purpose and Academic purpose you can use the FULL FTW Trained Checkpoints (See the Images below for perfrmance comparison)
We have also made FTW model checkpoints available that are pretrained only on CC-BY (or equivalent open licenses) datasets. You can download these checkpoints using the following command:
-
3 Class
wget https://github.com/fieldsoftheworld/ftw-baselines/releases/download/v1/3_Class_CCBY_FTW_Pretrained.ckpt
-
2 Class
https://github.com/fieldsoftheworld/ftw-baselines/releases/download/v1/2_Class_CCBY_FTW_Pretrained.ckpt
For details on the experimentation process, see Experimentation section.
If you see any warnings in this format:
/home/byteboogie/miniforge3/envs/ftw/lib/python3.12/site-packages/kornia/feature/lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
This is due to outdated libraries that rely on an older version of pytorch.
Rest assured ftw
won't face any issue in experimentation and dataset exploration.
We have made the dataset compatible with torchgeo for ease of use, and TorchGeo release 0.7 will include both the dataset and pre-trained models for streamlined integration. To get started, you can install the development version of TorchGeo and load the Fields of the World dataset with the following code:
pip install git+https://github.com/Microsoft/torchgeo.git # to get version 0.7 dev
from torchgeo.datasets import FieldsOfTheWorld
ds = FieldsOfTheWorld("dataset/", countries="austria", split="train", download=True)
We welcome contributions! Please fork the repository, make your changes, and submit a pull request. For any issues, feel free to open an issue ticket.
This codebase is released under the MIT License. See the LICENSE file for details.