This project explores semantic segmentation techniques for road detection in aerial imagery. Using various architectures, including U-Net, FPN, and SPINFCN8, the project evaluates their performance on a road segmentation dataset. The results demonstrate the effectiveness of advanced segmentation models in achieving high accuracy and F1 scores.
- Source: AIcrowd Road Segmentation Challenge.
- Details: The dataset consists of:
- 100 training images with masks.
- 50 test images without labels.
- Image Specifications:
- Size: 400x400 pixels.
- Channels: RGB.
To download the dataset, follow these steps:
- Navigate to the AIcrowd Road Segmentation Challenge.
- Log in with your credentials. If you do not have an account, create one using your institutional email.
- Download the dataset files:
train.zip
(contains training images and masks).test.zip
(contains test images).
- Extract the downloaded files:
unzip train.zip -d data/training unzip test.zip -d data/test_set_images
The dataset does not include a predefined validation set. To create one:
-
Split the
train
directory into:- A training subset containing 90% of the images and masks.
- A validation subset containing the remaining 10%.
-
Manually move the selected files into the following structure:
data/
├── training/
│ ├── images/ # Training images
│ └── groundtruth/ # Corresponding segmentation masks
├── validation/
│ ├── images/ # Validation images
│ └── groundtruth/ # Corresponding segmentation masks
└── test/
└── test_set_images/ # Test images
- Normalization of pixel values.
- Data augmentation techniques such as flipping and rotation.
- Splitting the training set into 90% training and 10% validation subsets.
- Baseline model implemented from scratch.
- Encoder-decoder structure with skip connections.
- FPN: Multiscale feature aggregation.
- DeepLabV3+: Atrous Spatial Pyramid Pooling (ASPP).
- Encoder: ResNet50, some with pretrained weights.
- Optimized for efficient and accurate segmentation.
- Combines spatial pyramids with fully convolutional layers.
- Python 3.8 or later.
- Required libraries:
numpy
,pytorch
,torchvision
,segmentation_models_pytorch
.
-
Clone the repository:
git clone https://github.com/CS-433/ml-project-2-morocco
-
Install dependencies:
pip install -r requirements.txt
-
Create a directory structure for storing the results of the predictions and model checkpoints (model.pth). Run the following commands in your terminal:
mkdir -p results/current/predictions
-
For each model, you can adjust the model’s hyperparameters and select the backbone in
config.py
. For optimal performance as demonstrated in our tests, use the configurations specified inbest_config.py
(copy paste inconfig.py
) for the SPIN model. -
In src/SPIN/config.py you can change the variable MODEL which by default is SPINRoadMapperFCN8() but you can also use SPINRoadMapper() with a speicfic backbone and weight (example : MODEL = SPINRoadMapper(model_func=segmentation.deeplabv3_resnet101, weights=segmentation.DeepLabV3_ResNet101_Weights))
-
To test a new model change the model name in the
run.py
file (in the case of SMP change model_name too):model = 'SPIN' // 'UNET' // 'SMP' model_name = 'FPN' // 'UNET' // 'UNET_PRETRAINED'
-
Run the training script:
python run.py
Model | F1 score (%) | Pixel Accuracy (%) |
---|---|---|
Custom U-Net | 74.5 | 81.6 |
SMP U-Net (pretrained weights) | 78.3 | 87.8 |
SPIN + DeepLabV3+ | 87.2 | 93.0 |
SPIN + FCN8 | 87.9 | 93.3 |
- Misuse in surveillance and military applications.
- Limited the scope to civilian road segmentation applications.
- Ensured the dataset contains no personally identifiable information.
- Course: Machine Learning (CS-433), EPFL.
- Libraries:
segmentation_models_pytorch
, PyTorch. - Dataset: AIcrowd Road Segmentation Challenge.
This project is licensed under the MIT License. See the LICENSE
file for details.