Skip to content

Latest commit

 

History

History
3 lines (2 loc) · 984 Bytes

README.md

File metadata and controls

3 lines (2 loc) · 984 Bytes

Environmental Scene Classification: Comparing Pseudolabeled Data and Traditionally Augmented Data

This paper explores semi-supervised learning to improve environmental scene classification by generating pseudolabels for the previously unlabeled ESC-US dataset of 250,000 audio records. A MobileNetV3 convolutional neural network (CNN), pretrained on ImageNet and fine-tuned on the labeled ESC-50 dataset (2,000 records), is used to pseudolabel ESC-US. Subsequently, various VGG-like CNNs are trained from scratch on ESC-50, additionally incorporating either data augmentation techniques (e.g., pitch shifting, time stretching, silence trimming) applied to ESC-50 or pseudolabeled ESC-US data at different confidence thresholds. The results show that, while incorporating 250,000 pseudolabeled samples (ESC-US) can theoretically enhance performance, carefully applied data augmentation on a much smaller dataset (ESC-50) can yield superior performance and computational efficiency.