This paper explores semi-supervised learning to improve environmental scene classification by generating pseudolabels for the previously unlabeled ESC-US dataset of 250,000 audio records. A MobileNetV3 convolutional neural network (CNN), pretrained on ImageNet and fine-tuned on the labeled ESC-50 dataset (2,000 records), is used to pseudolabel ESC-US. Subsequently, various VGG-like CNNs are trained from scratch on ESC-50, additionally incorporating either data augmentation techniques (e.g., pitch shifting, time stretching, silence trimming) applied to ESC-50 or pseudolabeled ESC-US data at different confidence thresholds. The results show that, while incorporating 250,000 pseudolabeled samples (ESC-US) can theoretically enhance performance, carefully applied data augmentation on a much smaller dataset (ESC-50) can yield superior performance and computational efficiency.
-
Notifications
You must be signed in to change notification settings - Fork 0
teaden/ESC-Semi-Supervised
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Semi-Supervised Learning Audio Classification Task Focused on Environmental Scene Classification
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published