Skip to content

Divyanshi-Bhojak/Classification-of-Urban-Sounds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Classification-of-Urban-Sounds

Classification of audio signals are employed using convolutional neural network and VGG-16 for the and dataset used for the implementation is the UrbanSound8K. As a result of model’s output, an input audio file will be classified into one of the ten classes: Air Conditioner, Car Horn, Children Playing, Dog Bark, Drilling, Engine Idling, Gun Shot, Jackhammer, Siren, and Street Music. The most important priority in audio processing is feature extraction, and Mel Frequency Cepstral Coefficient (MFCC) is employed as a feature space for sound samples. image

Dataset The UrbanSound8k dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music. The classes are drawn from the urban sound taxonomy.All excerpts are taken from field recordings uploaded to www.freesound.org. 8732 audio files of urban sounds (see description above) in WAV format. The sampling rate, bit depth, and number of channels are the same as those of the original file uploaded to Freesound (and hence may vary from file to file). The UrbanSound8k dataset used for model training, can be downloaded from the following link: https://urbansounddataset.weebly.com/

Features Extracted Librosa was used for data preprocessing and feature extraction.

MEL Features MFCC In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum"). image

Results Test Accuracy for CNN Model using original data: 47.0339% Test Accuracy for CNN Optimised Model using original data: 93.1598% Test Accuracy for CNN-V 16 Model using augmented data: 92.9177%

Future Work Extend data more by using different parameters for augmentation Apply Hyperparameter optimization and test different architectures

About

Classification of Urban Sounds

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published