Speech-Augmentation-and-Endpoint-Detection

This repository is developed in MATLAB. Speech Augmentation is based on Adaptive Filtering while Endpoint Detection is based on Voice Activity Detection(VAD)

Prerequisite

Please make sure that your MATLAB has installed Voicebox. This toolbox contains important functions such as enframe(). This is their official website:http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html Installation of this toolbox is everywhere on the Internet. Please google it yourself.

Installation

1. git clone all files under the same directory.

https://github.com/MorrisXu-Driving/Speech-Augmentation-and-Endpoint-Detection.git
Run SpeechDetect.m in MATLAB.

2. File Introduction

SpeechDetect.m: The main program.
SpeechSegment.m: The endpoint detection algorithm based on VAD. It returns:
- zcr: zero-crossing rate(ZCR),
- amp:Short-time Energy(STE),
- voiceseg: a class containing start,end,duration of each speech signal:,
- vsl: total number of speech segments,
- SF: An array with speech frame labeled 1,
- NF: An array with non-speech frame labeled 1
SegmentInfo: A depend function in SpeechSegment.m
frame2time: Calculate the corresponding time step of each frame after enframing the signal
zc2.m: Calculate the ZCR of each frame

3. Parameter Setting:

Parameter in SpeechDetect.m:
- wlen: enframing window length
- inc: enframing increment
- IS: The time duration in second of the non-speech/background signal at the start of input audio
Parameter in SpeechSegment.m:
- maxsilence: Maximum length of silence in frame number accepted in one speech segment
- minspeech: Minimum length of speech signal in frame number accpeted to recognize it as a speech segment
- r1: threshold coefficient for lower bound of STE gate
- r2: threshold coefficient for upper bound of STE gate
- r3: threshold coefficient for ZCR gate.
  To understand the parameters please go through the Algorithm Architecture carefully!

Algorithm Architecture

1. Speech Augmentation based on Adaptive Filtering

The technique used in this algorithm is MMSE filter, also called Wiener filter. It is an LTI system that was shown below

a. Optimization Objective

The filter has an optimization object that 'V₂*W=V₁'. Intuitively, the filter is trying to learn the impulse response for the noise propogating from noise source to the wanted signal source.

b. Optimization Method

After determing the optimization objective, we need to determine the optimization approach. Since the MSE is a typical convex optimization problem, using Gradient Descent or Random Gradient Descent(SGD) is simple and efficient. The following equation shows the optimization approach in this project.

c. Denoising Result

Speech Waveform before denoising(SNR:-6dB)

Speech Waveform after denoising(SNR:16.2dB)

2. Endpoint Detection based on VAD

a. Algorithm Flow

In this projeect the algorithm determins the speech/non-speech frame based on Short-Time Energy(STE) and Zero-Crossing Rate(ZCR) of background signal at the start of input audio, the signal between 0-IS. The judgement flowchart is shown below:

b. Detection Result

The result on clean speech signal:

The result on the denoising signal:

The endpoints detected in yellow is wrong, which means that more features need to consider when tackling with high ZCR signals.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Readme_img		Readme_img
README.md		README.md
SegmentInfo.m		SegmentInfo.m
SpeechDetect.m		SpeechDetect.m
SpeechSegment.m		SpeechSegment.m
bluesky1.wav		bluesky1.wav
crowdtalking2_16k.wav		crowdtalking2_16k.wav
frame2time.m		frame2time.m
speech-librivox-0000-50s.wav		speech-librivox-0000-50s.wav
zc2.m		zc2.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-Augmentation-and-Endpoint-Detection

Prerequisite

Installation

1. git clone all files under the same directory.

2. File Introduction

3. Parameter Setting:

Algorithm Architecture

1. Speech Augmentation based on Adaptive Filtering

a. Optimization Objective

b. Optimization Method

c. Denoising Result

2. Endpoint Detection based on VAD

a. Algorithm Flow

b. Detection Result

About

Releases

Packages

Languages

MorrisXu-Driving/Speech-Augmentation-and-Endpoint-Detection

Folders and files

Latest commit

History

Repository files navigation

Speech-Augmentation-and-Endpoint-Detection

Prerequisite

Installation

1. git clone all files under the same directory.

2. File Introduction

3. Parameter Setting:

Algorithm Architecture

1. Speech Augmentation based on Adaptive Filtering

a. Optimization Objective

b. Optimization Method

c. Denoising Result

2. Endpoint Detection based on VAD

a. Algorithm Flow

b. Detection Result

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages