DeepBhashan

DeepBhashan - Personalised Speech Recognition Using Deep Neural Networks

About :

Speech recognition problem can be solved using traditional ways like HMMs. With increased computational power and bigger datasets we can get better accuracy on DNN than HMM. Broad Structure:

Speech
Feature Extraction using pre trained base model
Fine Tuning Encoder Network(LSTM/RNNs)
Fine tuning English LM for Hindi Speech
Repeating Tasks for Multiple Users.

Pre-processing data :

We have taken hindi audiobook from librivox, which had 33 chapters of about 20-25 mins along with their mp3's and text files. Each hindi-typed text file has been converted to english data using google translate API using the script csv_mapping_EnglishToHindi.py Each sentence has been seperated indivisually and special characters and uni-codes have been removes from the text. Aeneas library has been used to preprocess the text files and generate a json syncmap for corresponding text and audio files.

Preparing the dataset :

In the folder named dataset -> text_files, chapter wise are present. To prepare the json files run the script Preparing_jsons.py. After creating the json files manually fine tune them to ensure proper matching of the text with the audio files in jsons. Once done with jsons, run the script Preparing_csvs.py to prepare csvs for each chapter. Merge all csvs to create a final one and then using Splitting_data.py seperate it into training, testing and validation data with a split ratio of 70:20:10. For seperating wav files in each csv seperately use Splitting_wav_files.py.

Preparing Hindi devnagri dataset:

In the folder named dataset -> Hindi dictionary and english dictionary is present. To prepare the one-to-one mapping, an array of unique english words can be formed using English_dictionary_fromHindi.py. These arrays can be used to make the final csv's of devnagri text by using csv_mapping_EnglishToHindi.py.

Training :

Pre-tained model DeepSpeech has been used to fine tune the parameters for the pre-processed data. All the training has been done from scratch.
Language Model is generated using the KenLM module KenLM for both Devanagari and English alphabet for the Hindi Speech Dataset.
The DeepSpeech model has been trained on on the English alphabet Hindi Speech Dataset. This ran successfully and the following is the validation and training loss curves shown below.

The model was trained for 20 epochs, as the validation loss was beginning to be stagnant.

The code, trained Models, training-test-validation data, LMs can be found here : Code

The link contains seperate folders for all the wav files and corresponding csv files used for training testing and validation along with the models used for training. Two seperate models are there, one for english and one for hindi data. The english model has been trained on 11 hours of data along with fine-tuning of parameters. The hindi model, made using transfer learning has been trained on 1 hour of data(personalised data). Both these folders contain respective languages models, alphabet files, vocabulary files, LM files and LM Scorers along with necessary data for training and inference.

Results :

Sample output to some of the sentences are shown below :

Audio	Ground Truth	DeepSpeech output with LM
Audio_1	na apane kisee bhaee se aur na hee mere kisee parivaar ke sadasy se	na apane kisee bhaee se aur na hee mere kisee parivaar ke sadasy se
Audio_2	bas keval ek hee aap aaj ke baad kisee se bhee is vishay mein koee baat nahin karoge	bas keval ek hee ki aap aaj ke baad kisee se bhee is vishay mein koee baat nahin karenge

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Models		Models
dataset		dataset
.DS_Store		.DS_Store
DeepSpeech.ipynb		DeepSpeech.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepBhashan

DeepBhashan - Personalised Speech Recognition Using Deep Neural Networks

About :

Pre-processing data :

Preparing the dataset :

Preparing Hindi devnagri dataset:

Training :

Results :

About

Releases

Packages

Languages

0xproflupin/DeepBhashan

Folders and files

Latest commit

History

Repository files navigation

DeepBhashan

DeepBhashan - Personalised Speech Recognition Using Deep Neural Networks

About :

Pre-processing data :

Preparing the dataset :

Preparing Hindi devnagri dataset:

Training :

Results :

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages