-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DMP 2024]: Create offline audio-phoetic matching model #313
Comments
hello @ChakshuGautam, The hindi words displayed....what will be its format...like
Also is there some specific corpus of hindi text to be used? |
the 2nd one. A paragraph that a child can read. Ideally in the UI, would like to show around 2 sentences that the child keeps reading and the paragraph keep scrolling down until fully read. Have added a sample dataset |
because this is to check if a person has read correctly or not, model needs to be more based on phonetics of the audio than relying on auto-regressively decoding for next word. |
I would like to work on this project |
Do not ask process related questions about how to apply and who to contact in the above ticket. The only questions allowed are about technical aspects of the project itself. If you want help with the process, you can refer instructions listed on Unstop and any further queries can be taken up on our Discord channel titled DMP queries. Here's a Video Tutorial on how to submit a proposal for a project. |
Hello @GautamR-Samagra, I've delved into this use case and stumbled upon some pre-trained models that yield promising results with just a minor bit of fine-tuning. Although I had limited resources on the free version of Colab, I managed to achieve notable improvements. As depicted in the image, there's still some difference between the actual output and the output generated by the model. My approach involves recording the .wav file and converting it into words, subsequently comparing them against the repository of pre-stored correct words and sentences to derive a score. This initial evaluation phase sets the stage for fine-tuning the model to suit our specific needs. I would greatly appreciate any feedback or suggestions you may have on refining this approach. Thank you. |
hello @GautamR-Samagra @ChakshuGautam , I have worked upon the project where we are supposed to implement a read along app in offline mode. as MFCC may not capture all the aspects of pronunciation and also gives good similarity score with incomplete speech so I am currently tinkering with X-vectors. Please reply with your valuable feedback. Thank you for your time and consideration. |
hi @GautamR-Samagra @ChakshuGautam, after tinkering with X-vectors, I got the following results. It was time consuming and demanded high computation( not suitable for edge devices). In addition to this, it wasn't able to solve the existing problem with MFCCs. So, I will try to setup a workaround to tackle the above issue and I will keep you posted. |
hi @GautamR-Samagra @ChakshuGautam, I tried setting up the prototype the other way around and here are the results. read.along.demo.mp4this setup works in the offline environment and the score is not perfect because selected sentence contain inconsistent spacing. This model is based upon whisper(openai) and is quite large for edge device, so I will try to reduce the size of the model (*the time taken to predict the score is due to gradio's framework, not related to model). Any feedback would be good for the development of the project, this would mean a lot. Thank you for your time. |
@RohanHBTU what did you use to create the Xvectors? Can you mentions which whisper model you used for the last comment? |
hi @GautamR-Samagra @ChakshuGautam , the whisper model was too big in offline envrionment for an edge device even after quantization. So, tried another model which lightweight and low latency. vosk_demo.mp4the model is only 42 mb(zipped) and 78 mb after extraction. |
Hi @GautamR-Samagra, I wish to work on this project as a part of C4GT program. I am a pre-final year student at IIIT Delhi, India and I believe I will able to contribute positively to the project. Since I recently got to know about this program, and the deadline is approaching, could you please give me a clarity on what steps should I take to showcase you my dedication and make my proposal strong? Furthermore, it'd be great if I could get your discord so that I can work directly under your supervision. Awaiting your reply. thank you |
looking at other force alignment tools here
|
@xorsuyash can you comment here so that I can assign this to you ? |
@GautamR-Samagra |
cc @GautamR-Samagra Training acoustic word embedding model to optimize audio transcript matching
|
@prabakaranc98 here |
Weekly GoalsWeek 1
Week 2
Week 3
Week 4
Week 5
Week 6
Week 7
Week 8
Week 9
Week 10
Week 11
Week 12
|
Weekly GoalsWeek 1
Week 2
Week 3
Week 4
Week 5
Week 6
Week 7
Week 8
Week 9
Week 10
Week 11
Week 12
|
Offline Alternative to Google's Read Along App in Hindi
Description
Develop an offline application (POC - web) that can display a set of Hindi words and accurately determine if the user has pronounced each word correctly. The app aims to be an educational tool for Hindi language learners, providing instant feedback on their pronunciation.
The application is envisioned as an offline tool similar to Google's Read Along app but specifically for the Hindi language. It should present users with Hindi words and listen to the user's attempt to pronounce these words, providing feedback on the accuracy of their pronunciation.
Approaches for Consideration:
Implementation Details:
This is an open invitation for contributors to suggest ideas, approaches, and potential technologies that could be utilized to achieve the project goals. Contributions at all stages of development are welcome, from conceptualization to implementation.
Goals & Mid-Point Milestone
Sample audio files:
Acceptance Criteria
Being able to create a lite model that is able to detect the subset of words that a child has correctly pronounced.
Mockups/Wireframes
Product Name
Nipun Lakshya App
Organisation Name
SamagraX
Domain
Education
Tech Skills Needed
Machine Learning, Natural Language Processing, Python
Mentor(s)
@GautamR-Samagra
Category
Machine Learning
The text was updated successfully, but these errors were encountered: