Discussion thread for LM and AM development for Speech recognition of Malayalam uisng CMU Sphinx
Mentor : Mrs. Deepa Gopinath
Link to the current work I have done: ml-lm-am
The workflow I followed for my initial trial of the idea during my Major project:
- Use The Datuk corpus as the source for Malayalam corpus (http://olam.in/open/datuk/)
- Build a Phonetic dictionary using this corpus
- Develop unigram, digram, and trigram Statistical language model using CMUCSLM
toolkit.
- Training phase and acoustic model development using five different people.
- Testing for accuracy, could only achieve roughly 60% due to time limitations for the
project submission.
Find the complete Proposal over here
1 Like
I will create a repo in gitlab under smc, please fork it and give PR’s regularly.
what should be the name of the repo?
That’s fine.
How about naming it “samsaaram” ?
Blog Updates
Follow me here for further updates and also you can read my blog updates from the past three weeks.
How about naming it “samsaaram” ?
I think a name that non-malayalees can recognize will be a better choice
Ohh… right!
Well then normal naming like ‘ml-speech’ or something similar. Can’t get anything simple enough in my mind.
One more doubt, Is it possible to edit the description?
There is a mistake in the hyperlink I provided for ‘ml-lm-am’ . Its supposed to be this : GitHub - sreecodeslayer/ml-am-lm-cmusphinx: This is Malayalam Speech Recognition model developed for CMUSphinx. This is now used for Google Summer Code 2016
[Blog Update] (Slow as a Snail. “by persistence the snail reached the… | by Sreenadh T C | Medium)
Am currently pushing my works to the previously mentioned repository on GitHub
asd
9
Yes, it is possible to edit discourse posts. Just find the edit button among the tools on the bottom of the post.
##Status update:
Completed an initial build of Language Model and Acoustic Model.
Need to test accuracy and see if I can improve a bit perhaps.