Discussion thread for LM and AM development for Speech recognition of Malayalam uisng CMU Sphinx
Mentor : Mrs. Deepa Gopinath
Link to the current work I have done: ml-lm-am
The workflow I followed for my initial trial of the idea during my Major project:
- Use The Datuk corpus as the source for Malayalam corpus (http://olam.in/open/datuk/)
- Build a Phonetic dictionary using this corpus
- Develop unigram, digram, and trigram Statistical language model using CMUCSLM
- Training phase and acoustic model development using five different people.
- Testing for accuracy, could only achieve roughly 60% due to time limitations for the
project submission.
Find the complete Proposal over here
1 Like
I will create a repo in gitlab under smc, please fork it and give PR’s regularly.
what should be the name of the repo?
That’s fine.
How about naming it “samsaaram” ?
Blog Updates
Follow me here for further updates and also you can read my blog updates from the past three weeks.
How about naming it “samsaaram” ?
I think a name that non-malayalees can recognize will be a better choice
Ohh… right!
Well then normal naming like ‘ml-speech’ or something similar. Can’t get anything simple enough in my mind.
One more doubt, Is it possible to edit the description?
There is a mistake in the hyperlink I provided for ‘ml-lm-am’ . Its supposed to be this : GitHub - sreecodeslayer/ml-am-lm-cmusphinx: This is Malayalam Speech Recognition model developed for CMUSphinx. This is now used for Google Summer Code 2016
[Blog Update] (Slow as a Snail. “by persistence the snail reached the… | by Sreenadh T C | Medium)
Am currently pushing my works to the previously mentioned repository on GitHub
Yes, it is possible to edit discourse posts. Just find the edit button among the tools on the bottom of the post.
##Status update:
Completed an initial build of Language Model and Acoustic Model.
Need to test accuracy and see if I can improve a bit perhaps.