Discussion thread for LM and AM development for Speech recognition of Malayalam uisng CMU Sphinx
Mentor : Mrs. Deepa Gopinath
Link to the current work I have done: ml-lm-am
The workflow I followed for my initial trial of the idea during my Major project:
- Use The Datuk corpus as the source for Malayalam corpus (http://olam.in/open/datuk/)
- Build a Phonetic dictionary using this corpus
- Develop unigram, digram, and trigram Statistical language model using CMUCSLM
toolkit. - Training phase and acoustic model development using five different people.
- Testing for accuracy, could only achieve roughly 60% due to time limitations for the
project submission.
Find the complete Proposal over here