Language model and Acoustic Model Speech recognition development for Malayalam using CMU Sphinx

IMSreenadh · March 24, 2016, 6:36am

Discussion thread for LM and AM development for Speech recognition of Malayalam uisng CMU Sphinx

Mentor : Mrs. Deepa Gopinath

Link to the current work I have done: ml-lm-am

The workflow I followed for my initial trial of the idea during my Major project:

Use The Datuk corpus as the source for Malayalam corpus (http://olam.in/open/datuk/)
Build a Phonetic dictionary using this corpus
Develop unigram, digram, and trigram Statistical language model using CMUCSLM
toolkit.
Training phase and acoustic model development using five different people.
Testing for accuracy, could only achieve roughly 60% due to time limitations for the
project submission.

Find the complete Proposal over here

IMSreenadh · May 27, 2016, 4:49am

Blog update : Here

stultus · May 28, 2016, 7:50pm

I will create a repo in gitlab under smc, please fork it and give PR’s regularly.
what should be the name of the repo?

IMSreenadh · May 29, 2016, 6:58am

That’s fine.

How about naming it “samsaaram” ?

IMSreenadh · May 29, 2016, 11:24am

Blog Updates

Follow me here for further updates and also you can read my blog updates from the past three weeks.

stultus · May 30, 2016, 5:50am

How about naming it “samsaaram” ?

I think a name that non-malayalees can recognize will be a better choice

IMSreenadh · May 30, 2016, 3:47pm

Ohh… right!

Well then normal naming like ‘ml-speech’ or something similar. Can’t get anything simple enough in my mind.

IMSreenadh · June 10, 2016, 12:57pm

Am currently pushing my works to the previously mentioned repository on GitHub

asd · June 27, 2016, 4:24pm

Yes, it is possible to edit discourse posts. Just find the edit button among the tools on the bottom of the post.

IMSreenadh · July 19, 2016, 2:28pm

##Status update:
Completed an initial build of Language Model and Acoustic Model.
Need to test accuracy and see if I can improve a bit perhaps.