LibIndic Improvements

Mentors: Hrishi, copyninja

The project aims to tackle the problems faced by search and spellchecking due to the agglutinative nature of indic languages. The implementation as part of this project is in Malayalam language. Also, implementation of a REST API for libindic modules.

Subprojects

  1. sandhi-splitter for Malayalam
  2. Improvements to spellchecker using sandhi-splitter.
  3. libindic REST API

Target Milestones

Community Bonding Period

  • Semi-automated annotation tool
  • Feasibiity study - non rule based splitting/joining
  • Initial Setup
    • Standard - PEP-8
  • Packaging - pbr
  • Testing Tool - testtool

Mid Term Evaluation

  • Sandhi splitter
  • split point identification
  • splitter
  • joiner
  • Spellchecker Integration

Final Evaluation

  • REST API for libindic
  • Finishing touches.

Links

  1. sandhi-splitter, working github repo under libindic.
  2. Proposal Draft - Google Docs, view only.
1 Like

Blog posts related to work can be found in the following page: