Automatic gloss generator for Otomí language

Dependencies

Dependency Manager

poetry (see how to install it)

Python packages

scikit-learn
jupyter
jupyterlab
matplotlib
pandas
python-crfsuite

Instalation

$ poetry install

Notebooks and examples

Experimental enviroments

Training pipelines are available inside the notebooks/ folder. Each notebook can be executed and reproduce cell by cell.

linearCRF: This setting considers all the information available. Features are mentioned inside notebooks in the first cell.
POSLess: In this setting we excluded the POS tags.
HMMLike: This setting takes into account the minimum information, i.e. information about the current letter and the immediately preceding one. We use this name because this configuration contains similar information as the HMMs but using CRFs to build the.

Examples

Inside notebooks/ folder there are notebook with the postfix _ejemplos.ipynb for experimental enviroment. Those notebooks are useful to see pre-trained models in acton.

Baseline: HMMLike

L1 = 0.0
L2 = 0.0
Max de iterions = 50
model name: HMMLike_baseline_k_[1-3].crfsuite

Preprocessing

Corpus depuration

Delete duplicated lines
- $ sort -u corpus > corpus_uniq
Show duplicated lines
- $ diff --color corpus_sort corpus_uniq

Conventions

Character substitutions

To solve encoding/decogding problems with python-crfsuite we substitute next otomí characters:

u̱ -> μ
a̱̱ -> α
e̱ -> ε
i̱ -> ι

Pipeline

Get the glossed corpus
Text preprocessing
Make the feature lists for each letter in sentences
Split test and train sets
Training and models build
Tags generations and performance tests

Name		Name	Last commit message	Last commit date
Latest commit History 190 Commits
diagramas		diagramas
latex		latex
lezgi		lezgi
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
PREGUNTAS.md		PREGUNTAS.md
README.md		README.md
TODO.md		TODO.md
gunnar_notas.yml		gunnar_notas.yml
gunnar_notes_mannager.py		gunnar_notes_mannager.py
notas.md		notas.md
notas_de_notas.yml		notas_de_notas.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic gloss generator for Otomí language

Dependencies

Dependency Manager

Python packages

Instalation

Notebooks and examples

Experimental enviroments

Examples

Baseline: HMMLike

Preprocessing

Corpus depuration

Conventions

Character substitutions

Pipeline

About

Releases

Packages

Languages

License

umoqnier/otomi-morph-segmenter

Folders and files

Latest commit

History

Repository files navigation

Automatic gloss generator for Otomí language

Dependencies

Dependency Manager

Python packages

Instalation

Notebooks and examples

Experimental enviroments

Examples

Baseline: HMMLike

Preprocessing

Corpus depuration

Conventions

Character substitutions

Pipeline

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages