Releases: martijndeb/haxe-linguistics
Butter, bread and green cheese
This release adds Frisian as a language. It's being treated as a secend class citizen for now whilst English, Dutch and German remain to have the primary focus.
Also new in this release are the separation of token filtering from the tokenizers, and thus all tokenizers must implement the new ITokenFilter. One such example is the new StopwordTokenFilter, which uses the updated stopwords lists in languages.
You can now use the new BasicStringBuilder to convert a token list back to a string
Smack my Bayes up
Adds support for Dictionaries, which allow you to extract unique words from a text and keep a count of them.
Adds support for calculation Levenshtein distance in strings.
Adds support for Naive Bayes classification.
The Basics
First release created and submitted as haxelib.
Contains a basic tokenizer supporting English and Dutch as languages.
Provides complete test coverage for this release.