Tokenizers (or stemmers) improve the quality of matches by recognizing inflected words in source and translation memory data. They also improve glossary matching.
A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stemmer", "stemming", "stemmed" as based on "stem". A stemming algorithm reduces the words "fishing", "fished", "fish", and "fisher" to the root word, "fish". This is especially useful in case of languages that use pre- and postfix forms for the stem words. Borrowing an example from Slovenian, here "good" in all possible grammatically correct forms:
lep, lepa, lepo - singular, masculine, feminine, neutral
lepši, lepša, lepše . - comparative, nominative, masculine, feminine, neutral, resp. Plural form of the adjective
najlepših - superlative, plural, genitive for M,F,N
Tokenizers are included in OmegaT. OmegaT automatically selects a tokenizer for the source and the target language according to the language settings of the project. It is possible to select another tokenizer or a different version of the tokenizer from the Project Properties window.
OmegaT will not launch if tokenizers are found in the /plugin folder. Remove all the tokenizers from the /plugin folder before starting OmegaT.