start
Table of Contents

Spelling Checkers

These are an important part of creating a complete language specific view of the operating system. Even English speakers prefer to see correct spell checkers for their locale (UK vs US vs South African). Certain languages by their nature do not need or cannot use the wordlist type spellcheckers found on Linux.

There are 3 main spellcheckers in Linux:

Ispell is the original and includes affix compression. Aspell is dubbed as a replacement for Ispell and has better algorithms for quessing missing words. MySpell is used by OpenOffice.org and Mozilla and will work on both Windows and Linux, it uses the affix compression found in Ispell (although in a new format). ASpell has now also adopted the MySpell affix file format.

For languages that have more sophisitcated spelling needs such as agglutination you will want to look as Hunspell.

Resource

Web based corpus building

A corpus is a body of text used by language researchers and spell checker builder. You can find missing or new words for your spell checker by scanning the web. There are two free tools that you can use to build your own web-based corpus - corpusbuilder and text_cat (FIXME How to use these). The former searches the web using a public search engine and the later uses a statistical model to determine if the text found is indeed in your target language.

Once you have a list of potential words you can use the new-words script in src/wordlist in the Translate Toolkit CVS to identify words that are not in your language. Review these words and add them to you master wordlist.

Kevin Scannell's - An Crúbadán

Language detection

This Python code could easily be used to develop language detection for a webcrawler: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/326576

Other Crawlers

Letter Frequencies

The translate project has a simple python script that creates letter frequencies that can be used in the MySpell affix files TRY line. See translate/src/wordlist/letter-frequency.py in the Translate Toolkit CVS

Building

The easiest way to build your spellcheckers is to use our project spellchecker build framework. This will build MySpell and Aspell (Ispell temporarily disabled) spellcherckers from a common wordlist or wordlists. Look at the Afrikaans and Zulu dictionaries for a template of the process. Again this is in CVS in the dict module.

In more detail

Checkout the dict/ module from CVS:

cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/translate login
cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/translate co -P dict

Directory layout:

Simple make instructions:

Making it work

Make sure that your language is included in: http://cvs.gnome.org/viewcvs/gnome-spell/gnome-spell/dictionary.c

So that Gnome applications such as Evolution can make use of your aspell spellchecker.

Publishing

OpenOffice.org

To get the spellchecker onto the OpenOffice.org pages and thus downloadable from within OpenOffice.org. You will need to submit a bug report. Here is and example issue: http://www.openoffice.org/issues/show_bug.cgi?id=23201

ASpell

FIXME

Mozilla

FIXME have tried requesting updates on the Mozilla dictionary site but no responce.