Chapter 2. Indexation

Table of Contents
2.1. Introduction
2.2. The indexation configuration
2.3. Starting indexation
2.4. Using cron to automate indexation

2.1. Introduction

Indexation is the process by which the set of documents is analyzed and the data entered into the database. Recoll indexation is normally incremental: documents will only be processed if they have been modified. On the first execution, of course, all documents will need processing. A full index build can be forced later on by specifying an option to the indexation command (recollindex -z).

Recoll indexation takes place at discrete times. There is currently no interface to real time file modification monitors. The typical usage is to have a nightly indexation run programmed into your cron file.

Recoll knows about quite a few different document types. The parameters for document types recognition and processing are set in configuration files Most file types, like HTML or word processing files, only hold one document. Some file types, like mail folder files can hold many individually indexed documents.

Recoll indexation processes plain text, HTML, openoffice and e-mail files internally. Other types (ie: postscript, pdf, ms-word, rtf) need external applications for preprocessing. The list is in the installation section.

Without further configuration, Recoll will index all appropriate files from your home directory, with a reasonable set of defaults, if you live in western Europe or the USA. If your normal character set is not iso8859-1, you almost certainly need to adjust the configuration.