Recoll uses the Xapian information retrieval library as its storage and retrieval engine. Xapian is a very mature package using a sophisticated probabilistic ranking model. Recoll provides the interface to get data into (indexation) and out (searching) of the system.
In practice, Xapian works by remembering where terms appear in your document files. The acquisition process is called indexation.
The resulting database can be big (roughly the size of the original document set), but it is not a document archive. Recoll can only display documents that still exist at the place from which they were indexed.
Recoll stores all internal data in Unicode UTF-8 format, and it can index files with different character sets, encodings, and languages into the same database. It has input filters for many document types.
Stemming depends on the document language. Recoll stores the unstemmed versions of terms and uses auxiliary databases for term expansion. It can switch stemming languages, or add a language, without reindexing. Storing documents in different languages in the same database is possible, and useful in practice, but does introduce possibilities of confusion. Recoll makes no attempt at automatic language recognition.
Recoll has many parameters which define exactly what to index, and how to classify and decode the source documents. These are kept in a configuration file. A sample configuration is installed into the .recoll subdirectory of your home directory when you first execute a Recoll command. The initial configuration will index your home directory with default parameters and should be sufficient for giving Recoll a try, but you may want to adjust it later.
Indexation is started automatically the first time you execute the recoll search graphical user interface, or by executing the recollindex command.
Searches are performed inside the recoll program, which has many options to help you find what you are looking for.