Key concepts

To fully understand the capabilities of DB2 Net Search Extender, it is necessary to understand key terms, which are found bold in this section, and the various options available. It is also necessary to have a basic understanding of DB2 Universal Database concepts and terms.

Basically, DB2 Net Search Extender searches text documents that are held in the column of a database table.

The text documents must be uniquely identifiable. Net Search Extender uses the primary key of the table for this purpose.

The documents can be in various formats, such as HTML or XML.

Rather than sequentially searching through the text documents that would take a considerable amount of time, Net Search Extender creates a text index in order to make documents searchable.

A text index consists of significant terms that are extracted from the text documents.

Figure 1. Creating a text index
Creating a text index

Text index creation is the process of defining and declaring the properties of the index, such as the location of the index. After creation, the text index contains no data. Index update is the process of adding data to the text index. The first index update adds all text documents from the text column to the index. It is known as the initial update.

By using a text index for searching, there are synchronization issues between the table and the text index that must be taken into account, as any follow-up changes to the table, such as additions, deletions, and updates to the text documents must be reflected in the text index.

Synchronization in Net Search Extender is based on triggers that automatically store information about new, changed, and deleted documents in a log table. There is one log table for each text index. Applying the contents of the log table to its corresponding text index is called incremental update.

Figure 2. Incremental update process
Incremental updates

You can update the text index by using a manual or automatic option. The automatic option uses an update schedule to set days and times.

Note that neither of these options synchronizes the text index within the scope of a transaction that updates, deletes, and inserts text documents. Net Search Extender's asynchronous text indexing improves performance and concurrency. The update is applied within a separate transaction to a copy of a very small part of the index. The index is only locked for read access during a very short period of time when the copy is put in place of the original. This is invisible to search operations, see Net Search Extender instance services for information.

A text index has certain properties, such as index file location and automatic update properties. If necessary, you can change some of the properties. This is known as altering the index.

One such property is whether the ORDER BY phrase should presort the text index on the table columns. In such a case, the initial update will index the text document in the order specified, and return the search results in this order.

For example, specifying presorted book abstracts according to the book price. When looking for the least expensive books about relational database systems, you can restrict your text search to return only the first couple of books as these will be the cheapest. However, without presorted indexes, you would have to search for all books and join these with the cheapest books, which would be a more costly operation.

Net Search Extender allows several presorted indexes per text column. For example, one index for presorting books according to the date of publication and, a second presorting books according to the price.

Usually the first update after creating a text index is an initial update, and the following updates are incremental. However, when working with presorted indexes you want to keep the order in case of updates. This is addressed by the Recreate Index on Update option, which totally rebuilds the index each time an update is performed.

After the text index is updated, you can search using one of the following options:

As the search options have different operating characteristics, they are explained in the following sections.