Creating a text index which the stored procedure search can use

To use the stored procedure search, you need to specify cache options during the CREATE INDEX command. This enables high performance, by moving all the specified data into main memory.

However, before the first index update for searching, ensure that your table contains documents to avoid updating an index on a non-populated table. This provides a better indexing performance and a solid estimation of cache memory requirements.

The stored procedure search allows you to quickly return predefined data that is associated with a document. Use the cache table option to define this in the CREATE INDEX command. The ACTIVATE CACHE command then moves the specified data into a memory cache.

Note

The SQL scalar search functions can also use this text index, if the text index is not created on a view.

In a distributed DB2 environment, you must explicitely specify a tablespace for administration tables on a single node for the stored procedure and explicitely call on this node.

To ensure that you connect to the correct node, use the DB2NODE environment variable.

When creating a text index for stored procedure search, you must determine and calculate the following parameters:

The following types of cache are available:

A temporary cache
This is rebuilt with each DB2TEXT ACTIVATE CACHE command, and requires loading the data from your DB2 table to memory. This takes longer than activation of a persistent cache, especially for large indexes. However, it might provide slightly better search performance.
A persistent cache
This is maintained on disk and can be quickly mapped to memory by means of the operating system on each DB2TEXT ACTIVATE CACHE command. In incremental index update scenarios, it must remain activated to allow synchronization between the index and the cache. If this does not occur, the next DB2TEXT ACTIVATE CACHE command recreates the cache from scratch.

The following methods of updating a text index are available:

Without the Recreate index on update option
Avoid deleting and re-inserting a document in the table as the slot for a deleted document cannot be reused in the cache. As a consequence, the changing of key columns should be avoided on an activated index.

This is also known as incremental update.

With the Recreate index on update option
This recreates the index on each update. Use variable data types in the cache column expressions wherever possible. This will save cache space. Use the corresponding cast expressions in the CACHE TABLE clause.

Use this option if you expect to insert more than 50% of your documents after the initial index activation.

Net Search Extender provides two SQL functions to help you determine the CREATE INDEX memory parameters. These are: MAXIMUM CACHE SIZE and PCTFREE.

For incremental and recreate updates
The following UDF function returns the recommended MAXIMUM CACHE SIZE value in megabytes (MB):
DB2EXT.MAXIMUM_CACHE_SIZE(maximumNumberDocs INTEGER, 
       averageRowLength INTEGER, numberOfCacheColumns INTEGER)
The following command returns the average row length parameter from your table:
SELECT AVG(LENGTH(cache column_1) + ... + LENGTH(cache column_n))
Note that the average may change significantly when further documents are inserted into your table. The number of cache columns relates to the number of column expressions used in the CACHE TABLE clause of the DB2TEXT CREATE INDEX command.

For additional information, see Appendix B. Using large amounts of memory.

For incremental updates only
The following UDF function returns the recommended PCTFREE value based on the actual and maximum numbers of documents.
DB2EXT.PCTFREE(actualNumberDocs INTEGER, maximumNumberDocs INTEGER)
The actual numbers of documents are the number of rows in your table at the time of the first ACTIVATE CACHE command, which creates the memory cache.

The maximum number of documents is an estimate of the maximum number of documents in your table before the next DB2TEXT ACTIVATE command (for a temporary cache), or DB2TEXT ACTIVATE CACHE RECREATE command (for a persistent cache) is run.

If you are recreating the index on each update, set the PCTFREE value to 0.

Examples

Assume that you have 10 000 rows in your table and you do not expect more then 20 000. Use the following call to calculate the PCTFREE value you require:

db2  "values DB2EXT.PCTFREE(10000,20000) "

Assume that your maximum row size is 20 000 and that you have 2 columns in your cache with an average size of 76. Use the following call to return the size:

db2 " values DB2EXT.MAXIMUM_CACHE_SIZE(20000,76,2) "

After determining suitable parameters, you can create your index and cache table by using the following call:

db2text CREATE INDEX db2ext.comment FOR TEXT ON db2ext.texttab (comment) 
          CACHE TABLE (docid) PCTFREE 10 MAXIMUM CACHE SIZE 5

In this example, the docid column is built in addition to the index, using main memory for fast result table return. Ten percent of the cache memory is reserved for future documents and the cache is limited to a maximum of 5 MB.

Updating the text index

To search on this index, you need to update and then activate the index. This copies the specified table cache expression from the database into memory.

If during ACTIVATE or UPDATE operations, the MAXIMUM CACHE SIZE is exceeded, the following actions are recommended:

If you expect frequent updates on documents, consider using fixed-size data types for the cache column expressions in the CACHE TABLE clause. The following example shows how you can use the same cache storage during update operations:

CACHE TABLE(cast(C1 as char(20)), cast(substr(C2,1,10) as char(10))....

In this case, you ensure that only the non-variable data types are used.

Activating and deactivating the cache for a text index

Before cache activation, perform any pending incremental updates to avoid a poor PCTFREE calculation.

To activate the text index, use the following command:

db2text ACTIVATE CACHE FOR INDEX db2ext.comment FOR TEXT

This command retrieves the specified cache table data out of the database and stores this in memory. The time taken depends on the size of the table.

Note

If you call update index when an index is activated, this will also update the cache tables. As deleted documents may take slots in cache memory, ensure that you set PCTFREE with a high enough value.

The ACTIVATE CACHE call needs to be redone every time you stop your system. If you use the persistent cache, the new activate will be quicker.

Note

If an update occurs when the persistent cache is not activated, the persistent cache is dropped and recreated during the activate call.

To save resources, you should also deactivate any indexes that are not currently required.

To check how much memory is left, use the following call:

db2text control show cache status for database cte index db2ext.comment

This displays whether the index has been activated and how much of the specified cache space is left.