IBM Books
(C) IBM Corp. 2000

DB2 Net Search Extender Administration and User's Guide

Creating a text index

Select the Create command, and a Create Text Index Wizard pops up. Use the wizard to specify the configuration options for the text index in a number of panels.

To move between the panels, enter all the mandatory information and click on the Next button until the Finish button is enabled. Click on the Finish button to create the text index.

To create a text index on views, use the CREATE INDEX command described in CREATE INDEX.

Name panel

This panel allows you to specify the schema and name for the text index. You can also specify a work and index directory for the text index files. Create the administrative tables for the index on the administration tablespace.

Figure 10. Create Text Index Wizard: Name panel

Create Text Index Wizard: Name panel

Here is a description of the fields in the panel:

Table 1. Name panel text fields

Field Name Mandatory/ Optional Default Description
Index schema Mandatory user ID Select a schema name of the text index. This is the DB2 schema name for the index-specific administration tables.
Index name Mandatory N/A Enter a valid DB2 index name for the text index. With the index schema, this uniquely identifies a full-text index in the database.
Index directory Optional See the path name Specify the directory path where you will store the text index. The directory must exist with read, write, and run permissions for the DB2 instance owner user ID.
Work directory Optional See the path name Specify the work directory where you will store temporary files during search and administration operations. The directory must exist with read, write, and run permissions for the DB2 instance owner user ID.
Administration tablespace Optional Use the default tablespace Select a tablespace name for the text index administration tables. You must define the tablespace on the same node group as the tablespace for the user table.

Target panel

This panel allows you to specify the schema and name of the table or nickname table, and the name of the text column containing the data you want to index. You can use a transformation function to modify the content of the text column. In addition to the text column, you can also specify numeric attributes if you want to index content of a table column expression.

Figure 11. Create Text Index Wizard: Target panel

Create Text Index Wizard: Target panel

Here is a description of the fields in the panel:

Table 2. Target panel text fields

Field Name Mandatory/ Optional Default Description
Table schema (1) Mandatory user ID Select the schema of the table or nickname table on which you are creating a text index.
Table name (2) Mandatory N/A Select the name of the table or nickname table on which you are creating an index. The table must have a primary key.
Text column (3) Mandatory N/A Select the name of the column used for creating the text index. The column must be transformed to, or be one of the following types: CHAR (for bit data), VARCHAR (for bit data), LONG VARCHAR (for bit data), CLOB, DBCLOB, BLOB, GRAPHIC, VARGRAPHIC, LONG VARGRAPHIC, and DATALINK.
Transformation function Optional Disabled Select to use a transformation function.
Transformation function: Schema Mandatory (if function selected) user ID Select the schema of the UDF used to access the text documents.
Transformation function: Name As above N/A Select the name of a UDF used to access the text documents.

Note that you can only specify the table schema (1), table name (2) and text column (3) in this order.

Figure 12. Numeric Attributes dialog

Numeric Attributes dialog

To view or add attributes, click on the Numeric Attributes button. A window displays. To add numeric attributes to the index, click on the Add button and a further window displays. Specify the SQL column expression and name for the attribute.

Alternatively, select an attribute and press the appropriate buttons to change, move, or remove an entry.

Explaining Numeric Attributes

Use Numeric Attributes to index column expressions in addition to the text column. For example, if you want to index the column date of type TIMESTAMP in addition to the text column, specify a numeric attribute "cast(julian_day(date) as double)" and specify a name for the attribute.

Specify a numeric attribute if you want to use a numeric expression inside a search query. If you are searching with SQL queries, you can use a combined search instead of using numeric attributes, for example: WHERE numattrib = 123 AND contains('...').

Text Properties panel

This panel allows you to specify the language and format of the text documents. If the documents are not the same CCSID as the database and the text column is of binary type, specify the CCSID. Note that the database CCSID is initially selected. If your documents are of a GPP, HTML, Outside-In, or XML structured format, you can specify a document model.

Note

In the format list box, the Outside-In filtering format is known as INSO.

Figure 13. Create Text Index Wizard: Text Properties panel

Create Text Index Wizard: Text Properties panel

Here is a description of the fields in the panel:

Table 3. Text Properties panel text fields

Field Name Mandatory/ Optional Default Description
Language Optional EN_US Select a language to determine end-of-sentence and end-of-paragraph delimiters when indexing documents.
CCSID Optional CCSID of database Select the CCSID for indexing text documents.
Format Optional TEXT Select the text document format: HTML, XML, TEXT, INSO or GPP.
Default Document Model Optional Enabled Use the default document model.
User Document Model Optional Disabled Use your document model.
Model name Mandatory (if User Document Model selected) N/A Enter the name of the document model. For HTML, XML, Outside-In, and GPP formats, you can specify a document model. Note that the name is only found in the model file.
Model file As above N/A Specify the document model file. The file must be readable by DB2 instance owners.
Model CCSID As above Database CCSID Select the CCSID to interpret the contents of the document model file.
Treat numbers as words Optional Disabled Select to interpret sequences of digits as separate words, even if they are adjacent to characters.
Index stopwords Optional Enabled Select to enable language-specific stopword processing. The <language>.tsw in the directory sqllib/db2ext/resources contains the stopword list.

Update characteristics panel

This panel allows you to specify whether the index updates incrementally or is recreated from scratch. You can specify update settings so that the index automatically updates at the specified time.

Figure 14. Create Text Index Wizard: Update Characteristics panel

Create Text Index Wizard: Update Characteristics panel

Here is a description of the fields in the panel:

Table 4. Update Characteristics panel text fields

Field Name Mandatory/ Optional Default Description
Incremental update Optional Enabled Select for incremental index updates. If you do not enable the check box, you recreate the index when an update operation is performed.
Commitcount Optional 0 Number of changes processed during an update in one transaction.

Commitcount has implications on performance. For information, see Performance considerations.

Capture table characteristics Optional N/A Select to use a replication capture table for capturing changes on the source table. The replication capture table must either be a Capture Data (CD) table , or a Capture Change Data (CCD) table and replaces the DB2 Net Search Extender generated log table.
Replication capture schema name Optional User ID The schema name of the replication capture table. Note that the table must have been previously created using DB2 Replication.
Replication capture table name Mandatory, if Capture table characteristics enabled N/A The table name of the replication capture table. Note that the table must have been previously created using DB2 Replication.
Control table schema name Mandatory, if Capture table characteristics enabled N/A The control table schema name. Note that the tables must have been previously created using DB2 Replication.
Reorg automatic or manual radio button Optional or Mandatory Enabled/ disabled Completes index reorganization automatically or manually.
Minimum number of changes for Update Optional 1 Specify the minimum number of changes to the text documents before the index incrementally updates at the specified time.
Update schedule Optional Disabled Select to add automatic update settings.

To add index update settings, click on the Settings button. Note that this button is only enabled if you select Update Schedule. In the dialog, select the days, hours, and minutes for the update time. Note that if you select multiple days, the update occurs at the same time on all the selected days.

Cache table panel

This panel allows you to specify a cached table in addition to the index. You can specify the result columns to be cached and you can search the cache using a stored procedure. You can also specify other cache parameters, such as type, maximum size, and the order in which you retrieve the contents of the user table during initial indexing.

Figure 15. Create Text Index Wizard: Cache Table panel

Create Text Index Wizard: Cache Table panel

Here is a description of the fields in the panel:

Table 5. Result Cache panel text fields

Field Name Mandatory/ Optional Default Description
Enable table cache Optional Disabled Select to enable the building of a cached table.
Result column table Mandatory (if Enable table cache selected) N/A Displays a list of SQL column expressions specifying the search result columns.
Maximum number of documents Mandatory Row count of table See the following section: Determining cache utilization and cache size.
Average cache row length Mandatory N/A See the following section: Determining cache utilization and cache size.
Percentage of initial cache utilization Optional 50% Select the percentage of the cache held free for additional documents.
Maximum cache size Optional N/A Specify a maximum size for the cached table built during index activate. If the number is too small, the activation will fail.
Initial search result order Optional Disabled Select to define the search result order. Documents are returned in the same indexing order as in the cached table. This order can not be ensured after incremental update.
Use persistent cache Optional Enabled This option enables a fast activate execution after a deactivation or system reboot. Note that you must specify a directory path for the persistent cache. Leave disabled if the cache should be temporary.
Determining cache utilization and cache size

The Percentage of initial cache utilization specifies the percentage of the cache to be held free for additional documents. The Maximum cache size specifies the maximum size of the cached table to be built during activate cache. These options depend on the following factors:

  • The actual number of documents in the table.
  • The expected number of updates.
  • The average size of the SQL expressions you want to cache.

You can enter the recommended values for the Percentage of initial cache utilization and Maximum cache size. Alternatively, you can let them be calculated each time you enter values in the Maximum number of documents or Average cache row length fields.

The Maximum number of documents value is initially set to the row count of the table. Modify this according to the number of documents and expected number of changes. Include all document updates, additions, and deletions.

When you add an SQL expression to the cache table list, the Average cache row length is calculated according to the length of the result. As this is based on the number of rows in your table, the calculation can take a considerable amount of time. If you know that on average this value is smaller, modify the value.

For example, if your table has 10 entries and the sum of your column expressions is 100, then these values are initially set. If you expect that the maximum number of documents (including deleted ones) is 10 000, enter this figure. If you know that column expressions on average are smaller than the calculated value, such as a VARCHAR(100) and a filled-in text size of 10, use this figure for the average row size.

To define the initial search result order, click on the Define button. Note that this button is only enabled if you select the Initial Search Result Order check box. A dialog displays all the specified SQL column expressions. To add a result order, click on the Add button and, in the dialog, specify the SQL result order.

To change, move, or remove an entry, select the expression and click on the appropriate buttons.

Figure 16. Initial Search Result Order dialog

Initial Search Result Order dialog

To add SQL column expressions, click on the Add button, next to the Result Column table. In the dialog, specify the result column expression and name.

To change or remove an entry, click on the column expression which enables the appropriate buttons.

Figure 17. Change column expression dialog

Change column expression dialog

Summary panel

This panel provides an overview of the previously selected parameters.

Figure 18. Create Text Index Wizard: Summary panel

Create Text Index Wizard: Summary panel

Click on the Show Command button to view the commands that are run when you click on the Finish button. This action creates the text index.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]