IBM Books
(C) IBM Corp. 2000

DB2 Net Search Extender Administration and User's Guide

CREATE INDEX

This command creates a full-text index on a text column for use in DB2 Net Search Extender full-text queries.

In a distributed DB2 environment, a full-text index is created on every partition of the tablespace the user table is defined on. Subsequent changes to the distribution of the tablespace are not allowed and will lead to unexpected behavior in the administration commands and during the search process.

Authorization

According to DB2 catalog views, the user ID in this command must have the CONTROL privilege on the table where the full-text index was created.

Command syntax

>>-CREATE-INDEX--+------------------+--index-name FOR TEXT------>
                 '-index-schema-"."-'
 
>--ON--+------------------+--table-name------------------------->
       '-table-schema-"."-'
 
>--+-(text-column-name)--------------------------------------------+-->
   '-+-----------------------+--function-name-(-text-column-name-)-'
     '-|function-schema "."|-'
 
>--+------------------+--+----------------------------+--------->
   '-|attribute-list|-'  '-|text-default-information|-'
 
>--+--------------------------+--+-------------------+---------->
   '-|update-characteristics|-'  '-|storage-options|-'
 
>--+-------------------------------+---------------------------->
   '-|cache-search-result-options|-'
 
>--+-------------------------------+---------------------------->
   '-|index-configuration-options|-'
 
>--+----------------------+------------------------------------><
   '-|connection-options|-'
 
attribute list
 
                  .-','------------------------------------------.
                  V                                              |
|--ATTRIBUTES--(----SQL-column-expression--+-------------------+-+--)--|
                                           '-AS-attribute-name-'
 
text-default-information
 
|--+--------------+--+--------------------+--------------------->
   '-CCSID--ccsid-'  '-LANGUAGE--language-'
 
>--+-----------------------------------------+------------------|
   '-FORMAT--format--+---------------------+-'
                     '-|model-information|-'
 
model-information
 
|--DOCUMENTMODEL--documentmodel-name--IN--modelfilepath--------->
 
>--+--------------------+---------------------------------------|
   '-USING-CCSID--ccsid-'
 
update-characteristics
 
|--+------------------------------------------+----------------->
   '-UPDATE-FREQUENCY--+-NONE---------------+-'
                       '-|update-frequency|-'
 
>--+-|incremental-update-characteristics|-+---------------------|
   '-RECREATE INDEX ON UPDATE-------------'
 
incremental-update-characteristics
 
|--+----------------------------+------------------------------->
   '-UPDATE-MINIMUM--minchanges-'
 
>--+---------------------------+-------------------------------->
   '-REORGANIZE--+-AUTOMATIC-+-'
                 '-MANUAL----'
 
>--+---------------------------------+--------------------------|
   +-COMMITCOUNT-FOR-UPDATE--count---+
   '-|capture-table-characteristics|-'
 
capture-table-characteristics
 
|--REPLICATION-CAPTURE-TABLE------------------------------------>
 
>--+--------------------------+--capture-table-name------------->
   '-capture-table-schema-"."-'
 
>--CONTROL TABLE SCHEMA--capture-control-schema-----------------|
 
update-frequency
 
|--D--(--+-*---------+--)--H--(--+-*----------+--)--M----------->
         | .-,-----. |           | .-,------. |
         | V       | |           | V        | |
         '---0...6-+-'           '---0...23-+-'
 
      .-,------.
      V        |
>--(----0...59-+--)---------------------------------------------|
 
storage-options
 
|--+----------------------------+------------------------------->
   '-INDEX-DIRECTORY--directory-'
 
>--+-------------------------------+---------------------------->
   '-WORK-DIRECTORY--workdirectory-'
 
>--+-------------------------------------------+----------------|
   '-ADMINISTRATION-TABLES-IN--tablespace-name-'
 
cache-search-results-options
 
|--CACHE TABLE-------------------------------------------------->
 
      .-','-------------------------------------------.
      V                                               |
>--(----SQL-column-expression--+--------------------+-+--)------>
                               '-AS--attribute-name-'
 
>--+-------------------------------+--+---------------------+--->
   +-PERSISTENT--+---------------+-+  '-PCTFREE--percentage-'
   |             '-IN--directory-' |
   '-TEMPORARY---------------------'
 
>--MAXIMUM CACHE SIZE--memsize---------------------------------->
 
>--+------------------------------------------------------+----->
   '-INITIAL SEARCH RESULT ORDER--(--SQL-order-by-list--)-'
 
>--+------------------------------------------------------+-----|
   '-KEY COLUMNS FOR INDEX ON VIEW--(SQL-columnname-list)-'
 
index-configuration-options
 
                           .-,------------.
                           V              |
|--INDEX CONFIGURATION--(----option-value-+--)------------------|
 
connection-options
 
|--CONNECT-TO--database-name--+-------------------------------+--|
                              '-USER--userid--USING--password-'
 
 

Command parameters

index schema
The schema of the text index. Use this as a DB2 schema name for the index-specific administration tables. If no schema is specified, the user ID of the DB2 connection is used. Note that the index schema must be a valid DB2 schema name.

index name
The name of the index. Together with the index schema, this uniquely identifies a full-text index in a database. It also serves as the name of the index event table.

See Appendix C, Net Search Extender information catalogs for details. Note that the index name must be a valid DB2 index name.

table schema
The schema for which of the table, nickname, or view the index is created. If no schema is specified, the user ID of the DB2 connection is used.

table name
The name of the text table, nickname, or view in the connected database that contains the column the full-text index is created for.

Note that when the table name does not refer to a DB2 base table, there are the following restrictions:

text-column-name
The name of the column containing the text used for creating the full-text index. The column must be one of the following types:

If the column type is none of these, specify a transformation function using function-schema.function-name to convert the column type.

Note that, if you use a Data Link column, the referenced content is be fetched for indexing. This is via the protocol that is part of the Data Link value, for example, Http. When using protocols other than "file" or "unc", ensure that you support these with servers that are part of the Data Link values. As proxy servers might be necessary to get the file content, the database administrator can specify them in the DB2EXT.PROXYINFORMATION table before index creation.

Note that several indexes on the same columns are allowed, but only with the following conditions:

The index is created on a view
Therefore, you can not use the index in the CONTAINS, SCORE, or NUMBEROFMATCHES search arguments.

The index is created on a table
If all the indexes are synchronized, they have identical properties on the same column in the following CREATE INDEX command details:
  • Function name and schema
  • ATTRIBUTES
  • CCSID
  • LANGUAGE
  • FORMAT
  • DOCUMENTMODEL
  • INDEX CONFIGURATION

Therefore, it does not matter which index is chosen by the CONTAINS, SCORE, or NUMBEROFMATCHES arguments.

function-schema.function-name
The schema and the name of a user-defined function used to access text documents that are in a column of an unsupported type. The function performs a column type conversion, using the one input parameter of an arbitrary column type. It returns the value of one of the Net Search Extender supported types.

ATTRIBUTES (SQL-column-expression AS Attribute-name, ...)
Ensures that the content of a column expression is indexed in addition to the text column. This content can also be searched by the ATTRIBUTE clause in a search statement. The SQL-column expressions have to be defined using unqualified column names of the table on which the index is created. The only data types allowed are double. Cast operators can be used in the column expressions, but implicit casting of DB2 is not possible. The attribute-names must follow the rules for attribute-names in document models and must be different from attribute names in the indexes model-definition file.

Determine the attribute names for expressions by using the following rules:

For example: ATTRIBUTES (CAST(JULIAN_DAY(date) AS DOUBLE) as day, (price1+price2)/2 as avg_price)

Note that attributes without quotes are mapped to uppercase and must be specified in this way during search.

CCSID ccsid
The Coded Character Set Identifier is used when indexing text documents. The default value is from the DB2EXT.DBDEFAULTS view where DEFAULTNAME='CCSID'.

LANGUAGE language
For a list, see Appendix E, Supported languages. The default value is from the DB2EXT.DBDEFAULTS view where DEFAULTNAME='LANGUAGE'.

FORMAT format
The format of text documents in the column, for example, HTML. This information is necessary for indexing documents. See Document formats and supported code pages for a list of document formats that are supported for structured documents.

For structured document formats, you can specify information in a document model file. If no document model is specified, the text of the document is indexed using a default document model. See Document models.

If the format keyword is not specified, the default value is from the DB2EXT.DBDEFAULTS view where the DEFAULTNAME='FORMAT'.

DOCUMENTMODEL documentmodel-name IN modelfilepath
The modelfilepath specifies the location of a model file. This contains a model definition for the format in the FORMAT clause. It must be readable by the DB2 instance owner. A document model enables you to index and search specific sections of a document. You can define markup tags and section names in a document model. A document model is bound to a document format that supports HTML, XML, or GPP structures. You can only specify one document model in a model file.

As document models do not need to be referenced in search conditions, use all the section names in the model file instead. For details on document models, see Chapter 9, Working with structured documents. Note that as the document model is only read during the CREATE INDEX command, any later changes are not recognized for this index.

Note that in a distributed DB2 environment, use a shared file system to ensure the modelfilepath is accessible on every node.

USING CCSID ccsid
Specify a CCSID to interpret the contents of the model file. The default value is from the DB2EXT.DBDEFAULTS view where DEFAULTNAME='MODELCCSID'.

UPDATE FREQUENCY
The index update frequency determines when the update occurs. If changes to the user table are less than that specified by the UPDATE MINIMUM option, the index is not updated. If you do not specify the UPDATE FREQUENCY, the default NONE is used, so that no further index updates are made. This is useful when there are to be no further changes to a text column.

The default value is from the DB2EXT.DBDEFAULTS view where DEFAULTNAME='UPDATEFREQUENCY'.

UPDATE MINIMUM minchanges
The minimum number of changes allowed to text documents before the index is updated automatically by the UPDATE FREQUENCY. Positive integer values are allowed. The default value is taken from the DB2EXT.DBDEFAULTS view, where DEFAULTNAME='UPDATEMINIMUM'.

Note that this value is ignored in a DB2TEXT UPDATE command. This option cannot be used with the RECREATE INDEX ON UPDATE option, as the number of changes is not available without a log table and triggers for incremental update.

For distributed databases, the UPDATE MINIMUM is checked on every node.

REORGANIZE AUTOMATIC/MANUAL
Updates performed using the update frequency will only recognize the index if REORGANIZE AUTOMATIC is specified. This step is completed automatically according to the value of select REORGSUGGESTED from DB2EXT.TEXTINDEXES after the update.

REORGANIZE MANUAL can only be performed with a manual UPDATE command, using the REORGANIZE option.

If the REORGANIZE clause is omitted, the default is taken from the DB2EXT.DBDEFAULTS view, where DEFAULTNAME='AUTOMATICREORG'.

For further information on the REORGANIZE option, see UPDATE INDEX.

REPLICATION CAPTURE TABLE capture-table-schema.capture-table-name CONTROL TABLE SCHEMA capture-control-schema
For incremental update processing, the specified replication capture table is taken instead of a log table that is normally created for the index. Therefore, schemaname, tablename, and the replication capture table name relate to objects in the local DB2 (federated) database.

The capture-control-schema is the schema name of the replication control tables, for example IBMSNAP_PRUNE_SET on the local DB2. The replication control tables must be available as nicknames on the local DB2 system after setting up the replication.

At minimum, there must be nicknames available for the following capture control tables:

As DB2 Replication Center does not automatically guarantee to create local nicknames for a remote capture table and capture control tables, this can be a manual task. The task is similar to creating a nickname for the table that the text index is to be created on.

The column names of primary key columns in the user table nickname and the capture table nickname must match. In addition, the names of the columns IBMSNAP_OPERATION, IBMSNAP_COMMITSEQ and IBMSNAP_INTENTSEQ must not be changed in the capture table nickname.

After index creation, the column names DB2EXT.TEXTINDEXES(LOGVIEWNAME) and DB2EXT.TEXTINDEXES(LOGVIEWSCHEMA) both refer to the local name of the replication capture table.

As Net Search Extender does not require all the functionality of the DB2 Replication Center, the Change Data table (CD) or the Consistent-Change Data (CCD) table must obey following rules:

Other prerequisites include:

Notes and restrictions

Ensure that the correct source table name is inserted into the registration table. Depending on the type of remote DBMS, the remote tablename or the local nickname must be used:

  • DB2: remote table name (the tablename on the remote server)
  • Non-DB2: local nickname (the corresponding nickname in the federated DB2 database)

A user mapping must exist for the local user to access the remote data source via nicknames and the remote user must have control privilege on the tables.

If the DB2 instance owner user ID is different from the local user ID, an additional user mapping for the DB2 instance owner user ID is needed.

The specified base table name must not be a view on a nickname. This is because a view can be over several nicknames and several CD and CCD tables can also be involved. As only one CD or CCD table can be specified in the replication capture clause, a view on nicknames can not be supported. In addition, nicknames on a remote views can not be supported because the primary key is missing.

The CD or CCD table must be a nickname and can not be a view or an alias.

For information on the DB2 Replication Guide and Reference Version 8, see Related information.

COMMITCOUNT FOR UPDATE count
For incremental update processing a commitcount can be specified, see UPDATE INDEX for further information. If not specified, a default value is taken from the DB2EXT.DBDEFAULTS view, where DEFAULTNAME='COMMITCOUNT'.

The COMMITCOUNT FOR UPDATE value for the index can be found in DB2EXT.TEXTINDEXES.COMMITCOUNT. This can be changed for each index using the ALTER INDEX command. It also applies to the scheduled update processing according to the UPDATE FREQUENCY specification. A value of 0 means that the update is completed in one transaction, with values >0 specifying the number of documents to process in one transaction.

The use of commitcount has implications on performance. For information, see Performance considerations.

RECREATE INDEX ON UPDATE
This does not allow incremental index updates, but recreates the index when an update operation is performed (by command or scheduled update). See the Usage Notes on UPDATE INDEX for additional information.
Note

No triggers are created on the user table and no log table is created.

INDEX DIRECTORY directory
The directory path in which the text index is to be stored. As the directory will contain index data, ensure that the directory has read/write and execute permissions for the DB2 instance owner user ID.

The default value is taken from the DB2EXT.DBDEFAULTS view, where DEFAULTNAME=INDEXDIRECTORY'. A subdirectory, NODE<nr>, is created under the directory to distinguish indexes on logical nodes of a server.

Note that in a distributed DB2 environment, this directory has to exist on every physical node.

WORK DIRECTORY directory
A separate work directory may be specified optionally, that will be used to store temporary files during index search and administration operations. The directory must exist and have read/write and execute permissions for the DB2 instance owner user ID.

The default value is taken from the DB2EXT.DBDEFAULTS view, where DEFAULTNAME='WORKDIRECTORY'. A subdirectory, NODE<nr>, is created under the directory to distinguish indexes on logical nodes of a server.

Note that in a distributed DB2 environment, this directory has to exist on every physical node.

ADMINISTRATION TABLES IN tablespace-name
The name of the regular table space for administration tables created for the index. The table space must exist. If not specified, the tablespace of the user table is chosen, if the index is created on a base table.

In case of a nickname or a view, a default tablespace is chosen by DB2.

When creating text indexes on views, nicknames, or text indexes for stored procedure search on a distributed DB2 environment, the tablespace has to be single-noded.

CACHE TABLE (SQL-column-expression-list)
A cached table is built in addition to the index, which consists of the specified column expressions. This cache is used to return the result set via a stored procedure search without joining full-text search results with a DB2 table. Note that a regular DB2 search using the full-text index with the CONTAINS function is always possible.

Define the SQL-column expressions using unqualified column names of the table the index is created on. The allowed SQL-column expression types are all built-in and user-defined distinct types. The column names in the result set are determined using the following rules:

CLOB data types are not supported as cache data types. You need to cast these to VARCHARS.
Note

Note that if the column names of the result set are not disjunct, the CREATE INDEX command returns an error. Also note that the cached table is not implicitly activated after creation, for example search by stored procedure is not possible until DB2TEXT ACTIVATE CACHE is performed.

This option may be used in a distributed DB2 environment only if the user table is stored in a single-noded tablespace.

PERSISTENT IN directory
Specifies that the cache is also created persistent and could be activated shortly after a deactivation or a system reboot. The persistent cache is stored in the specified directory.

Note that if the directory is not specified, the default is taken from the db2ext.dbdefaults view, where DEFAULTNAME='CACHEDIRECTORY'.

TEMPORARY
Specifies that the cache is not stored persistent. If neither PERSISTENT or TEMPORARY is specified, the default is taken from the DB2EXT.DBDEFAULTS view, where DEFAULTNAME='USEPERSISTENTCACHE'.

MAXIMUM CACHE SIZE memsize
Specifies the maximum size of the cached table to be built during DB2TEXT ACTIVATE CACHE. The memsize parameter must be specified in megabytes as a positive integer. There is no default value for memsize. If the integer is too small, the ACTIVATE CACHE command will fail. The actual cache size is calculated during the ACTIVATE CACHE command.

The limit for the maximum cache size on the different platforms is:

For more information, see Appendix B, Using large amounts of memory.

PCTFREE percentage
Specifies the percentage of the cache to be held free for additional documents. The percentage must be an integer value lower than 100 and greater or equal to 0. If not specified, the default is taken from the db2ext.dbdefaults view, where DEFAULTNAME='PCTFREE'.

See ACTIVATE CACHE for details.

INITIAL SEARCH RESULT ORDER (SQL-order-by-list)
Specifies the order used for retrieving the user table contents during initial indexing. When using this option and skipping the dynamic ranking of full-text search results, the documents are returned in their indexing order, as stored in the cached result table.

For further information, see Chapter 16, Stored procedure search function.

Note

The index order can not be ensured for the new or changed documents after incremental update. For example: INITIAL RESULT ORDER(length(column1) asc, column2+column3 desc)

KEY COLUMNS FOR INDEX ON VIEW (SQL-columnname-list)
If indexes on views are created, the KEY COLUMNS FOR INDEX ON VIEW clause must be specified, otherwise it MUST NOT be specified. The list of column names specifies the columns that UNIQUELY identify a row in the view.

As this uniqueness cannot be checked by DB2 as in case of primary keys, the user is responsible to ensure the equivalent uniqueness. The specified columns build part of the log table for the index.

INDEX CONFIGURATION (option-value), ...
These are the index configuration values. The default values are underlined.
Option Values Description
TreatNumbersAsWords 0 or 1 Interprets sequences of digits as separate words, even if they are adjacent to characters, For example, the 0 default means that tea42at5 is considered as one word.
IndexStopWords 0 or 1 Considers or ignores stopwords during indexing. Currently, the stopword list is an UCS-2 file <language>.tsw in directory <instance>/sqllib/db2ext/resources. Changes to this file have no effect after index creation. Also note that <language> is the LANGUAGE value from the CREATE INDEX command.
UpdateDelay seconds Specifies the duration in seconds for incremental update without capture tables. Only entries older than this duration will be taken from the log table. This is to avoid lost updates. For example, document changes that are not reflected in the index in transaction scenarios where user transactions interfere with update commands. Therefore, the UpdateDelay parameter should be set to the maximum duration of a user write transaction on the table the index was created on.

CONNECT TO database-name
The name of the database that is target for this command. You can omit this parameter, if DB2DBDFT is set and the user is running the command on the server. Note that the user ID must have the required DB2 authorizations.

USER userid USING password
Use a password and userid to connect to the database. If not specified, a connection is attempted from the current user ID without a password.

Changes to the database

Changes to the shared memory
Deferred to ACTIVATE execution: If CACHE TABLE clause is used, a cache for the result table is built in shared memory.

Changes to the file system

Usage

Creation of a full-text index requires a primary key on the user table. In DB2 Net Search Extender Version 8.1, a multicolumn DB2 primary key can be used without type limitations. However, to use the table-valued search, no compound primary key is allowed.

The number of primary key columns is limited to 14, the total length of all primary key columns is limited to 1024 - 14 = 1010 bytes.

Note

After creating the index, the length of primary key columns or the view key columns must not be changed by ALTER TABLE commands.

The synchronization between user table, full-text index and the cached result table is completed during the update index command. For further information, see the UPDATE INDEX.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]