This command creates a full-text index on a text column for use in DB2 Net Search Extender full-text queries.
In a distributed DB2 environment, a full-text index is created on every partition of the tablespace the user table is defined on. Subsequent changes to the distribution of the tablespace are not allowed and will lead to unexpected behavior in the administration commands and during the search process.
Authorization
According to DB2 catalog views, the user ID in this command must have the CONTROL privilege on the table where the full-text index was created.
Command syntax
>>-CREATE-INDEX--+------------------+--index-name FOR TEXT------> '-index-schema-"."-' >--ON--+------------------+--table-name-------------------------> '-table-schema-"."-' >--+-(text-column-name)--------------------------------------------+--> '-+-----------------------+--function-name-(-text-column-name-)-' '-|function-schema "."|-' >--+------------------+--+----------------------------+---------> '-|attribute-list|-' '-|text-default-information|-' >--+--------------------------+--+-------------------+----------> '-|update-characteristics|-' '-|storage-options|-' >--+-------------------------------+----------------------------> '-|cache-search-result-options|-' >--+-------------------------------+----------------------------> '-|index-configuration-options|-' >--+----------------------+------------------------------------>< '-|connection-options|-' attribute list .-','------------------------------------------. V | |--ATTRIBUTES--(----SQL-column-expression--+-------------------+-+--)--| '-AS-attribute-name-' text-default-information |--+--------------+--+--------------------+---------------------> '-CCSID--ccsid-' '-LANGUAGE--language-' >--+-----------------------------------------+------------------| '-FORMAT--format--+---------------------+-' '-|model-information|-' model-information |--DOCUMENTMODEL--documentmodel-name--IN--modelfilepath---------> >--+--------------------+---------------------------------------| '-USING-CCSID--ccsid-' update-characteristics |--+------------------------------------------+-----------------> '-UPDATE-FREQUENCY--+-NONE---------------+-' '-|update-frequency|-' >--+-|incremental-update-characteristics|-+---------------------| '-RECREATE INDEX ON UPDATE-------------' incremental-update-characteristics |--+----------------------------+-------------------------------> '-UPDATE-MINIMUM--minchanges-' >--+---------------------------+--------------------------------> '-REORGANIZE--+-AUTOMATIC-+-' '-MANUAL----' >--+---------------------------------+--------------------------| +-COMMITCOUNT-FOR-UPDATE--count---+ '-|capture-table-characteristics|-' capture-table-characteristics |--REPLICATION-CAPTURE-TABLE------------------------------------> >--+--------------------------+--capture-table-name-------------> '-capture-table-schema-"."-' >--CONTROL TABLE SCHEMA--capture-control-schema-----------------| update-frequency |--D--(--+-*---------+--)--H--(--+-*----------+--)--M-----------> | .-,-----. | | .-,------. | | V | | | V | | '---0...6-+-' '---0...23-+-' .-,------. V | >--(----0...59-+--)---------------------------------------------| storage-options |--+----------------------------+-------------------------------> '-INDEX-DIRECTORY--directory-' >--+-------------------------------+----------------------------> '-WORK-DIRECTORY--workdirectory-' >--+-------------------------------------------+----------------| '-ADMINISTRATION-TABLES-IN--tablespace-name-' cache-search-results-options |--CACHE TABLE--------------------------------------------------> .-','-------------------------------------------. V | >--(----SQL-column-expression--+--------------------+-+--)------> '-AS--attribute-name-' >--+-------------------------------+--+---------------------+---> +-PERSISTENT--+---------------+-+ '-PCTFREE--percentage-' | '-IN--directory-' | '-TEMPORARY---------------------' >--MAXIMUM CACHE SIZE--memsize----------------------------------> >--+------------------------------------------------------+-----> '-INITIAL SEARCH RESULT ORDER--(--SQL-order-by-list--)-' >--+------------------------------------------------------+-----| '-KEY COLUMNS FOR INDEX ON VIEW--(SQL-columnname-list)-' index-configuration-options .-,------------. V | |--INDEX CONFIGURATION--(----option-value-+--)------------------| connection-options |--CONNECT-TO--database-name--+-------------------------------+--| '-USER--userid--USING--password-'
Command parameters
See Appendix C, Net Search Extender information catalogs for details. Note that the index name must be a valid DB2 index name.
Note that when the table name does not refer to a DB2 base table, there are the following restrictions:
If the column type is none of these, specify a transformation function using function-schema.function-name to convert the column type.
Note that, if you use a Data Link column, the referenced content is be fetched for indexing. This is via the protocol that is part of the Data Link value, for example, Http. When using protocols other than "file" or "unc", ensure that you support these with servers that are part of the Data Link values. As proxy servers might be necessary to get the file content, the database administrator can specify them in the DB2EXT.PROXYINFORMATION table before index creation.
Note that several indexes on the same columns are allowed, but only with the following conditions:
Therefore, it does not matter which index is chosen by the CONTAINS, SCORE, or NUMBEROFMATCHES arguments.
Determine the attribute names for expressions by using the following rules:
For example: ATTRIBUTES (CAST(JULIAN_DAY(date) AS DOUBLE) as day, (price1+price2)/2 as avg_price)
Note that attributes without quotes are mapped to uppercase and must be specified in this way during search.
For structured document formats, you can specify information in a document model file. If no document model is specified, the text of the document is indexed using a default document model. See Document models.
If the format keyword is not specified, the default value is from the DB2EXT.DBDEFAULTS view where the DEFAULTNAME='FORMAT'.
As document models do not need to be referenced in search conditions, use all the section names in the model file instead. For details on document models, see Chapter 9, Working with structured documents. Note that as the document model is only read during the CREATE INDEX command, any later changes are not recognized for this index.
Note that in a distributed DB2 environment, use a shared file system to ensure the modelfilepath is accessible on every node.
The default value is from the DB2EXT.DBDEFAULTS view where DEFAULTNAME='UPDATEFREQUENCY'.
Note that this value is ignored in a DB2TEXT UPDATE command. This option cannot be used with the RECREATE INDEX ON UPDATE option, as the number of changes is not available without a log table and triggers for incremental update.
For distributed databases, the UPDATE MINIMUM is checked on every node.
REORGANIZE MANUAL can only be performed with a manual UPDATE command, using the REORGANIZE option.
If the REORGANIZE clause is omitted, the default is taken from the DB2EXT.DBDEFAULTS view, where DEFAULTNAME='AUTOMATICREORG'.
For further information on the REORGANIZE option, see UPDATE INDEX.
The capture-control-schema is the schema name of the replication control tables, for example IBMSNAP_PRUNE_SET on the local DB2. The replication control tables must be available as nicknames on the local DB2 system after setting up the replication.
At minimum, there must be nicknames available for the following capture control tables:
As DB2 Replication Center does not automatically guarantee to create local nicknames for a remote capture table and capture control tables, this can be a manual task. The task is similar to creating a nickname for the table that the text index is to be created on.
The column names of primary key columns in the user table nickname and the capture table nickname must match. In addition, the names of the columns IBMSNAP_OPERATION, IBMSNAP_COMMITSEQ and IBMSNAP_INTENTSEQ must not be changed in the capture table nickname.
After index creation, the column names DB2EXT.TEXTINDEXES(LOGVIEWNAME) and DB2EXT.TEXTINDEXES(LOGVIEWSCHEMA) both refer to the local name of the replication capture table.
As Net Search Extender does not require all the functionality of the DB2 Replication Center, the Change Data table (CD) or the Consistent-Change Data (CCD) table must obey following rules:
Other prerequisites include:
Notes and restrictions |
---|
Ensure that the correct source table name is inserted into the registration table. Depending on the type of remote DBMS, the remote tablename or the local nickname must be used:
A user mapping must exist for the local user to access the remote data source via nicknames and the remote user must have control privilege on the tables. If the DB2 instance owner user ID is different from the local user ID, an additional user mapping for the DB2 instance owner user ID is needed. The specified base table name must not be a view on a nickname. This is because a view can be over several nicknames and several CD and CCD tables can also be involved. As only one CD or CCD table can be specified in the replication capture clause, a view on nicknames can not be supported. In addition, nicknames on a remote views can not be supported because the primary key is missing. The CD or CCD table must be a nickname and can not be a view or an alias. |
For information on the DB2 Replication Guide and Reference Version 8, see Related information.
The COMMITCOUNT FOR UPDATE value for the index can be found in DB2EXT.TEXTINDEXES.COMMITCOUNT. This can be changed for each index using the ALTER INDEX command. It also applies to the scheduled update processing according to the UPDATE FREQUENCY specification. A value of 0 means that the update is completed in one transaction, with values >0 specifying the number of documents to process in one transaction.
The use of commitcount has implications on performance. For information, see Performance considerations.
Note |
---|
No triggers are created on the user table and no log table is created. |
The default value is taken from the DB2EXT.DBDEFAULTS view, where DEFAULTNAME=INDEXDIRECTORY'. A subdirectory, NODE<nr>, is created under the directory to distinguish indexes on logical nodes of a server.
Note that in a distributed DB2 environment, this directory has to exist on every physical node.
The default value is taken from the DB2EXT.DBDEFAULTS view, where DEFAULTNAME='WORKDIRECTORY'. A subdirectory, NODE<nr>, is created under the directory to distinguish indexes on logical nodes of a server.
Note that in a distributed DB2 environment, this directory has to exist on every physical node.
In case of a nickname or a view, a default tablespace is chosen by DB2.
When creating text indexes on views, nicknames, or text indexes for stored procedure search on a distributed DB2 environment, the tablespace has to be single-noded.
Define the SQL-column expressions using unqualified column names of the table the index is created on. The allowed SQL-column expression types are all built-in and user-defined distinct types. The column names in the result set are determined using the following rules:
Note |
---|
Note that if the column names of the result set are not disjunct, the CREATE INDEX command returns an error. Also note that the cached table is not implicitly activated after creation, for example search by stored procedure is not possible until DB2TEXT ACTIVATE CACHE is performed. |
This option may be used in a distributed DB2 environment only if the user table is stored in a single-noded tablespace.
Note that if the directory is not specified, the default is taken from the db2ext.dbdefaults view, where DEFAULTNAME='CACHEDIRECTORY'.
The limit for the maximum cache size on the different platforms is:
For more information, see Appendix B, Using large amounts of memory.
See ACTIVATE CACHE for details.
For further information, see Chapter 16, Stored procedure search function.
Note |
---|
The index order can not be ensured for the new or changed documents after incremental update. For example: INITIAL RESULT ORDER(length(column1) asc, column2+column3 desc) |
As this uniqueness cannot be checked by DB2 as in case of primary keys, the user is responsible to ensure the equivalent uniqueness. The specified columns build part of the log table for the index.
Option | Values | Description |
TreatNumbersAsWords | 0 or 1 | Interprets sequences of digits as separate words, even if they are adjacent to characters, For example, the 0 default means that tea42at5 is considered as one word. |
IndexStopWords | 0 or 1 | Considers or ignores stopwords during indexing. Currently, the stopword list is an UCS-2 file <language>.tsw in directory <instance>/sqllib/db2ext/resources. Changes to this file have no effect after index creation. Also note that <language> is the LANGUAGE value from the CREATE INDEX command. |
UpdateDelay | seconds | Specifies the duration in seconds for incremental update without capture tables. Only entries older than this duration will be taken from the log table. This is to avoid lost updates. For example, document changes that are not reflected in the index in transaction scenarios where user transactions interfere with update commands. Therefore, the UpdateDelay parameter should be set to the maximum duration of a user write transaction on the table the index was created on. |
The entries in these tables are uniquely identified by the columns:
See page *** for more information on the columns.
Usage
Creation of a full-text index requires a primary key on the user table. In DB2 Net Search Extender Version 8.1, a multicolumn DB2 primary key can be used without type limitations. However, to use the table-valued search, no compound primary key is allowed.
The number of primary key columns is limited to 14, the total length of all primary key columns is limited to 1024 - 14 = 1010 bytes.
Note |
---|
After creating the index, the length of primary key columns or the view key columns must not be changed by ALTER TABLE commands. The synchronization between user table, full-text index and the cached result table is completed during the update index command. For further information, see the UPDATE INDEX. |