DB2 Net Search Extender needs to know the format (or type) of text documents that you intend to search. This information is necessary for indexing text documents.
DB2 Net Search Extender supports the following document formats:
For the document formats HTML, XML, GPP, and the Outside-In filter formats, searching can be restricted to specific parts of a document. Chapter 9, Working with structured documents explains how to define and work with document models.
Where Outside-In filters can not be used for nonsupported document formats, you can write a User Defined Function (UDF). This UDF must be specified at index creation time and converts the data from the nonsupported format to a supported format.
See CREATE INDEX for more information.
You can index documents if they are in one of the supported Coded Character Set Identifiers (CCSIDs). These are also known as code pages. See Appendix D, Supported CCSIDs for a list of these code pages.
To check the database code page, use the following DB2 command:
db2 GET DB CFG for <dbname>
For consistency, DB2 normally converts the code page of a document to the code page of the database. However, when you store data in a DB2 database in a column having a binary data type, such as BLOB, FOR BIT DATA, or a datalink value, DB2 does not convert the data, and the documents retain their original CCSIDs.
Note that having two different code pages might cause problems when creating a text index or searching. See Creating a text index on binary data types for further information.