IBM Books
(C) IBM Corp. 2000

DB2 Net Search Extender Administration and User's Guide

Search argument

Search argument syntax

>>-+----------------------+--+-------------------------+-------->
   '-RESULT LIMIT--number-'  '-EXPANSION LIMIT--number-'
 
>--+------------------------------------------+----------------->
   '-STOP SEARCH AFTER--number--+-DOCUMENT--+-'
                                '-DOCUMENTS-'
 
>--+-| boolean-search-expression |-+---------------------------><
   '-| freetext-argument |---------'
 
Boolean-search-expression
 
|--+-| search-term |-------------------------------------------------+--|
   '-| boolean-search-expression |--| operator-or |--| search-term |-'
 
search-term
 
|--+-| search-factor |-----------------------------------------------+--|
   +-| search-term |--| operator-and |--| search-factor |------------+
   +-| search-term |--| operator-accum |--| search-factor |----------+
   '-| search-term |--| operator-minus |--| positive-search-factor |-'
 
Search-factor
 
|--+-----+--| positive-search-factor |--------------------------|
   '-NOT-'
 
Positive-search-factor
 
|--+-+------------------------------------------------------------+--| search-primary |-+--|
   | |                  .-,----------------------------------.    |                     |
   | |                  V                                    |    |                     |
   | '-+-SECTION--+--(----"section-name"--+----------------+-+--)-'                     |
   |   '-SECTIONS-'                       '-WEIGHT--number-'                            |
   '-attribute-factor-------------------------------------------------------------------'
 
Search-primary
 
|--+-| text-literal |-------------------+-----------------------|
   +-| context-condition |--------------+
   +-| thesaurus-invocation |-----------+
   +-(--| boolean-seach-expression |--)-+
   '-(--| text-literal-list |--)--------'
 
Operator-and
 
|--&------------------------------------------------------------|
 
Operator-or
 
|--|------------------------------------------------------------|
 
Operator-accum
 
|--ACCUM--------------------------------------------------------|
 
Operator-minus
 
|--MINUS--------------------------------------------------------|
 
Context-condition
 
|----| context-argument |--| IN-SAME |--| context-unit |--| AS |--| context-argument |---->
 
>--+-------------------------------+----------------------------|
   | .---------------------------. |
   | V                           | |
   '---AND--| Context-argument |-+-'
 
Context-argument
 
|--+-| text-literal |------------+------------------------------|
   +-(--| text-literal-list |--)-+
   '-| thesaurus-invocation |----'
 
Text-literal-list
 
   .-,------------.
   V              |
|----text-literal-+---------------------------------------------|
 
Context-unit
 
|--+-PARAGRAPH-+------------------------------------------------|
   '-SENTENCE--'
 
Text-literal
 
|--+--------------------------------+--+----------------+------->
   +-PRECISE-FORM-OF----------------+  '-WEIGHT--number-'
   +-STEMMED-FORM-OF----------------+
   '-FUZZY-FORM-OF--+-------------+-'
                    '-match-level-'
 
>--"word-or-phrase"--+----------------------------+-------------|
                     '-ESCAPE--"escape-character"-'
 
thesaurus-invocation
 
|--THESAURUS--"thesaurus-name"--EXPAND-------------------------->
 
>--+-+-SYNONYM------------+--TERM OF--| text-literal |-------------------+--|
   | +-RELATED------------+                                              |
   | '-RELATION--(number)-'                                              |
   '-+-BROADER--+--TERM OF--| text-literal |--+------------------------+-'
     '-NARROWER-'                             '-FOR--count--+-LEVEL--+-'
                                                            '-LEVELS-'
 
Attribute-factor
 
|--ATTRIBUTE--"attribute-name"---------------------------------->
 
>--+-BETWEEN--valueFrom AND valueTo-+---------------------------|
   +->--valueFROM-------------------+
   '-<--valueTO---------------------'
 
freetext-argument
 
|--IS-ABOUT--+----------+--"word-or-phrase"--------------------->
             '-language-'
 
>--+----------------------------+-------------------------------|
   '-ESCAPE--"escape-character"-'
 
 

Examples

Examples are given in Specifying SQL search arguments.

Search parameters

RESULT LIMIT number
A keyword specifying the maximum number of results to be returned by the full-text search.

The RESULT LIMIT should be used together with the SCORE function to ensure that the returned results are scored and only the best results are processed.

EXPANSION LIMIT number
A keyword specifying the maximum number of times a term can be expanded for searching. For example, to determine how many times you can expand the search term 'a*'.

STOP SEARCH AFTER number DOCUMENTS(S)
A keyword specifying the search threshold. The search is stopped when the number of documents is reached during the search and an intermediate result is returned. A lower value will increase the search performance, but may lead to fewer results and omit documents with a potentially high rank.

Note that there is no default value and the number value must be a positive integer.

boolean-search-expression
The search-terms and search-factors can be combined using boolean operators NOT, AND, OR, ACCUM, and MINUS according to the syntax diagrams. The operators have the following precedence order (with the strongest first): NOT> MINUS = ACCUM = AND > OR. This can be seen in the following example:
"Pilot" MINUS "passenger" & "vehicle" | "transport" & "public"

is evaluated as:

(("Pilot" MINUS "passenger") & ("vehicle")) | ("transport" & "public")

The operator ACCUM evaluates to true, if one of the boolean arguments evaluates to true (which is comparable to the OR operator). The rank value is computed by accumulating rank values from both operands. The ACCUM operator has the same binding (precedence) as AND. The operator MINUS evaluates to true, if the left operand evaluates to true. The rank value is computed by taking the rank value for the left operand and subtracting a penalty, if the right operand evaluates to true.

search-primary
A search-primary consisting of a thesaurus-invocation evaluates to true, if any of the expanded text-literals is found in the (specified section of the document). A search-primary, consisting of a text-literal-list evaluates to true, if any of the text-literals is found in the (specified section of the document).

SECTION(S) section-name

A keyword specifying one or more sections in a structured document that the search is to be restricted to. The section name must be specified in a model file specified at index creation time, see CREATE INDEX.

Section names are case sensitive. Ensure that the section name in the model file and query is identical.

This model describes the structure of documents that contain identifiable sections, so the content of these sections can be individually searched. Section names cannot be masked using masking characters. The positive-search-factor using the SECTION clause evaluates to true, if the search primary is found in one of the sections.

context-argument IN SAME context-unit AS context-argument AND context-argument ...
This condition lets you search for a combination of text-literals occurring in the same paragraph or same sentence. Context arguments are always equivalent to text-literal-lists, and thesaurus expansion may be used to expand a text-literal to such a list.

The condition evaluates to true, if there is a context-unit (paragraph respectively sentence) in the document, which contains at least one of the text-literals of each expanded context-argument. This can be seen in the following example:

("a","b") IN SAME PARAGRAPH AS ("c","d") 
          AND THESAURUS "t1" EXPAND SYNONYM TERM OF "e".

Assuming e1, e2 as synonyms of e, the following paragraphs would match:

".. a c e .." ,  ".. a c e1..",  "a c e2..",
".. a d e .." ,  ".. a d e1..",  "a d e2..",
".. b c e .." ,  ".. b c e1..",  "b c e2..",
".. b d e .." ,  ".. b d e1..",  "b d e2..".

PRECISE FORM OF
A keyword that causes the word (or each word in the phrase) following PRECISE FORM OF to be searched for exactly as typed. This form of search is case-sensitive; that is, the use of upper- and lowercase letters is significant. For example, if you search for mouse, you do not find "Mouse".

STEMMED FORM OF
A keyword that causes the word (or each word in the phrase) following STEMMED FORM OF to be reduced to its word stem before the search is carried out. This form of search is not case-sensitive. For example, if you search for mouse, you find "Mouse".

The way in which words are reduced to their stem form is language-dependent. Currently, only English is supported and the word must follow regular inflection endings.

FUZZY FORM OF
A keyword for making a "fuzzy" search, which is a search for terms that have a similar spelling to the search term. This is particularly useful when searching in documents that were created by an Optical Character Recognition (OCR) program. Such documents often include misspelled words. For example, the word economy could be recognized by an OCR program as econony. Note that the first three characters must match and that fuzzy search cannot be used if a word in the search atom contains a masking character.

match level
An integer from 1 to 100 specifying the degree of similarity, where 100 is more similar than 1. 100 specifies an "exact match", and 60 is already considered a very "fuzzy value". The fuzzier the match level is, the longer the lapsed search time, since more documents qualify for the search. The default match level is 70.

WEIGHT number
Associates a text-literal with a weight value to change the default score. The allowed weight values are integers between 0 (the lowest score weighting) and 1000 (the highest); the default value is 100.

word-or-phrase
A word or phrase to be searched for. The characters that can be used within a word are language-dependent. It is also language-dependent whether words need to be separated by separator characters. For English and most other languages, each word in a phrase must be separated by a blank character.

To search for a character string that contains double quotation marks, type the double quotation marks twice. For example, to search for the text "wildcard" character, use:

"""wildcard"" character"

Note that in the example, it is only possible to search for one set of quotation marks. You cannot search for two quotation marks in a sequence. There is also a maximum length of 128 bytes for each word or phrase.

Masking characters
A word can contain the following masking characters:

_ (underscore)
Represents any single character.

% (percent)
Represents any number of arbitrary characters. If a word consists of a single %, then it represents an optional word of any length. A word cannot be composed exclusively of masking characters, except when a single % is used to represent an optional word. If you use a masking character, you cannot use THESAURUS. Masking characters cannot follow a non-alphanumeric character.

ESCAPE escape-character
A character that identifies the next character as one to be searched for and not as one to be used as a masking character. For example, if an escape-character is $, then $%, $_, and $$ represent %, _, and $ respectively. Any % and _ characters not preceded by $ represent masking characters.

THESAURUS thesaurus-name
A keyword used to specify the name of the thesaurus to be used to expand a text-literal. The thesaurus name is the file name (without its extension) of a thesaurus that has been compiled using the thesaurus compiler. It must be located in <os-dependent>/sqllib/db2ext/thes. Alternatively, the path can be specified preceding the file name.

EXPAND relation
Specifies which relation is used to expand the text-literal using the thesaurus. The thesaurus has predefined relations described in the DB2EXTTH command. These are referred to using the following keywords:

For user-defined relations, use RELATION(number), that corresponds to the relation definition in DB2TEXTTH.

TERM OF text-literal

The text-literal, to which other search terms are to be added from the thesaurus.

count LEVELS

A keyword used to specify the number of levels (the depth) of terms in the thesaurus that are to be used to expand the search term for the given relation. If you do not specify this keyword, a count of 1 is assumed. The value of depth must be a positive integer value.

ATTRIBUTE Attribute-name
Searches for documents having attributes matching the specified condition. The attribute-name refers to the name of an attribute expression in the CREATE INDEX command, or to an attribute definition in the document model file.

The attribute-factor is allowed for attributes of type double only. The precision of the value is guaranteed for 15 digits. Numbers of 16 characters and above are rounded. Usage of masking characters is not allowed in attribute-name, valueFrom and, valueTo. For an explanation, see the following:

BETWEEN valueFrom AND valueTo
A BETWEEN attribute factor evaluates to true if the value of the attribute is greater than (not equal to) valueFrom and lower than (not equal to) valueTo.

>valueFrom
A ">" attribute factor evaluates to true if the value of the attribute is greater than (not equal to) valueFrom.

<valueTo
A "<" attribute factor evaluates to true if the value of the attribute is lower than (not equal to) valueTo.

If the attribute name in the CREATE INDEX command is specified with quotes, or is defined in a model file, the specified attribute name must match exactly. Whereas, if no quotes are specified in the CREATE INDEX command, the attribute name must be in uppercase.

IS ABOUT language word-or-phrase
An option that lets you specify a free-text search argument. It should be used to get a different kind of score algorithm as it checks the positioning of the terms within the documents. The closer together the terms used in the word-or-phrase are, the more terms are included in the document and the higher the score value returned.

The values allowed for language are described in Appendix E, Supported languages, and are only relevant for the Thai language. If not specified, the language en_US is used as default. The language is used only for tokenization of the word-or-phrase.

Note that IS ABOUT is useful only if the score values are requested and the search results are ordered by score values.


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]