Main Page   Class Hierarchy   Alphabetical List   Compound List   File List   Compound Members   File Members   Search  

StringSearch Class Reference

StringSearch is a SearchIterator that provides language-sensitive text searching based on the comparison rules defined in a RuleBasedCollator object. More...

#include <stsearch.h>

Inheritance diagram for StringSearch::

SearchIterator List of all members.

Public Methods

 StringSearch (const UnicodeString &pattern, const UnicodeString &text, const Locale &locale, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument locale language rule set. More...

 StringSearch (const UnicodeString &pattern, const UnicodeString &text, RuleBasedCollator *coll, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument collator language rule set. More...

 StringSearch (const UnicodeString &pattern, CharacterIterator &text, const Locale &locale, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument locale language rule set. More...

 StringSearch (const UnicodeString &pattern, CharacterIterator &text, RuleBasedCollator *coll, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument collator language rule set. More...

 StringSearch (const StringSearch &that)
 Copy constructor that creates a StringSearch instance with the same behavior, and iterating over the same text. More...

virtual ~StringSearch (void)
 Destructor. More...

StringSearch & operator= (const StringSearch &that)
 Assignment operator. More...

virtual UBool operator== (const SearchIterator &that) const
 Equality operator. More...

virtual void setOffset (int32_t position, UErrorCode &status)
 Sets the index to point to the given position, and clears any state that's affected. More...

virtual int32_t getOffset (void) const
 Return the current index in the text being searched. More...

virtual void setText (const UnicodeString &text, UErrorCode &status)
 Set the target text to be searched. More...

virtual void setText (CharacterIterator &text, UErrorCode &status)
 Set the target text to be searched. More...

RuleBasedCollatorgetCollator () const
 Gets the collator used for the language rules. More...

void setCollator (RuleBasedCollator *coll, UErrorCode &status)
 Sets the collator used for the language rules. More...

void setPattern (const UnicodeString &pattern, UErrorCode &status)
 Sets the pattern used for matching. More...

const UnicodeStringgetPattern () const
 Gets the search pattern. More...

virtual void reset ()
 Reset the iteration. More...

virtual SearchIteratorsafeClone (void) const
 Returns a copy of StringSearch with the same behavior, and iterating over the same text, as this one. More...


Protected Methods

virtual int32_t handleNext (int32_t position, UErrorCode &status)
 Search forward for matching text, starting at a given location. More...

virtual int32_t handlePrev (int32_t position, UErrorCode &status)
 Search backward for matching text, starting at a given location. More...


Private Attributes

RuleBasedCollator m_collator_
 RuleBasedCollator, contains exactly the same UCollator * in m_strsrch_. More...

UnicodeString m_pattern_
 Pattern text. More...

UnicodeString m_collation_rules_
 Corresponding collation rules. More...

UStringSearchm_strsrch_
 String search struct data. More...


Detailed Description

StringSearch is a SearchIterator that provides language-sensitive text searching based on the comparison rules defined in a RuleBasedCollator object.

StringSearch ensures that language eccentricity can be handled, e.g. for the German collator, characters ß and SS will be matched if case is chosen to be ignored. See the "ICU Collation Design Document" for more information.

The algorithm implemented is a modified form of the Boyer Moore's search. For more information see "Efficient Text Searching in Java", published in Java Report in February, 1999, for further information on the algorithm.

There are 2 match options for selection:
Let S' be the sub-string of a text string S between the offsets start and end <start, end>.
A pattern string P matches a text string S at the offsets <start, end> if

 
 option 1. Some canonical equivalent of P matches some canonical equivalent 
           of S'
 option 2. P matches S' and if P starts or ends with a combining mark, 
           there exists no non-ignorable combining mark before or after S? 
           in S respectively. 
 
Option 2. will be the default·

This search has APIs similar to that of other text iteration mechanisms such as the break iterators in BreakIterator. Using these APIs, it is easy to scan through text looking for all occurances of a given pattern. This search iterator allows changing of direction by calling a reset followed by a next or previous. Though a direction change can occur without calling reset first, this operation comes with some speed penalty. Match results in the forward direction will match the result matches in the backwards direction in the reverse order

SearchIterator provides APIs to specify the starting position within the text string to be searched, e.g. setOffset, preceding and following. Since the starting position will be set as it is specified, please take note that there are some danger points which the search may render incorrect results: