Main Page   Class Hierarchy   Alphabetical List   Compound List   File List   Compound Members   File Members  

ComposedCharIter Class Reference

ComposedCharIter is an iterator class that returns all of the precomposed characters defined in the Unicode standard, along with their decomposed forms. More...

#include <compitr.h>

List of all members.

Public Types

enum  { DONE = 0xffff }
 Constant that indicates the iteration has completed. More...


Public Methods

 ComposedCharIter ()
 Construct a new ComposedCharIter. More...

 ComposedCharIter (UBool compat, int32_t options)
 Constructs a non-default ComposedCharIter with optional behavior. More...

UBool hasNext (void) const
 Determines whether there any precomposed Unicode characters not yet returned by. More...

UChar next (void)
 Returns the next precomposed Unicode character. More...

void getDecomposition (UnicodeString& result) const
 Returns the Unicode decomposition of the current character. More...


Detailed Description

ComposedCharIter is an iterator class that returns all of the precomposed characters defined in the Unicode standard, along with their decomposed forms.

This is often useful when building data tables (e.g. collation tables) which need to treat composed and decomposed characters equivalently.

For example, imagine that you have built a collation table with ordering rules for the Normalizer#DECOMP forms of all characters used in a particular language. When you process input text using this table, the text must first be decomposed so that it matches the form used in the table. This can impose a performance penalty that may be unacceptable in some situations.

You can avoid this problem by ensuring that the collation table contains rules for both the decomposed and composed versions of each character. To do so, use a ComposedCharIter to iterate through all of the composed characters in Unicode. If the decomposition for that character consists solely of characters that are listed in your ruleset, you can add a new rule for the composed character that makes it equivalent to its decomposition sequence.

Note that ComposedCharIter iterates over a static table of the composed characters in Unicode. If you want to iterate over the composed characters in a particular string, use Normalizer instead.

When constructing a ComposedCharIter there is one optional feature that you can enable or disable:

ComposedCharIter is currently based on version 2.1.8 of the Unicode Standard. It will be updated as later versions of Unicode are released.

Definition at line 58 of file compitr.h.


Member Enumeration Documentation

anonymous enum
 

Constant that indicates the iteration has completed.

#next returns this value when there are no more composed characters over which to iterate. This value is equal to Normalizer::DONE.

Enumeration values:
DONE  

Definition at line 67 of file compitr.h.


Constructor & Destructor Documentation

ComposedCharIter::ComposedCharIter ( )
 

Construct a new ComposedCharIter.

The iterator will return all Unicode characters with canonical decompositions, including Korean Hangul characters.

ComposedCharIter::ComposedCharIter ( UBool compat,
int32_t options )
 

Constructs a non-default ComposedCharIter with optional behavior.

Parameters:
compat   false for canonical decompositions only; true for both canonical and compatibility decompositions.
options   Optional decomposition features. Currently, the only supported option is Normalizer#IGNORE_HANGUL, which causes this ComposedCharIter not to iterate over the Hangul characters and their corresponding Jamo decompositions.


Member Function Documentation

void ComposedCharIter::getDecomposition ( UnicodeString & result ) const
 

Returns the Unicode decomposition of the current character.

This method returns the decomposition of the precomposed character most recently returned by #next. The resulting decomposition is affected by the settings of the options passed to the constructor. Normalizer#COMPATIBILITY and Normalizer#NO_HANGUL options passed to the constructor.

UBool ComposedCharIter::hasNext ( void ) const
 

Determines whether there any precomposed Unicode characters not yet returned by.

#next.

UChar ComposedCharIter::next ( void )
 

Returns the next precomposed Unicode character.

Repeated calls to next return all of the precomposed characters defined by Unicode, in ascending order. After all precomposed characters have been returned, #hasNext will return false and further calls to next will return #DONE.


The documentation for this class was generated from the following file:
Generated at Tue Dec 5 17:56:01 2000 for ICU by doxygen1.2.3 written by Dimitri van Heesch, © 1997-2000