Main Page   Class Hierarchy   Alphabetical List   Compound List   File List   Compound Members   File Members  

TransliterationRule Class Reference

A transliteration rule used by RuleBasedTransliterator. More...

#include <rbt_rule.h>

List of all members.

Public Types

enum  { MISMATCH, PARTIAL_MATCH, FULL_MATCH }
 Constants returned by getMatchDegree() indicating the degree of match between the text and this rule. More...


Public Methods

 TransliterationRule (const UnicodeString& input, int32_t anteContextPos, int32_t postContextPos, const UnicodeString& output, int32_t cursorPos, int32_t cursorOffset, int32_t* adoptedSegs, UBool anchorStart, UBool anchorEnd, UErrorCode& status)
 Construct a new rule with the given input, output text, and other attributes. More...

 TransliterationRule (const UnicodeString& input, int32_t anteContextPos, int32_t postContextPos, const UnicodeString& output, int32_t cursorPos, UErrorCode& status)
 Construct a new rule with the given input, output text, and other attributes. More...

 TransliterationRule (TransliterationRule& other)
 Copy constructor.

virtual ~TransliterationRule ()
 Destructor.

virtual int32_t getCursorPos (void) const
 Return the position of the cursor within the output string. More...

virtual int32_t getAnteContextLength (void) const
 Return the preceding context length. More...

int16_t getIndexValue (const TransliterationRuleData& data) const
 Internal method. More...

int32_t replace (Replaceable& text, int32_t offset, const TransliterationRuleData& data) const
 Do a replacement of the input pattern with the output text in the given string, at the given offset. More...

UBool matchesIndexValue (uint8_t v, const TransliterationRuleData& data) const
 Internal method. More...

virtual UBool masks (const TransliterationRule& r2) const
 Return true if this rule masks another rule. More...

virtual UBool matches (const Replaceable& text, const UTransPosition& pos, const TransliterationRuleData& data, const UnicodeFilter* filter) const
 Return true if this rule matches the given text. More...

virtual int32_t getMatchDegree (const Replaceable& text, const UTransPosition& pos, const TransliterationRuleData& data, const UnicodeFilter* filter) const
 Return the degree of match between this rule and the given text. More...

virtual int32_t getRegionMatchLength (const Replaceable& text, const UTransPosition& pos, const TransliterationRuleData& data, const UnicodeFilter* filter) const
 Return the number of characters of the text that match this rule. More...

virtual UBool charMatches (UChar keyChar, const Replaceable& textChar, int32_t index, const UTransPosition& pos, const TransliterationRuleData& data, const UnicodeFilter* filter) const
 Return true if the given key matches the given text. More...


Static Public Attributes

const UChar ETHER
 The character at index i, where i < contextStart || i >= contextLimit, is ETHER. More...


Detailed Description

A transliteration rule used by RuleBasedTransliterator.

TransliterationRule is an immutable object.

A rule consists of an input pattern and an output string. When the input pattern is matched, the output string is emitted. The input pattern consists of zero or more characters which are matched exactly (the key) and optional context. Context must match if it is specified. Context may be specified before the key, after the key, or both. The key, preceding context, and following context may contain variables. Variables represent a set of Unicode characters, such as the letters a through z. Variables are detected by looking up each character in a supplied variable list to see if it has been so defined.

Author(s):
Alan Liu

Definition at line 36 of file rbt_rule.h.


Member Enumeration Documentation

anonymous enum
 

Constants returned by getMatchDegree() indicating the degree of match between the text and this rule.

See also:
getMatchDegree
Enumeration values:
MISMATCH   Constant returned by getMatchDegree() indicating a mismatch between the text and this rule.

One or more characters of the context or key do not match the text.

PARTIAL_MATCH   Constant returned by getMatchDegree() indicating a partial match between the text and this rule.

All characters of the text match the corresponding context or key, but more characters are required for a complete match. There are some key or context characters at the end of the pattern that remain unmatched because the text isn't long enough.

FULL_MATCH   Constant returned by getMatchDegree() indicating a complete match between the text and this rule.

The text matches all context and key characters.

Definition at line 45 of file rbt_rule.h.


Constructor & Destructor Documentation

TransliterationRule::TransliterationRule ( const UnicodeString & input,
int32_t anteContextPos,
int32_t postContextPos,
const UnicodeString & output,
int32_t cursorPos,
int32_t cursorOffset,
int32_t * adoptedSegs,
UBool anchorStart,
UBool anchorEnd,
UErrorCode & status )
 

Construct a new rule with the given input, output text, and other attributes.

A cursor position may be specified for the output text.

Parameters:
input   input string, including key and optional ante and post context
anteContextPos   offset into input to end of ante context, or -1 if none. Must be <= input.length() if not -1.
postContextPos   offset into input to start of post context, or -1 if none. Must be <= input.length() if not -1, and must be >= anteContextPos.
output   output string
cursorPos   offset into output at which cursor is located, or -1 if none. If less than zero, then the cursor is placed after the output; that is, -1 is equivalent to output.length(). If greater than output.length() then an exception is thrown.
cursorOffset   an offset to be added to cursorPos to position the cursor either in the ante context, if < 0, or in the post context, if > 0. For example, the rule "abc{def} > | @@ xyz;" changes "def" to "xyz" and moves the cursor to before "a". It would have a cursorOffset of -3.
adoptedSegs   array of 2n integers. Each of n pairs consists of offset, limit for a segment of the input string. Characters in the output string refer to these segments if they are in a special range determined by the associated RuleBasedTransliterator.Data object. May be null if there are no segments.
anchorStart   TRUE if the the rule is anchored on the left to the context start
anchorEnd   TRUE if the rule is anchored on the right to the context limit

TransliterationRule::TransliterationRule ( const UnicodeString & input,
int32_t anteContextPos,
int32_t postContextPos,
const UnicodeString & output,
int32_t cursorPos,
UErrorCode & status )
 

Construct a new rule with the given input, output text, and other attributes.

A cursor position may be specified for the output text.

Parameters:
input   input string, including key and optional ante and post context
anteContextPos   offset into input to end of ante context, or -1 if none. Must be <= input.length() if not -1.
postContextPos   offset into input to start of post context, or -1 if none. Must be <= input.length() if not -1, and must be >= anteContextPos.
output   output string
cursorPos   offset into output at which cursor is located, or -1 if none. If less than zero, then the cursor is placed after the output; that is, -1 is equivalent to output.length(). If greater than output.length() then an exception is thrown.

TransliterationRule::TransliterationRule ( TransliterationRule & other )
 

Copy constructor.

TransliterationRule::~TransliterationRule ( ) [virtual]
 

Destructor.


Member Function Documentation

UBool TransliterationRule::charMatches ( UChar keyChar,
const Replaceable & textChar,
int32_t index,
const UTransPosition & pos,
const TransliterationRuleData & data,
const UnicodeFilter * filter ) const [virtual]
 

Return true if the given key matches the given text.

This method accounts for the fact that the key character may represent a character set. Note that the key and text characters may not be interchanged without altering the results.

Parameters:
keyChar   a character in the match key
textChar   a character in the text being transliterated
data   a dictionary of variables mapping Character to UnicodeSet
filter   the filter. Any character for which filter.isIn() returns false will not be altered by this transliterator. If filter is null then no filtering is applied.

int32_t TransliterationRule::getAnteContextLength ( void ) const [virtual]
 

Return the preceding context length.

This method is needed to support the Transliterator method getMaximumContextLength().

int32_t TransliterationRule::getCursorPos ( void ) const [virtual]
 

Return the position of the cursor within the output string.

Returns:
a value from 0 to getOutput().length(), inclusive.

int16_t TransliterationRule::getIndexValue ( const TransliterationRuleData & data ) const
 

Internal method.

Returns 8-bit index value for this rule. This is the low byte of the first character of the key, unless the first character of the key is a set. If it's a set, or otherwise can match multiple keys, the index value is -1.

int32_t TransliterationRule::getMatchDegree ( const Replaceable & text,
const UTransPosition & pos,
const TransliterationRuleData & data,
const UnicodeFilter * filter ) const [virtual]
 

Return the degree of match between this rule and the given text.

The degree of match may be mismatch, a partial match, or a full match. A mismatch means at least one character of the text does not match the context or key. A partial match means some context and key characters match, but the text is not long enough to match all of them. A full match means all context and key characters match.

Parameters:
text   the text, both translated and untranslated
start   the beginning index, inclusive; 0 <= start <= limit.
limit   the ending index, exclusive; start <= limit <= text.length().
cursor   position at which to translate next, representing offset into text. This value must be between start and limit.
filter   the filter. Any character for which filter.isIn() returns false will not be altered by this transliterator. If filter is null then no filtering is applied.
Returns:
one of MISMATCH, PARTIAL_MATCH, or FULL_MATCH.
See also:
MISMATCH , PARTIAL_MATCH , FULL_MATCH

int32_t TransliterationRule::getRegionMatchLength ( const Replaceable & text,
const UTransPosition & pos,
const TransliterationRuleData & data,
const UnicodeFilter * filter ) const [virtual]
 

Return the number of characters of the text that match this rule.

If there is a mismatch, return -1. If the text is not long enough to match any characters, return 0.

Parameters:
text   the text, both translated and untranslated
start   the beginning index, inclusive; 0 <= start <= limit.
limit   the ending index, exclusive; start <= limit <= text.length().
cursor   position at which to translate next, representing offset into text. This value must be between start and limit.
data   a dictionary of variables mapping Character to UnicodeSet
filter   the filter. Any character for which filter.isIn() returns false will not be altered by this transliterator. If filter is null then no filtering is applied.
Returns:
-1 if there is a mismatch, 0 if the text is not long enough to match any characters, otherwise the number of characters of text that match this rule.

UBool TransliterationRule::masks ( const TransliterationRule & r2 ) const [virtual]
 

Return true if this rule masks another rule.

If r1 masks r2 then r1 matches any input string that r2 matches. If r1 masks r2 and r2 masks r1 then r1 == r2. Examples: "a>x" masks "ab>y". "a>x" masks "a[b]>y". "[c]a>x" masks "[dc]a>y".

UBool TransliterationRule::matches ( const Replaceable & text,
const UTransPosition & pos,
const TransliterationRuleData & data,
const UnicodeFilter * filter ) const [virtual]
 

Return true if this rule matches the given text.

Parameters:
text   the text, both translated and untranslated
start   the beginning index, inclusive; 0 <= start <= limit.
limit   the ending index, exclusive; start <= limit <= text.length().
cursor   position at which to translate next, representing offset into text. This value must be between start and limit.
filter   the filter. Any character for which filter.isIn() returns false will not be altered by this transliterator. If filter is null then no filtering is applied.

UBool TransliterationRule::matchesIndexValue ( uint8_t v,
const TransliterationRuleData & data ) const
 

Internal method.

Returns true if this rule matches the given index value. The index value is an 8-bit integer, 0..255, representing the low byte of the first character of the key. It matches this rule if it matches the first character of the key, or if the first character of the key is a set, and the set contains any character with a low byte equal to the index value. If the rule contains only ante context, as in foo)>bar, then it will match any key.

int32_t TransliterationRule::replace ( Replaceable & text,
int32_t offset,
const TransliterationRuleData & data ) const
 

Do a replacement of the input pattern with the output text in the given string, at the given offset.

This method assumes that a match has already been found in the given text at the given position.

Parameters:
text   the text containing the substring to be replaced
offset   the offset into the text at which the pattern matches. This is the offset to the point after the ante context, if any, and before the match string and any post context.
data   the RuleBasedTransliterator.Data object specifying context for this transliterator.
Returns:
the change in the length of the text


Member Data Documentation

const UChar TransliterationRule::ETHER [static]
 

The character at index i, where i < contextStart || i >= contextLimit, is ETHER.

This allows explicit matching by rules and UnicodeSets of text outside the context. In traditional terms, this allows anchoring at the start and/or end.

Definition at line 79 of file rbt_rule.h.


The documentation for this class was generated from the following file:
Generated at Tue Dec 5 17:56:25 2000 for ICU by doxygen1.2.3 written by Dimitri van Heesch, © 1997-2000