RuleBasedTransliterator
.
More...
#include <rbt_rule.h>
Public Types | |
enum | { MISMATCH, PARTIAL_MATCH, FULL_MATCH } |
Constants returned by getMatchDegree() indicating the degree of match between the text and this rule. More... | |
Public Methods | |
TransliterationRule (const UnicodeString& input, int32_t anteContextPos, int32_t postContextPos, const UnicodeString& output, int32_t cursorPos, int32_t cursorOffset, int32_t* adoptedSegs, UBool anchorStart, UBool anchorEnd, UErrorCode& status) | |
Construct a new rule with the given input, output text, and other attributes. More... | |
TransliterationRule (const UnicodeString& input, int32_t anteContextPos, int32_t postContextPos, const UnicodeString& output, int32_t cursorPos, UErrorCode& status) | |
Construct a new rule with the given input, output text, and other attributes. More... | |
TransliterationRule (TransliterationRule& other) | |
Copy constructor. | |
virtual | ~TransliterationRule () |
Destructor. | |
virtual int32_t | getCursorPos (void) const |
Return the position of the cursor within the output string. More... | |
virtual int32_t | getAnteContextLength (void) const |
Return the preceding context length. More... | |
int16_t | getIndexValue (const TransliterationRuleData& data) const |
Internal method. More... | |
int32_t | replace (Replaceable& text, int32_t offset, const TransliterationRuleData& data) const |
Do a replacement of the input pattern with the output text in the given string, at the given offset. More... | |
UBool | matchesIndexValue (uint8_t v, const TransliterationRuleData& data) const |
Internal method. More... | |
virtual UBool | masks (const TransliterationRule& r2) const |
Return true if this rule masks another rule. More... | |
virtual UBool | matches (const Replaceable& text, const UTransPosition& pos, const TransliterationRuleData& data, const UnicodeFilter* filter) const |
Return true if this rule matches the given text. More... | |
virtual int32_t | getMatchDegree (const Replaceable& text, const UTransPosition& pos, const TransliterationRuleData& data, const UnicodeFilter* filter) const |
Return the degree of match between this rule and the given text. More... | |
virtual int32_t | getRegionMatchLength (const Replaceable& text, const UTransPosition& pos, const TransliterationRuleData& data, const UnicodeFilter* filter) const |
Return the number of characters of the text that match this rule. More... | |
virtual UBool | charMatches (UChar keyChar, const Replaceable& textChar, int32_t index, const UTransPosition& pos, const TransliterationRuleData& data, const UnicodeFilter* filter) const |
Return true if the given key matches the given text. More... | |
Static Public Attributes | |
const UChar | ETHER |
The character at index i, where i < contextStart || i >= contextLimit, is ETHER. More... |
RuleBasedTransliterator
.
TransliterationRule
is an immutable object.
A rule consists of an input pattern and an output string. When the input pattern is matched, the output string is emitted. The input pattern consists of zero or more characters which are matched exactly (the key) and optional context. Context must match if it is specified. Context may be specified before the key, after the key, or both. The key, preceding context, and following context may contain variables. Variables represent a set of Unicode characters, such as the letters a through z. Variables are detected by looking up each character in a supplied variable list to see if it has been so defined.
Definition at line 36 of file rbt_rule.h.
|
Constants returned by
Definition at line 45 of file rbt_rule.h. |
|
Construct a new rule with the given input, output text, and other attributes. A cursor position may be specified for the output text.
|
|
Construct a new rule with the given input, output text, and other attributes. A cursor position may be specified for the output text.
|
|
Copy constructor.
|
|
Destructor.
|
|
Return true if the given key matches the given text. This method accounts for the fact that the key character may represent a character set. Note that the key and text characters may not be interchanged without altering the results.
|
|
Return the preceding context length.
This method is needed to support the |
|
Return the position of the cursor within the output string.
|
|
Internal method. Returns 8-bit index value for this rule. This is the low byte of the first character of the key, unless the first character of the key is a set. If it's a set, or otherwise can match multiple keys, the index value is -1. |
|
Return the degree of match between this rule and the given text. The degree of match may be mismatch, a partial match, or a full match. A mismatch means at least one character of the text does not match the context or key. A partial match means some context and key characters match, but the text is not long enough to match all of them. A full match means all context and key characters match.
|
|
Return the number of characters of the text that match this rule. If there is a mismatch, return -1. If the text is not long enough to match any characters, return 0.
|
|
Return true if this rule masks another rule. If r1 masks r2 then r1 matches any input string that r2 matches. If r1 masks r2 and r2 masks r1 then r1 == r2. Examples: "a>x" masks "ab>y". "a>x" masks "a[b]>y". "[c]a>x" masks "[dc]a>y". |
|
Return true if this rule matches the given text.
|
|
Internal method. Returns true if this rule matches the given index value. The index value is an 8-bit integer, 0..255, representing the low byte of the first character of the key. It matches this rule if it matches the first character of the key, or if the first character of the key is a set, and the set contains any character with a low byte equal to the index value. If the rule contains only ante context, as in foo)>bar, then it will match any key. |
|
Do a replacement of the input pattern with the output text in the given string, at the given offset. This method assumes that a match has already been found in the given text at the given position.
|
|
The character at index i, where i < contextStart || i >= contextLimit, is ETHER. This allows explicit matching by rules and UnicodeSets of text outside the context. In traditional terms, this allows anchoring at the start and/or end. Definition at line 79 of file rbt_rule.h. |