#include <tblcoll.h>
Inheritance diagram for RuleBasedCollator:
Public Methods | |
RuleBasedCollator (const UnicodeString& rules, UErrorCode& status) | |
RuleBasedCollator constructor. More... | |
RuleBasedCollator (const UnicodeString& rules, ECollationStrength collationStrength, UErrorCode& status) | |
RuleBasedCollator constructor. More... | |
RuleBasedCollator (const UnicodeString& rules, Normalizer::EMode decompositionMode, UErrorCode& status) | |
RuleBasedCollator constructor. More... | |
RuleBasedCollator (const UnicodeString& rules, ECollationStrength collationStrength, Normalizer::EMode decompositionMode, UErrorCode& status) | |
RuleBasedCollator constructor. More... | |
RuleBasedCollator (const RuleBasedCollator& other) | |
Copy constructor. More... | |
virtual | ~RuleBasedCollator () |
Destructor. | |
RuleBasedCollator& | operator= (const RuleBasedCollator& other) |
Assignment operator. More... | |
virtual UBool | operator== (const Collator& other) const |
Returns true if argument is the same as this object. More... | |
virtual UBool | operator!= (const Collator& other) const |
Returns true if argument is not the same as this object. More... | |
virtual Collator* | clone (void) const |
Makes a deep copy of the object. More... | |
virtual CollationElementIterator* | createCollationElementIterator ( const UnicodeString& source) const |
Creates a collation element iterator for the source string. More... | |
virtual CollationElementIterator* | createCollationElementIterator ( const CharacterIterator& source) const |
Creates a collation element iterator for the source. More... | |
virtual EComparisonResult | compare (const UnicodeString& source, const UnicodeString& target) const |
Compares a range of character data stored in two different strings based on the collation rules. More... | |
virtual EComparisonResult | compare (const UnicodeString& source, const UnicodeString& target, int32_t length) const |
Compares a range of character data stored in two different strings based on the collation rules up to the specified length. More... | |
virtual EComparisonResult | compare (const UChar* source, int32_t sourceLength, const UChar* target, int32_t targetLength) const |
The comparison function compares the character data stored in two different string arrays. More... | |
virtual CollationKey& | getCollationKey (const UnicodeString& source, CollationKey& key, UErrorCode& status) const |
Transforms a specified region of the string into a series of characters that can be compared with CollationKey.compare. More... | |
virtual CollationKey& | getCollationKey (const UChar *source, int32_t sourceLength, CollationKey& key, UErrorCode& status) const |
Transforms a specified region of the string into a series of characters that can be compared with CollationKey.compare. More... | |
virtual int32_t | hashCode (void) const |
Generates the hash code for the rule-based collation object. More... | |
const UnicodeString& | getRules (void) const |
Gets the table-based rules for the collation object. More... | |
int32_t | getMaxExpansion (int32_t order) const |
Return the maximum length of any expansion sequences that end with the specified comparison order. More... | |
virtual UClassID | getDynamicClassID (void) const |
Returns a unique class ID POLYMORPHICALLY. More... | |
uint8_t* | cloneRuleData (int32_t &length, UErrorCode &status) |
Returns the binary format of the class's rules. More... | |
UnicodeString | getRules (UColRuleOption delta) |
Returns current rules. More... | |
virtual void | setAttribute (UColAttribute attr, UColAttributeValue value, UErrorCode &status) |
Universal attribute setter. More... | |
virtual UColAttributeValue | getAttribute (UColAttribute attr, UErrorCode &status) |
Universal attribute getter. More... | |
virtual Collator* | safeClone (void) |
Thread safe cloning operation. More... | |
virtual EComparisonResult | compare (ForwardCharacterIterator &source, ForwardCharacterIterator &target) |
String compare that uses user supplied character iteration. More... | |
virtual int32_t | getSortKey (const UnicodeString& source, uint8_t *result, int32_t resultLength) const |
Get the sort key as an array of bytes from an UnicodeString. More... | |
virtual int32_t | getSortKey (const UChar *source, int32_t sourceLength, uint8_t *result, int32_t resultLength) const |
Get the sort key as an array of bytes from an UChar buffer. More... | |
virtual ECollationStrength | getStrength (void) const |
Determines the minimum strength that will be use in comparison or transformation. More... | |
virtual void | setStrength (ECollationStrength newStrength) |
Sets the minimum strength to be used in comparison or transformation. More... | |
virtual void | setDecomposition (Normalizer::EMode mode) |
Set the decomposition mode of the Collator object. More... | |
virtual Normalizer::EMode | getDecomposition (void) const |
Get the decomposition mode of the Collator object. More... | |
Static Public Methods | |
UClassID | getStaticClassID (void) |
Returns the class ID for this class. More... | |
Private Methods | |
RuleBasedCollator () | |
Default constructor. | |
RuleBasedCollator (UCollator *collator, UnicodeString *rule) | |
Constructor that takes in a UCollator struct. More... | |
RuleBasedCollator (const Locale& desiredLocale, UErrorCode& status) | |
RuleBasedCollator constructor. More... | |
void | setUCollator (const Locale& locale, UErrorCode& status) |
Creates the c struct for ucollator. More... | |
void | setUCollator (const char* locale, UErrorCode& status) |
Creates the c struct for ucollator. More... | |
void | setUCollator (UCollator *collator) |
Creates the c struct for ucollator. More... | |
Collator::EComparisonResult | getEComparisonResult ( const UCollationResult &result) const |
Converts C's UCollationResult to EComparisonResult. More... | |
Collator::ECollationStrength | getECollationStrength ( const UCollationStrength &strength) const |
Converts C's UCollationStrength to ECollationStrength. More... | |
UCollationStrength | getUCollationStrength ( const Collator::ECollationStrength &strength) const |
Converts C++'s ECollationStrength to UCollationStrength. More... | |
Private Attributes | |
UBool | dataIsOwned |
UCollator* | ucollator |
c struct for collation. More... | |
UnicodeString* | urulestring |
Rule UnicodeString. More... | |
Static Private Attributes | |
const int32_t | UNMAPPED |
const int32_t | CHARINDEX |
const int32_t | EXPANDCHARINDEX |
const int32_t | CONTRACTCHARINDEX |
const int32_t | PRIMARYORDERINCREMENT |
const int32_t | SECONDARYORDERINCREMENT |
const int32_t | TERTIARYORDERINCREMENT |
const int32_t | PRIMARYORDERMASK |
const int32_t | SECONDARYORDERMASK |
const int32_t | TERTIARYORDERMASK |
const int32_t | IGNORABLEMASK |
const int32_t | PRIMARYDIFFERENCEONLY |
const int32_t | SECONDARYDIFFERENCEONLY |
const int32_t | PRIMARYORDERSHIFT |
const int32_t | SECONDARYORDERSHIFT |
const int32_t | COLELEMENTSTART |
const int32_t | PRIMARYLOWZEROMASK |
const int32_t | RESETSECONDARYTERTIARY |
const int32_t | RESETTERTIARY |
const int32_t | PRIMIGNORABLE |
const int16_t | FILEID |
const char* | kFilenameSuffix |
char | fgClassID |
static class id. More... | |
Friends | |
class | RuleBasedCollatorStreamer |
class | CollationElementIterator |
class | Collator |
The user can create a customized table-based collation.
RuleBasedCollator maps characters to collation keys.
Table Collation has the following restrictions for efficiency (other subclasses may be used for more complex languages) :
1. If the French secondary ordering is specified in a collation object, it is applied to the whole object.
2. All non-mentioned Unicode characters are at the end of the collation order.
3. Private use characters are treated as identical. The private use area in Unicode is 0xE800-0xF8FF.
The collation table is composed of a list of collation rules, where each rule is of three forms:
The following demonstrates how to create your own collation rules:<modifier > <relation > < text-argument > <reset > < text-argument >
'@' : Indicates that secondary differences, such as accents, are sorted backwards, as in French.
'&' : Indicates that the next rule follows the position to where the reset text-argument would be sorted.
This sounds more complicated than it is in practice. For example, the following are equivalent ways of expressing the same thing:
Notice that the order is important, as the subsequent item goes immediately after the text-argument. The following are not equivalent:a < b < c a < b & b < c a < c & a < b
Either the text-argument must already be present in the sequence, or some initial substring of the text-argument must be present. (e.g. "a < b & ae < e" is valid since "a" is present in the sequence before "ae" is reset). In this latter case, "ae" is not entered and treated as a single character; instead, "e" is sorted as if it were expanded to two characters: "a" followed by an "e". This difference appears in natural languages: in traditional Spanish "ch" is treated as though it contracts to a single character (expressed as "c < ch < d"), while in traditional German "ä" (a-umlaut) is treated as though it expands to two characters (expressed as "a & ae ; ä < b").a < b & a < c a < c & a < b
Ignorable Characters
For ignorable characters, the first rule must start with a relation (the examples we have used above are really fragments; "a < b" really should be "< a < b"). If, however, the first relation is not "<", then all the text-arguments up to the first "<" are ignorable. For example, ", - < a < b" makes "-" an ignorable character, as we saw earlier in the word "black-birds". In the samples for different languages, you see that most accents are ignorable.
Normalization and Accents
The Collator object automatically normalizes text internally to separate accents from base characters where possible. This is done both when processing the rules, and when comparing two strings. Collator also uses the Unicode canonical mapping to ensure that combining sequences are sorted properly (for more information, see The Unicode Standard, Version 2.0 .)
Errors
The following are errors:
Examples: Simple: "< a < b < c < d" Norwegian: "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I< j,J < k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R< s,S< t,T < u,U< v,V< w,W< x,X< y,Y< z,Z < å=a°,Å=A° ;aa,AA< æ,Æ< ø,Ø"
To create a table-based collation object, simply supply the collation rules to the RuleBasedCollator contructor. For example:
UErrorCode status = U_ZERO_ERROR; RuleBasedCollator *mySimple = new RuleBasedCollator(Simple, status);
Another example:
To add rules on top of an existing table, simply supply the orginal rules and modifications to RuleBasedCollator constructor. For example,UErrorCode status = U_ZERO_ERROR; RuleBasedCollator *myNorwegian = new RuleBasedCollator(Norwegian, status);
Traditional Spanish (fragment): ... & C < ch , cH , Ch , CH ... German (fragment) : ...< y , Y < z , Z & AE, Ä & AE, ä & OE , Ö & OE, ö & UE , Ü & UE, ü Symbols (fragment): ...< y, Y < z , Z & Question-mark ; '?' & Ampersand ; '&' & Dollar-sign ; '$'
To create a collation object for traditional Spanish, the user can take the English collation rules and add the additional rules to the table. For example:
UErrorCode status = U_ZERO_ERROR; UnicodeString rules(DEFAULTRULES); rules += "& C < ch, cH, Ch, CH"; RuleBasedCollator *mySpanish = new RuleBasedCollator(rules, status);
In order to sort symbols in the similiar order of sorting their alphabetic equivalents, you can do the following,
UErrorCode status = U_ZERO_ERROR; UnicodeString rules(DEFAULTRULES); rules += "& Question-mark ; '?' & Ampersand ; '&' & Dollar-sign ; '$' "; RuleBasedCollator *myTable = new RuleBasedCollator(rules, status);
Another way of creating the table-based collation object, mySimple, is:
Or,UErrorCode status = U_ZERO_ERROR; RuleBasedCollator *mySimple = new RuleBasedCollator(" < a < b & b < c & c < d", status);
Because " < a < b < c < d" is the same as "a < b < d & b < c" or "< a < b & b < c & c < d".UErrorCode status = U_ZERO_ERROR; RuleBasedCollator *mySimple = new RuleBasedCollator(" < a < b < d & b < c", status);
To combine collations from two locales, (without error handling for clarity)
// Create an en_US Collator object Locale locale_en_US("en", "US", ""); RuleBasedCollator* en_USCollator = (RuleBasedCollator*) Collator::createInstance( locale_en_US, success ); // Create a da_DK Collator object Locale locale_da_DK("da", "DK", ""); RuleBasedCollator* da_DKCollator = (RuleBasedCollator*) Collator::createInstance( locale_da_DK, success ); // Combine the two // First, get the collation rules from en_USCollator UnicodeString rules = en_USCollator->getRules(); // Second, get the collation rules from da_DKCollator rules += da_DKCollator->getRules(); RuleBasedCollator* newCollator = new RuleBasedCollator(rules, success); // newCollator has the combined rules
Another more interesting example would be to make changes on an existing table to create a new collation object. For example, add "& C < ch, cH, Ch, CH" to the en_USCollation object to create your own English collation object,
// Create a new Collator object with additional rules rules = en_USCollator->getRules(); rules += "& C < ch, cH, Ch, CH"; RuleBasedCollator* myCollator = new RuleBasedCollator(rules, success); // myCollator contains the new rules
The following example demonstrates how to change the order of non-spacing accents,
UChar contents[] = { '=', 0x0301, ';', 0x0300, ';', 0x0302, ';', 0x0308, ';', 0x0327, ',', 0x0303, // main accents ';', 0x0304, ';', 0x0305, ';', 0x0306, // main accents ';', 0x0307, ';', 0x0309, ';', 0x030A, // main accents ';', 0x030B, ';', 0x030C, ';', 0x030D, // main accents ';', 0x030E, ';', 0x030F, ';', 0x0310, // main accents ';', 0x0311, ';', 0x0312, // main accents '<', 'a', ',', 'A', ';', 'a', 'e', ',', 'A', 'E', ';', 0x00e6, ',', 0x00c6, '<', 'b', ',', 'B', '<', 'c', ',', 'C', '<', 'e', ',', 'E', '&', 'C', '<', 'd', ',', 'D', 0 }; UnicodeString oldRules(contents); UErrorCode status = U_ZERO_ERROR; // change the order of accent characters UChar addOn[] = { '&', ',', 0x0300, ';', 0x0308, ';', 0x0302, 0 }; oldRules += addOn; RuleBasedCollator *myCollation = new RuleBasedCollator(oldRules, status);
The last example shows how to put new primary ordering in before the default setting. For example, in Japanese collation, you can either sort English characters before or after Japanese characters,
UErrorCode status = U_ZERO_ERROR; // get en_US collation rules RuleBasedCollator* en_USCollation = (RuleBasedCollator*) Collator::createInstance(Locale::US, status); // Always check the error code after each call. if (U_FAILURE(status)) return; // add a few Japanese character to sort before English characters // suppose the last character before the first base letter 'a' in // the English collation rule is 0x2212 UChar jaString[] = {'&', 0x2212, '<', 0x3041, ',', 0x3042, '<', 0x3043, ',', 0x3044, 0}; UnicodeString rules(en_USCollation->getRules()); rules += jaString; RuleBasedCollator *myJapaneseCollation = new RuleBasedCollator(rules, status);
NOTE: Typically, a collation object is created with Collator::createInstance().
Note: RuleBasedCollator
s with different Locale, CollationStrength and Decomposition mode settings will return different sort orders for the same set of strings. Locales have specific collation rules, and the way in which secondary and tertiary differences are taken into account, for example, will result in a different sorting order for same strings.
Definition at line 355 of file tblcoll.h.
|
RuleBasedCollator constructor. This takes the table rules and builds a collation table out of them. Please see RuleBasedCollator class description for more details on the collation rule syntax.
|
|
RuleBasedCollator constructor. This takes the table rules and builds a collation table out of them. Please see RuleBasedCollator class description for more details on the collation rule syntax.
|
|
RuleBasedCollator constructor. This takes the table rules and builds a collation table out of them. Please see RuleBasedCollator class description for more details on the collation rule syntax.
|
|
RuleBasedCollator constructor. This takes the table rules and builds a collation table out of them. Please see RuleBasedCollator class description for more details on the collation rule syntax.
|
|
Copy constructor.
|
|
Destructor.
|
|
Default constructor.
|
|
Constructor that takes in a UCollator struct.
|
|
RuleBasedCollator constructor. This constructor takes a locale. The only caller of this class should be Collator::createInstance(). If createInstance() happens to know that the requested locale's collation is implemented as a RuleBasedCollator, it can then call this constructor. OTHERWISE IT SHOULDN'T, since this constructor ALWAYS RETURNS A VALID COLLATION TABLE. It does this by falling back to defaults.
|
|
Makes a deep copy of the object. The caller owns the returned object.
Reimplemented from Collator. |
|
Returns the binary format of the class's rules. The format is that of .col files.
|
|
String compare that uses user supplied character iteration. The idea is to prevent users from having to convert the whole string into UChar's before comparing since sometimes strings differ on first couple of characters.
Reimplemented from Collator. |
|
The comparison function compares the character data stored in two different string arrays. Returns information about whether a string array is less than, greater than or equal to another string array. Example of use: . UErrorCode status = U_ZERO_ERROR; . Collator *myCollation = . Collator::createInstance(Locale::US, status); . if (U_FAILURE(status)) return; . myCollation->setStrength(Collator::PRIMARY); . // result would be Collator::EQUAL ("abc" == "ABC") . // (no primary difference between "abc" and "ABC") . Collator::UCollationResult result = . myCollation->compare(L"abc", 3, L"ABC", 3); . myCollation->setStrength(Collator::TERTIARY); . // result would be Collator::LESS (abc" <<< "ABC") . // (with tertiary difference between "abc" and "ABC") . Collator::UCollationResult result = . myCollation->compare(L"abc", 3, L"ABC", 3);
Reimplemented from Collator. |
|
Compares a range of character data stored in two different strings based on the collation rules up to the specified length. Returns information about whether a string is less than, greater than or equal to another string in a language. This can be overriden in a subclass.
Reimplemented from Collator. |
|
Compares a range of character data stored in two different strings based on the collation rules. Returns information about whether a string is less than, greater than or equal to another string in a language. This can be overriden in a subclass.
Reimplemented from Collator. |
|
Creates a collation element iterator for the source. The caller of this method is responsible for the memory management of the returned pointer.
|
|
Creates a collation element iterator for the source string. The caller of this method is responsible for the memory management of the return pointer.
|
|
Universal attribute getter.
Reimplemented from Collator. |
|
Transforms a specified region of the string into a series of characters that can be compared with CollationKey.compare. Use a CollationKey when you need to do repeated comparisions on the same string. For a single comparison the compare method will be faster.
Reimplemented from Collator. |
|
Transforms a specified region of the string into a series of characters that can be compared with CollationKey.compare. Use a CollationKey when you need to do repeated comparisions on the same string. For a single comparison the compare method will be faster.
Reimplemented from Collator. |
|
Get the decomposition mode of the Collator object.
Reimplemented from Collator. |
|
Returns a unique class ID POLYMORPHICALLY. Pure virtual override. This method is to implement a simple version of RTTI, since not all C++ compilers support genuine RTTI. Polymorphic operator==() and clone() methods call this method.
Reimplemented from Collator. |
|
Converts C's UCollationStrength to ECollationStrength.
|
|
Converts C's UCollationResult to EComparisonResult.
|
|
Return the maximum length of any expansion sequences that end with the specified comparison order.
|
|
Returns current rules. Delta defines whether full rules are returned or just the tailoring.
|
|
Gets the table-based rules for the collation object.
|
|
Get the sort key as an array of bytes from an UChar buffer.
Reimplemented from Collator. |
|
Get the sort key as an array of bytes from an UnicodeString.
Reimplemented from Collator. |
|
Returns the class ID for this class. This is useful only for comparing to a return value from getDynamicClassID(). For example: Base* polymorphic_pointer = createPolymorphicObject(); if (polymorphic_pointer->getDynamicClassID() == Derived::getStaticClassID()) ...
Definition at line 623 of file tblcoll.h. Referenced by getDynamicClassID(). |
|
Determines the minimum strength that will be use in comparison or transformation.
E.g. with strength == SECONDARY, the tertiary difference is ignored E.g. with strength == PRIMARY, the secondary and tertiary difference are ignored.
Reimplemented from Collator. |
|
Converts C++'s ECollationStrength to UCollationStrength.
|
|
Generates the hash code for the rule-based collation object.
Reimplemented from Collator. |
|
Returns true if argument is not the same as this object.
Reimplemented from Collator. |
|
Assignment operator.
|
|
Returns true if argument is the same as this object.
Reimplemented from Collator. |
|
Thread safe cloning operation.
Reimplemented from Collator. |
|
Universal attribute setter.
Reimplemented from Collator. |
|
Set the decomposition mode of the Collator object. success is equal to U_ILLEGAL_ARGUMENT_ERROR if error occurs.
Reimplemented from Collator. |
|
Sets the minimum strength to be used in comparison or transformation.
Example of use: . UErrorCode status = U_ZERO_ERROR; . Collator*myCollation = Collator::createInstance(Locale::US, status); . if (U_FAILURE(status)) return; . myCollation->setStrength(Collator::PRIMARY); . // result will be "abc" == "ABC" . // tertiary differences will be ignored . Collator::ComparisonResult result = myCollation->compare("abc", "ABC");
Reimplemented from Collator. |
|
Creates the c struct for ucollator.
|
|
Creates the c struct for ucollator.
|
|
Creates the c struct for ucollator.
|
|
Used to iterate over collation elements in a character source.
|
|
Collator ONLY needs access to RuleBasedCollator(const Locale&, UErrorCode&).
|
|
Streamer used to read/write binary collation data files.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
static class id.
|
|
|
|
c struct for collation. All initialisation for it has to be done through setUCollator(). |
|
Rule UnicodeString.
|