com.ibm.icu4jni.text
Class RuleBasedCollator

java.lang.Object
  extended bycom.ibm.icu4jni.text.Collator
      extended bycom.ibm.icu4jni.text.RuleBasedCollator
All Implemented Interfaces:
java.lang.Cloneable

public final class RuleBasedCollator
extends Collator

Concrete implementation class for Collation.

The collation table is composed of a list of collation rules, where each rule is of three forms:

    < modifier >
    < relation > < text-argument >
    < reset > < text-argument >
 

RuleBasedCollator has the following restrictions for efficiency (other subclasses may be used for more complex languages) :

  1. If a French secondary ordering is specified it applies to the whole collator object.
  2. All non-mentioned Unicode characters are at the end of the collation order.
  3. If a character is not located in the RuleBasedCollator, the default Unicode Collation Algorithm (UCA) rulebased table is automatically searched as a backup.
The following demonstrates how to create your own collation rules:

This sounds more complicated than it is in practice. For example, the following are equivalent ways of expressing the same thing:

 a < b < c
 a < b & b < c
 a < c & a < b
 
Notice that the order is important, as the subsequent item goes immediately after the text-argument. The following are not equivalent:
 a < b & a < c
 a < c & a < b
 
Either the text-argument must already be present in the sequence, or some initial substring of the text-argument must be present. (e.g. "a < b & ae < e" is valid since "a" is present in the sequence before "ae" is reset). In this latter case, "ae" is not entered and treated as a single character; instead, "e" is sorted as if it were expanded to two characters: "a" followed by an "e". This difference appears in natural languages: in traditional Spanish "ch" is treated as though it contracts to a single character (expressed as "c < ch < d"), while in traditional German a-umlaut is treated as though it expanded to two characters (expressed as "a,A < b,B ... & ae;? & AE;?"). [? and ? are, of course, the escape sequences for a-umlaut.]

Ignorable Characters

For ignorable characters, the first rule must start with a relation (the examples we have used above are really fragments; "a < b" really should be "< a < b"). If, however, the first relation is not "<", then all the all text-arguments up to the first "<" are ignorable. For example, ", - < a < b" makes "-" an ignorable character, as we saw earlier in the word "black-birds". In the samples for different languages, you see that most accents are ignorable.

Normalization and Accents

RuleBasedCollator automatically processes its rule table to include both pre-composed and combining-character versions of accented characters. Even if the provided rule string contains only base characters and separate combining accent characters, the pre-composed accented characters matching all canonical combinations of characters from the rule string will be entered in the table.

This allows you to use a RuleBasedCollator to compare accented strings even when the collator is set to NO_DECOMPOSITION. However, if the strings to be collated contain combining sequences that may not be in canonical order, you should set the collator to CANONICAL_DECOMPOSITION to enable sorting of combining sequences. For more information, see The Unicode Standard, Version 3.0.)

Errors

The following are errors:

If you produce one of these errors, a RuleBasedCollator throws a ParseException.

Examples

Simple: "< a < b < c < d"

Norwegian: "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I< j,J < k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R< s,S< t,T < u,U< v,V< w,W< x,X< y,Y< z,Z < ?=a?,?=A? ;aa,AA< ?,?< ?,?"

Normally, to create a rule-based Collator object, you will use Collator's factory method getInstance. However, to create a rule-based Collator object with specialized rules tailored to your needs, you construct the RuleBasedCollator with the rules contained in a String object. For example:

 String Simple = "< a < b < c < d";
 RuleBasedCollator mySimple = new RuleBasedCollator(Simple);
 
Or:
 String Norwegian = "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I< j,J" +
                 "< k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R< s,S< t,T" +
                 "< u,U< v,V< w,W< x,X< y,Y< z,Z" +
                 "< ?=a?,?=A?" +
                 ";aa,AA< ?,?< ?,?";
 RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian);
 

Combining Collators is as simple as concatenating strings. Here's an example that combines two Collators from two different locales:

 // Create an en_US Collator object
 RuleBasedCollator en_USCollator = (RuleBasedCollator)
     Collator.getInstance(new Locale("en", "US", ""));
 // Create a da_DK Collator object
 RuleBasedCollator da_DKCollator = (RuleBasedCollator)
     Collator.getInstance(new Locale("da", "DK", ""));
 // Combine the two
 // First, get the collation rules from en_USCollator
 String en_USRules = en_USCollator.getRules();
 // Second, get the collation rules from da_DKCollator
 String da_DKRules = da_DKCollator.getRules();
 RuleBasedCollator newCollator =
     new RuleBasedCollator(en_USRules + da_DKRules);
 // newCollator has the combined rules
 

Another more interesting example would be to make changes on an existing table to create a new Collator object. For example, add "& C < ch, cH, Ch, CH" to the en_USCollator object to create your own:

 // Create a new Collator object with additional rules
 String addRules = "& C < ch, cH, Ch, CH";
 RuleBasedCollator myCollator =
     new RuleBasedCollator(en_USCollator + addRules);
 // myCollator contains the new rules
 

The following example demonstrates how to change the order of non-spacing accents,

 // old rule
 String oldRules = "=?;?;?"    // main accents Diaeresis 00A8, Macron 00AF
                               // Acute 00BF
                 + "< a , A ; ae, AE ; ? , ?"
                 + "< b , B < c, C < e, E & C < d, D";
 // change the order of accent characters
 String addOn = "& ?;?;?;"; // Acute 00BF, Macron 00AF, Diaeresis 00A8
 RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);
 

The last example shows how to put new primary ordering in before the default setting. For example, in Japanese Collator, you can either sort English characters before or after Japanese characters,

 // get en_US Collator rules
 RuleBasedCollator en_USCollator = 
                      (RuleBasedCollator)Collator.getInstance(Locale.US);
 // add a few Japanese character to sort before English characters
 // suppose the last character before the first base letter 'a' in
 // the English collation rule is ?
 String jaString = "& \\u30A2 , \\u30FC < \\u30C8";
 RuleBasedCollator myJapaneseCollator = new
     RuleBasedCollator(en_USCollator.getRules() + jaString);
 

Status:
Stable ICU 2.4.

Field Summary
 
Fields inherited from class com.ibm.icu4jni.text.Collator
CANONICAL_DECOMPOSITION, IDENTICAL, NO_DECOMPOSITION, PRIMARY, QUATERNARY, RESULT_DEFAULT, RESULT_EQUAL, RESULT_GREATER, RESULT_LESS, SECONDARY, TERTIARY
 
Constructor Summary
RuleBasedCollator(java.lang.String rules)
          RuleBasedCollator constructor.
RuleBasedCollator(java.lang.String rules, int strength)
          RuleBasedCollator constructor.
RuleBasedCollator(java.lang.String rules, int normalizationmode, int strength)
          RuleBasedCollator constructor.
 
Method Summary
 java.lang.Object clone()
          Makes a complete copy of the current object.
 int compare(java.lang.String source, java.lang.String target)
          The comparison function compares the character data stored in two different strings.
 boolean equals(java.lang.Object target)
          Checks if argument object is equals to this object.
protected  void finalize()
          Garbage collection.
 int getAttribute(int type)
          Gets the attribute to be used in comparison or transformation.
 CollationElementIterator getCollationElementIterator(java.lang.String source)
          Create a CollationElementIterator object that will iterator over the elements in a string, using the collation rules defined in this RuleBasedCollator
 CollationKey getCollationKey(java.lang.String source)
          Get the sort key as an CollationKey object from the argument string.
 int getDecomposition()
          Get the normalization mode for this object.
 java.lang.String getRules()
          Get the collation rules of this Collation object The rules will follow the rule syntax.
 byte[] getSortKey(java.lang.String source)
          Get a sort key for the argument string Sort keys may be compared using java.util.Arrays.equals
 int getStrength()
          Determines the minimum strength that will be use in comparison or transformation.
 int hashCode()
          Returns a hash of this collation object Note this method is not complete, it only returns 0 at the moment.
 void setAttribute(int type, int value)
          Sets the attribute to be used in comparison or transformation.
 void setDecomposition(int decompositionmode)
          Sets the decomposition mode of the Collator object on or off.
 void setStrength(int strength)
          Sets the minimum strength to be used in comparison or transformation.
 
Methods inherited from class com.ibm.icu4jni.text.Collator
equals, getInstance, getInstance
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RuleBasedCollator

public RuleBasedCollator(java.lang.String rules)
                  throws java.text.ParseException
RuleBasedCollator constructor. This takes the table rules and builds a collation table out of them. Please see RuleBasedCollator class description for more details on the collation rule syntax.

Parameters:
rules - the collation rules to build the collation table from.
Throws:
java.text.ParseException - thrown if rules are empty or a Runtime error if collator can not be created.
Status:
Stable ICU 2.4.

RuleBasedCollator

public RuleBasedCollator(java.lang.String rules,
                         int strength)
                  throws java.text.ParseException
RuleBasedCollator constructor. This takes the table rules and builds a collation table out of them. Please see RuleBasedCollator class description for more details on the collation rule syntax.

Parameters:
rules - the collation rules to build the collation table from.
strength - collation strength
Throws:
java.text.ParseException - thrown if rules are empty or a Runtime error if collator can not be created.
See Also:
Collator.PRIMARY, Collator.SECONDARY, Collator.TERTIARY, Collator.QUATERNARY, Collator.IDENTICAL
Status:
Stable ICU 2.4.

RuleBasedCollator

public RuleBasedCollator(java.lang.String rules,
                         int normalizationmode,
                         int strength)
RuleBasedCollator constructor. This takes the table rules and builds a collation table out of them. Please see RuleBasedCollator class description for more details on the collation rule syntax.

Note API change starting from release 2.4. Prior to release 2.4, the normalizationmode argument values are from the class com.ibm.icu4jni.text.Normalization. In 2.4, the valid normalizationmode arguments for this API are CollationAttribute.VALUE_ON and CollationAttribute.VALUE_OFF.

Parameters:
rules - the collation rules to build the collation table from.
strength - collation strength
normalizationmode - normalization mode
Throws:
java.lang.IllegalArgumentException - thrown when constructor error occurs
See Also:
Collator.PRIMARY, Collator.SECONDARY, Collator.TERTIARY, Collator.QUATERNARY, Collator.IDENTICAL, Collator.CANONICAL_DECOMPOSITION, Collator.NO_DECOMPOSITION
Status:
Stable ICU 2.4.
Method Detail

clone

public java.lang.Object clone()
Makes a complete copy of the current object.

Specified by:
clone in class Collator
Returns:
a copy of this object if data clone is a success, otherwise null
Status:
Stable ICU 2.4.

compare

public int compare(java.lang.String source,
                   java.lang.String target)
The comparison function compares the character data stored in two different strings. Returns information about whether a string is less than, greater than or equal to another string.

Example of use:
Collator myCollation = Collator.createInstance(Locale::US); myCollation.setStrength(CollationAttribute.VALUE_PRIMARY); // result would be Collator.RESULT_EQUAL ("abc" == "ABC") // (no primary difference between "abc" and "ABC") int result = myCollation.compare("abc", "ABC",3); myCollation.setStrength(CollationAttribute.VALUE_TERTIARY); // result would be Collation::LESS (abc" <<< "ABC") // (with tertiary difference between "abc" and "ABC") int result = myCollation.compare("abc", "ABC",3);

Specified by:
compare in class Collator
Parameters:
source - The source string.
target - The target string.
Returns:
result of the comparison, Collator.RESULT_EQUAL, Collator.RESULT_GREATER or Collator.RESULT_LESS
Status:
Stable ICU 2.4.

getDecomposition

public int getDecomposition()
Get the normalization mode for this object. The normalization mode influences how strings are compared.

Specified by:
getDecomposition in class Collator
Returns:
the decomposition mode
See Also:
Collator.CANONICAL_DECOMPOSITION, Collator.NO_DECOMPOSITION
Status:
Stable ICU 2.4.

setDecomposition

public void setDecomposition(int decompositionmode)

Sets the decomposition mode of the Collator object on or off. If the decomposition mode is set to on, string would be decomposed into NFD format where necessary before sorting.

Specified by:
setDecomposition in class Collator
Parameters:
decompositionmode - the new decomposition mode
See Also:
Collator.CANONICAL_DECOMPOSITION, Collator.NO_DECOMPOSITION
Status:
Stable ICU 2.4.

getStrength

public int getStrength()
Determines the minimum strength that will be use in comparison or transformation.

E.g. with strength == CollationAttribute.VALUE_SECONDARY, the tertiary difference is ignored

E.g. with strength == PRIMARY, the secondary and tertiary difference are ignored.

Specified by:
getStrength in class Collator
Returns:
the current comparison level.
See Also:
Collator.PRIMARY, Collator.SECONDARY, Collator.TERTIARY, Collator.QUATERNARY, Collator.IDENTICAL
Status:
Stable ICU 2.4.

setStrength

public void setStrength(int strength)
Sets the minimum strength to be used in comparison or transformation.

Example of use:
Collator myCollation = Collator.createInstance(Locale::US); myCollation.setStrength(PRIMARY); // result will be "abc" == "ABC" // tertiary differences will be ignored int result = myCollation->compare("abc", "ABC");

Specified by:
setStrength in class Collator
Parameters:
strength - the new comparison level.
Throws:
java.lang.IllegalArgumentException - when argument does not belong to any collation strength mode or error occurs while setting data.
See Also:
Collator.PRIMARY, Collator.SECONDARY, Collator.TERTIARY, Collator.QUATERNARY, Collator.IDENTICAL
Status:
Stable ICU 2.4.

setAttribute

public void setAttribute(int type,
                         int value)
Sets the attribute to be used in comparison or transformation.

Example of use:
Collator myCollation = Collator.createInstance(Locale::US); myCollation.setAttribute(CollationAttribute.CASE_LEVEL, CollationAttribute.VALUE_ON); int result = myCollation->compare("\\u30C3\\u30CF", "\\u30C4\\u30CF"); // result will be Collator.RESULT_LESS.

Specified by:
setAttribute in class Collator
Parameters:
type - the attribute to be set from CollationAttribute
value - attribute value from CollationAttribute
Status:
Stable ICU 2.4.

getAttribute

public int getAttribute(int type)
Gets the attribute to be used in comparison or transformation.

Specified by:
getAttribute in class Collator
Parameters:
type - the attribute to be set from CollationAttribute
Returns:
value attribute value from CollationAttribute
Status:
Stable ICU 2.4.

getCollationKey

public CollationKey getCollationKey(java.lang.String source)
Get the sort key as an CollationKey object from the argument string. To retrieve sort key in terms of byte arrays, use the method as below

Collator collator = Collator.getInstance(); byte[] array = collator.getSortKey(source);
Byte array result are zero-terminated and can be compared using java.util.Arrays.equals();

Specified by:
getCollationKey in class Collator
Parameters:
source - string to be processed.
Returns:
the sort key
Status:
Stable ICU 2.4.

getSortKey

public byte[] getSortKey(java.lang.String source)
Get a sort key for the argument string Sort keys may be compared using java.util.Arrays.equals

Parameters:
source - string for key to be generated
Returns:
sort key
Status:
Stable ICU 2.4.

getRules

public java.lang.String getRules()
Get the collation rules of this Collation object The rules will follow the rule syntax.

Returns:
collation rules.
Status:
Stable ICU 2.4.

getCollationElementIterator

public CollationElementIterator getCollationElementIterator(java.lang.String source)
Create a CollationElementIterator object that will iterator over the elements in a string, using the collation rules defined in this RuleBasedCollator

Parameters:
source - string to iterate over
Returns:
address of C collationelement
Throws:
java.lang.IllegalArgumentException - thrown when error occurs
Status:
Stable ICU 2.4.

hashCode

public int hashCode()
Returns a hash of this collation object Note this method is not complete, it only returns 0 at the moment.

Specified by:
hashCode in class Collator
Returns:
hash of this collation object
Status:
Stable ICU 2.4.

equals

public boolean equals(java.lang.Object target)
Checks if argument object is equals to this object.

Specified by:
equals in class Collator
Parameters:
target - object
Returns:
true if source is equivalent to target, false otherwise
Status:
Stable ICU 2.4.

finalize

protected void finalize()
Garbage collection. Close C collator and reclaim memory.