com.ibm.icu4jni.text
Class Collator

java.lang.Object
  |
  +--com.ibm.icu4jni.text.Collator
All Implemented Interfaces:
java.lang.Cloneable
Direct Known Subclasses:
RuleBasedCollator

public abstract class Collator
extends java.lang.Object
implements java.lang.Cloneable

Abstract class handling locale specific collation via JNI and ICU. Subclasses implement specific collation strategies. One subclass, com.ibm.icu4jni.text.RuleBasedCollator, is currently provided and is applicable to a wide set of languages. Other subclasses may be created to handle more specialized needs. You can use the static factory method, getInstance(), to obtain the appropriate Collator object for a given locale.

 // Compare two strings in the default locale
 Collator myCollator = Collator.getInstance();
 if (myCollator.compare("abc", "ABC") < 0) {
   System.out.println("abc is less than ABC");
 }
 else {
   System.out.println("abc is greater than or equal to ABC");
 }
 
You can set a Collator's strength property to determine the level of difference considered significant in comparisons. Five strengths in CollationAttribute are provided: VALUE_PRIMARY, VALUE_SECONDARY, VALUE_TERTIARY, VALUE_QUARTENARY and VALUE_IDENTICAL. The exact assignment of strengths to language features is locale dependant. For example, in Czech, "e" and "f" are considered primary differences, while "e" and "?" latin small letter e with circumflex are secondary differences, "e" and "E" are tertiary differences and "e" and "e" are identical.

The following shows how both case and accents could be ignored for US English.

 //Get the Collator for US English and set its strength to PRIMARY
 Collator usCollator = Collator.getInstance(Locale.US);
 usCollator.setStrength(Collator.PRIMARY);
 if (usCollator.compare("abc", "ABC") == 0) {
   System.out.println("Strings are equivalent");
 }
 
For comparing Strings exactly once, the compare method provides the best performance. When sorting a list of Strings however, it is generally necessary to compare each String multiple times. In this case, com.ibm.icu4jni.text.CollationKey provide better performance. The CollationKey class converts a String to a series of bits that can be compared bitwise against other CollationKeys. A CollationKey is created by a Collator object for a given String. Note: CollationKeys from different Collators can not be compared.

Considerations : 1) ErrorCode not returned to user throw exceptions instead 2) Similar API to java.text.Collator

Status:
Stable ICU 2.4.

Field Summary
static int CANONICAL_DECOMPOSITION
          Decomposition mode value.
static int IDENTICAL
           Smallest Collator strength value.
static int NO_DECOMPOSITION
          Decomposition mode value.
static int PRIMARY
          Strongest collator strength value.
static int QUATERNARY
          Fourth level collator strength value.
static int RESULT_DEFAULT
          accepted by most attributes
static int RESULT_EQUAL
          string a == string b
static int RESULT_GREATER
          string a > string b
static int RESULT_LESS
          string a < string b
static int SECONDARY
          Second level collator strength value.
static int TERTIARY
          Third level collator strength value.
 
Constructor Summary
Collator()
           
 
Method Summary
abstract  java.lang.Object clone()
          Makes a copy of the current object.
abstract  int compare(java.lang.String source, java.lang.String target)
          The comparison function compares the character data stored in two different strings.
abstract  boolean equals(java.lang.Object target)
          Checks if argument object is equals to this object.
 boolean equals(java.lang.String source, java.lang.String target)
          Locale dependent equality check for the argument strings.
abstract  int getAttribute(int type)
          Gets the attribute to be used in comparison or transformation.
abstract  CollationKey getCollationKey(java.lang.String source)
          Get the sort key as an CollationKey object from the argument string.
abstract  int getDecomposition()
          Get the decomposition mode of this Collator.
static Collator getInstance()
          Factory method to create an appropriate Collator which uses the default locale collation rules.
static Collator getInstance(java.util.Locale locale)
          Factory method to create an appropriate Collator which uses the argument locale collation rules.
abstract  int getStrength()
          Determines the minimum strength that will be use in comparison or transformation.
abstract  int hashCode()
          Returns a hash of this collation object
abstract  void setAttribute(int type, int value)
          Sets the attribute to be used in comparison or transformation.
abstract  void setDecomposition(int mode)
          Set the normalization mode used int this object The normalization mode influences how strings are compared.
abstract  void setStrength(int strength)
          Sets the minimum strength to be used in comparison or transformation.
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PRIMARY

public static final int PRIMARY
Strongest collator strength value. Typically used to denote differences between base characters. See class documentation for more explanation.

See Also:
setStrength(int), getStrength(), Constant Field Values
Status:
Draft ICU 2.4.

SECONDARY

public static final int SECONDARY
Second level collator strength value. Accents in the characters are considered secondary differences. Other differences between letters can also be considered secondary differences, depending on the language. See class documentation for more explanation.

See Also:
setStrength(int), getStrength(), Constant Field Values
Status:
Draft ICU 2.4.

TERTIARY

public static final int TERTIARY
Third level collator strength value. Upper and lower case differences in characters are distinguished at this strength level. In addition, a variant of a letter differs from the base form on the tertiary level. See class documentation for more explanation.

See Also:
setStrength(int), getStrength(), Constant Field Values
Status:
Draft ICU 2.4.

QUATERNARY

public static final int QUATERNARY
Fourth level collator strength value. When punctuation is ignored (see Ignoring Punctuations in the user guide) at PRIMARY to TERTIARY strength, an additional strength level can be used to distinguish words with and without punctuation. See class documentation for more explanation.

See Also:
setStrength(int), getStrength(), Constant Field Values
Status:
Draft ICU 2.4.

IDENTICAL

public static final int IDENTICAL

Smallest Collator strength value. When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. See class documentation for more explanation.

Note this value is different from JDK's

See Also:
Constant Field Values
Status:
Draft ICU 2.4.

NO_DECOMPOSITION

public static final int NO_DECOMPOSITION

Decomposition mode value. With NO_DECOMPOSITION set, Strings will not be decomposed for collation. This is the default decomposition setting unless otherwise specified by the locale used to create the Collator.

Note this value is different from the JDK's.

See Also:
CANONICAL_DECOMPOSITION, getDecomposition(), setDecomposition(int), Constant Field Values
Status:
Draft ICU 2.4.

CANONICAL_DECOMPOSITION

public static final int CANONICAL_DECOMPOSITION

Decomposition mode value. With CANONICAL_DECOMPOSITION set, characters that are canonical variants according to the Unicode standard will be decomposed for collation.

CANONICAL_DECOMPOSITION corresponds to Normalization Form D as described in Unicode Technical Report #15.

See Also:
NO_DECOMPOSITION, getDecomposition(), setDecomposition(int), Constant Field Values
Status:
Draft ICU 2.4.

RESULT_EQUAL

public static final int RESULT_EQUAL
string a == string b

See Also:
Constant Field Values
Status:
Stable ICU 2.4.

RESULT_GREATER

public static final int RESULT_GREATER
string a > string b

See Also:
Constant Field Values
Status:
Stable ICU 2.4.

RESULT_LESS

public static final int RESULT_LESS
string a < string b

See Also:
Constant Field Values
Status:
Stable ICU 2.4.

RESULT_DEFAULT

public static final int RESULT_DEFAULT
accepted by most attributes

See Also:
Constant Field Values
Status:
Stable ICU 2.4.
Constructor Detail

Collator

public Collator()
Method Detail

getInstance

public static Collator getInstance()
Factory method to create an appropriate Collator which uses the default locale collation rules. Current implementation createInstance() returns a RuleBasedCollator(Locale) instance. The RuleBasedCollator will be created in the following order,

Returns:
an instance of Collator
Status:
Stable ICU 2.4.

getInstance

public static Collator getInstance(java.util.Locale locale)
Factory method to create an appropriate Collator which uses the argument locale collation rules.
Current implementation createInstance() returns a RuleBasedCollator(Locale) instance. The RuleBasedCollator will be created in the following order,

Parameters:
locale - to be used for collation
Returns:
an instance of Collator
Status:
Stable ICU 2.4.

equals

public boolean equals(java.lang.String source,
                      java.lang.String target)
Locale dependent equality check for the argument strings.

Parameters:
source - string
target - string
Returns:
true if source is equivalent to target, false otherwise
Status:
Stable ICU 2.4.

equals

public abstract boolean equals(java.lang.Object target)
Checks if argument object is equals to this object.

Overrides:
equals in class java.lang.Object
Parameters:
target - object
Returns:
true if source is equivalent to target, false otherwise
Status:
Stable ICU 2.4.

clone

public abstract java.lang.Object clone()
                                throws java.lang.CloneNotSupportedException
Makes a copy of the current object.

Overrides:
clone in class java.lang.Object
Returns:
a copy of this object
java.lang.CloneNotSupportedException
Status:
Stable ICU 2.4.

compare

public abstract int compare(java.lang.String source,
                            java.lang.String target)
The comparison function compares the character data stored in two different strings. Returns information about whether a string is less than, greater than or equal to another string.

Example of use:

 .  Collator myCollation = Collator.getInstance(Locale::US);
 .  myCollation.setStrength(CollationAttribute.VALUE_PRIMARY);
 .  // result would be CollationAttribute.VALUE_EQUAL 
 .  // ("abc" == "ABC")
 .  // (no primary difference between "abc" and "ABC")
 .  int result = myCollation.compare("abc", "ABC",3);
 .  myCollation.setStrength(CollationAttribute.VALUE_TERTIARY);
 .  // result would be Collation.LESS (abc" <<< "ABC")
 .  // (with tertiary difference between "abc" and "ABC")
 .  int result = myCollation.compare("abc", "ABC",3);
 

Parameters:
source - source string.
target - target string.
Returns:
result of the comparison, Collator.RESULT_EQUAL, Collator.RESULT_GREATER or Collator.RESULT_LESS
Status:
Stable ICU 2.4.

getDecomposition

public abstract int getDecomposition()
Get the decomposition mode of this Collator.

Returns:
the decomposition mode
See Also:
CANONICAL_DECOMPOSITION, NO_DECOMPOSITION
Status:
Draft ICU 2.4.

setDecomposition

public abstract void setDecomposition(int mode)
Set the normalization mode used int this object The normalization mode influences how strings are compared.

Parameters:
mode - desired normalization mode
See Also:
CANONICAL_DECOMPOSITION, NO_DECOMPOSITION
Status:
Draft ICU 2.4.

getStrength

public abstract int getStrength()
Determines the minimum strength that will be use in comparison or transformation.

E.g. with strength == SECONDARY, the tertiary difference is ignored

E.g. with strength == PRIMARY, the secondary and tertiary difference are ignored.

Returns:
the current comparison level.
See Also:
PRIMARY, SECONDARY, TERTIARY, QUATERNARY, IDENTICAL
Status:
Draft ICU 2.4.

getAttribute

public abstract int getAttribute(int type)
Gets the attribute to be used in comparison or transformation.

Parameters:
type - the attribute to be set from CollationAttribute
Returns:
value attribute value from CollationAttribute
Status:
Stable ICU 2.4.

setStrength

public abstract void setStrength(int strength)
Sets the minimum strength to be used in comparison or transformation.

Example of use:

 . Collator myCollation = Collator.createInstance(Locale::US);
 . myCollation.setStrength(PRIMARY);
 . // result will be "abc" == "ABC"
 . // tertiary differences will be ignored
 . int result = myCollation->compare("abc", "ABC"); 
 

Parameters:
strength - the new comparison level.
See Also:
PRIMARY, SECONDARY, TERTIARY, QUATERNARY, IDENTICAL
Status:
Draft ICU 2.4.

setAttribute

public abstract void setAttribute(int type,
                                  int value)
Sets the attribute to be used in comparison or transformation.

Example of use:

 . Collator myCollation = Collator.createInstance(Locale::US);
 . myCollation.setAttribute(CollationAttribute.CASE_LEVEL, 
 .                          CollationAttribute.VALUE_ON);
 . int result = myCollation->compare("\\u30C3\\u30CF", 
 .                                   "\\u30C4\\u30CF");
 . // result will be Collator.RESULT_LESS.
 

Parameters:
type - the attribute to be set from CollationAttribute
value - attribute value from CollationAttribute
Status:
Stable ICU 2.4.

getCollationKey

public abstract CollationKey getCollationKey(java.lang.String source)
Get the sort key as an CollationKey object from the argument string. To retrieve sort key in terms of byte arrays, use the method as below
Collator collator = Collator.getInstance(); CollationKey collationkey = collator.getCollationKey("string"); byte[] array = collationkey.toByteArray();
Byte array result are zero-terminated and can be compared using java.util.Arrays.equals();

Parameters:
source - string to be processed.
Returns:
the sort key
Status:
Stable ICU 2.4.

hashCode

public abstract int hashCode()
Returns a hash of this collation object

Overrides:
hashCode in class java.lang.Object
Returns:
hash of this collation object
Status:
Stable ICU 2.4.