com.ibm.icu.lang
Class UCharacter

java.lang.Object
  |
  +--com.ibm.icu.lang.UCharacter

public final class UCharacter
extends java.lang.Object

The UCharacter class provides extensions to the java.lang.Character class. These extensions provide support for Unicode 3.1 properties and together with the UTF16 class, provide support for supplementary characters (those with code points above U+FFFF).

Code points are represented in these API using ints. While it would be more convenient in Java to have a separate primitive datatype for them, ints suffice in the meantime.

To use this class please add the jar file name icu4j.jar to the class path, since it contains data files which supply the information used by this file.
E.g. In Windows
set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar.
Otherwise, another method would be to copy the files uprops.dat and unames.dat from the icu4j source subdirectory $ICU4J_SRC/src/com.ibm.icu.impl.data to your class directory $ICU4J_CLASS/com.ibm.icu.impl.data.

Aside from the additions for UTF-16 support, and the updated Unicode 3.1 properties, the main differences between UCharacter and Character are:

Further detail differences can be determined from the program com.ibm.icu.dev.test.lang.UCharacterCompare

Since:
oct 06 2000
Author:
Syn Wee Quek
See Also:
UCharacterCategory, UCharacterDirection

Field Summary
static int MAX_VALUE
          The highest Unicode code point value (scalar value) according to the Unicode Standard.
static int MIN_VALUE
          The lowest Unicode code point value.
protected static com.ibm.icu.lang.UCharacterName NAME_
          Database storing the sets of character name
static int REPLACEMENT_CHAR
          Unicode value used when translating into Unicode encoding form and there is no existing character.
static int SUPPLEMENTARY_MIN_VALUE
          The minimum value for Supplementary code points
 
Method Summary
static int digit(int ch)
          Retrieves the numeric value of a decimal digit code point.
static int digit(int ch, int radix)
          Retrieves the numeric value of a decimal digit code point.
static int foldCase(int ch, boolean defaultmapping)
          The given character is mapped to its case folding equivalent according to UnicodeData.txt and CaseFolding.txt; if the character has no case folding equivalent, the character itself is returned.
static java.lang.String foldCase(java.lang.String str, boolean defaultmapping)
          The given string is mapped to its case folding equivalent according to UnicodeData.txt and CaseFolding.txt; if any character has no case folding equivalent, the character itself is returned.
static VersionInfo getAge(int ch)
          Get the "age" of the code point.
static int getCharFromExtendedName(java.lang.String name)
          Find a Unicode character by either its name and return its code point value.
static int getCharFromName(java.lang.String name)
          Find a Unicode code point by its most current Unicode name and return its code point value.
static int getCharFromName1_0(java.lang.String name)
          Find a Unicode character by its version 1.0 Unicode name and return its code point value.
static int getCodePoint(char char16)
          Returns the code point corresponding to the UTF16 character.
static int getCodePoint(char lead, char trail)
          Returns a code point corresponding to the two UTF16 characters.
static int getCombiningClass(int ch)
          Gets the combining class of the argument codepoint
static int getDirection(int ch)
          Returns the Bidirection property of a code point.
static java.lang.String getExtendedName(int ch)
          Retrieves a name for a valid codepoint.
static ValueIterator getExtendedNameIterator()
          Gets an iterator for character names, iterating over codepoints.
static int getHanNumericValue(int ch)
          Return numeric value of Han code points.
static int getMirror(int ch)
          Maps the specified code point to a "mirror-image" code point.
static java.lang.String getName(int ch)
          Retrieve the most current Unicode name of the argument code point, or null if the character is unassigned or outside the range UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
static java.lang.String getName1_0(int ch)
          Retrieve the earlier version 1.0 Unicode name of the argument code point, or null if the character is unassigned or outside the range UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
static ValueIterator getName1_0Iterator()
          Gets an iterator for character names, iterating over codepoints.
static ValueIterator getNameIterator()
          Gets an iterator for character names, iterating over codepoints.
static int getNumericValue(int ch)
          Returns the numeric value of the code point as a nonnegative integer.
static int getType(int ch)
          Returns a value indicating a code point's Unicode category.
static RangeValueIterator getTypeIterator()
          Gets an iterator for character types, iterating over codepoints.
static int getUnicodeNumericValue(int ch)
          Returns the Unicode numeric value of the code point as a nonnegative integer.
static VersionInfo getUnicodeVersion()
          Gets the version of Unicode data used.
static boolean hasBinaryProperty(int ch, int property)
          Check a binary Unicode property for a code point.
static boolean isBaseForm(int ch)
          Determines whether the specified code point is of base form.
static boolean isBMP(int ch)
          Determines if the code point is in the BMP plane.
static boolean isDefined(int ch)
          Determines if a code point has a defined meaning in the up-to-date Unicode standard.
static boolean isDigit(int ch)
          Determines if a code point is a Java digit.
static boolean isIdentifierIgnorable(int ch)
          Determines if the specified code point should be regarded as an ignorable character in a Unicode identifier.
static boolean isISOControl(int ch)
          Determines if the specified code point is an ISO control character.
static boolean isLegal(int ch)
          A code point is illegal if and only if Out of bounds, less than 0 or greater than UCharacter.MAX_VALUE A surrogate value, 0xD800 to 0xDFFF Not-a-character, having the form 0x xxFFFF or 0x xxFFFE Note: legal does not mean that it is assigned in this version of Unicode.
static boolean isLegal(java.lang.String str)
          A string is legal iff all its code points are legal.
static boolean isLetter(int ch)
          Determines if the specified code point is a letter.
static boolean isLetterOrDigit(int ch)
          Determines if the specified code point is a letter or digit.
static boolean isLowerCase(int ch)
          Determines if the specified code point is a lowercase character.
static boolean isMirrored(int ch)
          Determines whether the code point has the "mirrored" property.
static boolean isPrintable(int ch)
          Determines whether the specified code point is a printable character according to the Unicode standard.
static boolean isSpaceChar(int ch)
          Determines if the specified code point is a Unicode specified space character, i.e.
static boolean isSupplementary(int ch)
          Determines if the code point is a supplementary character.
static boolean isTitleCase(int ch)
          Determines if the specified code point is a titlecase character.
static boolean isUAlphabetic(int ch)
          Check if a code point has the Alphabetic Unicode property.
static boolean isULowercase(int ch)
          Check if a code point has the Lowercase Unicode property.
static boolean isUnicodeIdentifierPart(int ch)
          Determines if the specified code point may be any part of a Unicode identifier other than the starting character.
static boolean isUnicodeIdentifierStart(int ch)
          Determines if the specified code point is permissible as the first character in a Unicode identifier.
static boolean isUpperCase(int ch)
          Determines if the specified code point is an uppercase character.
static boolean isUUppercase(int ch)
          Check if a code point has the Uppercase Unicode property.
static boolean isUWhiteSpace(int ch)
          Check if a code point has the White_Space Unicode property.
static boolean isWhitespace(int ch)
          Determines if the specified code point is a white space character.
static int toLowerCase(int ch)
          The given code point is mapped to its lowercase equivalent; if the code point has no lowercase equivalent, the code point itself is returned.
static java.lang.String toLowerCase(java.util.Locale locale, java.lang.String str)
          Gets lowercase version of the argument string.
static java.lang.String toLowerCase(java.lang.String str)
          Gets lowercase version of the argument string.
static java.lang.String toString(int ch)
          Converts argument code point and returns a String object representing the code point's value in UTF16 format.
static int toTitleCase(int ch)
          Converts the code point argument to titlecase.
static java.lang.String toTitleCase(java.util.Locale locale, java.lang.String str, BreakIterator breakiter)
          Gets the titlecase version of the argument string.
static java.lang.String toTitleCase(java.lang.String str, BreakIterator breakiter)
          Gets the titlecase version of the argument string.
static int toUpperCase(int ch)
          Converts the character argument to uppercase.
static java.lang.String toUpperCase(java.util.Locale locale, java.lang.String str)
          Gets uppercase version of the argument string.
static java.lang.String toUpperCase(java.lang.String str)
          Gets uppercase version of the argument string.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MIN_VALUE

public static final int MIN_VALUE
The lowest Unicode code point value.

MAX_VALUE

public static final int MAX_VALUE
The highest Unicode code point value (scalar value) according to the Unicode Standard. This is a 21-bit value (21 bits, rounded up).
Up-to-date Unicode implementation of java.lang.Character.MIN_VALUE

SUPPLEMENTARY_MIN_VALUE

public static final int SUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points

REPLACEMENT_CHAR

public static final int REPLACEMENT_CHAR
Unicode value used when translating into Unicode encoding form and there is no existing character.

NAME_

protected static final com.ibm.icu.lang.UCharacterName NAME_
Database storing the sets of character name
Method Detail

digit

public static int digit(int ch,
                        int radix)
Retrieves the numeric value of a decimal digit code point.
This method observes the semantics of java.lang.Character.digit(). Note that this will return positive values for code points for which isDigit returns false, just like java.lang.Character.
Semantic Change: In release 1.3.1 and prior, this did not treat the European letters as having a digit value, and also treated numeric letters and other numbers as digits. This has been changed to conform to the java semantics.
A code point is a valid digit if and only if:
Parameters:
ch - the code point to query
radix - the radix
Returns:
the numeric value represented by the code point in the specified radix, or -1 if the code point is not a decimal digit or if its value is too large for the radix

digit

public static int digit(int ch)
Retrieves the numeric value of a decimal digit code point.
This is a convenience overload of digit(int, int) that provides a decimal radix.
Semantic Change: In release 1.3.1 and prior, this treated numeric letters and other numbers as digits. This has been changed to conform to the java semantics.
Parameters:
ch - the code point to query
Returns:
the numeric value represented by the code point, or -1 if the code point is not a decimal digit or if its value is too large for a decimal radix

getNumericValue

public static int getNumericValue(int ch)
Returns the numeric value of the code point as a nonnegative integer.
If the code point does not have a numeric value, then -1 is returned.
If the code point has a numeric value that cannot be represented as a nonnegative integer (for example, a fractional value), then -2 is returned.
Semantic Change: In release 1.3.1 and prior, this returned -1 for ASCII letters and their fullwidth counterparts. This has been changed to conform to the java semantics.
Parameters:
ch - the code point to query
Returns:
the numeric value of the code point, or -1 if it has no numeric value, or -2 if it has a numeric value that cannot be represented as a nonnegative integer

getUnicodeNumericValue

public static int getUnicodeNumericValue(int ch)
Returns the Unicode numeric value of the code point as a nonnegative integer.
If the code point does not have a numeric value, then -1 is returned.
If the code point has a numeric value that cannot be represented as a nonnegative integer (for example, a fractional value), then -2 is returned. This returns values other than -1 for all and only those code points whose type is a numeric type.
Parameters:
ch - the code point to query
Returns:
the numeric value of the code point, or -1 if it has no numeric value, or -2 if it has a numeric value that cannot be represented as a nonnegative integer

getType

public static int getType(int ch)
Returns a value indicating a code point's Unicode category. Up-to-date Unicode implementation of java.lang.Character.getType() except for the above mentioned code points that had their category changed.
Return results are constants from the interface UCharacterCategory
Parameters:
ch - code point whose type is to be determined
Returns:
category which is a value of UCharacterCategory

isDefined

public static boolean isDefined(int ch)
Determines if a code point has a defined meaning in the up-to-date Unicode standard. E.g. supplementary code points though allocated space are not defined in Unicode yet.
Up-to-date Unicode implementation of java.lang.Character.isDefined()
Parameters:
ch - code point to be determined if it is defined in the most current version of Unicode
Returns:
true if this code point is defined in unicode

isDigit

public static boolean isDigit(int ch)
Determines if a code point is a Java digit.
This method observes the semantics of java.lang.Character.isDigit(). It returns true for decimal digits only.
Semantic Change: In release 1.3.1 and prior, this treated numeric letters and other numbers as digits. This has been changed to conform to the java semantics.
Parameters:
ch - code point to query
Returns:
true if this code point is a digit

isISOControl

public static boolean isISOControl(int ch)
Determines if the specified code point is an ISO control character. A code point is considered to be an ISO control character if it is in the range \u0000 through \u001F or in the range \u007F through \u009F.
Up-to-date Unicode implementation of java.lang.Character.isISOControl()
Parameters:
ch - code point to determine if it is an ISO control character
Returns:
true if code point is a ISO control character

isLetter

public static boolean isLetter(int ch)
Determines if the specified code point is a letter. Up-to-date Unicode implementation of java.lang.Character.isLetter()
Parameters:
ch - code point to determine if it is a letter
Returns:
true if code point is a letter

isLetterOrDigit

public static boolean isLetterOrDigit(int ch)
Determines if the specified code point is a letter or digit. Note this method, unlike java.lang.Character does not regard the ascii characters 'A' - 'Z' and 'a' - 'z' as digits.
Parameters:
ch - code point to determine if it is a letter or a digit
Returns:
true if code point is a letter or a digit

isLowerCase

public static boolean isLowerCase(int ch)
Determines if the specified code point is a lowercase character. UnicodeData only contains case mappings for code points where they are one-to-one mappings; it also omits information about context-sensitive case mappings.
For more information about Unicode case mapping please refer to the Technical report #21.
Up-to-date Unicode implementation of java.lang.Character.isLowerCase()
Parameters:
ch - code point to determine if it is in lowercase
Returns:
true if code point is a lowercase character

isWhitespace

public static boolean isWhitespace(int ch)
Determines if the specified code point is a white space character. A code point is considered to be an whitespace character if and only if it satisfies one of the following criteria: Up-to-date Unicode implementation of java.lang.Character.isWhitespace().
Parameters:
ch - code point to determine if it is a white space
Returns:
true if the specified code point is a white space character

isSpaceChar

public static boolean isSpaceChar(int ch)
Determines if the specified code point is a Unicode specified space character, i.e. if code point is in the category Zs, Zl and Zp. Up-to-date Unicode implementation of java.lang.Character.isSpaceChar().
Parameters:
ch - code point to determine if it is a space
Returns:
true if the specified code point is a space character

isTitleCase

public static boolean isTitleCase(int ch)
Determines if the specified code point is a titlecase character. UnicodeData only contains case mappings for code points where they are one-to-one mappings; it also omits information about context-sensitive case mappings.
For more information about Unicode case mapping please refer to the Technical report #21.
Up-to-date Unicode implementation of java.lang.Character.isTitleCase().
Parameters:
ch - code point to determine if it is in title case
Returns:
true if the specified code point is a titlecase character

isUnicodeIdentifierPart

public static boolean isUnicodeIdentifierPart(int ch)
Determines if the specified code point may be any part of a Unicode identifier other than the starting character. A code point may be part of a Unicode identifier if and only if it is one of the following: Up-to-date Unicode implementation of java.lang.Character.isUnicodeIdentifierPart().
See UTR #8.
Parameters:
ch - code point to determine if is can be part of a Unicode identifier
Returns:
true if code point is any character belonging a unicode identifier suffix after the first character

isUnicodeIdentifierStart

public static boolean isUnicodeIdentifierStart(int ch)
Determines if the specified code point is permissible as the first character in a Unicode identifier. A code point may start a Unicode identifier if it is of type either Up-to-date Unicode implementation of java.lang.Character.isUnicodeIdentifierStart().
See UTR #8.
Parameters:
ch - code point to determine if it can start a Unicode identifier
Returns:
true if code point is the first character belonging a unicode identifier

isIdentifierIgnorable

public static boolean isIdentifierIgnorable(int ch)
Determines if the specified code point should be regarded as an ignorable character in a Unicode identifier. A character is ignorable in the Unicode standard if it is of the type Cf, Formatting code.
Up-to-date Unicode implementation of java.lang.Character.isIdentifierIgnorable().
See UTR #8.
Parameters:
ch - code point to be determined if it can be ignored in a Unicode identifier.
Returns:
true if the code point is ignorable

isUpperCase

public static boolean isUpperCase(int ch)
Determines if the specified code point is an uppercase character. UnicodeData only contains case mappings for code point where they are one-to-one mappings; it also omits information about context-sensitive case mappings.
For language specific case conversion behavior, use toUpperCase(locale, str).
For example, the case conversion for dot-less i and dotted I in Turkish, or for final sigma in Greek. For more information about Unicode case mapping please refer to the Technical report #21.
Up-to-date Unicode implementation of java.lang.Character.isUpperCase().
Parameters:
ch - code point to determine if it is in uppercase
Returns:
true if the code point is an uppercase character

toLowerCase

public static int toLowerCase(int ch)
The given code point is mapped to its lowercase equivalent; if the code point has no lowercase equivalent, the code point itself is returned. UnicodeData only contains case mappings for code point where they are one-to-one mappings; it also omits information about context-sensitive case mappings.
For language specific case conversion behavior, use toLowerCase(locale, str).
For example, the case conversion for dot-less i and dotted I in Turkish, or for final sigma in Greek. For more information about Unicode case mapping please refer to the Technical report #21.
Up-to-date Unicode implementation of java.lang.Character.toLowerCase()
Parameters:
ch - code point whose lowercase equivalent is to be retrieved
Returns:
the lowercase equivalent code point

toString

public static java.lang.String toString(int ch)
Converts argument code point and returns a String object representing the code point's value in UTF16 format. The result is a string whose length is 1 for non-supplementary code points, 2 otherwise.
com.ibm.ibm.icu.UTF16 can be used to parse Strings generated by this function.
Up-to-date Unicode implementation of java.lang.Character.toString()
Parameters:
ch - code point
Returns:
string representation of the code point, null if code point is not defined in unicode

toTitleCase

public static int toTitleCase(int ch)
Converts the code point argument to titlecase. UnicodeData only contains case mappings for code points where they are one-to-one mappings; it also omits information about context-sensitive case mappings.
There are only four Unicode characters that are truly titlecase forms that are distinct from uppercase forms. For more information about Unicode case mapping please refer to the Technical report #21.
If no titlecase is available, the uppercase is returned. If no uppercase is available, the code point itself is returned.
Up-to-date Unicode implementation of java.lang.Character.toTitleCase()
Parameters:
ch - code point whose title case is to be retrieved
Returns:
titlecase code point

toUpperCase

public static int toUpperCase(int ch)
Converts the character argument to uppercase. UnicodeData only contains case mappings for characters where they are one-to-one mappings; it also omits information about context-sensitive case mappings.
For more information about Unicode case mapping please refer to the Technical report #21.
If no uppercase is available, the character itself is returned.
Up-to-date Unicode implementation of java.lang.Character.toUpperCase()
Parameters:
ch - code point whose uppercase is to be retrieved
Returns:
uppercase code point

isSupplementary

public static boolean isSupplementary(int ch)
Determines if the code point is a supplementary character. A code point is a supplementary character if and only if it is greater than SUPPLEMENTARY_MIN_VALUE
Parameters:
ch - code point to be determined if it is in the supplementary plane
Returns:
true if code point is a supplementary character

isBMP

public static boolean isBMP(int ch)
Determines if the code point is in the BMP plane.
Parameters:
ch - code point to be determined if it is not a supplementary character
Returns:
true if code point is not a supplementary character

isPrintable

public static boolean isPrintable(int ch)
Determines whether the specified code point is a printable character according to the Unicode standard.
Parameters:
ch - code point to be determined if it is printable
Returns:
true if the code point is a printable character

isBaseForm

public static boolean isBaseForm(int ch)
Determines whether the specified code point is of base form. A code point of base form does not graphically combine with preceding characters, and is neither a control nor a format character.
Parameters:
ch - code point to be determined if it is of base form
Returns:
true if the code point is of base form

getDirection

public static int getDirection(int ch)
Returns the Bidirection property of a code point. For example, 0x0041 (letter A) has the LEFT_TO_RIGHT directional property.
Result returned belongs to the interface UCharacterDirection
Parameters:
ch - the code point to be determined its direction
Returns:
direction constant from UCharacterDirection. Otherwise is character is not defined, UCharacterDirection.BOUNDARY_NEUTRAL will be returned.

isMirrored

public static boolean isMirrored(int ch)
Determines whether the code point has the "mirrored" property. This property is set for characters that are commonly used in Right-To-Left contexts and need to be displayed with a "mirrored" glyph.
Parameters:
ch - code point whose mirror is to be determined
Returns:
true if the code point has the "mirrored" property

getMirror

public static int getMirror(int ch)
Maps the specified code point to a "mirror-image" code point. For code points with the "mirrored" property, implementations sometimes need a "poor man's" mapping to another code point such that the default glyph may serve as the mirror-image of the default glyph of the specified code point.
This is useful for text conversion to and from codepages with visual order, and for displays without glyph selection capabilities.
Parameters:
ch - code point whose mirror is to be retrieved
Returns:
another code point that may serve as a mirror-image substitute, or ch itself if there is no such mapping or ch does not have the "mirrored" property

getCombiningClass

public static int getCombiningClass(int ch)
Gets the combining class of the argument codepoint
Parameters:
ch - code point whose combining is to be retrieved
Returns:
the combining class of the codepoint

isLegal

public static boolean isLegal(int ch)
A code point is illegal if and only if Note: legal does not mean that it is assigned in this version of Unicode.
Parameters:
ch - code point to determine if it is a legal code point by itself
Returns:
true if and only if legal.

isLegal

public static boolean isLegal(java.lang.String str)
A string is legal iff all its code points are legal. A code point is illegal if and only if Note: legal does not mean that it is assigned in this version of Unicode.
Parameters:
ch - code point to determine if it is a legal code point by itself
Returns:
true if and only if legal.

getUnicodeVersion

public static VersionInfo getUnicodeVersion()
Gets the version of Unicode data used.
Returns:
the unicode version number used

getName

public static java.lang.String getName(int ch)
Retrieve the most current Unicode name of the argument code point, or null if the character is unassigned or outside the range UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
Note calling any methods related to code point names, e.g. get*Name*() incurs a one-time initialisation cost to construct the name tables.
Parameters:
ch - the code point for which to get the name
Returns:
most current Unicode name

getName1_0

public static java.lang.String getName1_0(int ch)
Retrieve the earlier version 1.0 Unicode name of the argument code point, or null if the character is unassigned or outside the range UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name.
Note calling any methods related to code point names, e.g. get*Name*() incurs a one-time initialisation cost to construct the name tables.
Parameters:
ch - the code point for which to get the name
Returns:
version 1.0 Unicode name

getExtendedName

public static java.lang.String getExtendedName(int ch)

Retrieves a name for a valid codepoint. Unlike, getName(int) and getName1_0(int), this method will return a name even for codepoints that are not assigned a name in UnicodeData.txt.

The names are returned in the following order. Note calling any methods related to code point names, e.g. get*Name*() incurs a one-time initialisation cost to construct the name tables.
Parameters:
ch - the code point for which to get the name
Returns:
a name for the argument codepoint

getCharFromName

public static int getCharFromName(java.lang.String name)

Find a Unicode code point by its most current Unicode name and return its code point value. All Unicode names are in uppercase.

Note calling any methods related to code point names, e.g. get*Name*() incurs a one-time initialisation cost to construct the name tables.
Parameters:
name - most current Unicode character name whose code point is to be returned
Returns:
code point or -1 if name is not found

getCharFromName1_0

public static int getCharFromName1_0(java.lang.String name)

Find a Unicode character by its version 1.0 Unicode name and return its code point value. All Unicode names are in uppercase.

Note calling any methods related to code point names, e.g. get*Name*() incurs a one-time initialisation cost to construct the name tables.
Parameters:
name - Unicode 1.0 code point name whose code point is to returned
Returns:
code point or -1 if name is not found

getCharFromExtendedName

public static int getCharFromExtendedName(java.lang.String name)

Find a Unicode character by either its name and return its code point value. All Unicode names are in uppercase. Extended names are all lowercase except for numbers and are contained within angle brackets.

The names are searched in the following order Note calling any methods related to code point names, e.g. get*Name*() incurs a one-time initialisation cost to construct the name tables.
Parameters:
name - codepoint name
Returns:
code point associated with the name or -1 if the name is not found.

getCodePoint

public static int getCodePoint(char lead,
                               char trail)
Returns a code point corresponding to the two UTF16 characters.
Parameters:
lead - the lead char
trail - the trail char
Returns:
code point if surrogate characters are valid.
Throws:
java.lang.IllegalArgumentException - thrown when argument characters do not form a valid codepoint

getCodePoint

public static int getCodePoint(char char16)
Returns the code point corresponding to the UTF16 character.
Parameters:
char16 - the UTF16 character
Returns:
code point if argument is a valid character.
Throws:
java.lang.IllegalArgumentException - thrown when char16 is not a valid codepoint

toUpperCase

public static java.lang.String toUpperCase(java.lang.String str)
Gets uppercase version of the argument string. Casing is dependent on the default locale and context-sensitive.
Parameters:
str - source string to be performed on
Returns:
uppercase version of the argument string

toLowerCase

public static java.lang.String toLowerCase(java.lang.String str)
Gets lowercase version of the argument string. Casing is dependent on the default locale and context-sensitive
Parameters:
str - source string to be performed on
Returns:
lowercase version of the argument string

toTitleCase

public static java.lang.String toTitleCase(java.lang.String str,
                                           BreakIterator breakiter)

Gets the titlecase version of the argument string.

Position for titlecasing is determined by the argument break iterator, hence the user can customized his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.

Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.

Casing is dependent on the default locale and context-sensitive

Parameters:
str - source string to be performed on
breakiter - break iterator to determine the positions in which the character should be title cased.
Returns:
lowercase version of the argument string

toUpperCase

public static java.lang.String toUpperCase(java.util.Locale locale,
                                           java.lang.String str)
Gets uppercase version of the argument string. Casing is dependent on the argument locale and context-sensitive.
Parameters:
locale - which string is to be converted in
str - source string to be performed on
Returns:
uppercase version of the argument string

toLowerCase

public static java.lang.String toLowerCase(java.util.Locale locale,
                                           java.lang.String str)
Gets lowercase version of the argument string. Casing is dependent on the argument locale and context-sensitive
Parameters:
locale - which string is to be converted in
str - source string to be performed on
Returns:
lowercase version of the argument string

toTitleCase

public static java.lang.String toTitleCase(java.util.Locale locale,
                                           java.lang.String str,
                                           BreakIterator breakiter)

Gets the titlecase version of the argument string.

Position for titlecasing is determined by the argument break iterator, hence the user can customized his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.

Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.

Casing is dependent on the argument locale and context-sensitive

Parameters:
locale - which string is to be converted in
str - source string to be performed on
breakiter - break iterator to determine the positions in which the character should be title cased.
Returns:
lowercase version of the argument string

foldCase

public static int foldCase(int ch,
                           boolean defaultmapping)
The given character is mapped to its case folding equivalent according to UnicodeData.txt and CaseFolding.txt; if the character has no case folding equivalent, the character itself is returned. Only "simple", single-code point case folding mappings are used. For "full", multiple-code point mappings use the API foldCase(String str, boolean defaultmapping).
Parameters:
ch - the character to be converted
defaultmapping - Indicates if all mappings defined in CaseFolding.txt is to be used, otherwise the mappings for dotted I and dotless i marked with 'I' in CaseFolding.txt will be skipped.
Returns:
the case folding equivalent of the character, if any; otherwise the character itself.
See Also:
foldCase(String, boolean)

foldCase

public static java.lang.String foldCase(java.lang.String str,
                                        boolean defaultmapping)
The given string is mapped to its case folding equivalent according to UnicodeData.txt and CaseFolding.txt; if any character has no case folding equivalent, the character itself is returned. "Full", multiple-code point case folding mappings are returned here. For "simple" single-code point mappings use the API foldCase(int ch, boolean defaultmapping).
Parameters:
str - the String to be converted
defaultmapping - Indicates if all mappings defined in CaseFolding.txt is to be used, otherwise the mappings for dotted I and dotless i marked with 'I' in CaseFolding.txt will be skipped.
Returns:
the case folding equivalent of the character, if any; otherwise the character itself.
See Also:
foldCase(int, boolean)

getHanNumericValue

public static int getHanNumericValue(int ch)
Return numeric value of Han code points.
This returns the value of Han 'numeric' code points, including those for zero, ten, hundred, thousand, ten thousand, and hundred million. Unicode does not consider these to be numeric. This includes both the standard and 'checkwriting' characters, the 'big circle' zero character, and the standard zero character.
Parameters:
ch - code point to query
Returns:
value if it is a Han 'numeric character,' otherwise return -1.

getTypeIterator

public static RangeValueIterator getTypeIterator()

Gets an iterator for character types, iterating over codepoints.

Example of use:
 RangeValueIterator iterator = UCharacter.getTypeIterator();
 RangeValueIterator.Element element = new RangeValueIterator.Element();
 while (iterator.next(element)) {
     System.out.println("Codepoint \\u" + 
                        Integer.toHexString(element.start) + 
                        " to codepoint \\u" +
                        Integer.toHexString(element.limit - 1) + 
                        " has the character type " + 
                        element.value);
 }
 
Returns:
an iterator

getNameIterator

public static ValueIterator getNameIterator()

Gets an iterator for character names, iterating over codepoints.

This API only gets the iterator for the modern, most up-to-date Unicode names. For older 1.0 Unicode names use get1_0NameIterator() or for extended names use getExtendedNameIterator().

Example of use:
 ValueIterator iterator = UCharacter.getNameIterator();
 ValueIterator.Element element = new ValueIterator.Element();
 while (iterator.next(element)) {
     System.out.println("Codepoint \\u" + 
                        Integer.toHexString(element.codepoint) +
                        " has the name " + (String)element.value);
 }
 

The maximal range which the name iterator iterates is from UCharacter.MIN_VALUE to UCharacter.MAX_VALUE.

Returns:
an iterator

getName1_0Iterator

public static ValueIterator getName1_0Iterator()

Gets an iterator for character names, iterating over codepoints.

This API only gets the iterator for the older 1.0 Unicode names. For modern, most up-to-date Unicode names use getNameIterator() or for extended names use getExtendedNameIterator().

Example of use:
 ValueIterator iterator = UCharacter.get1_0NameIterator();
 ValueIterator.Element element = new ValueIterator.Element();
 while (iterator.next(element)) {
     System.out.println("Codepoint \\u" + 
                        Integer.toHexString(element.codepoint) +
                        " has the name " + (String)element.value);
 }
 

The maximal range which the name iterator iterates is from

Returns:
an iterator

getExtendedNameIterator

public static ValueIterator getExtendedNameIterator()

Gets an iterator for character names, iterating over codepoints.

This API only gets the iterator for the extended names. For modern, most up-to-date Unicode names use getNameIterator() or for older 1.0 Unicode names use get1_0NameIterator().

Example of use:
 ValueIterator iterator = UCharacter.getExtendedNameIterator();
 ValueIterator.Element element = new ValueIterator.Element();
 while (iterator.next(element)) {
     System.out.println("Codepoint \\u" + 
                        Integer.toHexString(element.codepoint) +
                        " has the name " + (String)element.value);
 }
 

The maximal range which the name iterator iterates is from

Returns:
an iterator

getAge

public static VersionInfo getAge(int ch)

Get the "age" of the code point.

The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.

This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.

The data is from the UCD file DerivedAge.txt.

Parameters:
ch - The code point.
Returns:
the Unicode version number

hasBinaryProperty

public static boolean hasBinaryProperty(int ch,
                                        int property)

Check a binary Unicode property for a code point.

Unicode, especially in version 3.2, defines many more properties than the original set in UnicodeData.txt.

This API is intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).

For details about the properties see http://www.unicode.org/.

For names of Unicode properties see the UCD file PropertyAliases.txt.

This API does not check the validity of the codepoint.

Important: If ICU is built with UCD files from Unicode versions below 3.2, then properties marked with "new" are not or not fully available.

Parameters:
codepoint - Code point to test.
property - selector constant from com.ibm.icu.lang.UProperty, identifies which binary property to check.
Returns:
true or false according to the binary Unicode property value for ch. Also false if property is out of bounds or if the Unicode version does not have data for the property at all, or not for this code point.
See Also:
UProperty

isUAlphabetic

public static boolean isUAlphabetic(int ch)

Check if a code point has the Alphabetic Unicode property.

Same as UCharacter.hasBinaryProperty(ch, UProperty.ALPHABETIC).

Different from UCharacter.isLetter(ch)!

Parameters:
ch - codepoint to be tested

isULowercase

public static boolean isULowercase(int ch)

Check if a code point has the Lowercase Unicode property.

Same as UCharacter.hasBinaryProperty(ch, UProperty.LOWERCASE).

This is different from UCharacter.isLowerCase(ch)!

Parameters:
ch - codepoint to be tested

isUUppercase

public static boolean isUUppercase(int ch)

Check if a code point has the Uppercase Unicode property.

Same as UCharacter.hasBinaryProperty(ch, UProperty.UPPERCASE).

This is different from UCharacter.isUpperCase(ch)!

Parameters:
ch - codepoint to be tested

isUWhiteSpace

public static boolean isUWhiteSpace(int ch)

Check if a code point has the White_Space Unicode property.

Same as UCharacter.hasBinaryProperty(ch, UProperty.WHITE_SPACE).

This is different from both UCharacter.isSpace(ch) and UCharacter.isWhiteSpace(ch)!

Parameters:
ch - codepoint to be tested


Copyright (c) 2001 IBM Corporation and others.