|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--com.ibm.icu.lang.UCharacter
The UCharacter class provides extensions to the java.lang.Character class. These extensions provide support for Unicode 3.1 properties and together with the UTF16 class, provide support for supplementary characters (those with code points above U+FFFF).
Code points are represented in these API using ints. While it would be more convenient in Java to have a separate primitive datatype for them, ints suffice in the meantime.
To use this class please add the jar file name icu4j.jar to the
class path, since it contains data files which supply the information used
by this file.
E.g. In Windows
set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar
.
Otherwise, another method would be to copy the files uprops.dat and
unames.dat from the icu4j source subdirectory
$ICU4J_SRC/src/com.ibm.icu.impl.data to your class directory
$ICU4J_CLASS/com.ibm.icu.impl.data.
Aside from the additions for UTF-16 support, and the updated Unicode 3.1 properties, the main differences between UCharacter and Character are:
Further detail differences can be determined from the program com.ibm.icu.dev.test.lang.UCharacterCompare
UCharacterCategory
,
UCharacterDirection
Field Summary | |
static int |
MAX_VALUE
The highest Unicode code point value (scalar value) according to the Unicode Standard. |
static int |
MIN_VALUE
The lowest Unicode code point value. |
protected static com.ibm.icu.lang.UCharacterName |
NAME_
Database storing the sets of character name |
static int |
REPLACEMENT_CHAR
Unicode value used when translating into Unicode encoding form and there is no existing character. |
static int |
SUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points |
Method Summary | |
static int |
digit(int ch)
Retrieves the numeric value of a decimal digit code point. |
static int |
digit(int ch,
int radix)
Retrieves the numeric value of a decimal digit code point. |
static int |
foldCase(int ch,
boolean defaultmapping)
The given character is mapped to its case folding equivalent according to UnicodeData.txt and CaseFolding.txt; if the character has no case folding equivalent, the character itself is returned. |
static java.lang.String |
foldCase(java.lang.String str,
boolean defaultmapping)
The given string is mapped to its case folding equivalent according to UnicodeData.txt and CaseFolding.txt; if any character has no case folding equivalent, the character itself is returned. |
static VersionInfo |
getAge(int ch)
Get the "age" of the code point. |
static int |
getCharFromExtendedName(java.lang.String name)
Find a Unicode character by either its name and return its code point value. |
static int |
getCharFromName(java.lang.String name)
Find a Unicode code point by its most current Unicode name and return its code point value. |
static int |
getCharFromName1_0(java.lang.String name)
Find a Unicode character by its version 1.0 Unicode name and return its code point value. |
static int |
getCodePoint(char char16)
Returns the code point corresponding to the UTF16 character. |
static int |
getCodePoint(char lead,
char trail)
Returns a code point corresponding to the two UTF16 characters. |
static int |
getCombiningClass(int ch)
Gets the combining class of the argument codepoint |
static int |
getDirection(int ch)
Returns the Bidirection property of a code point. |
static java.lang.String |
getExtendedName(int ch)
Retrieves a name for a valid codepoint. |
static ValueIterator |
getExtendedNameIterator()
Gets an iterator for character names, iterating over codepoints. |
static int |
getHanNumericValue(int ch)
Return numeric value of Han code points. |
static int |
getMirror(int ch)
Maps the specified code point to a "mirror-image" code point. |
static java.lang.String |
getName(int ch)
Retrieve the most current Unicode name of the argument code point, or null if the character is unassigned or outside the range UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name. |
static java.lang.String |
getName1_0(int ch)
Retrieve the earlier version 1.0 Unicode name of the argument code point, or null if the character is unassigned or outside the range UCharacter.MIN_VALUE and UCharacter.MAX_VALUE or does not have a name. |
static ValueIterator |
getName1_0Iterator()
Gets an iterator for character names, iterating over codepoints. |
static ValueIterator |
getNameIterator()
Gets an iterator for character names, iterating over codepoints. |
static int |
getNumericValue(int ch)
Returns the numeric value of the code point as a nonnegative integer. |
static int |
getType(int ch)
Returns a value indicating a code point's Unicode category. |
static RangeValueIterator |
getTypeIterator()
Gets an iterator for character types, iterating over codepoints. |
static int |
getUnicodeNumericValue(int ch)
Returns the Unicode numeric value of the code point as a nonnegative integer. |
static VersionInfo |
getUnicodeVersion()
Gets the version of Unicode data used. |
static boolean |
hasBinaryProperty(int ch,
int property)
Check a binary Unicode property for a code point. |
static boolean |
isBaseForm(int ch)
Determines whether the specified code point is of base form. |
static boolean |
isBMP(int ch)
Determines if the code point is in the BMP plane. |
static boolean |
isDefined(int ch)
Determines if a code point has a defined meaning in the up-to-date Unicode standard. |
static boolean |
isDigit(int ch)
Determines if a code point is a Java digit. |
static boolean |
isIdentifierIgnorable(int ch)
Determines if the specified code point should be regarded as an ignorable character in a Unicode identifier. |
static boolean |
isISOControl(int ch)
Determines if the specified code point is an ISO control character. |
static boolean |
isLegal(int ch)
A code point is illegal if and only if Out of bounds, less than 0 or greater than UCharacter.MAX_VALUE A surrogate value, 0xD800 to 0xDFFF Not-a-character, having the form 0x xxFFFF or 0x xxFFFE Note: legal does not mean that it is assigned in this version of Unicode. |
static boolean |
isLegal(java.lang.String str)
A string is legal iff all its code points are legal. |
static boolean |
isLetter(int ch)
Determines if the specified code point is a letter. |
static boolean |
isLetterOrDigit(int ch)
Determines if the specified code point is a letter or digit. |
static boolean |
isLowerCase(int ch)
Determines if the specified code point is a lowercase character. |
static boolean |
isMirrored(int ch)
Determines whether the code point has the "mirrored" property. |
static boolean |
isPrintable(int ch)
Determines whether the specified code point is a printable character according to the Unicode standard. |
static boolean |
isSpaceChar(int ch)
Determines if the specified code point is a Unicode specified space character, i.e. |
static boolean |
isSupplementary(int ch)
Determines if the code point is a supplementary character. |
static boolean |
isTitleCase(int ch)
Determines if the specified code point is a titlecase character. |
static boolean |
isUAlphabetic(int ch)
Check if a code point has the Alphabetic Unicode property. |
static boolean |
isULowercase(int ch)
Check if a code point has the Lowercase Unicode property. |
static boolean |
isUnicodeIdentifierPart(int ch)
Determines if the specified code point may be any part of a Unicode identifier other than the starting character. |
static boolean |
isUnicodeIdentifierStart(int ch)
Determines if the specified code point is permissible as the first character in a Unicode identifier. |
static boolean |
isUpperCase(int ch)
Determines if the specified code point is an uppercase character. |
static boolean |
isUUppercase(int ch)
Check if a code point has the Uppercase Unicode property. |
static boolean |
isUWhiteSpace(int ch)
Check if a code point has the White_Space Unicode property. |
static boolean |
isWhitespace(int ch)
Determines if the specified code point is a white space character. |
static int |
toLowerCase(int ch)
The given code point is mapped to its lowercase equivalent; if the code point has no lowercase equivalent, the code point itself is returned. |
static java.lang.String |
toLowerCase(java.util.Locale locale,
java.lang.String str)
Gets lowercase version of the argument string. |
static java.lang.String |
toLowerCase(java.lang.String str)
Gets lowercase version of the argument string. |
static java.lang.String |
toString(int ch)
Converts argument code point and returns a String object representing the code point's value in UTF16 format. |
static int |
toTitleCase(int ch)
Converts the code point argument to titlecase. |
static java.lang.String |
toTitleCase(java.util.Locale locale,
java.lang.String str,
BreakIterator breakiter)
Gets the titlecase version of the argument string. |
static java.lang.String |
toTitleCase(java.lang.String str,
BreakIterator breakiter)
Gets the titlecase version of the argument string. |
static int |
toUpperCase(int ch)
Converts the character argument to uppercase. |
static java.lang.String |
toUpperCase(java.util.Locale locale,
java.lang.String str)
Gets uppercase version of the argument string. |
static java.lang.String |
toUpperCase(java.lang.String str)
Gets uppercase version of the argument string. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final int MIN_VALUE
public static final int MAX_VALUE
public static final int SUPPLEMENTARY_MIN_VALUE
public static final int REPLACEMENT_CHAR
protected static final com.ibm.icu.lang.UCharacterName NAME_
Method Detail |
public static int digit(int ch, int radix)
java.lang.Character.digit()
. Note that this
will return positive values for code points for which isDigit
returns false, just like java.lang.Character.
ch
- the code point to queryradix
- the radixpublic static int digit(int ch)
digit(int, int)
that provides a decimal radix.
ch
- the code point to querypublic static int getNumericValue(int ch)
ch
- the code point to querypublic static int getUnicodeNumericValue(int ch)
ch
- the code point to querypublic static int getType(int ch)
ch
- code point whose type is to be determinedpublic static boolean isDefined(int ch)
ch
- code point to be determined if it is defined in the most current
version of Unicodepublic static boolean isDigit(int ch)
java.lang.Character.isDigit()
. It returns true for
decimal digits only.
ch
- code point to querypublic static boolean isISOControl(int ch)
ch
- code point to determine if it is an ISO control characterpublic static boolean isLetter(int ch)
ch
- code point to determine if it is a letterpublic static boolean isLetterOrDigit(int ch)
ch
- code point to determine if it is a letter or a digitpublic static boolean isLowerCase(int ch)
ch
- code point to determine if it is in lowercasepublic static boolean isWhitespace(int ch)
ch
- code point to determine if it is a white spacepublic static boolean isSpaceChar(int ch)
ch
- code point to determine if it is a spacepublic static boolean isTitleCase(int ch)
ch
- code point to determine if it is in title casepublic static boolean isUnicodeIdentifierPart(int ch)
ch
- code point to determine if is can be part of a Unicode identifierpublic static boolean isUnicodeIdentifierStart(int ch)
ch
- code point to determine if it can start a Unicode identifierpublic static boolean isIdentifierIgnorable(int ch)
ch
- code point to be determined if it can be ignored in a Unicode
identifier.public static boolean isUpperCase(int ch)
ch
- code point to determine if it is in uppercasepublic static int toLowerCase(int ch)
ch
- code point whose lowercase equivalent is to be retrievedpublic static java.lang.String toString(int ch)
ch
- code pointpublic static int toTitleCase(int ch)
ch
- code point whose title case is to be retrievedpublic static int toUpperCase(int ch)
ch
- code point whose uppercase is to be retrievedpublic static boolean isSupplementary(int ch)
ch
- code point to be determined if it is in the supplementary planepublic static boolean isBMP(int ch)
ch
- code point to be determined if it is not a supplementary
characterpublic static boolean isPrintable(int ch)
ch
- code point to be determined if it is printablepublic static boolean isBaseForm(int ch)
ch
- code point to be determined if it is of base formpublic static int getDirection(int ch)
ch
- the code point to be determined its directionpublic static boolean isMirrored(int ch)
ch
- code point whose mirror is to be determinedpublic static int getMirror(int ch)
ch
- code point whose mirror is to be retrievedpublic static int getCombiningClass(int ch)
ch
- code point whose combining is to be retrievedpublic static boolean isLegal(int ch)
ch
- code point to determine if it is a legal code point by itselfpublic static boolean isLegal(java.lang.String str)
ch
- code point to determine if it is a legal code point by itselfpublic static VersionInfo getUnicodeVersion()
public static java.lang.String getName(int ch)
ch
- the code point for which to get the namepublic static java.lang.String getName1_0(int ch)
ch
- the code point for which to get the namepublic static java.lang.String getExtendedName(int ch)
Retrieves a name for a valid codepoint. Unlike, getName(int) and getName1_0(int), this method will return a name even for codepoints that are not assigned a name in UnicodeData.txt.
The names are returned in the following order.ch
- the code point for which to get the namepublic static int getCharFromName(java.lang.String name)
Find a Unicode code point by its most current Unicode name and return its code point value. All Unicode names are in uppercase.
Note calling any methods related to code point names, e.g. get*Name*() incurs a one-time initialisation cost to construct the name tables.name
- most current Unicode character name whose code point is to be
returnedpublic static int getCharFromName1_0(java.lang.String name)
Find a Unicode character by its version 1.0 Unicode name and return its code point value. All Unicode names are in uppercase.
Note calling any methods related to code point names, e.g. get*Name*() incurs a one-time initialisation cost to construct the name tables.name
- Unicode 1.0 code point name whose code point is to
returnedpublic static int getCharFromExtendedName(java.lang.String name)
Find a Unicode character by either its name and return its code point value. All Unicode names are in uppercase. Extended names are all lowercase except for numbers and are contained within angle brackets.
The names are searched in the following ordername
- codepoint namepublic static int getCodePoint(char lead, char trail)
lead
- the lead chartrail
- the trail charjava.lang.IllegalArgumentException
- thrown when argument characters do
not form a valid codepointpublic static int getCodePoint(char char16)
char16
- the UTF16 characterjava.lang.IllegalArgumentException
- thrown when char16 is not a valid
codepointpublic static java.lang.String toUpperCase(java.lang.String str)
str
- source string to be performed onpublic static java.lang.String toLowerCase(java.lang.String str)
str
- source string to be performed onpublic static java.lang.String toTitleCase(java.lang.String str, BreakIterator breakiter)
Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customized his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the default locale and context-sensitive
str
- source string to be performed onbreakiter
- break iterator to determine the positions in which
the character should be title cased.public static java.lang.String toUpperCase(java.util.Locale locale, java.lang.String str)
locale
- which string is to be converted instr
- source string to be performed onpublic static java.lang.String toLowerCase(java.util.Locale locale, java.lang.String str)
locale
- which string is to be converted instr
- source string to be performed onpublic static java.lang.String toTitleCase(java.util.Locale locale, java.lang.String str, BreakIterator breakiter)
Gets the titlecase version of the argument string.
Position for titlecasing is determined by the argument break iterator, hence the user can customized his break iterator for a specialized titlecasing. In this case only the forward iteration needs to be implemented. If the break iterator passed in is null, the default Unicode algorithm will be used to determine the titlecase positions.
Only positions returned by the break iterator will be title cased, character in between the positions will all be in lower case.
Casing is dependent on the argument locale and context-sensitive
locale
- which string is to be converted instr
- source string to be performed onbreakiter
- break iterator to determine the positions in which
the character should be title cased.public static int foldCase(int ch, boolean defaultmapping)
ch
- the character to be converteddefaultmapping
- Indicates if all mappings defined in CaseFolding.txt
is to be used, otherwise the mappings for dotted I
and dotless i marked with 'I' in CaseFolding.txt will
be skipped.foldCase(String, boolean)
public static java.lang.String foldCase(java.lang.String str, boolean defaultmapping)
str
- the String to be converteddefaultmapping
- Indicates if all mappings defined in CaseFolding.txt
is to be used, otherwise the mappings for dotted I
and dotless i marked with 'I' in CaseFolding.txt will
be skipped.foldCase(int, boolean)
public static int getHanNumericValue(int ch)
ch
- code point to querypublic static RangeValueIterator getTypeIterator()
Gets an iterator for character types, iterating over codepoints.
Example of use:RangeValueIterator iterator = UCharacter.getTypeIterator(); RangeValueIterator.Element element = new RangeValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.start) + " to codepoint \\u" + Integer.toHexString(element.limit - 1) + " has the character type " + element.value); }
public static ValueIterator getNameIterator()
Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the modern, most up-to-date Unicode names. For older 1.0 Unicode names use get1_0NameIterator() or for extended names use getExtendedNameIterator().
Example of use:ValueIterator iterator = UCharacter.getNameIterator(); ValueIterator.Element element = new ValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.codepoint) + " has the name " + (String)element.value); }
The maximal range which the name iterator iterates is from UCharacter.MIN_VALUE to UCharacter.MAX_VALUE.
public static ValueIterator getName1_0Iterator()
Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the older 1.0 Unicode names. For modern, most up-to-date Unicode names use getNameIterator() or for extended names use getExtendedNameIterator().
Example of use:ValueIterator iterator = UCharacter.get1_0NameIterator(); ValueIterator.Element element = new ValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.codepoint) + " has the name " + (String)element.value); }
The maximal range which the name iterator iterates is from
public static ValueIterator getExtendedNameIterator()
Gets an iterator for character names, iterating over codepoints.
This API only gets the iterator for the extended names. For modern, most up-to-date Unicode names use getNameIterator() or for older 1.0 Unicode names use get1_0NameIterator().
Example of use:ValueIterator iterator = UCharacter.getExtendedNameIterator(); ValueIterator.Element element = new ValueIterator.Element(); while (iterator.next(element)) { System.out.println("Codepoint \\u" + Integer.toHexString(element.codepoint) + " has the name " + (String)element.value); }
The maximal range which the name iterator iterates is from
public static VersionInfo getAge(int ch)
Get the "age" of the code point.
The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.
This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.
The data is from the UCD file DerivedAge.txt.
ch
- The code point.public static boolean hasBinaryProperty(int ch, int property)
Check a binary Unicode property for a code point.
Unicode, especially in version 3.2, defines many more properties than the original set in UnicodeData.txt.
This API is intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR).
For details about the properties see http://www.unicode.org/.
For names of Unicode properties see the UCD file PropertyAliases.txt.
This API does not check the validity of the codepoint.
Important: If ICU is built with UCD files from Unicode versions below 3.2, then properties marked with "new" are not or not fully available.
codepoint
- Code point to test.property
- selector constant from com.ibm.icu.lang.UProperty,
identifies which binary property to check.UProperty
public static boolean isUAlphabetic(int ch)
Check if a code point has the Alphabetic Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.ALPHABETIC).
Different from UCharacter.isLetter(ch)!
ch
- codepoint to be testedpublic static boolean isULowercase(int ch)
Check if a code point has the Lowercase Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.LOWERCASE).
This is different from UCharacter.isLowerCase(ch)!
ch
- codepoint to be testedpublic static boolean isUUppercase(int ch)
Check if a code point has the Uppercase Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.UPPERCASE).
This is different from UCharacter.isUpperCase(ch)!
ch
- codepoint to be testedpublic static boolean isUWhiteSpace(int ch)
Check if a code point has the White_Space Unicode property.
Same as UCharacter.hasBinaryProperty(ch, UProperty.WHITE_SPACE).
This is different from both UCharacter.isSpace(ch) and UCharacter.isWhiteSpace(ch)!
ch
- codepoint to be tested
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |