|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--com.ibm.text.UCharacter
A static class designed to be a generic code point information source that
handles surrogate pairs.
Data for code point information originates from Unicode 3.0 data files,
UnicodeData.txt and Mirror.txt, downloadable from the Unicode Consortium site
ftp://ftp.unicode.org/Public/
ICU's gennames and genprops programs are used to compact the information from
the above mentioned files before being used by this package. The binary
result files are named unames.dat and uprops.dat.
Both are jared with the package for release, hence to use this class please
add the jar file name ucharacter.jar
to your class path.
E.g. In Windows set CLASSPATH=%CLASSPATH%;$JAR_FILE_PATH/ucharacter.jar
For more information about the data file format, please refer to
Read Me.
Each code point used here in in terms of a 32 bit int. This is so as to
handle supplementary code points which has 21 bit in size.
APIs provide up-to-date Unicode implementation of java.lang.Character,
hence
Difference between UCharacter and java.lang.Character
UCharacterCategory
,
UCharacterDirection
,
com.ibm.icu.test.text.UCharacterCompare
,
com.ibm.icu.test.text.UCharacterTest
Field Summary | |
protected static int |
LEAD_SURROGATE_SHIFT_
Shift and mask value for surrogates |
static int |
MAX_VALUE
The highest Unicode code point value (scalar value) according to the Unicode Standard. This is a 21-bit value (21 bits, rounded up). Up-to-date Unicode implementation of java.lang.Character.MIN_VALUE |
static int |
MIN_VALUE
The lowest Unicode code point value. |
static int |
REPLACEMENT_CHAR
Unicode value used when translating into Unicode encoding form and there is no existing character. |
static int |
SUPPLEMENTARY_MIN_VALUE
The minimum value for Supplementary code points |
protected static int |
TRAIL_SURROGATE_MASK_
|
Method Summary | |
static int |
digit(int ch)
Retrieves the decimal numeric value of a digit code point in radix 10 Note this method, unlike java.lang.Character.digit() does not regard the ascii characters 'A' - 'Z' and 'a' - 'z' as digits. |
static int |
digit(int ch,
int radix)
Retrieves the decimal numeric value of a digit code point. A code point is a valid digit if the following is true: The method isDigit(ch) is true and the Unicode decimal digit value of ch is less than the specified radix. |
static int |
getCharFromName(java.lang.String name)
Find a Unicode code point by its most current Unicode name and return its code point value. Note calling any methods related to code point names, e.g. |
static int |
getCharFromName1_0(java.lang.String name)
Find a Unicode character by its version 1.0 Unicode name and return its code point value. Note calling any methods related to code point names, e.g. |
static int |
getCodePoint(char char16)
Returns the code point corresponding to the UTF16 character. If argument char16 is a surrogate character, UCharacter.REPLACEMENT_CHAR is returned |
static int |
getCodePoint(char lead,
char trail)
Returns a code pointcorresponding to the two UTF16 characters. If the argument lead is not a high surrogate character or trail is not a low surrogate character, UCharacter.REPLACEMENT_CHAR is returned. |
static byte |
getCombiningClass(int ch)
Gets the combining class of the argument codepoint |
static int |
getDirection(int ch)
Returns the Bidirection property of a code point. For example, 0x0041 (letter A) has the LEFT_TO_RIGHT directional property. Result returned belongs to the interface UCharacterDirection |
static int |
getMirror(int ch)
Maps the specified code point to a "mirror-image" code point. For code points with the "mirrored" property, implementations sometimes need a "poor man's" mapping to another code point such that the default glyph may serve as the mirror-image of the default glyph of the specified code point. This is useful for text conversion to and from codepages with visual order, and for displays without glyph selection capabilities. |
static java.lang.String |
getName(int ch)
Retrieve the most current Unicode name of the argument code point. Note calling any methods related to code point names, e.g. |
static java.lang.String |
getName1_0(int ch)
Retrieve the earlier version 1.0 Unicode name of the argument code point. |
static int |
getNumericValue(int ch)
Returns the Unicode numeric value of the code point as a nonnegative integer. |
protected static int |
getRawSupplementary(char lead,
char trail)
Forms a supplementary code point from the argument character Note this is for internal use hence no checks for the validity of the surrogate characters are done |
static int |
getType(int ch)
Returns a value indicating a code point's Unicode category. Up-to-date Unicode implementation of java.lang.Character.getType() except for the above mentioned code points that had their category changed. Return results are constants from the interface UCharacterCategory |
static java.lang.String |
getUnicodeVersion()
Gets the version of Unicode data used. |
static boolean |
isBaseForm(int ch)
Determines whether the specified code point is of base form. A code point of base form does not graphically combine with preceding characters, and is neither a control nor a format character. |
static boolean |
isBMP(int ch)
Determines if the code point is in the BMP plane. |
static boolean |
isDefined(int ch)
Determines if a code point has a defined meaning in the up-to-date Unicode standard. E.g. |
static boolean |
isDigit(int ch)
Determines if a code point is a digit. Note this method, unlike java.lang.Character.isDigit() does not regard the ascii characters 'A' - 'Z' and 'a' - 'z' as digits. |
static boolean |
isIdentifierIgnorable(int ch)
Determines if the specified code point should be regarded as an ignorable character in a Unicode identifier. A character is ignorable in the Unicode standard if it is of the type Cf, Formatting code. Up-to-date Unicode implementation of java.lang.Character.isIdentifierIgnorable(). See UTR #8. |
static boolean |
isISOControl(int ch)
Determines if the specified code point is an ISO control character. A code point is considered to be an ISO control character if it is in the range \u0000 through \u001F or in the range \u007F through \u009F. Up-to-date Unicode implementation of java.lang.Character.isISOControl() |
static boolean |
isLegal(int ch)
A code point is illegal if and only if Out of bounds, less than 0 or greater than UCharacter.MAX_VALUE A surrogate value, 0xD800 to 0xDFFF Not-a-character, having the form 0x xxFFFF or 0x xxFFFE Note: legal does not mean that it is assigned in this version of Unicode. |
static boolean |
isLegal(java.lang.String str)
A string is legal iff all its code points are legal. |
static boolean |
isLetter(int ch)
Determines if the specified code point is a letter. Up-to-date Unicode implementation of java.lang.Character.isLetter() |
static boolean |
isLetterOrDigit(int ch)
Determines if the specified code point is a letter or digit. Note this method, unlike java.lang.Character does not regard the ascii characters 'A' - 'Z' and 'a' - 'z' as digits. |
static boolean |
isLowerCase(int ch)
Determines if the specified code point is a lowercase character. UnicodeData only contains case mappings for code points where they are one-to-one mappings; it also omits information about context-sensitive case mappings. For more information about Unicode case mapping please refer to the Technical report #21. Up-to-date Unicode implementation of java.lang.Character.isLowerCase() |
static boolean |
isMirrored(int ch)
Determines whether the code point has the "mirrored" property. This property is set for characters that are commonly used in Right-To-Left contexts and need to be displayed with a "mirrored" glyph. |
static boolean |
isPrintable(int ch)
Determines whether the specified code point is a printable character according to the Unicode standard. |
static boolean |
isSpaceChar(int ch)
Determines if the specified code point is a Unicode specified space character, ie if code point is in the category Zs, Zl and Zp. Up-to-date Unicode implementation of java.lang.Character.isSpaceChar(). |
static boolean |
isSupplementary(int ch)
Determines if the code point is a supplementary character. A code point is a supplementary character if and only if it is greater than SUPPLEMENTARY_MIN_VALUE |
static boolean |
isTitleCase(int ch)
Determines if the specified code point is a titlecase character. UnicodeData only contains case mappings for code points where they are one-to-one mappings; it also omits information about context-sensitive case mappings. For more information about Unicode case mapping please refer to the Technical report #21. Up-to-date Unicode implementation of java.lang.Character.isTitleCase(). |
static boolean |
isUnicodeIdentifierPart(int ch)
Determines if the specified code point may be any part of a Unicode identifier other than the starting character. A code point may be part of a Unicode identifier if and only if it is one of the following: Lu Uppercase letter Ll Lowercase letter Lt Titlecase letter Lm Modifier letter Lo Other letter Nl Letter number Pc Connecting punctuation character Nd decimal number Mc Spacing combining mark Mn Non-spacing mark Cf formatting code Up-to-date Unicode implementation of java.lang.Character.isUnicodeIdentifierPart(). See UTR #8. |
static boolean |
isUnicodeIdentifierStart(int ch)
Determines if the specified code point is permissible as the first character in a Unicode identifier. A code point may start a Unicode identifier if it is of type either Lu Uppercase letter Ll Lowercase letter Lt Titlecase letter Lm Modifier letter Lo Other letter Nl Letter number Up-to-date Unicode implementation of java.lang.Character.isUnicodeIdentifierStart(). See UTR #8. |
static boolean |
isUpperCase(int ch)
Determines if the specified code point is an uppercase character. UnicodeData only contains case mappings for code point where they are one-to-one mappings; it also omits information about context-sensitive case mappings. For language specific case conversion behavior, use toUpperCase(locale, str). |
static boolean |
isWhitespace(int ch)
Determines if the specified code point is a white space character. A code point is considered to be an whitespace character if and only if it satisfies one of the following criteria: It is a Unicode space separator (category "Zs"), but is not a no-break space (\u00A0 or \u202F or \uFEFF). |
static int |
toLowerCase(int ch)
The given code point is mapped to its lowercase equivalent; if the code point has no lowercase equivalent, the code point itself is returned. UnicodeData only contains case mappings for code point where they are one-to-one mappings; it also omits information about context-sensitive case mappings. For language specific case conversion behavior, use toLowerCase(locale, str). |
static java.lang.String |
toLowerCase(java.util.Locale locale,
java.lang.String str)
Gets lowercase version of the argument string. |
static java.lang.String |
toLowerCase(java.lang.String str)
Gets lowercase version of the argument string. |
static java.lang.String |
toString(int ch)
Converts argument code point and returns a String object representing the code point's value in UTF16 format. The result is a string whose length is 1 for non-supplementary code points, 2 otherwise. com.ibm.ibm.icu.UTF16 can be used to parse Strings generated by this function. Up-to-date Unicode implementation of java.lang.Character.toString() |
static int |
toTitleCase(int ch)
Converts the code point argument to titlecase. UnicodeData only contains case mappings for code points where they are one-to-one mappings; it also omits information about context-sensitive case mappings. There are only four Unicode characters that are truly titlecase forms that are distinct from uppercase forms. |
static int |
toUpperCase(int ch)
Converts the character argument to uppercase. UnicodeData only contains case mappings for characters where they are one-to-one mappings; it also omits information about context-sensitive case mappings. For more information about Unicode case mapping please refer to the Technical report #21. If no uppercase is available, the character itself is returned. Up-to-date Unicode implementation of java.lang.Character.toUpperCase() |
static java.lang.String |
toUpperCase(java.util.Locale locale,
java.lang.String str)
Gets uppercase version of the argument string. |
static java.lang.String |
toUpperCase(java.lang.String str)
Gets uppercase version of the argument string. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final int MIN_VALUE
public static final int MAX_VALUE
public static final int SUPPLEMENTARY_MIN_VALUE
public static final int REPLACEMENT_CHAR
protected static final int LEAD_SURROGATE_SHIFT_
protected static final int TRAIL_SURROGATE_MASK_
Method Detail |
public static int digit(int ch, int radix)
ch
- the code point whose numeric value is to be determinedradix
- the radix which the digit is to be converted topublic static int digit(int ch)
ch
- the code point whose numeric value is to be determinedpublic static int getNumericValue(int ch)
ch
- Unicode code pointpublic static int getType(int ch)
ch
- code point whose type is to be determinedpublic static boolean isDefined(int ch)
ch
- code point to be determined if it is defined in the most current
version of Unicodepublic static boolean isDigit(int ch)
ch
- code point to determine if it is a digitpublic static boolean isISOControl(int ch)
ch
- code point to determine if it is an ISO control characterpublic static boolean isLetter(int ch)
ch
- code point to determine if it is a letterpublic static boolean isLetterOrDigit(int ch)
ch
- code point to determine if it is a letter or a digitpublic static boolean isLowerCase(int ch)
ch
- code point to determine if it is in lowercasepublic static boolean isWhitespace(int ch)
ch
- code point to determine if it is a white spacepublic static boolean isSpaceChar(int ch)
ch
- code point to determine if it is a spacepublic static boolean isTitleCase(int ch)
ch
- code point to determine if it is in title casepublic static boolean isUnicodeIdentifierPart(int ch)
ch
- code point to determine if is can be part of a Unicode identifierpublic static boolean isUnicodeIdentifierStart(int ch)
ch
- code point to determine if it can start a Unicode identifierpublic static boolean isIdentifierIgnorable(int ch)
ch
- code point to be determined if it can be ignored in a Unicode
identifier.public static boolean isUpperCase(int ch)
ch
- code point to determine if it is in uppercasepublic static int toLowerCase(int ch)
ch
- code point whose lowercase equivalent is to be retrievedpublic static java.lang.String toString(int ch)
ch
- code pointpublic static int toTitleCase(int ch)
ch
- code point whose title case is to be retrievedpublic static int toUpperCase(int ch)
ch
- code point whose uppercase is to be retrievedpublic static boolean isSupplementary(int ch)
ch
- code point to be determined if it is in the supplementary planepublic static boolean isBMP(int ch)
ch
- code point to be determined if it is not a supplementary
characterpublic static boolean isPrintable(int ch)
ch
- code point to be determined if it is printablepublic static boolean isBaseForm(int ch)
ch
- code point to be determined if it is of base formpublic static int getDirection(int ch)
ch
- the code point to be determined its directionpublic static boolean isMirrored(int ch)
ch
- code point whose mirror is to be determinedpublic static int getMirror(int ch)
ch
- code point whose mirror is to be retrievedpublic static byte getCombiningClass(int ch)
ch
- code point whose combining is to be retrievedpublic static boolean isLegal(int ch)
ch
- code point to determine if it is a legal code point by itselfpublic static boolean isLegal(java.lang.String str)
ch
- code point to determine if it is a legal code point by itselfpublic static java.lang.String getUnicodeVersion()
public static java.lang.String getName(int ch)
ch
- the code point for which to get the namepublic static java.lang.String getName1_0(int ch)
ch
- the code point for which to get the namepublic static int getCharFromName(java.lang.String name)
name
- most current Unicode character name whose code point is to be
returnedpublic static int getCharFromName1_0(java.lang.String name)
name
- Unicode 1.0 code point name whose code point is to
returnedpublic static int getCodePoint(char lead, char trail)
lead
- the lead chartrail
- the trail charpublic static int getCodePoint(char char16)
char16
- the UTF16 characterjava.lang.IllegalArgumentException
- thrown when char16 is not a valid
codepointpublic static java.lang.String toUpperCase(java.lang.String str)
str
- source string to be performed onpublic static java.lang.String toLowerCase(java.lang.String str)
str
- source string to be performed onpublic static java.lang.String toUpperCase(java.util.Locale locale, java.lang.String str)
locale
- which string is to be converted instr
- source string to be performed onpublic static java.lang.String toLowerCase(java.util.Locale locale, java.lang.String str)
locale
- which string is to be converted instr
- source string to be performed onprotected static int getRawSupplementary(char lead, char trail)
lead
- lead surrogate charactertrail
- trailing surrogate character
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |