|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--com.ibm.text.UTF16
Standalone utility class providing UTF16 character conversions and indexing conversions.
Code that uses strings alone rarely need modification.
By design, UTF-16 does not allow overlap, so searching for strings is a safe
operation. Similarly, concatenation is always safe. Substringing is safe if
the start and end are both on UTF-32 boundaries. In normal code, the values
for start and end are on those boundaries, since they arose from operations
like searching. If not, the nearest UTF-32 boundaries can be determined
using bounds()
.
Examples:
The following examples illustrate use of some of these methods.
// iteration forwards: Original for (int i = 0; i < s.length(); ++i) { char ch = s.charAt(i); doSomethingWith(ch); } // iteration forwards: Changes for UTF-32 int ch; for (int i = 0; i < s.length(); i+=UTF16.getCharCount(ch)) { ch = UTF16.charAt(s,i); doSomethingWith(ch); } // iteration backwards: Original for (int i = s.length()-1; i >= 0; --i) { char ch = s.charAt(i); doSomethingWith(ch); } // iteration backwards: Changes for UTF-32 int ch; for (int i = s.length()-1; i > 0; i-=UTF16.getCharCount(ch)) { ch = UTF16.charAt(s,i); doSomethingWith(ch); }Notes:
Lead
and Trail
in the API, which gives a better
sense of their ordering in a string. offset16
and
offset32
are used to distinguish offsets to UTF-16
boundaries vs offsets to UTF-32 boundaries. int char32
is
used to contain UTF-32 characters, as opposed to char16
,
which is a UTF-16 code unit.
bounds(string, offset16) != TRAIL
.
UCharacter.isLegal()
can be used to check
for validity if desired.
Inner Class Summary | |
static class |
UTF16.StringComparator
Compare strings using Unicode code point order, instead of UTF-16 code unit order. |
Field Summary | |
static int |
LEAD_SURROGATE_BOUNDARY
Value returned in
bounds() . |
static int |
SINGLE_CHAR_BOUNDARY
Value returned in
bounds() . |
static int |
TRAIL_SURROGATE_BOUNDARY
Value returned in
bounds() . |
Method Summary | |
static java.lang.StringBuffer |
append(java.lang.StringBuffer target,
int char32)
Append a single UTF-32 value to the end of a StringBuffer. |
static int |
bounds(java.lang.String source,
int offset16)
Returns the type of the boundaries around the char at offset16. |
static int |
boundsAtCodePointOffset(java.lang.String source,
int offset32)
Returns the type of the boundaries around the char at offset32. |
static int |
charAt(java.lang.String source,
int offset16)
Extract a single UTF-32 value from a string. |
static int |
charAtCodePointOffset(java.lang.String source,
int offset32)
Extract a single UTF-32 value from a string. |
static int |
countCodePoint(java.lang.String s)
Number of codepoints in a UTF16 String |
static int |
findCodePointOffset(java.lang.String source,
int offset16)
Returns the UTF-32 offset corresponding to the first UTF-32 boundary at or after the given UTF-16 offset. |
static int |
findOffsetFromCodePoint(java.lang.String source,
int offset32)
Returns the UTF-16 offset that corresponds to a UTF-32 offset. |
static int |
getCharCount(int char32)
Determines how many chars this char32 requires. |
static int |
getLeadSurrogate(int char32)
Returns the lead surrogate. |
static int |
getTrailSurrogate(int char32)
Returns the trail surrogate. |
static boolean |
isLeadSurrogate(char char16)
Determines whether the character is a lead surrogate. |
static boolean |
isSurrogate(char char16)
Determines whether the code value is a surrogate. |
static boolean |
isTrailSurrogate(char char16)
Determines whether the character is a trail surrogate. |
static void |
setCharAt(java.lang.StringBuffer source,
int offset16,
int char32)
Set a code point into a UTF16 position. |
static void |
setCharAtCodePointOffset(java.lang.StringBuffer str,
int offset32,
int char32)
Sets a code point into a UTF32 position. |
static java.lang.String |
valueOf(int char32)
Convenience method corresponding to String.valueOf(char). |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final int SINGLE_CHAR_BOUNDARY
bounds()
.
These values are chosen specifically so that it actually represents the
position of the character
[offset16 - (value >> 2), offset16 + (value & 3)]public static final int LEAD_SURROGATE_BOUNDARY
bounds()
.
These values are chosen specifically so that it actually represents the
position of the character
[offset16 - (value >> 2), offset16 + (value & 3)]public static final int TRAIL_SURROGATE_BOUNDARY
bounds()
.
These values are chosen specifically so that it actually represents the
position of the character
[offset16 - (value >> 2), offset16 + (value & 3)]Method Detail |
public static int charAt(java.lang.String source, int offset16)
UTF16.getCharCount()
, as well as random access. If a validity
check is required, use
UCharacter.isLegal()
on the return value.
If the char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is not
found the incomplete character will be returnedsource
- array of UTF-16 charsoffset16
- UTF-16 offset to the start of the character.bounds32()
.public static int charAtCodePointOffset(java.lang.String source, int offset32)
UCharacter.isLegal()
on the return value.
If tbe char retrieved is part of a surrogate pair, its supplementary
character will be returned. If a complete supplementary character is not
found the incomplete character will be returnedsource
- array of UTF-16 charsoffset32
- UTF-32 offset to the start of the character.bounds32()
.public static int getCharCount(int char32)
isLegal()
on
char32 before calling.ch
- the input character.public static int bounds(java.lang.String source, int offset16)
source
- text to analyseoffset16
- UTF-16 offsetjava.lang.StringIndexOutOfBoundsException
- if offset16 is out of bounds.public static int boundsAtCodePointOffset(java.lang.String source, int offset32)
source
- text to analyseoffset32
- UTF-32 offsetjava.lang.StringIndexOutOfBoundsException
- if offset32 is out of bounds.public static boolean isSurrogate(char char16)
ch
- the input character.public static boolean isTrailSurrogate(char char16)
char16
- the input character.public static boolean isLeadSurrogate(char char16)
char16
- the input character.public static int getLeadSurrogate(int char32)
isLegal()
on
char32 before calling.char32
- the input character.public static int getTrailSurrogate(int char32)
isLegal()
on
char32 before calling.char32
- the input character.public static java.lang.String valueOf(int char32)
char32
- the input character.public static int findOffsetFromCodePoint(java.lang.String source, int offset32)
source
- the UTF-16 stringoffset32
- UTF-32 offsetjava.lang.StringIndexOutOfBoundsException
- if offset32 is out of bounds.public static int findCodePointOffset(java.lang.String source, int offset16)
To find the UTF-32 length of a string, use:
len32 = getOffset32(source, source.length());
source
- text to analyseoffset16
- UTF-16 offset < source text length.java.lang.StringIndexOutOfBoundsException
- if offset16 is out of bounds.public static java.lang.StringBuffer append(java.lang.StringBuffer target, int char32)
char32
- value to append. If out of bounds, substitutes
UTF32.REPLACEMENT_CHAR.public static int countCodePoint(java.lang.String s)
s
- UTF16 stringpublic static void setCharAtCodePointOffset(java.lang.StringBuffer str, int offset32, int char32)
str
- stringbufferoffset32
- UTF32 position to insert intochar32
- code pointpublic static void setCharAt(java.lang.StringBuffer source, int offset16, int char32)
source
- stringbufferoffset16
- UTF16 position to insert intochar32
- code point
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |