Main Page   Class Hierarchy   Alphabetical List   Compound List   File List   Compound Members   File Members  

UnicodeConverter Class Reference

UnicodeConverter is a C++ wrapper class for UConverter. More...

#include <convert.h>

List of all members.

Public Methods

 UnicodeConverter ()
 Creates Unicode Conversion Object will default to LATIN1 <-> encoding. More...

 UnicodeConverter (const char* name, UErrorCode& err)
 Creates Unicode Conversion Object by specifying the codepage name. More...

 UnicodeConverter (const UnicodeString& name, UErrorCode& err)
 Creates a UnicodeConverter object with the names specified as unicode strings. More...

 UnicodeConverter (int32_t codepageNumber, UConverterPlatform platform, UErrorCode& err)
 Creates Unicode Conversion Object using the codepage ID number. More...

 ~UnicodeConverter ()
void fromUnicodeString (char* target, int32_t& targetSize, const UnicodeString& source, UErrorCode& err) const
 Transcodes the source UnicodeString to the target string in a codepage encoding with the specified Unicode converter. More...

void toUnicodeString (UnicodeString& target, const char* source, int32_t sourceSize, UErrorCode& err) const
 Transcode the source string in codepage encoding to the target string in Unicode encoding. More...

void fromUnicode (char*& target, const char* targetLimit, const UChar*& source, const UChar* sourceLimit, int32_t * offsets, UBool flush, UErrorCode& err)
 Transcodes an array of unicode characters to an array of codepage characters. More...

void toUnicode (UChar*& target, const UChar* targetLimit, const char*& source, const char* sourceLimit, int32_t * offsets, UBool flush, UErrorCode& err)
 Converts an array of codepage characters into an array of unicode characters. More...

int8_t getMaxBytesPerChar (void) const
 Returns the maximum length of bytes used by a character. More...

int8_t getMinBytesPerChar (void) const
 Returns the minimum byte length for characters in this codepage. More...

UConverterType getType (void) const
 Gets the type of conversion associated with the converter e.g. More...

void getStarters (UBool starters[256], UErrorCode& err) const
 Gets the "starter" bytes for the converters of type MBCS will fill in an U_ILLEGAL_ARGUMENT_ERROR if converter passed in is not MBCS. More...

void getSubstitutionChars (char* subChars, int8_t& len, UErrorCode& err) const
 Fills in the output parameter, subChars, with the substitution characters as multiple bytes. More...

void setSubstitutionChars (const char* subChars, int8_t len, UErrorCode& err)
 Sets the substitution chars when converting from unicode to a codepage. More...

void resetState (void)
 Resets the state of stateful conversion to the default state. More...

const char* getName ( UErrorCode& err) const
 Gets the name of the converter (zero-terminated). More...

int32_t getCodepage (UErrorCode& err) const
 Gets a codepage number associated with the converter. More...

void getMissingCharAction (UConverterToUCallback *action, void **context) const
 Returns the current setting action taken when a character from a codepage is missing or a byte sequence is illegal etc. More...

void getMissingUnicodeAction (UConverterFromUCallback *action, void **context) const
 Return the current setting action taken when a unicode character is missing or there is an unpaired surrogate etc. More...

void setMissingCharAction (UConverterToUCallback newAction, void* newContext, UConverterToUCallback *oldAction, void** oldContext, UErrorCode& err)
 Sets the current setting action taken when a character from a codepage is missing. More...

void setMissingUnicodeAction (UConverterFromUCallback newAction, void* newContext, UConverterFromUCallback *oldAction, void** oldContext, UErrorCode& err)
 Sets the current setting action taken when a unicode character is missing. More...

void getDisplayName (const Locale& displayLocale, UnicodeString& displayName) const
 Returns the localized name of the UnicodeConverter, if for any reason it is available, the internal name will be returned instead. More...

UConverterPlatform getCodepagePlatform (UErrorCode& err) const
 Returns the T_UnicodeConverter_platform (ICU defined enum) of a UnicodeConverter available, the internal name will be returned instead. More...

UnicodeConverter& operator= (const UnicodeConverter& that)
UBool operator== (const UnicodeConverter& that) const
UBool operator!= (const UnicodeConverter& that) const
 UnicodeConverter (const UnicodeConverter& that)
void fixFileSeparator (UnicodeString& source) const
 Fixes the backslash character mismapping. More...

UBool isAmbiguous (void) const
 Determines if the converter contains ambiguous mappings of the same character or not. More...


Static Public Methods

const char* const* getAvailableNames (int32_t& num, UErrorCode& err)
 Returns the available names. More...

int32_t flushCache (void)
 Iterates through every cached converter and frees all the unused ones. More...


Private Methods

void printRef (void) const

Private Attributes

UConvertermyUnicodeConverter

Static Private Attributes

const char** availableConverterNames
int32_t availableConverterNamesCount


Detailed Description

UnicodeConverter is a C++ wrapper class for UConverter.

You need one UnicodeConverter object in place of one UConverter object. For details on the API and implementation of the codepage converter iterface see ucnv.h.

See also:
UConverter
Stable:

Definition at line 28 of file convert.h.


Constructor & Destructor Documentation

UnicodeConverter::UnicodeConverter ( )
 

Creates Unicode Conversion Object will default to LATIN1 <-> encoding.

Returns:
An object Handle if successful or a NULL if the creation failed
Stable:

UnicodeConverter::UnicodeConverter ( const char * name,
UErrorCode & err )
 

Creates Unicode Conversion Object by specifying the codepage name.

The name string is in ASCII format.

Parameters:
code_set   the pointer to a char[] object containing a codepage name. (I)
UErrorCode   Error status (I/O) IILLEGAL_ARGUMENT_ERROR will be returned if the string is empty. If the internal program does not work correctly, for example, if there's no such codepage, U_INTERNAL_PROGRAM_ERROR will be returned.
Returns:
An object Handle if successful or a NULL if the creation failed
Stable:

UnicodeConverter::UnicodeConverter ( const UnicodeString & name,
UErrorCode & err )
 

Creates a UnicodeConverter object with the names specified as unicode strings.

The name should be limited to the ASCII-7 alphanumerics. Dash and underscore characters are allowed for readability, but are ignored in the search.

Parameters:
code_set   name of the uconv table in Unicode string (I)
err   error status (I/O) IILLEGAL_ARGUMENT_ERROR will be returned if the string is empty. If the internal program does not work correctly, for example, if there's no such codepage, U_INTERNAL_PROGRAM_ERROR will be returned.
Returns:
the created Unicode converter object
Stable:

UnicodeConverter::UnicodeConverter ( int32_t codepageNumber,
UConverterPlatform platform,
UErrorCode & err )
 

Creates Unicode Conversion Object using the codepage ID number.

Parameters:
code_set   a codepage # (I) @UErrorCode Error status (I/O) IILLEGAL_ARGUMENT_ERROR will be returned if the string is empty. If the internal program does not work correctly, for example, if there's no such codepage, U_INTERNAL_PROGRAM_ERROR will be returned.
Returns:
An object Handle if successful or a NULL if failed
Stable:

UnicodeConverter::~UnicodeConverter ( )
 

UnicodeConverter::UnicodeConverter ( const UnicodeConverter & that )
 


Member Function Documentation

void UnicodeConverter::fixFileSeparator ( UnicodeString & source ) const
 

Fixes the backslash character mismapping.

For example, in SJIS, the backslash character in the ASCII portion is also used to represent the yen currency sign. When mapping from Unicode character 0x005C, it's unclear whether to map the character back to yen or backslash in SJIS. This function will take the input buffer and replace all the yen sign characters with backslash. This is necessary when the user tries to open a file with the input buffer on Windows.

Parameters:
source   the input buffer to be fixed
Draft:

int32_t UnicodeConverter::flushCache ( void ) [static]
 

Iterates through every cached converter and frees all the unused ones.

Returns:
the number of cached converters successfully deleted
Stable:

void UnicodeConverter::fromUnicode ( char *& target,
const char * targetLimit,
const UChar *& source,
const UChar * sourceLimit,
int32_t * offsets,
UBool flush,
UErrorCode & err )
 

Transcodes an array of unicode characters to an array of codepage characters.

The source pointer is an I/O parameter, it starts out pointing at the place to begin translating, and ends up pointing after the first sequence of the bytes that it encounters that are semantically invalid. if T_UnicodeConverter_setMissingCharAction is called with an action other than STOP before a call is made to this API, consumed and source should point to the same place (unless target ends with an imcomplete sequence of bytes and flush is FALSE).

Parameters:
target   : I/O parameter. Input : Points to the beginning of the buffer to copy codepage characters to. Output : points to after the last codepage character copied to target.
targetLimit   the pointer to the end of the target array
source   the source Unicode character array
sourceLimit   the pointer to the end of the source array
flush   TRUE if the buffer is the last buffer and the conversion will finish in this call, FALSE otherwise. (future feature pending)
UErrorCode   the error status. U_ILLEGAL_ARGUMENT_ERROR will be returned if the converter is null.
Draft:
backslash versus Yen sign in shift-JIS

void UnicodeConverter::fromUnicodeString ( char * target,
int32_t & targetSize,
const UnicodeString & source,
UErrorCode & err ) const
 

Transcodes the source UnicodeString to the target string in a codepage encoding with the specified Unicode converter.

For example, if a Unicode to/from JIS converter is specified, the source string in Unicode will be transcoded to JIS encoding. The result will be stored in JIS encoding.

Parameters:
source   the source Unicode string
target   the target string in codepage encoding
targetSize   Input the number of bytes available in the "target" buffer, Output the number of bytes copied to it
err   the error status code. U_MEMORY_ALLOCATION_ERROR will be returned if the the internal process buffer cannot be allocated for transcoding. U_ILLEGAL_ARGUMENT_ERROR is returned if the converter is null or the source or target string is empty.
Draft:
backslash versus Yen sign in shift-JIS

const char *const * UnicodeConverter::getAvailableNames ( int32_t & num,
UErrorCode & err ) [static]
 

Returns the available names.

Lazy evaluated, Library owns the storage

Parameters:
num   the number of available converters
err   the error code status
Returns:
the name array
Stable:

int32_t UnicodeConverter::getCodepage ( UErrorCode & err ) const
 

Gets a codepage number associated with the converter.

This is not guaranteed to be the one used to create the converter. Some converters do not represent IBM registered codepages and return zero for the codepage number. The error code fill-in parameter indicates if the codepage number is available.

Parameters:
err   the error status code. U_ILLEGAL_ARGUMENT_ERROR will returned if the converter is null or if converter's data table is null.
Returns:
If any error occurrs, null will be returned.
Stable:

UConverterPlatform UnicodeConverter::getCodepagePlatform ( UErrorCode & err ) const
 

Returns the T_UnicodeConverter_platform (ICU defined enum) of a UnicodeConverter available, the internal name will be returned instead.

Parameters:
err   the error code status
Returns:
the codepages platform
Stable:

void UnicodeConverter::getDisplayName ( const Locale & displayLocale,
UnicodeString & displayName ) const
 

Returns the localized name of the UnicodeConverter, if for any reason it is available, the internal name will be returned instead.

Parameters:
displayLocale   the valid Locale, from which we want to localize
displayString   a UnicodeString that is going to be filled in.
Stable:

int8_t UnicodeConverter::getMaxBytesPerChar ( void ) const
 

Returns the maximum length of bytes used by a character.

This varies between 1 and 4

Returns:
the max number of bytes per codepage character * converter is null, targetLimit < target, sourceLimit < source
Stable:

int8_t UnicodeConverter::getMinBytesPerChar ( void ) const
 

Returns the minimum byte length for characters in this codepage.

This is either 1 or 2 for all supported codepages.

Returns:
the minimum number of byte per codepage character
Stable:

void UnicodeConverter::getMissingCharAction ( UConverterToUCallback * action,
void ** context ) const
 

Returns the current setting action taken when a character from a codepage is missing or a byte sequence is illegal etc.

Parameters:
action   the callback function pointer
context   the callback function state
Stable:

void UnicodeConverter::getMissingUnicodeAction ( UConverterFromUCallback * action,
void ** context ) const
 

Return the current setting action taken when a unicode character is missing or there is an unpaired surrogate etc.

Parameters:
action   the callback function pointer
context   the callback function state
Stable:

const char * UnicodeConverter::getName ( UErrorCode & err ) const
 

Gets the name of the converter (zero-terminated).

the name will be the internal name of the converter

Parameters:
converter   the Unicode converter
err   the error status code. U_INDEX_OUTOFBOUNDS_ERROR in the converterNameLen is too small to contain the name.
Stable:

void UnicodeConverter::getStarters ( UBool starters[256],
UErrorCode & err ) const
 

Gets the "starter" bytes for the converters of type MBCS will fill in an U_ILLEGAL_ARGUMENT_ERROR if converter passed in is not MBCS.

fills in an array of boolean, with the value of the byte as offset to the array. At return, if TRUE is found in at offset 0x20, it means that the byte 0x20 is a starter byte in this converter.

Parameters:
starters:   an array of size 256 to be filled in
err:   an array of size 256 to be filled in
See also:
ucnv_getType
Stable:

void UnicodeConverter::getSubstitutionChars ( char * subChars,
int8_t & len,
UErrorCode & err ) const
 

Fills in the output parameter, subChars, with the substitution characters as multiple bytes.

Parameters:
subChars   the subsitution characters
len   the number of bytes of the substitution character array
err   the error status code. U_ILLEGAL_ARGUMENT_ERROR will be returned if the converter is null. If the substitution character array is too small, an U_INDEX_OUTOFBOUNDS_ERROR will be returned.
Stable:

UConverterType UnicodeConverter::getType ( void ) const
 

Gets the type of conversion associated with the converter e.g.

SBCS, MBCS, DBCS, UTF8, UTF16_BE, UTF16_LE, ISO_2022, EBCDIC_STATEFUL, LATIN_1

Returns:
the type of the converter
Stable:

UBool UnicodeConverter::isAmbiguous ( void ) const
 

Determines if the converter contains ambiguous mappings of the same character or not.

Returns:
TRUE if the converter contains ambiguous mapping of the same character, FALSE otherwise.
Draft:

UBool UnicodeConverter::operator!= ( const UnicodeConverter & that ) const
 

UnicodeConverter& UnicodeConverter::operator= ( const UnicodeConverter & that )
 

UBool UnicodeConverter::operator== ( const UnicodeConverter & that ) const
 

void UnicodeConverter::printRef ( void ) const [private]
 

void UnicodeConverter::resetState ( void )
 

Resets the state of stateful conversion to the default state.

This is used in the case of error to restart a conversion from a known default state.

Stable:

void UnicodeConverter::setMissingCharAction ( UConverterToUCallback newAction,
void * newContext,
UConverterToUCallback * oldAction,
void ** oldContext,
UErrorCode & err )
 

Sets the current setting action taken when a character from a codepage is missing.

(Currently STOP or SUBSTITUTE).

Parameters:
newAction   the action constant if an equivalent codepage character is missing
newContext   the new toUnicode callback function state
oldAction   the original action constant, saved for later restoration.
oldContext   the old toUnicode callback function state
err   the error status code
Stable:

void UnicodeConverter::setMissingUnicodeAction ( UConverterFromUCallback newAction,
void * newContext,
UConverterFromUCallback * oldAction,
void ** oldContext,
UErrorCode & err )
 

Sets the current setting action taken when a unicode character is missing.

(currently T_UnicodeConverter_MissingUnicodeAction is either STOP or SUBSTITUTE, SKIP, CLOSEST_MATCH, ESCAPE_SEQ may be added in the future).

Parameters:
newAction   the action constant if an equivalent Unicode character is missing
newContext   the new fromUnicode callback function state
oldAction   the original action constant, saved for later restoration.
oldContext   the old fromUnicode callback function state
err   the error status code
Stable:

void UnicodeConverter::setSubstitutionChars ( const char * subChars,
int8_t len,
UErrorCode & err )
 

Sets the substitution chars when converting from unicode to a codepage.

The substitution is specified as a string of 1-4 bytes, and may contain null byte. The fill-in parameter err will get the error status on return.

Parameters:
cstr   the substitution character array to be set with
len   the number of bytes of the substitution character array and upon return will contain the number of bytes copied to that buffer
err   the error status code. U_ILLEGAL_ARGUMENT_ERROR if the converter is null. or if the number of bytes provided are not in the codepage's range (e.g length 1 for ucs-2)
Stable:

void UnicodeConverter::toUnicode ( UChar *& target,
const UChar * targetLimit,
const char *& source,
const char * sourceLimit,
int32_t * offsets,
UBool flush,
UErrorCode & err )
 

Converts an array of codepage characters into an array of unicode characters.

The source pointer is an I/O parameter, it starts out pointing at the place to begin translating, and ends up pointing after the first sequence of the bytes that it encounters that are semantically invalid. if T_UnicodeConverter_setMissingUnicodeAction is called with an action other than STOP before a call is made to this API, consumed and source should point to the same place (unless target ends with an imcomplete sequence of bytes and flush is FALSE).

Parameters:
target   : I/O parameter. Input : Points to the beginning of the buffer to copy Unicode characters to. Output : points to after the last UChar copied to target.
targetLimit   the pointer to the end of the target array
source   the source codepage character array
sourceLimit   the pointer to the end of the source array
flush   TRUE if the buffer is the last buffer and the conversion will finish in this call, FALSE otherwise. (future feature pending)
err   the error code status U_ILLEGAL_ARGUMENT_ERROR will be returned if the converter is null, targetLimit < target, sourceLimit < source
Stable:

void UnicodeConverter::toUnicodeString ( UnicodeString & target,
const char * source,
int32_t sourceSize,
UErrorCode & err ) const
 

Transcode the source string in codepage encoding to the target string in Unicode encoding.

For example, if a Unicode to/from JIS converter is specified, the source string in JIS encoding will be transcoded to Unicode encoding. The result will be stored in Unicode encoding.

Parameters:
source   the source string in codepage encoding
target   the target string in Unicode encoding
targetSize   : I/O parameter, Input size buffer, Output # of bytes copied to it
err   the error status code U_MEMORY_ALLOCATION_ERROR will be returned if the the internal process buffer cannot be allocated for transcoding. U_ILLEGAL_ARGUMENT_ERROR is returned if the converter is null or the source or target string is empty.
Stable:


Member Data Documentation

const char ** UnicodeConverter::availableConverterNames [static, private]
 

Definition at line 37 of file convert.h.

int32_t UnicodeConverter::availableConverterNamesCount [static, private]
 

Definition at line 38 of file convert.h.

UConverter * UnicodeConverter::myUnicodeConverter [private]
 

Definition at line 32 of file convert.h.


The documentation for this class was generated from the following file:
Generated at Fri Dec 15 12:13:59 2000 for ICU 1.7 by doxygen1.2.3 written by Dimitri van Heesch, © 1997-2000