ifx_gl_cv_mconv - convert characters from one codeset to another
SYNOPSIS
#include <ifxgls.h>
int ifx_gl_cv_mconv(conv_state_t *state,
gl_mchar_t **dst, int *dstbytesleft, char *dstcodeset,
gl_mchar_t **src, int *srcbytesleft, char *srccodeset)
DESCRIPTION
This function converts the string of characters in *src
into the same characters, but encoded in another codeset, and stores
the result in the buffer pointed to by *dst.
The codesets, srccodeset and dstcodeset, may be locale
specifiers (for example, "de_de.8859-1" or "ja_jp.ujis") or code set
names found in the file $INFORMIXDIR/gls/cmX/registry (for
example, "8859-1" or "ujis"). The macro, IFX_GL_PROC_CS, may be passed as
srccodeset or dstcodeset to specify the code set of the
current processing locale. Depending on the context, the value of IFX_GL_PROC_CS is based on either
the client's environment, or the database which the server is currently
accessing.
Source Buffer
The src argument points to a pointer to the first character in
the source buffer and srcbytesleft points to the number of bytes in
*src to convert.
The pointer pointed to by src is updated to point to the byte
following the last source character successfully converted.
The integer pointed to by srcbytesleft is updated be the number of
bytes in *src that have not been converted. After a successful
conversion, srcbytesleft should point to zero.
If srcbytesleft is NULL, the function converts data
until a null character is found in the source buffer.
Destination Buffer
The dst argument points to a pointer
to the first character in the destination buffer and dstbytesleft
points to the maximum number of bytes to write into *dst.
The pointer
pointed to by dst is updated to point to the byte following
the last destination character. The integer pointed to by
dstbytesleft is updated be the number of bytes still available
in *dst.
If dst is NULL, then the function updates *src,
*srcbytesleft, and *dstbytesleft but the converted
data is not written to *dst.
Memory Management
Sometimes the caller of this function will need to fragment a complete
source string into two or more non-adjacent source buffers to meet the
memory management requirements of their component. This will result in
the need to call this function multiple times: once for each fragment.
Because of the nature of state-dependent codesets (and since the caller
of this function cannot know whether either srccodeset or dstcodeset is a state-dependent codeset), state information
must be preserved across these multiple calls. The state
argument is used for this purpose.
Therefore, the caller must supply two
pieces of information to each call to this function:
- whether *src is the first fragment of
the complete string (state->first_frag = 0/1) and
- whether *src is the last fragment of the
complete string (state->last_frag = 0/1)
In the case where *src and *srcbytesleft represent
a complete string, then both state->first_frag and state->last_frag would be set to 1.
The caller of this function should only set the first_frag and
last_frag fields of state. This structure contains other
fields that are used internally; however, the caller should not
set or inspect these other fields.
This function must be passed the
fragments in the same order in which they appear in the complete
string; and the same conv_state_t structure must be used for
all of the fragments of the same complete string.
For example,
to convert a complete character string that is not fragmented:
int
foo(out, outlen, outcs, in, inlen, incs)
gl_mchar_t *out;
int outlen;
char *outcs;
gl_mchar_t *in;
int inlen;
char *incs;
{
conv_state_t state;
int ret;
state.first_frag = 1;
state.last_frag = 1;
ret = ifx_gl_cv_mconv(&state, &out, &outlen, outcs,
&in, &inlen, incs);
...
}
To convert a complete character string that is fragmented into 4 different
buffers:
int
foo(out, outlen, outcs, in, inlen, incs)
gl_mchar_t *out;
int outlen;
char *outcs;
gl_mchar_t *in[];
int inlen;
char *incs;
{
conv_state_t state;
int ret;
state.first_frag = 1;
state.last_frag = 0;
ret = ifx_gl_cv_mconv(&state, &out, &outlen, outcs,
&in[0], &inlen, incs);
...
state.first_frag = 0;
state.last_frag = 0;
ret = ifx_gl_cv_mconv(&state, &out, &outlen, outcs,
&in[1], &inlen, incs);
...
/* no need to set the states again, since they are already 0 */
ret = ifx_gl_cv_mconv(&state, &out, &outlen, outcs,
&in[2], &inlen, incs);
...
state.first_frag = 0;
state.last_frag = 1;
ret = ifx_gl_cv_mconv(&state, &out, &outlen, outcs,
&in[3], &inlen, incs);
...
}
For an additional issue involving processing fragmented multi-byte character strings
see Fragmenting Long Multi-Byte Strings.
CALCULATING dstbytesleft
The number of bytes written to *dst might be more or less than the
number of bytes read from *src. There are two ways to determine
the number of bytes that will be written to *dst.
The function
ifx_gl_cv_outbuflen(srccodeset, dstcodeset, *srcbytesleft)
calculates either exactly the number of bytes that will be written to
*dst or a close over-approximation of the number. The third
argument to
ifx_gl_cv_outbuflen()
is the number of bytes in *src.
The expression
(*srcbytesleft * IFX_GL_MB_MAX)
is the maximum number of bytes that will be written to *dst for any value
of *src in any locale.
This value will always be equal
to or greater
than the value returned by
(*srcbytesleft * ifx_gl_mb_loc_max()).
Of the two options, the expression
(*srcbytesleft * IFX_GL_MB_MAX)
is the faster, but the function
ifx_gl_cv_outbuflen(srccodeset, dstcodeset, *srcbytesleft)
is more precise.
PERFORMANCE
Any performance overheads involved with codeset conversion are a result
of either memory management
or multi-byte string traversal. Not every pair of codesets require
these overheads to correctly convert. See
ifx_gl_cv_sb2sb_table()
to determine how to optimize your code for these situations.
RETURN VALUES
This function updates the variables pointed to by the
arguments to reflect the extent of the conversion (as mentioned
above) and returns zero. If the entire source buffer is converted,
the value pointed to by srcbytesleft will be zero. If the source
conversion is stopped due to an error, the value pointed to by
srcbytesleft will be greater than zero. -1 is returned
if an error occurs.
ERRORS
If an error has occurred, this function returns -1 and sets
ifx_gl_lc_errno() to one of the
following,
- [IFX_GL_FILEERR]
- Retrieving the conversion information for the specified
codesets failed. This may be due to invalid codeset names,
a missing registry file, a missing codeset conversion object file or
one with an incorrect format, or a lack of memory for the codeset
conversion object.
- [IFX_GL_EILSEQ]
- *src contains an invalid character;
conversion stops after both the last successfully converted
source and destination character
- [IFX_GL_EINVAL]
- The function cannot determine whether the last character of *src
is a valid character or the end of shift sequence, because it would need to
read more than *srcbytesleft bytes from *src;
conversion stops after the last
successfully converted source and destination character. See Keeping Multi-Byte Strings
Consistent for more information about this error.
- [IFX_GL_E2BIG]
- Not enough space in the destination buffer;
conversion stops after both the last successfully converted
source and destination character
SEE ALSO
ifx_gl_conv_needed()
ifx_gl_cv_outbuflen()
ifx_gl_cv_sb2sb_table()
IFX_GL_PROC_CS
ACKNOWLEDGEMENT
Portions of this description were derived from the X/Open CAE
Specification: "System Interfaces and Headers, Issue 4"; X/Open
Document Number: C202; ISBN: 1-872630-47-2; Published by X/Open Company
Ltd., U.K.