ifx_gl_cv_mconv - convert characters from one codeset to another

SYNOPSIS


#include <ifxgls.h>
int ifx_gl_cv_mconv(conv_state_t *state,
                    gl_mchar_t **dst, int *dstbytesleft, char *dstcodeset,
                    gl_mchar_t **src, int *srcbytesleft, char *srccodeset)

DESCRIPTION

This function converts the string of characters in *src into the same characters, but encoded in another codeset, and stores the result in the buffer pointed to by *dst.

The codesets, srccodeset and dstcodeset, may be locale specifiers (for example, "de_de.8859-1" or "ja_jp.ujis") or code set names found in the file $INFORMIXDIR/gls/cmX/registry (for example, "8859-1" or "ujis"). The macro, IFX_GL_PROC_CS, may be passed as srccodeset or dstcodeset to specify the code set of the current processing locale. Depending on the context, the value of IFX_GL_PROC_CS is based on either the client's environment, or the database which the server is currently accessing.

Source Buffer

The src argument points to a pointer to the first character in the source buffer and srcbytesleft points to the number of bytes in *src to convert. The pointer pointed to by src is updated to point to the byte following the last source character successfully converted. The integer pointed to by srcbytesleft is updated be the number of bytes in *src that have not been converted. After a successful conversion, srcbytesleft should point to zero.

If srcbytesleft is NULL, the function converts data until a null character is found in the source buffer.

Destination Buffer

The dst argument points to a pointer to the first character in the destination buffer and dstbytesleft points to the maximum number of bytes to write into *dst. The pointer pointed to by dst is updated to point to the byte following the last destination character. The integer pointed to by dstbytesleft is updated be the number of bytes still available in *dst.

If dst is NULL, then the function updates *src, *srcbytesleft, and *dstbytesleft but the converted data is not written to *dst.

Memory Management

Sometimes the caller of this function will need to fragment a complete source string into two or more non-adjacent source buffers to meet the memory management requirements of their component. This will result in the need to call this function multiple times: once for each fragment. Because of the nature of state-dependent codesets (and since the caller of this function cannot know whether either srccodeset or dstcodeset is a state-dependent codeset), state information must be preserved across these multiple calls. The state argument is used for this purpose.

Therefore, the caller must supply two pieces of information to each call to this function:

  1. whether *src is the first fragment of the complete string (state->first_frag = 0/1) and
  2. whether *src is the last fragment of the complete string (state->last_frag = 0/1)
In the case where *src and *srcbytesleft represent a complete string, then both state->first_frag and state->last_frag would be set to 1.

The caller of this function should only set the first_frag and last_frag fields of state. This structure contains other fields that are used internally; however, the caller should not set or inspect these other fields.

This function must be passed the fragments in the same order in which they appear in the complete string; and the same conv_state_t structure must be used for all of the fragments of the same complete string.

For example, to convert a complete character string that is not fragmented:

 int
 foo(out, outlen, outcs, in, inlen, incs)
     gl_mchar_t *out;
     int outlen;
     char *outcs;
     gl_mchar_t *in;
     int inlen;
     char *incs;
 {
     conv_state_t state;
     int ret;

     state.first_frag = 1;
     state.last_frag = 1;
     ret = ifx_gl_cv_mconv(&state, &out, &outlen, outcs,
                                   &in, &inlen, incs);
     ...
 }
To convert a complete character string that is fragmented into 4 different buffers:
 int
 foo(out, outlen, outcs, in, inlen, incs)
     gl_mchar_t *out;
     int outlen;
     char *outcs;
     gl_mchar_t *in[];
     int inlen;
     char *incs;
 {
     conv_state_t state;
     int ret;

     state.first_frag = 1;
     state.last_frag = 0;
     ret = ifx_gl_cv_mconv(&state, &out, &outlen, outcs,
                                   &in[0], &inlen, incs);
     ...
     state.first_frag = 0;
     state.last_frag = 0;
     ret = ifx_gl_cv_mconv(&state, &out, &outlen, outcs,
                                   &in[1], &inlen, incs);
     ...
     /* no need to set the states again, since they are already 0 */
     ret = ifx_gl_cv_mconv(&state, &out, &outlen, outcs,
                                   &in[2], &inlen, incs);
     ...
     state.first_frag = 0;
     state.last_frag = 1;
     ret = ifx_gl_cv_mconv(&state, &out, &outlen, outcs,
                                   &in[3], &inlen, incs);
     ...
 }
For an additional issue involving processing fragmented multi-byte character strings see Fragmenting Long Multi-Byte Strings.

CALCULATING dstbytesleft

The number of bytes written to *dst might be more or less than the number of bytes read from *src. There are two ways to determine the number of bytes that will be written to *dst.

The function ifx_gl_cv_outbuflen(srccodeset, dstcodeset, *srcbytesleft) calculates either exactly the number of bytes that will be written to *dst or a close over-approximation of the number. The third argument to ifx_gl_cv_outbuflen() is the number of bytes in *src.

The expression (*srcbytesleft * IFX_GL_MB_MAX) is the maximum number of bytes that will be written to *dst for any value of *src in any locale. This value will always be equal to or greater than the value returned by (*srcbytesleft * ifx_gl_mb_loc_max()).

Of the two options, the expression (*srcbytesleft * IFX_GL_MB_MAX) is the faster, but the function ifx_gl_cv_outbuflen(srccodeset, dstcodeset, *srcbytesleft) is more precise.

PERFORMANCE

Any performance overheads involved with codeset conversion are a result of either memory management or multi-byte string traversal. Not every pair of codesets require these overheads to correctly convert. See ifx_gl_cv_sb2sb_table() to determine how to optimize your code for these situations.

RETURN VALUES

This function updates the variables pointed to by the arguments to reflect the extent of the conversion (as mentioned above) and returns zero. If the entire source buffer is converted, the value pointed to by srcbytesleft will be zero. If the source conversion is stopped due to an error, the value pointed to by srcbytesleft will be greater than zero. -1 is returned if an error occurs.

ERRORS

If an error has occurred, this function returns -1 and sets ifx_gl_lc_errno() to one of the following,
[IFX_GL_FILEERR]
Retrieving the conversion information for the specified codesets failed. This may be due to invalid codeset names, a missing registry file, a missing codeset conversion object file or one with an incorrect format, or a lack of memory for the codeset conversion object.
[IFX_GL_EILSEQ]
*src contains an invalid character; conversion stops after both the last successfully converted source and destination character
[IFX_GL_EINVAL]
The function cannot determine whether the last character of *src is a valid character or the end of shift sequence, because it would need to read more than *srcbytesleft bytes from *src; conversion stops after the last successfully converted source and destination character. See Keeping Multi-Byte Strings Consistent for more information about this error.
[IFX_GL_E2BIG]
Not enough space in the destination buffer; conversion stops after both the last successfully converted source and destination character

SEE ALSO

ifx_gl_conv_needed() ifx_gl_cv_outbuflen() ifx_gl_cv_sb2sb_table() IFX_GL_PROC_CS

ACKNOWLEDGEMENT

Portions of this description were derived from the X/Open CAE Specification: "System Interfaces and Headers, Issue 4"; X/Open Document Number: C202; ISBN: 1-872630-47-2; Published by X/Open Company Ltd., U.K.