An internationalized data handler is a data handler that has been written in such as way that it can be customized for a particular locale. A locale is the part of a user's environment that brings together information about how to handle data that is specific to the end user's particular country, language, or territory. The locale is typically installed as part of the operating system. Creating a data handler that handles locale-sensitive data is called the internationalization (I18N) of the data handler. Preparing an internationalized data handler for a particular locale is called the localization (L10N) of the data handler.
This section provides the following information on an internationalized data handler:
A locale provides the following information for the user environment:
A locale name has the following format:
ll_TT.codeset
wherell is a two-character language code (usually in lower case), TT is a two-letter country and territory code (usually in upper case), and codeset is the name of the associated character code set. The codeset portion of the name is often optional. The locale is typically installed as part of the installation of the operating system.
This section provides the following categories of design considerations for internationalizing a data handler:
To be internationalized, a data handler must be coded to be locale-sensitive; that is, its behavior must take the locale setting into consideration and perform the task appropriate to that locale. For example, for locales that use English, the data handler should obtain its error messages from an English-language message file. The data handler framework that is installed with the product is internationalized. To complete the internationalization (I18N) of a data handler you develop, you must ensure that your data-handler implementation is internationalized.
An internationalized data handler must follow a set of locale-sensitive design principles:
The data handler might need to perform locale-sensitive processing (such as data format conversions) when it converts between the serialized data application and a business object. To track the locale associated with the data handler's environment, the DataHandler class has a private locale variable, which is initialized to the locale of the operating system on which the data handler runs. You can access the data handler environment's locale (the value of this private locale variable) at runtime through the accessor methods in Table 78.
Table 78. Methods to access the data handler environment's locale
Data Handler class | Method |
---|---|
DataHandler | getLocale(), setLocale() |
When a business object is created, it has a locale associated with its data. This locale applies to the data in the business object, not to the name of the business object definition or its attributes (which must be characters in the code set associated with the U.S. English locale, en_US). To create a business object, your data handler can use the methods shown in Table 79. These methods have access to the private locale variable in the DataHandler class. When one of these methods creates a business object, it associates with this business object the locale that the private DataHandler locale variable specifies.
Use the methods in Table 79 to create a business object and set the locale for its data. To ensure that the private locale variable specifies the correct locale for the data in the business object, you can use the setLocale() method before you call either of the methods in Table 79.
Table 79. Methods to assign a locale to a business object
Data Handler class | Method |
---|---|
DataHandler | getBO() - public, getBOName() |
If data transfers from a location that uses one code set to a location that uses a different code set, some form of character conversion needs to be performed for the data to retain its meaning. The Java runtime environment within the Java Virtual Machine (JVM) represents data in the Unicode character set. The Unicode character set is a universal character set that contains encodings for characters in most known character code sets (both single-byte and multibyte). There are several encoding formats of Unicode. The following encodings are used most frequently within the integration business system:
The UCS-2 encoding is the Unicode character set encoded in 2 bytes (octets).
The UTF-8 encoding is designed to address the use of Unicode character data in UNIX environments. It supports all ASCII code values (0...127) so that they are never interpreted as anything except a true ASCII code. Each code value is usually represented as a 1-, 2-, or 3- byte value.
Most components in the IBM WebSphere business integration system are written in Java. Therefore, when data is transferred between most system components, it is encoded in the Unicode code set and there is no need for character conversion.
Because a data handler is a component written in Java, it handles the serialized data in the Unicode code set. Usually, the source of the data's input stream is also processing in Unicode. Therefore, a data handler does not normally need to perform character conversion on the serialized data. However, if the input or output data contains a byte array whose character encoding is not the same as the system default, the data handler must provide the character encoding.
To track the character encoding associated with the data handler's environment, the DataHandler class has a private character-encoding variable, which is initialized to the character encoding associated with the locale of the operating system on which the data handler runs. You can access the data handler environment's character encoding (the value of this private character-encoding variable) at runtime through the accessor methods in Table 80.
Table 80. Methods to retrieve the data handler's character encoding
Data Handler Class | Method |
---|---|
DataHandler | getEncoding(), setEncoding() |