International Components for Unicode for Java

Release Notes


This Release - Version 2.1

This release is icu4jbin2_1.zip total size of 6553kb released on April 15 2002.

License

Please read and understand the license attached to this release before installing and using the ICU4J libraries.

What's new in Release 2.1

  • Changes for JDK 1.4
  • Enhancements to Character Properties
    • Now supports UnicodeData 3.1.1.
    • Besides the general categories, ICU now supports derived categories
    • Internal data structure updated for faster access, optimized for BMP characters
    • Extended character name access provided, all characters will be given a name if the option is selected
    • New character category iterator and name iterator.
  • Transliterator enhancements
    • Transliterator rules are now implemented with an object-oriented output side. This allows more powerful output mechanisms.
    • The first new output mechanism is the &function syntax. This allows one transliterator to invoke another one within a rule.
    • Case transliterators fixed to use new and improved API.
  • Added API to DateFormat to parse and format using a Calendar. This allows incremental parsing and more flexible behavior.
  • Fixed Calendar.fieldDifference to handle leap years and large ranges.

Platform Dependencies

Parts of ICU4J depend on functionality that is only available in JDK 1.3 or later, although some components work under earlier JVMs. All components should be compiled using a Java2 compiler, as even components that run under earlier JVMs can require language features that are only present in Java2. Currently 1.1.x and 1.2.x JVMs are unsupported and untested, and you use the components on these JVMs at your own risk.

The platforms on which we have built and tested ICU4J are:

  • Win98, WinNT, Win2000, WinXP / IBM JDK 1.3, Sun JDK 1.3.1, 1.4
  • Solaris 2.6, 2.7 / Sun JDK 1.3.1, 1.4
  • AIX 5.1 / IBM JDK 1.3

Installation Dependencies

  • To install ICU4J as it is, simply place the prebuilt jar file icu4j.jar on your Java CLASSPATH. No other files are needed.
  • If building ICU4J is required, you can use the Ant build system.
    The Ant build system is part of the Apache Software Foundation's Jakarta project. Ant is a portable, Java-based build system similar to make. ICU4J uses Ant because it introduces no other dependencies, it's portable, and it's easier to manage than a collection of makefiles. We currently build ICU4J using a single build.xml file on both Windows and Solaris using Ant. Installing Ant is straightforward.
    Note : It's recommended to install both the JDK and Ant somewhere outside the ICU4J directory, to keep them out of CVS's hair.
    For more information, read the Ant documentation and the build.xml file.

For further detailed information about the ICU4J library, please refer to the ReadMe.


ICU Resource Data added to ICU4J

Starting with JDK 1.4, the resource information that used to be available through public classes in java.text.resources is no longer available. Sun has moved these classes to an internal package. This has two consequences. One, both the format and contents of the resources can now change at any time-- dot releases and special bugfix releases can be different. Two, the resources are now no longer accessible without explicit permission by the java user.

For these reasons, ICU4J 2.1 now includes its own resource information which is completely independent of the JDK resource information. The ICU4J 2.1 information is equivalent to the information in ICU4C and ultimately derives from the same source. This allows ICU4J 2.1 to be built on, and run on, JDK 1.4.

There are two main consequences of this decision. The first is an increase in size of ICU4J. The new resource information, currently stored as class files residing in a jar file, is approximately 1.15 megabytes. The second is an increased difference between ICU's resource information and Java's. Neither is a clear superset of the other. For example, Java core currently has more timezone information than ICU. ICU's model for handling currency is also different than Java's. This will change over time as new versions of Java and ICU are released.

In addition to the resource information that corresponds to the Java resource information, ICU4J also includes resource information needed to support its additional features, such as Transliteration, Calendar, and DictionaryBasedBreakIterator. This information has existed in some form in prior releases on ICU4J and has not greatly changed in size.

How to Remove Unneeded Resource Information

This section will focus on the new information included in ICU4J 2.1.

By default the ICU4J distribution includes all of the new resource information. It is located in the package com.ibm.icu.impl.data, as a set of class files named "LocaleElements" followed by the names of locales in the form _xx_YY_ZZZZ, where 'xx' is the two-letter language code, 'YY' is the country code, and 'ZZ' (which can be any length) is a variant. Many of these fields can be omitted. Locale naming is documented the Locale class, java.util.Locale, and the use of these names in searching for resources is documented in java.util.ResourceBundle.

Some of these files require separate binary data. The names of the binary data files start with "CollationElements", then the corresponding Locale string, and end with '.res'. Another data file (only one at the moment) starts with the name "BreakDictionaryData", the corresponding Locale string, and ends with '.ucs'.

Some of the LocaleElements files share data with other LocaleElements files, because some Locale names have changed. For example, he_IL used to be iw_IL. In order to support both names but not duplicate the data, one of the class files refers to the other class file's data.

The list of supported resources is found in a file called LocaleElements_index.class. This contains the names of all the LocaleElements resources and is the source of the information returned by API such as Calendar.getAvailableLocales. (Note: for ease of customization this probably should be a text file).

LocaleElements files form a hierarchy, with up to four levels: a root, language, region (country), and variant. Searches for locale data attempt to match as far down the hierarchy as possible, for example, 'he_IL' will match LocaleElements_he_IL, but 'he_US' will match LocaleElements_he (since there is no 'US' variant for 'he', and 'xx_YY' will match LocaleElements (since there is no 'xx' language code in the LocaleElements hierarchy). Again, see java.util.ResourceBundle for more information.

With this in mind, the way to remove LocaleData is to make sure to remove all dependencies on that data as well. For example, if you remove LocaleElements_he.class, you need to remove LocaleElements_he_IL.class, since it is lower in the hierarchy, and you must remove LocaleElements_iw.class, since it references LocaleElements_he, and LocaleELements_iw_IL.class, since it depends on it (and also references LocaleElements_he_IL). For another example, if you remove CollationElements_zh__PINYIN.res, you must also remove LocaleElements_zh__PINYIN.class, since it depends on the CollationElements_zh__PINYIN.res.

Unfortunately, the jar tool in the JDK provides no way to remove items from a jar file. Thus you have to extract the resources, remove the ones you don't want, and then create a new jar file with the remining resources. See the jar tool information for how to do this. Before 'rejaring' the files, be sure to thoroughly test your application with the remaining resources, making sure each required resource is present.

Developing Resources to be used with ICU4J

ICU4J 2.1 uses the standard class lookup mechanism. This means any appropriately named resource on the CLASSPATH will be located, in the order listed in the classpath.

If you create a resource file com.ibm.icu.impl.data.LocaleElements_xx_YY.class, and list it on the CLASSPATH before icu4j.jar, your resource will be used in place of any existing LocaleElements_xx_YY resource in icu4j. This is a good way to try out changes to resources. You can, for example, include the resource in your application's jar file and list it ahead of icu4j.jar.

In order to create new resources, you first must thoroughly understand the various elements contained in the resource files, their syntax and dependencies. You cannot simply 'patch' existing resource files with a single change because the new file completely replaces the old file in the resource hierarchy. In general, the new resource file should contain all the different data that the old one did, plus your changes.

Adding a new 'leaf' resource is easiest. Elements defined in that resource will override corresponding ones in the resources further up the hierarchy. Thus you can, for example, try out new localized names of days of the week, as they are all contained in one element. The variant mechanism can be used to temporarily try out new versions of existing resource elements (though we don't recommend shipping this way). Note though that some resources have detailed dependencies on each other, so that you cannot simply assume that a new element with the same structure and number of contents will 'just work.'

Patching an 'internal' resource (say, one corresponding to an existing language resource that has children) requires careful analysis of the contents of the resources.

LocaleElements resource data in ICU4J 2.1 is checked in to the repository as precompiled class files. This means that inspecting the contents of these resources is difficult. They are compiled from java files that in turn are machine-generated from ICU4C binary data, using the genrb tool in ICU4C. [Some of the files are then hand-tweaked at the moment, we weren't able to fix this before release.] You can view the contents of the ICU4C text resource files to understand the contents of the ICU4J resources, as they are the same.

Developing ICU4J Resources

Currently only the LocaleElements resource data is shared, other ICU resources (calendar, transliterator, etc.) are still checked in directly to ICU4J as source files. This means that development and maintenance of these resources continues as before, only LocaleElements resource data has been changed in ICU4J 2.1. This probably will change in the future once we work out a reasonable mechanism for storing and generating the resource data.

One goal of using the same resource data as ICU4C is to avoid keeping redundant copies of the resource data. Currently there is no separate repository of the 'master' resource data, it is checked in to ICU4C, and the tools for converting it to .java files are ICU4C tools. This is inconvenient for working in Java, but since maintenance of ICU4J and ICU4C is supposed to go on 'in parallel,' as a practical matter people will have to be familiar with development in both C and Java, and with the conventions and structure of each project. Additionally, sharing of data means that modifications to data immediately impact both projects (as it should) and thus both projects need to be tested when such changes are made. The bulk of the tools are currently on the ICU4C side, and will likely stay that way, so this seems like a reasonable initial approach to sharing the data.

While prototyping of LocaleElements data can occur in either Java or C, the final version should be checked in to ICU4C in text format. Genrb is then run to generate the .java and .res files. They are then (except for the current tweaking) compiled and jar'd into the file ICULocaleData.jar. The resulting jar file is then checked in to ICU4J as src/com/ibm/icu/dev/data/ICULocaleData.jar. (This is not great but it allows ICU4J to be downloaded and built as one project, instead of two, one for locale data and one for ICU4J proper. Given the 2.1 schedule it wasn't possible to work out the larger data sharing problem in time, so we tried to limit the impact to just what was needed to get JDK 1.4 support up and running.)

The files in ICULocaleData.jar get extracted to com/ibm/icu/impl/data in the build directory when the 'core' target is built. Thereafter, as long as the file LocaleElements_index.class file is untouched, they will not be extracted again. Building the 'resource' target will force the resources to once again be extracted. Extraction will overwrite any corresponding .class files already in that directory.