com.ibm.text
Class UnicodeCompressor

java.lang.Object
  |
  +--com.ibm.text.UnicodeCompressor
All Implemented Interfaces:
com.ibm.text.SCSU

public final class UnicodeCompressor
extends java.lang.Object
implements com.ibm.text.SCSU

A compression engine implementing the Standard Compression Scheme for Unicode (SCSU) as outlined in Unicode Technical Report #6.

The SCSU works by using dynamically positioned windows consisting of 128 consecutive characters in Unicode. During compression, characters within a window are encoded in the compressed stream as the bytes 0x7F - 0xFF. The SCSU provides transparency for the characters (bytes) between U+0000 - U+00FF. The SCSU approximates the storage size of traditional character sets, for example 1 byte per character for ASCII or Latin-1 text, and 2 bytes per character for CJK ideographs.

USAGE

The static methods on UnicodeCompressor may be used in a straightforward manner to compress simple strings:

  String s = ... ; // get string from somewhere
  byte [] compressed = UnicodeCompressor.compress(s);
 

The static methods have a fairly large memory footprint. For finer-grained control over memory usage, UnicodeCompressor offers more powerful APIs allowing iterative compression:

  // Compress an array "chars" of length "len" using a buffer of 512 bytes
  // to the OutputStream "out"

  UnicodeCompressor myCompressor         = new UnicodeCompressor();
  final static int  BUFSIZE              = 512;
  byte []           byteBuffer           = new byte [ BUFSIZE ];
  int               bytesWritten         = 0;
  int []            unicharsRead         = new int [1];
  int               totalCharsCompressed = 0;
  int               totalBytesWritten    = 0;

  do {
    // do the compression
    bytesWritten = myCompressor.compress(chars, totalCharsCompressed, 
                                         len, unicharsRead,
                                         byteBuffer, 0, BUFSIZE);

    // do something with the current set of bytes
    out.write(byteBuffer, 0, bytesWritten);

    // update the no. of characters compressed
    totalCharsCompressed += unicharsRead[0];

    // update the no. of bytes written
    totalBytesWritten += bytesWritten;

  } while(totalCharsCompressed < len);

  myCompressor.reset(); // reuse compressor
 

Version:
1.5 05 Aug 99
Author:
Stephen F. Booth
See Also:
UnicodeDecompressor

Field Summary
static int ARMENIANINDEX
           
static int COMPRESSIONOFFSET
           
static int GREEKINDEX
           
static int HALFWIDTHKATAKANAINDEX
           
static int HIRAGANAINDEX
           
static int INVALIDCHAR
           
static int INVALIDWINDOW
           
static int IPAEXTENSIONINDEX
           
static int KATAKANAINDEX
           
static int LATININDEX
           
static int MAXINDEX
           
static int NUMSTATICWINDOWS
           
static int NUMWINDOWS
           
static int RESERVEDINDEX
           
static int SCHANGE0
           
static int SCHANGE1
           
static int SCHANGE2
           
static int SCHANGE3
           
static int SCHANGE4
           
static int SCHANGE5
           
static int SCHANGE6
           
static int SCHANGE7
           
static int SCHANGEU
           
static int SDEFINE0
           
static int SDEFINE1
           
static int SDEFINE2
           
static int SDEFINE3
           
static int SDEFINE4
           
static int SDEFINE5
           
static int SDEFINE6
           
static int SDEFINE7
           
static int SDEFINEX
           
static int SINGLEBYTEMODE
           
static int[] sOffsets
          Static compression window offsets
static int[] sOffsetTable
          For window offset mapping
static int SQUOTE0
           
static int SQUOTE1
           
static int SQUOTE2
           
static int SQUOTE3
           
static int SQUOTE4
           
static int SQUOTE5
           
static int SQUOTE6
           
static int SQUOTE7
           
static int SQUOTEU
           
static int SRESERVED
           
static int UCHANGE0
           
static int UCHANGE1
           
static int UCHANGE2
           
static int UCHANGE3
           
static int UCHANGE4
           
static int UCHANGE5
           
static int UCHANGE6
           
static int UCHANGE7
           
static int UDEFINE0
           
static int UDEFINE1
           
static int UDEFINE2
           
static int UDEFINE3
           
static int UDEFINE4
           
static int UDEFINE5
           
static int UDEFINE6
           
static int UDEFINE7
           
static int UDEFINEX
           
static int UNICODEMODE
           
static int UQUOTEU
           
static int URESERVED
           
 
Constructor Summary
UnicodeCompressor()
          Create a UnicodeCompressor.
 
Method Summary
static byte[] compress(char[] buffer, int start, int limit)
          Compress a Unicode character array into a byte array.
 int compress(char[] charBuffer, int charBufferStart, int charBufferLimit, int[] charsRead, byte[] byteBuffer, int byteBufferStart, int byteBufferLimit)
          Compress a Unicode character array into a byte array.
static byte[] compress(java.lang.String buffer)
          Compress a string into a byte array.
 void reset()
          Reset the compressor to its initial state.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COMPRESSIONOFFSET

public static final int COMPRESSIONOFFSET

NUMWINDOWS

public static final int NUMWINDOWS

NUMSTATICWINDOWS

public static final int NUMSTATICWINDOWS

INVALIDWINDOW

public static final int INVALIDWINDOW

INVALIDCHAR

public static final int INVALIDCHAR

SINGLEBYTEMODE

public static final int SINGLEBYTEMODE

UNICODEMODE

public static final int UNICODEMODE

MAXINDEX

public static final int MAXINDEX

RESERVEDINDEX

public static final int RESERVEDINDEX

LATININDEX

public static final int LATININDEX

IPAEXTENSIONINDEX

public static final int IPAEXTENSIONINDEX

GREEKINDEX

public static final int GREEKINDEX

ARMENIANINDEX

public static final int ARMENIANINDEX

HIRAGANAINDEX

public static final int HIRAGANAINDEX

KATAKANAINDEX

public static final int KATAKANAINDEX

HALFWIDTHKATAKANAINDEX

public static final int HALFWIDTHKATAKANAINDEX

SDEFINEX

public static final int SDEFINEX

SRESERVED

public static final int SRESERVED

SQUOTEU

public static final int SQUOTEU

SCHANGEU

public static final int SCHANGEU

SQUOTE0

public static final int SQUOTE0

SQUOTE1

public static final int SQUOTE1

SQUOTE2

public static final int SQUOTE2

SQUOTE3

public static final int SQUOTE3

SQUOTE4

public static final int SQUOTE4

SQUOTE5

public static final int SQUOTE5

SQUOTE6

public static final int SQUOTE6

SQUOTE7

public static final int SQUOTE7

SCHANGE0

public static final int SCHANGE0

SCHANGE1

public static final int SCHANGE1

SCHANGE2

public static final int SCHANGE2

SCHANGE3

public static final int SCHANGE3

SCHANGE4

public static final int SCHANGE4

SCHANGE5

public static final int SCHANGE5

SCHANGE6

public static final int SCHANGE6

SCHANGE7

public static final int SCHANGE7

SDEFINE0

public static final int SDEFINE0

SDEFINE1

public static final int SDEFINE1

SDEFINE2

public static final int SDEFINE2

SDEFINE3

public static final int SDEFINE3

SDEFINE4

public static final int SDEFINE4

SDEFINE5

public static final int SDEFINE5

SDEFINE6

public static final int SDEFINE6

SDEFINE7

public static final int SDEFINE7

UCHANGE0

public static final int UCHANGE0

UCHANGE1

public static final int UCHANGE1

UCHANGE2

public static final int UCHANGE2

UCHANGE3

public static final int UCHANGE3

UCHANGE4

public static final int UCHANGE4

UCHANGE5

public static final int UCHANGE5

UCHANGE6

public static final int UCHANGE6

UCHANGE7

public static final int UCHANGE7

UDEFINE0

public static final int UDEFINE0

UDEFINE1

public static final int UDEFINE1

UDEFINE2

public static final int UDEFINE2

UDEFINE3

public static final int UDEFINE3

UDEFINE4

public static final int UDEFINE4

UDEFINE5

public static final int UDEFINE5

UDEFINE6

public static final int UDEFINE6

UDEFINE7

public static final int UDEFINE7

UQUOTEU

public static final int UQUOTEU

UDEFINEX

public static final int UDEFINEX

URESERVED

public static final int URESERVED

sOffsetTable

public static final int[] sOffsetTable
For window offset mapping

sOffsets

public static final int[] sOffsets
Static compression window offsets
Constructor Detail

UnicodeCompressor

public UnicodeCompressor()
Create a UnicodeCompressor. Sets all windows to their default values.
See Also:
reset()
Method Detail

compress

public static byte[] compress(java.lang.String buffer)
Compress a string into a byte array.
Parameters:
buffer - The string to compress.
Returns:
A byte array containing the compressed characters.
See Also:
compress(char [], int, int)

compress

public static byte[] compress(char[] buffer,
                              int start,
                              int limit)
Compress a Unicode character array into a byte array.
Parameters:
buffer - The character buffer to compress.
start - The start of the character run to compress.
limit - The limit of the character run to compress.
Returns:
A byte array containing the compressed characters.
See Also:
compress(String)

compress

public int compress(char[] charBuffer,
                    int charBufferStart,
                    int charBufferLimit,
                    int[] charsRead,
                    byte[] byteBuffer,
                    int byteBufferStart,
                    int byteBufferLimit)
Compress a Unicode character array into a byte array. This function will only consume input that can be completely output.
Parameters:
charBuffer - The character buffer to compress.
charBufferStart - The start of the character run to compress.
charBufferLimit - The limit of the character run to compress.
charsRead - A one-element array. If not null, on return the number of characters read from charBuffer.
byteBuffer - A buffer to receive the compressed data. This buffer must be at minimum four bytes in size.
byteBufferStart - The starting offset to which to write compressed data.
byteBufferLimit - The limiting offset for writing compressed data.
Returns:
The number of bytes written to byteBuffer.

reset

public void reset()
Reset the compressor to its initial state.


Copyright (c) 1998-2000 IBM Corporation and others.