org.biojava.bio.seq.io
Class CharacterTokenization

java.lang.Object
  extended by org.biojava.utils.Unchangeable
      extended by org.biojava.bio.seq.io.CharacterTokenization
All Implemented Interfaces:
Serializable, Annotatable, SymbolTokenization, Changeable

public class CharacterTokenization
extends Unchangeable
implements SymbolTokenization, Serializable

Implementation of SymbolTokenization which binds symbols to single unicode characters.

Many alphabets (and all simple built-in alphabets like DNA, RNA and Protein) will have an instance of CharacterTokenization registered under the name 'token', so that you could say CharacterTokenization ct = (CharacterTokenization) alpha.getTokenization('token'); and expect it to work. When you construct a new instance of this class for an alphabet, there will be no initial associations of Symbols with characters. It is your responsibility to populate the new tokenization appropriately.

Since:
1.2
Author:
Thomas Down, Matthew Pocock, Greg Cox, Keith James
See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from interface org.biojava.bio.seq.io.SymbolTokenization
SymbolTokenization.TokenType
 
Nested classes/interfaces inherited from interface org.biojava.bio.Annotatable
Annotatable.AnnotationForwarder
 
Field Summary
 
Fields inherited from interface org.biojava.bio.seq.io.SymbolTokenization
CHARACTER, FIXEDWIDTH, SEPARATED, UNKNOWN
 
Fields inherited from interface org.biojava.bio.Annotatable
ANNOTATION
 
Constructor Summary
CharacterTokenization(Alphabet alpha, boolean caseSensitive)
           
 
Method Summary
 void bindSymbol(Symbol s, char c)
           Bind a Symbol to a character.
 Alphabet getAlphabet()
          The alphabet to which this tokenization applies.
 Annotation getAnnotation()
          Should return the associated annotation object.
protected  Symbol[] getTokenTable()
           
 SymbolTokenization.TokenType getTokenType()
          Determine the style of tokenization represented by this object.
 StreamParser parseStream(SeqIOListener listener)
          Return an object which can parse an arbitrary character stream into symbols.
 Symbol parseToken(String token)
          Returns the symbol for a single token.
protected  Symbol parseTokenChar(char c)
           
 String tokenizeSymbol(Symbol s)
          Return a token representing a single symbol.
 String tokenizeSymbolList(SymbolList sl)
          Return a string representation of a list of symbols.
 
Methods inherited from class org.biojava.utils.Unchangeable
addChangeListener, addChangeListener, addForwarder, getForwarders, getListeners, isUnchanging, removeChangeListener, removeChangeListener, removeForwarder
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.biojava.utils.Changeable
addChangeListener, addChangeListener, isUnchanging, removeChangeListener, removeChangeListener
 

Constructor Detail

CharacterTokenization

public CharacterTokenization(Alphabet alpha,
                             boolean caseSensitive)
Method Detail

getAlphabet

public Alphabet getAlphabet()
Description copied from interface: SymbolTokenization
The alphabet to which this tokenization applies.

Specified by:
getAlphabet in interface SymbolTokenization

getTokenType

public SymbolTokenization.TokenType getTokenType()
Description copied from interface: SymbolTokenization
Determine the style of tokenization represented by this object.

Specified by:
getTokenType in interface SymbolTokenization

getAnnotation

public Annotation getAnnotation()
Description copied from interface: Annotatable
Should return the associated annotation object.

Specified by:
getAnnotation in interface Annotatable
Returns:
an Annotation object, never null

bindSymbol

public void bindSymbol(Symbol s,
                       char c)

Bind a Symbol to a character.

This method will ensure that when this char is observed, it resolves to this symbol. If it was previously associated with another symbol, the old binding is removed. If this is the first time the symbol has been bound to any character, then this character is taken to be the default tokenization of the Symbol. This means that when converting symbols into characters, this char will be used. If the symbol has previously been bound to another character, then this char will not be produced for the symbol when stringifying the symbol, but this symbol will be produced when tokenizing this character.

Parameters:
s - the Symbol to bind
c - the char to bind it to

parseToken

public Symbol parseToken(String token)
                  throws IllegalSymbolException
Description copied from interface: SymbolTokenization
Returns the symbol for a single token.

The Symbol will be a member of the alphabet. If the token is not recognized as mapping to a symbol, an exception will be thrown.

Specified by:
parseToken in interface SymbolTokenization
Parameters:
token - the token to retrieve a Symbol for
Returns:
the Symbol for that token
Throws:
IllegalSymbolException - if there is no Symbol for the token

getTokenTable

protected Symbol[] getTokenTable()

parseTokenChar

protected Symbol parseTokenChar(char c)
                         throws IllegalSymbolException
Throws:
IllegalSymbolException

tokenizeSymbol

public String tokenizeSymbol(Symbol s)
                      throws IllegalSymbolException
Description copied from interface: SymbolTokenization
Return a token representing a single symbol.

Specified by:
tokenizeSymbol in interface SymbolTokenization
Parameters:
s - The symbol
Throws:
IllegalSymbolException - if the symbol isn't recognized.

tokenizeSymbolList

public String tokenizeSymbolList(SymbolList sl)
                          throws IllegalAlphabetException
Description copied from interface: SymbolTokenization
Return a string representation of a list of symbols.

Specified by:
tokenizeSymbolList in interface SymbolTokenization
Parameters:
sl - A SymbolList
Throws:
IllegalAlphabetException - if alphabets don't match

parseStream

public StreamParser parseStream(SeqIOListener listener)
Description copied from interface: SymbolTokenization
Return an object which can parse an arbitrary character stream into symbols.

Specified by:
parseStream in interface SymbolTokenization
Parameters:
listener - The listener which gets notified of parsed symbols.