class Bio::Nexus

DESCRIPTION

Bio::Nexus is a parser for nexus formatted data. It contains classes and constants enabling the representation and processing of nexus data.

USAGE

# Parsing a nexus formatted string str:
nexus = Bio::Nexus.new( nexus_str )

# Obtaining of the nexus blocks as array of GenericBlock or
# any of its subclasses (such as DistancesBlock):
blocks = nexus.get_blocks

# Getting a block by name:
my_blocks = nexus.get_blocks_by_name( "my_block" )

# Getting distance blocks:
distances_blocks = nexus.get_distances_blocks

# Getting trees blocks:
trees_blocks = nexus.get_trees_blocks

# Getting data blocks:
data_blocks = nexus.get_data_blocks

# Getting characters blocks: 
character_blocks = nexus.get_characters_blocks

# Getting taxa blocks: 
taxa_blocks = nexus.get_taxa_blocks

Constants

BEGIN_BLOCK
BEGIN_COMMENT
BEGIN_NEXUS
CHARACTERS
CHARACTERS_BLOCK
DATA
DATATYPE
DATA_BLOCK
DELIMITER
DIMENSIONS
DISTANCES
DISTANCES_BLOCK
DOUBLE_QUOTE
END_BLOCK
END_COMMENT
END_OF_LINE
FORMAT
INDENTENTION
MATRIX
NCHAR
NTAX
SINGLE_QUOTE
TAXA
TAXA_BLOCK
TAXLABELS
TREES
TREES_BLOCK

Public Class Methods

new( nexus_str ) click to toggle source

Creates a new nexus parser for 'nexus_str'.


Arguments:

  • (required) nexus_str: String - nexus formatted data

# File lib/bio/db/nexus.rb, line 177
def initialize( nexus_str )
  @blocks             = Array.new
  @current_cmd        = nil
  @current_subcmd     = nil
  @current_block_name = nil
  @current_block      = nil
  parse( nexus_str )
end

Public Instance Methods

get_blocks() click to toggle source

Returns an Array of all blocks found in the String 'nexus_str' set via ::new( nexus_str ).


Returns

Array of GenericBlocks or any of its subclasses

# File lib/bio/db/nexus.rb, line 192
def get_blocks
  @blocks
end
get_blocks_by_name( name ) click to toggle source

A convenience methods which returns an array of all nexus blocks for which the name equals 'name' found in the String 'nexus_str' set via ::new( nexus_str ).


Arguments:

Returns

Array of GenericBlocks or any of its subclasses

# File lib/bio/db/nexus.rb, line 204
def get_blocks_by_name( name )
  found_blocks = Array.new
  @blocks.each do | block |
    if ( name == block.get_name )
      found_blocks.push( block )
    end
  end
  found_blocks
end
get_characters_blocks() click to toggle source

A convenience methods which returns an array of all characters blocks.


Returns

Array of CharactersBlocks

# File lib/bio/db/nexus.rb, line 228
def get_characters_blocks
  get_blocks_by_name( CHARACTERS_BLOCK.chomp( ";").downcase )
end
get_data_blocks() click to toggle source

A convenience methods which returns an array of all data blocks.


Returns

Array of DataBlocks

# File lib/bio/db/nexus.rb, line 219
def get_data_blocks
  get_blocks_by_name( DATA_BLOCK.chomp( ";").downcase )
end
get_distances_blocks() click to toggle source

A convenience methods which returns an array of all distances blocks.


Returns

Array of DistancesBlock

# File lib/bio/db/nexus.rb, line 246
def get_distances_blocks
  get_blocks_by_name( DISTANCES_BLOCK.chomp( ";").downcase )
end
get_taxa_blocks() click to toggle source

A convenience methods which returns an array of all taxa blocks.


Returns

Array of TaxaBlocks

# File lib/bio/db/nexus.rb, line 255
def get_taxa_blocks
  get_blocks_by_name( TAXA_BLOCK.chomp( ";").downcase )
end
get_trees_blocks() click to toggle source

A convenience methods which returns an array of all trees blocks.


Returns

Array of TreesBlocks

# File lib/bio/db/nexus.rb, line 237
def get_trees_blocks
  get_blocks_by_name( TREES_BLOCK.chomp( ";").downcase )
end
to_s() click to toggle source

Returns a String listing how many of each blocks it parsed.


Returns

String

# File lib/bio/db/nexus.rb, line 263
def to_s
  str = String.new
  if get_blocks.length < 1
    str << "empty"
  else 
    str << "number of blocks: " << get_blocks.length.to_s
    if get_characters_blocks.length > 0
      str << " [characters blocks: " << get_characters_blocks.length.to_s << "] "
    end  
    if get_data_blocks.length > 0
      str << " [data blocks: " << get_data_blocks.length.to_s << "] "
    end
    if get_distances_blocks.length > 0
      str << " [distances blocks: " << get_distances_blocks.length.to_s << "] "
    end  
    if get_taxa_blocks.length > 0
      str << " [taxa blocks: " << get_taxa_blocks.length.to_s << "] "
    end    
    if get_trees_blocks.length > 0
      str << " [trees blocks: " << get_trees_blocks.length.to_s << "] "
    end        
  end
  str
end
Also aliased as: to_str
to_str()
Alias for: to_s

Private Instance Methods

add_token_to_matrix( token, scan_token, matrix, row, col ) click to toggle source

Helper method for make_matrix.


Arguments:

  • (required) token: String

  • (required) scan_token: true or false - add whole token

    or
    scan into chars
  • (required) matrix: NexusMatrix - the matrix to which to add token

  • (required) row: Integer - the row for matrix

  • (required) col: Integer - the starting row

Returns

Integer - ending row

# File lib/bio/db/nexus.rb, line 686
def add_token_to_matrix( token, scan_token, matrix, row, col )
  if ( scan_token )
    token.scan(/./) { |w|
    col += 1
    matrix.set_value( row, col, w )
  }
  else
    col += 1
    matrix.set_value( row, col, token )
  end
  col
end
begin_block() click to toggle source

Operations required when beginnig of block encountered.


# File lib/bio/db/nexus.rb, line 341
def begin_block() 
  if @current_block_name != nil
    raise NexusParseError, "Cannot have nested nexus blocks (\"end;\" might be missing)"
  end
  reset_command_state()
end
cmds_equal_to?( command, subcommand ) click to toggle source

Returns true if @current_cmd == command and @current_subcmd == subcommand, false otherwise


Arguments:

Returns

true or false

# File lib/bio/db/nexus.rb, line 736
def cmds_equal_to?( command, subcommand )
  return ( @current_cmd == command && @current_subcmd == subcommand )
end
create_block() click to toggle source

Creates GenericBlock (or any of its subclasses) the type of which is determined by the state of @current_block_name.


Returns

GenericBlock (or any of its subclasses) object

# File lib/bio/db/nexus.rb, line 395
def create_block()
  case @current_block_name
    when TAXA_BLOCK.downcase
      return Bio::Nexus::TaxaBlock.new( @current_block_name )
    when CHARACTERS_BLOCK.downcase
      return Bio::Nexus::CharactersBlock.new( @current_block_name )
    when DATA_BLOCK.downcase
      return Bio::Nexus::DataBlock.new( @current_block_name )
    when DISTANCES_BLOCK.downcase
      return Bio::Nexus::DistancesBlock.new( @current_block_name )
    when TREES_BLOCK.downcase
      return Bio::Nexus::TreesBlock.new( @current_block_name )
    else
      return Bio::Nexus::GenericBlock.new( @current_block_name )
  end 
end
end_block() click to toggle source

Operations required when ending of block encountered.


# File lib/bio/db/nexus.rb, line 351
def end_block()
  if @current_block_name == nil
    raise NexusParseError, "Cannot have two or more \"end;\" tokens in sequence"
  end
  @current_block_name = nil
end
equal?( str1, str2 ) click to toggle source

Returns true if Strings str1 and str2 are equal - ignoring case.


Arguments:

Returns

true or false

# File lib/bio/db/nexus.rb, line 721
def equal?( str1, str2 )
  if ( str1 == nil || str2 == nil )
    return false
  else
    return ( str1.downcase == str2.downcase )
  end
end
make_matrix( token, ary, size, scan_token = false ) click to toggle source

Makes a NexusMatrix out of token from token Array ary Used by process_token_for_X_block methods which contain data in a matrix form. Column 0 contains names. This will shift tokens from ary.


Arguments:

  • (required) token: String

  • (required) ary: Array

  • (required) size: Integer

  • (optional) scan_token: true or false

Returns

NexusMatrix

# File lib/bio/db/nexus.rb, line 647
def make_matrix( token, ary, size, scan_token = false )
  matrix = NexusMatrix.new
  col = -1
  row = 0
  done = false
  while ( !done )
    if ( col == -1 )
      # name
      col = 0
      matrix.set_value( row, col, token ) # name is in col 0  
    else
      # values
      col = add_token_to_matrix( token, scan_token, matrix, row, col )
      if ( col == size.to_i  )
        col = -1
        row += 1
      end
    end
    token = ary.shift
    if ( token.index( DELIMITER ) != nil )
      col = add_token_to_matrix( token.chomp( ";" ), scan_token, matrix, row, col )
      done = true
    end
  end # while
  matrix
end
parse( str ) click to toggle source

The master method for parsing. Stores the resulting block in array @blocks.


Arguments:

# File lib/bio/db/nexus.rb, line 297
def parse( str )
  str = str.chop if str[-1..-1] == ';'
  ary = str.split(/[\s+=]/)
  ary.collect! { |x| x.strip!; x.empty? ? nil : x }
  ary.compact!
  in_comment = false
  comment_level = 0
 
  # Main loop
  while token = ary.shift
    # Quotes:
    if ( token.index( SINGLE_QUOTE ) == 0 ||
         token.index( DOUBLE_QUOTE ) == 0 )
      token << "_" << ary.shift
      token = token.chop if token[-1..-1] == ';'
      token = token.slice( 1, token.length - 2 )
    end
    # Comments: 
    open = token.count( BEGIN_COMMENT )
    close = token.count( END_COMMENT )
    comment = comment_level > 0
    comment_level = comment_level + open - close
    if ( open > 0 && open == close  )
      next
    elsif comment_level > 0 || comment
      next
    elsif equal?( token, END_BLOCK )
      end_block()
    elsif equal?( token, BEGIN_BLOCK )
      begin_block()
      @current_block_name = token = ary.shift
      @current_block_name.downcase!
      @current_block = create_block()
      @blocks.push( @current_block )
    elsif ( @current_block_name != nil )  
      process_token( token.chomp( DELIMITER ), ary )
    end
  end # main loop
  @blocks.compact!
end
process_token( token, ary ) click to toggle source

This calls various process_token_for_<name>_block methods depeding on state of @current_block_name.


Arguments:

  • (required) token: String

  • (required) ary: Array

# File lib/bio/db/nexus.rb, line 365
def process_token( token, ary )
  case @current_block_name
    when TAXA_BLOCK.downcase
      process_token_for_taxa_block( token )
    when CHARACTERS_BLOCK.downcase
      process_token_for_character_block( token, ary )
    when DATA_BLOCK.downcase
      process_token_for_data_block( token, ary )
    when DISTANCES_BLOCK.downcase
      process_token_for_distances_block( token, ary )
    when TREES_BLOCK.downcase  
      process_token_for_trees_block( token, ary )
    else
      process_token_for_generic_block( token )  
  end
end
process_token_for_character_block( token, ary ) click to toggle source

This processes the tokens (between Begin Taxa; and End;) for a character block Example of a currently parseable character block: Begin Characters; Dimensions NChar=20

NTax=4;

Format DataType=DNA Missing=x Gap=- MatchChar=.; Matrix fish ACATA GAGGG TACCT CTAAG frog ACTTA GAGGC TACCT CTAGC snake ACTCA CTGGG TACCT TTGCG mouse ACTCA GACGG TACCT TTGCG; End;


Arguments:

  • (required) token: String

  • (required) ary: Array

# File lib/bio/db/nexus.rb, line 458
def process_token_for_character_block( token, ary )
  if ( equal?( token, DIMENSIONS ) )
    @current_cmd    = DIMENSIONS
    @current_subcmd = nil
  elsif ( equal?( token, FORMAT ) )
    @current_cmd    = FORMAT
    @current_subcmd = nil  
  elsif ( equal?( token, MATRIX ) )
    @current_cmd    = MATRIX
    @current_subcmd = nil
  elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) )
    @current_subcmd = NTAX
  elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) )
    @current_subcmd = NCHAR
  elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) )
    @current_subcmd = DATATYPE
  elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MISSING ) )
    @current_subcmd = CharactersBlock::MISSING 
  elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::GAP ) )
    @current_subcmd = CharactersBlock::GAP
  elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MATCHCHAR ) )
    @current_subcmd = CharactersBlock::MATCHCHAR  
  elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) )
    @current_block.set_number_of_taxa( token )
  elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) )
    @current_block.set_number_of_characters( token )  
  elsif ( cmds_equal_to?( FORMAT, DATATYPE ) )
    @current_block.set_datatype( token )
  elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MISSING ) )
    @current_block.set_missing( token )
  elsif ( cmds_equal_to?( FORMAT, CharactersBlock::GAP ) )
    @current_block.set_gap_character( token )
  elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MATCHCHAR ) )
    @current_block.set_match_character( token )  
  elsif ( cmds_equal_to?( MATRIX, nil ) )
    @current_block.set_matrix( make_matrix( token, ary,
                               @current_block.get_number_of_characters, true ) )
  end
end
process_token_for_data_block( token, ary ) click to toggle source

This processes the tokens (between Begin Taxa; and End;) for a data block. Example of a currently parseable data block: Begin Data; Dimensions ntax=5 nchar=14; Format Datatype=RNA gap=# MISSING=x MatchChar=^; TaxLabels ciona cow [comment] ape 'purple urchin' “green lizard”; Matrix taxon_1 A- CCGTCGA-GTTA taxon_2 T- CCG-CGA-GATA taxon_3 A- C-GTCGA-GATA taxon_4 A- CCTCGA–GTTA taxon_5 T- CGGTCGT-CTTA; End;


Arguments:

  • (required) token: String

  • (required) ary: Array

# File lib/bio/db/nexus.rb, line 591
def process_token_for_data_block( token, ary )
  if ( equal?( token, DIMENSIONS ) )
    @current_cmd    = DIMENSIONS
    @current_subcmd = nil
  elsif ( equal?( token, FORMAT ) )
    @current_cmd    = FORMAT
    @current_subcmd = nil
  elsif ( equal?( token, TAXLABELS ) )
    @current_cmd    = TAXLABELS
    @current_subcmd = nil  
  elsif ( equal?( token, MATRIX ) )
    @current_cmd    = MATRIX
    @current_subcmd = nil
  elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) )
    @current_subcmd = NTAX
  elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) )
    @current_subcmd = NCHAR
  elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) )
    @current_subcmd = DATATYPE
  elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MISSING ) )
    @current_subcmd = CharactersBlock::MISSING 
  elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::GAP ) )
    @current_subcmd = CharactersBlock::GAP
  elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MATCHCHAR ) )
    @current_subcmd = CharactersBlock::MATCHCHAR  
  elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) )
    @current_block.set_number_of_taxa( token )
  elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) )
    @current_block.set_number_of_characters( token )  
  elsif ( cmds_equal_to?( FORMAT, DATATYPE ) )
    @current_block.set_datatype( token )
  elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MISSING ) )
    @current_block.set_missing( token )
  elsif ( cmds_equal_to?( FORMAT, CharactersBlock::GAP ) )
    @current_block.set_gap_character( token )
  elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MATCHCHAR ) )
    @current_block.set_match_character( token )
  elsif ( cmds_equal_to?( TAXLABELS, nil ) )
    @current_block.add_taxon( token ) 
  elsif ( cmds_equal_to?( MATRIX, nil ) )
    @current_block.set_matrix( make_matrix( token, ary,
                               @current_block.get_number_of_characters, true ) )
  end
end
process_token_for_distances_block( token, ary ) click to toggle source

This processes the tokens (between Begin Taxa; and End;) for a character block. Example of a currently parseable character block: Begin Distances;

Dimensions nchar=20 ntax=5;
Format Triangle=Upper;
Matrix
taxon_1 0.0 1.0 2.0 4.0 7.0
taxon_2 1.0 0.0 3.0 5.0 8.0
taxon_3 3.0 4.0 0.0 6.0 9.0
taxon_4 7.0 3.0 1.0 0.0 9.5
taxon_5 1.2 1.3 1.4 1.5 0.0;

End;


Arguments:

  • (required) token: String

  • (required) ary: Array

# File lib/bio/db/nexus.rb, line 542
def process_token_for_distances_block( token, ary )
  if ( equal?( token, DIMENSIONS ) )
    @current_cmd    = DIMENSIONS
    @current_subcmd = nil
  elsif ( equal?( token, FORMAT ) )
    @current_cmd    = FORMAT
    @current_subcmd = nil  
  elsif ( equal?( token, MATRIX ) )
    @current_cmd    = MATRIX
    @current_subcmd = nil
  elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) )
    @current_subcmd = NTAX
  elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) )
    @current_subcmd = NCHAR
  elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) )
    @current_subcmd = DATATYPE
  elsif ( @current_cmd == FORMAT && equal?( token, DistancesBlock::TRIANGLE ) )
    @current_subcmd = DistancesBlock::TRIANGLE   
  elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) )
    @current_block.set_number_of_taxa( token )
  elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) )
    @current_block.set_number_of_characters( token )  
  elsif ( cmds_equal_to?( FORMAT, DistancesBlock::TRIANGLE ) )
    @current_block.set_triangle( token )
  elsif ( cmds_equal_to?( MATRIX, nil ) )
    @current_block.set_matrix( make_matrix( token, ary,
                               @current_block.get_number_of_taxa, false ) )
  end
end
process_token_for_generic_block( token ) click to toggle source

This processes the tokens (between Begin Taxa; and End;) for a block for which a specific parser is not available. Example of a currently parseable generic block: Begin Taxa;

token1 token2 token3 ...

End;


Arguments:

# File lib/bio/db/nexus.rb, line 709
def process_token_for_generic_block( token )
    @current_block.add_token( token )
end
process_token_for_taxa_block( token ) click to toggle source

This processes the tokens (between Begin Taxa; and End;) for a taxa block Example of a currently parseable taxa block: Begin Taxa;

Dimensions NTax=4;
TaxLabels fish [comment] 'african frog' "rat snake" 'red mouse';

End;


Arguments:

# File lib/bio/db/nexus.rb, line 422
def process_token_for_taxa_block( token )
  if ( equal?( token, DIMENSIONS ) )
    @current_cmd    = DIMENSIONS
    @current_subcmd = nil
  elsif ( equal?( token, TAXLABELS ) )
    @current_cmd    = TAXLABELS
    @current_subcmd = nil
  elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) )
    @current_subcmd = NTAX
  elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) )
    @current_block.set_number_of_taxa( token )
  elsif ( cmds_equal_to?( TAXLABELS, nil ) )
    @current_block.add_taxon( token )
  end
end
process_token_for_trees_block( token, ary ) click to toggle source

This processes the tokens (between Begin Trees; and End;) for a trees block Example of a currently parseable taxa block: Begin Trees; Tree best=(fish,(frog,(snake, mouse))); Tree other=(snake,(frog,( fish, mouse))); End;


Arguments:

  • (required) token: String

  • (required) ary: Array

# File lib/bio/db/nexus.rb, line 509
def process_token_for_trees_block( token, ary )
  if ( equal?( token, TreesBlock::TREE ) )
    @current_cmd    = TreesBlock::TREE
    @current_subcmd = nil
  elsif ( cmds_equal_to?( TreesBlock::TREE, nil ) )
    @current_block.add_tree_name( token )
    tree_string = ary.shift
    while ( tree_string.index( ";" ) == nil )
      tree_string << ary.shift
    end
    @current_block.add_tree( tree_string )
    @current_cmd    = nil
  end  
end
reset_command_state() click to toggle source

Resets @current_cmd and @current_subcmd to nil.


# File lib/bio/db/nexus.rb, line 385
def reset_command_state()
  @current_cmd    = nil
  @current_subcmd = nil
end